Find your current working directory
getwd()
[1] "/Users/mitzi/tmp"
Change your working directory (as needed)
setwd("path/to/work/dir")
hello.stan
In exercise 1.2, we created a data generating process which produces draws from a normally-distributed RV with mean 5.0 and standard deviation 10.0. The Stan model which fits this data is:
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
y ~ normal(mu, sigma);
}
Two things to note:
In Stan, parameters must be declared with constraints corresponding to their support in the model. Because the standard deviation \(\sigma\) must be positive, we place a lower bound on the parameter declaration.
Because we don’t specify priors on parameters mu
and sigma
, Stan puts an uniform prior from \(-\infty\) to \(+\infty\) on these parameters; this is an improper prior. Stan can fit models with improper priors as long as there is data which results in a proper posterior. For guidance on choosing priors, see Stan wiki page: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
stan
The RStan function stan
fits a Stan model and returns the fitted result as a stanfit
object.
Exercise 2.1 Hello World in Stan
Create the file “hello.stan” in your current R working directory and copy the above program into it.
In your R environment, create variables “N” and “y”, where N is the number of draws, and y is a vector of draw from a normally-distributed RV with mean 5.0 and standard deviation 10.0:
N = 100
y = rnorm(N, mean=5, sd=10)
Run the stan
command to fit the model to the data (variables “N” and “y”) and save the resulting stanfit
object:
stanfit = stan("hello.stan",data=c("N","y"))
Print a summary of the stanfit
object:
print(stanfit)
First, the summary reports the number of draws from the posterior saved into the stanfit
object:
Inference for Stan model: hello.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
Next, it reports on all the parameters and any top-level variables declared in the transformed parameters and generated quantities blocks:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 5.48 0.02 1.09 3.24 4.78 5.50 6.21 7.58 2953 1
sigma 10.76 0.01 0.77 9.43 10.23 10.71 11.22 12.42 3656 1
lp__ -284.55 0.02 1.02 -287.21 -284.94 -284.24 -283.82 -283.57 1905 1
Lastly, it reminds you what n_eff
and Rhat
mean:
Samples were drawn using NUTS(diag_e) at Mon Oct 16 23:20:08 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Exercise 2.2
In an R session:
Fit this data and print the summary of the fit using commands:
fit_gt_100 = stan("hello.stan",data=c("N","y"))
print(fit_gt_100);
Does Stan recover the mean and standard deviation?
Fit this data using commands:
fit_lt_100 = stan("hello.stan",data=c("N","y"))
print(fit_lt_100);
How well does Stan recover the mean and standard deviation? How do the reported values for n_eff
and Rhat
change?