Getting Started

• Start R or R Studio.
• Find your current working directory

getwd()
[1] "/Users/mitzi/tmp"
• Change your working directory (as needed)

setwd("path/to/work/dir")

Stan program hello.stan

In exercise 1.2, we created a data generating process which produces draws from a normally-distributed RV with mean 5.0 and standard deviation 10.0. The Stan model which fits this data is:

data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
y ~ normal(mu, sigma);
}

Two things to note:

• In Stan, parameters must be declared with constraints corresponding to their support in the model. Because the standard deviation $$\sigma$$ must be positive, we place a lower bound on the parameter declaration.

• Because we don’t specify priors on parameters mu and sigma, Stan puts an uniform prior from $$-\infty$$ to $$+\infty$$ on these parameters; this is an improper prior. Stan can fit models with improper priors as long as there is data which results in a proper posterior. For guidance on choosing priors, see Stan wiki page: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

RStan function stan

The RStan function stan fits a Stan model and returns the fitted result as a stanfit object.

Exercise 2.1 Hello World in Stan

1. Create the file “hello.stan” in your current R working directory and copy the above program into it.

2. In your R environment, create variables “N” and “y”, where N is the number of draws, and y is a vector of draw from a normally-distributed RV with mean 5.0 and standard deviation 10.0:

N = 100
y = rnorm(N, mean=5, sd=10)
3. Run the stan command to fit the model to the data (variables “N” and “y”) and save the resulting stanfit object:

stanfit = stan("hello.stan",data=c("N","y"))
4. Print a summary of the stanfit object:

print(stanfit)

First, the summary reports the number of draws from the posterior saved into the stanfit object:

Inference for Stan model: hello.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

Next, it reports on all the parameters and any top-level variables declared in the transformed parameters and generated quantities blocks:

     mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
mu       5.48    0.02 1.09    3.24    4.78    5.50    6.21    7.58  2953    1
sigma   10.76    0.01 0.77    9.43   10.23   10.71   11.22   12.42  3656    1
lp__  -284.55    0.02 1.02 -287.21 -284.94 -284.24 -283.82 -283.57  1905    1

Lastly, it reminds you what n_eff and Rhat mean:

Samples were drawn using NUTS(diag_e) at Mon Oct 16 23:20:08 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Exercise 2.2

In an R session:

• Set variable “N” to some value > 100
• Generate a vector “y” consisting N draws from a normally distributed RV with mean and standard deviation of your choice
• Fit this data and print the summary of the fit using commands:

fit_gt_100 = stan("hello.stan",data=c("N","y"))
print(fit_gt_100);

Does Stan recover the mean and standard deviation?

• Repeat this procedure, setting N to some value < 100, generate a new vector of N y’s
• Fit this data using commands:

fit_lt_100 = stan("hello.stan",data=c("N","y"))
print(fit_lt_100);

How well does Stan recover the mean and standard deviation? How do the reported values for n_eff and Rhat change?