- Start R or R Studio.
Find your current working directory

`getwd() [1] "/Users/mitzi/tmp"`

Change your working directory (as needed)

`setwd("path/to/work/dir")`

`hello.stan`

In exercise 1.2, we created a data generating process which produces draws from a normally-distributed RV with mean 5.0 and standard deviation 10.0. The Stan model which fits this data is:

```
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
y ~ normal(mu, sigma);
}
```

Two things to note:

In Stan, parameters must be declared with constraints corresponding to their support in the model. Because the standard deviation \(\sigma\) must be positive, we place a lower bound on the parameter declaration.

Because we don’t specify priors on parameters

`mu`

and`sigma`

, Stan puts an uniform prior from \(-\infty\) to \(+\infty\) on these parameters; this is an improper prior. Stan can fit models with improper priors as long as there is data which results in a proper posterior. For guidance on choosing priors, see Stan wiki page: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

`stan`

The RStan function `stan`

fits a Stan model and returns the fitted result as a `stanfit`

object.

**Exercise 2.1** Hello World in Stan

Create the file “hello.stan” in your current R working directory and copy the above program into it.

In your R environment, create variables “N” and “y”, where N is the number of draws, and y is a vector of draw from a normally-distributed RV with mean 5.0 and standard deviation 10.0:

`N = 100 y = rnorm(N, mean=5, sd=10)`

Run the

`stan`

command to fit the model to the data (variables “N” and “y”) and save the resulting`stanfit`

object:`stanfit = stan("hello.stan",data=c("N","y"))`

Print a summary of the

`stanfit`

object:`print(stanfit)`

First, the summary reports the number of draws from the posterior saved into the

`stanfit`

object:`Inference for Stan model: hello. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000.`

Next, it reports on all the parameters and any top-level variables declared in the transformed parameters and generated quantities blocks:

`mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat mu 5.48 0.02 1.09 3.24 4.78 5.50 6.21 7.58 2953 1 sigma 10.76 0.01 0.77 9.43 10.23 10.71 11.22 12.42 3656 1 lp__ -284.55 0.02 1.02 -287.21 -284.94 -284.24 -283.82 -283.57 1905 1`

Lastly, it reminds you what

`n_eff`

and`Rhat`

mean:`Samples were drawn using NUTS(diag_e) at Mon Oct 16 23:20:08 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).`

**Exercise 2.2**

In an R session:

- Set variable “N” to some value > 100
- Generate a vector “y” consisting N draws from a normally distributed RV with mean and standard deviation of your choice
Fit this data and print the summary of the fit using commands:

`fit_gt_100 = stan("hello.stan",data=c("N","y")) print(fit_gt_100);`

Does Stan recover the mean and standard deviation?

- Repeat this procedure, setting N to some value < 100, generate a new vector of N y’s
Fit this data using commands:

`fit_lt_100 = stan("hello.stan",data=c("N","y")) print(fit_lt_100);`

How well does Stan recover the mean and standard deviation? How do the reported values for `n_eff`

and `Rhat`

change?