An Introduction to Bayesian Inference using R Interfaces to Stan, Part II

summary(post, probs = c(.25, .75))

## stan_glmer(formula = Days ~ (1 | Age:Sex:Eth:Lrn), data = MASS::quine, 
##     family = "neg_binomial_2")
## 
## Family: neg_binomial_2 (log)
## Algorithm: sampling
## Posterior sample size: 4000
## Observations: 146
## Groups: Age:Sex:Eth:Lrn 28
## 
## Estimates:
##                                            mean   sd     25%    75% 
## (Intercept)                                 2.7    0.1    2.6    2.8
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:AL]    0.3    0.3    0.0    0.5
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:SL]   -0.3    0.5   -0.6    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:AL]    0.2    0.3    0.0    0.4
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:SL]    0.2    0.5   -0.1    0.5
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:AL]    0.0    0.3   -0.3    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:SL]   -0.2    0.4   -0.5    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:AL]   -0.6    0.4   -0.9   -0.4
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:SL]    0.5    0.4    0.2    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:AL]   -0.1    0.3   -0.4    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:SL]    0.4    0.3    0.2    0.6
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:AL]   -0.2    0.3   -0.4    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:SL]   -0.7    0.3   -0.9   -0.5
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:AL]   -0.1    0.4   -0.4    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:SL]   -0.2    0.4   -0.5    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:AL]   -0.5    0.5   -0.8   -0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:SL]   -0.6    0.3   -0.8   -0.3
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:AL]   -0.3    0.6   -0.7    0.0
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:SL]    0.7    0.3    0.5    0.9
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:AL]   -0.4    0.6   -0.7    0.0
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:SL]   -0.6    0.3   -0.8   -0.4
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:AL]    0.5    0.3    0.3    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:SL]    0.7    0.4    0.4    0.9
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:AL]   -0.3    0.3   -0.5   -0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:SL]    0.5    0.4    0.2    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:A:AL]    0.0    0.3   -0.2    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:N:AL]    0.0    0.3   -0.2    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:A:AL]    0.5    0.3    0.3    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:N:AL]    0.5    0.3    0.3    0.7
## overdispersion                              1.5    0.2    1.3    1.6
## mean_PPD                                   16.7    2.0   15.3   18.0
## log-posterior                            -585.9    5.8 -589.5 -581.8
## 
## Diagnostics:
##                                          mcse Rhat n_eff
## (Intercept)                              0.0  1.0  1516 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:N:AL] 0.0  1.0  4000 
## overdispersion                           0.0  1.0  4000 
## mean_PPD                                 0.0  1.0  4000 
## log-posterior                            0.2  1.0  1031 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Any primitive object can have lower and / or upper bounds if declared in the data, transformed data, parameters, or transformed parameters blocks
int<lower=1> K; real<lower=-1,upper=1> rho;
vector<lower=0>[K] alpha; and similarly for a matrix
Alternatively, a vector can be specialized as
1. unit_vector[K] x; implies $\sum_{k=1}^K{x_k^2} = 1$
2. unit_vector[K] x; implies $x_k \geq 0 \forall k$ and $\sum_{k=1}^K{x_k} = 1$
3. ordered[K] x; implies $x_i \leq x_j \forall i<j$
4. positive_ordered[K] x; implies also $0 \leq x_1$
Alternatively, a matrix can be specialized as
1. cov_matrix[K] Sigma or better cholesky_factor_cov[K,K] L;
2. corr_matrix[K] Lambda or better cholesky_factor_corr[K] L;

data { int<lower=1> N; // number of observations int<lower=1> K; // number of predictors matrix[N, K] X; // design matrix vector[N] y; // outcomes real<lower=0> prior_scale; // hyperparameter }

model { vector[N] eta; eta = X * beta; target += normal_lpdf(log_y | eta, sigma); // likelihood of log(y) target += normal_lpdf(beta | 0, 5); // prior for each beta_k target += exponential_lpdf(sigma_unscaled | 1); // prior for sigma_unscaled }

state.x77 <- within(as.data.frame(state.x77), { # choose reasonable units Density <- Population / Area Income <- Income / 1000 Frost <- Frost / 100 }) X <- model.matrix(Murder ~ Density + Income + Illiteracy + Frost, data = state.x77) y <- state.x77$Murder data_block <- list(N = nrow(X), K = ncol(X), X = X, y = y, prior_scale = 5) options(mc.cores = parallel::detectCores()) post <- stan("regression.stan", data = data_block)

## Inference for Stan model: regression. ## 4 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=4000. ## ## mean se_mean sd 25% 75% n_eff Rhat ## beta[1] 0.74 0.02 0.74 0.22 1.24 1348 1 ## beta[2] -0.60 0.01 0.33 -0.81 -0.37 2295 1 ## beta[3] 0.17 0.00 0.13 0.08 0.26 1598 1 ## beta[4] 0.57 0.00 0.16 0.46 0.69 1671 1 ## beta[5] -0.22 0.00 0.18 -0.34 -0.11 1847 1 ## sigma_unscaled 0.10 0.00 0.01 0.09 0.10 2271 1 ## sigma 0.48 0.00 0.05 0.44 0.51 2271 1 ## lp__ -48.00 0.05 1.79 -49.01 -46.66 1518 1 ## ## Samples were drawn using NUTS(diag_e) at Mon Jun 27 15:11:23 2016. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## The estimated Bayesian Fraction of Missing Information is a measure of ## the efficiency of the sampler with values close to 1 being ideal. ## For each chain, these estimates are ## 1 0.9 0.9 1

CA_rep <- extract(post, pars = 'y_rep')[[1]][,which(rownames(state.x77) == "California")] hist(state.x77["California", "Murder"] - exp(CA_rep), prob = TRUE, main = "Errors for California", las = 1)

data { int<lower=1> N; // number of observations int<lower=0> y[N]; // outcomes int<lower=1> J; // number of groups int<lower=1, upper=J> group_ID[N]; // what group is y[n] in? real<lower=0> prior_scale_group; // hyperparameter }

Bayesian Perspective on Hierarchical Models

Frequentist Perspective on Hierarchical Models

Limitations of Frequentist Perspective

The `stan_glmer` Function

Results

Workflow for Stan via the rstan R Pakcage

Primitive Object Types in Stan

Builtin Functions in Stan

Optional `functions` Block of a Stan Program

Constrained Object Declarations in Stan

Required `data` Block of a Stan Program

Optional `transformed data` Block

Required `parameters` Block of a Stan Program

Optional `transformed parameters` Block

Required `model` Block of a Stan Program

Optional `generated quantities` Block

Calling a Stan Program

Results

Diagnostics

Implementing a Hierarchical Poisson Model

Summary

Bayesian Perspective on Hierarchical Models

Frequentist Perspective on Hierarchical Models

Limitations of Frequentist Perspective

The stan_glmer Function

Results

Workflow for Stan via the rstan R Pakcage

Primitive Object Types in Stan

Builtin Functions in Stan

Optional functions Block of a Stan Program

Constrained Object Declarations in Stan

Required data Block of a Stan Program

Optional transformed data Block

Required parameters Block of a Stan Program

Optional transformed parameters Block

Required model Block of a Stan Program

Optional generated quantities Block

Calling a Stan Program

Results

Diagnostics

Implementing a Hierarchical Poisson Model

Summary

The `stan_glmer` Function

Optional `functions` Block of a Stan Program

Required `data` Block of a Stan Program

Optional `transformed data` Block

Required `parameters` Block of a Stan Program

Optional `transformed parameters` Block

Required `model` Block of a Stan Program

Optional `generated quantities` Block