Bayesian Modeling Using Stan, Part I

What is the paradigm?	What is fixed?	What is random?	What proportion is important?	What is the conclusion?
Randomization	\({y_1, y_2, \dots, y_N}\)	Treatment assignment	\(p\)-value for null: ATE \(= 0\)?	ATE \(\neq 0\)
Frequentist	\(Y\), \(\boldsymbol{\theta}\), \(N\)	Sample inclusion	\(\theta \in\) confidence intervals (plural)	Something basically Bayesian
Supervised learning	\({y_1, y_2, \dots, y_N}\)	Training / testing inclusion	Correctly classified outcomes in testing data	Some procedure predicts best
Bayesian	\({y_1, y_2, \dots, y_N}\), \(\boldsymbol{\theta}\)	Beliefs about \(\boldsymbol{\theta}\)	Posterior draws of \(\theta \in \left(a,b\right)\)	Decision or action

Randomization

${y_1, y_2, \dots, y_N}$

Treatment assignment

$p$-value for null: ATE $= 0$?

ATE $\neq 0$

Frequentist

$Y$, $\boldsymbol{\theta}$, $N$

Sample inclusion

$\theta \in$ confidence intervals (plural)

Something basically Bayesian

Supervised learning

${y_1, y_2, \dots, y_N}$

Training / testing inclusion

Correctly classified outcomes in testing data

Some procedure predicts best

Bayesian

${y_1, y_2, \dots, y_N}$, $\boldsymbol{\theta}$

Beliefs about $\boldsymbol{\theta}$

Posterior draws of $\theta \in \left(a,b\right)$

Decision or action

par(mar = c(4,4,1,1) + .1, las = 1, bg = "lightgrey") x <- sapply(1:6, FUN = function(i) arima.sim(model = list(ar = 0.9999999), n = 10^6)) matplot(x, type = "l", col = 1:6, lty = 1) for (j in 1:ncol(x)) abline(h = mean(x[,j]), col = j, lty = 2)

data("sat.act", package = "psych") # requires psych R package sat.act <- within(sat.act, { # choose reasonable codings gender <- factor(gender, labels = c("male", "female")) # education <- as.factor(education) }) library(rstanarm) options(mc.cores = parallel::detectCores())

## stan_lm(formula = SATQ ~ gender + education + age, data = sat.act, ## prior = R2(0.25)) ## ## Estimates: ## Median MAD_SD ## (Intercept) 640.38 13.97 ## genderfemale -42.04 8.96 ## education 8.34 3.61 ## age -1.16 0.54 ## sigma 113.63 3.12 ## log-fit_ratio 0.00 0.03 ## R2 0.04 0.01 ## ## Sample avg. posterior predictive ## distribution of y (X = xbar): ## Median MAD_SD ## mean_PPD 610.32 5.87

## 25% 75% ## (Intercept) 630.909 649.767 ## genderfemale -48.152 -36.066 ## education 5.947 10.839 ## age -1.527 -0.803 ## sigma 111.568 115.762 ## log-fit_ratio -0.016 0.021 ## R2 0.029 0.047

## Computed from 4000 by 687 log-likelihood matrix ## ## Estimate SE ## elpd_loo -4230.2 18.8 ## p_loo 4.9 0.4 ## looic 8460.4 37.5 ## ## All Pareto k estimates OK (k < 0.5)

Any primitive object can have lower and / or upper bounds if declared in the data, transformed data, parameters, or transformed parameters blocks
int<lower=1> K; real<lower=-1,upper=1> rho;
vector<lower=0>[K] alpha; and similarly for a matrix
Alternatively, a vector can be specialized as
1. unit_vector[K] x; implies $\sum_{k=1}^K{x_k^2} = 1$
2. simplex[K] x; implies $x_k \geq 0 \forall k$ and $\sum_{k=1}^K{x_k} = 1$
3. ordered[K] x; implies $x_i \leq x_j \forall i<j$
4. positive_ordered[K] x; implies also $0 \leq x_1$
Alternatively, a matrix can be specialized as
1. cov_matrix[K] Sigma or better cholesky_factor_cov[K,K] L;
2. corr_matrix[K] Lambda or better cholesky_factor_corr[K] L;

data { int<lower=1> N; // number of observations int<lower=1> K; // number of predictors matrix[N, K] X; // design matrix vector[N] y; // outcomes real<lower=0> prior_scale; // hyperparameter }

model { vector[N] eta; eta = X * beta; target += normal_lpdf(log_y | eta, sigma); // likelihood of log(y) target += normal_lpdf(beta | 0, 5); // prior for each beta_k target += exponential_lpdf(sigma_unscaled | 1); // prior for sigma_unscaled }

X <- model.matrix(SATQ ~ gender + education + age, data = sat.act) y <- sat.act$SATQ; y <- y[!is.na(y)] data_block <- list(N = nrow(X), K = ncol(X), X = X, y = y, prior_scale = 5) options(mc.cores = parallel::detectCores()) post <- stan("regression.stan", data = data_block)

## Inference for Stan model: regression. ## 4 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=4000. ## ## mean se_mean sd 25% 75% n_eff Rhat ## beta[1] 6.44 0.00 0.03 6.42 6.46 2201 1 ## beta[2] -0.07 0.00 0.02 -0.08 -0.06 2500 1 ## beta[3] 0.02 0.00 0.01 0.01 0.02 2604 1 ## beta[4] 0.00 0.00 0.00 0.00 0.00 3428 1 ## sigma_unscaled 0.04 0.00 0.00 0.04 0.04 2142 1 ## sigma 0.21 0.00 0.01 0.21 0.21 2142 1 ## lp__ 84.63 0.04 1.52 83.87 85.75 1686 1 ## ## Samples were drawn using NUTS(diag_e) at Mon Jul 11 22:34:07 2016. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## The estimated Bayesian Fraction of Missing Information is a measure of ## the efficiency of the sampler with values close to 1 being ideal. ## For each chain, these estimates are ## 1.1 1 1 1.1

summary(post, probs = c(.25, .75))

## stan_glmer(formula = Days ~ (1 | Age:Sex:Eth:Lrn), data = MASS::quine, 
##     family = "neg_binomial_2")
## 
## Family: neg_binomial_2 (log)
## Algorithm: sampling
## Posterior sample size: 4000
## Observations: 146
## Groups: Age:Sex:Eth:Lrn 28
## 
## Estimates:
##                                            mean   sd     25%    75% 
## (Intercept)                                 2.7    0.1    2.6    2.7
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:AL]    0.3    0.4    0.0    0.5
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:SL]   -0.3    0.5   -0.6    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:AL]    0.2    0.3    0.0    0.4
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:SL]    0.2    0.5   -0.1    0.5
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:AL]    0.0    0.3   -0.2    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:SL]   -0.2    0.4   -0.5    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:AL]   -0.6    0.4   -0.9   -0.4
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:SL]    0.5    0.4    0.2    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:AL]   -0.1    0.3   -0.3    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:SL]    0.4    0.3    0.2    0.6
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:AL]   -0.2    0.3   -0.4    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:SL]   -0.6    0.3   -0.8   -0.5
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:AL]   -0.1    0.4   -0.4    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:SL]   -0.2    0.4   -0.5    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:AL]   -0.5    0.5   -0.8   -0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:SL]   -0.5    0.3   -0.8   -0.3
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:AL]   -0.3    0.5   -0.7    0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:SL]    0.8    0.3    0.6    0.9
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:AL]   -0.4    0.6   -0.7    0.0
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:SL]   -0.6    0.3   -0.8   -0.4
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:AL]    0.5    0.3    0.3    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:SL]    0.7    0.3    0.4    0.9
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:AL]   -0.3    0.3   -0.5   -0.1
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:SL]    0.5    0.4    0.2    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:A:AL]    0.0    0.3   -0.1    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:N:AL]    0.0    0.3   -0.2    0.2
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:A:AL]    0.5    0.3    0.3    0.7
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:N:AL]    0.5    0.3    0.3    0.7
## overdispersion                              1.5    0.2    1.3    1.6
## mean_PPD                                   16.6    1.9   15.2   17.9
## log-posterior                            -585.9    5.8 -589.5 -581.8
## 
## Diagnostics:
##                                          mcse Rhat n_eff
## (Intercept)                              0.0  1.0  1693 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:F:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F0:M:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:F:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F1:M:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:F:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:A:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F2:M:N:SL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:F:N:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:A:AL] 0.0  1.0  4000 
## b[(Intercept) Age:Sex:Eth:Lrn:F3:M:N:AL] 0.0  1.0  4000 
## overdispersion                           0.0  1.0  4000 
## mean_PPD                                 0.0  1.0  4000 
## log-posterior                            0.2  1.0  1050 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Installation

Outline

Obligatory Disclosure

Why Bayes?

What Is the Probability that Johnny Manziel Has Narcissistic Personality Disorder?

Different Perspectives on Probability

Two Justifications for Bayes Rule

Markov Chain Monte Carlo

A Markov Process with Severe Dependence

Bayesianism and the Crisis in Psychology

Why Doesn't Everyone Use Bayesian Methods?

Bayesian Process

What is Stan and How Does It Help?

Overview of Hamiltonian Monte Carlo

Example of Drawing from a Multivariate Normal

Details of Hamiltonian Monte Carlo

A Model for SAT Quantiatative Scores

Results

Diagnostics

You Can Do Anything with the Draws

Model Comparison

Using the loo Function

Stan Language

Workflow for Stan via the rstan R Pakcage

Primitive Object Types in Stan

Builtin Functions in Stan

Optional functions Block of a Stan Program

Constrained Object Declarations in Stan

Required data Block of a Stan Program

Optional transformed data Block

Required parameters Block of a Stan Program

Optional transformed parameters Block

Required model Block of a Stan Program

Optional generated quantities Block

Calling a Stan Program

Results

Summary

Hierarchical Models

Bayesian Perspective on Hierarchical Models

Frequentist Perspective on Hierarchical Models

Limitations of Frequentist Perspective

The stan_glmer Function

Results

Using the `loo` Function

Optional `functions` Block of a Stan Program

Required `data` Block of a Stan Program

Optional `transformed data` Block

Required `parameters` Block of a Stan Program

Optional `transformed parameters` Block

Required `model` Block of a Stan Program

Optional `generated quantities` Block

The `stan_glmer` Function