July 30, 2016
stanc
, does two things:
real x;
or int K;
vector[K] z;
, row_vector[K] zt;
, matrix[N,K] X;
int y[N];
or int Y[N,P]
;lookup
suppressPackageStartupMessages(library(rstan)) lookup("besselK")
## StanFunction Arguments ReturnType Page ## 290 modified_bessel_second_kind (int v, real z) real 342
functions
Block of a Stan Programfunctions
blockexpose_stan_functions
data
, transformed data
, parameters
, or transformed parameters
blocksint<lower=1> K; real<lower=-1,upper=1> rho;
vector<lower=0>[K] alpha;
and similarly for a matrix
vector
can be specialized as
unit_vector[K] x;
implies \(\sum_{k=1}^K{x_k^2} = 1\)simplex[K] x;
implies \(x_k \geq 0 \forall k\) and \(\sum_{k=1}^K{x_k} = 1\)ordered[K] x;
implies \(x_i \leq x_j \forall i<j\)positive_ordered[K] x;
implies also \(0 \leq x_1\)matrix
can be specialized as
cov_matrix[K] Sigma
or better cholesky_factor_cov[K,K] L;
corr_matrix[K] Lambda
or better cholesky_factor_corr[K] L;
data
Block of a Stan Program// comment
or /* comment */
)Whitespace is essentially irrelevant
data { int<lower=1> N; // number of observations int<lower=1> K; // number of predictors matrix[N, K] X; // design matrix vector[N] y; // outcomes real<lower=0> prior_mean; // hyperparameter }
transformed data
Blockdata
blockAll declarations must come directly after the opening {
transformed data { vector[N] log_y; log_y = log(y); }
parameters
Block of a Stan Programint
parametersparameters
blockMust specify the sample space of the parameters but lower and upper bounds are implicitly \(\pm\infty\) if unspecified
parameters { vector[K] beta; real<lower=0> sigma_unscaled; // Jacobian handled automatically here }
transformed parameters
Blocktransformed data
but involves objects declared in the parameters
block and is evaluated each leapfrog stepConstraints are validated and draws are stored
transformed parameters { real<lower=0> sigma; sigma = sigma_unscaled * prior_mean; }
model
Block of a Stan Programtarget
keywordCan declare local objects at the top of the model
block and then assign to them but draws are not stored
model { vector[N] eta; eta = X * beta; target += normal_lpdf(log_y | eta, sigma); // likelihood of log(y) target += normal_lpdf(beta | 0, 5); // prior for each beta_k target += exponential_lpdf(sigma_unscaled | 1); // prior for sigma_unscaled }
Can increment target
with user-defined functions or arbitrary expressions
generated quantities
Blockdata
, transformed data
, parameters
, or transformed parameters
blocksCan use pseduo-random number generation
generated quantities { vector[N] y_rep; // posterior beliefs about each y[n] for (n in 1:N) y_rep[n] = normal_rng(X[n,] * beta, sigma); }
state.x77 <- within(as.data.frame(state.x77), { # choose reasonable units Density <- Population / Area Income <- Income / 1000 Frost <- Frost / 100 }) X <- model.matrix(Murder ~ Density + Income + Illiteracy + Frost, data = state.x77) y <- state.x77$Murder data_block <- list(N = nrow(X), K = ncol(X), X = X, y = y, prior_mean = 5) options(mc.cores = parallel::detectCores()) post <- stan("regression.stan", data = data_block)
print(post, pars = 'y_rep', include = FALSE, digits = 2, probs = c(.25, .75))
## Inference for Stan model: regression. ## 4 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=4000. ## ## mean se_mean sd 25% 75% n_eff Rhat ## beta[1] 0.75 0.02 0.74 0.26 1.22 1476 1 ## beta[2] -0.60 0.01 0.33 -0.83 -0.39 2129 1 ## beta[3] 0.17 0.00 0.13 0.08 0.26 1666 1 ## beta[4] 0.57 0.00 0.16 0.46 0.68 1728 1 ## beta[5] -0.23 0.00 0.18 -0.35 -0.11 2090 1 ## sigma_unscaled 0.09 0.00 0.01 0.09 0.10 2368 1 ## sigma 0.47 0.00 0.05 0.44 0.51 2368 1 ## lp__ -47.99 0.05 1.82 -48.95 -46.64 1315 1 ## ## Samples were drawn using NUTS(diag_e) at Sat Jul 30 13:28:10 2016. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1).
CA_rep <- extract(post, pars = 'y_rep')[[1]][,which(rownames(state.x77) == "California")] hist(state.x77["California", "Murder"] - exp(CA_rep), prob = TRUE, main = "Errors for California", las = 1)
stan_glmer
Function[g]lmer
functions in the lme4 R package are very popular because people want to quickly estimate hierarchical models with a convenient syntax and interpret the results as if they were Bayesianstan_glmer
function in the rstanarm R package and interpret the results in a genuinely Bayesian fashionrstan.package.skeleton
function in the rstan package