Stan User's Guide
Overview
Part 1. Example Models
1
Regression Models
1.1
Linear regression
Matrix notation and vectorization
1.2
The QR reparameterization
1.3
Priors for coefficients and scales
1.4
Robust noise models
1.5
Logistic and probit regression
1.6
Multi-logit regression
Identifiability
1.7
Parameterizing centered vectors
\(K-1\)
degrees of freedom
QR decomposition
Translated and scaled simplex
Soft centering
1.8
Ordered logistic and probit regression
Ordered logistic regression
1.9
Hierarchical logistic regression
1.10
Hierarchical priors
Boundary-avoiding priors for MLE in hierarchical models
1.11
Item-response theory models
Data declaration with missingness
1PL (Rasch) model
Multilevel 2PL model
1.12
Priors for identifiability
Location and scale invariance
Collinearity
Separability
1.13
Multivariate priors for hierarchical models
Multivariate regression example
1.14
Prediction, forecasting, and backcasting
Programming predictions
Predictions as generated quantities
1.15
Multivariate outcomes
Seemingly unrelated regressions
Multivariate probit regression
1.16
Applications of pseudorandom number generation
Prediction
Posterior predictive checks
2
Time-Series Models
2.1
Autoregressive models
AR(1) models
Extensions to the AR(1) model
AR(2) models
AR(
\(K\)
) models
ARCH(1) models
2.2
Modeling temporal heteroscedasticity
GARCH(1,1) models
2.3
Moving average models
MA(2) example
Vectorized MA(Q) model
2.4
Autoregressive moving average models
Identifiability and stationarity
2.5
Stochastic volatility models
2.6
Hidden Markov models
Supervised parameter estimation
Start-state and end-state probabilities
Calculating sufficient statistics
Analytic posterior
Semisupervised estimation
Predictive inference
3
Missing Data and Partially Known Parameters
3.1
Missing data
3.2
Partially known parameters
3.3
Sliced missing data
3.4
Loading matrix for factor analysis
3.5
Missing multivariate data
4
Truncated or Censored Data
4.1
Truncated distributions
4.2
Truncated data
Constraints and out-of-bounds returns
Unknown truncation points
4.3
Censored data
Estimating censored values
Integrating out censored values
5
Finite Mixtures
5.1
Relation to clustering
5.2
Latent discrete parameterization
5.3
Summing out the responsibility parameter
Log sum of exponentials: linear Sums on the log scale
Dropping uniform mixture ratios
Recovering posterior mixture proportions
Estimating parameters of a mixture
5.4
Vectorizing mixtures
5.5
Inferences supported by mixtures
Mixtures with unidentifiable components
Inference under label switching
5.6
Zero-inflated and hurdle models
Zero inflation
Hurdle models
5.7
Priors and effective data size in mixture models
Comparison to model averaging
6
Measurement Error and Meta-Analysis
6.1
Bayesian measurement error model
Regression with measurement error
Rounding
6.2
Meta-analysis
Treatment effects in controlled studies
7
Latent Discrete Parameters
7.1
The benefits of marginalization
7.2
Change point models
Model with latent discrete parameter
Marginalizing out the discrete parameter
Coding the model in Stan
Fitting the model with MCMC
Posterior distribution of the discrete change point
Discrete sampling
Posterior covariance
Multiple change points
7.3
Mark-recapture models
Simple mark-recapture model
Cormack-Jolly-Seber with discrete parameter
Collective Cormack-Jolly-Seber model
Individual Cormack-Jolly-Seber model
7.4
Data coding and diagnostic accuracy models
Diagnostic accuracy
Data coding
Noisy categorical measurement model
Model parameters
Noisy measurement model
Stan implementation
8
Sparse and Ragged Data Structures
8.1
Sparse data structures
8.2
Ragged data structures
9
Clustering Models
9.1
Relation to finite mixture models
9.2
Soft
K
-means
9.2.1
Geometric hard
K
-means clustering
9.2.2
Soft
K
-means clustering
Stan implementation of soft
K
-means
Generalizing soft
K
-means
9.3
The difficulty of Bayesian inference for clustering
Non-identifiability
Multimodality
9.4
Naive Bayes classification and clustering
Coding ragged arrays
Estimation with category-labeled training data
Estimation without category-labeled training data
Full Bayesian inference for naive Bayes
Prediction without model updates
9.5
Latent Dirichlet allocation
The LDA Model
Summing out the discrete parameters
Implementation of LDA
Correlated topic model
10
Gaussian Processes
10.1
Gaussian process regression
10.2
Simulating from a Gaussian process
Multivariate inputs
Cholesky factored and transformed implementation
10.3
Fitting a Gaussian process
GP with a normal outcome
Discrete outcomes with Gaussian processes
Automatic relevance determination
10.3.1
Priors for Gaussian process parameters
Predictive inference with a Gaussian process
Multiple-output Gaussian processes
11
Directions, Rotations, and Hyperspheres
11.1
Unit vectors
11.2
Circles, spheres, and hyperspheres
11.3
Transforming to unconstrained parameters
11.4
Unit vectors and rotations
Angles from unit vectors
11.5
Circular representations of days and years
12
Solving Algebraic Equations
12.1
Example: system of nonlinear algebraic equations
12.2
Coding an algebraic system
12.3
Calling the algebraic solver
Data versus parameters
Length of the algebraic function and of the vector of unknowns
Pathological solutions
12.4
Control parameters for the algebraic solver
Tolerance
Maximum number of steps
13
Ordinary Differential Equations
13.1
Notation
13.2
Example: simple harmonic oscillator
13.3
Coding the ODE system function
Strict signature
13.4
Measurement error models
Simulating noisy measurements
Estimating system parameters and initial state
13.5
Stiff ODEs
13.6
Control parameters for ODE solving
Discontinuous ODE system function
Tolerance
Maximum number of steps
13.7
Adjoint ODE solver
13.8
Solving a system of linear ODEs using a matrix exponential
14
Computing One Dimensional Integrals
14.1
Calling the integrator
14.1.1
Limits of integration
14.1.2
Data vs. parameters
14.2
Integrator convergence
14.2.1
Zero-crossing integrals
14.2.2
Avoiding precision loss near limits of integration in definite integrals
15
Complex Numbers
15.1
Working with complex numbers
15.1.1
Constructing and accessing complex numbers
15.1.2
Complex assignment and promotion
15.1.3
Complex arrays
15.1.4
Complex functions
15.2
Complex random variables
15.3
Complex matrices and vectors
15.4
Complex linear regression
15.4.1
Independent real and imaginary error
15.4.2
Dependent complex error
16
Differential-Algebraic Equations
16.1
Notation
16.2
Example: chemical kinetics
16.3
Index of DAEs
16.4
Coding the DAE system function
Strict signature
16.5
Solving DAEs
16.6
Control parameters for DAE solving
Maximum number of steps
Part 2. Programming Techniques
17
Floating Point Arithmetic
17.1
Floating-point representations
17.1.1
Finite values
17.1.2
Normality
17.1.3
Ranges and extreme values
17.1.4
Signed zero
17.1.5
Not-a-number values
17.1.6
Positive and negative infinity
17.2
Literals: decimal and scientific notation
17.3
Arithmetic precision
17.3.1
Rounding and probabilities
17.3.2
Machine precision and the asymmetry of 0 and 1
17.3.3
Complementary and epsilon functions
17.3.4
Catastrophic cancellation
17.3.5
Overflow
17.3.6
Underflow and the log scale
17.4
Log sum of exponentials
17.4.1
Log-sum-exp function
17.4.2
Applying log-sum-exp to a sequence
17.4.3
Calculating means with log-sum-exp
17.5
Comparing floating-point numbers
18
Matrices, Vectors, and Arrays
18.1
Basic motivation
18.2
Fixed sizes and indexing out of bounds
18.3
Data type and indexing efficiency
Matrices vs. two-dimensional arrays
(Row) vectors vs. one-dimensional arrays
18.4
Memory locality
Memory locality
Matrices
Arrays
18.5
Converting among matrix, vector, and array types
18.6
Aliasing in Stan containers
19
Multiple Indexing and Range Indexing
19.1
Multiple indexing
19.2
Slicing with range indexes
Lower and upper bound indexes
Lower or upper bound indexes
Full range indexes
Slicing functions
19.3
Multiple indexing on the left of assignments
Assign-by-value and aliasing
19.4
Multiple indexes with vectors and matrices
Vectors
Matrices
Matrices with one multiple index
Arrays of vectors or matrices
Block, row, and column extraction for matrices
19.5
Matrices with parameters and constants
20
User-Defined Functions
20.1
Basic functions
User-defined functions block
Function bodies
Return statements
Reject statements
Type declarations for functions
Array types for function declarations
Data-only function arguments
20.2
Functions as statements
20.3
Functions accessing the log probability accumulator
20.4
Functions acting as random number generators
20.5
User-defined probability functions
20.6
Overloading functions
20.6.1
Warning on usage
20.6.2
Function resolution
20.7
Documenting functions
20.8
Summary of function types
Void vs. non-void return
Suffixed or non-suffixed
20.9
Recursive functions
20.10
Truncated random number generation
Generation with inverse CDFs
Truncated variate generation
21
Custom Probability Functions
21.1
Examples
Triangle distribution
Exponential distribution
Bivariate normal cumulative distribution function
22
Proportionality Constants
22.1
Dropping Proportionality Constants
22.2
Keeping Proportionality Constants
22.3
User-defined Distributions
22.4
Limitations on Using
_lupdf
and
_lupmf
Functions
23
Problematic Posteriors
23.1
Collinearity of predictors in regressions
Examples of collinearity
Mitigating the invariances
23.2
Label switching in mixture models
Mixture models
Convergence monitoring and effective sample size
Some inferences are invariant
Highly multimodal posteriors
Hacks as fixes
23.3
Component collapsing in mixture models
23.4
Posteriors with unbounded densities
Mixture models with varying scales
Beta-binomial models with skewed data and weak priors
23.5
Posteriors with unbounded parameters
Separability in logistic regression
23.6
Uniform posteriors
23.7
Sampling difficulties with problematic priors
Gibbs sampling
Hamiltonian Monte Carlo sampling
No-U-turn sampling
Examples: fits in Stan
24
Reparameterization and Change of Variables
24.1
Theoretical and practical background
24.2
Reparameterizations
Beta and Dirichlet priors
Transforming unconstrained priors: probit and logit
24.3
Changes of variables
Change of variables vs. transformations
Multivariate changes of variables
24.4
Vectors with varying bounds
Varying lower bounds
Varying upper and lower bounds
25
Efficiency Tuning
25.1
What is efficiency?
25.2
Efficiency for probabilistic models and algorithms
25.3
Statistical vs. computational efficiency
25.4
Model conditioning and curvature
Condition number and adaptation
Unit scales without correlation
Varying curvature
Reparameterizing with a change of variables
25.5
Well-specified models
25.6
Avoiding validation
25.7
Reparameterization
Example: Neal’s funnel
Reparameterizing the Cauchy
Reparameterizing a Student-t distribution
Hierarchical models and the non-centered parameterization
Non-centered parameterization
Multivariate reparameterizations
25.8
Vectorization
Gradient bottleneck
Vectorizing summations
Vectorization through matrix operations
Vectorized probability functions
Reshaping data for vectorization
25.9
Exploiting sufficient statistics
25.10
Aggregating common subexpressions
25.11
Exploiting conjugacy
25.12
Standardizing predictors and outputs
Standard normal distribution
25.13
Using map-reduce
26
Parallelization
26.1
Reduce-sum
26.1.1
Example: logistic regression
26.1.2
Picking the grainsize
26.2
Map-rect
26.2.1
Map function
Map function signature
26.2.2
Example: logistic regression
26.2.3
Example: hierarchical logistic regression
26.2.4
Ragged inputs and outputs
26.3
OpenCL
Part 3. Posterior Inference & Model Checking
27
Posterior Predictive Sampling
27.1
Posterior predictive distribution
27.2
Computing the posterior predictive distribution
27.3
Sampling from the posterior predictive distribution
27.4
Posterior predictive simulation in Stan
27.4.1
Simple Poisson model
27.4.2
Stan code
27.4.3
Analytic posterior and posterior predictive
27.5
Posterior prediction for regressions
27.5.1
Posterior predictive distributions for regressions
27.5.2
Stan program
27.6
Estimating event probabilities
27.7
Stand-alone generated quantities and ongoing prediction
28
Simulation-Based Calibration
28.1
Bayes is calibrated by construction
28.2
Simulation-based calibration
28.3
SBC in Stan
28.3.1
Example model
28.3.2
Testing a Stan program with simulation-based calibration
28.3.3
Pseudocode for simulation-based calibration
28.3.4
The importance of thinning
28.4
Testing uniformity
28.4.1
Indexing to simplify arithmetic
28.5
Examples of simulation-based calibration
28.5.1
When things go right
28.5.2
When things go wrong
28.5.3
When Stan’s sampler goes wrong
29
Posterior and Prior Predictive Checks
29.1
Simulating from the posterior predictive distribution
29.2
Plotting multiples
29.3
Posterior ``p-values’’
29.3.1
Which statistics to test?
29.4
Prior predictive checks
29.4.1
Coding prior predictive checks in Stan
29.5
Example of prior predictive checks
29.6
Mixed predictive replication for hierarchical models
29.7
Joint model representation
29.7.1
Posterior predictive model
29.7.2
Prior predictive model
29.7.3
Mixed replication for hierarchical models
30
Held-Out Evaluation and Cross-Validation
30.1
Evaluating posterior predictive densities
30.1.1
Stan program
30.2
Estimation error
30.2.1
Parameter estimates
30.2.2
Predictive estimates
30.2.3
Predictive estimates in Stan
30.3
Cross-validation
30.3.1
Stan implementation with random folds
30.3.2
User-defined permutations
30.3.3
Cross-validation with structured data
30.3.4
Cross-validation with spatio-temporal data
30.3.5
Approximate cross-validation
31
Poststratification
31.1
Some examples
31.1.1
Earth science
31.1.2
Polling
31.2
Bayesian poststratification
31.3
Poststratification in Stan
31.4
Regression and poststratification
31.5
Multilevel regression and poststratification
31.5.1
Dealing with small partitions and non-identifiability
31.6
Coding MRP in Stan
31.6.1
Binomial coding
31.6.2
Coding binary groups
31.7
Adding group-level predictors
32
Decision Analysis
32.1
Outline of decision analysis
32.2
Example decision analysis
Step 1. Define decisions and outcomes
Step 2. Define density of outcome conditioned on decision
Step 3. Define the utility function
Step 4. Maximize expected utility
32.3
Continuous choices
33
The Bootstrap and Bagging
33.1
The bootstrap
33.1.1
Estimators
33.1.2
The bootstrap in pseudocode
33.2
Coding the bootstrap in Stan
33.3
Error statistics from the bootstrap
33.3.1
Standard errors
33.3.2
Confidence intervals
33.4
Bagging
33.5
Bayesian bootstrap and bagging
Appendices
34
Using the Stan Compiler
34.1
Command-line options for stanc3
34.2
Understanding stanc3 errors and warnings
34.2.1
Warnings
34.2.2
Errors
34.3
Pedantic mode
34.3.1
Distribution argument and variate constraint issues
34.3.2
Special-case distribution issues
34.3.3
Unused parameters
34.3.4
Large or small constants in a distribution
34.3.5
Control flow depends on a parameter
34.3.6
Parameters with multiple tildes
34.3.7
Parameters with zero or multiple priors
34.3.8
Variables used before assignment
34.3.9
Strict or nonsensical parameter bounds
34.3.10
Nonlinear transformations
34.3.11
Pedantic mode limitations
34.4
Automatic updating and formatting of Stan programs
34.4.1
Automatic formatting
34.4.2
Canonicalizing
34.4.3
Known issues
34.5
Optimization
34.5.1
Optimization levels
34.5.2
O1
Optimizations
34.5.3
Oexperimental
Optimizations
35
Stan Program Style Guide
35.1
Choose a consistent style
35.2
Line length
35.3
File extensions
35.4
Variable naming
35.5
Local variable scope
35.6
Parentheses and brackets
Braces for single-statement blocks
Parentheses in nested operator expressions
No open brackets on own line
35.7
Conditionals
35.8
Functions
35.9
White space
Line breaks between statements and declarations
No tabs
Two-character indents
35.9.1
Space between
if
,
{
and condition
No space for function calls
Spaces around operators
No spaces in type constraints
Breaking expressions across lines
Spaces after commas
Unix newlines
36
Transitioning from BUGS
36.1
Some differences in how BUGS and Stan work
BUGS is interpreted, Stan is compiled
BUGS performs MCMC updating one scalar parameter at a time, Stan uses HMC which moves in the entire space of all the parameters at each step
Differences in tuning during warmup
The Stan language is directly executable, the BUGS modeling language is not
Differences in statement order
Stan computes the gradient of the log density, BUGS computes the log density but not its gradient
Both BUGS and Stan are semi-automatic
Licensing
Interfaces
Platforms
36.2
Some differences in the modeling languages
36.3
Some differences in the statistical models that are allowed
36.4
Some differences when running from R
36.5
The Stan community
References
Stan User’s Guide
This is an old version,
view current version
.
15
Complex Numbers
Stan supports complex scalars, matrices, and vectors as well as real-based ones.