Case Studies
opensource methods and models
The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming.
Contributing Case Studies
To contribute a case study, please contact us through the Stan Forums. We require

a documented, reproducible example with narrative documentation (e.g., knitr or Jupyter with software/compiler versions noted and seeds fixed) and

an opensource code license (preferably BSD or GPL for code, Creative Commons for text); authors retain all copyright.
Stan Case Studies, Volume 5 (2018)
PredatorPrey Population Dynamics: the LotkaVolterra model in Stan
Lotka (1925) and Volterra (1926) formulated parameteric differential equations that characterize the oscillating populations of predators and prey. A statistical model to account for measurement error and unexplained variation uses the deterministic solutions to the LotkaVolterra equations as expected population sizes. Stan is used to encode the statistical model and perform full Bayesian inference to solve the inverse problem of inferring parameters from noisy data. The model is fit to Canadian lynx and snowshoe hare populations between 1900 and 1920, based on the number of pelts collected annually by the Hudson’s Bay Company. Posterior predictive checks for replicated data show the model fits this data well. Full Bayesian inference may be used to estimate future (or past) populations.
View (HTML)
 Author
 Bob Carpenter
 Keywords
 population dynamics, LotkaVolterra equations, differential equations, posterior predictive checks
 Source Repository
 standev/examplemodels/knitr/lotkavolterra (GitHub)
 R Package Dependencies
 rstan, ggplot2, gridExtra, knitr, reshape tufte
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Nearest neighbor Gaussian process (NNGP) models in Stan
Nearest neighbor Gaussian process (NNGP) based models is a family of highly scalable Gaussian processes based models. In brief, NNGP extends the Vecchia’s approximation (Vecchia 1988) to a process using conditional independence given information from neighboring locations. This case study shows how to express and fit these models in Stan.
View (HTML)
 Author
 Lu Zhang
 Keywords
 Gaussian process, nearest neighbor Gaussian process, spatial models, latent process, regression
 Source Repository
 LuZhangstat/NNGP_STAN (GitHub)
 R Package Dependencies
 rstan
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Stan Case Studies, Volume 4 (2017)
Extreme value analysis and user defined probability functions in Stan
This notebook demonstrates how to implement user defined probability functions in Stan language. As an example I use the generalized Pareto distribution (GPD) to model geomagnetic storm data from the World Data Center for Geomagnetism.
View (HTML)
 Author
 Aki Vehtari
 Keywords
 extreme value analysis, generalized Pareto distribution, user defined probability functions
 Source Repository
 avehtari/BDA_R_demos/demos_rstan/gpareto_functions (GitHub)
 R Package Dependencies
 rstan, bayesplot, loo, ggplot2, tidyr, dplyr, extraDistr, gridExtra
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Modelling Loss Curves in Insurance with RStan
Loss curves are a standard actuarial technique for helping insurance companies assess the amount of reserve capital they need to keep on hand to cover claims from a line of business. Claims made and reported for a given accounting period are tracked seperately over time. This enables the use of historical patterns of claim development to predict expected total claims for newer policies.
We model the growth of the losses in each accounting period as an increasing function of time, and use the model to estimate the parameters which determine the shape and form of this growth. We also use the sampler to estimate the values of the “ultimate loss ratio”, i.e. the ratio of the total claims on an accounting period to the total premium received to write those policies. We treat each accounting period as a cohort.
View (HTML)
 Author
 Mick Cooney
 Keywords
 actuarial science, loss curves, insurance, ultimate loss ratio, hierarchical model
 Source Repository
 kaybenleroll/stancasestudy_losscurves (GitHub)
 R Package Dependencies
 rstan, bayesplot, tidyverse, scales, cowplot
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Splines in Stan
In this document, we discuss the implementation of splines in Stan. We start by providing a brief introduction to splines and then explain how they can be implemented in Stan. We also discuss a novel prior that alleviates some of the practical challenges of spline models.
View (HTML)
 Author
 Milad Kharratzadeh
 Keywords
 Bsplines, piecewise regression, knots, priors
 Source Repository
 milkha/Splines_in_Stan (GitHub)
 R Package Dependencies
 rstan, splines
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
View (HTML)
 Author
 Milad Kharratzadeh
 Keywords
 Bsplines, piecewise regression, knots, priors
 Source Repository
 milkha/Splines_in_Stan (GitHub)
 R Package Dependencies
 splines, rstan,
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Spatial Models in Stan: Intrinsic AutoRegressive Models for Areal Data
This case study shows how to efficiently encode and compute an Intrinsic Conditional AutoRegressive (ICAR) model in Stan. When data has a neighborhood structure, ICAR models provide spatial smoothing by averaging measurements of directly adjoining regions. The Besag, York, and Mollié (BYM) model is a Poisson GLM which includes both an ICAR component and an ordinary randomeffects component for nonspatial heterogeneity. We compare two variants of the BYM model and fit two datasets taken from epidemiological studies over 56 and 700 regions, respectively.
View (HTML)
 Author
 Mitzi Morris
 Keywords
 spatial modeling, CAR, ICAR, INLA, OpenBUGS, hierarchical models
 Source Repository
 standev/examplemodels (GitHub)
 R Package Dependencies
 rstan, rstanarm, ggplot2, broom, reshape2, dplyr, maptools, spdep, RINLA, R2OpenBugs
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
The QR Decomposition for Regression Models
This case study reviews the QR decomposition, a technique for decorrelating covariates and, consequently, the resulting posterior distribution in regression models.
View (HTML)
 Author
 Michael Betancourt
 Keywords
 Markov chain Monte Carlo, regression, RStan
 Source Repository
 betanalpha/knitr_case_studies/qr_regression (GitHub)
 R Package Dependencies
 rstan, knitr.
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Robust RStan Workflow
This case study demonstrates the recommended RStan workflow for ensuring robust inferences with the default dynamic Hamiltonian Monte Carlo algorithm.
View (HTML)
 Author
 Michael Betancourt
 Keywords
 Markov chain Monte Carlo, Hamiltonian Monte Carlo, divergences, RStan
 Source Repository
 betanalpha/knitr_case_studies/rstan_workflow (GitHub)
 R Package Dependencies
 rstan, knitr.
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Robust PyStan Workflow
This case study demonstrates the recommended PyStan workflow for ensuring robust inferences with the default dynamic Hamiltonian Monte Carlo algorithm.
View (HTML)
 Author
 Michael Betancourt
 Keywords
 Markov chain Monte Carlo, Hamiltonian Monte Carlo, divergences, PyStan
 Source Repository
 betanalpha/jupyter_case_studies/pystan_workflow (GitHub)
 Python Package Dependencies
 rstan, pystan, pickle, numpy, md5.
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Typical Sets and the Curse of Dimensionality
This case study illustrates the socalled “curse of dimensionality” using simple examples based on simulation to show that all points are far away in high dimensions and that the mode is an atypical draw from a multivariate normal. The informationtheoretic concept of typical set is illustrated with both discrete and continuous cases, which show that probability mass is a product of volume and density (or count and mass in the discrete case). It also illustrates Monte Carlo methods and relates distance to the log density of the normal distribution and the chisquared distribution.
View R version (HTML)
 Authors
 Bob Carpenter
 Keywords
 probability mass, typical sets, concentration of measure, Monte Carlo methods
 Source Repository (R)
 standev/examplemodels/knitr/cursedims (GitHub)
 R Package Dependencies
 ggplot2
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
View Python version (HTML)
 Author (Python translation)
 Aravind S (Python translation)
 Source Repository (Python)
 Aravindsds/StanCode/python notebooks/curse_dims (GitHub)
 Python Package Dependencies
 numpy, scipy, pandas, matplotlib, collections, sys
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Diagnosing Biased Inference with Divergences
This case study discusses the subtleties of accurate Markov chain Monte Carlo estimation and how divergences can be used to identify biased estimation in practice.
View (HTML)
 Author
 Michael Betancourt
 Keywords
 Markov chain Monte Carlo, Hamiltonian Monte Carlo, divergences, RStan
 Source Repository
 betanalpha/knitr_case_studies/divergences_and_bias (GitHub)
 R Package Dependencies
 rstan, knitr.
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Identifying Bayesian Mixture Models
This case study discusses the common pathologies of Bayesian mixture models as well as some strategies for identifying and overcoming them.
View (HTML)
 Author
 Michael Betancourt
 Keywords
 Markov chain Monte Carlo, Hamiltonian Monte Carlo, mixture models, multimodal models, RStan
 Source Repository
 betanalpha/knitr_case_studies/identifying_mixture_models (GitHub)
 R Package Dependencies
 rstan, knitr.
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
How the Shape of a Weakly Informative Prior Affects Inferences
This case study reviews the basics of weaklyinformative priors and how the choice of a specific shape of such a prior affects the resulting posterior distribution.
View (HTML)
 Author
 Michael Betancourt
 Keywords
 Markov chain Monte Carlo, Hamiltonian Monte Carlo, priors, weaklyinformative priors, RStan
 Source Repository
 betanalpha/knitr_case_studies/weakly_informative_shapes (GitHub)
 R Package Dependencies
 rstan, knitr.
 License
 Code: BSD (3 clause), Text: CC BYNC 4.0
Stan Case Studies, Volume 3 (2016)
Exact Sparse CAR Models in Stan
This document details sparse exact conditional autoregressive (CAR) models in Stan as an extension of previous work on approximate sparse CAR models in Stan. Sparse representations seem to give order of magnitude efficiency gains, scaling better for large spatial data sets.
View (HTML)
 Author
 Max Joseph
 Keywords
 conditional autoregressive (CAR), independent autoregressive (IAR), sparsity, spatial random effects, maps
 Source Repository
 mbjoseph/CARstan (GitHub)
 R Package Dependencies
 rstan, dplyr, ggmcmc, knitr, maptools, rgeos, spdep.
 License
 BSD (3 clause), CCBY
A Primer on Bayesian Multilevel Modeling using PyStan
This case study replicates the analysis of home radon levels using hierarchical models of Lin, Gelman, Price, and Kurtz (1999). It illustrates how to generalize linear regressions to hierarchical models with grouplevel predictors and how to compare predictive inferences and evaluate model fits. Along the way it shows how to get data into Stan using pandas, how to sample using PyStan, and how to visualize the results using Seaborn.
View (HTML)
 Author
 Chris Fonnesbeck
 Keywords
 hierarchical/multilevel modeling, linear regression, model comparison, predictive inference, radon
 Source Repository
 fonnesbeck/stan_workshop_2016 (GitHub)
 Python Package Dependencies
 pystan, numpy, pandas, matplotlib, seaborn
 License
 Apache 2.0 (code), CCBY 3 (text)
The Impact of Reparameterization on Point Estimates
When changing variables, a Jacobian adjustment needs to be provided to account for the rate of change of the transform. Applying the adjustment ensures that inferences that are based on expectations over the posterior are invariant under reparameterizations. In contrast, the posterior mode changes as a result of the reparameterization. In this note, we use Stan to code a repeated binary trial model parameterized by chance of success, along with its reparameterization in terms of log odds in order to demonstrate the effect of the Jacobian adjustment on the Bayesian posterior and the posterior mode. We contrast the posterior mode to the maximum likelihood estimate, which, like the Bayesian estimates, is invariant under reparameterization. Along the way, we derive the logistic distribution by transforming a uniformly distributed variable.
View (HTML)
 Author
 Bob Carpenter
 Keywords
 MLE, Bayesian posterior, reparameterization, Jacobian, binomial
 Source Repository
 examplemodels/knitr/mleparams (GitHub)
 R Package Dependencies
 rstan
 License
 BSD (3 clause), CCBY
Hierarchical TwoParameter Logistic Item Response Model
This case study documents a Stan model for the twoparameter logistic model (2PL) with hierarchical priors. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a grade 12 science assessment is provided.
View (HTML)
 Author
 Daniel C. Furr
 Keywords
 education, item response theory, twoparameter logistic model, hierarchical priors
 Source Repository
 examplemodels/education/hierarchical_2pl (GitHub)
 R Package Dependencies
 rstan, ggplot2, mirt
 License
 BSD (3 clause), CCBY
Rating Scale and Generalized Rating Scale Models with Latent Regression
This case study documents a Stan model for the rating scale model (RSM) and the generalized rating scale model (GRSM) with latent regression. The latent regression portion of the models may be restricted to an intercept only, yielding a standard RSM or GRSM. A brief simulation indicates that the Stan models successfully recover the generating parameters. An example using a survey of public perceptions of science and technology is provided.
View (HTML)
 Authors
 Daniel C. Furr
 Keywords
 education, item response theory, rating scale model, generalized rating scale model
 Source Repository
 examplemodels/education/rsm_and_grsm (GitHub)
 R Package Dependencies
 rstan, edstan, ggplot2, ltm
 License
 BSD (3 clause), CCBY
Partial Credit and Generalized Partial Credit Models with Latent Regression
This case study documents a Stan model for the partial credit model (PCM) and the generalized partial credit model (GPCM) with latent regression. The latent regression portion of the models may be restricted to an intercept only, yielding a standard PCM or GPCM. A brief simulation indicates that the Stan models successfully recover the generating parameters. An example using the TIMSS 2011 mathematics assessment is provided
View (HTML)
 Authors
 Daniel C. Furr
 Keywords
 education, item response theory, partial credit model, generalized partial credit model
 Source Repository
 examplemodels/education/pcm_and_gpcm (GitHub)
 R Package Dependencies
 rstan, edstan, ggplot2, TAM
 License
 BSD (3 clause), CCBY
Rasch and TwoParameter Logistic Item Response Models with Latent Regression
This case study documents Stan models for the Rasch and twoparameter logistic models with latent regression. The latent regression portion of the models may be restricted to an intercept only, yielding standard versions of the models. Simulations indicate that the two models successfully recover generating parameters. An example using a grade 12 science assessment is provided.
View (HTML)
 Authors
 Daniel C. Furr
 Keywords
 education, item response theory, rasch model, twoparameter logistic model
 Source Repository
 examplemodels/education/rasch_and_2pl.html (GitHub)
 R Package Dependencies
 rstan, edstan, ggplot2, TAM
 License
 BSD (3 clause), CCBY
TwoParameter Logistic Item Response Model
This tutorial introduces the R package edstan for estimating twoparameter logistic item response models using Stan without knowing the Stan language. Subsequently, the tutorial explains how the model can be expressed in the Stan language and fit using the rstan package. Specification of prior distributions and assessment of convergence are discussed. Using the Stan language directly has the advantage that it becomes quite easy to extend the model, and this is demonstrated by adding a latent regression and differential item functioning to the model. Posterior predictive model checking is also demonstrated.
View (HTML)
 Author
 Daniel C. Furr, Seung Yeon Lee, JoonHo Lee, and Sophia RabeHesketh
 Keywords
 education, item response theory, twoparameter logistic model
 Source Repository
 examplemodels/education/tutorial_twopl (GitHub)
 R Package Dependencies
 rstan, reshape2, ggplot2, gridExtra, devtools, edstan
 License
 BSD (3 clause), CCBY
Cognitive Diagnosis Model: DINA model with independent attributes
This case study documents a Stan model for the DINA model with independent attributes. A Simulation indicates that the Stan model successfully recovers the generating parameters and predicts respondents’ attribute mastery. A Stan model with no structure of the attributes is also discussed and applied to the simulated data. An example using a subset of the fraction subtraction data is provided.
View (HTML)
 Author
 Seung Yeon Lee
 Keywords
 education, cognitive diagnosis model, diagnostic classification model, attribute mastery, DINA
 Source Repository
 examplemodels/education/dina_independent (GitHub)
 R Package Dependencies
 rstan, ggplot2, CDM
 License
 BSD (3 clause), CCBY
Pooling with Hierarchical Models for Repeated Binary Trials
This note illustrates the effects on posterior inference of pooling data (aka sharing strength) across items for repeated binary trial data. It provides Stan models and R code to fit and check predictive models for three situations: (a) complete pooling, which assumes each item is the same, (b) no pooling, which assumes the items are unrelated, and (c) partial pooling, where the similarity among the items is estimated. We consider two hierarchical models to estimate the partial pooling, one with a beta prior on chance of success and another with a normal prior on the log odds of success. The note explains with working examples how to (i) fit models in RStan and plot the results in R using ggplot2, (ii) estimate event probabilities, (iii) evaluate posterior predictive densities to evaluate model predictions on heldout data, (iv) rank items by chance of success, (v) perform multiple comparisons in several settings, (vi) replicate new data for posterior pvalues, and (vii) perform graphical posterior predictive checks.
View (HTML)
 Author
 Bob Carpenter
 Keywords
 binary trials, pooling, hierarchical models, baseball, epidemiology, prediction, posterior predictive checks
 Source Repository
 examplemodels/knitr/poolbinarytrials (GitHub)
 R Package Dependencies
 rstan, ggplot2, rmarkdown
 License
 BSD (3 clause), CCBY
RStanARM version
There is also a version of this case study in which all models are fit using the RStanARM interface. Many of the visualizations are also created using RStanARM’s plotting functions.
View RStanARM version (HTML)
 Author
 Bob Carpenter, Jonah Gabry, Ben Goodrich
Stan Case Studies, Volume 2 (2015)
Multiple SpeciesSite Occupancy Model
This case study replicates the analysis and output graphs of Dorazio et al. (2006) noisymeasurement occupancy model for multiple species abundance of butterflies. Going beyond the paper, the supercommunity assumptions are tested to show they are invariant to sizing, and posterior predictive checks are provided.
View (HTML)
 Author
 Bob Carpenter
 Keywords
 ecology, occupancy, species abundance, supercommunity, posterior predictive check
 Source Repository
 examplemodels/knitr/dorazioroyleoccupancy (GitHub)
 License
 BSD (3 clause), CCBY
 R Package Dependencies
 rstan, ggplot2, rmarkdown
Stan Case Studies, Volume 1 (2014)
Soil Carbon Modeling with RStan
This case study provides ordinary differential equationbased compartment models of soil carbon flux, with experimental data fitted with unknown initial compartment balance and noisy CO_{2} measurements. Results form Sierra and Müller’s (2014) soilR package are replicated.
View (HTML)
 Author
 Bob Carpenter
 Keywords
 biogeochemistry, compartment ODE, soil carbon respiration, incubation experiment
 Source Repository
 soilmetamodel/stan/soilknit (GitHub)
 License
 BSD (3 clause), CCBY
 R Package Dependencies
 rstan, ggplot2, rmarkdown