Case Studies

Case Studies

open-source methods and models

The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming.

Contributing Case Studies

To contribute a case study, please contact us through the Stan Forums. We require

  • a documented, reproducible example with narrative documentation (e.g., knitr or Jupyter with software/compiler versions noted and seeds fixed) and

  • an open-source code license (preferably BSD or GPL for code, Creative Commons for text); authors retain all copyright.

Stan Case Studies,   Volume 4   (2017)



Spatial Models in Stan: Intrinsic Auto-Regressive Models for Areal Data

This case study shows how to efficiently encode and compute an Intrinsic Conditional Auto-Regressive (ICAR) model in Stan. When data has a neighborhood structure, ICAR models provide spatial smoothing by averaging measurements of directly adjoining regions. The Besag York Mollié (BYM) model is a Poisson GLM which includes both an ICAR component and an ordinary random-effects component for non-spatial heterogeneity. We compare two variants of the BYM model and fit two datasets taken from epidemiological studies over 56 and 700 regions, respectively.

View (HTML)

The QR Decomposition for Regression Models

This case study reviews the QR decomposition, a technique for decorrelating covariates and, consequently, the resulting posterior distribution in regression models.

View (HTML)

Author
Michael Betancourt
Keywords
Markov chain Monte Carlo, regression, RStan
Source Repository
betanalpha/knitr_case_studies/qr_regression (GitHub)
R Package Dependencies
rstan, knitr.
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

Robust RStan Workflow

This case study demonstrates the recommended RStan workflow for ensuring robust inferences with the default dynamic Hamiltonian Monte Carlo algorithm.

View (HTML)

Author
Michael Betancourt
Keywords
Markov chain Monte Carlo, Hamiltonian Monte Carlo, divergences, RStan
Source Repository
betanalpha/knitr_case_studies/rstan_workflow (GitHub)
R Package Dependencies
rstan, knitr.
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

Robust PyStan Workflow

This case study demonstrates the recommended PyStan workflow for ensuring robust inferences with the default dynamic Hamiltonian Monte Carlo algorithm.

View (HTML)

Author
Michael Betancourt
Keywords
Markov chain Monte Carlo, Hamiltonian Monte Carlo, divergences, PyStan
Source Repository
betanalpha/jupyter_case_studies/pystan_workflow (GitHub)
Python Package Dependencies
rstan, pystan, pickle, numpy, md5.
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

Typical Sets and the Curse of Dimensionality

This case study illustrates the so-called “curse of dimensionality” using simple examples based on simulation to show that all points are far away in high dimensions and that the mode is an atypical draw from a multivariate normal. The information-theoretic concept of typical set is illustrated with both discrete and continuous cases, which show that probability mass is a product of volume and density (or count and mass in the discrete case). It also illustrates Monte Carlo methods and relates distance to the log density of the normal distribution and the chi-squared distribution.

View
(HTML)

Author
Bob Carpenter
Keywords
probability mass, typical sets, concentration of measure, Monte Carlo methods
Source Repository
stan-dev/example-models/knitr/curse-dims (GitHub)
R Package Dependencies
ggplot2
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

Diagnosing Biased Inference with Divergences

This case study discusses the subtleties of accurate Markov chain Monte Carlo estimation and how divergences can be used to identify biased estimation in practice.

View (HTML)

Author
Michael Betancourt
Keywords
Markov chain Monte Carlo, Hamiltonian Monte Carlo, divergences, RStan
Source Repository
betanalpha/knitr_case_studies/divergences_and_bias (GitHub)
R Package Dependencies
rstan, knitr.
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

Identifying Bayesian Mixture Models

This case study discusses the common pathologies of Bayesian mixture models as well as some strategies for identifying and overcoming them.

View (HTML)

Author
Michael Betancourt
Keywords
Markov chain Monte Carlo, Hamiltonian Monte Carlo, mixture models, multimodal models, RStan
Source Repository
betanalpha/knitr_case_studies/identifying_mixture_models (GitHub)
R Package Dependencies
rstan, knitr.
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

How the Shape of a Weakly Informative Prior Affects Inferences

This case study reviews the basics of weakly-informative priors and how the choice of a specific shape of such a prior affects the resulting posterior distribution.

View (HTML)

Author
Michael Betancourt
Keywords
Markov chain Monte Carlo, Hamiltonian Monte Carlo, priors, weakly-informative priors, RStan
Source Repository
betanalpha/knitr_case_studies/weakly_informative_shapes (GitHub)
R Package Dependencies
rstan, knitr.
License
Code: BSD (3 clause), Text: CC BY-NC 4.0

Stan Case Studies,   Volume 3   (2016)



Exact Sparse CAR Models in Stan

This document details sparse exact conditional autoregressive (CAR) models in Stan as an extension of previous work on approximate sparse CAR models in Stan. Sparse representations seem to give order of magnitude efficiency gains, scaling better for large spatial data sets.

View (HTML)

Author
Max Joseph
Keywords
conditional autoregressive (CAR), independent autoregressive (IAR), sparsity, spatial random effects, maps
Source Repository
mbjoseph/CARstan (GitHub)
R Package Dependencies
rstan, dplyr, ggmcmc, knitr, maptools, rgeos, spdep.
License
BSD (3 clause), CC-BY

A Primer on Bayesian Multilevel Modeling using PyStan

This case study replicates the analysis of home radon levels using hierarchical models of Lin, Gelman, Price, and Kurtz (1999). It illustrates how to generalize linear regressions to hierarchical models with group-level predictors and how to compare predictive inferences and evaluate model fits. Along the way it shows how to get data into Stan using pandas, how to sample using PyStan, and how to visualize the results using Seaborn.

View (HTML)

Author
Chris Fonnesbeck
Keywords
hierarchical/multilevel modeling, linear regression, model comparison, predictive inference, radon
Source Repository
fonnesbeck/stan_workshop_2016 (GitHub)
Python Package Dependencies
pystan, numpy, pandas, matplotlib, seaborn
License
Apache 2.0 (code), CC-BY 3 (text)

The Impact of Reparameterization on Point Estimates

When changing variables, a Jacobian adjustment needs to be provided to account for the rate of change of the transform. Applying the adjustment ensures that inferences that are based on expectations over the posterior are invariant under reparameterizations. In contrast, the posterior mode changes as a result of the reparameterization. In this note, we use Stan to code a repeated binary trial model parameterized by chance of success, along with its reparameterization in terms of log odds in order to demonstrate the effect of the Jacobian adjustment on the Bayesian posterior and the posterior mode. We contrast the posterior mode to the maximum likelihood estimate, which, like the Bayesian estimates, is invariant under reparameterization. Along the way, we derive the logistic distribution by transforming a uniformly distributed variable.

View (HTML)

Author
Bob Carpenter
Keywords
MLE, Bayesian posterior, reparameterization, Jacobian, binomial
Source Repository
example-models/knitr/mle-params (GitHub)
R Package Dependencies
rstan
License
BSD (3 clause), CC-BY

Hierarchical Two-Parameter Logistic Item Response Model

This case study documents a Stan model for the two-parameter logistic model (2PL) with hierarchical priors. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a grade 12 science assessment is provided.

View (HTML)

Author
Daniel C. Furr
Keywords
education, item response theory, two-parameter logistic model, hierarchical priors
Source Repository
example-models/education/hierarchical_2pl (GitHub)
R Package Dependencies
rstan, ggplot2, mirt
License
BSD (3 clause), CC-BY

Rating Scale and Generalized Rating Scale Models with Latent Regression

This case study documents a Stan model for the rating scale model (RSM) and the generalized rating scale model (GRSM) with latent regression. The latent regression portion of the models may be restricted to an intercept only, yielding a standard RSM or GRSM. A brief simulation indicates that the Stan models successfully recover the generating parameters. An example using a survey of public perceptions of science and technology is provided.

View (HTML)

Authors
Daniel C. Furr
Keywords
education, item response theory, rating scale model, generalized rating scale model
Source Repository
example-models/education/rsm_and_grsm (GitHub)
R Package Dependencies
rstan, edstan, ggplot2, ltm
License
BSD (3 clause), CC-BY

Partial Credit and Generalized Partial Credit Models with Latent Regression

This case study documents a Stan model for the partial credit model (PCM) and the generalized partial credit model (GPCM) with latent regression. The latent regression portion of the models may be restricted to an intercept only, yielding a standard PCM or GPCM. A brief simulation indicates that the Stan models successfully recover the generating parameters. An example using the TIMSS 2011 mathematics assessment is provided

View (HTML)

Authors
Daniel C. Furr
Keywords
education, item response theory, partial credit model, generalized partial credit model
Source Repository
example-models/education/pcm_and_gpcm (GitHub)
R Package Dependencies
rstan, edstan, ggplot2, TAM
License
BSD (3 clause), CC-BY

Rasch and Two-Parameter Logistic Item Response Models with Latent Regression

This case study documents Stan models for the Rasch and two-parameter logistic models with latent regression. The latent regression portion of the models may be restricted to an intercept only, yielding standard versions of the models. Simulations indicate that the two models successfully recover generating parameters. An example using a grade 12 science assessment is provided.

View (HTML)

Authors
Daniel C. Furr
Keywords
education, item response theory, rasch model, two-parameter logistic model
Source Repository
example-models/education/rasch_and_2pl.html (GitHub)
R Package Dependencies
rstan, edstan, ggplot2, TAM
License
BSD (3 clause), CC-BY

Two-Parameter Logistic Item Response Model

This tutorial introduces the R package edstan for estimating two-parameter logistic item response models using Stan without knowing the Stan language. Subsequently, the tutorial explains how the model can be expressed in the Stan language and fit using the rstan package. Specification of prior distributions and assessment of convergence are discussed. Using the Stan language directly has the advantage that it becomes quite easy to extend the model, and this is demonstrated by adding a latent regression and differential item functioning to the model. Posterior predictive model checking is also demonstrated.

View (HTML)

Author
Daniel C. Furr, Seung Yeon Lee, Joon-Ho Lee, and Sophia Rabe-Hesketh
Keywords
education, item response theory, two-parameter logistic model
Source Repository
example-models/education/tutorial_twopl (GitHub)
R Package Dependencies
rstan, reshape2, ggplot2, gridExtra, devtools, edstan
License
BSD (3 clause), CC-BY

Cognitive Diagnosis Model: DINA model with independent attributes

This case study documents a Stan model for the DINA model with independent attributes. A Simulation indicates that the Stan model successfully recovers the generating parameters and predicts respondents’ attribute mastery. A Stan model with no structure of the attributes is also discussed and applied to the simulated data. An example using a subset of the fraction subtraction data is provided.

View (HTML)

Author
Seung Yeon Lee
Keywords
education, cognitive diagnosis model, diagnostic classification model, attribute mastery, DINA
Source Repository
example-models/education/dina_independent (GitHub)
R Package Dependencies
rstan, ggplot2, CDM
License
BSD (3 clause), CC-BY

Pooling with Hierarchical Models for Repeated Binary Trials

This note illustrates the effects on posterior inference of pooling data (aka sharing strength) across items for repeated binary trial data. It provides Stan models and R code to fit and check predictive models for three situations: (a) complete pooling, which assumes each item is the same, (b) no pooling, which assumes the items are unrelated, and (c) partial pooling, where the similarity among the items is estimated. We consider two hierarchical models to estimate the partial pooling, one with a beta prior on chance of success and another with a normal prior on the log odds of success. The note explains with working examples how to (i) fit models in RStan and plot the results in R using ggplot2, (ii) estimate event probabilities, (iii) evaluate posterior predictive densities to evaluate model predictions on held-out data, (iv) rank items by chance of success, (v) perform multiple comparisons in several settings, (vi) replicate new data for posterior p-values, and (vii) perform graphical posterior predictive checks.

View (HTML)

Author
Bob Carpenter
Keywords
binary trials, pooling, hierarchical models, baseball, epidemiology, prediction, posterior predictive checks
Source Repository
example-models/knitr/pool-binary-trials (GitHub)
R Package Dependencies
rstan, ggplot2, rmarkdown
License
BSD (3 clause), CC-BY

RStanARM version

There is also a version of this case study in which all models are fit using the RStanARM interface. Many of the visualizations are also created using RStanARM’s plotting functions.

View RStanARM version (HTML)

Author
Bob Carpenter, Jonah Gabry, Ben Goodrich

Stan Case Studies,   Volume 2   (2015)



Multiple Species-Site Occupancy Model

This case study replicates the analysis and output graphs of Dorazio et al. (2006) noisy-measurement occupancy model for multiple species abundance of butterflies. Going beyond the paper, the supercommunity assumptions are tested to show they are invariant to sizing, and posterior predictive checks are provided.

View (HTML)

Author
Bob Carpenter
Keywords
ecology, occupancy, species abundance, supercommunity, posterior predictive check
Source Repository
example-models/knitr/dorazio-royle-occupancy (GitHub)
License
BSD (3 clause), CC-BY
R Package Dependencies
rstan, ggplot2, rmarkdown

Stan Case Studies,   Volume 1   (2014)



Soil Carbon Modeling with RStan

This case study provides ordinary differential equation-based compartment models of soil carbon flux, with experimental data fitted with unknown initial compartment balance and noisy CO2 measurements. Results form Sierra and Müller’s (2014) soilR package are replicated.

View (HTML)

Author
Bob Carpenter
Keywords
biogeochemistry, compartment ODE, soil carbon respiration, incubation experiment
Source Repository
soil-metamodel/stan/soil-knit (GitHub)
License
BSD (3 clause), CC-BY
R Package Dependencies
rstan, ggplot2, rmarkdown