Introduction

This vignette demonstrates how to write a Stan program that computes and stores the pointwise log-likelihood required for using the loo package. The other vignettes included with the package demonstrate additional functionality.

Some sections from this vignette are excerpted from our papers

  • Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. :10.1007/s11222-016-9696-4. Links: published | arXiv preprint.

  • Vehtari, A., Gelman, A., and Gabry, J. (2017). Pareto smoothed importance sampling. arXiv preprint arXiv:1507.04544.

which provide important background for understanding the methods implemented in the package.

Example: Well water in Bangladesh

This example comes from a survey of residents from a small area in Bangladesh that was affected by arsenic in drinking water. Respondents with elevated arsenic levels in their wells were asked if they were interested in getting water from a neighbor’s well, and a series of logistic regressions were fit to predict this binary response given various information about the households (Gelman and Hill, 2007). Here we fit a model for the well-switching response given two predictors: the arsenic level of the water in the resident’s home, and the distance of the house from the nearest safe well.

The sample size in this example is \(N=3020\), which is not huge but is large enough that it is important to have a computational method for LOO that is fast for each data point. On the plus side, with such a large dataset, the influence of any given observation is small, and so the computations should be stable.

Coding the Stan model

Here is the Stan code for fitting the logistic regression model, which we save in a file called logistic.stan:

data {
  int<lower=0> N;             // number of data points
  int<lower=0> P;             // number of predictors (including intercept)
  matrix[N,P] X;              // predictors (including 1s for intercept)
  int<lower=0,upper=1> y[N];  // binary outcome
}
parameters {
  vector[P] beta;
}
model {
  beta ~ normal(0, 1);
  y ~ bernoulli_logit(X * beta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);
  }
}

We have defined the log likelihood as a vector named log_lik in the generated quantities block so that the individual terms will be saved by Stan. After running Stan, log_lik can be extracted (using the extract_log_lik function provided in the loo package) as an \(S \times N\) matrix, where \(S\) is the number of simulations (posterior draws) and \(N\) is the number of data points.

Fitting the model with RStan

Next we fit the model in Stan using the rstan package:

library("rstan")

# Prepare data 
url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat"
wells <- read.table(url)
wells$dist100 <- with(wells, dist / 100)
X <- model.matrix(~ dist100 + arsenic, wells)
standata <- list(y = wells$switch, X = X, N = nrow(X), P = ncol(X))

# Fit model
fit_1 <- stan("logistic.stan", data = standata)
print(fit_1, pars = "beta")
         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1]  0.00       0 0.08 -0.16 -0.05  0.00  0.05  0.15  1964    1
beta[2] -0.89       0 0.10 -1.09 -0.96 -0.89 -0.82 -0.68  2048    1
beta[3]  0.46       0 0.04  0.38  0.43  0.46  0.49  0.54  2198    1

Computing approximate leave-one-out cross-validation usig PSIS-LOO

We can then use the loo package to compute the efficient PSIS-LOO approximation to exact LOO-CV:

library("loo")

# Extract pointwise log-likelihood and compute LOO
log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)

# as of loo v2.0.0 we can optionally provide relative effective sample sizes
# when calling loo, which allows for better estimates of the PSIS effective
# sample sizes and Monte Carlo error
r_eff <- relative_eff(exp(log_lik_1)) 

loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2)
print(loo_1)
Computed from 4000 by 3020 log-likelihood matrix

         Estimate   SE
elpd_loo  -1968.5 15.6
p_loo         3.2  0.1
looic      3937.0 31.2
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

The printed output from the loo function shows the estimates \(\widehat{\mbox{elpd}}_{\rm loo}\) (expected log predictive density), \(\widehat{p}_{\rm loo}\) (effective number of parameters), and \({\rm looic} =-2\, \widehat{\mbox{elpd}}_{\rm loo}\) (the LOO information criterion).

The line at the bottom of the printed output provides information about the reliability of the LOO approximation (the interpretation of the \(k\) parameter is explained in the PSIS-LOO section in help("loo-package") and in greater detail in Vehtari, Gelman, and Gabry (2017)). In this case the message tells us that all of the estimates for \(k\) are fine.

Comparing models

To compare this model to an alternative model for the same data we can use the compare function in the loo package. First we’ll fit a second model to the well-switching data, using log(arsenic) instead of arsenic as a predictor:

standata$X[, "arsenic"] <- log(standata$X[, "arsenic"])
fit_2 <- stan(fit = fit_1, data = standata) 

log_lik_2 <- extract_log_lik(fit_2, merge_chains = FALSE)
r_eff_2 <- relative_eff(exp(log_lik_2))
loo_2 <- loo(log_lik_2, r_eff = r_eff_2, cores = 2)
print(loo_2)
Computed from 4000 by 3020 log-likelihood matrix

         Estimate   SE
elpd_loo  -1952.3 16.2
p_loo         3.1  0.1
looic      3904.6 32.4
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

We can now compare the models on LOO using the compare function:

# Compare
comp <- compare(loo_1, loo_2)

This new object, comp, contains the estimated difference of expected leave-one-out prediction errors between the two models, along with the standard error:

print(comp)
elpd_diff        se 
     16.2       4.4 

The positive difference in elpd (and its scale relative to the standard error) indicates a preference for the second model.

References

Gelman, A., and Hill, J. (2007). Data Analysis Using Regression and Multilevel Hierarchical Models. Cambridge University Press.

Stan Development Team (2017). The Stan C++ Library, Version 2.17.0. http://mc-stan.org

Stan Development Team (2018) RStan: the R interface to Stan, Version 2.17.3. http://mc-stan.org

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. :10.1007/s11222-016-9696-4. online, arXiv preprint arXiv:1507.04544.

Vehtari, A., Gelman, A., and Gabry, J. (2017). Pareto smoothed importance sampling. arXiv preprint arXiv:1507.02646.