28.6 Mixed predictive replication for hierarchical models
Andrew Gelman, Meng, and Stern (1996) discuss the case of mixed replication for hierarchical models in which the hyperparameters remain fixed, but varying effects are replicated. This is neither a purely prior nor purely posterior predictive check, but falls somewhere in between.
For example, consider a simple varying intercept logistic regression, with intercepts \(\alpha_k\) for \(k \in 1:K\). Each data item \(y_n \in \{ 0, 1 \}\) is assumed to correspond to group \(kk_n \in 1:K.\) The sampling distribution is thus \[ y_n \sim \textrm{bernoulli}(\textrm{logit}^{-1}(\alpha_{kk[n]})). \] The varying intercepts have a hierarchical normal prior, \[ \alpha_k \sim \textrm{normal}(\mu, \sigma). \] The hyperparameters are themselves given weakly informative priors, \[\begin{eqnarray*} \mu & \sim & \textrm{normal}(0, 2) \\[4pt] \sigma & \sim & \textrm{lognormal}(0, 1). \end{eqnarray*}\]
Like in a posterior predictive check, the hyperparameters \(\mu\) and \(\sigma\) are drawn from the posterior, \[ \mu^{(m)}, \sigma^{(m)} \sim p(\mu, \sigma \mid y) \] Like in a prior predictive check, replicated values of \(\alpha\) are drawn from the hyperparameters, \[ \alpha^{\textrm{rep}(m)}_k \sim \textrm{normal}(\alpha_k \mid \mu^{(m)}, \sigma^{(m)}). \] The data items are then each replicated using the replicated intercepts, \[ y^{\textrm{rep}(m)}_n \sim \textrm{bernoulli} (\textrm{logit}^{-1}(\alpha^{\textrm{rep}(m)}_{kk[n]})). \] Thus the \(y^{\textrm{rep}(m)}\) can be seen as a kind of posterior predictive replication of observations from new groups that were not among the original \(K\) groups.
In Stan, mixed predictive replications \(y^{\textrm{rep}(m)}\) can be programmed directly.
data {
int<lower=0> K;
int<lower=0> N;
array[N] int<lower=1, upper=K> kk;
array[N] int<lower=0, upper=1> y;
}
parameters {
real mu;
real<lower=0> sigma;
vector<offset=mu, multiplier=sigma>[K] alpha;
}
model {
mu ~ normal(0, 2); // hyperprior
sigma ~ lognormal(0, 1);
alpha ~ normal(mu, sigma); // hierarchical prior
y ~ bernoulli_logit(alpha[kk]); // sampling distribution
}
generated quantities {
// alpha replicated; mu and sigma not replicated
array[K] real alpha_rep
= normal_rng(rep_vector(mu, K), sigma);
array[N] int<lower=0, upper=1> y_rep
= bernoulli_logit_rng(alpha_rep[kk]);
}