30.6 Coding MRP in Stan
Multilevel regression and poststratification can be coded directly in
Stan. To code the non-centered parameterization for each coefficient,
which will be required for sampling efficiency, the multiplier
transform is used on each of the parameters. The combination of
vector<multiplier=s>[K] a;
// ...
a ~ normal(0, s);
implements a non-centered parameterization for a
; a centered
parameterization would drop the multiplier
specification. The prior
scale s
is being centered here. The prior location is fixed to
zero in multilevel regressions because there is an overall intercept;
introducing a location parameters in the prior would introduce
non-identifiability with the overall intercept. The centered
parameterization drops the multiplier
.
Here is the full Stan model, which performs poststratification in the
generated quantities using population sizes made available through
data variable P
.
data {
int<lower=0> N;
array[N] int<lower=1, upper=4> age;
array[N] int<lower=1, upper=5> income;
array[N] int<lower=1, upper=50> state;
array[N] int<lower=0> y;
array[4, 5, 50] int<lower=0> P;
}
parameters {
real alpha;
real<lower=0> sigma_beta;
vector<multiplier=sigma_beta>[4] beta;
real<lower=0> sigma_gamma;
vector<multiplier=sigma_gamma>[5] gamma;
real<lower=0> sigma_delta;
vector<multiplier=sigma_delta>[50] delta;
}
model {
y ~ bernoulli_logit(alpha + beta[age] + gamma[income] + delta[state]);
alpha ~ normal(0, 2);
beta ~ normal(0, sigma_beta);
gamma ~ normal(0, sigma_gamma);
delta ~ normal(0, sigma_delta);
{ sigma_beta, sigma_gamma, sigma_delta } ~ normal(0, 1);
}
generated quantities {
real expect_pos = 0;
int total = 0;
for (b in 1:4) {
for (c in 1:5) {
for (d in 1:50) {
total += P[b, c, d];
expect_pos
+= P[b, c, d]
* inv_logit(alpha + beta[b] + gamma[c] + delta[d]);
}
}
}
real<lower=0, upper=1> phi = expect_pos / total;
}
Unlike in posterior predictive inference aimed at uncertainty, there is no need to introduce binomial sampling uncertainty into the estimate of expected positive votes. Instead, generated quantities are computed as expectations. In general, it is more efficient to work in expectation if possible (the Rao-Blackwell theorem says it’s at least as efficient to work in expectation, but in practice, it can be much much more efficient, especially for discrete quantities).
30.6.1 Binomial coding
In some cases, it can be more efficient to break the data down by group. Suppose there are \(4 \times 5 \times 2 \times 50 = 2000\) groups. The data can be broken down into a size-2000 array, with entries corresponding to total vote counts in that group
int<lower=0> G;
array[G] int<lower=1, upper=4> age;
array[G] int<lower=1, upper=5> income;
array[G] int<lower=1, upper=50> state;
Then the number of positive votes and the number of total votes are collected into two parallel arrays indexed by group.
array[G] int<lower=0> pos_votes;
array[G] int<lower=0> total_votes;
Finally, the likelihood is converted to binomial.
pos_votes ~ binomial_logit(total_votes,
alpha + beta[age] + ...);
The predictors look the same because of the way the age
and other
data items are coded.
30.6.2 Coding binary groups
In this first model, sex is not included as a predictor. With only
two categories, it needs to be modeled separately, because it is not
feasible to build a hierarchical model with only two cases.
A sex predictor is straightforward to add to the data block; it takes
on values 1 or 2 for each of the N
data points.
array[N] int<lower=1, upper=2> sex;
Then add a single regression coefficient as a parameter,
real epsilon;
In the log odds calculation, introduce a new term
[epsilon, -epsilon][sex]';
That is, the likelihood will now look like
y ~ bernoulli_logit(alpha + beta[age] + gamma[income] + delta[state]
+ [epsilon, -epsilon][sex]');
For data point n
, the expression [epsilon, -epsilon][sex]
takes on
value [epsilon, -epsilon][sex][n]
, which with Stan’s multi-indexing
reduces to [epsilon, -epsilon][sex[n]]
. This term evaluates to
epsilon
if sex[n]
is 1 and to -epsilon
if sex[n]
is 2. The
result is effectively a sum-to-zero constraint on two sex
coefficients. The '
at the end transposes [epsilon, -epsilon][sex]
which is a row_vector
into a vector
that can be added to the other
variables.
Finally, a prior is needed for the coefficient in the model block,
epsilon ~ normal(0, 2);
As with other priors in multilevel models, the posterior for epsilon
should be investigated to make sure it is not unrealistically wide.