29.6 Coding MRP in Stan

Multilevel regression and poststratification can be coded directly in Stan. To code the non-centered parameterization for each coefficient, which will be required for sampling efficiency, the multiplier transform is used on each of the parameters. The combination of

vector<multiplier = s>[K] a;
...
a ~ normal(0, s);

implements a non-centered parameterization for a; a centered parameterization would drop the multiplier specification. The prior scale s is being centered here. The prior location is fixed to zero in multilevel regressions because there is an overall intercept; introducing a location parameters in the prior would introduce non-identifiability with the overall intercept. The centered parameterization drops the multiplier.

Here is the full Stan model, which performs poststratification in the generated quantities using population sizes made available through data variable P.

data {
  int<lower = 0> N;
  int<lower = 1, upper = 4> age[N];
  int<lower = 1, upper = 5> income[N];
  int<lower = 1, upper = 50> state[N];
  int<lower = 0> y[N];
  int<lower = 0> P[4, 5, 50];
}
parameters {
  real alpha;
  real<lower = 0> sigma_beta;
  vector<multiplier = sigma_beta>[4] beta;
  real<lower = 0> sigma_gamma;
  vector<multiplier = sigma_gamma>[5] gamma;
  real<lower = 0> sigma_delta;
  vector<multiplier = sigma_delta>[50] delta;
}
model {
  y ~ bernoulli_logit(alpha + beta[age] + gamma[income] + delta[state]);
  alpha ~ normal(0, 2);
  beta ~ normal(0, sigma_beta);
  gamma ~ normal(0, sigma_gamma);
  delta ~ normal(0, sigma_delta);
  { sigma_beta, sigma_gamma, sigma_delta } ~ normal(0, 1);
}
generated quantities {
  real expect_pos = 0;
  int total = 0;
  for (b in 1:4)
    for (c in 1:5)
      for (d in 1:50) {
        total += P[b, c, d];
        expect_pos
          += P[b, c, d]
             * inv_logit(alpha + beta[b] + gamma[c] + delta[d]);
      }
  real<lower = 0, upper = 1> phi = expect_pos / total;
}

Unlike in posterior predictive inference aimed at uncertainty, there is no need to introduce binomial sampling uncertainty into the estimate of expected positive votes. Instead, generated quantities are computed as expectations. In general, it is more efficient to work in expectation if possible (the Rao-Blackwell theorem says it’s at least as efficient to work in expectation, but in practice, it can be much much more efficient, especially for discrete quantities).

29.6.1 Binomial coding

In some cases, it can be more efficient to break the data down by group. Suppose there are \(4 \times 5 \times 2 \times 50 = 2000\) groups. The data can be broken down into a size-2000 array, with entries corresponding to total vote counts in that group

int<lower = 0> G;
int<lower = 1, upper = 4> age[G];
int<lower = 1, upper = 5> income[G];
int<lower = 1, upper = 50> state[G];

Then the number of positive votes and the number of total votes are collected into two parallel arrays indexed by group.

int<lower = 0> pos_votes[G];
int<lower = 0> total_votes[G];

Finally, the likelihood is converted to binomial.

pos_votes ~ binomial_logit(total_votes,
                           alpha + beta[age] + ...);

The predictors look the same because of the way the age and other data items are coded.

29.6.2 Coding binary groups

In this first model, sex is not included as a predictor. With only two categories, it needs to be modeled separately, because it is not feasible to build a hierarchical model with only two cases. A sex predictor is straightforward to add to the data block; it takes on values 1 or 2 for each of the N data points.

  int<lower = 1, upper = 2> sex[N];

Then add a single regression coefficient as a parameter,

  real epsilon;

In the log odds calculation, introduce a new term

[epsilon, -epsilon][sex]';

That is, the likelihood will now look like

  y ~ bernoulli_logit(alpha + beta[age] + gamma[income] + delta[state]
                      + [epsilon, -epsilon][sex]');

For data point n, the expression [epsilon, -epsilon][sex] takes on value [epsilon, -epsilon][sex][n], which with Stan’s multi-indexing reduces to [epsilon, -epsilon][sex[n]]. This term evaluates to epsilon if sex[n] is 1 and to -epsilon if sex[n] is 2. The result is effectively a sum-to-zero constraint on two sex coefficients. The ' at the end transposes [epsilon, -epsilon][sex] which is a row_vector into a vector that can be added to the other variables.

Finally, a prior is needed for the coefficient in the model block,

epsilon ~ normal(0, 2);

As with other priors in multilevel models, the posterior for epsilon should be investigated to make sure it is not unrealistically wide.