1.15 Multivariate Outcomes
Most regressions are set up to model univariate observations (be they scalar, boolean, categorical, ordinal, or count). Even multinomial regressions are just repeated categorical regressions. In contrast, this section discusses regression when each observed value is multivariate. To relate multiple outcomes in a regression setting, their error terms are provided with covariance structure.
This section considers two cases, seemingly unrelated regressions for continuous multivariate quantities and multivariate probit regression for boolean multivariate quantities.
Multivariate Probit Regression
The multivariate probit model generates sequences of boolean variables by applying a step function to the output of a seemingly unrelated regression.
The observations \(y_n\) are \(D\)-vectors of boolean values (coded 0 for false, 1 for true). The values for the observations \(y_n\) are based on latent values \(z_n\) drawn from a seemingly unrelated regression model (see the previous section),
\[ \begin{array}{rcl} z_n & = & x_n \, \beta + \epsilon_n \\[4pt] \epsilon_n & \sim & \mathsf{multivariate\ normal}(0, \Sigma) \end{array} \]
These are then put through the step function to produce a \(K\)-vector \(z_n\) of boolean values with elements defined by \[ y_{n, k} = \mathrm{I}(z_{n, k} > 0), \] where \(\mathrm{I}()\) is the indicator function taking the value 1 if its argument is true and 0 otherwise.
Unlike in the seemingly unrelated regressions case, here the covariance matrix \(\Sigma\) has unit standard deviations (i.e., it is a correlation matrix). As with ordinary probit and logistic regressions, letting the scale vary causes the model (which is defined only by a cutpoint at 0, not a scale) to be unidentified (see Greene (2011)).
Multivariate probit regression can be coded in Stan using the trick introduced by Albert and Chib (1993), where the underlying continuous value vectors \(y_n\) are coded as truncated parameters. The key to coding the model in Stan is declaring the latent vector \(z\) in two parts, based on whether the corresponding value of \(y\) is 0 or 1. Otherwise, the model is identical to the seemingly unrelated regression model in the previous section.
First, we introduce a sum function for two-dimensional arrays of integers; this is going to help us calculate how many total 1 values there are in \(y\).
functions {
int sum2d(int[,] a) {
int s = 0;
for (i in 1:size(a))
s += sum(a[i]);
return s;
}
}
The function is trivial, but it’s not a built-in for Stan and it’s easier to understand the rest of the model if it’s pulled into its own function so as not to create a distraction.
The data declaration block is much like for the seemingly unrelated
regressions, but the observations y
are now integers
constrained to be 0 or 1.
data {
int<lower=1> K;
int<lower=1> D;
int<lower=0> N;
int<lower=0,upper=1> y[N,D];
vector[K] x[N];
}
After declaring the data, there is a rather involved transformed data
block whose sole purpose is to sort the data array y
into
positive and negative components, keeping track of indexes so that
z
can be easily reassembled in the transformed parameters
block.
transformed data {
int<lower=0> N_pos;
int<lower=1,upper=N> n_pos[sum2d(y)];
int<lower=1,upper=D> d_pos[size(n_pos)];
int<lower=0> N_neg;
int<lower=1,upper=N> n_neg[(N * D) - size(n_pos)];
int<lower=1,upper=D> d_neg[size(n_neg)];
N_pos = size(n_pos);
N_neg = size(n_neg);
{
int i;
int j;
i = 1;
j = 1;
for (n in 1:N) {
for (d in 1:D) {
if (y[n,d] == 1) {
n_pos[i] = n;
d_pos[i] = d;
i += 1;
} else {
n_neg[j] = n;
d_neg[j] = d;
j += 1;
}
}
}
}
}
The variables N_pos
and N_neg
are set to the number of
true (1) and number of false (0) observations in y
. The loop
then fills in the sequence of indexes for the positive and negative
values in four arrays.
The parameters are declared as follows.
parameters {
matrix[D, K] beta;
cholesky_factor_corr[D] L_Omega;
vector<lower=0>[N_pos] z_pos;
vector<upper=0>[N_neg] z_neg;
}
These include the regression coefficients beta
and the Cholesky
factor of the correlation matrix, L_Omega
. This time there is
no scaling because the covariance matrix has unit scale (i.e., it is a
correlation matrix; see above).
The critical part of the parameter declaration is that the latent real value \(z\) is broken into positive-constrained and negative-constrained components, whose size was conveniently calculated in the transformed data block. The transformed data block’s real work was to allow the transformed parameter block to reconstruct \(z\).
transformed parameters {
vector[D] z[N];
for (n in 1:N_pos)
z[n_pos[n], d_pos[n]] = z_pos[n];
for (n in 1:N_neg)
z[n_neg[n], d_neg[n]] = z_neg[n];
}
At this point, the model is simple, pretty much recreating the seemingly unrelated regression.
model {
L_Omega ~ lkj_corr_cholesky(4);
to_vector(beta) ~ normal(0, 5);
{
vector[D] beta_x[N];
for (n in 1:N)
beta_x[n] = beta * x[n];
z ~ multi_normal_cholesky(beta_x, L_Omega);
}
}
This simple form of model is made possible by the Albert and
Chib-style constraints on z
.
Finally, the correlation matrix itself can be put back together in the generated quantities block if desired.
generated quantities {
corr_matrix[D] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
}
The same could be done for the seemingly unrelated regressions in the previous section.