3.1 Missing data
Stan treats variables declared in the data
and
transformed data
blocks as known and the variables in the
parameters
block as unknown.
An example involving missing normal observations could be coded as follows.10
data {
int<lower=0> N_obs;
int<lower=0> N_mis;
real y_obs[N_obs];
}
parameters {
real mu;
real<lower=0> sigma;
real y_mis[N_mis];
}
model {
y_obs ~ normal(mu, sigma);
y_mis ~ normal(mu, sigma);
}
The number of observed and missing data points are coded as data with
non-negative integer variables N_obs
and N_mis
. The
observed data are provided as an array data variable y_obs
.
The missing data are coded as an array parameter, y_mis
. The
ordinary parameters being estimated, the location mu
and scale
sigma
, are also coded as parameters. The model is vectorized
on the observed and missing data; combining them in this case would be
less efficient because the data observations would be promoted and
have needless derivatives calculated.
A more meaningful estimation example would involve a regression of the observed and missing observations using predictors that were known for each and specified in the
data
block.↩