6.1 Bayesian Measurement Error Model
A Bayesian approach to measurement error can be formulated directly by treating the true quantities being measured as missing data (Clayton 1992; Richardson and Gilks 1993). This requires a model of how the measurements are derived from the true values.
Regression with Measurement Error
Before considering regression with measurement error, first consider a linear regression model where the observed data for \(N\) cases includes a predictor \(x_n\) and outcome \(y_n\). In Stan, a linear regression for \(y\) based on \(x\) with a slope and intercept is modeled as follows.
data {
int<lower=0> N; // number of cases
vector[N] x; // predictor (covariate)
vector[N] y; // outcome (variate)
}
parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}
model {
y ~ normal(alpha + beta * x, sigma);
alpha ~ normal(0, 10);
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);
}
Now suppose that the true values of the predictors \(x_n\) are not known, but for each \(n\), a measurement \(x^{\mathrm meas}_n\) of \(x_n\) is available. If the error in measurement can be modeled, the measured value \(x^{\mathrm{meas}}_n\) can be modeled in terms of the true value \(x_n\) plus measurement noise. The true value \(x_n\) is treated as missing data and estimated along with other quantities in the model. A simple approach is to assume the measurement error is normal with known deviation \(\tau\). This leads to the following regression model with constant measurement error.
data {
...
real x_meas[N]; // measurement of x
real<lower=0> tau; // measurement noise
}
parameters {
real x[N]; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x); // prior
x_meas ~ normal(x, tau); // measurement model
y ~ normal(alpha + beta * x, sigma);
...
}
The regression coefficients alpha
and beta
and
regression noise scale sigma
are the same as before, but now
x
is declared as a parameter rather than as data. The data are
now x_meas
, which is a measurement of the true x
value
with noise scale tau
. The model then specifies that the
measurement error for x_meas[n]
given true value x[n]
is normal with deviation tau
. Furthermore, the true values
x
are given a hierarchical prior here.
In cases where the measurement errors are not normal, richer measurement error models may be specified. The prior on the true values may also be enriched. For instance, Clayton (1992) introduces an exposure model for the unknown (but noisily measured) risk factors \(x\) in terms of known (without measurement error) risk factors \(c\). A simple model would regress \(x_n\) on the covariates \(c_n\) with noise term \(\upsilon\), \[ x_n \sim \mathsf{normal}(\gamma^{\top}c, \upsilon). \] This can be coded in Stan just like any other regression. And, of course, other exposure models can be provided.
Rounding
A common form of measurement error arises from rounding measurements. Rounding may be done in many ways, such as rounding weights to the nearest milligram, or to the nearest pound; rounding may even be done by rounding down to the nearest integer.
Exercise 3.5(b) from Gelman et al. (2013) provides an example.
Letting \(z_n\) be the unrounded measurement for \(y_n\), the problem as stated assumes the likelihood
\[ z_n \sim \mathsf{normal}(\mu, \sigma). \]
The rounding process entails that \(z_n \in (y_n - 0.5, y_n + 0.5)\). The probability mass function for the discrete observation \(y\) is then given by marginalizing out the unrounded measurement, producing the likelihood \[ p(y_n \, | \, \mu, \sigma) = \int_{y_n - 0.5}^{y_n + 0.5} \ \mathsf{normal}(z_n \, | \, \mu, \sigma) \ \mathrm{d}z_n = \Phi\!\left(\frac{y_n + 0.5 - \mu}{\sigma}\right) - \Phi\!\left(\frac{y_n - 0.5 - \mu}{\sigma}\right). \] Gelman’s answer for this problem took the noninformative prior to be uniform in the variance \(\sigma^2\) on the log scale, which yields (due to the Jacobian adjustment), the prior density \[ p(\mu, \sigma^2) \propto \frac{1}{\sigma^2}. \] The posterior after observing \(y = (10, 10, 12, 11, 9)\) can be calculated by Bayes’s rule as
\[ \begin{array}{rcl} p(\mu, \sigma^2 \, | \, y) & \propto & p(\mu, \sigma^2) \ p(y \, | \, \mu, \sigma^2) \\[6pt] & \propto & \frac{1}{\sigma^2} \ \prod_{n=1}^5 \left( \Phi\!\left(\frac{y_n + 0.5 - \mu}{\sigma}\right) - \Phi\!\left(\frac{y_n - 0.5 - \mu}{\sigma}\right) \right). \end{array} \]
The Stan code simply follows the mathematical definition, providing an example of the direct definition of a probability function up to a proportion.
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
real<lower=0> sigma_sq;
}
transformed parameters {
real<lower=0> sigma;
sigma = sqrt(sigma_sq);
}
model {
target += -2 * log(sigma);
for (n in 1:N)
target += log(Phi((y[n] + 0.5 - mu) / sigma)
- Phi((y[n] - 0.5 - mu) / sigma));
}
Alternatively, the model may be defined with latent parameters for the unrounded measurements \(z_n\). The Stan code in this case uses the likelihood for \(z_n\) directly while respecting the constraint \(z_n \in (y_n - 0.5, y_n + 0.5)\). Because Stan does not allow varying upper- and lower-bound constraints on the elements of a vector (or array), the parameters are declared to be the rounding error \(y - z\), and then \(z\) is defined as a transformed parameter.
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
real<lower=0> sigma_sq;
vector<lower=-0.5, upper=0.5>[N] y_err;
}
transformed parameters {
real<lower=0> sigma;
vector[N] z;
sigma = sqrt(sigma_sq);
z = y + y_err;
}
model {
target += -2 * log(sigma);
z ~ normal(mu, sigma);
}
This explicit model for the unrounded measurements \(z\) produces the same posterior for \(\mu\) and \(\sigma\) as the previous model that marginalizes \(z\) out. Both approaches mix well, but the latent parameter version is about twice as efficient in terms of effective samples per iteration, as well as providing a posterior for the unrounded parameters.