Measurement Error and Meta-Analysis
Most quantities used in statistical models arise from measurements. Most of these measurements are taken with some error. When the measurement error is small relative to the quantity being measured, its effect on a model is usually small. When measurement error is large relative to the quantity being measured, or when precise relations can be estimated being measured quantities, it is useful to introduce an explicit model of measurement error. One kind of measurement error is rounding.
Meta-analysis plays out statistically much like measurement error models, where the inferences drawn from multiple data sets are combined to do inference over all of them. Inferences for each data set are treated as providing a kind of measurement error with respect to true parameter values.
Bayesian measurement error model
A Bayesian approach to measurement error can be formulated directly by treating the true quantities being measured as missing data (Clayton 1992; Richardson and Gilks 1993). This requires a model of how the measurements are derived from the true values.
Regression with measurement error
Before considering regression with measurement error, first consider a linear regression model where the observed data for \(N\) cases includes a predictor \(x_n\) and outcome \(y_n\). In Stan, a linear regression for \(y\) based on \(x\) with a slope and intercept is modeled as follows.
data {
int<lower=0> N; // number of cases
vector[N] x; // predictor (covariate)
vector[N] y; // outcome (variate)
}parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}model {
y ~ normal(alpha + beta * x, sigma);0, 10);
alpha ~ normal(0, 10);
beta ~ normal(0, 5);
sigma ~ cauchy( }
Now suppose that the true values of the predictors \(x_n\) are not known, but for each \(n\), a measurement \(x^{\textrm{meas}}_n\) of \(x_n\) is available. If the error in measurement can be modeled, the measured value \(x^{\textrm{meas}}_n\) can be modeled in terms of the true value \(x_n\) plus measurement noise. The true value \(x_n\) is treated as missing data and estimated along with other quantities in the model. A simple approach is to assume the measurement error is normal with known deviation \(\tau\). This leads to the following regression model with constant measurement error.
data {
// ...
array[N] real x_meas; // measurement of x
real<lower=0> tau; // measurement noise
}parameters {
array[N] real x; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
// ...
}model {
// prior
x ~ normal(mu_x, sigma_x); // measurement model
x_meas ~ normal(x, tau);
y ~ normal(alpha + beta * x, sigma);// ...
}
The regression coefficients alpha
and beta
and regression noise scale sigma
are the same as before, but now x
is declared as a parameter rather than as data. The data are now x_meas
, which is a measurement of the true x
value with noise scale tau
. The model then specifies that the measurement error for x_meas[n]
given true value x[n]
is normal with deviation tau
. Furthermore, the true values x
are given a hierarchical prior here.
In cases where the measurement errors are not normal, richer measurement error models may be specified. The prior on the true values may also be enriched. For instance, Clayton (1992) introduces an exposure model for the unknown (but noisily measured) risk factors \(x\) in terms of known (without measurement error) risk factors \(c\). A simple model would regress \(x_n\) on the covariates \(c_n\) with noise term \(\upsilon\), \[ x_n \sim \textsf{normal}(\gamma^{\top}c, \upsilon). \] This can be coded in Stan just like any other regression. And, of course, other exposure models can be provided.
Rounding
A common form of measurement error arises from rounding measurements. Rounding may be done in many ways, such as rounding weights to the nearest milligram, or to the nearest pound; rounding may even be done by rounding down to the nearest integer.
Exercise 3.5(b) by Gelman et al. (2013) provides an example.
3.5. Suppose we weigh an object five times and measure weights, rounded to the nearest pound, of 10, 10, 12, 11, 9. Assume the unrounded measurements are normally distributed with a noninformative prior distribution on \(\mu\) and \(\sigma^2\).
- Give the correct posterior distribution for \((\mu, \sigma^2)\), treating the measurements as rounded.
Letting \(z_n\) be the unrounded measurement for \(y_n\), the problem as stated assumes \[ z_n \sim \textsf{normal}(\mu, \sigma). \]
The rounding process entails that \(z_n \in (y_n - 0.5, y_n + 0.5)\)1. The probability mass function for the discrete observation \(y\) is then given by marginalizing out the unrounded measurement, producing the likelihood \[\begin{align*} p(y_n \mid \mu, \sigma) &= \int_{y_n - 0.5}^{y_n + 0.5} \textsf{normal}(z_n \mid \mu, \sigma) \,\textsf{d}z_n \\ &= \Phi\!\left(\frac{y_n + 0.5 - \mu}{\sigma}\right) -\Phi\!\left(\frac{y_n - 0.5 - \mu}{\sigma}\right). \end{align*}\] Gelman’s answer for this problem took the noninformative prior to be uniform in the variance \(\sigma^2\) on the log scale, but we replace it with more recently recommended half-normal prior on \(\sigma\) \[ \sigma \sim \textsf{normal}^+(0, 1). \] The posterior after observing \(y = (10, 10, 12, 11, 9)\) can be calculated by Bayes’s rule as \[\begin{align*} p(\mu, \sigma \mid y) &\propto p(\mu, \sigma) \ p(y \mid \mu, \sigma) \\ &\propto \textsf{normal}^+(\sigma \mid 0, 1)\prod_{n=1}^5 \left( \Phi\!\left(\frac{y_n + 0.5 - \mu}{\sigma}\right) -\Phi\!\left(\frac{y_n - 0.5 - \mu}{\sigma}\right) \right). \end{align*}\]
The Stan code simply follows the mathematical definition, providing an example of the direct definition of a probability function up to a proportion.
data {
int<lower=0> N;
vector[N] y;
}parameters {
real mu;
real<lower=0> sigma;
}model {
0, 1);
sigma ~ normal(for (n in 1:N) {
target += log_diff_exp(normal_lcdf(y[n] + 0.5 | mu, sigma),
0.5 | mu, sigma));
normal_lcdf(y[n] -
} }
where normal_lcdf(y[n]+0.5 | mu, sigma)
is equal to log(Phi((y[n] + 0.5 - mu) / sigma))
, and log_diff_exp(a, b)
computes log(exp(a) - exp(b))
in numerically more stable way.
Alternatively, the model may be defined with latent parameters for the unrounded measurements \(z_n\). The Stan code in this case uses a distribution statement for \(z_n\) directly while respecting the constraint \(z_n \in (y_n - 0.5, y_n + 0.5)\).
data {
int<lower=0> N;
vector[N] y;
}parameters {
real mu;
real<lower=0> sigma;
vector<lower=y-0.5, upper=y+0.5>[N] z;
}model {
0, 1);
sigma ~ normal(
z ~ normal(mu, sigma); }
This explicit model for the unrounded measurements \(z\) produces the same posterior for \(\mu\) and \(\sigma\) as the previous model that marginalizes \(z\) out. Both approaches mix well, but the latent parameter version is about twice as efficient in terms of effective samples per iteration, as well as providing a posterior for the unrounded parameters.
Meta-analysis
Meta-analysis aims to pool the data from several studies, such as the application of a tutoring program in several schools or treatment using a drug in several clinical trials.
The Bayesian framework is particularly convenient for meta-analysis, because each previous study can be treated as providing a noisy measurement of some underlying quantity of interest. The model then follows directly from two components, a prior on the underlying quantities of interest and a measurement-error style model for each of the studies being analyzed.
Treatment effects in controlled studies
Suppose the data in question arise from a total of \(M\) studies providing paired binomial data for a treatment and control group. For instance, the data might be post-surgical pain reduction under a treatment of ibuprofen (Warn, Thompson, and Spiegelhalter 2002) or mortality after myocardial infarction under a treatment of beta blockers (Gelman et al. 2013, sec. 5.6).
Data
The clinical data consists of \(J\) trials, each with \(n^t\) treatment cases, \(n^c\) control cases, \(r^t\) successful outcomes among those treated and \(r^c\) successful outcomes among those in the control group. This data can be declared in Stan as follows.2
data {
int<lower=0> J;
array[J] int<lower=0> n_t; // num cases, treatment
array[J] int<lower=0> r_t; // num successes, treatment
array[J] int<lower=0> n_c; // num cases, control
array[J] int<lower=0> r_c; // num successes, control
}
Converting to log odds and standard error
Although the clinical trial data are binomial in its raw format, it may be transformed to an unbounded scale by considering the log odds ratio \[\begin{align*} y_j &= \log \left( \frac{r^t_j / (n^t_j - r^t_j)} {r^c_j / (n^c_j - r^c_j)} \right) \\ &= \log \left( \frac{r^t_j}{n^t_j - r^t_j} \right) -\log \left( \frac{r^c_j}{n^c_j - r^c_j} \right) \end{align*}\] and corresponding standard errors \[ \sigma_j = \sqrt{ \frac{1}{r^T_i} + \frac{1}{n^T_i - r^T_i} + \frac{1}{r^C_i} + \frac{1}{n^C_i - r^C_i} }. \]
The log odds and standard errors can be defined in a transformed data block, though care must be taken not to use integer division.3
transformed data {
array[J] real y;
array[J] real<lower=0> sigma;
for (j in 1:J) {
y[j] = log(r_t[j]) - log(n_t[j] - r_t[j])
- (log(r_c[j]) - log(n_c[j] - r_c[j]));
}for (j in 1:J) {
1 / r_t[j] + 1 / (n_t[j] - r_t[j])
sigma[j] = sqrt(1 / r_c[j] + 1 / (n_c[j] - r_c[j]));
+
} }
This definition will be problematic if any of the success counts is zero or equal to the number of trials. If that arises, a direct binomial model will be required or other transforms must be used than the unregularized sample log odds.
Non-hierarchical model
With the transformed data in hand, two standard forms of meta-analysis can be applied. The first is a so-called “fixed effects” model, which assumes a single parameter for the global odds ratio. This model is coded in Stan as follows.
parameters {
real theta; // global treatment effect, log odds
}model {
y ~ normal(theta, sigma); }
The distribution statement for y
is vectorized; it has the same effect as the following.
for (j in 1:J) {
y[j] ~ normal(theta, sigma[j]); }
It is common to include a prior for theta
in this model, but it is not strictly necessary for the model to be proper because y
is fixed and \(\textsf{normal}(y \mid \mu,\sigma) =
\textsf{normal}(\mu \mid y,\sigma)\).
Hierarchical model
To model so-called “random effects,” where the treatment effect may vary by clinical trial, a hierarchical model can be used. The parameters include per-trial treatment effects and the hierarchical prior parameters, which will be estimated along with other unknown quantities.
parameters {
array[J] real theta; // per-trial treatment effect
real mu; // mean treatment effect
real<lower=0> tau; // deviation of treatment effects
}model {
y ~ normal(theta, sigma);
theta ~ normal(mu, tau);0, 10);
mu ~ normal(0, 5);
tau ~ cauchy( }
Although the vectorized distribution statement for y
appears unchanged, the parameter theta
is now a vector. The distribution statement for theta
is also vectorized, with the hyperparameters mu
and tau
themselves being given wide priors compared to the scale of the data.
Rubin (1981) provided a hierarchical Bayesian meta-analysis of the treatment effect of Scholastic Aptitude Test (SAT) coaching in eight schools based on the sample treatment effect and standard error in each school.
Extensions and alternatives
Smith, Spiegelhalter, and Thomas (1995) and Gelman et al. (2013, sec. 19.4) provide meta-analyses based directly on binomial data. Warn, Thompson, and Spiegelhalter (2002) consider the modeling implications of using alternatives to the log-odds ratio in transforming the binomial data.
If trial-specific predictors are available, these can be included directly in a regression model for the per-trial treatment effects \(\theta_j\).
References
Footnotes
There are several different rounding rules (see, e.g., Wikipedia: Rounding), which affect which interval ends are open and which are closed, but these do not matter here as for continuous \(z_n\) \(p(z_n=y_n-0.5)=p(z_n=y_n+0.5)=0\).↩︎
Stan’s integer constraints are not powerful enough to express the constraint that \(\texttt{r}\mathtt{\_}\texttt{t[j]} \leq \texttt{n}\mathtt{\_}\texttt{t[j]}\), but this constraint could be checked in the transformed data block.↩︎
When dividing two integers, the result type is an integer and rounding will ensue if the result is not exact. See the discussion of primitive arithmetic types in the reference manual for more information.↩︎