6.2 Meta-Analysis

This is an old version, view current version.

Meta-analysis aims to pool the data from several studies, such as the application of a tutoring program in several schools or treatment using a drug in several clinical trials.

The Bayesian framework is particularly convenient for meta-analysis, because each previous study can be treated as providing a noisy measurement of some underlying quantity of interest. The model then follows directly from two components, a prior on the underlying quantities of interest and a measurement-error style model for each of the studies being analyzed.

Treatment Effects in Controlled Studies

Suppose the data in question arise from a total of \(M\) studies providing paired binomial data for a treatment and control group. For instance, the data might be post-surgical pain reduction under a treatment of ibuprofen Warn, Thompson, and Spiegelhalter (2002) or mortality after myocardial infarction under a treatment of beta blockers (Gelman et al. 2013, Section 5.6).

Data

The clinical data consists of \(J\) trials, each with \(n^t\) treatment cases, \(n^c\) control cases, \(r^t\) successful outcomes among those treated and \(r^c\) successful outcomes among those in the control group. This data can be declared in Stan as follows.¹³

data {
  int<lower=0> J;
  int<lower=0> n_t[J];  // num cases, treatment
  int<lower=0> r_t[J];  // num successes, treatment
  int<lower=0> n_c[J];  // num cases, control
  int<lower=0> r_c[J];  // num successes, control
}

Converting to Log Odds and Standard Error

Although the clinical trial data are binomial in its raw format, it may be transformed to an unbounded scale by considering the log odds ratio \[\begin{align*} y_j &= \log \left( \frac{r^t_j / (n^t_j - r^t_j)} {r^c_j / (n^c_j - r^c_j)} \right) \\ &= \log \left( \frac{r^t_j}{n^t_j - r^t_j} \right) -\log \left( \frac{r^c_j}{n^c_j - r^c_j} \right) \end{align*}\] and corresponding standard errors \[ \sigma_j = \sqrt{ \frac{1}{r^T_i} + \frac{1}{n^T_i - r^T_i} + \frac{1}{r^C_i} + \frac{1}{n^C_i - r^C_i} }. \]

The log odds and standard errors can be defined in a transformed parameter block, though care must be taken not to use integer division.¹⁴

transformed data {
  real y[J];
  real<lower=0> sigma[J];
  for (j in 1:J)
    y[j] = log(r_t[j]) - log(n_t[j] - r_t[j])
            - (log(r_c[j]) - log(n_c[j] - r_c[j]);
  for (j in 1:J)
    sigma[j] = sqrt(1 / r_t[j] + 1 / (n_t[j] - r_t[j])
                     + 1 / r_c[j] + 1 / (n_c[j] - r_c[j]));
}

This definition will be problematic if any of the success counts is zero or equal to the number of trials. If that arises, a direct binomial model will be required or other transforms must be used than the unregularized sample log odds.

Non-Hierarchical Model

With the transformed data in hand, two standard forms of meta-analysis can be applied. The first is a so-called “fixed effects” model, which assumes a single parameter for the global odds ratio. This model is coded in Stan as follows.

parameters {
  real theta;  // global treatment effect, log odds
}
model {
  y ~ normal(theta, sigma);
}

The sampling statement for y is vectorized; it has the same effect as the following.

  for (j in 1:J)
    y[j] ~ normal(theta, sigma[j]);

It is common to include a prior for theta in this model, but it is not strictly necessary for the model to be proper because y is fixed and \(\textsf{normal}(y \mid \mu,\sigma) = \textsf{normal}(\mu \mid y,\sigma)\).

Hierarchical Model

To model so-called “random effects,” where the treatment effect may vary by clinical trial, a hierarchical model can be used. The parameters include per-trial treatment effects and the hierarchical prior parameters, which will be estimated along with other unknown quantities.

parameters {
  real theta[J];      // per-trial treatment effect
  real mu;            // mean treatment effect
  real<lower=0> tau;  // deviation of treatment effects
}
model {
  y ~ normal(theta, sigma);
  theta ~ normal(mu, tau);
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 5);
}

Although the vectorized sampling statement for y appears unchanged, the parameter theta is now a vector. The sampling statement for theta is also vectorized, with the hyperparameters mu and tau themselves being given wide priors compared to the scale of the data.

Rubin (1981) provided a hierarchical Bayesian meta-analysis of the treatment effect of Scholastic Aptitude Test (SAT) coaching in eight schools based on the sample treatment effect and standard error in each school.

Extensions and Alternatives

Smith, Spiegelhalter, and Thomas (1995) and (Gelman et al. 2013, Section 19.4) provides meta-analyses based directly on binomial data. Warn, Thompson, and Spiegelhalter (2002) consider the modeling implications of using alternatives to the log-odds ratio in transforming the binomial data.

If trial-specific predictors are available, these can be included directly in a regression model for the per-trial treatment effects \(\theta_j\).

Stan’s integer constraints are not powerful enough to express the constraint that \(\texttt{r}\mathtt{\_}\texttt{t[j]} \leq \texttt{n}\mathtt{\_}\texttt{t[j]}\), but this constraint could be checked in the transformed data block.↩︎
When dividing two integers, the result type is an integer and rounding will ensue if the result is not exact. See the discussion of primitive arithmetic types in the reference manual for more information.↩︎