From Baseball to Surgical Mortality

Another example of repeated binary trial data is surgical mortality, with \(y_n\) surgical patients dying in \(K_n\) surgeries for hospitals \(n \in 1{:}N\). Instead of calculating a player’s hit rate, we’re calculating a hospital’s mortality rate for pediatric cardiac surgeries - so success here is technicaly a failure, nonetheless, the model is the same.

Spiegelhalter et al. (1996) provide a data set of mortality rates in 12 hospitals performing cardiac surgery in babies. The data from the paper has been manually entered into the data files here, which are also available in the Stan example models repository.

surgical_mortality_df <- read.csv("surgical-mortality.csv", sep=",", comment.char="#");
surgical_mortality_df;
##     y   K
## 1   0  47
## 2  18 148
## 3   8 119
## 4  46 810
## 5   8 211
## 6  13 196
## 7   9 148
## 8  31 215
## 9  14 207
## 10  8  97
## 11 29 256
## 12 24 360
plot(surgical_mortality_df$K, surgical_mortality_df$y,
     main=("hospital surgical mortalities"),
     xlab=("pediatric cardiac procedures"),
     ylab=("deaths"));

Three Models: Complete Pooling, No Pooling, Partial Pooling

Model 1: Complete Pooling.

This model assumes a single parameter \(\theta \in [0,1]\) which represents the mortality rate (chance of success) for all hospitals (items). Assuming each hospital’s surgeries are \(K_n\) independent Bernoulli trials, the sampling distribution for the mortalities \(y_n\) is modeled as: \[ p(y_n \, | \, \theta) \ = \ \mathsf{Binomial}(y_n \, | \, K_n, \theta). \] Assuming each hospital is independent leads to the complete data likelihood \[ p(y \, | \, \theta) = \prod_{n=1}^N \mathsf{Binomial}(y_n \, | \, K_n, \theta). \]

Model 2: No Pooling.

This model assumes that all hospitals are unrelated, and estimates a separate parameter \(\theta \in [0,1]\) for each.

The likelihood uses the chance of success \(\theta_n\) for item \(n\) in modeling the number of successes \(y_n\) as \[ p(y_n \, | \, \theta_n) = \mathsf{Binomial}(y_n \, | \, K_n, \theta_n). \]

Model 3: Partial Pooling.

This is a hierarchical model which estimates the similarity among hospitals, i.e., it treats the hospitals as belonging to a population of hospitals. The properties of this population will be estimated along with individual hospital mortality rates, implicitly controlling the amount of pooling that is applied. The more variable the (estimate of the) population, the less pooling is applied.

The hierarchical model places a prior on a hospital’s mortality rates with parameters that are themselves estimated. In this case, we will assume a beta distribution as the prior as it is scaled to values in \([0, 1]\): \[ p(\theta_n \, | \, \alpha, \beta) \ = \ \mathsf{Beta}(\theta_n \, | \, \alpha, \beta), \] where \(\alpha, \beta > 0\) are the parameters of the prior. The beta distribution is the conjugate prior for the binomial, meaning that the posterior is known to be a beta distribution. This also allows us to interpret the prior’s parameters as prior data, with \(\alpha - 1\) being the prior number of successes and \(\beta - 1\) being the prior number of failures, and \(\alpha = \beta = 1\) corresponding to no prior observations and thus a uniform distribution. Each \(\theta_n\) will be modeled as conditionally independent given the prior parameters, so that the complete prior is: \[ p(\theta \, | \, \alpha, \beta) = \prod_{n=1}^N \mathsf{Beta}(\theta_n \, | \, \alpha, \beta). \] The likelihood is the same binomial as above, the chance of success \(\theta_n\) for item \(n\) with number of successes \(y_n\) is: \[ p(y_n \, | \, \theta_n) = \mathsf{Binomial}(y_n \, | \, K_n, \theta_n). \]

The direct encoding of this model in Stan would be:

data {
  int<lower=0> N;          // items
  int<lower=0> K[N];       // trials
  int<lower=0> y[N];       // successes
}
parameters {
  real<lower=0> alpha; 
  real<lower=0> beta;
  vector<lower=0,upper=1>[N] theta;
}
model {
  // no priors on alpha and beta - improper
  theta ~ beta(alpha,beta);  
  y ~ binomial(K, theta);
}

However, the Stan manual recommends using Stan’s transformed parameters block to reparameterize the two count parameters \(\alpha, \beta\) of the Beta distribution in terms of a mean parameter: \[\phi = \frac{\alpha}{\alpha + \beta}\] and a total count parameter: \[\kappa = \alpha + \beta\] Following Gelman et al. (2013, Chapter 5), the mean gets a uniform prior and the count parameter a Pareto prior with \[p(\lambda) \propto \lambda^{-2.5}\] The new parameters, \(\phi\) and \(\lambda\), are declared in the parameters block and the parameters for the Beta distribution, \(\alpha\) and \(\beta\), are declared and defined in the transformed parameters block:

parameters {
  vector<lower=0,upper=1>[N] theta;
  real<lower=0,upper=1> phi;
  real<lower=0.1> lambda;
  ...
}
transformed parameters {
  real<lower=0> alpha;
  real<lower=0> beta;
  ...
  alpha = lambda * phi;
  beta = lambda * (1 - phi);
  ...
}
model {
  phi ~ beta(1, 1); // uniform on phi, could drop
  lambda ~ pareto(0.1, 1.5);
  theta ~ beta(alpha, beta);

}

Exercise 3 Write Stan programs: pool.stan, no-pool.stan, and hier.stan. For all three of these models, the data block is the same:

data {
  int<lower=0> N;              // items
  int<lower=0> K[N];           // trials
  int<lower=0> y[N];           // successes
}

You just need to write the parameters and model blocks in which you declare the parameters and specify the prior and likelihoods, following the above definitions.

Note that in Stan, parameters must be declared with constraints corresponding to their support in the model. Because \(\theta\) will be used as a binomial parameter, \(\theta\) is constrained to be in the interval \([0,1]\).

Fit these models using the data from file surgical-mortality.csv using the following R commands:

surgical_mortality_df <- read.csv("surgical-mortality.csv", sep=",", comment.char="#");
y <- surgical_mortality_df$y;
K <- surgical_mortality_df$K;
N <- length(y);

fit_surgical_pool <- stan("pool.stan", data = c("N", "K", "y"));
fit_surgical_no_pool <- stan("no-pool.stan", data = c("N", "K", "y"));
fit_surgical_hier <- stan("hier.stan", data = c("N", "K", "y"));

print(fit_surgical_pool,probs=c(0.1, 0.5, 0.9));
print(fit_surgical_no_pool,probs=c(0.1, 0.5, 0.9));
print(fit_surgical_hier,probs=c(0.1, 0.5, 0.9));

References

  • Spiegelhalter, D., Thomas, A., Best, N., & Gilks, W. (1996) BUGS 0.5 Examples. MRC Biostatistics Unit, Institute of Public health, Cambridge, UK.