22.9 Exploiting Sufficient Statistics

This is an old version, view current version.

22.9 Exploiting Sufficient Statistics

In some cases, models can be recoded to exploit sufficient statistics in estimation. This can lead to large efficiency gains compared to an expanded model. For example, consider the following Bernoulli sampling model.

data {
  int<lower=0> N;
  int<lower=0, upper=1> y[N];
  real<lower=0> alpha;
  real<lower=0> beta;
}
parameters {
  real<lower=0, upper=1> theta;
}
model {
  theta ~ beta(alpha, beta);
  for (n in 1:N)
    y[n] ~ bernoulli(theta);
}

In this model, the sum of positive outcomes in y is a sufficient statistic for the chance of success theta. The model may be recoded using the binomial distribution as follows.

    theta ~ beta(alpha, beta);
    sum(y) ~ binomial(N, theta);

Because truth is represented as one and falsehood as zero, the sum sum(y) of a binary vector y is equal to the number of positive outcomes out of a total of N trials.

This can be generalized to other discrete cases (one wouldn’t expect continuous observations to be duplicated if they are random). Suppose there are only $K$ possible discrete outcomes, $z_1, \ldots, z_K$ , but there are $N$ observations, where $N$ is much larger than $K$ . If $f_k$ is the frequency of outcome $z_k$ , then the entire likelihood with distribution foo can be coded as follows.

for (k in 1:K)
  target += f[k] * foo_lpmf(z[k] | ...);

where the ellipses are the parameters of the log probability mass function for distribution foo (there’s no distribution called foo"; this is just a placeholder for any discrete distribution name).

The resulting program looks like a “weighted” regression, but here the weights f[k] are counts and thus sufficient statistics for the pmf and simply amount to an alternative, more efficient coding of the same likelihood. For efficiency, the frequencies f[k] should be counted once in the transformed data block and stored.