22.9 Exploiting sufficient statistics
In some cases, models can be recoded to exploit sufficient statistics in estimation. This can lead to large efficiency gains compared to an expanded model. For example, consider the following Bernoulli sampling model.
data {
int<lower=0> N;
int<lower=0, upper=1> y[N];
real<lower=0> alpha;
real<lower=0> beta;
}
parameters {
real<lower=0, upper=1> theta;
}
model {
theta ~ beta(alpha, beta);
for (n in 1:N)
y[n] ~ bernoulli(theta);
}
In this model, the sum of positive outcomes in y
is a
sufficient statistic for the chance of success theta
. The
model may be recoded using the binomial distribution as follows.
theta ~ beta(alpha, beta);
sum(y) ~ binomial(N, theta);
Because truth is represented as one and falsehood as zero, the sum
sum(y)
of a binary vector y
is equal to the number of
positive outcomes out of a total of N
trials.
This can be generalized to other discrete cases (one wouldn’t expect
continuous observations to be duplicated if they are random). Suppose
there are only \(K\) possible discrete outcomes, \(z_1, \dotsc, z_K\), but
there are \(N\) observations, where \(N\) is much larger than \(K\). If
\(f_k\) is the frequency of outcome \(z_k\), then the entire likelihood
with distribution foo
can be coded as follows.
for (k in 1:K)
target += f[k] * foo_lpmf(z[k] | ...);
where the ellipses are the parameters of the log probability mass
function for distribution foo
(there’s no distribution called
“foo”; this is just a placeholder for any discrete distribution
name).
The resulting program looks like a “weighted” regression, but here
the weights f[k]
are counts and thus sufficient statistics for
the PMF and simply amount to an alternative, more efficient coding of
the same likelihood. For efficiency, the frequencies f[k]
should be counted once in the transformed data block and stored.