5.4 Vectorizing mixtures
There is (currently) no way to vectorize mixture models at the
observation level in Stan. This section is to warn users away from
attempting to vectorize naively, as it results in a different model.
A proper mixture at the observation level is defined as follows, where
we assume that lambda
, y[n]
, mu[1], mu[2]
, and
sigma[1], sigma[2]
are all scalars and lambda
is between
0 and 1.
for (n in 1:N) {
target += log_sum_exp(log(lambda)
+ normal_lpdf(y[n] | mu[1], sigma[1]),
log1m(lambda)
+ normal_lpdf(y[n] | mu[2], sigma[2]));
or equivalently
for (n in 1:N) {
target += log_mix(lambda,
normal_lpdf(y[n] | mu[1], sigma[1]),
normal_lpdf(y[n] | mu[2], sigma[2]))
};
This definition assumes that each observation \(y_n\) may have arisen from either of the mixture components. The density is \[ p\left(y \mid \lambda, \mu, \sigma\right) = \prod_{n=1}^N \big(\lambda \times \textsf{normal}\left(y_n \mid \mu_1, \sigma_1 \right) + (1 - \lambda) \times \textsf{normal}\left(y_n \mid \mu_2, \sigma_2 \right)\big). \]
Contrast the previous model with the following (erroneous) attempt to vectorize the model.
target += log_sum_exp(log(lambda)
+ normal_lpdf(y | mu[1], sigma[1]),
log1m(lambda)
+ normal_lpdf(y | mu[2], sigma[2]));
or equivalently,
target += log_mix(lambda,
normal_lpdf(y | mu[1], sigma[1]),
normal_lpdf(y | mu[2], sigma[2]));
This second definition implies that the entire sequence \(y_1, \dotsc, y_n\) of observations comes form one component or the other, defining a different density, \[ p\left(y \mid \lambda, \mu, \sigma \right) = \lambda \times \prod_{n=1}^N \textsf{normal}\left(y_n \mid \mu_1, \sigma_1\right) + (1 - \lambda) \times \prod_{n=1}^N \textsf{normal}\left(y_n \mid \mu_2, \sigma_2\right). \]