5.4 Vectorizing Mixtures

There is (currently) no way to vectorize mixture models at the observation level in Stan. This section is to warn users away from attempting to vectorize naively, as it results in a different model. A proper mixture at the observation level is defined as follows, where we assume that lambda, y[n], mu[1], mu[2], and sigma[1], sigma[2] are all scalars and lambda is between 0 and 1.

for (n in 1:N) {
  target += log_sum_exp(log(lambda)
                          + normal_lpdf(y[n] | mu[1], sigma[1]),
                        log1m(lambda)
                          + normal_lpdf(y[n] | mu[2], sigma[2]));

or equivalently

for (n in 1:N)
  target += log_mix(lambda,
                    normal_lpdf(y[n] | mu[1], sigma[1]),
                    normal_lpdf(y[n] | mu[2], sigma[2]));

This definition assumes that each observation \(y_n\) may have arisen from either of the mixture components. The density is \[ p(y \, | \, \lambda, \mu, \sigma) = \prod_{n=1}^N (\lambda * \mathsf{normal}(y_n \, | \, \mu_1, \sigma_1) + (1 - \lambda) * \mathsf{normal}(y_n \, | \, \mu_2, \sigma_2). \]

Contrast the previous model with the following (erroneous) attempt to vectorize the model.

target += log_sum_exp(log(lambda)
                        + normal_lpdf(y | mu[1], sigma[1]),
                      log1m(lambda)
                        + normal_lpdf(y | mu[2], sigma[2]));

or equivalently,

target += log_mix(lambda,
                  normal_lpdf(y | mu[1], sigma[1]),
                  normal_lpdf(y | mu[2], sigma[2]));

This second definition implies that the entire sequence \(y_1, \ldots, y_n\) of observations comes form one component or the other, defining a different density, \[ p(y \, | \, \lambda, \mu, \sigma) = \lambda * \prod_{n=1}^N \mbox{normal}(y_n \, | \, \mu_1, \sigma_1) + (1 - \lambda) * \prod_{n=1}^N \mbox{normal}(y_n \, | \, \mu_2, \sigma_2). \]