21.1 Dropping Proportionality Constants
If a density \(p(\theta)\) can be factored into \(K g(\theta)\) where \(K\) are all the factors that are a not a function of \(\theta\) and \(g(\theta)\) are all the terms that are a function of \(\theta\), then it is said that \(g(\theta)\) is proportional to \(p(\theta)\) up to a constant.
The advantage of all this is that sometimes \(K\) is expensive to compute and if it is not a function of the distribution that is to be sampled (or optimized or approximated with variational inference), there is no need to compute it because it will not affect the results.
Stan takes advantage of the proportionality constant fact with the ~
syntax.
Take for instance the normal likelihood:
data {
real mu;
real<lower=0.0> sigma;
}
parameters {
real x;
}
model {
x ~ normal(mu, sigma);
}
Syntactically, this is just shorthand for the equivalent model that replaces the
~
syntax with a target +=
statement and a normal_lupdf
function call:
data {
real mu;
real<lower=0.0> sigma;
}
parameters {
real x;
}
model {
target += normal_lupdf(x | mu, sigma)
}
The function normal_lupdf
is only guaranteed to return the log density of the
normal distribution up to a proportionality constant density to be sampled. The
proportionality constant itself is not defined. The full log density of the
statement here is:
\[ \textsf{normal\_lpdf}(x | \mu, \sigma) = -\log \left( \sigma \sqrt{2 \pi} \right) -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 \]
Now because the density here is only a function of \(x\), the additive terms in the log density that are not a function of \(x\) can be dropped. In this case it is enough to know only the quadratic term:
\[ \textsf{normal\_lupdf}(x | \mu, \sigma) = -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 \]