20.3 Changes of Variables

Changes of variables are applied when the transformation of a parameter is characterized by a distribution. The standard textbook example is the lognormal distribution, which is the distribution of a variable \(y > 0\) whose logarithm \(\log y\) has a normal distribution. The distribution is being assigned to \(\log y\).

The change of variables requires an adjustment to the probability to account for the distortion caused by the transform. For this to work, univariate changes of variables must be monotonic and differentiable everywhere in their support.

For univariate changes of variables, the resulting probability must be scaled by the absolute derivative of the transform.

In the case of log normals, if \(y\)’s logarithm is normal with mean \(\mu\) and deviation \(\sigma\), then the distribution of \(y\) is given by \[ p(y) \ = \ \mathsf{normal}(\log y| \mu, \sigma) \ \left| \frac{d}{dy} \log y \right| \ = \ \mathsf{normal}(\log y| \mu, \sigma) \frac{1}{y}. \] Stan works on the log scale to prevent underflow, where \[ \log p(y) = \log \mathsf{normal}(\log y| \mu, \sigma) - \log y. \]

In Stan, the change of variables can be applied in the sampling statement. To adjust for the curvature, the log probability accumulator is incremented with the log absolute derivative of the transform. The lognormal distribution can thus be implemented directly in Stan as follows.37

parameters {
  real<lower=0> y;
  ...
model {
  log(y) ~ normal(mu, sigma);
  target += -log(y);
  ...

It is important, as always, to declare appropriate constraints on parameters; here y is constrained to be positive.

It would be slightly more efficient to define a local variable for the logarithm, as follows.

model {
  real log_y;
  log_y = log(y);
  log_y ~ normal(mu, sigma);
  target += -log_y;
  ...

If y were declared as data instead of as a parameter, then the adjustment can be ignored because the data will be constant and Stan only requires the log probability up to a constant.

Change of Variables vs. Transformations

This section illustrates the difference between a change of variables and a simple variable transformation. A transformation samples a parameter, then transforms it, whereas a change of variables transforms a parameter, then samples it. Only the latter requires a Jacobian adjustment.

It does not matter whether the probability function is expressed using a sampling statement, such as

log(y) ~ normal(mu, sigma);

or as an increment to the log probability function, as in

target += normal_lpdf(log(y) | mu, sigma);

20.3.0.1 Gamma and Inverse Gamma Distribution {-}

Like the log normal, the inverse gamma distribution is a distribution of variables whose inverse has a gamma distribution. This section contrasts two approaches, first with a transform, then with a change of variables.

The transform based approach to sampling y_inv with an inverse gamma distribution can be coded as follows.

parameters {
  real<lower=0> y;
}
transformed parameters {
  real<lower=0> y_inv;
  y_inv = 1 / y;
}
model {
  y ~ gamma(2,4);
}

The change-of-variables approach to sampling y_inv with an inverse gamma distribution can be coded as follows.

parameters {
  real<lower=0> y_inv;
}
transformed parameters {
  real<lower=0> y;
  y = 1 / y_inv;  // change variables
}
model {
  y ~ gamma(2,4);
  target +=  -2 * log(y_inv);  //  Jacobian adjustment;
}

The Jacobian adjustment is the log of the absolute derivative of the transform, which in this case is

\[ \log \left| \frac{d}{du} \left( \frac{1}{u} \right) \right| \ = \ \log | - u^{-2} | \ = \ \log u^{-2} \ = \ -2 \log u. \]

Multivariate Changes of Variables

In the case of a multivariate transform, the log of the Jacobian of the transform must be added to the log probability accumulator. In Stan, this can be coded as follows in the general case where the Jacobian is not a full matrix.

parameters {
  vector[K] u;      // multivariate parameter
   ...
transformed parameters {
  vector[K] v;     // transformed parameter
  matrix[K, K] J;   // Jacobian matrix of transform
  ... compute v as a function of u ...
  ... compute J[m, n] = d.v[m] / d.u[n] ...
  target += log(fabs(determinant(J)));
  ...
model {
  v ~ ...;
  ...

If the Jacobian is known analytically, it will be more efficient to apply it directly than to call the determinant function, which is neither efficient nor particularly stable numerically.

In many cases, the Jacobian matrix will be triangular, so that only the diagonal elements will be required for the determinant calculation. Triangular Jacobians arise when each element v[k] of the transformed parameter vector only depends on elements u[1], , u[k] of the parameter vector. For triangular matrices, the determinant is the product of the diagonal elements, so the transformed parameters block of the above model can be simplified and made more efficient by recoding as follows.

transformed parameters {
  ...
  vector[K] J_diag;  // diagonals of Jacobian matrix
  ...
  ... compute J[k, k] = d.v[k] / d.u[k] ...
  target += sum(log(J_diag));
  ...

  1. This example is for illustrative purposes only; the recommended way to implement the lognormal distribution in Stan is with the built-in lognormal probability function; see the functions reference manual for details.