18.10 Truncated Random Number Generation

This is an old version, view current version.

18.10 Truncated Random Number Generation

Generation with Inverse CDFs

To generate random numbers, it is often sufficient to invert their cumulative distribution functions. This is built into many of the random number generators. For example, to generate a standard logistic variate, first generate a uniform variate $u \sim \textsf{uniform}(0, 1)$ , then run through the inverse cumulative distribution function, $y = \textrm{logit}(u)$ . If this were not already built in as logistic_rng(0, 1), it could be coded in Stan directly as

real standard_logistic_rng() {
  real u = uniform_rng(0, 1);
  real y = logit(u);
  return y;
}

Following the same pattern, a standard normal RNG could be coded as

real standard_normal_rng() {
  real u = uniform_rng(0, 1);
  real y = inv_Phi(u);
  return y;
}

that is, $y = \Phi^{-1}(u)$ , where $\Phi^{-1}$ is the inverse cumulative distribution function for the standard normal distribution, implemented in the Stan function inv_Phi.

In order to generate non-standard variates of the location-scale variety, the variate is scaled by the scale parameter and shifted by the location parameter. For example, to generate $\textsf{normal}(\mu, \sigma)$ variates, it is enough to generate a uniform variate $u \sim \textsf{uniform}(0, 1)$ , then convert it to a standard normal variate, $z = \Phi(u)$ , where $\Phi$ is the inverse cumulative distribution function for the standard normal, and then, finally, scale and translate it, $y = \mu + \sigma \times z$ . In code,

real my_normal_rng(real mu, real sigma) {
  real u = uniform_rng(0, 1);
  real z = inv_Phi(u);
  real y = mu + sigma * z;
  return y;
}

A robust version of this function would test that the arguments are finite and that sigma is non-negative, e.g.,

  if (is_nan(mu) || is_infinite(mu))
    reject("my_normal_rng: mu must be finite; ",
           "found mu = ", mu);
  if (is_nan(sigma) || is_infinite(sigma) || sigma < 0)
    reject("my_normal_rng: sigma must be finite and non-negative; ",
           "found sigma = ", sigma);

Truncated variate generation

Often truncated uniform variates are needed, as in survival analysis when a time of death is censored beyond the end of the observations. To generate a truncated random variate, the cumulative distribution is used to find the truncation point in the inverse CDF, a uniform variate is generated in range, and then the inverse CDF translates it back.

Truncating below

For example, the following code generates a $\textsf{Weibull}(\alpha, \sigma)$ variate truncated below at a time $t$ ,³¹

real weibull_lb_rng(real alpha, real sigma, real t) {
  real p = weibull_cdf(lt, alpha, sigma);   // cdf for lb
  real u = uniform_rng(p, 1);               // unif in bounds
  real y = sigma * (-log1m(u))^inv(alpha);  // inverse cdf
  return y;
}

Truncating above and below

If there is a lower bound and upper bound, then the CDF trick is used twice to find a lower and upper bound. For example, to generate a $\textsf{normal}(\mu, \sigma)$ truncated to a region $(a, b)$ , the following code suffices,

real normal_lub_rng(real mu, real sigma, real lb, real ub) {
  real p_lb = normal_cdf(lb, mu, sigma);
  real p_ub = normal_cdf(ub, mu, sigma);
  real u = uniform_rng(p_lb, p_ub);
  real y = mu + sigma * inv_Phi(u);
  return y;
}

To make this more robust, all variables should be tested for finiteness, sigma should be tested for positiveness, and lb and ub should be tested to ensure the upper bound is greater than the lower bound. While it may be tempting to compress lines, the variable names serve as a kind of chunking of operations and naming for readability; compare the multiple statement version above with the single statement

  return mu + sigma * inv_Phi(uniform_rng(normal_cdf(lb, mu, sigma),
                                          normal_cdf(ub, mu, sigma)));

for readability. The names like p indicate probabilities, and p_lb and p_ub indicate the probabilities of the bounds. The variable u is clearly named as a uniform variate, and y is used to denote the variate being generated itself.

The original code and impetus for including this in the manual came from the Stan forums post ; by user lcomm, who also explained truncation above and below.↩︎