18.10 Truncated Random Number Generation
Generation with Inverse CDFs
To generate random numbers, it is often sufficient to invert their
cumulative distribution functions. This is built into many of the
random number generators. For example, to generate a standard
logistic variate, first generate a uniform variate
\(u \sim \mathsf{Uniform}(0, 1)\), then run through the inverse
cumulative distribution function, \(y = \mathrm{logit}(u)\). If this
were not already built in as logistic_rng(0, 1)
, it could be
coded in Stan directly as
real standard_logistic_rng() {
real u = uniform_rng(0, 1);
real y = logit(u);
return y;
}
Following the same pattern, a standard normal RNG could be coded as
real standard_normal_rng() {
real u = uniform_rng(0, 1);
real y = Phi(u);
return y;
}
In order to generate non-standard variates of the location-scale variety, the variate is scaled by the scale parameter and shifted by the location parameter. For example, to generate \(\mathsf{normal}(\mu, \sigma)\) variates, it is enough to generate a uniform variate \(u \sim \mathsf{Uniform}(0, 1)\), then convert to a standard normal variate, \(z = \Phi(u)\), where \(\Phi\) is the inverse cumulative distribution function for the standard normal, and then finally, scale and translate, \(y = \mu + \sigma \cdot z\). In code,
real my_normal_rng(real mu, real sigma) {
real u = uniform_rng(0, 1);
real z = Phi(u);
real y = mu + sigma * z;
return y;
}
A robust version of this function would test that the arguments are
finite and that sigma
is non-negative, e.g.,
if (is_nan(mu) || is_infinite(mu))
reject("my_normal_rng: mu must be finite; ",
"found mu = ", mu);
if (is_nan(sigma) || is_infinite(sigma) || sigma < 0)
reject("my_normal_rng: sigma must be finite and non-negative; ",
"found sigma = ", sigma);
Truncated variate generation
Often truncated uniform variates are needed, as in survival analysis when a time of death is censored beyond the end of the observations. To generate a truncated random variate, the cumulative distribution is used to find the truncation point in the inverse CDF, a uniform variate is generated in range, and then the inverse CDF translates it back.
Truncating below
For example, the following code generates a \(\mathsf{Weibull}(\alpha, \sigma)\) variate truncated below at a time \(t\),30
real weibull_lb_rng(real alpha, real sigma, real t) {
real p = weibull_cdf(lt, alpha, sigma); // cdf for lb
real u = uniform_rng(p, 1); // unif in bounds
real y = sigma * (-log1m(u))^inv(alpha); // inverse cdf
return y;
}
Truncating above and below
If there is a lower bound and upper bound, then the CDF trick is used twice to find a lower and upper bound. For example, to generate a \(\mathsf{normal}(\mu, \sigma)\) truncated to a region \((a, b)\), the following code suffices,
real normal_lub_rng(real mu, real sigma, real lb, real ub) {
real p_lb = normal_cdf(lb, mu, sigma);
real p_ub = normal_cdf(ub, mu, sigma);
real u = uniform_rng(p_lb, p_ub);
real y = mu + sigma * Phi(u);
return y;
}
To make this more robust, all variables shold be tested for
finiteness, sigma
should be tested for positiveness, and
lb
and ub
should be tested to ensure the upper bound is
greater than the lower bound. While it may be tempting to compress
lines, the variable names serve as a kind of chunking of operations
and naming for readability; compare the multiple statement version
above with the single statement
return mu + sigma * Phi(uniform_rng(normal_cdf(lb, mu, sigma),
normal_cdf(ub, mu, sigma)));
for readability. The names like p
indicate probabilities, and
p_lb
and p_ub
indicate the probabilities of the
bounds. The variable u
is clearly named as a uniform variate,
and y
is used to denote the variate being generated itself.
The original code and impetus for including this in the manual came from the Stan forums post ; by user
lcomm
, who also explained truncation above and below.↩