This is an old version, view current version.

## 4.2 Truncated data

Truncated data are data for which measurements are only reported if they fall above a lower bound, below an upper bound, or between a lower and upper bound.

Truncated data may be modeled in Stan using truncated distributions. For example, suppose the truncated data are $$y_n$$ with an upper truncation point of $$U = 300$$ so that $$y_n < 300$$. In Stan, this data can be modeled as following a truncated normal distribution for the observations as follows.

data {
int<lower=0> N;
real U;
real<upper=U> y[N];
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
for (n in 1:N)
y[n] ~ normal(mu, sigma) T[,U];
}

The model declares an upper bound U as data and constrains the data for y to respect the constraint; this will be checked when the data are loaded into the model before sampling begins.

This model implicitly uses an improper flat prior on the scale and location parameters; these could be given priors in the model using sampling statements.

### Constraints and out-of-bounds returns

If the sampled variate in a truncated distribution lies outside of the truncation range, the probability is zero, so the log probability will evaluate to $$-\infty$$. For instance, if variate y is sampled with the statement.

for (n in 1:N)
y[n] ~ normal(mu, sigma) T[L,U];

then if the value of y[n] is less than the value of L or greater than the value of U, the sampling statement produces a zero-probability estimate. For user-defined truncation, this zeroing outside of truncation bounds must be handled explicitly.

To avoid variables straying outside of truncation bounds, appropriate constraints are required. For example, if y is a parameter in the above model, the declaration should constrain it to fall between the values of L and U.

parameters {
real<lower=L,upper=U> y[N];
...

If in the above model, L or U is a parameter and y is data, then L and U must be appropriately constrained so that all data are in range and the value of L is less than that of U (if they are equal, the parameter range collapses to a single point and the Hamiltonian dynamics used by the sampler break down). The following declarations ensure the bounds are well behaved.

parameters {
real<upper=min(y)> L;           // L < y[n]
real<lower=fmax(L, max(y))> U;  // L < U; y[n] < U

For pairs of real numbers, the function fmax is used rather than max.

### Unknown truncation points

If the truncation points are unknown, they may be estimated as parameters. This can be done with a slight rearrangement of the variable declarations from the model in the previous section with known truncation points.

data {
int<lower=1> N;
real y[N];
}
parameters {
real<upper = min(y)> L;
real<lower = max(y)> U;
real mu;
real<lower=0> sigma;
}
model {
L ~ ...;
U ~ ...;
for (n in 1:N)
y[n] ~ normal(mu, sigma) T[L,U];
}

Here there is a lower truncation point L which is declared to be less than or equal to the minimum value of y. The upper truncation point U is declared to be larger than the maximum value of y. This declaration, although dependent on the data, only enforces the constraint that the data fall within the truncation bounds. With N declared as type int<lower=1>, there must be at least one data point. The constraint that L is less than U is enforced indirectly, based on the non-empty data.

The ellipses where the priors for the bounds L and U should go should be filled in with a an informative prior in order for this model to not concentrate L strongly around min(y) and U strongly around max(y).