5.3 Univariate data types and variable declarations

This is an old version, view current version.

5.3 Univariate data types and variable declarations

All variables used in a Stan program must have an explicitly declared data type. The form of a declaration includes the type and the name of a variable. This section covers univariate types, the next section vector and matrix types, and the following section array types.

Unconstrained integer

Unconstrained integers are declared using the int keyword. For example, the variable N is declared to be an integer as follows.

int N;

Constrained integer

Integer data types may be constrained to allow values only in a specified interval by providing a lower bound, an upper bound, or both. For instance, to declare N to be a positive integer, use the following.

int<lower=1> N;

This illustrates that the bounds are inclusive for integers.

To declare an integer variable cond to take only binary values, that is zero or one, a lower and upper bound must be provided, as in the following example.

int<lower=0,upper=1> cond;

Unconstrained real

Unconstrained real variables are declared using the keyword real. The following example declares theta to be an unconstrained continuous value.

real theta;

Constrained real

Real variables may be bounded using the same syntax as integers. In theory (that is, with arbitrary-precision arithmetic), the bounds on real values would be exclusive. Unfortunately, finite-precision arithmetic rounding errors will often lead to values on the boundaries, so they are allowed in Stan.

The variable sigma may be declared to be non-negative as follows.

real<lower=0> sigma;

The following declares the variable x to be less than or equal to \(-1\).

real<upper=-1> x;

To ensure rho takes on values between \(-1\) and \(1\), use the following declaration.

real<lower=-1,upper=1> rho;

Infinite constraints

Lower bounds that are negative infinity or upper bounds that are positive infinity are ignored. Stan provides constants positive_infinity() and negative_infinity() which may be used for this purpose, or they may be read as data in the dump format.

Affinely transformed real

Real variables may be declared on a space that has been transformed using an affine transformation \(x\mapsto \mu + \sigma * x\) with offset \(\mu\) and (positive) multiplier \(\sigma\), using a syntax similar to that for bounds. While these transforms do not change the asymptotic sampling behaviour of the resulting Stan program (in a sense, the model the program implements), they can be useful for making the sampling process more efficient by transforming the geometry of the problem to a more natural multiplier and to a more natural offset for the sampling process, for instance by facilitating a non-centered parameterisation. While these affine transformation declarations do not impose a hard constraint on variables, they behave like the bounds constraints in many ways and could perhaps be viewed as acting as a sort of soft constraint.

The variable x may be declared to have offset \(1\) as follows.

real<offset=1> x;

Similarly, it can be declared to have multiplier \(2\) as follows.

real<multiplier=2> x;

Finally, we can combine both declarations to declare a variable with offset \(1\) and multiplier \(2\).

real<offset=1,multiplier=2> x;

As an example, we can give x a normal distribution with non-centered parameterization as follows.

parameters {
  real<offset=mu,multiplier=sigma> x;
}
model {
  x ~ normal(mu, sigma);
}

Recall that the centered parameterization is achieved with the code

parameters {
  real x;
}
model {
  x ~ normal(mu, sigma);
}

or equivalently

parameters {
  real<offset=0,multiplier=1> x;
}
model {
  x ~ normal(mu, sigma);
}

Expressions as bounds and offset/multiplier

Bounds (and offset and multiplier) for integer or real variables may be arbitrary expressions. The only requirement is that they only include variables that have been declared (though not necessarily defined) before the declaration. If the bounds themselves are parameters, the behind-the-scenes variable transform accounts for them in the log Jacobian.

For example, it is acceptable to have the following declarations.

data {
 real lb;
}
parameters {
   real<lower=lb> phi;
}

This declares a real-valued parameter phi to take values greater than the value of the real-valued data variable lb. Constraints may be complex expressions, but must be of type int for integer variables and of type real for real variables (including constraints on vectors, row vectors, and matrices). Variables used in constraints can be any variable that has been defined at the point the constraint is used. For instance,

data {
   int<lower=1> N;
   real y[N];
}
parameters {
   real<lower=min(y), upper=max(y)> phi;
}

This declares a positive integer data variable N, an array y of real-valued data of length N, and then a parameter ranging between the minimum and maximum value of y. As shown in the example code, the functions min() and max() may be applied to containers such as arrays.

A more subtle case involves declarations of parameters or transformed parameters based on parameters declared previously. For example, the following program will work as intended.

parameters {
  real a;
  real<lower = a> b;  // enforces a < b
}
transformed parameters {
  real c;
  real<lower = c> d;
  c = a;
  d = b;
}

The parameters instance works because all parameters are defined externally before the block is executed. The transformed parameters case works even though c isn’t defined at the point it is used, because constraints on transformed parameters are only validated at the end of the block. Data variables work like parameter variables, whereas transformed data and generated quantity variables work like transformed parameter variables.

Declaring optional variables

A variable may be declared with a size that depends on a boolean constant. For example, consider the definition of alpha in the following program fragment.

data {
  int<lower = 0, upper = 1> include_alpha;
  ...
parameters {
  vector[include_alpha ? N : 0] alpha;

If include_alpha is true, the model will include the vector alpha; if the flag is false, the model will not include alpha (technically, it will include alpha of size 0, which means it won’t contain any values and won’t be included in any output).

This technique is not just useful for containers. If the value of N is set to 1, then the vector alpha will contain a single element and thus alpha[1] behaves like an optional scalar, the existence of which is controlled by include_alpha.

This coding pattern allows a single Stan program to define different models based on the data provided as input. This strategy is used extensively in the implementation of the RStanArm package.