5.3 Univariate data types and variable declarations
All variables used in a Stan program must have an explicitly declared data type. The form of a declaration includes the type and the name of a variable. This section covers univariate types, the next section vector and matrix types, and the following section array types.
Unconstrained integer
Unconstrained integers are declared using the int
keyword.
For example, the variable N
is declared to be an integer as follows.
int N;
Constrained integer
Integer data types may be constrained to allow values only in a
specified interval by providing a lower bound, an upper bound, or
both. For instance, to declare N
to be a positive integer, use
the following.
int<lower=1> N;
This illustrates that the bounds are inclusive for integers.
To declare an integer variable cond
to take only binary values,
that is zero or one, a lower and upper bound must be provided, as in
the following example.
int<lower=0,upper=1> cond;
Unconstrained real
Unconstrained real variables are declared using the keyword
real
. The following example declares theta
to be an
unconstrained continuous value.
real theta;
Constrained real
Real variables may be bounded using the same syntax as integers. In theory (that is, with arbitrary-precision arithmetic), the bounds on real values would be exclusive. Unfortunately, finite-precision arithmetic rounding errors will often lead to values on the boundaries, so they are allowed in Stan.
The variable sigma
may be declared to be non-negative as follows.
real<lower=0> sigma;
The following declares the variable x
to be less than or equal
to \(-1\).
real<upper=-1> x;
To ensure rho
takes on values between \(-1\) and \(1\), use the
following declaration.
real<lower=-1,upper=1> rho;
Infinite constraints
Lower bounds that are negative infinity or upper bounds that are
positive infinity are ignored. Stan provides constants
positive_infinity()
and negative_infinity()
which may
be used for this purpose, or they may be read as data in the dump
format.
Affinely transformed real
Real variables may be declared on a space that has been transformed using an affine transformation \(x\mapsto \mu + \sigma * x\) with offset \(\mu\) and (positive) multiplier \(\sigma\), using a syntax similar to that for bounds. While these transforms do not change the asymptotic sampling behaviour of the resulting Stan program (in a sense, the model the program implements), they can be useful for making the sampling process more efficient by transforming the geometry of the problem to a more natural multiplier and to a more natural offset for the sampling process, for instance by facilitating a non-centered parameterisation. While these affine transformation declarations do not impose a hard constraint on variables, they behave like the bounds constraints in many ways and could perhaps be viewed as acting as a sort of soft constraint.
The variable x
may be declared to have offset \(1\) as follows.
real<offset=1> x;
Similarly, it can be declared to have multiplier \(2\) as follows.
real<multiplier=2> x;
Finally, we can combine both declarations to declare a variable with offset \(1\) and multiplier \(2\).
real<offset=1,multiplier=2> x;
As an example, we can give x
a normal distribution with non-centered
parameterization as follows.
parameters {
real<offset=mu,multiplier=sigma> x;
}
model {
x ~ normal(mu, sigma);
}
Recall that the centered parameterization is achieved with the code
parameters {
real x;
}
model {
x ~ normal(mu, sigma);
}
or equivalently
parameters {
real<offset=0,multiplier=1> x;
}
model {
x ~ normal(mu, sigma);
}
Expressions as bounds and offset/multiplier
Bounds (and offset and multiplier) for integer or real variables may be arbitrary expressions. The only requirement is that they only include variables that have been declared (though not necessarily defined) before the declaration. If the bounds themselves are parameters, the behind-the-scenes variable transform accounts for them in the log Jacobian.
For example, it is acceptable to have the following declarations.
data {
real lb;
}
parameters {
real<lower=lb> phi;
}
This declares a real-valued parameter phi
to take values
greater than the value of the real-valued data variable lb
.
Constraints may be complex expressions, but must be of type int
for integer variables and of type real
for real variables
(including constraints on vectors, row vectors, and matrices).
Variables used in constraints can be any variable that has been
defined at the point the constraint is used. For instance,
data {
int<lower=1> N;
real y[N];
}
parameters {
real<lower=min(y), upper=max(y)> phi;
}
This declares a positive integer data variable N
, an array
y
of real-valued data of length N
, and then a parameter
ranging between the minimum and maximum value of y
. As shown
in the example code, the functions min()
and max()
may
be applied to containers such as arrays.
A more subtle case involves declarations of parameters or transformed parameters based on parameters declared previously. For example, the following program will work as intended.
parameters {
real a;
real<lower = a> b; // enforces a < b
}
transformed parameters {
real c;
real<lower = c> d;
c = a;
d = b;
}
The parameters instance works because all parameters are defined
externally before the block is executed. The transformed parameters
case works even though c
isn’t defined at the point it is used,
because constraints on transformed parameters are only validated at
the end of the block. Data variables work like parameter variables,
whereas transformed data and generated quantity variables work like
transformed parameter variables.
Declaring optional variables
A variable may be declared with a size that depends on a boolean
constant. For example, consider the definition of alpha
in the
following program fragment.
data {
int<lower = 0, upper = 1> include_alpha;
...
parameters {
vector[include_alpha ? N : 0] alpha;
If include_alpha
is true, the model will include the vector
alpha
; if the flag is false, the model will not include
alpha
(technically, it will include alpha
of size 0,
which means it won’t contain any values and won’t be included in any
output).
This technique is not just useful for containers. If the value of
N
is set to 1, then the vector alpha
will contain a
single element and thus alpha[1]
behaves like an optional
scalar, the existence of which is controlled by include_alpha
.
This coding pattern allows a single Stan program to define different models based on the data provided as input. This strategy is used extensively in the implementation of the RStanArm package.