18.1 Basic Functions
Here’s an example of a skeletal Stan program with a userdefined relative difference function employed in the generated quantities block to compute a relative differences between two parameters.
functions {
real relative_diff(real x, real y) {
real abs_diff;
real avg_scale;
abs_diff = fabs(x  y);
avg_scale = (fabs(x) + fabs(y)) / 2;
return abs_diff / avg_scale;
}
}
...
generated quantities {
real rdiff;
rdiff = relative_diff(alpha, beta);
}
The function is named relative_diff
, and is declared to have two realvalued arguments and return a realvalued result. It is used the same way a builtin function would be used in the generated quantities block.
UserDefined Functions Block
All functions are defined in their own block, which is labeled functions
and must appear before all other program blocks. The userdefined functions block is optional.
Function Bodies
The body (the part between the curly braces) contains ordinary Stan code, including local variables. The new function is used in the generated quantities block just as any of Stan’s builtin functions would be used.
Return Statements
Return statements, such as the one on the last line of the definition of relative_diff
above, are only allowed in the bodies of function definitions. Return statements may appear anywhere in a function, but functions with nonvoid return types must end in a return statement.
Reject Statements
The Stan reject statement provides a mechanism to report errors or problematic values encountered during program execution. It accepts any number of quoted string literals or Stan expressions as arguments. This statement is typically embedded in a conditional statement in order to detect bad or illegal outcomes of some processing step.
Catching errors
Rejection is used to flag errors that arise in inputs or in program state. It is far better to fail early with a localized informative error message than to run into problems much further downstream (as in rejecting a state or failing to compute a derivative).
The most common errors that are coded is to test that all of the arguments to a function are legal. The following function takes a square root of its input, so requires nonnegative inputs; it is coded to guard against illegal inputs.
real dbl_sqrt(real x) {
if (!(x >= 0))
reject("dblsqrt(x): x must be positive; found x = ", x);
return 2 * sqrt(x);
}
The negation of the positive test is important, because it also catches the case where x
is a notanumber value. If the condition had been coded as (x < 0)
it would not catch the notanumber case, though it could be written as (x < 0  is_nan(x))
. The positive infinite case is allowed through, but could also be checked with the is_inf(x)
function. The square root function does not itself reject, but some downstream consumer of dbl_sqrt(2)
would be likely to raise an error, at which point the origin of the illegal input requires detective work. Or even worse, as Matt Simpson pointed out in the GitHub comments, the function could go into an infinite loop if it starts with an infinite value and tries to reduce it by arithmetic, likely consuming all available memory and crashing an interface. Much better to catch errors early and report on their origin.
The effect of rejection depends on the program block in which the rejection is executed. In transformed data, rejections cause the program to fail to load. In transformed parameters or in the model block, rejections cause the current state to be rejected in the Metropolis sense.^{29}
In generated quantities, rejections cause execution to halt because there is no way to recover and generate the remaining parameters, so extra care should be taken in calling functions in the generated quantities block.
Type Declarations for Functions
Function Argument  Local  Block 

(unsized)  (unconstrained)  (constrained) 
int 
int 
int 
int<lower = L> 

int<upper = U> 

int<lower = L, upper = U> 

real 
real 
real 
real<lower = L> 

real<upper = U> 

real<lower = L, upper = U> 

vector 
vector[N] 
vector[N] 
vector[N]<lower = L> 

vector[N]<upper = U> 

vector[N]<lower = L, upper = U> 

ordered[N] 

positive_ordered[N] 

simplex[N] 

unit_vector[N] 

row_vector 
row_vector[N] 
row_vector[N] 
row_vector[N]<lower = L> 

row_vector[N]<upper = U> 

row_vector[N]<lower = L, upper = U> 

matrix 
matrix[M, N] 
matrix[M, N] 
matrix[M, N]<lower = L> 

matrix[M, N]<upper = U> 

matrix[M, N]<lower = L, upper = U> 

matrix[K, K] 
corr_matrix[K] 

matrix[K, K] 
cov_matrix[K] 

matrix[K, K] 
cholesky_factor_corr[K] 

matrix[K, K] 
cholesky_factor_cov[K] 
The leftmost column is a list of the unconstrained and undimensioned basic types; these are used as function return types and argument types. The middle column is of unconstrained types with dimensions; these are used as local variable types. The variables M
and N
indicate number of columns and rows, respectively. The variable K
is used for square matrices, i.e., K
denotes both the number of rows and columns. The rightmost column lists the corresponding constrained types. An expression of any righthand column type may be assigned to its corresponding lefthand column basic type. At runtime, dimensions are checked for consistency for all variables; containers of any sizes may be assigned to function arguments. The constrained matrix types cov_matrix[K]
, corr_matrix[K]
, cholesky_factor_cov[K]
, and cholesky_factor_corr[K]
are only assignable to matrices of dimensions matrix[K, K]
types. Stan also allows arrays of any of these types, with slightly different declarations for function arguments and return types and variables.
Function argument and return types for vector and matrix types are not declared with their sizes, unlike type declarations for variables. Function argument type declarations may not be declared with constraints, either lower or upper bounds or structured constraints like forming a simplex or correlation matrix, (as is also the case for local variables); see the table of constrained types for full details.
For example, here’s a function to compute the entropy of a categorical distribution with simplex parameter theta
.
real entropy(vector theta) {
return sum(theta .* log(theta));
}
Although theta
must be a simplex, only the type vector
is used.^{30}
Upper or lower bounds on values or constrained types are not allowed as return types or argument types in function declarations.
Array Types for Function Declarations
Array arguments have their own syntax, which follows that used in this manual for function signatures. For example, a function that operates on a twodimensional array to produce a onedimensional array might be declared as follows.
real[] baz(real[,] x);
The notation [ ]
is used for onedimensional arrays (as in the return above), [ , ]
for twodimensional arrays, [ , , ]
for threedimensional arrays, and so on.
Functions support arrays of any type, including matrix and vector types. As with other types, no constraints are allowed.
Dataonly Function Arguments
A function argument which is a realvalued type or a container of a realvalued type, i.e., not an integer type or integer array type, can be qualified using the prefix qualifier data
. The following is an example of a dataonly function argument.
real foo(real y, data real mu) {
return 0.5 * (y  mu)^2;
}
This qualifier restricts this argument to being invoked with expressions which consist only of data variables, transformed data variables, literals, and function calls. A dataonly function argument cannot involve real variables declared in the parameters, transformed parameters, or model block. Attempts to invoke a function using an expression which contains parameter, transformed parameters, or model block variables as a dataonly argument will result in an error message from the parser.
Use of the data
qualifier must be consistent between the forward declaration and the definition of a functions.
This qualifier should be used when writing functions that call the builtin ordinary differential equation (ODE) solvers, algebraic solvers, or map functions. These higherorder functions have strictly specified signatures where some arguments of are data only expressions. (See the ODE solver chapter for more usage details and the functions reference manual for full definitions.) When writing a function which calls the ODE or algebraic solver, arguments to that function which are passed into the call to the solver, either directly or indirectly, should have the data
prefix qualifier. This allows for compiletime type checking and increases overall program understandability.
Just because this makes it possible to code a rejection sampler does not make it a good idea. Rejections break differentiability and the smooth exploration of the posterior. In Hamiltonian Monte Carlo, it can cause the sampler to be reduced to a diffusive random walk.↩
A range of builtin validation routines is coming to Stan soon! Alternatively, the
reject
statement can be used to check constraints on the simplex.↩