User-Defined Functions
Stan allows users to define their own functions. The basic syntax is a simplified version of that used in C and C++. This chapter specifies how functions are declared, defined, and used in Stan.
Function-definition block
User-defined functions appear in a special function-definition block before all of the other program blocks.
functions {
// ... function declarations and definitions ...
}data {
// ...
Function definitions and declarations may appear in any order. Forward declarations are allowed but not required.
Function names
The rules for function naming and function-argument naming are the same as for other variables; see the section on variables for more information on valid identifiers. For example,
real foo(real mu, real sigma);
declares a function named foo
with two argument variables of types real
and real
. The arguments are named mu
and sigma
, but that is not part of the declaration.
Function overloading
Multiple user-defined functions may have the same name if they have different sequences of argument types. This is known as function overloading.
For example, the following two functions are both defined with the name add_up
real add_up(real a, real b){
return a + b;
}
real add_up(real a, real b, real c){
return a + b + c;
}
The return types of overloaded functions do not need to be the same. One could define an additional add_up
function as follows
int add_up(int a, int b){
return a + b;
}
That being said, functions may not use the same name if their signature only differs by the return type.
For example, the following is not permitted
// illegal
real baz(int x);
int baz(int x);
Function names used in the Stan standard library may be overloaded by user-defined functions. Exceptions to this are the reduce_sum
family of functions and ODE integrators, which cannot be overloaded.
Calling functions
All function arguments are mandatory—there are no default values.
Functions as expressions
Functions with non-void return types are called just like any other built-in function in Stan—they are applied to appropriately typed arguments to produce an expression, which has a value when executed.
Functions as statements
Functions with void return types may be applied to arguments and used as statements. These act like sampling statements or print statements. Such uses are only appropriate for functions that act through side effects, such as incrementing the log probability accumulator, printing, or raising exceptions.
Resolving overloads
Overloaded functions alongside type promotion can result in situations where there are multiple valid interpretations of a function call. Stan requires that there be a unique signature which minimizes the number of promotions required.
Consider the following two overloaded functions
real foo(int a, real b);
real foo(real a, int b);
These functions do not have a unique minimum when called with two integer arguments foo(1,2)
, and therefore cannot be called as such.
Promotion of integers to complex numbers is considered as two separate promotions, one from int
to real
and a second from real
to complex
. Consider the following functions with real
and complex
signatures
real bar(real x);
real bar(complex z);
A call bar(5)
with an integer argument will be resolved to bar(real)
because it only requires a single promotion, whereas the promotion to a complex number requires two promotions.
Argument promotion
The rules for calling functions work the same way as assignment as far as promotion goes. This means that we can promote arguments to the type expected by function arguments. For example, the following will work.
real foo(real x) { return ... };
...int a = 5;
real b = foo(a); // a promoted to type real
In addition to promoting int
to real
, Stan also promotes real
to complex
, and by transitivity, int
to complex
. This also works for containers, so an array of int
may be assigned to an array of real
of the same shape. And we can also promote vector
to complex_vector
and similarly for row vectors and matrices.
Probability functions in sampling statements
Functions whose name ends in _lpdf
or _lpmf
(density and mass functions) may be used as probability functions and may be used in place of parameterized distributions on the right-hand-side of sampling statements.
Restrictions on placement
Functions of certain types are restricted on scope of usage. Functions whose names end in _lp
assume access to the log probability accumulator and are only available in the transformed parameter and model blocks.
Functions whose names end in _rng
assume access to the random number generator and may only be used within the generated quantities block, transformed data block, and within user-defined functions ending in _rng
.
Functions whose names end in _lpdf
and _lpmf
can be used anywhere. However, _lupdf
and _lupmf
functions can only be used in the model block or user-defined probability functions.
See the section on function bodies for more information on these special types of function.
Argument types and qualifiers
Stan’s functions all have declared types for both arguments and returned value. As with built-in functions, user-defined functions are only declared for base argument type and dimensionality. This requires a different syntax than for declaring other variables. The choice of language was made so that return types and argument types could use the same declaration syntax.
The type void
may not be used as an argument type, only a return type for a function with side effects.
Base variable type declaration
The base variable types are integer
, real
, complex
, vector
, row_vector
, and matrix
. No lower-bound or upper-bound constraints are allowed (e.g., real<lower=0>
is illegal). Specialized constrained types are also not allowed (e.g., simplex
is illegal).
Tuple types of the form tuple(T1, ..., TN)
are also allowed, with all of the types T1
to TN
being function argument types (i.e., no constraints and no sizes).
Dimensionality declaration
Arguments and return types may be arrays, and these are indicated with optional brackets and commas as would be used for indexing. For example, int
denotes a single integer argument or return, whereas array[] real
indicates a one-dimensional array of reals, array[,] real
a two-dimensional array and array[,,] real
a three-dimensional array; whitespace is optional, as usual.
The dimensions for vectors and matrices are not included, so that matrix
is the type of a single matrix argument or return type. Thus if a variable is declared as matrix a
, then a
has two indexing dimensions, so that a[1]
is a row vector and a[1, 1]
a real value. Matrices implicitly have two indexing dimensions. The type declaration matrix[ , ] b
specifies that b
is a two-dimensional array of matrices, for a total of four indexing dimensions, with b[1, 1, 1, 1]
picking out a real value.
Dimensionality checks and exceptions
Function argument and return types are not themselves checked for dimensionality. A matrix of any size may be passed in as a matrix argument. Nevertheless, a user-defined function might call a function (such as a multivariate normal density) that itself does dimensionality checks.
Dimensions of function return values will be checked if they’re assigned to a previously declared variable. They may also be checked if they are used as the argument to a function.
Any errors raised by calls to functions inside user functions or return type mismatches are simply passed on; this typically results in a warning message and rejection of a proposal during sampling or optimization.
Data-only qualifiers
Some of Stan’s built-in functions, like the differential equation solvers, have arguments that must be data. Such data-only arguments must be expressions involving only data, transformed data, and generated quantity variables.
In user-defined functions, the qualifier data
may be placed before an argument type declaration to indicate that the argument must be data only. For example,
real foo(data real x) {
return x^2;
}
requires the argument x
to be data only.
Declaring an argument data only allows type inference to proceed in the body of the function so that, for example, the variable may be used as a data-only argument to a built-in function.
Function bodies
The body of a function is between an open curly brace ({
) and close curly brace (}
). The body may contain local variable declarations at the top of the function body’s block and these scope the same way as local variables used in any other statement block.
Any user-defined function may be used in the function body regardless of the order in which the function definitions appear in the file. Self-recursive and mutually recursive functions are possible without any additional declarations.
The only restrictions on statements in function bodies are external, and determine whether the log probability accumulator or random number generators are available; see the rest of this section for details.
Random number generating functions
Functions that call random number generating functions in their bodies must have a name that ends in _rng
; attempts to use random-number generators in other functions lead to a compile-time error.
Like other random number generating functions, user-defined functions with names that end in _rng
may be used only in the generated quantities block and transformed data block, or within the bodies of user-defined functions ending in _rng
. An attempt to use such a function elsewhere results in a compile-time error.
Log probability access in functions
Functions that include sampling statements or log probability increment statements must have a name that ends in _lp
. Attempts to use sampling statements or increment log probability statements in other functions lead to a compile-time error.
Like the target log density increment statement and sampling statements, user-defined functions with names that end in _lp
may only be used in blocks where the log probability accumulator is accessible, namely the transformed parameters and model blocks. An attempt to use such a function elsewhere results in a compile-time error.
Defining probability functions for sampling statements
Functions whose names end in _lpdf
and _lpmf
(density and mass functions) can be used as probability functions in sampling statements. As with the built-in functions, the first argument will appear on the left of the sampling statement operator (~
) in the sampling statement and the other arguments follow. For example, suppose a function returning the log of the density of y
given parameter theta
allows the use of the sampling statement is defined as follows.
real foo_lpdf(real y, vector theta) { ... }
Note that for function definitions, the comma is used rather than the vertical bar.
For every custom _lpdf
and _lpmf
defined there is a corresponding _lupdf
and _lupmf
defined automatically. The _lupdf
and _lupmf
versions of the functions cannot be defined directly (to do so will produce an error). The difference in the _lpdf
and _lpmf
and the corresponding _lupdf
and _lupmf
functions is that if any other unnormalized density functions are used inside the user-defined function, the _lpdf
and _lpmf
forms of the user-defined function will change these densities to be normalized. The _lupdf
and _lupmf
forms of the user-defined functions will instead allow other unnormalized density functions to drop additive constants.
The sampling shorthand
z ~ foo(phi);
will have the same effect as incrementing the target with the log of the unnormalized density:
target += foo_lupdf(z | phi);
Other _lupdf
and _lupmf
functions used in the definition of foo_lpdf
will drop additive constants when foo_lupdf
is called and will not drop additive constants when foo_lpdf
is called.
If there are _lupdf
and _lupmf
functions used inside the following call to foo_lpdf
, they will be forced to normalize (return the equivalent of their _lpdf
and _lpmf
forms):
target += foo_lpdf(z | phi);
If there are no _lupdf
or _lupmf
functions used in the definition of foo_lpdf
, then there will be no difference between a foo_lpdf
or foo_lupdf
call.
The unnormalized _lupdf
and _lupmf
functions can only be used in the model block or in user-defined probability functions (those ending in _lpdf
or _lpmf
).
The same syntax and shorthand that works for _lpdf
also works for log probability mass functions with suffixes _lpmf
.
A function that is going to be accessed as distributions must return the log of the density or mass function it defines.
Parameters are constant
Within function definition bodies, the parameters may be used like any other variable. But the parameters are constant in the sense that they can’t be assigned to (i.e., can’t appear on the left side of an assignment (=
) statement). In other words, their value remains constant throughout the function body. Attempting to assign a value to a function parameter value will raise a compile-time error.1
Local variables may be declared at the top of the function block and scope as usual.
Return value
Non-void functions must have a return statement that returns an appropriately typed expression. If the expression in a return statement does not have the same type as the return type declared for the function, a compile-time error is raised.
Void functions may use return
only without an argument, but return statements are not mandatory.
Return guarantee required
Unlike C++, Stan enforces a syntactic guarantee for non-void functions that ensures control will leave a non-void function through an appropriately typed return statement or because an exception is raised in the execution of the function. To enforce this condition, functions must have a return statement as the last statement in their body. This notion of last is defined recursively in terms of statements that qualify as bodies for functions. The base case is that
- a return statement qualifies,
and the recursive cases are that
- a sequence of statements qualifies if its last statement qualifies,
- a for loop or while loop qualifies if its body qualifies, and
- a conditional statement qualifies if it has a default else clause and all of its body statements qualify.
An exception is made for “obviously infinite” loops like while (1)
, which contain a return
statement and no break
statements. The only way to exit such a loop is to return, so they are considered as returning statements.
These rules disqualify
real foo(real x) {
if (x > 2) {
return 1.0;
else if (x <= 2) {
} return -1.0;
} }
because there is no default else
clause, and disqualify
real foo(real x) {
real y;
y = x;while (x < 10) {
if (x > 0) {
return x;
}2;
y = x /
} }
because the return statement is not the last statement in the while loop. A bogus dummy return could be placed after the while loop in this case. The rules for returns allow
real log_fancy(real x) {
if (x < 1e-30) {
return x;
else if (x < 1e-14) {
} return x * x;
else {
} return log(x);
} }
because there’s a default else clause and each condition body has return as its final statement.
Void Functions as Statements
Void functions
A function can be declared without a return value by using void
in place of a return type. Note that the type void
may only be used as a return type—arguments may not be declared to be of type void
.
Usage as statement
A void function may be used as a statement.
Because there is no return, such a usage is only for side effects, such as incrementing the log probability function, printing, or raising an error.
Special return statements
In a return statement within a void function’s definition, the return
keyword is followed immediately by a semicolon (;
) rather than by the expression whose value is returned.
Declarations
Stan supports forward declarations, which look like function definitions without bodies. For example,
real unit_normal_lpdf(real y);
declares a function named unit_normal_lpdf
that consumes a single real-valued input and produces a real-valued output. Declaring a function without a definition is only really useful when using an extension which supplies the definition in C++ rather than in the Stan code itself. How exactly this can be accomplished will differ depending on your Stan interface.
A function definition with a body simultaneously declares and defines the named function, as in
real unit_normal_lpdf(real y) {
return -0.5 * square(y);
}
A function can be declared and (perhaps separately) defined at most once. However, functions with different argument types are considered distinct even if they have the same name; see the section on function overloading.
Footnotes
Despite being declared constant and appearing to have a pass-by-value syntax in Stan, the implementation of the language passes function arguments by constant reference in C++.↩︎