User-Defined Functions

Stan allows users to define their own functions. The basic syntax is a simplified version of that used in C and C++. This chapter specifies how functions are declared, defined, and used in Stan.

Function-definition block

User-defined functions appear in a special function-definition block before all of the other program blocks.

functions {
   // ... function declarations and definitions ...
}
data {
  // ...

Function definitions and declarations may appear in any order. Forward declarations are allowed but not required.

Function names

The rules for function naming and function-argument naming are the same as for other variables; see the section on variables for more information on valid identifiers. For example,

real foo(real mu, real sigma);

declares a function named foo with two argument variables of types real and real. The arguments are named mu and sigma, but that is not part of the declaration.

Function overloading

Multiple user-defined functions may have the same name if they have different sequences of argument types. This is known as function overloading.

For example, the following two functions are both defined with the name add_up

real add_up(real a, real b){
  return a + b;
}

real add_up(real a, real b, real c){
  return a + b + c;
}

The return types of overloaded functions do not need to be the same. One could define an additional add_up function as follows

int add_up(int a, int b){
  return a + b;
}

That being said, functions may not use the same name if their signature only differs by the return type.

For example, the following is not permitted

// illegal
real baz(int x);
int baz(int x);

Function names used in the Stan standard library may be overloaded by user-defined functions. Exceptions to this are the reduce_sum family of functions and ODE integrators, which cannot be overloaded.

Calling functions

All function arguments are mandatory—there are no default values.

Functions as expressions

Functions with non-void return types are called just like any other built-in function in Stan—they are applied to appropriately typed arguments to produce an expression, which has a value when executed.

Functions as statements

Functions with void return types may be applied to arguments and used as statements. These act like distribution statements or print statements. Such uses are only appropriate for functions that act through side effects, such as incrementing the log probability accumulator, printing, or raising exceptions.

Resolving overloads

Overloaded functions alongside type promotion can result in situations where there are multiple valid interpretations of a function call. Stan requires that there be a unique signature which minimizes the number of promotions required.

Consider the following two overloaded functions

real foo(int a, real b);
real foo(real a, int b);

These functions do not have a unique minimum when called with two integer arguments foo(1,2), and therefore cannot be called as such.

Promotion of integers to complex numbers is considered as two separate promotions, one from int to real and a second from real to complex. Consider the following functions with real and complex signatures

real bar(real x);
real bar(complex z);

A call bar(5) with an integer argument will be resolved to bar(real) because it only requires a single promotion, whereas the promotion to a complex number requires two promotions.

Argument promotion

The rules for calling functions work the same way as assignment as far as promotion goes. This means that we can promote arguments to the type expected by function arguments. For example, the following will work.

real foo(real x) { return ... };
...
int a = 5;
real b = foo(a); // a promoted to type real

In addition to promoting int to real, Stan also promotes real to complex, and by transitivity, int to complex. This also works for containers, so an array of int may be assigned to an array of real of the same shape. And we can also promote vector to complex_vector and similarly for row vectors and matrices.

Probability functions in distribution statements

Functions whose name ends in _lpdf or _lpmf (log density and mass functions) may be used as probability functions and may be used in place of parameterized distributions on the right side of statements.qmd#distribution-statements.section.

Restrictions on placement

Functions of certain types are restricted on scope of usage. Functions whose names end in _lp assume access to the log probability accumulator and are only available in the transformed parameters and model blocks.

Functions whose name end in _jacobian assume access to the log probability accumulator may only be used within the transformed parameters block.

Functions whose names end in _rng assume access to the random number generator and may only be used within the generated quantities block, transformed data block, and within user-defined functions ending in _rng.

Functions whose names end in _lpdf and _lpmf can be used anywhere. However, _lupdf and _lupmf functions can only be used in the model block or user-defined probability functions.

See the section on function bodies for more information on these special types of function.

Argument types and qualifiers

Stan’s functions all have declared types for both arguments and returned value. As with built-in functions, user-defined functions are only declared for base argument type and dimensionality. This requires a different syntax than for declaring other variables. The choice of language was made so that return types and argument types could use the same declaration syntax.

The type void may not be used as an argument type, only a return type for a function with side effects.

Base variable type declaration

The base variable types are integer, real, complex, vector, row_vector, and matrix. No lower-bound or upper-bound constraints are allowed (e.g., real<lower=0> is illegal). Specialized constrained types are also not allowed (e.g., simplex is illegal).

Tuple types of the form tuple(T1, ..., TN) are also allowed, with all of the types T1 to TN being function argument types (i.e., no constraints and no sizes).

Dimensionality declaration

Arguments and return types may be arrays, and these are indicated with optional brackets and commas as would be used for indexing. For example, int denotes a single integer argument or return, whereas array[] real indicates a one-dimensional array of reals, array[,] real a two-dimensional array and array[,,] real a three-dimensional array; whitespace is optional, as usual.

The dimensions for vectors and matrices are not included, so that matrix is the type of a single matrix argument or return type. Thus if a variable is declared as matrix a, then a has two indexing dimensions, so that a[1] is a row vector and a[1, 1] a real value. Matrices implicitly have two indexing dimensions. The type declaration matrix[ , ] b specifies that b is a two-dimensional array of matrices, for a total of four indexing dimensions, with b[1, 1, 1, 1] picking out a real value.

Dimensionality checks and exceptions

Function argument and return types are not themselves checked for dimensionality. A matrix of any size may be passed in as a matrix argument. Nevertheless, a user-defined function might call a function (such as a multivariate normal density) that itself does dimensionality checks.

Dimensions of function return values will be checked if they’re assigned to a previously declared variable. They may also be checked if they are used as the argument to a function.

Any errors raised by calls to functions inside user functions or return type mismatches are simply passed on; this typically results in a warning message and rejection of a proposal during sampling or optimization.

Data-only qualifiers

Some of Stan’s built-in functions, like the differential equation solvers, have arguments that must be data. Such data-only arguments must be expressions involving only data, transformed data, and generated quantity variables.

In user-defined functions, the qualifier data may be placed before an argument type declaration to indicate that the argument must be data only. For example,

real foo(data real x) {
  return x^2;
}

requires the argument x to be data only.

Declaring an argument data only allows type inference to proceed in the body of the function so that, for example, the variable may be used as a data-only argument to a built-in function.

Function bodies

The body of a function is between an open curly brace ({) and close curly brace (}). The body may contain local variable declarations at the top of the function body’s block and these scope the same way as local variables used in any other statement block.

Any user-defined function may be used in the function body regardless of the order in which the function definitions appear in the file. Self-recursive and mutually recursive functions are possible without any additional declarations.

The only restrictions on statements in function bodies are external, and determine whether the log probability accumulator or random number generators are available; see the rest of this section for details.

Random number generating functions

Functions that call random number generating functions in their bodies must have a name that ends in _rng; attempts to use random-number generators in other functions lead to a compile-time error.

Like other random number generating functions, user-defined functions with names that end in _rng may be used only in the generated quantities block and transformed data block, or within the bodies of user-defined functions ending in _rng. An attempt to use such a function elsewhere results in a compile-time error.

Log probability access in functions

Functions that include distribution statements or log probability increment statements must have a name that ends in _lp. Attempts to use distribution statements or increment log probability statements in other functions lead to a compile-time error.

Like the target log density increment statement and distribution statements, user-defined functions with names that end in _lp may only be used in blocks where the log probability accumulator is accessible, namely the transformed parameters and model blocks. An attempt to use such a function elsewhere results in a compile-time error.

Defining probability functions for distribution statements

Functions whose names end in _lpdf and _lpmf (density and mass functions) can be used as probability functions in distribution statements. As with the built-in functions, the first argument will appear on the left of the distribution statement operator (~) in the distribution statement and the other arguments follow. For example, suppose a function returning the log of the density of y given parameter theta allows the use of the distribution statement is defined as follows.

real foo_lpdf(real y, vector theta) { ... }

Note that for function definitions, the comma is used rather than the vertical bar.

For every custom _lpdf and _lpmf defined there is a corresponding _lupdf and _lupmf defined automatically. The _lupdf and _lupmf versions of the functions cannot be defined directly (to do so will produce an error). The difference in the _lpdf and _lpmf and the corresponding _lupdf and _lupmf functions is that if any other unnormalized density functions are used inside the user-defined function, the _lpdf and _lpmf forms of the user-defined function will change these densities to be normalized. The _lupdf and _lupmf forms of the user-defined functions will instead allow other unnormalized density functions to drop additive constants.

The distribution statement shorthand

z ~ foo(phi);

will have the same effect as incrementing the target with the log of the unnormalized density:

target += foo_lupdf(z | phi);

Other _lupdf and _lupmf functions used in the definition of foo_lpdf will drop additive constants when foo_lupdf is called and will not drop additive constants when foo_lpdf is called.

If there are _lupdf and _lupmf functions used inside the following call to foo_lpdf, they will be forced to normalize (return the equivalent of their _lpdf and _lpmf forms):

target += foo_lpdf(z | phi);

If there are no _lupdf or _lupmf functions used in the definition of foo_lpdf, then there will be no difference between a foo_lpdf or foo_lupdf call.

The unnormalized _lupdf and _lupmf functions can only be used in the model block or in user-defined probability functions (those ending in _lpdf or _lpmf).

The same syntax and shorthand that works for _lpdf also works for log probability mass functions with suffixes _lpmf.

A function that is going to be accessed as distributions must return the log of the density or mass function it defines.

Parameters are constant

Within function definition bodies, the parameters may be used like any other variable. But the parameters are constant in the sense that they can’t be assigned to (i.e., can’t appear on the left side of an assignment (=) statement). In other words, their value remains constant throughout the function body. Attempting to assign a value to a function parameter value will raise a compile-time error.¹

Local variables may be declared at the top of the function block and scope as usual.

Return value

Non-void functions must have a return statement that returns an appropriately typed expression. If the expression in a return statement does not have the same type as the return type declared for the function, a compile-time error is raised.

Void functions may use return only without an argument, but return statements are not mandatory.

Return guarantee required

Unlike C++, Stan enforces a syntactic guarantee for non-void functions that ensures control will leave a non-void function through an appropriately typed return statement or because an exception is raised in the execution of the function. To enforce this condition, functions must have a return statement as the last statement in their body. This notion of last is defined recursively in terms of statements that qualify as bodies for functions. The base case is that

a return statement qualifies,

and the recursive cases are that

a sequence of statements qualifies if its last statement qualifies,
a for loop or while loop qualifies if its body qualifies, and
a conditional statement qualifies if it has a default else clause and all of its body statements qualify.

An exception is made for “obviously infinite” loops like while (1), which contain a return statement and no break statements. The only way to exit such a loop is to return, so they are considered as returning statements.

These rules disqualify

real foo(real x) {
  if (x > 2) {
    return 1.0;
  } else if (x <= 2) {
    return -1.0;
  }
}

because there is no default else clause, and disqualify

real foo(real x) {
  real y;
  y = x;
  while (x < 10) {
    if (x > 0) {
      return x;
    }
    y = x / 2;
  }
}

because the return statement is not the last statement in the while loop. A bogus dummy return could be placed after the while loop in this case. The rules for returns allow

real log_fancy(real x) {
  if (x < 1e-30) {
    return x;
  } else if (x < 1e-14) {
    return x * x;
  } else {
    return log(x);
  }
}

because there’s a default else clause and each condition body has return as its final statement.

Void Functions as Statements

Void functions

A function can be declared without a return value by using void in place of a return type. Note that the type void may only be used as a return type—arguments may not be declared to be of type void.

Usage as statement

A void function may be used as a statement.

Because there is no return, such a usage is only for side effects, such as incrementing the log probability function, printing, or raising an error.

Special return statements

In a return statement within a void function’s definition, the return keyword is followed immediately by a semicolon (;) rather than by the expression whose value is returned.

Declarations

Stan supports forward declarations, which look like function definitions without bodies. For example,

real unit_normal_lpdf(real y);

declares a function named unit_normal_lpdf that consumes a single real-valued input and produces a real-valued output. Declaring a function without a definition is only really useful when using an extension which supplies the definition in C++ rather than in the Stan code itself. How exactly this can be accomplished will differ depending on your Stan interface.

A function definition with a body simultaneously declares and defines the named function, as in

real unit_normal_lpdf(real y) {
  return -0.5 * square(y);
}

A function can be declared and (perhaps separately) defined at most once. However, functions with different argument types are considered distinct even if they have the same name; see the section on function overloading.

Footnotes

Despite being declared constant and appearing to have a pass-by-value syntax in Stan, the implementation of the language passes function arguments by constant reference in C++.↩︎