# User-Defined Functions

Stan allows users to define their own functions. The basic syntax is a simplified version of that used in C and C++. This chapter specifies how functions are declared, defined, and used in Stan.

## Function-definition block

User-defined functions appear in a special function-definition block before all of the other program blocks.

```
functions {
// ... function declarations and definitions ...
}data {
// ...
```

Function definitions and declarations may appear in any order. Forward declarations are allowed but not required.

## Function names

The rules for function naming and function-argument naming are the same as for other variables; see the section on variables for more information on valid identifiers. For example,

`real foo(real mu, real sigma);`

declares a function named `foo`

with two argument variables of types `real`

and `real`

. The arguments are named `mu`

and `sigma`

, but that is not part of the declaration.

### Function overloading

Multiple user-defined functions may have the same name if they have different sequences of argument types. This is known as function overloading.

For example, the following two functions are both defined with the name `add_up`

```
real add_up(real a, real b){
return a + b;
}
real add_up(real a, real b, real c){
return a + b + c;
}
```

The return types of overloaded functions do not need to be the same. One could define an additional `add_up`

function as follows

```
int add_up(int a, int b){
return a + b;
}
```

That being said, functions may **not** use the same name if their signature *only* differs by the return type.

For example, the following is not permitted

```
// illegal
real baz(int x);
int baz(int x);
```

Function names used in the Stan standard library may be overloaded by user-defined functions. Exceptions to this are the `reduce_sum`

family of functions and ODE integrators, which cannot be overloaded.

## Calling functions

All function arguments are mandatory—there are no default values.

### Functions as expressions

Functions with non-void return types are called just like any other built-in function in Stan—they are applied to appropriately typed arguments to produce an expression, which has a value when executed.

### Functions as statements

Functions with void return types may be applied to arguments and used as statements.qmd. These act like distribution statements or print statements. Such uses are only appropriate for functions that act through side effects, such as incrementing the log probability accumulator, printing, or raising exceptions.

### Resolving overloads

Overloaded functions alongside type promotion can result in situations where there are multiple valid interpretations of a function call. Stan requires that there be a unique signature which minimizes the number of promotions required.

Consider the following two overloaded functions

```
real foo(int a, real b);
real foo(real a, int b);
```

These functions do **not** have a unique minimum when called with two integer arguments `foo(1,2)`

, and therefore cannot be called as such.

Promotion of integers to complex numbers is considered as two separate promotions, one from `int`

to `real`

and a second from `real`

to `complex`

. Consider the following functions with `real`

and `complex`

signatures

```
real bar(real x);
real bar(complex z);
```

A call `bar(5)`

with an integer argument will be resolved to `bar(real)`

because it only requires a single promotion, whereas the promotion to a complex number requires two promotions.

### Argument promotion

The rules for calling functions work the same way as assignment as far as promotion goes. This means that we can promote arguments to the type expected by function arguments. For example, the following will work.

```
real foo(real x) { return ... };
...int a = 5;
real b = foo(a); // a promoted to type real
```

In addition to promoting `int`

to `real`

, Stan also promotes `real`

to `complex`

, and by transitivity, `int`

to `complex`

. This also works for containers, so an array of `int`

may be assigned to an array of `real`

of the same shape. And we can also promote `vector`

to `complex_vector`

and similarly for row vectors and matrices.

### Probability functions in distribution statements

Functions whose name ends in `_lpdf`

or `_lpmf`

(log density and mass functions) may be used as probability functions and may be used in place of parameterized distributions on the right side of statements.qmd#distribution-statements.section.

### Restrictions on placement

Functions of certain types are restricted on scope of usage. Functions whose names end in `_lp`

assume access to the log probability accumulator and are only available in the transformed parameter and model blocks.

Functions whose names end in `_rng`

assume access to the random number generator and may only be used within the generated quantities block, transformed data block, and within user-defined functions ending in `_rng`

.

Functions whose names end in `_lpdf`

and `_lpmf`

can be used anywhere. However, `_lupdf`

and `_lupmf`

functions can only be used in the model block or user-defined probability functions.

See the section on function bodies for more information on these special types of function.

## Argument types and qualifiers

Stan’s functions all have declared types for both arguments and returned value. As with built-in functions, user-defined functions are only declared for base argument type and dimensionality. This requires a different syntax than for declaring other variables. The choice of language was made so that return types and argument types could use the same declaration syntax.

The type `void`

may not be used as an argument type, only a return type for a function with side effects.

### Base variable type declaration

The base variable types are `integer`

, `real`

, `complex`

, `vector`

, `row_vector`

, and `matrix`

. No lower-bound or upper-bound constraints are allowed (e.g., `real<lower=0>`

is illegal). Specialized constrained types are also not allowed (e.g., `simplex`

is illegal).

Tuple types of the form `tuple(T1, ..., TN)`

are also allowed, with all of the types `T1`

to `TN`

being function argument types (i.e., no constraints and no sizes).

### Dimensionality declaration

Arguments and return types may be arrays, and these are indicated with optional brackets and commas as would be used for indexing. For example, `int`

denotes a single integer argument or return, whereas `array[] real`

indicates a one-dimensional array of reals, `array[,] real`

a two-dimensional array and `array[,,] real`

a three-dimensional array; whitespace is optional, as usual.

The dimensions for vectors and matrices are not included, so that `matrix`

is the type of a single matrix argument or return type. Thus if a variable is declared as `matrix a`

, then `a`

has two indexing dimensions, so that `a[1]`

is a row vector and `a[1, 1]`

a real value. Matrices implicitly have two indexing dimensions. The type declaration `matrix[ , ] b`

specifies that `b`

is a two-dimensional array of matrices, for a total of four indexing dimensions, with `b[1, 1, 1, 1]`

picking out a real value.

### Dimensionality checks and exceptions

Function argument and return types are not themselves checked for dimensionality. A matrix of any size may be passed in as a matrix argument. Nevertheless, a user-defined function might call a function (such as a multivariate normal density) that itself does dimensionality checks.

Dimensions of function return values will be checked if they’re assigned to a previously declared variable. They may also be checked if they are used as the argument to a function.

Any errors raised by calls to functions inside user functions or return type mismatches are simply passed on; this typically results in a warning message and rejection of a proposal during sampling or optimization.

### Data-only qualifiers

Some of Stan’s built-in functions, like the differential equation solvers, have arguments that must be data. Such data-only arguments must be expressions involving only data, transformed data, and generated quantity variables.

In user-defined functions, the qualifier `data`

may be placed before an argument type declaration to indicate that the argument must be data only. For example,

```
real foo(data real x) {
return x^2;
}
```

requires the argument `x`

to be data only.

Declaring an argument data only allows type inference to proceed in the body of the function so that, for example, the variable may be used as a data-only argument to a built-in function.

## Function bodies

The body of a function is between an open curly brace (`{`

) and close curly brace (`}`

). The body may contain local variable declarations at the top of the function body’s block and these scope the same way as local variables used in any other statement block.

Any user-defined function may be used in the function body regardless of the order in which the function definitions appear in the file. Self-recursive and mutually recursive functions are possible without any additional declarations.

The only restrictions on statements in function bodies are external, and determine whether the log probability accumulator or random number generators are available; see the rest of this section for details.

### Random number generating functions

Functions that call random number generating functions in their bodies must have a name that ends in `_rng`

; attempts to use random-number generators in other functions lead to a compile-time error.

Like other random number generating functions, user-defined functions with names that end in `_rng`

may be used only in the generated quantities block and transformed data block, or within the bodies of user-defined functions ending in `_rng`

. An attempt to use such a function elsewhere results in a compile-time error.

### Log probability access in functions

Functions that include statements.qmd#distribution-statements.section or statements.qmd#increment-log-prob.section must have a name that ends in `_lp`

. Attempts to use distribution statements or increment log probability statements in other functions lead to a compile-time error.

Like the target log density increment statement and distribution statements, user-defined functions with names that end in `_lp`

may only be used in blocks where the log probability accumulator is accessible, namely the transformed parameters and model blocks. An attempt to use such a function elsewhere results in a compile-time error.

### Defining probability functions for distribution statements

Functions whose names end in `_lpdf`

and `_lpmf`

(density and mass functions) can be used as probability functions in distribution statements. As with the built-in functions, the first argument will appear on the left of the distribution statement operator (`~`

) in the distribution statement and the other arguments follow. For example, suppose a function returning the log of the density of `y`

given parameter `theta`

allows the use of the distribution statement is defined as follows.

`real foo_lpdf(real y, vector theta) { ... }`

Note that for function definitions, the comma is used rather than the vertical bar.

For every custom `_lpdf`

and `_lpmf`

defined there is a corresponding `_lupdf`

and `_lupmf`

defined automatically. The `_lupdf`

and `_lupmf`

versions of the functions cannot be defined directly (to do so will produce an error). The difference in the `_lpdf`

and `_lpmf`

and the corresponding `_lupdf`

and `_lupmf`

functions is that if any other unnormalized density functions are used inside the user-defined function, the `_lpdf`

and `_lpmf`

forms of the user-defined function will change these densities to be normalized. The `_lupdf`

and `_lupmf`

forms of the user-defined functions will instead allow other unnormalized density functions to drop additive constants.

The distribution statement shorthand

` z ~ foo(phi);`

will have the same effect as incrementing the target with the log of the unnormalized density:

`target += foo_lupdf(z | phi);`

Other `_lupdf`

and `_lupmf`

functions used in the definition of `foo_lpdf`

will drop additive constants when `foo_lupdf`

is called and will not drop additive constants when `foo_lpdf`

is called.

If there are `_lupdf`

and `_lupmf`

functions used inside the following call to `foo_lpdf`

, they will be forced to normalize (return the equivalent of their `_lpdf`

and `_lpmf`

forms):

`target += foo_lpdf(z | phi);`

If there are no `_lupdf`

or `_lupmf`

functions used in the definition of `foo_lpdf`

, then there will be no difference between a `foo_lpdf`

or `foo_lupdf`

call.

The unnormalized `_lupdf`

and `_lupmf`

functions can only be used in the model block or in user-defined probability functions (those ending in `_lpdf`

or `_lpmf`

).

The same syntax and shorthand that works for `_lpdf`

also works for log probability mass functions with suffixes `_lpmf`

.

A function that is going to be accessed as distributions must return the log of the density or mass function it defines.

## Parameters are constant

Within function definition bodies, the parameters may be used like any other variable. But the parameters are constant in the sense that they can’t be assigned to (i.e., can’t appear on the left side of an assignment (`=`

) statement). In other words, their value remains constant throughout the function body. Attempting to assign a value to a function parameter value will raise a compile-time error.^{1}

Local variables may be declared at the top of the function block and scope as usual.

## Return value

Non-void functions must have a return statement that returns an appropriately typed expression. If the expression in a return statement does not have the same type as the return type declared for the function, a compile-time error is raised.

Void functions may use `return`

only without an argument, but return statements are not mandatory.

### Return guarantee required

Unlike C++, Stan enforces a syntactic guarantee for non-void functions that ensures control will leave a non-void function through an appropriately typed return statement or because an exception is raised in the execution of the function. To enforce this condition, functions must have a return statement as the last statement in their body. This notion of last is defined recursively in terms of statements that qualify as bodies for functions. The base case is that

- a return statement qualifies,

and the recursive cases are that

- a sequence of statements qualifies if its last statement qualifies,
- a for loop or while loop qualifies if its body qualifies, and
- a conditional statement qualifies if it has a default else clause and all of its body statements qualify.

An exception is made for “obviously infinite” loops like `while (1)`

, which contain a `return`

statement and no `break`

statements. The only way to exit such a loop is to return, so they are considered as returning statements.

These rules disqualify

```
real foo(real x) {
if (x > 2) {
return 1.0;
else if (x <= 2) {
} return -1.0;
} }
```

because there is no default `else`

clause, and disqualify

```
real foo(real x) {
real y;
y = x;while (x < 10) {
if (x > 0) {
return x;
}2;
y = x /
} }
```

because the return statement is not the last statement in the while loop. A bogus dummy return could be placed after the while loop in this case. The rules for returns allow

```
real log_fancy(real x) {
if (x < 1e-30) {
return x;
else if (x < 1e-14) {
} return x * x;
else {
} return log(x);
} }
```

because there’s a default else clause and each condition body has return as its final statement.

## Void Functions as Statements

### Void functions

A function can be declared without a return value by using `void`

in place of a return type. Note that the type `void`

may only be used as a return type—arguments may not be declared to be of type `void`

.

### Usage as statement

A void function may be used as a statement.

Because there is no return, such a usage is only for side effects, such as incrementing the log probability function, printing, or raising an error.

### Special return statements

In a return statement within a void function’s definition, the `return`

keyword is followed immediately by a semicolon (`;`

) rather than by the expression whose value is returned.

## Declarations

Stan supports forward declarations, which look like function definitions without bodies. For example,

`real unit_normal_lpdf(real y);`

declares a function named `unit_normal_lpdf`

that consumes a single real-valued input and produces a real-valued output. Declaring a function without a definition is only really useful when using an extension which supplies the definition in C++ rather than in the Stan code itself. How exactly this can be accomplished will differ depending on your Stan interface.

A function definition with a body simultaneously declares and defines the named function, as in

```
real unit_normal_lpdf(real y) {
return -0.5 * square(y);
}
```

A function can be declared and (perhaps separately) defined at most once. However, functions with different argument types are considered distinct even if they have the same name; see the section on function overloading.

## Footnotes

Despite being declared constant and appearing to have a pass-by-value syntax in Stan, the implementation of the language passes function arguments by constant reference in C++.↩︎