Language Syntax

This chapter defines the basic syntax of the Stan modeling language using a Backus-Naur form (BNF) grammar plus extra-grammatical constraints on function typing and operator precedence and associativity.

BNF grammars

Syntactic conventions

In the following BNF grammars, tokens are represented in ALLCAPS. Grammar non-terminals are surrounded by < and >. A square brackets ([A]) indicates optionality of A. A postfixed Kleene star (A*) indicates zero or more occurrences of A. Parenthesis can be used to group symbols together in productions.

Finally, this grammar uses the concept of “parameterized nonterminals” as used in the parsing library Menhir. A rule like <list(x)> ::= x (COMMA x)* declares a generic list rule, which can later be applied to others by the symbol <list(<expression>)>.

The following representation is constructed directly from the OCaml reference parser using a tool called Obelisk. The raw output is available here.

Programs

<program> ::= [<function_block>] [<data_block>] [<transformed_data_block>]
              [<parameters_block>] [<transformed_parameters_block>]
              [<model_block>] [<generated_quantities_block>] EOF

<functions_only> ::= <function_def>* EOF

<function_block> ::= FUNCTIONBLOCK LBRACE <function_def>* RBRACE

<data_block> ::= DATABLOCK LBRACE <top_var_decl_no_assign>* RBRACE

<transformed_data_block> ::= TRANSFORMEDDATABLOCK LBRACE
                             <top_vardecl_or_statement>* RBRACE

<parameters_block> ::= PARAMETERSBLOCK LBRACE <top_var_decl_no_assign>*
                       RBRACE

<transformed_parameters_block> ::= TRANSFORMEDPARAMETERSBLOCK LBRACE
                                   <top_vardecl_or_statement>* RBRACE

<model_block> ::= MODELBLOCK LBRACE <vardecl_or_statement>* RBRACE

<generated_quantities_block> ::= GENERATEDQUANTITIESBLOCK LBRACE
                                 <top_vardecl_or_statement>* RBRACE

Function declarations and definitions

<function_def> ::= <return_type> <decl_identifier> LPAREN [<arg_decl> (COMMA
                   <arg_decl>)*] RPAREN <statement>

<return_type> ::= VOID
                | <unsized_type>

<arg_decl> ::= [DATABLOCK] <unsized_type> <decl_identifier>

<unsized_type> ::= ARRAY <unsized_dims> <basic_type>
                 | ARRAY <unsized_dims> <unsized_tuple_type>
                 | <basic_type>
                 | <unsized_tuple_type>

<unsized_tuple_type> ::= TUPLE LPAREN <unsized_type> COMMA <unsized_type>
                         (COMMA <unsized_type>)* RPAREN

<basic_type> ::= INT
               | REAL
               | COMPLEX
               | VECTOR
               | ROWVECTOR
               | MATRIX
               | COMPLEXVECTOR
               | COMPLEXROWVECTOR
               | COMPLEXMATRIX

<unsized_dims> ::= LBRACK COMMA* RBRACK

Variable declarations and compound definitions

<identifier> ::= IDENTIFIER
               | TRUNCATE

<decl_identifier> ::= <identifier>

<no_assign> ::= UNREACHABLE

<optional_assignment(rhs)> ::= [ASSIGN rhs]

<id_and_optional_assignment(rhs)> ::= <decl_identifier>
                                      <optional_assignment(rhs)>

<decl(type_rule, rhs)> ::= type_rule <decl_identifier> <dims>
                           <optional_assignment(rhs)> SEMICOLON
                         | <higher_type(type_rule)>
                           <id_and_optional_assignment(rhs)> (COMMA
                           <id_and_optional_assignment(rhs)>)* SEMICOLON

<higher_type(type_rule)> ::= <array_type(type_rule)>
                           | <tuple_type(type_rule)>
                           | type_rule

<array_type(type_rule)> ::= <arr_dims> type_rule
                          | <arr_dims> <tuple_type(type_rule)>

<tuple_type(type_rule)> ::= TUPLE LPAREN <higher_type(type_rule)> COMMA
                            <higher_type(type_rule)> (COMMA
                            <higher_type(type_rule)>)* RPAREN

<var_decl> ::= <decl(<sized_basic_type>, <expression>)>

<top_var_decl> ::= <decl(<top_var_type>, <expression>)>

<top_var_decl_no_assign> ::= <decl(<top_var_type>, <no_assign>)>
                           | SEMICOLON

<sized_basic_type> ::= INT
                     | REAL
                     | COMPLEX
                     | VECTOR LBRACK <expression> RBRACK
                     | ROWVECTOR LBRACK <expression> RBRACK
                     | MATRIX LBRACK <expression> COMMA <expression> RBRACK
                     | COMPLEXVECTOR LBRACK <expression> RBRACK
                     | COMPLEXROWVECTOR LBRACK <expression> RBRACK
                     | COMPLEXMATRIX LBRACK <expression> COMMA <expression>
                       RBRACK

<top_var_type> ::= INT [LABRACK <range> RABRACK]
                 | REAL <type_constraint>                  | TUPLE

                 | COMPLEX <type_constraint>
                 | VECTOR <type_constraint> LBRACK <expression> RBRACK
                 | ROWVECTOR <type_constraint> LBRACK <expression> RBRACK
                 | MATRIX <type_constraint> LBRACK <expression> COMMA
                   <expression> RBRACK
                 | COMPLEXVECTOR <type_constraint> LBRACK <expression> RBRACK
                 | COMPLEXROWVECTOR <type_constraint> LBRACK <expression>
                   RBRACK
                 | COMPLEXMATRIX <type_constraint> LBRACK <expression> COMMA
                   <expression> RBRACK
                 | ORDERED LBRACK <expression> RBRACK
                 | POSITIVEORDERED LBRACK <expression> RBRACK
                 | SIMPLEX LBRACK <expression> RBRACK
                 | UNITVECTOR LBRACK <expression> RBRACK
                 | CHOLESKYFACTORCORR LBRACK <expression> RBRACK
                 | CHOLESKYFACTORCOV LBRACK <expression> [COMMA <expression>]
                   RBRACK
                 | CORRMATRIX LBRACK <expression> RBRACK
                 | COVMATRIX LBRACK <expression> RBRACK

<type_constraint> ::= [LABRACK <range> RABRACK]
                    | LABRACK <offset_mult> RABRACK

<range> ::= LOWER ASSIGN <constr_expression> COMMA UPPER ASSIGN
            <constr_expression>
          | UPPER ASSIGN <constr_expression> COMMA LOWER ASSIGN
            <constr_expression>
          | LOWER ASSIGN <constr_expression>
          | UPPER ASSIGN <constr_expression>

<offset_mult> ::= OFFSET ASSIGN <constr_expression> COMMA MULTIPLIER ASSIGN
                  <constr_expression>
                | MULTIPLIER ASSIGN <constr_expression> COMMA OFFSET ASSIGN
                  <constr_expression>
                | OFFSET ASSIGN <constr_expression>
                | MULTIPLIER ASSIGN <constr_expression>

<arr_dims> ::= ARRAY LBRACK <expression> (COMMA <expression>)* RBRACK

Expressions

<expression> ::= <expression> QMARK <expression> COLON <expression>
               | <expression> <infixOp> <expression>
               | <prefixOp> <expression>
               | <expression> <postfixOp>
               | <common_expression>

<constr_expression> ::= <constr_expression> <arithmeticBinOp>
                        <constr_expression>
                      | <prefixOp> <constr_expression>
                      | <constr_expression> <postfixOp>
                      | <common_expression>

<common_expression> ::= <identifier>
                      | INTNUMERAL
                      | REALNUMERAL
                      | DOTNUMERAL
                      | IMAGNUMERAL
                      | LBRACE <expression> (COMMA <expression>)* RBRACE
                      | LBRACK [<expression> (COMMA <expression>)*] RBRACK
                      | <identifier> LPAREN [<expression> (COMMA
                        <expression>)*] RPAREN
                      | TARGET LPAREN RPAREN
                      | <identifier> LPAREN <expression> BAR [<expression>
                        (COMMA <expression>)*] RPAREN
                      | LPAREN <expression> COMMA <expression> (COMMA
                        <expression>)* RPAREN
                      | <common_expression> DOTNUMERAL
                      | <common_expression> LBRACK <indexes> RBRACK
                      | LPAREN <expression> RPAREN

<prefixOp> ::= BANG
             | MINUS
             | PLUS

<postfixOp> ::= TRANSPOSE

<infixOp> ::= <arithmeticBinOp>
            | <logicalBinOp>

<arithmeticBinOp> ::= PLUS
                    | MINUS
                    | TIMES
                    | DIVIDE
                    | IDIVIDE
                    | MODULO
                    | LDIVIDE
                    | ELTTIMES
                    | ELTDIVIDE
                    | HAT
                    | ELTPOW

<logicalBinOp> ::= OR
                 | AND
                 | EQUALS
                 | NEQUALS
                 | LABRACK
                 | LEQ
                 | RABRACK
                 | GEQ

<indexes> ::= epsilon
            | COLON
            | <expression>
            | <expression> COLON
            | COLON <expression>
            | <expression> COLON <expression>
            | <indexes> COMMA <indexes>

<printables> ::= <expression>
               | <string_literal>
               | <printables> COMMA <printables>

Statements

<statement> ::= <atomic_statement>
              | <nested_statement>

<atomic_statement> ::= <common_expression> <assignment_op> <expression>
                       SEMICOLON
                     | <identifier> LPAREN [<expression> (COMMA
                       <expression>)*] RPAREN SEMICOLON
                     | <expression> TILDE <identifier> LPAREN [<expression>
                       (COMMA <expression>)*] RPAREN [<truncation>] SEMICOLON
                     | TARGET PLUSASSIGN <expression> SEMICOLON
                     | BREAK SEMICOLON
                     | CONTINUE SEMICOLON
                     | PRINT LPAREN <printables> RPAREN SEMICOLON
                     | REJECT LPAREN <printables> RPAREN SEMICOLON
                     | FATAL_ERROR LPAREN <printables> RPAREN SEMICOLON
                     | RETURN <expression> SEMICOLON
                     | RETURN SEMICOLON
                     | SEMICOLON

<assignment_op> ::= ASSIGN
                  | PLUSASSIGN
                  | MINUSASSIGN
                  | TIMESASSIGN
                  | DIVIDEASSIGN
                  | ELTTIMESASSIGN
                  | ELTDIVIDEASSIGN

<string_literal> ::= STRINGLITERAL

<truncation> ::= TRUNCATE LBRACK [<expression>] COMMA [<expression>] RBRACK

<nested_statement> ::= IF LPAREN <expression> RPAREN <vardecl_or_statement>
                       ELSE <vardecl_or_statement>
                     | IF LPAREN <expression> RPAREN <vardecl_or_statement>
                     | WHILE LPAREN <expression> RPAREN
                       <vardecl_or_statement>
                     | FOR LPAREN <identifier> IN <expression> COLON
                       <expression> RPAREN <vardecl_or_statement>
                     | FOR LPAREN <identifier> IN <expression> RPAREN
                       <vardecl_or_statement>
                     | PROFILE LPAREN <string_literal> RPAREN LBRACE
                       <vardecl_or_statement>* RBRACE
                     | LBRACE <vardecl_or_statement>* RBRACE

<vardecl_or_statement> ::= <statement>
                         | <var_decl>

<top_vardecl_or_statement> ::= <statement>
                             | <top_var_decl>

Tokenizing rules

Many of the tokens used in the BNF grammars follow obviously from their names: DATABLOCK is the literal string ‘data’, COMMA is a single ‘,’ character, etc. The literal representation of each operator is additionally provided in the operator precedence table.

A few tokens are not so obvious, and are defined here in regular expressions:

IDENTIFIER = [a-zA-Z] [a-zA-Z0-9_]*

STRINGLITERAL = ".*"

INTNUMERAL = [0-9]+ (_ [0-9]+)*

EXPLITERAL = [eE] [+-]? INTNUMERAL

REALNUMERAL = INTNUMERAL \. INTNUMERAL? EXPLITERAL?
            | \. INTNUMERAL EXPLITERAL
            | INTNUMERAL EXPLITERAL

IMAGNUMERAL = (REALNUMERAL | INTNUMERAL) i

DOTNUMERAL = \. INTNUMERAL

Extra-grammatical constraints

Type constraints

A well-formed Stan program must satisfy the type constraints imposed by functions and distributions. For example, the binomial distribution requires an integer total count parameter and integer variate and when truncated would require integer truncation points. If these constraints are violated, the program will be rejected during compilation with an error message indicating the location of the problem.

Operator precedence and associativity

In the Stan grammar provided in this chapter, the expression 1 + 2 * 3 has two parses. As described in the operator precedence table, Stan disambiguates between the meaning \(1 + (2 \times 3)\) and the meaning \((1 + 2) \times 3\) based on operator precedences and associativities.

Typing of compound declaration and definition

In a compound variable declaration and definition, the type of the right-hand side expression must be assignable to the variable being declared. The assignability constraint restricts compound declarations and definitions to local variables and variables declared in the transformed data, transformed parameters, and generated quantities blocks.

Typing of array expressions

The types of expressions used for elements in array expressions ('{' expressions '}') must all be of the same type or a mixture of scalar (int, real and complex) types (in which case the result is promoted to be of the highest type on the int -> real -> complex hierarchy).

Forms of numbers

Integer literals longer than one digit may not start with 0 and real literals cannot consist of only a period or only an exponent.

Conditional arguments

Both the conditional if-then-else statement and while-loop statement require the expression denoting the condition to be a primitive type, integer or real.

For loop containers

The for loop statement requires that we specify in addition to the loop identifier, either a range consisting of two expressions denoting an integer, separated by ‘:’, or a single expression denoting a container. The loop variable will be of type integer in the former case and of the contained type in the latter case. Furthermore, the loop variable must not be in scope (i.e., there is no masking of variables).

Only break and continue in loops

The break and continue statements may only be used within the body of a for-loop or while-loop.

Block-specific restrictions

Some constructs in the Stan language are only allowed in certain blocks or in certain kinds of user-defined functions.

PRNG functions

Functions ending in _rng may only be called in the transformed data and generated quantities block, and within the bodies of user-defined functions with names ending in _rng.

Unnormalized distributions

Unnormalized distributions (with suffixes _lupmf or _lupdf) may only be called in the model block, user-defined probability functions, or within the bodies of user defined functions which end in _lp.

Incrementing and accessing target

target += statements can only be used inside of the model block or user-defined functions which end in _lp.

User defined functions which end in _lp and the target() function can only be used in the model block, transformed parameters block, and in the bodies of other user defined functions which end in _lp.

Sampling statements (using ~) can only be used in the model block or in the bodies of user-defined functions which end in _lp.

Probability function naming

A probability function literal must have one of the following suffixes: _lpdf, _lpmf, _lcdf, or _lccdf.

Indexes

Standalone expressions used as indexes must denote either an integer (int) or an integer array (array[] int). Expressions participating in range indexes (e.g., a and b in a : b) must denote integers (int).

A second condition is that there not be more indexes provided than dimensions of the underlying expression (in general) or variable (on the left side of assignments) being indexed. A vector or row vector adds 1 to the array dimension and a matrix adds 2. That is, the type array[ , , ] matrix, a three-dimensional array of matrices, has five index positions: three for the array, one for the row of the matrix and one for the column.

Back to top