JSON Format for CmdStan
CmdStan can use JSON format for input data for both model data and parameters. Model data is read in by the model constructor. Model parameters are used to initialize the sampler and optimizer.
Creating JSON files
You can create the JSON file yourself using the guidelines below, but a more convenient way to create a JSON file for use with CmdStan is to use the write_stan_json()
function provided by the CmdStanR interface.
JSON syntax summary
JSON is a data interchange notation, defined by an EMCA standard. JSON data files must in Unicode. JSON data is a series of structural tokens, literal tokens, and values:
Structural tokens are the left and right curly bracket
{}
, left and right square bracket[]
, the semicolon;
, and the comma,
.Literal tokens must always be in lowercase. There are three literal tokens:
true
,false
,null
.A primitive value is a single token which is either a literal, a string, or a number.
A string consists of zero or more Unicode characters enclosed in double quotes, e.g.
"foo"
. A backslash is used to escape the double quote character as well as the backslash itself. JSON allows the use of Unicode character escapes, e.g."\\uHHHH"
whereHHHH
is the Unicode code point in hex.Numbers are represented using either decimal notation or scientific notation. The following are examples of numbers:
17
,17.2
,17.2
,17.2e8
,17.2e8
.
There is no distinction between integer and real numbers in the JSON format other than whether they have periods or scientific notation.The special floating point values for positive infinity, negative infinity, and notanumber can be represented in multiple ways. Positive infinity can be represented as the string
"Inf"
, the string"Infinity"
, or the atomInfinity
. Negative infinity can be represented as the string"Inf"
, the string"Infinity"
, or the atomInfinity
. Notanumber can be represented as the string"NaN"
or the atomNaN
. These values may be mixed with other numerical types.A complex scalar is represented as a twoelement array consisting of its real component followed by its imaginary component. For example, the complex number \(2.3  1.83i\) would be represented in JSON as the twoelement array
[2.3, 1.83]
.A JSON array is an ordered, commaseparated list of zero or more JSON values enclosed in square brackets. The elements of an array can be of any type. The following are examples of arrays:
[]
,[1]
,[0.2, "inf", true]
.Vectors and row vectors in JSON are representing as arrays of their elements. For example, both the vector \([1 \quad 2]^{\top}\) and the row vector \([1 \quad 2]\) are represented by the JSON array
[1, 2]
.Complex vectors are represented as arrays of twoelement arrays. For example, the complex vector \([2.3  1.83i \quad 4.8 + 2i]^{\top}\) is represented as
[[2.3, 1.83], [4.8, 2]]
in JSON. A complex row vector has the same representation as its transpose (the vector with the same elements).Matrices are represented as arrays of their row vectors. For example, the \(2 \times 3\) matrix \[\begin{equation*} \begin{bmatrix} 1 & 2.7 & 9.8 \\ 4.2 & 1.8 & 7.3 \end{bmatrix} \end{equation*}\] is represented in JSON as
[[1, 2.7, 9.8], [4.2, 1.8, 7.3]]
.Complex matrices are also represented as arrays of their row vectors. For example, the \(2 \times 3\) complex matrix \[\begin{equation*} \begin{bmatrix} 1 + 2i & 3  4.2i & 13.1 + 2.7i \\ 3.1 & 5i & 0 \end{bmatrix} \end{equation*}\] would be represented in JSON as
[[[1, 2], [3, 4.2], [13.1, 2.7]], [[3.1, 0], [0, 5], [0, 0]]]
.Tuples are written as nested JSON objects where the keys are strings for the numbered slots in the tuple. For example, the tuple
(1.5, 3.4)
is represented in JSON as{"1": 1.5, "2": 3.4}
.A namevalue pair consists of a string followed by a colon followed by a value, either primitive or compound.
A JSON object is a commaseparated series of zero or more namevalue pairs enclosed in curly brackets. Each namevalue pair is a member of the object. Membership is unordered. Member names are not required to be unique. The following are examples of objects:
{ }
,{"foo": null}
,{"bar" : 17, "baz" : [14,15,16.6] }
.
Stan data types in JSON notation
Stan follows the JSON standard. A Stan input file in JSON notation consists of single JSON object which contains zero or more namevalue pairs. This structure corresponds to a Python data dictionary object. The following is an example of JSON data for the simple Bernoulli example model:
{ "N" : 10, "y" : [0,1,0,0,0,0,0,0,0,1] }
Matrix data and multidimensional arrays are indexed in rowmajor order. For a Stan program which has data block:
data {
int d1;
int d2;
int d3;
array[d1, d2, d3] int ar;
}
the following JSON input would be valid:
{ "d1" : 2,
"d2" : 3,
"d3" : 4,
"ar" : [[[0,1,2,3], [4,5,6,7], [8,9,10,11]],
[[12,13,14,15], [16,17,18,19], [20,21,22,23]]]
}
JSON ignores whitespace. In the above examples, the spaces and newlines are only used to improve readability and can be omitted.
All data inputs are encoded as namevalue pairs. The following table provides more examples of JSON data. The left column contains a Stan data variable declaration and the right column contains valid JSON data inputs.
Stan declaration  JSON encoding 

int i 
"i": 17 
real a 
"a" : 17 
"a" : 17.2 

"a" : "NaN" 

"a" : "+inf" 

"a" : "inf" 

complex z 
"z": [1, 2.3] 
array[5] int 
"a" : [1, 2, 3, 4, 5] 
array[5] real a 
"a" : [ 1, 2, 3.3, "NaN", 5 ] 
array[2] complex b 
"b" : [[1, 2.3], [4.9, 0]] 
vector[5] a 
"a" : [1, 2, 3.3, "NaN", 5] 
row_vector[5] a 
"a" : [1, 2, 3.3, "NaN", 5] 
matrix[2, 3] a 
"a" : [[ 1, 2, 3 ], [ 4, 5, 6]] 
complex_vector[2] c 
"c" : [[1.2, 3.3], [4.8, 1.9], [2.3, 0]] 
complex_row_vector[2] c 
"c" : [[1.2, 3.3], [4.8, 1.9], [2.3, 0]] 
complex_matrix[2, 3] d 
"d" : [[[1, 1], [2, 2], [3, 3]], [4, 4], [5, 5], [6, 6]]] 
tuple(real, array[2] int) t 
"t" : { "1": 1.4, "2": [1, 2]} 
Empty arrays in JSON
JSON notation is not able to distinguish between multidimensional arrays where any dimension is \(0\), e.g., a 2D array with dimensions \((1,0)\), i.e., an array which contains a single array which is empty, has JSON representation [ ]
. To see how this works, consider the following Stan program data block:
data {
int d;
array[d] int ar_1d;
array[d, d] int ar_2d;
array[d, d, d] int ar_3d;
}
In the case where variable d
is 1
, all arrays will contain a single value. If array variable ar_d1
contains value 7
, 2D array variable ar_d2
contains (an array which contains) value 8
, and 3D array variable ar_d3
contains (an array which contains an array which contains) value 9
, the JSON representation is:
{ "ar_d1" : [7],
"ar_d2" : [[8]],
"ar_d3" : [[[9]]]
}
However, in the case where variable d
is 0
, ar_d1
is empty, i.e., it contains no values, as is ar_d2
, ar_d3
, and the JSON representation is
{ "d" : 0,
"ar_d1" : [ ],
"ar_d2" : [ ],
"ar_d3" : [ ]
}