21 JSON Format for CmdStan

CmdStan can use JSON format for input data for both model data and parameters. Model data is read in by the model constructor. Model parameters are used to initialize the sampler and optimizer.

21.1 Creating JSON files

You can create the JSON file yourself using the guidelines below, but a more convenient way to create a JSON file for use with CmdStan is to use the write_stan_json() function provided by the CmdStanR interface.

21.2 JSON syntax summary

JSON is a data interchange notation, defined by an . JSON data files must in Unicode. JSON data is a series of structural tokens, literal tokens, and values:

  • Structural tokens are the left and right curly bracket {}, left and right square bracket [], the semicolon ;, and the comma ,.

  • Literal tokens must always be in lowercase. There are three literal tokens: true, false, null.

  • A primitive value is a single token which is either a literal, a string, or a number.

  • A string consists of zero or more Unicode characters enclosed in double quotes, e.g. "foo". A backslash is used to escape the double quote character as well as the backslash itself. JSON allows the use of Unicode character escapes, e.g. "\\uHHHH" where HHHH is the Unicode code point in hex.

  • Numbers are represented using either decimal notation or scientific notation. The following are examples of numbers: 17, 17.2, -17.2, -17.2e8, 17.2e-8.
    There is no distincition between integer and real numbers in the JSON format other than whether they have periods or scientific notation.

  • The special floating point values for positive infinity, negative infinity, and not-a-number can be represented in multiple ways. Positive infinity can be represented as the string "Inf", the string "Infinity", or the atom Infinity. Negative infinity can be represented as the string "-Inf", the string "-Infinity", or the atom -Infinity. Not-a-number can be represented as the string "NaN" or the atom NaN. These values may be mixed with other numerical types.

  • A complex scalar is represented as a two-element array consisting of its real component followed by its imaginary component. For example, the complex number \(2.3 - 1.83i\) would be represented in JSON as the two-element array [2.3, -1.83].

  • A JSON array is an ordered, comma-separated list of zero or more JSON values enclosed in square brackets. The elements of an array can be of any type. The following are examples of arrays: [], [1], [0.2, "-inf", true].

  • Vectors and row vectors in JSON are representing as arrays of their elements. For example, both the vector \([1 \quad 2]^{\top}\) and the row vector \([1 \quad 2]\) are represented by the JSON array [1, 2].

  • Complex vectors are represented as arrays of two-element arrays. For example, the complex vector \([2.3 - 1.83i \quad -4.8 + 2i]^{\top}\) is represented as [[2.3, -1.83], [-4.8, 2]] in JSON. A complex row vector has the same representation as its transpose (the vector with the same elements).

  • Matrices are represented as arrays of their row vectors. For example, the \(2 \times 3\) matrix \[ \begin{bmatrix} 1 & 2.7 & -9.8 \\ 4.2 & 1.8 & -7.3 \end{bmatrix} \] is represented in JSON as [[1, 2.7, -9.8], [4.2, 1.8, -7.3]].

  • Complex matrices are also represented as arrays of their row vectors. For example, the \(2 \times 3\) complex matrix \[ \begin{bmatrix} 1 + 2i & 3 - 4.2i & 13.1 + 2.7i \\ 3.1 & -5i & 0 \end{bmatrix} \]
    would be represented in JSON as [[[1, 2], [3, -4.2], [13.1, 2.7]], [[3.1, 0], [0, -5], [0, 0]]].

  • A name-value pair consists of a string followed by a colon followed by a value, either primitive or compound.

  • A JSON object is a comma-separated series of zero or more name-value pairs enclosed in curly brackets. Each name-value pair is a member of the object. Membership is unordered. Member names are not required to be unique. The following are examples of objects: { }, {"foo": null}, {"bar" : 17, "baz" : [14,15,16.6] }.

21.3 Stan data types in JSON notation

Stan follows the JSON standard. A Stan input file in JSON notation consists of single JSON object which contains zero or more name-value pairs. This structure corresponds to a Python data dictionary object. The following is an example of JSON data for the simple Bernoulli example model:

{ "N" : 10, "y" : [0,1,0,0,0,0,0,0,0,1] }

Matrix data and multi-dimensional arrays are indexed in row-major order. For a Stan program which has data block:

data {
  int d1;
  int d2;
  int d3;
  array[d1, d2, d3] int ar;
}

the following JSON input would be valid:

{ "d1" : 2,
  "d2" : 3,
  "d3" : 4,
  "ar" : [[[0,1,2,3], [4,5,6,7], [8,9,10,11]],
          [[12,13,14,15], [16,17,18,19], [20,21,22,23]]]
}

JSON ignores whitespace. In the above examples, the spaces and newlines are only used to improve readability and can be omitted.

All data inputs are encoded as name-value pairs. The following table provides more examples of JSON data. The left column contains a Stan data variable declaration and the right column contains valid JSON data inputs.

Stan declaration JSON encoding
int i "i": 17
real a "a" : 17
"a" : 17.2
"a" : "NaN"
"a" : "+inf"
"a" : "-inf"
complex z "z": [1, -2.3]
array[5] int "a" : [1, 2, 3, 4, 5]
array[5] real a "a" : [ 1, 2, 3.3, "NaN", 5 ]
array[2] complex b "b" : [[1, -2.3], [4.9, 0]]
vector[5] a "a" : [1, 2, 3.3, "NaN", 5]
row_vector[5] a "a" : [1, 2, 3.3, "NaN", 5]
matrix[2, 3] a "a" : [[ 1, 2, 3 ], [ 4, 5, 6]]
complex_vector[2] c "c" : [[-1.2, 3.3], [4.8, 1.9], [2.3, 0]]
complex_row_vector[2] c "c" : [[-1.2, 3.3], [4.8, 1.9], [2.3, 0]]
complex_matrix[2, 3] d "d" : [[[1, 1], [2, 2], [3, 3]],
[4, 4], [5, 5], [6, 6]]]

21.3.1 Empty arrays in JSON

JSON notation is not able to distinguish between multi-dimensional arrays where any dimension is \(0\), e.g., a 2-D array with dimensions \((1,0)\), i.e., an array which contains a single array which is empty, has JSON representation [ ]. To see how this works, consider the following Stan program data block:

data {
  int d;
  array[d] int ar_1d;
  array[d, d] int ar_2d;
  array[d, d, d] int ar_3d;
}

In the case where variable d is 1, all arrays will contain a single value. If array variable ar_d1 contains value 7, 2-D array variable ar_d2 contains (an array which contains) value 8, and 3-D array variable ar_d3 contains (an array which contains an array which contains) value 9, the JSON representation is:

{ "ar_d1" : [7],
  "ar_d2" : [[8]],
  "ar_d3" : [[[9]]]
}

However, in the case where variable d is 0, ar_d1 is empty, i.e., it contains no values, as is ar_d2, ar_d3, and the JSON representation is

{ "d" : 0,
  "ar_d1" : [ ],
  "ar_d2" : [ ],
  "ar_d3" : [ ]
}