Stan User’s Guide

This is an old version, view current version.

10.2 Simulating from a Gaussian Process

It is simplest to start with a Stan model that does nothing more than simulate draws of functions \(f\) from a Gaussian process. In practical terms, the model will draw values \(y_n = f(x_n)\) for finitely many input points \(x_n\).

The Stan model defines the mean and covariance functions in a transformed data block and then samples outputs \(y\) in the model using a multivariate normal distribution. To make the model concrete, the squared exponential covariance function described in the previous section will be used with hyperparameters set to \(\alpha^2 = 1\), \(\rho^2 = 1\), and \(\sigma^2 = 0.1\), and the mean function \(m\) is defined to always return the zero vector, \(m(x) = {\bf 0}\). Consider the following implementation of a Gaussian process simulator.

data {
  int<lower=1> N;
  real x[N];
}
transformed data {
  matrix[N, N] K;
  vector[N] mu = rep_vector(0, N);
  for (i in 1:(N - 1)) {
    K[i, i] = 1 + 0.1;
    for (j in (i + 1):N) {
      K[i, j] = exp(-0.5 * square(x[i] - x[j]));
      K[j, i] = K[i, j];
    }
  }
  K[N, N] = 1 + 0.1;
}
parameters {
  vector[N] y;
}
model {
  y ~ multi_normal(mu, K);
}

The above model can also be written more compactly using the specialized covariance function that implements the exponentiated quadratic kernel.

data {
  int<lower=1> N;
  real x[N];
}
transformed data {
  matrix[N, N] K = cov_exp_quad(x, 1.0, 1.0);
  vector[N] mu = rep_vector(0, N);
  for (n in 1:N)
    K[n, n] = K[n, n] + 0.1;
}
parameters {
  vector[N] y;
}
model {
  y ~ multi_normal(mu, K);
}

The input data are just the vector of inputs x and its size N. Such a model can be used with values of x evenly spaced over some interval in order to plot sample draws of functions from a Gaussian process.

Multivariate Inputs

Only the input data needs to change in moving from a univariate model to a multivariate model.

The only lines that change from the univariate model above are as follows.

data {
  int<lower=1> N;
  int<lower=1> D;
  vector[D] x[N];
}
transformed data {
...
...

The data are now declared as an array of vectors instead of an array of scalars; the dimensionality D is also declared.

In the remainder of the chapter, univariate models will be used for simplicity, but any of the models could be changed to multivariate in the same way as the simple sampling model. The only extra computational overhead from a multivariate model is in the distance calculation.

Cholesky Factored and Transformed Implementation

A more efficient implementation of the simulation model can be coded in Stan by relocating, rescaling and rotating an isotropic standard normal variate. Suppose \(\eta\) is an an isotropic standard normal variate \[ \eta \sim \mathsf{normal}({\bf 0}, {\bf 1}), \] where \({\bf 0}\) is an \(N\)-vector of 0 values and \({\bf 1}\) is the \(N \times N\) identity matrix. Let \(L\) be the Cholesky decomposition of \(K(x | \theta)\), i.e., the lower-triangular matrix \(L\) such that \(LL^{\top} = K(x | \theta)\). Then the transformed variable \(\mu + L\eta\) has the intended target distribution, \[ \mu + L\eta \sim \mathsf{multivariate\ normal}(\mu(x), K(x | \theta)). \]

This transform can be applied directly to Gaussian process simulation.

This model has the same data declarations for N and x, and the same transformed data definitions of mu and K as the previous model, with the addition of a transformed data variable for the Cholesky decomposition. The parameters change to the raw parameters sampled from an isotropic standard normal, and the actual samples are defined as generated quantities.

...
transformed data {
  matrix[N, N] L;
...
  L = cholesky_decompose(K);
}
parameters {
  vector[N] eta;
}
model {
  eta ~ std_normal();
}
generated quantities {
  vector[N] y;
  y = mu + L * eta;
}

The Cholesky decomposition is only computed once, after the data are loaded and the covariance matrix K computed. The isotropic normal distribution for eta is specified as a vectorized univariate distribution for efficiency; this specifies that each eta[n] has an independent standard normal distribution. The sampled vector y is then defined as a generated quantity using a direct encoding of the transform described above.