Nearest Neighbor Gaussian Processes (NNGP) based models is a family of highly scalable Gaussian processes based models. In brief, NNGP extends the Vecchia’s approximation (Vecchia (1988)) to a process using conditional independence given information from neighboring locations. In this section, I will briefly review response and latent NNGP models. For more details and applications of NNGP, please refer to (A. Datta, Banerjee, Finley, and Gelfand 2016) and (A. Datta, Banerjee, Finley, Hamm, et al. 2016).

We envision a spatial regression model at any location \(s\)

\[\begin{equation}\label{eq: spatial_regression_model} y(s) = m_{\theta}(s) + w(s) + \epsilon(s)\;,\quad \epsilon(s) \stackrel{iid}{\sim} \mathcal{N}(0,\tau^2) \;, \end{equation}\]where, usually, \(m_{\theta}(s) = x(s)^{\top}\beta\) and \(w(s)\) is a latent spatial process capturing spatial dependence. Let \(S\) be the set of \(n\) observed locations. If we model the process \(w(s)\) with a Gaussian process \(w(s) \sim GP(0, C_\theta(\cdot, \cdot))\), then a customary Bayesian hierarchical models for observations on \(S = \{s_1, \ldots, s_N\}\) can be constructed as

\[\begin{equation} \label{eq: latent_model} \mathcal{p}(\theta) \; \times \; \mathcal{N}(w(S)\;|\;0, C_\theta(S,S)) \; \times \; \mathcal{N}(y(S)\;|\; m_\theta(S) + w(S), \tau^2 I_n) \end{equation}\]Another implementation of GP is to marginalize over the latent process \(w(s)\) and construct the outcome process \(y(s)\) directly with a GP. By integrating out \(w(s)\), we have a more parsimonious model whose parameters set collapses from \(\{\theta, w\}\) to \(\{\theta\}\). In a Bayesian setting, \(\theta\) will be sampled from its posterior distribution

\[\begin{equation}\label{eq: response model} \mathcal{p}(\theta \;|\; y(S)) \; \propto \; \mathcal{p}(\theta) \; \times \; \mathcal{N} (y(S)\;|\;m_{\theta}(S), C_\theta(S, S) + \tau^2 I_n) \end{equation}\]To distinguish these two models, we shall call the former as a latent GP model, and the latter as a response GP model. These two models are referred as the latent variable GP and the marginal likelihood GP in Stan reference manual (Stan Development Team (2017)).

Nearest neighbor Gaussian process (NNGP) provides an alternative to the Gaussian process in the models discussed in the preceding subsection. The likelihoods of two models basing on NNGP derived from the original Gaussian process coincide with the Vecchia’s approximation (Vecchia (1988)) of the original models. In particular, a latent NNGP model has a posterior distribution proportional to

\[\begin{equation}\label{eq: latent NNGP model} \mathcal{p} (\theta) \; \times \; \mathcal{N}(w(S) \;|\; 0, C^*) \; \times \; \mathcal{N}(y(S) \;|\; m_\theta(S)+w(S), \tau^2 I_n) \;, \end{equation}\]where \(C^{*-1} = (I-A^*)^\top D^{*-1}(I-A^*)\) is the precision matrix of the latent process \(w(s)\) over \(S\). Here \(A^*\) is a sparse and strictly lower triangular matrix with at most \(M\)(\(M \ll N\)) non-zero entries in each row, and \(D^*\) is a diagonal matrix. One can readily calculate the determinant of \(C^*\) by the product of the diagonal elements in \(D^*\). The likelihood of \(w(S)\) based on precision matrix \(C^{*-1}\) serves as a good approximation to the likelihood of \(w(s)\) in \eqref{eq: latent_model}, while the storage and computational burden of the former is linear in \(N\).

A response NNGP model yields posterior distribution:

\[\begin{equation} \label{eq:response_NNGP} \mathcal{p}(\theta \;|\; y(S)) \; \propto \; \mathcal{p}(\theta) \; \times \; \mathcal{N}(y(S)\;|\;m_{\theta}(S), \{C_\theta + \tau^2 I\}^*) \;, \end{equation}\]where \(\{C_\theta + \tau^2 I\}^{*-1} = (I-A)^\top D^{-1}(I-A)\), analogous to the latent NNGP model, can be treated as an approximation of \(\{C_\theta(S, S)+\tau^2I_n\}^{-1}\). Notice that although one can obtain the response GP model by integrating out the latent process in a latent GP model, the corresponding NNGP model doesn’t have this property.

The details of Matrices \(A\), \(D\) and two models can be found in Finley et al. (2017). Here we use the response NNGP model to show how to construct matrix \(A\) and \(D\). Let \(N(s_i)\) be at most \(M\) closest points to \(s_i\) among the locations indexed less than \(i\). The \(i\) th row (\(i > 1\)) of \(A\) has nonzero entries on positions indexed by \(N(s_i)\), and the nonzero entries are calculated by

\[\begin{equation} \label{eq: A_construct} A(i, N(s_i)) = C_\theta(s_i, N(s_i))(C_\theta(N(s_i), N(s_i)) + \tau^2I)^{-1} \end{equation}\]And the \(i\) th element on the diagonal of \(D\) satisfies

\[\begin{equation} \label{eq: D_construct} D(i, i) = C_\theta(s_i, s_i) + \tau^2 - C_\theta(s_i, N(s_i))(C_\theta(N(s_i), N(s_i)) + \tau^2I)^{-1}C_\theta(N(s_i), s_i) \end{equation}\]These equations are derived from the distribution of \(E[y(s_i) \; | \; y(N(s_i))]\). The nonzero entries in each row of \(A\) are precisely the weights obtained by predicting \(y(s_i)\), or “kriging”, based upon the values of \(y(s)\) at neighboring locations, i.e., \(N(s_i)\). And the diagonal elements in \(D\) are the variance of \(y(s_i)\) conditioning on its’ neighbors in the “past” \(y(N(s_i))\).

In this section, I will use a simulation data to show how to code NNGP based models efficiently in Stan.

We generated response \(Y\) along with a covariate \(x\) at \(n = 500\) randomly sited locations in a unit square domain by the following model:

\[\begin{equation} y(s) = \beta_0 + x(s)\beta_1 + w(s) + \epsilon(s), \hspace{1cm} \epsilon(s) \sim N(0, \tau^2) \;, \end{equation}\]where the zero-centered spatial latent process \(w(s)\) were sampled from a Gaussian process with a covariance function \(C_\theta\) specified by exponential:

\[\begin{equation}\label{exp_K} C_\theta(s_i, s_j) = \sigma^2 \exp(-\phi||s_i-s_j||)\;, \hspace{0.5cm} s_i,s_j \in S \end{equation}\]The predictor \(x\) were generated from \(N(0, 1)\). The setting of parameters is listed in the code.

```
rmvn <- function(N, mu = 0, V = matrix(1)){
P <- length(mu)
if(any(is.na(match(dim(V), P))))
stop("Dimension problem!")
D <- chol(V)
t(matrix(rnorm(N * P), ncol = P) %*% D + rep(mu, rep(N, P)))
}
set.seed(1234)
N <- 500
coords <- cbind(runif(N), runif(N))
X <- as.matrix(cbind(1, rnorm(N)))
B <- as.matrix(c(1, 5))
sigma.sq <- 2
tau.sq <- 0.1
phi <- 3 / 0.5
D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0, N), sigma.sq*R)
Y <- rnorm(N, X %*% B + w, sqrt(tau.sq))
```

The following block shows the elements needed for NNGP based models

```
data {
int<lower=1> N;
int<lower=1> M;
int<lower=1> P;
vector[N] Y;
matrix[N, P + 1] X;
int NN_ind[N - 1, M];
matrix[N - 1, M] NN_dist;
matrix[N - 1, (M * (M - 1) / 2)] NN_distM;
}
```

Here the design matrix X contains an initial column of 1s, P is the number of regression coefficients, and M is the number of nearest neighbors (maximum number of elements in each row of \(n \times n\) sparse matrix \(A\)). Notice that we provide three matrices `NN_ind`

, `NN_dist`

and `NN_distM`

:

`NN_ind`

is a two-dimensional array of indices whose \(i - 1\)th row shows at most \(M\) closest points to \(s_i\) among the locations indexed less than \(i\).

`NN_dist`

is a matrix whose \(i - 1\)th row contains the distance of \(i\)th location to its selected neighbors.

`NN_dist`

is a matrix whose \(i - 1\)th row contains the strictly lower triangular part of the distance matrix of the selected neighbors of \(i\)th location.

These three matrices are required for constructing the sparse lower triangular matrix \(A\), and the diagonal matrix \(D\). Since they are fixed across the MCMC updates, we recommend user to provide them in the data segment. Next, I will show how to efficiently generate the matrices listed above.

File “NNmatrix.R” provides a wrapper function in R that uses package spNNGP, which has a fast algorithm of building neighbor index, to generate the required matrices in the Stan data segment. Here `n.omp.threads`

indicates the number of threads to use for parallel processing.

```
source("NNmatrix.R")
M = 6 # Number of Nearest Neighbors
NN.matrix <- NNMatrix(coords = coords, n.neighbors = M, n.omp.threads = 2)
str(NN.matrix)
```

We can use function `Check_Neighbors`

in “NNmatrix.R” for checking the nearest neighbor index. It is important to point out that `NNMatrix`

sorts coordinates on the first column before building the neighbor index. Thus we should use the sorted response and design matrix instead of the raw data in the data block.

`Check_Neighbors(NN.matrix$coords.ord, n.neighbors = M, NN.matrix, ind = 200)`

We assign Gaussian priors for \(\sigma\) and \(\tau\), an inverse gamma prior for \(\phi\) and a Gaussian prior for \(\beta = \{\beta_0, \beta_1\}\). The following is the parameter and model block for a latent NNGP model.

```
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
vector[N] w;
}
transformed parameters {
real sigmasq = sigma^2;
real tausq = tau^2;
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ inv_gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
w ~ nngp_w(sigmasq, phi, NN_dist, NN_distM, NN_ind, N, M);
Y ~ normal(X * beta + w, tau);
}
```

A small modification will make the code work for a response NNGP model:

```
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
}
transformed parameters {
real sigmasq = sigma^2;
real tausq = tau^2;
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ inv_gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
Y ~ nngp(X * beta, sigmasq, tausq, phi, NN_dist, NN_distM, NN_ind, N, M);
}
```

Here, the user-defined functions `nngp_w`

and `nngp`

will be given in the simulation study section.

The hardest part in coding NNGP in Stan is the user-defined likelihood, specifically, the function `nngp_w`

and `nngp`

in the last subsection. Here we use `nngp`

to illustrate the main idea of coding NNGP likelihood.

The log-likelihood of \(y(S)\) in \eqref{eq:response_NNGP} is given by:

\[ -{1 \over 2}\sum_{i = 1}^N\log{D_{ii}} - {1 \over 2}{(y(S) - X(S)^\top\beta)^\top (I-A)^T D^{-1} (I-A)(y(S) - X(S)^\top\beta)} \]

In the code below, vector `U`

saves the results of \((I-A)(y(S) - X(S)^\top\beta)\), and vector `V`

saves all the diagonal elements of Matrix \(D\) scaled by \(\sigma^2\). With this notation, the log-likelihood can be simplified as

where all the elements in the likelihood are vectors.

In the calculation of vector \(U = (I-A)(y(S) - X(S)^\top\beta)\), since we know that matrix \(A\) has at most \(M\) nonzero elements and the index of nonzero elements is given in `NN_ind`

, there is no need for saving the \(n \times n\) matrix \(I-A\). Instead, we use a for loop to calculate \(U\). Within each iteration, we first use `NN_dist`

and `NN_distM`

along with the updated parameter to obtain \(A(i, N(s_i))\) by \eqref{eq: D_construct} and \(D(i, i)\) by \eqref{eq: A_construct}, then use `NN_ind`

and \(y(S) - X(S)^\top\beta\) to calculate the \(i\) th element of \(U\). The flops required in each iteration is in the order of \(M^3\).

```
functions{
real nngp_lpdf(vector Y, vector X_beta, real sigmasq, real tausq,
real phi, matrix NN_dist, matrix NN_distM, int[,] NN_ind,
int N, int M){
vector[N] V;
vector[N] YXb = Y - X_beta;
vector[N] U = YXb;
real kappa_p_1 = tausq / sigmasq + 1;
int dim;
int h;
for (i in 2:N) {
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNdistM;
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNCholL;
vector[ i < (M + 1) ? (i - 1) : M] iNNcorr;
vector[ i < (M + 1) ? (i - 1) : M] v;
row_vector[i < (M + 1) ? (i - 1) : M] v2;
dim = (i < (M + 1))? (i - 1) : M;
if(dim == 1){iNNdistM[1, 1] = kappa_p_1;}
else{
h = 0;
for (j in 1:(dim - 1)){
for (k in (j + 1):dim){
h = h + 1;
iNNdistM[j, k] = exp(- phi * NN_distM[(i - 1), h]);
iNNdistM[k, j] = iNNdistM[j, k];
}
}
for(j in 1:dim){
iNNdistM[j, j] = kappa_p_1;
}
}
iNNCholL = cholesky_decompose(iNNdistM);
for (j in 1: dim){
iNNcorr[j] = exp(- phi * NN_dist[(i - 1), j]);
}
v = mdivide_left_tri_low(iNNCholL, iNNcorr);
V[i] = kappa_p_1 - dot_self(v);
v2 = mdivide_right_tri_low(v', iNNCholL);
for (j in 1:dim){
U[i] = U[i] - v2[j] * YXb[NN_ind[(i - 1), j]];
}
}
V[1] = kappa_p_1;
return - 0.5 * ( 1 / sigmasq * dot_product(U, (U ./ V)) +
sum(log(V)) + N * log(sigmasq));
}
}
```

Now let’s run the NNGP based models for the simulation data in the last section. First set parameters of priors:

```
P = 1 # number of regression coefficients
uB = rep(0, P + 1) # mean vector in the Gaussian prior of beta
VB = diag(P + 1)*1000 # covariance matrix in the Gaussian prior of beta
ss = 3 * sqrt(2) # scale parameter in the normal prior of sigma
st = 3 * sqrt(0.1) # scale parameter in the normal prior of tau
ap = 3; bp = 0.5 # shape and scale parameter in the inv-gamma prior of phi
```

The following chunk is the R code for running response NNGP models. We use the response and design matrix sorted by the order from `NNMatrix`

instead of the raw `Y`

and `X`

in the data block.

```
library(rstan)
options(mc.cores = parallel::detectCores())
data <- list(N = N, M = M, P = P,
Y = Y[NN.matrix$ord], X = X[NN.matrix$ord, ], # sorted Y and X
NN_ind = NN.matrix$NN_ind,
NN_dist = NN.matrix$NN_dist,
NN_distM = NN.matrix$NN_distM,
uB = uB, VB = VB, ss = ss, st = st, ap = ap, bp = bp)
myinits <-list(list(beta = c(1, 5), sigma = 1, tau = 0.5, phi = 12),
list(beta = c(5, 5), sigma = 1.5, tau = 0.2, phi = 5),
list(beta = c(0, 0), sigma = 2.5, tau = 0.1, phi = 9))
parameters <- c("beta", "sigmasq", "tausq", "phi")
samples <- stan(
file = "nngp_response.stan",
data = data,
init = myinits,
pars = parameters,
iter = 600,
chains = 3,
thin = 1,
seed = 123
)
```

The full Stan program for the response NNGP model is in the file “nngp_response.stan”.

`writeLines(readLines('nngp_response.stan'))`

```
/* Response NNGP model */
functions{
real nngp_lpdf(vector Y, vector X_beta, real sigmasq, real tausq,
real phi, matrix NN_dist, matrix NN_distM, int[,] NN_ind,
int N, int M){
vector[N] V;
vector[N] YXb = Y - X_beta;
vector[N] U = YXb;
real kappa_p_1 = tausq / sigmasq + 1;
int dim;
int h;
for (i in 2:N) {
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNdistM;
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNCholL;
vector[ i < (M + 1) ? (i - 1) : M] iNNcorr;
vector[ i < (M + 1) ? (i - 1) : M] v;
row_vector[i < (M + 1) ? (i - 1) : M] v2;
dim = (i < (M + 1))? (i - 1) : M;
if(dim == 1){iNNdistM[1, 1] = kappa_p_1;}
else{
h = 0;
for (j in 1:(dim - 1)){
for (k in (j + 1):dim){
h = h + 1;
iNNdistM[j, k] = exp(- phi * NN_distM[(i - 1), h]);
iNNdistM[k, j] = iNNdistM[j, k];
}
}
for(j in 1:dim){
iNNdistM[j, j] = kappa_p_1;
}
}
iNNCholL = cholesky_decompose(iNNdistM);
iNNcorr = to_vector(exp(- phi * NN_dist[(i - 1), 1: dim]));
v = mdivide_left_tri_low(iNNCholL, iNNcorr);
V[i] = kappa_p_1 - dot_self(v);
v2 = mdivide_right_tri_low(v', iNNCholL);
U[i] = U[i] - v2 * YXb[NN_ind[(i - 1), 1:dim]];
}
V[1] = kappa_p_1;
return - 0.5 * ( 1 / sigmasq * dot_product(U, (U ./ V)) +
sum(log(V)) + N * log(sigmasq));
}
}
data {
int<lower=1> N;
int<lower=1> M;
int<lower=1> P;
vector[N] Y;
matrix[N, P + 1] X;
int NN_ind[N - 1, M];
matrix[N - 1, M] NN_dist;
matrix[N - 1, (M * (M - 1) / 2)] NN_distM;
vector[P + 1] uB;
matrix[P + 1, P + 1] VB;
real ss;
real st;
real ap;
real bp;
}
transformed data {
cholesky_factor_cov[P + 1] L_VB;
L_VB = cholesky_decompose(VB);
}
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
}
transformed parameters {
real sigmasq = square(sigma);
real tausq = square(tau);
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
Y ~ nngp(X * beta, sigmasq, tausq, phi, NN_dist, NN_distM, NN_ind, N, M);
}
```

The following chunk is the R code for running the latent NNGP model:

```
options(mc.cores = parallel::detectCores())
data <- list(N = N, M = M, P = P,
Y = Y[NN.matrix$ord], X = X[NN.matrix$ord, ], # sorted Y and X
NN_ind = NN.matrix$NN_ind,
NN_dist = NN.matrix$NN_dist,
NN_distM = NN.matrix$NN_distM,
uB = uB, VB = VB, ss = ss, st = st, ap = ap, bp = bp)
myinits <-list(list(beta = c(1, 5), sigma = 1, tau = 0.5, phi = 12,
w_b1 = rep(0, N)),
list(beta = c(5, 5), sigma = 1.5, tau = 0.2, phi = 5,
w_b1 = rep(0.1, N)),
list(beta = c(0, 0), sigma = 2.5, tau = 0.1, phi = 9 ,
w_b1 = rep(0, N)))
parameters <- c("beta", "sigmasq", "tausq", "phi", "w")
samples_w <- stan(
file = "nngp_latent.stan",
data = data,
init = myinits,
pars = parameters,
iter = 600,
chains = 3,
thin = 1,
seed = 123
)
```

The full Stan program for the latent NNGP model is in the file “nngp_latent.stan”.

`writeLines(readLines('nngp_latent.stan'))`

```
/* Latent NNGP model*/
functions{
real nng
```