Nearest Neighbor Gaussian Processes (NNGP) based models is a family of highly scalable Gaussian processes based models. In brief, NNGP extends the Vecchia’s approximation (Vecchia (1988)) to a process using conditional independence given information from neighboring locations. In this section, I will briefly review response and latent NNGP models. For more details and applications of NNGP, please refer to (A. Datta, Banerjee, Finley, and Gelfand 2016) and (A. Datta, Banerjee, Finley, Hamm, et al. 2016).

To distinguish these two models, we shall call the former as a latent GP model, and the latter as a response GP model. These two models are referred as the latent variable GP and the marginal likelihood GP in Stan reference manual (Stan Development Team (2017)).

where \(C^{*-1} = (I-A^*)^\top D^{*-1}(I-A^*)\) is the precision matrix of the latent process \(w(s)\) over \(S\). Here \(A^*\) is a sparse and strictly lower triangular matrix with at most \(M\)(\(M \ll N\)) non-zero entries in each row, and \(D^*\) is a diagonal matrix. One can readily calculate the determinant of \(C^*\) by the product of the diagonal elements in \(D^*\). The likelihood of \(w(S)\) based on precision matrix \(C^{*-1}\) serves as a good approximation to the likelihood of \(w(s)\) in \eqref{eq: latent_model}, while the storage and computational burden of the former is linear in \(N\).

A response NNGP model yields posterior distribution: \[\begin{equation} \label{eq:response_NNGP} \mathcal{p}(\theta \;|\; y(S)) \; \propto \; \mathcal{p}(\theta) \; \times \; \mathcal{N}(y(S)\;|\;m_{\theta}(S), \{C_\theta + \tau^2 I\}^*) \;, \end{equation}\]where \(\{C_\theta + \tau^2 I\}^{*-1} = (I-A)^\top D^{-1}(I-A)\), analogous to the latent NNGP model, can be treated as an approximation of \(\{C_\theta(S, S)+\tau^2I_n\}^{-1}\). Notice that although one can obtain the response GP model by integrating out the latent process in a latent GP model, the corresponding NNGP model doesn’t have this property.

These equations are derived from the distribution of \(E[y(s_i) \; | \; y(N(s_i))]\). The nonzero entries in each row of \(A\) are precisely the weights obtained by predicting \(y(s_i)\), or “kriging”, based upon the values of \(y(s)\) at neighboring locations, i.e., \(N(s_i)\). And the diagonal elements in \(D\) are the variance of \(y(s_i)\) conditioning on its’ neighbors in the “past” \(y(N(s_i))\).

In this section, I will use a simulation data to show how to code NNGP based models efficiently in Stan.

The predictor \(x\) were generated from \(N(0, 1)\). The setting of parameters is listed in the code.

```
rmvn <- function(N, mu = 0, V = matrix(1)){
P <- length(mu)
if(any(is.na(match(dim(V), P))))
stop("Dimension problem!")
D <- chol(V)
t(matrix(rnorm(N * P), ncol = P) %*% D + rep(mu, rep(N, P)))
}
set.seed(1234)
N <- 500
coords <- cbind(runif(N), runif(N))
X <- as.matrix(cbind(1, rnorm(N)))
B <- as.matrix(c(1, 5))
sigma.sq <- 2
tau.sq <- 0.1
phi <- 3 / 0.5
D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0, N), sigma.sq*R)
Y <- rnorm(N, X %*% B + w, sqrt(tau.sq))
```

The following block shows the elements needed for NNGP based models

```
data {
int<lower=1> N;
int<lower=1> M;
int<lower=1> P;
vector[N] Y;
matrix[N, P + 1] X;
int NN_ind[N - 1, M];
matrix[N - 1, M] NN_dist;
matrix[N - 1, (M * (M - 1) / 2)] NN_distM;
}
```

Here the design matrix X contains an initial column of 1s, P is the number of regression coefficients, and M is the number of nearest neighbors (maximum number of elements in each row of \(n \times n\) sparse matrix \(A\)). Notice that we provide three matrices `NN_ind`

, `NN_dist`

and `NN_distM`

:

`NN_ind`

is a two-dimensional array of indices whose \(i - 1\)th row shows at most \(M\) closest points to \(s_i\) among the locations indexed less than \(i\).

`NN_dist`

is a matrix whose \(i - 1\)th row contains the distance of \(i\)th location to its selected neighbors.

`NN_dist`

is a matrix whose \(i - 1\)th row contains the strictly lower triangular part of the distance matrix of the selected neighbors of \(i\)th location.

These three matrices are required for constructing the sparse lower triangular matrix \(A\), and the diagonal matrix \(D\). Since they are fixed across the MCMC updates, we recommend user to provide them in the data segment. Next, I will show how to efficiently generate the matrices listed above.

File “NNmatrix.R” provides a wrapper function in R that uses package spNNGP, which has a fast algorithm of building neighbor index, to generate the required matrices in the Stan data segment. Here `n.omp.threads`

indicates the number of threads to use for parallel processing.

```
source("NNmatrix.R")
M = 6 # Number of Nearest Neighbors
NN.matrix <- NNMatrix(coords = coords, n.neighbors = M, n.omp.threads = 2)
str(NN.matrix)
```

We can use function `Check_Neighbors`

in “NNmatrix.R” for checking the nearest neighbor index. It is important to point out that `NNMatrix`

sorts coordinates on the first column before building the neighbor index. Thus we should use the sorted response and design matrix instead of the raw data in the data block.

`Check_Neighbors(NN.matrix$coords.ord, n.neighbors = M, NN.matrix, ind = 200)`

We assign Gaussian priors for \(\sigma\) and \(\tau\), an inverse gamma prior for \(\phi\) and a Gaussian prior for \(\beta = \{\beta_0, \beta_1\}\). The following is the parameter and model block for a latent NNGP model.

```
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
vector[N] w;
}
transformed parameters {
real sigmasq = sigma^2;
real tausq = tau^2;
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ inv_gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
w ~ nngp_w(sigmasq, phi, NN_dist, NN_distM, NN_ind, N, M);
Y ~ normal(X * beta + w, tau);
}
```

A small modification will make the code work for a response NNGP model:

```
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
}
transformed parameters {
real sigmasq = sigma^2;
real tausq = tau^2;
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ inv_gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
Y ~ nngp(X * beta, sigmasq, tausq, phi, NN_dist, NN_distM, NN_ind, N, M);
}
```

Here, the user-defined functions `nngp_w`

and `nngp`

will be given in the simulation study section.

The hardest part in coding NNGP in Stan is the user-defined likelihood, specifically, the function `nngp_w`

and `nngp`

in the last subsection. Here we use `nngp`

to illustrate the main idea of coding NNGP likelihood.

`U`

saves the results of \((I-A)(y(S) - X(S)^\top\beta)\), and vector `V`

saves all the diagonal elements of Matrix \(D\) scaled by \(\sigma^2\). With this notation, the log-likelihood can be simplified as
\[\begin{equation} \label{eq: ll_code}
-{1 \over 2}\{\sum_{i = 1}^N\log{V_{i}} +N \log{(\sigma^2)}+ {1\over \sigma^2} U^\top(U \oslash V)\} \;,
\end{equation}\]
where all the elements in the likelihood are vectors.

In the calculation of vector \(U = (I-A)(y(S) - X(S)^\top\beta)\), since we know that matrix \(A\) has at most \(M\) nonzero elements and the index of nonzero elements is given in `NN_ind`

, there is no need for saving the \(n \times n\) matrix \(I-A\). Instead, we use a for loop to calculate \(U\). Within each iteration, we first use `NN_dist`

and `NN_distM`

along with the updated parameter to obtain \(A(i, N(s_i))\) by \eqref{eq: D_construct} and \(D(i, i)\) by \eqref{eq: A_construct}, then use `NN_ind`

and \(y(S) - X(S)^\top\beta\) to calculate the \(i\) th element of \(U\). The flops required in each iteration is in the order of \(M^3\).

```
functions{
real nngp_lpdf(vector Y, vector X_beta, real sigmasq, real tausq,
real phi, matrix NN_dist, matrix NN_distM, int[,] NN_ind,
int N, int M){
vector[N] V;
vector[N] YXb = Y - X_beta;
vector[N] U = YXb;
real kappa_p_1 = tausq / sigmasq + 1;
int dim;
int h;
for (i in 2:N) {
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNdistM;
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNCholL;
vector[ i < (M + 1) ? (i - 1) : M] iNNcorr;
vector[ i < (M + 1) ? (i - 1) : M] v;
row_vector[i < (M + 1) ? (i - 1) : M] v2;
dim = (i < (M + 1))? (i - 1) : M;
if(dim == 1){iNNdistM[1, 1] = kappa_p_1;}
else{
h = 0;
for (j in 1:(dim - 1)){
for (k in (j + 1):dim){
h = h + 1;
iNNdistM[j, k] = exp(- phi * NN_distM[(i - 1), h]);
iNNdistM[k, j] = iNNdistM[j, k];
}
}
for(j in 1:dim){
iNNdistM[j, j] = kappa_p_1;
}
}
iNNCholL = cholesky_decompose(iNNdistM);
for (j in 1: dim){
iNNcorr[j] = exp(- phi * NN_dist[(i - 1), j]);
}
v = mdivide_left_tri_low(iNNCholL, iNNcorr);
V[i] = kappa_p_1 - dot_self(v);
v2 = mdivide_right_tri_low(v', iNNCholL);
for (j in 1:dim){
U[i] = U[i] - v2[j] * YXb[NN_ind[(i - 1), j]];
}
}
V[1] = kappa_p_1;
return - 0.5 * ( 1 / sigmasq * dot_product(U, (U ./ V)) +
sum(log(V)) + N * log(sigmasq));
}
}
```

Now let’s run the NNGP based models for the simulation data in the last section. First set parameters of priors:

```
P = 1 # number of regression coefficients
uB = rep(0, P + 1) # mean vector in the Gaussian prior of beta
VB = diag(P + 1)*1000 # covariance matrix in the Gaussian prior of beta
ss = 3 * sqrt(2) # scale parameter in the normal prior of sigma
st = 3 * sqrt(0.1) # scale parameter in the normal prior of tau
ap = 3; bp = 0.5 # shape and scale parameter in the inv-gamma prior of phi
```

The following chunk is the R code for running response NNGP models. We use the response and design matrix sorted by the order from `NNMatrix`

instead of the raw `Y`

and `X`

in the data block.

```
library(rstan)
options(mc.cores = parallel::detectCores())
data <- list(N = N, M = M, P = P,
Y = Y[NN.matrix$ord], X = X[NN.matrix$ord, ], # sorted Y and X
NN_ind = NN.matrix$NN_ind,
NN_dist = NN.matrix$NN_dist,
NN_distM = NN.matrix$NN_distM,
uB = uB, VB = VB, ss = ss, st = st, ap = ap, bp = bp)
myinits <-list(list(beta = c(1, 5), sigma = 1, tau = 0.5, phi = 12),
list(beta = c(5, 5), sigma = 1.5, tau = 0.2, phi = 5),
list(beta = c(0, 0), sigma = 2.5, tau = 0.1, phi = 9))
parameters <- c("beta", "sigmasq", "tausq", "phi")
samples <- stan(
file = "nngp_response.stan",
data = data,
init = myinits,
pars = parameters,
iter = 600,
chains = 3,
thin = 1,
seed = 123
)
```

The full Stan program for the response NNGP model is in the file “nngp_response.stan”.

`writeLines(readLines('nngp_response.stan'))`

```
/* Response NNGP model */
functions{
real nngp_lpdf(vector Y, vector X_beta, real sigmasq, real tausq,
real phi, matrix NN_dist, matrix NN_distM, int[,] NN_ind,
int N, int M){
vector[N] V;
vector[N] YXb = Y - X_beta;
vector[N] U = YXb;
real kappa_p_1 = tausq / sigmasq + 1;
int dim;
int h;
for (i in 2:N) {
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNdistM;
matrix[ i < (M + 1) ? (i - 1) : M, i < (M + 1) ? (i - 1): M]
iNNCholL;
vector[ i < (M + 1) ? (i - 1) : M] iNNcorr;
vector[ i < (M + 1) ? (i - 1) : M] v;
row_vector[i < (M + 1) ? (i - 1) : M] v2;
dim = (i < (M + 1))? (i - 1) : M;
if(dim == 1){iNNdistM[1, 1] = kappa_p_1;}
else{
h = 0;
for (j in 1:(dim - 1)){
for (k in (j + 1):dim){
h = h + 1;
iNNdistM[j, k] = exp(- phi * NN_distM[(i - 1), h]);
iNNdistM[k, j] = iNNdistM[j, k];
}
}
for(j in 1:dim){
iNNdistM[j, j] = kappa_p_1;
}
}
iNNCholL = cholesky_decompose(iNNdistM);
iNNcorr = to_vector(exp(- phi * NN_dist[(i - 1), 1: dim]));
v = mdivide_left_tri_low(iNNCholL, iNNcorr);
V[i] = kappa_p_1 - dot_self(v);
v2 = mdivide_right_tri_low(v', iNNCholL);
U[i] = U[i] - v2 * YXb[NN_ind[(i - 1), 1:dim]];
}
V[1] = kappa_p_1;
return - 0.5 * ( 1 / sigmasq * dot_product(U, (U ./ V)) +
sum(log(V)) + N * log(sigmasq));
}
}
data {
int<lower=1> N;
int<lower=1> M;
int<lower=1> P;
vector[N] Y;
matrix[N, P + 1] X;
int NN_ind[N - 1, M];
matrix[N - 1, M] NN_dist;
matrix[N - 1, (M * (M - 1) / 2)] NN_distM;
vector[P + 1] uB;
matrix[P + 1, P + 1] VB;
real ss;
real st;
real ap;
real bp;
}
transformed data {
cholesky_factor_cov[P + 1] L_VB;
L_VB = cholesky_decompose(VB);
}
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
}
transformed parameters {
real sigmasq = square(sigma);
real tausq = square(tau);
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
Y ~ nngp(X * beta, sigmasq, tausq, phi, NN_dist, NN_distM, NN_ind, N, M);
}
```

The following chunk is the R code for running the latent NNGP model:

```
options(mc.cores = parallel::detectCores())
data <- list(N = N, M = M, P = P,
Y = Y[NN.matrix$ord], X = X[NN.matrix$ord, ], # sorted Y and X
NN_ind = NN.matrix$NN_ind,
NN_dist = NN.matrix$NN_dist,
NN_distM = NN.matrix$NN_distM,
uB = uB, VB = VB, ss = ss, st = st, ap = ap, bp = bp)
myinits <-list(list(beta = c(1, 5), sigma = 1, tau = 0.5, phi = 12,
w_b1 = rep(0, N)),
list(beta = c(5, 5), sigma = 1.5, tau = 0.2, phi = 5,
w_b1 = rep(0.1, N)),
list(beta = c(0, 0), sigma = 2.5, tau = 0.1, phi = 9 ,
w_b1 = rep(0, N)))
parameters <- c("beta", "sigmasq", "tausq", "phi", "w")
samples_w <- stan(
file = "nngp_latent.stan",
data = data,
init = myinits,
pars = parameters,
iter = 600,
chains = 3,
thin = 1,
seed = 123
)
```

The full Stan program for the latent NNGP model is in the file “nngp_latent.stan”.

`writeLines(readLines('nngp_latent.stan'))`

```
/* Latent NNGP model*/
functions{
real nngp_w_lpdf(vector w, real sigmasq, real phi, matrix NN_dist,
matrix NN_distM, int[,] NN_ind, int N, int M){
vector[N] V;
vector[N] I_Aw = w;
int dim;
int h;
for (i in 2:N) {
matrix[ i < (M + 1)? (i - 1) : M, i < (M + 1)? (i - 1): M]
iNNdistM;
matrix[ i < (M + 1)? (i - 1) : M, i < (M + 1)? (i - 1): M]
iNNCholL;
vector[ i < (M + 1)? (i - 1) : M] iNNcorr;
vector[ i < (M + 1)? (i - 1) : M] v;
row_vector[i < (M + 1)? (i - 1) : M] v2;
dim = (i < (M + 1))? (i - 1) : M;
if(dim == 1){iNNdistM[1, 1] = 1;}
else{
h = 0;
for (j in 1:(dim - 1)){
for (k in (j + 1):dim){
h = h + 1;
iNNdistM[j, k] = exp(- phi * NN_distM[(i - 1), h]);
iNNdistM[k, j] = iNNdistM[j, k];
}
}
for(j in 1:dim){
iNNdistM[j, j] = 1;
}
}
iNNCholL = cholesky_decompose(iNNdistM);
iNNcorr = to_vector(exp(- phi * NN_dist[(i - 1), 1:dim]));
v = mdivide_left_tri_low(iNNCholL, iNNcorr);
V[i] = 1 - dot_self(v);
v2 = mdivide_right_tri_low(v', iNNCholL);
I_Aw[i] = I_Aw[i] - v2 * w[NN_ind[(i - 1), 1:dim]];
}
V[1] = 1;
return - 0.5 * ( 1 / sigmasq * dot_product(I_Aw, (I_Aw ./ V)) +
sum(log(V)) + N * log(sigmasq));
}
}
data {
int<lower=1> N;
int<lower=1> M;
int<lower=1> P;
vector[N] Y;
matrix[N, P + 1] X;
int NN_ind[N - 1, M];
matrix[N - 1, M] NN_dist;
matrix[N - 1, (M * (M - 1) / 2)] NN_distM;
vector[P + 1] uB;
matrix[P + 1, P + 1] VB;
real ss;
real st;
real ap;
real bp;
}
transformed data {
cholesky_factor_cov[P + 1] L_VB;
L_VB = cholesky_decompose(VB);
}
parameters{
vector[P + 1] beta;
real<lower = 0> sigma;
real<lower = 0> tau;
real<lower = 0> phi;
vector[N] w;
}
transformed parameters {
real sigmasq = square(sigma);
real tausq = square(tau);
}
model{
beta ~ multi_normal_cholesky(uB, L_VB);
phi ~ gamma(ap, bp);
sigma ~ normal(0, ss);
tau ~ normal(0, st);
w ~ nngp_w(sigmasq, phi, NN_dist, NN_distM, NN_ind, N, M);
Y ~ normal(X * beta + w, tau);
}
```

In this section, we will show the results of the simulation study, compare response and latent NNGP models, and provide suggestions on how to use NNGP based models.

Response NNGP model is faster and easier to sample from posterior distribution than the latent NNGP models. The following shows the summary table and trace plot of the posterior samples from response NNGP model.

`print(samples)`

```
Inference for Stan model: nngp_response.
3 chains, each with iter=600; warmup=300; thin=1;
post-warmup draws per chain=300, total post-warmup draws=900.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff
beta[1] 0.76 0.03 0.60 -0.44 0.45 0.78 1.09 1.89 385
beta[2] 5.00 0.00 0.03 4.95 4.98 5.00 5.02 5.06 660
sigmasq 2.63 0.11 1.34 1.49 1.90 2.26 2.92 5.57 144
tausq 0.10 0.00 0.03 0.04 0.08 0.10 0.12 0.16 406
phi 4.44 0.10 1.48 1.70 3.42 4.44 5.50 7.38 239
lp__ -99.50 0.12 1.88 -104.15 -100.39 -99.13 -98.26 -97.04 228
Rhat
beta[1] 1.01
beta[2] 1.00
sigmasq 1.01
tausq 1.00
phi 1.00
lp__ 1.01
Samples were drawn using NUTS(diag_e) at Wed Jan 31 08:54:29 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
```

`stan_trace(samples)`