When data has a spatio-temporal structure and when observations from neighboring regions exhibit higher correlation than observations between distant regions, this correlation can be accounted for using the class of spatial models called “CAR” models (Conditional Auto-Regressive) introduced by Besag and collaborators. This case study covers how to efficiently code these models in Stan.

All models and data files are available in the Stan example-models GitHub repo for Stan case studies: car-iar-poisson. All commands should be run from the directory stan-dev/example-models/knitr/car-iar-poisson.

## Spatial Data as a Gaussian Random Field Model

The following math and its notation is taken from “Gaussian Random Field Models for Spatial Data” by Murali Haran, which is Chapter 18 of the “Handbook of Markov Chain Monte Carlo”.

Besag (1974) shows that by encoding the neighbor relations between spatial regions as a lattice, results from the physics of lattice systems of particles and the Hammersly-Clifford theorem provide an equivalence between a local specification of the conditional distribution of each particle given its neighboring particles and the global specification of the joint distribution of all particles. This specification of the joint distribution via the local specification of the conditional distributions of the individual variables is a Markov random field specification.

Therefore, given a set of observations taken at $$n$$ different subregions of a region with a number of dimensions $$D$$ (for spatio-temporal data, the number of dimensions is usually between 1 and 4, i.e., 1-3 spatial dimensions and 1 time dimension), spatial interactions between regions $$n_i$$ and $$n_j$$ can be modelled conditionally as a spatial random variable $$\mathbf{w}$$ as follows:

• Let $$w_{-i}$$ denote the $$n$$-length vector $$w$$ excluding $$w_i$$.

• We model each $$w_i$$ in terms of its full conditional distribution which is its distribution given the remaining random variables, $$\mathbf{w}_{-i}$$: $w_i \,\vert\, \mathbf{w}_{-i}, \Theta \sim N \left( \sum_{j=0}^n {c}_{ij} w_j,\kappa_i^{-1} \right), i= 1,\ldots,n$ where $$c_{ij}$$ describes the neighborhood structure such that $$c_{ij}$$ is nonzero only if $$i$$ and $$j$$ are neighbors and $$\kappa$$ is the precision (inverse variance) parameter.

### CAR Models

The neighborhood structure of the $$\kappa$$ and $$c_{ij}$$ elements can be stored in an $$n \times n$$ matrix $$Q$$ where the diagonal elements represent each of the $$n$$ subregions with value $$\kappa_i$$ and the off-diagonal elements contain $$-\kappa_i c_{ij}$$ if subregions $$i$$ and $$j$$ are adjacent and 0 otherwise. Usually a common precision parameter $$\tau$$, is assumed, where $${\kappa}_i = \tau$$ for all $$i$$.

When the matrix $$Q$$ is symmetric and positive definite, this specifies a valid joint distribution, $w \,\vert\, \Theta \sim N(0, Q^{-1})$ with $$\Theta$$ the vector of the precision parameters. This provides a proper prior for a CAR model. However evaluation of $$w$$ requires computing the covariance matrix $$Q^{-1}$$, which is computationally expensive for large values of $$n$$.

See the Stan case study Exact sparse CAR models in Stan, for further discussion of this model.

### IAR Models

Intrinsic Auto-Regressive models are intrinsic Gaussian Markov random fields, (see Besag and Kooperberg 1995). They are a subclass of CAR models which have an improper prior. Spatial models which use this improper prior are most correctly called IAR models, but the distinction between CAR and IAR models is often not made, especially in software packages which implement these models.

For this class of models, $$Q$$ is a positive semidefinite matrix. The off-diagonals of $$Q$$ are $$-{\tau} c_{ij}$$ and the value of the $$i^{th}$$ diagonal element is $$\tau \sum_{j} c_{ij}$$. This intrinsic GMRF model corresponds to the following conditional specification:

$f(w_j \vert \mathbf{w}_{-i}, \tau) \sim N \left( \frac{ \sum_{j \in N(i)}^n w_j} {n}, \frac{1}{n_i,\tau} \right)$

The individual spatial random variable $$w_j$$ for region $$n_j$$ with neighbors $$N(i)$$ is normally distributed with a mean equal to the average of its neighbors. The variance decreases as the number of neighbors increases.

Although this is an improper prior, given data, this results in a proper posterior density. Computing this density in Stan is computationally tractable due to the fact that Stan is computing proportional densities, allowing constant terms to drop out; among these is the term which requires computation of the determinant of this matrix. The density of the random variable $$\mathbf{w}$$ is:

$f (\mathbf{w} \vert \Theta) \propto \tau^{(N−1)/2} \exp(−\mathbf{w}^TQ(\tau)\mathbf{w})$

When the neighborhood graph contains disconnected subsets, term $$\tau^{(N−1)/2}$$ must be changed to $$\tau^{(N−k)/2}$$ where $$k$$ is the number of distinct subsets.

NOTE: this is only valid when all the regions in the adjacency matrix have at least 1 neighbor, that is, for all subregions $$n_i$$, the set $$j \sim i$$ is non-empty.

On the log scale, this is computed as:

$\frac{(N−1)}{2}log(\tau) + −\mathbf{w}^TQ(\tau)\mathbf{w}$

Although the computation of the matrix determinant has been eliminated, we still need to do matrix multiplication. An efficient representation of a sparse adjacency matrix for a set of $$N$$ subregions is:

• an array of length $$N$$ which contains the number of neighbors for each region; this is used to compute the diagonal elements of $$Q$$.

• an $$L \times 3$$ array where $$L$$ is the number of the non-zero off-diagonal elements of the adjacency matrix. The first two columns provide the $$i,j$$ region ids respectively, and the 3rd column contains the value $$c_{ij}$$, (the weight contributed by this neighbor). This 3rd column can be omitted altogether when all weights are the same.

The following Stan program fragments shows how to compute this in Stan.

The sparse adjacency matrix is passed in as data. The diagonal and off-diagonal elements are stored separately. Since all off-diagonal values in the adjacency matrix are -1, we only need to store the $$i,j$$ coordinates of the off-diagonal elements:

data {
int<lower=1> diag_weights[N_regions];  // weights == num_neighbors
int<lower=1> off_diag_coords[N_links,2]; // ij coords of off-diag entries
}

The IAR spatial component parameters are:

parameters {
vector[N_regions] h;  // individual-level spatial effect (IAR)
real<lower=0> tau;  // precision param
}

Stan computes the contribution of $$\frac{(N−1)}{2}log(\tau) + −\mathbf{w}^TQ(\tau)\mathbf{w}$$ to the log density as shown below. The diagonal and off-diagonal elements are computed separately. The weight of the off-diagonal elements is always -1.

transformed parameters {
real neg_tau_div_2 = -tau * 0.5;
}
model {
real off_diag_weight = -1.0;
...
target += ((N_tracts - 1) / 2.0) * log(tau);
for (i in 1:N_tracts) { // diagonals
target += neg_tau_div_2 * square(h[i]) * diag_weights[i];
}
for (j in 1:N_links) {   // off-diagonals
target += neg_tau_div_2 *
h[off_diag_coords[j,1]] * h[off_diag_coords[j,2]] * off_diag_weight;
}
...
}

## Multi-level Poisson GLMs for Areal Count Data

Adding a CAR (or IAR) spatially structured error term to a multi-level GLM as a random-effects multi-variate Gaussian provides spatial smoothing of the resulting estimates. For count data, e.g. incidents of disease or accidents, Poisson CAR models are available via R, BUGS, and JAGS as well as INLA (Integrated Nested Laplace Approximation) which is a fast alternative to MCMC, (INLA trades speed and scalability for accuracy, per the “no free lunch” principle).

### The Besag York Mollié (BYM) Model

A popular model for count data in biostatistics and epidemiology is a lognormal Poisson model proposed in Besag York Mollié 1991 which includes both an IAR component for spatial smoothing as well as an ordinary random-effects component for non-spatial heterogeneity. Banerjee Carlin and Gelfand 2003, section 5.4, presents the details of this model and its difficulties, together with a WinBUGS implementation (Figure 5.11) which is used to fit the Scottish lip cancer dataset from Clayton and Kaldor 1987. Using the notation of Banerjee et al., the Besag York Mollié model is:

$Y_i\, |\, {\psi}_i \sim Poisson ( E_i,e^{\psi_i}) ,$ $where\ \ {\psi}_i = {x'}_i\, \beta + \theta_i + \phi_i$

The $${x'}_i$$ are the explanatory spatial covariates having parameter coefficients . The $$\theta_i$$ are the ordinary random-effects components. The $$\phi_i$$ is an IAR spatial component.

As noted above, the IAR prior is an improper prior. Besag and Kooperberg 1995 show that the IAR prior is a pairwise difference prior which is identified up to an additive constant. Thus, models which include both the IAR prior and an intercept term are non-identifiable.

The intercept term can be estimated by constraining the random effects to sum to zero and specifying a separate intercept term with a location invariant Uniform($$-\infty$$, $$+\infty$$) prior. (The example WinBUGS program does this: the prior dflat() corresponds to an improper (flat) prior on the whole real line; the function CAR.normal perform the recentering of $$\phi$$.) This is equivalent to the unconstrained parameterization with no separate intercept, Since Stan works on the unconstrained parameters scale, we omit the intercept term from the model and compute it in the generated quantities block.

Another challenge for the BYM model is determining how much extra-Poisson variability should be allocated to the spatial $$\phi$$ and non-spatial $$\theta$$ random effects components. Banerjee et al. investigate the use of different gamma hyperpriors for both $$\phi$$ and $$\theta$$.

### A Stan Implementation of the BYM Model

We build a Stan version of the BYM model for the Scotland Lip cancer dataset, distributed as part of the CARBayesdata R package. This dataset consists of:

• y: the observed lip cancer case counts on a per-county basis
• x_aff: an area-specific continuous covariate that represents the proportion of the population employed in agriculture, fishing, or forestry (AFF)
• e_pop: the an expected number of cases, used as an offset,

In Stan, we would write the Poisson regression as:

  y ~ poisson_log(beta_1 + beta_2 * x_aff + log(e_pop));

where beta_1 and beta_2 are the regression intercept and slope parameters to be estimated.

The Stan program scotland_bym.stan. expands this model to include both a non-centered random-effects component and an IAR spatial component. As noted above, we compute the intercept term in the generated quantities block, instead of estimating it as part of the model.

The spatial information is coded as a sparse matrix and the IAR component is computed as described above:

data {
int<lower=1> N_areas;
int<lower=0> y[N_areas]; // number of events per area
vector[N_areas] x_aff;  // covariate: pct pop employed in AFF
vector[N_areas] e_pop;  // exposure: population
int<lower=1> diag_weights[N_areas]; // weights == num_neighbors
int<lower=1> off_diag_coords[N_links,2]; // ij coords of off-diag entries
}
transformed data {
vector[N_areas] log_e_pop = log(e_pop);
}
parameters {
real beta_2;   // slope
vector[N_areas] h;  // individual-level spatial effect (IAR)
real<lower=0> tau;  // precision param
vector[N_areas] re_nc;  // individual-level random effect
real<lower=0> sigma;   // scale of random effect
}
transformed parameters {
real neg_tau_div_2 = -tau * 0.5;
}
model {
real off_diag_weight = -1.0;
y ~ poisson_log(beta_2 * x_aff + log_e_pop + h + re_nc * sigma);
beta_2 ~ normal(0, 1);
tau ~ gamma(1.0, 1.0);  // following Banerjee et al 2003
re_nc ~ normal(0, 1);
sigma ~ gamma(3.3, 1.8);   // following Banerjee et al 2003
target += ((N_areas - 1) * 0.5) * log(tau);
for (i in 1:N_areas) { // diagonals
target += neg_tau_div_2 * square(h[i]) * diag_weights[i];
}
for (j in 1:N_links) {   // off-diagonals
target += neg_tau_div_2
* h[off_diag_coords[j,1]] * h[off_diag_coords[j,2]] * off_diag_weight;
}
}
generated quantities {
real beta_1;
vector[N_areas] eta = h + re_nc * sigma;
vector[N_areas] mu = exp(beta_2 * x_aff + log_e_pop + eta);
vector[N_areas] intercepts;
for (i in 1:N_areas) {
intercepts[i] = y[i] - mu[i];
}
beta_1 = mean(intercepts);
}

To run this program with the data, we need to load the following R packages:

library(gpclib);
library(maptools);
library(spdep);
gpclibPermit()

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

The Scotland data from the CARBayes package is included in the directory scotland_data:

scotland_shp = readShapePoly('scotland_data/scotland');
scotland_shp = scotland_shp[order(scotland_shp$ID),]; x_aff = scotland_shp$pcaff;
y = scotland_shp$Observed; e_pop = scotland_shp$Expected;

The scotland_data includes the geolocation information used to create the adjacency matrix, using functions from the R maptools and spdep packages. We use spdep package function poly2nb to get a sparse representation of the adjacency matrix coded up as an nb object which is a list of lists of the adjacent counties for each tract, referenced by county id. We have written an R script which factors this nb object into the array of i,j coordinates for the off-diagonal elements of this matrix. Given this, we create inputs: N_areas, N_links, diag_weights, off_diag_coords as follows:

scotland_nb = poly2nb(scotland_shp);
source("munge_data_helper.R")
off_diag_coords = get_nb_off_diags(scotland_nb);
N_areas = length(scotland_nb);
diag_weights = card(scotland_nb);
N_links = nrow(off_diag_coords);

We use RStan to fit the Stan model scotland_bym.stan with this data:

scot_fit = stan("scotland_bym.stan",
data=list(N_areas, y, x_aff, e_pop, diag_weights, N_links, off_diag_coords),
iter = 10000, warmup = 9000, control = list("adapt_delta=0.99"));

traceplot(scot_fit,
pars=c("lp__",  "beta_2", "tau", "sigma", "h[1]",  "h[2]",  "re_nc[1]",  "re_nc[2]"));