Tagging Basketball Events with HMM in Stan

Introduction
Simple HMM Example
Tagging Drive Events
Defensive Assignment
- Simulated Data
- Player Tracking Data
Conclusion
Acknowledgements
References
Computing Environment
Licenses

Introduction

This case study shows how we can apply Bayesian inference to Hidden Markov Models (HMMs) using Stan to extract useful information from basketball player tracking data. Specifically we show how to tag drive events and how to determine defensive assignment. Before diving into basketball data we show how to fit an HMM in Stan using a simple example. This should help build some intuition for those who are unfamiliar with HMMs and will also show how to specify an HMM using Stan.

The code used to build this document is available in the following GitHub repository: https://github.com/imadmali/bball-hmm.

Simple HMM Example

HMMs enable you to model a series of observed values. The generative model produces an observation for each latent state, so you also have a series of states that corresponds to the series of observed values. The state series exhibits the Markov property so the value of the state at time $t$ only depends on the value of the state at time $t-1$ . Often the states are hidden so the goal of inference using an HMM is to,

Estimate the parameters that allow you to transition from one state to the next and (given the state) the parameters involved in generating the observation.
Predict the most likely state sequence based on the observed data and the parameters estimated in (1).

The plot below outlines an HMM that we simulated. The model involves one sequence of observed outcomes generated from the normal distribution and two states. At each time step we are in one of the two states. Each of the states corresponds to a location parameter from the normal distribution. This means that the observed value is generated depending on which one of two location parameters is selected, which in turn depends on which state you are in. In more complicated data you could have multiple observations and many more states at each time step.

You can see how state 1 corresponds to smaller values of the outcome while state 2 corresponds to relatively higher values of the outcome. In most real-word situations you do not know the state value at each time step because it is hidden. So in order to fit the model and infer the state sequence you first need to make an assumption as to how many possible states there might be at each time step.

hmm_data <- readRDS("../data/hmm_example.RDS")
z <- hmm_data$z
y <- hmm_data$y
par(mfrow=c(2,1))
plot(hmm_data$z, type="s",
     main = "Hidden States",
     ylab = "State Value",
     xlab = "Time",
     ylim = c(0.5,2.5), yaxt = "n")
axis(2, 1:2, 1:2)
plot(hmm_data$y, type = "l",
     main = "Observed Output",
     ylab = "Observation Value",
     xlab = "Time")
y_plt <- hmm_data$y
y_plt[hmm_data$z==1] <- NA
lines(y_plt, lwd = 3)
legend("bottomright", c("State 1","State 2"), lty = c(1,1), lwd = c(1,3), cex = 0.8)

Suppose we don’t know how many states there are at each time step $t$ , but we assume that there are two states $z_t \in [1,2]$ . The full generative model can be defined as,

$\begin{align*} &y_t \sim \mathcal{N}(\mu_{z_t}, 1) \\ &z_t \sim \mathcal{Categorical}(\theta_{z_{[t-1]}}) \\ &\mbox{priors on } \mu_k \mbox{ and } \boldsymbol{\theta}\\ \end{align*}$

This says that the distribution of each observation $y_t$ is normal with location parameter $\mu_{z_t}$ and constant scale. There are two values that $z_t$ can take at each time step which means that there are two values of $\mu$ . So we have $\mu_{z_t} \in [\mu_1,\mu_2]$ . The model also says that the state variable follows the categorical distribution which is parameterized by $\theta_{z_[t-1]}$ . Since $z_t$ can take two values, $\theta_{z_[t-1]}$ also takes two values. Given the properties of the categorical distribution $\theta_{z_[t-1]}$ sums to 1. This means that $\boldsymbol{\theta}$ is a $2 \times 2$ stochastic matrix (also known as a transition matrix).

A graphical representation of the model is provided below.

The parameters that we want to estimate in this model are the transition probabilities ( $\boldsymbol{\theta}$ ) and the parameters associated with the emission probabilities ( $\mu_1$ , $\mu_2$ ).

Transition probabilities govern how likely it is to move from one state to another or to stay within the same state.
Emission probabilities govern how likely the outcome was generated by that state.

We can interpret an element in the stochastic matrix $\boldsymbol{\theta}$ as the probability of going from the row state to the column state from time $t-1$ to time $t$ . So the diagonal elements give the probability of staying in the same state, and the off-diagonal elements give the probability of transitioning from one state to the next. Continuing with our example we have,

$\theta_{1,1}$ : the probability of going from state 1 to state 1.
$\theta_{2,2}$ : the probability of going from state 2 to state 2.
$\theta_{1,2}$ : the probability of going from state 1 to state 2.
$\theta_{2,1}$ : the probability of going from state 2 to state 1.

Since the data in this example are normally distributed the emission probabilities come from the normal distribution. The emission probabilities depend on the location and scale parameter associated with each state (i.e. the emission parameters). In our model we assume the scale parameter is known and only have to estimate the location parameters $\mu_1$ and $\mu_2$ for states 1 and 2.

In order to estimate the parameters we need to define the posterior distribution which requires us to specify the likelihood of the data and the priors. The likelihood is defined by the probability of observing that particular sequence of outcome variables. Using the forward algorithm we can efficiently calculate the likelihood of the data. The forward algorithm is computing the following marginalization efficiently, $p(y | \theta, \mu) = \sum_{z} p(y,z | \theta, \mu)$

The priors are defined on the stochastic matrix and on the emission parameters. Since each row of the stochastic matrix sums to 1, a natural prior choice is the Dirichlet distribution on each row of the matrix.

We can then use Stan’s variant of the Hamiltonian Monte Carlo algorithm to estimate the stochastic matrix and emission parameters. Once we estimate the parameters we can determine the most probable state sequence that generated the sequence of observations. This can be computed with the Viterbi algorithm.

Specifying the Model

Below we represent the HMM in Stan (also available in models/hmm_example.stan). The model {} block specifies the priors and the forward algorithm to determine the most likely state at each point in time, and the generated quantities {} block specifies the Viterbi algorithm which enables us to determine the most likely state sequence. This model was adapted from the Stan User’s Guide.

Notice that we have chosen to define $\mu$ as positive_ordered[K] mu instead of real mu[K]. This constraint is applied in order to enforce a strict separation between the two $\mu$ values associated with the two states. While in theory weak priors that order the parameters are sufficient to identify the model, in practice Stan needs better parameterization so we enforce an ordering. If we don’t do this the sampling algorithm may struggle to find convergence among the parameter chains (see Betancourt 2017 for more detail). An example of omitting the constraint specification is provided in models/hmm_example_bad.stan.

data {
  int<lower=0> N;
  int<lower=0> K;
  real y[N];
}

parameters {
  simplex[K] theta[K];
  // real mu[K];
  positive_ordered[K] mu;
}

model {
  // priors
  target+= normal_lpdf(mu[1] | 3, 1);
  target+= normal_lpdf(mu[2] | 10, 1);
  // forward algorithm
  {
  real acc[K];
  real gamma[N, K];
  for (k in 1:K)
    gamma[1, k] = normal_lpdf(y[1] | mu[k], 1);
  for (t in 2:N) {
    for (k in 1:K) {
      for (j in 1:K)
        acc[j] = gamma[t-1, j] + log(theta[j, k]) + normal_lpdf(y[t] | mu[k], 1);
      gamma[t, k] = log_sum_exp(acc);
    }
  }
  target += log_sum_exp(gamma[N]);
  }
}

generated quantities {
  int<lower=1,upper=K> z_star[N];
  real log_p_z_star;
  {
    int back_ptr[N, K];
    real best_logp[N, K];
    for (k in 1:K)
      best_logp[1, k] = normal_lpdf(y[1] | mu[k], 1);
    for (t in 2:N) {
      for (k in 1:K) {
        best_logp[t, k] = negative_infinity();
        for (j in 1:K) {
          real logp;
          logp = best_logp[t-1, j] + log(theta[j, k]) + normal_lpdf(y[t] | mu[k], 1);
          if (logp > best_logp[t, k]) {
            back_ptr[t, k] = j;
            best_logp[t, k] = logp;
          }
        }
      }
    }
    log_p_z_star = max(best_logp[N]);
    for (k in 1:K)
      if (best_logp[N, k] == log_p_z_star)
        z_star[N] = k;
    for (t in 1:(N - 1))
      z_star[N - t] = back_ptr[N - t + 1, z_star[N - t + 1]];
  }
}

Fitting the Model

We fit the model to the data provided above in order to estimate the parameters theta and mu along with the hidden state sequence z_star.

# code available in hmm_example.R
stan_data <- list(N = length(hmm_data$y),
                  K = 2,
                  y = hmm_data$y)
hmm_fit <- stan("../models/hmm_example.stan", data = stan_data, iter = 1e3, chains = 4)

Post-estimation Validation

The post estimation steps that we take to validate are,

Diagnostics: Make sure that each parameter sample converged. This can be evaluated by examining the R-hat and effective sample size values for each parameter.
Predictions: Specifically we perform a posterior predictive check. Using the estimated parameters and the predicted state sequence we can predict multiple output sequences and see if they line up with the observed output sequence. Additionally, since we know the true state values, we can check to make sure the predicted state values line up the true state values.

We go through these steps below.

Diagnostics

For the transition probabilities $\boldsymbol\theta{}$ and the parameters associated with the emission probabilities $\mu$ we have the following parameter estimates.

print(hmm_fit, pars = "z_star", include = FALSE, probs = c(0.05,0.95))

## Inference for Stan model: hmm_example.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##                 mean se_mean   sd      5%     95% n_eff Rhat
## theta[1,1]      0.66    0.00 0.10    0.50    0.81  1647    1
## theta[1,2]      0.34    0.00 0.10    0.19    0.50  1647    1
## theta[2,1]      0.07    0.00 0.03    0.03    0.12  1638    1
## theta[2,2]      0.93    0.00 0.03    0.88    0.97  1638    1
## mu[1]           3.03    0.01 0.23    2.67    3.40  1046    1
## mu[2]           8.83    0.00 0.11    8.64    9.01  2305    1
## log_p_z_star -166.18    0.04 1.34 -168.85 -164.60   882    1
## lp__         -170.20    0.05 1.39 -173.00 -168.57   772    1
## 
## Samples were drawn using NUTS(diag_e) at Sat Oct  5 14:02:48 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

With Rhat values close to 1 and a high enough effective sample size (n_eff) we can conclude that our parameter estimates converged.

We can also look at the traceplots to get an idea of what convergence among chains looks like.

mcmc_trace(as.array(hmm_fit), regex_pars = "^theta\\[|^mu\\[", facet_args = list(nrow = 2))

Below we have diagnostics from a situation where priors are defined but an ordering is not enforced (see hmm_example_bad_fit.R for the code). You can see how the parameter chains look like they have not converged. Notice how we have Rhat values different from 1 and small effective sample sizes; all indicators of bad convergence.

# code available in hmm_example_bad.R
hmm_bad_stan <- readRDS("../results/hmm_example_bad.RDS")
print(hmm_bad_stan$fit, pars = "z_star", include = FALSE, probs = c(0.05,0.95))

## Inference for Stan model: hmm_example_bad.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##                 mean se_mean    sd      5%     95% n_eff  Rhat
## theta[1,1]      0.80    0.09  0.14    0.55    0.96     2  2.18
## theta[1,2]      0.20    0.09  0.14    0.04    0.45     2  2.18
## theta[2,1]      0.20    0.09  0.15    0.04    0.47     3  2.13
## theta[2,2]      0.80    0.09  0.15    0.53    0.96     3  2.13
## mu[1]           5.88    2.02  2.86    2.73    8.88     2 16.61
## mu[2]           6.10    1.93  2.73    3.08    8.96     2 16.71
## log_p_z_star -166.91    0.42  1.86 -170.48 -164.72    20  1.07
## lp__         -192.63   13.78 19.54 -214.04 -171.64     2 15.08
## 
## Samples were drawn using NUTS(diag_e) at Tue Oct  1 23:22:14 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

mcmc_trace(as.array(hmm_bad_stan$fit), regex_pars = "^theta\\[|^mu\\[", facet_args = list(ncol = 2))

Posterior Predictions

Below we plot 100 predicted outcome sequences given the predicted states and emission parameter estimates. (These predictions could also be implemented in the generated quantities {} block of our Stan model.) Notice how these posterior predictions based on the data line up nicely with the observed output values. This is one indication that the data generation process was appropriately modeled.

# extract samples
samples <- as.matrix(hmm_fit)
theta <- samples[,grep("^theta",colnames(samples))]
mu <- samples[,grep("^mu",colnames(samples))]
z_star <- samples[,grep("^z_star",colnames(samples))]

# simulate observations for each iteration in the sample
y_hat <- list()
for (i in 1:nrow(samples)) {
  psi_seq <- sapply(z_star[i,], function(x){mu[i,x]})
  y_hat[[i]] <- rnorm(length(psi_seq), psi_seq, 1)
}

# plot
indxs <- sample(length(y_hat), 100, replace = FALSE)
plot(hmm_data$y, type = "n",
     main = "Observed vs Predicted Output",
     ylab = "Observation Value",
     xlab = "Time",
     ylim = c(0,11))
for (i in indxs) {
  lines(y_hat[[i]], col = "#ff668890")
}
lines(hmm_data$y, lwd = 2)
legend("bottomright", c("Observed","Predicted"), col = c("#000000","#ff668890"), lty = c(1,1), lwd = c(2,1), cex = 0.8)

Below we overlay the predicted states with the true states. Similar to the conclusion above, our predictions line up nicely with the true values. Note that we can only do this because we know the true value of the hidden states. In situations where the states are truly hidden this step is infeasible.

# visualization
par(mfrow=c(2,1))
plot(hmm_data$z, type="s",
     main = "Latent States",
     ylab = "State Value",
     xlab = "Time",
     ylim = c(0.5,2.5), yaxt = "n")
axis(2, 1:2, 1:2)
points(colMeans(z_star), cex = 0.5)
legend("bottomright", c("Actual","Predicted"), pch = c(NA,1), lty = c(1,NA), cex = 0.5)
plot(hmm_data$y, type = "l",
     main = "Observed Output",
     ylab = "Observation Value",
     xlab = "Time")
y_plt <- hmm_data$y
y_plt[hmm_data$z==1] <- NA
lines(y_plt, lwd = 3)
legend("bottomright", c("State 1","State 2"), lty = c(1,1), lwd = c(1,3), cex = 0.8)

We can now adapt the methodology discsussed in this section to identify a drive event and defensive assignment using basketball player tracking data.

Tagging Drive Events

For those who are unfamiliar with basketball, a drive occurs when an offensive player dribbles the ball towards the hoop for a shot attempt (often a layup). We can translate a drive into two types of events that are happening simultaneously over time until the shot attempt,

The player increases their speed.
The player reduces the distance between himself and the basket.

The video below illustrates what a drive event looks like in the player tracking data (see drive_data.R for code). This drive possession was attributed to Zach LaVine in the Minnesota Timberwolves v Boston Celtics game on 12/21/2015.

Pre-process Data

Using the player tracking data we can construct the speed and distance metrics associated with LaVine’s drive event. We define speed as distance over time and use Euclidean distance to determine the player’s distance from the hoop at each time step. Below we show these metrics along with the player tracking data (see data/pt_data_evt140_drive.R for code). Notice how LaVine decreases his distance from the basket and increases his speed as he drives to the hoop.

It’s apparent that the speed metric is pretty noisy. This may be attributed to the fact that time is measured in 25 hertz (1/25th of a second) and that location is determined by a computer vision algorithm and not a tracking chip attached to the player. There are many methods that can be used to smooth the data (e.g. splines). Here we use a basic rolling mean with a window of three time steps. In our example the data is not so noisy that it would affect the performance of our model so the smoothing is mostly for aesthetics and ease of interpretation. If the noise in the series was more extreme then we might want consider implementing a better smoothing method before fitting our model.

drive_data <- readRDS("../data/evt140_0021500411.RDS")

lavine_speed_smooth <- rep(drive_data$game$lavine_speed[2],2)
for (i in 3:nrow(drive_data$game))
  lavine_speed_smooth[i] <- mean(drive_data$game$lavine_speed[(i-2):i], na.rm=TRUE)

drive_data <- readRDS("../data/evt140_0021500411.RDS")
par(mfrow = c(3,1))
plot(drive_data$game$lavine_dist, type = "l",
     main = "Distance from Hoop",
     xlab = "Time (25hz)", ylab = "Distance from Hoop")
plot(drive_data$game$lavine_speed, type = "l",
     main = "Raw Speed",
     xlab = "Time (25hz)", ylab = "Speed")
plot(lavine_speed_smooth, type = "l",
     main = "Smooth Speed",
     xlab = "Time (25hz)", ylab = "Speed")

Now that we have the transformed the data appropriately we can specify and fit the model.

Specifying and Fitting the Model

Here our HMM needs to infer two hidden states: drive and none, using both the speed and distance observed sequences. One approach would be to model 1/speed in order to get the speed and distance sequences to trend in the same direction as the drive and non-drive state (Keshri et al 2017). In this case a drive would be defined by,

The player reduces their 1/speed (equivalent to increasing speed).
The player reduces the distance between himself and the basket.

If the data are transformed in this way a natural modeling approach for the emission probabilities would be to use the exponential distribution function. Unfortunately this poses two issues:

Scale of the data: Both the distance and the speed metric are on very different scales. It would make sense to take the log of these data to normalize them but unfortunately this is infeasible since the support of the exponential distribution must be greater than or equal to 0.
Computationally unstable: The 1/speed transformation contains values that are mostly close to 0. This makes it difficult to use distributions that have most of their density around 0, such as the exponential distribution. The reason being that it will be difficult to discriminate between the “drive” and “none” states, since modeling 1/speed with the exponential distribution will attribute high probabilities to both states.

While it is mathematically more tractable to use the exponential distribution, we argue that such a choice to model the data generation process will be computationally unstable. Instead, we opt to use the normal distribution on both the 1/speed and distance metrics, enabling us to model the data on the log scale.

With the normal distributions defined over the observed data, our HMM is defined as follows,

$\begin{align*} u_t &\sim \mathcal{N}(\phi_{z_t}, \tau) \\ v_t &\sim \mathcal{N}(\lambda_{z_t}, \rho) \\ z_t &\sim \mathcal{Categorical}(\theta_{z_{[t-1]}}) \\ \theta_{z_t} &\sim \mathcal{Dir}(\alpha_{z_t}) \\ \end{align*}$

where $z_t \in [1,2]$ , $\tau=\rho=0.1$ , and $\alpha = [[4,2],[2,4]]$ .

Below is a graphical representation of the process that we are trying to model.

Now we can fit the model to LaVine’s data for the single drive possession. We provide the model results and diagnostics below.

# code available in drive_1.R
drive_stan <- readRDS("../results/drive_1.RDS")
print(drive_stan$fit, pars = 'z_star', include = FALSE, probs = c(0.05,0.95))

## Inference for Stan model: drive_1.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##                  mean se_mean   sd       5%      95% n_eff Rhat
## theta[1,1]       0.97    0.00 0.02     0.93     0.99  2129 1.00
## theta[1,2]       0.03    0.00 0.02     0.01     0.07  2129 1.00
## theta[2,1]       0.01    0.00 0.01     0.00     0.02  1954 1.00
## theta[2,2]       0.99    0.00 0.01     0.98     1.00  1954 1.00
## phi[1]          -2.34    0.00 0.01    -2.36    -2.33  1501 1.00
## phi[2]          -0.74    0.00 0.01    -0.75    -0.74  2150 1.00
## lambda[1]        2.43    0.00 0.01     2.41     2.45  1321 1.00
## lambda[2]        3.54    0.00 0.01     3.53     3.55  2262 1.00
## log_p_z_star -6905.79    0.07 2.19 -6910.07 -6902.96  1026 1.00
## lp__         -6932.75    0.07 1.82 -6936.19 -6930.45   757 1.01
## 
## Samples were drawn using NUTS(diag_e) at Sun Sep 29 00:05:49 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

mcmc_trace(as.array(drive_stan$fit), regex_pars = "^theta\\[|^psi\\[|^lambda\\[", facet_args = list(nrow = 2))

For comparison we implemented the exponential distribution version of the model in drive_0.R. In line with our theory we found that the parameter chains struggled to converge in some model runs, validating our approach to use the normal distribution to model the log-transformed data.

Post-processing Data

We post-process the predicted state values and layer them on top of the original distance/speed plots to see if the predictions lined up with our logic. We also show the posterior predictions of the output values given the predicted states and estimated parameter values. While the posterior predictions don’t completely track the variation in the data, they do line up with the states appropriately.

samples <- as.matrix(drive_stan$fit)
phi <- samples[,grep("^phi", colnames(samples))]
lambda <- samples[,grep("^lambda", colnames(samples))]
z_star <- samples[,grep("^z_star", colnames(samples))]

phi_hat <- list()
lambda_hat <- list()
for (i in 1:nrow(samples)) {
  phi_seq <- sapply(z_star[i,], function(x){phi[i,x]})
  lambda_seq <- sapply(z_star[i,], function(x){lambda[i,x]})
  phi_hat[[i]] <- rnorm(length(phi_seq), phi_seq, 0.1)
  lambda_hat[[i]] <- rnorm(length(lambda_seq), lambda_seq, 0.1)
}

indxs <- sample(length(samples), 100, replace = FALSE)

par(mfrow = c(3,1))
# dist
plot(drive_stan$data$v, type = "n",
     main = "Distance from Hoop (log scale)",
     xlab = "Time (25hz)", ylab = "Distance from Hoop")
for (i in 1:length(indxs))
  lines(lambda_hat[[i]], col = "#ff668890")
lines(drive_stan$data$v, lwd = 2)
legend("topright", c("Observed","Predicted"), col = c("#000000","#ff668890"), lty = c(1,1), lwd = c(2,1), cex = 0.8)
# speed
plot(drive_stan$data$u, type = "n",
     main = "Smooth Speed",
     xlab = "Time (25hz)", ylab = "Speed (log scale)")
for (i in 1:length(indxs))
  lines(phi_hat[[i]], col = "#ff668890")
lines(drive_stan$data$u, lwd = 2)
legend("topright", c("Observed","Predicted"), col = c("#000000","#ff668890"), lty = c(1,1), lwd = c(2,1), cex = 0.8)
# states
plot(round(colMeans(z_star)), type = "l", pch = 1, cex = 0.5,
     main = "Hidden States",
     ylab = "State", xlab = "Time (25hz)",
     ylim = c(0.5, 2.5), yaxt = "n")
    axis(2, c(1,2), c("Drive", "None"), las = 2)

We can also layer the predicted latent states on top of the previous video for a more comprehensive view of how the state sequence behaves based on what’s happening on the court.

It looks like things line up nicely. The drive event gets triggered only when the player dramatically reduces their distance to the basket and increases their speed. To be completely honest, the model did a good job at modeling this specific drive by Zach LaVine, not drives in general. In order for the parameters to comprehensively capture a drive we recommend running the model over a set of drive events performed by various players.

Defensive Assignment

Our final task is to determine defensive assignment in basketball using HMMs. We follow an approach similar to Franks et al (2015). However, instead of fitting an HMM using the EM algorithm we fit the model in Stan. Before diving into the model we need to define what good basketball defense is. Arguably, good defensive positioning means that the defender is somewhere in the triangle defined by the location of the hoop, the ball, and the offensive player in question. While there are many edge cases to consider, this is a reasonable abstraction of basketball defense. An example of this positioning is illustrated below.

h <- c(0,-1)
b <- c(0,1.5)
o <- c(-1,0.5)
d <- c(-0.3,0.3)
plot(rbind(h,b,o,h), type = "o",
     xlim = c(-2,1), ylim = c(-2,2),
     xlab = "", ylab = "", xaxt = "n", yaxt = "n",
     pch = 20)
points(d[1],d[2], pch = 20)
text(h[1],h[2], "hoop", pos = 1, cex = 0.8)
text(b[1],b[2], "ball", pos = 3, cex = 0.8)
text(o[1],o[2], "offensive player", pos = 2, cex = 0.8)
text(d[1],d[2], "defender", pos = 3, cex = 0.8)

This modeling approach assumes man-to-man defense is being run on the court, which is often the case in the NBA. This model doesn’t deal with zone defense. (Although we could identify certain locations on the court as the appropriate location for a defender when running a zone defense and apply an HMM to that data. But then we would need some way of distinguishing whether a zone is being run or not.)

Using this definition, we model the defender’s position $\d_{t}$ at time step $t$ as 2 draws from the normal distribution where the location parameter is defined as a convex combination of the hoop, ball, and offensive player coordinates.

$d_{t} \sim \mathcal{N}(\mu_{z_{t}}, \tau) \\ \mu_{z_t} = o_{z_t}\lambda_{1} + h\lambda_{3} + b_t\lambda_{2} \\$

The index $z_t \in [1,2,\ldots,5]$ represents the hidden states (one of the 5 offensive player). $d_t$ denotes the defensive player’s coordinates (in this example we only consider one defender). $o_{z_t}$ is the position of the offensive player associated with state $z_t$ , $h$ is the position of the hoop, and $b_t$ is the position of the ball. Because $\mu_{k}$ is a convex combination we also have the following constraint,

$\lambda_{1} + \lambda_{2} + \lambda_{3} = 1 \\ \Leftrightarrow \\ \sum_l{\lambda_l} = 1$ This specifies that $\lambda$ is a simplex (the sum of all the elements equals 1).

Putting the model together with the state stochastic matrix $\theta$ and prior distributions gives,

$d_{t} \sim \mathcal{N}(\mu_{z_{t}}, \tau) \\ \mu_{z_t} = o_{z_t}\lambda_{1} + h\lambda_{3} + b_t\lambda_{2} \\ z_{t} \sim \mbox{Categorical}(\theta_{z_{t-1}}) \\ \mbox{s.t. } \sum_l{\lambda_l} = 1 \\ \mbox{priors on } \lambda, \theta$

A good choice for priors on $lambda$ and each row of $\theta$ would be the Dirichlet distribution. For a single defender, this model determines which one of the five offensive players is being guarded at each time step.

Simulated Data

Before applying the model directly to the player tracking data we should check to see if our model performs appropriately with simulated data. Working in this type of controlled situation will also provide some intuition on what we are trying to model.

The figure below illustrates an overly simplified basketball possession. For 20 time steps we sample a fixed position of five offensive players (o1 to o5) and the ball. (We do this with trivial noise to mimic the noisiness of the player tracking data.) We construct the hoop coordinate without noise since the position is fixed over time. With all these pieces in place we trace the path of a single defender in an upside-down parabola shape around the hoop.

# code available in data/defense_example_data.R
defense_example <- readRDS("../data/defense_example.RDS") 
list2env(defense_example, .GlobalEnv)

## <environment: R_GlobalEnv>

# plot
plt_defense_example(defense_example, main = "Defense Example")
points(d[,1], d[,2], col = "#ff6688", pch = 0)

Since the ball and hoop locations do not change over time, the convex combination only varies based the offensive player under consideration. Intuitively we want the model to say that the defender is guarding a particular offensive player if the defender is closest to the convex combination generated by that offensive player. In order to do so, we need to estimate what the appropriate convex combination is.

We fit the model to the simulated data assuming that the convex combination parameters are known and fixed to $\lambda = [1/3,1/3,1/3]$ . This approach strongly biases the model to prior knowledge about where defenders are situated when guarding offensive players. Specifically it states that the locations (offensive player, hoop, and ball) are equally weighted so that the convex combination (i.e. optimal defensive player positioning) is not skewed towards certain locations. For example if we specified $\lambda = [0.4,0.4,0.2]$ then the convex combination will be closer to the offensive player and the hoop, and farther away from the ball. In this situation, a defender who is closer to this convex combination will be assigned to the offensive player over a defender who is closer to the convex combination defined by $\boldsymbol{\lambda} = [1/3,1/3,1/3]$ .

# code available in defense_0a.R
# fixing convex combination parameter
defense_0a_stan <- readRDS("../results/defense_0a.RDS")
print(defense_0a_stan$fit, pars = "z_star", include = FALSE, probs = c(0.05,0.95))

## Inference for Stan model: defense_0a.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##                   mean se_mean   sd        5%       95% n_eff Rhat
## theta[1,1]        0.50    0.00 0.15      0.25      0.76  2421 1.00
## theta[1,2]        0.10    0.00 0.10      0.01      0.29  2551 1.00
## theta[1,3]        0.10    0.00 0.09      0.00      0.29  2596 1.00
## theta[1,4]        0.20    0.00 0.12      0.04      0.42  2402 1.00
## theta[1,5]        0.10    0.00 0.09      0.01      0.28  2550 1.00
## theta[2,1]        0.08    0.00 0.08      0.00      0.23  2448 1.00
## theta[2,2]        0.67    0.00 0.13      0.42      0.87  2805 1.00
## theta[2,3]        0.08    0.00 0.08      0.00      0.25  2336 1.00
## theta[2,4]        0.08    0.00 0.08      0.00      0.25  2046 1.00
## theta[2,5]        0.08    0.00 0.08      0.00      0.24  1730 1.00
## theta[3,1]        0.14    0.00 0.13      0.01      0.41  3093 1.00
## theta[3,2]        0.29    0.00 0.16      0.07      0.58  2462 1.00
## theta[3,3]        0.28    0.00 0.16      0.06      0.57  2679 1.00
## theta[3,4]        0.14    0.00 0.12      0.01      0.39  2419 1.00
## theta[3,5]        0.14    0.00 0.12      0.01      0.40  1943 1.00
## theta[4,1]        0.17    0.00 0.14      0.01      0.46  3410 1.00
## theta[4,2]        0.16    0.00 0.14      0.01      0.44  3531 1.00
## theta[4,3]        0.34    0.00 0.18      0.08      0.67  2290 1.00
## theta[4,4]        0.17    0.00 0.15      0.01      0.47  2108 1.00
## theta[4,5]        0.17    0.00 0.14      0.01      0.46  2219 1.00
## theta[5,1]        0.22    0.00 0.13      0.04      0.46  3062 1.00
## theta[5,2]        0.11    0.00 0.10      0.01      0.31  3262 1.00
## theta[5,3]        0.12    0.00 0.10      0.01      0.33  2279 1.00
## theta[5,4]        0.11    0.00 0.10      0.01      0.30  2712 1.00
## theta[5,5]        0.44    0.00 0.16      0.20      0.71  2028 1.00
## log_p_z_star -36396.57    0.05 2.53 -36401.12 -36392.77  2446 1.00
## lp__         -36451.73    0.13 3.52 -36458.12 -36446.41   731 1.01
## 
## Samples were drawn using NUTS(diag_e) at Sun Sep 29 00:08:43 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The model below is similar to the model above except that it assumes $\lambda$ is not known and has to be estimated from the data. We can still incorporate prior information about defensive positioning by applying the appropriate prior distribution on $\lambda$ . This will be less strict compared to declaring $\lambda$ explicitly. Here we use the following Dirichlet prior, $\lambda \sim \mathcal{Dir}(3,3,3)$ . This approach lets us learn about $\lambda$ from the data, while accounting for our prior knowledge. More importantly, this gives us insight into the type of positioning defenders favor within the convex combination assumption.

# priors on convex combination parameter
# code available in defense_0b.R
defense_0b_stan <- readRDS("../results/defense_0b.RDS")
print(defense_0b_stan$fit, pars = "z_star", include = FALSE, probs = c(0.05,0.95))

## Inference for Stan model: defense_0b.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##                   mean se_mean   sd        5%       95% n_eff Rhat
## theta[1,1]        0.50    0.00 0.15      0.25      0.74  2254 1.00
## theta[1,2]        0.10    0.00 0.09      0.01      0.28  2495 1.00
## theta[1,3]        0.10    0.00 0.09      0.01      0.29  2722 1.00
## theta[1,4]        0.20    0.00 0.12      0.04      0.42  2698 1.00
## theta[1,5]        0.10    0.00 0.09      0.01      0.28  2727 1.00
## theta[2,1]        0.11    0.00 0.10      0.01      0.31  3206 1.00
## theta[2,2]        0.55    0.00 0.16      0.29      0.82  2593 1.00
## theta[2,3]        0.12    0.00 0.10      0.01      0.33  2431 1.00
## theta[2,4]        0.11    0.00 0.10      0.01      0.30  1950 1.00
## theta[2,5]        0.11    0.00 0.10      0.01      0.33  2739 1.00
## theta[3,1]        0.13    0.00 0.11      0.01      0.33  3493 1.00
## theta[3,2]        0.24    0.00 0.14      0.05      0.52  2563 1.00
## theta[3,3]        0.37    0.00 0.16      0.13      0.65  2276 1.00
## theta[3,4]        0.13    0.00 0.12      0.01      0.37  1948 1.00
## theta[3,5]        0.13    0.00 0.11      0.01      0.36  2115 1.00
## theta[4,1]        0.09    0.00 0.08      0.01      0.26  3446 1.00
## theta[4,2]        0.09    0.00 0.08      0.01      0.25  3385 1.00
## theta[4,3]        0.18    0.00 0.11      0.04      0.39  2765 1.00
## theta[4,4]        0.54    0.00 0.14      0.31      0.76  3072 1.00
## theta[4,5]        0.09    0.00 0.08      0.01      0.25  2846 1.00
## theta[5,1]        0.34    0.00 0.19      0.08      0.67  3336 1.00
## theta[5,2]        0.16    0.00 0.14      0.01      0.44  2457 1.00
## theta[5,3]        0.17    0.00 0.14      0.01      0.45  2483 1.00
## theta[5,4]        0.17    0.00 0.15      0.01      0.47  2087 1.00
## theta[5,5]        0.16    0.00 0.14      0.01      0.45  2016 1.00
## lambda[1]         0.59    0.00 0.00      0.59      0.59  4926 1.00
## lambda[2]         0.41    0.00 0.00      0.41      0.41  4925 1.00
## lambda[3]         0.00    0.00 0.00      0.00      0.00  2755 1.00
## log_p_z_star -12510.41    0.06 3.08 -12515.90 -12505.94  2347 1.00
## lp__         -12589.30    0.13 3.62 -12595.78 -12583.96   809 1.01
## 
## Samples were drawn using NUTS(diag_e) at Sun Sep 29 00:40:27 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

For each model we plot the states associated with each position for the defender (these correspond to the offensive player numbers). Additionally, we plot a convex combination $\mu_{z_[t=1]}$ for each offensive player. You can see how defensive assignment is related to how close the defender is to the convex combination. In this example the difference between fixing and estimating $\lambda$ is apparent. When we estimate $\lambda$ the defender favors positioning themselves closer to the offensive player and hoop, and father away from the ball.

samples_0a <- as.matrix(defense_0a_stan$fit)
samples_0b <- as.matrix(defense_0b_stan$fit)
z_star_0a <- samples_0a[,grep("^z_star", colnames(samples_0a))]
z_star_0b <- samples_0b[,grep("^z_star", colnames(samples_0b))]
z_star_0a <- colMeans(z_star_0a)
z_star_0b <- colMeans(z_star_0b)

lambda <- samples_0b[,grep("^lambda", colnames(samples_0b))]
lambda <- colMeans(lambda)

# compute convex combination given fixed lambda
mu_0a <- list(mu1 = t(rbind(o[1,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
              mu2 = t(rbind(o[2,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
              mu3 = t(rbind(o[3,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
              mu4 = t(rbind(o[4,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
              mu5 = t(rbind(o[5,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c)
# compute convex combination given lambda estimate
mu_0b <- list(mu1 = t(rbind(o[1,1,],h,b[1,])) %*% lambda %>% t %>% c,
              mu2 = t(rbind(o[2,1,],h,b[1,])) %*% lambda %>% t %>% c,
              mu3 = t(rbind(o[3,1,],h,b[1,])) %*% lambda %>% t %>% c,
              mu4 = t(rbind(o[4,1,],h,b[1,])) %*% lambda %>% t %>% c,
              mu5 = t(rbind(o[5,1,],h,b[1,])) %*% lambda %>% t %>% c)

par(mfrow = c(1,2))
# model defense_0a
plt_defense_example(defense_example, main = expression(paste("Defense Example: ", lambda, " Fixed")))
lambda_0a_txt <- paste(round(c(1/3,1/3,1/3),2), collapse = ",")
text(-2,1.9, bquote(paste(Lambda, " = [", .(lambda_0a_txt),"]")), pos = 4)
text(d[,1], d[,2], labels = paste(z_star_0a), col = "#ff668890")
text(mu_0a$mu1[1], mu_0a$mu1[2], expression(mu[1]), cex = 0.8)
text(mu_0a$mu2[1], mu_0a$mu2[2], expression(mu[2]), cex = 0.8)
text(mu_0a$mu3[1], mu_0a$mu3[2], expression(mu[3]), cex = 0.8)
text(mu_0a$mu4[1], mu_0a$mu4[2], expression(mu[4]), cex = 0.8)
text(mu_0a$mu5[1], mu_0a$mu5[2], expression(mu[5]), cex = 0.8)
# model defense_0b
plt_defense_example(defense_example, main = expression(paste("Defense Example: ", lambda, " Estimated")))
lambda_0b_txt <- sprintf("%.2f", round(lambda,2), collapse=",")
lambda_0b_txt <- paste(lambda_0b_txt, collapse = ",")
text(-2,1.9, bquote(paste(Lambda %~~% phantom(), "[", .(lambda_0b_txt),"]")), pos = 4)
text(d[,1], d[,2], labels = paste(z_star_0b), col = "#ff668890")
text(mu_0b$mu1[1], mu_0b$mu1[2], expression(mu[1]), cex = 0.8)
text(mu_0b$mu2[1], mu_0b$mu2[2], expression(mu[2]), cex = 0.8)
text(mu_0b$mu3[1], mu_0b$mu3[2], expression(mu[3]), cex = 0.8)
text(mu_0b$mu4[1], mu_0b$mu4[2], expression(mu[4]), cex = 0.8)
text(mu_0b$mu5[1], mu_0b$mu5[2], expression(mu[5]), cex = 0.8)

Player Tracking Data

Now that we are confident in our model we can apply it to player tracking data. We select a event different from the one used in the previous section on tagging drives; one with more movement among players on the court. This possession still has the Minnesota Timberwolves as the offensive team and the Boston Celtics are the defensive team.

So far the models in this section have shown how to infer defensive assignment for only one defender. In order to apply this to event level player tracking data we need to determine defensive assignment for each of the 5 defenders over time. We implement this in Stan by using all of the defensive player coordinates in the data. This means that $d$ has an extra dimension $I=5$ . Now the dimensions of $d$ are (number of defenders, number of offensive players (states), number of time steps, number of coordinates (2)), which translates to (5,5,N,2) where N is defined by the length of the possession. Accordingly, we wrap the forward algorithm and the Viterbi algorithm in a loop to determine defensive assignment for each of the 5 defenders.

Mathematically we can represent this as,

$d_{i,t} \sim \mathcal{N}(\mu_{z_{t}}, \tau) \\ \mu_{z_t} = o_{z_t}\lambda_{1} + h\lambda_{3} + b_t\lambda_{2} \\ z_{t} \sim \mbox{Categorical}(\theta_{z_{t-1}}) \\ \mbox{s.t. } \sum_l{\lambda_l} = 1 \\ \mbox{priors } \\ \lambda \sim \mathcal{Dir}(6,6,6) \\ \theta_z \sim \mathcal{Dir}(6,6,6,6,6) \\$

Again, we specified the prior distribution on lambda to equally weight the locations in the convex combination, only this time with less variance. We specified a similar prior distribution on the stochastic matrix for each state.

Due to the computational cost of fitting the model to the entire data we lowered the resolution of the data by a factor of 10 (i.e. selecting every 10th observation in each observed sequence). We then interpolated the gaps by carrying the last infered state forward. The results of this model are provided below.

# code available in defense_1_low_res.R
defense_1_stan <- readRDS("../results/defense_1_evt244_low_res.RDS")
print(defense_1_stan$fit, pars = "z_star", include = FALSE, probs = c(0.05,0.95))

## Inference for Stan model: defense_1.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##                     mean se_mean     sd          5%         95% n_eff Rhat
## theta[1,1]          0.67    0.00   0.05        0.58        0.75  2077    1
## theta[1,2]          0.09    0.00   0.03        0.04        0.14  2144    1
## theta[1,3]          0.09    0.00   0.03        0.04        0.14  2445    1
## theta[1,4]          0.08    0.00   0.03        0.04        0.13  2395    1
## theta[1,5]          0.08    0.00   0.03        0.04        0.13  1917    1
## theta[2,1]          0.10    0.00   0.03        0.05        0.16  2428    1
## theta[2,2]          0.66    0.00   0.05        0.58        0.74  2400    1
## theta[2,3]          0.10    0.00   0.03        0.05        0.15  2380    1
## theta[2,4]          0.07    0.00   0.03        0.03        0.12  2493    1
## theta[2,5]          0.07    0.00   0.03        0.03        0.12  2666    1
## theta[3,1]          0.10    0.00   0.04        0.05        0.17  2288    1
## theta[3,2]          0.10    0.00   0.04        0.05        0.17  2527    1
## theta[3,3]          0.59    0.00   0.06        0.48        0.68  2511    1
## theta[3,4]          0.10    0.00   0.04        0.05        0.17  2473    1
## theta[3,5]          0.11    0.00   0.04        0.06        0.18  1988    1
## theta[4,1]          0.09    0.00   0.03        0.05        0.14  2319    1
## theta[4,2]          0.09    0.00   0.03        0.05        0.13  1889    1
## theta[4,3]          0.06    0.00   0.02        0.03        0.10  2040    1
## theta[4,4]          0.70    0.00   0.05        0.62        0.77  1887    1
## theta[4,5]          0.07    0.00   0.03        0.03        0.11  2621    1
## theta[5,1]          0.11    0.00   0.04        0.05        0.18  2773    1
## theta[5,2]          0.09    0.00   0.03        0.04        0.15  2383    1
## theta[5,3]          0.09    0.00   0.04        0.04        0.16  2321    1
## theta[5,4]          0.16    0.00   0.04        0.09        0.23  2449    1
## theta[5,5]          0.55    0.00   0.06        0.45        0.66  2119    1
## lambda[1]           0.58    0.00   0.00        0.58        0.58  2110    1
## lambda[2]           0.35    0.00   0.00        0.35        0.35  2683    1
## lambda[3]           0.07    0.00   0.00        0.07        0.07  3513    1
## log_p_z_star  -288157.59    4.06 183.62  -288453.97  -287857.00  2046    1
## lp__         -1122041.63    0.11   3.37 -1122047.76 -1122036.72  1021    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Sep 29 01:31:57 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Looking at the estimates of $\lambda$ it is clear that (based on this possession) the defenders favor positioning closer to the offensive player and hoop over the ball’s location.

For comparison we fit a model assuming that the stochastic matrix $\theta$ is known (code available in defense_2_low_res.R). Here we fix each element of the stochastic matrix to 0.2. While the speed at which the model fit significantly increased, this is at the expense of not learning the stochastic matrix from the data/priors.

Finally, we can visualize what defensive assignment looks like when inferred from this HMM.

It looks like the assignments line up appropriately based on what we are observing. Additionally, we can see some noticable switching and double teaming. Unfortunatley, it is difficult to validate the assingments unless we watch footage of the possession.

Conclusion

This exposition has provided some guidance on how to model several series of data, provided that you believe there is some hidden state sequence governing the generation of the data that you observe. We’ve shown how to fit a basic HMM with Stan and how to apply this methodology to basketball player tracking data. There are some interesting opportunties to extend our work.

One such opportunity is that revealing hidden states with HMMs can provide humans with assistance when labeling data. Rather than looking at raw footage and tagging events/assignment from scratch, we can use HMMs and some basic prior knowledge to preemptively label data which can be further refined by humans if necessary. With this refined data we can go back to the modeling step and use a fully supervised approach (now that we are certain about the hidden state sequence in addition to the observed data) to get better estimates of the parameters in the model.

For example, after a first pass to infer the state sequence, the parameters in our drive model may not be the best at identifying when a drive actually takes place. However, after the state sequence is refined through human auditing we can re-estimate the parameters so that or model is more appropriate.

Additionally, we can pair this modeling approach with other models to get a more comprehensive understanding of what’s taking place on the court. Going back to the drive example, it’s difficult to infer the type of shot that took place as as result of the drive. Was it a floater, a bank shot, a finger-roll? It’s too hard to tell with just the coordinate level player tracking data. However, we could combine the player tracking data with computer vision algorithms that can determine the type of shot from video footage. This metadata can then be paired with the player tracking data to provide a more complete picture about the event that took place.

Acknowledgements

Thanks to Bob Carpenter for valuable feedback, and to Dan Cervone for sharing the code to visualize the player tracking data on GitHub.

References

Betancourt M. (2017) https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html.

Cervone, D., D’Amour, A., Bornn, L. & Goldsberry, K. (2016) A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes. Version 3. https://arxiv.org/abs/1408.0777.

Franks, A., Miller, A., Bornn, L. & Goldsberry, K. (2015) Characterizing the spatial structure of defensive skill in professional basketball. Version 3. https://arxiv.org/abs/1405.0231.

Keshri, S., Oh, M., Zhang, S. & Iyengar, G. (2017) Automatic Event Detection in Basketball. Selected presentations from the 2017 NESSIS. http://nessis.org/nessis17.html.

Keshri, S., Oh, M., Zhang, S. & Iyengar, G. (2019) Automatic event detection in basketball using HMM with energy based defensive assignment. Journal of Quantitiative Analysis in Sports. 15(2), 141–153.

Stan Development Team (2019) Stan Modeling Language User’s Guide. Version 2.20. http://mc-stan.org.

Stan Development Team (2019) Stan Language Reference Manual. Version 2.20. http://mc-stan.org.

Computing Environment

devtools::session_info("rstan")

## Session info -------------------------------------------------------------

##  setting  value                       
##  version  R version 3.4.0 (2017-04-21)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       <NA>                        
##  date     2019-10-05

## Packages -----------------------------------------------------------------

##  package      * version    date       source                         
##  assertthat     0.2.0      2017-04-11 CRAN (R 3.4.0)                 
##  backports      1.1.2      2017-12-13 cran (@1.1.2)                  
##  base64enc      0.1-3      2015-07-28 CRAN (R 3.4.0)                 
##  BH             1.66.0-1   2018-02-13 cran (@1.66.0-)                
##  callr          3.0.0.9000 2018-09-06 Github (r-pkgs/callr@10c1269)  
##  cli            1.0.1      2018-09-25 cran (@1.0.1)                  
##  colorspace     1.3-2      2016-12-14 CRAN (R 3.4.0)                 
##  compiler       3.4.0      2017-04-21 local                          
##  crayon         1.3.4      2017-09-16 cran (@1.3.4)                  
##  desc           1.2.0      2018-05-01 cran (@1.2.0)                  
##  digest         0.6.18     2018-10-10 cran (@0.6.18)                 
##  ggplot2      * 3.0.0      2018-07-03 cran (@3.0.0)                  
##  glue           1.3.0      2018-07-17 cran (@1.3.0)                  
##  graphics     * 3.4.0      2017-04-21 local                          
##  grDevices    * 3.4.0      2017-04-21 local                          
##  grid           3.4.0      2017-04-21 local                          
##  gridExtra      2.3        2017-09-09 cran (@2.3)                    
##  gtable         0.2.0      2016-02-26 CRAN (R 3.4.0)                 
##  inline         0.3.15     2018-05-18 cran (@0.3.15)                 
##  labeling       0.3        2014-08-23 CRAN (R 3.4.0)                 
##  lattice        0.20-35    2017-03-25 CRAN (R 3.4.0)                 
##  lazyeval       0.2.1      2017-10-29 cran (@0.2.1)                  
##  loo            2.0.0      2018-04-11 CRAN (R 3.4.4)                 
##  magrittr       1.5        2014-11-22 CRAN (R 3.4.0)                 
##  MASS           7.3-47     2017-02-26 CRAN (R 3.4.0)                 
##  Matrix         1.2-14     2018-04-09 CRAN (R 3.4.4)                 
##  matrixStats    0.52.2     2017-04-14 CRAN (R 3.4.0)                 
##  methods      * 3.4.0      2017-04-21 local                          
##  mgcv           1.8-17     2017-02-08 CRAN (R 3.4.0)                 
##  munsell        0.5.0      2018-06-12 cran (@0.5.0)                  
##  nlme           3.1-131    2017-02-06 CRAN (R 3.4.0)                 
##  parallel       3.4.0      2017-04-21 local                          
##  pillar         1.1.0      2018-01-14 cran (@1.1.0)                  
##  pkgbuild       1.0.0      2018-09-06 Github (r-lib/pkgbuild@b748041)
##  plyr           1.8.4      2016-06-08 CRAN (R 3.4.0)                 
##  processx       3.2.0      2018-08-16 cran (@3.2.0)                  
##  ps             1.1.0      2018-08-10 cran (@1.1.0)                  
##  R6             2.3.0      2018-10-04 cran (@2.3.0)                  
##  RColorBrewer   1.1-2      2014-12-07 CRAN (R 3.4.0)                 
##  Rcpp           1.0.0      2018-11-07 cran (@1.0.0)                  
##  RcppEigen      0.3.3.4.0  2018-02-07 cran (@0.3.3.4)                
##  reshape2       1.4.3      2017-12-11 cran (@1.4.3)                  
##  rlang          0.3.1      2019-01-08 CRAN (R 3.4.4)                 
##  rprojroot      1.3-2      2018-01-03 cran (@1.3-2)                  
##  rstan        * 2.18.1     2018-09-07 local                          
##  scales         1.0.0      2018-08-09 cran (@1.0.0)                  
##  StanHeaders  * 2.18.0     2018-09-07 local                          
##  stats        * 3.4.0      2017-04-21 local                          
##  stats4         3.4.0      2017-04-21 local                          
##  stringi        1.2.4      2018-07-20 cran (@1.2.4)                  
##  stringr        1.3.1      2018-05-10 cran (@1.3.1)                  
##  tibble         1.4.2      2018-01-22 cran (@1.4.2)                  
##  tools          3.4.0      2017-04-21 local                          
##  utf8           1.1.3      2018-01-03 cran (@1.1.3)                  
##  utils        * 3.4.0      2017-04-21 local                          
##  viridisLite    0.3.0      2018-02-01 cran (@0.3.0)                  
##  withr          2.1.2      2018-03-15 CRAN (R 3.4.4)