2.4 Autoregressive Moving Average Models
Autoregressive moving-average models (ARMA), combine the predictors of the autoregressive model and the moving average model. An ARMA(1,1) model, with a single state of history, can be encoded in Stan as follows.
data {
int<lower=1> T; // num observations
real y[T]; // observed outputs
}
parameters {
real mu; // mean coeff
real phi; // autoregression coeff
real theta; // moving avg coeff
real<lower=0> sigma; // noise scale
}
model {
vector[T] nu; // prediction for time t
vector[T] err; // error for time t
nu[1] = mu + phi * mu; // assume err[0] == 0
err[1] = y[1] - nu[1];
for (t in 2:T) {
nu[t] = mu + phi * y[t-1] + theta * err[t-1];
err[t] = y[t] - nu[t];
}
mu ~ normal(0, 10); // priors
phi ~ normal(0, 2);
theta ~ normal(0, 2);
sigma ~ cauchy(0, 5);
err ~ normal(0, sigma); // likelihood
}
The data are declared in the same way as the other time-series regressions and the parameters are documented in the code.
In the model block, the local vector nu
stores the predictions
and err
the errors. These are computed similarly to the
errors in the moving average models described in the previous section.
The priors are weakly informative for stationary processes. The likelihood only involves the error term, which is efficiently vectorized here.
Often in models such as these, it is desirable to inspect the
calculated error terms. This could easily be accomplished in Stan by
declaring err
as a transformed parameter, then defining it the
same way as in the model above. The vector nu
could still be a
local variable, only now it will be in the transformed parameter block.
Wayne Folta suggested encoding the model without local vector variables as follows.
model {
real err;
mu ~ normal(0, 10);
phi ~ normal(0, 2);
theta ~ normal(0, 2);
sigma ~ cauchy(0, 5);
err = y[1] - mu + phi * mu;
err ~ normal(0, sigma);
for (t in 2:T) {
err = y[t] - (mu + phi * y[t-1] + theta * err);
err ~ normal(0, sigma);
}
}
This approach to ARMA models illustrates how local
variables, such as err
in this case, can be reused in Stan.
Folta’s approach could be extended to higher order moving-average
models by storing more than one error term as a local variable and
reassigning them in the loop.
Both encodings are fast. The original encoding has the advantage
of vectorizing the normal distribution, but it uses a bit more memory.
A halfway point would be to vectorize just err
.
Identifiability and Stationarity
MA and ARMA models are not identifiable if the roots of the characteristic polynomial for the MA part lie inside the unit circle, so it’s necessary to add the following constrain9
real<lower = -1, upper = 1> theta;
When the model is run without the constraint, using synthetic data
generated from the model, the simulation can sometimes find modes for
(theta
, phi
) outside the \([-1,1]\) interval, which
creates a multiple mode problem in the posterior and also causes the
NUTS tree depth to get large (often above 10). Adding the
constraint both improves the accuracy of the posterior and
dramatically reduces the tree depth, which speeds up the simulation
considerably (typically by much more than an order of magnitude).
Further, unless one thinks that the process is really non-stationary, it’s worth adding the following constraint to ensure stationarity.
real<lower = -1, upper = 1> phi;