2.5 Stochastic Volatility Models
Stochastic volatility models treat the volatility (i.e., variance) of a return on an asset, such as an option to buy a security, as following a latent stochastic process in discrete time Kim, Shephard, and Chib (1998). The data consist of mean corrected (i.e., centered) returns \(y_t\) on an underlying asset at \(T\) equally spaced time points. Kim et al. formulate a typical stochastic volatility model using the following regression-like equations, with a latent parameter \(h_t\) for the log volatility, along with parameters \(\mu\) for the mean log volatility, and \(\phi\) for the persistence of the volatility term. The variable \(\epsilon_t\) represents the white-noise shock (i.e., multiplicative error) on the asset return at time \(t\), whereas \(\delta_t\) represents the shock on volatility at time \(t\). \[ y_t = \epsilon_t \exp(h_t / 2), \] \[ h_{t+1} = \mu + \phi (h_t - \mu) + \delta_t \sigma \] \[ h_1 \sim \mathsf{normal}\left( \mu, \frac{\sigma}{\sqrt{1 - \phi^2}} \right) \] \[ \epsilon_t \sim \mathsf{normal}(0,1); \ \ \ \ \ \delta_t \sim \mathsf{normal}(0,1) \]
Rearranging the first line, \(\epsilon_t = y_t \exp(-h_t / 2)\), allowing the sampling distribution for \(y_t\) to be written as \[ y_t \sim \mathsf{normal}(0,\exp(h_t/2)). \] The recurrence equation for \(h_{t+1}\) may be combined with the scaling and sampling of \(\delta_t\) to yield the sampling distribution \[ h_t \sim \mathsf{normal}(\mu + \phi(h_t - \mu), \sigma). \] This formulation can be directly encoded, as shown in the following Stan model.
data {
int<lower=0> T; // # time points (equally spaced)
vector[T] y; // mean corrected return at time t
}
parameters {
real mu; // mean log volatility
real<lower=-1,upper=1> phi; // persistence of volatility
real<lower=0> sigma; // white noise shock scale
vector[T] h; // log volatility at time t
}
model {
phi ~ uniform(-1, 1);
sigma ~ cauchy(0, 5);
mu ~ cauchy(0, 10);
h[1] ~ normal(mu, sigma / sqrt(1 - phi * phi));
for (t in 2:T)
h[t] ~ normal(mu + phi * (h[t - 1] - mu), sigma);
for (t in 1:T)
y[t] ~ normal(0, exp(h[t] / 2));
}
Compared to the Kim et al. formulation, the Stan model adds priors for the parameters \(\phi\), \(\sigma\), and \(\mu\). The shock terms \(\epsilon_t\) and \(\delta_t\) do not appear explicitly in the model, although they could be calculated efficiently in a generated quantities block.
The posterior of a stochastic volatility model such as this one typically has high posterior variance. For example, simulating 500 data points from the above model with \(\mu = -1.02\), \(\phi = 0.95\), and \(\sigma = 0.25\) leads to 95% posterior intervals for \(\mu\) of \((-1.23, -0.54)\), for \(\phi\) of \((0.82,0.98 )\) and for \(\sigma\) of \((0.16,0.38)\).
The samples using NUTS show a high degree of autocorrelation among the samples, both for this model and the stochastic volatility model evaluated in (Hoffman and Gelman 2011, 2014). Using a non-diagonal mass matrix provides faster convergence and more effective samples than a diagonal mass matrix, but will not scale to large values of \(T\).
It is relatively straightforward to speed up the effective samples per second generated by this model by one or more orders of magnitude. First, the sampling statements for return \(y\) is easily vectorized to
y ~ normal(0, exp(h / 2));
This speeds up the iterations, but does not change the effective
sample size because the underlying parameterization and log
probability function have not changed. Mixing is improved by by
reparameterizing in terms of a standardized volatility, then
rescaling. This requires a standardized parameter h_std
to be
declared instead of h
.
parameters {
...
vector[T] h_std; // std log volatility time t
The original value of h
is then defined in a transformed
parameter block.
transformed parameters {
vector[T] h = h_std * sigma; // now h ~ normal(0, sigma)
h[1] /= sqrt(1 - phi * phi); // rescale h[1]
h += mu;
for (t in 2:T)
h[t] += phi * (h[t-1] - mu);
}
The first assignment rescales h_std
to have a
\(\mathsf{normal}(0,\sigma)\) distribution and temporarily assigns it to
h
. The second assignment rescales h[1]
so that its
prior differs from that of h[2]
through h[T]
. The next
assignment supplies a mu
offset, so that h[2]
through
h[T]
are now distributed \(\mathsf{normal}(\mu,\sigma)\); note
that this shift must be done after the rescaling of h[1]
. The
final loop adds in the moving average so that h[2]
through
h[T]
are appropriately modeled relative to phi
and
mu
.
As a final improvement, the sampling statement for h[1]
and
loop for sampling h[2]
to h[T]
are replaced with a
single vectorized standard normal sampling statement.
model {
...
h_std ~ std_normal();
Although the original model can take hundreds and sometimes thousands of iterations to converge, the reparameterized model reliably converges in tens of iterations. Mixing is also dramatically improved, which results in higher effective sample sizes per iteration. Finally, each iteration runs in roughly a quarter of the time of the original iterations.
References
Hoffman, Matthew D., and Andrew Gelman. 2011. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” arXiv 1111.4246. http://arxiv.org/abs/1111.4246.
Hoffman, Matthew D., and Andrew Gelman. 2011. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” arXiv 1111.4246. http://arxiv.org/abs/1111.4246.
2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15: 1593–1623. http://jmlr.org/papers/v15/hoffman14a.html.Kim, Sangjoon, Neil Shephard, and Siddhartha Chib. 1998. “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models.” Review of Economic Studies 65: 361–93.