24.3 Some Differences in the Statistical Models that are Allowed

Stan does not yet support declaration of discrete parameters. Discrete data variables are supported. Inference is supported for discrete parameters as described in the mixture and latent discrete parameters chapters of the manual.

Stan has some distributions on covariance matrices that do not exist in BUGS, including a uniform distribution over correlation matrices which may be rescaled, and the priors based on C-vines defined in Lewandowski, Kurowicka, and Joe (2009). In particular, the Lewandowski et al. prior allows the correlation matrix to be shrunk toward the unit matrix while the scales are given independent priors.

In BUGS you need to define all variables. In Stan, if you declare but don’t define a parameter it implicitly has a flat prior (on the scale in which the parameter is defined). For example, if you have a parameter p declared as

real<lower = 0, upper = 1> p;

and then have no sampling statement for p in the model block, then you are implicitly assigning a uniform \([0,1]\) prior on p.

On the other hand, if you have a parameter theta declared with

real theta;

and have no sampling statement for theta in the model block, then you are implicitly assigning an improper uniform prior on \((-\infty,\infty)\) to theta.

BUGS models are always proper (being constructed as a product of proper marginal and conditional densities). Stan models can be improper. Here is the simplest improper Stan model:

parameters {
  real theta;
model { }

Although parameters in Stan models may have improper priors, we do not want improper posterior distributions, as we are trying to use these distributions for Bayesian inference. There is no general way to check if a posterior distribution is improper. But if all the priors are proper, the posterior will be proper also.

Each statement in a Stan model is directly translated into the C++ code for computing the log posterior. Thus, for example, the following pair of statements is legal in a Stan model:

y ~ normal(0,1);
y ~ normal(2,3);

The second line here does not simply overwrite the first; rather, both statements contribute to the density function that is evaluated. The above two lines have the effect of including the product, \(\mathsf{normal}(y|0,1) * \mathsf{normal}(y|2,3)\), into the density function.

For a perhaps more confusing example, consider the following two lines in a Stan model:

x ~ normal(0.8 * y, sigma);
y ~ normal(0.8 * x, sigma);

At first, this might look like a joint normal distribution with a correlation of 0.8. But it is not. The above are not interpreted as conditional entities; rather, they are factors in the joint density. Multiplying them gives, \(\mathsf{normal}(x|0.8y,\sigma) \times \mathsf{normal}(y|0.8x,\sigma)\), which is what it is (you can work out the algebra) but it is not the joint distribution where the conditionals have regressions with slope 0.8.

With censoring and truncation, Stan uses the censored-data or truncated-data likelihood—this is not always done in BUGS. All of the approaches to censoring and truncation discussed in Gelman et al. (2013) and Gelman and Hill (2007) may be implemented in Stan directly as written.

Stan, like BUGS, can benefit from human intervention in the form of reparameterization.


Lewandowski, Daniel, Dorota Kurowicka, and Harry Joe. 2009. “Generating Random Correlation Matrices Based on Vines and Extended Onion Method.” Journal of Multivariate Analysis 100: 1989–2001.

Gelman, Andrew, J. B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. Third. London: Chapman &Hall/CRC Press.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel-Hierarchical Models. Cambridge, United Kingdom: Cambridge University Press.