Exercise 2.2

Let's write another linear regression.

We should know this by now, right? \(y \sim \mathcal{N}(\beta X + \alpha, \sigma)\)

data {
  int N;
  int K;
  matrix[N, K] X;
  real y[N];
}

Full linear regression model

data {
  int N;
  int K;
  matrix[N, K] X;
  real y[N];
}
parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(X * beta + alpha, sigma);
}

Problem fit

Warning: There were 88 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

Warning: There were 1318 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low

Warning: Examine the pairs() plot to diagnose sampling problems

Warnings

"X divergent transitions; increase adapt_delta"
"Y transitions exceeded maximum treedepth"
"Examine the pairs() plot"
We run multiple chains partially to ensure we find issues

The fit itself

Inference for Stan model: model.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean      se_mean           sd n_eff Rhat
alpha   -431333212.7 9.838594e+08 6.353719e+09    42 1.12
beta[1]  -12507614.2 1.919952e+07 2.422585e+08   159 1.03
beta[2]  -42691574.9 9.176587e+07 6.523436e+08    51 1.09
beta[3]  -12206733.3 1.800335e+07 2.702798e+08   225 1.01
sigma   3006709155.6 1.278141e+09 8.207271e+09    41 1.10
lp__           -76.8 4.740000e+00 1.219000e+01     7 1.41

Samples were drawn using NUTS(diag_e) at Tue Dec  5 22:40:25 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

pairs() plot

Woops, let's add priors

data {
  int N;
  int K;
  matrix[N, K] X;
  real y[N];
}
parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(X * beta + alpha, sigma);
  sigma ~ normal(0, 20);
  alpha ~ normal(0, 20);
  beta ~ normal(0, 20);
}

Now with 100% more priors!

Warning: There were 69 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

Warning: Examine the pairs() plot to diagnose sampling problems

Inference for Stan model: model2.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

          mean se_mean   sd   2.5%    25%    50%   75% 97.5% n_eff Rhat
alpha     4.36    0.26 8.31 -14.61   0.49   4.97  8.49 20.59  1006 1.01
beta[1]  -0.52    0.01 0.52  -1.65  -0.74  -0.51 -0.30  0.57  1367 1.00
beta[2]   0.91    0.04 1.30  -2.16   0.36   1.00  1.54  3.45  1033 1.00
beta[3]   0.25    0.03 0.92  -1.81  -0.14   0.29  0.66  2.08  1293 1.00
sigma    10.91    0.43 7.75   1.87   5.30   8.90 14.23 31.70   319 1.00
lp__    -11.27    0.17 2.92 -17.79 -13.09 -10.99 -9.16 -6.47   288 1.01

Samples were drawn using NUTS(diag_e) at Tue Dec  5 22:40:31 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Pairs() plot, plus priors

pairs(fit2, pars = c("alpha", "beta[1]", "sigma"))

Raising adapt_delta

fit2.1 = sampling(model2, list(y = y, N = N, X = X), control = list(adapt_delta = 0.99), 
    seed = 1234)

The more careful fit

No more divergences!

Inference for Stan model: model2.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

          mean se_mean   sd   2.5%    25%    50%   75% 97.5% n_eff Rhat
alpha     4.57    0.26 8.33 -13.66   0.82   4.85  8.48 21.86  1052 1.01
beta[1]  -0.51    0.01 0.51  -1.57  -0.73  -0.51 -0.30  0.50  1224 1.01
beta[2]   0.92    0.04 1.33  -2.18   0.34   0.98  1.52  3.56  1007 1.00
beta[3]   0.25    0.03 0.87  -1.59  -0.11   0.25  0.63  2.15  1188 1.01
sigma    10.48    0.35 7.50   2.44   5.14   8.18 13.69 30.37   466 1.00
lp__    -11.02    0.15 2.92 -17.52 -12.90 -10.70 -8.81 -6.44   379 1.00

Samples were drawn using NUTS(diag_e) at Tue Dec  5 22:40:36 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Punctilious pairs() plot

pairs(fit2.1, pars = c("alpha", "beta[1]", "sigma"))

More informative priors

data {
  int N;
  int K;
  matrix[N, K] X;
  real y[N];
}
parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(X * beta + alpha, sigma);
  sigma ~ normal(0, 2);
  alpha ~ normal(0, 5);
  beta ~ normal(0, 5);
}

Fit

fit3 = sampling(model3, list(y = y, N = N, X = X), seed = 1234)

More informative prior fit

Inference for Stan model: model3.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

         mean se_mean   sd   2.5%    25%   50%   75% 97.5% n_eff Rhat
alpha    4.45    0.06 2.32  -0.38   2.98  4.56  5.96  8.72  1404 1.00
beta[1] -0.52    0.00 0.13  -0.80  -0.60 -0.52 -0.44 -0.27  1717 1.00
beta[2]  0.94    0.01 0.35   0.21   0.74  0.96  1.16  1.62  1432 1.00
beta[3]  0.26    0.01 0.24  -0.25   0.11  0.27  0.41  0.73  1721 1.00
sigma    3.14    0.03 0.97   1.64   2.41  3.00  3.72  5.40  1304 1.00
lp__    -9.67    0.06 1.90 -14.33 -10.69 -9.29 -8.28 -7.01   949 1.01

Samples were drawn using NUTS(diag_e) at Tue Dec  5 22:40:41 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Pairs plot

pairs(fit3, pars = c("alpha", "beta[1]", "sigma"))

Home safe?

PPCs

PPC model

model {
  y ~ normal(X * beta + alpha, sigma);
  sigma ~ normal(0, 2);
  alpha ~ normal(0, 5);
  beta ~ normal(0, 5);
}
generated quantities {
  vector[N] y_ppc;
  for (n in 1:N)
    y_ppc[n] = normal_rng(X[n,] * beta + alpha, sigma);
}

Shinystan

library(shinystan)
launch_shinystan(fit3ppc)

What parameters were used to generate the data?

alpha

[1] -0.1102855

beta

[1] -0.5110095 -0.9111954 -0.8371717

sigma

[1] 4

The fit

Inference for Stan model: model3.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

         mean se_mean   sd   2.5%    25%   50%   75% 97.5% n_eff Rhat
alpha    4.45    0.06 2.32  -0.38   2.98  4.56  5.96  8.72  1404 1.00
beta[1] -0.52    0.00 0.13  -0.80  -0.60 -0.52 -0.44 -0.27  1717 1.00
beta[2]  0.94    0.01 0.35   0.21   0.74  0.96  1.16  1.62  1432 1.00
beta[3]  0.26    0.01 0.24  -0.25   0.11  0.27  0.41  0.73  1721 1.00
sigma    3.14    0.03 0.97   1.64   2.41  3.00  3.72  5.40  1304 1.00
lp__    -9.67    0.06 1.90 -14.33 -10.69 -9.29 -8.28 -7.01   949 1.01

Samples were drawn using NUTS(diag_e) at Tue Dec  5 22:40:41 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

What was our data, anyway?

readLines("dist/y.txt")

[1] "15.72128 -2.101855 -11.15166 10.11426 -2.182092"

Non-identified!

beta[1:3], alpha, sigma vs. 5 datapoints