Radon contamination

  • radioactive gas that enters homes through the ground
  • vary greatly household to household
  • carcinogenic
  • mediocre yelp reviews

EPA Study

  • 80,000 houses
  • Two predictors:
    • measument on lowest floor?
    • county uranium level (positive corr with radon levels)
  • We'll focus on Minnesota, which has 919 households in 85 counties
str(df)
'data.frame':   919 obs. of  4 variables:
 $ log.radon    : num  0.8329 0.8329 1.0986 0.0953 1.1632 ...
 $ floor.measure: int  1 0 0 0 0 0 0 0 0 0 ...
 $ county       : int  1 1 1 1 2 2 2 2 2 2 ...
 $ uranium      : num  -0.689 -0.689 -0.689 -0.689 -0.847 ...

Distribution of radon levels in MN (log scale)

Conventional approaches

  • Complete pooling
    • Treat all counties the same and estimate a single radon level, \(y_i \sim \mathcal{N}(\alpha + \beta x_i, \sigma)\)
  • No pooling
    • Model radon in each county independently, \(y_i \sim \mathcal{N}(\alpha_{j[i]} + \beta x_i, \sigma)\), where \(j = 1, ..., 85\)
  • As usual, we can think of \(\sigma\) as capturing all of the variance we aren't modeling.

Pooled model

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  vector[2] beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(beta[1] + beta[2] * x, sigma);
}

Pooled fit

Inference for Stan model: pooled.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean   sd    2.5%     50%   97.5% n_eff Rhat
beta[1]    1.36    0.00 0.03    1.31    1.36    1.42  2698    1
beta[2]   -0.59    0.00 0.07   -0.72   -0.59   -0.46  2560    1
sigma      0.79    0.00 0.02    0.76    0.79    0.83  3134    1
lp__    -243.72    0.03 1.25 -246.89 -243.40 -242.33  1774    1

Samples were drawn using NUTS(diag_e) at Thu Dec  7 14:58:43 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Pooled fit

Let's try writing an unpooled model

We want a separate intercept for each county, but we can share beta and sigma.

\(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)

\(\alpha_{j} \sim \mathcal{N}(0, 10)\) (or something weakly informative)

data {
  int<lower=0> N; // number of houses
  int<lower=0> J; // number of counties
  vector[N] y; // radon output
  vector[N] x; // measured on the floor?
  int county[N]; //county id for each house
}

Unpooled model

data {
  int<lower=0> N;
  int<lower=1,upper=85> county[N];
  vector[N] x;
  vector[N] y;
}
parameters {
  vector[85] a;
  real beta;
  real<lower=0,upper=100> sigma;
}
model {
  y ~ normal(beta * x + a[county], sigma);
  // equivalent to:
  //for (i in 1:N)
  //  y[i] ~ normal(beta * x[i] + a[county[i]], sigma);
}

Unpooled fit

Inference for Stan model: unpooled.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

         mean se_mean   sd    2.5%     50%   97.5% n_eff Rhat
a[1]     0.89    0.01 0.37    0.17    0.89    1.61  4000    1
a[2]     0.93    0.00 0.10    0.73    0.93    1.12  4000    1
a[3]     1.55    0.01 0.43    0.71    1.55    2.40  4000    1
a[4]     1.58    0.00 0.27    1.04    1.58    2.11  4000    1
a[5]     1.44    0.01 0.36    0.73    1.45    2.17  4000    1
a[6]     1.54    0.01 0.44    0.69    1.54    2.40  4000    1
a[7]     2.02    0.00 0.19    1.64    2.03    2.41  4000    1
a[8]     1.99    0.01 0.37    1.26    1.99    2.71  4000    1
a[9]     1.05    0.00 0.23    0.61    1.05    1.50  4000    1
a[10]    1.56    0.00 0.29    0.99    1.57    2.13  4000    1
a[11]    1.43    0.01 0.33    0.78    1.43    2.08  4000    1
a[12]    1.75    0.01 0.37    1.02    1.75    2.47  4000    1
a[13]    1.09    0.00 0.31    0.51    1.09    1.70  4000    1
a[14]    2.00    0.00 0.20    1.61    2.00    2.41  4000    1
a[15]    1.37    0.01 0.36    0.65    1.38    2.06  4000    1
a[16]    0.71    0.01 0.52   -0.29    0.71    1.72  4000    1
a[17]    1.26    0.01 0.36    0.57    1.26    1.96  4000    1
a[18]    1.16    0.00 0.22    0.73    1.17    1.58  4000    1
a[19]    1.37    0.00 0.09    1.20    1.37    1.56  4000    1
a[20]    1.81    0.01 0.41    0.99    1.82    2.61  4000    1
a[21]    1.75    0.00 0.24    1.29    1.75    2.21  4000    1
a[22]    0.78    0.00 0.31    0.19    0.78    1.38  4000    1
a[23]    1.40    0.01 0.52    0.38    1.41    2.39  4000    1
a[24]    2.11    0.00 0.25    1.63    2.11    2.59  4000    1
a[25]    1.97    0.00 0.19    1.59    1.97    2.35  4000    1
a[26]    1.39    0.00 0.07    1.26    1.39    1.53  4000    1
a[27]    1.78    0.00 0.30    1.20    1.78    2.38  4000    1
a[28]    1.27    0.01 0.33    0.64    1.27    1.89  4000    1
a[29]    1.09    0.01 0.43    0.24    1.09    1.95  4000    1
a[30]    0.97    0.00 0.21    0.54    0.97    1.40  4000    1
a[31]    2.03    0.01 0.33    1.40    2.04    2.66  4000    1
a[32]    1.26    0.01 0.37    0.54    1.25    2.00  4000    1
a[33]    2.08    0.01 0.37    1.35    2.08    2.78  4000    1
a[34]    1.62    0.01 0.42    0.81    1.62    2.44  4000    1
a[35]    0.86    0.00 0.28    0.32    0.86    1.40  4000    1
a[36]    2.96    0.01 0.52    1.96    2.96    3.98  4000    1
a[37]    0.48    0.00 0.24    0.00    0.49    0.94  4000    1
a[38]    1.88    0.01 0.37    1.17    1.88    2.62  4000    1
a[39]    1.77    0.01 0.33    1.15    1.77    2.42  4000    1
a[40]    2.32    0.01 0.38    1.56    2.31    3.09  4000    1
a[41]    1.98    0.00 0.25    1.48    1.98    2.46  4000    1
a[42]    1.31    0.00 0.20    0.91    1.32    1.72  4000    1
a[43]    1.38    0.01 0.73   -0.04    1.38    2.84  4000    1
a[44]    1.63    0.00 0.25    1.15    1.63    2.12  4000    1
a[45]    1.11    0.00 0.27    0.58    1.11    1.66  4000    1
a[46]    1.24    0.01 0.32    0.62    1.24    1.86  4000    1
a[47]    0.96    0.01 0.52   -0.07    0.96    1.98  4000    1
a[48]    1.18    0.00 0.24    0.71    1.18    1.66  4000    1
a[49]    1.73    0.00 0.20    1.33    1.73    2.12  4000    1
a[50]    2.51    0.01 0.73    1.12    2.50    3.92  4000    1
a[51]    2.17    0.01 0.38    1.43    2.17    2.90  4000    1
a[52]    1.94    0.01 0.42    1.12    1.93    2.73  4000    1
a[53]    1.29    0.01 0.43    0.43    1.28    2.12  4000    1
a[54]    1.34    0.00 0.15    1.05    1.34    1.64  4000    1
a[55]    1.64    0.00 0.25    1.14    1.64    2.13  4000    1
a[56]    1.19    0.01 0.42    0.35    1.20    2.00  4000    1
a[57]    0.82    0.00 0.30    0.24    0.83    1.41  4000    1
a[58]    1.87    0.01 0.36    1.19    1.87    2.56  4000    1
a[59]    1.74    0.01 0.35    1.04    1.74    2.42  4000    1
a[60]    1.31    0.01 0.51    0.31    1.31    2.29  4000    1
a[61]    1.20    0.00 0.13    0.94    1.20    1.47  4000    1
a[62]    2.01    0.01 0.32    1.40    2.00    2.65  4000    1
a[63]    1.69    0.01 0.42    0.87    1.68    2.51  4000    1
a[64]    1.86    0.00 0.22    1.44    1.86    2.29  4000    1
a[65]    1.34    0.01 0.51    0.33    1.34    2.34  4000    1
a[66]    1.68    0.00 0.20    1.28    1.68    2.07  4000    1
a[67]    0.92    0.00 0.07    0.79    0.92    1.05  4000    1
a[68]    1.82    0.00 0.21    1.40    1.82    2.23  4000    1
a[69]    1.13    0.00 0.25    0.66    1.13    1.61  4000    1
a[70]    1.28    0.01 0.36    0.57    1.28    1.98  4000    1
a[71]    1.52    0.00 0.14    1.24    1.52    1.79  4000    1
a[72]    1.60    0.00 0.23    1.16    1.59    2.05  4000    1
a[73]    1.81    0.01 0.52    0.81    1.81    2.83  4000    1
a[74]    1.02    0.01 0.37    0.31    1.02    1.73  4000    1
a[75]    1.74    0.01 0.41    0.90    1.75    2.52  4000    1
a[76]    2.02    0.01 0.37    1.29    2.02    2.76  4000    1
a[77]    1.85    0.00 0.28    1.30    1.84    2.37  4000    1
a[78]    1.32    0.01 0.32    0.71    1.32    1.94  3877    1
a[79]    0.71    0.01 0.35   -0.01    0.72    1.40  4000    1
a[80]    1.36    0.00 0.11    1.15    1.37    1.57  4000    1
a[81]    2.70    0.01 0.41    1.93    2.70    3.53  4000    1
a[82]    2.24    0.01 0.74    0.77    2.24    3.66  4000    1
a[83]    1.65    0.00 0.20    1.25    1.65    2.04  4000    1
a[84]    1.67    0.00 0.20    1.27    1.67    2.05  4000    1
a[85]    1.21    0.01 0.51    0.22    1.22    2.21  4000    1
beta    -0.69    0.00 0.07   -0.82   -0.69   -0.54  3688    1
sigma    0.73    0.00 0.02    0.69    0.73    0.76  4000    1
lp__  -166.89    0.18 7.04 -182.17 -166.47 -154.54  1547    1

Samples were drawn using NUTS(diag_e) at Thu Dec  7 14:58:53 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Unpooled fit

meh

  • Pooling is useless for identifying high-radon counties
  • We don't trust unpooled estimates especially for counties with little data
  • 3 stars on Yelp

DIY partial pooling

Let's take our current model and add hierarchical priors on alpha: \(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i, \, \sigma)\)

\(\alpha_{j} \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha), \,\)

\(\mu_\alpha, \sigma_\alpha \sim\) [something weakly informative]

data {
  int<lower=0> N; // number of houses
  int<lower=0> J; // number of counties
  vector[N] y; // radon output
  vector[N] x; // measured on the floor?
  int county[N]; //county id for each house
}

Varying intercepts by group, CP

parameters {
  real<lower=0> sigma;
  real<lower=0> sigma_a;
  vector[J] a;
  real mu_a;
  real beta;
}
model {
  mu_a ~ normal(0, 100);
  sigma ~ normal(0, 10);
  sigma_a ~ normal(0, 10);
  a ~ normal(mu_a, sigma_a);
  beta ~ normal(0, 10);
  y ~ normal(a[county] + beta * x, sigma);
}

Varying intercepts fit

Inference for Stan model: partial_cp0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean   sd    2.5%  97.5% n_eff Rhat
sigma      0.73     0.0 0.02    0.69   0.76  4000    1
sigma_a    0.32     0.0 0.05    0.24   0.42  1286    1
a[1]       1.22     0.0 0.25    0.72   1.69  4000    1
a[2]       0.98     0.0 0.10    0.80   1.17  4000    1
a[3]       1.51     0.0 0.26    1.00   2.03  4000    1
a[4]       1.53     0.0 0.21    1.13   1.93  4000    1
a[5]       1.47     0.0 0.25    0.98   1.95  4000    1
a[6]       1.51     0.0 0.26    1.01   2.01  4000    1
a[7]       1.88     0.0 0.17    1.55   2.21  4000    1
a[8]       1.71     0.0 0.25    1.23   2.20  4000    1
a[9]       1.19     0.0 0.19    0.82   1.56  4000    1
a[10]      1.52     0.0 0.22    1.09   1.97  4000    1
a[11]      1.46     0.0 0.23    1.00   1.91  4000    1
a[12]      1.61     0.0 0.24    1.14   2.08  4000    1
a[13]      1.27     0.0 0.23    0.82   1.71  4000    1
a[14]      1.86     0.0 0.17    1.53   2.20  4000    1
a[15]      1.43     0.0 0.23    0.97   1.87  4000    1
a[16]      1.27     0.0 0.28    0.72   1.80  4000    1
a[17]      1.38     0.0 0.24    0.91   1.84  4000    1
a[18]      1.25     0.0 0.18    0.90   1.60  4000    1
a[19]      1.38     0.0 0.09    1.20   1.56  4000    1
a[20]      1.61     0.0 0.26    1.11   2.12  4000    1
a[21]      1.65     0.0 0.19    1.27   2.02  4000    1
a[22]      1.10     0.0 0.23    0.66   1.54  4000    1
a[23]      1.47     0.0 0.27    0.94   2.00  4000    1
a[24]      1.88     0.0 0.20    1.49   2.28  4000    1
a[25]      1.83     0.0 0.17    1.50   2.16  4000    1
a[26]      1.40     0.0 0.07    1.26   1.53  4000    1
a[27]      1.65     0.0 0.22    1.22   2.07  4000    1
a[28]      1.38     0.0 0.23    0.94   1.84  4000    1
a[29]      1.34     0.0 0.26    0.84   1.85  4000    1
a[30]      1.14     0.0 0.18    0.79   1.50  4000    1
a[31]      1.75     0.0 0.23    1.30   2.23  4000    1
a[32]      1.39     0.0 0.24    0.92   1.87  4000    1
a[33]      1.74     0.0 0.25    1.27   2.24  4000    1
a[34]      1.53     0.0 0.26    1.01   2.04  4000    1
a[35]      1.12     0.0 0.22    0.70   1.54  4000    1
a[36]      1.89     0.0 0.28    1.35   2.47  4000    1
a[37]      0.86     0.0 0.20    0.44   1.25  4000    1
a[38]      1.65     0.0 0.25    1.17   2.14  4000    1
a[39]      1.62     0.0 0.23    1.18   2.07  4000    1
a[40]      1.85     0.0 0.25    1.38   2.34  4000    1
a[41]      1.78     0.0 0.21    1.38   2.18  4000    1
a[42]      1.36     0.0 0.17    1.03   1.69  4000    1
a[43]      1.48     0.0 0.31    0.88   2.09  4000    1
a[44]      1.57     0.0 0.19    1.22   1.94  4000    1
a[45]      1.27     0.0 0.20    0.86   1.67  4000    1
a[46]      1.37     0.0 0.23    0.91   1.82  4000    1
a[47]      1.34     0.0 0.28    0.78   1.87  4000    1
a[48]      1.29     0.0 0.20    0.91   1.68  4000    1
a[49]      1.65     0.0 0.17    1.31   2.00  4000    1
a[50]      1.65     0.0 0.30    1.07   2.26  4000    1
a[51]      1.79     0.0 0.25    1.33   2.29  4000    1
a[52]      1.65     0.0 0.25    1.15   2.15  4000    1
a[53]      1.42     0.0 0.25    0.92   1.91  4000    1
a[54]      1.37     0.0 0.14    1.10   1.63  4000    1
a[55]      1.57     0.0 0.20    1.17   1.95  4000    1
a[56]      1.37     0.0 0.27    0.83   1.89  4000    1
a[57]      1.13     0.0 0.22    0.67   1.56  4000    1
a[58]      1.66     0.0 0.24    1.18   2.14  4000    1
a[59]      1.59     0.0 0.23    1.14   2.07  4000    1
a[60]      1.44     0.0 0.27    0.93   1.96  4000    1
a[61]      1.24     0.0 0.12    1.01   1.48  4000    1
a[62]      1.74     0.0 0.23    1.29   2.20  4000    1
a[63]      1.56     0.0 0.26    1.05   2.09  4000    1
a[64]      1.74     0.0 0.18    1.39   2.10  4000    1
a[65]      1.45     0.0 0.27    0.92   1.98  4000    1
a[66]      1.62     0.0 0.17    1.27   1.96  4000    1
a[67]      0.94     0.0 0.07    0.81   1.07  4000    1
a[68]      1.72     0.0 0.17    1.37   2.07  4000    1
a[69]      1.27     0.0 0.21    0.87   1.67  4000    1
a[70]      1.40     0.0 0.24    0.92   1.87  4000    1
a[71]      1.51     0.0 0.13    1.24   1.77  4000    1
a[72]      1.56     0.0 0.19    1.20   1.93  4000    1
a[73]      1.58     0.0 0.27    1.03   2.15  4000    1
a[74]      1.29     0.0 0.25    0.79   1.76  4000    1
a[75]      1.58     0.0 0.27    1.07   2.11  4000    1
a[76]      1.72     0.0 0.25    1.22   2.21  4000    1
a[77]      1.69     0.0 0.21    1.27   2.11  4000    1
a[78]      1.40     0.0 0.23    0.95   1.85  4000    1
a[79]      1.14     0.0 0.25    0.65   1.61  4000    1
a[80]      1.37     0.0 0.10    1.17   1.58  4000    1
a[81]      1.93     0.0 0.27    1.42   2.48  4000    1
a[82]      1.62     0.0 0.30    1.06   2.22  4000    1
a[83]      1.60     0.0 0.17    1.27   1.94  4000    1
a[84]      1.62     0.0 0.17    1.28   1.96  4000    1
a[85]      1.42     0.0 0.27    0.87   1.95  4000    1
mu_a       1.49     0.0 0.05    1.39   1.59  4000    1
beta      -0.66     0.0 0.07   -0.79  -0.53  4000    1
lp__    -112.02     0.3 8.83 -130.00 -95.13   852    1

Samples were drawn using NUTS(diag_e) at Thu Dec  7 14:58:58 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Lines for each county

DIY partial pooling, now with 8500% more slopes

Let's say that each county should now also get its own slope for floor measurement, \(\beta_j\), and code \(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta_{j[i]} x_i, \, \sigma)\) with hierarchical priors on both \(\alpha_j\) and \(\beta_j\):

\(\alpha_{j} \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha), \,\) \(\beta_{j} \sim \mathcal{N}(\mu_\beta, \sigma_\beta)\)

\(\mu_\alpha, \mu_\beta, \sigma_\alpha, \sigma_\beta \sim\) [something weakly informative]

data {
  int<lower=0> N; // number of houses
  int<lower=0> J; // number of counties
  vector[N] y; // radon output
  vector[N] x; // measured on the floor?
  int county[N]; //county id for each house
}

Varying slopes and intercepts by group, CP

parameters {
  real<lower=0> sigma;
  real<lower=0> sigma_a; real<lower=0> sigma_b;
  vector[J] a; vector[J] b;
  real mu_a; real mu_b;
}
model {
  mu_a ~ normal(0, 100); mu_b ~ normal(0, 100);
  sigma ~ normal(0, 10);
  sigma_a ~ normal(0, 10); sigma_b ~ normal(0, 10);
  a ~ normal(mu_a, sigma_a);
  b ~ normal(mu_b, sigma_b);
  y ~ normal(a[county] + b[county] .* x, sigma);
}

Uh oh

Warning: There were 13 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 1 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: There were 8 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems

pairs()

Shinystan!

launch_shinystan(partial.cp.fit)

Let's non-center !

  • Right now we have \(b_j \sim \mathcal{N}(\mu_b, \sigma_b)\)
  • We want to add a new parameter, \(b_j^{std} \sim \mathcal{N}(0, 1)\)
  • Define \(b_j\) now as a transformed parameter, \(b_j = \mu_b + \sigma_b * b_j^{std}\)

Varying slopes and intercepts by group, NCP

parameters {
  real<lower=0> sigma;
  real<lower=0> sigma_a;
  real<lower=0> sigma_b;
  real mu_a; real mu_b;
  vector[J] b_std;
  vector[J] a;
}
transformed parameters {
  vector[J] b = mu_b + sigma_b * b_std;
}
model {
  mu_a ~ normal(0, 100); mu_b ~ normal(0, 100);
  sigma_b ~ normal(0, 20); b_std ~ normal(0, 1);
  sigma ~ normal(0, 5);
  sigma_a ~ normal(0, 20);
  a ~ normal(mu_a, sigma_a);
  y ~ normal(a[county] + b[county] .* x, sigma);
}
In file included from file115535512f55c.cpp:8:
In file included from /usr/local/lib/R/3.4/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4:
In file included from /usr/local/lib/R/3.4/site-library/StanHeaders/include/stan/math.hpp:4:
In file included from /usr/local/lib/R/3.4/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4:
In file included from /usr/local/lib/R/3.4/site-library/StanHeaders/include/stan/math/rev/core.hpp:12:
In file included from /usr/local/lib/R/3.4/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5:
In file included from /usr/local/lib/R/3.4/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7:
In file included from /usr/local/lib/R/3.4/site-library/BH/include/boost/math/tools/config.hpp:13:
In file included from /usr/local/lib/R/3.4/site-library/BH/include/boost/config.hpp:39:
/usr/local/lib/R/3.4/site-library/BH/include/boost/config/compiler/clang.hpp:196:11: warning: 'BOOST_NO_CXX11_RVALUE_REFERENCES' macro redefined [-Wmacro-redefined]
#  define BOOST_NO_CXX11_RVALUE_REFERENCES
          ^
<command line>:6:9: note: previous definition is here
#define BOOST_NO_CXX11_RVALUE_REFERENCES 1
        ^
1 warning generated.

Slopes and intercepts for each county

Warning: Removed 217 rows containing non-finite values (stat_sum).

PPCs?

Generated quantities for \(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta_{j[i]} x_i, \, \sigma)\)

PPCs!

model {
  mu_a ~ normal(0, 100); mu_b ~ normal(0, 100);
  sigma_b ~ normal(0, 20); b_std ~ normal(0, 1);
  sigma ~ normal(0, 5);
  sigma_a ~ normal(0, 20);
  a ~ normal(mu_a, sigma_a);
  y ~ normal(a[county] + b[county] .* x, sigma);
}
generated quantities {
  vector[N] yppc;
  for (i in 1:N)
    yppc[i] = normal_rng(a[county[i]] + b[county[i]] * x[i], sigma);
}

House 10 PPC

samples = extract(partial.ncp.ppc.fit)
sdf = as.data.frame(samples$yppc)
ggplot(sdf, aes(V10)) + geom_histogram(bins=50) + geom_vline(xintercept = df$log.radon[10])

House 25 PPC

ggplot(sdf, aes(V25)) + geom_histogram(bins=50) + geom_vline(xintercept = df$log.radon[25])

Shinystan

launch_shinystan(partial.ncp.ppc.fit)

Group-level predictors… Go!

  • County uranium levels available
  • Let's get rid of the per-county floor.measure coefficient for simplicity
  • \(y_i \sim ~ \mathcal{N}(\alpha_{j[i]} + \beta x_i + \gamma u_j, \, \sigma)\)
  • Add vector[N] u; to your data block
  • Pass in u = df$uranium in your call to sampling like the rest of the data
  • Note that the model has both indicator variables for each county, plus a county-level covariate. In classical regression, this would result in collinearity. In a multilevel model, the partial pooling of the intercepts towards the expected value of the group-level linear model avoids this.
  • Group-level predictors also serve to reduce group-level variation \(\sigma_\alpha\). An important implication of this is that the group-level estimate induces stronger pooling.

Now with a group-level predictor, u_coeff

}
model {
  mu_a ~ normal(0, 100);
  sigma ~ normal(0, 5);
  sigma_a ~ normal(0, 20);
  a ~ normal(mu_a, sigma_a);
  u_coeff ~ normal(0, 2);
  floor_coeff ~ normal(0, 2);
}
Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems

pairs()

pairs(group.hier.fit, pars=c('sigma', 'sigma_a', 'u_coeff', 'floor_coeff'))

Shinystan

launch_shinystan(group.hier.fit)

NCP \(\alpha\)

  vector[J] a = mu_a + sigma_a * a_std;
}
model {
  mu_a ~ normal(0, 5); sigma_a ~ normal(0, 10);
  a_std ~ normal(0, 1);
  sigma ~ normal(0, 5);
  u_coeff ~ normal(0, 2);
  floor_coeff ~ normal(0, 2);
}

House 10 PPC

samples = extract(group.hier.ncp.fit)
sdf = as.data.frame(samples$yppc)
ggplot(sdf, aes(V10)) + geom_histogram(bins=50) + geom_vline(xintercept = df$log.radon[10])

House 25 PPC

ggplot(sdf, aes(V25)) + geom_histogram(bins=50) + geom_vline(xintercept = df$log.radon[25])

Effect of county-level uranium on log(radon)

samples = as.data.frame(group.hier.ncp.fit)
ggplot(samples, aes(u_coeff)) + geom_histogram(bins=50)

Uranium coefficient

Floor measure coefficient

Shinystan

launch_shinystan(group.hier.ncp.fit)

[TODO] Let's add a predictor for % floor measurements per county

transformed data{
  vector[N] avgx;
  {
    vector[J] county_avg = rep_vector(0, J);
    vector[J] num_houses_in_county = rep_vector(0, J);
    for (n in 1:N) {

Launch shinystan!

Floor measure coefficient

Uranium coefficient