- IRT with edstan
- Constructing basic IRT models with Stan
- Extending IRT models with Stan
July 11, 2016
IRT with edstan
Extending IRT models with Stan
The purpose is lower the start up costs of doing IRT with Stan.
edstan assists by providing
rstan should be installed first. Then edstan can be installed from Github.
install.packages("devtools") devtools::install_github("danielcfurr/edstan")
Then both should be loaded.
# Load rstan and edstan library(rstan) library(edstan) rstan_options(auto_write = TRUE) # Optional options(mc.cores = parallel::detectCores()) # Optional
Model | Stan file |
---|---|
Rasch | rasch_latent_reg.stan |
Partial credit | pcm_latent_reg.stan |
Rating scale | rsm_latent_reg.stan |
Two-parameter logistic | 2pl_latent_reg.stan |
Generalized partial credit | gpcm_latent_reg.stan |
Generalized rating scale | grsm_latent_reg.stan |
Function | Purpose |
---|---|
irt_data() |
Prepares data for fitting |
labelled_integer() |
Create vector of consecutive integers |
irt_stan() |
Wrapper for running MCMC |
print_irt_stan() |
Show table of output |
irt_data()
returns a data list in the correct format for the edstan models.
For wide-form data, it is used as follows:
irt_data(response_matrix, W)
response_matrix
is a matrix of scored responses.W
(optional) is a matrix of person covariates.For long-form data, it is used as follows:
irt_data(y, ii, jj, W)
y
is the scored response vector.ii
is an index for items.jj
is an index for persons.labelled_integer()
returns a vector of consecutive integers. It helps in creating indices like ii
and jj
.
It is used as follows:
labelled_integer(x)
x
is a vector of identifiers.x
can be a numeric, character, or factor variable.
The elements of the returned vector are labelled with their original values.
irt_stan()
is a wrapper for stan()
and returns a stanfit
object.
It is used as follows:
irt_stan(data_list, model, ...)
data_list
is the result of irt_data()
.model
(optional) is the file name for one of the edstan models....
allows arguments to be passed to stan()
, such as:
chains
: the number of MCMC chains.iter
: the number of iterations per chain.print_irt_stan()
provides summaries of parameter posteriors. It is an alternative to print()
for stanfit
objects.
It is used as follows:
print_irt_stan(fit, data_list, probs)
fit
is the fitted model from irt_stan()
.data_list
is the formatted data list from irt_stan()
.probs
(optional) are the quantiles to include for the posteriors.24 items, for example:
Polytomous responses:
24 items, for example:
Uses a \(3 \times 2 \times 2\) item design:
poly
, dich
).anger
, male
).do
, other
, scold
, shout
).# View first few rows head(aggression)
## person item poly dich description anger male do other scold shout ## 1 1 1 0 0 do bus curse 20 1 0 1 0 0 ## 2 1 2 0 0 do bus scold 20 1 0 1 1 0 ## 3 1 3 0 0 do bus shout 20 1 0 1 0 1 ## 4 1 4 0 0 do train curse 20 1 0 1 0 0 ## 5 1 5 0 0 do train scold 20 1 0 1 1 0 ## 6 1 6 0 0 do train shout 20 1 0 1 0 1
\[ \Pr(y_{ij} = 1 | \theta_j, \beta_i) = \mathrm{logit}^{-1} [ \theta_j - \beta_i ] \] \[ \theta_j | w_j \sim \mathrm{N}(w_j' \lambda, \sigma^2) \]
A latent regression may be included (or not) with any of the edstan models. To do this, create a matrix of covariates and pass it to irt_data()
.
# Assemble matrix of person covariates covars <- aggression[, c("person", "anger", "male")] covars <- unique(covars) covars <- covars[order(covars$person), ] W <- cbind(intercept = 1, covars[, -1]) head(W)
## intercept anger male ## 1 1 20 1 ## 25 1 11 1 ## 49 1 17 0 ## 73 1 21 0 ## 97 1 17 0 ## 121 1 21 0
The data list is assembled and then the model is fit.
# Make the data list data_dich <- irt_data(y = aggression$dich, ii = labelled_integer(aggression$description), jj = aggression$person, W = W) # Fit the latent regression Rasch model fit_rasch <- irt_stan(data_dich, model = "rasch_latent_reg.stan", iter = 200, chains = 4)
# View summary of parameter posteriors print_irt_stan(fit_rasch, data_dich, probs = c(.025, .975))
## Item 1: do bus curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[1] -1.4 0.00679 0.136 -1.64 -1.12 400 0.998 ## ## Item 2: do bus scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[2] -0.743 0.0069 0.138 -1.01 -0.488 400 0.999 ## ## Item 3: do bus shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[3] -0.245 0.0063 0.126 -0.495 0.0399 400 0.998 ## ## Item 4: do train curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[4] -1.92 0.00759 0.152 -2.23 -1.63 400 1 ## ## Item 5: do train scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[5] -0.881 0.0059 0.118 -1.1 -0.659 400 1.01 ## ## Item 6: do train shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[6] -0.174 0.00701 0.14 -0.449 0.0869 400 1 ## ## Item 7: do store curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[7] -0.699 0.00682 0.136 -0.967 -0.433 400 0.998 ## ## Item 8: do store scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[8] 0.517 0.00642 0.128 0.277 0.781 400 1 ## ## Item 9: do store shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[9] 1.37 0.00774 0.155 1.07 1.65 400 1.01 ## ## Item 10: do operator curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[10] -1.26 0.00688 0.138 -1.53 -0.999 400 0.997 ## ## Item 11: do operator scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[11] 0.177 0.00618 0.124 -0.0912 0.436 400 0.996 ## ## Item 12: do operator shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[12] 0.899 0.00688 0.138 0.642 1.16 400 1 ## ## Item 13: want bus curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[13] -1.39 0.00756 0.151 -1.7 -1.09 400 1 ## ## Item 14: want bus scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[14] -0.558 0.00615 0.123 -0.815 -0.322 400 0.993 ## ## Item 15: want bus shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[15] 0.7 0.00685 0.136 0.443 0.95 396 0.998 ## ## Item 16: want train curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[16] -1.04 0.00726 0.145 -1.34 -0.755 400 0.996 ## ## Item 17: want train scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[17] -0.105 0.00691 0.138 -0.354 0.166 400 0.993 ## ## Item 18: want train shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[18] 1.33 0.00749 0.15 1.03 1.6 400 1 ## ## Item 19: want store curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[19] 0.0471 0.00616 0.123 -0.19 0.268 400 0.993 ## ## Item 20: want store scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[20] 1.34 0.00785 0.151 1.05 1.66 369 1 ## ## Item 21: want store shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[21] 2.82 0.0112 0.224 2.38 3.26 400 1 ## ## Item 22: want operator curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[22] -0.863 0.00704 0.141 -1.15 -0.607 400 1 ## ## Item 23: want operator scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[23] 0.223 0.00719 0.144 -0.0461 0.481 400 0.994 ## ## Item 24: want operator shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[24] 1.86 0.00882 0.171 1.56 2.22 375 1 ## ## Ability distribution ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## lambda[1] -1.3981 0.016313 0.3263 -2.0374 -0.7364 400 1.00 ## lambda[2] 0.0577 0.000785 0.0157 0.0274 0.0861 400 1.01 ## lambda[3] 0.3118 0.011648 0.1958 -0.0774 0.6718 283 1.00 ## sigma 1.3705 0.004084 0.0659 1.2545 1.5031 261 1.01
…
## ## Item 22: want operator curse ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[22] -0.863 0.00704 0.141 -1.15 -0.607 400 1 ## ## Item 23: want operator scold ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[23] 0.223 0.00719 0.144 -0.0461 0.481 400 0.994 ## ## Item 24: want operator shout ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## beta[24] 1.86 0.00882 0.171 1.56 2.22 375 1 ## ## Ability distribution ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## lambda[1] -1.3981 0.016313 0.3263 -2.0374 -0.7364 400 1.00 ## lambda[2] 0.0577 0.000785 0.0157 0.0274 0.0861 400 1.01 ## lambda[3] 0.3118 0.011648 0.1958 -0.0774 0.6718 283 1.00 ## sigma 1.3705 0.004084 0.0659 1.2545 1.5031 261 1.01
Check all parameters for convergence.
# Check convergence (rhat) stan_rhat(fit_rasch)
The 2PL may be fit by requesting a different Stan model.
# Fit the latent regression 2PL model fit_2pl <- irt_stan(data_dich, model = "2pl_latent_reg.stan", iter = 200, chains = 4)
The latent regression may be omitted by excluding W
when using irt_data()
.
# Make the data list without poviding W data_noreg <- irt_data(y = aggression$dich, ii = labelled_integer(aggression$description), jj = aggression$person) # Fit the 2PL without latent regression fit_noreg <- irt_stan(data_noreg, model = "2pl_latent_reg.stan", iter = 200, chains = 4)
To fit the polytomous models, assemble the data list using the original responses.
data_poly <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person, W = W)
Then pass the new data list to irt_stan()
and specify the desired model.
fit_rsm <- irt_stan(data_poly, model = "rsm_latent_reg.stan", iter = 300, chains = 4) fit_pcm <- irt_stan(data_poly, model = "pcm_latent_reg.stan", iter = 300, chains = 4) fit_gpcm <- irt_stan(data_poly, model = "gpcm_latent_reg.stan", iter = 300, chains = 4) fit_grsm <- irt_stan(data_poly, model = "grsm_latent_reg.stan", iter = 300, chains = 4)
Constructing basic IRT models with Stan
Extending IRT models with Stan
\[ \Pr(y_{ij} = 1 | \theta_j, \beta_i) = \mathrm{logit}^{-1} [ \theta_j - \beta_i ] \] \[ \theta_j \sim \mathrm{N}(0, \sigma^2) \]
Priors:
\[\beta_i \sim \mathrm{N}(0, 25)\] \[\sigma \sim \mathrm{Exp}(.1)\]
data { int<lower=1> I; // # questions int<lower=1> J; // # persons int<lower=1> N; // # observations int<lower=1, upper=I> ii[N]; // question for n int<lower=1, upper=J> jj[N]; // person for n int<lower=0, upper=1> y[N]; // correctness for n } parameters { ... } model { ... }
data { ... } parameters { vector[I] beta; vector[J] theta; real<lower=0> sigma; } model { target += normal_lpdf(beta | 0, 5); target += normal_lpdf(theta | 0, sigma); target += exponential_lpdf(sigma | .1); target += bernoulli_logit_lpmf(y | theta[jj] - beta[ii]); }
\[ \Pr(y_{ij} = 1 | \theta_j, \beta_i) = \mathrm{logit}^{-1} [ \theta_j - \beta_i ] \] \[ \theta_j | w_j \sim \mathrm{N}(w_j' \lambda, \sigma^2) \]
Priors:
\[\beta_1 \cdots \beta_{(I-1)} \sim \mathrm{N}(0, 25)\] \[\sigma \sim \mathrm{Exp}(.1)\] \[\lambda \sim \mathrm{unif}(-\infty, \infty)\]
Constraint:
\[\beta_I = -\sum_{i=1}^{(I-1)} \beta_i\]
edstan file: rasch_latent_reg.stan
data { int<lower=1> I; // # questions int<lower=1> J; // # persons int<lower=1> N; // # observations int<lower=1, upper=I> ii[N]; // question for n int<lower=1, upper=J> jj[N]; // person for n int<lower=0, upper=1> y[N]; // correctness for n int<lower=1> K; // # person covariates matrix[J,K] W; // person covariate matrix } parameters { ... } transformed parameters { ... } model { ... }
data { ... } parameters { vector[I-1] beta_free; vector[J] theta; real<lower=0> sigma; vector[K] lambda; } transformed parameters { vector[I] beta; beta = append_row(beta_free, rep_vector(-1*sum(beta_free), 1)); } model { target += normal_lpdf(beta_free | 0, 5); target += normal_lpdf(theta |W*lambda, sigma); target += exponential_lpdf(sigma | .1); target += bernoulli_logit_lpmf(y | theta[jj] - beta[ii]); }
\[ \Pr(y_{ij} = 1 | \theta_j, \alpha_i, \beta_i) = \mathrm{logit}^{-1} [ \alpha_i (\theta_j + w_j'\lambda - \beta_i) ] \] \[ \theta_j \sim \mathrm{N}(0, 1) \]
Priors:
\[\beta_1 \cdots \beta_{(I-1)} \sim \mathrm{N}(0, 25)\] \[\alpha_i \sim \mathrm{log~N}(1, 1)\] \[\lambda \sim \mathrm{unif}(-\infty, \infty)\]
Constraint:
\[\beta_I = -\sum_{i=1}^{(I-1)} \beta_i\]
edstan file: 2pl_latent_reg.stan
data { int<lower=1> I; // # questions int<lower=1> J; // # persons int<lower=1> N; // # observations int<lower=1, upper=I> ii[N]; // question for n int<lower=1, upper=J> jj[N]; // person for n int<lower=0, upper=1> y[N]; // correctness for n int<lower=1> K; // # person covariates matrix[J,K] W; // person covariate matrix } parameters { ... } transformed parameters { ... } model { ... }
data { ... } parameters { vector<lower=0>[I] alpha; vector[I-1] beta_free; vector[J] theta; vector[K] lambda; } transformed parameters { vector[I] beta; beta = append_row(beta_free, rep_vector(-1*sum(beta_free), 1)); } model { vector[J] mu; mu = W*lambda; target += lognormal_lpdf(alpha | 1, 1); target += normal_lpdf(beta_free | 0, 5); target += normal_lpdf(theta | 0, 1); target += bernoulli_logit_lpmf(y | alpha[ii].*(theta[jj] + mu[jj] - beta[ii])); }
The Rasch and 2PL models may be fit with stan()
rather than irt_stan()
, yielding equivalent results.
# Assuming the .stan files are in the working directory: # Fit Rasch latent reg model rasch_fit <- stan("rasch_latent_reg.stan", data = data_dich, iter = 200, chains = 4) # Fit 2PL latent reg model twopl_fit <- stan("2pl_latent_reg.stan", data = data_dich, iter = 200, chains = 4)
See also an in-depth tutorial for the 2PL model.
Extending IRT models with Stan
\[ \Pr(y_{ij} = 1 | \xi_s, \zeta_{sj}, \beta_i) = \mathrm{logit}^{-1} [ \xi_s + \zeta_{sj} - \beta_i ] \] \[ \zeta_{sj} \sim \mathrm{N}(0, \sigma_1^2) \] \[ \xi_s \sim \mathrm{N}(0, \sigma_2^2) \]
data { int<lower=1> I; // # questions int<lower=1> J; // # persons int<lower=1> S; // # schools int<lower=1> N; // # observations int<lower=1, upper=I> ii[N]; // question for n int<lower=1, upper=J> jj[N]; // person for n int<lower=1, upper=S> ss[N]; // school for n int<lower=0, upper=1> y[N]; // correctness for n } parameters { ... } model { ... }
data { ... } parameters { vector[I] beta; vector[J] zeta; vector[S] xi; real<lower=0> sigma[2]; } model { target += normal_lpdf(zeta | 0, sigma[1]); target += normal_lpdf(xi | 0, sigma[2]); target += bernoulli_logit_lpmf(y | xi[ss] + zeta[jj] - beta[ii]); }
\[ \Pr(y_{ij} = 1 | \theta_j, \beta_i) = \mathrm{logit}^{-1} [ \theta_{j} - \beta_i ] \] \[ \theta_j \sim \mathrm{N}(\mu, \sigma^2) \] \[ \beta_i \sim \mathrm{N}(0, \tau^2) \]
data { int<lower=1> I; // # questions int<lower=1> J; // # persons int<lower=1> N; // # observations int<lower=1, upper=I> ii[N]; // question for n int<lower=1, upper=J> jj[N]; // person for n int<lower=0, upper=1> y[N]; // correctness for n } parameters { ... } model { ... }
data { ... } parameters { vector[I] beta; vector[J] theta; real mu; real<lower=0> sigma; real<lower=0> tau; } model { target += normal_lpdf(theta | mu, sigma); target += normal_lpdf(beta | 0, tau); target += bernoulli_logit_lpmf(y | theta[jj] - beta[ii]); }
See also a case study for a 2PL random item model that includes a correlation between the discrimination and difficulty parameters.
Linear logistic test model with error:
\[ \Pr(y_{ij} = 1 | \theta_j, \delta_i) = \mathrm{logit}^{-1} [ \theta_{j} - \delta_i ] \] \[ \delta_i \equiv x_i'\beta + \epsilon_i \] \[ \theta_j \sim \mathrm{N}(0, \sigma^2) \] \[ \epsilon_i \sim \mathrm{N}(0, \tau^2) \]
Stan file: lltme.stan
data { int<lower=1> I; // # questions int<lower=1> J; // # persons int<lower=1> N; // # observations int<lower=1, upper=I> ii[N]; // question for n int<lower=1, upper=J> jj[N]; // person for n int<lower=0, upper=1> y[N]; // correctness for n int<lower=1> K; // # item predictors matrix[I,K] X; // Item predictor matrix } parameters { ... } transformed parameters { ... } model { ... } generated quantities { ... }
data { ... } parameters { vector[J] theta; vector[K] beta; vector[I] epsilon; real<lower=0> sigma; real<lower=0> tau; } transformed parameters { vector[I] delta; delta = X*beta + epsilon; } model { target += normal_lpdf(theta | 0, sigma); target += normal_lpdf(epsilon | 0, tau); target += bernoulli_logit_lpmf(y | theta[jj] - delta[ii]); } generated quantities { ... }
data { ... } parameters { ... } transformed parameters { ... } model { ... } generated quantities { real rsq; rsq = variance(X*beta) / (variance(X*beta) + tau^2); }
Let's revisit the verbal aggression data.
# View first few rows head(aggression)
## person item poly dich description anger male do other scold shout ## 1 1 1 0 0 do bus curse 20 1 0 1 0 0 ## 2 1 2 0 0 do bus scold 20 1 0 1 1 0 ## 3 1 3 0 0 do bus shout 20 1 0 1 0 1 ## 4 1 4 0 0 do train curse 20 1 0 1 0 0 ## 5 1 5 0 0 do train scold 20 1 0 1 1 0 ## 6 1 6 0 0 do train shout 20 1 0 1 0 1
Make \(X\), the matrix of item covariates.
# Assemble a matrix of item covariates item_covars <- aggression[, c("item", "do", "other", "scold", "shout")] item_covars <- unique(item_covars) item_covars <- item_covars[order(item_covars$item), ] X <- cbind(intercept = 1, item_covars[, -1]) head(X)
## intercept do other scold shout ## 1 1 0 1 0 0 ## 2 1 0 1 1 0 ## 3 1 0 1 0 1 ## 4 1 0 1 0 0 ## 5 1 0 1 1 0 ## 6 1 0 1 0 1
Make the full data list and fit the model.
# Make the data list data_lltme <- list(I = max(aggression$item), J = max(aggression$person), N = nrow(aggression), ii = aggression$item, jj = aggression$person, y = aggression$dich, K = ncol(X), X = X) # Fit the LLTM-E with Stan fit_lltme <- stan("lltme.stan", data = data_lltme, iter = 200, chains = 4)
# Check convergence (rhat) stan_rhat(fit_lltme)
# View a summary of parameter posteriors print(fit_lltme, pars = c("beta", "tau", "rsq", "sigma"))
## Inference for Stan model: lltme. ## 4 chains, each with iter=200; warmup=100; thin=1; ## post-warmup draws per chain=100, total post-warmup draws=400. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ## beta[1] -0.73 0.01 0.23 -1.17 -0.88 -0.74 -0.60 -0.26 262 1.00 ## beta[2] 0.72 0.01 0.20 0.32 0.59 0.70 0.85 1.11 400 1.00 ## beta[3] -1.07 0.01 0.19 -1.40 -1.19 -1.06 -0.95 -0.65 379 1.00 ## beta[4] 1.05 0.01 0.26 0.47 0.89 1.06 1.19 1.55 340 0.99 ## beta[5] 2.12 0.02 0.25 1.64 1.95 2.12 2.28 2.69 264 0.99 ## tau 0.44 0.01 0.09 0.30 0.37 0.42 0.49 0.65 316 1.00 ## rsq 0.86 0.00 0.06 0.70 0.83 0.88 0.90 0.93 272 1.00 ## sigma 1.40 0.01 0.08 1.26 1.34 1.40 1.44 1.55 205 1.01 ## ## Samples were drawn using NUTS(diag_e) at Thu Jul 07 14:57:57 2016. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## The estimated Bayesian Fraction of Missing Information is a measure of ## the efficiency of the sampler with values close to 1 being ideal. ## For each chain, these estimates are ## 1.3 0.8 1 0.8