The two-parameter logistic model (2PL) is an item response theory model that includes parameters for both the difficulty and discrimination of items. A hierarchical extension, presented here, models these item parameter pairs as correlated draws from a bivariate normal distribution. This model is similar to the hierarchical three-parameter logistic model proposed by Glas and van der Linden (2003).
\[ \mathrm{logit} [ \Pr(y_{ij} = 1 | \theta_j, \alpha_i, \beta_i) ] = \alpha_i (\theta_j - \beta_i) \] \[ \log \alpha_i, \beta_i \sim \mathrm{MVN}(\mu_1, \mu_2, \Sigma) \] \[ \theta_p \sim \mathrm{N}(0, 1) \]
Variables:
Parameters:
Priors:
The Stan program for the model is provided below. It differs from the notation above in that it is written in terms of the Cholesky decomposition of the correlation matrix for better efficiency. This matrix is named L_Omega
; Omega
because it is the correlation rather than covariance matrix, and L
because it is a Cholesky decomposition. The vector tau
contains the standard deviations that would be the (square root of the) diagonal of the covariance matrix. (The standard deviations are invariant whether or not the the Cholesky decomposition is used.) In the model
block, L_Omega
is converted to its covariance equivalent L_Sigma
, and item parameter pairs xi[i]
are sampled from L_Sigma
. The first element of vector xi[i]
is the log discrimination for item i
, and the second is the difficulty for item i
.
Weakly informative normal priors are placed on mu
, and weakly informative truncated normal priors are placed on tau
. The prior placed on L_Omega
using lkj_corr_cholesky()
is weakly informative and slightly favors a correlation of zero. See the Stan manual for details.
While more efficient, parameters L_Omega
and xi
are difficult to interpret. To alleviate this inconvenience, item parameters alpha
and beta
are derived from xi
in the transformed parameters
block. (There is some redundancy, as beta[i]
and xi[i,2]
are equal.) Also, L_Omega
is converted to a standard correlation matrix, Omega
, in the generated quantities
block.
data {
int<lower=1> I; // # items
int<lower=1> J; // # persons
int<lower=1> N; // # observations
int<lower=1, upper=I> ii[N]; // item for n
int<lower=1, upper=J> jj[N]; // person for n
int<lower=0, upper=1> y[N]; // correctness for n
}
parameters {
vector[J] theta; // abilities
vector[2] xi[I]; // alpha/beta pair vectors
vector[2] mu; // vector for alpha/beta means
vector<lower=0>[2] tau; // vector for alpha/beta residual sds
cholesky_factor_corr[2] L_Omega;
}
transformed parameters {
vector[I] alpha;
vector[I] beta;
for (i in 1:I) {
alpha[i] <- exp(xi[i,1]);
beta[i] <- xi[i,2];
}
}
model {
matrix[2,2] L_Sigma;
L_Sigma <- diag_pre_multiply(tau, L_Omega);
for (i in 1:I)
xi[i] ~ multi_normal_cholesky(mu, L_Sigma);
theta ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(4);
mu[1] ~ normal(0,1);
tau[1] ~ exponential(.1);
mu[2] ~ normal(0,5);
tau[2] ~ exponential(.1);
y ~ bernoulli_logit(alpha[ii] .* (theta[jj] - beta[ii]));
}
generated quantities {
corr_matrix[2] Omega;
Omega <- multiply_lower_tri_self_transpose(L_Omega);
}
First, the necessary R packages are loaded.
# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)
The R code that follows simulates a dataset conforming to the model. The Stan model will be evaluated in terms of its ability to recover the generating values of the parameters when fit to this dataset.
# Set paramters for the simulated data
I <- 20
J <- 1000
mu <- c(0, 0)
tau <- c(0.25, 1)
Omega <- matrix(c(1, 0.3, 0.3, 1), ncol = 2)
# Calculate or sample remaining paramters
Sigma <- tau %*% t(tau) * Omega
xi <- MASS::mvrnorm(I, c(0, 0), Sigma)
alpha <- exp(mu[1] + as.vector(xi[, 1]))
beta <- as.vector(mu[2] + xi[, 2])
theta <- rnorm(J, mean = 0, sd = 1)
# Assemble data and simulate response
data_list <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J,
each = I))
eta <- alpha[data_list$ii] * (theta[data_list$jj] - beta[data_list$ii])
data_list$y <- as.numeric(boot::inv.logit(eta) > runif(data_list$N))
The simulated data consists of 20 items and 1000 persons. The log discriminations have mean 0 and standard deviation 0.25. The difficulties have mean 0 and standard deviation 1. The correlation between the log discrimination residuals and difficulty residuals is 0.3. The simulated dataset is fit with Stan.
# Fit model to simulated data
sim_fit <- stan(file = "hierarchical_2pl.stan", data = data_list, chains = 4,
iter = 500)
Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the \(\hat{R}\) statistic for the model parameters and log posterior. These are provided in the following figure. All values for \(\hat{R}\) should be less than 1.1.
sim_summary <- as.data.frame(summary(sim_fit)[[1]])
sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0,
width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))
The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.
# Make vector of wanted parameter names
wanted_pars <- c(paste0("alpha[", 1:I, "]"), paste0("beta[", 1:I, "]"), c("mu[1]",
"mu[2]", "tau[1]", "tau[2]", "Omega[1,2]"))
# Get estimated and generating values for wanted parameters
generating_values = c(alpha, beta, mu, tau, Omega[1, 2])
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]
# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values
sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values
# Plot the discrepancy
ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) +
scale_x_discrete() + geom_abline(intercept = 0, slope = 0, color = "white") +
geom_linerange() + geom_point(size = 2) + labs(y = "Discrepancy", x = NULL) +
theme(panel.grid = element_blank()) + coord_flip()
# Use data and scoring function from the mirt package
library(mirt)
sat <- key2binary(SAT12, key = c(1, 4, 5, 2, 3, 1, 2, 1, 3, 1, 2, 4, 2, 1, 5,
3, 4, 4, 1, 4, 3, 3, 4, 1, 3, 5, 1, 3, 1, 5, 4, 5))
The example data (Wood et al. 2003) are from a grade 12 science assessment. 600 students responded to 32 dichotomously scored items. Non-responses were scored as incorrect, so the data contain no missing values. The scored response matrix is converted to list form and fit with Stan.
# Assemble data list and fit model
sat_list <- list(I = ncol(sat), J = nrow(sat), N = length(sat), ii = rep(1:ncol(sat),
each = nrow(sat)), jj = rep(1:nrow(sat), times = ncol(sat)), y = as.vector(sat))
sat_fit <- stan(file = "hierarchical_2pl.stan", data = sat_list, chains = 4,
iter = 500)
As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior, using \(\hat{R}\).
ex_summary <- as.data.frame(summary(sat_fit)[[1]])
ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0,
width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))
Next we view summaries of the parameter posteriors.
# View table of parameter posteriors
print(sat_fit, pars = c("alpha", "beta", "mu", "tau", "Omega[1,2]"))
## Inference for Stan model: hierarchical_2pl.
## 4 chains, each with iter=500; warmup=250; thin=1;
## post-warmup draws per chain=250, total post-warmup draws=1000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha[1] 0.79 0.00 0.12 0.57 0.71 0.78 0.86 1.03 605 1.00
## alpha[2] 1.44 0.01 0.17 1.13 1.33 1.44 1.56 1.77 1000 1.00
## alpha[3] 1.04 0.00 0.13 0.81 0.95 1.04 1.12 1.31 717 1.00
## alpha[4] 0.59 0.00 0.10 0.40 0.52 0.59 0.66 0.80 1000 1.00
## alpha[5] 0.97 0.00 0.13 0.74 0.88 0.96 1.05 1.25 1000 1.01
## alpha[6] 1.11 0.01 0.15 0.83 1.00 1.11 1.20 1.40 640 1.00
## alpha[7] 0.99 0.01 0.14 0.74 0.90 0.99 1.09 1.28 574 1.00
## alpha[8] 0.69 0.00 0.11 0.48 0.62 0.69 0.76 0.91 600 1.00
## alpha[9] 0.73 0.00 0.12 0.51 0.65 0.73 0.81 1.00 663 1.00
## alpha[10] 0.99 0.00 0.12 0.77 0.91 0.98 1.07 1.25 1000 1.00
## alpha[11] 1.70 0.01 0.39 1.03 1.43 1.64 1.91 2.59 799 1.00
## alpha[12] 0.28 0.00 0.07 0.15 0.23 0.28 0.32 0.43 767 1.00
## alpha[13] 1.08 0.01 0.14 0.83 0.99 1.08 1.17 1.35 700 1.00
## alpha[14] 1.02 0.01 0.14 0.77 0.93 1.02 1.11 1.32 652 1.00
## alpha[15] 1.25 0.01 0.18 0.93 1.12 1.24 1.36 1.62 467 1.01
## alpha[16] 0.71 0.00 0.10 0.53 0.64 0.70 0.77 0.92 1000 1.00
## alpha[17] 1.52 0.01 0.30 1.00 1.32 1.49 1.72 2.15 638 1.00
## alpha[18] 1.65 0.01 0.18 1.33 1.52 1.64 1.77 2.04 548 1.00
## alpha[19] 0.83 0.00 0.11 0.61 0.75 0.82 0.90 1.06 1000 1.00
## alpha[20] 1.46 0.01 0.22 1.08 1.32 1.44 1.59 1.94 493 1.00
## alpha[21] 0.83 0.01 0.15 0.57 0.72 0.83 0.94 1.14 732 1.00
## alpha[22] 1.46 0.01 0.27 0.98 1.27 1.44 1.62 2.03 440 1.01
## alpha[23] 0.63 0.00 0.10 0.45 0.56 0.63 0.70 0.85 1000 1.00
## alpha[24] 1.17 0.01 0.15 0.90 1.06 1.17 1.27 1.51 728 1.00
## alpha[25] 0.75 0.00 0.11 0.54 0.67 0.74 0.83 0.97 1000 1.00
## alpha[26] 1.48 0.01 0.16 1.20 1.36 1.47 1.59 1.79 599 1.00
## alpha[27] 1.79 0.01 0.27 1.31 1.60 1.78 1.98 2.35 400 1.01
## alpha[28] 1.03 0.00 0.13 0.80 0.95 1.03 1.12 1.31 1000 1.00
## alpha[29] 0.82 0.00 0.11 0.60 0.74 0.81 0.89 1.05 689 1.00
## alpha[30] 0.43 0.00 0.08 0.26 0.37 0.43 0.49 0.60 717 1.00
## alpha[31] 2.13 0.02 0.30 1.58 1.93 2.11 2.31 2.75 347 1.00
## alpha[32] 0.37 0.00 0.07 0.25 0.32 0.37 0.42 0.53 415 1.00
## beta[1] 1.34 0.01 0.22 0.97 1.20 1.32 1.46 1.84 598 1.00
## beta[2] -0.31 0.00 0.08 -0.47 -0.36 -0.30 -0.25 -0.15 354 1.01
## beta[3] 1.09 0.01 0.15 0.82 0.99 1.08 1.18 1.42 683 1.00
## beta[4] 0.92 0.01 0.22 0.57 0.76 0.90 1.05 1.38 1000 1.01
## beta[5] -0.64 0.01 0.12 -0.89 -0.71 -0.63 -0.56 -0.41 399 1.01
## beta[6] 1.84 0.01 0.21 1.49 1.69 1.82 1.98 2.30 594 1.00
## beta[7] -1.42 0.01 0.19 -1.85 -1.52 -1.41 -1.30 -1.10 517 1.00
## beta[8] 2.21 0.02 0.36 1.66 1.96 2.17 2.41 3.12 470 1.00
## beta[9] -3.08 0.02 0.47 -4.11 -3.36 -3.02 -2.75 -2.30 499 1.00
## beta[10] 0.36 0.00 0.10 0.18 0.29 0.35 0.42 0.57 1000 1.01
## beta[11] -3.19 0.02 0.49 -4.36 -3.46 -3.12 -2.85 -2.38 668 1.00
## beta[12] 1.36 0.02 0.49 0.64 1.01 1.27 1.59 2.60 625 1.00
## beta[13] -0.79 0.01 0.12 -1.03 -0.86 -0.78 -0.71 -0.58 472 1.00
## beta[14] -1.16 0.01 0.15 -1.50 -1.25 -1.15 -1.05 -0.91 666 1.00
## beta[15] -1.55 0.01 0.18 -1.92 -1.66 -1.53 -1.42 -1.23 397 1.02
## beta[16] 0.54 0.00 0.15 0.26 0.43 0.53 0.64 0.83 1000 1.00
## beta[17] -2.80 0.02 0.44 -3.74 -3.04 -2.74 -2.50 -2.12 495 1.00
## beta[18] 0.51 0.00 0.08 0.35 0.45 0.50 0.56 0.68 479 1.01
## beta[19] -0.29 0.00 0.11 -0.50 -0.36 -0.29 -0.22 -0.08 1000 1.00
## beta[20] -1.78 0.01 0.19 -2.19 -1.89 -1.76 -1.64 -1.43 434 1.01
## beta[21] -3.23 0.02 0.53 -4.49 -3.55 -3.14 -2.84 -2.39 619 1.00
## beta[22] -2.39 0.02 0.33 -3.10 -2.57 -2.37 -2.17 -1.85 425 1.01
## beta[23] 1.36 0.01 0.26 0.95 1.20 1.34 1.51 1.94 725 1.00
## beta[24] -1.09 0.00 0.13 -1.38 -1.17 -1.08 -0.99 -0.84 716 1.00
## beta[25] 0.77 0.01 0.16 0.49 0.65 0.75 0.87 1.11 721 1.01
## beta[26] 0.11 0.00 0.08 -0.03 0.06 0.11 0.16 0.26 576 1.00
## beta[27] -1.52 0.01 0.15 -1.86 -1.62 -1.51 -1.41 -1.26 289 1.01
## beta[28] -0.17 0.00 0.10 -0.38 -0.23 -0.17 -0.11 0.02 1000 1.00
## beta[29] 0.93 0.01 0.16 0.66 0.81 0.91 1.04 1.27 557 1.00
## beta[30] 0.60 0.01 0.25 0.17 0.44 0.57 0.73 1.19 555 1.01
## beta[31] -1.26 0.01 0.11 -1.50 -1.32 -1.26 -1.18 -1.06 226 1.01
## beta[32] 4.57 0.04 0.92 3.17 3.90 4.43 5.16 6.43 606 1.00
## mu[1] -0.05 0.00 0.10 -0.25 -0.11 -0.05 0.02 0.13 1000 1.00
## mu[2] -0.25 0.01 0.34 -0.91 -0.48 -0.26 -0.02 0.39 1000 1.00
## tau[1] 0.50 0.00 0.08 0.38 0.45 0.50 0.55 0.68 708 1.00
## tau[2] 1.83 0.01 0.28 1.37 1.64 1.80 1.99 2.44 523 1.00
## Omega[1,2] -0.44 0.00 0.15 -0.70 -0.56 -0.45 -0.34 -0.12 1000 1.00
##
## Samples were drawn using NUTS(diag_e) at Tue Apr 19 16:52:58 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
To visualize the correlation between item parameters, the values of alpha[i]
and beta[i]
may be plotted against one another. This is presented in the left side of the figure below. Because the correlation is actually between xi[i,1]
(which is equal to beta[i]
) and xi[i,2]
(which is alpha[i]
on the log scale), a scatter plot pf xi[i,1]
versus xi[i,2]
is given on the right side.
# Assesmble a data frame of item parameter estimates and pass to ggplot
ab_df <- data.frame(Discrimination = ex_summary[paste0("alpha[", 1:sat_list$I,
"]"), "mean"], Difficulty = ex_summary[paste0("beta[", 1:sat_list$I, "]"),
"mean"], parameterization = "alpha & beta")
xi_df <- data.frame(Discrimination = ex_summary[paste0("xi[", 1:sat_list$I,
",1]"), "mean"], Difficulty = ex_summary[paste0("xi[", 1:sat_list$I, ",2]"),
"mean"], parameterization = "xi")
full_df <- rbind(ab_df, xi_df)
ggplot(full_df) + aes(x = Difficulty, y = Discrimination) + geom_point() + facet_wrap(~parameterization,
scales = "free")
Glas, Cees AW, and Wim J van der Linden. 2003. “Computerized Adaptive Testing with Item Cloning.” Applied Psychological Measurement 27 (4). Sage Publications: 247–61.
Wood, R., Wilson D. T., Gibbons R. D., Schilling S. G., Muraki E., and R. D. Bock. 2003. TESTFACT 4 for Windows: Test Scoring, Item Statistics, and Full-Information Item Factor Analysis. Lincolnwood, IL: Scientific Software International.