# 1 Model

## 1.1 Overview

The two-parameter logistic model (2PL) is an item response theory model that includes parameters for both the difficulty and discrimination of items. A hierarchical extension, presented here, models these item parameter pairs as correlated draws from a bivariate normal distribution. This model is similar to the hierarchical three-parameter logistic model proposed by Glas and van der Linden (2003).

$\mathrm{logit} [ \Pr(y_{ij} = 1 | \theta_j, \alpha_i, \beta_i) ] = \alpha_i (\theta_j - \beta_i)$ $\log \alpha_i, \beta_i \sim \mathrm{MVN}(\mu_1, \mu_2, \Sigma)$ $\theta_p \sim \mathrm{N}(0, 1)$

Variables:

• $$i = 1 \ldots I$$ indexes items
• $$j = 1 \ldots J$$ indexes persons
• $$y_{ij} \in \{ 0,1 \}$$ is the response of person $$j$$ to item $$i$$

Parameters:

• $$\alpha_i$$ is the discrimination for item $$i$$
• $$\beta_i$$ is the difficulty for item $$i$$
• $$\theta_j$$ is the ability for person $$j$$
• $$\mu_1$$ is the mean for $$\log \alpha_i$$
• $$\mu_2$$ is the mean for $$\beta_i$$
• $$\Sigma$$ is the covariance matrix for $$\log \alpha_i$$ and $$\beta_i$$

Priors:

• $$\mu_1 \sim \mathrm{N}(0,1)$$ is a weakly informative prior for the mean of the log discrimination parameters.
• $$\mu_2 \sim \mathrm{N}(0,25)$$ is a weakly informative prior for the mean of the difficulty parameters.
• Let $$\tau_1^2 = \Sigma_{1,1}$$ be the variance of the log discrimination parameters. Then $$\tau \sim \mathrm{Exp}(.1)$$ is a weakly informative prior for the standard deviation.
• Let $$\tau_2^2 = \Sigma_{2,2}$$ be the variance of the difficulty parameters. Then $$\tau_2 \sim \mathrm{Exp}(.1)$$ is a weakly informative prior for the standard deviation.
• A weakly informative prior is placed on the covariance, $$\Sigma_{1,2} = \Sigma_{2,1}$$, shrinking the posterior towards zero. This is described in more detail in the next section.

## 1.2Stan program

The Stan program for the model is provided below. It differs from the notation above in that it is written in terms of the Cholesky decomposition of the correlation matrix for better efficiency. This matrix is named L_Omega; Omega because it is the correlation rather than covariance matrix, and L because it is a Cholesky decomposition. The vector tau contains the standard deviations that would be the (square root of the) diagonal of the covariance matrix. (The standard deviations are invariant whether or not the the Cholesky decomposition is used.) In the model block, L_Omega is converted to its covariance equivalent L_Sigma, and item parameter pairs xi[i] are sampled from L_Sigma. The first element of vector xi[i] is the log discrimination for item i, and the second is the difficulty for item i.

Weakly informative normal priors are placed on mu, and weakly informative truncated normal priors are placed on tau. The prior placed on L_Omega using lkj_corr_cholesky() is weakly informative and slightly favors a correlation of zero. See the Stan manual for details.

While more efficient, parameters L_Omega and xi are difficult to interpret. To alleviate this inconvenience, item parameters alpha and beta are derived from xi in the transformed parameters block. (There is some redundancy, as beta[i] and xi[i,2] are equal.) Also, L_Omega is converted to a standard correlation matrix, Omega, in the generated quantities block.

data {
int<lower=1> I;               // # items
int<lower=1> J;               // # persons
int<lower=1> N;               // # observations
int<lower=1, upper=I> ii[N];  // item for n
int<lower=1, upper=J> jj[N];  // person for n
int<lower=0, upper=1> y[N];   // correctness for n
}
parameters {
vector[J] theta;              // abilities
vector[2] xi[I];              // alpha/beta pair vectors
vector[2] mu;                 // vector for alpha/beta means
vector<lower=0>[2] tau;       // vector for alpha/beta residual sds
cholesky_factor_corr[2] L_Omega;
}
transformed parameters {
vector[I] alpha;
vector[I] beta;
for (i in 1:I) {
alpha[i] <- exp(xi[i,1]);
beta[i] <- xi[i,2];
}
}
model {
matrix[2,2] L_Sigma;
L_Sigma <- diag_pre_multiply(tau, L_Omega);
for (i in 1:I)
xi[i] ~ multi_normal_cholesky(mu, L_Sigma);
theta ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(4);
mu[1] ~ normal(0,1);
tau[1] ~ exponential(.1);
mu[2] ~ normal(0,5);
tau[2] ~ exponential(.1);
y ~ bernoulli_logit(alpha[ii] .* (theta[jj] - beta[ii]));
}
generated quantities {
corr_matrix[2] Omega;
Omega <- multiply_lower_tri_self_transpose(L_Omega);
}

# 2 Simulation

First, the necessary R packages are loaded.

# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)

The R code that follows simulates a dataset conforming to the model. The Stan model will be evaluated in terms of its ability to recover the generating values of the parameters when fit to this dataset.

# Set paramters for the simulated data
I <- 20
J <- 1000
mu <- c(0, 0)
tau <- c(0.25, 1)
Omega <- matrix(c(1, 0.3, 0.3, 1), ncol = 2)

# Calculate or sample remaining paramters
Sigma <- tau %*% t(tau) * Omega
xi <- MASS::mvrnorm(I, c(0, 0), Sigma)
alpha <- exp(mu[1] + as.vector(xi[, 1]))
beta <- as.vector(mu[2] + xi[, 2])
theta <- rnorm(J, mean = 0, sd = 1)

# Assemble data and simulate response
data_list <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J,
each = I))
eta <- alpha[data_list$ii] * (theta[data_list$jj] - beta[data_list$ii]) data_list$y <- as.numeric(boot::inv.logit(eta) > runif(data_list$N)) The simulated data consists of 20 items and 1000 persons. The log discriminations have mean 0 and standard deviation 0.25. The difficulties have mean 0 and standard deviation 1. The correlation between the log discrimination residuals and difficulty residuals is 0.3. The simulated dataset is fit with Stan. # Fit model to simulated data sim_fit <- stan(file = "hierarchical_2pl.stan", data = data_list, chains = 4, iter = 500) Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the $$\hat{R}$$ statistic for the model parameters and log posterior. These are provided in the following figure. All values for $$\hat{R}$$ should be less than 1.1. sim_summary <- as.data.frame(summary(sim_fit)[[1]]) sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0,
width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.

# Make vector of wanted parameter names
wanted_pars <- c(paste0("alpha[", 1:I, "]"), paste0("beta[", 1:I, "]"), c("mu[1]",
"mu[2]", "tau[1]", "tau[2]", "Omega[1,2]"))

# Get estimated and generating values for wanted parameters
generating_values = c(alpha, beta, mu, tau, Omega[1, 2])
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]

# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values # Plot the discrepancy ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) + scale_x_discrete() + geom_abline(intercept = 0, slope = 0, color = "white") + geom_linerange() + geom_point(size = 2) + labs(y = "Discrepancy", x = NULL) + theme(panel.grid = element_blank()) + coord_flip() # 3 Example application # Use data and scoring function from the mirt package library(mirt) sat <- key2binary(SAT12, key = c(1, 4, 5, 2, 3, 1, 2, 1, 3, 1, 2, 4, 2, 1, 5, 3, 4, 4, 1, 4, 3, 3, 4, 1, 3, 5, 1, 3, 1, 5, 4, 5)) The example data (Wood et al. 2003) are from a grade 12 science assessment. 600 students responded to 32 dichotomously scored items. Non-responses were scored as incorrect, so the data contain no missing values. The scored response matrix is converted to list form and fit with Stan. # Assemble data list and fit model sat_list <- list(I = ncol(sat), J = nrow(sat), N = length(sat), ii = rep(1:ncol(sat), each = nrow(sat)), jj = rep(1:nrow(sat), times = ncol(sat)), y = as.vector(sat)) sat_fit <- stan(file = "hierarchical_2pl.stan", data = sat_list, chains = 4, iter = 500) As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior, using $$\hat{R}$$. ex_summary <- as.data.frame(summary(sat_fit)[[1]]) ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0,
width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

Next we view summaries of the parameter posteriors.

# View table of parameter posteriors
print(sat_fit, pars = c("alpha", "beta", "mu", "tau", "Omega[1,2]"))
## Inference for Stan model: hierarchical_2pl.
## 4 chains, each with iter=500; warmup=250; thin=1;
## post-warmup draws per chain=250, total post-warmup draws=1000.
##
##             mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha[1]    0.79    0.00 0.12  0.57  0.71  0.78  0.86  1.03   605 1.00
## alpha[2]    1.44    0.01 0.17  1.13  1.33  1.44  1.56  1.77  1000 1.00
## alpha[3]    1.04    0.00 0.13  0.81  0.95  1.04  1.12  1.31   717 1.00
## alpha[4]    0.59    0.00 0.10  0.40  0.52  0.59  0.66  0.80  1000 1.00
## alpha[5]    0.97    0.00 0.13  0.74  0.88  0.96  1.05  1.25  1000 1.01
## alpha[6]    1.11    0.01 0.15  0.83  1.00  1.11  1.20  1.40   640 1.00
## alpha[7]    0.99    0.01 0.14  0.74  0.90  0.99  1.09  1.28   574 1.00
## alpha[8]    0.69    0.00 0.11  0.48  0.62  0.69  0.76  0.91   600 1.00
## alpha[9]    0.73    0.00 0.12  0.51  0.65  0.73  0.81  1.00   663 1.00
## alpha[10]   0.99    0.00 0.12  0.77  0.91  0.98  1.07  1.25  1000 1.00
## alpha[11]   1.70    0.01 0.39  1.03  1.43  1.64  1.91  2.59   799 1.00
## alpha[12]   0.28    0.00 0.07  0.15  0.23  0.28  0.32  0.43   767 1.00
## alpha[13]   1.08    0.01 0.14  0.83  0.99  1.08  1.17  1.35   700 1.00
## alpha[14]   1.02    0.01 0.14  0.77  0.93  1.02  1.11  1.32   652 1.00
## alpha[15]   1.25    0.01 0.18  0.93  1.12  1.24  1.36  1.62   467 1.01
## alpha[16]   0.71    0.00 0.10  0.53  0.64  0.70  0.77  0.92  1000 1.00
## alpha[17]   1.52    0.01 0.30  1.00  1.32  1.49  1.72  2.15   638 1.00
## alpha[18]   1.65    0.01 0.18  1.33  1.52  1.64  1.77  2.04   548 1.00
## alpha[19]   0.83    0.00 0.11  0.61  0.75  0.82  0.90  1.06  1000 1.00
## alpha[20]   1.46    0.01 0.22  1.08  1.32  1.44  1.59  1.94   493 1.00
## alpha[21]   0.83    0.01 0.15  0.57  0.72  0.83  0.94  1.14   732 1.00
## alpha[22]   1.46    0.01 0.27  0.98  1.27  1.44  1.62  2.03   440 1.01
## alpha[23]   0.63    0.00 0.10  0.45  0.56  0.63  0.70  0.85  1000 1.00
## alpha[24]   1.17    0.01 0.15  0.90  1.06  1.17  1.27  1.51   728 1.00
## alpha[25]   0.75    0.00 0.11  0.54  0.67  0.74  0.83  0.97  1000 1.00
## alpha[26]   1.48    0.01 0.16  1.20  1.36  1.47  1.59  1.79   599 1.00
## alpha[27]   1.79    0.01 0.27  1.31  1.60  1.78  1.98  2.35   400 1.01
## alpha[28]   1.03    0.00 0.13  0.80  0.95  1.03  1.12  1.31  1000 1.00
## alpha[29]   0.82    0.00 0.11  0.60  0.74  0.81  0.89  1.05   689 1.00
## alpha[30]   0.43    0.00 0.08  0.26  0.37  0.43  0.49  0.60   717 1.00
## alpha[31]   2.13    0.02 0.30  1.58  1.93  2.11  2.31  2.75   347 1.00
## alpha[32]   0.37    0.00 0.07  0.25  0.32  0.37  0.42  0.53   415 1.00
## beta[1]     1.34    0.01 0.22  0.97  1.20  1.32  1.46  1.84   598 1.00
## beta[2]    -0.31    0.00 0.08 -0.47 -0.36 -0.30 -0.25 -0.15   354 1.01
## beta[3]     1.09    0.01 0.15  0.82  0.99  1.08  1.18  1.42   683 1.00
## beta[4]     0.92    0.01 0.22  0.57  0.76  0.90  1.05  1.38  1000 1.01
## beta[5]    -0.64    0.01 0.12 -0.89 -0.71 -0.63 -0.56 -0.41   399 1.01
## beta[6]     1.84    0.01 0.21  1.49  1.69  1.82  1.98  2.30   594 1.00
## beta[7]    -1.42    0.01 0.19 -1.85 -1.52 -1.41 -1.30 -1.10   517 1.00
## beta[8]     2.21    0.02 0.36  1.66  1.96  2.17  2.41  3.12   470 1.00
## beta[9]    -3.08    0.02 0.47 -4.11 -3.36 -3.02 -2.75 -2.30   499 1.00
## beta[10]    0.36    0.00 0.10  0.18  0.29  0.35  0.42  0.57  1000 1.01
## beta[11]   -3.19    0.02 0.49 -4.36 -3.46 -3.12 -2.85 -2.38   668 1.00
## beta[12]    1.36    0.02 0.49  0.64  1.01  1.27  1.59  2.60   625 1.00
## beta[13]   -0.79    0.01 0.12 -1.03 -0.86 -0.78 -0.71 -0.58   472 1.00
## beta[14]   -1.16    0.01 0.15 -1.50 -1.25 -1.15 -1.05 -0.91   666 1.00
## beta[15]   -1.55    0.01 0.18 -1.92 -1.66 -1.53 -1.42 -1.23   397 1.02
## beta[16]    0.54    0.00 0.15  0.26  0.43  0.53  0.64  0.83  1000 1.00
## beta[17]   -2.80    0.02 0.44 -3.74 -3.04 -2.74 -2.50 -2.12   495 1.00
## beta[18]    0.51    0.00 0.08  0.35  0.45  0.50  0.56  0.68   479 1.01
## beta[19]   -0.29    0.00 0.11 -0.50 -0.36 -0.29 -0.22 -0.08  1000 1.00
## beta[20]   -1.78    0.01 0.19 -2.19 -1.89 -1.76 -1.64 -1.43   434 1.01
## beta[21]   -3.23    0.02 0.53 -4.49 -3.55 -3.14 -2.84 -2.39   619 1.00
## beta[22]   -2.39    0.02 0.33 -3.10 -2.57 -2.37 -2.17 -1.85   425 1.01
## beta[23]    1.36    0.01 0.26  0.95  1.20  1.34  1.51  1.94   725 1.00
## beta[24]   -1.09    0.00 0.13 -1.38 -1.17 -1.08 -0.99 -0.84   716 1.00
## beta[25]    0.77    0.01 0.16  0.49  0.65  0.75  0.87  1.11   721 1.01
## beta[26]    0.11    0.00 0.08 -0.03  0.06  0.11  0.16  0.26   576 1.00
## beta[27]   -1.52    0.01 0.15 -1.86 -1.62 -1.51 -1.41 -1.26   289 1.01
## beta[28]   -0.17    0.00 0.10 -0.38 -0.23 -0.17 -0.11  0.02  1000 1.00
## beta[29]    0.93    0.01 0.16  0.66  0.81  0.91  1.04  1.27   557 1.00
## beta[30]    0.60    0.01 0.25  0.17  0.44  0.57  0.73  1.19   555 1.01
## beta[31]   -1.26    0.01 0.11 -1.50 -1.32 -1.26 -1.18 -1.06   226 1.01
## beta[32]    4.57    0.04 0.92  3.17  3.90  4.43  5.16  6.43   606 1.00
## mu[1]      -0.05    0.00 0.10 -0.25 -0.11 -0.05  0.02  0.13  1000 1.00
## mu[2]      -0.25    0.01 0.34 -0.91 -0.48 -0.26 -0.02  0.39  1000 1.00
## tau[1]      0.50    0.00 0.08  0.38  0.45  0.50  0.55  0.68   708 1.00
## tau[2]      1.83    0.01 0.28  1.37  1.64  1.80  1.99  2.44   523 1.00
## Omega[1,2] -0.44    0.00 0.15 -0.70 -0.56 -0.45 -0.34 -0.12  1000 1.00
##
## Samples were drawn using NUTS(diag_e) at Tue Apr 19 16:52:58 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).

To visualize the correlation between item parameters, the values of alpha[i] and beta[i] may be plotted against one another. This is presented in the left side of the figure below. Because the correlation is actually between xi[i,1] (which is equal to beta[i]) and xi[i,2] (which is alpha[i] on the log scale), a scatter plot pf xi[i,1] versus xi[i,2] is given on the right side.

# Assesmble a data frame of item parameter estimates and pass to ggplot
ab_df <- data.frame(Discrimination = ex_summary[paste0("alpha[", 1:sat_list$I, "]"), "mean"], Difficulty = ex_summary[paste0("beta[", 1:sat_list$I, "]"),
"mean"], parameterization = "alpha & beta")
xi_df <- data.frame(Discrimination = ex_summary[paste0("xi[", 1:sat_list$I, ",1]"), "mean"], Difficulty = ex_summary[paste0("xi[", 1:sat_list$I, ",2]"),
"mean"], parameterization = "xi")
full_df <- rbind(ab_df, xi_df)
ggplot(full_df) + aes(x = Difficulty, y = Discrimination) + geom_point() + facet_wrap(~parameterization,
scales = "free")

# 4 References

Glas, Cees AW, and Wim J van der Linden. 2003. “Computerized Adaptive Testing with Item Cloning.” Applied Psychological Measurement 27 (4). Sage Publications: 247–61.

Wood, R., Wilson D. T., Gibbons R. D., Schilling S. G., Muraki E., and R. D. Bock. 2003. TESTFACT 4 for Windows: Test Scoring, Item Statistics, and Full-Information Item Factor Analysis. Lincolnwood, IL: Scientific Software International.