Public opinion data are typically analyzed when researchers are interested in the opinions of individuals and subgroups. But sometimes we are more interested in the average opinion in a society at a point in time. In such studies of macro-opinion, our attention turns to trends across time, or patterns across societies. Yet this is where problems arise. Existing cross-national surveys field questions intermittently. The wording of questions varies across projects and sometimes across time as well. All of this makes it hard to produce a single measure of opinion on a given topic that includes all relevant public opinion data and which can be used to make comparisons across time and space.

This case study describes a method for doing so, based on (Claassen 2019, 2020a, 2020b). I begin by describing the model and its coding in Stan. I then outline the data I use – public support for a democratic system of government – and explain how to set the model up in R. After fitting the model using cmdstanr, I explore some model checks before extracting and examining key quantities of interest.

Model

The observed number of respondents \(x\) offering an affirmative opinion (e.g. in support of democracy) for each country \(i\), year \(t\), and survey item \(k\), is modeled as a binomial distributed count: \[\begin{equation} x_{ikt} \sim \text{Binomial}(s_{ikt},~ \pi_{ikt}). \end{equation}\]

Following McGann (2014), I use a beta prior on the probability parameter. This allows for some additional dispersion in the observed survey responses, i.e., it captures sources of error over and above simple sampling error:

\[\begin{equation} \pi_{ikt} \sim \text{Beta}(\alpha_{ikt},~ \beta_{ikt}) \end{equation}\]

The two shape parameters of the beta distribution can be reparameterized as a vector of expectation parameters \(\eta\) and a dispersion parameter \(\phi\):

\[\begin{align} \alpha_{ikt} &= \phi\eta_{ikt}\\ \beta_{ikt} &= \phi(1-\eta_{ikt}) \end{align}\]

The expectation parameters are then modeled as a function of the latent country-year estimates \(\theta\) as well as some additional parameters:

\[\begin{equation} \eta_{ikt} = \text{logit}^{-1}(\lambda_{k} + \delta_{ik} + \gamma_k\theta_{it} ) \end{equation}\]

Two item parameters, \(\lambda\) and \(\gamma\), are included. The former allows for item-specific bias that may inflate or deflate the probability of agreement regardless of true latent opinion \(\theta\). This is comparable to an item difficulty parameter in item-response theoretic (IRT) models. \(\gamma\) is an item slope that allows items to have different magnitude effects on latent opinion, analogous to the item discrimination parameters used in IRT models.

I furthermore include an additional parameter \(\delta\) that allows item bias to vary by country. For example, questions about military regimes will have different connotations (and bias) where military regimes have previously ruled.

All item parameters are modeled hierarchically, which will be helpful if observed survey data are sparse when grouped by country, by item, and by item-country combinations.

\[\begin{equation} \delta_{ik} \sim \text{N}(0,~ \sigma^2_{\delta}) \end{equation}\]

The \(\lambda\) and \(\gamma\) parameters are modeled jointly using a hierarchical bivariate normal prior:

\[\begin{equation} \begin{pmatrix} \lambda_k \\ \gamma_k \\ \end{pmatrix} \sim \text{N} \begin{bmatrix} \begin{pmatrix} \mu_{\lambda} \\ \mu_{\gamma} \\ \end{pmatrix} , \begin{pmatrix} \sigma^2_\lambda & \rho \sigma_\lambda \sigma_\gamma \\ \rho \sigma_\lambda \sigma_\gamma & \sigma^2_\gamma \\ \end{pmatrix} \end{bmatrix} \end{equation}\]

Finally, latent opinion is allowed to evolve over time using a simple dynamic linear model where the current level of latent opinion is a function of the previous year’s level plus some random noise (e.g., Caughey and Warshaw (2015)):

\[\begin{equation} \theta_{it} \sim \text{N}(\theta_{i, t-1},~ \sigma^2_\theta) \end{equation}\]

Stan code

The data block includes data, dimensions (of various grouping factors), and several index vectors. In terms of data, we have the vectors x, and samp, which are the number of respondents who, respectively, offered and were asked an opinion for each country, year and item combination. N is the number of national opinions we have in our data – there may be more than one per country-year if a survey has included multiple items or if multiple surveys were run in a country and year. R, in contrast, is the number of country-by-year estimates of latent opinion. The model only estimates the latent opinion for country \(j\) starting with the year in which the first survey was fielded in country \(j\). As such, R \(<\) J \(\times\) T.

data{
  int<lower=1> N; // number of national survey opinions
  int<lower=1> J; // number of countries
  int<lower=1> K; // number of items
  int<lower=1> P; // number of items-country combinations  
  int<lower=1> R; // number of national opinion estimates
  array[N] int<lower=1,upper=J> jj; // country j for opinion n
  array[N] int<lower=1,upper=K> kk; // item k for opinion n
  array[N] int<lower=1,upper=P> pp; // item-country p for opinion n
  array[N] int<lower=1,upper=R> rr; // estimate r for opinion n
  array[N] int<lower=1> x; // vector of survey responses, count
  array[N] int<lower=1> samp; // vector of sample sizes
  array[K] int<lower=1> it_len; // number of countries for each item
  array[J] int<lower=1> est_pos; // indicator showing cntry start for estimate vector   
  array[J] int<lower=1> len_theta_ts; // indicator for length of each cntry ts
  real mn_resp_log; // observed response mean proportion on logit scale
}

parameters{
  real<lower=0> sigma_theta; // opinion evolution error SD
  real<lower=0> sigma_delta; // item-country intercept error SD
  row_vector[J] theta_init; // initial latent traits for year 0
  real<lower=0> phi; // dispersion parameter
  corr_matrix[2] Omega; // correlation matrix for item pars
  vector<lower=0>[2] tau; // cor -> cov conversion
  real mu_lambda; // item intercept expectation
  vector[P] delta_ncp; // redundant parameters for item-country effects
  vector[R] theta_ncp; // redundant parameters for latent opinion
  matrix[K,2] Gamma_ncp; // non-centered parameters for item parameters
}

transformed parameters{
  // dynamic model with non-centered parameters
  vector[R] theta; // R-vector of theta values  
  for (j in 1:J) {                  
    theta[est_pos[j]] = theta_init[j] 
    + sigma_theta * theta_ncp[est_pos[j]];
    for (i in 1:(len_theta_ts[j]-1)) {
      theta[(est_pos[j]+i)] = theta[(est_pos[j]+i-1)] 
      + sigma_theta * theta_ncp[(est_pos[j]+i)];
    }
  }
  // variance-covariance matrix for item ints and slopes
  matrix[2,2] Sigma = quad_form_diag(Omega, tau);  
  // non-centered item-country parameters
  vector[P] delta = sigma_delta * delta_ncp; 
  // item parameter models with non-centered parameters
  real mu_gamm = 1; // item slope expectation
  matrix[K,2] Gamma; // matrix of item intercepts and slopes 
  for (k in 1:K) 
    Gamma[k] = [ mu_lambda , mu_gamm ] + Gamma_ncp[k] * Sigma;
  vector[K] lambda = Gamma[,1]; // K estimated item intercepts
  vector[K] gamm = Gamma[,2]; // K estimated item slopes
  // fitted values model
  vector<lower=0,upper=1>[N]eta = inv_logit(lambda[kk]+ delta[pp] 
    + gamm[kk] .* theta[rr]);  
  // reparamaterise beta distr parameters
  vector<lower=0>[N] beta_par1 = phi * eta; 
  vector<lower=0>[N] beta_par2 = phi * (1 - eta); 
}

model{
  // response model
  x ~ beta_binomial(samp, beta_par1, beta_par2); 
  // priors
  phi ~ gamma(3, 0.04);                 
  sigma_theta ~ normal(0, 1); 
  sigma_delta ~ normal(0, 1); 
  tau ~ normal(0, 1);
  Omega ~ lkj_corr(2);
  mu_lambda ~ normal(mn_resp_log, 0.5);
  theta_init ~ normal(0, 1);
  theta_ncp ~ normal(0, 1);
  to_vector(Gamma_ncp) ~ normal(0, 1);
  // standard normal prior for item-country effects, centered within items
  int pos; // local variable indicating which item to evaluate  
  pos = 1;
  for (k in 1:K) { 
    segment(delta_ncp, pos, it_len[k]) ~ normal(0, 1);
    pos = pos + it_len[k];
  }
}

generated quantities {
  vector[N] x_pred; // fitted data to check model
  vector[N] log_lik; // log lik for WAIC calc
  for (i in 1:N) {
    x_pred[i] = beta_binomial_rng(samp[i], beta_par1[i], beta_par2[i]);
    log_lik[i] = beta_binomial_lpmf(x[i] | samp[i], beta_par1[i], beta_par2[i]); 
  }
}

To faciliate the estimation of a non-rectagular panel of latent opinion, latent estimates are stored as a vector theta, which allows the use of vectorized code and speeds up estimation. This is where the various index vectors, e.g., jj, kk, etc., come into play. The non-centralized parameterization proved necessary for convergence of theta and its error standard deviation sigma_theta, hence the theta model being specified in the tranformed parameters block. The item and item-country parameters lambda, gamma, and delta are similarly estimated using non-centralized parametetrizaions and coded within the tranformed parameters block.

Latent opinion is linked to observed survey data using the \(\eta\) estimates. Since \(\eta\) are the probability parameter for the beta-binomial response model, these parameters are inverse-logit transformed to be constrained to the [0,1] interval.

The response model linking \(\eta\) to the observed data \(x\) is specfied in the model block, using the beta_binomial distribution. \(\eta\) is reparameterized as the two beta-binomial shape parameters (coded as beta_par1 and beta_par2 in the model).

To aid with model identification, the expectation of the item intercepts \(\lambda\) is centered on the logit-transformed mean percentage agreement observed in \(x\) while the expectation of the item slopes \(\gamma\) is fixed at 1.

Variance parameters are given weakly-informative half-normal priors, e.g., \(\sigma_\theta \sim \text{N}^{+}(0,~ 1)\) and the beta-binomial dispersion parameter \(\phi\) is given a \(\Gamma(3,~ 0.04)\) prior. The variance-covariance matrix of item intercepts and slopes is decomposed into the product of the variances for each vector of parameters and a \(2\times2\) correlation matrix \(\Omega\), which is given an LKJ prior as described in the stan documentation. Finally, the initial values of latent opinion for each country receive a \(\text{N}(0,~ 1)\) prior.

To aid in model checking, log-likelihood and predicted response count vectors are included in the generated quantities block but these are not examined here.

Data and Set up

Support for democracy is the extent to which a national public approves of a democratic system and rejects any autocratic alternatives. It is measured using survey questions that ask respondents to evaluate the appropriateness or desirability of democracy; compare democracy to some undemocratic alternative; or evaluate one of these undemocratic forms of government. Data were collected from cross-national survey projects that ran surveys between 1988 and 2020. The dataset includes 4,536 nationally aggregated survey responses gathered by 16 survey projects in 162 countries and territories and is available here.

I load the opinion dataset and drop countries for which we have survey data from only one point in time. Data for 144 countries and up to 32 years (1988 to 2020) remain.

# load data
supdem = read.csv("Support for democracy edit 1122.csv")

# drop countries with less than 2 years of data
cnt_obs_yrs = rowSums(table(supdem$Country, supdem$Year) > 0)
supdem = supdem[supdem$Country %in% levels(factor(supdem$Country))[cnt_obs_yrs > 1], ]

# check
head(supdem)

##   Country ISO3c Year Project        Item             ItemCnt Sample Response
## 4 Albania   ALB 1998     wvs  Church_wvs  Church_wvs_Albania    999    0.887
## 5 Albania   ALB 1998     wvs EvDemoc_wvs EvDemoc_wvs_Albania    999    0.943
## 6 Albania   ALB 2002     wvs    Army_wvs    Army_wvs_Albania   1000    0.803
## 7 Albania   ALB 2002     wvs  Church_wvs  Church_wvs_Albania   1000    0.889
## 8 Albania   ALB 2002     wvs EvDemoc_wvs EvDemoc_wvs_Albania   1000    0.917
## 9 Albania   ALB 2002     wvs  Strong_wvs  Strong_wvs_Albania   1000    0.748
##   RespN
## 4   886
## 5   942
## 6   803
## 7   889
## 8   917
## 9   748

length(unique(supdem$Country))

## [1] 144

Next, I extract the dimensions of the grouping factors – country, item, year, etc. – and create index vectors. Both of these will be needed for our vectorized Stan code.

# factorise
supdem$Country = as.factor(as.character(supdem$Country))
supdem$ISO3c = as.factor(as.character(supdem$ISO3c))
supdem$Item = as.factor(as.character(supdem$Item))
supdem$ItemCnt = as.factor(as.character(supdem$ItemCnt))
supdem$Project = as.factor(as.character(supdem$Project))

# identify first year to estimate for each country
year0 = 1987 # set year 0 
supdem$Year = supdem$Year - year0
cnt_start_yr = by(supdem$Year, supdem$Country, min, simplify = TRUE)

# extract dimensions
n_items = length(levels(supdem$Item))
n_cntrys = length(levels(supdem$Country))
n_yrs = 2020 - year0 # estimates up to 2020
n_resp = dim(supdem)[1]
n_itm_cnt = length(levels(supdem$ItemCnt))

# extract country, year, item index vectors
cntrys = as.numeric(factor(supdem$Country))
cnt_names = levels(supdem$Country)
cnt_iso = levels(supdem$ISO3c)
items = as.numeric(factor(supdem$Item))
yrs = supdem$Year
itm_cnts = as.numeric(factor(supdem$ItemCnt))
mean_resp.log = logit(mean(supdem$Response))

# create item-country length indicator for items
item_ind_kp = rep(0, length(levels(supdem$ItemCnt)))
for(i in 1:length(levels(supdem$Item))) {
  item_ind_kp[grepl(levels(supdem$Item)[i], levels(supdem$ItemCnt))] = i
}
item_ind_len = sapply(
  lapply(levels(supdem$Item), function(x) grep(x, levels(supdem$ItemCnt))), 
  length)

# length of each estimated mood series
len_theta_ts = rep(0, n_cntrys)
for(i in 1:n_cntrys){
  len_theta_ts[i] = length(cnt_start_yr[i]:n_yrs)
}
theta_n = sum(len_theta_ts)  

# create R-length vector indicating which country each estimate corresponds to
cntrys_r = rep(0, theta_n)
pos = 1
for(i in 1:n_cntrys) {
  cntrys_r[pos:(len_theta_ts[i] + pos - 1)] = (rep(i, len_theta_ts[i]))
  pos = pos + len_theta_ts[i]
}

# create R-length vector indicating which year each estimate corresponds to
year_r = rep(0, theta_n)
pos = 1
for(i in 1:n_cntrys) {
  year_r[pos:(len_theta_ts[i] + pos-1)] = cnt_start_yr[i]:n_yrs
  pos = pos + len_theta_ts[i]
}

# create R-length vector indicating which opinion observation each estimate corresponds to
n_map = data.frame(Obs = 1:n_resp, Cntry = cntrys, Yr = yrs)
r_map = data.frame(Est = 1:theta_n, Cntry = cntrys_r, Yr = year_r)
n_r_merg = merge(n_map, r_map, by = c("Cntry", "Yr"), all.x = TRUE)
n_r_merg = n_r_merg[order(n_r_merg$Obs), ]

# create vector indicating the start positions for each country estimate series
est_pos = rep(1, n_cntrys)
est_pos[1] = 1
for(i in 2:n_cntrys) {
  est_pos[i] = est_pos[i-1] + len_theta_ts[i-1]
}

Model fitting and checking

The model is compiled and fit in R using cmdstanr. It proves helpful to set the adapt_delta parameter to 0.90, somewhat higher than the default of 0.80.

# specify data for stan
dat_1 = list(N = n_resp, K = n_items, J = n_cntrys, P = n_itm_cnt, R = theta_n,
             jj = cntrys, pp = itm_cnts, kk = items, rr = n_r_merg$Est,
             it_len = item_ind_len, est_pos = est_pos, len_theta_ts = len_theta_ts,
             x = supdem$RespN, samp = supdem$Sample, mn_resp_log = mean_resp.log)

# parameters to save
pars_1 = c("Sigma", "Omega", "sigma_delta", "sigma_theta", "phi", "mu_lambda", "lambda",
           "gamm", "delta", "theta", "x_pred", "log_lik")

# compile model
stan_mod = cmdstan_model('dynamic_crossnat_latent_opinion_model.stan', quiet = TRUE)

# Stan fit
stan_fit = stan_mod$sample(
  data = dat_1, chains = 4, init = 1, parallel_chains = 4,
  iter_warmup = 1000, iter_sampling = 1000, refresh = 0,
  adapt_delta = 0.90, max_treedepth = 12, save_warmup = FALSE
)

Once complete, convergence is verified using cmdstanr’s diagnostic_summary() function as well as an examination of parameters with the largest Rhats. There are no divergences, the estimated Bayesian Fraction of Missing Information is greater than 0.2, and Rhats are no greater than 1.01.

# Examine model fit
res = stan_fit
res$diagnostic_summary()

## $num_divergent
## [1] 0 0 0 0
## 
## $num_max_treedepth
## [1] 0 0 0 0
## 
## $ebfmi
## [1] 0.6593041 0.6525008 0.6897211 0.5916663

sum = res$summary(pars_1)
print(sum[order(sum$rhat, decreasing = TRUE), ], n = 10)

## # A tibble: 13,844 × 10
##    variable          mean   median     sd    mad      q5     q95  rhat ess_bulk
##    <chr>            <num>    <num>  <num>  <num>   <num>   <num> <num>    <num>
##  1 mu_lambda      1.07     1.07    0.124  0.120   0.864   1.28    1.01     415.
##  2 Sigma[2,1]    -0.00475 -0.00628 0.0388 0.0368 -0.0662  0.0628  1.01     875.
##  3 Sigma[1,2]    -0.00475 -0.00628 0.0388 0.0368 -0.0662  0.0628  1.01     875.
##  4 Omega[2,1]    -0.0154  -0.0161  0.0969 0.0950 -0.176   0.147   1.01     888.
##  5 Omega[1,2]    -0.0154  -0.0161  0.0969 0.0950 -0.176   0.147   1.01     888.
##  6 gamm[43]       0.933    0.938   0.176  0.171   0.642   1.22    1.01    2090.
##  7 delta[332]     0.0313   0.0289  0.335  0.337  -0.512   0.594   1.01    7274.
##  8 log_lik[845]  -5.30    -5.21    0.245  0.104  -5.78   -5.12    1.01    1332.
##  9 log_lik[3619] -5.24    -5.08    0.572  0.464  -6.36   -4.65    1.00    4683.
## 10 log_lik[3319] -5.99    -5.91    0.223  0.0932 -6.43   -5.83    1.00    2297.
## # ℹ 13,834 more rows
## # ℹ 1 more variable: ess_tail <num>

# traceplot
tp_pars = c("Sigma[1,1]", "Sigma[2,2]", "Omega[1,2]", "sigma_theta", "sigma_delta",
            "mu_lambda", "phi", "lambda[31]", "gamm[31]", "delta[23]", "theta[423]",
            "theta[2092]")
tp = bayesplot::mcmc_trace(res$draws(tp_pars), np = nuts_params(res))
tp

A traceplot confirms that parameters have converged.

Extracting and examining latent opinion estimates

Now that the model has converged, we can extract and examine the estimates of most interest: country-year latent opinion, \(\theta\).

# extract theta
theta_out = apply(res$draws("theta"), 3, as.vector)

# standardize
theta_mean = mean(as.vector(theta_out))
theta_sd = sd(as.vector(theta_out))
theta_std = (theta_out - theta_mean) / theta_sd

# extract PEs and SEs
theta_t = apply(theta_std, 1, function(x) t(x) )
theta_pe = apply(theta_t, 1, mean)
theta_u95 = apply(theta_t, 1, quantile, probs = c(0.975))
theta_l95 = apply(theta_t, 1, quantile, probs = c(0.025))
theta_pe_sd = apply(theta_t, 1, sd)

# collate into dataset
theta_df = data.frame(Country = cnt_names[r_map$Cntry], Year = r_map$Yr + year0, 
                      SupDem = theta_pe, SupDem_u95 = theta_u95, 
                      SupDem_l95 = theta_l95, SupDem_sd = theta_pe_sd)

# plot theta by country
cnt_plot = sort(unique(theta_df$Country))
par(mfrow = c(18, 8), mar = c(0.75, 0.75, 0.2, 0.2), cex = 0.8, tcl = -0.1,
    mgp = c(1.2, 0.3, 0), las = 1)
for (i in 1:n_cntrys) {
  plot(x = (year0 + 1):2020, y = rep(0,length((year0 + 1):2020)), type = "n",
       ylim = c(-3, 4), axes = FALSE)
  abline(h = c(2, 0, -2), lwd = 0.75, lty = 3, col = rgb(0.5, 0.5, 0.5, 0.5))
  abline(v = seq(1980, 2020, by = 10), lwd = 0.75, lty = 3, col = rgb(0.5, 0.5, 0.5, 0.5))
  lines(x = theta_df[theta_df$Country == cnt_plot[i], "Year"],
        y = theta_df[theta_df$Country == cnt_plot[i], "SupDem"],
        lwd = 1.5, col = rgb(0, 0.4, 0, 1))
  polygon(x = c(theta_df[theta_df$Country == cnt_plot[i], "Year"],
              rev(theta_df[theta_df$Country == cnt_plot[i], "Year"])),
          y = c(theta_df[theta_df$Country == cnt_plot[i], "SupDem_u95"],
              rev(theta_df[theta_df$Country == cnt_plot[i], "SupDem_l95"])),
          col = rgb(0, 0.4, 0, 0.4), border = NA)
  text(1988, 3.5, cnt_plot[i], adj = 0, cex = 0.65)
  axis(side = 1, at = seq(1980, 2020, by = 10), labels = c(1980, 1990, 2000, 2010, 2020),
       mgp = c(1, -0.3, 0), cex.axis = 0.5, lwd = 1)
  axis(side = 2, at = c(-2, 0, 2, 4), mgp = c(1, 0.15, 0), cex.axis = 0.5, lwd = 1)
  box(lwd = 0.75)
}

I standardize \(\theta\) to have mean of 0 and standard deviation of 1 before plotting the point estimates and 95% credible intervals by country (some discussion of these results can be found in Claassen (2020b)).

Examining item parameters

Finally I extract and examine the item parameters, \(\lambda\) and \(\gamma\). This allows us to examine whether all survey items are contributing the expected information to the latent variable.

# examine item pars
item_names = levels(supdem$Item)
item_pars = data.frame(item_int = res$summary("lambda")[,2],
                       item_slope = res$summary("gamm")[,2])
rownames(item_pars) = item_names
names(item_pars) = c("lambda", "gamma")
item_pars$project = as.character(supdem$Project[match(item_names, supdem$Item)])

# extract 100 draws from posteriors of ints and slopes
lambda_out = apply(res$draws("lambda"), 3, as.vector)
gamma_out = apply(res$draws("gamm"), 3, as.vector)
theta_samp_100 = theta_std[sample(1:dim(theta_std)[1], 100), ]

# adjust intercepts and slopes to account for standardization of theta
item_pars_std = item_pars
item_pars_std[,1] = item_pars[,1] + (item_pars[,2] * theta_mean)
lambda_out_std = lambda_out + (gamma_out * theta_mean)
item_pars_std[,2] = item_pars[,2] * theta_sd
gamma_out_std = gamma_out * theta_sd

# plot iccs
draw_ind = sample(1:dim(lambda_out)[1], 100)
par(mfrow = c(9, 7), mar = c(1, 1, 0.2, 0.2), tcl = -0.1, las = 1, cex = 0.8)
for(i in 1:dim(item_pars_std)[1]){
  curve(arm::invlogit(item_pars_std[i, 1] + x * item_pars_std[i, 2]), -3, 3, type = "l",
        xlab = "", ylab = "", xaxt = "n", yaxt = "n", yaxs = "i",
        col = rgb(1, 1, 1, 1), ylim = c(0, 1))
  abline(h = 0.5, lty = 3, lwd = 0.75, col = rgb(0.5, 0.5, 0.5, 0.5))
  abline(v = 0, lty = 3, lwd = 0.75, col = rgb(0.5, 0.5, 0.5, 0.5))
  curve(arm::invlogit(item_pars_std[i, 1] + x * item_pars_std[i, 2]), -3, 3,
        type = "l", lwd = 1.5, col = rgb(0, 0.5, 0, 1), add = TRUE)
  for(j in 1:100){
    curve(arm::invlogit(lambda_out_std[draw_ind[j], i] + x * gamma_out_std[draw_ind[j], i]),
          -3, 3, type = "l", lwd = 1, col = rgb(0, 0.5, 0, 0.1), add = TRUE)
  }
  text(-3, 0.15, rownames(item_pars_std)[i], adj = 0, cex = 0.7)
  axis(2, at = c(0, .5, 1), labels = c("0.0", "0.5", "1.0"), mgp = c(1.5, 0.2, 0),
       cex.axis = 0.5)
  axis(1, at = c(-2, 0, 2), mgp = c(1, -0.3, 0), cex.axis = 0.5)
}

I plot these parameters in the form of item characteristic curves, which show the relationship between latent opinion (x-axis) and the population percentage supporting / agreeing with the item in question (y-axis). Steeper curves indicate items with greater discriminative power. Curves that appear shifted upwards indicate items that are “easy”, i.e., they tend to attract supportive responses even when latent opinion is negative.

All of the included items show decent discriminative qualities, likely because they were selected based on an extensive literature that has examined these survey data.

References

Caughey, Devin, and Christopher Warshaw. 2015. “Dynamic Estimation of Latent Opinion Using a Hierarchical Group-Level IRT Model.” Political Analysis 23 (2): 197–211.

Claassen, Christopher. 2019. “Estimating Smooth Country-Year Panels of Public Opinion.” Political Analysis 27 (1): 1–20.

———. 2020a. “Does Public Support Help Democracy Survive?” American Journal of Political Science 64 (1): 118–34.

———. 2020b. “In the Mood for Democracy? Democratic Support as Thermostatic Opinion.” American Political Science Review 114 (1): 36–53.

McGann, Anthony J. 2014. “Estimating the Political Center from Aggregate Data: An Item Response Theory Alternative to the Stimson Dyad Ratios Algorithm.” Political Analysis 22 (1): 115–29.

Estimating dynamic cross-national opinion using existing survey data

Christopher Claassen

2023-11-21