# Load data: Sommer-Zeger example from Imbens and Rubin (1997)
<- c(rep(1, 34 + 2385 + 12 + 9663),
Z rep(0, 74 + 11514))
<- c(rep(0, 34 + 2385),
W rep(1, 12 + 9663),
rep(0, 74 + 11514))
<- c(rep(0, 34),
Y rep(1, 2385),
rep(0, 12),
rep(1, 9663),
rep(0, 74),
rep(1, 11514))
Instrumental Variables Analysis of Randomized Experiments with One-Sided Noncompliance
1 Introduction
In this document, we demonstrate how to implement Bayesian inference for causal effects in randomized experiments with one-sided noncompliance using Stan. Specifically, we aim to replicate the analysis presented in Imbens and Rubin (1997). Noncompliance refers to situations where some units, assigned to receive a particular treatment level, do not comply with their assignment and instead receive an alternative treatment (Imbens and Rubin 2015). One-sided noncompliance denotes an asymmetric scenario where only units assigned to receive the active treatment can potentially circumvent their assigned treatment and receive the control treatment. In contrast, all units assigned to receive the control treatment comply with their assignment.
Initially, we define causal estimands and additional assumptions that allow us to estimate local average effects for the treatment of interest, specifically, averages for subsets of units. Subsequently, we analyze an example dataset using Stan according to the model-based estimation strategy suggested by Imbens and Rubin (1997). We then present Stan models with and without the exclusion restriction assumption, showcasing a significant advantage of the Bayesian model-based approach that enables us to assess the sensitivity of the treatment effect estimate to the assumption. All source code for this case study is available on GitHub at joonho112/Bayesian-causal-inference/Case_study_2.
2 Data Example
We demonstrate the application of the methods using data from a randomized experiment investigating the impact of vitamin A supplements on infant mortality in Indonesia. The data, previously analyzed by Sommer and Zeger (1991), can be found in the table below:
Compliance Type |
Assignment |
Receipt |
Survival |
Number of units ( |
---|---|---|---|---|
never-taker | 1 | 0 | 0 | 34 |
never-taker | 1 | 0 | 1 | 2,385 |
complier | 1 | 1 | 0 | 12 |
complier | 1 | 1 | 1 | 9,663 |
complier or never-taker | 0 | 0 | 0 | 74 |
complier or never-taker | 0 | 0 | 1 | 11,514 |
(not observed) | 0 | 1 | 0 | 0 |
(not observed) | 0 | 1 | 1 | 0 |
In this experiment, Indonesian villages were randomly assigned to receive or not to receive vitamin A supplements.1 The assignment to the supplements is denoted by
In principle, eight different combinations of the three observed binary variables,
We then organize the data into a list format suitable for use with Stan.
# Collect data into a list form
<- list(N = length(Y), Y = Y, Z = Z, W = W) stan_data
3 Causal Estimands
3.1 Setup
First, let’s define potential outcomes to fit the instrumental variable settings. For unit
We denote the proportions of the four compliance types in the population as
In the one-sided noncompliance case, all units assigned to the control group comply with the assignment, meaning
For the primary outcome, we define the potential outcome,
In the one-sided noncompliance setting, where both always-takers (
3.2 Intention-to-treat effects
Now consider the intention-to-treat (ITT) effect, the average effect of treatment assignment
Since the super-population average ITT effect for never-takers (
Note that we are dropping the
4 Model-Based Analysis
The moment-based estimator relies on the differences in sample averages across distinct groups, consequently disregarding individual-level information about compliance status. In contrast, Imbens and Rubin (1997) devised a model-based, or simulation, approach to estimating causal effects in randomized experiments with noncompliance, which emphasizes each individual’s compliance status. This alternative method to the standard ratio estimator offers a more nuanced perspective on causal effects.
The model-based estimation strategy offers several advantages over the moment-based approach. Firstly, it establishes a systematic framework for evaluating the effects of various restrictions on the joint distribution of observed variables (Imbens and Rubin 2015). In the context of our one-sided noncompliance example, the model-based method enables us to assess the sensitivity of the CACE estimate to the exclusion restriction for never-takers, which will be elaborated upon later in this document. Secondly, the model-based approach yields more accurate estimates if the proposed parametric model is approximately correct, especially in cases of small sample sizes or a low proportion of compliers (Page et al. 2015). Lastly, it provides a flexible means to extend the model to scenarios involving covariates or complex compliance statuses (e.g., Feller et al. (2016)).
4.1 Stan model with exclusion restriction for never-takers
Data block
data {
int<lower=1> N; // Sample size N
array[N] int<lower=0, upper=1> Z; // Treatment assigned Z
array[N] int<lower=0, upper=1> W; // Treatment received W
array[N] int<lower=0, upper=1> Y; // Outcome Y
}
For the model’s data inputs, we initially encode three observed binary variables (array
function for array declarations, which provides a more explicit and streamlined way to denote multidimensional structures. This modern syntax not only enhances clarity but also aligns with the direction of Stan’s evolution, ensuring compatibility with future versions.
Parameters block
parameters {
// Population probability of being a complier
real<lower=0, upper=1> pi_c;
// Probabilities for the binomial outcome distributions
real<lower=0, upper=1> eta_c0;
real<lower=0, upper=1> eta_c1;
real<lower=0, upper=1> eta_n;
}
Next, we define a comprehensive parameter vector
For binary outcomes, we assume that the outcome distribution for each compliance type follows a binomial distribution, with the probability of infant survival for compliance type
Model block
model {
// Define local variables for efficiency
real log_pi_c = log(pi_c);
real log1m_pi_c = log1m(pi_c);
// Prior for Complier probability
// implicit prior: pi_c ~ Unif(0, 1)
// Priors for outcome model parameters
2, 2);
eta_c0 ~ beta(2, 2);
eta_c1 ~ beta(2, 2);
eta_n ~ beta(
// Likelihood
for(n in 1:N){
// Complier (assigned to treatment)
if (Z[n] == 1 && W[n] == 1){
target += log_pi_c + bernoulli_lpmf(Y[n] | eta_c1) ;
}
// Never-taker (assigned to treatment)
else if (Z[n] == 1 && W[n] == 0){
target += log1m_pi_c + bernoulli_lpmf(Y[n] | eta_n);
}
// Complier or Never-taker (assigned to control)
else if (Z[n] == 0 && W[n] == 0){
target += log_mix(
// Complier probability
pi_c, // Complier
bernoulli_lpmf(Y[n] | eta_c0), // Never-taker
bernoulli_lpmf(Y[n] | eta_n)
);
}
} }
The model block in the Stan program defines prior distributions and the likelihood. The model’s initial lines compute the natural logarithms of two quantities: log_pi_c
and log1m_pi_c
, respectively. Specifically, log1m(x)
in Stan calculates the natural logarithm of (1 - x)
. This function offers enhanced numerical stability, particularly when the value of x
approaches 1, compared to directly computing log(1 - x)
. By precalculating and storing these logarithm values (log_pi_c
and log1m_pi_c
), the model optimizes its efficiency, eliminating the need to recompute log(pi_c)
and log(1 - pi_c)
multiple times within the likelihood calculations.
The next portion of the model section specifies prior distributions for the parameters. By not explicitly providing a prior for
The crux of the model-based estimation lies in utilizing a full likelihood that incorporates individual-level information about compliance status. To define the actual observed data likelihood function for the parameters, we first partition the set of
For the set
For the set
Given these specifications for the outcome distributions and the probabilities of compliance type for the three subsets, we can express the full likelihood function in terms of the observed data as follows:
Since there is no built-in Stan function to address the specific mixture structure of this likelihood function, we directly construct the likelihood function in the model block. We work with the log-likelihood because Stan’s execution relies on evaluating a log probability function for a given set of parameters. By taking the natural logarithm of the likelihood function defined above, the log of a product of terms within the likelihood function is transformed into a sum of the log of the terms.
Note that the likelihood function is expressed as an increment to the log probability function using the target +=
statement. The model accumulates the log contributions from the mixture components within each of the three observed data subsets,
the log of the probability of being a never-taker given treatment assignment (
log1m_pi_c
) andthe log Bernoulli probability mass evaluated at the point
Y[n]
given the probability of survival of an infant for never-takers (bernoulli_lpmf(Y[n] | eta_n)
).
For the set log_mix()
, is employed to define mixtures on the log scale. The log_mix(p, a, b)
function internally calculates the log-weighted mixture of a
and b
using the weight p
, which provides a more numerically stable solution for mixtures (see Chapter 5 of the Stan User’s Guide). By adopting this function, we directly determine the log-likelihood contributions for units in the subset
Generated quantities Block
generated quantities {
// Superpopulation complier average causal effect (CACE)
// in per-1000 units
real CACE = (eta_c1 - eta_c0) * 10^3;
}
The estimand of primary interest is the super-population complier average causal effect (CACE), denoted as generated quantities
block to generate the posterior distribution for the CACE, which is defined as a function of the declared parameters. We rescaled the CACE estimates by simply multiplying the original estimates by
Model estimation
The code blocks specified above are combined in the Stan program cace_with_exclusion.stan
. This program is then fit to the Sommer-Zeger vitamin A supplement data:
# Compile and run the stan model
<- stan(
stan_fit_ER file = "stan/cace_with_exclusion.stan",
data = stan_data,
iter = 2000, chains = 4
)
The posterior means and standard deviations of the parameters can be displayed as follows:
print(stan_fit_ER,
probs = c(0.005, 0.05, 0.5, 0.95, 0.995),
digits = 3)
Inference for Stan model: cace_with_exclusion.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 0.5% 5% 50% 95%
pi_c 0.800 0.000 0.004 0.791 0.794 0.800 0.806
eta_c0 0.995 0.000 0.001 0.992 0.994 0.995 0.997
eta_c1 0.999 0.000 0.000 0.997 0.998 0.999 0.999
eta_n 0.986 0.000 0.002 0.979 0.981 0.986 0.989
CACE 3.226 0.020 1.144 0.381 1.396 3.188 5.128
lp__ -6807.267 0.033 1.380 -6812.340 -6809.888 -6806.961 -6805.639
99.5% n_eff Rhat
pi_c 0.809 3641 1.001
eta_c0 0.998 3459 0.999
eta_c1 0.999 3939 1.000
eta_n 0.991 3130 1.000
CACE 6.384 3420 0.999
lp__ -6805.408 1739 1.001
Samples were drawn using NUTS(diag_e) at Sun Sep 17 10:38:58 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Before interpreting the results, it is necessary to check that the chains have converged for each parameter. The
Click this to see the code that produced this plot
# Extract posterior draws of CACE
<- rstan::extract(stan_fit_ER) %>%
df_CACE pluck("CACE") %>%
->> vec_CACE} %>%
{. as_tibble() %>%
set_names("CACE")
# Plot the histogram of CACE with exclusion restriction
ggplot(data = df_CACE, aes(x = CACE)) +
geom_histogram(
bins = 40, color = "black", fill = "gray95"
+
) geom_vline(
xintercept = quantile(vec_CACE, probs = c(0.05, 0.50, 0.95)),
color = "red", linetype = "longdash"
+
) geom_text(
x = 3.8, y = 25,
label = paste0("Median = ", round(median(df_CACE$CACE), 2)),
color = "blue", size = 5
+
) scale_x_continuous(
name = "CACE (in per 1,000 units)",
breaks = seq(from = -4, to = 10, by = 2)
+
) labs(
title = "Histogram of CACE with exclusion restriction",
subtitle = "with median and 90% credible interval"
+
) theme_bw() +
theme(panel.grid = element_blank())
This histogram replicates Figure 3 from Imbens and Rubin (1997). Under the exclusion restriction for never-takers, the posterior mean of the CACE is 3.23. The 90% credible interval spans from 1.4 to 1.4 per 1,000 individuals, while the 99% credible interval ranges from 0.38 to 5.13. These findings suggest that vitamin A treatment is likely to confer a positive effect on infant survival among compliers in the studied population.
In practical terms, given a CACE of 3.23, it means that for every 1,000 complier infants, about an additional 3.23 infants are expected to survive due to the vitamin A treatment, compared to their counterparts who did not receive the intervention. To fully grasp the practical significance of the CACE, one must consider other relevant data, the overarching landscape of infant health, and any potential disadvantages or expenses related to the treatment. For instance, in regions grappling with high infant mortality rates, even marginal improvements in survival rates might carry substantial importance. The cost-effectiveness and safety profile of the vitamin A treatment further accentuate its potential benefits, especially if it is both affordable and exhibits minimal side effects.
4.2 Stan model without exclusion restriction for never-takers
In this section, we assess the sensitivity of the CACE estimate to the exclusion restriction for never-takers. Without the exclusion restriction, the potential outcomes for never-takers are assumed to be affected by the treatment assignment
parameters {
// Population probability of being a complier
real<lower=0, upper=1> pi_c;
// Probabilities for the binomial outcome distributions
real<lower=0, upper=1> eta_c0;
real<lower=0, upper=1> eta_c1;
real<lower=0, upper=1> eta_n0;
real<lower=0, upper=1> eta_n1;
}
The model block is nearly identical to the one specified in the previous section, sharing the same prior distributions and likelihood. We use the same conjugate Beta priors for the binomial outcome distributions with
The only difference is the parameterization of the outcome distributions for never-takers. Instead of using a common parameter
model {
// Define local variables for efficiency
real log_pi_c = log(pi_c);
real log1m_pi_c = log1m(pi_c);
// Prior for Complier probability
// implicit prior: pi_c ~ Unif(0, 1)
// Priors for outcome model parameters
2, 2);
eta_c0 ~ beta(2, 2);
eta_c1 ~ beta(2, 2);
eta_n0 ~ beta(2, 2);
eta_n1 ~ beta(
// Likelihood
for(n in 1:N){
// Complier (assigned to treatment)
if (Z[n] == 1 && W[n] == 1){
target += log_pi_c + bernoulli_lpmf(Y[n] | eta_c1) ;
}
// Never-taker (assigned to treatment)
else if (Z[n] == 1 && W[n] == 0){
target += log1m_pi_c + bernoulli_lpmf(Y[n] | eta_n1);
}
// Complier or Never-taker (assigned to control)
else if (Z[n] == 0 && W[n] == 0){
target += log_mix(
// Complier probability
pi_c, // Complier
bernoulli_lpmf(Y[n] | eta_c0), // Never-taker
bernoulli_lpmf(Y[n] | eta_n0)
);
}
} }
In addition to the main causal estimand, the super-population complier average causal effect (CACE, generated quantities
block. We will refer to this as the “NACE” (Never-taker Average Causal Effect):
generated quantities {
// Super-population average causal effects
real CACE = (eta_c1 - eta_c0) * 10^3;
real NACE = (eta_n1 - eta_n0) * 10^3;
}
The Stan program cace_without_exclusion.stan
is subsequently fit to the example dataset:
# Compile and run the stan model
<- stan(
stan_fit_noER file = "stan/cace_without_exclusion.stan",
data = stan_data,
iter = 2000, chains = 4
)
The CACE and NACE estimates, without assuming exclusion restriction for never-takers, are presented in the table below. The
print(stan_fit_noER,
probs = c(0.005, 0.05, 0.5, 0.95, 0.995),
digits = 3)
Inference for Stan model: anon_model.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 0.5% 5% 50% 95%
pi_c 0.800 0.000 0.004 0.791 0.794 0.800 0.806
eta_c0 0.996 0.000 0.002 0.991 0.993 0.996 0.999
eta_c1 0.999 0.000 0.000 0.997 0.998 0.999 0.999
eta_n0 0.983 0.000 0.008 0.964 0.970 0.983 0.996
eta_n1 0.985 0.000 0.002 0.978 0.981 0.985 0.989
CACE 2.714 0.040 1.993 -1.429 -0.361 2.668 6.079
NACE 2.162 0.167 8.364 -15.710 -10.855 2.017 16.022
lp__ -6816.380 0.046 1.705 -6822.810 -6819.627 -6816.005 -6814.370
99.5% n_eff Rhat
pi_c 0.809 3840 1.000
eta_c0 1.000 2399 1.000
eta_c1 0.999 4039 0.999
eta_n0 0.999 2422 1.000
eta_n1 0.991 4045 0.999
CACE 7.659 2446 1.000
NACE 23.241 2522 0.999
lp__ -6814.004 1404 1.003
Samples were drawn using NUTS(diag_e) at Fri Sep 22 21:20:47 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Click this to see the code that produced this plot
# Extract posterior draws of CACE
<- rstan::extract(stan_fit_noER) %>%
df_CACE pluck("CACE") %>%
->> vec_CACE} %>%
{. as_tibble() %>%
set_names("CACE")
# Plot the histogram of CACE with exclusion restriction
ggplot(data = df_CACE, aes(x = CACE)) +
geom_histogram(
bins = 40, color = "black", fill = "gray95"
+
) geom_vline(
xintercept = quantile(vec_CACE, probs = c(0.05, 0.50, 0.95)),
color = "red", linetype = "longdash"
+
) geom_text(
x = 3.8, y = 25,
label = paste0("Median = ", round(median(df_CACE$CACE), 2)),
color = "blue", size = 5
+
) scale_x_continuous(
name = "CACE (in per 1,000 units)",
breaks = seq(from = -4, to = 10, by = 2)
+
) labs(
title = "Histogram of CACE without exclusion restriction",
subtitle = "with median and 90% credible interval"
+
) theme_bw() +
theme(panel.grid = element_blank())
The histogram above replicates Figure 1 from Imbens and Rubin (1997). The 90% credible interval of the posterior distribution indicates that the true CACE, without exclusion restriction, likely falls within the range of -0.36 to 6.08 per 1,000 individuals.
Click this to see the code that produced this plot
# Extract posterior draws of
<- rstan::extract(stan_fit_noER) %>%
df_NACE pluck("NACE") %>%
->> vec_NACE} %>%
{. as_tibble() %>%
set_names("NACE")
# Plot the histogram of NACE with exclusion restriction
ggplot(data = df_NACE, aes(x = NACE)) +
geom_histogram(
bins = 40, color = "black", fill = "gray95"
+
) geom_vline(
xintercept = quantile(vec_NACE, probs = c(0.05, 0.50, 0.95)),
color = "red", linetype = "longdash"
+
) geom_text(
x = 3.8, y = 25,
label = paste0("Median = ", round(median(df_NACE$NACE), 2)),
color = "blue", size = 5
+
) scale_x_continuous(
name = "NACE (in per 1,000 units)",
breaks = seq(from = -30, to = 30, by = 2)
+
) labs(
title = "Histogram of NACE without exclusion restriction",
subtitle = "with median and 90% credible interval"
+
) theme_bw() +
theme(panel.grid = element_blank())
We also replicate Figure 2 from Imbens and Rubin (1997), which represents the posterior distribution for NACE. Under the exclusion restriction, the NACE is constrained to be 0 because
Click this to see the code that produced this plot
# Plot joint posterior distribution of CACE and NACE
ggplot(data = bind_cols(df_CACE, df_NACE), aes(x = CACE, y = NACE)) +
geom_point(size = 2, shape = 1) +
geom_vline(xintercept = 0, color = "blue") +
geom_hline(yintercept = 0, color = "blue") +
labs(title = "Joint posterior distribution of CACE and NACE") +
theme_bw() +
theme(panel.grid = element_blank())
Next, we plot the joint posterior distribution of CACE and NACE without the exclusion restriction, replicating Figure 4 from Imbens and Rubin (1997). From the plot, we observe that in order to believe that CACE has a negative effect, one must also believe that NACE has a strong positive effect. Since this combination of hypotheses does not seem plausible, we can confidently conclude that receiving treatment has a positive effect on the outcome, even without imposing the exclusion restriction for never-takers (Imbens and Rubin 1997).
5 Conclusion
This document illustrates the application of Bayesian inference for estimating causal effects in randomized experiments with one-sided noncompliance using Stan. The methodology depends on two primary assumptions: unconfoundedness of treatment assignment, and an exclusion restriction that negates the effect of assignment on outcomes for noncompliers. The first assumption is intrinsic to randomized experiments, while the second necessitates more subject-matter knowledge and can be reinforced through design measures like double-blinding.
Our demonstration highlights the Bayesian model-based approach’s ability to assess the sensitivity of the treatment effect estimate to the exclusion restriction assumption, by performing the analysis both with and without this assumption. Without the exclusion restriction, inference might be imprecise, even in large samples. However, with the exclusion restriction, the complier average causal effect (CACE) can be more accurately estimated using Bayesian model-based approaches compared to traditional econometric instrumental variables approaches (Imbens and Rubin 1997).
References
Footnotes
This setting involves treatment being randomized at the group (village) level, while noncompliance occurs at the individual level. Imbens and Rubin (1997) mention that the authors did not have access to village indicators, which prevents them from accounting for the clustering resulting from the village-level randomization.↩︎