Experimental |
|
|
|
|
Randomized Controlled Trial (RCT) [Duflo et al., 2007]
|
RCTs are generally not feasible for network water infrastructure, as
such interventions are clustered, directional, and designed to serve
population at scale or to address known (selected) system deficiencies.
Some complementary interventions (information campaigns) can be
evaluated using this approach. Smaller-scale rural infrastructure (e.g.,
condominial sewerage, village-scale piped water) can be evaluated with
cluster RCTs, or step-wedge RCTs.
|
Confounding due to unbalanced randomization
Spillovers (violation of the stable unit treatment value
assumption, or SUTVA), whereby some units benefit as a result of other
units’ uptake.
Vulnerable to selective attrition
|
Typically artefactual, w/ limited evaluation questions
Treatment effect can be representative
“Gold standard” for causal researchers
Results are not conditioned by assumptions
Statistical power is a design feature, but usually sufficient
for a few pre-identified outcomes
|
Cost: High, especially when powered for multiple outcomes or
interventions
Contamination risk: Moderate, as pressure to help “untreated”
units increases over time
Coordination: Mainly pertains to maintaining integrity of
randomization
Interpretation: Intuitive and highly transparent
Pre-intervention data needs: Low to none
Flexibility to adapt: Very low
|
Experimental encouragement design [Katz et al., 2001]
|
Subsidies or other assistance to customers can generate exogenous
variation in the take-up of infrastructure connections, for use as an
instrumental variable for isolating impacts. The resulting local average
treatment effect is specific to those who respond to the encouragement
[Heckman et al., 2006].
|
Same as above
|
Same as above, except that the treatment effect only applies to the
population that responds to the encouragement
|
|
Quasi-experimental |
|
|
|
|
Natural experiment [J Angrist et al., 2002]
|
Some infrastructure placements are determined by geographic or other
factors that are “as good as random” in determining exposure to
improvements, such that they provide researchers with “natural
experiments” [Cerdá et al., 2012], that give rise to
comparable treatment and control groups. Another version of this is an
interrupted time series analysis where a time-dependent event (e.g.,
rehab of one part of a water network) gives rise to a sharp change that
only affects some households or others.
|
Confounding by geographic / other factors determining exposure
may also confound outcomes
Spillovers (i.e., violation of SUTVA) outside of treatment area
|
Evidence arises directly from the real world
Treatment effect is representative but contingent on natural
experiment conditions
Generally accepted by researchers
Results are not conditioned by assumptions
Statistical power: Difficult to anticipate ex ante
|
Cost: Low to moderate, depending on data collection needs
Contamination risk: Low
Coordination: Moderate; mainly in combining with other methods
(DiD) to strengthen validity
Interpretation: Intuitive but not always transparent
Pre-intervention data needs: Low to none
Flexibility to adapt: Impossible
Other: Natural experiment can be hard to anticipate
|
Difference-in-differences (DiD) [Card and Krueger,
2000]
|
In this approach, impacts are estimated by subtracting out the trend in
an unexposed sample, which represents the counterfactual, from that in
an exposed sample. Such samples are created using variation in spatial
targeting or other eligibility criteria, which are common for network
water infrastructure extension or rehabilitation. The validity of the
comparison relies on pre-treatment trends being similar in the groups,
and can be enhanced using matching or econometric models that control
for differences in baseline covariates.
|
Confounding by time-varying unobservables
Spillovers (i.e., violation of SUTVA)
Vulnerable to selective attrition
|
Evidence arises directly from the real world
Treatment effect is usually representative (unless combined
w/other methods)
Generally accepted by researchers, subject to showing parallel trends
Results are not conditioned by assumptions
Statistical power is a design feature
|
Cost: Moderate to high, depending on data collection needs
Contamination risk: Moderate to high
Coordination: Moderate; mainly in combining with other methods
(matching) to strengthen validity
Interpretation: Intuitive and transparent
Pre-intervention data needs: Moderate to high (parallel trends)
Flexibility to adapt: Moderate
|
Matching or synthetic control [Abadie and Gardeazabal, 2003;
Rosenbaum and Rubin, 1985]
|
These methods are best when combined with DiD analysis, but can be used
to improve comparability when targeting is correlated with baseline
characteristics. Various matching approaches enhance comparability by
sampling untreated observations that can approximate the treatment
counterfactual. For example, propensity score matching (PSM) finds
treated and untreated observations that have a similar probability of
being treated, from a regression of participation on observables.
Synthetic control uses a time series of pre-intervention observations to
“train” an algorithm that identifies weights for a pool of
observations with similar counterfactual trends as one or more treated
units.
|
Confounding by unobservables (Conditional Independence
Assumption), worse when match quality is low
Spillovers (i.e., violation of SUTVA)
|
Evidence arises directly from the real world
Treatment effect only applies to units with suitable comparisons
(common support region)
Researchers are often skeptical that the CIA has been met
Results are conditioned by assumptions of the matching
algorithm
Statistical power is a design feature
|
Cost: Moderate to high, depending on data collection needs
Contamination risk: High
Coordination: Moderate; mainly in combining with other methods
(DiD) to strengthen validity
Interpretation: Intuitive, but matching may lack transparency
Pre-intervention data needs: Moderate (matching)
Flexibility to adapt: Moderate
|
Instrumental variables (IV) [J D Angrist and Krueger,
2001]
|
An instrumental variable is a factor that predicts exposure to or
participation in an intervention, but that does not affect outcomes
directly through channels other than that effect on participation. This
creates exogenous variation in the intervention that can be leveraged to
determine its impacts. The impact measure is a local average treatment
effect that measures the effect of the intervention on those
(“compliers”) whose participation is affected by the instrument.
Program placement rules or constraints may give rise to valid
instruments.
|
Confounding: For many interventions and outcomes, there are few
plausibly “exogenous” assignments of this type, at least in a
statistical sense
Spillovers (i.e., violation of SUTVA)
|
Evidence arises directly from the real world
Treatment effect (LATE) is not representative, and not always for the
most relevant population
Researchers are often skeptical about exclusion restriction
Results are conditioned by exogeneity assumptions
Statistical power is often reduced by 2-stage estimation
|
Cost: Low to moderate, depending on data collection needs
Contamination risk: Not applicable
Coordination: Low
Interpretation: Unintuitive, lacks transparency
Pre-intervention data needs: Low
Flexibility to adapt: High
Other: Suitable IV may not exist
|
Regression discontinuity (RD) [Imbens and Lemieux, 2008;
Thistlethwaite and Campbell, 1960]
|
RD exploits discontinuities in eligibility for an intervention with
respect to an assignment variable. For example, population thresholds,
or a poverty line threshold for subsidy eligibility.
|
Confounding: Eligibility rule violations or manipulation, or
“fuzzy” discontinuities that are difficult to characterize well
Spillovers (i.e., violation of SUTVA)
Vulnerable to selective attrition
|
Evidence arises directly from the real world
Treatment effect is limited to units very near the discontinuity
Generally accepted by researchers
Results are conditioned on proximity to eligibility cutoff
Statistical power may be limited
|
Cost: Low to moderate, depending on data collection needs
Contamination risk: Moderate, depending on rigor with which
eligibility is assessed
Coordination: Low
Interpretation: Intuitive, but transparency may be lacking due
to definition of the RD bandwidth
Pre-intervention data needs: Low
Flexibility to adapt: Low
|
Other |
|
|
|
|
Ex post regression
|
Statistical comparison of treated and untreated units, with statistical
control for observed differences between the groups. Also commonly
called “observational” comparisons.
|
Selection: Units that participate are systematically different
than those that do not
Confounding by unobservables
Spillovers (i.e., violation of SUTVA)
|
Evidence arises directly from the real world
Treatment effect is usually representative
Causal researchers are typically highly skeptical of results
Results are conditioned on controls
Statistical power: Difficult to anticipate ex ante
|
Cost: Low to moderate, depending on data collection needs
Contamination risk: Not applicable
Coordination: Low
Interpretation: Intuitive, but transparency may be lacking
(contingent on choice of controls)
Pre-intervention data needs: None
Flexibility to adapt: High
|
Counterfactual modeling [Balke and Pearl, 2013]
|
Complex water resources systems evolve stochastically according to both
human and environmental influences. This approach leverages systems
understanding from socio-hydrological or hydro-economic models to
conduct “with” and “without” simulations of interventions, for
construction of model-based comparisons [Srinivasan,
2015].
|
Confounding by behavioral or other system-level factors not
accounted for
|
Evidence is artefactual; model may diverge from real world
observations
Treatment effect is usually representative, but may not align with
policy-maker priorities and needs
Not widely used by causal social science researchers, who are wary of
over-calibration
Results are conditioned on model assumptions
Statistical power: Not applicable
|
Cost: Low
Contamination risk: Not applicable
Coordination: Low
Interpretation: Not intuitive and not always transparent
(requires interdisciplinary expertise)
Pre-intervention data needs: Moderate to high, depending on
calibration needs
Flexibility to adapt: High
Other: Required model effort is substantial
|