With rigorous economic research and practical policy solutions, we focus on the issues and institutions that are critical to global development. Explore our core themes and topics to learn more about our work.
In timely and incisive analysis, our experts parse the latest development news and devise practical solutions to new and emerging challenges. Our events convene the top thinkers and doers in global development.
In this paper we examine how policymakers and practitioners should interpret the impact evaluation literature when presented with conflicting experimental and non-experimental estimates of the same intervention across varying contexts. We show three things. First, as is well known, non-experimental estimates of a treatment effect comprise a causal treatment effect and a bias term due to endogenous selection into treatment. When non-experimental estimates vary across contexts any claim for external validity of an experimental result must make the assumption that (a) treatment effects are constant across contexts, while (b) selection processes vary across contexts. This assumption is rarely stated or defended in systematic reviews of evidence. Second, as an illustration of these issues, we examine two thoroughly researched literatures in the economics of education—class size effects and gains from private schooling—which provide experimental and non-experimental estimates of causal effects from the same context and across multiple contexts.
We show that the range of “true” causal effects in these literatures implies OLS estimates from the right context are, at present, a better guide to policy than experimental estimates from a different context. Third, we show that in important cases in economics, parameter heterogeneity is driven by economy- or institution-wide contextual factors, rather than personal characteristics, making it difficult to overcome external validity concerns through estimation of heterogeneous treatment effects within a single localized sample. We conclude with recommendations for research and policy, including the need to evaluate programs in context, and avoid simple analogies to clinical medicine in which “systematic reviews” attempt to identify best-practices by putting most (or all) weight on the most “rigorous” evidence with no allowance for context.