In this paper we examine how policymakers and practitioners should interpret the impact evaluation literature when presented with conflicting experimental and non-experimental estimates of the same intervention across varying contexts. We show three things. First, as is well known, non-experimental estimates of a treatment effect comprise a causal treatment effect and a bias term due to endogenous selection into treatment. When non-experimental estimates vary across contexts any claim for external validity of an experimental result must make the assumption that (a) treatment effects are constant across contexts, while (b) selection processes vary across contexts. This assumption is rarely stated or defended in systematic reviews of evidence. Second, as an illustration of these issues, we examine two thoroughly researched literatures in the economics of education—class size effects and gains from private schooling—which provide experimental and non-experimental estimates of causal effects from the same context and across multiple contexts.
We show that the range of “true” causal effects in these literatures implies OLS estimates from the right context are, at present, a better guide to policy than experimental estimates from a different context. Third, we show that in important cases in economics, parameter heterogeneity is driven by economy- or institution-wide contextual factors, rather than personal characteristics, making it difficult to overcome external validity concerns through estimation of heterogeneous treatment effects within a single localized sample. We conclude with recommendations for research and policy, including the need to evaluate programs in context, and avoid simple analogies to clinical medicine in which “systematic reviews” attempt to identify best-practices by putting most (or all) weight on the most “rigorous” evidence with no allowance for context.