I never cease to be astonished by the amount of energy people put into claiming that Randomized Control Trials (RCTs) are the be-all and end-all of impact evaluation methods; nor at the energy people put into claiming that RCTs are marginal, costly, and a waste of time.
In both cases, people build a case against one side with a kind of argument that they fail to apply to the side they favor. Angus Deaton roundly exposed all the assumptions in RCTs that qualify their use and Lant Pritchett questions whether RCTs ever have external validity. But in both cases, they are ignoring that most evaluation work, including most studies that claim to be measuring impact, are not rigorous quasi-experimental or sophisticated qualitative studies. They are a mix of methods, some whose assumptions are poorly explicated and many of which fail to explicitly address even the most basic concerns about bias.
In a recent blog, Philipp Krause highlights another part of the anti-RCT critique. He correctly argues that the flat theory of change expressed by many RCT supporters, from study-results-to-efficient-policy-decision model is absurd. But, like many other RCT critics, he then claims that alternative evaluations are cheap and effective, ignoring just how much money is poured into such studies and how many of them also lack rigor and also fail to influence policy. Krause cites Chile’s public evaluation system for evidence that large benefits can be obtained from relatively inexpensive studies. But Chile is an outlier, the exception not the rule. Yes, Chile’s experience shows that inexpensive studies looking at a particular set of questions can be very useful. No, it does not prove that RCTs are marginal and useless.
Krause and Pritchett argue that RCTs are marginal, and I agree, but not for the reasons that they put forward. RCTs are marginal because only about 200 of them (my estimate based on a 3ie database) are being started in any given year on topics related to development programs. This is dwarfed by the thousands of evaluations being conducted using expert interviews, focus groups, non-purposive samples, and quasi-experimental methods. RCTs are hot and visible which makes it look like they’re the dominant form of evaluation (especially if you take your sample at a J-PAL 10th anniversary event), but they’re not the dominant form of evaluation.
And here’s the main point: No method can be proven less costly or more effective than any other without reference to the specific context in which a specific evaluation question is being asked.
Abstract methodological debates are worse than useless. They are harmful because they cast doubts on good studies along with the bad. They provide excuses for managers and politicians to cut spending on evaluation work rather than making the case to improve the appropriateness of methods applied to different kinds of policy questions. And they confuse people regarding whether there even is any such thing as a better or worse study. Ruth Levine makes an eloquent case that we risk throwing out the baby (genuine efforts to find out what policies do) and drowning in the bathwater (of polarized debates over methodological superiority).
So here’s my plea: Stop criticizing evaluation methods in the abstract.
That means, if you read a study and think it is asking irrelevant questions or applying the wrong method to answer its question, then focus your criticism on that specific study – and propose a better alternative for that case, if you know of one.
RCTs are not the be-all and end-all of evaluation evidence. Neither are regression-discontinuity, utilization-focused evaluation, realist evaluation or participatory evaluation. Each of these are part-of-the-all and have their place in addressing different questions in different places for different purposes.