BLOG POST

Using “Random” Right: New Insights from IDinsight Team

December 10, 2015

The unfolding of “thesis, antithesis, synthesis” about the use of randomized control trials (RCTs) as a tool in improving development policies and practices has reached the “synthesis” stage.  A new paper in the 3ie working paper series “Evaluations with Impact” by Shah, Wang, Fraker and Gastfriend (hereafter IDinsight team and, full disclosure, three of which were students of mine) (2015) does an excellent job both in laying out the debate and in articulating a newly emerging conventional wisdom in which quite similar new approaches to the use of randomization have been proposed by a number of actors.

In this debate the “thesis” was that the increased use of randomized controlled trials (RCTs) as a technique in impact evaluation (usually promoted as “independent” impact evaluation) would make important contributions to human well-being in the field of development.  Variants of this thesis were promoted by academics at newly formed research hubs like JPAL and IPA, by practitioners at newly formed organizations like 3ie, and in at least parts of major multi-lateral and bi-lateral organizations (DIME in the World Bank, DIV in USAID, DFID, IDB, etc.), and a few governments (e.g. Mexico’s unit in the Social group). 

In 1996 circumstances led me to be the World Bank “task manager” of record of an early RCT by Michael Kremer and Paul Glewwe.  Hence I was an early non-adopter of this new technique because I could see it was leading research away from, not towards, key needed insights about the process of improving development projects and programs.  For about 20 years I have been part of the antithesis.  I have always maintained that RCTs got one claim right—using randomization in assignment was the best way to identify the causal impacts of particular interventions—but got everything else about the use and impact of RCTs on development policy and practice wrong.  I will just list (with links to various papers and blogs) the four claims in the “causal chain” or “theory of change” or “logframe” from doing more RCTs to better development practice to increased human well-being that I argue against. 

  • Claims that RCTs of impact evaluation could (even in principle) produce useful codifiable knowledge with external validity about development policies and practices were wrong (paper and paper).
  • Claims about the political economy of policy adoption and scaling were wrong (paper).
  • Claims about how organizations learn and change practices on the basis of evidence were wrong (paper and paper).
  • The claim that RCTs would or could address issues of first order importance in development was wrong (blog).

I will be the first to admit that my arguments (and even those of other more senior and respected development economists like Agnus Deaton (paper and paper)) had the effectiveness of a pea-shooter against a tank.  Apparently nothing could prevent a hype cycle in RCTs.  One reason critics had no impact was that a very effective rejoinder to early critiques was not counter-argument per se but the Alka-Seltzer commercial refrain: “Try it, you’ll like it.” Initially there were so few RCTs the argument that we couldn’t possibly know their impacts until we tried them was persuasive.

Another reason critics had little impact was that agnostics rarely alter the course of faith based movements; heretics do. 

The importance of the IDinsight team paper is that neither of these powerful counter-objections apply. 

The IDinsight team (2015) points to 2,500 RCTs in development (the latest version of Eva Vivalt’s meta-analysis paper draws on 647 papers). OK, so we’ve tried it, for going on two decades now, and at massive scale.  Where are we, and now what? 

Perhaps even more importantly, the IDinsight team authors are children of the revolution, not older agnostics who saw the rise of RCTs from the outside (and can be accused of “not getting it” or defensiveness about the past).

On my reading, a key insight of the IDinsight team is to distinguish Knowledge Focused Evaluation from Decision Focused Evaluation – two uses of the technique of randomization (the first three columns of Table 1 are from their paper).  That is, rather than a mostly pointless debate about the value of “randomization” as a technique per se, the interesting questions are the who, what, how, and why of randomization.  Their point is that “knowledge focused” evaluations were seen as an arm of research and about building a generalizable (and hence externally valid) body of knowledge about “what works.”  In contrast, “decision focused” evaluations are the use of randomization to make decisions relevant to the implementing organization.

Table 1: Characteristics of Knowledge-Focused, Decision-Focused, and Accountability-Focused Evaluations

Characteristic

Knowledge Focused Evaluation
(IDinsight team)

Decision Focused Evaluation
(IDinsight team)

Accountability Focused
(Me)

Question source

Evaluator (with input from implementer)

Implementer (with input from evaluator)

Funder (e.g. MoF, donor)

Evaluator

Outside technical support

Embedded policy advisor

Outsider (“independent”)

Time to release findings

1-5 years

1-14 months

Depends on funder

Cost

$100,000-5 million

US$10,000-$500,000

? (Properly only incremental cost vs other evaluation methods)

Methodology

Lower diversity with emphasis on more robust methods and downstream outcomes

Higher diversity with greater emphasis on proximate outcomes and practical considerations

Emphasis on impact/outcome on beneficiary justification

External validity

Significant concerns due to intention to apply findings across contexts

Reduced concern since action is intended to occur in implementer’s context

Less concern

Definition of success

Contribution to development theory, contribution to high level policy debates, scale-up of generalizable interventions

Informed decision and at-scale action, or program discontinuation, in implementer’s context.

Does evaluation provider funder with assessment of impact of implementer.

Source: IDinsight team (2015), table 5 for first two columns (Knowledge Focused Evaluation and Decision Focused Evaluation), my own for AFE.

IDinsight team assess (based on evidence and interviews with key actors) the “weak links in the Knowledge Focused Evaluation theory of change” and examine why there is now a general perception the randomista revolution has had rather more impact on PhD training and journal articles than on development practice. 

They argue a better approach is not to give up on randomization per se but rather to move to a strategic mix of Knowledge Focused Evaluations and Decision Focused Evaluations.  The IDinsight team argue that Decision Focused Evaluations should be (pg 22):

Demand driven—conducted only when an implementer desires evidence to inform future action

Tailored—generating decision-relevant evidence within the temporal, budgetary, operational, and political constraints of the implementer,

Embedded—within the implementer’s operation and decision making structures; and

Cost-effective—aiming for a positive social return on investment (with the evaluation considered as the ‘investment’).

To their distinction of Knowledge Focused Evaluation and Decision Focused Evaluation I would add a third type (in the fourth column of Table 1), the Accountability Focused Evaluation, which is building randomization into the ex post evaluations that are routinely required of donor funded projects and programs (and which are done in some government programs).  The World Bank (as one example of a multi-lateral organization I have some knowledge of) has always, as a matter of policy, done an ex-post evaluation of every project that rated the success or failure of the project.  That evaluation was reviewed by a quasi-autonomous arm of the organization (in that it answered only to the Board, not Management) which also rated the project.  This department (the once Operations Evaluation Department now the Independent Evaluation Group) also did/does both more in-depth evaluations of selected projects and thematic evaluations (e.g. of all “directed credit” or “integrated rural development” projects).  Part of the rhetoric around the advent of RCTs was that the existing project and program evaluation mechanisms were too weak as an accountability device both because they were not truly “independent” of the implementer/funder organization and that, without a valid counter-factual the causal impact could not be rigorously assessed.  Hence it was “win-win” as doing evaluation with RCTs could be both better at accountability (AFE) and knowledge (Knowledge Focused Evaluation). 

It is easy to see (even without my paper on the topic) that one cannot have Knowledge Focused Evaluation, Decision Focused Evaluation and Accountability Focused Evaluation all at the same time as one cannot both be embedded in the implementing organization and providing real time feedback on how to do what the organization wants to do better and an “independent” evaluator of whether what the organization does is worth doing and hence at risk of losing funding.  It is also easy to see (again, without my paper) that organizations will generally resist Accountability Focused Evaluation if they can and hence will resist Knowledge Focused Evaluation if bundled with AFE.  On the other hand, organizations are more likely to want to use Decision Focused Evaluation.  

The IDinsight team (2015) are articulating an emerging new “synthesis” based on the “lessons of experience” of the new evaluation organizations.  Because the arguments for it are roughly right, something like the shift from Knowledge Focused Evaluation to Decision Focused Evaluation and the rise of organizations like IDinsight itself working directly with development organizations is, with different names and different acronyms, happening nearly everywhere.

The “It’s all about MeE” is an organizational learning approach that combines monitoring (M), experiential learning (“little e”-like Decision Focused Evaluation) to “crawl the design space” and (at some stage), impact Evaluation (like Knowledge Focused Evaluation).  This is an integral part of the Problem Driven Iterative Adaptation (PDIA) approach to building state capability (BSC) at CID, of which I am a participant, and the broader Doing Development Differently (DDD) network.

The Evidence for Policy Design (EPoD) group, also at CID at Harvard proposes SMART policy design that emphasizes embeddedness with the implementing organization and rapid, evidence-based feedback loops. 

Howard White (until recently the executive director of 3ie) at a recent conference on evaluation in Berlin discussed the difference between “Accountability” evaluations and “Learning” evaluations.  He made something very much Accountability vs Knowledge Focused Evaluation vs Decision Focused Evaluation distinction saying: “Second generation questions don’t ask ‘does it work’ but design questions about ‘how can it work better?  He noted that the biggest uptake in the use of randomization was by the private sector where organizations like Yahoo! do “impact evaluations” that take one hour.  These are clearly Decision Focused Evaluation not Knowledge Focused Evaluation uses of randomization.

The new Global Innovation Fund has an approach of “pilot, test, scale” which recognizes that a “pilot” phase in which there is “field testing” of innovations (the Decision Focused Evaluation stage) that must precede the “test” (or Knowledge Focused Evaluation stage).  And perhaps the “pilot” and “test” needs to be adapted for each context as “pilot-test” in one context and “scale” in another is demonstrably poor practice.

The IDinsight team points in a useful and productive direction to take techniques of randomization into the fabric of the detailed decision making and tacit learning that organizations need to do in the course of responding to problems and scaling up robust approaches to solving those problems.  

It is love not science that means never having to say you are sorry.

I would like to point out that it is love not science that means never having to say you are sorry.  What was “learned from experience” that leads to a shift in emphasis from Accountability Focused Evaluation and Knowledge Focused Evaluation to Decision Focused Evaluation (or its similar variants like MeE or SMART policy design or many others) was that (a) there were few development problems for which there was a logistically implementable solution based on codifiable knowledge for which an RCT (or even a small set of RCTs) could generate external valid results, so, for instance, the medicine analogy often appealed to in support of the value of RCTs was almost wholly misleading, (b) the model that organizational learning could effectively happen through Knowledge Focused Evaluation and Accountability Focused Evaluation was deeply flawed and (c) the political economy of policy change and program design that assumed that effectiveness was primarily limited by availability of rigorous evidence was also misguided.  Hmm. 

Disclaimer

CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.