With rigorous economic research and practical policy solutions, we focus on the issues and institutions that are critical to global development. Explore our core themes and topics to learn more about our work.
In timely and incisive analysis, our experts parse the latest development news and devise practical solutions to new and emerging challenges. Our events convene the top thinkers and doers in global development.
Each year billions of dollars are spent on development programs with relatively few rigorous studies of whether they actually work. In 2004, CGD set out to address this lack of good quality impact evaluations and our recommendations led to the creation of the International Initiative for Impact Evaluation (3ie) in 2009. The number and quality of impact evaluations has risen significantly, but there is still a long way to go to make sure future development interventions are based on evidence of what works.
In 2006, CGD published a working group report that addressed the insufficient number of rigorous impact evaluations of social programs in low- and middle-income countries. Last week —marking 10 years since the report’s release—CGD and J-PAL co-hosted the event, “Improving Development Policy through Impact Evaluation,” which echoed three key messages of the 2006 report: 1) conduct more and better evaluations; 2) connect evaluators and policymakers; and 3) recognize that impact evaluations are an important global public good that requires more unconstrained funding. Participants celebrated progress, but agreed that there’s more work to be done.
Here are our three takeaways:
1. Conduct more and better impact evaluations
Even if you’re not an avid follower of impact evaluation news, you may have read blogs or papers criticizing “Randomistas”—researchers who conduct randomized control trials (RCTs)—for taking over development research, lacking external validity, ignoring complexity, and displacing more important topics. The number of impact evaluations over the past 10 years has indeed increased markedly. But the presentations at the event put this trend into context. First, RCTs are not taking over; while they are driving the growth in impact evaluations, other methods (quasi-experimental studies) are increasing too. Moreover, fewer than 10 percent of evaluations conducted directly by major development agencies are impact evaluations and fewer than half of those are RCTs. In the case of USAID, for example, a 2013 study found that only three percent of the agency’s evaluations conducted between 2009 and 2012 were impact evaluations that use a counterfactual to attribute impact to a particular intervention.
More importantly, impact evaluations are influencing policy. Abhijeet Banerjee, J-PAL director and professor of economics at MIT, discussed how evidence from different contexts—rather than just a single, large RCT—has answered old questions and shifted policy debates. He highlighted three such cases: preventive health products like bed nets (for which charging even a small fee has a big effect on demand); microcredit (it doesn’t have a transformative impact on raising incomes); and improving learning (teaching at the right level can produce large improvements). Reflecting on these examples, Rachel Glennerster, executive director of J-PAL, suggested that the benefits from this kind of evidence in just a few areas are large enough to justify the spending that has gone into the entire portfolio of impact evaluations.
2. Connect evaluators and policymakers
Many of today’s impact evaluations engage evaluators and policymakers in jointly defining questions, designing programs, and identifying policy implications. Traditionally, evaluators insisted on independence to assure objectivity. But this separation often results in evaluators being brought in after a project is underway and when the opportunity to collect baseline data and test hypotheses is lost. Glennerster noted that researchers today rely on their methodology and transparency to assure objectivity, allowing them to engage in defining questions and collecting useful data.
The event included both policymakers and researchers who reflected on their partnerships. Bambang Widianto, executive secretary in the office of the vice president of Indonesia, discussed how collaboration with external researchers helped his government’s decision to expand an identification card system that sharply cut waste in the national rice subsidy program. Rema Hanna, scientific director for J-PAL Southeast Asia and one of the researchers who conducted the evaluation, explained how timely funding from the Australian Government permitted the research team to turn around results in a matter of six months to meet the government’s budget deadline. Banerjee used this case to reflect on the value to policymakers of having quick access to technical support from researchers.
3. Greater investment is necessary
A key message of “When Will We Ever Learn?” in 2006 was the need for greater investment in rigorous impact evaluations because the knowledge they create is a global public good. While most agencies and foundations have increased their own spending on impact evaluations, only a few, like the British Department for International Development (DFID) and the Bill and Melinda Gates Foundation, have provided substantial sums to organizations that contract studies with an eye on the collective benefit they provide. At the event, one of us (Savedoff) argued that agencies and foundations need to stop being free-riders and contribute resources to organizations that that can apply these funds flexibly and strategically, so that impact evaluations are not only good quality but also widely relevant to policy. Specifically, agencies could dedicate 0.1 percent of their annual disbursements to the International Initiative for Impact Evaluation (3ie) or other similar funds that conduct impact evaluations to build collective knowledge. Speakers also highlighted funds like USAID’s Development Innovation Ventures and the Global Innovation Fund that solicit a wide range of innovative proposals which are then rigorously evaluated as small-scale pilots or tested at scale.
Despite progress, impact evaluations have only addressed a few of the myriad unanswered questions about programs being implemented around the world. Moreover, data on cost effectiveness—an important complement to impact studies and a critical component for decision-making—are hard to come by. While conducting research for Millions Saved: New Cases of Proven Success in Global Health, Amanda Glassman and colleagues identified 50 evaluations of at-scale health programs that used rigorous methods to attribute impact (out of a total of 250 evaluations), but only three of them estimated cost effectiveness. Recapping this experience at the event, Glassman emphasized that cost analyses using standard methodologies can help policymakers draw useful comparisons and enhance value for money.
Summing up the road ahead, Santhosh Mathew, joint secretary of India’s ministry of rural development, highlighted the need for trust, resources, and sustained commitments from donors, researchers, implementers, and policymakers to continue closing the evaluation gap. And many questions still remain: how to assure external validity? how to prioritize when and where to conduct rigorous impact evaluations? While progress over the past decade is surely cause for optimism, it is definitely no time for complacency.
This blog is part of a special series celebrating CGD’s 15th anniversary in 2016. All year, CGD experts will look back at work we’ve done that has had real-world impact, and forward to future research that we hope will help increase global prosperity.
“3ie has made my job much easier.”
This is what we heard last month from a high-ranking government official in Africa, referring to the International Initiative for Impact Evaluation (3ie), and it made us very proud. Creating 3ie was the outcome of the Evaluation Gap Working Group that we led along with Nancy Birdsall to address the limited number of rigorous impact evaluation of public policies in developing countries. As CGD celebrates its 15th year, it is worth considering what made that working group so successful, the obstacles we confronted, and the work that still remains to be done.
In the early 2000s, we decided to tackle a long standing problem: there were simply too few good quality impact evaluations being conducted to ascertain whether development projects were achieving their goals and to help developing countries learn which interventions are effective and how to improve them. We wanted to know why governments and agencies underinvested in rigorous studies so that we might develop a practical solution to this problem. With the support of the Gates and Hewlett Foundations, we convened a working group to undertake this task.
When CGD convenes working groups, it tries to get people with different perspectives and backgrounds. The Evaluation Gap Working Group members followed this model: some people were strong supporters of particular evaluation methods, while others were skeptical of the demand for, let alone the usefulness of, such studies. Through research findings, interviews, and consultations in different parts of the world, we gradually developed a consensus report on the need for more and better impact evaluations, the reasons behind the lack of investment in evaluation within the development community, and the potential solutions corresponding to those reasons: strengthening the evaluation practices of major funders and creating an international organization to fund and promote them. It took another two years to facilitate the process that culminated in the creation of 3ie, and longer still for large bilateral agencies, like the UK Department for International Development and the US Agency for International Development, to create and implement policies that incorporated attention to impact evaluation.
The Working Group process was not without its difficulties. We worried that the topic would fail to garner attention. One of our colleagues told us the issue was simply too “wonky.” Instead, it kicked off a firestorm among bilateral evaluation departments and evaluation associations. The negative reaction suggested to us a kind of “Emperor’s Clothes” story – most agencies and evaluation experts knew that their evaluation studies weren’t up to the task of assessing impact but they were loath to address it. There are likely many reasons for this, ranging from professional pride to sincere concerns about the ethics, feasibility and utility of what some perceived as an approach too sophisticated or inappropriate for the context of development programs.
In parallel, researchers (primarily economists) were expanding the use of Randomized Control Trials (RCTs) to assess project interventions and our initiative got tied up in long-standing methodological debates over the applicability of RCTs to social analysis. As much as we argued that our initiative was about rigor and not a particular method, it continued to be branded as an RCT crusade – and, admittedly, there were some on the working group who sought to make it that. However, even though the final report argued that RCTs held great promise, it called for convening an international group to develop standards and stated that “[t]he starting point is defining the policy question and the context. From there it becomes possible to choose the best method for collecting and analyzing data and drawing valid inferences.”
By raising the question of why more impact evaluations were not being conducted – why so few development programs even had baseline data – the working group process directly shook up established interests. Evaluation among bilateral agencies was (and still is) focused more on processes, operations, and strategies than impact. The world of professional evaluators has a lot to offer in these kinds of studies but has relatively fewer experts in constructing the plausible counterfactuals – either statistically or qualitatively – that are needed to assess impact. The World Bank delayed publication of our report by a full year by offering numerous critiques which, once written down, were at best confusing. The reasons for this opposition remain unclear to us, but were probably related to the World Bank’s own interest in getting funds for its impact evaluation efforts like DIME and SIEF. It remains a shame that the World Bank is not providing funding to 3ie and other collective efforts to promote impact evaluation in developing countries. The World Bank is probably the only international organization with the scale and capacity to pursue its own rigorous evaluation program, and perhaps because of that go-it-alone tendency has not fully engaged in the community of practice around impact evaluation.
In retrospect, it’s clear that the working group process and results gradually won respect and shifted some of the terms of the debate. The working group articulated a general recognition that more impact evaluation was required and that the rigor of evaluations needed to be addressed. It persuaded some key people that evaluating important questions about the impact of aid at the country or sector level still needed to be informed by evidence on local and specific interventions. It demonstrated that a dozen or so developing countries had an interest in institutionalizing impact evaluation as part of their domestic policy process. And last but not least, it culminated in creating an international organization, 3ie, to channel funds and build a community of practice around more and better impact evaluations.
Impact evaluations have clearly increased in numbers and quality over the last 10 years. CGD’s Evaluation Gap Working Group and the creation of 3ie were not the sole cause of this, but we think it plausible to say that the initiative contributed and may have even accelerated this process by bringing a new organization and more funding into the field. In fact, though the controversy and misunderstanding was frustrating, it probably garnered the subject matter more attention than it would have gotten if we hadn’t ruffled feathers.
The Evaluation Gap initiative, one of the early CGD working groups, was in equal parts eclectic and directed, focused on a problem and keen to get to a solution, albeit not a predetermined one. Many CGD working groups follow this approach: identifying a problem, posing a provocative question that invites different ways of thinking, convening people with different perspectives, and informing the process with good research. When the working group’s final reports and recommendations are issued, CGD follows through by direct engagement with organizations that could carry the ideas forward, as facilitators, and sometimes incubators.
The evaluation gap is closing even if it isn’t closed. The key finding of our 2006 report was the institutional bias against investing in rigorous studies of impact. 3ie is heavily dependent on a few committed funders when it should be financed by long-term financial commitments from all countries and multilateral agencies engaged in policy and programs. We have argued that a collective commitment to dedicate 0.1% of annual disbursements to 3ie – or some similar international fund for rigorous impact evaluation – would be make a big difference to improving the effectiveness of development aid. In fact, we argue that collectively financing the creation of knowledge should actually be the future of aid. We simply cannot continue to underinvest in the evidence base on public policy.
Finally, this is not simply a matter of more studies but also the growth of a community of practice that includes researchers, government officials and institutions. The African official we quoted at the beginning explained why 3ie was so important. “I can get things done in [in my country] because of the resources I can draw on from 3ie, and I stay motivated because I can turn to these international colleagues.” That is one key to global development.
Ruth Levine is now Director, Global Development and Population Program, Hewlett Foundation and was formerly a CGD Senior Fellow and Vice President for Programs and Operations.
Recently, I sent out the final Evaluation Gap Update – a newsletter about impact evaluations and the institutions that fund them, implement them, or are supposed to be influenced by them. After 10 years, it seemed the right time to move on to other projects, particularly since numerous other resources have sprung up over this decade (many listed below!). Yet there is pushback on the growth of impact evaluations that sometimes worries me. I hear people say too many impact evaluations are being conducted (despite the need for the information they provide). I hear others claim that impact evaluations are irrelevant (based on a faulty model of how policymaking happens).
We started the Evaluation Gap Updates in 2005 to accompany the work of CGD's Evaluation Gap Working Group which concluded in 2006 with its report "When Will We Ever Learn." At that time, we documented how few impact evaluations were being conducted on public policies in low- and middle-income countries and recommended greater investment in this kind of research. Several foundations and a few aid agencies picked up the challenge to create 3ie. In this way, our work joined existing initiatives in developing countries, research centers, and aid agencies that were seeking to address the same problem. Today, more impact evaluations are being conducted than ever before.
This very success in mobilizing resources for learning has created a target for people arguing that we now have too many impact evaluations. Yet the numbers of studies are stillveryfew relative to the questions we are asking and to the myriad programs underway around the world in multiple sectors. We certainly need other kinds of studies and different approaches to policy, but the specific need for impact evaluations is still, in my view, nowhere near being met. If anything, the world needs to commit more resources to impact evaluation than it is doing today.
The other criticism that I hear is that impact evaluations are not relevant to policies. This may be true if you expect a one-to-one correspondence between a particular study and a specific decision – granted, occasionally you can find such a link. However, evaluations do not really influence policy in such a reductionist way. Rather such influence occurs through a complex process of interpretation and knowledge. Studies enter a social space of debate, along with other kinds of information, and gradually alter the frames within which people think and act. In other words, for the topics addressed by these impact evaluations, better understanding filters into discussions and from there into the minds of policymakers, agency staff, and project implementers.
Even though the evaluation gap is closing and our Updates are finished, I believe the value of impact evaluations will be increasingly recognized. In part, I recall interviewing an evaluator with many years of experience who assured me that fluctuating support for evaluations was less like a pendulum and more like a spiral – each cycle left us at a somewhat higher level of capacity and learning. I’m also hearted by the growth in the number and quality of information sources that I perused for the Evaluation Gap Updates. The following list includes links to many good newsletters, blogs and databases which I have used and which can help you stay current on this dynamic field. And you can always follow what my colleagues and I are doing at the Center by subscribing to other related CGD Newsletters and checking our CGD Policy Blogs.
The International Initiative for Impact Evaluation (3ie) has announced that Emmanuel (Manny) Jimenez will be the organization’s new Executive Director starting in early 2015. The selection of Jimenez represents a key transition for 3ie, which has moved quickly from start-up to maturity in just six years.
When I was involved in negotiations to think about, design and create 3ie between 2008 and 2009, lots of people were skeptical about the need for such an institution and about its capacity to accomplish its goal: getting better information for public policies in developing countries by promoting more and better impact evaluations. Yet the rationale seemed obvious to me. Aid agencies are regularly asked for evidence of impact, but are just as regularly denied the funding or high-level commitment to conduct the research necessary to provide that very evidence. Creating an external entity to carry out that function seemed like a really good idea.
3ie saw success early on thanks to sustained interest from key funders; a commitment to selecting high-quality relevant studies by the first Board and its first chair, Paul Gertler; and the energy, experience and outreach efforts of its first Executive Director, Howard White, and the able team he assembled.
Over the last few years 3ie has taken two more big steps toward maturity. The renewal of the Board, along with the appointment of Richard Manning as its new Chair, has shown the organization’s capacity to engage in fruitful reflections on its next steps. With the transition to a new Executive Director, 3ie will be able to build on past success while drawing on the different qualities and depth of experience and expertise that Manny Jimenez will bring to the post.
This latest transition proves 3ie can renew itself while building on its strengths. The United Kingdom, the Bill & Melinda Gates Foundation, and the William & Flora Hewlett Foundation deserve credit as 3ie’s leading funders, but it is time to revisit one of the original visions for 3ie – that all foreign aid and multilateral agencies should contribute 0.01% of their annual disbursements to 3ie in support of impact evaluation. These studies are a public good that is underfunded relative to its value and one which provides useful evidence for policy decisions by all agencies and developing countries.
But that’s another challenge. For now, it is mostly time to congratulate and welcome Manny Jimenez to his new position and to wish 3ie continuing growth and success.
I never cease to be astonished by the amount of energy people put into claiming that Randomized Control Trials (RCTs) are the be-all and end-all of impact evaluation methods; nor at the energy people put into claiming that RCTs are marginal, costly, and a waste of time.
In both cases, people build a case against one side with a kind of argument that they fail to apply to the side they favor. Angus Deaton roundly exposed all the assumptions in RCTs that qualify their use and Lant Pritchett questions whether RCTs ever have external validity. But in both cases, they are ignoring that most evaluation work, including most studies that claim to be measuring impact, are not rigorous quasi-experimental or sophisticated qualitative studies. They are a mix of methods, some whose assumptions are poorly explicated and many of which fail to explicitly address even the most basic concerns about bias.
In a recent blog, Philipp Krause highlights another part of the anti-RCT critique. He correctly argues that the flat theory of change expressed by many RCT supporters, from study-results-to-efficient-policy-decision model is absurd. But, like many other RCT critics, he then claims that alternative evaluations are cheap and effective, ignoring just how much money is poured into such studies and how many of them also lack rigor and also fail to influence policy. Krause cites Chile’s public evaluation system for evidence that large benefits can be obtained from relatively inexpensive studies. But Chile is an outlier, the exception not the rule. Yes, Chile’s experience shows that inexpensive studies looking at a particular set of questions can be very useful. No, it does not prove that RCTs are marginal and useless.
Krause and Pritchett argue that RCTs are marginal, and I agree, but not for the reasons that they put forward. RCTs are marginal because only about 200 of them (my estimate based on a 3ie database) are being started in any given year on topics related to development programs. This is dwarfed by the thousands of evaluations being conducted using expert interviews, focus groups, non-purposive samples, and quasi-experimental methods. RCTs are hot and visible which makes it look like they’re the dominant form of evaluation (especially if you take your sample at a J-PAL 10th anniversary event), but they’re not the dominant form of evaluation.
And here’s the main point: No method can be proven less costly or more effective than any other without reference to the specific context in which a specific evaluation question is being asked.
Abstract methodological debates are worse than useless. They are harmful because they cast doubts on good studies along with the bad. They provide excuses for managers and politicians to cut spending on evaluation work rather than making the case to improve the appropriateness of methods applied to different kinds of policy questions. And they confuse people regarding whether there even is any such thing as a better or worse study. Ruth Levine makes an eloquent case that we risk throwing out the baby (genuine efforts to find out what policies do) and drowning in the bathwater (of polarized debates over methodological superiority).
So here’s my plea: Stop criticizing evaluation methods in the abstract.
That means, if you read a study and think it is asking irrelevant questions or applying the wrong method to answer its question, then focus your criticism on that specific study – and propose a better alternative for that case, if you know of one.
RCTs are not the be-all and end-all of evaluation evidence. Neither are regression-discontinuity, utilization-focused evaluation, realist evaluation or participatory evaluation. Each of these are part-of-the-all and have their place in addressing different questions in different places for different purposes.
In this paper we examine how policymakers and practitioners should interpret the impact evaluation literature when presented with conflicting experimental and non-experimental estimates of the same intervention across varying contexts. We show three things. First, as is well known, non-experimental estimates of a treatment effect comprise a causal treatment effect and a bias term due to endogenous selection into treatment. When non-experimental estimates vary across contexts any claim for external validity of an experimental result must make the assumption that (a) treatment effects are constant across contexts, while (b) selection processes vary across contexts. This assumption is rarely stated or defended in systematic reviews of evidence. Second, as an illustration of these issues, we examine two thoroughly researched literatures in the economics of education—class size effects and gains from private schooling—which provide experimental and non-experimental estimates of causal effects from the same context and across multiple contexts.
The impact evaluation world has changed dramatically through a range of initiatives at research institutions, think tanks, development agencies, and governmental policy units. It has now been seven years since CGD’s Evaluation Gap Working Group released “When Will We Ever Learn? Improving Lives Through Impact Evaluation,” and four years since the launch of 3ie.
The purpose of this conference is to reflect on what has been achieved in recent years, to consider how the environment has and has not changed, to assess existing initiatives aimed at improving the supply and use of high quality evidence and to provide ideas for 3ie as it considers the next stage of its strategy within this landscape. Please note that the afternoon sessions will be organized to include small group discussions with the intention of generating specific and useful ideas for future action.