An Evaluation Gap is Restraining Progress

When Will We Ever Learn?
Recommendations to Improve Social Development through Enhanced Impact Evaluation

Aid evaluation plays an essential role in the efforts to enhance the quality of development co-operation.
-OECD/DAC "Principles for Evaluation of Development Assistance"

The Millennium Development Goals call, among other things, for increasing primary school completion, reducing child mortality rates and the incidence of malaria, and making the benefits of new technologies, especially information and communication technologies, available in developing countries. However, we know surprisingly little about what are the most effective ways to reach these goals. For example, when the Center for Global Development convened a group of health experts to nominate successful public health projects, resulting in the book Millions Saved, it discovered that less than two dozen programs could substantiate their claims of success with good evidence.

This paper demonstrates that there is a serious gap in our knowledge about what are the most effective social development programs in developing countries. This in turn reflects a lack of high quality impact evaluations, that is, evaluations that are designed to measure the impact directly attributable to a specific program or policy as distinct from other potential explanatory factors. The paper argues that the cost of not having this information – in terms of misallocating resources and even in some cases causing harm – is high. It analyzes the obstacles to more systematic production of good quality impact evaluations and demonstrates that confronting these obstacles requires collective action by international agencies and governments. It is hoped that the recommendations in this paper will serve as the basis for international discussion and agreements that will improve the quality and focus of impact evaluations so that 5 or 10 years from now, we will have answers to important questions about what kinds of social development programs are most likely to succeed.

I. An Evaluation Gap is Restraining Progress

Knowledge is a global public good which has the potential to improve millions of lives without being depleted. For example, the discovery in the late 19th Century that cholera was transmitted through contaminated water has saved lives around the world and over many years. In this regard, the knowledge gained from evaluating public programs is the most systematic approach available for improving public policy and spending public money well. Thus, it is an invaluable complement to the political processes that establish public policy. Fortunately, the capacity to conduct good evaluations and learn from them has increased dramatically in the last few decades as a consequence of methodological advances, expanding research institutions, a growing number of educated and skilled evaluators, and more effective evaluation offices in public agencies around the world.

Nevertheless, the evidence base for designing new programs and providing financial assistance remains quite weak. A substantial amount of resources are applied to designing projects, monitoring their implementation and even to measuring their outputs, but very little is done to assess whether the projects are ultimately successful and necessary for achieving positive impacts. This underinvestment in good quality "impact evaluations" – studies designed to measure the net impact directly attributable to a program or policy – means that opportunities for expanding good projects are lost and that funds continue to be wasted on bad ones.

This gap is puzzling because building this knowledge does not require an impact evaluation for every project. It requires only that agencies take a strategic view and conduct impact evaluations in those projects that can yield important information about what is and is not effective. For example, it would suggest paying particular attention to projects that are widespread but for which effectiveness has not been demonstrated, or to new approaches that have not yet been tested. Given the volume of money devoted to social programs and the emphasis on “results,” it is puzzling that the design of careful impact evaluations for these kinds of projects – with clear strategies for drawing valid inferences based on plausible counterfactual scenarios – are so rare.

This gap in knowing what projects and programs are effective is made worse by the poor implementation of impact evaluations in those cases where they have been commissioned. Too often, impact evaluations are neglected by development agencies and governments who, quite reasonably, are focused on implementation of the current project but then lose important opportunities to improve future policies. As a result, while they may generate studies that are important for improving processes, operations and implementation, they frequently neglect studies of impact. And when impact evaluations are conducted, all too often, their designs are so weak that they cannot reliably measure the net impact of the programmed activities.

As one example, consider the history of programs that promote voluntary community health insurance schemes as a way to build sustainable financing for health services. Such programs have been proposed and encouraged for decades (See, for example, WHO 1978). Since that time, millions of dollars have been spent on such programs in dozens of countries and reviews of the literature evaluating such programs give the impression that we know a great deal about these programs and that they are beneficial (e.g. Commission on Macroeconomics and Health 2001 and 2002). However, reviews that explicitly discount studies that are methodologically weak, find that there is very little evidence actually available on whether these strategies are effective. The ILO’s Universitas Programme reviewed 127 studies, studying 258 community health schemes, and found that only 2 of these cases had “internal validity”, that is, only 2 of these studies were designed in such a way that they could distinguish impacts on the relevant population that were specific to the program from changes attributable to other factors. As they state:

…even for utilization of services the information and analysis is scarce and inconclusive mostly due to the few studies that address the question … and due to the lack of internal validity for most of those studies that address the question. The main internal validity problems are related to, inter allia (sic), lack of base lines, absence of control groups, problems in sampling techniques, control for confounding variables … and sources of data for utilization analysis. (ILO 2002, p. 47)

Other reviews on community health schemes confirm the methodological weakness of the literature identified in the ILO Report. (Ekman 2004, Jakab et al 2001).

This is not to say that impact evaluation has been barren in all fields or at all times. Good evaluations do happen, and when they are disseminated, they stand out in their field. Decades later, the RAND health insurance experiment and the Job Training Partnership Act (JTPA) evaluation in the United States remain important points of reference for designing health insurance and job training programs (Newhouse 2004, Gueron 2002, Wilson 1998). More recently, the evaluation of Mexico’s conditional cash transfer program, PROGRESA/Oportunidades, has influenced the design of similar programs throughout the world (Morley and Coady 2003).

In 2004, we reviewed existing initiatives to address this apparent underinvestment in good impact evaluations. We found that several organizations are working to improve impact evaluation in various ways: through advocacy, publishing guidelines, training programs, literature reviews, and promoting or conducting specific evaluations. However, no organization was asking why international agencies and developing countries continually face the same dilemma over the course of decades. International agencies make it clear that they want to finance proven programs and governments insist on value for money, yet the frequent absence of evidence does not spur adequate investment in the good quality impact evaluations that are necessary to fill these gaps in our knowledge.