Impact Evaluations are Feasible and Desirable

Though the value of the information from impact evaluations may be very great, the suggestion that more such evaluations are required is often met with skepticism – if not outright rejection. The main objections are that good impact evaluations:

are not appropriate for answering questions of importance to decision-makers;
cannot be ethically implemented;
are too costly;
produce results too late to be of use to decision makers;
do not provide important information about how programs operate; and
are too complex and do not influence policymaking.

Such objections are not supported by the facts, as we now demonstrate.

Current methods can answer questions that are important to decision-makers

It is difficult to design high quality impact evaluations than can answer policy questions such as “under what circumstances should a country have a fixed exchange rate?” Nevertheless, the range of questions that can be answered by well-designed impact evaluations is much wider than generally recognized.

Studies aimed at learning the best way to assist individuals can be relatively easy to design, but still require time and money. For example, certain questions can be easily studied by comparing participants and non-participants – who have been randomly assigned to a “treatment” and “control” group – because they relate to how individuals respond to specific interventions:

Do Vitamin A supplements reduce infant mortality? (Sommer et al 1986)
Do textbooks increase students learning? (Glewwe et al 2001)
Do microfinance programs improve child nutrition? (MkNelly and Dunford 1998)

But questions regarding the best ways to “produce” services – requiring comparisons across classrooms, facilities or districts – can also be addressed in a relatively straight- forward fashion:

Does hiring an additional teacher in non-formal schools improve attendance and performance? (Banerjee et al 2003)
What are the most effective ways to reduce provider absenteeism in schools and government clinics? (Banerjee and Duflo 2005)
Does rewarding teachers or children for improved test scores lead to sustained boosts in students’ learning? Which is more effective: incentives for teachers or for students? (Glewwe et al 2003 and Kremer 2003)
Does community monitoring of development programs reduce corruption? Is it more or less effective than audits? (Olken 2004)

It is still feasible, though more resource-intensive, to use randomized assignment to assess programs that have externalities—that is, to measure the net impact on a person (or community) from a program that was delivered to a neighboring person or community:

Does school-based mass treatment of children for intestinal parasites, in a high prevalence area, improve health and schooling even for those not receiving the treatment? (Miguel and Kremer 2001)
Does agricultural extension have effects beyond the farmers who are directly affected through the diffusion of their learning to their neighbors? (Conley & Udry 2001)

In other cases, measuring the impact of a national program delivering new social services is possible by contrasting changes across districts or municipalities as they are introduced in successive waves:

Do cash transfers to poor families that are conditional on school attendance and utilization of preventive health care services improve children’s health and schooling? (Gertler 2000, Schultz 2000)

Even questions that might be considered quite difficult to answer – such as the impact of gender on political decision-making – can be rigorously studied:

Do quotas for women's participation in political decision-making improve allocations of public funds? (Chattopadhyay and Duflo 2001)

In short, the only real limitation on this type of impact evaluation is for addressing questions for which no credible counterfactual can be constructed. But even in these cases, there is usually a series of underlying questions that need to be answered through impact evaluation to begin to find answers. For example, it would be difficult to develop a random assignment study to assess "budget support" (i.e. recent strategies of development assistance that replace project-specific funding with direct financial support to developing country governments and linked to broader policy targets). However, if we had a good evidence base on what were the most effective interventions in particular circumstances (built up from impact evaluations), we would be in a position to indirectly judge the impact of moving from project assistance to direct budget support. This is because we would have a measure of the effectiveness of the programs implemented by the government under the budgetary support scheme. By contrast, in the absence of information about which programs are effective, it is impossible to judge whether a government has made a "better" or "worse" allocation of its budget. More importantly, without good impact evaluations, governments who receive additional budgetary support do not have information on which to base their decisions about how to spend the money effectively.

Impact evaluations can be ethical

Some argue that impact evaluations that rely on collecting data from control groups are unethical because they exclude people from program benefits. But this criticism only applies when resources are available for serving everyone as soon as the program starts. In fact, whenever funds are scarce or programs need to be expanded in phases, only a portion of potential beneficiaries can be reached at any time. Choosing who initially participates by lottery is no less ethical (and perhaps even more so) than many other approaches. Some programs are allocated by lottery when they are oversubscribed (e.g. school choice in the United States, voucher programs in Colombia), or for transparency and fairness (e.g. random rotation of local governments seats to be set aside for women in the Indian elections).

Furthermore, whenever we have a reasonable doubt of a program’s efficacy or concerns with unforeseen negative effects, it is not only ethical but actually an imperative responsibility to adequately monitor and evaluate the impact. This is exactly the premise underlying the routine use and regulation of medical trials; and it applies to many social programs as much as to medicine.

Finally, starting with a properly evaluated pilot program can greatly increase the number of eventual program beneficiaries, since the evidence of success will provide support for continuing and expanding an effective program.

Ignorance is more expensive than impact evaluations

It is also argued that impact evaluations are too costly or difficult. This argument is often made by comparing the cost of an evaluation to the program that is its subject. But the appropriate comparator is not the program cost but the value of the knowledge it would produce. For example, evaluations of demonstration training programs in the US and Latin America have sometimes exceeded a third of the initial program costs, but the evaluation results affected decisions regarding the rollout of much larger national programs. In these cases, the value of scaling-up programs that worked and avoiding or redesigning those that were ineffective was extremely high. To the degree that these findings were generalizable, they yielded benefits to other countries as well. Thus, a few well-selected impact evaluations can generate knowledge that influences the design and adoption of an entire class of interventions around the world.

Sometimes the additional cost of doing a good impact evaluation is actually quite small. When projects are results-oriented and require baseline data, an intelligent design for data-gathering can determine whether or not an impact evaluation will be feasible – sometimes without any additional cost of data collection. Costs are also likely to be lower in studies of developing country programs because the field costs of surveys and local researchers is generally lower than in higher-income countries.

The principal cost of a random assignment study is the cost of data collection; and the cost of collecting data for a bad study is just as expensive as collecting data for a good one. For example, a large primary education program in India (DPEP) spent millions of dollars collecting data on all the districts in which the program was implemented. But, as noted above, this kind of data collection does not lend itself to a proper impact evaluation. A proper data collection strategy (e.g. randomly choosing which districts would be offered the program and then conducting surveys in a sample of districts that were offered and those that were not offered the program) might have cost less and, most importantly, would have provided useful information about the program’s impact (Duflo and Kremer 2003, Duflo 2004).

At a minimum, critics must recognize that the cost or difficulty of good impact evaluations is not a universal fact, but rather one that has to be judged for particular questions and contexts.

Impact evaluations can provide timely information

A fourth critique of the need for impact evaluation studies is that the results are not useful for decisions because they take too long to produce results. However, the time taken to produce results depends a lot on the questions being studied. Some rigorous impact evaluations produce results within a matter of months. ns within a matter of months. Others take longer, but are still available in time to affect important policy decisions. For example, the initial findings of Mexico’s impact evaluations of its national conditional cash transfer program were available in time to convince a new administration to preserve it. A rigorous impact evaluation comparing different kinds of teachers provided valuable information to Pratham (an NGO in India) in time for it to expand a program of community-based teachers who had been shown to be at least as effective as and less costly than new teachers.

It is also possible to design impact evaluations that generate useful feedback during implementation. For example, a multi-year study of the impact of HIV education in Kenya was designed to monitor the long-term impact but also to assess intermediate outcomes, such as the accuracy of knowledge about HIV transmission. The transfer of knowledge is only a necessary, not sufficient, condition for the program to have an impact; but measuring the success or failure of reaching such intermediate goals can help program managers make necessary adjustments to improve implementation.

More fundamentally, however, it is more important to have accurate information about what programs work even if it takes some years to acquire than to have inaccurate information generated quickly.

Impact evaluations complement other studies

Critics sometimes claim that impact evaluations only tell us whether or not something has an impact without telling us why and how. But a good impact evaluation can provide reliable evidence about the mechanism by which the outcome is achieved when it simultaneously collects information on processes and intermediate outcomes. Impact evaluations are not a replacement for sound theories and models, needs assessment, monitoring, and operational evaluations. All of these elements are necessary to complement the analysis of impact. But it is equally true that the knowledge gained from impact evaluations is a necessary complement to these other kinds of analyses.

Findings from impact evaluation can be simple and transparent

The final critique is that such studies are too complex to be understood by policymakers and do not influence policymaking. In fact, good impact evaluations and randomized evaluations in particular, are relatively easy to present to policy makers with a little work. MDRC, formerly known as the Manpower Demonstration Research Corporation, conducted randomized control trials of numerous state welfare programs in the United States.[i] Because the findings were readily conveyed to policymakers, MDRC’s studies had a significant impact on U.S. welfare reform legislation in the mid-1980s (Wiseman et al 1991, Gueron 1997, and Gueron 2002).

Other cases where impact evaluation affected subsequent policy can be found in Latin America. In the 1980s, evaluations of radio-assisted education programs in Nicaragua led to widespread replication of this promising intervention (Jamison 1978). The impact evaluation of PROGRESA in the mid-90s is widely credited with preserving that social program in the transition to an opposition administration (the program was retained but the name was changed to Oportunidades). Furthermore, the PROGRESA evaluation influenced the adoption of similar conditional cash transfer programs in many other countries (Morley and Coady 2003).

If we don't start now, then when will we ever learn?

Most important of all, if impact evaluations are not started today, then we will never have access to the information needed for evidence-based decisions. This point has been made in recent declarations associated with the creation of the Global Fund for AIDS, TB and Malaria, the replenishment of World Bank and African Development Bank concessional funds, and the formulation of the Millennium Development Goals. In each case, attention to measurement of results makes it imperative to lay down the foundations today so that we can learn about the effects of our actions in the future. Any investment takes time to yield benefits, and building the kind of knowledge generated by impact evaluations is one of the best investments we can make today.

[i] MDRC is a private non-profit organization established in 1974 with support from the Ford Foundation and six US government agencies to assess welfare, training, and education programs. It

1st Annual CGD Women in Leadership Conference: From Pipeline to Power: Evidence on Women’s Leadership in Global Institutions

From Prospective to Prepared Teacher: A Global Study of Initial Teacher Education

Impact Evaluations are Feasible and Desirable

Events

1st Annual CGD Women in Leadership Conference: From Pipeline to Power: Evidence on Women’s Leadership in Global Institutions

From Prospective to Prepared Teacher: A Global Study of Initial Teacher Education

Impact Evaluations are Feasible and Desirable