With rigorous economic research and practical policy solutions, we focus on the issues and institutions that are critical to global development. Explore our core themes and topics to learn more about our work.
In timely and incisive analysis, our experts parse the latest development news and devise practical solutions to new and emerging challenges. Our events convene the top thinkers and doers in global development.
Economic development, institutional analysis, health systems, corruption, evaluation
Bill Savedoff is a senior fellow at the Center for Global Development where he works on issues of aid effectiveness and health policy. His current research focuses on the use of performance payments in aid programs and problems posed by corruption. At the Center, Savedoff played a leading role in the Evaluation Gap Initiative and co-authored Cash on Delivery Aid with Nancy Birdsall. Before joining the Center, Savedoff prepared, coordinated, and advised development projects in Latin America, Africa and Asia for the Inter-American Development Bank and the World Health Organization. As a Senior Partner at Social Insight, Savedoff worked for clients including the National Institutes of Health, Transparency International, and the World Bank. He has published books and articles on labor markets, health, education, water, and housing including “What Should a Country Spend on Health?,” Governing Mandatory Health Insurance, and Diagnosis Corruption.
Last year, our colleague, Jonah Busch revealed that India surpassed Norway as the largest results-based funder of forest conservation. Now, India has become the single largest payer for outcomes in a nationwide sanitation initiative.
For years, conventional aid programs have tried to improve sanitation by building infrastructure for potable water and latrines. Successive failures led them to refocus on improving maintenance systems or trying to change social norms, but usually by prejudging the “right” plan of action and paying for inputs. India has had its share of these failures, as discussed in the press (here and here) and documented by researchers.
Now the Government of India and the World Bank have adopted an approach using principles we describe as Cash on Delivery (COD). The program follows three of these principles by linking payments to outcomes, not inputs; independently verifying outcomes; and allowing recipients to take the lead (in other words, the program is “hands off”).
How does it work? India and the World Bank recently signed a US$1.5 billion loan that will finance incentives to states that succeed in reducing open defecation while implementing the Swachh Bharat Abhiyan (or Clean India Mission)—a five-year, US$22 billion national government program. The World Bank loan is structured as a Program for Results (PforR) operation and has two parts. The first part involves payments from the World Bank to the central government based on verified outcomes. The central government in turn releases incentive grants to the states, also based on declines in open defecation. In addition to its sheer scale, the program is distinctive because:
It pays against outcomes—the ultimate development goals, far along the “results chain,” that we care about. For example, 50 percent of the US$1.5 billion loan is tied to reductions in open defecation and an additional 30 percent to sustaining open defecation-free status in villages.
Annual national surveys conducted by a third party will independently verify the level of progress achieved against these indicators.
As it gives states the flexibility to decide how best to reduce open defecation, this results-based approach is "hands-off." States can decide where to allocate funds and adopt strategies that take the local context into account.
These three features set the new India program apart from conventional aid, and even from 34 other PforR operations in the World Bank’s portfolio. PforR loans disburse funds in relation to indicators that are chosen to reflect an operation’s goals and context. This flexible instrument allows operations to pay for outcomes along the lines of a COD agreement, though most of them focus on institutional changes, activities, and outputs.
A new CGD policy paper reviews the World Bank’s first 35 PforR operations and shows that, unlike the India program, most of them link payments to outputs and actions. These are measured by such indicators as the availability of anti-retroviral drugs in Mozambique, the number of telephone calls received by Citizen Contact Centers in Pakistan, and the creation of a monitoring system for technical and vocational training programs in Brazil. About one-third of the PforR loans link a portion of the disbursements to an outcome, like the number of children who can read in Tanzania or increases in crop yields in Rwanda. India’s sanitation program stands at one end of the spectrum by allocating 90 percent of the loan to three outcomes. In addition, the results in the India sanitation program are independently verified, in contrast to some operations that rely on self-reported progress.
As this largest of all COD programs launches, we can expect challenges in implementation. For example, the first annual survey will establish baseline levels against which progress will be measured and set targets; project managers will develop procedures that could overly complicate or facilitate state autonomy; and it may take time for the actors involved to fully understand they have autonomy and then respond to the incentives. Nevertheless, this is a tremendously exciting opportunity to see if this paying for outcomes approach will succeed at improving the health of millions of rural Indians better and faster than efforts of the past.
We would like to thank Alan Gelb and Anit Mukherjee for feedback and input on this blog.
Health aid pays for life-saving medicines, products, and services in the poorest countries in the world. Funding for such uses needs to be smooth and uninterrupted. But when fraud is detected, funds are subject to sudden stops and starts—the result of a sequence of events set off by the scandal cycle in health aid. We examine this idea in a new CGD policy paper.
To understand the scandal cycle, we looked at four cases of fraud and response involving the World Bank in India, USAID in Afghanistan, the Global Fund in Mali, Djibouti and Mauritania, and European donors in Zambia. While corruption is discovered in different ways, scandals tend to erupt when the press publicizes it or a funder reacts strongly. Once allegations are in the public eye, funders typically react by suspending aid. Then, they work with recipients to create action plans for improving financial management systems, and eventually resume funding.
This scandal cycle is, unfortunately, all too common. In May, the Global Fund published an investigation that tracked down $3.8 million in fraudulent expenditures at Nigeria’s Department of Health Planning, Research & Statistics. The Fund’s executive director issued a statement reaffirming the Fund’s “zero tolerance of corruption” policy, underscoring that the Fund has frozen disbursements to several Nigerian agencies, and calling for reforms to government control measures.
As with the cases we analyzed in our paper, the focus on fraud often comes at the expense of considering the scale of corruption and the impact of disruption on health programs. While $3.8 million is no small number, it represents less than one percent of the $889 million in grants to Nigeria that the Global Fund audited in a companion report. Furthermore, the impact of international support on improving health has been rather large; the Global Fund’s own statement indicates that international support has helped Nigeria reduce deaths from malaria by 62 percent since 2000.
Halting disbursements to health programs can have serious consequences for service delivery, health outcomes, and institutional development. In light of the scale of fraud and the potential health impact, is suspending aid an effective response? And without information on health impact, how would we know?
We argue that funders may be able to escape the scandal cycle—and reduce such disruptions—by paying greater attention to information on program achievements. Currently, funders pay a lot of attention to procedural issues. For example, a 2013 report from the Special Inspector General for Afghanistan Reconstruction (SIGAR) documented weak accounting systems at the Afghan Ministry of Health. Even though the report had no direct evidence of fraud and the health program was successfully delivering services, SIGAR recommended USAID suspend the program.
By contrast, the World Bank’s 2008 Detailed Implementation Review of the Indian health sector not only included evidence of procedural failures, such as bid rigging, but also documented results failures, like continuing high malaria rates and inoperative hospitals. If the World Bank and India had reported these results failures earlier, the cases where corruption was big enough to affect programs would have come to light much sooner.
We think results on service delivery, population health, and institutional development are the key piece of information that could change the dynamics of the scandal cycle. This kind of information can help funders communicate more effectively about why they are deciding to suspend or continue aid, set appropriate standards for when aid should be halted, and establish new funding mechanisms that make it more difficult to divert funds.
We recommend the following three steps to improve funder response:
Communicate using results. When a scandal erupts, communicating the funder’s actions to control or prevent corruption to stakeholders, the media, and the broader public is important. But emphasizing whether health aid programs are achieving intended results is also an essential component of the communications strategy. If a program is achieving results, stakeholders and constituents would better understand a funder’s decision not to suspend aid when a scandal erupts (while investigating abuse and working with the recipient to address the problem).
Differentiate responses by results. In addition to responding to corruption allegations (which typically come from whistleblowers), tracking program results could help funders detect corruption. If a program is falling short of achieving results, corruption might be a contributing factor and an investigation could help determine whether and how much. Moreover, results data would allow funders to determine whether corruption is—or is not—hampering program implementation, and to recalibrate anti-corruption controls accordingly.
Disburse in proportion to results. Where feasible, paying for results in health could help ensure that funds are only paid out when results are achieved. This approach makes it harder to divert funds because payments only occur after the program’s impact is measured. In programs that pay for results, dishonest people can only skim off funds if they have been very efficient at generating impact. In practice, they are likely to simply set their sights elsewhere.
The Global Fund’s recent statement recognizes the importance of communicating the results of its health grants to Nigeria, but it doesn’t address whether it is helpful to suspend aid over a relatively small amount of fraud or lack of supporting documentation. Our paper encourages funders to incorporate information about program results into their risk management strategies so they can communicate better, detect corruption sooner, and make more considered choices about creating or responding to scandals.
Global health action has been remarkably successful at saving lives and preventing illness in many of the world’s poorest countries. This is a key reason that funding for global health initiatives has increased in the last twenty years. Nevertheless, financial support is periodically jeopardized when scandals erupt over allegations of corruption, sometimes halting health programs altogether.
What should tomorrow’s aid agencies look like in a landscape where the global goal is to ensure sustainable development? In the past, the role of aid has mainly been to “finance” specific projects or services, with a strong sense of donor identity and marked projections of donor interests. A modern approach to development assistance, however, focuses on the catalytic role of institutions and their capacity to mobilize expertise and resources towards shared objectives.
A billion premature deaths this century—that’s the estimated toll of smoking. As 80% of the world’s smokers live in low- to middle-income countries, that’s a huge problem for the developing world. That's a lot of not only lost lives but also lost economic output, and increased strain on already-overburdened health budgets.
So what’s the solution? You’ve
Cmd+Click or tap to follow the link">heard before from CGD senior fellow Bill Savedoff that increasing tobacco taxes can actually help turn people away from nicotine; on this week’s podcast for World No Tobacco Day, you’ll hear another idea.
University of Ottawa professor David Sweanor, who helped develop Canada’s tobacco control laws, believes that smokers should be encouraged to switch to less harmful nicotine delivery systems, like e-cigarettes. But does switching our focus to harm reduction mean letting go of the “endgame”—a completely tobacco-free future?
That’s the question that Sweanor and Savedoff tackle in this week’s podcast. In the clip below, Sweanor argues that nicotine itself is not particularly hazardous. But, Savedoff asks, if “puffing” on e-cigarettes becomes the norm, will that undo all the progress we’ve made towards eradication?
A billion premature deaths this century – that’s the estimated toll of smoking. As 80% of the world’s smokers live in low- to middle-income countries, that’s a huge problem for the developing world. So what’s the solution? You’ve heard before from CGD senior fellow Bill Savedoff that increasing tobacco taxes can actually help turn people away from nicotine; on this week’s podcast, you’ll hear another idea.
Is the tobacco epidemic more like smallpox or HIV? It’s an important question. If it is like smallpox, then we can pursue strategies to eradicate tobacco as a risk to human health. However, if it is like HIV, we instead need to be thinking in terms of controlling and managing the epidemic. I have tended to favor the idea of eradication. But this “World No Tobacco Day,” I find myself reconsidering.
I aired some of these thoughts in a CGD podcast with David Sweanor, a long-time veteran of the war on smoking, and Rajesh Mirchandani, where we discussed the promise and misgivings associated with encouraging people to switch from cigarettes to Electronic Nicotine-Delivery Systems (ENDS, commonly known as e-cigarettes). This strategy of harm reduction could be extremely effective at reducing premature deaths, but it would mean accepting that nicotine addiction, like unsafe sex, is something that won’t be eliminated anytime soon.
Eradicating deaths from smoking is not necessarily a pipe dream
It was possible to eradicate smallpox because human beings were its only host, transmission didn’t involve another reservoir for infection, and an effective vaccine existed to prevent infection and interrupt transmission. Smoking cigarettes shares many of these factors. If human beings stopped growing tobacco, manufacturing cigarettes, and selling them, then the prevalence of associated diseases—such as lung cancer—would plummet. This idea of moving beyond tobacco “control” to something like a “tobacco-free future,” in which smoking is no longer a major health threat, is known as an “endgame proposal.”
Most strategies today for reducing death and disease from tobacco are compatible with endgame proposals but are essentially efforts to manage, constrain, and reduce the scope of the epidemic. This year, WHO is prominently discussing the effectiveness of plain packaging, which limits the ability of companies to entice new smokers and undercuts profitability by increasing competition. Other control measures include tobacco taxes (which I’ve argued are the single best health policy in the world) and measures to ban smoking in public places, restrict sales to minors, prohibit advertising and sports sponsorships, and limit cigarette company access to policymakers. These control measures are responsible for remarkable progress against smoking in the last few decades; yet the number of smokers has continued to rise.
When I’m feeling optimistic, I envision that these control efforts will reduce demand enough to make cigarette manufacturing unprofitable, marketing efforts weaker, and interference with regulation unlikely. But the process is slow (another 30 million people start smoking every year), the number of deaths during the transition is enormous (rising to 10 million per year in 2030), and the context is constantly changing. Cigarette manufacturers have demonstrated incredible ingenuity at outsmarting taxes, advertising prohibitions, and other regulations. In countries with the largest smoking populations, multinationals aren’t even necessarily the main drivers of the epidemic. China has a large state tobacco monopoly. In India, high-toxin chewing tobacco and hand-rolled bidis are a major threat. Tobacco consumption in Indonesia is diverse and it is the only large country that failed to muster political support to sign the Framework Convention on Tobacco Control.
Electronic Nicotine-Delivery Systems are changing the game
The Royal College of Physicians (RCP) concluded that using ENDS is substantially less hazardous than smoking cigarettes. This creates an alternative for those currently addicted to nicotine and who have been unable to quit. By regulating and taxing ENDS less stringently than cigarettes, it could be possible to accelerate progress against smoking-related diseases. Furthermore, as David Sweanor argues in our podcast:
ENDS producers could force cigarette companies to face more competition, lowering their profitability (so long as regulators don’t constrain the ENDS market); and
plaintiffs’ attorneys could sue combustible cigarette manufacturers for selling unreasonably dangerous nicotine-delivery devices (because of a safer alternative in the market).
While I find these points convincing, I still have misgivings. My biggest worry is that cigarette manufacturers will co-opt ENDS so as to encourage nicotine-dependence and move users to combustible cigarettes (a concern also raised in the RCP report). I also wonder if ENDS will become a way to renormalize the social behavior of “puffing” in public and roll back what little gain we’ve made in de-glamorizing smoking.
More importantly, does using ENDS as a harm-reduction strategy mean giving up on the idea of an endgame? As Clive Bates asks, what exactly are we trying to eradicate: death and disease or nicotine-addiction? The two issues are quite different.
I’m not willing to give up on the idea of a world free of tobacco-related illnesses. As with smallpox, the “reservoir” of this epidemic is exclusive to our species. The associated diseases could be eradicated if we were to eliminate the political, social and economic mechanisms which support “transmission.” If this is simply infeasible because of the genetics and behavioral aspects of nicotine use, then we need ENDS for the endgame. It requires the right public health policies to regulate and tax ENDS in a way that discourages nicotine addiction; offers current smokers a safer alternative; and disrupts the cigarette market. If successful, this methodology would be a way to help people live with nicotine rather than die from smoking.
I’d like to thank David Sweanor and Prabhat Jha for comments on an earlier draft.
In a recent blog, Duncan Green wonders if “Pay by Results” (PbR) programs are overhyped and questions whether foreign aid agencies and NGOs should be pursuing them at all. If PbR programs are taking off, it doesn’t seem to be for a particular form of PbR that we describe as Cash on Delivery (COD) Aid. That’s not surprising. COD requires donors to be willing to try something substantially different from conventional aid. In particular, it requires them to recognize up front that development programs don’t always achieve results in the first few years, and it takes away the illusion of progress created by adherence to pre-approved plans and implementation schedules.
Only a few countries have stepped into this new way of doing aid. The UK experimented with paying Ethiopia a small sum for each additional secondary student who completes school and takes a test. To its credit, the UK and Ethiopia stuck by the agreement even after the first-year outcomes were much lower than they had anticipated. Norway is certainly a champion of patience among donors, maintaining its commitment to Indonesia for five years now, still waiting for Indonesia to complete a number of preconditions to the performance aid package.
Much of what Duncan Green questions in PbR applies to payments to specific service providers, NGOs, communities, or households. Indeed, the Bond study that he cites is focused on paying service providers for results. But these programs differ from COD in important ways. COD pays governments, not service providers. COD pays for only one or a few broad and important measures of outcomes that the government and donor care about, such as educated children, healthier people, access to inexpensive energy, or more efficient tax administration. COD requires that the outcome indicator be regularly reported to the public.
Why do these differences matter? Because without these features, PbR programs tend to create “deliverables” defined by donors, to be delivered on a preset schedule, which invites creation of “plans” and “result chains” and fixed implementation schedules. That, in turn, diverts donor attention from outcomes to inputs, indulging donor impatience and discouraging the kinds of local initiative and innovation (e.g., Problem-Driven Iterative Adaptation) that are ultimately the best guarantee of sustained progress.
Debates over something as abstract and generalized as PbR will continue to be fruitless unless people make clear distinctions about who is being paid, for what, and how. Performance agreements for a consultant writing reports, a health clinic caring for infectious disease, or an energy firm delivering electricity differ from each other, and differ from performance agreements with governments. PbR of the COD Aid type — paying governments in proportion to outcomes as COD Aid proposes — is still largely unexplored.
PbR may be overhyped at the same time that at least one type of PbR is underutilized.
Cash transfer programs have shown mostly consistent success at improving conditions that matter for development; smoothing consumption, increasing school attendance and health care, sometimes improving nutritional status and helping with the accumulation of productive assets, among others. This event will highlight cash transfers as a tool for development, and pose the questions: when are cash transfers better than traditional foreign aid? And should aid be benchmarked against the cost-effectiveness of cash transfers?
Opening keynote remarks from Paul Niehaus, co-founder of GiveDirectly and Professor at the University of California San Diego, will argue for benchmarking in-kind aid interventions against cash transfers, and will be followed by a debate on the relevance and feasibility of this approach. Jenny Aker, Jishnu Das, and Sudhanshu Handa will argue for benchmarking, while Ferdinando Regalia, David Roodman, and Bill Savedoff will argue against. Audience Q&A will follow. See the agenda for more information.
I never cease to be astonished by the amount of energy people put into claiming that Randomized Control Trials (RCTs) are the be-all and end-all of impact evaluation methods; nor at the energy people put into claiming that RCTs are marginal, costly, and a waste of time.
In both cases, people build a case against one side with a kind of argument that they fail to apply to the side they favor. Angus Deaton roundly exposed all the assumptions in RCTs that qualify their use and Lant Pritchett questions whether RCTs ever have external validity. But in both cases, they are ignoring that most evaluation work, including most studies that claim to be measuring impact, are not rigorous quasi-experimental or sophisticated qualitative studies. They are a mix of methods, some whose assumptions are poorly explicated and many of which fail to explicitly address even the most basic concerns about bias.
In a recent blog, Philipp Krause highlights another part of the anti-RCT critique. He correctly argues that the flat theory of change expressed by many RCT supporters, from study-results-to-efficient-policy-decision model is absurd. But, like many other RCT critics, he then claims that alternative evaluations are cheap and effective, ignoring just how much money is poured into such studies and how many of them also lack rigor and also fail to influence policy. Krause cites Chile’s public evaluation system for evidence that large benefits can be obtained from relatively inexpensive studies. But Chile is an outlier, the exception not the rule. Yes, Chile’s experience shows that inexpensive studies looking at a particular set of questions can be very useful. No, it does not prove that RCTs are marginal and useless.
Krause and Pritchett argue that RCTs are marginal, and I agree, but not for the reasons that they put forward. RCTs are marginal because only about 200 of them (my estimate based on a 3ie database) are being started in any given year on topics related to development programs. This is dwarfed by the thousands of evaluations being conducted using expert interviews, focus groups, non-purposive samples, and quasi-experimental methods. RCTs are hot and visible which makes it look like they’re the dominant form of evaluation (especially if you take your sample at a J-PAL 10th anniversary event), but they’re not the dominant form of evaluation.
And here’s the main point: No method can be proven less costly or more effective than any other without reference to the specific context in which a specific evaluation question is being asked.
Abstract methodological debates are worse than useless. They are harmful because they cast doubts on good studies along with the bad. They provide excuses for managers and politicians to cut spending on evaluation work rather than making the case to improve the appropriateness of methods applied to different kinds of policy questions. And they confuse people regarding whether there even is any such thing as a better or worse study. Ruth Levine makes an eloquent case that we risk throwing out the baby (genuine efforts to find out what policies do) and drowning in the bathwater (of polarized debates over methodological superiority).
So here’s my plea: Stop criticizing evaluation methods in the abstract.
That means, if you read a study and think it is asking irrelevant questions or applying the wrong method to answer its question, then focus your criticism on that specific study – and propose a better alternative for that case, if you know of one.
RCTs are not the be-all and end-all of evaluation evidence. Neither are regression-discontinuity, utilization-focused evaluation, realist evaluation or participatory evaluation. Each of these are part-of-the-all and have their place in addressing different questions in different places for different purposes.
There are 20 pages covering the Addis Ababa Action Agenda. And while they are inevitably bubble-wrapped in diplo-speak and hat-tipping, there is a solid package of proposals nestled within. They cover domestic public finance, private finance, international public finance, trade, debt, technology, data and systemic issues. Amongst many other things, the Agenda calls for more tax and better tax (less regressive, more focused on pollution and tobacco). And it is long and specific on base erosion, tax evasion and competition and tax cooperation. It calls for financial inclusion and cheaper remittances. The draft discusses blended finance and a larger role for market-based instruments to support infrastructure rollout, as well as a new measure of “Total Official Support for Sustainable Development.” It calls for Multilateral Development Bank reform including new graduation criteria and scaling up. And it suggests a global compact to guarantee a universal package of basic social services and a second compact covering infrastructure. Finally, the draft has a good section on technology including the need for public finance and flexibility on intellectual property rights.
With the US Congress considering cuts to foreign assistance and aid budgets in other donor countries coming under increased pressure, evidence about what works in global development is more important than ever. Evidence should inform decisions on where to allocate scarce resources—but to do so, evaluations must be of good quality. The evaluation community has made tremendous progress on quality over the past decade. Several funders have implemented new evaluation policies and most are conducting more evaluations than ever before. But less is known about how well aid agencies are evaluating programs.
To fill in the gap, we—together with our colleagues Julia Raifman Goldberg, Felix Lam, and Alex Radunsky—set out to assess the quality of global health evaluations (both performance and impact evaluations). We looked specifically at publicly available evaluations of large-scale health programs from five major funders: USAID, the Global Fund, PEPFAR, DFID, and IDA at the World Bank. We describe our findings in a new CGD Working Paper and accompanying brief. Check out the brief recap of our findings below.
What types of evaluations are aid agencies conducting?
We identified a total of 299 evaluations of global health programs published between 2009 and 2014. One feature stood out to us: performance evaluations made up an overwhelming majority (91 percent), with impact evaluations accounting for less than 10 percent. This is comparable to the share found across USAID evaluations in all sectors by an earlier study. And among impact evaluations, those using experimental methods, known as randomized controlled trials or RCTs, constituted a minority (we only found five RCTs). When looking at evaluations commissioned or conducted by major funders, the often-made criticism that RCTs are displacing other forms of evaluation doesn’t hold up.
How well are aid agencies evaluating global health programs?
We randomly sampled 37 evaluations and applied a standardized assessment approach with two reviewers rating each evaluation. To answer questions about evaluation quality, we used three criteria from the evaluation literature: relevance, validity, and reliability. We considered evaluations as relevant if the evaluation addressed questions related to the means or ends of an intervention, and used appropriate data to answer those questions. Evaluations were considered valid if analyses were methodologically sound and conclusions were derived logically and consistently from the findings. Evaluations were considered reliable if the method and analysis would be likely to yield similar conclusions if the evaluation were repeated in the same or similar context.
We constructed four aggregate scores (on a three-point scale) to correspond with these criteria. Overall, we found that most evaluations did not meet social science standards in terms of relevance, validity, and reliability; only a relatively small share of evaluations received a high score.
Looking across different types of evaluations, we found that impact evaluations generally scored better than performance evaluations on measures of validity and reliability.
What can aid agencies do better going forward?
Building on our analysis, we developed 10 recommendations for aid agency staff overseeing and managing evaluations to improve quality.
Classify the evaluation purpose by including this information in the title and abstract, as well as coding/tagging categories on the agency website.
Discuss evaluator independence by acknowledging the evaluators’ institutional affiliation and any financial conflicts of interest.
Disclose costs and duration of programs and evaluations.
Plan and design the evaluation before program implementation begins; we found that early planning was associated with higher evaluation quality.
State the evaluation question(s) clearly to ensure the right kinds of data are collected and an appropriate methodology is used.
Explain the theoretical framework underlying the evaluation.
Explain sampling and data collection methods so subsequent researchers could apply them in another context and readers can judge the likelihood of bias.
Improve data collection methods by using purposeful or random sampling, where possible, that provide more confidence in findings.
Triangulate findings using varied sources of qualitative and quantitative data.
Be transparent on data and ethics by publishing data in useable formats, and taking appropriate measures to protect privacy and assure confidentiality.
This set of recommendations draws on the high-quality evaluations we found in our sample. These examples showed that it is possible to conduct good quality evaluations for a range of methodologies and purposes. In many cases, quality improvement is possible within existing budgets by planning early or using better data collection approaches. Taking steps to improve quality can help ensure evaluations promote learning about what works and hold funders and implementers accountable—with an eye on increasing value for money and maximizing development impact.
We assessed the methodological quality of global health program evaluations from five major funders between 2009 and 2014. We found that most evaluations did not meet social science methodological standards in terms of relevance, validity, and reliability. Nevertheless, good quality evaluations made it possible to identify ten recommendations for improving evaluations, including a robust finding that early planning is associated with better quality.
Cash on Delivery Aid (COD Aid) is moving from concept to reality as I learned in a recent trip to Europe. In the process we are learning a lot about measuring outcomes and other implementation challenges. While I heard about the ways aid agencies are beginning to try COD Aid or similar initiatives, the internal resistance they face told me a lot about the internal contradictions we’ve lived with in foreign aid for a long time.
One of the most prominent issues underlying all these discussions was the growing disenchantment in Europe with general budget support programs. The main political argument being leveled against budget support is that it can’t demonstrate performance, and the basic response has been to move toward project-specific aid. If European aid agencies move in this direction it will be a shame – throwing out the baby with the bathwater. My hope is that they will see that they can preserve the good aspects of budget support – working through country systems and giving recipients greater ownership – by agreeing to disburse flexible funds in relation to progress on a few key high-level indicators, such as educational attainment, reduced child mortality, better security, less deforestation or cleaner energy.
The discussion of Results-Based Aid is also extremely useful for uncovering the dynamics of foreign aid politics. For example, a key advantage of results-based mechanisms is that they can reduce transaction costs associated with tracking inputs. Nevertheless, we heard several cases in which the measurement of outcome indicators was simply added on top of existing spending control mechanisms. In addition, results-based aid from one government to another should be an opportunity to keep attention on broad high-level goals and leave the recipient with flexibility on how they respond. Instead, it is extremely tempting for the discussion to fall into using the payments to get “them” to do what “we” think they should.
The entire conception of foreign aid is changing – with new actors, new constraints, and new ideas. I think the process of working out these new ideas in practice will show if the system can really be reformed or whether it will be increasingly marginalized.
The United Kingdom has been a stalwart funder and innovator in foreign assistance for almost 20 years. In 2011, it created the Independent Commission for Aid Impact (ICAI) to report to Parliament on the country’s growing aid portfolio. ICAI is a QUANGO in Brit-speak – a quasi-public non-governmental organization - with a 4-year mandate which is undergoing review this year. Recently, I took a look at the reports it has produced to see whether the organization is fulfilling its role in holding the country’s overseas development aid programs accountable. I found one fascinating report which shows what ICAI could be doing and many more reports that made me wonder whether ICAI is duplicating work already within the purview of the agency, Department for International Development (DFID), which accounts for most of the UK’s foreign assistance programs.
The world of impact evaluation has changed dramatically over the last ten years and I’ve been worried that political and bureaucratic pressures to water down evaluation systems would erode this wave of commitment to study, learn and respond to findings on aid programs. This was a key concern in CGD’s report on When Will We Ever Learn. So in 2011, when I first heard about ICAI, I wrote “…by establishing ICAI, the UK has gone further than most countries in establishing independent external oversight for aid programs, thereby raising the visibility of evaluation work and the standards of evidence.”
One of the first ICAI reports that I read seemed to fulfill this goal. Two years after DFID completed an impact evaluation of the Western Orissa Rural Livelihoods Project (an anti-poverty program in India), ICAI commissioned researchers to return and assess the quality of the evaluation, the reliability of the information, and the sustainability of the results. This study (in newly renamed Western Odisha) was a brilliant way to check on the project itself (did poverty really decline?) as well as provide insights regarding the way DFID conducts the impact evaluations that should serve as the basis for learning and adaptation. In this case, they found delays in the initial baseline survey, problems with the quality of the questionnaires, and errors in the associated cost-benefit analysis which underestimated the program’s likely return.
The report was quite candid about these findings but to my surprise none of these points were highlighted in the report’s three recommendations. Instead, the ICAI report concludes with three broad recommendations unrelated to evaluation or specific to anti-poverty programs. It calls for better long term planning, attention to sustainability in project design, and more transparency. These are perfectly reasonable admonitions. But in practical terms, what do they mean? In retrospect, projects always look like they should have planned for problems in implementation and sustainability. Furthermore, how would you know if DFID were implementing those recommendations?
By contrast, the evidence this team collected would have allowed ICAI to make much more specific recommendations about the rigor and use of DFID’s evaluations. Some of the findings could be tracked as a way of improving the learning cycle. A simple point noted in the report is the need for conducting baseline surveys in a timely fashion – this would be a powerful and relatively easy to monitor recommendation.
The ICAI report on a health program in Zimbabwe – which describes its methodology as “desk-based research and a two-week visit to Zimbabwe in September 2011” – is more typical of the studies that assess individual projects. In fact, of the 14 studies on the ICAI website that look at individual projects, 12 relied on short visits, interviews, and secondary information while only two involved significant primary data collection (the other relied on a cross-section of interviews in Nigeria). The remaining seven studies on ICAI’s site look at DFID’s relationship with other multilateral agencies (World Bank, Asian Development Bank, EU Aid, UNDP and UNICEF) based on literature reviews, short visits and interviews; explain ICAI’s approach to Value for Money; and assess DFID’s strategy for reducing corruption.
Some of these studies are nicely done but most of them look like the kinds of operational and quick project completion reports that are common within aid agencies and it isn’t clear why an independent commission is required to do them. Meanwhile, ICAI has generated a stream of recommendations, which I’ve been told has led DFID to generate a voluminous stream of new guidance – with little notion of whether it is read, is used or has much effect.
What an independent commission really can do is hold an aid agency accountable for having a strong evaluation and learning system in place. Is good evidence generated, and is it being used? The Western Odisha study was explicit about improvements needed in evaluation – including providing adequate resources and time relative to the study goals – and in learning – noting that lessons from this program had been applied across India but have not informed similar program’s in other countries.
Other countries should learn from the UK’s experience and think about setting up a Quango, but make sure it’s focused on the right things. DFID is mandated to evaluate and learn. Instead of duplicating that function, ICAI could hold them to account for doing it better.
Thanks to Ted Collins for research assistance on this blog.