Why a Careful Evaluation of the Millennium Villages Is Not Optional

March 18, 2010

Millennium Villages Project (MVP) is a new, large, experimental intervention to promote economic development in 13 clusters of small and very poor villages across Africa. It has been driven forward by Jeffrey Sachs, one of the world’s foremost economists, and today is a joint effort of Columbia University, Millennium Promise, and the United Nations. The project deploys a broad package of interventions for five years in each village, including distribution of fertilizer and insecticide-treated bednets, school construction, HIV control, microfinance, electric lines, road construction, piped water and irrigation lines, mobile phones, and several others. Its goal is to break the villages out of poverty traps and build a “solid foundation for sustainable growth”.Over the years I’ve been critical of the Millennium Villages, in soundbites trotted out periodically by the Financial Times and New York Times. Here I want to explain what lies behind those soundbites. I hope the MVP will consider these points as they revise their evaluation practices this year and continue to scale up the project.Kenyan Millennium VillageI deeply admire the goals of those who conceived the MVP and run it today. The challenges of Africa require boldness, and the MVP is commendably bold. The big problem with the enterprise is that the effect of the MV intervention is not being as carefully evaluated as it should be. Without sound evaluation, it simply cannot be known—regardless of what is observed today at MV sites—whether the money devoted to the MVs is accomplishing its goals. An intervention of this scale deserves proper evaluation, which can only make it better as it expands.A careful evaluation of the MVs would comprise two critical elements. First, long term follow-up is essential. Second, the villages that get the intervention must be compared to villages that do not get it, in such a way that which villages do and don’t get the intervention are randomly picked from an initial group. I will show with real-world examples why these elements are “must have” rather than “nice to have”.Long run follow-up before scale-upThe stated goal of the MVP is to break the villages out of a poverty trap, and put them onto the “ladder of development”. Therefore the project, no matter what its benefits in the short run, will have failed in its own terms if its effects are not sustained.The only independent evaluation of the MVs is currently planned to proceed in three waves: baseline, year 3, and year 5 of the project. That is, the evaluation has no publicly-stated plan to proceed long past the end of the five-year intervention in each village, before deciding whether or not it would be right to vastly scale up the intervention all across Africa. A recent research paper starkly showed how inadequate such a stance can be.That paper, by Shaohua Chen, Ren Mu, and Martin Ravallion, is a lesson in humility (ungated version here, published here). It studies the Southwest Project in China, a village-level development package intervention executed in 1,800 rural villages in the late 1990s. Like the Millennium Village intervention it targeted the poorest villages, lasted about five years, and cost hundreds of thousands of dollars per village. It sought to permanently reverse the fortunes of those villages with a broad-based package including roads, piped water, power lines, upgrading schools and clinics, training of teachers and health-care workers, microcredit, and initiatives for raising crop yields, animal husbandry, and horticulture.Right before the end of the Southwest Project intervention, five years after it started, the project seemed to indeed be reversing the fortunes of the treated villages. Income in those villages grew by 20% more during the project than in similar villages in the same area that had not received the intervention, and savings grew by 100% more.Then the intervention ended and—fast forward five years—all those effects on income and savings disappeared. Ten years after the five-year project began, average income and savings in the villages that got that massive package of interventions were indistinguishable from income and savings in villages that did not.Notably, incomes in both the treated and untreated Chinese villages in the Southwest Project area increased greatly during the span of the project (1995-2000) and for years thereafter. The reason this happened is because the Chinese economy was being transformed during this period, not because of massive village-level package development interventions.The point here is not that the Southwest Project was the same as the MVP; it wasn’t. The point is that short-term evaluation is plainly inadequate. It is obvious that the MVP is going to have short-term impacts. The size of the intervention is the same order of magnitude as the size of the entire economy of each village; that is, the MV intervention is roughly 100% of local income per capita (see the bottom of this post for this calculation). Indeed, it would be astonishing to not see short-term effects with an intervention that gargantuan. The three-year evaluation results that the MVP plans to release this year simply won’t tell us much.The only interesting evaluation question is in the long term, for three reasons: 1) because unlike short-term impacts the answer is not obvious, 2) because long-term change is the stated goal of the MVP, and 3) because other village-level package interventions have shown that short-term effects and long-term effects can be completely different from one another.Randomized evaluationThe ongoing independent evaluation of the MVs notes that “the specific interventions contained within the MVP package have scientifically proven efficacy” in other settings, and that outcomes in the MVs will be compared to those in untreated “comparison” villages that appear similar to the treated villages. It is laudable that individually-proven elements (such as fertilizer) have been chosen for inclusion in the packages, and it is laudable that the evaluation includes comparison villages. But this does not amount to a careful, scientific evaluation of the MVP intervention. Why?First, the fact that a technology has been scientifically proven in isolation—such as a certain fertilizer proven to raise crop yields—does not mean that it will improve people’s well-being amidst the complexities of real villages. Recent research by Esther Duflo, CGD non-resident fellow Michael Kremer, and Jonathan Robinson shows that fertilizer use is scientifically proven highly effective at raising farm yields and farmers’ profits in Kenya. But for complex reasons very few farmers wish to adopt fertilizer, even those well trained in its use and usefulness. This means that this proven technology has enormous difficulty raising farmers’ incomes in practice. The gap between agronomy and development is very hard to cross.Second, it is not sufficient to compare treated villages to untreated villages that were chosen ex-post as comparison villages because they appear similar. Many recent research papers have shown this conclusively. A long list of studies conducted over decades showed that African and other children learned much more in schools that had textbooks than in schools that appeared otherwise similar but did not have textbooks. Paul Glewwe, Michael Kremer, and Sylvie Moulin evaluated a large intervention in some of the neediest schools in Kenya (ungated version here, published here). Schools that received textbooks were randomly chosen from an initial pool of candidates. The problem: Children did not learn more in the treated schools than in the untreated schools.This strongly suggests that unobserved differences between treated and untreated schools could account for many of the earlier results claiming a large, positive effect of textbooks on learning in this setting. Likewise, when the early MVP evaluation results are released later this year, differences are likely to exist between treated villages and untreated villages that are similar in appearance to the treated villages. But we cannot know if those differences are attributable to the intervention. This is because the decision about which villages got treatment was not made by choosing villages at random from a pool of candidates.In the U.S. and other developed countries, people insist that pharmaceutical companies’ claims about the medicines people take are backed up by randomized evaluations of those claims. No company can assert that a pill will change the course of your life without proving the claim to an independent evaluation agency with a transparent, randomized experiment. People living in the current or future MVs deserve to know whether the claims that they are being launched onto the “ladder of development” are backed up by good science.Randomization is not a panacea for all development evaluation, and many important development policy decisions cannot feasibly be based on randomized evidence. But the MVP is not among those. Opponents of all randomization have suggested that it is cruel to deny treatment to the control group, but in the pilot phase of any project it is impossible to treat all villages across a whole continent. Having a control group of some kind is not optional, and thus not cruel. In such settings the decision to randomize does not affect how many people receive treatment.An evaluation for the new millenniumThe MV evaluation should plan for the long term and focus on showing long-term impacts before it is scaled up. There should be three waves of evaluation: five years after the intervention is over, 10 years after, and 15 years after. The cost of the household surveys required to conduct this evaluation would be less than 5% of the project cost—well worth it in the pilot phase of any venture (see the bottom of this post for this calculation).And there was no fundamental reason why the selection of treatment villages for the MVP could not have been randomized. There was certainly a large pool of candidate villages, and the people running the MVP are some of the most capable scientists on earth, so they are very familiar with these methods and why they matter. But treatment selection was not random, and it may be too late to evaluate the initial 13 MVs scientifically. It would be very easy, however, to scientifically evaluate the next wave. The MVP seeks to scale up its intervention massively, so there will likely be many more such villages in the years to come. The next wave of them could easily be randomly selected for treatment.These changes would help ensure that the MVP, if it is to be massively scaled up, does so in a form that is known to accomplish its goals.

APPENDIX: Explanation of how two numbers used in this post were calculated:1) The MVP intervention is roughly the same size as the entire local economy, that is, on the order of 100% of local income per capita.

Take the Mwandama, Malawi cluster. Income per capita is somewhere between US$100 and US$150 (exchange rate dollars), and probably closer to US$100. The basis for this is about 90% of people in the Mwandama cluster live in extreme poverty. The World Bank’s PovCalNet figures for Malawi in 2004 say that 74% of the national population lives below PPP$38/month, the Bank’s criterion for “extreme poverty”. If the poverty line referenced by the MV website is similar to that used by the World Bank, then given that the Mwandama headcount rate is higher than the national rate, this suggests that PPP$38/month is a reasonable upper bound on the typical income of someone living there, which might be much less.So the typical income per head in a village of Mwandama is something less than PPP$456/year. According to the price data collected by the International Comparison Program, the Purchasing Power Parity (“PPP”) conversion factor to market exchange rate ratio for Malawi in the most recent year available (2006) is 0.333. Therefore US$152, converted into Kwacha at market exchange rates, will give roughly the same living standard at Malawian prices as US$456 would give at US prices. So the typical Kwacha income per capita of a Mwandama resident could be purchased with less than US$152 in cash, perhaps substantially less because the income figure is an upper bound.Thus with somewhere between US$100 and US$150, one could purchase the Kwacha income of a typical Mwandama resident. This is bolstered by the fact that unskilled wages in Mwandama are around US$0.50/day (US$156/year with a six-day workweek), and of course per capita income would include many non-wage-earning dependents, suggesting that average income per capita is closer to US$100 or less.How much does the MV intervention in Mwandama cost?  The 2008 Annual Report of the MVP (Appendix B) says that the cost of the MV intervention for a typical village is US$120 per person per year. If the Mwandama intervention is typical, this amount could purchase an amount of Kwacha on the order of 100% of typical local income per capita, and possibly more.Should we exclude administrative costs from this figure? The MVP says that about 1/6 of the US$120 is spent on “logistical and operational costs associated with implementation, community training, and monitoring and evaluation”. So even if we exclude that amount, the big story is the same. And it’s not clear that items like “community training” should be excluded from the amount if the goal is to compare the size of the intervention to the size of the local economy.2) The cost of a comprehensive long-term follow-up survey of the MVP would be less than 5% of the project budget.The MV budget is $300,000/village/year, or $1.5 million each for the original 13 village clusters in the MVP, for a total of $19.5 million. Approximating the design of other impact evaluation studies (with arguably much smaller interventions) suggests that a robust impact evaluation to measure treatment effects of the magnitude that the MVP seeks would entail surveying fewer than 20 households in treated and control (untreated) villages. If three villages from the treatment cluster and three from the control cluster are surveyed, six villages are surveyed in each cluster pair (treatment-control pair) in each wave. Suppose that there are three evaluation waves: 5 years after project completion, 10 years after, and 15 years after. What are the costs of this data collection? Kathleen Beegle of the Research Group in the World Bank, who has extensive experience running large household surveys in Africa, estimates that the total cost of the surveys—including survey teams, supervisors, drivers, computers, GPS equipment, travel by international consultants, everything—would amount to less than $150 per surveyed household. Thus 20 households per village x 6 villages per cluster-pair x 13 cluster pairs x 3 waves x $150/household/wave = $702,000. That is just 3.6% of the total project cost of $19.5 million.


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.