I recently made a public plea for rigorous evaluation of the U.N. Millennium Village Project (MVP), an experimental effort to break African villages out of poverty traps with a large package of outside assistance. The public response from the MVP left me puzzled.
Paul Pronyk, Director of Monitoring and Evaluation for the MVP, responds with two points. Pronyk’s first point is:
“The need for careful evaluation and long term follow-up. We fully agree. The duration of the Millennium Villages Project is, in fact, 10 years, according to the 2015 MDG timeline. We are now completing the mid-point (end-of-third-year) review of the first five-year phase. We will of course continue to monitor and evaluate the project beyond 2015.”
It is strange to learn that the MVP is a ten-year project. The MVP has repeatedly stated that it will demonstrate how “people in the poorest regions of rural Africa can lift themselves out of poverty in five years’ time.” If it has now morphed into a ten-year project, is this because the project failed to meet the original five-year goal? Or if it was always conceived as a ten-year project, why was it sold as a five-year project?
It is laudable that the MVP has hereby announced its intention to continue evaluation beyond 2015. But that is not enough. That long-term evaluation must be done before the project is scaled up. I explained why in my earlier post: Other village-level development packages have seemed to have effects after five years but had zero effects after ten years. The goal of the MVP is to effect long-term change, building “a solid foundation for sustainable growth.” So whether or not the project is effective can only be assessed in the long term. That means that scaling up the project before any long-term evaluation would be scaling up an intervention of unknown effectiveness.
But without any such evaluation, the MVP has already called for “the model’s expansion throughout Africa” to affect millions of people. Researchers at the Overseas Development Institute already advocate for African governments to make “MVP-type investments” a “key component” of their national development strategies. It would be irresponsible to vastly expand an intervention that is not known to accomplish its own stated, long-term goals. This concern is by no means specific to the MVP; it is irresponsible to vastly expand any testable intervention without testing.
Pronyk’s second point is:
“The need to compare villages that do get the intervention with those that do not. We also agree with the inclusion of comparison villages. Indeed, once we scaled up the project from the first two sites to 10 countries, we also introduced matched randomly selected comparison villages in 10 of the more recently established sites.”
I am sure that Dr. Pronyk knows the difference between what he calls “matched randomly selected comparison villages” and true, scientific control villages. But he must also know that many readers will not immediately grasp this difference. I will explain:
A true scientific evaluation would pick which village clusters receive treatment from each of several matched pairs of clusters. The crucial part is that matching must be done before it is decided which cluster to treat. First, pairs of village clusters broadly similar to each other would be established --- for example, the pair of clusters A1 and A2, the pair B1 and B2, and the pair C1 and C2. After this, it would be randomly determined which cluster from each pair would get the treatment. For example, the treated clusters might be A2, B1, and C2. If this is done for a sufficient number of pairs, any difference between the treated and untreated clusters could only reasonably be attributed to the effect of the intervention. It is impossible to “pick winners” in this setting because treatment is assigned randomly, and therefore transparently. This is the core of the evaluation method used to ensure that the pharmaceuticals you take are effective.
What the MVP has done is something very different from this, very different from a rigorous evaluation. First, village cluster A1 was chosen for treatment, for a range of reasons that may include its potential for responding positively to the project. Then, long after treatment began, three other clusters that appear similar to A1 were identified --- call these “candidate” comparison clusters A2, A3, and A4. The fact that all three candidates were chosen after treatment in A1 began creates an enormous incentive to pick those candidates, consciously or unconsciously, whose performance will make the intervention in A1 look good. Then the comparison village was chosen at random from among A2, A3, and A4.
Done like this, it helps very little that the comparison villages were randomly chosen, since they were randomly chosen from a group that were all chosen after treatment began. Differences between the treated cluster and the comparison cluster might be due to the MVP. But those differences might also be due to how the original Millennium Village was chosen, and how the three candidate comparison villages were chosen. This is not a hypothetical concern; I discussed real-world examples of this problem in my original post.
Would it be possible to do the evaluation in the proper, transparent, truly randomized way? Absolutely. It is too late to do it for the original 13 Millennium Villages, but it could easily be done for the next wave of villages. Given that the MVP seeks to scale up the intervention across the continent, there will be many opportunities to do the evaluation right.
I have talked with a few of the world’s top experts on conducting randomized evaluations in Africa, and all have told me that it would be possible to conduct a proper evaluation of the MVP with 20 or 25 matched pairs of village clusters. Chris Blattman, a Yale professor and CGD fellow with deep experience in such evaluations, has rightly pointed out that such a small sample size would only allow evaluation of the entire MVP package intervention as a package, and would not allow richer estimation of the relative effectiveness of different elements of the package. But this does not mean that evaluating the effectiveness of the package as a package is uninteresting. Indeed, the basic premise of the MVP is that a large package is capable of permanently breaking villages out of poverty traps. That notion is testable with a modest and entirely feasible sample size.