Ideas to Action:

Independent research for global prosperity


US Development Policy


In my first blog on recently published impact evaluations by the Millennium Challenge Corporation (MCC), I argued that MCC has set a new standard for systematic learning about development programs. In this blog, I’m addressing a second set of questions related to the design and use of impact evaluations

Impact evaluations are not foolproof. The probability that a given evaluation will successfully yield good information is nowhere near 100%.  Evaluation is a risky endeavor – as risky as a development project or even a new business. All agencies (managers take note!) should just recognize this from the start.

There are at least three basic ways that an evaluation can fail:

  • If the program doesn’t get implemented, the study has nothing to evaluate.
  • If the program is implemented but the study doesn’t collect good data, it can’t tell you much.
  • Even with good implementation and data, the evaluation may have too little “statistical power”, that is, it may have too few observations to detect an impact when the effects are small or when there are lots of other confounding factors.

In the MCC studies, most programs were implemented, achieving the inputs and outputs that had been planned. For example, in El Salvador, 11,520 farmers reportedly applied improved farming techniques as a result of the program, well above the target of 7,000. However, some implementation problems are not apparent from the raw numbers. In Honduras, the process of selecting farmers for the program was poorly done and in Armenia, complementary irrigation investments were delayed.  You can only evaluate the impact of an intervention if it has been successfully implemented. Most of these programs accomplished the tasks – the inputs, activities, and outputs – that they set for themselves but implementation of some components did not fully meet the intended design.

The MCC studies confronted some problems with data collection but the one which stood out for me was the difficulty evaluators had in maintaining a clear enough distinction between farmers who received training and those who didn’t. In Honduras, the implementing agency apparently confused matters by obfuscating how farmers were selected. In Ghana, both participants and non-participants seem to have received about the same amount of training. Evaluators are constantly under pressure to bring everyone into a program even when no one really knows if the program is beneficial. While it would be unethical to test a proven intervention by withholding it from a control group, the results of these and many other studies show that so-called “proven” interventions are frequently less robust than proponents claim. In most of these cases, non-participants were not harmed and many participants failed to benefit. This is not like a program to feed hungry people or provide parachutes to those who jump out of airplanes. Getting farmers to spend time listening to an extension agent and change the way they farm may or may not be beneficial to them. Like an unproven medicine, it should be tested and the control group included only if some clear benefit becomes apparent.

The biggest problem these studies faced was inadequate statistical power. In some cases, the average difference in income between participants and non-participants were large but there was so much variation, so much “noise”, that the difference could have occurred just by chance (Michael Carter addresses this in his peer review). This problem is common in the impact evaluation world. Markus Goldstein notes that the randomized roll-out designs were partly to blame. Given the complexities of our social world, I think we are simply too optimistic about what our programs can accomplish. When you expect farm or household incomes to double, you don’t need very large samples to distinguish the impact. The combination of high predicted returns and cost constraints on evaluation is a perfect recipe for small samples that lead to “we-don’t-know” at the end of the day. Managers who want to increase the probability of getting good evaluation results and effectively use their evaluation budget should take heed. When teams propose to study such complex social interventions, don’t ask them to reduce the cost. If anything, ask them how much it would cost to increase the probability of detecting a statistically robust estimate and budget for that! You won’t regret the decision when it comes time to show what your department has learned.

MCC is taking the right approach to these evaluations. First and foremost it has reaffirmed its commitment to keep doing and publishing rigorous evaluations. For this alone, they deserve an international award. Secondly, MCC is presenting the results in a clear but nuanced fashion – emphasizing learning relative to accountability. Third, they are proposing to improve evaluation designs so that future studies yield better evidence. And fourth, they are mining the studies for information about ways to better design and implement their farmer training programs.

Now, what do the studies tell us about farmer training programs themselves? I’ll address that in part three …

Related Topics:


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.