The R-Word Is Not Dirty

March 05, 2014

Randomized controlled trials (RCTs) have been used to demonstrate the effectiveness of development interventions from cash transfers to tutoring. They are one of many tools used to evaluate policies, products, and services around the world.

RCTs also catch a lot of flak, including today in a New York Times blog post by University of Chicago professor Casey Mulligan. He endorses the old idea that development RCTs somehow withhold beneficial treatments from poor people—calling them ethically or morally “wrong”.

There are limits to what we can learn from randomized controlled trials, of course, but the drawbacks presented by Mulligan are mostly red herrings. His handwringing rests on two clearly false ideas: 1) rationing is “wrong” because were it not for the research, it would be possible for everyone to receive the treatment, and 2) experimenting is unnecessary because it can be known before the experiment whether or not the treatment is all that beneficial.

The simple truth is that rationing treatment is inescapable, and we do not know whether or not many development interventions—including the ones Mulligan seems to like—are effective. Let’s dig into each of these.

I’ve run numerous randomized trials in developing countries, and in my experience, they don't work quite the way Mulligan suggests. For example, Mulligan describes the challenges in measuring the effect of education on wages, and then proposes: 

An alternative approach is to randomly assign study participants to treatment and control groups. The treatment group would be sent to extra schooling; the control group would be prohibited from attending extra school.”

Most RCTs use “encouragement” designs exactly because the scenario Mulligan proposes is preposterous. Researchers have neither the authority nor the right to prohibit a control group from attending extra school, and they cannot require attendance from the treatment group. Instead, researchers randomly assign some study participants to be eligible for a program, such as tutoring.  Those in the control group are not eligible for the tutoring provided by the study, but they are not prohibited from seeking out tutoring of their own. 

The difference may seem subtle, but it is important.  The control group is not made worse off or denied access to services it would have been able to access absent the experiment. It might not share in all of the benefits available to the treatment group, but that disadvantage is not necessarily due to the evaluation. Consider the Millennium Villages Project, which Mulligan describes as follows:

Perhaps most famously, development economists—randomistas as Prof. Angus Deaton calls them—have randomly assigned economic assistance to poor villages in order to measure the rates of return on that assistance. Prof. Jeffrey Sachs’s Millennium Villages Project, an ambitious effort to help African villages escape poverty, has been criticized for, among other things, failing to randomly assign its treatments.

But Professor Sachs didn’t accidentally forget to randomize his assistance. He thinks that it’s wrong to withhold from poor people assistance that he’s confident can help. The patients who get placebos in randomized F.D.A. trials would probably agree.”

If we were in a world where we could afford to make every village a Millennium Village, and we knew that Millennium Villages were the most effective way to provide assistance to the poor, then we could debate the ethics of designating a control group. I’ll discuss the second condition in a moment, but let’s focus first on the budget constraint. Rationing access to a program is a natural consequence of budget constraints, not of randomized controlled experiments. Millennium Villages are really expensive—on average, over $4,500 per household. Right now, they reach at most a few hundred thousand people. Hundreds of millions of others are excluded, because that project simply doesn’t have enough money to operate in every poor village in the world. Failing to randomize didn’t prevent people from being excluded from the Millennium Villages,it just meant that the people excluded were excluded by project design, not chance. And, unfortunately, it means that it’s a lot harder for us to learn how effective the Millennium Village project really is.

That takes us to the condition I hinted at above. We use RCTs to answer certain types of questions about the impact of a program or product. When the questions have been answered and we know whether, how, and for whom a product works, RCTs are neither necessary nor interesting.  But at the time of evaluation, we didn’t know whether tutoring improves student learning (or, for that matter, whether Millennium Villages improve welfare for their residents).

Perhaps more importantly, we still have questions about what type of tutoring is most effective, for whom it has the biggest benefits, how and by whom it should be implemented, and whether it is more cost effective than other uses of the same amount of money. It often takes a series of studies to establish the best way to carry out what seems to be a simple intervention. For example, after initial promising studies of community-based tutoring programs in India and Ghana, work by Justin Sandefur and others highlighted the challenges in obtaining equal success with programs implemented by governments rather than NGOs.

If we really knew that tutoring was the best use of our education dollars, then we should focus on scaling up effective programs. If we really knew that Millennium Villages were the best way to do development, then, yes, we should focus on expanding them. Mulligan ignores the unanswered questions and the opportunity costs of spending in his indictment of RCTs.

When we spend money on programs whose effect we do not know, we are experimenting without evaluating. When we provide a program to some people and not others, we are rationing without randomizing. These sorts of uncontrolled experiments measure up no better than controlled experiment along the dimensions that Mulligan discusses, and they fall far short of RCTs in guiding us towards better decisions and improved program designs in the future.


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.