Last week CGD released a new working paper that I’ve been working on—together with my co-authors Tessa Bold, Mwangi Kimenyi, Germano Mwabu, and Alice Ng’ang’a—since I was a grad student in 2007.  “Scaling Up What Works: Experimental Evidence on External Validity in Kenyan Education” describes our attempt to bring the wave of randomized impact evaluations in development economics into the policymaking process in Kenya’s Ministry of Education.

My Kenyan co-authors were eager that the Kenyan Ministry of Education get on board the RCT bandwagon.  School performance was (/is ) abysmal, yet many of the groundbreaking studies in the education impact evaluation literature were conducted in the Ministry’s own primary schools in Kenya’s Western province!   So we set out to replicate, as best we could, one successful impact evaluation conducted by Esther Duflo, Pascaline Dupas, and Michael Kremer, which showed that putting additional teachers on temporary contracts in public schools significantly raised pupils score in math and English. 

Can governments learn from such NGO pilots? 

To answer that question we added one major tweak to the replication study.  Instead of running the program through a partner NGO, we randomly assigned half of the schools in our study to receive a contract teacher from an international NGO, and the other half to receive a contract teacher from the Ministry of Education—just as the Ministry prepared to roll out 18,000 contract teachers nationwide.  We kept everything else as identical as possible between the NGO and government treatment arm: same salaries, same hiring rules, etc.

Below are the results.  We found strikingly similar effects as in the original Dulfo, Dupas, and Kremer study when we had an NGO implement the contract teacher intervention.  Test scores rose by roughly 0.18 standard deviations.  But when the Ministry of Education (MOE) implemented a seemingly identical program, there was zero impact.

Figure 1. The impact of contract teachers on pupil test scores

So what does this all mean? 

“Find what works” is the unofficial motto of the movement to do more randomized impact evaluations in both education research and development policy.  The US Department of Education runs something called the What Works Clearinghouse, where educators can find research-tested solutions.  And in the development industry, DfID and 3ie have been commissioning a range of systematic reviews of the evidence for various development interventions in education and other sectors.

The risk of these systematic reviews is that carefully contextualized research projects get summarized as a menu of apolitical fixes that can be applied to any educational system.  In this sense the results agenda comes close to what the author Evgeny Morozov decries as “solutionism”: the idea that all problems have benign solutions, often technological or—in the case of education reform—highly technocratic. 

Solutionism has its virtues.

Yglesias could just as easily have said “innovations” instead of technologies—and I think that would better capture the phenomenon in the development industry.  This is not just about one laptop per child, it includes contract teachers, community scorecards of school performance, and all kinds of non-technological innovations to cut costs and improve performance.  These small interventions can add up to big improvements in learning and welfare.

But as we write in the paper, solutions are slippery, especially when deprived of context.

In most of the experimental evaluation literature in development economics, the treatment construct is defined to include only the school- or clinic-level intervention, abstracting from the institutional context of these interventions. Our findings suggest that the treatment in this case was not a “contract teacher’’, but rather a multi-layered organizational structure including monitoring systems, payroll departments, long-run career incentives and political pressures.

In short, as we try to figure out what works, we need to be extremely careful in defining what is the “what.”  Is it a contract teacher, or something bigger and more institutionally embedded?   To answer that question, and to get closer to policy-relevant solutions, there’s a case to be made that policy-oriented researchers should be working more closely with the governments they’d like to influence.   Gabriel Demombynes, who discussed our paper on the World Bank’s “Development Impact” blog, makes this point very well:

I expect this paper will be a sort of Rorschach test for views on RCTs and service delivery in developing countries. Evaluation skeptics may try to cite this as evidence that RCTs are a waste of time, since it suggests that successful interventions implemented by NGOs, as they often are in experiments, may not be replicated at scale by governments. Others might take the paper to indicate that NGOs should be the preferred vehicle for interventions. I think these readings would be mistaken.... we should do many more rigorous studies working with governments where we vary forms of service delivery to better understand what can work in practice.… the long, difficult slog of working to improve government systems is the right one, because it’s the only way to ultimately make services work for the poor at large scale.

The wave of RCTs in education research and development economics is a huge methodological advance. The next step, if we’re serious about large-scale policy reform, is bringing context, politics and the state back into the equation.