Is Your Impact Evaluation Asking Questions That Matter? A Four Part Smell Test

November 06, 2014

In 2006 I was in West Bengal with a World Bank team and was asking questions of a group of women about a “livelihoods” program that built and financed women’s self-help groups as a means of increasing women’s productivity and incomes.  After asking them questions for an hour or so I asked them if they had any questions for me or the team—after all, they had been so gracious to answer our nosey questions we would be rude to not allow them to ask us anything they wanted to know.  After an awkward silence, one woman said “You all are from countries that are much richer and doing much better than our country so your country’s women’s self-help groups must also be much better, tell us how women’s self-help groups work in your country.”

I’m American. Along on the team was a German woman, another man from New Zealand, and a woman from the UK.  We all looked at each other blankly as none of us had any idea whether there even were at any time in our countries’ history such a thing as “women’s self-help groups” in our countries (much less government program for promoting them).  We also had no idea how to explain that, yes, all of our countries are now developed but no, all of our countries did this without a major role from women’s self-help groups at any time (or if there were a role we development experts were collectively ignorant of it), but yes, women’s self-help groups promote development.

My four-fold “smell test” for what is important to development

I have a four-fold criteria for whether something is potentially an important determinant of development, or more narrowly, just economic growth, and I am happy if “thing X” that I am proposing is “good for development” can satisfy all four (and then can move on from these simple facts about potential importance to tease out complicated questions of proximal, distal, and reverse causality). 

One, countries differ in their level of development by an order of magnitude.   Countries that are developed should have more of thing X than countries that aren’t.  If Denmark and Canada don’t have more of thing X than Mali or Nepal I am kind of suspicious.

Two, since now developed countries are almost an order of magnitude more developed than they were in 1870 I am happy if there is more of thing X in developed countries now than 140 years ago.  If Germany and Japan don’t have more of thing X (or at least the same amount) than they did in 1870 I am kind of suspicious.

Three, since over the period since 1950 some countries have seen their development improve incredibly rapidly and others have seen almost no progress I am happy if thing X is more prevalent in rapid development successes than in development failures.  If Korea and Taiwan don’t have more of thing X than Haiti and Nigeria then I am kind of suspicious. 

Four, since countries change in their pace of development (and this is particularly true of economic growth, less so of human development indicators) dramatically over time, I am happy if there is more of thing X in a country in periods when development progress is rapid than in periods when development progress is slow.  If China doesn’t have more of thing X after 1978 than before 1978 (as growth accelerated by 3.3 ppa) or if Cote d’Ivoire doesn’t have less of thing X after 1978 than before 1978 (as growth decelerated by 3.7 ppa) then I am kind of suspicious.

These four of course don’t resolve the debates or details about the respective roles of macroeconomic management, policy approaches to external markets (e.g. trade, capital, ideas), security of property rights, infrastructure, accumulation of human capital, technological change, capability in the product space, or “institutions” (or, more deeply, what is cause and what is consequence amongst these elements themselves).  But nearly all contenders in debates about economic growth or development more broadly pass 2 or 3—and sometimes all four—of these “smell tests” of at least potentially being an important determinant.

Are interventions being evaluated important for development?

Eva Vivalt (2014) has written a paper that is so good it deserves several blog posts to discuss its interesting findings. She and her team have asked the important question about the generalizability of the findings from “rigorous impact evaluations” (including RCTs).  In order to do her team surveyed 621 papers (not all of which could be used in her analysis).  That is an impressive number.  Suppose typical productivity of an academic or research economist is three original completed papers per year.  Then 621 papers is 207 person/years of research.  Alternatively think of inclusive cost (opportunity cost of researcher time plus money costs) per impact evaluation. 

I would encourage you to fill in this table with the 20 programs on which Vivalt (2014) finds enough rigorous impact evaluations for comparison before reading on.


After the table is filled in (don’t cheat or I’ll send a nudge mobile phone reminder to alter your behavior) ask yourself: why has much of the best and brightest talent of a generation of development economists been devoted to producing rigorous impact evaluations about these 20 topics?


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.