[Note: A full response to Mark Pitt is now available.]
On Saturday night, Jonathan Morduch and I learned second-hand that Brown University economist Mark Pitt had circulated a paper via blast e-mail that challenges our replication of Pitt and Khandker, which was for a decade the leading study of the impact of microcredit on poverty. Here's the abstract of the new paper:
This response to Roodman and Morduch seeks to correct the substantial damage that their claims have caused to the reputation of microfinance as a means of alleviating poverty by providing a detailed explanation of why their replication of Pitt and Khandker (1998) is incorrect. Using the dataset constructed by Pitt and Khandker, as well as the data set Roodman and Morduch constructed themselves, the Pitt and Khandker results standup extremely well, indeed are strengthened, when estimated with Roodman’s cmp program, after correcting for the Roodman and Morduch errors.
History has repeated itself. Back in 1999, Pitt wrote a similar response to Jonathan's original attempt to understand Pitt and Khandker.
The timing is terrible for me as I am in the midst of a big push to finalize the book. So here is a quick response, pending the investment of time needed for a more thorough reply. Take these thoughts as preliminary.
Pitt's response has exposed an important mistake in our work. With his fixes, we now match their key results extremely well. This is good news. But in truth we did not dwell on the sign mismatch---Pitt and Khandker found a positive association between borrowing and household spending where we found a negative one---but on whether cause and effect had been proved. In announcing the study, for instance, I wrote, "Seemingly, lending to women makes families poorer…but I just told you how much credence we put on such claims about cause and effect." And our statical analysis on the causality claims has not changed (though we have more to do).
As a result, Pitt's response:
- Validates our open approach to research, in which we, in contrast to Pitt and Khandker, have freely shared our data and code.
- Strengthens our main conclusion, which is about lack of proof of causality.
I'll elaborate on those points in turn.
1. How does it validate our open approach? In the mid-1980s, the editor of the Journal of Money, Credit and Banking set out to perform replications (i.e., rerun the original math on the original data) of articles recently published in his journal. Roughly a third of authors responded to his request for data. From these, he determined that "inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence." This is not surprising. Most econometric work involves computer programming. All programs have bugs, and we'd expect more bugs in code written by amateur programmers, a group that includes most economists. Invariably, such mistakes look dumb when exposed. Think "I used the GDP of Switzerland instead of Swaziland." But there was good news: rarely did the mistakes overturn the authors' conclusions.
We have found the same pattern in our work. We unearthed small errors in Pitt and Khandker and in our own code. For example, Pitt and Khandker appear to treat all students in school as having no years of education. Separately, in one place (Table A2) they report 1,461 girls and 1,589 boys of school age in the sample while in another (Table 4) they report roughly double those figures, which could signal a problem in their analysis. The formulas at the end of the paper contain typographic errors. And so on.
Most of Pitt's substantive points we would either quibble with or are secondary, in that they don't affect the results much. But one is not: it points to one of those mistakes, on our side, that matters a lot. It is the omission of the dummy for a household's target status (a zero-one indicator of whether a household formally qualifies for microcredit based on whether its landholdings are below half an acre). Pitt is right that including this missing control flips the sign on key coefficients so that they match the original. He is also right that this missing variable was in fact listed in a table in his paper and mentioned in text. We are grateful to him for uncovering and pointing out this mistake (which, to be clear, is essentially mine). The relationship between borrowing and household spending (a measure of affluence or poverty) now appears to be positive in our replication too.
Before discussing what this means for our understanding of the impact microcredit, I will finish my point about the research process. Replicating a complex statistical study without access to the original code and data is a scientific whodunnit. You look for clues, generate hypotheses about what the original authors did, then pick the hypothesis that best fits the observable data. Since the replicatees certainly made mistakes, you cannot take any particular piece of evidence as gospel.
To save space, the final Pitt and Khandker paper did not list complete regression results, only results for the microcredit variables of greatest interest. So we turned to the more-detailed working paper version, whose Table B2 does provide complete results for what would seem to be the key regression. The variable list there excludes the variable we should have included and it includes a variable Pitt says we should have excluded. Now we see that the list was for a slightly different regression, one restricted to target households, for which an indicator of target status would have been superfluous since it is uniformly 1. This difference from the headline regression (in Table 4.1C of the working paper) appears not to be signaled in the text. It is signaled by the very last row of the Table B2, which shows the number of households in the sample for this regression as smaller than in the headline one. But see above (on number of kids in school) for an example of where we didn't know whether to take such numbers at face value. Meanwhile, the variable list we copied conflicted with the table, also in the working paper, that Pitt now says we actually should have copied (over the "participated but did not take credit" variable); again, it was not obvious which to believe.
Update: I discovered that the data set Pitt sent us embodies the same misleading variable list. Variables xb1--xb25 are the explanatory variables for household spending. None of these is the erroneously omitted variable, but the last is the erroneously included variable. The erroneously omitted one, nontar, is elsewhere in the file but clustered with variables that are not explanatory variables, suggesting that it is not either.
The point is not to excuse our mistake but to illustrate how closed research, meaning research in which data and code are kept secret, inhibits replication (which is a sine qua non of science). Our open research, in contrast, while exposing us to vehement attack, allowed for the detection of error and helped us improve our work. That serves the greater good.
Moreover, I must emphasize that the authors have been less open with us than it would appear. In our correspondence, and in reviewing our paper for the Journal of Political Economy Pitt emphasized that the data set he provided us was not the exact one used in Pitt and Khandker, and that therefore our work is not a true replication. The new paper now refers categorically to the same rectangle of numbers as "the dataset constructed by Pitt and Khandker." And the copy now made public is better than what was made available to us, having informative variable names and descriptions, more columns (including some Pitt wrongly says were in there before), and a missing row restored. Meanwhile, Khandker fought our efforts to obtain the later round of survey data from the World Bank. (A more senior official ultimately shared it, according to the World Bank's policies on data openness.)
2. How does Pitt's work strengthen our conclusions? Our main conclusion is not about the failure to match the signs of the key coefficients. It is about the failure to convincingly demonstrate cause and effect. For example, read Pitt's quote from my congressional testimony:
A couple of years ago I spent time scrutinizing what was then the leading academic study finding that microcredit reduces poverty. To decide whether I believed this crucial study, I replicated it—rerunning all the original math on the original data. The math and computer programming were really complex. In time, with my coauthor Jonathan Morduch, I would conclude that the study does not stack up. We’re not saying microcredit doesn’t help people, just that you cannot judge the matter with this data.
...we have little solid evidence that microcredit, the dominant form of microfinance, reduces poverty.
This is not about our inability to match the sign. It is about whether we can interpret the sign as an effect of borrowing on poverty. (It could be an effect of poverty on borrowing, with better-off people capable of borrowing more.) This is the standard problem with inferring causality from non-experimental data, and is one reason that the randomized approach has caught on.
More to the point, when we fix our regressions, they continue to fail tests of the assumptions needed to infer causality. So improving the match to the original greatly strengthens our conclusion that this study does not convincingly demonstrate an impact of microcredit on poverty. (Non-econometricians can read Roodman's Law of Instrumental Value to understand the underlying issues.)
Now that we are able to replicate the original much better, Jonathan and I will spend some time studying it. (And you can too: just add the nontar dummy to the second stage.) For now, here are a couple of preliminary tables (updated April 4) . In the first, under "WESML-LIML-FE" you'll see that the "PK" and "new" columns now match well. That's Pitt's main point. But at the bottom of the last column is the result of a statistical test, a "0.022." (For experts, this is a Hansen test on a parallel 2SLS regression, as explained in our paper.) That says that if the assumptions Pitt and Khandker make in order to infer causality from correlations are correct, then there is only a 2.2% chance that a certain test statistic would be as large as it actually is.
And this updates Table 4 of our working paper. This shows the parallel 2SLS regressions still failing Sargan/Hansen tests, and (in the right half) excluded instruments having clear explanatory power when included:
One of the questions we will investigate in coming weeks is why the 2SLS regressions produce such weak results compared to the LIML ones. In theory 2SLS is less efficient, but has the advantage of robustness to heteroskedasticity. (The LIML regressions model the Tobit censoring of credit and so require homoskedasticity.)
More technical notes: The regressions reported above incorporate two other changes from Pitt. One is his distinction between what I call the censoring threshold of log(1000) and the censoring value of log(1). cmp can actually handle this fine. The other is that in the instrumenting stage, observations for all three survey rounds now use the first round's data. In addition to homoskedasticity, the Pitt-Khandker LIML regressions assume no cross-household error correlation; thus the only deviation from sphericity that they allow is serial correlation. This makes errors i.i.d. within survey rounds and makes the Sargan test valid for 2SLS regressions restricted to individual rounds. The Hansen test is required for regression that pool all three rounds. [Update: this is not quite right. The regressions are weighted for sampling---"pweights" in Stata parlance---which effectively introduces heteroskedasticity and invalidates the Sargan test even in cross-sections. But Hansen, now shown for all columns, remains valid and corroborates Sargan.]
Code and data are here.