Tomorrow is apparently the official release date of a Lancet article you may already have read. Christopher Murray (the corresponding author) and his colleagues have assembled a detailed and comprehensive data set on "development assistance for health" (DAH). This data set, released last year, is an achievement in itself. (See Mead Over's commentary.) The new study plugs the DAH data into cross-country regressions to study a big question: how does aid for health affect total health spending in receiving countries?
This is an interesting and important exercise, but I believe that the information supplied does not warrant such confident causal claims as this:
The statistical analysis showed that DAH to government had a negative and significant effect on domestic government spending on health such that for every US$1 of DAH to government, government health expenditures from domestic resources were reduced by $0·43 (p=0) to $1·14 (p=0). However, DAH to the non-governmental sector had a positive and significant effect on domestic government health spending.
I am less concerned about the numbers than I am the causal interpretation attached to them. How sure can we be that health aid is affecting health spending but not, for example, the other way around? I'm no health policy expert, and I don't play one on this blog. But I wrote a program that the authors may well have used to come up with these numbers. (If they didn't use mine, they almost certainly used a program that Stata Corp wrote to mimic features in mine.)
Mind you, I find the numerical results roughly plausible. In general, one can imagine different ways that health aid affects health spending. Health aid may "crowd in" health spending: perhaps because of conditionality, injecting the aid from the outside causes receiving governments to contribute more from their coffers; then health aid would increase total health spending more than one-to-one. Or perhaps health aid is partly fungible: governments take advantage of outside finance for health by shifting their own budget to guns, gyms, and schools. (I would. Wouldn't you?) Or maybe the crowding out is so intense that total health spending goes down as health aid goes up. My own prior is that aid to governments is partly fungible: somewhere between 0% and 100% is siphoned into other areas. Most of the estimates in the paper are within this range.
And these plausible results point to an important conclusion: there is no simple equation between aid and expenditure. (Hat tip to April Harding.) If you estimate that it will cost $100 million to stick a needle in every Nigerian kid's arm, giving $100 million to Nigeria might not get the job done. People allocating aid care about that sort of complication.
What is unclear is how much the study carries us beyond these intuitions. It is really hard for cross-country statistical analyses to prove what causes what. Maybe aid is indeed reducing spending by the receiving governments. Maybe governments with lower domestic health spending attract more health aid (reverse causality). Maybe third factors simultaneously affect both variables. One can make parallel conjectures for the finding that DAH increases non-governmental health spending. For more on the fundamental obstacles here, see my November post on CGD's main blog and my Guide for the Perplexed.
Now, the authors in the Lancet well understand the challenges to inferring causation from correlation. That's why they deploy the complex "Arellano-Bover/Blundell-Bond (ABBB) linear generalized method of moments estimators." Behind the veil of complexity, ABBB uses instruments (explained in my November post) to try to sort out causality. The instruments are "lags." That is, roughly speaking, the instrument for DAH today is DAH some years ago, which is assumed to be correlated with government health spending today only via DAH today. My technical paper (published, gated version) explains fully.
That paper, as well as the follow-up Note on the Theme of Too Many Instruments (published, gated version), make some general points that are relevant for reading this Lancet article:
- The researcher has many degrees of freedom in applying ABBB, mostly having to do with the set of instruments. There is no one ABBB estimator.
- These choices matter: they can make the estimates more or less valid; and they can make certain tests of that validity more or less strong. It is unfortunately very easy to generate results that are invalid but test valid.
- The ABBB estimator is based on a particular assumption about the dynamic process that generates the data we observe, referred to in the title of the "BB" paper ("Initial conditions and moment restrictions in dynamic panel data models"). This assumption is a) non-trivial and b) testable (albeit imperfectly).
- Unless the authors report what choices they make and the results of various appropriate tests, it is impossible for the reader to judge how well ABBB is applied. In this article, the reporting is more Spartan than I had ever seen before, perhaps reflecting the awkward jump from economics to medicine. We only get coefficients and standard errors.
In sum, it is very easy for the complexity of ABBB to hide rather than solve the fundamental challenges to identifying causal effects. Simply stating that ABBB has been applied, as in the Lancet, does not meet a reasonable burden of evidence. We cannot judge how well the causal claims are grounded in the data. That seems an important caveat given the rather strong and explosive conclusions.
The authors' online appendix does not fill the gap. It does report results from a separate estimation procedure that grapples in a different way with the challenges of this kind of data ("panel data"). But that one incorporates no effort to instrument in order to pin down causality, so it doesn't speak to the issue I am raising.
I hope that the authors will share their programming code and data. Owen McCarthy and I have submitted requests for code over the last few days, which may bear fruit.
To get technical, here is what I would want to see. All of these can be easily had from my program:
- A precise description of the instruments created (which lags of which variables; whether or not collapsed).
- The number of instruments.
- Whether estimation is one- or two-step, and what kind of standard errors are calculated.
- Results of the relevant Arellano-Bond autocorrelation test(s).
- Coefficients and standard errors for the lagged dependent variable. The coefficient needs to be less than 1 for ABBB to be valid. [Update: I suppose it is, else they could not "equilibrium-correct" their results.]
- Results of the Hansen instrument validity test and a demonstration that they are not weakened by instrument proliferation.
- Results from difference-Hansen tests of the instruments that are most suspect in ABBB, namely those for the levels equation based on lags of the dependent variable. Analogous checks here for weakening from instrument proliferation.
David Roodman usually blogs at David Roodman's Open Book Microfinance Blog.