Share

[Update, June 29, 2012. The authors of the study here blogged have retracted it. The issue they refer to in the retraction is one I raise toward the end of this post, where I write, "It appears to me that the estimator does not include fixed effects, but merely clusters standard errors by country."]

Have you followed the debate on whether health is "fungible," i.e., whether giving money to governments to spend on health leads them to cut their own funding for same, thereby effectively siphoning health aid into other uses? It has been like watching the French Open from a center-line seat. Two years ago, a team of authors mostly affiliated with the Institute for Health Metrics and Evaluation (IHME) in Seattle concluded in the Lancet (gated) that health aid has been highly fungible. Now two physician-scholars at Stanford have reanalyzed IHME's data in PLoS Medicine (quite ungated) and judged the Lancet findings to be spuriously generated by bad and/or extreme data points.

The original paper estimated that for every dollar that in developing-countries governments received in health-focused aid, they cut health spending from their own resources, such as tax revenues, by $0.43--1.14. Ergo, donors who thought they were investing in health were substantially financing something else unknown---road building? jets for dictators? That conclusion went for the noted private philanthropy that paid for the research. This uncomfortable factoid about fungibility was widely cited in academia, and in the press (e.g., the New York Times, where it cameoed, garbled, on the home page).

To the casual reader, the reanalysis by Rajaie Batniji and Eran Bendavid seems damning:

We...demonstrate that prior conclusions drawn from these data are unstable and driven by outliers. While government spending may be displaced by development assistance for health in some settings, the evidence is not robust and is highly variable across countries.

My advice: don't trust the conclusions of either side.

Of the two studies, the original is of higher quality, not least because it embodies the Herculean effort to collect the data that the new study is quick to describe as flawed. It is always harder to create than destroy. Still, as I blogged when the Lancet paper appeared, there are strong technical reasons to doubt its confident claims. Sure, maybe countries receiving more aid for health spend less on it themselves, but what is causing what is less clear. It's that old correlation-is-not-causation bugaboo. The IHME team battled the bugaboo with a fancy mathematical technique, but in a way that I am qualified to doubt as the author of a popular program to implement it.

The Stanford docs' new scrutiny of the IHME health data set is healthy. But their own data undermine their conclusion that the Lancet conclusion is undermined by bad data. (In 2009, my colleague Mead Over took apart another study by Bendavid as tautological.)

You don't need a Ph.D. in statistics to see how. The key table from the new paper is below. It shows the results of four sets of statistical runs, listed from top to bottom. The first set was done on the full IHME database, the next three after omitting certain extreme groups of data points in turn. Within each set of runs, the left side of the table uses health spending data from IMF and the right from WHO; the two sources disagree enough to significantly affect results. Also within each set, the first row uses the statistical method favored in the Lancet ("Arellano-Bover/Blundell Bond"---Did I mention I'm an expert on that? Yes I'm a highly expert expert on that) while the second uses the simpler method favored by Batniji and Bendavid ("Linear, country clustered"). Scan the table, then read my exegesis underneath:

The "coefficient" columns show the central estimates of fungibility. So, e.g., "--0.40" suggests that for every dollar of health aid, receiving governments spent an average 40 cents less of their own money on health.

Notice that the p values in the rows for the IHME-favored "Arellano-Bover/Blundell Bond" method are almost all 0 or 0.001. "p" stands for probability, so these small numbers are saying that if the IHME authors are wrong, if health aid is not fungible, then it is highly improbable that we would see such high estimates of fungibility as are reported to the left of the p values. Turning that around, if you buy the IHME statistical approach, aid is probably fungible, and little in this table changes that conclusion. Contrary to the damning verdict I quote at the outset, the Lancet analysis is robust to the deletion of data points that the Stanford authors deem dubious.

What does change the numbers---producing lower fungibility estimates and higher p values---is the switch to the "Linear, country clustered" method that the authors of the new study prefer. So the big divide within this table is not between its top quarter the rest, but between the"Arellano-Bover/Blundell Bond" rows and the "Linear, country clustered" rows.

That is, what breaks the fungibility finding is not the changes to the data sample, but the change to statistical method. The authors have interpreted their results backwards.

To go slightly technical for a moment, there are other problems, which together persuade me that the Batniji and Bendavid have more to learn about short-panel econometrics than the IHME team did. The argument for fixed effects over random effects reads like a non sequitur, focusing on whether nations are diverse instead of whether their diversity fits a bell curve. The authors do not explain why their statistical approach is superior, thereby seeming to confuse issues of data, which they do discuss, with issues of method, which they essentially do not. Their preferred method is described in just one sentence, which contains a phrasing,"country fixed effects, clustered by country," that I have never seen before, and I think reveals confusion. It appears to me that the estimator does not include fixed effects, but merely clusters standard errors by country. (A quick e-mail exchange with Batniji has strengthened my guess that Ordinary Least Squares was done without fixed effects, but I'm still not certain.)

Most importantly, if my educated guess is right, then the authors appear unaware that their favored method is known to be biased. Dynamic panel bias is exactly what motivated Manuel Arellano and Stephen Bond to design their famous alternative. (See page 4 of this paper by Bond.) In the case at hand, the naive method will overestimate the ability of last year's domestic health spending to predict this year's, which will leave less room for variables such as health aid receipts to explain that outcome. OLS without fixed effects can be expected to underestimate the negative relationship between health aid receipts and governments' own health spending, which is consistent with what we see in the table above. [Update based on further correspondence: my educated guess about them not actually using fixed effects was right. But my guess about the direction of bias was wrong because they did not actually include the lagged dependent variable.]

When I blogged two years ago, I reported that the Lancet authors had not responded to a request for their data and computer code. In effect, and as is still the norm in academia, they asked the world to trust their results---indeed, to use them to shape policy involving billions of dollars and millions of lives---while they kept the precise method behind those results secret.

This new article sits on the servers of the Public Library of Science, which aims to break the centuries-old lock of the journals on scientific research. Now, anyone can be a reviewer. Check out the dozen comments I've peppered throughout the web version of the article. (Look for the little "1" icons. Did I go overboard?)

It appears to me that the old and new-style publishers have both let slip into their pages articles with problematic methods---or at least methods whose results are over-relied upon---probably for lack of reviewers expert in short-panel econometrics. But at least with the PLoS one, the crowd can flag the low quality, and even correct it at the margin (and in the margins).

In January, I gave a talk on my new microfinance book at the IHME. Toward the end, I described CGD's new data and code transparency policy, which calls on our researchers to publicly post all the data sets and the lists of computer commands (code) needed to reproduce our statistical results. I argued that the IHME could learn from this policy. My hosts were extremely gracious, and soon sent me the data and code for the Lancet paper. Now, two years later, I am engaged in a constructive, private dialog with them, that I will say more about when the time is right.

Meanwhile PLoS, unlike the Lancet, has a policy that calls for data sharing in letter and code sharing in spirit. I mentioned that I'm not yet 100% certain about the methods in the new paper. But thanks to that policy, I'm optimistic that, much more quickly this time, I'll get to the bottom of the latest research on health aid fungibility.