With rigorous economic research and practical policy solutions, we focus on the issues and institutions that are critical to global development. Explore our core themes and topics to learn more about our work.
In timely and incisive analysis, our experts parse the latest development news and devise practical solutions to new and emerging challenges. Our events convene the top thinkers and doers in global development.
Microfinance, foreign aid, Commitment to Development Index, debt and debt relief
David Roodman, a former CGD senior fellow, worked at the Center from March 2002 to July 2013. His work at the Center focused on microfinance, debt relief, and aid effectiveness. His widely praised book Due Diligence confronts questions about the impacts of microfinance and how it should be supported. He wrote the book through a pathbreaking Microfinance Open Book Blog, where he shared questions, discoveries, and draft chapters.
Roodman was an architect and manager of the Commitment to Development Index since the project's inception in 2002. The Index ranks the world's richest countries based on their dedication to policies that benefit the 5 billion people living in poorer nations; it is widely recognized as the most comprehensive measure of rich-country policies towards the developing world.
Roodman wrote several papers questioning the capacity of common cross-country statistical techniques to shed light on what causes economic development. He co-authored a 2004 American Economic Review paper that challenged findings of World Bank research that aid works in a good policy environment. His non-technical Guide for the Perplexed builds on analysis of methodological problems and fragility in other studies. Among econometricians Roodman is best known for his computer programs that run in the statistical software package Stata; articles about them won him the inaugural Stata Journal editors' prize in 2012. Also in 2012, Roodman aged off the RePEc list of top young economists in the world, at number 6.
If I had had the stamina, I would have inserted into my book a chapter on the history of the microfinance movement.
Central to that story would have been the work of Sam Daley-Harris. He was drawn into activism in the late 1970s by the, well, cult-like est movement and associated Hunger Project, which had the grand goal of eliminating world hunger by 1997 (and apparently didn't do much practical to achieve the goal). In 1980, Sam founded the more practical, and at least as disciplined, grassroots lobbying group Results, which in the early 1980s persuaded the U.S. Congress to increase funding for Oral Rehydration Therapy and the UN's International Fund for Agricultural Development (IFAD). These were remarkable feats for a mostly-volunteer, start-up nonprofit.
As it happened, IFAD was an early funder of the Grameen project in Bangladesh. An IFAD documentary on Grameen brought microcredit to Sam's attention in 1985. He later introduced Muhammad Yunus to the American Congress and press. As I recall, a story in the Christian Science Monitor led to a segment on 60 Minutes in 1990. Yunus was catapulted into fame in the US and beyond. Sam matched that publicity coup by organizing the Microcredit Summit in Washington, DC, in 1997, whose headliners included First Lady Hillary Clinton and Bangladesh Prime Minister Sheikh Hasina. (I think FINCA founder John Hatch first suggested the Summit.) At the event, a grand goal was announced: bringing microcredit to 100 million by 2005.
The Summit and the ensuing, permanent Microcredit Summit Campaign have been a major force behind the global microfinance movement, combining savvy publicity with behind-the-scenes lobbying for funding. Each year the MCS has released a painstakingly collected tally of microloans worldwide. After the 100 million goal for 2005 was met, the campaign issued two goals for 2015: 175 million of the poorest with microcredit and 100 million people lifted out of extreme poverty (under $1.25/day).
I enjoyed the report. It is a short, clearly written sampler of current thinking in the microfinance world, covering such topics as economic psychology and "graduation programs," which assist the poorest with a package of services and assets that will, it is hoped, reduce their poverty and prepare them for microfinance.
The most striking finding is that in 2011, for the first time, the number of microloans fell---and sharply. The graph is below. Most of the decline happened in Andhra Pradesh, but also some in Bangladesh (which I don't know how to reconcile with the Mix Market's finding of little change in 2011).
Yet the report's cover and title de-emphasize this arresting drop. The title, "Vulnerability," connects to a quote in the report from India microfinance veteran Vijayalakshmi Das. Identifying with microfinance institutions (MFIs), she said, "we are as vulnerable as our clients." That is, MFIs will be fragile as long as prey upon the vulnerabilities of the poor. They lost sight of their clients' interests and paid the price. It's a compelling formulation, and resonates as we try to understand what went wrong in Andhra Pradesh, and why the global total fell. But I wonder if it is the whole truth. In the U.S., payday lenders seem to be making a steady profit off a clientele whose finances are far less steady. In Mexico, so is Compartamos.
Broadly, the interests of merchant and customer conflict on some margin and coincide on others. Many MFIs could gain by raising interest rates, and hurts clients. Yet clearly MFIs will go bankrupt if all their clients do.
None of that is new. MFIs have always said that they were succeeding by helping their clients succeed. What is new is that in India, it came to appear that were not, in some cases. So what we need to learn from India is not that MFIs should remember their clients' interests, but how the MFIs came to collectively work against the interests of some of their clients.
An implication of the vulnerability theme is that some of the loans in Andhra Pradesh, and by implication elsewhere, were harmful. With microcredit, more is not always better. That cuts against the historical messaging of the MCS, with its goals and tallies and its fusing of a goal for outreach with one for poverty reduction. The general impression has been that expansion is urgent and equatable with poverty reduction.
The text thoughtfully covers both mobile money and graduation programs. I found the discussion subtly asymmetric, cautious about high tech and more hopeful that graduation programs represents the best path for MFIs wanting to reduce the vulnerability of clients and themselves. One need not choose sides, and the report properly captures many nuances in these topics. But I'd argue that if anything the report gets things backwards. Mobile money is delivering useful financial services to tens of millions of people in a businesslike way. In my view, that is true to the spirit of microfinance. As for graduation programs, they appear in randomized trials to be helping people too. This is great. But to my limited knowledge, it's not clear that the financial services that are part of these packages or become available after are key to this impact. So is it perhaps provincial to call them "graduation" programs, if by that is meant graduation to microfinance? Maybe giving very poor people carefully chosen goods and services is a good way to help them, regardless of whether it leads to greater microfinance use. Would we describe the U.S. food stamp program as a graduation program meant to increase financial access?
I think microfinance has succeeded mainly by streamlining, not adding expensive extras. If adding the extras helps people and donors are willing to fund it, that's excellent. Whether it's microfinance, I don't know. So while I applaud the pragmatic creativity of BRAC in developing the approach and the rigor of Ford and CGAP in evaluating it, I fear that casting it as a species of microfinance at once belittles it and exaggerates its promise for mainstream MFIs.
As usual, it is the friction of disagreement that most excites me to write. Yet there is much I like in the report and I encourage you to peruse it.
[Note: This post has been revised to reflect information from a Paris Club press release and conversations with knowledgeable officials.]
Credit: photography-by-winnaing/ cc
In the last few days, a delicate dance of reconciliation between Myanmar and its estranged foreign creditors reached its final measures. At the Club de Paris---the collective negotiating forum for creditor governments such as Japan and the United States---a press release just announced a debt deal with the poor and long-isolated Asian nation. The creditors committed to what is by Paris Club standards an exceptionally generous deal: cancelling half the debt in arrears---Myanmar defaulted in 1998---and instituting a 15-year repayment schedule for the remainder, including a 7-year grace period. Because the interest rates on most of these the loans are low, typically about 1%, this stretching out of repayment further reduces the debt's economic cost ("net present value" or NPV). Overall, the NPV will fall 60%. Meanwhile the World Bank and Asian Development Bank made their first loans to Myanmar in more than 20 years, in the process erasing their own arrears issues with the country.
With aid relationships normalized, the gates---floodgates?---are now open for donors into Myanmar.
[As the government gains legitimacy with the international community, the case strengthens for accepting the government's chosen English name for the country: Myanmar. So even though Aung San Suu Kyi and the US and UK governments still favor "Burma," I'm switching.]
This graph summarizes the recent transactions and commitments. Below, I explain the numbers and put them in some context:
As I recounted in an earlier post, Myanmar's military government fell out of favor with industrial democracies and multilateral development banks in 1988 after it brutally suppressed democracy protests. Aid flows slowed to a trickle, which came mainly from Japan. Myanmar paid the interest and principal due on its old debts to donors until 1998, perhaps hoping for an improvement in relations, but then slipped into default. Recent economic and political reforms have made the great powers desirous of détente: Obama visited right after winning reelection, and Japan promised substantial debt relief. The donors want to reward the government's difficult reforms and improve their own position within the geopolitical rivalries among the United States, Japan, China, and even India.
One obstacle to reconciliation was the clearance of Myanmar's arrears. Formally, for example, the development banks cannot lend to nations that are in default with them. I argued last summer that while debt relief would certainly not harm Myanmar, neither was the case for it that compelling. The country's debt burden was modest by most standards. And famed opposition leader Aung San Suu Kyi had called for foreign governments to suspend but not end economic sanctions on Myanmar, such as on trade and visa issuance; the threat of easy reinstatement, in her judgment, would spur further reform. The analogous step in the debt dance was to refinance defaulted loans rather than cancel them. Just as sanctions can be permanently abolished later, perhaps in order to reward further reform, so can debts be.
Events have exposed my dichotomy between debt refinancing and debt reduction as artificial. Some debts have indeed been refinanced rather than cancelled, but in such a way as to cut their net present value. So refinancing and reduction, rather than being mutually exclusive choices, went hand in hand. This possibility should have been obvious to me. I'm refinancing my mortgage now, which won't reduce my debt but will save me money.
The fresh deals have also sidelined my contention that deep debt relief was not needed. Events have gone otherwise than I envisioned for two reasons. First, the IMF learned that Myanmar was significantly more indebted than it estimated last May in a report that I leaned on. The IMF estimate for the stock rose from $12.4 billion (page 22) to $15.3 billion (note 2). Probably most of these newly discovered debts were penalties from creditors for missing payments. Myanmar didn't pay those penalties, so the creditors tacked them onto the debt stocks. And in many cases, Myanmar didn't even track them, which is why the IMF, relying on Myanmar's data, left them out of last May's debt assessment. Unfortunately, the government has not yet permitted the IMF to publish its revised analysis. Second, creditors are eager to improve relations with Myanmar for foreign policy reasons beyond debt sustainability. So eager are they that the Myanmar deal is generous by the creditors' own standards, offering a larger cut than is normally given for countries at Myanmar's (moderate) level of debt distress. Notably, the Paris Club press release makes no reference to the standard terms of treatment.
I had argued against such generous relief now, citing Nobel Peace Prize winner Aung San Suu Kyi, as giving away too much too soon. Why not condition debt relief on major additional reforms of the political or economic system? In fact, though not made clear in the Paris Club press release, the creditors have formally conditioned their write-offs on modest economic reforms occurring in 2013. The IMF negotiated and will oversee these conditions, under a <a href="https://www.cgdev.org/%3Ca%20href%3D"http://www.imf.org/external/pubs/cat/longres.aspx?sk=40248.0">">Staff-Monitored Program (a type of arrangement that is distinctive in not involving any lending by the IMF). The five listed reforms (page 32) are no doubt important. Myanmar will, for example, "prepare regulations to introduce treasury securities auctions" in order "to facilitate market-based deficit financing." Still, these conditions for reengagement feel pallid next to the conditions that triggered the break-up in 1988, namely the killing of hundreds of protesting students. The contrast became more poignant when, the day after the debt deals were announced, a different kind of news emerged from Myanmar: lab tests showed that the military used phosphorous on protesting monks and villagers in November. If nothing else, this situation illustrates the complex interplay in diplomacy between morality, geopolitics, and development concerns.
Two governments are plotting a different course in managing their Myanmar credits. Norway has agreed to cancel all its claims on the country. And Japan, the largest of Myanmar's Paris Club creditors, had already committed to a 60% cut---but according to its own terms and timing.
Intertwined with the Paris Club deal is a resolution of Myanmar's arrears to the World Bank and Asian Development Bank (ADB). The World Bank announced today that it had lent $440 million, and the ADB announced $512 million. The two credits total $952 million.
The real significance of the transactions lies not in that impressive sum, which is mostly illusory, but in the normalization of relations with the donors, which will lead to more loans and grants. Why is the $952 million illusory? Not by coincidence, the loans almost exactly match Myanmar's arrears to the same lenders (page 13): $436 million to the World Bank and $517 million to the ADB. That is $953 million altogether. Or was. On or soon after January 17, Myanmar paid off those arrears with the proceeds of the new loans. Clearing the arrears then allowed the development banks, by their own rules, to lend to Myanmar...
Sort of. Did you notice the paradox? Myanmar didn't pay the overdues until it got new loans. And the banks couldn't make the new loans until Myanmar paid the overdues. So which came first? As I explained before, what actually happened is that the government of Japan made a bridge loan. It lent Myanmar $950 million or so. Myanmar passed the money to development banks to clear the arrears. The development banks quickly disbursed new loans of similar size. Myanmar repaid the Japanese loan, perhaps along with a fee for the service. Perhaps $20 million is now left over for the other avowed purpose of the World Bank and ADB credits, which is to provide technical assistance to help the government "revamp the national budget process and modernize tax administration" and otherwise strengthen governance and the business climate.
To my knowledge, this is the first time the ADB has skirted its own prohibition on lending to its delinquents. The World Bank has conducted at least seven such pragmatic transactions in the last decade. Usually its descriptions of the deals have carried an Orwellian tinge, as proceeds from loans for "poverty reduction" circle back to the Bank within hours. In that light, the first word in the Bank's description of the Myanmar loan, "Reengagement and Reform Support Program" is refreshingly honest. (The term appears to have been first used under similar circumstances with Liberia.)
And given the dangers of an aid avalanche in Myanmar (see: Haiti), it may also be for the good that the net proceeds from these first loans is modest. A "rush for the entrances" by dozens of well-meaning donors could exceed their capacity and the government's to manage the the aid productively.
On CGD's main blog, Julia Clark and I just posted a ranking of noted American think tanks based on their ability to generate public profile: press mentions, academic citations, web traffic, and social media followers. The effort is aimed at providing some healthy methodological competition for another ranking of think tanks, this one looking at institutions around the world, which experts have mostly criticized. If the criticisms are right, the other index may be distorting funding decisions and think tank behavior from Croatia to Kenya.
This Thursday, the World bank will host the unveiling of the latest edition of the best-known ranking of think tanks, which is produced by the University of Pennsylvania. The public event will reveal whether the Brookings Institution has lost its hold on "Think Tank of the Year," which tanks made the top 50 worldwide, which are best in Latin America, and so on.
As with the Oscars, the verdicts of the Global Go To Think Tank (GGTTT) Index are rendered on the basis not of performance measurement, but on the perceptions of those in the business, in this case hundreds of journalists, policymakers, and think tank employees. That approach may be one reason expert perceptions of the GGTTT index itself have tended to be highly critical (here, here, here, here, here). Among the concerns: the opacity of the ranking process, the inclusion of institutions that are not think tanks in any usual sense of that term, and fundamental doubts about what it means for a member of such a diverse class to be "the best." Yet there's no doubt the results turn heads each year. If the criticisms from think tank experts are right, then the GGTTT may be distorting the behavior of think tanks, as they strive to raise their standing on dubious metrics, as well as misleading think tank funders.
The combination of continuing criticism and continuing interest made us wonder: could we do better, or at least illustrate the possibility of doing better? Last November we posted indicators of think tank "profile", looking at how often a tank's work is reported, cited, downloaded, or followed. Here, after tweaking and updating the indicators, we blend them into a single index for ranking, which we easily do by drawing upon methods we honed over 10 years for the Commitment to Development Index.
Let us be clear: CGD does not intend to enter the tank-ranking business long-term. As a think tank, we have a dog in this fight and lack the necessary objectivity. Indeed, we acknowledge that our focus on public profile may bias the results in CGD's favor, since public outreach is central to our strategy. Moreover, as we wrote last time, web page hits, media mentions, and scholarly citations are just a subset of the characteristics that can make a think tank effective. Thus the "profile" in our title. Some tanks succeed precisely by flying below the radar. Our purpose is to stimulate and improve the discourse around think tank performance.
These caveats notwithstanding, like many in the think tank community, we are data geeks, committed to the use of objective evidence in our research. We believe that any effort to rank the tanks should begin by gathering the best available data. In the spirit of offering some much-needed competition to the GGTTT, let’s take a closer look at the data.
We start with an update of the last post's first table, covering noted US institutions. Major changes in our data collection include a switch from Google News to Nexis for media mentions counts, the latter having proved more stable from week to week; the use of academic citations of papers published in 2010 instead of 2012, since 2012 is too recent for much mention to have accrued; and the exclusion of Human Rights Watch and NBER as not being think tanks despite being listed as such in the GGTTT:
Indicators of aggregate profile for major US think tanks
Next we turn those numbers into scores by multiplying each performance indicator column by a scaling factor chosen so that an average performer scores exactly 5 (and a twice-average one gets a 10). This puts most scores in the intuitive 0--10 scale. Finally, we average the five categories to get the overall scores in the last column of the table below.
The conservative stalwarts Cato and Heritage perform best in social media and web presence, while Brookings leads in media mentions and scholarly citations---but as we'll explain in a moment, we don't view these rankings as our most useful results, preferring to order another way:
Scores of aggregate profile for major US think tanks (5 = average)
(In the spreadsheet, you can change the weights on the five indicators.)
The next two tables are like the first two except that they divide all performance indicators by annual spending: not total media mentions, for example, but media mentions per dollar of budget. These results seem more practically relevant because a donor of $100,000 is probably less interested in an institution's aggregate profile than its ability to build following per dollar spent. By that measure, the Cato Institute, the Pew Research Center, and Peterson Institute have been most effective:
Indicators of profile per dollar spent for major US think tanks
Scores of profile per dollar spent for major US think tanks (5 = average)
Ranking by profile per dollar spent of major US think tanks
These standings differ from those of the Global Go To Think Tank index. Cato is third in aggregate profile (second table above) and tops in profile/dollar (fourth table and graph above) but only 14th on the 2011 GGTTT. Last year's 'Think Tank of the Year," Brookings, score just 5.5 (5 being average), placing it 7th out of 18.
To defend the GGTTT against this seeming contradiction by real data, one could argue that the GGTTT index covers more than public profile: it encompasses any performance attribute the raters think of, such as timeliness or creativity of policy proposals. Recognizing this argument, for a more meaningful comparison, we plot our results against corresponding rankings in the GGTTT report.
This next chart compares the average of our first three indicators (per-dollar social media fans, web traffic, and incoming links) with an institution's rank on the GGTTT list of "Think Tanks with the Best Use of the Internet or Social Media to Engage the Public." (It drops tanks that don't make the GGTTT list.) If the two approaches agreed, we would see the dots cluster roughly along a line from bottom left (low score from us, high rank number from GGTTT) to bottom right (high score, low rank number such as #1). Some do that, but others are far from the diagonal line of good agreement:
Internet/Social media: GGTTT rank vs. CGD index
This is a similar plot for print and electronic media; it compares our media-mention scores per dollar to the GGTTT "Think Tanks with the Best Use of the Media (Print or Electronic) to Communicate Programs and Research." Here the correlation is even weaker:
Print/electronic media: GGTTT rank vs. CGD index
Our small exercise falls short of an objective ranking of think tanks. Still, we can’t help but believe that expert and popular understanding of think tanks would improve substantially if some of the energy currently put into annually cajoling hundreds of experts to rate thousands of institutions each year were redirected into collecting and analyzing objective, empirical measurements of think tank performance.
A spreadsheet with all results shown above, plus the same for the GGTTT's "international development" think tanks, is here. Perhaps some think tank funders will want to pursue this work. If so, your comments below could contribute to their own thinking about think tanks.
CGD senior fellow David Roodman has won the inaugural Stata Journal Editors’ Prize for what the editors termed “two outstanding papers:” How to do xtabond2: An introduction to difference and system GMM in Stata and Fitting fully observed recursive mixed-process models with cmp. Each article describes how to use a computer program he wrote to extends Stata, a widely used statistical toolset. Roodman’s papers have been cited thousands of times. The prize citation concludes:
“David Roodman has provided to the Stata community:
• Excellent programs substantially extending the functionality available to users
• Programs that incorporate innovative and sophisticated Mata programming
• Excellent accompanying articles in the Stata Journal that not only explain the
programs but also are excellent free-standing pedagogic pieces in their own right”
CGD president Nancy Birdsall congratulated Roodman, noting that the prize is ”a welcome reminder that at CGD we have found a way to attract world-class scholars who bring rigor as well as passion to this century's crucial challenge: reducing poverty and inequality in the world.”
Roodman explains his work and discusses his reaction to winning the prize in the Q&A below.
What is the “Stata Journal Editors’ Prize”?
Stata is software for doing statistics. It’s popular in academia, especially in the social sciences, which is why CGD uses it. StataCorp’s strategy is interesting: as a for-profit corporation it has worked to build a public, free-software ecosystem on its platform. People write and share add-on packages for it. Probably the strategy works well in the academic market because for researchers, reference to one’s work by others is a valuable professional commodity. To further reward contributors to this community, StataCorp sponsors an academic journal that publishes articles about user-written programs and the theory behind them. For the same reason, the journal this year inaugurated an annual award for the best contributions.
So why did you win the inaugural prize?
Partly out of luck. My two articles hit the journal in 2009 and 2011, the years that bracket the period for the inaugural prize. Aside from that, I think the editors appreciate the contributions for the professionalism I brought to the computer programming and the pedagogy I brought to the write-ups. I am a mathematician and programmer first (my degree is in math) and a social scientist second. I think that most contributors are the other way around. For good mathematicians and programmers, doing things right is paramount. Elegance is akin to truth. I have strived to make my programs elegant, powerful, and flexible. I’ve also tried to respond quickly to suggestions from users, in the spirit of continuous improvement. Meanwhile, I thrive on teaching the complex things I’ve figured out to others. That’s the common thread from this abstruse mathematical work to my broad book on microfinance to the Commitment to Development Index.
Why did you write the new programs for Stata?
I arrived at CGD in 2002 knowing little about the application of statistics to social sciences, what is called econometrics. One of my first projects was to help Senior Fellow Bill Easterly reconstruct an important study that found that foreign aid sped economic growth in countries with good economic policies. It was a powerful finding, which had influenced my own understanding. But when we added more years of data the key finding disappeared. This experience made a strong impression on me. I taught me that replication is a great way to learn econometrics. And it taught me that econometric work can be a black box that fools even the people who do it.
You are most known in the Stata community for “xtabond2.” What is that?
I wrote it in 2004 to implement a statistical method not then available in Stata. Called System GMM, it is used on data sets generated by observing many individuals—people, firms, or countries—a few times. The case at hand: data from some 100 countries in the six five-year periods between 1970 and 2000 on foreign aid and various economic and political indicators. The method makes a particular attack on a central econometric challenge: inferring causation from correlation. That is to say, statistics are only really good at telling us about patterns, such as whether faster-growing countries receive more aid. It’s much tougher to figure out which way the causal arrows go: is aid making countries grow faster or is faster growth merely attracting more aid? Ironically, the upshot of my making it easier for people to do System GMM was a distrust of studies that use it. Still, researchers have to make the best of the data they have, and sometimes the best is System GMM, so after I completed the CGD working paper about xtabond2 that became a Stata Journal article, I wrote A Note on the Theme of Too Many Instruments, to promote better use of it.
And what is your other program, “cmp,” for?
Econometrics is most natural when the outcomes studied vary across a wide range of numerical values. Examples are household net worth and IQ. But for some outcomes the range is hemmed in—you can’t borrow a negative amount of money—or discontinuous—you can’t be a little bit pregnant. And sometimes researchers want to analyze the determinants of several such variables at once. In the example that inspired the command, Mark Pitt and Shahidur Khandker, set up a model in which several factors determine how much microcredit a Bangladeshi household would borrow, which in turn could affect whether, say, a child was in school, which is a binary characteristic like pregnancy. They model in two stages in an attempt, again, to surmise causation from correlation (explained here). They performed the complex math using custom code now reportedly lost. So I wrote cmp to fit this model and more. It is a way to mix and match models for these sorts of outcomes. It’s flexible, a sort of smartphone for Stata. It’s been used to study everything from inequality in health in the U.S. to the effect of remittances on poverty in Ghana.
What does the award mean to you?
I started programming when I was 12, on a mainframe computer under my father’s tutelage. At 20, I interned at Microsoft for a summer. It was fun, but I wanted to do something more meaningful than beating another software maker and parking an expensive car outside the office building. So I long struggled to meld my aptitudes to a larger purpose—as a teacher put it, to program for the revolution. In addition, my path at CGD has been unorthodox. I have learned econometrics by coding it, in order to engage in economics not as a producer, as a graduate economics program would have made me, but as an annoyingly demanding consumer. I wrote the programs in order to rerun and scrutinize important studies on the impact of foreign aid and microfinance. I felt this depth of review was necessary to inform my judgments on the substantive matters. But despite the sense of compulsion, I often questioned my judgment about my choice of direction. It is gratifying now to see that my work is of some use to others.
Editor’s Note: See also Ich bin ein Über-Geek
Roodman’s formal education ended in 1990 with a Bachelor’s degree in theoretical mathematics from Harvard College. After years at the Worldwatch Institute and on a Fulbright in Vietnam, he arrived at CGD in 2002 knowing little about econometrics. He discovered that a great way to learn econometrics is to code it. His contributions to the Stata community since then were motivated by a desire to replicate and scrutinize complex, influential studies in development economics, which led him to write xtabond2, cmp, and other packages; and motivated by a pedagogic bent, which led him to document the packages and their mathematics in the Stata Journal. He is the author of Due Diligence: An Impertinent Inquiry into Microfinance (Roodman 2012).
David's prize is a welcome reminder that at CGD we have found a way to attract world-class scholars who bring rigor as well as passion to this century's crucial challenge: reducing poverty and inequality in the world.
One popular statistical software package in academia is Stata. CGD has always used it, and thus so have I. As my colleague Mead Over pointed out, Stata's business model is an interesting mix of private and public goods provision. The private corporation profits by cultivating a public free-software community on top of its core product. Stata sells you the main program, which includes commands to perform all sorts of analyses. People outside the company write add-on commands and share their code freely, all in return for...the satisfaction and prestige of seeing others use their work.
Stata has worked over the years to reward such sharing. One step was the founding of the Stata Journal (and before that the Stata Technical Bulletin) to give academics a venue to leverage their coding labors into career-boosting publications. A more recent step was the institution of an annual award for the best contributions to that journal in the previous three years. The first award was announced today. The recipient is: me.
The prize is awarded to David Roodman specifically for two outstanding papers in this journal:
How to do xtabond2: An introduction to difference and system GMM in Stata (Roodman 2009b)
Fitting fully observed recursive mixed-process models with cmp (Roodman 2011)
The titles alone are exciting I know! Ungated CGD versions here and here. My two papers were fortuitously timed for the period of this first award.
Born in Indianapolis, Indiana, in 1968, he grew up in Hanover, New Hampshire, and Binghamton, New York. Roodman’s formal education ended in 1990 with a Bachelor’s degree in theoretical mathematics from Harvard College. After years at the Worldwatch Institute and on a Fulbright in Vietnam, he arrived at CGD in 2002 knowing little about econometrics. He discovered that a great way to learn econometrics is to code it. His contributions to the Stata community since then were motivated by a desire to replicate and scrutinize complex, influential studies in development economics, which led him to write xtabond2, cmp, and other packages; and motivated by a pedagogic bent, which led him to document the packages and their mathematics in the Stata Journal.
I am humbled and happy about the award.
I wrote the first program as part of my appraisal of the literature on whether foreign aid causes economic growth. (See the technical Anarchy of Numbers and this non-technical guide for the perplexed.) At the encouragement of David Drukker of StataCorp, I then wrote my paper about the program.
As I blogged in 2009, the second paper documents a program I wrote in order to replicate the Pitt & Khandker study of the impact of microcredit in Bangladesh. It's the most beautiful program I've written. Philosophers argue about whether mathematical ideas are discovered or invented. The concept of this program, cmp, is so elegant that I feel like it was there waiting to be discovered.
I echo the end of the award announcement:
As editors, we are indebted to...a necessarily anonymous nominator for a singularly lucid and detailed précis of Roodman’s work.
I've followed an unusual track at CGD, teaching myself econometrics by coding it, doing so primarily in order to be an annoyingly demanding consumer of econometrics, trying to decide which studies to believe, abrading some people along the way. I feel fortunate to have had the opportunity to grow and contribute in this way, to realize my peculiar latent potential. But especially in the early years, I also felt bashful about this strange path---real economists are trained to produce research not just be annoying consumers of it---even as I felt compelled to cut that path. The validation is appreciated.
Of course, writing cool code is not the same as improving lives, which is CGD's reason for being. I only hope that the tools I have made, through their use in the hands of others, have in some small way advanced social science, especially as it relates to helping the world's poor.
CGD working paper 26, "New Data, New Doubts: Revisiting "Aid, Policies, and Growth" by CGD non-resident fellow William Easterly, research fellow David Roodman, and Ross Levine (also published as "Aid, Policies, and Growth: Comment" in the American Economic Review, June 2004), concludes that the Burnside and Dollar (2000) finding that aid raises growth in a good policy environment is not statistically robust. This dataset is a four-year panel covering 1966–97. It includes all the Burnside and Dollar data and Easterly, Levine and Roodman's expanded data set.
The Burnside and Dollar (2000) finding that aid raises growth in a good policy environment has had an important influence on policy and academic debates. We conduct a data gathering exercise that updates their data from 1970-93 to 1970-97, as well as filling in missing data for the original period 1970-93. We find that the BD finding is not robust to the use of this additional data. (JEL F350, O230, O400)
[Update: To compare Compartamos to its Mexican peers on price, read this next.]
I have told how a question last fall from filmmaker Tom Heinemann prompted me to measure the Grameen Bank's interest rate as sharply as I could. At the time, he asked me not to mention his role in inspiring that post, so as not to step on his toes as he rolled out the documentary. (In the event, he used a figure higher than the one I calculated.)
Now it can be revealed: Tom asked me the same question about Mexico's Compartamos Banco, and I responded with another spreadsheet. I was interested for my own work, but since it was his idea, I held off blogging that result until today for basically the reason just given. The English premiere of his documentary was scheduled for today, and it unlike the Norwegian version apparently covers Mexico and India in addition to Bangladesh. The release event planned for today at the Overseas Development Institute has been postponed indefinitely...but that's enough waiting for my bit of analysis.
As you probably know, Compartamos's IPO in April 2007 touched off a controversy akin to the current Indian one, which also began with an IPO and then exploded with the Andhra Pradesh crackdown. In both IPOs, a few people made millions of dollars, inviting attacks of usury. There is one big difference this time around: while the government of Andhra Pradesh expressed outrage at microcreditors for charging more than 30%, Compartamos charged 85%---98% after value-added tax (VAT; "sales tax" for Americans). Makes you wonder what all the ruckus in India is about. Or, more carefully: makes you wonder whether the role of interest rates in India's troubles have been exaggerated. You can get into debt trouble at 0%.
Now, the price of credit from Compartamos has glided down since the IPO. And I learned from the Grameen Bank that pinning down the price of credit often requires analytical effort: it's not something you can generally look up on a website, unless MFTransparency has been on the job. In fact, MFTransparency has not measured Mexican microcredit rates yet because it needs the cooperation of the microcreditors and funding to proceed. This is ironic since it was Compartamos's high interest (along with a prod from Muhammad Yunus) that made Chuck Waterfield start MFTransparency.
So I had reasons to poke into the matter. Here, I am going to describe how I calculated the rate, show you some pretty huge numbers, then philosophize.
I analyzed what I believe is the main Compartamos loan product, Crédito Mujer (Women Credit) which is delivered through groups of 12--50 women in village banks in amounts of 1,500--27,000 pesos (US$125--2,225) per person. This loan is repaid in 16 installments over 16 weeks (not quite the four months advertised on the web site, as Chuck Waterfield has pointed out, which matters for the effective interest rate). Roughly, the interest is a "flat rate" of 0.84%/week of the starting balance, or 16 × 0.84% = 13.4% over the full term. (This is before VAT: see below.) So right there, we're looking at 52 × 0.84% = 43.7%/year, easily on the high end of the Indian range.
And we're just getting started. As you should have learned in Microfinance 101, such a "flat rate," expressed relative to the opening balance, understates the true interest rate roughly twofold. People might think they are paying 13.4% interest over 16 weeks, but with weekly payments steadily reducing the principal to 0, the average balance over the term is about half the starting balance. If a woman borrows 1,000 pesos, her total interest of 16 × 0.84% × 1,000 = 134 pesos works out to 26.8% of an average balance of 500 pesos. In fact, the g-forces are so high at these interest rates that they bend the math; a precise calculation translates the flat rate of 13.4% into a more accurate "declining rate" of not 26.8% but 24.3%. (If you really want to know why: Crédito Mujer, like a fixed-rate mortgage, has constant payments. In the early payments, more pesos are interest, so the principal falls slowly at first. Halfway through the repayment period, the loan is less than half repaid, so the average balance over the term is more than half the starting balance and the declining rate is less than half the flat rate.)
24.3% per 16 weeks multiplies up to 79.1% per year. That exceeds the 71% that a Compartamos representative quoted to Tom Heinemann in e-mail but lines up very well with the Annualized Percentage Rate (APR) of 79.99% posted on Compartamos's web site.
Three more considerations increase the computed rate substantially (see my spreadsheet):
Compounding. Suppose the borrower paid all 134 pesos of interest in one installment at the end of the term rather than 16 installments during; suppose further that the schedule of principal repayments remained the same as before this change. Deferring the interest payments does not affect the interest rate as computed above: total interest and average outstanding balance would stay the same. Yet deferring interest would reduce the financial burden because it would give the borrower more time to reinvest her earnings in her business and expand it. A proper "effective APR" calculation that factors in the time value of money---that reflects the additional burden of having to pay interest before the end of the loan term---turns that 79.1% into 119.3%. In other words, the borrower's business would have to generate a weekly rate of return equivalent to 119.3%/year in order to break even on the loan. This calculation is the credit analog for the way that reinvesting interest in a savings account increases (compounds) the rate of return. My 119.3% lines up well with the "CAT" (Costo Anual Total) of 116.9% that Compartamos reports, I believe by law. Why the match is inexact, I don't know.
At any rate (ha ha), it's important to keep in mind that while the interest fully compounds from borrower's point of view (in theory, every peso of early interest payment reduces the capital in her business, thus its growth, thus her future capacity to service the loan) Compartamos's revenue does not fully compound. That is, if it is charging 70%/year before compounding, and its operating costs (wages, rent, equipment) are 60%, then it can only reinvest the 10% margin in order to increase future profits. Compartamos is not earning 120.2%/year.
Value-added tax. Including a standard 16% VAT on the sales price of credit---the interest---raises the uncompounded rate to 91.8% and the compounded rate to 148.4%. Here too, it's important to distinguish the borrower's and the lender's points of view. The borrower does pay 91.8% or 148.4% but the lender does not earn that much since VAT goes to the government. (Apparently VAT is 11% near the U.S. border.)
Forced savings. Compartamos requires borrowers to have savings equal to 10% of the starting loan balance, as a kind of collateral or emergency fund for weeks in which repayment is hard. The savings can be at any bank. According to the Compartamos representative Tom contacted, savers can expect to earn 4--8%/year. I use 8% to be conservative, i.e., bias the net cost of borrowing from Compartamos downward. (You can change this in the spreadsheet.) This requirement effectively reduces a 1,000 peso loan to 900 without reducing the interest charged. That raises the uncompounded rate to 87.0%, or 101.1% after VAT, and the compounded rate to 154.4%, or 195.3% after VAT.
Arguably then, Compartamos credit costs nearly 200%/year, much more than you would guess from the 0.84%/week flat rate and much more than the 71--80% APR the bank tends to quote. Here's a table of the rate computed different ways:
So should we be outraged that what is pitched as 0.84%/week (or 1%/week with VAT) really costs 195.3%/year? Maybe. But I have three doubts.
First, it is not clear to me that we have cause to castigate Compartamos for opacity. On its website is a credit simulator that asks you for the parameters of a loan---amount borrowed, frequency of payments---then displays an exact repayment schedule, including VAT. The schedule could not be simpler: 16 equal payments, no other fees. Lord knows, probably most of Compartamos's 1.5 million clients cannot access this tool. But its presence may indicate a wider corporate practice. Clients may get the same schedules in hard copy when they borrow. Last October at the Financial Access Initiative conference in New York I asked Compartamos executive vice president and co-founder Carlos Danel if this is the case. He said it is.
Now, one can argue that disclosing a loan repayment schedule is morally inferior to disclosing an interest rate. No campesina is going to look at a 10% savings requirement and a series of 16 72.25-pesos payments on a 1,000-peso loan and infer 195.3%. Ergo, the argument goes, she is deceived about the true price she is paying. To be fair, if you squint at the bottom of the loan schedule, you'll find a link to the interest rate disclosure I have already cited (hat tip to Chuck Waterfield). Perhaps in dealing with borrowers, Compartamos similarly discloses interest rates in fine print.
But I am not sure that Compartamos is wrong to emphasize the repayment schedule over the effective interest rate. Which is easier for you to understand, 16 payments of 72.25 pesos or 195.3%? Which is easier for a poor Mexican woman to understand? More to the point, which will better help her judge whether she can handle the credit? As I blogged, in my Reflections on Transparency this debate goes back at least 80 years, to when American economists criticized the Morris Plan of Industrial Banking in the same way. The defense was that Morris Plan banks were crystal clear about the deal they offered, even though---indeed, precisely because---their price wasn't in the economists' "correct" language of effective interest rates. The proper test of disclosure is not whether it reifies a capitalistic metaphor but whether, given the foibles of the mind, it encourages good decision making.
Not that there isn't room under the sun for APRs. APRs have several virtues. They aggregate all the determinants of cost into a single number, bringing complex or hidden fees into the open. As a consistent and conceptually solid yardstick, they help consumers compare financial offers. Since everyone argues about interest rates, it is good for them to be measured systematically in the way that MFTransparency does. And APRs can support good decisionmaking on the part of sophisticated players, such as regulators.
Another source of doubt---or at least debate---about the 195.3% is whether it is proper to count the compulsion to save purely as a cost. Maybe Compartamos is doing clients a favor by disciplining them into not only repaying the loan but saving too---or at least by assuring that borrowers have a buffer to draw on in bad weeks. One strong message from Portfolios of the Poor and behavioral economists, is that people look to financial services for the discipline to set aside money for important purposes. Having a loan to pay off or a commitment savings account to pay into is a valuable excuse to say "no" when money is tight, when temptations to spend are all around, and when friends and relatives ask for cash.
The people at MFTransparency have thought a lot about this question. They are firm in treating forced savings as a cost:
Most Truth-in-Lending legislation requires that all obligatory additional costs be incorporated into the transparent price, even if the costs are argued to be related to other services bundled with the loan. Otherwise, the interest rate can be used to hide the true cost of the loan. [A]ny financial requirement that reduces the amount of money available to the client, regardless of its purpose, is included in the calculation of the true price of the loan.
This position has some limitations (discussed below), but overall it is strong and coherent. If my mortgage lender required a deposit in exchange for a loan, I'd want that counted in my APR. Further strengthening the MFTransparency view is that a borrower could subvert the ascribed good intentions of Compartamos's savings requirement. Instead of scraping together that 10% before taking a loan, she could borrow it from a sibling or moneylender, put that in a bank, get the Compartamos loan, repay her informal creditor...and repeat for the next loan.
So on balance, I tend to feel it is truer to factor in the forced savings.
[Update: now I tend not to feel that. Below Carlos Danel explains that Compartamos cannot enforce the "forced savings" because it is done at other banks.]
My third doubt overlaps conceptually with the previous one: how meaningful is it to compound, especially when extrapolating from a 16-week term to an annual rate? As I blogged before:
Consider also the example of the vegetable sellers of Chennai, who pay 10%/day for informal credit. Are their moneylenders remiss in not disclosing the effective APR of 128,330,558,031,335,170%/year? I'm reminded of the Steven Wright joke: “One time, the police stopped me for speeding, and they said, ‘Don't you know the speed limit is 55 miles an hour?’ I said, ‘Yeah, I know, but I wasn't gonna be out that long.’”
Struggling with this example, and talking with Carlos Danel in New York, I began to wonder if a core confusion here is the ill fit of a particular capitalistic metaphor. What, really, does that 128 quadrillion %/year mean? If the tea seller were to take a 100 rupee loan on day 1, start a business with it, repay the 100 rupees plus 10 rupees interest at the end of the day, borrow back the 110 rupees the next morning, expand her business, repay the 110 rupees plus 11 rupees interest that next evening, borrow back the 121 rupees the morning after that, etc., for a year, she'd have a 128-quadrillion-rupee business by the end of the year. She would own the world.
But she would not do that. One reason is that after a week or so of expanding her stock of tea and putting more chairs in front of her stall, she would have to hire people. She would grow beyond the cost-free labor of her own two hands. Paying wages would transform the economics of her business. It would become much harder for her to clear (and grow) 10%/day. So while 128 quadrillion %/year is a mathematically sound way to represent the interest rate, it is predicated on the false hypothetical italicized above. It is frail in real-world meaning.
The unrealism in that case arises from viewing a daily loan as an annual one. The same principle applies to Compartamos's 16-week loans. Perhaps we confuse ourselves when we annualize interest charges on 16-week loan.
Another kind of unrealism comes from viewing a loan used for consumption as one used for investment. Suppose a woman borrows to buy a phone. She takes the loan in order to bind herself into setting aside money each week to retroactively pay for the phone. For her, the loan gives discipline more than capital. If she is indeed buying discipline then is it not misguided to model her loan as a one-time injection of capital that will generate returns over time? Shouldn't we instead view the loan as providing a flow of services over its term? To my mind, if she is paying weekly for a weekly service, then there is no compounding. I think that this is the essence of why Carlos Danel finds compounded rates (in Mexico, CATs) misleading.
So here's an idea: to the extent borrowers are buying capital, use the compounded rate---195.3%/year in this case, factoring in VAT and forced savings. To the extent they are buying discipline, use the uncompounded rate---here, a much lower 101.1%. Of course in practice the distinction is not clean, but in theory it legitimizes the use of uncompounded rates. To the extent that we are seeking intuition for the price of Compartamos credit, perhaps it is best to say it costs 101.1--195.3%/year. I know that's a wide range. Maybe "31.0--39.4%/16 weeks" characterizes better.
Can you tell I am groping here?...and hoping you'll think about this and give me feedback.
While you're at it, puzzle me this. SafeSave, the microfinance institution founded by Stuart Rutherford in the shadow of Bangladeshi giants, offers some innovative services that let people borrow and save at the same time. One is called P9. More straightforward is a savings account that lets you borrow back 80% of your own deposits. People seem to like this option as a way to draw down savings while disciplining themselves to rebuild. Now suppose, as Stuart wrote to me, SafeSave raised the borrowing limit from 80% to 100%. And suppose people hit the limit. Then they would be paying net interest (the difference between the interest they paid on the loan and any interest they earned on their gross savings) for exactly no capital. Clearly, they would be buying discipline, not capital. But here's the conundrum: the APR would be infinite. And my idea of pricing discipline without compounding doesn't help: it would still be infinite. From a standard economic point of view, clients' behavior is irrational: they are paying to borrow nothing. Behavioral economics can give us a more realistic model. But the question remains: Is there a coherent way to define the price of discipline in this case, thus in general? Or is it wrong to think that just because we can price capital in a neat, consistent way, through APRs, we can do the same for discipline?
Here also is where the idea of treating savings as a pure cost breaks down. People cannot partake of this particular loan product without first saving, but it seems wrong to treat the savings requirement as a cost of the loan.
Sorry to end in a muddle. Here's a partial bottom line. Ultimately, as MFTransparency notes, there is no one right way to measure interest rates. How you measure should depend on why you measure. If you want to know whether Compartamos is profiteering, the 79.1%, which excludes the effects of compounding, VAT, and forced savings, is more relevant. If you want to know whether borrowers understand the obligations they are assuming, APRs may be a distraction. If you want to compare Compartamos to other microcreditors on price, the 195.3% may be best. If you want to know whether credit is helping people, you'll need to research that more directly.
Over the decade, donors have publicly declared that they would improve how they operate in order to make aid work better. They would coordinate better, let recipient countries take more ownership of project design, and so on. Ten years and ten days ago, there was the Rome Declaration. Then came the Paris Declaration, the Accra Agenda for Action, and the Busan Partnership. Probably you doubt as much as I do whether these statements are worth their weight in paper. But their impact should be judged not by our biases, but by evidence---evidence of how donors actually behave. And the ultimate test for donors, suggests a finely written new report, will come in Myanmar. After the remarkable turnabout there---symbolized for foreigners by longtime dissident Aung San Suu Kyi becoming a member of the legislature last April---dozens of public and private donors are flocking to the country.
Write Lex Rieffel of Brookings and James Fox, a former USAID senior economist:
Every respectable aid agency and international NGO in the world is planning to initiate or expand operations in Myanmar. The best and the brightest in these organizations are pushing to be posted in Yangon or to manage the Myanmar account. We were told that in a recent survey of World Bank employees, 80 percent listed Myanmar as their first choice for an overseas posting. Administrators of the Princeton-in-Asia program have described a fellowship in Myanmar as “the hot ticket” for its current applicants.
Rieffel and Fox are admirably forthright about their limitations as analysts of Myanmar---they've worked on development for decades but only spent a few months in the country, and don't speak the language---but Too Much, Too Soon? The Dilemma of Foreign Aid to Myanmar/Burma seems a fair portrait of donor behavior in the early days of an aid rush. Already, Myanmar's thin government is overwhelmed by requests for meeting from "ministers from donor countries, business leaders, [and] movie stars." Already, donors that vowed to cooperate are withholding information from each other. In this light, it may be a strange blessing that the government suddenly moved its capital in 2005 from Yangon to the terribly planned new city Naypyitaw. "For the government of Myanmar, encouraging embassies and donor offices to remain in Yangon could have the advantage of slowing the flow of visitors and making it easier to avoid them altogether."
The news is not all bad though. Rieffel and Fox single out the UK's Department for International Development for praise: "arguably the most innovative and principled aid agency in the world today." One thing DFID did right was, along with the EU and Australia, to help establish four multidonor trust funds. For example, the "3 Diseases Fund," fills the void left by the departure of the Global Fund to Fight AIDS, TB, and Malaria in 2005. By pooling contributions from many countries, it prevents them from each setting up their own projections pell-mell.
My one disappointment is that the report doesn't quite live up to its title. It doesn't answer the question, "Too Much, Too Soon?," nor even confront it with the directness I'd like. There is much (good) discussion of whether aid quality is too low---whether donors are living up to their Paris Declaration---but less of whether the sheer amount of aid in prospect could do more harm than good, no matter the quality. Even when aid is perfectly harmonized, coordinated, untied, country-owned, aligned, defragmented, managed for results, channeled through country systems, and done with mutual accountability and civil society consultation, it can still distort the political economy of the receiving country. The more that revenue comes from abroad rather than from the population, the less this fragile democracy may be compelled to respond to the needs of its citizens. Or the more, as the report worries, that dealing with donors will distract officials from existential challenges such as making peace with ethnic minorities.
Although Rieffel and Fox don't say this directly, their descriptions imply that the best hope for coordinating the donors lies with the recipient. "Aid professionals generally do not get promoted in their organizations for being good cooperators. They advance by responding to the headquarters
agenda, by showing that their organization is doing something that makes a difference, and by speeding up disbursement of resources under their control." Perhaps only the government of Myanmar, by leading a planning process, can get the donors to fall into a line. Here, early signs are promising. It began work on a Framework for Economic and Social Reforms last May, presented it in draft to donors in December, and brought them together to discuss it in January. The donors offered "high praise."
Encouraging too is the deep debt reduction the government won from the Paris Club, the association of creditor nations, earlier this year. So far I've found no one who was involved in the 19-hour negotiation who is willing to tell me what happened. I surmise, however, that the government of Myanmar knew it had a good hand going in and played it well---owing to what blend of native savvy and foreign advisers, I know not. Norway says the talks almost collapsed. Japan, Myanmar's largest creditor, had already promised deep debt relief, and seems set to become the country's largest donor. With that set, Myanmar was apparently prepared to let its arrears to Norway, France, and other creditors languish, which could have frozen them out of the aid action. I've questioned whether Myanmar needed such deep debt relief. But if they were smart enough to squeeze it out of the donors, like a clever gambler who beats the house, I figure they deserve it.
At any rate, if you need to understand the aid situation in Myanmar, this is an essential read. If your interest is more general, I still highly recommend the report as a humble, historically informed, and insightful snapshot of a country on the eve of major change.
This working paper by CGD research fellow David Roodman provides an original synthesis and exposition of the statistical theory behind one of the most influential studies of the impact of microcredit on borrowers (Pitt and Khandker, Journal of Political Economy, 1998). The present paper also documents Roodman’s program, called cmp which for the first time makes it easy for other researchers to apply these methods. The program implements a "maximum likelihood" estimator for "fully observed, recursive, mixed-process systems of equations," and runs in the commercial statistical analysis package, Stata.
As you would expect when the same people write about the same things, the new paper shares much with its predecessors. It is written emphatically, seeming to raise profound concerns; I don't find it that persuasive; and yet it has taught me something. Perhaps the most succinct rebuttal is that the paper does not refute the fact that if you drop the 16 data points with the most extreme (highest) values on household spending, less than 0.5% of the sample, the finding that microcredit increases household spending completely goes away. The new paper spends much more space challenging our hypotheses (what PK call "claims") about why this happens than whether it happens. But for real-world implications, the whether matters more than the why.
Here, I will explain my thinking in more detail. I'll try to keep section openers jargon-free, but no promises otherwise. If this post bewilders you, then you will know how I felt when I decided years ago to understand the debate over PK, which Jonathan initiated.
Section 1 of the new reply is an introduction. Section 2 does not deal with substance.
Section 3 criticizes an alternative to PK's main statistical method, which we use in part of our paper. The alternative was first proposed by PK in 1998 (footnote 16) and first applied to the PK data by Pitt in 1999 in his attempt to rebut Morduch (1998). PK now describe the method as "outlandish" and "extraordinarily artificial." To me, the theoretical critique and the demonstration through simulations both appear flawed. And even if they are correct, they mainly go just to that "why" question, not the "whether."
PK (1998) propose estimating impacts via two-stage IV in which the instruments are interactions between included controls and each of the two dummies for female and male credit availability. Pitt (1999) implements this as 2SLS. We do classical linear LIML instead, solely in order to stay conceptually closer to PK's nonlinear LIML. But this distinction matters little. In fact the associated under- and weak identification tests, whose availability substantially motivates use of the estimators, are identical. Both also have the virtue of being known to be robust to non-normality in the errors. PK's theoretical attack on the method they once used to defend themselves can I think be distilled to this:
Consider the system of structural equations:
y = x1 + x2 + x3 + e
x1 = z1 + u1
x2 = z2 + u2
where e, u1, u2 are potentially correlated error terms; x3 is an exogenous control; z1, z2 are uncorrelated with e; and z1, z2 are strong explanators for x1 and x2, making them strong instruments for x1 and x2 in the y equation. 2SLS is appropriate for estimating the coefficients on x1 and x2; it would put x3, z1, and z2 in the first-stage equations for x1 and x2. However, the exogenous control x3 is a weak instrument because its expected coefficients in the first-stage equations are 0. (Its first-stage coefficients correspond to πfx and πmx in the PK (2012) exposition.) And z1 is a weak instrument because its expected coefficient in one of the equations, the x2 equation, is zero; and vice versa for z2. (These correspond to πfm and πmf.)
The above argument is wrong because a) the exogenous control x3 cannot be a “weak instrument"; b) z1 and z2 are collectively strong for x1 and x2 even if z1 is weak for x2 and z2 is weak for x1.
PK attempt to illustrate their contention that the linear estimators are fatally flawed by weak instruments. However, these and subsequent simulations deviate from the PK estimation framework in two ways that greatly and unrealistically weaken instruments in themselves. First, treatment quantity (how much is borrowed) is simulated with a zero-centered distribution rather than being always positive as in the real data. As a result, treatment averages zero for treated and untreated, and the availability of credit by gender, the key instruments in PK, are perfectly weak. (In the data and code file, in Table1&9groups3a_liml.do, lines 22--33 define teh zero-centered female and male treatment quantities.) And second, also unlike in PK, these dummies enter as controls rather than instruments! In a classical treatment impact assessment, this is equivalent to looking at the impact of treatment while controlling for rather than instrumenting with intent-to-treat. This too should weaken the remaining instruments. (In the same .do file, note the appearance of "treatm treatf" in the second stages of the specifications in lines 71 and 83.)
I think what this first simulation set actually demonstrates is a problem PK don't emphasize, and which I of all people should have taken more on board. The linear approach generates a lot of instruments, which causes overfitting bias, toward OLS. I think when we revise, we should add exactly-identified regressions, instrumenting only with the credit availability dummies, not their interactions with the controls. Tests on a Pitt (1999) simulated data set demonstrate the minimal bias (but inefficiency) of this method (see Appendix of our first paper). I think it was a mistake not to apply it to the real data earlier.
Exactly identified LIML or 2SLS regressions (they coincide), which are relatively free of overfitting bias and robust to deviations from normality, show no impact of microcredit on household consumption. See 3rd and 5th columns, which are new:
Section 4 helpfully spots a bug in our code. It then devotes almost 3 pages to its potential implications---rather than fixing it to see if it makes a difference, and rather than acknowledging that I informed Mark Pitt this summer that it doesn't.
I forgot to factor in the sampling weights in the lines that compute the skew and kurtosis of the second-stage errors and test whether they deviate from normality. The fix actually strengthens our findings of non-normality: skew in errors in the replication regression rises from 0.64 to 0.71 and kurtosis from 4.78 to 5.12. (Add "[aw=weightpk]" clauses to the "sum ey, detail" and "sktest ey" lines in this.) This is to be expected since PK undersampled ineligible (less poor) households and overweight the sample to compensate. This accentuates the long right tail in the household consumption data.
All code has bugs. And most bugs once found can be made to look really dumb, especially by a good lawyer. If in one place in the complicated computer program for your study of childhood obesity, you typed an H instead of a W: you used height instead of weight. Equating height and weight goes against millennia of medical science, not to mention common sense. One can demonstrate with theory and simulations how it can produce all sorts of wrong results. And no justification is provided for this strange theoretical construct!
Seemingly, it would be easier to just fix the problem and see if it matters. Seemingly this is also the thing to do if one's priority is getting the science right.
Section 5 confronts our finding that the PK estimation method is bimodal, tending to generate two contradictory results---microcredit increases or reduces household spending. The section argues that such bimodality is neither unusual nor a problem. The argument here seems flawed in three ways, one minor and two major.
The minor flaw is that the computer simulation code that shows the normalcy of bimodality doesn't actually demonstrate the presence of two local maxima the way ours does (see Figure1conloop3negt.do in the code and data file). The code loops over various possible impact coefficients for female and male credit, equating the two, each time maximizing the likelihood while constraining the estimated impacts to these values. It is a graph of constrained likelihoods over a subset of possible values for the constrained parameters. But I fix this is and found that the two peaks indeed correspond to true local maxima when the likelihood search is unconstrained. So the double-peaked graph is conceptually flawed but meaningful.
The first major problem was already mentioned. The simulations unrealistically deviate from the PK set-up in ways that weaken the instruments. One deviation is in the hypothesized data-generating process: amount borrowed averages zero. The other is in the estimator: credit availability is a control rather than instrument. When these two problems are fixed, the bimodality goes away. Compare this to PK's double-humped Figure 1:
So instead of challenging Jonathan and me by showing that bimodality is the norm, PK's simulations corroborate us by associating bimodality with econometric degeneracy.
The other major problem is an elision of the distinction between bimodality in the likelihood and bimodality in the estimator. It is absolutely the case, as PK say, that ML does not require unimodality of the likelihood for consistency. If a particular mode is highest with probability 1 as sample size goes to infinity and the ML search always detects this mode, the estimator will be consistent. However, multimodality in the estimator is inconsistency prima facie. It's a matter of matter of definition, not theory. (Some narrow counterexamples for completeness: the estimator could be asymptotically bimodal such that the mass of all but one mode goes to zero in probability, or such that the modes become infinitely close. But the data do not suggest such scenarios.)
And we do present evidence of bimodality in the estimator, via bootstrapping. Using the best method we've found for detecting multiple modes, we found 65% of the mass of the ML estimate of the impact of Grameen lending to women to be below zero. But I recently discovered a subtle bug in that code: the right number, I now believe, is 36%:
So a one-tailed test of whether the impact is positive yields significance only at p=0.36. PK address our bootstrapping only in footnote 26, where they say it "lacks any econometric justification." For justification, they can refer to authoritative texts.
It's possible that the estimator is asymptotically stable, contrary to our finite-sample evidence. Perhaps 5,218 observations on 1,798 households is not large enough for asymptotic behavior to kick in. That brings us to theory. Econometric theory tells us that the PK estimator is consistent when the assumptions implied in its likelihood are correct, notably normality of the errors. It may well be robust to certain violations of these assumptions, such as the non-normality we detected, but no one has proven as much. Thus PK are correct that "RM’s concern with bias [actually, inconsistency] arising from non-normality draws no support from econometric theory." But the opposite holds: neither does theory reassure. And I thought that in econometrics, estimators were presumed inconsistent until proven consistent. It takes chutzpah to defend an estimator by saying no one has proved it doesn't work.
PK offer some hand-waving about why their estimator is probably robust to non-normality. The arguments are reasonable, but illustrate the danger of hand-waving, for they are wrong. These simulations demonstrate. We are correct that the classical linear estimators are strictly more robust to non-normality than PK's, and thus provide a useful check.
Section 6 returns to the linear estimators that PK first proposed and used (and which again relate primarily to the "why," not the "whether").
There is some semantic confusion here---linear LIML is not a way of doing 2SLS, and parameter and moment covariance matrices are not the same. But the main upshot is that our first linear regression, in which there are six instrumented variables, credit by gender and lender---is underidentified. It is "the equivalent of sirens blaring and red lights flashing to proclaim that something is terribly wrong with the estimation." Actually, our paper notes the underidentification (see the under-ID test in the table above) and does not rely on that regression for inference. We fix it by pooling credit by lender, which eliminates the underidentification, as shown above (the p values on the test plunge to 0.000).
Moreover, as that table newly shows, and contrary to PK, instrument weakness is not an irreducible source of trouble. In the exactly identified estimates, the instruments appear strong; the impact of microcredit on poverty still does not. It appears now that the instruments are not weak in the overidentified regressions, at least those that pool across lenders. Rather, instrument proliferation is distorting the test of instrument weakness. Adding instruments should increase instrument strength even if the test of strength says opposite. This finding does contradict our earlier thinking a bit, and I'll return to it.
The section also provides an unusual interpretation of the linear estimates, combining the point estimates from LIML with the "perfectly valid" standard errors from 2SLS. In fact, the LIML and 2SLS regressions return the exact same weak instrument diagnostics, so it's not clear why one is more valid than another. At any rate, it is an unorthodox move and a thin reed on which to rest a defense of PK.
Section 7 is interesting. It strips away components of the PK estimator to isolate the source of identification. What it still does not do, however, is use the formal language of probability to state and defend the conditions needed for identification of causal effects. PK, for example, have never motivated the assumption that variation in the availability of credit by gender is exogenous. They also have not explained why credit availability is a good instrument in a nonlinear IV set-up despite our demonstration (Table 4) that credit availability is correlated with the second-stage error.
Contrary to appearance, the new PK paper confirms rather than refutes the conjecture that bimodality in the PK estimator is a sign of weak instrumentation. The paper does not overturn the finding that dropping a handful of systematically picked outliers collapses the two modes into one near zero. It does not change the fact that linear estimators that are robust to demonstrated deviations from the likelihood model produce estimates close to zero. It does not change the fact that the PK estimator is demonstrably inconsistent in the face of such deviations. It does not address the bootstrap evidence that the estimator is inconsistent on the real data.
But we have learned from this round. Most important is the discovery of a paradox: PK now provide a laboratory demonstration of how weak instruments make their estimator bimodal, confirming one of our hypotheses; yet the friction with them led me to run exactly identified linear regressions that revealed the instruments to be strong in that context, cutting against our hypothesis.
This forces us to revise whatever tentative insight into the nonlinear PK regressions that we derive from the linear analogs. I would not now hypothesize that the PK instruments are weak in the usual sense, across the full sample. Nevertheless, as Jonathan and I noted in 2011, the PK result disappears when dropping villages where both genders can borrow and, symmetrically, persists strongly when restricting to just those villages. So the PK result seems to emanate from this subsample, in which the female and male credit availability dummies are identical, making them weak for explaining distinctive variation in the endogenous variables, credit uptake by gender. It seems as if instruments being weak only within a subsample, while irrelevant for linear estimation, can distort a nonlinear one---at least when there are outliers. This is why I continue to conjecture that the outliers and instrument weakness are interacting: fix either and the bimodality goes away.
Perhaps someone else can formulate a sharper explanation for the instability of the PK estimator; our conclusions about the credibility of the PK findings remain regardless.