The Hunting of the Snark: Reinhart, Rogoff, and McArdle

While I was on my ~~mental vacation~~ hiatus, Megan McArdle was all over la affaire of Reinhart and Rogoff. As we all know, data is McArdle's bête noir. It didn't have to be that way. McArdle was always willing to hold up her side of the data bargain. She was perfectly happy to look for numbers that seemed to support her point, or perhaps a bar graph or one of those colorful little pie charts. True, the graph didn't always say what she seemed to think it said, but the spirit was willing even if the mind was weak. But those numbers betrayed her by being different from the numbers she envisioned in her head. Something was very wrong in McArdleland but never fear, McArdle was more than eager to Cuisinart the lemons of defeat into the lemonade of victory.

Bit of a bombshell in the econoblogosphere yesterday. Several economists from the University of Massachusetts are contesting one of the key findings by the authors of This Time is Different, a landmark study of financial crises and debt dynamics from Carmen Reinhart and Ken Rogoff. At issue is their observation that once the debt-to-GDP ratio passes 90%, growth slows down dramatically. We should be careful about what we're actually refuting. Since this critique broke, there's been a bit of strengthening up Rogoff and Reinhart's claims in order to beat them down--claiming, for example, that Rogoff and Reinhart asserted that high debt mechanically causes low growth. I've interviewed both of them about their work, and they've always been most modest in their claims, emphasizing that they've isolated an empirical regularity, not causality. While the paper under question does speculate about possible vehicles for causality, its claims are more modest than both its critics, and those who have bandied about the 90% statistic, would have you believe.

I would love to take McArdle's word for it that since she has met Reinhart and Rogoff she knows they would not claim a relationship between high debt and low growth. However the last time she personally vouched for a man that gentleman was David Koch. It would be wiser to go to the source, the Reinhart and Rogoff (R and R) paper Growth In A Time Of Debt.

Our main finding is that across both advanced countries and emerging markets, high debt/GDP levels (90 percent and above) are associated with notably lower growth outcomes. In addition, for emerging markets, there appears to be a more stringent threshold for total external debt/GDP (60 percent), that is also associated with adverse outcomes for growth. Seldom do countries simply “grow” their way out of deep debt burdens. Why are there thresholds in debt, and why 90 percent? This is an important question that merits further research, but we would speculate that the phenomenon is closely linked to logic underlying our earlier analysis of “debt intolerance” in Reinhart, Rogoff, and Savastano (2003). As we argued in that paper, debt thresholds are importantly country-specific and as such the four broad debt groupings presented here merit further sensitivity analysis. A general result of our “debt intolerance” analysis, however, highlights that as debt levels rise towards historical limits, risk premia begin to rise sharply, facing highly indebted governments with difficult tradeoffs. Even countries that are committed to fully repaying their debts are forced to dramatically tighten fiscal policy in order to appear credible to investors and thereby reduce risk premia. The link between indebtedness and the level and volatility of sovereign risk premia is an obvious topic ripe for revisiting in light of the more comprehensive cross-country data on government debt.

According to Wikipedia, critical realist economists say:

The world that mainstream economists study is the empirical world. But this world is "out of phase" (Lawson) with the underlying ontology of economic regularities. The mainstream view is thus a limited reality because empirical realists presume that the objects of inquiry are solely "empirical regularities"—that is, objects and events at the level of the experienced.

McArdle is saying that R and R are just noting two things that happen and do not say one was a cause of the other. Clearly this is wrong and it's typical that McArdle thinks denial based on an appeal to authority is an effective means of argumentation. Especially when she has written about Reinhart's work before. In this post McArdle argues that Reinhart is right that our debt will drag down the economy.

One way or another, all the debt we've taken on has to be dealt with. And the least painful way, at least in the short term, is for central bankers to keep their hands on the interest rate levers--and their eyes on the government debt.

And if the R and R paper was not pointed enough, Reinhart emphasized the relationship in an interview with Der Spiegel.

You have to deal with the debt overhang one way or the other because the high debt levels are an impediment to growth, they paralyze the financial system and the credit process.

Perhaps Reinhart played McArdle or McArdle didn't understand a word the woman said. Or McArdle is attempting to re-write reality to agree with her personal opinions.

I've seen more than one suggestion today that Rogoff and Reinhart must have deliberately or subconsciously biased their work because they're such mad advocates of fiscal austerity. But I interviewed Rogoff about the fiscal cliff last fall, and he was emphatic that we should not simply slam on the brakes and cut spending drastically, immediately. In fact, he was moderately dovish on stimulus. For example, he said "Back in 2008-9, there was a reasonable chance, maybe 20% that we’d end up in another Great Depression. Spending a trillion dollars is nothing to knock that off the table." Rogoff is basically an austerity moderate: he thinks we should be spending a little more now, while making plans to cut back in the future. And note that the main vehicle by which they suggest high debt causes slow growth is . . . that it forces sudden fiscal contraction.

Apparently Rogoff does advocate austerity, just a slower, gentler type. And Reinhart believes austerity is absolutely necessary.

SPIEGEL: Do you think it is wrong for Europe to focus on austerity measures with inflation at such a low level? Reinhart: No. Restructuring, inflation und financial repression are not substitutes for austerity. All these measures reduce your existing stock of debt. Unless you do austerity you keep adding to the debt. There is no either-or. You need a combination of both to bring down debt to a sustainable level.

McArdle (incorrectly) throws austerity into the mix to distract her readers from the main issue, the data. Then she attempts to downplay the effect of the Reinhart and Rogoff paper by declaring it had only a trivial effect on some radicals. It is utterly impossible to believe that McArdle does not know of the impact of the R and R paper; at the very least she saw a description of it in the paper written by Herndon et al. As the important critique relates, the impact of R and R's paper was enormous.

Publication, Citations, Public Impact, and Policy Relevance

According to Reinhart's and Rogoff's website, the findings reported in the two 2010 papers formed the basis for testimony before the Senate Budget Committee (Reinhart, February 9, 2010) and a Financial Times opinion piece "Why We Should Expect Low Growth amid Debt (Reinhart and Rogoff, January 28, 2010). The key tables and figures have been reprinted in additional Reinhart and Rogoff publications and presentations of Centre for Economic Policy Research and the Peter G. Peterson Institute for International Economics. A Google Scholar search for the publication excluding pieces by the authors themselves finds more than 500 results. The key findings have also been widely cited in popular media. Reinhart's and Rogoff's website lists 76 high-profile features, including The Economist, Wall Street Journal, New York Times, Washington Post, Fox News, National Public Radio, and MSNBC, as well as many international publications and broadcasts. Furthermore, RR 2010a is the only evidence cited in the "Paul Ryan Budget" on the consequences of high public debt for economic growth. Representative Ryan's "Path to Prosperity" reports

A well-known study completed by economists Ken Rogoff and Carmen Reinhart confirms this common-sense conclusion. The study found conclusive empirical evidence that gross debt (meaning all debt that a government owes, including debt held in government trust funds) exceeding 90 percent of the economy has a significant negative effect on economic growth. (Ryan 2013 p. 78)
RR have clearly exerted a major influence in recent years on public policy debates over the management of government debt and fiscal policy more broadly. Their findings have provided significant support for the austerity agenda that has been ascendant in Europe and the United States since 2010.

But McArdle is not one to let reality rear its ugly head.

That said, many more radical austerity hawks have naturally been drawn to that 90% figure. Such a lovely, round, precise number is bonza for stump speeches and TV sound bytes, and unsurprisingly, it's been found in a lot of them. So it matters whether it's in error. And it does seem to be at least somewhat in error.

Tie me kangaroo down, sport! After mitigating and shading as hard as her little heart could, McArdle is forced to admit that mistakes were made. Little, inconsequential mistakes that would in no way destroy R and R's arguments. And that mistake never influenced policy at all.

The UMass authors (heretofore to be known as Herndon et al) argue that there are three major problems with Rogoff and Reinhart's work, or at least with the claim that very high debt causes negative average growth rates: 1. They excluded the immediate postwar-growth years for Australia, Canada, and New Zealand. 2. There is a coding error in the spreadsheet which caused them to exclude the first five countries in their analysis: Austria, Australia, Canada, Belgium, and Denmark. 3. They weighted each country's growth rate during high-debt episodes equally, rather than by the number of years for which the debt persisted.

Number one is arguably the most troubling, but funnily enough, it is mostly taken care of by the coding error. We're actually arguing mostly about New Zealand. Herndon et al. argue that New Zealand, plus the decision to weight by country, instead of the number of years that each country was in debt, lowers the growth rate during high-debt episodes from a somewhat robust 2.2% average to a terrible -0.1% average. Basically, they're arguing that because New Zealand had one year of very high debt and very bad growth, when you weight all the countries equally, you multiply that one bad year into a spurious "tipping point" where high debt destroys your GDP growth rate. Obviously this is a problem. I'm unable to tell exactly how much of a problem, because the country-year method is also arguably problematic. The years that a country spends in debt are serially correlated--which is to say that if you had a debt load above 90% of GDP last year, you're much more likely to have a similar debt load this year than a country which had a debt load in the 30% of GDP range. So weighting by country year is also likely to produce problems with your data. You could argue about how to calculate this for years--and I hope that these guys, and Rogoff/Reinhart, will do just that.

Because McArdle would much rather you argue methodology than discuss the effect of the methodology problems on the data and therefore conclusion. But what of McArdle's claim that serial correlation invalidates the country-year method? Let's had over this question to Josh Bivens at the Economic Policy Institute:

Some have argued that “serial correlation” in country/year high debt episodes—particularly when the years are consecutive—might mean that each country/year observation is actually not providing another fully independent data point in their sample and that weighting each as such might be inappropriate. Maybe, but it’s a long way from this insight to thinking that a proper fix is that the “year” part of the country/year observation should be completely ignored and each high debt year for a given country should just be collapsed into one single data point. Further, R&R have never been hugely clear about the economic transmission mechanism that allows high debt ratios to slow growth (indeed, they note that the most logical prime suspect—rising interest rates, do not seem to be up to the job of explaining this association). What they have strongly implied is that it is the problem of debt exceeding 90 percent is greatest when it comes in long-lived episodes rather than in one or two-year bursts (their latest paper on “debt overhangs,” in fact, focuses exclusively on episodes of debt exceeding 90 percent of GDP for five years or more). Given this, one might think that serial correlation would make their results stronger when one switches to country/year observations. That is, long-lived episodes of high debt (the 19 years in the UK) should be much more damaging to growth than one-off years that see debt barely move over the 90 percent threshold and then retreat (the one year of New Zealand data in their sample). But as HAP show, weighting each country/year observation equally (which should allow serial correlation to influence the results) actually makes most the R&R findings on debt exceeding 90 percent melt away.

After blowing smoke, McArdle helpfully hands her audience a convenient excuse to ignore the flaws in R and R's data:

Frustratingly, though the authors of the paper break out the results in various ways, the labelling is not very clear, and as far as I can tell they do not show you what the data looks like if you put all the New Zealand miscoded years back, but use the Rogoff/Reinhart weighting method. I'd really like to see this to get a sense of how much of their dispute hinges on omissions, and how much over disagreements about weighting methods.

Hernden et al tell us:

The exclusion of the missing years is alone responsible for a reduction of 0.3 percentage points of estimated real GDP growth in the highest public debt/GDP category. Further, RR's unconventional weighting method that we describe below amplifies the effect of the exclusion of years for New Zealand so that it has a very large effect on the RR results.

Not that the actual numbers matter to McArdle. She's already decided that there are two versions of R and R's conclusion, the actual "strong" one and an imaginary "softer" version that would be easier to defend.

Nonetheless, I think it's fair to say that a result should not hinge on a single bad year from New Zealand. And Herndon et al are arguing that the "strong" version of Reinhart-Rogoff, where debt levels of above 90 of GDP are actually correlated with negative growth rates, is almost entirely driven by that one bad year, plus the choice of weighting method. This is not, to put it mildly, a very robust result. The question remains: how much does it matter? As a policy matter, in my humble opinion is: not at all.

McArdle goes on to explain in great length why R and R's paper didn't really count because everybody else said debt was associated with slowdowns and Clinton reduced debt so liberal critics are just hypocrites and Europe's austerity had nothing to do with the paper and the US didn't count.

I think there is a roughly 0% chance that US economic policy would be detectably different if Reinhart and Rogoff had never been published. This is obviously going to be embarassing for Reinhart and Rogoff, because coding errors always are, and especially when your coding error produced a widely cited figure. To point out the obvious, conservative wonks and politicians should stop citing that result.

Most of all, McArdle wants us to learn a very valuable lesson from this unfortunate situation: Nobody can know anything ever.

And to point out the somewhat-less-obvious, people on all sides should be cautious about lovely, round numbers. Even if there had been no coding error, no disagreements about the country weights, this still would have been one number from one study. People were relying on this figure because it gave the illusion of precision. For some stupid reason, things sound more like a fact if there's a number attached.

Especially liberals.

On the flipside, no one should be acting as if discrediting this single number somehow defeats the hawkish arguments over government borrowing. Even if this one number is wrong, there is still ample reason to worry about debt dynamics and crowding out--some of that evidence from Reinhart and Rogoff, but also from many other sources. Indeed, Herndon et al show a relationship. They say that this relationship is not statistically significant. But "not statistically significant" is not the same as "unlikely to be true". There is other empirical work, and some good theoretical reasons, to think that too much debt is dangerous. The reasons for debt hawkery can certainly be argued with. But they neither stand nor fall on a single paper, much less a single number from it.

I predict within a week or two, McArdle will be reminding us that it has been proven statistically that high debt slows growth, and will link to this article as proof. I also predict that if a liberal made these sorts of mistakes (or if McArdle could convince people that a liberal made these sorts of mistakes), McArdle would be the first to call them dishonest and ideological, as she attempted to do with Elizabeth Warren.

14 comments:

Dragon-King WangchuckMay 29, 2013 at 2:20 PM
But "not statistically significant" is not the same as "unlikely to be true".

WhatisthisIdon'teven.
Susan of TexasMay 29, 2013 at 2:22 PM
See, just because R and R were wrong doesn't mean that what they said wrong. It just means that if they were wrong, which they were, it doesn't matter.
Dragon-King WangchuckMay 29, 2013 at 2:53 PM
"Not statistically significant" means EXACTLY "unlikely to be true". In fact, the degree to which it is statistically insignificant quantifies the probability that the thing is not true.

Is there a potential argument about p-values and sample sizes and wevs? No. That's covered by the word "unlikely". The liklihood is exactly described by the significance test. And sample sizes? The not statistically significant correlation was found using EXACTLY the same data that R&R used in the first place. If there was enough data to prove the hyhpothesis true, there is also enough data to prove it false - at exactly the same level of confidence.

Unless, I guess McArdle uses a new kind of statistics where it's possible to reject the null hypothesis while still getting it to pay for dinner and a movie.
Susan of TexasMay 29, 2013 at 2:59 PM
McArdle's numbers mean exactly what she wants them to mean, no more, no less.

More seriously, since she is certain the conclusion is right, the actual proof doesn't matter to her at all. It's the conservative mind at work, in which denial prevents them from acknowledging anything they don't like.
Smut ClydeMay 29, 2013 at 3:29 PM
"Bonza" is *not* an acceptable spelling of "bonzer". Urban Dictionary is wrong about this. Why MM would affect out-of-date Orstralian slang is a mystery.
fishMay 29, 2013 at 4:24 PM
"Not statistically significant" means EXACTLY "unlikely to be true".

Actually all it means is that you can't rule out the null (not true) hypothesis. You can have a not statistically significant result that can become significant if the sample size (n) becomes larger. The larger the n required to prove something, the smaller the effect size is, but it can still be real.
Dragon-King WangchuckMay 29, 2013 at 7:09 PM
Well more data changes stuff, sure I totally accept that. But on the point - once you determine that something is not statistically significant at p=x then it's x likely to be untrue. While stats defines the continuum of possibilities between true and untrue, it's binary in that there's no state for "neither". IOW, the probability that the hypothesis is true summed with the probability that it's not true is 100%.

In the specific example, McArdle (and austerity junkies) believe that debt is correlated (actually they believe causation which is another argument) with low growth. There is no statistically significant correlation between debt and growth rate. Therefore the hypothesis is not likely to be true. If the p value for that test was the typical 2 sigma 19 times out of 20, then it is 95% likely (or wev the actual %age is for that test distribution) that there is no correlation between debt and growth.

Unless there's another state. I mean, I not saying that the opposite of the tested condition is true, just that if no statistically significant correlation is found at some confidence interval, than you can say that no correlation exists with that level of confidence. That's right, isn't it?
bradMay 29, 2013 at 10:04 PM
Statistical significance, like all empirical manifestations of consensus reality, completely leaves out faith, stupidity, ignorance, and self-serving partisan deception.
Besides which, it hurts the authoritarian brain to have to think of things in terms of probabilities and potentialities and multiple outcomes based on multiple choices by multiple actors when everything is ultimately good/bad, right/wrong, based on tribal loyalty, and Daddy does all the real doing anyway.
McMegan naturally no like.
DownpuppyMay 30, 2013 at 7:01 AM
Crowding out?

Crowding out!!*&^$*&*

Yeah. Mid term AFR is 1% because nobody has money to lend.

She isn't even trying.
ifthethunderdontgetya™³²®©May 30, 2013 at 8:45 AM
Or McArdle is attempting to re-write reality to agree with her personal opinions.

I think that's the winning ticket.
~
fishMay 30, 2013 at 10:54 AM
I not saying that the opposite of the tested condition is true, just that if no statistically significant correlation is found at some confidence interval, than you can say that no correlation exists with that level of confidence. That's right, isn't it?

No, it is not exactly right. You could have a p value of say 0.056, this would not traditionally be considered statistically significant, but it is still quite likely to be true (94.4%). The p=0.05 cutoff is a somewhat arbitrary one that is set so that there is some standard language that everyone can agree on. This kind of statistics is a continuum, that is why sample size is a critical component of any statistical argument. Large effects can be detected with small N (e.g. the fake data from R&R), but it takes much larger sets to prove small effects are real. But by that time, you are admitting that the effects are small (the opposite of what R&R want in this case) and probably other conditions are more important or there are hundreds of small effects that add up to the observed phenomenon...
fishMay 30, 2013 at 11:00 AM
Maybe a better way of saying it is p value represents they confidence with which you can say that the null hypothesis (i.e. no correlation) is not true. So it is when the p value hits 0.5 it is a coin toss (50/50 true not true). At p= 0.05 you are correct that the null is not true 95% of the time.
Susan of TexasMay 31, 2013 at 9:23 AM
Smut Clyde--I am guessing that her knowledge of Australia comes mostly from Nevil Shute books, which would explain the WWII Australian slang.

McArdle is extremely parochial for someone who's a world traveler.
Ghost of Joe Liebling's DogOctober 5, 2013 at 8:04 AM
I had to follow the link, just for the "heretofore to be known as" -- it's so rare to see one in the wild.

Such a shiny big word; it's almost sad that it doesn't mean what she thinks it means.

The Hunting of the Snark

Atlas Shrugged: The Mocking

Wednesday, May 29, 2013

Reinhart, Rogoff, and McArdle

14 comments: