Wednesday, March 30, 2011

Keith Briffa and the Renaissance

While I was busy with the Antarctica analysis a blogfuss started at CA on Briffa's 1999 (!) Science presentation of various paleo reconstructions, particularly relating to whether a particular graph should have started in 1400 or 1550. A file had been discovered which showed data down to 1400, and if you plot it, it goes into oscillations in the years before 1550. Since it is clear that this is in a period of rapidly diminishing data, and very likely caused by that, I thought that would die fairly quickly, but no, as these things go, it was promoted to a grand ethical violation, megaphoned at WUWT, and taken up at the Air Vent, where it was seen as "unbelievable fraud".

The evidence is supposed to be this graph, taken from this paper: SEEING THE WOOD FROM THE TREES. KR Briff and TJ Osborn, Science, 1999

The pink bits have been helpfully added by Steve McIntyre, based on numbers he found in an XL file which had been apparently inadvertently attached to another paper's data archived at NOAA. It's undocumented there.

Well, it seemed clear to me that the available data is just getting low as we go back beyond 1550, and the wild swings are just the result of the growing noise, as you'd expect. And I haven't found anyone who seems to seriously think they reflect any kind of reality. So Briffa sensibly stopped at 1550 to avoid misleading the public. But no, it's apparently academic misconduct, another Yamal even. [ed.. But, but, wasn't that about actually proceeding with plots where data was running low? O, never mind, ethical violation etc].

So I belatedly saddled up and sallied forth to dispute at CA and tAV. I don't want to rehash that here, nor transfer the debate, which I'll continue there. In the course of it, there was argument about how much the data was actually reducing, and Steve McI pointed me to the archived list of sites at UEA. There isn't one for the Science paper but there is a list for the Nature paper that it references:
Briffa KR et al. (1998) Influence of volcanic eruptions on Northern Hemisphere summer temperature over the past 600 years. Nature 393, 450-455.
and a list for a subsequent paper
Briffa KR et al. (2001) Low-frequency temperature variations from a northern tree-ring-density network. J. Geophysical Research 106, 2929-2941.

From these, I could extract a graph for the rate at which the number of sites was reducing in the relevant period. I also made lists of the actual sites concerned.



The reason for plotting the JGR paper as well is that Steve M pushed the claim that the JGR paper had plotted down to 1400, while the Science paper stopped at 1550. They referenced different data sets, so I wanted to check if they perhaps had more data in the later paper. As you'll see, the answer is no, but they did claim in the second paper to have an improved method, which might well allow them to go further back in time.

Anyway, here is the plot of number of sites vs time, over all time.

As you can see, the number of sites is dropping rapidly before 1600, and is down to about 40 near 1550. Here is the expanded region between 1400 and 1600:

As you can see, the rate of decrease is quite sharp near 1550. There's no absolute rule on where you have to say that a plot has to be stopped. The noise rises relative to the signal in a continuous way, and I don't curently know how to quantify whether 40 sites is likely to be sufficient. But neither do the critics. What is clear is that the observed rapid changes observed in McIntyre's graph are closely associated with the steep reduction in data. In those circumstances, I would be very uncomfortable about presenting them as real. And I don't think referees would let me.

Here is a list of the first 61 sites from the 1998 Nature paper. I won't list the corresponding JGR sites, as it is very similar.
"ID""Name""Abbr""Genus""Lon""Lat""Start""End"
"276"" 862A "" Polar-Ural (historisch) "" POU_LA "" LASI "65.6366.879141990
"253"" 928A "" Mangazeja (hist. + rez.) ""MANGAZPC "" PCOB "82.366.6812461969
"252"" 928D "" Mangazeja (historisch) ""MANGAZLA "" LASI "82.366.6813071698
"301"" 913A "" Zhaschiviersk ""ZHASHIST "" LASI "142.6267.4513111708
"351"" 1041A "" Majakit river/village ""MAJAKILA "" LADA "151.9561.2813381994
"278"" 930A "" Seimchan-river ""SEIMCHLA "" LADA "151.7263.5213621991
"215"" 745A "" Sylvan Pass bei Cody "" SYLVAN "" PCEN "-110.1344.3713881983
"264"" 1017A "" Nuleger river ""NULEGELA "" LADA "127.4371.2213911990
"174"" 691A "" Jasper "" SUNW "" PCEN "-11752.2514001983
"383"" 9999Z "" Tornetrask "" TORNXX "" PISY "19.568.314001980
"206"" 731A "" Medicine Bow Peak "" MEDI "" PCEN "-107.741.314011983
"372"" 992A "" Qamdo "" QAMDOFI "" PCBA "96.9531.0814061994
"337"" 1014A "" Bilibina,Mali Anuj river ""BILIBILA "" LADA "167.6767.4714321991
"382"" 1060A ""Tschokurdach,Ochotingna r ""TSCHOKLA "" LADA "148.0570.2814341990
"48"" 630A "" Sierra da Crispo "" CRISPO "" PILE "16.2339.914411980
"332"" 1032A "" Andryuschkine ""ANDRYULA "" LADA "145.7769.2814491991
"268"" 899A "" Olenjok ""OLENJOLA "" LAGM "112.5368.5214501990
"214"" 686A "" Snow Bowl, San.Fr. Peak "" SNOW "" PCEN "-110.235.4314531983
"377"" 1063B "" Adycha river (flach) ""ADYBOALA "" LADA "137.2565.814811991
"304"" 865A "" Olenok-River ""ZWANZGLA "" LAGM "119.168.5214821990
"316"" 540A "" Lofoten, Loedingen "" LOFOTN "" PISY "16.0368.4814851978
"333"" 955A "" Balshoia Anui ""BALANULA "" LADA "165.4266.2214921991
"198"" 692A "" Highland Fire Outlook "" HIGHL "" PCEN "-112.5345.7514961983
"209"" 733A "" Powder River Pass "" POWDER "" PCEN "-107.0544.1514961983
"352"" 981A "" Omoloya river ""OMOLOYLA "" LADA "132.9870.9514961991
"380"" 1070A "" Balygichan-river ""BALYRILA "" LADA "154.5862.1314971994
"188"" 762A "" Barlow Pass, am Mt.Hood "" BARLOW "" PSME "-121.6545.3215041983
"200"" 746A "" Granite Pass, Hunt Mtn. "" HUNT "" PCEN "-107.8744.7815081983
"101"" 689A ""Hidden Peak, Wasatch Mtn. "" ALTA "" PCEN "-111.6340.5715111983
"184"" 702A "" Yosemite Park, E Eingang "" YOSE "" PICO "-119.2537.815131983
"87"" 629A "" Col de Sorba, Mt.Renoso "" SORBA "" PINI "9.242.0715181980
"202"" 760A "" Lassen National-Park "" LASSEN "" TSME "-121.5240.4515251983
"196"" 723A "" Galena Pass, Sawtooth NF "" GALENA "" PCEN "-114.7243.8715301983
"208"" 756A "" Pike Peaks "" PIKES "" PCEN "-105.0339.3315301983
"341"" 974A "" Cherskij ""CHERSKLA "" LADA "163.0568.815381991
"220"" 961A "" Ayakli river ""AYAKLILA "" LAGM "97.5369.5315401990
"192"" 755A "" Electric Lake "" ELEC "" PCEN "-111.3339.5815421983
"345"" 1047A "" Julietta north ""JULINOLA "" LADA "153.9761.1715471994
"204"" 732A "" Lone Lake "" LONE "" PICO "-118.4737.1715481983
"336"" 973A "" Batagay,Chandon-river ""BATAGALA "" LADA "138.1770.2515501991
"221"" 924A "" Ayandina-River ""AYANDYLA "" LADA "143.1768.4215531991
"125"" 704A "" Denali National-Park "" DENALI "" PCGL "-149.5863.6715541983
"210"" 700A "" Sierra Blanca, (Ruidoso) "" RUID "" PSME "-105.6733.3315541983
"339"" 956A "" Bulun river ""BULUNLAD "" LADA "154.9365.115541991
"353"" 958A "" Rossocha river ""ROSSOCLA "" LADA "149.4365.1515551991
"187"" 726A "" Baldy Peak "" BALDY "" PSME "-109.5533.9715561983
"348"" 969A "" Srednie-Kolymsk ""KOLYMSLA "" LADA "153.767.2515561991
"381"" 1061A "" Kubaka right bank middle ""KUBA2LAG "" LAGM "159.9563.6715601994
"240"" 864A "" Kotuykan-River ""KOTUYALA "" LAGM "104.2570.5815631990
"191"" 748A "" Crater Lake, NE-Medford "" CRATER "" TSME "-122.1742.9715641983
"246"" 926A "" Kuonamka-River (trocken) ""KUONLATR "" LAGM "112.8269.9315641990
"302"" 906A "" Zhigansk ""ZHIGANPI "" PISY "122.3366.5215641991
"354"" 970A "" Sartan river ""SARTANLA "" LADA "132.9364.9315641991
"190"" 683A "" Cottonwood Pass "" COTTON "" PCEN "-107.5838.6715651982
"356"" 980A "" Tirekhtjakh river ""TIREKHLA "" LADA "137.4767.6215651991
"203"" 695A "" Mt. Lemon "" LEMON "" PSME "-110.7832.4515681983
"234"" 927A "" Khandiga-River ""KHANDILA "" LADA "137.7562.4715681990
"238"" 914A "" Khotugn-Uladan-Tukulan ""KHOTUGPI "" PISY "125.863.3815681991
"228"" 1011A "" Balschaya Kamenka river ""KAMENKLA "" LAGM "93.9771.3215691990
"243"" 871A "" Kulyumbe River ""KULYUMLA "" LASI "89.0768.0515741990


















Briffa et al Nature 1998: ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/reconstructions/n_hem_temp/nhemtemp_data.txt

24 comments:

  1. Hey Nick,
    Interesting discussions. I figured you might like to further note that spatial distribution is probably the key factor here. See the following:
    http://www.skepticalscience.com/pics/Sites.png

    This is also supported by Briffa 2001

    “Bias might be introduced in cases where the spatial coverage is not uniform (e.g., of the 24 original chronologies with data back to 1500, half are concentrated in eastern Siberia) but this can be reduced by prior averaging of the chronologies into regional series (as was done in the previous section)… Eight different methods have been used… They produce very similar results for the post-1700 period… They exhibit fairly dramatic differences, however, in the magnitude of multidecadal variability prior to 1700… highlighting the sensitivity of the reconstruction to the methodology used, once the number of regions with data, and the reliability of each regional reconstruction, begin to decrease. The selection of a single reconstruction of the ALL temperature series is clearly somewhat arbitrary… The method that produces the best fit in the calibration period is principal component regression…

    “…we note that the 1450s were much cooler in all of the other (i.e., not PCA regression) methods of producing this curve…”

    It is a bit arbitrary to select 40 but without mapping the 40 in particular it might be hard to understand. Later today I can provide a GIS map for you perhaps.

    ReplyDelete
  2. Nick,
    In my research, I came across this paper,

    http://www.cru.uea.ac.uk/cru/pubs/thesis/2004-melvin/melvin-2004-thesis.pdf

    a dissertation by Tom Melvin. This very much looks primarily about data from the Northern treeline in Luosto, Finland and Helldalisen, Norway. It has references to the Briffa Jones etc work from that time period and discusses the pre-1550 data.

    The NOAA has a lot of that data

    http://www.ncdc.noaa.gov/paleo/indextree.html

    from the sites listed on page 28 of the Melvin dissertation.

    I hope this helps you.

    - gryp

    ReplyDelete
  3. What you should do in a case like this is include the data together with an uncertainty estimate of the data.

    Strange that people demanded that Loehle provide error bars, but they don't do it themselves. That's just an aside.

    I also suspect you can find plenty of proxy reconstructions that rely on far fewer than 40 proxies.

    What you don't do is quietly redact part of your data analysis without explanation. Not and remain ethical in your treatment of the data, that is. These were't junior researchers so "sloppiness" isn't a viable excuse.

    ReplyDelete
  4. Okay Carrick,
    So you suggest using a global temperature record like this:
    http://treesfortheforest.files.wordpress.com/2010/05/ghcn_global_temp_area_1961-1990_5x51.png

    And just including uncertainty estimates going back further? Are you really saying that ALL data in a reconstruction should be included nowmatter how uncertain?

    ReplyDelete
  5. No, Robert I'm not suggesting that. Where did you get that idea?

    Thew issue is only about the responsibility of the researcher in reporting the results of his analysis of an ensemble of data that he has analyzed, whether he collected it himself or not..

    What we have here is the post-hoc redaction of a portion of the data there were considered by the authors, with no explanation for why the data were redacted and no comment that a redaction ever occurred.

    Can you imagine as a hypothetical example, somebody comparing a model to data, and stopping the comparison at the point where the model and data started to disagree (and making no comment that they had done this)? Would that be appropriate to do?

    ReplyDelete
  6. Also, regarding the uncertainty in the data, Nick has only shown there were fewer series, he hasn't shown that the uncertainty has bloated to the point where cutting off the graph at 1550 would be a reasonable and prudent thing to do. After all, if the data are uncorrelated, the uncertainty goes as the square-root of the number of time series. This means in going from 40 tree ring series in 1550 to 20 series in 1450, we've only increased the uncertainty by a square root of 2.

    (Even if Nick were to objectively demonstrate that one should drop the data prior to 1550, which I doubt he can actually do, that still doesn't excuse the neglect of the authors in failing to mention that they had so, nor the neglect in the explanation for why they had cut the series off.)

    ReplyDelete
  7. C arrick,
    I think the continuous plot underlies the point - where would you stop? There's data back to 954 ad, although that would give about 300 years of just Polar Urals. So do they have to show a plot back to 954? What makes 1400 OK?

    And yes, I can try to imagine what CA would be saying if they presented 300 years of NH based on Polar Urals.

    ReplyDelete
  8. Robert #1
    Thanks. Yes, I think it may be mostly spatial, but the sample number also has an effect, which I've tried to illustrate with the new post.

    ReplyDelete
  9. Grypo #2
    Thanks. I've downloaded that thesis, and it looks very informative and well written. I'll read it later on today (when the NH goes to bed).

    ReplyDelete
  10. Nick: I think the continuous plot underlies the point - where would you stop? There's data back to 954 ad, although that would give about 300 years of just Polar Urals. So do they have to show a plot back to 954? What makes 1400 OK?
    We keep covering the same ground, making ruts here, but you stop where your analysis ends:

    If you select a collection of data, begin analyzing it, and then find part of your results detrimental to your conclusions, the question is what are your responsibilities as a researcher in this situation?

    How is what they did different than my scenario of a person comparing his model to data, but only including the portions where the two agree, without mentioning or explaining the exclusion of the other data (for which there can be good reasons to exclude it)?

    Again (going deeper into the same rut) what we are describing here is a post-hoc redaction of results, which for the language-challenged, means that they do the analysis then decide after the fact to not to show part of the analysis that they performed.

    This document covers some of the issues addressed here. In the US it is a requirement to take a course on responsible conduct of research (I could link the site, but unfortunately it is a closed site, your institute needs to be associated with the program to participate).

    But please note:

    The ethics issues are very different than the question of whether the redaction could be justified. The ethic issues address the question of the researchers responsibilities to alert the reader that part of their results have been withheld and why.

    All this said, I can think of plenty of cases where it would be OK to leave the data off the graph (which by the way is still a "manipulation" of the results, just a kosher one)...I can't think of any examples where it would be OK to do so, and not notify the reader though. That's the issue I believe we're really dealing with here.

    ReplyDelete
  11. I meant to say, at many higher educational institutions in the US, it is a requirement to take a course on responsible conduct of research.

    This requirement, where present, typically includes professors, research staff, post-docs and grad students and covers anybody who is receiving federal funding for their research. Here's a sample.

    ReplyDelete
  12. I'm a little confused the Science piece is not a paper its a commentary - probably invited and probably not refereed.

    You don't put details about data analysis in such a piece or report any original research at all.

    Normally such a graph would come from a cited paper which reports the data analysis. So there may still be valid criticisms of the commentary.

    But you can also imagine counter arguments - the authors might view as justified taking a portion of a graph constructed for different purposes (did they?) relevant to the discussions in the commentary - which BTW seems to be aimed at cataloging the weakness in paleoclimate reconstructions of Mann et al.

    So the hyperole is at least misplaced

    ReplyDelete
  13. Journals like Science have strict word limits on submissions. There's no way to include anywhere near a full description of data handling and analysis in any given paper. In fact it's very frustrating to try to write a research paper in the 4-5k word range; all that's on your mind is what you've left out. I'm new to this particular Briffa kerfuffle - what's the evidence that he was purposely hiding something by not mentioning his inclusion criteria? There are a hundred other things that weren't in the paper because of space constraints as well - there's simply no way to include everything that you've done, and every choice you've made, that someone somewhere might have a question about in the future.

    ReplyDelete
  14. andrewt,
    Briffa referred to his volcano paper and the Phil Trans paper. I think the Phil Trans paper is the source paper - it discusses and uses the RCS methods which seem to be the basis of the plot in question. But it doesn't actually show the plot.

    But yes, the hyperbole is misplaced.

    ReplyDelete
  15. Hi Nick, I like your recent plots!
    Have you seen the recent article in Annals of Applied Stats vol5(1). p5-44. McShane & Wyner. ... with Discussion! the most interesting part Rob Dunne tells me! :)

    Bill Wilson

    ReplyDelete
  16. Thanks, Bill,
    There was a lot of discussion of the McShane and Wyner paper last year, eg
    CA,
    CA, CA, CA,
    tAV,
    tAV, WUWT.

    My own contribution was slight, and fell way short of expectations. I can't remember reading the discussion - I must get back to that. A lot seems to depend on what you think of the Lasso in this context - do you have any ideas there?

    ReplyDelete
  17. The hyperbole is the suggestion that it would have been difficult to have put two sentences in, explaining that you had cut off the graph at a particular period, and a short (1-2 sentence) explanation of why. As I've said in another thread here (one that degenerated into google search counts), I've actually written review papers, including one that appeared in Nature, so I realize the limits...and these arguments you guys are giving are just pure unmitigated BS.

    You guys are going into bizarre contortions trying to explain away what is can be diplomatically described as "poor judgement" on the part of the authors. At least, Kerry Emanuel had the honesty to admit that much.

    ReplyDelete
  18. Nick,
    I'm not sure why you would point folks to the largely fatuous commentary on M&W at CA (Briggs?? give me a break) and tAv. Yet you omitted the substantive comment at RC and, yes, yours truly.

    RC:
    http://www.realclimate.org/index.php/archives/2010/08/doing-it-yourselves/

    http://www.realclimate.org/index.php/archives/2010/12/responses-to-mcshane-and-wyner/

    DC:
    http://deepclimate.org/2010/08/19/mcshane-and-wyner-2010/

    As I showed in excruciating detail, M&W Lasso results are flatly contradicted by an actual paleoclimatological study using the same data set (Mann et al 2008). So at best, M&W's section 3 merely demonstrates that an entirely unsuitable and unused methodology doesn't work.

    M&W's account of the "scientific literature" is an appalling rehash of CA and the Wegman Report, and demonstrates that they haven't read much less understood the actual relevant literature.

    ReplyDelete
  19. Deep,
    Yes, I realised later that I had only remembered the discussion that I found frustrating - yours and RC were much better.

    ReplyDelete
  20. Re: M&W: see SSWR, A.12, pp.96-113.

    Not only did M&W use poor statistical methods, but their paper most likely falls under academic misconduct (plagiarism + falsification/fabrication, certainly false citation.) They cut-and-paste/edit or copied tipoff errors from Wegman Report & Wikipedia, and then false-cited Ray Bradley's book, whose title they misspelled in exactly the same way as Wegman Report had! Read that section of SSWR, although it was done quickly, and even more is known now.

    ReplyDelete
  21. Regarding some matters arising:
    While Science articles are strictly limited in length, Science, like most journals, allows authors to include supporting online data, which is not limited, so one can no longer point to space limitations as an excuse for omitting detail.

    Most graduate students are required to receive some training in ethical conduct of science, but it does not need to be a formal course.

    Most scientists would agree that it is improper to delete data without at least stating criteria (which should be unbiased, and preferably decided upon before inspecting the results) for eliminating data. If the data was deleted for a good reason, and the author merely failed to explain it, this would be considered to be more along the lines of error than ethical violation.

    ReplyDelete
  22. trrll,
    Remember, this is 1999. As I remember the internet then, I doubt if Science had that provision in 1999. If it did, it was very new.

    No data was deleted here. It's a plotting issue.

    But I agree that, if the plot cutoff was done for a good reason, then the most one can argue about is whether it should have been explained, which is an exposition issue, not an ethical one.

    ReplyDelete
  23. I would think that the poor coherence that shows with fewer proxies should be considered part of a sensitivity test. If one can use a reasonable fraction of the total proxies used over the time period and come up with a very different looking time series I would think that the lack of robustness must be considered in a complete analysis.

    To be honest my interests are not in the ethics of the matter, as I could care less about the standing of these scientists. On the other hand I am most interested in looking at the results of sensitivity tests whether intended or not. I have seen this lack of sensitivity testing and/or the reporting of it as a weakness in some climate science papers I have read- even with simple things like sensitivity of start/end points for a time series trend that might have cyclical components.

    ReplyDelete