Sunday, February 5, 2017

On the Mail on Sunday article on Karl et al., 2015

There is an "interesting" piece (use of quotes intentional) in the Mail on Sunday today around the Karl et al., 2015 Science paper.

There are a couple of relevant pieces arising from Victor Venema and Zeke Hausfather already available which cover most of the science aspects and are worth a read. I'm adding some thoughts because I worked for three and a bit years in the NOAA group responsible in the build-up to the Karl et al. paper (although I had left prior to that paper's preparation and publication). I have been involved in and am a co-author upon all relevant underlying papers to Karl et al., 2015.

The 'whistle blower' is John Bates who was not involved in any aspect of the work. NOAA's process is very stove-piped such that beyond seminars there is little dissemination of information across groups. John Bates never participated in any of the numerous technical meetings on the land or marine data I have participated in at NOAA NCEI either in person or remotely. This shows in his reputed (I am taking the journalist at their word that these are directly attributable quotes) mis-representation of the processes that actually occured. In some cases these mis-representations are publically verifiable.

I will go through a small selection of these in the order they appear in the piece:

1. 'Insisting on decisions and scientific choices that maximised warming and minised documentation'

Dr. Tom Karl was not personally involved at any stage of ERSSTv4 development, the ISTI databank development or the work on GHCN algorithm during my time at NOAA NCEI. At no point was any pressure bought to bear to make any scientific or technical choices. It was insisted that best practices be followed throughout. The GHCN homogenisation algorithm is fully available to the public and bug fixes documented. The ISTI databank has been led by NOAA NCEI but involved the work of many international scientists. The databank involves full provenance of all data and all processes and code are fully documented. The paper describing the databank was held by the journal for almost a year (accepted October 2013, published September 2014) to allow the additional NOAA internal review processes to complete. The ERSSTv4 analysis also has been published in no fewer than three papers. It also went through internal review and approval processes including a public beta release prior to its release which occurred prior to Karl et al., 2015.

2. 'NOAA has now decided the sea dataset will have to be replaced and revised just 18 months after it was issued, because it used unreliable methods which overstated the speed of warming' 

While a new version of ERSST is forthcoming the reasoning is incorrect here. The new version arises because NOAA and all other centres looking at SST records are continuously looking to develop and refine their datasets. The ERSSTv4 development completed in 2013 so the new version reflects over 3 years of continued development and refinement. All datasets I have ever worked upon have undergone version increments. Measuring in the environment is a tough proposition - its not a repeatable lab experiment - and measurements were never made for climate. It is important that we continue to strive for better understanding and the best possible analyses of the imperfect measurements. That means being open to new, improved, analyses. The ERSSTv4 analysis was a demonstrable improvement on the prior version and the same shall be true in going to the next version once it also has cleared both peer-review and the NOAA internal process review checks (as its predecessor did).

3. 'The land temperature dataset used by the study was afflicted by devestating bugs in its software that rendered its findings unstable' (also returned to later in the piece to which same response applies)

The land data homogenisation software is publically available (although I understand a refactored and more user friendly version shall appear with GHCNv4) and all known bugs have been identified and their impacts documented. There is a degree of flutter in daily updates. But this does not arise from software issues (running the software multiple times on a static data source on the same computer yields bit repeatability). Rather it reflects the impacts of data additions as the algorithm homogenises all stations to look like the most recent segment. The PHA algorithm has been used by several other groups outside NOAA who did not find any devestating bugs. Any bugs reported during my time at NOAA were investigated, fixed and their impacts reported.

4. 'The paper relied on a preliminary alpha version of the data which was never approved or verified'

The land data of Karl et al., 2015 relied upon the published and internally process verified ISTI databank holdings and the published, and publically assessable homogenisation algorithm application thereto. This provenance satisfied both Science and the reviewers of Karl et al. It applied a known method (used operationally) to a known set of improved data holdings (published and approved).

5. [the SST increase] 'was achieved by dubious means'

The fact that SST measurements from ships and buoys disagree with buoys cooler on average is well established in the literature. See IPCC AR5 WG1 Chapter 2 SST section for a selection of references by a range of groups all confirming this finding. ERSSTv4 is an anomaly product. What matters for an anomaly product is relative homogeneity of sources and not absolute precision. Whether the ships are matched to buoys or buoys matched to ships will not affect the trend. What will affect the trend is doing so (v4) or not (v3b). It would be perverse to know of a data issue and not correct for it in constructing a long-term climate data record.

6. 'They had good data from buoys. And they threw it out [...]'

v4 actually makes preferential use of buoys over ships (they are weighted almost 7 times in favour) as documented in the ERSSTv4 paper. The assertion that buoy data were thrown away as made in the article is demonstrably incorrect.

7. 'they had used a 'highly experimental early run' of a programme that tried to combine two previously seperate sets of records' 

Karl et al used as the land basis the ISTI databank. This databank combined in excess of 50 unique underlying sources into an amalgamated set of holdings. The code used to perform the merge was publically available, the method published, and internally approved. This statement therefore is demonstrably false.

There are many other aspects of the piece that I disagree with. Having worked with the NOAA NCEI team involved in land and SST data analysis I can only say that the accusations in the piece do not square one iota with the robust integrity I see in the work and discussions that I have been involved in with them for over a decade. 


Ceist said...

Please post this on Judith Curry's blog. She has a guest article by John Bates there and science deniers are jumping up and and down with glee

PeterThorne said...

Ceist 8,

People should feel free to use this content elsewhere or link to it.

bert9000 said...

THank you for the detailed explanation that debunks some of the inevitable spin from the mail, I had a couple of questions if I may,

In regards to point 2, is it correct to say that the new data will show cooler temperatures than that used in the 2015 paper? If so, is it significantly cooler?

On another note, the Mail article talks of an error in which there was a "failure to archive and make available fully documented data" meaning that the results of the influential 2015 paper cannot be independently validated.

Is there any truth to that?

PeterThorne said...

To anon:

In regard to point 2 it will not be entirely clear what the change in ERSSTv5 vs. ERSSTv4 is until a final version appears. In the version submitted to the journal the change is minor yielding a slight reduction in the rate of recent warming of SSTs. But peer review, internal process review and thorough testing must take its course. And during this time changes may be suggested that improve and change the product. It would never be wise to assume that a submitted manuscript will match the final product.

For Karl et al., 2015 the source code and data for much, if not all, of the process is available. As noted in my post I was not involved in that paper itself. But the bits I was involved in I would disagree with the assertion in the Mail on Sunday.

For the land data the ISTI databank source and its construction is fully archived and reproducible, and the homogenisation algorithm likewise. Those were the aspects I was most directly involved with.

The SST source data is available publically (ICOADS) as is the derived ERSST product.

I was never involved in aspects of the SST and land product merge and infilling so am not qualified to venture an opinion on the archival of that aspect.

But, the underlying land and marine data have full archival and documentation with much, if not all, processing software publically available without request.

Jim Hunt said...

Peter - Thanks very much for this article. You may be interested to learn that Ex Prof. Curry is apparently currently quite content that the Mail on Sunday have now modified the caption under David Rose's "anomalous baseline" graph.

I am endeavouring to explain to her how these matters are supposed to be resolved here in the once Great Britain:

Judith – Do you seriously expect us to accept that changing the caption and not the graph, without apology, satisfies clause 1.ii of the IPSO Editors Code of Practice?

Would you be open to a (possibly private?) conversation about pursuing the IPSO process to its (potentially bitter?) end?


PeterThorne said...


I saw the graph issue but considered others to have taken sufficient care of that. The baseline paradox gets folks far more often than it should. If I recall correctly this isn't the first time this particular journalist has fallen foul of it.

At this juncture my main interest is in providing corrections to the critiques of process, which only those involved such as myself were privvy to, on the record that might allow folks proper context. Others had already provided scientific rebuttals. I consider this to be public record and that people can quote from / use it for fair purposes with attribution.

Beyond that I'll see how things pan out over coming weeks.

Anonymous said...

Buoy data is only significant over the last few decades and thus adjusting buoy data would only affect the last 40 years, or so. The trend would not change if looking at only the last 40 years, but, please correct me if I'm wrong, would not the overall trend increase when looking at the entire time series?

PeterThorne said...


The issue with buoys coming on board is that they read systematically different to the ships. And buoys start out in the 80s/90s at 10% of data and are now 90% of data. If they stayed at 10% of data then you are right that over the period of buoy measurements the trend wouldn't (to first order) be affected.

The issue is that if I start out in a hypothetical stationary process of true mean 1.00 with 90% of measurements reading 1.12 and 10% reading 1.00 and end up with 90% reading 1.00 and 10% reading 1.12 (or vice-versa) then I impart a trend in the series which isn't real. To get the real trend requires some sort of bias correction which recognises a. the offset between the techniques and b. the change in relative propensity of the two techniques. The precise nature of the method can be subject to discussion, the need for it less so.

Gene said...

Last night I found a link to the "process that Dr. Bates helped design" for the NOAA to track code changes / archive data / etc. I thought it was on Dr. Bates' blog, but I can't find that link again. Does anyone have an outline of that NOAA data archive process? Some googling lead me to realize they have many processes, but I'm looking for the one originally linked to the story.

Unknown said...

I'm not a climate scientist, only and signals engineer with a PhD. Does the following statement make sense?

Climate scientists are trying to measure the total global heat balance measured over the year (expressed as average temperature of the earth), using a global grid of temperature measurements all affected by both linear and nonlinear noise sources that change from year to year. An algorithm to "homogenize" the data is used to make up for the measurement variations. You can't possibly get a good value for the absolute yearly global heat balance, but you can compare "homogenized and corrected" data sets against each other to get a relative change year or decades apart.

Just trying to get a handle on the problem.

R Graf said...

Peter, Do you know if Karl's team considered Matthews(2013) and part II here rebuttal of the Thompson(2008)? Basically Matthews found that the ERI observations were not biased warm due to the depth of intakes getting colder water. He showed the speed of the observations was the most critical factor.

Can you kindly point to the location of the Karl(15) data and code archive?


PeterThorne said...


as you say there are many process documents and many of them also aren't publically available. I would not know precisely which process document was meant to be followed at the time or whether its publically available. The general ethos of the review experiences for the land databank and ERSST that I was involved in preparing for however is refected in which outlines the maturity assessment based upon several processing-maturity based strands that John Bates and colleagues produced.

A mature dataset by such metrics is a good thing and its right that its well documented, the code is verified, reviewed etc. etc. But these are aspects of processing maturity - they cannot guarantee scientific accuracy. They may ever so slightly tilt the odds in favour of scientific accuracy. But the fundamental issue is that a well engineered and understood processing does not guarantee a scientifically accurate product.

To my view Karl et al. were faced with the following question:

Would you rather have a mature product that you know is wrong or a less mature product that you believe is right based upon new knowledge of issues in the data that weren't addressed in your current process engineering mature product?

My judgement is that it is better to be scientifically 'correct' than process 'perfect'.

PeterThorne said...

in terms of the process of temperature analysis you are broadly correct. However, in terms of energy, the near-surface atmosphere is an insiginificant extra. The real energy in the climate system is in the ocean (90%+) and cryosphere / land. Box 3.1 in AR5 WG1 available here is a good piece on the total energetics.

PeterThorne said...

R Graf,

I'm on the road at present but believe that Matthews papers were cited in Huang et al.

In terms of your question about the archived data (sorry if blogger doesn't like ftp links).

Paul Matthews said...

On the question of instability, it is interesting that you acknowledge a 'degree of flutter'. Do you know where this is discussed in the literature or technical reports? It does not seem to be mentioned in the technical reports at the GHCN site.
In the case of one station, Alice Springs, the amplitude of the flutter in past temperatures as reported in the adjusted GHCN data during 2012 is 3 or 4 degrees C, which appears to support the statement made by Bates.

Paul Matthews said...

In this blog post you say that the software is available.
But the link provided is to Fortran code dating from 2012.
This is not the code currently used. The Fortran code was re-written in Python in 2012. According to one of the technical documents dating from 2012, "This was part of a larger effort of the Climate Code Foundation to improve accessibility to climate related software."
My question is, where is this current Python version of the PHA code?

R Graf said...

Peter, Matthews is not cited in Huang(2015). I am wondering if you or any of the other authors of Huang(2015) considered re-adjusting the ERI and bucket temps directly instead of using HadNMAT2. Also, did you make a diurnal temperature range adjustment for the century trend in DTR? As you realize nighttime air temps are a proxy only for Tmin, not Tavg.

PeterThorne said...

Rgraf and Paul Matthews,

sorry, those commenst were in spam and I didn't check.

R graf: We did consider direct adjustment but chose to retain use of NMAT. The rationale was that there is power in looking at the same problem in distinct manners. The Hadley group already were undertaking an approach similar to that you describe and we wanted to ensure methodological dof were spanned. If we don't explore methodological approach space we pretty much guarantee underestimating the true uncertainty.

Paul, the effort by CCF was part of the google summer of code. It never completed so there is no final python version. However, the PHA code has been refactored in fortran and made considerably more user friendly. My understanding is that this refactored code shall be released in coming weeks or months.

With regard to the flutter. The flutter of daily updates is implicit in the updates protocol in the Lawrimore et al paper but I can't remember whether it was explicitly discussed. The effects will maximise for remote stations where 'marginal' breaks are larger magnitude. Its a neighbour-based algorithm, and breakpoint identification f(difference series sigma). In densely sampled region that sigma is small so the marginal (p c.0.05) breaks are small. Where the region is very sparsely sampled (e.g. Outback Australia) sigma is very large so even 3 or 4 C breaks become marginal. Adding little bits of new data can change whether the algorithm decides a break exists. It is a feature rather than a bug per se. My understanding is that GHCNv4 will only make periodic POR reassessments and append new data regularly which will de facto remove the flutter. But I am no longer directly involved in GHCN development so this is simply what I understand on the matter.

Post a Comment

Please note all comments are moderated. Comments containing profanities, unwarranted accusations, deemed off-topic etc. shall be deleted. There may be a substantial delay between posting a comment and its acceptance owing to moderator availability.