BEST Data ?Quality? Problems
Source: Climate Audit
CA reader Gary Wescom writes about more data quality problems with the Berkeley temperature study ? see here.
In a surprising number of records, the ?seasonally adjusted? station data in the Berekely archive contains wildly incorrect data. Gary shows a number of cases, one of which, Longmont 2ESE, outside the nest of climate scientists in Boulder CO, is said to have temperatures below minus 50 deg C in the late fall as shown below:
This is not an isolated incident. Gary reports:
Of the 39028 sites listed in the data.txt file, arbitrarily counting only sites with 60 months of data or more, 34 had temperature blips of greater
than +/- 50 degrees C, 215 greater than +/- 40 C, 592 greater than +/- 30 C, and 1404 greater than +/- 20 C. That is quite a large number of faulty temperature records, considering that this kind of error is something that is so easy to check for. A couple hours work is all it took to find these numbers.
In the engineering world, this kind of error is not acceptable. It is an indication of poor quality control. Statistical algorithms were run on the data without subsequent checks on the results. Coding errors obviously existed that would have been caught with just a cursory examination of a few site temperature plots. That the BEST team felt the quality of their work, though preliminary, was adequate for public display is disconcerting.
Gary also observed a strange ringing problem in the data.
I observed earlier that I had been unable to replicate the implied calculation of monthly anomalies that occurred somewhere in the BEST algorithm in several stations that I looked at (with less exotic results.) It seems likely that there is some sort of error in the BEST algorithm for calculating monthly anomalies as the problems are always in the same month. When I looked at this previously, I couldn?t see where the problem occurred. (There isn?t any master script or road map to the code and I wasn?t sufficiently interested in the issue to try to figure out where their problem occurred. That should be their responsibility.)
Even though GHCN data is the common building block of all the major temperature indices, its location information is inaccurate. Peter O?Neill see has spot checked a number of stations, locating numerous stations which are nowhere near their GHCN locations. Peter has notified GHCN of many of these errors. However, with the stubbornness that it is all too typical of the climate ?community?, GHCN?s most recent edition (Aug 2011) perpetuated the location errors (see Peter?s account here.)
Unfortunately, BEST has directly used GHCN location data, apparently without any due diligence of their own on these locations, though this has been a known problem area. In a number of cases, the incorrect locations will be classified as ?very rural? under MODIS. For example, the incorrect locations of Cherbourg stations in the English Channel or Limassol in the Mediterranean will obviously not be classified as urban. In a number of cases that I looked at, BEST had duplicate versions of stations with incorrect GHCN locations. In cases where the incorrect location was classified differently than the correct location, essentially the same data would be classified as both rural and urban.
I haven?t parsed the BEST station details, but did look up some of the erroneous locations already noted by Peter and report on the first few that I looked at.
Peter observed that Kzyl-Orda, Kazakhstan has a GHCN location of 49.82N 65.50E, which was over 5 degrees of separation from its true location near 44.71N 65.69E. BEST station 148338 Kzyl-Orda is also at GHCN 49.82N 65.50E. Other versions (124613 and 146861) are at 44.8233 65.530E and 44.8000 65.500E.
Peter observed that Isola Gorgona, Italy had GHCN location of 42.40N 9.90E more than one degree away from its true location of 43.43N, 9.910E. BEST station 148309 (ISOLA GORGONA) has the incorrect GHCN location of 42.4N 9.9E.
The same sort of errors can be observed in virtually all the stations in Peter?s listing.
I realize that the climate community is pretty stubborn about this sort of thing. (Early CA readers recall that the ?rain in Maine falls mainly in the Seine? ? an error stubbornly repeated in Mann et al 2007.) While BEST should have been alert to this sort of known problem, it?s hardly unreasonable for them to presume that GHCN had done some sort of quality control on station locations during the past 20 years, but this in fact was presuming too much.
These errors will affect the BEST urbanization paper (the amount of the effect is not known at present.)