Squaring the Culture

"...and I will make justice the plumb line, and righteousness the level;
then hail will sweep away the refuge of lies,
and the waters will overflow the secret place."
Isaiah 28:17

01/29/2010 (5:24 pm)

…and We Can't Trust the Raw Climate Data…

Part of the continuing fallout from ClimateGate is the awareness that virtually all recent climate research depends on a data set of varying quality that has not been examined by scientists outside of the agency handling it, and over which global warming alarmists appear to have complete control. Researchers are responding by paying much closer attention to the quality of the data — and they’re not happy about what they’re finding.

Anthony Watts and Joe D’Aleo, two retired meteorologists, have combined to produce a report noting serious problems with the base climate data. This report goes far beyond Watt’s original work regarding siting problems with weather stations in the US, although that work is part of the current publication. Watts and D’Aleo were aided by some hard computer work produced by one E.Michael Smith, an Information Technology specialist. D’Aleo is the Executive Director of ICECAP, that useful source of thankfully-not-addled-by-progressive-talking-points climate science. Smith blogs at Musings from the ChiefIO.

What follows is an explanation of one of the chief findings from recent inquiries into the quality of the raw data on which climate research is based. I borrowed a great deal of analysis from Smith’s article from 1/8/2010, entitled “The Bolivia Effect,” and from Marc Sheppard’s article, “Climategate: CRU Was But the Tip of the Iceberg,” cross-posted at the American Thinker and Watts Up With That.

First of all, it is crucial to note that there exists basically one data set containing historical temperature data from stations on the ground, and that’s the data set compiled by the National Oceanographic and Atmospheric Administration (NOAA,) a US federal agency, called the Global Historical Climatology Network (GHCN). There are three, independent organizations that produce adjusted temperature histories, but they’re all based on more or less the same data, and that data is from GHCN. The three major adjusted histories are named after the places producing them: the Climate Research Unit from East Anglia University (CRU), NASA’s Goddard Institute of Space Studies (GISS), and the National Climatic Data Center (NCDC), also run by the NOAA. There are also two organizations producing climate histories based on satellite readings — Remote Sensing Systems (RSS), a private company supported by NASA, and the University of Alabama at Huntsville (UAH) — but satellite data has only been available since 1979, so the temperature history from the sky is all recent.

So, to keep the alphabet soup straight: virtually all ground data come from GHCN, which is run by NOAA. CRU, GISS, and NCDC all produce ground-based temperature histories using GHCN data. RSS and UAH produce recent temperature histories from satellites. Got it? (Sheesh. Scrabble, anyone?)

The GHCN data is dirty, having been accumulated over many decades from literally hundreds of collections. It begins in 1701 with a single station in Germany. Additional stations appear gradually, until by the mid-20th century the world is crowded with measuring stations reporting, as many as 6,000 of them. Then, suddenly, around 1990 the number of stations drops off dramatically. Also, individual stations come and go; some stations only report for 10 or 20 years, others skip particular months. None of the temperature records span the entire time period of the database. Lots of months are missing.

Here’s a graph showing the number of stations in the GHCN over time, taken from NOAA’s overview of the GHCN data. The solid line is the number of stations reporting average daily temperatures; the dotted line is the number of stations reporting two temperatures daily, the minimum and the maximum temp:

GHCN Number of stations

NOAA explains the drop-off in the number of stations as follows:

The reasons why the number of stations in GHCN drop off in recent years are because some of GHCN’s source datasets are retroactive data compilations (e.g., World Weather Records) and other data sources were created or exchanged years ago.

Say what?

That’s not crystal-clear prose, but as near as I can figure, this means somebody had to go back through historical records to reconstruct some of the individual data series, and they’re not finished reconstructing records gathered since 1990. I find that incredibly hard to believe; there’s been somewhere on the order of $50 billion spent by governments studying climate change since 1990. Surely resources have been available to gather data accurately since 1990. And, as we’ll see later in the discussion, it appears as though somebody has gone through the record of individual stations deleting particular stations, and deleting them within a predictable pattern. So, I’m not buying that the drop-off is due to a lag in reconstructing records.

All of us who have been following the climate change debate for a while have seen world graphs from NASA that look like this:


The map purports to show where the globe is getting warmer, where it’s getting cooler, and where it’s staying the same. For most of the last 20 years, these maps have shown that the planet seems to be heating, especially as we approach the north pole. As we will see, however, that’s not exactly what it’s showing us. This will be a lesson in why it is important to know not only what a chart says, but where the data came from, and how the chart was constructed.

These graphs are produced by NASA based on GISS-adjusted data, which (as I just said) is one of three adjusted datasets based on GHCN. They’re apparently generated by a piece of FORTRAN code called GIStemp, which was (finally!) released to the public for inspection in 2007. GIStemp takes a Mercator Projection map of the earth (notice that the poles are as big as the equator, so the sizes of things get more distorted the farther you get from the equator,) and breaks it up into grid boxes of 5° latitude by 5° longitude which, if you do the math, means there are 5,184 separate grid boxes on the map (Smith says there are 8,000 — they must be a hair smaller than 5°, then). Then, it calculates two numbers: 1) the temperature in each box today; and 2) the temperature in each box at some point in the past, which is called the “baseline.” They subtract the baseline temperature from today’s temperature. If the result is positive, it gets displayed in shades of red, darker reds showing greater differences. If the result is negative, it gets displayed in blue, again with darker shades for greater differences. Thus, the world shows up red where it’s getting hotter, and blue where it’s getting colder — if the numbers are meaningful.

The baseline could have been anything the programmers chose, depending on what they wanted to show. In this case, what they wanted to show was a general temperature trend of the planet over a few decades, so they picked a series of years not too far into the past, 1951-1980, and made the program calculate the average temperature for each grid box in those years. It uses an average because the temperature tends to rise and fall dramatically (as we all notice every day), and averaging over a group of years gets rid of these normal variations which statisticians would call “noise.” The result is a graph of what they call “anomalies”: temperatures getting hotter, colder, or staying the same when compared to the average temperature from 1951-1980.

What does the program do if there’s no thermometer anywhere in a grid box?

What it does is attempt to figure the temperature from nearby grid boxes where there are actual temperature readings. It picks up to 5 grid boxes and tries to get a temperature from them, averaging them with (I think) a weight for distance. The maximum distance from which it can borrow a temperature is 1200 km, or about 750 miles.

Immediately, I see two problems. The first is that 750 miles can produce a dramatic difference in altitude or regional climate. If you’ve got a thermometer at Los Angeles International Airport, and you want the temperature for a grid box located in the High Sierra, borrowing the temp from the airport is not going to give you a reasonable reading. It becomes important to understand where the thermometers are placed.

The second problem is even larger: what happens if there was a thermometer in the grid box during the baseline period, but not today, or vice versa? If I was programming this and I found that condition, I’d try to recalculate the baseline to use the same stations as were used to calculate the current temperature; but that may not be possible because the stations come and go over time.

What the GIStemp programmers did was ignore the condition. They calculate the baseline using thermometers available during that period, and calculate today’s temperature using thermometers available now. And this is where the problem of the disappearing stations comes in.

You see, E. Michael Smith, the programmer who’s evaluating the GHCN database, says there’s a pattern to which stations have disappeared from the data set, according to Marc Sheppard at the American Thinker:

Perhaps the key point discovered by Smith was that by 1990, NOAA had deleted from its datasets all but 1,500 of the 6,000 thermometers in service around the globe.

Now, 75% represents quite a drop in sampling population… Yet as disturbing as the number of dropped stations was, it is the nature of NOAA’s “selection bias” that Smith found infinitely more troubling.

It seems that stations placed in historically cooler, rural areas of higher latitude and elevation were scrapped from the data series in favor of more urban locales at lower latitudes and elevations. Consequently, post-1990 readings have been biased to the warm side not only by selective geographic location, but also by the anthropogenic heating influence of a phenomenon known as the Urban Heat Island Effect (UHI).

For example, Canada’s reporting stations dropped from 496 in 1989 to 44 in 1991, with the percentage of stations at lower elevations tripling while the numbers of those at higher elevations dropped to one. That’s right: As Smith wrote in his blog, they left “one thermometer for everything north of LAT 65.” And that one resides in a place called Eureka, which has been described as “The Garden Spot of the Arctic” due to its unusually moderate summers.

Smith also discovered that in California, only four stations remain – one in San Francisco and three in Southern L.A. near the beach…

The baseline is 1951-1980. The Big Deletion occurred around 1990. So all around the world, baselines are being calculated using data from actual thermometers, and then current temperature is being computed based on a temperatures borrowed from stations as far as 750 miles away — stations apparently selected for their warmth.

Take our hypothetical grid box in the High Sierra. If there was a thermometer there in the 1950s, but not today, the baseline was measured in the mountains, and the current temperature is being measured at the beach. Of course there will be a warming trend, if the deleted stations are disproportionately from high elevations and high latitudes.

Smith has evaluated every major area of the globe, and they all show the same pattern: there were real thermometers reporting from the cooler and higher areas during the baseline period, but today’s temperature is being borrowed from stations in warmer areas.

So, with that knowledge in hand, let’s look at the anomaly map again. Just below the anomaly map, I’ve inserted a GIStemp output produced by Smith showing where the actual measuring stations are located today. The second map is from a year and a half earlier, but I’m not including it to compare temperatures, just to show which readings are real and which are borrowed:


GIStemp Smith actual locations

Do you see the large, red spot over the middle of South America in the full-map GIStemp display, showing extreme warming? It’s the result of deleting stations that were located in the mountains of Bolivia, and substituting for it stations on the coast or in the Amazon.

Do you see the extreme warming around the Arctic Circle? It’s largely the result of losing temperature stations around the Arctic Circle, and replacing them with proxies located far south of there.

This is only one of the problems reported by Watts and D’Aleo. They also complained about the pre-analysis adjustment of raw data from GHCN which is carried out by each of the major reporters (CRU, GISS, and NCDC, remember?) before they plot their data. There’s a certain amount of adjustment that’s required, of course; they need to filter out differences due to different altitudes, measurements taken at different times of day, or proximity to cities which get hotter than the surrounding countryside, and they need to fill in missing months. They also need to adjust for known station events, like differences caused by switching the type of instrument used to record the temperature, or caused by moving the station.

The problem is, nobody has peer-reviewed the adjustments, and virtually all the adjustments go in the same direction. Global temperature is falling, if you go by the raw data, but it’s rising steadily after adjustments. There are plausible-sounding reasons for the adjustments, but in the wake of the revelations of partisan animus in ClimateGate, can we take that for granted? And since the standard advanced by the climate alarmists themselves dismisses anything not peer-reviewed (as is the norm for scientific pursuits), why should we take it for granted?

Watts and D’Aleo also complain about the lack of standards for handling the data, the improper siting of temperature-recording stations, the number of stations located near airports, changes in adjustments that fail to account for Urban Heat Island effects (those warm urban areas again,) problems in the way data are merged from separate stations, changes in the methods of taking ocean surface temperatures, and several other issues. The sum of their analysis is simply, “the data are a mess.”

Is the globe actually warming? It’s really impossible to say. The presentation of the data has made it seem that way, but it’s not a valid measurement. Bad data means bad results, always. The solution is to start afresh, but with lots of public eyes watching the process this time. After ClimateGate, and after the IPCC debacle, no results produced so far are solid.

« « So, We Can’t Trust the IPCC… (Updated) | Main | State of the Union » »


January 30, 2010 @ 3:22 pm #

Another great explanation. Thanks.

April 1, 2010 @ 10:43 am #

[…] which we now know to be unreliable as well. And Horner points out that Watts and D’Aleo recently published a report giving reason to doubt the accuracy of NCDC’s data […]

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>