East Anglia University used to proudly host the Climate Research Unit. Now, perhaps, they wish that the well-heeled yet skeptical alumni didn’t know that this particular academic institution existed on their campus. EAU-CRU claimed to have the most comprehensive and well-ordered archive of climate data in the world. According to Lorrie Goldstein of the Toronto Sun, they have an archive that holds a certain volume of climate data.

The CRU assigned a programmer to archive and organize this sprawling agglomeration of data. They took this step so that they could retroactively validate claims to excellence that their professorate had already made. This was not only a moral shortcoming of this university; it was also a tactical error.

One of the sad ironies of both engineering and mathematics is that figuring out and organizing what you already have can be just as tough as discovering something new. Problems of this ilk, such as data normalization, and systems identification can give very competent practitioners of the scientific black arts splitting migraine headaches.

This programmer labored in the sterile vineyards of the CRU data warehouse from 2006 through this year. His methodical notes and working papers were collected into the standard read_me.txt file that all reasonable programmers include in the deliverable for any large project. His 274 page opus, HARRY_READ_ME.txt reads like a Dean Kuntz novel.

The now infamous programmer’s journal chronicles the man’s descent into ethical Gehenna. The programmer first realizes that his assignment is, as academics put it, “non-trivial.” Harry’s frustration shows below.

“But what are all those monthly files? DON’T KNOW, UNDOCUMENTED. Wherever I look, there are data files, no info about what they are other than their names. And that’s useless …” (Page 17)

– “It’s botch after botch after botch.” (18)

“Am I the first person to attempt to get the CRU databases in working order?!!” (47)

He then makes the shocking realization that a lot of this data is either counterfactual or even worse, made up from scratch.

– “COBAR AIRPORT AWS (data from an Australian weather station) cannot start in 1962, it didn’t open until 1993!” (71)

Finally, having realized that the data populating this database is not representative, he was forced to make a choice. He could have blown the whistle and declare this database fraudulent. He also could have played along and continued receiving remuneration for his efforts. From the quotes below, he became another working girl in the house of academic ill repute that was the East Anglia University Climate Research Unit. Here are some examples of Harry just going along to get along.

“What the hell is supposed to happen here? Oh yeah — there is no ‘supposed,’ I can make it up. So I have : – )” (98)

– “You can’t imagine what this has cost me — to actually allow the operator to assign false WMO (World Meteorological Organization) codes!! But what else is there in such situations? Especially when dealing with a ‘Master’ database of dubious provenance …” (98)

– “So with a somewhat cynical shrug, I added the nuclear option — to match every WMO possible, and turn the rest into new stations … In other words what CRU usually do. It will allow bad databases to pass unnoticed, and good databases to become bad …” (98-9)

And so it goes in modern academia. Like the corrupt, self-serving police officers in the novel LA Confidential, people get forced to bend or break. The hard choices forced by the ethical squalor amongst the leadership of a major academic enterprise debased otherwise decent people into similar moral decline.

Harry had to choose between his own soul and the continued success of his team. Yet, even in the end, after he had sacrificed his integrity; reality still wouldn’t budge a single degree Fahrenheit. As he continued to try and bring order to dishonesty, the vast ineluctability of the project seemed to overwhelm and demoralize Harry.

– “OH F— THIS. It’s Sunday evening, I’ve worked all weekend, and just when I thought it was done, I’m hitting yet another problem that’s based on the hopeless state of our databases.” (241).

– “This whole project is SUCH A MESS …” (266)

All of that, for a graduate stipend, or maybe a PostDoc… Things like this are why I enraged most of the blogosphere by demanding that science be put under a regulatory regime similar to Sarbanes-Oxley.