May 1, 1993 12:00 PM

Digital Archaeology

Anyone who can read German can read the first book ever printed. If you can read Sumerian cuneiform, you can read clay tablets that were probably the first things ever written. Hard copy sticks around, equally a delight to scholars and a burden to office managers. More significantly, the read-out system - the human eye, hand, and brain for which the scribes of Sumer scratched in clay - has not changed appreciably, which is why nearly every written thing that survives, from the dawn of writing to yesterday's newspaper, is still accessible and constitutes a fragment of civilization.

But as we move forward we discover that not all modern means of storing data share the characteristic of eternal readability. This problem originally appeared in the pre-electronic age, with the invention of sound recording. Signals were embodied in an object that required a specific machine to render it back into a form that could be apprehended by the senses. In those days, there were dozens of incompatible recording formats. The 10-inch, 78-rpm shellac platter ultimately won out, but not before the losers had produced a substantial body of recorded material, some of it irreplaceable.

Serious audiophiles constructed customized machines that could play anything from Edison cylinders to the various platter formats - including those that ran from the axis to the circumference - to the standard outside-in disk. When LPs were introduced, turntable manufacturers included variable-speed switches, so you could play your old 45s as well as the new "albums."

That was a slower era, of course, when decades passed between one standard and another. But the advent of digital computing in the early '50s vastly accelerated the pace at which we replace formats designed to store information. With computers increasing an order of magnitude in speed every two or three years, at the same time decreasing in cost, the pressure to dump the old, less efficient standards was irresistible.

Obviously, much of the data stored on the old systems - the material of immediate or archival value to the organization doing the replacement - is recorded in the new format and lives on. But a lot of it doesn't.

Digital archaeology is a discipline that doesn't quite exist yet, but may develop to deal with this problem, which is pervasive in the world of data.

NASA, for example, has huge quantities of information sent back from space missions in the 1960s stored on deteriorating magnetic tape, in formats designed for computers that were obsolete twenty years ago. NASA didn't have the funds to transfer the data before the machines became junk. The National Center for Atmospheric research has "thousands of terabits" of data on aging media that will probably never be updated because it would take a century to do it. The archival tapes of Doug Engelbart's Augment project - an important part of the history of computing - are decaying in a St. Louis warehouse.

"The 'aging of the archives' issue isn't trivial," says desktop publisher Ari Davidow. "We're thinking of CD-ROM as a semi-permanent medium, but it isn't. We already have PageMaker files that are useless."

Also, recall that the PC era is an eye-blink compared to the mainframe generations that came and went under the care of the old Egyptian priesthood of computer geeks. (Would you believe a '60s vintage GE 225 machine that ran tapes that stored 256 bits per inch? Drop some developer on it and you can actually see the bits.) J. Paul Holbrook, technical services manager for CICNet (one such Egyptian priest), summarizes the problem this way:

"The biggest challenge posed by systems like this is the sheer volume of information saved - there's too much stuff, it isn't indexed when it's saved, so there's lots of stuff you could never discover without loading it up again - that is, if you could load it up.

"The nature of the technology makes saving it all a daunting task. It's certainly possible to keep information moving forward indefinitely, if you keep upgrading it as you go along. But given the volume of data and how fast it's growing, this could present an enormous challenge."

Holbrook says twenty years is the maximum time you can expect to maintain a form of digital data without converting it to a newer format. He draws an analogy to print: "What if all your books had only a twenty-year life span before you had to make copies of them?"

A 'museum of information,' suggested by WELL info-maven Hank Roberts, might help to stem the leakage. Roberts says, "[Museum] collections are spotty and odd sometimes, because whenever people went out to look for anything, they brought back 'everything else interesting.' And that's the only way to do it, because it always costs too much to get info on demand - a library makes everything available and throws out old stuff; a museum has lots of stuff tucked away as a gift to the future."

But the information museum and the birth of digital archaeology itself wait for the development of a keener sense of loss. Those who created archaeology believed that modern civilization had roots in the past, roots that were no longer visible, and so they went to look for them, at Ninevah and Troy and in the Valley of the Kings. In contrast, electronic data seems to us like the sea: eternal, uniform, and plumbable to its deepest regions. We tend not to see it as layered in hidden strata, like Jericho.

But so it is. In the far future, cybernauts cruising the Matrix will spot professors in virtual pith helmets directing virtually sweating graduate students digging for - what? Checking accounts from 1957? The margarine sales of Safeway store #103 for 1971? You can never tell what our electronic legacy to the future will be. It's doubtful that the kings of Assyria thought much about how much barley was harvested year to year in some obscure Euphratean province. They were recording on the towers of Ninevah the really important stuff - their ancestry, their conquests, their laws. So we, too, sort out the important electronic data from the "junk." Yet Ninevah is dust and the grain tallies survive.