Preserving Electronic Libraries
Researchers are developing a system that would read data lost on aging computer tapes. DIGITAL DILEMMA
CAMBRIDGE, MASS. — IN a few weeks, Alan Bawden is going to start spinning 1,700 computer tapes in an attempt to restore his past.
From 1976 until earlier this year, Dr. Bawden was an undergraduate, a staff member, and a graduate student at the Massachusetts Institute of Technology's Artificial Intelligence Laboratory. Every few months during those years, MIT's computer labs made a set of backup tapes containing all of the data on the lab's computers. On those tapes were the results of government-funded research projects, breakthrough programs that changed the world of computer science, and reams upon reams of personal electronic ma il. It's all there in digital form, all safely stored away for posterity. There's just one problem: No computer left at MIT can make any sense of it.
Although much of the AI Lab's research was published in papers, "the published papers don't contain the full results," says Bawden. "A lot of the valuable data is contained only on those tapes."
So for the past four years, Bawden and a small group of friends have been developing a computer system that can read the tapes again. Their goal: to copy the data onto modern computers so that the lab's computer research from the '70s and '80s will be available to today's researchers.
Bawden has a personal reason for wanting to preserve the data as well: "Some of this history is my personal history."
What's a problem for MIT is also a problem for government and most corporations. "People who are running organizations aren't aware of the need to treat electronic media and electronic systems differently from the way that they treated paper systems," says David Bearman, editor of Archives and Museum Informatics, an industry newsletter.
While paper does deteriorate over time if it is not properly preserved, even books that are printed on acid-based paper can be put away in boxes and read in 20 or 30 years. Computer tapes, on the other hand, simply do not age well. Recently, when a technician working for the National Archives and Records Administration tried to read the contents of a 27-year-old computer tape from the Equal Employment Opportunity Commission, the tape literally melted inside the tape drive, says Kenneth Thibodeau, directo r of the Archives' Center for Electronic Records. "The [tape's] chemistry had changed," he says. Short shelf life
Experts say that even when stored under proper temperature and humidity, computer tapes shouldn't be depended on to store data for more than 10 years. Furthermore, the tapes can't be left on the shelf: Every few years the tapes should be unrolled and re-rolled to prevent a problem called "print through," in which the data transfers between one layer of tape and another on a tightly wound reel.
"If you look at the interior of the tape hub, there is an incredible amount of pressure between the two layers of the tape," says Jim Green, who heads the space science data operations for NASA's Goddard Spaceflight Center in Maryland. "The magnetic domains from one [tape] layer end up being so close to the layer underneath it that they end up being imprinted." The result is added noise and, eventually, loss of data.
The solution is to copy data from old tapes onto newer tapes every few years, before the tapes go bad. Fortunately, since the information on the tapes is in digital form, each copy is a duplicate of the original. But having to make those copies is an ongoing expense sometimes hard to justify to decision-makers. The reason the National Archives hadn't tried to read the 27-year-old tape earlier, Mr. Thibodeau says, was that funding for the project had been cut under the Reagan administration and only recen tly restored.
Organizations are increasingly turning to optical disks as a way of preserving data in their archives because of optical media's longer life span. But the long life can give a false sense of confidence. "Optical disks are sure to last 50 years or so, but in 50 years the chances that you will be able to read the optical disks we are now producing is about zero," because the optical drives available 50 years from now won't be compatible with today's optical disks, says Mr. Bearman.
Indeed, the real problem in keeping an electronic library isn't preventing the physical destruction of the tapes. "It's staying abreast of the technology to read and write data," says Green, who oversees the National Spaceflight Data Center, one of the largest electronic archives in the world.
Twenty years ago, most computers used tapes with seven tracks of data; a decade ago, seven-track tapes were replaced with nine-track systems, which are now being replaced with IBM 3480 data cartridges and optical disks, says Green.
But NASA still has 20,000 seven-track tapes containing valuable scientific data from past missions. Seven years ago, when the data center's seven-track tape drive broke, NASA bought the last seven-track drive made in the country, says Green.
It will take more than a year to copy data off the remaining tapes. Hopefully, the archive's seven-track drive will last. "When the last tape drive available to read the seven-tracks gives out, we might as well throw [the tapes] we have left away," Green says. Computer catch-up
Simply having the computer files transferred to the next generation of computer hardware is rarely enough. "Software often captures the data in a fairly proprietary way," says Bearman. Many word processors, spread sheets, and database programs store their files in secret, proprietary formats. Nothing guarantees that next year's programs will be able to read last year's files.
To compound the problems, says Bearman, documents increasingly incorporate sound, video, or computer-generated animations.
Documents are being created that are a lot more complicated than the standard ASCII, or straight text. While these systems make great presentations today, there's no guarantee that they will look the same way when played back on a computer 10 years from now. In fact, says Bearman, there's no assurance that computers 10 years from now will be able to replay these multi-media documents at all.
Being unable to access today's computer records in the future poses profound public policy issues, says Sheryl Walter, general council for the National Security Archive, a private Washington-based watchdog group. "People who know that their actions are being recorded, and can be reviewed in the future, are more likely to act in a way that they would want to be accounted for," she says.
Nevertheless, despite the problems and extra precautions that must be taken with electronic libraries, experts agree they are only going to become more common.
"It's hopeless to store all these things on paper," says Fred Wood, a senior associate at the Congressional Office of Technology Assessment. Electronic documents can be searched far faster than paper archives, making them more valuable than their paper counterparts. But most important, electronic documents require only a fraction of the total storage space as their paper counterparts. "That's a major problem for archives and libraries - they can't get more space, and they're not going to get it," Mr. Wo od says.