From wax cylinders to 'the cloud': How to preserve data for the long term

The University of California, Santa Barbara, is digitizing wax cylinders from the late 1800s and early 1900s, including some of the earliest sounds ever recorded. What's the best way to store information so that it can be retrieved by future generations?

The original Edison phonograph, shown here, recorded and played sounds on a sheet of tinfoil wrapped around a cylinder. Tinfoil was replaced by wax and later by celluloid, which is a more durable material.

Edison National Historic Site/National Park Service

November 11, 2015

Before Spotify, CDs, cassettes, or vinyl, there were wax cylinders. Invented by Thomas Edison in the late 1870s, the cylinders could be dropped into a phonograph to play back recorded sounds or music. But early versions were exceedingly fragile, and could only be played a few dozen times before the grooves in the wax surface wore down.

Now, the University of California, Santa Barbara, is trying to preserve some of those early sound recordings in a format that’s a bit more durable than wax. The UC team is digitizing its collection of wax cylinder recordings from the late 1800s and early 1900s, and has even made more than 10,000 such recordings available to stream online. The collection includes pop songs, poems, dramatic readings, opera, and speeches.

The collection is being digitized using an Archéophone, a purpose-built phonograph that converts sound from wax and metal cylinders to modern formats. The team used special styluses so as not to cause extra damage to the recordings, and ran the resulting files through a series of software filters to remove clicks, crackles, buzzes, and hisses.

Trump vows to fire bureaucrats. Here’s why Biden is trying to stop him.

“Many cylinders sound wonderful, while others are almost unlistenable, even after having undergone treatment,” the university concedes on its site. “Project staff decided that bad copies were better than no copies at all since ... the public's ability to hear even copies in poor condition is essentially nil.” 

The university’s project raises another important question: what’s the best way to preserve information for the very long term? Wax cylinders degrade quickly. CDs can last for up to 200 years if stored in a stable environment, but will eventually develop bit errors and become unusable. Even data stored digitally is susceptible to “bit rot,” meaning that it might become unreadable once the programs and computer formats used to view it stop being supported.

One solution may be found in the cloud. Data stored in commercial cloud services such as Dropbox or iCloud is kept in multiple redundant locations, so it won’t be taken out even if one server succumbs to fire or flood. The Internet Archive, a non-profit organization that digitizes websites, computer software, books, music, and more, follows this model. The archive mirrors its data in San Francisco, Redwood City, and Richmond, Calif.; Alexandria, Egypt; and Amsterdam. The Archive’s staff recently started rewriting the code for the Wayback Machine, which collects snapshots of websites over time, to support more formats and automatically restore broken links. 

To store information for the very long term, follow NASA’s strategy for the Voyager spacecraft. The administration doesn't have a sterling track record – in the 1990s it lost more than a million reels of data, including the Apollo moonwalk footage – but in 1977, NASA engineers needed to stash a recording of Earth’s sights and sounds aboard Voyager 1 and Voyager 2, in a format that could potentially be retrieved in a million years or more by an extraterrestrial civilization. NASA settled on a copper record plated in gold, the etchings on which will last for hundreds of millions of years before ongoing micrometeoroid impacts render the information unreadable.