Preservation Strategies for Digital Objects

As digital resources proliferate the modern culture, the necessity of learning how to preserve them for the future grows as well. While the concepts and problems associated with object preservation have always been around (one need look no further than the destruction of the great library in Alexandria in the first century BC to see the result of a failure to adequately preserve ancient texts), the problems posed by digital objects are somewhat unique, and as such require special attention.

Firstly, the rapid advancement of hardware capabilities has itself harmed our ability to retrieve digital information locked in older formats. As media becomes more dense, older standards are abandoned for newer, and eventually, support for an older device type is lessened, then dropped. Eventually, it may become difficult to even locate the specs that would detail how the bitstream was written to the media. A good example of this phenomena is the 5.25" floppy drives; once quite common, today it is definately a challenge to read data off a 5.25" disk, if you are unlucky enough to have the need.

Secondly, even if the physical hardware and media formats remain consistent, the abstract layer above them -- the software that determines how the bits are laid down across the media, and how to extract and reassemble them again -- will almost certainly change over the years. This will present a different challenge to the librarian, and usually in tandem with the aforementioned difficulties with the hardware.

Thirdly, the existing media commonly used for long-term storage (mostly CD-ROMs and DVDs today) has simply not been around long enough for us to know how long it will last in an archived environment (assuming that there will be hardware and software around to read it when its needed). Although some manufacturers claim lifespans of up to 100 years for their media, those dates are ultimately conjecture, since the optical disk media is less than 15 years old. Because of this issue, the current strategy for preservation of digital objects usually involves regular, systematic copying without loss (and as we're dealing with digital objects, this process is not as painful as it sounds.

Once the process of copying has been decided on, there is again a decision point: what exactlyis copied? We can opt to copy the physical bitstream of the original document, in their original format, and simply emulate the specific characteristics of the machine that extracted and interpreted them; or we can choose to preserve the logical means by which the document is interpreted (the "business logic", so to speak), through a process of regular migration of the document from older to newer formats and media.

A good example of the former strategy is found in many places in the online gaming community. Concerned about the number of classic and original computer games that were no longer playable, coders developed emulators to create a virtual machine inside a modern computer that exposes itself to the original code as an older machine. A good example of the latter strategy is emplyed by almost every office orker worldwide as they upgrade their documents yet again to the very latest version of Microsoft Word. In this case, the migration usually gains them nothing (aside from a few minor new features), but after enough cycles, the newest versions of Word can no longer read the oldest file formats, so if you weren't upgrading consistentaly along the way, eventually you'll find youself up the proverbial creek without a paddle (or hardware emulator.. ?).