Você está na página 1de 5

Keeping Our Bits About Us

When it comes to preserving your digital heritage, backup is only the


beginning. Is that photo album you stored on DVD going to be readable in 20
years?

You just used your high-powered digital camera to take wonderful pictures of
your kids romping at the petting zoo. But unless youre both careful and lucky,
your grandkids may never get to see them. People are accumulating digital
photos and music and tax returns and personal correspondence,says William
LeFurgy, digital initiatives project manager at the Library of Congress.
Eventually the disks going to fail. If you havent backed it up, its gone.
Multiply that problem by a billion or so and you begin to understand the
challenge of preserving information born digitalanything and everything
that began its life as electronic ones and zeroes.
A 2003 University of California study estimated that new information in
electronic form amounted to about 17.7 exabytes per annum17.7 billion
gigabytes. The number has only grown since. Nowadays informationwhether
document, photo, architectural rendering, high-def video or aircraft design
starts out as bits. And preserving those bits for posterity isnt as easy as
sticking a sheet of paper in a drawer.
Its not at all uncommon to have people looking at printed records that are
200 years old, says Clifford Lynch, executive director of the Coalition for
Networked Information. In the traditional world many things would survive
for an awfully long time just through benign neglect. For digital things, they
will survive only if people plan and think systematically about their survival on
a continuing basis. The issues surrounding our digital heritage have become
so complex that intense academic, institutional and corporate efforts are
under way to develop means of preserving digitally born data and having a
chance of understanding it decades and centuries from now.

The Library of Congress is midway through its ten-year, $100 million National
Digital Information Infrastructure & Preservation Project, designed to develop
digital preservation strategies. Last September the National Archives awarded
Lockheed Martin a $308 million contract to develop ways of preserving
diverse electronic government records. In the past year and a half Iron
Mountain, a 55-year-old company specializing in the storage of physical
records, bought digital-archiving specialists Connected and LiveVault.
At minimum, future digital archeologists will need ways to extract information
from existing and future storage media. Since devices eventually become
unavailable or unworkablewhen was the last time you saw a Commodore 64
floppy disk drive?organizations bent on preservation move info from older
systems to newer ones regularly. Bill Gates company Corbis, for example,
stores 73 terabytes73,000 gigabytesof images on hard drives it upgrades on
a three-year schedule.
Just storing bits may be the easy part. Making them usable can require the
hardware and software that created them. The multimedia BBCDomesday
project,a record of life in Britain, cost (at historic exchange rates) $4.2 million
to develop in 1986; restoring it only 15 years later required reconstructing an
obsolete computer and laser-disk player, reverse-engineering software data
structures and writing a new program. The Washington State Archives
maintains a legacy library of old hardware and software and is beginning to
collect some oft-overlooked missing links:manuals and how-to books.
Sometimes software can stand in for hardware. Software emulators let
thousands of old arcade and console games be played on todays PCsthough
few of those games have been licensed legally. But seemingly simple chores
like translating file formats can be tricky; for example, no competing word
processor renders every last element of Microsoft Word files with absolute
fidelity.
The new game is to preserve and extract electronic records free from
dependence on any specific hardware or software,in the words of the National
Archives. Given the diversity of whats born digital today, this is a daunting
task. Kenneth Thibodeau, director of the National Archives Electronic
Records Archives program, points to Navy ships with a life span of 50 years:

All the records to keep the ships operational are digital, including computerassisted manufacturing data designed to interface with a particular tool. As the
ship gets older, How do they know that the data can be used to replace a
system if it gets damaged?
One key to minimizing the importance of original hardware and software is
metadataadditional data that describes the digital information and explains
how to handle it. As Thibodeau puts it, theoretically you wrap the records in
enough information that you could figure out what youve got and what you
need to do with it.
Until that exalted state comes to pass, simpler metadata can help users search
and retrieve born-digital content. Information like the date-and-time stamp
attached to data files, the lens and shutter info embedded in digital photo files
and the correspondents included in e-mail add descriptive information
without forcing users to take extra action. The content in text files amounts to
internal metadata ripe for automatic indexing.
Sound and image files put more demands on humans to create metadata that
makes them useful and searchableas anybody whos received hilarious
automated results from Googles image search can attest. Metadata standards
do exist: Digital photojournalists, for example, often use a standard called
IPTC for captions, locations and credits. Closed captions provide a form of
internal metadata for TVshows. Communal metadata, like the tagging from
users of sites like Flickr or del.icio.us., help categorize Web pages and
snapshots for retrieval.
Nonetheless, much of todays informationlike, say, the Webrefuses to sit
still for its portrait. If you rely on the representations of a governmental or
corporate Web site, how do you later prove what was there? The Internet
Archive stores snapshots of the publicly available Web, but Brewster Kahle, its
director and cofounder, points out that its like a camera with a shutter that
takes two months to get the picture. Plenty of change can happen in the
interim.
That changeability leads to another archival problem: authenticity. New
Securities & Exchange Commission regulations require that if securities

dealers maintain mandated transaction records in electronic form, they must


be serialized, time stamped and stored on a nonrewritable, nonerasable
medium in more than one location. Stringent but less specific rules apply for
health records and those dealing with Sarbanes-Oxley complianceand in
lawsuits that compel discovery of electronic information. Compounding the
headache for the health care industry is another preservation challenge: digital
privacy.
One of the traditional roles of archivistsdeciding what to throw awayis
becoming unnecessary in many situations. Lynch observes that todays cheap
storage media make it relatively easy to store virtually everything you create
directly.
But sensors of all sortsscientific, agricultural, mechanical, and even the ones
in a digital camcordercan rapidly create massive collections of digital data.
Scientific endeavors like Microsoft cofounder Paul Allens Brain Atlas project
to map the activity of genes in brain cells can create a terabytea thousand
gigabtyesof data every day. Given the costs of maintaining massive
collections, determining what to keep and what to discard will remain an
issue.
Automation is likely to help simplify some aspects of digital preservation. For
example, BBNTechnologies PodZinger Web site uses speech-to-text software
to index podcasts and let you search their content. Software that can analyze
images may one day let them be catalogued and retrieved with minimal
human intervention. And digital advantages, like the ease of storing multiple
copies of documents at separate locations, make preservation a key way to
dodge the consequences of regional disasters like Hurricane Katrina.
The Internet Archives Kahle sees easy access to data as digital preservations
ultimate rationale.Access,he says, drives preservation. As the issues get
sorted out, the real achievement of digital preservation may turn out to be in
collaboration with the World Wide Webopening up heretofore hidden realms
of information to the genealogists, historians, scientists, authors, musicians
and videographers of today and tomorrow.

Você também pode gostar