Friday, July 10, 2009

Managing Exabytes

According to this YouTube video, an estimated four exabytes “of unique information” will be created this year—“more than the previous 5,000 years.”

Some years ago, to confront the challenges of managing and preserving large amounts of data generated over decades of space missions and observations, NASA’s Consultative Committee for Space Data Systems created the reference model for Open Archival Information Systems (OAIS).

A helpful introduction to this 148-page document is available here:

Unlike many standards, OAIS specifies no particular implementation, API, data format, or protocol. Instead, it’s an abstract model that provides four basic things:

  • A vocabulary for talking about common operations, services, and information structures of a repository. ... [see section 1]
  • A simple data model for the information that a repository takes in (or “ingests”, to use the OAIS vocabulary), manages internally, and provides to others. … [see sections 2 and 4]
  • A set of required responsibilities of the archive. … [see sections 3 and 5]
  • A set of recommended functions for carrying out the archive’s required responsibilities. … [see section 6].

Building on this model, in 2002 RLG-OCLC released a report on Trusted Digital Repositories: Attributes and Responsibilities, which was followed in 2007 by the RLG-NARA Trustworthy Repositories Audit & Certification: Criteria and Checklist. A number of institutions at present are working to become trusted digital repositories.