Escolar Documentos
Profissional Documentos
Cultura Documentos
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
Graduate Students
A. Bagchi S. Bansal A. Behere R. Bharath S. Bharath L. Sui N. Cotofana D. Le J. Trang L. Yin +/- NN
Undergraduate Interns
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
Topics
Digital preservation approach Levels of abstraction Application to NARA collections
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
E R A : A r ch iv a l C o m p o n en t s C o n c ep t
S t o r a g e R e s o u r c e B r o k e r / E x t e n s i b l e M e ta - d a t a C A T a l o g
A c c e s s io n i n g
T ap es
G r i d S e c u r i t y I n f r a s t r u c tu r e
R ef e r en c e W o r k b en c h
W o r k b en c h
A c c e s s io n
Q u ery
C o ll e c t i o n D is k s
V e r if y R e b u il d
C o ll e c t i o n
W ra p & C o n t a i n e r iz e
In te rn et
D es c r ib e
M e ta d a t a
P r e se n t
M e d i a ti o n o f
R e c o r ds S c h e d u le s
I n f o r m a t io n u s i n g
X M L
O rd e r F u lf i ll m e n t S y s te m
A r c h iv a l R e s e a r c h C a t a l o g
Operating System
Storage System
Display System
Digital Object
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Digital Object
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Digital Object
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Create storage abstraction layer Storage Resource Broker (SRB) provides data management system
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Data Transport
Metadata Transport
Prime Server
Servers
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
Physical representation
What is the physical structure of the digital entity?
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
Digital Entity
Disk
SanDiegoSupercomputerCenter
NationalPartnershipforAdvancedComputationalInfrastructure
Physical:
Encoding Format (syntax, structure)
Digital Entity
Files
Physical:
Data Handling System -SRB/MCAT
NationalPartnershipforAdvancedComputationalInfrastructure
Information Management
Abstraction layer for interacting with information repositories
Manage the schema and physical table structures of a database Extensible schema User defined attributes
Extensible Metadata CATalog (EMCAT) manages collections mySRB.html interface supports dynamic collection creation
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Digital Entity
Physical: EMCAT/CWM
Database
SanDiegoSupercomputerCenter
NationalPartnershipforAdvancedComputationalInfrastructure
Characterization of knowledge repository operations Mapping from collection attributes to discipline concepts Mapping from knowledge relationships to rules for application in inference engines
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Digital Entity
Concept Space
(ontology instance)
Logical:
Knowledge Repository Schema
Physical:
Model-based Mediation System
Knowledge Repository
SanDiegoSupercomputerCenter
NationalPartnershipforAdvancedComputationalInfrastructure
Persistent Archives
Storage system abstraction
Logical name space and data manipulations
NARA Prototype
Demonstrate ability to ingest, archive, recreate, query, and present a digital object from a 1 million record E-mail collection (RFC1036)
2.5 GB of data 6 required fields 13 optional fields User defined fields (over 1000)
<!ELEME NT rfc1036_mesg (headers, body)> <!ELEME NT headers (required_headers, optional_headers, other_headers)> <!ELEME NT body #PCDATA>
<!ELEME NT required_headers (From, Date, Newsgroups, Subject, Me ssage-ID, Path)> <!ELEME NT optional_headers (Folloup-To?, Expires?, Reply-To?, Sender?, References?, Control?, Distribution?, Keywords?, Summary?, Approved Lines?, Xref?, Organization?)> <!ELEME NT other_headers other+> <!-- 6 r equired header keywords --> <!ELEME NT From #PCDATA> <!ELEME NT Date #PCDATA> <!ELEME NT Newsgroups #PCDATA> <!ELEME NT Subject #PCDATA> <!ELEME NT Message-ID #PCDATA> <!ELEME NT Path #PCDATA> <!ATTLIST From <!ATTLIST Date <!ATTLIST Newsgroups <!ATTLIST Subject <!ATTLIST Message-ID seqno seqno seqno seqno seqno CDATA #REQUIRED> CDATA #REQUIRED> CDATA #REQUIRED> CDATA #REQUIRED> CDATA #REQUIRED>
SanDiegoSupercomputerCenter
<!ATTLIST P ath
NationalPartnershipforAdvancedComputationalInfrastructure
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter
Creation of attributes that represent the accessioning template concepts Analysis of attributes for anomalies and implied inherent knowledge
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Data Organization
NationalPartnershipforAdvancedComputationalInfrastructure
Data Storage
SanDiegoSupercomputerCenter
Data Organization
NationalPartnershipforAdvancedComputationalInfrastructure
Collection Storage
SanDiegoSupercomputerCenter
Collection Storage
SanDiegoSupercomputerCenter
Persistent Collection
Define context for archiving data -annotate information content Create archivable form - standard encoding format Archive information content along with data Test closure of the collection - all digital objects that can be discovered in the collection are members of the collection Test completeness of the collection - inherent relationships within the collection can be cast in terms of attributes generated from the annotated information.
Differentiate between inherent knowledge and anomalies / artifacts
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Self-Instantiating Archive
Archive the processes that are used to control the ingestion process
Conversion to archivable form Annotation of information content
When accessing the collection, retrieve the processes and the original digital objects
Apply the processing steps to re-create the information content Query the result to discover desired digital objects
Information
Any tagged data, which is treated as an attribute. Attributes may be tagged data within the digital object, or tagged data that is associated with the digital object
Knowledge
Relationships between attributes Relationships can be procedural/temporal, structural/spatial, logical/semantic, functional
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Temporal / procedural
Workflow systems
Spatial / structural
GIS systems
Functional / algorithmic
Scientific feature analysis
NationalPartnershipforAdvancedComputationalInfrastructure SanDiegoSupercomputerCenter
Access Services
Knowledge
XML DTD
Information
Attributes Semantics
Feature-based Query
SanDiegoSupercomputerCenter
NationalPartnershipforAdvancedComputationalInfrastructure
Further Information
http://www.npaci.edu/DICE
NationalPartnershipforAdvancedComputationalInfrastructure
SanDiegoSupercomputerCenter