Escolar Documentos
Profissional Documentos
Cultura Documentos
TITLE Tape
GOES HERE
Combining
SNIA Cloud,
and Container
Format Technologies for the Long Term
Retention of Big Data
Abstract
Combining SNIA Cloud, Tape and Container Format
Technologies for the Long Term Retention of Big
Data
Generating and collecting very large data sets is becoming a necessity in many domains that
also need to keep that data for long periods. Examples include astronomy, atmospheric
science, genomics, medical records, photographic archives, video archives, and large-scale
e-commerce. While this presents significant opportunities, a key challenge is providing
economically scalable storage systems to efficiently store and preserve the data, as well as
to enable search, access, and analytics on that data in the far future.
Both cloud and tape technologies are viable alternatives for storage of big data and SNIA
supports their standardization. The SNIA Cloud Data Management Interface (CDMI) provides
a standardized interface to create, retrieve, update, and delete objects in a cloud. The SNIA
Linear Tape File System (LTFS) takes advantage of a new generation of tape hardware to
provide efficient access to tape using standard, familiar system tools and interfaces. In
addition, the SNIA Self-contained Information Retention Format (SIRF) defines a storage
container for long term retention that will enable future applications to interpret stored data
regardless of the application that originally produced it.
This tutorial will present advantages and challenges in long term retention of big data, as well
as initial work on how to combine SIRF with LTFS and SIRF with CDMI to address some of
those challenges. SIRF with CDMI will also be examined in the European Union integrated
research project ENSURE Enabling kNowledge, Sustainability, Usability and Recovery for
Economic value.
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Outline
Introduction
SNIA technologies
Big Data..
Really is BIG..
2.5 quintillion (1018) bytes of new data created per day in 2012
(source IBM)
Satellite data is
kept for ever
Scientific and
Cultural
We would like to
keep digital art
for ever
X-rays are
often stored for
periods of 75
years
Records of
minors are
needed until 20 to
43 years of age
M&E
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Security Risk
Protection of business or
intellectual assets
Business Risk
Business Risk
Compliance
Requirements
Compliance
Requirements
Meeting regulatory
requirements
Meeting regulatory
requirements
Security Risk
Legal Risk
Legal Risk
10%
20%
30%
40%
50%
60%
Percent of Respondents
>50-100 Years
13.1%
>21-50 Years
15.7%
>11-20 Years
0.0%
12.3%
>7-10 Years
>3-6 Years
38.8%
1.9%
5.0% 10.0%
15.0%
25.0%
30.0% Format
35.0% Technologies
40.0%
Combining
SNIA
Cloud,20.0%
Tape and
Container
for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
2007
To: roger_cummings@symantec.com
From: sue@somewhere.com
Subject: Something else
To: gary.phillips@veritas.com
From: fred@nowhere.com
Subject: Something or other
To: gary_phillips@symantec.com
From: sue@somewhere.com
Subject: Something else
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Outline
Introduction
SNIA technologies
10
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
11
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
12
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
13
Index Partition
Guard Wraps
File
B
O
T
File
File
E
O
T
File
File
Data Partition
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
14
An Analogy
15
SIRF Properties
Required Properties
Self-describing can be interpreted by different
systems
Self-contained all data needed for the
interpretation is in the container
Extensible so it can meet future needs
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
16
SIRF Components
A SIRF container includes:
A magic object: identifies
SIRF container and its
version
Multiple preservation
objects that are immutable
A catalog that is
Updatable
Contains metadata to make
container and preservation
objects portable into the
17
Outline
Introduction
SNIA technologies
18
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
19
cat Patient PO
PO Container PO
We can read the various preservation objects and the catalog object via
CDMI REST API as follows:
GET <root URI>/<PatientContainer>/encounterJan2001
GET <root URI>/<PatientContainer>/chestImage
GET <root URI>/ PatientContainer>/sirfCatalog
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
20
PatientContainer
sirfCatalog
{
"encounterJan2001":[
"IDs": [{ ...},]
"Fixity": [{ ...},]
]
"chestImage":[
"IDs": [{ ...},]
"Fixity": [{ ...},]
]
}
Simple
Simple
PO
Simple
Encounter
PO
PO
Jan2001
Simple PO
cestImage
cestImage
manifest
cestImage
manifest
chestImage
manifest
cestImage manifest
cestImage
cestImage
cestImage
dicom1
dicom1
cestImage
cestImage
dicom1
dicom1
chestImage
chestImage
dicom1
dicom1
dicom1
dicom1
Composite PO
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
21
22
Label
Construct
DP
Label
Construct
SIRF
catalog
Preservation
Object
LTFS
index
Preservation
Object
A SIRF preservation object (PO) that is a simple object (contains one element)
is mapped to a LTFS file
A SIRF PO that is a composite object (contains several elements) is mapped to:
a set of LTFS files (one for each element) and a manifest file that its content includes
the IDs and fixities of the element data objects
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
23
VOL1
Label
fixed-size
80 bytes
LTFS
Label
SIRF
Label
XML
XML
24
Outline
Introduction
SNIA technologies
25
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
26
27
Summary
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
28
29
Additional Contributors
30
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
31