Você está na página 1de 160

EMC© Data Domain DD OS 5.

1
Technology and Systems Introduction

Student Guide

Version: A.1
January, 2012

Backup Recovery Systems Division


EMC© Data Domain
2421 Mission College Boulevard
Santa Clara, CA 95054
866-WE-DEDUPE
408-980-4800
www.datadomain.com

 
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC
CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Using, copying, and distributing EMC software
described in this publication requires an applicable software license. EMC2, EMC, Data Domain, Global
Compression™, SISL™, the EMC logo, and where information lives are registered trademarks or trademarks of
EMC Corporation in the United States and other countries. All other trademarks used herein are the property
of their respective owners. © Copyright 2009-2012 EMC Corporation. All rights reserved. Published in the
USA.

 
Welcome to EMC Data Domain Technology and Systems Introduction.
Welcome to EMC Data Domain Technology and Systems Introduction.
Click the Notes tab at any time to view text that corresponds to the audio recording.
Click the Supporting Materials tab to download a PDF version of this eLearning.
Copyright © 1996, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 EMC Corporation. All Rights Reserved. EMC 
believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.  
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.”  EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF 
ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF 
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, Data Domain, RSA, EMC Centera, EMC ControlCenter, EMC LifeLine, EMC OnCourse, EMC Proven, EMC Snap, EMC SourceOne, 
EMC Storage Administrator, Acartus, Access Logix, AdvantEdge, AlphaStor, ApplicationXtender, ArchiveXtender, Atmos, Authentica, 
Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Captiva, Catalog Solution, C‐Clip, Celerra, 
Celerra Replicator, Centera, CenterStage, CentraStar, ClaimPack, ClaimsEditor, CLARiiON, ClientPak, Codebook Correlation Technology, 
Common Information Model, Configuration Intelligence, Configuresoft, Connectrix, CopyCross, CopyPoint, Dantz, DatabaseXtender, Direct 
Matrix Architecture, DiskXtender, DiskXtender 2000, Document Sciences, Documentum, elnput, E‐Lab, EmailXaminer, EmailXtender, 
Enginuity, eRoom, Event Explorer, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, 
Greenplum, HighRoad, HomeBase, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS, Max Retriever, 
MediaStor, MirrorView, Navisphere, NetWorker, nLayers, OnAlert, OpenScale, PixTools, Powerlink, PowerPath, PowerSnap, QuickScan,
MediaStor, MirrorView, Navisphere, NetWorker, nLayers, OnAlert, OpenScale, PixTools, Powerlink, PowerPath, PowerSnap, QuickScan, 
Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, Smarts, 
SnapImage, SnapSure, SnapView, SRDF, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix 
VMAX, TimeFinder, UltraFlex, UltraPoint, UltraScale, Unisphere, VMAX, Vblock, Viewlets, Virtual Matrix, Virtual Matrix Architecture, Virtual 
Provisioning, VisualSAN, VisualSRM, Voyence, VPLEX, VSAM‐Assist, WebXtender, xPression, xPresso, YottaYotta, the EMC logo, and where 
information lives, are registered trademarks or trademarks of EMC Corporation in the United States and other countries. 

All other trademarks used herein are the property of their respective owners.

© Copyright 2012 EMC Corporation. All rights reserved. Published in the USA.

Revision Date: January 2012


Revision Number: MR-5WP-DDTSINTRO

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 1
In this course you will learn about:
In this course you will learn about:
Data Domain solutions, including the important role played by EMC data domain systems in a 
variety of today’s infrastructure solutions.
Data Domain technology, including core technologies, and a wide range of supported 
protocols and topologies. 
Data Domain systems, including the full range of appliances and systems in the current 
product line as of this publication.
Data Domain software, including a number of licensed options enabling important 
functionality. 

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 2
The Technology and Systems Introduction course provides students with basic foundation 
The Technology and Systems Introduction course provides students with basic foundation
knowledge of EMC Data Domain products, including technology, systems, and software.
This course is intended for individuals who desire an overview of EMC Data Domain system 
features and functionality, and forms the foundation for more advanced training and 
certification.
To complete this course successfully, a student should have an understanding of
computer storage, networking
ki and d backup
b k concepts.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 3
This module provides a general overview of a number of solutions in which Data Domain systems
play an important role.
In this module you will learn about: the importance of EMC Data Domain systems in today's IT
infrastructures and the benefits they provide; a variety of IT solutions in which Data Domain
systems play a key role.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 4
This lesson describes why customers choose EMC Data Domain systems based on a number of
key benefits.
In this lesson you will learn about: typical backup environments with and without Data Domain
systems; the migration from tape to disk-based backup and recovery systems in today's
environments; the significant impact of speed provided by Data Domain's unique technology and
architecture.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 5
Increasing the capacity speed and cost-effectiveness of data storage is a
perpetual challenge. One of the most expensive and resource intensive tasks is
gathering storing and protecting data backups. Writing data to tape and shipping
and storing the tapes off-site is one of the largest financial and labor resource
challenges in the conventional tape centric environment. The diagram illustrates
the conventional process of handling backups through backup servers which then
store that content in tape libraries. The tapes may be retained on-site for quick
pp and stored off-site and data recoveryy must
retrieval or in some cases shipped
reverse the steps with an involved manual process of moving the data back on
site.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 6
Data Domain systems simplify the storage and handling of data by reducing,
reducing or in many cases
cases,
entirely eliminating the need for tape for data storage. With Data Domain systems, data is backed
up to disk instead of to tape. Data domain deduplication greatly reduces the data footprint before
the data is backed up. Data domain global compression technology combines an exceptionally
efficient high-performance in-line deduplication technology with a local compression technique.
The reduced data footprint allows data to be retained on-site for longer periods and allows
transfer across the network for archival. Data recovery is similarly transformed by the elimination
of the time-consuming
time consuming and resource intensive handling of tape.
tape

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 7
This illustration provides a quick snapshot of the evolution of backup and recovery from tape to
disk. From top to bottom, the conventional tape centric backup and recovery process has evolved
from a heavily burdened manual process to some degree of automation with the introduction of
VTL or virtual tape library systems. More recently, the introduction of disk-based technologies
such as EMC Data Domain systems have transformed the process, providing both hybrid digital
emulation of virtual tape library systems as well is purely disk-based backup recovery and
archival solutions. Moving from left to right in the diagram, the move from tape to disk has had a
significant effect on each important stage of the process including seamless integration with a
wide variety of application backup clients and media management architectures, a smaller
footprint in the on-site backup storage space, and fast and efficient disaster recovery involving a
process known as replication in which disk-based backup data is replicated or copied off premise
to the disaster recovery site also known as the DR site. A number of additional Data Domain
technologies provide a unique level of protection at the DR site.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 8
Title
Month Year

Beyond the transformational migration from tape to diskdisk, Data Domain


Domain'ss unique core
technologies provide the added benefits of exceptional ingest speed. Quite simply, the faster the
system in terms of ingest speed, the larger the environment the system can support. As the chart
indicates the size of the environment to protect is fundamentally defined as the number of hours
in the backup windows, and by how many terabytes the system can ingest during each backup
window. In today's customer environments the backup window is defined by the customer and is
fixed or shrinking and therefore increases in ingest speed drive an increase in the amount of data
that can be protected
protected. Since their introduction Data Domain systems continue to transform this
fact of the IT environment. While many competitors claim extreme capacity, Data Domain systems
have always matched performance in terms of throughput or ingest speed and system capacity,
resulting in the ability to protect ever larger customer environments within consistent or smaller
backup Windows with each new release.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 9
Due to their unique architecture Data Domain systems have continued to scale in significant
ways. The red line on this slide shows how throughput and addressable capacity have increased
over time in the flagship Data Domain system for industry-standard data access protocols such as
NFS, CIFS, and VTL. However in 2010 EMC introduced two important products: Data Domain
Boost, represented by the green line, and Data Domain Global Deduplication Array, represented
by the blue line – both of which bump this trajectory to a whole new level. The key to the
scalability is the Data Domain CPU centric architecture referred to as Stream Informed Segment
Layout or SISL which relies on CPU processing power rather than disk spindles to scale
performance. This allows Data Domain systems to scale in performance every time Intel
introduces a new CPU, versus competitors that rely on disk vendors to increase performance over
time. Since 2004 Data Domain systems have increased 175 times on throughput and 450 times
in capacity.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 10
In summary here are the key transformational advantages customers get from EMC Data Domain.
Domain
Customers can retain backups longer on-site using less space through deduplication that delivers
10 to 30 times data reduction compared to traditional methods, while also reducing or
eliminating entirely the use of tape; replicate smarter by moving the duplicated data over existing
networks with up to 99% bandwidth efficiency for fast cost-effective disaster recovery; and
recover reliably from disk with continuous fault detection and self-healing insuring data
recoverability to meet the requirements of service level agreements.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 11
This lesson helps place
This lesson helps place EMC Data Domain systems in the context of a number of common 
EMC Data Domain systems in the context of a number of common
infrastructure solutions.
In this lesson you will learn about: Data Domain's role in backup and recovery solutions;
archiving and compliance solutions; business continuity and availability solutions; data center in
green IT solutions; and virtualization and consolidation solutions.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 12
Deployed in the context of today
today'ss backup and recovery solutions,
solutions Data Domain systems optimize
data protection environments by minimizing reliance on tape. Customers can consolidate tape
operations to reduce costs and ease management while significantly reducing the backup
windows and quickly restoring key applications. Both local and remote Data Domain
deduplication storage reduces data volume by 10 to 30 times making disk backup cost effective
and WAN or wide area network vaulting of data operationally feasible. In summary, backup and
recovery solutions employing Data Domain systems minimize risk, improve data protection and
recovery and control costs
recovery, costs.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 13
For archiving and compliance solutions Data Domain systems allow customers to cost cost-effectively
effectively
archive non-changing data while keeping it online for fast reliable access and recovery. These
capabilities free up expensive storage to significantly improve operational efficiency and control
costs. Data Domain provides a capacity optimized storage tier for backup and archive of data in a
single system. File level retention locking enables active archive protection for IT governance.
Data Domain systems offer field proven and automated data replication, data deduplication,
enterprise management, and built-in data safety for extended on-site and off-site retention and
recovery In combination these capabilities enhance compliance and eDiscovery,
recovery. eDiscovery allowing
customers to adhere to government industry corporate and legal mandates.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 14
In the context of business continuity and availability Data Domain systems optimize data
protection and allow customers to quickly restore key applications. The incorporation of Data
Domain systems into business continuity and availability solutions helps to assure business
operations that recover quickly from disasters while improving availability of key business
systems.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 15
Data Domain systems allow customers with data center infrastructure to consolidate tape
operations, lower storage acquisition costs, and simplify data protection management.
Customers can utilize cost-efficient storage tiers to achieve green IT goals including data center
floor space reduction and lower power consumption. Customers overcome critical energy
challenges by utilizing Data Domain systems to optimize storage tiers designed specifically for
consolidation and recovery of data center operations. In addition, Data Domain deduplication
storage systems automate DR or data recovery processes and simplify the management of global
data protection assets in a small footprint.
footprint In summary Data Domain systems enable customers
to maximize the energy efficiency of information infrastructure while reducing costs, ensuring
uninterrupted operation and minimizing environmental impact.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 16
Virtualization and consolidation are current cornerstones to improving data center efficiency
efficiency. As
customers consolidate server environments, Data Domain system storage capacity, data
protection policies, and compliance requirements enable cost and management optimization. In
summary Data Domain systems supply environments engaged in virtualization and consolidation
with flexibility control and choice of infrastructure.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 17
In this module you learned about: Data Domain key benefits and Data Domain
Domain'ss role in a number
of today's industry solutions.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 18
This module provides an overview of Data Domain
Domain'ss core technology
technology.
In this module you will learn about: deduplication; Stream Informed Segment Layout or SISL
scaling architecture; Data Invulnerability Architecture (DIA); Data Domain replication; supported
protocols; Data Domain data paths; and Data Domain file systems.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 19
This lesson describes the fundamentals of deduplication.
deduplication
In this lesson you will learn about: deduplication fundamentals; file-based deduplication; fixed
segment deduplication; variable segment size deduplication; in-line versus post process
deduplication; and the specifics of Data Domain's deduplication.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 20
Deduplication eliminates redundant data because it stores only one instance of specific portions
of data. Here's a very simple, text-based analogy. looking at the sentence Mary had a Little Lamb,
thinking in terms of the duplication of the alphabetical characters, the sentence gets stored as
the shortened version shown here. There are no second instances of a letter. Deduplication
recognizes and deletes common elements in data. It stores only one copy of the duplicated data.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 21
Turning to computer
computer-based
based deduplication examples
examples, a very simple form is filefile-based
based
deduplication. For example, consider e-mail attachments in which a single attachment may be
sent to many people. In file-based deduplication if any two files are exactly alike, one file is
stored and future iterations of the file are pointed to the original file. This form of deduplication is
not very efficient.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 22
Segment based deduplication increases the efficiency by allowing sub
Segment-based sub-file
file chunking of data that
is earmarked for deduplication. Smaller segments make it easier for the deduplication system to
find duplicates. Fixed segment deduplication fixes the chunking at a specific size, for example 8
KB. If you add a segment to fixed segment deduplication, the entire data stream moves, which
can be a disadvantage in efficiencies.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 23
Variable segment size deduplication improves on fixed segment size
deduplication because you can add data to a variable segment and it doesn't
move the data stream.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 24 Systems Introduction
EMC Data Domain Technology and 24
Another aspect of deduplication is whether the process occurs in in-line
line, prior to storing the data,
data or
post-process, utilizing disks to manage data during the deduplication. With inline deduplication,
the data is filtered before it is stored. in post process deduplication additional storage and
administration are involved to manage multiple pools of data in various states of deduplication.
Data Domain systems employ in-line deduplication which occurs in RAM -- random-access
memory -- before the data is written to disk. Around 99% of data segments are analyzed in RAM
without requiring disk access. Only a very small amount of data is not identified immediately as
either unique or redundant and that data is stored to disk and examined again later against the
already stored data. Because deduplication is done with limited disk access, the speed of in-line
deduplication is not limited to disk seek times. Stream speed is as fast as other virtual tape
library products that do not have deduplication.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 25 Systems Introduction
EMC Data Domain Technology and
Data Domain deduplication works by breaking the data into segments and then identifying the
unique segments. Local compression is performed on the unique segments using standard
compression algorithms before the unique data is written to disk. The end result is a significant
reduction in the size of the data stored on disk as backup as well is the amount of data to
replicate for disaster recovery. Here's an illustration. On Friday the backup application initiates
the first full backup of 1 TB, but only 250 GB is stored on the Data Domain system. This occurs
because as the data stream is coming into the system, in-line deduplication finds one duplicate
segment -- segment A -- and then compression is applied to the unique segments stored on disk disk.
On average this results in 2 to 4 times reduction in data on the first full backup. Over the course
of the week, 100 GB daily incremental backups result in a 7 to 10 times reduction and only
require 10 GB to be stored due to the data that was already protected from the first full backup.
Finally on the second Friday the second full backup contains almost all redundant data;
therefore, of the 1 TB backup data sent, only 18 GB needs to be stored. In total, over the course of
the week, 2.4 TB of data was backed up to the Data Domain system, but the system only required
308 GB off capacityit to
t protect
t t this
thi data
d t set.
t Overall
O ll this
thi resulted
lt d in
i 7.8
7 8 ti
times reduction
d ti in i one
week. The amount of reduction increases even further over the course of additional weeks and
months.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 26
One of the core technologies pioneered by Data Domain is Stream Informed Segment Layout
scaling architecture which leverages the continued advancement of CPU performance to add
direct benefit to system throughput scalability. In this lesson you will learn about: SISL data flow;
SISL process; and how SISL handles new versus duplicate data.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 27
The main goal of Data Domain SISL technology is to allow deduplication throughput to scale with
CPU performance. SISL achieves this through a combination of patented techniques that allow
the system to identify 99% of duplicate segments in RAM before storing to disk, while also
minimizing disk access by storing and caching related segments and fingerprints together so
groups can be read or written all at once. As a result, Data Domain can utilize the full capacity of
large SATA disks for data protection without increasing RAM, and minimize the number of disks
needed to deliver high throughput. In the long-term, SISL allows DD OS-based system
performance and throughput to track dramatically with CPU speed improvements.
improvements

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 28
Here s the fundamental data flow in SISL.
Here's SISL In the following animation we’ll
we ll examine each step in
detail.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 29
First, the data stream enters RAM on the Data Domain system. 
First the data stream enters RAM on the Data Domain system
SISL slices the incoming data into segments, 4 to12 kilobytes in size.
SISL then creates a fingerprint for each segment.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 30
SISL then verifies if segments are unique
unique. To do this SISL includes a series of techniques
performed in-line in RAM prior to data storage to disk, involving two technologies: the Summary
Vector, and Segment Localities. The summary vector is an in-memory data structure used by DD
OS to quickly identify new unique segments. Identifying the new segments saves the system from
doing a lookup in the on disk index. The Summary Vector is not by itself sufficient for declaring a
segment redundant: a small fraction of the time, typically less than 1%, further steps are
necessary including checking against information stored on disk. One key to disk efficiency is to
retrieve many segments with each access.
access Generally
Generally, a given small segment of data in most
backups will tend to be stored sequentially with the same neighboring segments before it and
after it most of the time. The Data Domain system stores these neighbors together as sequences
of segments in units called segment localities which are packed into containers. With SISL, when
a segment is not found in cache the system looks it up in the on disk index and then pre-fetches
the fingerprints of an entire stream informed locality. The vast majority of the following segments
in the incoming backup data stream are then typically found in the cache without further disk
accesses. Together
T th these
th techniques
t h i andd others
th make
k it possible
ibl to
t find
fi d duplicate
d li t segments t att
high speed in an application independent way, while minimizing array hardware. It requires
neither huge amounts of RAM nor large numbers of disk drives.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 31
Finally, the system then goes through a process of storing the unique segments which includes
Finally
compression of the segments and writing containers full of segments to disk. New segments are
grouped into multi-segment compression regions, and then locally compressed. Segments are
written to a container and SISL continues with this process until a container is filled. SISL then
writes the fingerprint metadata and other information into the container and writes the container
to disk.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 32
This lesson describes one of Data Domain
lesson describes one of Data Domain’ss core technologies known as data invulnerability 
core technologies known as data invulnerability
architecture, or DIA.
In this lesson you will learn about: the four primary ways in which Data Domain's Data
Invulnerability Architecture helps provide safe and reliable storage using end to end verification;
fault avoidance and containment; continuous fault detection and healing; and file system
recoverability.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 33
The Data Invulnerability Architecture or DIA lays out a defense against data integrity issues by
providing unprecedented levels of data protection, data verification and self-healing capabilities
that are unavailable in conventional disk or tape systems. There are three key areas of data
integrity protection described on this slide. The first is end-to-end data verification at backup
time. As illustrated end-to-end verification means reading data after it is written and comparing it
to what was sent to disk, proving that it's reachable through the file system to disk and that the
data is not corrupted. Specifically, when the Data Domain operating system receives a write
request from backup software,
software it computes a checksum over the data after analyzing the data for
redundancy. It stores the new data segments and all of the checksums after all the data has been
written to disk. The Data Domain operating system verifies that it can read the entire file from the
disk platter and through the Data Domain file system, and that the checksums of the data read
back match the checksums of the written data. If there are problems anywhere along the way --
for example if a bit has flipped on a disk drive -- it will be caught. The second key area is a self-
healing file system. Data Domain systems actively re-verify the integrity of all data every week in
an ongoing
i background
b k d process. Thi
This scrub
b process will ill find
fi d and
d repair
i defects
d f t on the
th disk
di k before
b f
they can become a problem. In addition real-time error detection ensures that all data returned to
the user during a restore is correct. On every read from disk the system first verifies that the block
read from the disk is the block expected. It then uses the checksum to verify the integrity of the
data. If any issue is found, the Data Domain operating system will self heal and correct the data
error. In addition to data verification and self-healing, there is a collection of other capabilities
that help with data integrity. The Data Domain system with raid six provides double-disk failure
protection, and NVRAM enables fast safe restarts. Snapshots are also available to provide point-
in-time file system recoverability.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 34
This lesson describes Data Domain replication.
replication
In this lesson you'll learn about: data replication; Data Domain replication types; replication
context; and replication topologies.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 35
Because replication duplicates data over a WAN after its dede-duplicated
duplicated, only unique data
segments are sent over the network and compressed and network demands are reduced. You
replicate or copy data from one Data Domain system to another for the purpose of disaster
recovery, remote office data protection and multiple site tape consolidation, and also on-site
archiving. Once you configure replication between a source and a destination, only new data
written to the source is automatically replicated to the destination. Data is de-duplicated at the
source and at the destination. You can recover off-site replicated data online so you don't need to
transport tape via remounting or by truck.
truck You need replicator license for both source and
destination Data Domain systems.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 36
Replication is set up with a source Data Domain system and one or more destination Data
Domain systems. There are three replication types: collection, directory and pool.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 37
With collection replication the entire backup directory is replicated and duplicated
duplicated. A full system
data replication mirror exists. Any changes made manually on the destination are overwritten
after the next change is made on the source. It is recommended that changes be made only on
the source. The backup directory is used for these purposes. The data is immediately accessible
at the destination. Other than receiving data from the source, the destination is read only, and all
user accounts and passwords are replicated from the source.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 38
Directory replication provides replication at the level of individual directories.
directories Each Data Domain
system can be the source or the destination for multiple directories and can also be a source for
some directories and the destination for others. During directory replication, the Data Domain
system can also perform normal backup and restore operations. A destination Data Domain
system must have available storage capacity that is at least the post compressed size of the
expected maximum size of the source. A single destination Data Domain system can receive
backups from both CIFS clients and NFS clients, as long as separate directories are used for each.
When replication is initialized a destination directory is created automatically if it doesn
doesn'tt already
exist. After replication is initialized, ownership and permissions of the destination directory are
always identical to those of the source directory. At any time, due to differences in global
compression, the source and destination directory can differ in size.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 39
Pool replication is similar to directory replication adapted for Data Domain systems configured as
VTL's. Replicating VTL pools and tape cartridges does not require a VTL license on the destination
Data Domain system. VTL virtual tapes can be replicated from multiple replication originators to a
single replication destination. The system refers to directories that contain VTL tape cartridges or
pools and operates similarly to directory replication.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 40
A replication pair is referred to as a context.
context We can also refer to a replication context as a
replication stream, and although the use case is quite different, the stream resource utilization
within a Data Domain system is roughly equivalent to a read stream for source context or a write
stream for destination context.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 41
This table lists the number of streams available for various models.
models Take a moment to review the
table. Note that if you exceed the number of streams supported by a specific Data Domain model,
your replication throughput may be compromised.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 42
Data Domain has various supported replication topologies where data flows from source to
destination directory over a LAN or WAN. Directory replication can be configured in the following
ways: one-to-one replication is the simplest type of replication. This is from the Data Domain
source system to a Data Domain destination system. In a bidirectional replication pair, data from
the source is replicated to the destination directory on the destination system and from the
source directory on the destination system to the destination directory on the source system. In
many-to-one replication, data flows from several source directories to a single destination
system This type of replication occurs for example when several branch offices replicate their
system.
data to the corporate headquarters IT system. In a one-to-many replication, multi-stream
optimization maximizes replication throughout per context. In a cascaded replication topology,
directory replication is chained among three or more Data Domain systems. Data recovery can be
performed from the non-degraded replication pair context. One additional topology is available:
cascaded one-to-many.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 43
This lesson provides an overview of the supported protocols available with Data Domain systems.
systems
In this lesson you will learn about: the range of supported protocols; how the supported protocols
provide application transparency; how these protocols interact with integrated deduplication and
data invulnerability; how the supported protocols enable replication transparency; and several
use cases for protocols.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 44
Title
Month Year

In summary,
summary Data Domain system protocol flexibility enables easy integration into existing
infrastructures. Protocol transparency provides seamless integration with backup and archive
applications, and allows for simplified and centralized disaster recovery. Customers can leverage
their investment in Data Domain systems for broad protocol usage within their infrastructures.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 45


EMC Data Domain Technology and Systems Introduction
This chart illustrates file system hierarchy in a number of supported protocols
protocols, supporting data
connectivity. Fibre Channel connectivity is supported through VTL. For Ethernet connectivity, DD
Boost NFS, CIFS and NDMP protocols are all supported.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 46


EMC Data Domain Technology and Systems Introduction
The range of supported protocols allows for transparency with many different applications at
work in today's infrastructures. This allows customers to keep their backup environment the
same with no significant changes. There are no new client agents and no new backup software
required. De-duplication applies to all backup applications. This helps keep the load off the
clients while Data Domain systems perform deduplication in the dedicated hardware. There's no
need to have different deduplication systems within various silos within the site. Data Domain
systems also go far beyond backup with optimization, applying to many applications, for
example archive.
archive

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 47


EMC Data Domain Technology and Systems Introduction
The full range of supported protocols also enables replication transparency both in the
production environment and at the data recovery site. Identical protocols allow for seamless
replication and recovery.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 48
Here is a use case of an environment employing NFS and CIFS protocols
protocols. The topology allows
backup archive and direct access working through these protocols achieving deduplication
backup through a single Data Domain system.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 49


EMC Data Domain Technology and Systems Introduction
In this example the environment is employing disk to disk backup with CIFS and NFS.
Data Domain systems easily integrate into these existing IP environments for network
efficient replication for disaster recovery and with support for all leading backup and
archive applications.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 50


EMC Data Domain Technology and Systems Introduction
Here are resources for more information
Here are resources for more information.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 51
This lesson provides a brief overview of Data Domain data paths.
paths
In this lesson you will learn about: data flow in a typical backup environment; data paths in a
typical Ethernet environment; and data paths over Fibre Channel utilizing VTL.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 52
Data Domain systems integrate into typical backup environments non non-intrusively
intrusively. Often Data
Domain systems connect directly to the backup server the backup data flow is redirected from the
clients to the Data Domain device instead of to tape. If tape needs to be made for long-term
archival retention, data flows from the Data Domain device back to the server and then to tape,
completing the same flow that the backup server was doing initially. Tapes come out in the same
standard backup software formats as before and go off-site for long-term retention. If a tape must
be retrieved it goes back into the tape library the data flows back through the backup software to
the client as needed.
needed

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. 53 Systems Introduction
EMC Data Domain Technology and
Copyright © 2009 EMC Corporation. Do not
Copy - All Rights Reserved.

In environments that rely on Ethernet connectivity


connectivity, backup and archive media servers send data
from clients to Data Domain systems on the network. A direct connection between a dedicated
port on the backup or archive server and a dedicated port on the Data Domain system may also
be used. The data is written to the backup file system on the Data Domain system. Physical
separation of the replication traffic from backup traffic can be achieved by using two separate
Ethernet interfaces on the Data Domain system. This allows backups and replication to run
simultaneously without network conflicts.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 54
Copyright © 2009 EMC Corporation. Do not
Copy - All Rights Reserved.

If the Data Domain virtual tape library or VTL option is licensed,


licensed and a VTL Fibre Channel HBA is
installed on the Data Domain system, the system can be connected to a Fibre Channel SAN or
storage area network. The backup or archive media server sees the Data Domain system as one or
multiple VTL's.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 55
This lesson provides a brief introduction to Data Domain file systems.
systems
In this lesson you will learn about: the administrative file system known as ddvar; and the
storage file system known as MTrees.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 56
The Data Domain systems administrative file system is called ddvar.
ddvar the NFS directory is/ddvar
the CIFS rectory is\ddvar. This file system stores system core and log files. You cannot rename or
delete this file system nor can you access all of its subdirectories. Data streams for this file
system change according to the Data Domain OS version and hardware model you have. Check
the Data Domain support portal for more information on data streams for each OS and hardware
model.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 57
For storage file system,
system MTrees are introduced in DD OS release 5.0
5 0 and later to provide more
granular management of data, so the different types of data or data from different sources can be
managed and reported on separately with different policies applied.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 58
At the top of this directory structure is the data folder.
folder Note that you cannot add anything to the
data directory; you can only make changes to the col1 subdirectory. You might wonder why the
entry path includes col1 instead of just data and subfolders. The addition of col1 in the path
allows for the possibility of additional layers or collections: col2, col3 and more in the future.
Note also that the backup MTree cannot be deleted or renamed. Subdirectories, to keep data
separate, can be created under the backup directory. Separately managed directory trees or
MTrees can also be configured under the col1 directory. Up to 14 entries can be active at a time.
Note that up to 100 MTrees can exist
exist, but performance drops sharply if more than 14 are active
active. If
MTrees are added they can be renamed and deleted.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 59
In this module you learned about: deduplication
deduplication, Stream Informed Segment Layout or SISL
scaling architecture, Data Invulnerability Architecture, Data Domain replication, supported
protocols, Data Domain data paths, and Data Domain file systems.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 60
This module provides an overview of the full range of Data Domain systems.
systems
In this module you will learn about: Data Domain's current product line; hardware features across
Data Domain systems; internal versus external storage as it applies to Data Domain's product
line; the DD 800 series appliances; DD 600 series appliances; the DD160 remote office
appliance; DD Archiver; Global Deduplication Array; and system management features.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 61
This diagram illustrates the range of currently marketed Data Domain deduplication
storage systems.
A few words about terminology. In general, the controller is the part of each Data Domain
system that de-duplicates data and performs writes and reads of the data to and from the
storage disks. A system that stores no data internally is a standalone controller, which is
sometimes referred to as a data-less head or DLH. When you encounter the word
controller when it is applied to a Data Domain system, it is generally safe to assume that
a DLH iis b
being
i referred
f d tto.
This table compares the models by speed, logical capacity, and usable capacity. Here are
some definitions:
Speed DD Boost is the throughput measured when EMC Data Domain Boost or DD Boost
is used with 10 Gb Ethernet.
Speed Other is the throughput measured when other protocols such as CIFS, NFS or VTL
are usedd tto access the
th DDatat D
Domain
i system.
t
Logical capacity is the amount of data that can be stored on the system after
deduplication and compression. The logical capacity is calculated based on a mix of
typical enterprise backup data file systems, database, e-mail, developer files, and so
forth.
Usable capacity is the amount of space available on the disks after overhead for the
operating system and RAID is deducted
deducted. All capacity values shown are calculated using
base T in which one terabyte equals 1 trillion bytes. Take a moment to review this diagram
before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 62
This lesson introduces common hardware features that you will find across Data Domain
Domain'ss
product line.
In this lesson you will learn about: a full list of common hardware features; N +1 hardware
redundancy; connectivity features; expansion shelves; and internal versus external storage.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 63
Here s a list of common hardware features across Data Domain models
Here's models. Data Domain systems are
based around the same basic hardware architecture. Documents for each hardware model are
published on the Data Domain support site. Hardware features common to all models include:
rack mountable in four-post racks; hot-swappable disks with redundant hot-swappable fans and
redundant hot-swappable power modules; serial port and copper Ethernet ports; DIMM modules
for RAM; a battery backed NVRAM card; video, keyboard, and mouse ports to connect to a
monitor and keyboard and mouse; front panel LEDs that provide system status indicators.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 64
Components under high mechanical or electrical stress such as spinning drives,
drives fans
fans, and power
supplies are provided with N+1 redundant configuration, N+1 redundancy is a system
configuration in which certain components have at least one independent backup component to
ensure system functionality continues if a part fails. This allows for uninterrupted operation at full
capacity and operational status if one component fails. For data, RAID 6 technology provides
additional protection of data integrity when up to two disks fail.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 65
Connectivity features include Ethernet connections,
connections Fibre Channel connections,
connections serial console
connection, keyboard, mouse, and monitor connections.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 66
Data Domain systems may be connected to Ethernet networks for TCP/IP
TCP/IP-based
based data transfer and
system management. All models have two built-in ports. Some models may be configured with
additional ports by adding optional Ethernet expansion cards. Newer systems also include a
dedicated Ethernet port for what is known as lights out management or remote system
management. Interface cards are usually added to provide additional network capacity.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 67
Connecting to a Fibre Channel
Channel-based
based storage area network is supported by adding a host bus
adapter card. In these environments the virtual tape library VTL software license is also required.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 68
For repairs in the field,
field access to the command line interface to shut down restart and run
diagnostics is usually through the serial port. The username for the built-in administrator is
sysadmin.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 69
Alternatives to serial port access are to connect over the network or directly to the unit with a
keyboard and monitor. Check with the onsite administrator for the preferred access method.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 70
The majority of Data Domain system models support the expansion of storage with the addition
of ES20 or ES30 expansion shelves. The 3U Data Domain ES20 expansion shelf has 16 disks for
increased storage capacity of certain models. The expansion shelf fits in a standard 19-inch rack
and connects to either a Data Domain system or to another expansion shelf.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 71
The 3U Data Domain ES30
The 3U Data Domain ES30 expansion shelf has 45 less power consumption than the ES20, 
expansion shelf has 45 less power consumption than the ES20
and utilizes a SAS II interface.  Otherwise, performance characteristics and usable capacity 
are the same.  Systems supporting the ES30 are shown.  Requirements include shelf capacity 
licensing, and DD OS 5.1.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 72
Here are the features of the ES30. Take a moment to review before proceeding.
Here are the features of the ES30 Take a moment to review before proceeding

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 73
This lesson describes the DD800 series appliances.
appliances
In this lesson you will learn about: DD800 series features; hardware; back view; and model
comparisons

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 74
Title
Month Year

The DD800 series consists of the DD890 and the DD860,


DD860 two high-end
high end single controller
deduplication storage systems. The DD890 is the industry's fastest single controller. Throughput
figures are up to 14.1 TB per hour with DD Boost, and up to 8.1 TB per hour VTL. Capacity figures
are up to 14.2 PB of logical capacity and up to 384 TB of raw and 285 TB of usable capacity. All of
this capacity fits in 20U, 1/2 of a rack.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 75
The DD890 hardware consists of dual socket 6 core processors; 96 GB of RAM,
RAM one 2 GB NVRAM
card; four hot-swappable disk drives; two Quad Port SAS cards with mini-SAS connectors to
connect ES20 or ES30 expansion shelves for external storage of data.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 76
The DD860 hardware consists of dual socket quad core processors; 36 GB of RAM expandable to
72 GB; one 2 GB NVRAM card; four hot-swappable disk drives; and two Quad Port SAS HBA's with
mini-SAS connectors to connect ES20 or ES30 expansion shelves for external storage of data.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 77
This slide illustrates the back view of the DD890.  Take a moment to review these details 
This slide illustrates the back view of the DD890 Take a moment to review these details
before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 78
This slide illustrates the back view of the DD860.
DD860 Take a moment to review before proceeding.
proceeding

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 79
This lesson describes the DD600 series appliances,
appliances designed for midsize enterprise data centers.
midsize enterprise data centers
In this lesson you will learn about: DD600 series specifications, features, and capacities.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 80
the DD600 series appliances are designed for midsize enterprise data centers.
midsize enterprise data centers The series
consists of the DD620, DD640, and DD670. These systems have different capacities for
optional internal and external expansion.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 81
Here are the specifications for the DD640.  Take a moment to review before proceeding.
Here are the specifications for the DD640 Take a moment to review before proceeding

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 82
Here are the features of the DD640 Take a moment to review before proceeding
Here are the features of the DD640.  Take a moment to review before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 83
Here are the features of the DD620 Take a moment to review before proceeding
Here are the features of the DD620.  Take a moment to review before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 84
Title
Month Year

Here are the features of the DD670 Take a moment to review before proceeding
Here are the features of the DD670.  Take a moment to review before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 85
This lesson describes the DD160 remote office appliance.
appliance
In this lesson you will learn about: the DD160 specifications, features, and hardware

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 86
Here are the specifications for the DD160.  Take a moment to review before proceeding.
Here are the specifications for the DD160 Take a moment to review before proceeding

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 87
Here are the features of the DD160.  Take a moment to review before proceeding.
Here are the features of the DD160 Take a moment to review before proceeding

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 88
This slide provides a quick review
This slide provides a quick review of how the different Data Domain models store data, from 
of how the different Data Domain models store data from
the high end to the low end models. 

• Most models store data only on disks in external expansion shelves. These include the 
DD860, DD890, DD Archiver and GDA. Models that store data only on expansion disks 
have four built‐in disks for the DD OS, boot and logs.

• The DD640 and DD670 store data both on built‐in disks and on optional expansion 
Th DD640 d DD670 t d t b th b ilt i di k d ti l i
shelves.

• Two models stores data only on built‐in disks: the DD620, and the DD160 branch office 


appliance.

• ES20 and ES30 expansion shelves are used for external storage.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 89
This lesson describes the EMC Data Domain Archiver.
Archiver
In this lesson you will learn about: archiving in the industry; you'll get an overview of the DD
Archiver including descriptions of features, and hardware, fault isolation and data recovery, and
replication with DD Archiver.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 90
For a long time EMC has recommended the practice of archive before backup as a best practice
for data management. There several reasons to archive data including to free up primary storage,
comply with regulations or policies, preserve intellectual property, better manage data, extend
primary storage, among other reasons. Archiving data by moving it from primary storage frees up
precious capacity. Often times a positive side effect of freeing up capacity and primary storage is
that the backup process includes less data and is therefore faster. A faster process means a
shorter backup window and faster recoveries. By implementing this practice, each process is
used and optimized for what it was designed to do: the backup process is used for data
protection and disaster recovery, while the archive process is used for data retention. When
possible implementing archive before backup practice leads to better utilization of resources and
achievement of data management objectives.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 91
In practice however
however, while every major enterprise implements data protection processes
processes, very few
have archive processes as widely or uniformly deployed. Everyone backs up their data while few
have archiving processes implemented. Data protection as backup is almost always seen as a
must-have requirement with lot of urgency and attention and budget dedicated to it. Occasionally
archiving can have similar level of mind share relative to data protection, but mostly in narrow
data sets that are typically associated with the retention of e-mail archiving of a few employees.
The more common situation is that there is little interest in enterprise-wide method to archive
data and the complexity of the implementation makes the effort daunting
data, daunting. As a consequence
consequence,
the most common way to comply with a newly imposed retention period is to extend the retention
of backups instead of starting a new archiving project. Administrators leverage the fact that all
data sets are being backed up. This method is by no means optimal the market for archiving
applications and the penetration of archiving processes in the enterprise will continue to grow
and evolve at some point archiving applications will be predominant in the enterprise informal
archive processes will enable enterprises to archive before backup in the meantime enterprises
are best
b t served db
by ddeploying
l i ab bridge
id platform
l tf th
thatt can b
be cost-effective
t ff ti and d supportt th
their
i
extended backup retention workloads.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 92
The EMC Data Domain Archiver system meets the growing need for long long-term
term retention of the
duplicated data. The Data Domain Archiver supports: incremental capacity growth, existing
archive backup and data management applications and existing policies, access to all data in the
system many times faster than access to data archived to tape and possibly vaulted. As shown in
the illustration data can be sent from archive servers, backup servers, file servers, or directly from
users to the DD Archiver. Illustration shows four archive storage units each with its own separate
store of recent, old, older, and oldest data. All the data from the most recent to the oldest is
available for access.
access

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 93
DD Archiver is the first long
long-term
term retention system for backup and archive specifically developed
for longer-term storage of larger amounts of backup and archive data that is infrequently
accessed. It incorporates a large tier of storage behind a standard Data Domain controller. It's
optimized for cost-effective retention of data with stronger compression algorithms used in the
archive tier for additional data size reductions. Because data in the archive tier is always
accessible, the risks and delays associated with the access of data from tape archives can be
eliminated.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 94
One main benefit of DD Archiver is that one controller is shared with up to 24 ES20 expansion
shelves. The system supports up to 570 TB usable storage and retains up to 28.5 PB of data --
that's the logical capacity -- for as low as $.50 a gigabyte. The system ingests data at a rate of up
to 8.1 TB per hour. It also lowers data retention costs with just one deduplication engine (the
controller) shared over up to twice as many storage shelves as other Data Domain single
controller models.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 95
As shown in this slide,
slide data can be sent from archive servers,
servers backup servers,
servers file servers,
servers or
directly from users to the DD Archiver. The data first lands in the active tier. The controller, also
sometimes referred to as a data-less head, or DLH, deduplicates and compresses the data in line
and stores the data on one or more connected expansion shelves. The controller and expansion
shelves together make up the active tier. Additional shelves can be added later to the active unit
if the amount of data that needs to be deduplicated increases. Data remains in the active tier for
customer configured amount of time: usually a period between weeks and months, and up to a
maximum of 90 days.
days

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 96
DD Archiver introduces the concept of an archive tier tier. Based on a customer configured data
movement schedule, data can be moved from the active tier to the first archive unit in the archive
tier. Note that data movement is often referred to as data migration. Data movement is based on
the last modification date and the data movement schedule. Note that the data is only written
into one archive unit. The Data Invulnerability Architecture checks files after they are written into
the archive tier. A file is removed from the active tier only after DIA verifies that the file has been
correctly copied to the archive tier. Data in an archive tier is retained for much longer periods of
time than in the active tier
tier, usually for months to years.
years

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 97
One or multiple shelves can be configured as an archive unitunit. A full archive unit is automatically
sealed data is not written into a sealed unit but files can be deleted. For fault isolation,
namespace information and system files are copied into the sealed unit so data can be recovered
even if other parts of the system are lost. Data in a sealed archive unit remains accessible.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 98
After one archive unit is filled and sealed,
sealed the DD Archiver begins to write data into a
second archive unit.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 99
 When the second archive unit is filled and sealed,
sealed the DD Archiver begins to write data
onto a third archive unit.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 100
 And so on.
on

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 101
DD Archiver hardware is a specific configuration of the DD860 with 72 GB of RAM,
RAM one NVRAM
card, and three quad port SAS cards to connect 1 to 24 shelves. The system also includes two 1
Gb Ethernet ports on the motherboard and up to two optional 1 Gb or 10 Gb Ethernet cards for
additional conductivity.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 102
Looking at the DD Archiver back view,
view the tables and diagram here show the slot assignments
and ports. Take a minute to review these details before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 103
DD Archiver uses sophisticated fault isolation
isolation. Each archive unit is self-contained
self contained. Data in each
archive unit is deduplicated separately from data in other archive units. When archive unit is
filled up, namespace information and system files are copied into the unit before it is sealed.
Data in a sealed archive unit can then be recovered even if other parts of the DD Archiver system
are lost. The DD Archiver can tolerate partial failures so operations can continue as usual. Unlike
in a non-DD Archiver Data Domain system, the DD Archiver file system can come up even if one or
more archive units are not available because each archive unit is self-contained. Here's an
illustration of the process: failure of a sealed archive unit affects only its contents; if a sealed
archive unit fails the rest of the DD Archiver system continues to work as usual.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 104
Here’ss an illustration of the process
Here process.

Failure of a sealed archive unit affects only its contents.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 105
If a sealed archive unit fails
fails, the rest of the DD Archiver system continues to work as
usual.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 106
If a replica exists
exists, a single failed archive unit can be replaced with a new archive unit and
the data can be restored onto the new archive unit from its replica. If no replica exists and
if access is attempted on a file that is no longer in the system, an error is returned or the
filesystem may restart.
Data Domain support should be called to prune the missing files. See the DD 860
Archiver Administration Guide for more details.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 107
 If the controller fails,
fails a new controller can be swapped into the DD Archiver (called a
head swap)

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 108
 If the active tier fails,
fails the failed active tier can be replaced by a new active tier.
tier

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 109
 After the new active tier is configured
configured, the sealed archive units can be reassembled
into the new DD Archiver by Data Domain support.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 110
This illustration shows the basic topology for replication with the Archiver system.
system Note that the
units at the remote DR site on the right mirror the units at the data center. Collection replication is
performed through the controller at the data center to the controller at the DR site. If a single unit
goes down, a Data Domain support engineer can perform single unit recovery from the matching
unit at the DR site. Replication at a DR site is highly recommended with DD Archiver.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 111
This lesson describes Global Deduplication Array or GDA.
GDA
In this lesson you will learn about: the role played by GDA in large and distributed data centers;
overview of GDA and benefits with a description of features and hardware; and VTL context for
GDA

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 112
Key challenges in large and distributed data centers include lack of backup
storage infrastructure scalability in which large backup policies do not fit in a
single system. The use of multiple systems leads to inefficient deduplication and
it is also difficult to maintain balanced utilization of storage resources. A lack of
scalable and network efficient disaster recovery solutions leads to: complexity of
managing a tape-based DR solution, with limited and cost prohibitive network
infrastructure to handle DR via replication, and complex end-to-end backup
lifecycle. Administration involves backup administration, capacity planning, and
s st
system managementt whichhi h is exacerbated
b t d by
b multiple
lti l storage
st systems
s st s withith the
th
lack of integration and backup applications to centrally manage the lifecycle.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 113
Data Domain Global Deduplication Array is designed to address these challenges
challenges. The Global
Deduplication Array is a dual controller deduplication storage system with global deduplication in
a single namespace across to DD890 controllers, providing massive scalability and simple
management. The system is the industry's fastest in-line deduplication storage system. It is the
largest Data Domain system. GDA provides up to 26.3 TB per hour and is now up to seven times
faster than its primary competitor IBM ProtecTIER. In addition Global Deduplication Array offers
up to 28.5 PB of logical capacity. Enterprises with hundreds of terabytes of data can use Global
Deduplication Array to store and protect months of backups in the same number of floor tiles that
would normally provide only a few days of tape staging.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 114
This is a partial list of the various backup applications that can now be used with GDA.
GDA In the case
of EMC networker, Symantec NetBackup and BackupExec, GDA supports DD Boost or VTL access.
DD Boost is preferred but the VTL option is available if there is a customer with preference for VTL
over Fibre Channel.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 115
VTL support makes GDA compatible with 100% of leading backup software applications that
support VTL, including IBM Tivoli Storage Manager, with the VTL license. Four 8 Gb Fibre Channel
HBA cards, two per controller, can be utilized for fast VTL access.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 116
VTL support also means the GDA provides a much larger total deduplication storage capacity for
VTL's. The GDA VTL now supports double the number of virtual tape drive supported by any other
Data Domain system, with support for 512 virtual tape drives. The 256 maximum per controller
limit is the same as for the DD 890 and the DD880 systems. The GDA also supports a large
number of active backup and restore streams. The GDA supports a maximum total of 270 write
streams for each GDA controller. The maximum number of write streams is 256, compared to a
maximum of 180 write streams on single controller models.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 117
This slide illustrates the back view of the GDA.
GDA In addition to the slots dedicated to the SAS and
NVRAM card, in a GDA configuration of two DD890 controllers, slot 1 in both controllers is
reserved for mandatory 10 GE optical card. This card is used for the GDA connectivity between the
controllers. Please take a moment to review before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 118
This lesson describes features for managing Data Domain systems.
systems
In this lesson you will learn about: system management interfaces; the Enterprise Manager; Data
Domain System Management Framework; and lights out management.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 119
Data Domain systems are managed using sophisticated tools,tools including EMC Data Domain
Enterprise Manager: a graphical user interface which is web-based. Enterprise Manager provides
simple configuration wizards and displays of resource usage and performance reporting, with a
single interface to manage multiple systems. The Data Domain Management Framework involves
the CLI or command line interface, as well as infrastructure with rich functionality for script
automation ,SSH enabled remote administration through a protocol known as IPMI or intelligent
platform management interface, SNMP-based alert monitoring, integration with syslog monitors,
comprehensive product security,
security and integrated email home functionality.
functionality

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 120
Data Domain Enterprise Manager provides direct access to system configuration and
management features in easy to use interface the web-based GUI covers configuration
management monitoring and reporting and administration.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 121
The Data Domain Management Framework includes the command line interface CLI framework,
framework
with complete command coverage for all DDOS functionality, and enhanced auto-completion
functionality. The framework allows you to configure alerts and monitoring with SNMP MIB
support for DD system serial number, FileSystemClean and FileSystemSpace; configurable alert
classes and severity levels with assignment of alert notifications to users via mailing lists. The
framework also includes role-based access control with security officer roles for encryption of
data at rest, and the support includes 24/7 Data Domain support.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 122
Many customers deploy Data Domain systems at remote locations that do not have IT
staff on site. Customers want the following remote management options without having
to send someone to the site: checking of power status; turn power on or off; or power
cycle systems that are hung; and access to system serial console to run off-line
diagnostics and repair problems.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 123
Remote management of systems
systems, whether or not they are powered on,
on is called lights out
management.
Data Domain systems provide the ability to do the following over the network: manage power
with intelligent platform management interface or IPMI commands; access the serial console with
serial over LAN or SOL.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 124
This illustration shows the basic architecture involved in lights out management.
management As indicated,
indicated in
the remote system at the top, baseboard management controller or BMC in each Data Domain
system has IPM I version 2.0 firmware. IPMI is an industry-standard protocol designed for remote
management of systems. An administrative user logged in to one of the local systems can issue
power management commands and get SOL access for running diagnostics and troubleshooting
over the WAN to manage the remote target. Remote power management includes: power on the
Data Domain system after power outage; power cycle after crash; power off for power savings on
systems not currently in use; and the ability to obtain power status
status. SOL gives remote access to
the target systems serial console.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 125
In this module you learned about:
In this module you learned about:
Current product line
Common hardware features
Overview of internal and external storage
DD800 series appliances
DD600 i
DD600 series appliances
li
DD140 remote office appliance
DD Archiver
Global Deduplication Array (GDA)
System management features

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 126
This module provides an overview of Data Domain software options.
options
In this module you will learn about: Data Domain license options including: DD Boost; the
virtual tape library with two variations; DD replicator; DD encryption; and DD retention
lock.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 127
This slide shows all the licenses for software options and for hardware and capacity.
capacity Take a
minute to review this table before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 128
This lesson provides an introduction to the DD Boost option.
option
In this lesson you will learn about: DD Boost benefits; the deduplication process with and
without; a feature of DD Boost known as Distributed Segment Processing or DSP; replication
awareness; and advanced load-balancing and link failover.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 129
EMC Data Domain Boost extends optimization capabilities of Data Domain solutions.
solutions The main
advantages for DD Boost are: improved throughput; backup server managed file replication; and
backup server replication awareness. How is this possible, and why is this important in a backup
recovery and archive environment?

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 130
The benefits of DD Boost include improved throughput.
throughput This is improved throughput of retaining
data backups that is accomplished with the open storage technology (OST) protocol, and the
Data Domain Distributed Segment Processing, DSP. OST, when compared to CIFS and NFS
protocols, has a faster throughput. The Distributed Segment Processing offloads part of the
deduplication process to the backup server. Why is backup server managed file replication
beneficial? without DD Boost, replication is initiated and managed by the Data Domain system.
The backup server is not aware of the data written on the replica Data Domain system, system B
in the diagram
diagram. With DD Boost
Boost, the backup server can now control and manage file replication
replication.
Why is this backup server awareness of the replica Data Domain system beneficial? Because DD
Boost allows for the backup server to be aware of the replica system, it can directly restore and
recover from the replica. Prior to DD Boost, the backup server didn't control or manage replica
data. This made restoring the data from replica a manual process.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 131
Let s compare deduplication process with and without DSP
Let's DSP. in this illustration,
illustration the deduplication
process is performed on the Data Domain system without DSP. Without DSP enabled, the entire
deduplication process is performed on one system. The Data Domain goes through the process of
segmenting, fingerprinting, filtering, compressing, and finally writing the data to disk. The backup
server has no part in the deduplication process.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 132
With DSP enabled,
enabled the process is distributed between the backup server and the Data Domain
system. The creation of segments and fingerprints, along with the segment compression, is
performed on the backup server. The Data Domain system filters the fingerprints and writes the
segments to the Data Domain system disk. Note that the overall deduplication process remains
the same with or without DSP. Enabled DSP merely changes which component – whether the
backup server or the Data Domain system – performs parts of the process.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 133
The DD Boost advanced load load-balancing
balancing and link failover functionality supports transparent
failover of jobs. In-flight jobs on failed ports on the DD system are transparently moved to healthy
links on the Data Domain system, and subsequent jobs are sent to the healthy links. This further
improves the enterprise robustness of the backup environments that use Data Domain Boost.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 134
This lesson provides a brief introduction to DD virtual tape library.
library
In this lesson you will learn about: the challenges with traditional VTL's; DD VTL benefits; DD VTL
deployment; DD VTL specifications; support for IBMi environments with VTL; and enhanced
support for NAS environments.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 135
Title
Month Year

The challenges with traditional VTL


VTL'ss include: disk costs too expensive for long-term
long term retention;
backup data that's too large to replicate over the WAN to a DR site; and the requirement of
creating physical tapes to support post backup operations.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 136
To summarize the benefits of Data Domain VTL,VTL it starts with economics
economics. There is less disk to
resource and less to manage. The CPU-centric deduplication approach of SISL allows the system
to be simpler to manage, as well as easier to provision. In addition, Data Domain is more mature
and flexible than most of its competitors. Finally, because of their resilience and replication
flexibility Data Domain systems are highly reliable when deployed for VTL environments.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 137
Title
Month Year

Data Domain systems support backups over the SAN and LAN.
LAN The backup application manages
all data movement to and from Data Domain systems. The backup application manages physical
tape creation. Data Domain replicator software manages virtual tape replication, and Data
Domain Enterprise Manager is used to configure and manage tape emulations.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 138
This slide displays specifications for VTL deployments. Take a minute to review these
numbers before proceeding.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 139
Using an enhancement to the VTL software option Data Domain systems provide the
fastest backup throughput available with up to 8.1 TB per hour ingest for IBMi operating
environments. The licensable feature is available for both the DD600 and the DD800
series systems, and offers broad scalability up to 14.2 PB of logical capacity. This
capability is also available as an upgrade. The solution is available for environments that
use Backup Recovery and Media Services (also known as BRMS) as the backup
application
pp to p
protect data stored on IBMi p
powered systems
y runningg IBMi versions 5.4,,
6.1 or 6.1.1. The Data Domain system supports simultaneous use with other data access
protocols including Data Domain Boost, CIFS and NFS, and VTL to provide consolidated
protection for IBMi and open systems environments.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 140
DD OS 5.0
5 0 and later includes an NDMP tape server feature that simplifies the deployment of Data
Domain systems for network attached storage or NAS devices. The feature is ideal to provide data
protection for NAS systems and data centers or remote offices that use an Ethernet infrastructure,
and complements the existing VTL over Fibre Channel capability. The capability is included with
purchase of the VTL software option, bundled with Fibre Channel HBA's, and is available to the
VTL installed base with an upgrade to DD OS 5.0 or greater.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 141
This lesson provides a brief introduction to DD replicator.
replicator
In this lesson you will learn about: the benefits of DD replicator licensing, and encrypted
replication.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 142
Because there
there’ss a variety of IT centers and visions for how long vaulting should work,
work with Data
Domain products EMC offers a wide variety of options for deploying different topologies cost-
effectively, from centralized support of remote office backup over a WAN, to protecting larger peer
data centers. All of these configurations are available.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 143
The main benefits provided by DD replicator licensing include network efficient asynchronous IP
replication; easy integration with existing backup and archive applications; and environments
and extensive feature set for real world enterprise providing the greatest deployment flexibility.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 144
Data Domain replicator licensing supports replication and includes the addition of an
SSL-based encrypted replication capability. This functionality is included at no additional
charge and available to customers that have a DD replicator license and a valid support
contract. Once the source and destination systems have authenticated, secure replication
connections are established using the standard SSL protocol, which encrypt data and
metadata using 256-bit AES key strength. The encrypted replication has a minimum
performance impact,
p p , and the capability
p y works concurrentlyy with DD encryption
yp of data at
rest.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 145
This lesson provides a brief introduction to DD encryption.
encryption
In this lesson you will learn about: security challenges in backup; the benefits of DD encryption
licensing; encrypted replication; challenges of encrypting with the duplication; Data Domain in-
line encryption key management; and data integrity and interoperability and transparency with
DD encryption

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 146
Increased levels of publicized loss of tape and disk
disk-based
based backups coupled with compliance
mandates are driving the need for customers to encrypt their data at rest. The Data Domain
encryption software option provides a way for organizations to secure the data that resides on
their Data Domain systems There are two types of encryption: encryption for data in flight means
encryption as data is transported. Only at the source and the destination is the data's true
meaning apparent. Encryption for data at rest involves data that is physically stored in an
encrypted manner, such that the data can be removed or copied and taken to another
environment It cannot be accessed without decrypting it.
environment. it

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 147
The benefits and goals of Data Domain encryption are to protect against theft or loss of the
system in transit; to protect against theft or loss of physical storage media; to allow failed drives
to be returned to factory securely; and to provide adequate data encryption security to meet basic
compliance regulations. DD encryption encrypts all system data at rest in the physical storage to
provide adequate encryption key management to ensure key integrity and security.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 148
When deduplication is involved there are challenges
challenges. There are three approaches: encrypt before
deduplication, which leads to poor compression; encrypt after deduplication with an adjunct
gateway solution (additional hardware is required in this circumstance and it's complex to
manage); or integrated deduplication and encryption. This is the best of both worlds with security
and space savings, but it's not easily implemented, requires architecture suited to in-line
deduplication such as that provided by Data Domain systems.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 149
Data Domain
Domain'ss in
in-line
line encryption provides real
real-time
time data encryption with deduplication,
deduplication and
immediate data protection. There's no postprocessing encryption which is safer and more secure.
Data is encrypted immediately. There's no window of exposure and the process is predictable
and simple. Data Domain in-line encryption also involves SISL architecture, leveraged for
optimized encryption with the same level of deduplication provided to non-encrypted data the
software-based approach requires no additional hardware.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 150
Key management and data integrity involves in in-box
box design for simple key management; a single
key for the entire system; robust protection against accidental key loss, with passphrase
protection of encryption keys; and locking file system. Data Domain Data Invulnerability
Architecture with RAID 6 provide added protection.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 151
The Data Domain encryption software option greatly simplifies the encryption management as
encryption is done on the Data Domain system transparently to the applications writing to it. This
allows the ultimate flexibility in selecting and changing your applications without impacting the
encryption process. Complete Protocol and Application transparency allows encryption of all data
types via any backup application, archiving application or simple file sharing using NAS
interfaces.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 152
This lesson provides a brief introduction to DD retention lock.
lock
In this lesson you will learn about: DD retention lock benefits and the added security that comes
with combining DD retention lock with DD encryption

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 153
IT governance policies require that critical business records,
records files
files, or e-mail
e mail be retained for
specific periods of time. Ensuring that these policies are being implemented and enforced is an
ongoing challenge for IT staffs. Retention lock software enables users to easily implement
deduplication with file locking to satisfy IT governance and compliance policies. DD retention lock
includes electronic data shredding. Electronic data shredding is performed on a per file basis,
ensuring that deleted files have been disposed of in an appropriate and permanent manner. DD
retention lock includes enforced retention for active archiving. Files are retained on disk in a non-
rewritable and non-erasable
non erasable format.
format Retention parameters can be set on a file by file basis
basis.
Minimum and maximum retention periods can be set globally. DD retention lock also involves
operational flexibility: changing regulations and policies can be addressed quickly.
Administrators can easily react to changes in security or retention policies.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 154
Combined with DD encryption DD retention lock is designed to provide the highest level of data integrity and security,
working transparently with backup and archive applications as well as NAS environments working over Ethernet.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 155
In this module you learned about: Data Domain license options; DD Boost DD virtual tape library
and two variations; DD replicator; DD encryption; and DD retention lock.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 156
To learn about EMC Data Domain products and solutions
solutions, consult the following resources.
resources
For product information, including overviews, data and specification sheets, and white papers,
visit the link shown at EMC’s website.
For product documentation, knowledge base articles, and additional white papers, visit
my.datadomain.com. This site requires a login.
To find and enroll in follow-on training covering a wide range of topics including system
installation and maintenance,
maintenance integration and implementation,
implementation administration and
troubleshooting, visit EMC Education Services, using the link shown. Search for Data Domain to
view a complete list of offerings.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 157
This course covered the topics shown
This course covered the topics shown.
This concludes the training. Proceed to the course assessment on the next slide.

Copyright © 2012 EMC Corporation. Do not copy - All Rights Reserved. EMC Data Domain Technology and Systems Introduction 158

Você também pode gostar