Escolar Documentos
Profissional Documentos
Cultura Documentos
Presentation Download
Since 2009, there have been a number of changes and advancements in the storage environment. Data deduplication was the # 1 storage technology being evaluated by storage professionals last year. This presentation download will explain how to leverage data deduplication technology to benefit your organization. Download to learn the answers to the following questions: How do recent major acquisitions affect the options in the dedupe marketplace? Is everyone doing dedupe now? Are all dedupe products roughly equivalent, or are their advantages of certain approaches? These questions and more will be answered by storage expert, W. Curtis Preston, in this Dedupe School seminar presentation download.
Sponsored by:
A Little About Me
When I started as backup guy at $35B company in 1993: Tape Drive: QIC 80 (80 MB capacity) Tape Drive: Exabyte 8200 (2.5 GB & 256KB/s) Biggest Server: 4 GB (93), 100 GB (96) Entire Data Center: 200 GB (93), 400 GB (96) My TIVO now has 5 times the storage my data center did! Consulting in backup & recovery since 96 Author of OReillys Backup & Recovery & Using SANs and NAS Webmaster of BackupCentral.com Founder/CEO of Truth in IT Follow me on Twitter @wcpreston
Agenda
Understanding Deduplication Using Deduplication in Backup Systems Using Data Reduction in Primary Systems Recent Backup Software Advancements Backing up Virtual Servers Backups on a Budget Stump Curtis
Session 1
Understanding Deduplication
Why Disk?
First a little history
More History
Plan A: Stage to disk, spool to tape Pioneered by IBM in 90s, widely adopted in late 00s Large, very fast virtual disk as caching mechanism to tape Only need enough disk to hold one nights backups Helps backups; does not help restores Plan B: Backup to disk, leave on disk AKA the early VTL craze Helps backups and restores Disk was still way too expensive to make this feasible for
most people
Plan C: Dedupe
Its perfect for traditional backup Fulls backup the same data every day/week/month Incrementals backup entire file when only one byte changes Both backup file 100 times if its in 100 locations Databases are often backed up full every day Tons of duplicate blocks! Average actual reduction of 10:1 and higher Its not perfect for everything Pre-compressed or encrypted data File types that dont have versions (multimedia)
Naysayers
Eliminate all but one copy? No, just eliminate duplicates per location What about hash collisions? More on this later, but this is nothing but FUD If youre unconvinced, use a delta differential approach Doesnt this have immutability concerns? Everything that changes the format of the data has
immutability concerns (e.g. sector-based storage, tar, etc)
What about the dedupe tax? Lets talk more about this one in a bit
Is There a Plan D?
Some pundits/analysts think dedupe (especially target dedupe) is a band-aid, and will eventually be done away with via backupsoftware-based dedupe, delta-backups, etc. Maybe this will happen in a 3-5 year time span, maybe it wont. (In fact, some backup software companies will tell you they dont need no stinking dedupe appliances.) Thats still no argument for not moving on whats available to solve your problems now
Length of retention (longer retention = more dupes) Redundancy in single full backup (if your product notices)
Things that confuse dedupe Encrypting data before the dedupe process sees it Compressing data before the dedupe process sees it Multiplexing to a VTL
Delta differential Exagrid, IBM Protectier, Ocarina, SEPATON Some systems may use a hybrid approach
Chunking/Hashing Method
Slice all data into segments or chunks Run chunk through hashing algorithm (SHA-1) Check hash value against all other hash values Chunk with identical hash value is discarded Will find redundant blocks between files from different file systems, even different servers
Tonights backup of Elvis is compared byte-by-byte to last nights backup of Elvis & redundant segments are found
Most used method with most mileage Some concerned about hash collisions (more on this later) Compares everything to everything, therefore gets more dedupe out of similar data in dissimilar datasets (e.g. production and test copy of same data)
Delta Differentials
Faster than hashing No concern about hash collisions Only compares like backups, so will get no dedupe on similar data in dissimilar datasets, but does get more dedupe on same data
What will you get? Only testing with your data will answer that question.
10-15: Odds of single disk writing incorrect data and not knowing it (Undetectable Bit Error Rate or UBER) With SHA-1, we have to write 6.6 PB to get those odds 10-5: Worst odds of a double-disk RAID5 failure We have to write 1,371,181 YB to reach those odds Original formula here: http://en.wikipedia.org/wiki/Birthday_attack Original formula modified with MacLaurin series expansion to mitigate Excels
lack of precision and is here: backupcentral.com/hash-odds.xls
Data is sent unmodified across LAN & deduped at target No LAN/WAN benefits until you replicate target to target Cannot compress or encrypt before sending to target
Source Dedupe
Redundant data is identified at backup client Only new, unique data sent across LAN/WAN LAN/WAN benefits, can back up remote/mobile data Allows for compression, encryption at source
Hybrid
Fingerprint data at source, dedupe at target Allows for compression, encryption at source
Integrated Target Dedupe Symantec NetBackup Integrated Source Dedupe Asigra, Symantec NetBackup Standalone Source Dedupe EMC Avamar, i365 eVault, Symantec NetBackup Hybrid CommVault Simpana
Multi-node Deduplication
AKA Global Deduplication AKA Clustered Deduplication
Your dataset sizes never change A given dataset never outgrows a node
Some single-node sales reps will point out that this also doesnt harm your dedupe ratio because most dedupe is from comparing like to like. Theyre also the same ones claiming they get better dedupe because they compare all to all. Which is it?
When Is It Deduped?
AKA Inline or Post Process?
For every 100 GB an inline hash system writes 10 GB to disk For every 100 GB an inline delta system writes 10 GB, reads 100 GB from disk For every 100 GB a post process hash system writes 100 GB, reads 100 GB, and deletes 90 GB from disk For every 100 GB a post process delta system writes 100 GB, reads 200 GB, and deletes 90 GB from disk Common sense seems like inline has a major advantage Things change when you consider the dedupe tax
You dont need as much staging disk as you might think Inline vendors may slow down large backups and restore. They always rehydrate. We only rehydrate older data.
Is There an Index?
What happens if the index is destroyed? How do you protect against that? Does it need its index to read the data? What do you to verify data integrity? What about malicious people? Some dedupe vendors arent very good at answering these questions, partially because they dont get them enough Make sure you ask them
Session Two
Using Deduplication in Backup Systems Using Data Reduction in Primary Systems
Adding storage Replacing drives (how long does rebuild take?) Monitoring, etc
VMware Backup
One of the challenges with typical VMware backup is the I/O load it places on the server Source dedupe can perform an incrementalforever backup with a much lower I/O load Could allow you to continue simpler backups without having to invest in VCB
Test Everything
Installation and configuration, including adding additional capacity Support call and ask stupid questions Dedupe ratio Must use your data Must use your retention settings Must fill up the system All speeds Backup speed Copy speed extremely important to test Restore speed Aggregate performance With all your data types Especially true if using local dedupe Single stream performance Backup speed Restore and copy speed (especially if going to tape) Replication Performance Lag time (if using post process) Dedupe speed (if using post process) Loss of physical systems Drive rebuild times Reverse replication to replace array? Unplug things, see how it handles it Be mean!
Options
Compression File-level dedupe Sub-file-level dedupe Some files compress, but dont dedupe Some files dedupe but dont compress well
Vendors
Compression Storwize, Ocarina File-level dedupe EMC Celerra Sub-file-level dedupe NetApp ASIS, Ocarina, Greenbytes, Exar/Hifn, SNOracle Usually you get compression or dedupe Ocarina & Exar claim to do both compression and sub-file-level dedupe
Contact Me
Email curtis@backupcentral.com Websites to which I contribute: http://www.backupcentral.com http://www.searchstorage.com http://www.searchdatabackup.com Follow me on Twitter @wcpreston My upcoming venture: http://www.truthinit.com
R E S O U RC E S F RO M O U R S P O N S O R
The ROI of Backup Redesign Using Deduplication: An EMC Data Domain User
IDC Executive Guide: Assess the Value of Deduplication for your Storage
Consolidation Initiatives