Você está na página 1de 48

Data Protection, Recovery, and High

Availability for Oracle Database


Paisit Wongsongsarn
Master Principal Storage Architect

Oracle Corporation Asia Pacific

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Whoever Here Have Oracle DB
in Your Company?

Copyright 2014, Oracle and/or its affiliates. All rights reserved. | 2


Program Agenda

1 Today Data Protection Challenge


2 Data Protection and Oracle MAA

3 Bronze and Silver Data Protection

4 Gold and Platinum Data Protection

5 Reference Architecture

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Program Agenda

1 Today Data Protection Challenge


2 Data Protection and Oracle MAA

3 Bronze and Silver Data Protection

4 Gold and Platinum Data Protection

5 Reference Architecture

Copyright 2014, Oracle and/or its affiliates. All rights reserved. | 4


How Much Does Downtime Cost Your Business?
What are the main causes of unplanned system downtime?

Studies show disasters cause an average of


2.2 days of downtime costing $366,363 for
the majority of businesses

Source: Acronis, The Acronis Global Disaster Recovery Index: 2012

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Inadequate Data Protection = Downtime
5-day
outage
European Cloud Infrastructure Provider
Storage array failed, unable to read tape backups used for DR

8-day
Global Specialty Retailer outage
Disk failure, followed by mirrored disk failure. Restore from local
backup failed. Restore using copy at DR site also failed.

U.S. State Government 5-day


SAN memory failure, problem mirrored to standby SAN. outage

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Data Protection is the Backbone of Business Continuity &
Disaster Recovery
Backups are too slow

Significant challenges Backups need constant


for data protection? management
#1: Backing up and Complex Recovery
managing increasingly
large data volumes Production slow down

Customers Need a Trusted Coordination of


multiple backups
Partner to Architect the Best
Data Protection Strategy!
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |
Which One is More Important?
BACKUP v.s. RECOVERY

Copyright 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal Use Only
Program Agenda

1 Today Data Protection Challenge


2 Data Protection and Oracle MAA

3 Bronze and Silver Data Protection

4 Gold and Platinum Data Protection

5 Reference Architecture

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Understanding The Basics
of Backup & Recovery

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Data Protection: Understanding RPO/RTO Requirements

Last known good Incident Application


image of data restarted
Data evolution over time

Modifications since last backup Detection Restore Recover

Data

Recovery Point Objective - RPO


Tolerance for Data Loss (secs,hours, Recovery Time Objective RTO
days); determines frequency of The shorter the RTO, the quicker you
backup/replication approaches get back to business

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Database Data Protection Technologies
All About RTO & RPO

Protection Downtime Data Loss Exposure


Technology Against Type of Recovery Time Objective , Recovery Point Objective ,
Failure RTO RPO

Days/Hours
Oracle Secure Backup Physical Hours/Days
(from last backup)
Hours
Recovery Manager (RMAN) Physical Hours/Days
(from last backup)
Minutes/Hours
Flashback Technologies Logical Minutes/Hours
(from point-in-time)

Data Guard Site Failure / DR Seconds/Minutes Zero/Seconds

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Storage Data Protection Technologies
All About RTO & RPO

Protection Downtime Data Loss Exposure


Technology Against Type of Recovery Time Objective , Recovery Point Objective ,
Failure RTO RPO

Oracle SAN FS1 Hours


Logical Minutes/Hours
(snapshot & clone) (from last snapshot)
Oracle NAS ZS3 Hours
Logical Minutes/Hours
(snapshot & clone) (from last snapshot)
Minutes/Hours
Oracle Tape Physical Hours/Days
(from last backup)
Oracle Storage Replication
Site Failure / DR Minutes/Hours Zero/Seconds/Minutes
(FS1 and ZS3)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Database Data Protection and Availability
Design Principles

Data Protection at Strong Fault Isolation: Real-time HA/DR:


Every Level Real-Time Validation All Components Active

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


The MAA Approach

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Maximum Availability Architecture (MAA)

Production Site Enterprise Manager Cloud Control Active Replica


Site Guard, Coordinated Site Failover Active Data Guard
RAC Application Continuity Data Protection, DR
Scalability Application HA Query Offload
Server HA
Global Data Services
ASM GoldenGate
Service Failover / Load Balancing
Local storage Active-active replication
protection Heterogeneous

Flashback
Human error
correction

Edition-based Redefinition, RMAN, Oracle Secure Backup,


Online Redefinition, Data Guard, GoldenGate Recovery Appliance
Minimal downtime maintenance, upgrades, migrations Backup to disk, tape or cloud

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle MAA Availability Tiers
Availability Service Levels for Unplanned and Planned Outages

Zero Outage for Platinum Ready Applications


PLATINUM Zero data loss

Comprehensive HA and Disaster Protection


GOLD Recovery in seconds with zero or near-zero data loss

High Availability (HA) for Recoverable Local Outages


SILVER Backups plus redo for Oracle data protection

Basic Service Restart


BRONZE Backups plus redo for Oracle data protection

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle MAA Availability Tiers
Reference Architectures

Platinum-Ready Apps
PLATINUM Clusters and Replication

Clusters
GOLD Replication

Clusters
SILVER
Backups

Single Instance
BRONZE
Backups

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Problem Statement: Lack of Intelligent Data Validation

Data can be corrupted anywhere


and anytime and can be undetected unless touched Checksum is not sufficient

How do we know restore and


recovery will succeed?

Is my mirrored copy corrupt too?

Can I achieve recovery SLAs?

Backups and DR without validation is enormous risk Validation is helpful everywhere: I/O, memory, storage,
Do not guarantee working or meeting recovery SLAs Oracle data block, inter-block, database and application

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Third Party Backups lacks data protection and validation
A backup is meaningless if it does not result in successful recovery
Recover operation can fail IF:
Backup script is incorrect or incomplete (e.g. missing data files, archives, control files)
Backup operation is incorrect (e.g. online backups without database in backup mode)
Backups are corrupted (from source, from storage or media)
Most backup appliances do not have ongoing checks and validations
Reality: The inability to recover successfully results in extended
downtime, lost revenue, damaged reputationsand career changes.

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Common Misconception

Is storage snapshot equal to backup?


Is deduplication work well with Oracle DB?

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Snapshots 101

Copy-on-write method Redirect-on-write method


Block C is updated New block (C') written directly to the
Old block copied to new location snapshot storage
New block (C) written to original location No double write
Double write Active version becomes fragmented over
time

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Snapshots Are NOT Backups
NO awareness of the Oracle block structure operate at storage-block level
Physically different than backups mixture of pointers and data blocks
A corruption in never-changed block, can affect entire series of snapshots
Therefore..not true DR protection
Not to mention..database performance impact due to:
Double writes
Fragmented block state, after reverting to snapshot
Net-net: snapshots good for test/dev/QA purposes, when created off secondary
database copy

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Other Observations
Storage vendors position mirroring of DB storage volumes + snapshots off
mirror, as DR
What gets lost in translation:
No Oracle validation: I/O corruption on primary array replicates to secondary array
RMAN backup does not let that happen
No storage savings: Mirroring requires 1X storage of DB + time to create the copy
No different than RMAN image copy
Mirroring SW license is extra $$
No DB SW savings:
If DB will be run at mirrored location, requires license (per http://www.oracle.com/us/corporate/pricing/data-
recovery-licensing-070587.pdf)

Much better solution: RMAN + Data Guard

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Backup and Deduplication 101
Fact about Deduplication For Oracle Database Backups
Deduplication
Description RMAN Incremental Backups Appliance
Backup level Incremental Recommend Full

Deduplication type Source sends changed data only Target Inline dedup
save network bandwidth
Oracle database-aware Yes - HCC No no HCC awareness

Backup media options Disk or tape or both Disk only

Backup performance Faster than full backup depending on the Full backup time + deduplication
database change rate processing time

Restore performance Similar to that of backup operation with Restore is longer due to rehydration
merged full backup process

Incremental backups use less IO/Memory/CPU than full backups


Database data is not really redundant hence less dedup ratio
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda

1 Today Data Protection Challenge


2 Data Protection and Oracle MAA

3 Bronze and Silver Data Protection

4 Gold and Platinum Data Protection

5 Reference Architecture

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Bronze and Silver Data Protection

We can do a much better job preventing and Id like to know that my backups are validated when
repairing corruptions in real time. they are created, and on a regular basis to make
sure they are good. I want to be alerted whenever a
database can NOT meet my recovery SLAs.

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle DB Data Protection
Bronze - Single Instance Oracle Database (MOS 1302539.1)

Capability Physical Block Corruption Logical Block Corruption


DBverify, Logical checks for intra-block and
Manual

Physical block checks


Analyze inter-object consistency
RMAN, ASM Physical block checks Intra-block logical checks

Database In-memory block and redo checksum In-memory intra-block checks


Runtime

Automatic corruption detection and


ASM
repair using extent pairs

HARD checks on write, automatic


Exadata HARD checks on write
disk scrubbing and repair

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


MAA: Data Protection for All Databases
Validation, Detection and Repair in Memory, during I/O and on Disk

DB_BLOCK_CHECKSUM=FULL (Optional DB_BLOCK_CHECKING)


Compute checksum on change and catches corruptions in memory
Validate checksum on read and update (DETECTION)
Prevents corrupted block to be written to disk (PREVENTION)
Recover using good data block and redo (REPAIR)

Automatic Storage Management


Data Corruption or I/O error triggers repair (DETECTION/REPAIR)
Bad Good Oracle semantics aware
SCN SCN Reads extent copies for good copy (PREVENTION of ERROR)
Good writes can correct existing corruptions (REPAIR)

Exadata HARD and Automatic Disk Scrub and Repair


Prevents physical corruption during writes (OS to storage) (PREVENTION)
Inspects and repairs hard disk corruption that resides on storage (DETECTION)
Calls ASM to repair using good extent copy (REPAIR)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Corruption detection, Mirror Read, and Automatic Repair
With ASM Redundancy
Oracle logs the following for the administrator:
Corrupt block relative dba: 0x16400087 (file 89, block 135)
Application Bad check value found during multiblock buffer read
continues to run without Data in bad block:
ever noticing the failure type: 6 format: 2 rdba: 0x16400087
last change scn: 0x0000.b6702b33 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x2b330601

Database check value in block header: 0xa07a


update computed block checksum: 0x3
encounters Reading datafile '+DATA/qs/datafile/c.257.825768683' for corruption
corruption at rdba: 0x16400087 (file 89, block 135)
Read datafile mirror DATA_CD_08_CELL13' (file 89, block 135) found
Database same corrupt data (no logical check)
reads ASM Read datafile mirror DATA_CD_07_CELL14' (file 89, block 135) found
mirror copy valid data
and repairs Hex dump of (file 89, block 135) in trace file
corruption /u01/app/oracle/diag/ /qs1_ora_60475.trc
Repaired corruption at (file 89, block 135)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


I/O Error Prevention
Exadata Disk Scrubbing Combined with ASM Auto Repair
Oracle logs the following for the administrator:
Application Wed Jul 16 17:00:06 2014
never encounters Begin scrubbing CellDisk:CD_06_cell06.
the I/O error Begin scrubbing CellDisk:CD_07_cell06.
..
Wed Jul 16 18:33:05 2014
Read Error on Cell Disk CD_06_cell06 (/dev/sdg) at device offset
2794140467200 bytes with size 1048576 bytes (errno: Input/output
error [5])
Read Error on Grid Disk RECOC1_CD_06_cell06 at grid disk offset
Disk sector 423268188160 bytes with size 1048576 bytes from disk scrub
goes bad Wed Jul 16 18:33:12 2014
Broadcast: 1 events ASM REPAIR diskgroup of opcode 10 for diskgroup
Cell disk RECOC1 to:
scrub finds ...
bad sector Finished scrubbing CellDisk:CD_06_cell06, scrubbed blocks
and ASM (1MB):2860960, found bad blocks:2
repairs it Finished scrubbing CellDisk:CD_07_cell06, scrubbed blocks
(1MB):2860960, found bad blocks:0
..
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |
Corruption Prevention with Automatic Retry
Exadata Hardware Assisted Resilient Disk (HARD)

Oracle logs the following for the Administrator:


Application
never encounters Cell side:
a corruption Thu Sep 11 08:42:33 2014
HARD CHECK FAILED for ftyp=0 blksiz=512 blkno=0checks=1
startblk=33182326784 nblks=16

Database side:
Network packet Errors in file
containing /u01/app/oracle/diag/rdbms/qs/qs1/trace/qs1_dbwf_41262.trc:
database write ORA-27603: Cell storage I/O error, I/O failed on disk
is corrupted o/192.168.10.29;192.168.10.30/DATAC1_CD_02_CELL7 at offset
151396352
Cell prevents for data length 8192
write of corrupt ORA-27626: Exadata error: 205 (HARD check failed)
block and ASM WARNING: Write Failed, will retry. group:1 disk:74 AU:36
retries write offset:401408
size:8192

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Bronze and Silver Data Protection

We can do a much better job preventing and Id like to know that my backups are validated when
repairing corruptions in real time. they are created, and on a regular basis to make
sure they are good. I want to be alerted whenever a
database can NOT meet my recovery SLAs.

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Zero Data Loss Recovery Appliance Overview
Data Protection for Your Backups, Recovery for Your Business
Offloads Tape
Protected ZDLRA Backup
Databases

Delta Push
Access and send only changes
Minimal impact on production
Data Guard-like real-time redo ship
instantly protects new transactions

Delta Store virtual full backups


Protects all Oracle Databases Stores validated, compressed changes on disk Replicates to
Petabytes of data, any release Fast restores to any point-in-time using deltas and redo Remote ZDLRA for
No expensive backup agents Built on Exadata scaling and resilience disaster recovery
Enterprise Manager end-to-end control

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Data Loss Protection from Corruptions
ZDLRA Understands and Validates Database Formats End-to-end

Recovery
Appliance Tape Archive
Data validated when copied
Data validated on receive to and restored from tape
Data periodically revalidated
Data validated on restore
Built using MAA practices Remote Replica
ASM auto repair
Exadata HARD checks and Data validated on receive,
automatic disk scrub/repair restore, and periodically
ASM and Exadata checks
and repair
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |
Policy-Based Database Protection as a Service

Platinum and Gold Policy, Mission Critical


Disk: 90 days
Tape: 2 years
Tape Protection Policies
RPO: 5 secs Easy-to-deploy
Silver Policy, Business Critical Standardized
Disk: 30 days
Tape: 45 days Alerts when not
RPO: 15 mins Replica meeting Recovery
Bronze Policy, Test/Dev
SLAs
Disk: 5 days
Tape: 30 days
Replica ZDLRA
RPO: 1 hour
also policy based

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Lets Summarize BRONZE and SILVER

MAA parameters
ASM redundancy
RMAN Backups
ZDLRA so we can count on
successful restore when
required
Exadata-unique capabilities for
the best database protection
and availability

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Program Agenda

1 Today Data Protection Challenge


2 Data Protection and Oracle MAA

3 Bronze and Silver Data Protection

4 Gold and Platinum Data Protection

5 Reference Architecture

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Storage Remote Mirroring Architecture
Problem: No real-time validation. Corruption and other problems are mirrored

Primary Database Remote Volumes

Data corruptions are replicated


Oracle Instance (in memory) Zero Oracle validation
No Oracle block checks
Recovery No database recovery checks
Files No application validation
SYNC or ASYNC
block replication

0 0
0 0
0 0

Database Files ORA-01578: ORACLE data block corrupt


(file # 27, block # 331214)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Data Protection
Gold and Platinum Comprehensive Data Protection

Capability Physical Block Corruption Logical Block Corruption


Manual

Dbverify, Logical checks for intra-block and


Physical block checks
Analyze inter-object consistency
RMAN, ASM Physical block checks Intra-block logical checks
Continuous physical block checking at standby Detect lost write corruption, auto
Active Data Strong isolation to prevent single point of failure shutdown and failover
Guard Automatic repair of physical corruptions Intra-block logical checks at
Runtime

Automatic database failover standby

Database In-memory block and redo checksum In-memory intra-block checks

ASM Automatic corruption detection and repair using extent pairs

Exadata HARD checks on write, automatic disk scrub and repair HARD checks on write

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Active Data Guard Architecture
Oracle Aware Process Maintains an Exact Physical Copy of Production

Primary Database Active Standby Database


SYNC or ASYNC open read-only
database redo
Oracle Instance (in memory) Oracle Instance (in memory)
Data corruption is isolated to primary
Recovery Comprehensive run-time validation
Files Redo
By Data Guard apply
By read-only application workload
Apply
Automatic repair of primary using
good copy from standby
0
0
0

Automatic
Automatic block
block media
media recovery
recovery
Database Files requested
successfulfor
for(file#6,
(file#6,block
block#8738)
#8738)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Active Data Guard Demonstrations
Environment: Primary, RAC Far Sync, Active Data Guard Standby

SYNC ASYNC
Limited distance any distance
transport compression over WAN

Far Sync Instance


Primary Oracle control file and log files Active Standby Database
DR and reporting instance
Database No database files, no media recovery
Open read-only
Production instance Offload transport compression
Supports up to thirty remote Continuous Oracle validation
destinations Zero data loss failover target
Manual or automatic failover

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Lets Summarize GOLD and PLATINUM

MAA parameters + lost_write +


db_block_checking (standby only)
Active Data Guard provides
continuous validation
Auto data block repair for primary
and standby
Full utilization of standby for
queries and reports
Fast database and application
failover in seconds
Zero data loss with SYNC
(LAN/MAN) or FAR SYNC (WAN)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Program Agenda

1 Today Data Protection Challenge


2 Data Protection and Oracle MAA

3 Bronze and Silver Data Protection

4 Gold and Platinum Data Protection

5 Reference Architecture

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Integrated Software Environment
Address both Planned and Unplanned Downtime
Oracle Maximum Availability Architecture
Automatic Oracle RAC Data Guard Flashback GoldenGate
Storage
Management Instance failure Database failure Fast point-in-time Heterogeneous
recovery migrations
Recovery Server failure System failure
Granular repair Bi-directional
Manager (RMAN), Rolling Site failure
of logical and multi-master
maintenance Zero data loss corruptions replication
Oracle Secure Performance Automatic Transaction Zero downtime
Backup scale-out failover
Table maintenance
Online instance Corruption
Storage failure relocation Database
protection
Data recovery Edition Based
Consolidation Rolling upgrade
Backups Redefinition
Read-only offload Zero downtime
Backup offload application
upgrades

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Oracle Data Protection & HA Reference Architecture
The Ultimate Data Protection Solution for Oracle Database
ZDLRA
ZDLRA
RMAN/
RMAN/ Redo shipping
Redo shipping Production
Standby
Oracle DB or RAC
Oracle DB
Data Guard

OSB RMAN ASM/Flashback ASM RMAN OSB

Online Online
Tape or Disk Storage Tape or Disk
Backup Storage Backup

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Key Take Away
Data is everything. Carefully plan how to protect it
Data Lost is considered downtime for your business
Follow Oracle MAA Approach to make sure you
conform to Oracle data protection recommendation
Generic solutions for backup and replication put data
at risk and make recovery uncertain
Review your environment to see if there is any gap,
Oracle can advise you how to address the issue

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |


Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Você também pode gostar