Você está na página 1de 51

ORACLE RAC

Click to add text Lokesh Aggarwal

7th Sept, 2011

AGENDA
Introduction to RAC Availability Manageability Voting Disk Crash Scenarios Global Cache Services ASM Grid infrastructure Patching and up gradation By others ACFS Patching in Data Guard New features in 11gR2 RAC concepts Eviction scenarios
2006 IBM Corporation 2 2006 IBM Corporation

7th Sept, 2011

INTRODUCTION

2006 IBM Corporation

7th Sept, 2011

2006 IBM Corporation

What is RAC ???


Multiple instances running on separate servers (nodes) Single database on shared storage accessible to all nodes Instances exchange information over an interconnect network

Instance 1 Node 1 Local Disk


7th Sept, 2011

Interconnect

Instance 2
Node 2

Shared Storage
4

Local Disk
2006 IBM Corporation 2006 IBM Corporation

What is RAC Databases ???


Located on shared storage accessible by all instances Includes Control Files Data Files Online Redo Logs Server Parameter File May optionally include Archived Redo Logs Backups Flashback Logs (Oracle 10.1 and above) Change Tracking Writer files (Oracle 10.1 and above)
2006 IBM Corporation

7th Sept, 2011

2006 IBM Corporation

Contd
Contents similar to single instance database except One redo thread per instance
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 3 SIZE 51200K, GROUP 4 SIZE 51200K; ALTER DATABASE ENABLE PUBLIC THREAD 2;

If using Automatic Undo Management also require one UNDO tablespace per instance
CREATE UNDO TABLESPACE "UNDOTBS2" DATAFILE SIZE 25600K AUTOEXTEND ON MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL;

Additional dynamic performance views (V$, GV$ but not X$) created by $ORACLE_HOME/rdbms/admin/catclust.sql
2006 IBM Corporation

7th Sept, 2011

2006 IBM Corporation

RAC Internal Structures and Services


Global Resource Directory (GRD) Records current state and owner of each resource Contains convert and write queues Distributed across all instances in cluster Global Cache Services (GCS) Implements cache coherency for database Coordinates access to database blocks for instances Maintains GRD Global Enqueue Services (GES) Controls access to other resources (locks) including library cache dictionary cache
2006 IBM Corporation

7th Sept, 2011

2006 IBM Corporation

Why Do Users Deploy RAC ???


Users may deploy RAC to achieve Increasing availability Increasing scalability Improving maintainability Reduction in total cost of ownership

2006 IBM Corporation

7th Sept, 2011

2006 IBM Corporation

Availability

2006 IBM Corporation

7th Sept, 2011

2006 IBM Corporation

What is Failover?
If one node or instance fails Node detecting failure will Read redo log of failed instance from last checkpoint Apply redo to datafiles including undo segments (roll forward) Rollback uncommitted transactions Cluster is frozen during part of this process Instance 1 Node 1 Local Disk
7th Sept, 2011

Interconnect

Instance 2 Node 2

Shared Storage
10

Local Disk
2006 IBM Corporation 2006 IBM Corporation

What are Database Services???


Database Services are logical groups of sessions Can be configured using DBCA Enterprise Manager (10.2 and above)

Can also be configured using SRVCTL (Oracle Cluster Registry only) SQL*Plus (Data Dictionary only)
In Oracle 10.1 and above, each service has Preferred Nodes (used by default) Available Nodes (used if preferred node fails)

2006 IBM Corporation

7th Sept, 2011

11

2006 IBM Corporation

What is Oracle Clusterware???


Introduced in Oracle 10.1 (Cluster Ready Services - CRS) Renamed in Oracle 10.2 to Oracle Clusterware Cluster Manager providing Node membership services Global resource management High availability functions On Linux Configured in /etc/inittab Implemented using three daemons CRS - Cluster Ready Service CSS - Cluster Synchronization Service EVM - Event Manager In Oracle 10.2 includes High Availability framework Allows non-Oracle applications to be managed
2006 IBM Corporation

7th Sept, 2011

12

2006 IBM Corporation

What is the OCR???


Oracle Cluster Registry (OCR) Configuration information for Oracle Clusterware / CRS Introduced in Oracle 10.1 Replaced Server Management (SRVM) disk/file Similar to Windows Registry Located on shared storage In Oracle 10.2 and above can be mirrored Maximum two copies

2006 IBM Corporation

7th Sept, 2011

13

2006 IBM Corporation

What is a Voting Disk???


Known as Quorum Disk / File in Oracle 9i Located on shared storage accessible to all instances Used to determine RAC instance membership In the event of node failure voting disk is used to determine which instance takes control of cluster Avoids split brain In Oracle 10.2 and above can be mirrored Odd number of copies (1, 3, 5 etc)
2006 IBM Corporation

7th Sept, 2011

14

2006 IBM Corporation

What is VIP???
Node application introduced in Oracle 10.1 Allows Virtual IP address to be defined for each node All applications connect using Virtual IP addresses If node fails Virtual IP address is automatically relocated to another node Only applies to newly connecting sessions

2006 IBM Corporation

7th Sept, 2011

15

2006 IBM Corporation

VIP Failover ???


mydb = x.x.x.201 x.x.x.202

(VIP) x.x.x.201 (Static) x.x.x.101

(VIP) x.x.x.202 (Static) x.x.x.102

2006 IBM Corporation

7th Sept, 2011

16

2006 IBM Corporation

VIP Failover ???


mydb = x.x.x.201 x.x.x.202 TCP Reset

(VIP) (VIP) x.x.x.202 x.x.x.201 (Static) x.x.x.101 (Static) x.x.x.102

2006 IBM Corporation

7th Sept, 2011

17

2006 IBM Corporation

VIP Failover ???


mydb = x.x.x.201 x.x.x.202

(VIP) (VIP) x.x.x.202 x.x.x.201 (Static) x.x.x.101 (Static) x.x.x.102

2006 IBM Corporation

7th Sept, 2011

18

2006 IBM Corporation

What is TAF???
TAF is Transparent Application Failover
Requires additional coding in client

Requires configuration in TNSNAMES.ORA


RAC_FAILOVER = (DESCRIPTION = (ADDRESS_LIST = (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = RAC) (SERVER = DEDICATED) (FAILOVER_MODE =(TYPE=SELECT)(METHOD=BASIC)(RETRIES=30)(DELAY=5)) ) )
2006 IBM Corporation

7th Sept, 2011

19

2006 IBM Corporation

Does RAC Increase Availability???


Depends on definition of availability May achieve less unplanned downtime May have more time to respond to failures Instance failover means any node can fail without total loss of service Must provide have overcapacity in cluster to survive failover Additional Oracle and RAC licenses Load can be distributed over all running nodes Can use Grid to provision additional nodes

2006 IBM Corporation

7th Sept, 2011

20

2006 IBM Corporation

Contd
Can still get data corruptions Human errors / software errors Only one logical copy of data Only one logical copy of application / Oracle software Lots of possibility for human errors Power / network cabling / storage configuration Upgrades and patches are more complex Can upgrade software on subset of nodes If database is affected then still need downtime

2006 IBM Corporation

7th Sept, 2011

21

2006 IBM Corporation

Manageability

2006 IBM Corporation

7th Sept, 2011

22

2006 IBM Corporation

Server Parameter File


Introduced in Oracle 9.0.1 Must reside on shared storage Shared by all RAC instances Binary (not text) files Parameters can be changed using ALTER SYSTEM Can be backed up using the Recovery Manager (RMAN) Created using
CREATE SPFILE [ = SPFILE_NAME ] FROM PFILE [ = PFILE_NAME ];

init.ora file on each node must contain SPFILE parameter


SPFILE = <pathname>

2006 IBM Corporation

7th Sept, 2011

23

2006 IBM Corporation

Parameters
Some parameters must be same on each instance including :
ACTIVE_INSTANCE_COUNT ARCHIVE_LAG_TARGET CLUSTER_DATABASE CONTROL_FILES DB_BLOCK_SIZE DB_DOMAIN DB_FILES DB_NAME DB_RECOVERY_FILE_DEST DB_RECOVERY_FILE_DEST_SIZE DB_UNIQUE_NAME TRACE_ENABLED UNDO_MANAGEMENT

2006 IBM Corporation

7th Sept, 2011

24

2006 IBM Corporation

Contd
Some parameters, if used, must be different on each instance including :
THREAD INSTANCE_NUMBER INSTANCE_NAME UNDO_TABLESPACE ROLLBACK_SEGMENTS

2006 IBM Corporation

7th Sept, 2011

25

2006 IBM Corporation

DBCA
Can be used to Create RAC database and instances Create ASM instance Manage ASM instance (10.2) Add RAC instances Create RAC database Create clone RAC database (10.2) Create, Manage and Drop Services Drop instances and database

2006 IBM Corporation

7th Sept, 2011

26

2006 IBM Corporation

What is SRVCTL?
Utility used to manage cluster database Configured in Oracle Cluster Registry (OCR) Controls Database Instance ASM Listener Node Applications Services Options include Start / Stop Enable / Disable Add / Delete Show current configuration Show current status
2006 IBM Corporation

7th Sept, 2011

27

2006 IBM Corporation

SRVCTL - Examples
Starting and Stopping a Database
srvctl start database -d RAC srvctl stop database -d RAC

Starting and Stopping an Instance


srvctl start instance -d RAC -i RAC1 srvctl stop instance -d RAC -i RAC1

Starting and Stopping a Service


srvctl start service -d RAC -s SERVICE1 srvctl stop service -d RAC -s SERVICE1

Starting and Stopping ASM on a specified node


srvctl start asm -n node1 srvctl stop asm -n node1
2006 IBM Corporation

7th Sept, 2011

28

2006 IBM Corporation

What is CLUVFY?
Introduced in Oracle 10.2
Supplied with Oracle Clusterware Can be downloaded from OTN (Linux and Windows) Written in Java - requires JRE (supplied) Also works with 10.1 (specify -10gR1 option) Checks cluster configuration stages - verifies all steps for specified stage have been completed components - verifies specified component has been correctly installed
2006 IBM Corporation

7th Sept, 2011

29

2006 IBM Corporation

CLUVFY
Stages include -post hwos -pre cfs -post cfs -pre crsinst -post crsinst -pre dbinst -pre dbcfg post check for hardware and operating system pre-check for CFS setup post-check for CFS setup pre-check for Oracle Clusterware installation post-check for Oracle Clusterware installation pre-check for database installation pre-check for database configuration

2006 IBM Corporation

7th Sept, 2011

30

2006 IBM Corporation

contd..
Components include
nodereach nodecon cfs ssa space sys clu clumgr ocr crs Checks reachability between nodes Checks node connectivity Checks CFS integrity Checks shared storage accessibility Checks space availability Checks minimum system requirements Checks cluster integrity Checks cluster manager integrity Checks OCR integrity Checks Oracle Clusterware (CRS) integrity

nodeapp
admprv peer

Checks node applications exist


Checks administrative privileges Compares properties with peers
2006 IBM Corporation 2006 IBM Corporation

7th Sept, 2011

31

contd..
For example, to check configuration before installing Oracle Clusterware on node1 and node2 use: sh runcluvfy.sh stage -pre crsinst -n node1,node2

Checks: node reachability user equivalence administrative privileges node connectivity shared stored accessibility
If any checks fail append -verbose to display more information
2006 IBM Corporation

7th Sept, 2011

32

2006 IBM Corporation

Other Utilities
Additional RAC utilities and diagnostics include OCRCONFIG OCRCHECK OCRDUMP CRSCTL CRS_STAT Additional RAC diagnostics can be obtained using ORADEBUG utility DUMP option LKDEBUG option Events

2006 IBM Corporation

7th Sept, 2011

33

2006 IBM Corporation

Does RAC Improve Manageability?


Advantages Fewer databases to manage Easier to monitor Easier to upgrade Easier to control resource allocation Resources can be shared between applications Disadvantages Upgrades potentially more complex Downtime may affect more applications Requires more experienced operational staff Higher cost / harder to replace

2006 IBM Corporation

7th Sept, 2011

34

2006 IBM Corporation

Voting Disk Crash Scenarios

2006 IBM Corporation

7th Sept, 2011

35

2006 IBM Corporation

contd
Losing one Voting Disk Voting disks are used in a RAC configuration for maintaining nodes membership. They are critical pieces in a cluster configuration. Starting with ORACLE 10gR2, it is possible to mirror the OCR and the voting disks. Using the default mirroring template, the minimum number of voting disks necessary for a normal functioning is two. Scenario Setup

Identify Votings:
crsctl query css votedisk /dev/raw/raw1 /dev/raw/raw2

/dev/raw/raw3
2006 IBM Corporation

7th Sept, 2011

36

2006 IBM Corporation

contd
corrupt one of the voting disks (as root):
dd if=/dev/zero /dev/raw/raw3 bs=1M

Recoverability Steps
check the $CRS_HOME/log/[hostname]/alert[hostname].log file. The following message should be written there which allows us to determine which voting disk became corrupted:
[cssd(9120)]CRS-1604:CSSD voting file is offline: /opt/oracle/product/10.2.0/crs_1/Voting1. Details in /opt/oracle/product/10.2.0/crs_1/log/ractest2/cssd/ocssd.log.

2006 IBM Corporation

7th Sept, 2011

37

2006 IBM Corporation

contd
According to the above listing the Voting1 is the corrupted disk. Shutdown the CRS stack:
srvctl stop database -d fitstest -o immediate srvctl stop asm -n ractest1 srvctl stop asm -n ractest2 srvctl stop nodeapps -n ractest1 srvctl stop nodeapps -n ractest2 crs_stat t

On every node as root:


crsctl stop crs
2006 IBM Corporation

7th Sept, 2011

38

2006 IBM Corporation

contd
Pick a good voting from the remaining ones and copy it over the corrupted one:
dd if=/dev/raw/raw4 of=/dev/raw/raw3 bs=1M

Start CRS (on every node as root)::


crsctl start crs

Check log file $CRS_HOME/log/[hostname]/alert[hostname].log. It should look like shown below:


[cssd(14463)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ractest1 ractest1. 2007-05-31 15:19:53.954 [crsd(14268)]CRS-1012:The OCR service started on node ractest1. 2007-05-31 15:19:53.987 [evmd(14228)]CRS-1401:EVMD started on node ractest1. 2007-05-31 15:19:55.861 [crsd(14268)]CRS-1201:CRSD started on node ractest1.
2006 IBM Corporation

7th Sept, 2011

39

2006 IBM Corporation

contd
After a couple of minutes check the status of the whole CRS stack:
[oracle@ractest1 ~]$ crs_stat -t

2006 IBM Corporation

7th Sept, 2011

40

2006 IBM Corporation

Global Cache Services

2006 IBM Corporation

7th Sept, 2011

41

2006 IBM Corporation

Read with No Transfer


N S 1 Request shared resource Resource Master

Instance 2

2 Request granted

Instance 3

3 Read request Instance 1 4 Block returned Instance 4

Instance 2 requests current read on block

1318

2006 IBM Corporation

7th Sept, 2011

42

2006 IBM Corporation

Read to Write Transfer


N S N 3 Block and resource status 2 Transfer block to Instance 1 for exclusive access 1 Request exclusive resource Resource Master

1318
Instance 2

Instance 3

N X

1320
Instance 1 Instance 1 requests exclusive read on block

4 Resource status
Instance 4

1318

2006 IBM Corporation

7th Sept, 2011

43

2006 IBM Corporation

Write to Write Transfer


N S N 2 Transfer block to Instance 4 in exclusive mode Resource Master 1

1318
Instance 2

Instance 3 4 Resource status

N X N

N X

1320
3 Block and resource status Instance 1 Instance 4 requests exclusive read on block

1323
Instance 4

1318

Note that Instance 1 will create a past image (PI) of the dirty block
2006 IBM Corporation

7th Sept, 2011

44

2006 IBM Corporation

Past Images
When an instance passes a dirty block to another instance it Flushes redo buffer to redo log Retains past image (PI) of block in buffer cache PI is retained until another instance writes block to disk Used to reduce recovery times Recorded in V$BH.STATUS as PI Based on X$BH.STATE (value 8 in Oracle 10.2)

2006 IBM Corporation

7th Sept, 2011

45

2006 IBM Corporation

contd..
Buffer Cache UPDATE t1 SET c1 = 1324; 1328; 1327; 1326; 1325; COMMIT; Buffer Cache UPDATE t1 SET c1 = 1329; COMMIT;

1323 1324 1325 1326 1327 1328 1329

1328 1329

Instance 1

Instance 2

1323

1324

1328

1329

1324
1325 1326 1327

1325
1326 1327 1328

1329 1323

BlockUndo/Redoapplied froma DBWR hasis 1makesperform to Instance 42updates column Block 421table t1 contains Assume2subsequentlyto 42 Instance notmustrecovery Undo/redoupdated in to GCS transferswritten from Block 1needs block Instance is2 written buffer Undo/redo Crasheswritten 42 is Instance written changes block Block 42 is read from disk ContentsPastdisk Instance 2 lost Instance 42cachePast Image toInstance usesInstance yet recovery1 to by block 2 backof 1 Imageto disk blocktobuffer cache 42 single row in block are a Redo Log 1 1327 DBWR 1324 2 1329 1328 1326 1325 for back
Redo Log 2
2006 IBM Corporation 46 2006 IBM Corporation

Redo Log 1

7th Sept, 2011

THANK YOU !!!

2006 IBM Corporation

7th Sept, 2011

47

2006 IBM Corporation

2006 IBM Corporation 48 2006 IBM Corporation

2006 IBM Corporation 49 2006 IBM Corporation

What is the Interconnect???

Customer Relationship Management (CRM) Information transferred between instances includes


data blocks locks SCNs Typically 1GB Ethernet UDP protocol

Instances communicate with each other over the interconnect (network)

Back
2006 IBM Corporation

7th Sept, 2011

50

2006 IBM Corporation

Why Use Shared Storage ???


Mandatory for Database files Control files Online redo logs Server Parameter file (if used) Optional for Archived redo logs (recommended) Executables (Binaries) Password files Parameter files Network configuration files Administrative directories Alert Log Dump Files
7th Sept, 2011
51

Back
2006 IBM Corporation 2006 IBM Corporation

Você também pode gostar