Escolar Documentos
Profissional Documentos
Cultura Documentos
Aman Sharma
OTNYatra-2016
Aman Sharma-OTNYatra(2016)
Who Am I?
Aman Sharma
About 12+ years using Oracle Database
Oracle ACE
Frequent Contributor to OTN Database forum(Aman.)
Oracle Certified
Sun Certified
|@amansharma81 *
|http://blog.aristadba.com *
Aman Sharma-OTNYatra(2016)
Agenda
RAC Overview
Understanding Node Eviction
Instance Eviction
Q/A
Aman Sharma-OTNYatra(2016)
Aman Sharma-OTNYatra(2016)
Database Tier
SGA
CFS
Clusterware
Clusterware
Voting
Disk(s)
Aman Sharma-OTNYatra(2016)
Aman Sharma-OTNYatra(2016)
Aman Sharma-OTNYatra(2016)
Instance Eviction
Mechanism in which an instance(s) is evicted by group of
instances
Happens if any critical issue is detected at the RAC
instance/database level
Is done to ensure that the entire database wont be required to
get crashed due to issue in one or few instances
Aman Sharma-OTNYatra(2016)
Node-2
SGA
I can see
Node 2
NW Heartbeat
SGA
I can see
Node 1
CFS
Clusterware
Clusterware
VD Heartbeat
Voting
Disk(s)
Aman Sharma-OTNYatra(2016)
10
Node-2
SGA
I can see
Node 2
NW Heartbeat
SGA
I cant see
Node 2
CFS
Clusterware
Clusterware
VD Heartbeat
Voting
Disk(s)
Aman Sharma-OTNYatra(2016)
11
Aman Sharma-OTNYatra(2016)
12
Aman Sharma-OTNYatra(2016)
13
Aman Sharma-OTNYatra(2016)
14
Aman Sharma-OTNYatra(2016)
15
Node-2
SGA
I can see
Node 2
NW Heartbeat
SGA
I cant see
Node 2
CFS
Clusterware
Clusterware
VD Heartbeat
Voting
Disk(s)
Aman Sharma-OTNYatra(2016)
16
CSS Misscount
CSS Timeout=network latency in a node-node
communication(in Seconds)
Can be checked through
$crsctl get css misscount
17
Actions to be taken
Check OCSSD Log & Alert Log of the evicted node
Confirm that the interconnect network is working fine
Confirm that there isnt any packet drops in the
private network
Aman Sharma-OTNYatra(2016)
18
Node-2
SGA
I cant see
Node 2
NW Heartbeat
SGA
I cant see
Node 1
CFS
Clusterware
Clusterware
VD Heartbeat
Voting
Disk(s)
Aman Sharma-OTNYatra(2016)
19
(Voting)Disk HeartBeat
Cluster nodes ping constantly into the Voting
disk to mark their presence
Disk heartbeat has an internal timeout-Disk
Timeout(DTO);in Seconds
Disk heartbeat i/o timeout interval is directly
related to the misscount parameter setting
Default value=200 seconds
Default value can be changed(MOS#284752.1)
Aman Sharma-OTNYatra(2016)
20
OCSSD process(s) of each node write into the Voting Disk & also
read the heartbeat writes of other nodes
To stay in the cluster, each node must access >50% of the Voting
disks
Writes are done in the unit of an OS block size(OS dependent)
Voting disk count must be in an ODD number
#Voting Disk=F*(2+1) where F=Voting Disks expected to be failed
Maximum Voting Disks can be 15
From 11.2, can also be stored on ASM
Aman Sharma-OTNYatra(2016)
21
Reboot Time
Time allowed for a node to do a reboot after being
evicted
Aman Sharma-OTNYatra(2016)
22
23
Clusterware
Aman Sharma-OTNYatra(2016)
24
25
26
Aman Sharma-OTNYatra(2016)
27
CSS log:
2012-03-27 22:05:48.693: [
CSSD][1100548416](:CSSNM000
18:)clssnmvDiskCheck: Aborting,
0 of 3 configured voting disks
available, need 2
2012-03-27 22:05:48.693: [
CSSD][1100548416]##########
#########################
2012-03-27 22:05:48.693: [
CSSD][1100548416]clssscExit:
CSSD aborting from thread
clssnmvDiskPingMonitorThread
Aman Sharma-OTNYatra(2016)
28
Aman Sharma-OTNYatra(2016)
30
Aman Sharma-OTNYatra(2016)
31
32
Aman Sharma-OTNYatra(2016)
33
Aman Sharma-OTNYatra(2016)
34
Aman Sharma-OTNYatra(2016)
35
Aman Sharma-OTNYatra(2016)
36
Network issues
OS Load issues
Memory or CPU starvation issues
Instance eviction may lead to Node
Eviction(Member Kill escalation)
LMON may request CSS to remove an instance
from the clsuter
If NW timeout occurs before it happens, node
itself is evicted(11g onwards)
Aman Sharma-OTNYatra(2016)
37
Files to be checked
CW Alert Log (Evicted & Surviving Node)
LMON Trace File (Evicted & Surviving Node)
Aman Sharma-OTNYatra(2016)
38
Linux: /var/log/messages
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
IBM: /bin/errpt -a > messages.out
Aman Sharma-OTNYatra(2016)
39
Aman Sharma-OTNYatra(2016)
40
Few Recommendations
Ensure proper HW resources are available
For database, use AMM & Resource Manager for
proper memory management
Run ORACHK & implement its given suggestions
Install & use OSWatcher
Use faster network, especially between the
nodes(Private Network)
Private interconnect card should be very fast-at least
1Gigabit
Dont use crossover cables but use a Switch instead
Ensure to use NTP(Network Time Protocol)
Aman Sharma-OTNYatra(2016)
41
Thank You!
@amansharma81
amansharma@aristadba.com
Aman Sharma-OTNYatra(2016)
42