Você está na página 1de 129

Troubleshooting and Diagnosing Oracle

Database 12.2 and Oracle RAC


https://www.linkedin.com/in/raosandesh/
sandeshr

Sandesh Rao, Senior Director , RAC Development

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracles products remains at the sole discretion of Oracle.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 2
Common Questions
How do I contact you ?
Linkedin Sandesh Rao
Email Sandesh.rao@oracle.com
Where do I get your presentation ?
http://otnyathra.in/downloads/

Which books on RAC do I read for basics or internals ?


Oracle Database 11g Oracle Real Application Clusters Handbook, 2nd Edition (Oracle Press) 2nd Edition
Pro Oracle Database 11g RAC on Linux (Expert's Voice in Oracle) 2nd ed. Edition
Oracle 10g RAC Grid, Services and Clustering 1st Edition
Pro Oracle Database 10g RAC on Linux: Installation, Administration, and Performance (Expert's Voice in
Oracle) 1st Corrected ed., Corr. 3rd printing Edition
Oracle Database 12c Release 2 Oracle Real Application Clusters Handbook: Concepts, Administration, Tuning &
Troubleshooting (Oracle Press) 1st Edition
Documentation Autonomous Computing Guide , RAC Admin guide
Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted 3
Agenda
Architectural Overview
Troubleshooting Scenarios
Proactive and Reactive tools
Q&A

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Overview
Grid Infrastructure is the name for the combination of
Oracle Cluster Ready Services (CRS)
Oracle Automatic Storage Management (ASM)
The Grid Home contains the software for both products
CRS can also be Standalone for ASM and/or Oracle Restart
CRS can run by itself or in combination with other vendor clusterware
Grid Home and RDBMS home must be installed in different locations
The installer locks the Grid Home path by setting root permissions.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Overview
CRS requires shared Oracle Cluster Registry (OCR) and Voting files
Must be in ASM or CFS
OCR backed up every 4 hours automatically GIHOME/cdata
Kept 4,8,12 hours, 1 day, 1 week
Restored with ocrconfig
Voting file backed up into OCR at each change.
Voting file restored with crsctl

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Overview
For network CRS requires
One/multiple high speed, low latency, redundant private network for inter node
communications
Think of interconnect as a memory backplane for the cluster
Should be a separate physical network or managed converged network
VLANS are supported
Used for :-
Clusterware messaging
RDBMS messaging and block transfer
ASM messaging
HANFS for block traffic

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Overview
Only one set of Clusterware daemons can run on each node
The CRS stack is spawned from Oracle HA Services Daemon (ohasd)
On Unix ohasd runs out of inittab with respawn
A node can be evicted when deemed unhealthy
May require reboot but at least CRS stack restart (rebootless restart)
IPMI integration or diskmon in case of Exadata
CRS provides Cluster Time Synchronization services
Always runs but in observer mode if ntpd configured

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Processes
Agents change everything
Multi-threaded Daemons
Manage multiple resources and types
Implements entry points for multiple resource types
Start,stop check,clean,fail
oraagent, orarootagent, application agent, script agent, cssdagent
Single process started from init on Unix (ohasd)
Diagram below shows all core resources

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Processes Level 4a

Level 2a
Level 3

Level 0

Level 4b
Level 1

Level 2b

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Processes
Init Scripts
/etc/init.d/ohasd ( location O/S dependent )
RC script with start and stop actions
Initiates Oracle Clusterware autostart
Control file coordinates with CRSCTL
/etc/init.d/init.ohasd ( location O/S dependent )
OHASD Framework Script runs from init/upstart
Control file coordinates with CRSCTL
Named pipe syncs with OHASD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Processes

Level 1: OHASD Spawns:


cssdagent - Agent responsible for spawning CSSD
orarootagent - Agent responsible for managing all root owned ohasd resources
oraagent - Agent responsible for managing all oracle owned ohasd resources
cssdmonitor - Monitors CSSD and node health (along with the cssdagent)
Level 2a: OHASD rootagent spawns:
CRSD - Primary daemon responsible for managing cluster resources.
CTSSD - Cluster Time Synchronization Services Daemon
Diskmon ( Exadata )
ACFS (ASM Cluster File System) Drivers

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Processes
Level 2b: OHASD oraagent spawns:
MDNSD Multicast DNS daemon
GIPCD Grid IPC Daemon
GPNPD Grid Plug and Play Daemon
EVMD Event Monitor Daemon
ASM ASM instance started here as may be required by CRSD
Level 3: CRSD spawns:
orarootagent - Agent responsible for managing all root owned crsd resources.
oraagent - Agent responsible for managing all nonroot owned crsd resources.
One is spawned for every user that has CRS resources to manage.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Grid Infrastructure Processes
Startup Sequence
Level 4: CRSD oraagent spawns:
ASM Resouce - ASM Instance(s) resource (proxy resource)
Diskgroup - Used for managing/monitoring ASM diskgroups.
DB Resource - Used for monitoring and managing the DB and instances
SCAN Listener - Listener for single client access name, listening on SCAN VIP
Listener - Node listener listening on the Node VIP
Services - Used for monitoring and managing services
ONS - Oracle Notification Service
eONS - Enhanced Oracle Notification Service ( pre 11.2.0.2 )
GSD - For 9i backward compatibility
GNS (optional) - Grid Naming Service - Performs name resolution

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Oracle Flex Cluster

The standard going forward


(every Oracle 12c Rel. 2 cluster
is a Flex Cluster by default.)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 15


Under the Hood: Any New Install Ends Up in a Flex Cluster

[GRID]> crsctl get cluster name


CRS-6724: Current cluster name is 'SolarCluster'

[GRID]> crsctl get cluster class


CRS-41008: Cluster class is 'Standalone Cluster'

[GRID]> crsctl get cluster type


CRS-6539: The cluster type is 'flex'.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 16


1 2 Cluster Domain 3 4
Database Application Database Database
Member Cluster Member Cluster Member Cluster Member Cluster

Uses IO & ASM Uses ASM


Private Uses local ASM GI only Service of DSC Service
Network
SAN

NAS Domain Services Cluster


Mgmt Trace File Rapid Home Additional
Repository Analyzer Provisioning ASM
Optional IO Service
(GIMR) (TFA) (RHP) Service
Service Services
Service Service

Shared ASM

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 17


ASM Flex Diskgroups 1
Database-oriented Storage Management for more flexibility and availability
Pre-12.2 diskgroup Organization 12.2 Flex Diskgroup Organization
Shared resource Database-oriented
File Group resource management
management
Diskgroup Flex Diskgroup
DB1 : File 1 DB3 : File 3
DB1 DB2 DB3
DB3 : File 1 DB2 : File 1
File 1 File 1 File 1
DB2 : File 2 DB1 : File 3
File 2 File 2 File 2
DB3 : File 2 DB2 : File 3 File 3 File 3 File 3
DB2 : File 4 DB1 : File 2 File 4

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal/Restricted/Highly Restricted 18
ASM Flex Diskgroups 2
Database-oriented Storage Management for more flexibility and availability
12.2 Flex Diskgroup Organization
Flex Diskgroups enable
Quota Management - limit the space
Flex Diskgroup databases can allocate in a diskgroup and
thereby improve the customers ability to
DB1 DB2 DB3 consolidate databases into fewer DGs
File 1 File 1 File 1 Redundancy Change utilize lower
redundancy for less critical databases
Quota File 2 File 2 File 2
File 3 Shadow Copies (split mirrors) to easily
File 3 File 3 DB3
and dynamically create database clones
File 4 File 1
for test/dev or production databases
File 2
File 3

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal/Restricted/Highly Restricted 19
Node Weighting in Oracle RAC 12c Release 2
Idea: Everything equal, let the majority of work survive

Node Weighting is a new feature that considers

the workload hosted in the cluster during fencing

1 2 The idea is to let the majority of work survive,


if everything else is equal
Example: In a 2-node cluster, the node hosting the
majority of services (at fencing time) is meant to survive

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 20


CSS_CRITICAL Fencing with Manual Override
Node eviction
despite WL; WL Conflict.
will failover.


srvctl modify database -help
|grep critical

-css_critical {YES | NO}
Define whether the database
or service is CSS critical

crsctl set server


css_critical {YES|NO}
+ server restart

CSS_CRITICAL CSS_CRITICAL will be honored


can be set on various levels / if no other technical reason prohibits A fallback scheme is applied if
components to mark them as survival of the node which has at CSS_CRITICAL settings do not lead to
critical so that the cluster will try to least one critical component at the an actionable outcome.
preserve them in case of a failure. time of failure.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 21


Proven Features Even More Beneficial on the DSC

Autonomous Health Framework


The DSC is the ideal hosting Oracle ASM 12c Rel. 2 based storage
(powered by machine learning)
environment for Rapid Home consolidation is best performed on
works more efficiently for you on the
Provisioning (RHP) enabling software the DSC, as it enables numerous
DSC, as continuous analysis is taken
fleet management. additional features and use cases.
off the production cluster.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 22


Node Eviction Basics

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Basic RAC Cluster with Oracle Clusterware

Public Lan Public Lan

Private Lan /
Interconnect

CSSD CSSD CSSD

SAN SAN
Network Voting Network
Disk

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


What does CSSD do?
CSSD monitors and evicts nodes
Monitors nodes using 2 communication channels:
Private Interconnect Network Heartbeat
Voting Disk based communication Disk Heartbeat
Evicts (forcibly removes nodes from a cluster)
nodes dependent on heartbeat feedback (failures)

CSSD Ping CSSD

Ping

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Network Heartbeat
Interconnect basics
Each node in the cluster is pinged every second
Nodes must respond in css_misscount time (defaults to 30 secs.)
Reducing the css_misscount time is generally not supported

Network heartbeat failures will lead to node evictions


CSSD-log: [date / time] [CSSD][1111902528]clssnmPollingThread: node
mynodename (5) at 75% heartbeat fatal, removal in 6.770 seconds

CSSD Ping CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Disk Heartbeat
Voting Disk basics Part 1
Each node in the cluster pings (r/w) the Voting Disk(s) every second
Nodes must receive a response in (long / short) diskTimeout time
I/O errors indicate clear accessibility problems timeout is irrelevant

Disk heartbeat failures will lead to node evictions


CSSD-log: [CSSD] [1115699552] >TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(1) wrtcnt(1) LATS(63436584) Disk lastSeqNo(1)

CSSD CSSD

Ping

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Voting Disk Structure
Voting Disk basics Part 2
Voting Disks contain dynamic and static data:
Dynamic data: disk heartbeat logging
Static data: information about the nodes in the cluster

With 11.2.0.1 Voting Disks got an identity:


E.g. Voting Disk serial number: [GRID]> crsctl query css votedisk
1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]

Voting Disks must therefore not be copied using dd or cp anymore

Node information Disk Heartbeat Logging

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Simple Majority Rule
Voting Disk basics Part 3
Oracle supports redundant Voting Disks for disk failure protection
Simple Majority Rule applies:
Each node must see the simple majority of configured Voting Disks
at all times in order not to be evicted (to remain in the cluster)

trunc(n/2+1) with n=number of voting disks configured and n>=1

CSSD CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Insertion 1: Simple Majority Rule
In extended Oracle clusters

http://www.oracle.com/goto/rac
Using standard NFS to support
a third voting file for extended
cluster configurations (PDF)

CSSD CSSD

Same principles apply


Voting Disks are just
geographically dispersed

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Insertion 2: Voting Disk in Oracle ASM
The way of storing Voting Disks doesnt change its use
[GRID]> crsctl query css votedisk
1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]
2. 2 aafab95f9ef84f03bf6e26adc2a3b0e8 (/dev/sde5) [DATA]
3. 2 28dd4128f4a74f73bf8653dabd88c737 (/dev/sdd6) [DATA]
Located 3 voting disk(s).

Oracle ASM auto creates 1/3/5 Voting Files


Based on Ext/Normal/High redundancy
and on Failure Groups in the Disk Group
Per default there is one failure group per disk
ASM will enforce the required number of disks
New failure group type: Quorum Failgroup

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Why are nodes evicted?
To prevent worse things from happening
Evicting (fencing) nodes is a preventive measure (a good thing)!
Nodes are evicted to prevent consequences of a split brain:
Shared data must not be written by independently operating nodes
The easiest way to prevent this is to forcibly remove a node from the cluster

1 2

CSSD CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


How are nodes evicted?
EXAMPLE: Heartbeat failure
The network heartbeat between nodes has failed
It is determined which nodes can still talk to each other
A kill request is sent to the node(s) to be evicted
Using all (remaining) communication channels Voting Disk(s)

A node is requested to kill itself; executer: typically CSSD

CSSD CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Re-bootless Node
Fencing (restart)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Re-bootless Node Fencing (restart)
Fence the cluster, do not reboot the node
Until Oracle Clusterware 11.2.0.2, fencing meant re-boot
With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because:
Re-boots affect applications that might run an a node, but are not protected
Customer requirement: prevent a reboot, just stop the cluster implemented...

Standalone Standalone
App X App Y
Oracle RAC Oracle RAC
DB Inst. 1 DB Inst. 2

CSSD CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Re-bootless Node Fencing (restart)
How it works
With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
Instead of fast re-booting the node, a graceful shutdown of the stack is attempted

Then IO issuing processes are killed; it is made sure that no IO process remains
For a RAC DB mainly the log writer and the database writer are of concern

Standalone Standalone
App X App Y
Oracle RAC
DB Inst. 1

CSSD CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Re-bootless Node Fencing (restart)
EXCEPTIONS
With Oracle Clusterware 11.2.0.2, re-boots will be seen less, unless:
IF the check for a successful kill of the IO processes fails reboot
IF CSSD gets killed during the operation reboot
IF cssdmonitor is not scheduled reboot
IF the stack cannot be shutdown in short_disk_timeout-seconds reboot

Standalone Standalone
App X App Y
Oracle RAC Oracle RAC
DB Inst. 1 DB Inst. 2

CSSD CSSD

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Cluster Startup Problem Triage (11.2+)
Startup ps ef|grep init.ohasd NO crsctl config crs NO Engage Oracle Support
Sequence ps ef|grep ohasd.bin Running? ohasd.log Obvious? TFA Collector
Engage Sysadmin Team

YES
YES

Engage Sysadmin Team

ps ef|grep cssdagent

Cluster Startup ps ef|grep ocssd.bin


ps ef|grep orarootagent NO
ohasd.log YES Engage
ps ef|grep ctssd.bin Running? agent logs Obvious?
Diagnostic Flow ps ef|grep crsd.bin
ps ef|grep cssdmonitor
process logs
Sysadmin Team

ps ef|grep oraagent NO
YES
ps ef|grep ora.asm
Engage
ps ef|grep gpnpd.bin
TFA Collector Oracle Support
ps ef|grep mdnsd.bin ohasd.log Sysadmin Team
ps ef|grep evmd.bin OLR perms
Crsctl check crs Compare reference system
Crsctl check cluster

Engage NO YES Engage


Oracle Support TFA Collector Obvious?
Sysadmin Team
Sysadmin Team

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Cluster Startup Problem Triage

Multicast Domain Name Service Daemon (mDNS(d))


Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform
name resolution. The mDNS process is a background process on Linux and UNIX and on
Windows.
Uses multicast for cache updates on service advertisement arrival/departure.
Advertises/serves on all found node interfaces.
Log is GI_HOME/log/<node>/mdnsd/mdnsd.log

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Cluster Startup Problem Triage
<?xml version="1.0" encoding="UTF-8"?>
<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-
pnp.org/2005/11/gpnp-profile" xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile
gpnp-profile.xsd" ProfileSequence="6" ClusterUId="b1eec1fcdd355f2bbf7910ce9cc4a228" ClusterName="staij-cluster"
PALocation="">
<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP=192.168.1.0" Adapter="eth0" Use="public"/>
<gpnp:Network id="net2" IP=192.168.2.0" Adapter="eth1 Use="cluster_interconnect"/>
</gpnp:HostNetworkcss"></gpnp:Network-Profile>
<orcl:CSS-Profile id=" DiscoveryString="+asm" LeaseDuration="400"/>
<orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+SYSTEM/staij-cluster/asmparameterfile/registry.253.693925293"/>
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod
Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/10/xml-
exc-c14n#"> <InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl
xsi"/></ds:Transform></ds:Transforms><ds:DigestMethod
Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>x1H9LWjyNyMn6BsOykHhMvxnP8U=</ds:DigestValue
></ds:Reference></ds:SignedInfo><ds:SignatureValue>N+20jG4=</ds:SignatureValue></ds:Signature>
</gpnp:GPnP-Profile>

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Cluster Startup Problem Triage
cssd agent and monitor
Same functionality in both agent and monitor
Functionality of several pre-11.2 daemons consolidated in both
OPROCD system hang
OMON oracle clusterware monitor
VMON vendor clusterware monitor
Run realtime with locked down memory, like CSSD
Provides enhanced stability and diagnosability
Logs are
GI_HOME/log/<node>/agent/oracssdagent_root/oracssdagent_root.log
GI_HOME/log/<node>/agent/oracssdmonitor_root/oracssdmonitor_root.log
12c ORACLE_BASE/diag/node/agent/..

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Node Evictions
NHB?
1050693.1 Engage
1534949.1 YES YES
Eviction 1531223.1 Resource NO Cluster alert
1546004.1
Obvious? networking team
1328466.1 ocssd.log
Scenario System log
Starvation?
NO

NO
YES TFA Collector

Free memory? Engage storage


CPU load? team
Node Response? Engage DHB? NO
appropriate 1549428.1
team 1466639.1 YES YES

Obvious?
Engage
NO
Node Eviction Resolved?
NO
Oracle
Support

Diagnostic Flow Fenced? YES


NO
YES

YES Resource starvation


Engage
sysadmin
NO team

TFA Collector

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Missing Network Heartbeat (1)
ocssd.log from node 1
===> sending network heartbeats other nodes. Normally, this message is output once every 5 messages (seconds)
2016-08-13 17:00:20.023: [ CSSD][4096109472]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:20.023: [ CSSD][4096109472]clssnmSendingThread: sent 5 status msgs to all nodes
===> The network heartbeat is not received from node 2 (drrac2) for 15 consecutive seconds.
===> This means that 15 network heartbeats are missing and is the first warning (50% threshold).
2016-08-13 17:00:22.818: [ CSSD][4106599328]clssnmPollingThread: node drrac2 (2) at 50% heartbeat fatal, removal in 14.520
seconds
2016-08-13 17:00:22.818: [ CSSD][4106599328]clssnmPollingThread: node drrac2 (2) is impending reconfig, flag 132108,
misstime 15480
===> continuing to send the network heartbeats and log messages once every 5 messages
2016-08-13 17:00:25.023: [ CSSD][4096109472]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:25.023: [ CSSD][4096109472]clssnmSendingThread: sent 5 status msgs to all nodes
===> 75% threshold of missing network heartbeat is reached. This is second warning.
2016-08-13 17:00:29.833: [ CSSD][4106599328]clssnmPollingThread: node drrac2 (2) at 75% heartbeat fatal, removal in 7.500
seconds

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Missing Network Heartbeat (2)
===> continuing to send the network heartbeats and log messages once every 5 messages
2016-08-13 17:00:30.023: [ CSSD][4096109472]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:30.023: [ CSSD][4096109472]clssnmSendingThread: sent 5 status msgs to all nodes
===> continuing to send the network heartbeats, but the message is logged after 4 messages
2016-08-13 17:00:34.021: [ CSSD][4096109472]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:34.021: [ CSSD][4096109472]clssnmSendingThread: sent 4 status msgs to all nodes
===> Last warning shows that 90% threshold of the missing network heartbeat is reached.
===> The eviction will occur in 2.49 seconds.
2016-08-13 17:00:34.841: [ CSSD][4106599328]clssnmPollingThread: node drrac2 (2) at 90% heartbeat fatal, removal in
2.490 seconds, seedhbimpd 1
===> Eviction of node 2 (drrac2) started
2016-08-13 17:00:37.337: [ CSSD][4106599328]clssnmPollingThread: Removal started for node drrac2 (2), flags 0x2040c,
state 3, wt4c 0
===> This shows that the node 2 is actively updating the voting disks
2016-08-13 17:00:37.340: [ CSSD][4085619616]clssnmCheckSplit: Node 2, drrac2, is alive, DHB (1281744040, 1396854)
more than disk timeout of 27000 after the last NHB (1281744011, 1367154)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Missing Network Heartbeat (3)
===> Evicting node 2 (drrac2)
2016-08-13 17:00:37.340: [ CSSD][4085619616](:CSSNM00007:)clssnmrEvict: Evicting node 2, drrac2, from the cluster in
incarnation 169934272, node birth incarnation 169934271, death incarnation 169934272, stateflags 0x24000

===> Reconfigured the cluster without node 2


2016-08-13 17:01:07.705: [ CSSD][4043389856]clssgmCMReconfig: reconfiguration successful, incarnation 169934272 with 1
nodes, local node number 1, master node number 1

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Missing Network Heartbeat (4)
ocssd.log from node 2:
===> Logging the message to indicate 5 network heartbeats are sent to other nodes
2016-08-13 17:00:26.009: [ CSSD][4062550944]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:26.009: [ CSSD][4062550944]clssnmSendingThread: sent 5 status msgs to all nodes
===> First warning of reaching 50% threshold of missing network heartbeats
2016-08-13 17:00:26.213: [ CSSD][4073040800]clssnmPollingThread: node drrac1 (1) at 50% heartbeat fatal, removal in 14.540
seconds
2016-08-13 17:00:26.213: [ CSSD][4073040800]clssnmPollingThread: node drrac1 (1) is impending reconfig, flag 394254,
misstime 15460
===> Logging the message to indicate 5 network heartbeats are sent to other nodes
2016-08-13 17:00:31.009: [ CSSD][4062550944]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:31.009: [ CSSD][4062550944]clssnmSendingThread: sent 5 status msgs to all nodes
===> Second warning of reaching 75% threshold of missing network heartbeats
2016-08-13 17:00:33.227: [ CSSD][4073040800]clssnmPollingThread: node drrac1 (1) at 75% heartbeat fatal, removal in 7.470
seconds

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Missing Network Heartbeat (5)
===> Logging the message to indicate 4 network heartbeats are sent
2016-08-13 17:00:35.009: [ CSSD][4062550944]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:35.009: [ CSSD][4062550944]clssnmSendingThread: sent 4 status msgs to all nodes
===> Third warning of reaching 90% threshold of missing network heartbeats
2016-08-13 17:00:38.236: [ CSSD][4073040800]clssnmPollingThread: node drrac1 (1) at 90% heartbeat fatal, removal in
2.460 seconds, seedhbimpd 1
===> Logging the message to indicate 5 network heartbeats are sent to other nodes
2016-08-13 17:00:40.008: [ CSSD][4062550944]clssnmSendingThread: sending status msg to all nodes
2016-08-13 17:00:40.009: [ CSSD][4062550944]clssnmSendingThread: sent 5 status msgs to all nodes
===> Eviction started for node 1 (drrac1)
2016-08-13 17:00:40.702: [ CSSD][4073040800]clssnmPollingThread: Removal started for node drrac1 (1), flags 0x6040e,
state 3, wt4c 0
===> Node 1 is actively updating the voting disk, so this is a split brain condition
2016-08-13 17:00:40.706: [ CSSD][4052061088]clssnmCheckSplit: Node 1, drrac1, is alive, DHB (1281744036, 1243744)
more than disk timeout of 27000 after the last NHB (1281744007, 1214144)
2016-08-13 17:00:40.706: [ CSSD][4052061088]clssnmCheckDskInfo: My cohort: 2
2016-08-13 17:00:40.707: [ CSSD][4052061088]clssnmCheckDskInfo: Surviving cohort: 1
Copyright 2017, Oracle and/or its affiliates. All rights reserved. |
Missing Network Heartbeat (6)
===> Node 2 is aborting itself to resolve the split brain and ensure the cluster integrity
2016-08-13 17:00:40.707: [ CSSD][4052061088](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain.
Cohort of 1 nodes with leader 2, drrac2, is smaller than cohort of 1 nodes led by node 1, drrac1, based on map type 2
2016-08-13 17:00:40.707: [ CSSD][4052061088]###################################
2016-08-13 17:00:40.707: [ CSSD][4052061088]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
2016-08-13 17:00:40.707: [ CSSD][4052061088]###################################

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Missing Network Heartbeat (7)
Observations
1. Both nodes reported missing heartbeats at the same time
2. Both nodes sent heartbeats to other nodes all the time
3. Node 2 aborted itself to resolve split brain

Conclusion
1. This is likely a network problem, engage network team
2. Check OSWatcheroutput (netstat and traceroute)
1. Configure private.net file, not configured by default
3. Check CHM
4. Check system log

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Voting Disk Access Problem (1)
ocssd.log:
===> The first error indicating that it could not read voting disk -- first message to indicate a
problem accessing the voting disk
2016-08-13 18:31:19.787: [ SKGFD][4131736480]ERROR: -9(Error 27072, OS Error (Linux
Error: 5: Input/output error
Additional information: 4
Additional information: 721425
Additional information: -1)
)
2016-08-13 18:31:19.787: [ CSSD][4131736480](:CSSNM00060:)clssnmvReadBlocks: read
failed at offset 529 of /dev/sdb8
2016-08-13 18:31:19.802: [ CSSD][4131736480]clssnmvDiskAvailabilityChange: voting file
/dev/sdb8 now offline
Copyright 2017, Oracle and/or its affiliates. All rights reserved. |
Voting Disk Access Problem (2)
====> The error message that shows a problem accessing the voting disk repeats once every 4 seconds
2016-08-13 18:31:23.782: [ CSSD][150477728]clssnmvDiskOpen: Opening /dev/sdb8
2016-08-13 18:31:23.782: [ SKGFD][150477728]Handle 0xf43fc6c8 from lib :UFS:: for disk :/dev/sdb8:
2016-08-13 18:31:23.782: [ CLSF][150477728]Opened hdl:0xf4365708 for dev:/dev/sdb8:
2016-08-13 18:31:23.787: [ SKGFD][150477728]ERROR: -9(Error 27072, OS Error (Linux Error: 5:
Input/output error
Additional information: 4
Additional information: 720913
Additional information: -1)
)
2016-08-13 18:31:23.787: [ CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks: read failed at offset 17
of /dev/sdb8

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Voting Disk Access Problem (3)
====> The last error that shows a problem accessing the voting disk.
====> Note that the last message is 200 seconds after the first message
====> because the long disktimeout is 200 seconds
2016-08-13 18:34:37.423: [ CSSD][150477728]clssnmvDiskOpen: Opening /dev/sdb8
2016-08-13 18:34:37.423: [ CLSF][150477728]Opened hdl:0xf4336530 for dev:/dev/sdb8:
2016-08-13 18:34:37.429: [ SKGFD][150477728]ERROR: -9(Error 27072, OS Error (Linux Error: 5:
Input/output error
Additional information: 4
Additional information: 720913
Additional information: -1)
)
2016-08-13 18:34:37.429: [ CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks: read failed at offset 17
of /dev/sdb8

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Voting Disk Access Problem (4)
====> This message shows that ocssd.bin tried accessing the voting disk for 200 seconds
2016-08-13 18:34:38.205: [ CSSD][4110736288](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
200880 ms for voting file /dev/sdb8)
====> ocssd.bin aborts itself with an error message that the majority of voting disks are not available. In
this case, there was only one voting disk, but if three voting disks were available, as long as two
voting disks are accessible, ocssd.bin will not abort.
2016-08-13 18:34:38.206: [ CSSD][4110736288](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1
configured voting disks available, need 1
2016-08-13 18:34:38.206: [ CSSD][4110736288]###################################
2016-08-13 18:34:38.206: [ CSSD][4110736288]clssscExit: CSSD aborting from thread
clssnmvDiskPingMonitorThread
2016-08-13 18:34:38.206: [ CSSD][4110736288]###################################
Conclusion
The voting disk was not available, engage storage team

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Node Eviction Triage

Time synchronisation issue


Cluster Time Synchronisation Services daemon
Provides time management in a cluster for Oracle.
Observer mode when Vendor time synchronisation s/w is found
Logs time difference to the CRS alert log
Active mode when no Vendor time sync s/w is found

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Node Eviction Triage
Cluster Ready Services Daemon
The CRSD daemon is primarily responsible for maintaining the availability of application
resources, such as database instances. CRSD is responsible for starting and stopping these
resources, relocating them when required to another node in the event of failure, and
maintaining the resource profiles in the OCR (Oracle Cluster Registry). In addition, CRSD is
responsible for overseeing the caching of the OCR for faster access, and also backing up the
OCR.
Log file is GI_HOME/log/<node>/crsd/crsd.log
Rotation policy 10-50M
Retention policy 10 logs
Dynamic in 12.1 and can be changed

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Node Eviction Triage
CRSD oraagent
CRSDs oraagent manages
all database, instance, service and diskgroup resources
node listeners
SCAN listeners, and ONS
If the Grid Infrastructure owner is different from the RDBMS home owner then you would
have 2 oraagents each running as one of the installation owners. The database, and service
resources would be managed by the RDBMS home owner and other resources by the Grid
Infrastructure home owner.
Log file is
GI_HOME/log/<node>/agent/crsd/oraagent_<user>/oraagent_<user>.log

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Node Eviction Triage

CRSD orarootagent
CRSDs rootagent manages
GNS and its VIP
Node VIP
SCAN VIP
network resources.
Log file is
GI_HOME/log/<node>/agent/crsd/orarootagent_root/oraagent_root.log

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Node Eviction Triage
Agent return codes
Check entry must return one of the following return codes:
ONLINE
UNPLANNED_OFFLINE
Target=online, may be recovered failed over
PLANNED_OFFLINE
UNKNOWN
Cannot determine, if previously online, partial then monitor
PARTIAL
Some of a resources services are available. Instance up but not open.
FAILED
Requires clean action

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Troubleshooting Scenarios
Automatic Diagnostic Repository (ADR)

Important logs and traces


11.2 Databases only use ADR
Grid Infrastructure files in $GI_HOME/log/<node_name>/<component_name>
$GI_HOME/log/myHost/cssd
$GI_HOME/log/myHost/alertmyHost.log
12c Grid Infrastructure and Database use ADR
Different locations for Grid Infrastructure and Databases
Grid Infrastructure
Alert.log, cssd.log, csrd.log, etc
Databases
Alert.log, background process traces, foreground process traces

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Oracles Database and Clusterware Tools
What if issues were detected before they
had an impact? Hang
Manager
Trace File
Analyzer
What if you were notified with a specific Quality of
Service
diagnosis and corrective actions? Management

Cluster
What if resource bottlenecks threatening Health
SLAs were identified early? EXAchk Advisor

What if bottlenecks could be Memory


Guard
automatically relieved just in time?
Cluster ORAchk
What if database hangs and node reboots Health
Monitor
could be eliminated? Cluster
Verification
Utility

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 60
Maintains Compliance
with Best Practices and
Alerts Vulnerabilities to
Known Issues

Oracle 12c ORAchk & EXAchk

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 61


Why Oracle ORAchk & EXAchk
Automatic proactive warning Health checks for most impactful Runs in your environment
of problems before they reoccurring problems with no need to send
impact you anything to Oracle

Get scheduled health reports Findings can be integrated


sent to you in email Engineered into other tools of choice
EXAchk
Systems
Common Framework
Non
Engineered ORAchk
Systems

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 62


Oracle Stack Coverage
Oracle Engineered Systems Oracle Database Oracle E-Business Suite
Oracle Database Appliance Standalone Database Oracle Payables
o Oracle Exadata Database Machine Grid Infrastructure & RAC Oracle Workflow
o Oracle SuperCluster / MiniCluster Maximum Availability Architecture (MAA) Oracle Purchasing
Scorecard Oracle Order Management
o Oracle Private Cloud Appliance
Upgrade Readiness Validation Oracle Process Manufacturing
o Oracle Big Data Appliance
Golden Gate Oracle Receivables
o Oracle Exalogic Elastic Cloud Oracle Restart Oracle Fixed Assets
o Oracle Exalytics In-Memory Machine Oracle Enterprise Manager Cloud Control Oracle HCM
o Oracle Zero Data Loss Recovery Appliance Repository Oracle CRM
Oracle ASR Agent Oracle Project Billing
OMS Oracle Siebel
Oracle Systems
Oracle Solaris Oracle Middleware Database best practices
Cross stack checks Application Continuity Oracle PeopleSoft
Oracle Identify and Access Management Database best practices
Solaris Cluster
Suite (Oracle IAM)
OVN Oracle SAP
EXAdata best practices

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 63


Profiles Profile
asm ASM Checks
Description

avdf Audit Vault Configuration checks


Profiles provide logical grouping of clusterware
control_VM
Oracle clusterware checks
Checks only for Control VM(ec1-vm, ovmm, db, pc1, pc2).
checks which are about similar topics No cross node checks
corroborate Exadata checks needs further review by user to determine
Run only checks in a specific profile pass or fail
dba DBA Checks
./exachk profile <profile> ebs Oracle E-Business Suite checks
eci_healthchecks Enterprise Cloud Infrastructure Healthchecks
Run everything except checks in a specific ecs_healthchecks Enterprise Cloud System Healthchecks
profile goldengate Oracle GoldenGate checks
hardware Hardware specific checks for Oracle Engineered systems
./exachk excludeprofile <profile> maa Maximum Availability Architecture Checks
ovn Oracle Virtual Networking
platinum Platinum certification checks
preinstall Pre-installation checks
prepatch Checks to execute before patching
security Security checks
solaris_cluster Solaris Cluster Checks
storage Oracle Storage Server Checks
switch Infiniband switch checks
sysadmin Sysadmin checks
user_defined_checks Run user defined checks from user_defined_checks.xml

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 64


Profiles Profile
asm ASM Checks
Description

bi_middleware Oracle Business Intelligence checks


Profiles provide logical grouping of clusterware
dba
Oracle clusterware checks
DBA Checks
checks which are about similar topics ebs Oracle E-Business Suite checks
emagent Cloud control agent checks
Run only checks in a specific profile emoms Cloud Control management server
em Cloud control checks
./orachk profile <profile> goldengate Oracle GoldenGate checks
hardware Hardware specific checks for Oracle Engineered systems
Run everything except checks in a specific oam Oracle Access Manager checks
profile oim Oracle Identify Manager checks
oud Oracle Unified Directory server checks
./orachk excludeprofile <profile> ovn Oracle Virtual Networking
peoplesoft Peoplesoft best practices
preinstall Pre-installation checks
prepatch Checks to execute before patching
security Security checks
siebel Siebel Checks
solaris_cluster Solaris Cluster Checks
storage Oracle Storage Server Checks
switch Infiniband switch checks
sysadmin Sysadmin checks
user_defined_checks Run user defined checks from user_defined_checks.xml

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 65


Keep Track of Changes to the Attributes of Important Files
Track changes to the attributes of important files with fileattr
Looks at all files & directories within Grid Infrastructure and Database homes by default
The list of monitored directories and their contents can be configured to your specific requirements
Use fileattr start to start the first snapshot ./orachk fileattr start

$ ./orachk -fileattr start


CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/11.2.0.4/grid?[y/n][y]
Checking ssh user equivalency settings on all nodes in cluster
Node mysrv22 is configured for ssh user equivalency for oradb user
Node mysrv23 is configured for ssh user equivalency for oradb user
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
orachk has taken snapshot of file attributes for above directories at:
/orahome/oradb/orachk/orachk_mysrv21_20170504_041214

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 66


Keep Track of Changes to the Attributes of Important Files
Compare current attributes against first snapshot using fileattr check
./orachk fileattr check
$ ./orachk -fileattr check -includedir "/root/myapp/config" -excludediscovery
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/12.2.0/grid?[y/n][y]
Checking for prompts on myserver18 for oragrid user...
Checking ssh user equivalency settings on all nodes in cluster
Node myserver17 is configured for ssh user equivalency for root user
List of directories(recursive) for checking file attributes:
Results of snapshot comparison will also
/root/myapp/config be shown in the HTML report output
Checking file attribute changes...
.
"/root/myapp/config/myappconfig.xml" is different:
Baseline : 0644 oracle root /root/myapp/config/myappconfig.xml
Current : 0644 root root /root/myapp/config/myappconfig.xml
etc
etc

Note:
Use the same arguments with check that you used with start
Will proceed to perform standard health checks after attribute checking
File Attribute Changes will also show in HTML report output

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 67


Improve performance of SQL queries
Many new checks focus on known issues in 12c All contained in the dba profile:
Optimizer as well as SQL Plan Management -profile dba

These checks target problems such as:


Wrong results returned
High memory & CPU usage
Errors such as ORA-00600 or ORA-07445
Issues with cursor usage
Other general SQL plan management problems

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Oracle Database Security Assessment Tool (DBSAT) included
DBSAT analyzes
database
configurations and
security policies
Uncovers security
risks
Improves the security
posture of Oracle
Databases

All results included within report output under the check:


Validate database security configuration using database security assessment tool

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Upgrade to Database 12.2 with confidence
New checks to help when upgrading the database
to 12.2
Both pre and post upgrade verification to prevent
problems related to:
OS configuration
Grid Infrastructure & Database patch prerequisites
Database configuration
Cluster configuration
Pre upgrade -u o pre

Post upgrade -u o post

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Oracle Health Checks Collection Manager
New Collection Manager
app built on APEX 5
theme
Tabs replaced with drop
down menus for easier
navigation
ORAchk & EXAchk
continue to ship with
APEX 4 app too
No more new
functionality in the APEX
4 app, all new features
will go into the APEX 5
app

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 71


Enterprise Manager Integration

Related checks grouped into View targets checked, violations &


compliance standards average score

Check results integrated into EM


compliance framework via plugin
View results in native EM
compliance dashboards
Drill down into compliance standard View break down by target
to see individual check results

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 72


Provision
Use Enterprise Manager provisioning After selected this will launch the
feature and select ORAchk/EXAchk provisioning wizard, choose the system
type

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 73


View Results by Compliance Standard
Drill into applicable standard and view
individual checks & target status
Filter by Exachk%

Click individual checks for


recommendation details

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 74


JSON Output to Integrate with Kibana, Elastic Search etc
The JSON provides many tags to
allow dashboard filtering based on
facts such as:
Engineered System type
Engineered System version
Hardware type
Node name
OS version
Rack identifier
Rack type
Database version
And more...
Kibana can be used to view health
check compliance across your data
center
Results can also be filtered based
on any combination of exposed
system attributes

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 75


JSON Output to Integrate with Kibana, Elastic Search etc

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 76


Speeds Issue Diagnosis,
Triage and Resolution

Oracle 12c Trace File Analyzer

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal 77
Why TFA?

Collects data across the


Provides one interface for
cluster and consolidates it
all diagnostic needs
in one place

Reduces time required to


Collects all relevant
obtain diagnostic data,
diagnostic data at the time
which saves your business
of the problem
money

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal 78
Supported Platforms and Versions
All major Operating Systems are All Oracle Database & Grid versions
supported 10.2+ are supported
Linux (OEL, RedHat, SUSE, Itanium &
zLinux) You probably already have TFA
Oracle Solaris (SPARC & x86-64) installed as it is included with:
AIX Oracle Grid
Oracle Database
Infrastructure
HPUX (Itanium & PA-RISC) 11.2.0.4+
Windows 12.1.0.2+ 12.2.0.1+
12.2.0.1+

Updated quarterly via 1513912.1


OS versions supported are the same as those supported by the Database
Java Runtime Edition 1.8 required

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 79


Linux / Unix Installation
Root / Daemon Install Non root / Non Daemon Install
1. Download from 1513912.1 1. Download from 1513912.1
2. Copy to one required machine and unzip 2. Copy to every required machine and unzip
3. Run ./installTFA<platform> 3. Run ./installTFA<platform>

Will : -extractto <install_dir>


Will: -javahome <jre_home>
Install on all nodes
Only install on current host
Auto discover relevant Oracle Software & Exadata
Storage Servers Not do automatic collections
Start monitoring for problems & perform auto Not collect from remote hosts
collections Not collect files unreadable by install user

Recommended install location: /opt/oracle.tfa

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 80


Architecture
TFA daemon runs on each cluster
node
Remote
TFA Cluster
Node
n
Remote
Daemon TFA
Or single instance when no
Node
Daemon
Remote Grid Infrastructure is used
2 TFA
Node TFA
Scripts 1 Daemon Daemon Command line communication is
via tfactl command
Scripts
Alerts &
Scripts Log files TFA Daemons on all nodes
Alerts &
Log files
coordinate:
Scripts Script execution
tfactl Collection of diagnostics
Cluster Trimming of log contents
wide
Initiator Node Collection
( Where command originated) Cluster wide collection output is
consolidated on one node

The daemon is only used when installed as root

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 81


Automatic Diagnostic Collections
Oracle Trace File Analyzer
DBA(s) / Sys Admin(s)
1
Automatically
detect event
Oracle Grid Infrastructure
& Database(s)
2 4
Collect & package Upload collection
relevant to Oracle Support
diagnostics for further help

Significant 3 Notify
problem occurs
relevant DBA and
or Sys Admin by
email

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 82


Command Interfaces
Command line Shell Menu
Specify all command options at 1. Set and change context 1. Select menu navigation
the command line options then choose the
2. Run commands from within command you want to run
tfactl <command> the shell
tfactl menu
tfactl
tfaclt > database MyDB
MyDB tfactl > oratop

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 83


Maintain
Option 1 Option 2
Applying standard PSUs will To update with latest TFA & Support
automatically update TFA Tools Bundle
PSUs do not contain Support Tools 1. Download latest version: 1513912.1
Bundle updates 2. Repeat the same installation steps

Upgrade to the latest version whenever possible to include bug fixes, new features & optimizations

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 84


View System & Cluster Summary

Choose an option to drill


down further

Quick summary of status of


key components

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 85


Summary ASM Drill Down Example

ASM Overview

ASM cluster wide summary

Problems found

ASM Cluster wide status Problems found on myserver69

Also disk space warning on both servers

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 86


Summary ASM Drill Down Example
View ASM problems for myserver69

View node wise & drill into


myserver69

View ASM status summary


for myserver69

View recent problems detected


View component status

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 87


Investigate Logs & Look for Errors
Analyze all important recent log entries: Search recent log entries:
tfactl analyze last 1d tfactl analyze -search ora-00600" -last 8h

Searching for
ora-00600

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 88


Perform Analysis Using the Included Tools
Tool Description Tool Description
orachk or Provides health checks for the Oracle stack. grep Search alert or trace files with a given database and file name pattern, for
exachk Oracle Trace File Analyzer will install either a search string.
Oracle EXAchk for Engineered Systems, see document 1070954.1 for
more details summary Provides high level summary of the configuration
or vi Opens alert or trace files for viewing a given database and file name
Oracle ORAchk for all non-Engineered Systems, see document pattern in the vi editor
1268927.2 for more details
tail Runs a tail on an alert or trace files for a given database and file name
oswatcher Collects and archives OS metrics. These are useful for instance or node pattern
evictions & performance Issues. See document 301137.1 for more details
param Shows all database and OS parameters that match a specified pattern
procwatcher Automates & captures database performance diagnostics and session level
dbglevel Sets and unsets multiple CRS trace levels with one command
hang information. See document 459694.1 for more details
history Shows the shell history for the tfactl shell
oratop Provides near real-time database monitoring. See document 1500864.1
for more details. changes Reports changes in the system setup over a given time period. This
sqlt Captures SQL trace data useful for tuning. See document 215187.1 for includes database parameters, OS parameters and patches applied
more details. calog Reports major events from the Cluster Event log
alertsummary Provides summary of events for one or more database or ASM alert files events Reports warnings and errors seen in the logs
from all nodes
managelogs Shows disk space usage and purges ADR log and trace files
ls Lists all files TFA knows about for a given file name pattern across all nodes
pstack Generate process stack for specified processes across all nodes ps Finds processes
triage Summarize oswatcher/exawatcher data
Not all tools are included in Grid or Database install.
Download from 1513912.1 to get full collection of tools Verify which tools you have installed: tfactl toolstatus

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 89


OS Watcher (Support Tools Bundle)

Collect & Archive OS Metrics


Executes standard UNIX utilities (e.g. vmstat, iostat, ps,
etc) on regular intervals
Built in Analyzer functionality to summarize, graph and
report upon collected metrics
Output is Required for node reboot and performance
issues
Simple to install, extremely lightweight
Runs on ALL platforms (Except Windows)
MOS Note: 301137.1 OS Watcher Users Guide

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 90


Procwatcher (Support Tools Bundle)

Monitor & Examine Database Processes


Single instance & RAC
Generates session wait, lock and latch reports as well as call stacks
from any problem process(s)
Ability to collect stack traces of specific processes using Oracle Tools
and OS Debuggers
Typically reduces SR resolution for performance related issues
Runs on ALL major UNIX Platforms
MOS Note: 459694.1 Procwatcher Install Guide

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 91


oratop (Support Tools Bundle)

Near Real-Time Database Monitoring


Single instance & RAC
Monitoring current database activities
Database performance
Identifying contentions and bottleneck

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 92


Analyze
Each tool can be run using tfactl in shell mode
Start tfactl shell with tfactl

Run a tool with the tool name tfactl > orachk

1. Where necessary set context with database <dbname> tfactl > database MyDB

2. Then run tool MyDB tfactl > oratop

3. Clear context with database MyDB tfactl > database

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 93


One Command SRDCs
For certain types of problems
Oracle Support will ask you to
run a Service Request Data
Collection (SRDC)
Previously this would have
involved:
Reading many different
support documents
Collecting output from
many different tasks
Gathering lots of different
diagnostics
Packaging & uploading
Now just run:

tfactl diagcollect -srdc <srdc_type>

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 94


Faster & Easier SR Data Collection
tfactl diagcollect srdc <srdc_type>

Type of Problem SRDC Types Collection Scope


ORA-00600
ORA-00700 ORA-27300
ORA Errors ORA-04030 ORA-27301 Local only
ORA-04031 ORA-27302
ORA-07445
Other internal database errors internalerror Local only
Database performance problems dbperf Cluster wide
dbpatchinstall New
Database patching problems Local only
dbpatchconflict New
dbinstall New
Database install / upgrade problems Local only
dbupgrade New
Enterprise Manager tablespace usage metric problems emtbsmetrics New Local only (on EM Agent target)
emdebugon New
Enterprise Manager general metrics page or threshold Local only (on EM Agent target & OMS)
emdebugoff New
problems - Run all three SRDCs
emmetricalert New Local only (on EM Agent target & Repository DB)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 95


One Command SRDCs Examples of Whats Collected
ORA4031: Database Performance
tfactl diagcollect srdc ora4031 tfactl diagcollect srdc dbperf

1. IPS Package 1. ADDM report


2. Patch Listing 2. AWR for good and problem period
3. AWR report 3. AWR Compare Period report
4. Memory information 4. ASH report for good and problem period
5. RDA 5. OS Watcher
6. IPS Package (if errors during problem
period)
7. ORAchk (performance related checks)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 96


Manual Data Gathering vs One Command SRDC
Manual Data Gathering TFA SRDC
1. Generate ADDM reviewing Document 1680075.1 1. Run tfactl diagcollect srdc dbperf
2. Identify good and problem periods and gather AWR 2. Upload resulting zip file to SR
reviewing Document 1903158.1
3. Generate AWR compare report (awrddrpt.sql) using good
and problem periods
4. Generate ASH report for good and problem periods
reviewing Document 1903145.1
5. Collect OSWatcher data reviewing Document 301137.1
6. Check alert.log if there are any errors during the problem
period
7. Find any trace files generated during the problem period
8. Collate and upload all the above files/outputs to SR

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 97


One Command SRDC
Interactive Mode
tfactl diagcollect srdc <srdc_type>

4. All required files are


identified

5. Trimmed where
applicable

6. Package in a zip ready


to provide to support
1. Enter default for event date/time and database name

2. Scans system to identify recent 10 events in the system (ORA600


example shown)

3. Once the relevant event is chosen, proceeds with diagnostic


collection

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 98


One Command SRDC
Silent Mode tfactl diagcollect srdc <srdc_type> -database <db> -for <time>

1. Parameters(date/time, DB name) are provided


in the command

2. Does not prompt for any more information

3. All required files are identified

4. Trimmed where applicable

5. Package in a zip ready to provide to support

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 99


Default Collection
Run a default diagnostic
collection if there is not
yet an SRDC about your
problem:
tfactl diagcollect

Will trim & collect all


important log files
updated in the past 12
hours:
Collections stored in the
repository directory
Change diagcollect
timeframe with
last <n>h|d

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 100
Automatic Database Log Purge
TFA can automatically purge database logs
OFF by default
Except on a Domain Service Cluster (DSC),
which it is ON by default

Turn auto purging on or off: tfactl set manageLogsAutoPurge=<ON|OFF>

Will remove logs older than 30 days


configurable with: tfactl set manageLogsAutoPurgePolicyAge=<n><d|h>

Purging runs every 60 minutes


configurable with: tfactl set manageLogsAutoPurgeInterval=<minutes>

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 101
Manual Database Log Purge
TFA can manage ADR log and trace files
Show disk space usage of individual diagnostic destinations
Purge these file types based on diagnostic location and or age:
"ALERT, "INCIDENT, "TRACE, "CDUMP, "HM, "UTSCDMP, "LOG
tfactl managelogs <options>

Option Description
Runs as the ADR home
show usage Shows disk space usage per diagnostic directory for both GI and database logs owner. So will only be able
-show variation older <n><m|h|d> Use to determine per directory disk space growth. to purge files this owner
Shows the disk usage variation for the specified period per directory. has permission to delete
-purge older <n><m|h|d> Remove all ADR files under the GI_BASE directory, which are older than the time specified
gi Restrict command to only diagnostic files under the GI_BASE
database [all | dbname] Restrict command to only diagnostic files under the database directory. Defaults to all,
alternatively specify a database name
-dryrun Use with purge to estimate how many files will be affected and how much disk space will be May take a while for a
freed by a potential purge command. large number of files

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 102
Manual Database Log Purge
tfactl managelogs show usage tfactl managelogs show variation older <n><m|h|d>

Use -gi to only


show grid
infrastructure

Use database to only


show database

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 103
Manual Database Log Purge
tfactl managelogs purge older n<m|h|d> -dryrun tfactl managelogs purge older n<m|h|d>

Use dryrun
for a what if

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 104
Disk Usage Snapshots
TFA will track disk usage and record snapshots to:
tfa/repository/suptools/<node>/managelogs/usage_snapshot/
Snapshot happens every 60 minutes, configurable with:
tfactl set diskUsageMonInterval=<minutes>

Disk usage monitoring is ON by default, configurable with:


tfactl set diskUsageMon=<ON|OFF>

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 105
Collect
Trim & collect all important log files updated in Collect a problem specific Service Request Data
the past 12 hours: tfactl diagcollect Collection (SRDC): tfactl diagcollect -srdc ora600

Collections stored in the repository directory


Change diagcollect timeframe with since <n>h|d
For list of types of srdc collections use tfactl diagcollect -srdc help

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | 106
TFA dbglevel profiles
Example
tfactl dbglevel -set node_eviction
would be used for enhancing diagnostics when node evictions are the being
investigated and would perform the following operation internally
crsctl set log css "CSSD=4"
crsctl set log css "CSSDNMC=4"
crsctl set log css "CLSF=4"
crsctl set log css "CSSDGMCC=4"
crsctl set log css "CSSDGMPC=4"

To revert to the original or default logging levels the following command


$ tfactl dbglevel -unset node_eviction
would perform the following operations internally
crsctl set log css "CSSD=2"
crsctl set log css "CSSDNMC=2"
crsctl set log css "CLSF=0"
crsctl set log css "CSSDGMCC=2"
crsctl set log css "CSSDGMPC=2" Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal/Restricted/Highly Restricted 107
Incident Based Collections with SRDC
Incident Type Description For dbperf use these parameters to
ora4030 For ORA-04030 errors
ora4031 For ORA-04031 errors specify the good & bad performance
dbperf For basic db performance problems periods to compare:
ora600 For ORA-00600 errors
Parameter Description
ora700 For ORA-00700 errors
perf_base_sd Start date for a good performance period
ora7445 For ORA-07445 errors
perf_base_st Start time for a good performance period
perf_base_ed End date for a good performance period
Use srdc <incident type>: tfactl srdc ora4030 perf_base_et End time for a good performance period
To specify sid use sid <oracle sid> perf_comp_sd Start date for a bad performance period
To specify database use db <dbname> perf_comp_st Start time for a bad performance period
perf_comp_ed End date for a bad performance period
To specify incident date & time use perf_comp_et End time for a bad performance period
inc_date <YYYY-MM-DD> -inc_time <HH:MM:SS>
To upload directly to the SR use sr<SR#> tfactl srdc dbperf db RDBMS121 \
perf_base_sd 2016-06-15 perf_base_st 01:30:00 \
tfactl srdc ora4030 -sid orcl db RDBMS121 \ perf_base_ed 2016-06-15 perf_base_et 02:00:00 \
-inc_date 2016-06-15 -inc_time 02:48:23 \ perf_comp_sd 2016-06-16 perf_comp_st 09:30:00 \
-sr 3-123456789 perf_comp_ed 2016-06-16 perf_comp_et 10:00:00

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal 108
Generates Diagnostic
Metrics View of Cluster
and Databases

Oracle 12c Cluster Health Monitor

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential
Confidential Oracle Restricted
Oracle Internal/Restricted/Highly Restricted 109
Cluster Health Monitor (CHM)
Generates Diagnostic Metrics View of Cluster and Databases

Always on - Enabled by default


OS Data OS Data
Provides Detailed OS Resource Metrics
osysmond
Assists Node eviction analysis osysmond
OS Data

Locally logs all process data osysmond


ologgerd
(master)
User can define pinned processes
Listens to CSS and GIPC events osysmond OS Data

Categorizes processes by type GIMR

Supports plug-in collectors (ex. 12c Grid Infrastructure


traceroute, netstat, ping, etc.) Management Repository

New CSV output for ease of analysis


Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential
Confidential Oracle Restricted
Oracle Internal/Restricted/Highly Restricted 110
Cluster Health Monitor (CHM)
Oclumon CLI or Full Integration with EM Cloud Control

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential
Confidential Oracle Restricted
Oracle Internal/Restricted/Highly Restricted 111
Discovers Potential Cluster
& DB Problems - Notifies
with Corrective Actions

Oracle 12c Cluster Health Advisor

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential
Confidential Oracle Restricted
Oracle Internal/Restricted/Highly Restricted 112
Cluster Health Monitor (CHM)
Generates Diagnostic Metrics View of Cluster and Databases

Always on - Enabled by default


OS Data OS Data
Provides Detailed OS Resource Metrics
osysmond
Assists Node eviction analysis osysmond
OS Data

Locally logs all process data osysmond


ologgerd
(master)
User can define pinned processes
Listens to CSS and GIPC events osysmond OS Data

Categorizes processes by type GIMR

Supports plug-in collectors (ex. 12c Grid Infrastructure


traceroute, netstat, ping, etc.) Management Repository

New CSV output for ease of analysis


Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential
Confidential Oracle Restricted
Oracle Internal/Restricted/Highly Restricted 113
CHA has detected a service degradation due to higher than expected I/O latencies.

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 114
CHA has detected a service degradation due to higher than expected I/O latencies.

Cluster Health Advisor

CHA/DB Health

CHA detected a for service degradation due to higher than expected I/O latencies.

CHA/DB Health: I/O problem

Cluster Health Advisor


Problem The degradation is caused by a higher than expected utilization of shared storage devices for this
database. No evidence of significant increase in I/O demand on the local node.
Confidence 95.17%
Action Validate whether there is increase in I/O demand on other nodes than the local and find I/O intensive SQL .
Add more disks to disk group or move database to faster disks.
proddb_1

proddb_2

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Confidential Oracle Restricted 115
Cluster Health Advisor Daemon

Dependencies to the Grid Infrastructure


Management Repository (GIMR)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted 116
Command Line Tool - chactl

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted 117
Cluster Health Advisor

Will only monitor cluster


initially

Tell it to monitor the


database

chactl monitor database db <db_name>

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted 118
Cluster Health Advisor - diagnosis Query a specific database for
diagnosis

Query the cluster diagnosis for


incidents and recommendations chactl query diagnosis chactl query diagnosis db <db_name>

Query the repository footprint

chactl query repository

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted 119
Autonomously Preserves
Database Availability and
Performance

Oracle 12c Database Hang Manager

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential
Confidential Oracle Restricted
Oracle Internal/Restricted/Highly Restricted 120
Debugging Live Systems: Hangs
Parsing the system state dump can be very time consuming.
To debug a hang more quickly you could query v$session.
blocking_session:
select sess.sid sid,substr(proc.program,0,25)
prog,substr(sw.event,0,15) event,sw.wait_time wt,
sess.blocking_session bsid from v$process proc, v$session sess,
v$session_wait sw where proc.addr=sess.paddr and
sess.status='ACTIVE and sw.sid=sess.sid order by prog;

SID Program Event WT BSID


----- ------------------------- --------------- --- -----
2836 oracle@fstsun002 (S000) enq: TM - conte 0 2979
2690 oracle@fstsun002 (S001) enq: TM - conte 0 2979
2531 oracle@fstsun002 (S002) enq: TM - conte 0 2979
2811 oracle@fstsun002 (S003) enq: TM - conte 0 2979
2979 oracle@fstsun002 (TNS V1- enq: TM - conte 0 2853

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Debugging Live Systems: Hangs
sqlplus prelim / as sysdba is useful because it avoids a
process state object creation which requires various
resources such as latches.
Trying to acquire those resources may cause your debugger
session to hang.
Some dumps/commands may require a PSO therefore you
can execute those dumps/commands in an existing process
that already has a PSO

$ sqlplus -prelim "/ as sysdba"


SQL> oradebug setorapid 9
SQL> oradebug dump systemstate 3

Copyright 2017, Oracle and/or its affiliates. All rights reserved. |


Oracle 12c Hang Manager
Autonomously Preserves Database Availability and Performance Session

Always on - Enabled by default


Reliably detects database hangs and DETECT

deadlocks
Autonomously resolves them EVALUATE
Hung?
Supports QoS Performance Classes, Ranks
and Policies to maintain SLAs ANALYZE
QoS
Logs all detections and resolutions Policy
DIA0 VERIFY
New SQL interface to configure sensitivity
(Normal/High) and trace file sizes
Victim

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 123
Oracle 12c Hang Manager
Full Resolution Dump Trace File and DB Alert Log Audit Reports
Dump file /diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
Oracle Database 12c Enterprise Edition Release 12.2.0.0.0 - 64bit Beta
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics 2015-10-13T16:47:59.435039+17:00
and Real Application Testing options Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):
Build label: RDBMS_MAIN_LINUX.X64_151013 ORA-32701: Possible hangs up to hang ID=1 detected
ORACLE_HOME: /3775268204/oracle Incident details in: /diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc
System name: Linux 2015-10-13T16:47:59.506775+17:00
Node name: slc05kyr DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2
Release: 2.6.39-400.211.1.el6uek.x86_64 due to a GLOBAL, HIGH confidence hang with ID=1.
Version: #1 SMP Fri Nov 15 13:39:16 PST 2013 Hang Resolution Reason: Automatic hang resolution was performed to free a
Machine: x86_64 significant number of affected sessions.
VM name: Xen Version: 3.4 (PVM) DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1.
Instance name: hm62
Redo thread mounted by this instance: 2 In the alert log on the instance local to the session (instance 2 in this case),
Oracle process number: 19 we see the following:
Unix process pid: 12656, image: oracle@slc05kyr (DIA0)
2015-10-13T16:47:59.538673+17:00
Errors in file /diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):
*** 2015-10-13T16:47:59.541509+17:00 ORA-32701: Possible hangs up to hang ID=1 detected
*** SESSION ID:(96.41299) 2015-10-13T16:47:59.541519+17:00 Incident details in: /diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
*** CLIENT ID:() 2015-10-13T16:47:59.541529+17:00
*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-13T16:47:59.541538+17:00 2015-10-13T16:48:04.222661+17:00
*** MODULE NAME:() 2015-10-13T16:47:59.541547+17:00 DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1
*** ACTION NAME:() 2015-10-13T16:47:59.541556+17:00 requested by master DIA0 process on instance 1
*** CLIENT DRIVER:() 2015-10-13T16:47:59.541565+17:00 Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
by terminating session sid:40 with serial # 43179 (ospid:13031)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 124
Deploys with Minimum
Footprint and Maximum
Manageability

Oracle Domain Services Cluster (DSC)

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 125
Oracle 12c Domain Services Cluster (DSC)
Deploys with Minimum Footprint and Maximum Manageability
ORACLE CLUSTER DOMAIN
Application Database
Hosts Framework as Services Member
Cluster
Member
Cluster

Reduces local resource footprint Application


Member
Database
Member
Cluster Cluster
Centralizes management
Speeds deployment and patching
Oracle Domain Services Cluster
Optional Shared Storage Database Database
Member Member
Supports multiple versions and Cluster Cluster

platforms going forward Management Repository Service


Trace File Analyzer Receiver
ORAchk Collection Service
Grid Names Service
Storage Services
Rapid Home Provisioning Service

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 126
Oracle Cluster Domain
Database Application Database Database
Member Cluster Member Cluster Member Cluster Member Cluster

Uses IO & ASM Uses ASM


Private Uses local ASM GI only Service of DSC Service
Network
SAN

NAS Oracle Domain Services Cluster


Mgmt Trace File Rapid Home Additional
Repository Analyzer Provisioning ACFS ASM
Optional IO Service
(GIMR) (TFA) (RHP) Services Service
Service Services
Service Service

Shared ASM

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 127
Oracle 12c Domain Services Cluster (DSC)
Deploys with Minimum Footprint and Maximum Manageability
ORACLE CLUSTER DOMAIN
Application Database
Hosts Framework as Services Member
Cluster
Member
Cluster

Reduces local resource footprint Application


Member
Database
Member
Cluster Cluster
Centralizes management
Speeds deployment and patching
Oracle Domain Services Cluster
Optional Shared Storage Database Database
Member Member
Supports multiple versions and Cluster Cluster

platforms going forward Management Repository Service


Trace File Analyzer Receiver
ORAchk Collection Service
Grid Names Service
Storage Services
Rapid Home Provisioning Service

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Restricted 128
Compare Database Status Before & After Upgrade
Download dbupgdiag.sql from doc 556610.1
Run both before and after the upgrade:
cd <location of the script>

$ sqlplus / as sysdba

sql> alter session set


nls_language='American';

sql> @dbupgdiag.sql

sql> exit

Copyright 2017, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted 129