Você está na página 1de 6

ORACLE RAC

NEW FEATURES:
10g: we will have three homes
11g release 2:
* Scan listener is the new feature
Scan listeners minimum 3.
* RAC WILL HAVE 3 SCAN LISTENERS MANDATORY.. (LOAD BALANCING
DONE BY SCAN LISTENER)
AND ANOTHER LISTENER INDIVIDUALLY FOR EACH NODE.
Why does Oracle recommend that you set up the SCAN name with three IP
addresses, thus having three SCAN VIPs and three SCAN listeners?
Redundancy - Having 2 would cover the redundancy requirement.
Maximum Availability - But having 3 ensures that the connect load of the
SCAN listener would not get exhausted, would reduce the CPU cost per node, and
would require the least amount of decision on the part of the cluster.
* FOR ANY RAC CONFIGURATION WE NEED PUBLIC, PRIVATE AND VIRTUAL
IPS ARE MANDATORY.
Public - Client purpose
Private - connection b/w nodes
Virtual - only used for failover of the nodes. (Any issues in any node)
Why is a Virtual IP Necessary in an Oracle RAC Environment?
TCP timeouts. When a node in an Oracle RAC environment goes down.
We know that if a node fails, the VIP fails over to another node

SCAN LISTENER:
*************
* SCAN LISTENER CONFIGURATION NEEDS 3 MORE VIRTUAL IPS
Scan listener can be configured using DNS or GNS
Domain Name Server (DNS) need to mention full VIP address.
Grid Naming Service (GNS) 11g new feature. No need to configure VIP address. We
can just mention subnet (172 is called as subnet from 172.30.193.82)
Why a SCAN virtual IP?
SCAN VIP not only has the task of avoiding a wait for the TCP timeout, but it must
also ensure that the SCAN LISTENER associated with the SCAN VIP can be started on
every available node in the cluster, if needed.
NOTE

The benefit of using SCAN is that the network configuration files on the
client computer do not need to be modified when nodes are added to or
removed from the cluster.

SERVER REBOOT:
**************

init.d - first process to come up after server reboot


crsctl enable HAS if we have this enabled all the cluster services also will come
up with server reboot.

CLUSTERWARE STARTUP SEQUENCE:


*********************************
OHASD

OCR AND VOTING DISK:


********************

Ksed utility is used to read the header of OCR and Voting Disk.
Before 11g, daily we used to take backup of OCR and Voting Disk files...
But from 11g, oracle will take backup by default every 4 hours,end of the day and
end of week.
Ocrconfig show backup
OCR configuration information of every node in the
Voting Disk Voting disk will have the node membership information.
VOTING DISK IN 11G:
*****************
Voting disk a key component of clusterware and its failure can lead to inoperability
of the cluster.
In RAC at any point in time the clusterware must know which nodes are member of
the cluster so that
- it can perform load balancing
- In case a node fails, it can perform failover of resources as defined in the resource
profiles
- If a node joins, it can start resources on it as defined in OCR/OLR
- If a node joins, it can assign VIP to it in case GNS is in use
- If a node fails, it can execute callouts if defined
Hence, there must be a way by which clusterware can find out about the node
membership.
That is where voting disk comes into picture. It is the place where nodes mark their
attendance.

CSSD process on every node makes entries in the voting disk to ascertain the
membership of that node. The voting disk records node membership information. If
it ever fails, the entire clustered environment for Oracle 11g RAC will be adversely
affected and a possible outage may result if the vote disks is/are lost.
Also, in a cluster communication between various nodes is of paramount
importance. Nodes which cant communicate with other nodes should be evicted
from the cluster. While marking their own presence, all the nodes also register the
information about their communicability with other nodes in voting disk . This is
called network heartbeat.
CSSD process in each RAC node maintains its heart beat in a block of size 1 OS
block, in the hot block of voting disk at a specific offset. The written block has a
header area with the node name. The heartbeat counter increments every second
on every write call. Thus heartbeat of various nodes is recorded at different offsets
in the voting disk. In addition to maintaining its own disk block, CSSD processes also
monitors the disk blocks maintained by the CSSD processes running in other cluster
nodes. Healthy nodes will have continuous network and disk heartbeats exchanged
between the nodes. Break in heart beat indicates a possible error scenario.If the
disk block is not updated in a short timeout period, that node is considered
unhealthy and may be rebooted to protect the database information. In this case ,
a message to this effect is written in the kill block of the node. Each node reads its
kill block once per second, if the kill block is overwritten node commits suicide.
During reconfig (join or leave) CSSD monitors all nodes and determines whether a
node has a disk heartbeat, including those with no network heartbeat. If no disk
heartbeat is detected then node is declared as dead.
Why should we have an odd number of voting disks?
The odd number of voting disks configured provide a method to determine who in
the cluster should survive.
A node must be able to access more than half of the voting disks at any time. For
example, lets have a two node cluster with an even number of lets say 2 voting
disks. Let Node1 is able to access voting disk1 and Node2 is able to access voting
disk2 . This means that there is no common file where clusterware can check the
heartbeat of both the nodes. If we have 3 voting disks and both the nodes are able
to access more than half i.e. 2 voting disks, there will be at least on disk which will
be accessible by both the nodes. The clusterware can use that disk to check the
heartbeat of both the nodes. Hence, each node should be able to access more
than half the number of voting disks. A node not able to do so will have to be
evicted from the cluster by another node that has more than half the voting disks,
to maintain the integrity of the cluster . After the cause of the failure has been
corrected and access to the voting disks has been restored, you can instruct Oracle
Clusterware to recover the failed node and restore it to the cluster.
Loss of more than half your voting disks will cause the entire cluster to fail !!

CACHE FUSION:

**************
Cache Fusion is a new technology that uses a high speed interprocess
communication (IPC) interconnect to provide cache to cache transfers of data
blocks between instances in a cluster. This eliminates disk I/O (which is inherently

slow, since it is a mechanical process) and optimizes read/write concurrency. Block


reads take advantage of the speed of IPC and an interconnecting network. Cache
Fusion also relaxes the requirements of data partitioning.
Cache Fusion addresses these types of concurrency between instances, each of
which is discussed in the following sections:

Concurrent Reads on Multiple Nodes


Concurrent Reads and Writes on Different Nodes
Concurrent Writes on Different Nodes

Concurrent Reads on Multiple Nodes


Concurrent reads on multiple nodes occur when two instances need to read the
same data block. Real Application Clusters easily resolves this situation because
multiple instances can share the same blocks for read access without cache
coherency conflicts.
Concurrent Reads and Writes on Different Nodes
Concurrent reads and writes on different nodes are the dominant form of
concurrency in Online Transaction Processing (OLTP) and hybrid applications. A read
of a data block that was recently modified can be either for the current version of
the block or for a read-consistent previous version. In both cases, the block will be
transferred from one cache to the other.
Concurrent Writes on Different Nodes
Concurrent writes on different nodes occur when the same data block is modified
frequently by processes on different instances.

The main features of the cache coherency model used in Cache


Fusion are:

The cache-to-cache data transfer is done through the high speed IPC
interconnect. This virtually eliminates any disk I/Os to achieve cache
coherency.
The Global Cache Service (GCS) tracks one or more past image (PI) for
a block in addition to the traditional GCS resource roles and modes. (The GCS
tracks blocks that were shipped to other instances by retaining block copies
in memory. Each such copy is called a past image (PI). In the event of a
failure, Oracle can reconstruct the current version of a block by using a PI.
The work required for recovery in node failures is proportional to the number
of failed nodes. Oracle must perform a log merge in the event of failure on
multiple nodes.
The number of context switches is reduced because of the reduced sequence
of round trip messages. In addition, database writer (DBWR) is not involved in
Cache Fusion block transfers. Reducing the number of context switches adds
to the more efficient use of the cache coherency protocol.

Global Cache Service Operations


The GCS tracks the location and status (mode and role) of data blocks, as well as
the access privileges of various instances. Oracle uses the GCS for cache coherency
when the current version of a data block is in one instance's buffer cache and
another instance requests that block for modification. It is also used for reading
blocks.
Following the initial acquisition of exclusive resources in subsequent transactions
multiple transactions running on a single Real Application Clusters instance can
share access to a set of data blocks without involvement of the GCS as long as the
block is not transferred out of the local cache. If the block has to be transferred out
of the local cache, then the Global Resource Directory is updated by the GCS.
NOTE
The Global Enqueue Service (GES) uses a similar notification mechanism. There,
only completion interrupts and blocking interrupts are used.
* All requests for cluster-wide access to a resource are maintained in grant
queues and convert queues. While requests are in progress and until they
are completed, the requests remain in a convert queue. These queues are
managed by the GCS and GES.

LOAD BALANCING:
****************
How to do load balancing with SCAN in 11gR2 RAC
The load balance is done at session level and at connect time only. Oracle
does not move sessions between nodes during normal operation.
Example (2 nodes and 3 sessions):
Every minute or so, PMON of each instance is updating all listeners in the
cluster about the load on the node.
S1 is connecting, so let's say it is connecting to node1. It starts running
something, taking 20% CPU.
S2 is connecting, the listeners know that node1 is 20% busy so S2 is
connecting to node2 and takes 2% CPU.
S3 is connecting, so again, according to the load it will be connected to
node2.
Now, several problems we have with this:
1. After a while, S1 finished so node 1 is idle. S2 and S3 are running heavy
queries, causing 100% CPU on node2. There is nothing we can really do.
Oracle will not load balance that, unless we disconnect and connect again.

2. The load balance is automatic according to information from PMON every


minute. I've seen cases where an application creates a connection pooling
(open many connections upon startup to use them later). and all of the
sessions were connected to a single node. This is because both nodes are
idle. This happened because all session got to the same listener, when we
changed the client side to load balance between the listeners, it was solved.
By the way, this was 10g, I didn't check it on 11g.

Você também pode gostar