Oracle Database Extended RAC Oracle 10g Extended RAC on Windows 2003 Handbook For Achieving HA & Disaster Recovery Solution By Database Manager

Index Introduction Test Bed overview Installation of Virtual Machines

Installation of Oracle Clusterware Services Installation of Oracle Software/ASM/DB Post RAC Installation Health Check Applying Oracle Patch set Adding a third Node to the Cluster Database RAC Concepts Primer Troubleshooting RAC Environment Backup and Recovery - RAC Environment References

Introduction The document is intended as guidelines for Oracle and System Administrators who are responsible for implementing an Oracle Extended (also known as stretched clustering) RAC for the nodes that are located within 5KM away from each other. I have implemented the same on IBM P590 series running AIX 5.3 with Oracle 10g Release, using Oracle Clusterware on SAN storage, the two data centers connected over a 1GB Laser Link network with latency of less than 5 ms. We have had some problems in implementing the solution and most of the issues were related to configuring Oracle RAC components related to network synchronization. At that time, I decided to first fully test the Extended RAC implementation on my own and thanks to VMWare software, I was able to fully test the implementation. This document is based on the VMWare installation on windows and I would highly recommend that before you actually go on implementing Extended RAC over Unix in UAT/Production environment, have it fully tested

on windows to clear the concepts behind it. For RAC Handbook on IBM P590 Series-AIX OS, Pease refer to RAC resources section on my site.

Test Bed overview Oracle 10g Release (later we will apply latest patch Windows 2003 Enterprise Operating System Service Pack 1 Windows resource kit to be installed on every virtual machine. VMWare Workstation Version 4.5 (You can download trial version from their site, but I would highly recommend buying it). 4 windows XP Workstation PCs attached to a Network of 5 ms latency at most, with 1GB Ram, 1 CPU each, and at least 40GB of free storage space. There are many articles on the internet which talks about implementing RAC on a single PC with VMWare installed. However they all need at least 2GB of Memory and even when you acquired that, after the RAC is installed, you can not test all possible RAC scenarios with lack of resources. What I did and recommend, is that at your work place, talk to other DBAs and say that the 4 DBAs will have their PCs which can be used to simulate the extended RAC testing. So all you need to make sure is 4 PCs with 1GB of memory and are connected over LAN with administrative privileges on their workstations.

Installation of Virtual Machines Lets call the 4 XP OS PC work stations as: XPWS1, XPWS2, XPWS3, & XPWS4 In each of these work stations, install VMware Workstation software and then Launch VMware software and create one virtual machine of windows 2003 on every XP work station. While

creating a virtual machine, please make sure to assign the following to each Virtual server: · On XPWS1, create folder C:\RACVIRTUAL and under it create two subfolders as RAC1 and ASMDISK. · On XPWS2, create folder D:\RACVIRTUAL and under it create two subfolders as RAC3 and ASMDISK. · On XPWS2, create folder D:\RACVIRTUAL and under it create 1 subfolder as RAC5. · The above folders are for the new Virtual Server hosting Windows 2003. ASM folder is for hosting ASM Raw devices. · Virtual OS should be assigned with 524MB of RAM each. · During creation of virtual servers, choose bridged network, IO Adapter as LSI Logic, disks as SCSI. · Under Virtual machine settings, remove Drive A: · As for windows 2003, choose default settings. · Make sure the swap space is 1Gb and goes to Drive C: · Create two logical drives C, D where C: drive should be of 4G for OS usage, and D: of 5GB where we will install Oracle 10g Software. · After installation is over, choose VM menu option and install VMWare tools for the new virtual machine/OS. WorkstationVirtual ServerRemarks XPWS1RAC1RAC Node1 with its storage defined XPWS2RAC3RAC Node2 with its storage defined XPWS3RAC5RAC Node3 with no storage of its own XPWS4RAC6Only storage for 3rd voting disk

At this point you will have 4 XP work stations installed with single virtual machine each running Windows 2003 OS. Let us focus now on the first two virtual machines created. Remaining two Virtual machines will be used later for a) adding a 3rd node, and adding a third voting disk/site respectively. Therefore, at this point you will work with two virtual machines which I named as RAC1 (on PC XPWS1), and RAC3 (on PC XPWS2). RAC5 (on XPWS3) will be configured later as a 3rd node. You now need to configure network settings for these two virtual servers. Shutdown RAC1 and RAC3 servers (we will call them from now on RAC1

42. Gateway: leave empty Repeat the same for RAC3 node. the same new H.139.10. Lets now proceed with NIC settings: · Shutdown RAC1.W found screen will appear again.and RAC3 which are created on two separate workstations connected over your home or office network).0 Gateway:10. subnet:255. My settings are: RAC1 Node: Public NIC: subnet:255. because if you pressed cancel. then you should see other articles on the internet.139.0 Gateway: leave empty · Modify the Hosts file of the RAC1 & RAC3 OS (c:\windows\system32\drivers\etc) as: .1 10.10.0. because all of my configurations below are based on this fact. If you would like to proceed with single PC RAC testing. Make sure the subnet is different for Public and Private.0. · Go to network connections from Control panel and you should see two NIC identified as Local Area connection1 and Local Area connection2.0. press Next and complete it. edit Virtual machine settings and add a new Ethernet Adapter as of Bridged type.72. Gateway:10.0.0 10. RAC3 Node: Public NIC: 10.W found.10. This point is very important to note. so please do not create two virtual machines on the same physical PC but have them created on different physical servers.0 Private NIC: subnet:255.10. Rename the two as Public and Private respectively.0.0.0. Click on the Advanced Settings of the Network connections windows and make sure the order of list is first Public.0. Bring up RAC1 server and you should see a window displayed for a new H. so that the host name RAC1 will resolve to Public NIC.72. every time you reboot RAC1. and then Private.1 Private NIC: 10.142.142. it will fail at the end.142. subnet:255. because Private NIC will be used for Cache fusion/Inter Nodes communication between the two RAC Nodes.10.0.0. but let it be. Click on their properties and choose Internet Protocol and assign the IP addresses.10.

10.10. RAC5.42.0. To know more about this parameter see http://www. this will enable you to use Remote Desktop services to access both nodes from a remote laptop or workstation if required.exe.msc remove greyed out NIC under Network category. RAC6) · Oracle supports the TCP/IP protocol for the public and private networks and requires that Windows Media Sensing is disabled by setting the value of the DisableDHCPMediaSense parameter to To do this. (you could use XPWS1 and from here open a remote connection to XPWS2. When we have completed the RAC setup.mspx?mfr=true · Go to command prompt and type: set devmgr_show_nonpresent_devices =1 devmgmt.0.250 10.139 10. Also create a mapped drive from each workstation to each other s C:\RACVIRTUAL (and D:\RACVIRTUAL) folder with full permissions granted. its VIP service will fail over to the surviving node and client will be re-directed to the surviving node without any tcp time out. go to windows registry via Regedit. Navigate to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Serv ices\Tcpip\Parameters.10. Shutdown Both servers(RAC1.42. these are the IP addresses (or names) which will be used to configure client connections to the RAC.10.10.42. · From control panel.10. The mapped drive can be called Z: mapped to C:\RACVIRTUAL.72 10.10. RAC3) from their respective Work dows2000serv/reskit/regentry/94173. RAC3.251 RAC1 RAC1-priv RAC1-vip RAC3 RAC3-priv RAC3-vip RAC1-VIP and RAC3-VIP are not physically linked to any Network Card but are logically defined on the Public Subnet address. and add DWORD key: DisableDHCPMediaSense=1. add/remove Windows components. The Setup would be in such a way that these will act as components of the cluster so in case Nod1 is down.that s what I did). add Terminal Services component. which usually happens when listener is listening on a port attached to physical IP address. Perform following steps for all nodes (RAC1.142.72 10.139 10. Now edit the main virtual Server file C:\RACVIRTUAL\RAC1\ .

You need to use VMWare GUI interface to create this.vmx for the Virtual Server RAC3 created on XPWS2.dataCacheMaxReadAheadSize = "0" diskLib. I will mirror these two raw devices sets(created on two different PC s storage). as I had run out of SCSI limits for number of virtual devices.exe -c -s 200MB -a lsilogic . Raw Device to hold OCR information vmware-vdiskmanager. I will create a set of raw devices on the storage of Both Workstations (XPWS1. Each set will have three raw devices. Since we are going to have an Extended RAC setup.winNetEnterprise. make sure you pre-allocate the disk space and chose a size of 4GB. backup sets to their respective separate raw disks). Later when configuring ASM with normal redundancy.vmx on XPWS1 (for RAC1 node settings) and add the following lines: disk.locking = "FALSE" diskLib.vmdk vmware-vdiskmanager.present = "TRUE" scsi1. XPWS2) via VMware.virtualDev = "lsilogic" scsi1. Create the raw disk as C:\RACVIRTUAL\ASMDISK\oradata1. second set for OCR and third for Voting disks.vmdk Raw Device to hold Voting disk information vmware-vdiskmanager.exe -c -s 200MB -a lsilogic -t 2 D:\RACVIRTUAL\ASMDISK\ocr2. so I chose IDE hard disk as new virtual disk.exe -c -s 200MB -a lsilogic -t 2 C:\RACVIRTUAL\ASMDISK\votingdisk1.vmdk Repeat the same from RAC3 node (on XPWS2) as: vmware-vdiskmanager.dataCacheMinReadAheadSize = "0" diskLib.maxUnsyncedWrites = "0" scsi1. Later I will show you how to create additional raw disks to move different databases files (like redo. one to hold database files.vmdk Explanation of OCR and Voting disk will be explained in next section.sharedBus = "VIRTUAL" · Repeat the same for mapped drive Z:\RAC3\ winNetEnterprise.exe -c -s 300MB -a lsilogic -t 2 C:\RACVIRTUAL\ASMDISK\ocr1. Raw Device to hold Oracle Database Files. From RAC1 node: Go to command prompt. We are now ready to create raw devices.dataCacheMaxSize = "0" diskLib.dataCachePageSize = "4096" diskLib.

Or you can use net time command to configure the time from any time server available on the internet. make sure Windows Time server is disabled. Check current Time Server with: NET TIME /QUERYSNTP To set the initial time with Time server as: NET TIME \\XPWS1 /SET Set current Time Server(XPWS1) for a RAC1 as: NET TIME /SETSNTP:XPWS1 Repeat the same for RAC3 node and make XPWS1 as its Time Server as well. · For RAC1 Node. Then check times on both servers from one place as: NET TIME \\RAC1 NET TIME \\RAC3 Alternatively right click on VMware tools icon and select time Sync between Host and Virtual machines. 2.vmdk via GUI as IDE hard disk.-t 2 D:\RACVIRTUAL\ASMDISK\votingdisk2. Also note the names of the raw devices end with digit2 because these raw devices will be used together with raw devices created on XPWS1 and mirrored by ASM. So make these two command as part of a schedule job to trigger on every system starts up. The two RAC nodes must have the time clock synchronized. You can download third part software which makes clock in synch among different servers in one network. On XPWS2 I have used drive D: because I have more free space in Drive D: on that PC. Now you need to make the new disks available to the VMWare workstation software by editing the virtual settings as: · Bring up both nodes RAC1 and RAC2 and perform the following: 1. However you need to run the above command on every time machine start up. Go to command prompt and type Diskpart and then enter command Automount enable. but when you opt for this option. however for RAC1 node on XPWS1 I have used Drive C:. 3. What I did was basically sync the time for each of the Virtual server to the Host OS(which is the XP workstation) as: From RAC1 Node. you already have registered the raw device that holds database . Search for Time Sync Server on windows in Google for that. This is required to make sure the raw devices will be auto mounted every time os starts up.vmdk And now create D:\RACVIRTUAL\ASMDISK\oradata2.

version = "3" . · Since RAC1 node also need to access the same disks created on RAC3 node. However for the OCR and Voting disks. Here make sure to choose "Do not assign drive letter and do not format the disk and continue until completion.maxUnsyncedWrites = "0" config.dataCacheMaxReadAheadSize = "0" diskLib.vmdk) as you have created it via VMWare GUI.dataCacheMaxSize = "0" diskLib.version = "7" virtualHW. since you created them via command prompt.vmx disk.dataCachePageSize = "4096" diskLib. For simplicity I have copied the contents of the vmx files for both nodes below: RAC1 Node vmx file (Notice remote raw device links with Z:) Location: C:\RACVIRTUAL\RAC1\winNetEnterprise. Click on each of them one by one and perform the following tasks: 1.. Repeat the same for remaining two disks (ocr and voting). and add the three raw devices from RAC3 node.locking = "FALSE" diskLib. · Shutdown both RAC1 and RAC3 nodes. you should see a popup window which will list the three new disks you added. choose create from existing and browse to these two files location and select the vmdk files. · Start up the RAC1 node. you need to repeat the same procedure as above. just use the VMWare gui -add Disks and this time instead of creating a new virtual disk. You should now see your three disks appear as offline in the Disk Management tab. right click on manager MyComputer short cut on desktop. also accept defaults here and complete.. · Bring up both nodes and verify all storage settings. choose remote location as Z:\.files (oradata1. · Repeat the whole Procedure as described above for RAC3. Right click again and choose Create Logical drive. You should now see the disk as Online status. 2.dataCacheMinReadAheadSize = "0" diskLib. but this time when you add disks from exiting. Accept defaults and press Next. choose manage and then select Storage section and click on Disk Management. Right click on the new disk and select new partition and choose extended and proceed to finish.

addressType = "generated" uuid.location = "56 4d c5 0d df c2 22 83-78 96 e8 44 92 2d 88 e3" uuid.deviceType = "plainDisk" scsi1:3.fileName = "Windows Server 2003 Enterprise Edition (3).present = "TRUE" scsi1.reset = "default" ide1:0.scsi0.powerOn = "default" powerType.ungrabbed = "normal" powerType.mode = "persistent" scsi1:1.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.fileName = "C:\RACVIRTUAL\ASMDISK\ocr1.fileName = "-1" displayName = "RAC1" guestOS = "winNetEnterprise" priority.vmdk" scsi1:1.deviceType = "cdrom-raw" floppy0.suspend = "default" powerType.vmdk" scsi1:2.powerOff = "default" powerType.fileName = "C:\RACVIRTUAL\ASMDISK\votingdisk1.virtualDev = "lsilogic" memsize = "524" scsi0:0.present = "TRUE" scsi0:0.startConnected = "TRUE" Ethernet0.vmdk" scsi1:3.sharedBus = "VIRTUAL" scsi1:1.mode = "persistent" scsi1:2.present = "TRUE" scsi1:2.vmdk" ide1:0.present = "TRUE" sound.generatedAddressOffset = "0" tools.syncTime = "FALSE" scsi0:1.bios = "56 4d c5 0d df c2 22 83-78 96 e8 44 92 2d 88 e3" ethernet0.generatedAddress = "00:0c:29:2d:88:e3" ethernet0.grabbed = "normal" priority.fileName = "A:" Ethernet0.present = "TRUE" sound.virtualDev = "lsilogic" scsi1.deviceType = "plainDisk" scsi1:2.vmdk" sound.present = "TRUE" scsi1:3.present = "TRUE" scsi0.deviceType = "plainDisk" .mode = "persistent" scsi1:3.present = "FALSE" scsi1:1.present = "TRUE" ide1:0.fileName = "auto detect" ide1:0.present = "TRUE" scsi0:1.fileName = "Windows Server 2003 Enterprise Edition.virtualDev = "es1371" scsi1.

present = "FALSE" scsi0:6.scsi1:4.mode = "persistent" scsi1:0.fileName = "C:\RACVIRTUAL\ASMDISK\test.present = "FALSE" scsi0:5.generatedAddressOffset = "10" floppy0.deviceType = "plainDisk" RAC3 Node vmx file: Note remote links to raw devices on RAC1 Location: Z:\RAC3\winNetEnterprise.present = "FALSE" ide1:1.locking = "FALSE" diskLib.deviceType = "plainDisk" ide0:1.dataCacheMinReadAheadSize = "0" diskLib.vmdk" scsi0:2.present = "TRUE" Ethernet1.present = "TRUE" scsi1:0.fileName = "C:\RACVIRTUAL\ASMDISK\test9.vmx disk.present = "TRUE" scsi1:4.present = "TRUE" ide0:1.present = "FALSE" redoLogDir = ".fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.deviceType = "plainDisk" scsi0:5.fileName = "Z:\ASMDISK\ocr2." Ethernet0.vmdk" scsi1:0.vmdk" ide0:0.fileName = "Z:\ASMDISK\oradata2.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.mode = "persistent" scsi1:4.vmdk" ide0:0.vmdk" scsi0:5.vmdk" scsi0:3.vmdk" ide0:1.deviceType = "plainDisk" ide1:1.present = "TRUE" ide0:0.generatedAddress = "00:0c:29:2d:88:ed" ethernet1.deviceType = "plainDisk" scsi1:0.present = "FALSE" scsi0:2.connectionType = "bridged" scsi0:2.dataCachePageSize = "4096" .fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.dataCacheMaxSize = "0" diskLib.addressType = "generated" ethernet1.dataCacheMaxReadAheadSize = "0" diskLib.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.vmdk" scsi1:4.present = "FALSE" scsi0:3.deviceType = "plainDisk" Ethernet1.vmdk" scsi0:6.fileName = "Z:\ASMDISK\votingdisk2.

maxUnsyncedWrites = "0" config.fileName = "A:" Ethernet0.version = "3" scsi0.vmdk" ide1:0.addressType = "generated" uuid.deviceType = "cdrom-raw" floppy0.mode = "persistent" scsi1:3.present = "FALSE" scsi1:1.generatedAddressOffset = "0" tools.vmdk" scsi1:2.present = "TRUE" scsi0:0.present = "TRUE" sound.version = "7" virtualHW.deviceType = "plainDisk" scsi1:3.powerOff = "default" powerType.present = "TRUE" scsi1:3.fileName = "z:\ASMDISK\ocr1.suspend = "default" powerType.startConnected = "TRUE" Ethernet0.bios = "56 4d a4 62 67 78 2e 4e-bd 76 ea 69 ed 2d f9 03" ethernet0.fileName = "z:\ASMDISK\votingdisk1.virtualDev = "lsilogic" memsize = "524" scsi0:0.virtualDev = "lsilogic" scsi1.reset = "default" ide1:0.present = "TRUE" scsi0.vmdk" scsi1:1.present = "TRUE" ide1:0.fileName = "Z:\ASMDISK\oradata1.present = "TRUE" scsi0:1.sharedBus = "VIRTUAL" scsi1:1.fileName = "auto detect" ide1:0.fileName = "Windows Server 2003 Enterprise Edition (3).deviceType = "plainDisk" scsi1:2.present = "TRUE" scsi1.grabbed = "normal" priority.mode = "persistent" scsi1:1.vmdk" sound.diskLib.location = "56 4d a4 62 67 78 2e 4e-bd 76 ea 69 ed 2d f9 03" uuid.generatedAddress = "00:0c:29:2d:f9:03" ethernet0.ungrabbed = "normal" powerType.fileName = "Windows Server 2003 Enterprise Edition.fileName = "-1" displayName = "RAC3" guestOS = "winNetEnterprise" priority.present = "TRUE" sound.present = "TRUE" scsi1:2.powerOn = "default" powerType.virtualDev = "es1371" scsi1.mode = "persistent" scsi1:2.syncTime = "FALSE" scsi0:1.vmdk" .

fileName = "D:\RACVIRTUAL\ASMDISK\oradata2.deviceType = "plainDisk" Ethernet1.present = "TRUE" Ethernet1.present = "TRUE" ide0:0.vmdk" scsi1:0. · In Specify Home Details screen. Cd d:\ Cd D:\clusterware\cluvfy runcluvfy.deviceType = "plainDisk" ide0:1.bat .scsi1:3.present = "TRUE" scsi1:0.mode = "persistent" scsi1:0.present = "TRUE" scsi1:4.exe which will launch oracle installer for Cluster Services.addressType = "generated" ethernet1. This is achieved by running a verification utility provided by oracle called "runcluvfy.deviceType = "plainDisk" scsi1:0.vmdk" ide0:1.rac3 -verbose Verify that the output of the above command has only VIP verification failure and all tests should pass.present = "FALSE" redoLogDir = ".present = "TRUE" ide0:1.vmdk" ide0:0.fileName = "D:\RACVIRTUAL\ASMDISK\votingdisk2. set the Name as oracrs and location as .fileName = "Z:\ASMDISK\oradata1.fileName = "D:\RACVIRTUAL\ASMDISK\ocr2.deviceType = "plainDisk" Installation of Oracle Clusterware Services · Before we go on installing Oracle CRS. · Execute the D:\clusterware\Setup. navigate to the CD Rom where you have already inserted Oracle 10g Enterprise CD.connectionType = "bridged" ide0:0. lets verify that our two nodes fulfill all of the pre-requisites for CRS installation.generatedAddressOffset = "10" floppy0.mode = "persistent" scsi1:4. On RAC1 command prompt." Ethernet0.vmdk" scsi1:4.deviceType = "plainDisk" scsi1:4.bat stage -pre crsinst -n rac1.generatedAddress = "00:0c:29:2d:f9:0d" ethernet1.

The other nodes will communicate to Master Node for ocr operations. Oracle Private inter connect configuration 8. one RAC node becomes the master for it (responsible for read/write to it and its mirrored).0\crs. Oracle Notification Server configuration 7. each node has to write to it separately. you would have edit and make sure there will be two interconnect one for Public (10. and 200.42. Virtual Private IP configuration · Except step 8. But for Voting you have to specify its location multiple times. all steps should be completed OK. You will see that for OCR you will have an option like Primary OCR and mirrored OCR locations. You can also know the location of these disks by their names and verifying the names from VMware machine settings for the disks added. As you can see in the Installation screen. You should be able to reorganize the disks first with their sizes (remember we chose 4GB for data files. RAC3-priv and RAC3-vip as names of Private and VIP interfaces. Oracle will go through the following steps: 1. · In the Specify Cluster Configuration screen. Reason is simple that I would like to show how to change RAC configurations after installation.10. Oracle Clusterware configuration 6. you will have to add RAC3 as Second Node for RAC. Configuration Pending 5. For example. · In the cluster configuration Storage screen. Remote Operation Pending 4.0. Here I should have specified three voting disks but I chose only two and third I shall add It later. Ignore and proceed to complete the install and exit. · Oracle will then run a check on all of the pre-requisites and you should make sure all tests passed and then press Next. · At this point we will have the following services configured in the windows service manager 1.0) in my case. Install successful 2. · In the next screen of Specify Network Interface Usage.10. Oracle Object service . you have to add the two nodes along with the all Private. That means for OCR disks. you have to specify the two disks (main and its mirror) for both OCR and Voting disks. Setup Successful 3. However in voting disk.I:\oracle\product\10. · Next oracle will begin installation. Public and VIP names. 300 MB for Voting and Ocr).0) and second for Private (10.2.

2. · Choose Oracle Home (different from CRS) as oradb and location as i:\oracle\product\10. · Select Enterprise Edition. d:\database\setup.0\crs\bin\crsctl check crs I:\oracle\product\10. VIP application resource 2. 4. · Choose Oracle RAC Database in the Welcome screen · In the next screen select Configure Automatic Assistant · Select both RAC nodes. GSD Apps resource(Global Service Directory) 3. Run from CD ROM. So run it from i:\oracle\product\10.2. · Now launch oracle database configuration assistant from the Programs group (not from CD).bat stage -post crsinst -n RAC1.2.RAC3 · Following are the commands that can also be used to verify cluster health on both nodes: I:\oracle\product\10. 3. ONS Apps resource (Oracle notification Service) · At this point your CRS installation is complete and you should verify it by running: Cluvfy. choose Install Database Software only .0\crs\bin\ocrcheck · Recycle both nodes and verify again the CRS health Installation of Oracle Software/ASM/DB · Make sure that the Cluster services are up and running on both nodes.0\db_1 · Make sure to check mark both nodes for s/w installation · Make sure all pre-requisites tests are passed · In the Select Configuration Option. Oracle Oracle Oracle Oracle cluster volume service CR Service CS service EVM service · You need to now run the VIP configuration assistant as it was the one which got failed. 5.2.2. provide password for ASM instance and choose pfile which .exe.0\crs\bin\vipca · The assistant will show you screen where you will have to provide RAC1-vip and RAC2-vip network names and then proceed to install the VIP services. · Basically this will install the following 3 resources (not as windows services) 1. · Complete the installation until end and you should receive an errors this time.0\crs\bin\crs_stat -t I:\oracle\product\10.2.

· Select Redundancy as Normal and create the disk group as shown in the following picture. click on Stamp disks and you should see all of your raw devices (from both node s storage). Select the two raw devices of 4GB on the two nodes and accept defaults. · Choose Create Database. select both nodes. SET CRS_HOME = I:\oracle\product\10. it is always recommended to use OMF with ASM. archived and Flash recovery area to their respective asm groups. · ASM instance setup is now completed and we can begin creating a database. · In the create disk group. · Select DATA as the ASM group for all database files. but specify Archived log location. I will create more groups later and show how to distribute REDO. · Press OK and continue to complete. · Now you should be able to see both disks appear as candidates in the Create Disk group screen. · Select ASM as the storage for the new database. name the database as RACDB.2.ora located on NTFS. · Do not specify Flash recovery area as it will be done later.means each ASM instance on RAC1 and RAC3 will have its own init.2. --Assignment of Environment Variables Right click on My Computer and select properties and go to Advanced tab and define the following environment variables on both servers. lets perform basic health checks. · The main group is DATA and two sub groups are DATAP and DATAS. select general purpose database template . · Launch DBCA from programs group and follow the screens as under. · Accepts defaults for rest of the screens and continue until completion. and then you will see a screen where you have to create disk groups. · DBCA will then create ASM instance. The two instances will be RACDB1 and RACDB2. Post RAC Installation Health Check Now that the RAC is installed.0\crs SET ORACLE_HOME= I:\oracle\product\10. · Use Oracle Managed files.0\db_1 .

Can also runs without integration to vendor clusterware . .Provides group services .Engine for HA operation .OCSSD is part of RAC and Single Instance with ASM . on demand.exe OracleDBConsoleRACDB1 nmesrvc.exe OracleJobSchedulerRACDB1 --Link between Windows Services and Processes (in Task Manager) Run command : TASKLIST /SVC (See the processes links above) Here is a short description of each of the CRS daemon processes: (Taken from Metalink Note: 259301. .Evmlogger. when present .exe evmd.Starts.Manages 'application resources' . EVMD: .Stores current known state in the OCR.Provides basic cluster locking .Maintains configuration profiles in the OCR (Oracle Configuration Repository) .exe OracleCRService OracleCSService OracleEVMService OracleClusterVolumeService --Oracle ASMServices OracleASMService+ASM1 crsd.Is restarted automatically on failure OCSSD: . spawns children .Runs as Oracle.exe OcfsFindVol.EXE OracleServiceRACDB1 oracle. stops.Generates events when things happen .Failure exit causes machine reboot.exe oracle.Integrates with existing vendor clusterware.Provides access to node membership .This is a feature to prevent data corruption in event of a split brain.Runs as root .exe ocssd.1 CRSD: .Spawns a permanent child evmlogger . and fails 'application resources' over .exe --Oracle Database Services OracleoradbTNSListenerLISTENER_RAC1 TNSLSNR.Spawns separate 'actions' to start/stop/check application resources .--Oracle Clusterware Services Oracle Object Service OracleOBJService. --. application application application rac1 ora..C1.rac1.inst application rac3 ora.. ..lsnr application rac3 ora.C3.ons rac1 ora.rac1.DB..Scans callout directory and invokes callouts. .Runs as Oracle.inst application rac1 ora..rac3.. --Start/stop all oracle services Crs_start -all rac3 application application application crs_stat alone will provide full names listing crs_stat -f will provide detailed information about each of the compoenents.db application ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE rac3 ora.asm application rac3 ora.gsd rac1 ora.B1. Listener.lsnr application rac1 ora.Restarted auto when fails --CRS_STAT: Check Health of Resources(ASM...rac1.asm application rac1 ora. Instance etc) Cd %CRS_HOME% Crs_stat -t Name Type Target State Host ---------------------------------------------------------ora..rac3..gsd rac3 ora.ons rac3 ora..B2.SM2.SM1...RACDB.

2. . You should see a document for understanding how to debug a real application cluster environment.\votedsk2 Checks version of Clusterware crsctl query crs softwareversion/activeversion CRS software version on node [rac1] is [10.0\crs\BIN\GUIOracleOBJMan ager.1.RACDB.log crsctl debug statedump crs Debug specific components(level 2) See crsd. 0 \\.0] You can also use the utility to find out location of ocr and voting disk as : Run I:\oracle\product\10. --CRSCTL : Controls RAC parameters Checks health of cluster only Crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy Query voting disks location crsctl query css votedisk 0.htm See Appendix at the end of this document for more details.RACDB1.exe Dumps cluster state to crsd. and is recommended to use it as it has more control of each service group.2. See details at rac.\votedsk1 1.inst Please note that you can also use srvctl command to achieve the same for starting or stopping -all Start/Stop Individual services crs_start resounce_name -c cluster_member crs_start resource_name For example: crs_start ora. 0 \\.102/b14197/appsupport.0.log crsctl debug log "CRSTIMER:2" Please note that dumping cluster state is a one time snapshop while other debig command are modes of tracing with different levels.

You can use the command ocrconfig -showbackup to see existing backups. Please make sure to keep a copy of the backup files. ocrconfig -replace ocrmirror <new location> ocrconfig -restore ocrbackup Ocr backup is automatically taken every 4-hours on the master node.dmp -s online ocrconfig -import ocr. Step2:Stop database with its instances(all) .2. Let us suppose we need to stop all rac resources (not the windows services like CSS. CRS and EVM).0\db_1\log\rac3\client --Export ocr (takes backup and restore and change location) ocrconfig -export ocr.dmp oracle performs 4hr backup at cdata folder under CRS_HOME but only on master node. Recycle RAC Environment Step1: Stops agent processes SET ORACLE_SID=RACDB1 emctl status agent emctl status dbconsole emctl stop dbconsole Repeat the same on RAC3 and then run emctl status to verify.\ocrcfg Device/File integrity check succeeded Device/File Name : \\.\ocrmirrorcfg Device/File integrity check succeeded Cluster registry integrity check succeeded Make sure to check the log: I:\oracle\product\10. ocrconfig -repair ocr <ocr_location> ocrdump <file-name> (dumps ascii format) Manage Cluster Database srvctl command srvctl <commanD> <OBJECT> [<OPTIONS>] I will explain this with an exmaple.Displays health of Oracle Cluster Registry Ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 192652 Used space (kbytes) : 3800 Available space (kbytes) : 188852 ID : 1953799442 Device/File Name : \\.-.

--Accesing RAC Environment from EM console. Make sure agent is up and then open IExplorer: http://RAC1:1158/em where RAC1 is the dbconsole node.GSD.GSD.2. ONS SERVICES srvctl stop nodeapps -n RAC1 srvctl stop nodeapps -n RAC3 Step5: START VIP. I would highly recommend DBAs to get familiar with EM and it is an excellent GUI tool to monitor and manage your RAC environment.0. ONS AND LISTENER SERVICES srvctl start nodeapps -n RAC1 srvctl start nodeapps -n RAC3 Step6: starts asm instances srvctl start asm -n RAC1 srvctl start asm -n RAC3 Step7: starts db +instances srvctl start database -d RACDB Step8: dbconsole and agent startup set ORACLE_SID=RACDB1 emctl stART dbconsole Here you need to repeat this step on RAC3 as well. Make sure RAC is up by crs_stat -t command.srvctl stop database -d RACDB Step3:stop all asm instances srvctl stop asm -n RAC1 srvctl stop asm -n RAC3 Step4:stop VIP. Applying Oracle 10. What ever actions you need to perform you can also see the corresponding SQL that will be run.3 Patch set .

exe again and this time choose oradb_1 home to patch the oracle asm and oracle database software. · After installation is over. · On both nodes. Create a folder as C:\10203Patch and use that as the patch contents unzip folder. Oracle Object Service and OracleClusterVolumeService. Run c:\10203Patch\Setup.NET Oracle Provider for OLE DB Oracle Objects for OLE Oracle Counters for Windows Performance Monitor Oracle Administration Assistant · After this point. you should notice in Windows service manager that Cluster related services will be up and running. Once up start the services in the .3 by running the following command: crsctl query crs softwareversion crsctl query crs activeversion · Stop all cluster services once again. RAC3 and we are ready to add a third node as RAC5.exe. · On the next screen make sure both RAC Nodes are checked and proceed to complete the installation. make sure all services/components of RAC are down.2.bat on remote nodes to activate the following products: Oracle Data Provider for . and bounce both nodes. · After the installation is over. on each of the two nodes. you need to run the following on remote node (RAC3) : You need to execute <Oracle Home>\bin\SelectHome.Now that the RAC is installed on the two nodes. you should go to the file folder from windows explorer and remove that file and then retry the operation. Complete the patch installation. · Unzip and copy the patch contents to RAC1 node. During db home patching. I would recommend to patch the two RAC nodes with latest oracle patch for both Cluster layer and Database (asm inclusive). · Select ORACRS as your first home to be patches which is the clusterware stack. you may receive errors like file in use. run the following from command prompt: I:\oracle\product\10. Major steps in applying the patch. RAC1.2. Applying oracle patch has a pre requisite that all oracle services are down. make all RAC services on windows as Manual start except two services. Now run the setup. You can verify that the Cluster layer is patched with t · At this point.0\crs\install\patch102.

· Again startup the instance as : 1. thenshutdown the instance. log on to sqlplus after setting the ORACLE_SID=RACDB1.ora so that it will only contain the following line: SPFILE='+DATA/RACDB/spfileRACDB.ora' · Also remove the local file spfileracdb1.0 00:01:03 · Oracle Text VALID 10.2. · Now startup the instance as mount.ora.0.0 00:06:23 · Oracle XDK VALID 10.3. spool patch10203.2. startup upgrade 2.following order: § OracleCSService § OracleEVMService § OracleCRService · Make sure database services are down otherwise run the following command: Srvctl stop database -d RACDB · Now you are ready to run the catupgrade against the data dictionary as part of the last step in patching database.2.sql · Review the log file for any errors and make sure all database compoenents are showd updated with 10.0.3.log 3.3.2.ora (located at local node location %ORACLE_HOME%/database to another file name. you can revert it back to 150 after the patch is deployed.3. Then revert back the saved init file to original initRACDB1.0. and then open the initRACDB1. From sqlplus run create pfile from spfile. turn archive off by running alter database noarchivelog. Save the pfile initRACDB1. I will increase the SGA_TARGET from 150 to 300 MB for the instance.2.3. Since am using Automatic SGA memory management. · From RAC1 node.0.0 00:19:33 · JServer JAVA Virtual Machine VALID 10. @I:\oracle\product\10. However you need to make sure SGA components (shared pool and java pool should be at least 150 mb each).2.3 patch set as: · Component Status Version HH:MM:SS · Oracle Database Server VALID 00:00:33 . and also runalter system set cluster_database=FALSE scope=spfile. change parameter values and then from sqlplus run create spfile=SPFILE='+DATA/RACDB/spfileRACDB.0\db_1\RDBMS\ADMIN\catup grd.0 00:01:40 · Oracle Database Java Packages VALID 10.ora' from pfile.0.ora from database folder as it is not required.

At this point you have successfully deployed oracle TO COMPILE ALLINVALID OBJECTS (ELSE THEY BE VALID WHEN ACCESSED) · alter system set cluster_database=TRUE scope=spfile. VALID Adding a third Node to the Cluster Database Now that the RAC is installed on the two nodes. we are ready to create RAC5 as third node.2. · SHUTDOWN · STARTUP MOUNT · ALTER DATABASE ARCHIVELOG.73 RAC5 . RAC1.3.· Oracle XML Database 10.0 00:08:31 · Spatial VALID 10. RAC5 needs to be configured with the following parameters: · IP addresses assigned and also need to be replicated to the host file of remaining two nodes. 00:00:59 · OLAP Catalog VALID 10.2. RAC5 has already been created as a virtual machine on workstation XPWS3.0 00:00:13 · RUN UTLRP. 00:01:37 · Oracle Real Application Clusters VALID 10.0 00:06:41 · Oracle Expression Filter VALID 10.3.3 patch set. · SHUTDOWN · STARTUP · srvctl start database -d RACDB · crs_stat -t should now show databases instances to be up and running.0 00:00:02 · Oracle Data Mining VALID 10.2.0 00:02:32 · Oracle Rule Manager VALID 00:01:41 · Oracle OLAP API VALID 00:01:20 · Oracle interMedia VALID 10.42.3. while the IP addressed of the existing two nodes need to be copied to the host file of RAC5: RAC5 will have the following IP addresses: 10.0 00:00:30 · Oracle Enterprise Manager VALID 10.0.0 00:00:34 · OLAP Analytic Workspace VALID 10.3.0. RAC3.0.

powerOff = "default" powerType.powerOn = "default" powerType.location = "56 4d 49 04 57 26 bc 40-3f 27 76 e5 1c 6a 0b 18" uuid.bios = "56 4d 49 04 57 26 bc 40-3f 27 76 e5 1c 6a 0b 18" ethernet0.present = "TRUE" .startConnected = "TRUE" Ethernet0.0.73 RAC5-PRIV 10.dataCacheMaxSize = "0" diskLib.vmdk" ide1:0.virtualDev = "lsilogic" scsi1.sharedBus = "VIRTUAL" config.generatedAddressOffset = "0" tools.dataCacheMinReadAheadSize = "0" diskLib.fileName = "Windows Server 2003 Enterprise Edition (3).generatedAddress = "00:0c:29:6a:0b:18" ethernet0.present = "TRUE" sound.present = "TRUE" scsi0:1.10.version = "3" scsi0.maxUnsyncedWrites = "0" scsi1.present = "TRUE" scsi1.present = "TRUE" sound.reset = "default" ide1:0. · Following is the excerpt from RAC5 OS winNetEnterprise.fileName = "A:" Ethernet0.252 RAC5-VIP · Map network drives on XPWS3 as Y and W to point to ASM folders of RAC1 and RAC3 with full permission.ungrabbed = "normal" powerType.virtualDev = "lsilogic" memsize = "540" scsi0:0.fileName = "auto detect" ide1:0.42.dataCacheMaxReadAheadSize = "0" diskLib.grabbed = "normal" priority.vmx: disk.virtualDev = "es1371" Ethernet1.locking = "FALSE" diskLib.version = "7" virtualHW.vmdk" sound.addressType = "generated" uuid.10.present = "TRUE" scsi0:0.deviceType = "cdrom-raw" floppy0.present = "FALSE" ide1:0.fileName = "-1" displayName = "rac5" guestOS = "winNetEnterprise" priority.suspend = "default" powerType.syncTime = "TRUE" scsi0:1.dataCachePageSize = "4096" diskLib.fileName = "Windows Server 2003 Enterprise Edition.present = "TRUE" scsi0.10.

bat · You should recive the following messages and make sure there are no errors even for VIP services.2.present = "TRUE" scsi0:3.bat · Press Next to the welcome screen and provide public and private IP address of the new new node and complete the installation.present = "TRUE" ide0:0.Ethernet1.0\crs\install\ · I:\oracle\product\10.fileName = "Y:\ASMDISK\ocr1.2.0\crs\oui\BIN addnode.generatedAddress = "00:0c:29:6a:0b:22" ethernet1.fileName = "Y:\ASMDISK\oradata1.present = "TRUE" scsi1:0.present = "TRUE" scsi0:2.vmdk" scsi1:1.vmdk" floppy0.fileName = "W:\ASMDISK\ d.0\crs on RAC5 and also cluster services but will not start cluster sevices (except first 2 obj serv and cluster volume) · cd I:\oracle\product\10.fileName = "W:\ASMDISK\oradata2.generatedAddressOffset = "10" scsi0:2.vmdk" scsi1:0.deviceType = "plainDisk" · Now we are ready to add RAC Node2 as RACDB3 to Server RAC5.vmdk" ide0:1. Step 1: checking status of CRS stack Step 2: Configuring basic cluster services Step 3: configuring OCR repository with new nodes clscfg: EXISTING configuration version 3 detected.present = "FALSE" ide0:0.vmdk" ide0:0.fileName = "Y:\ASMDISK\votingdisk1.addressType = "generated" ethernet1.deviceType = "plainDisk" ide0:1.present = "TRUE" ide0:1.fileName = "W:\ASMDISK\ocr2. clscfg: version 3 is 10G Release 2.vmdk" scsi0:3. .2.2. Run the following commands from Existing Node RAC1: · cluvfy comp peer -refnode rac1 -n rac5 (Compare) · Install Clusterware stack software on RAC5 from RAC1 as: cd cd I:\oracle\product\10.present = "TRUE" scsi1:1. · The above proc will install I:\oracle\product\10.0\crs\install>crssetup.

ora of other instances.Attempting to add 1 new nodes to the configuration Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.. ons and vip services up and running. Creating ONS application resource on (1) nodes.bat Complete install process. Run crs_stat -t and you should see listener component also apearing besides ons.ora to include +ASM3. You now have two subfolders as +ASM and RACDB. privgrp ''. However I always prefer manual approach which is explained below: Perform the following steps from RAC5 node.. Starting GSD application resource on (1) nodes. · If you run crs_stat -t from Nod3 (RAC5) you should see gsd.0 Copy all the contents of admin folder from RAC1. Modify init+asm3. Go to command prompt of ORACLE_HOME/database set ORACLE_SID=+ASM3 . this is not required but in case later you would like to create spfile for asm. Cd I:\oracle\product\10.. Starting ONS application resource on (1) nodes.0\db_1\oui\bin addnode.2... Creating GSD application resource on (1) nodes. node <nodenumber>: <nodename> <private interconnect name> <hostname> node 3: rac5 rac5-priv rac5 Creating OCR keys for user 'administrator'. · Go to RAC5 node and run network config assistance and create the listener with default settings. · Now you are ready to install oracle software on the new node. Step 4: configuring safe mode for CRS components Step 5: starting up the CRS stack on new nodes Step 6: configuring OCR with new node VIP information Creating VIP application resource on (1) nodes. · Creating ASM instance on RAC5 Node: Create admin folder under I:\oracle\product\10..2. · At this point all cluster services on the new node RAC5 should be aytomatically started and this marks the end of Cluster stack installation for the new node. Operation successful. Make sure listener creation is only for RAC5 node..instance_number=3 to init. Now you can go back to RAC1 node and run DBCA GUI tool and follow the screens to add RACDB3 instance on RAC5 Node (third node). Starting VIP application resource on (1) nodes.instance_number=3 copy +ASM3. vip and gsd services.

audit_file_dest='I:\oracle\product\10.__db_cache_size=201326592 racdb2.__shared_pool_size=96468992 racdb3.2.0 /admin/RACDB/bdump' *.db_file_multiblock_read_count=16 *.__db_cache_size=159383552 racdb1.__streams_pool_size=0 racdb2.__large_pool_size=4194304 racdb2.0/admi n/RACDB/adump' *.__db_cache_size=159383552 racdb3.db_domain='' *.2.__shared_pool_size=121634816 racdb2.db_create_file_dest='+DATA' *. so shutdown asm instance and go to crshome/bin and run the following command: srvctl add asm -n RAC5 -i +ASM3 -o %ORACLE_HOME% srvctl start asm -n rac5 Now execute srvctl start asm -n RAC5 and the ASM will be started. You should now see asm instance running and you can verify by running command select * from v$asm_diskgroup However you need to add the asm service to cluster stack.2.ora as shown below and then copy it back to spfile.__java_pool_size=4194304 racdb3.background_dump_dest='I:\oracle\product\10.ora password=password oradim -new -ASMSID +ASM3 (create windows service) make sure asmtoolg shows the oradata asm group disks via sqlplus mount instance as startup.db_name='RACDB' *.260.__large_pool_size=4194304 racdb3.0.0/admin /RACDB/cdump' *.__streams_pool_size=0 *.626900241' *.orapwd file=PWD+ASM3.0' *.compatible='10.__large_pool_size=4194304 racdb1. · Creating Database instance on RAC5 Node: set ORACLE_SID=RACDB3 orapwd file=PWDRACDB3.cluster_database_instances=3 *.control_files='+DATA/racdb/controlfile/current .dispatchers='(PROTOCOL=TCP) (SERVICE=RACDBXDB)' .core_dump_dest='I:\oracle\product\10.__java_pool_size=4194304 racdb1.__shared_pool_size=96468992 racdb1.__streams_pool_size=0 racdb3. racdb1.1.db_block_size=8192 *.ORA password=password create pfile from spfile and edit the contents of the initRACDB3.cluster_database=TRUE *.__java_pool_size=4194304 racdb2.2.

thread=2 RACDB1.instance_number=1 *.thread=1 *. then that means it was not started with the proper spfile which has value set as cluster_database_instances=3.undo_tablespace='UNDOTBS1' *.instance_number=2 RACDB1.2.cannot mount To overcome this message you need to create the redo log for the new node from RAC1 as: alter database add logfile thread 3 group 5. You do not necessarily need a third party cluster software for RAC implementation as .0/admin /RACDB/udump' Now create oracle db sid oradim -NEW -SID RACDB3 startup pfile=initRACDB3. ORA-01618: redo thread 3 is not enabled .undo_tablespace='UNDOTBS3' RACDB2.thread=3 RACDB2.remote_login_passwordfile='exclusive' *.undo_tablespace='UNDOTBS2' RACDB1.ora nomount. RAC Concepts Primer Oracle Clusterware With 10g. alter database add logfile thread 3 group 6.instance_number=3 RACDB2.undo_management='AUTO' RACDB3.user_dump_dest='I:\oracle\product\10.RACDB3.open_cursors=300 *.remote_listener='LISTENERS_RACDB' *. Also create undo tablespace as: create undo tablespace UNDPTBS3. Add new db instance to the node: adds instance srvctl add instance -d RACDB -i RACDB3 -n RAC5 Now shudown the database as srvctl stop database -d RACDB Now start the database as srvctl start database -d RACDB At this point all three instances are up and running and if you encounter issues like cluster_database_instances value is not in sync for any of the instance.sga_target=268435456 RACDB3. alter database enable public thread 3.job_queue_processes=10 *.processes=150 *. alter database mount.pga_aggregate_target=16777216 *.

CSS then bring voting disks online. CSS performs the following: 1. 8. it is then replicated from local OCR cache to the OCR cache on other nodes in the cluster.loc file during the system startup. It is important to note that Only one OCR process (designated as the master) in the cluster performs any disk I/O activity. a new incarnation of the cluster . 7. CSS authorizes the first node that attains the ACTIVE state as the MASTER node unless a MASTER node is already assigned. 6. Oracle Cluster Registry (OCR) OCR maintains RAC application resources and availability. All of the ACTIVE RAC nodes then register themselves with the MASTER node. This is the first process that is started in the Oracle Clusterware stack. The information relating to System includes CSS. Oracle Clusterware software enables RAC nodes to communicate with each other and work as single logical RAC server. ORA_CRS_HOME etc. CSS then establishes a connection to all RAC nodes using private interconnect. 2. The layers include System. for the location of the ocr file. Database. Each RAC node maintains a copy of the OCR in memory. and CRS. Finally. CRS. Oracle Clusterware reads the ocr. It is created on a shared storage accessible to all Nodes. 4.loc(on Unix) or registry values(on windows). Once information is read by this master OCR process. It then reads the OCR file to determine the location of the voting disk. Cluster Synchronization Services (CSS) CSS maintains membership of each RAC Nodes in the cluster through voting disk which is also stored in shared storage subsystem. The OCR file contains information for all of cluster layers. 5. and to find out which resources need to be started on RAC Nodes after reading OCR file contents. these nodes are changed to ACTIVE status if the node(s) is able to access voting disk(s). Once the connection is established between the various RAC Nodes listeners. 3. EVM. Oracle Clusterware determines the location of the OCR from the ocr.Oracle Clusterware provides the clustering support. The vote disk is required to determine the names/numbers of members in the cluster.

The OCR information is cached inside CRS. it will automatically starts. 3. All communications between the CRS and CSS happen via this process. Moreover this process also . If the daemon fails. stop. Event Manager Daemon (EVMD) The EVMD is an event-forwarding process that sends events through the Oracle Notification Service (ONS). PROCD process. Cluster Ready Service (CRSD) 4. Cluster Synchronization Service Daemon: Cluster Synchronization Service Daemon (CSSD) is responsible for synchronization between the various resources in the cluster. Resources have profiles that define metadata about them in OCR. When a node fails. A failure of this process will cause the relevant RAC node to reboot. Oracle Clusterware Stack The main processes that compose the Oracle Clusterware stack are: established. start. so it also acts as gateway for messages.e. Cluster Ready Service Daemon (CRSD) The CRSD process is used to define and manage resources. Node Membership Service (NM) has the following role: o Check the heartbeat across RAC Nodes every second o Check the heartbeat of the disk by performing a read/write operation every second o If the heartbeat fails to receive for more than 60 sec. Reconfiguration of instances (when an instance joins or leaves the cluster) is also handled by GM.. Cluster Synchronization Service(CSS) 2. Master Node will evict the problematic node from cluster. The GM provides membership services. 2. and manage failover. These services are performed by the Node Membership (NM) and the Group Membership (GM) services. LMON. This process manages the application resources i. o Query voting disk to determine if any RAC node is not able to write to it. Event Manager Service (EVMD) 3. All clients that perform I/O operations register with the GM (e. 1. DBWR). RACGIMON process 5. the GM sends out messages to other instances regarding the status.

Additional Notes: The voting disk is a shared disk that will be accessed by all the nodes used as a central reference. 3rd party Services. Database listeners are configured to listen on VIPs addresses instead of the public ones. stopping. Virtual IP is required to ensure that applications can be work to be high available. the client connection will be rejected by the that node. Instances. RACGIMON Daemon RACGIMON is a database health check process monitor.starts and communicates with the RACGIMON process. PROCD Process PROCD is also a process monitor that runs on hardware platform supporting other third-party cluster managers and is present only on hardware platforms other than Linux like it is present on AIX OS machines. Cluster Interconnect is a communication network used by the cluster nodes for the synchronization of resources and is also used to transfer instance-specific data from one . You should always have three voting disks on different locations to avoid split brain issue which can result in corruption. however its VIP resource will be failed over to another existing node and there will be no TCP timeout whereas the clients will be connected to the RAC. Resources that are managed by the CRS include: Global Service Daemon (GSD). 5. when the node that houses it fails. and also performs the tasks of starting. If any of the nodes is unable to access the voting disk. Listeners. ONS Daemon. When a node goes down. keeps the heartbeat information between the nodes. Databases. Virtual Internet Protocol (VIP). and failover services. the cluster immediately recognizes the communication failure and Master node starts evicting the failed node from the cluster group to prevent data corruptions. 4. the RACGIMON process is started on the MASTER node of the surviving nodes by the CRS process.

4. functions are: · Enables consistent copies of blocks to be transferred between instances. 5. · Number of LMS processes running is driven by GCS_SERVER_PROCESSES parameter say for example ora_lms0. · Rollback uncommitted transactions for blocks that are being requested for consistent read by the remote instance. · DIAG: Diagnostic Daemon Monitors health of the RAC instances . Cache fusion uses high-speed interprocess communication ntwork for cache-to-cache transfer of data blocks between RAC instances. The network layer should be dedicated to the RAC and has high bandwidth with low latency. RAC Background Processes RAC Instance will have the usual background processes that a single non-RAC instance has plus additional processes specifically required for the RAC environment. LMS (Lock Manager Service): Global Cache Services Process LMS is the process used in Cache Fusion. 3..ora_lms9 2.instance to another. LCK: Lock Process Primary function is to manage non-cache fusion resource requests such as library. 1. and lock requests that are local to the instance. LMD (Lock Manager Daemon): Global Enqueue Services Daemon It is a process responsible for: · Managing requests for resources and controls access to blocks and global Enqueues · Handling global deadlock detection and remote resource requests. row cache. LMON (Lock Monitor): Global Enqueue Services Monitor LMON Process is a monitor process which manages: · Instance deaths and associated recovery for the failed node · Cluster/Locks reconfiguration when a new instance joins or existing instance gets evicted from the RAC · Maintains consistence among GCS memory in case any LMSx dies. It addresses transaction concurrency between instances.

· Note that PMON restarts a new DIAG process to continue its service in case DIAG process dies. On Windows environment. Troubleshooting RAC Environment Now that you created a three node RAC with storage extended from RAC1 to RAC3 with normal redundancy. you have the following services for each RAC Node. But before we do that. Please start services in the following order. Additional Notes: The GCS and GES processes on each RAC-Node manage the cache synchronization by using the cluster interconnect network layer. Oracle Object service (Keep Auto Start) Oracle cluster volume service (Keep Auto Start) Oracle CS service (Keep Auto Start) Oracle EVM service Oracle CR Service . · Concurrent Writes on different nodes occurs where multiple instances want to change the same data block frequently. In a clustered database environment. It may be possible that these issues will not arise on a stable environment like AIX/HP over SAN storage. I would like to share some of the issues faced and the methods to resolve. A block available on any of the instances is modified by a another instance while maintaining a different copy of data. we are ready to create a third voting disk. there will exists different scenarios of block sharing which can be categorized as follows: · Concurrent Reads on multiple nodes occurs when two ore more instances are required to read the same block of data. but having the RAC tested over VMWare/Windows has its benefits in terms of troubleshooting. · Concurrent Reads and Writes on different nodes is a combination of I/O operations for a single block of data.and captures diagnostic data regarding process failures in an instance.

I had this problem with the database resources and the instances. On windows some times you are not able to start the CRSD service and in the crsd. Cluster components/services not starting. you need to add this parameter in sqlnet.inbound_connect_timeout=600 4. Creation of OCR mirror: ocrconfig -replace ocrmirror \\. please make sure that the CS service is started on all nodes. sqlnet.oradb.version.RACDB RAC3 5.db srvctl add database -d RACDB -o srvctl add instance -d RACDB -i srvctl add instance -d RACDB -i srvctl start database -d RACDB also all %ORACLE_HOME% RACDB1 -n RAC1 RACDB2 -n RAC1 You can also remove a particular instance by running the command: srvctl remove instance -d RACDB -i RACDB1 3.\ocrcfg 2. so I did the following to resolve: srvctl remove database -d RACDB (this will move db resource and instances registered) Crs_unregister ora.log file you will notice network timeout ora-errors. 1.Some times you will not be able to start EVM service.ora to have the timout increase from 10 secs. Miscellenous cluster commands: srvctl start instance -d RACDB -i RACDB2 srvctl status instance -d RACDB -i RACDB2 srvctl status database -d RACDB crs_stat -t -v srvctl add asm -n RAC5 -i +ASM3 -o %ORACLE_HOME% srvctl start asm -n rac5 ocrcheck --starts specific resourse Crs_start <resource> -c <member> Crs_start ora.node3] Value [] I used ocrdump ocr.node_numbers. OCR Corrupted when starting crsd service After I added the 3rd node and starting crsd service which failed with a message in crsd.log: Incorrect SV stored in OCR. Key [SYSTEM.txt and opened the file in text editor and found out that the value of .RACDB. Some times you receive a message that the <name> resource is not registered with the cluster and although you are able to see the resource when you type crs_stat -t.

and the only way out was to recreate OCR or re-install RAC5 node. listener. log This log basically logs status information for the entire cluster.0 like for other nodes.2. What I did was exported ocr using ocrconfig -export ocr.node_numbers. For example when your mirrored ocr gets corrupted. For example: 2007-07-16 10:43:47. For example when you start cluster services. Opened the ocr.2.dmp file in hex editor and add the values.2.0\crs\log\rac3\alertrac3. Location: I:\oracle\product\10.2. it will display status for voting disks being brought online. Cluster Services Log Location: I:\oracle\product\10. 6.0\crs\log\rac3\client Here you can find log files like cssn.SYSTEM. RAC Node Alert Log Location: I:\oracle\product\10. Location: I:\oracle\product\10. You will not find details information about individual cluster components. you will see a log mentioning about not able to find the corresponding location.log which displays client information for any missing entries in the registry.dmp and it worked.0\crs\log\rac3\css . database. you should be familiar with the following log files. you should be able to find all relevant information of cluster resources (asm. you should see this log for any warnings or errors.0\crs\log\rac3\crsd This is the most important log of all(cluster ready services) depending on the level of debug mode.784: [ OCROSD][2744]utgdv:11:could not read reg value ocrmirrorconfig_loc os error= The system could not find the environment option that was entered. Later I imported back using ocrconfig -import ocr.version. ons etc) as to why they failed to start and continuous running log for all failures. When you start crsd service.2. OCR information when it is configured for changes like when upgrading ocr etc. This method however is not supported by oracle. however executive information is logged here.dmp. RAC Logs While investigating various problems.3. You can also find information about all cluster members being active.0.node3 should be 10.

crsctl debug statedump crs will dump status of crsd Suppose you want to debug specific modules for a service. For example. Location: I:\oracle\product\10. You should look in this log (even if the windows service gets started fine) for any errors relating to accessing the shared storage from a node. Location: I:\oracle\product\10.RACDB.2.2.0\crs\log\rac3\racg This folder has log files for VIP service (even for other nodes when they failed over) as well as the main database service log ora.log This is the log for the first windows service Oracle Object Service which gets started and is responsible to links to storage management (ocr disk.2. crsctl lsmodules css will list: CSSD COMMCRS COMMNS crsctl lsmodules crs will list: CRSUI CRSCOMM CRSRTI CRSMAIN CRSPLACE CRSAPP CRSRES CRSCOMM CRSOCR CRSTIMER CRSEVT CRSD CLUCLS CSSCLNT COMMCRS . voting etc).0\crs\log\rac3\evmd Cluster event management log for the EVMD service which gets started after CSS but before CRSD. first this you should do is to find out all of the modules related to a service.0\crs\BIN\OOBJService. Location: I:\oracle\product\10. You can control the information being logged with various trace levels.log which controls all RAC instances for High availability and monitoring.Cluster stack log for the CSS service which gets started before CRSD service.db.

db:5 crsctl debug log res ora.2.0\db_1\log\rac3\* This location holds several logs and its worthwhile to look here when there is an issue with cluster database.0\db_1\log\rac3\racg\imon .RAC Environment Here I will discuss about backing up RAC .2.0\crs\srvm\admi n\ocrlog.COMMNS crsctl lsmodules evm will list: EVMD EVMDMAIN EVMCOMM EVMEVT EVMAPP EVMAGENT CRSOCR CLUCLS CSSCLNT COMMCRS COMMNS Now suppose you want to debug css modules for level 5(detailed info): crsctl debug log css CSSD:5.2.COMMCRS:5.RACDB1. For example you find here a log about ocr not being able to initialized as file name: I:\oracle\product\10.0\db_1\RDBMS\log\ ipcdbg.ini Database Services Log Location: I:\oracle\product\10.racdb2.2. Location: I:\oracle\product\10. Look here when there is an issue between nodes for private interface channel.log Here you can find information related to Cache fusion communication channel over private interface card.2. Backup and Recovery . Which holds useful information everytime you run ocrcheck utility.2.RACDB.CRSCOMM:2" crsctl debug log res ora.0\db_1\log\rac3\racg\imon _RACDB.log and ocrcheck_600.0\db_1\log\rac3\\client\o crconfig_1064.log I:\oracle\product\10.inst:5 ocr looging: uncomment:I:\oracle\product\10.log.COMMNS:5 crsctl debug log crs "CRSRTI:1.RACDB.log Instance monitor/RACGIMON logs Location: I:\oracle\product\10.

Cluster and Database. Therefore do not consider export as your backup strategy for RAC.0\db_1 asmcmd -p cd DATA cd RACDB mkdir BACKUP +DATA/RACDB/BACKUP is the shared backup location unless you are using a Tape device. As far as OS and Oracle software layer is concerned. Backup and Recovery for clustered database You can always use an export method to backup the database or specific schema. For example. My viewpoint is that a DBA needs to be more aware about Backup and Recovery concepts in a RAC environment rather than the actual commands difference. RMAN should always be The Choice when considering backup strategies for a RAC Database environment. There are many articles available on Metalink that talk about RAC Backup and Recovery Procedures/commands. Create a folder in shared storage to hold your backup sets.2. then you should make sure to use a MML like Veritas and have the backup registered in Veritas as well as RMAN repository. or even for a single instance Non-RAC database. you should have a cold backup for the OS System backup which should include OS and Oracle software mount points. I have always maintained RAC Databases in such a away that I did not have to issue different backup/recovery commands for single instance vs. take a look at the following test case where archived logs are defined at a shared storage accessible to both nodes. Backup commands have specific switches when your archived logs are backed up locally on each node.environment including all of its components. set ORACLE_SID=+ASM1 set ORACLE_HOME =I:\oracle\product\10. Export is always used when your requirements are more closer to the application level for specific objects. 1. . RAC. however this does not differ from single instance to RAC and I would not consider export to replace the standard backup procedures. The key here is that you should always define Archived Log location in the shared storage (where rest of the data files reside).

Note down archived logs created. alter database datafile 7 offline. create user user1 identified by user1. } run { delete obsolete. alter database open.sequence#. . alter system set log_archive_dest='+DATA' scope=both 3.--------------------db_create_file_dest string +DATA drop tablespace tbs_test1 including contents and datafiles.2. completion_time FROM gV$ARCHIVED_LOG where completion_time > '21-JUL-2007 12:00:31' and name is not null ORDER BY SEQUENCE# DESC 5. Point your archived logs to be created at shared storage. Simulate Crash Shutdown database (only on windows) From ASMCMD. remove the datafile for the tbs_test1 tablespace. CREATE TABLESPACE tbs_test1 DATAFILE SIZE 20M. } configure CONTROLFILE AUTOBACKUP on. Create test data show parameter db_create_file_dest NAME TYPE VALUE -----------------------------------. run { backup as compressed backupset database format = '+DATA/RACDB/BACKUP/FULLB1%u'. } RMAN> list backup of database. alter user user1 default tablespace tbs_test1. backup archivelog all format = 'i:\oracle\archbkup%u' delete input. 4. Take full database backup run { change archivelog all crosscheck. SELECT name. Startup database in mount state.thread#. create table dept (id number) --insert some values Now from both nodes switch logfiles. drop user user1.

and use -import option to restore the contents back. it holds all the cluster related information such as instances.loc in ocrconfig_loc and ocrmirrorconfig_loc variables. sql 'alter database datafile 7 online'. There is an excellent note on Metalink that deals with issues relating to recovery scenarios for RAC environments.1. these are the changes to be aware of: run { allocate channel d1 type disk connect 'sys/rac@node1'.1 and Note:220970. The OCR file format is binary and starting with 10. Backup and Recovery for OCR & VOTING disks OCR Backup and Recovery Reference: Metalink Note: 220970.1 OCR raw device/file gets backed up every four hours on the master RAC node at the default location: $ORA_CRS_HOME\cdata\"clustername"\ To display backups : ocrconfig -showbackup To restore a backup : ocrconfig -restore The automatic backup mechanism keeps about a week old copy.CONFIGURE CHANNEL 2 DEVICE TYPE DISK connect 'SYS/rac@node2'. As you can see in the above example.Note:207059. Or CONFIGURE CHANNEL 1 DEVICE TYPE DISK connect 'SYS/rac@node1'. allocate channel d2 type disk connect 'sys/rac@node2'. Document 207059.1 is an old one that deals with 9i and Parallel server but it does clear some concepts about raw devices. However if you decide to put archived logs locally. . Location of file(s) is located in: /etc/oracle/ocr. we did not have to specify any extra commands for the RAC environment because our archived logs are located in the common storage. recover datafile 7.2 it is possible to mirror it. Obviously if you only have one copy of the OCR and it is lost or corrupt then you must restore .Now you run commands as normal and those will have the scope for the relevant node specified above. rman target / restore datafile 7. OCR is the Oracle Cluster Registry. services. If you want to take a logical copy of OCR at any time use : Ocrconfig -export .select * from v$recover_file exit.

. OCR assign each device with one vote. then it will not be possible to start it up until the failed device becomes online again or some administrative action using ocrconfig utility with -overwrite flag is taken.. Despite the corrupt copy. b) Issue "ocrconfig -overwrite" on any one of the nodes in the cluster. alternatively. see ocrconfig utility for details. The rule is to have more than 50% .301: [OCRRAW][1210108256]proprioini:disk 0 (/dev/raw/raw1) doesn't have enough votes (1. When the Clusterware attempts to start you will see messages similar to: total id sets (1). DBA is advised to repair this hardware/software problem that prevent OCR from accessing the device as soon as possible. specifically -showbackup and -restore flags. 2nd set (0. Well. Basically.0) my votes (1). The interesting discussion is what happens if you have the OCR mirrored and one of the copies gets corrupt? You would expect that everything will continue to work seamlessly. then the corruption will be tolerated and the Oracle Clusterware will continue to function without interruptions.1958222370). 1st set (1669906634. In the above example one of the OCR mirrors was lost while the Oracle Clusterware was down.a recent backup. The real answer depends on when the corruption takes place.2) 2006-07-12 10:53:54. There are 3 ways to fix this failure: a) Fix whatever problem (hardware/software?) that prevent OCR from accessing the device.301: [OCRRAW][1210108256]proprseterror: Error in accessing physical storage [26] This is because the software can't determine which OCR copy is the valid one. Until a valid backup is restored the Oracle Clusterware will not startup due to the corrupt/missing OCR file. if OCR device is configured with mirror. This command will overwrite the vote check built into OCR when it starts up. If however the corruption happens while the Oracle Clusterware stack is down. total votes (2) 2006-07-12 10:53:54. Almost. If the corruption happens while the Oracle Clusterware stack is up and running. DBA can replace the failed device with another healthy device using the ocrconfig utility with -replace flag.

Edit /var/opt/oracle/ocr.loc(or windows registry) on all nodes and set up ocrconfig_loc=new OCR device .reboot to restart the CRS stack. . How to move ocr location: Stop the CRS stack on all nodes . do the following to add a mirror file. In 2-way mirroring. you need to have at least one OCR online ocrconfig -replace ocr OR ocrconfig -replace ocrmirror Voting Disk Backup and Recovery .loc to delete the failed device and restart the cluster. OCR assign 2 vote to the surviving device and that is why this surviving device now with two votes can start after the cluster is down).Run ocrcheck to verify. the total vote count is 2 so it requires 2 votes to achieve the quorum.Restore from one of the automatic physical backups using ocrconfig -restore. c) This method is not recommend to be performed by customers. while OCR is running when the device is down. It is possible to manually modify ocr. (In the earlier example. In the example above there isn't enough vote to start if only one device with one vote is available. OCR won't do the vote check if the mirror is not configured. OCR locations can be changed with ocrconfig: ocrconfig -replace ocr ocrmirror [<filename>] In short these are the commands to administer OCR: ocrconfig -replace ocr destination_file or disk Here.of total vote (quorum) in order to safely make sure the available devices contain the latest data. ocrconfig -replace ocrmirror destination_file or disk To replace OCR do the following: ocrconfig -replace ocr destination_file or disk and to replace the OCR mirror: ocrconfig -replace ocrmirror destination_file or disk Repairing the OCR: ocrconfig -repair ocrmirror device_name To remove an OCR. .

From VMWare settings of RAC5 node.vmdk.exe to see that the new DISK of 300MB is visible. List existing voting disks: crsctl query css votedisk To delete existing voting disk: crsctl delete css votedisk path To add another voting disk: crsctl add css votedisk path Above command should be run when crs is up.exe under CRS HOME/bin to assign logical name/link the new candidate disk as VOTEDSK3 as shown in the following figure. On XPWS1 create a logical Y: Drive to point to it.0. you can use the following to backup voting disks: dd if=voting_disk_name of=backup_file_name You can use the ocopy command in Windows environments along with the use the crsctl commands to copy and administer the files.On Unix.\votedsk3 . however use force option if crs is down as: crsctl add css votedisk path -force Test Case: Lets apply what we have learned onto the RAC environment we have earlier creates. You should reboot RAC1. Adding 3rd voting disk: Our extended RAC environment already has one voting disk for RAC1 & RAC3 nodes. Now from RAC1 node. I would like to add a third voting disk on RAC5 (3rd) node. make all cluster services must be DOWN and then verify this with the command crs_stat -t.0\crs\BIN>crsctl add css votedisk \\. There is a bug that is fixed in 10. Then from VMWare add existing virtual disk from both Wok stations (XPWS1 and XPWS2) to point to Y:\votedisk3.2. therefore you need to use the force option: From the node RAC5: I:\oracle\product\10.4 that crashes crs stack of voting disks are added online. Now you need to run GUIOracleOBJManager. Run ASMTOOLG before and after adding the disks.2. so you would be able to identify the raw partition name. create a new pre-allocated virtual disk (IDE) of 300MB in size. Now share the D:\RACVIRTUAL\RAC5 folder on XPWS5 to XPWS1 and XPWS2 with full access. likewise on XPWS2 as well. RAC3 and RAC5 and then use GUIOracleOBJManager.

All rights reserved. Taking a backup of voting disk: Shutdown all cluster services across nodes.bak \\.\votedsk1 1.0\crs\BIN> Restoring a backup of voting disk: Suppose you lost your voting disk/device.0\crs\BIN>ocopy votedsk3.0\crs\BIN>crsctl query css votedisk 0.0\crs\BIN>ocopy OCOPY v2.Cluster is not in a ready state for online disk addition I:\oracle\product\10.2.0\crs\BIN>ocopy \\.\votedsk3 votedsk3.0\crs\BIN>crsctl add css votedisk \\.\votedsk3 -force Now formatting voting disk: \\.1 . follow the same procedures as described above to re-create the new voting disk.bak VOTEDSK3.\votedsk3 \\. I:\oracle\product\10. What to do when OCR/Voting disks are lost and there is no backup: Reference Metalink ID: 399482. then follow the procedures to create a new raw voting disk/device as described earlier until the point where you assign the link name.BAK I:\oracle\product\10. 0 \\.2.2. Use the ocopy oracle supplied command to take a backup as shown below: From RAC5: I:\oracle\product\10. Usage: ocopy from_file [to_file [a size_1 [size_n]]] ocopy -b from_file to_drive ocopy -r from_drive to_dir I:\oracle\product\10. Then run the restore as shown below: I:\oracle\product\10. However suppose you lost all of your voting disk but you had a backup.\VOTEDSK3 Changing Location of Voting disk: Use the add method described above.\votedsk3 successful addition of votedisk \\. 0 \\.2.0 .\votedsk2 2.\votedsk3. 0 \\.2.2. Verify this from all nodes by running crsctl query css votedisk Now start the cluster node rac1 with all services and it should be up and running.Copyright 1989-1993 Oracle Corp.\votedsk3 located 3 votedisk(s).

