Escolar Documentos
Profissional Documentos
Cultura Documentos
MIGRATING
FROM
SINGLE INSTANCE
TO
RAC
AND UPGRADING:
INTRODUCTION
Oracle has provided many methods of migrating and upgrading their database systems. These migration methods include Basic migration methods like CTAS, Export/Import, Datapump, Streams, GoldenGate and Data Guard. The purpose of this paper is to discuss the use of Data Guard to migrate from a single instance to a RAC Database and explain the rationale behind this approach. This Paper will also talk about what the benefits we found in designing the architecture and infrastructure and how we were able to successfully migrate to new servers with only 1 hour of downtime . I will also discuss the challenge of migrating to the newest patchset with the software stack we were using. The old infrastructure consisted of Active/Passive Nodes using Veritas Clustering to provide Failover capabilities The old infrastructure did not allow us to scale and we had maxed out both CPU and memory capacity in the frame The old infrastructure could not connect to newer Disk Frames due to antiquated HBAs The old infrastructure was running Oracle 10.2.0.4 with Solaris 9 There was an idle node that was not being used due to the Active/Passive design of the environment Due to the application design we have a Logical Standby that was configured for reporting purposes only The application being an Ecommerce application was experiencing scalability issues during busy times with high I/O Wait times as well as application concurrency issues. The database server was maxing out of CPU which in turn caused the application to bottleneck. Various application tuning attempts yielded minimal gain since this was a 3rd party application with customized code Application team expected higher concurrency and transaction growth rates due to newer technologies being deployed at the application's front end The cost and timelines to upgrade the current infrastructure would have caused us to miss the holiday season and thus lose significant revenue Our application team needed a plan that would have the least amount of downtime with the least amount of money being spent The Platform had to be scalable to meet our future needs The Platform had to be certified by the 3rd Party application we are using
OLD INFRASTRUCTURE:
THE PROBLEM
371
Database
DESIGN GOALS:
Scalable architecture that can support our business growth Minimal code changes Needed to support our business and be available in time for the holiday season
The Pre-migration architecture consisted of hardware that had outgrown the business and a model that required downtime for any changes, which was not beneficial to the business. The need to be highly available and provide a model to be able to patch with minimal downtime as well as load balance the environment and utilize all hardware on the floor caused us to look for better strategies to help our business.
As part of the design process, and to meet design goals, the team worked towards Oracle RAC as the technology/scalability platform for the application. This technology was selected after careful consideration since it was certified by the ATG Product team, (ATG is Art Technology Group, now part of Oracle) as well as verifying that it made sense as the technology platform to provide uptime and economies of scale. The new environment selected was: 3 Oracle SPARC Enterprise M5000 servers running Solaris 10 Oracle Grid Infrastructure 11.2.0.2 Oracle RAC 11.2.0.2 EMC VMAX SAN
The environment was sized for an N+1 configuration which allowed the application to function without any issues in the case of 1 node being out of service. The capacity planning exercise involved various tools from various vendors including AWR reports to assess peak and nonpeak capacity models.
MIGRATION PREPARATION
The migration goals were simple. We needed a platform to migrate to with very little downtime. To do this task the team started working on strategies that would allow us to provide not only a migration platform but a pre-migration live test. The technologies that were considered were: Disk Based Replication Disk Based replication would work since it was between 2 similar frames but we were moving from single instance to RAC and that would have caused complications
2 371
Database
Backup & Restore Backup and restore was one of the options considered but this option was time consuming and the downtime window would have been extensive and thus unacceptable to the business
Data Guard Oracle Data Guard was a technology that fit the bill of being able to not only provide for faster migration time but be a repeatable process that would allow us to do a test run and perform pre-migration testing. As part of our Data Guard strategy we looked at both Logical and physical Data Guard as a means to migrate. Logical was ruled out due to the fact that our primary had objects without primary keys and some load operations would have caused issues in using the Transient Logical Standby upgrade methodology. We had a logical standby attached to the single instance and our understanding of the implications made our decision to use physical standby much easier.
Database
The Cluster Verification Utility can help in discovering the network interfaces and ensuring that the environment is fully configured prior to the install. The Oracle 11gR2 (11.2.0.2+) Installer includes Pre-checks that invoke the CVU utility and validate all components of the environment prior to the creating the cluster.
SOFTWARE INSTALLATION
After the environment has been validated it is time to install the software. Oracle 11gR2 is different from previous version in installation as a lot of the checks that were manual in previous versions are incorporated into the installer, the binaries structure has changed too. Oracle 11gR2 breaks the Oracle binaries into 2 distinct sets. 1. Oracle Clusterware or Grid Infrastructure Binaries The Oracle Clusterware or Grid Infrastructure (GI) binaries are now separate and include the Clusterware as well as the Automatic Storage Management (ASM) binaries. These binaries can be installed for RAC Installation as well as for single instance installation where ASM and Oracle Restart are needed. These binaries can also be installed as a separate user for job role separation. 2. Oracle Database The Oracle Database binaries are the standard binaries that are used for the operation and management of the Oracle Database. Oracle GI has specific issues at install time (11.2.0.2 only) which require a patch that can be applied at software install time due to Multicast Issues. This issue is documented in MyOracleSupport (formerly Metalink) doc #1212703.1 and something that is very frustrating to debug. The GI installation gets a little complicated if you create Job Role Separation by creating a separate grid user and separating the clusterware install from the DB Install. MyOracleSupport has a lot of useful documents that explain, in detail, how to setup the clusterware and what the caveats are for job role separation. Note #1376731.1 will help here. Once the install is completed a validation must to be done to ensure all components are installed correctly. Oracle provides an audit tool called Raccheck. Raccheck can be downloaded at #1268927.1. Raccheck is a very useful tool and should be used not only for validating the install but also for routine validation of the environment. The following is an example of the checks Oracle has incorporated in raccheck.
Begin RACCHECK example output ============================================================= Node name - test ============================================================= Collecting - ASM DIsk I/O stats Collecting - ASM Disk Groups Collecting - ASM disk partnership imbalance Collecting - ASM diskgroup attributes Collecting - ASM initialization parameters Collecting - Active sessions load balance for test database Collecting - Archived Destination Status for test database Collecting - CONNECT Role Grantees for test database
371
Database
Collecting - Cluster Interconnect Config for test database Collecting - Data Files In Backup Mode for test database Collecting - Database Archive Destinations for test database Collecting - Database Component Status for test database Collecting - Database Files for test database Collecting - Database Instance Settings for test database Collecting - Database Parameters for test database Collecting - Database Properties for test database Collecting - Database Registry for test database Collecting - Database Sequences for test database Collecting - Database Undocumented Parameters for test database Collecting - Database Workload Services for test database Collecting - Dataguard Status for test database Collecting - Files Needing Media Recovery for test database Collecting - Files not opened by ASM Collecting - INVALID SYS and SYSTEM objects for test database Collecting - INVALID application objects for test database Collecting - Invalid Java Objects for test database Collecting - Invalid Registry Components for test database Collecting - JVM Roles for test database Collecting - JVM Roles for test database Collecting - Log Sequence Numbers for test database Collecting - Objects Duplicated in SYS and SYSTEM Schema for test database Collecting - Percentage of asm disk Imbalance Collecting - Process for shipping Redo to standby for test database Collecting - Redo Log information for test database Collecting - Standby redo log creation status before switchover for test database Collecting - CRS active version Collecting - CRS oifcfg Collecting - CRS software version Collecting - CSS Reboot time Collecting - CSS diagwait Collecting - CSS disktimout Collecting - CSS miscount Collecting - Cluster interconnect (clusterware) Collecting - Clusterware OCR healthcheck Collecting - Clusterware Resource Status Collecting - Kernel parameters Collecting - Multipath configuration Collecting - Netstat for tcp and udp protocols Collecting - OS Packages Collecting - OS Patches
371
Database
Collecting - Shared memory segments Collecting - Solaris10 kernel parameters Collecting - Solaris9 kernel parameters Collecting - System configuration information Collecting - Table of file system defaults Collecting - Voting disks (clusterware)
The following is an Example of a sample raccheck output. -INFO => $CRS_HOME/log/hostname/client directory has too many older log files. WARNING => Value of remote_listener parameter is not able to tnsping for test WARNING => Value of remote_listener parameter is not able to tnsping for test INFO => INFO => INFO => INFO => INFO => INFO => core_dump_dest has too many older core dump files for test user_dump_dest has trace files older than 30 days for test ORA-00600 errors found in alert log for test ORA-07445 errors found in alert log for test background_dump_dest has files older than 30 days for pwagdb Some tablespaces do not have allocation type as SYSTEM for test
WARNING => Some tablespaces are not using Automatic segment storage management. for test --
THE MIGRATION
We have talked about getting the environment ready and now we are going to talk about the actual process of migration from single instance to RAC using Data Guard. Since this migration was from the Oracle 10.2.0.4 database the first thing was to install the 10.2.0.4 DB software on the new 11.2.0.2 RAC cluster. Once the Software is successfully Follow these steps to get the environment up: Pre Migration Steps 1. Add REDO threads to correspond to the number of instances in the new RAC cluster 2. Add Undo Tablespaces to correspond with the number of RAC instances you will have 3. Run ?/rdbms/admin/catclust.sql on the single instance during a quiet time to get the RAC catalog views in place 4. Backup The Database as below RUN {
set command id to 'stdby_test'; allocate channel ch1 type disk format ='/usr/local/oracle/migration/standby_%U.bak'; allocate channel ch2 type disk format ='/usr/local/oracle/migration/standby_%U.bak'; BACKUP DATABASE PLUS ARCHIVELOG tag for_standby; } BACKUP DEVICE TYPE DISK FORMAT '/usr/local/oracle/migration/test_ctl%U' CURRENT CONTROLFILE FOR STANDBY;
371
Database
5. Configure Oracle tnsnames & sqlnet.ora on Standby & Primary to have both the primary and standby node information on both 6. Setup the Standby init.ora with parameters from primary and add-ons as below
-- Setup the init file on the standby -- copy base file from primary -- modify the following control_files *.log_archive_config *.log_archive_dest_2 *.db_file_name_convert *.log_file_name_convert *.standby_file_management=auto *.fal_server *.fal_client *.service_names *.cluster_database=true *.db_unique_name *.instance_name *.instance_number *.thread *.REMOTE_LISTENER= *.LOCAL_LISTENER
7. On the New RAC node as the grid user $GRID_HOME/bin/setasmgidwrap o=/opt/oracle/product/10.2.0/db/bin/oracle to get the Oracle binary to work with 11.2.0.2 clusterware Migration Steps 1. Copy the backup to a location available by the new nodes 2. Restore the database as below $ rman target sys/oracle@pwagdb01 auxiliary / DUPLICATE TARGET DATABASE FOR STANDBY; 3. Once the duplicated database exists you can perform the setup to let the clusterware/OCR be aware of the database srvctl add database -d test -o /usr/local/oracle/product/10.2 -r PHYSICAL_STANDBY Let the Clusterware know that this is physical Standby srvctl add instance -d test i test1 -n testdb01 -- Add node 1 instance srvctl add instance -d test i test2 -n testdb02 -- Add node 2 instance srvctl modify instance -d test -i test1 -s +ASM1 Add node 1 ASM as a dependency srvctl modify instance -d test -i test2 -s +ASM2 -- Add Node 2 ASM as a dependency --
371
Database
4. Add Standby Redo Logs that should be identical to the online redo log sizes that the primary database has. The standby redo logs can be created prior to the duplicate as well as after and ae needed for real time apply to function 5. Run ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT on instance 1 of the new RAC DB to start the physical Standby apply At this point we have a complete running Standby Database on the new RAC Cluster. The Database is 10.2.0.4 and is in complete sync with the primary. The Data Guard Broker should be used to manage and maintain this database and the primary single instance database is being synced with the new RAC clustered physical Standby. After this point the upgrade portion of the exercise begins. In our scenario we chose to take downtime since there was application configuration that was needed. So the following was the upgrade scenario. Upgrade Steps. 1. On The Primary and standby ensure all redo logs have been shipped 2. Put the database in restrict and bring the application 3. Bounce the database to ensure all app connections are completed and put it in restricted mode. Switch a couple of redo logs and then defer the redo log apply 4. Ensure on the Standby database all received archivelogs are applied 5. Stop Standby redo apply ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; 6. Cleanup old Archive Logs 7. Activate Standby Database Alter database activate standby database; 8. Open The Database 9. Shutdown the Database to enable flashback. 10.Remove any obsolete parameters identified by utl112i.sql 11.Create a restore point 12.Open the Database 13.Run DBUA to upgrade the Database 14.Validate The Upgrade 15.GO Live While the steps look tedious if well planned and choreographed, the total upgrade time can be 1-3 hrs. This in our case coincided with the application changes required to support the new architecture including the addition of services to load balance the application architecture. The upgrade was completed, the move to infrastructure was completed without any hitches and our Go-Live from single instance using RAC was a success.
371
Database
REFERENCES AND FURTHER READING The references are provided courtesy of Oracle RAC Assurance group Oracle RAC Assurance Group is a team within Oracle whose goal is to provide input to customers with new and existing CRS,RAC,ASM and MAA implementation References
22.1 Notes for 'My Oracle Support' (MOS) (The New MetaLink):
o Doc ID 736737.1 My Oracle Support - The Next Generation Support Platform o Doc ID 730283.1 Get the most out of My Oracle Support o Doc ID 747242.1 My Oracle Support Configuration Management FAQ
o Doc ID 296874.1Configuring the HP-UX Operating System for the Oracle 10g VIP o Doc ID 361323.1 HugePages on Linux: What It Is... and What It Is Not... o Doc ID 401132.1 How to install Oracle Clusterware with shared storage on block devices o Doc ID 357472.1 Configuring device-mapper for CRS/ASM o Doc ID 332257.1 Using Oracle Clusterware with Vendor Clusterware FAQ o Doc ID 397460.1 Oracle's Policy for Supporting RAC 10g with Symantec SFRAC o Doc ID 790189.1 Oracle Clusterware and Application Failover Management o Doc ID 391771.1 OCFS2 - Frequently Asked Questions o Doc ID 359515.1 Mount Options for Oracle files when used with NAS devices o Doc ID 759565.1 Turning NUMA on can cause database hangs o Doc ID 220970.1 RAC: Frequently Asked Questions (9.2 through 11.2) o Click here. Automatic Workload Management with Oracle Real Application Clusters (FAN/FCF). o Click here. Oracle Clusterware 11g Release 2
Database
o Doc ID 811306.1 RAC Assurance Support Team RAC Starter Kit (Linux) o Doc ID 811280.1 RAC Assurance Support Team RAC Starter Kit (Solaris) o Doc ID 811271.1 RAC Assurance Support Team RAC Starter Kit (Windows)
Oracle Global Customer Support, RAC Assurance Walgreens Page 156
10
371
Database
o Doc ID 811293.1 RAC Assurance Support Team RAC Starter Kit (AIX) o Doc ID 811303.1 RAC Assurance Support Team RAC Starter Kit (HP-UX)
ID ID ID ID ID ID ID ID ID
429855.1 CRSCTL STOP CRS issues Shutdown Abort in both ASM and database instances 4598992.8 Bug 4598992:"Action script for resource 'ora.xxx.vip' stdout redirection" errors in crsd.log 563905.1 Implementing LIBUMEM for CRS on Solaris 64 with 3rd Party Clusterware 92602.1 How to Password Protect the Listener 403743.1 VIP Failover Take Long Time After Network Cable Pulled 359515.1 Mount Options for Oracle files when used with NAS devices 561414.1 Transactional Sequences in Applications in a RAC environment 395314.1 RAC Hangs due to small cache size on SYS.AUDSES$ 563566.1 gc lost blocks diagnostics
11
371
Database
o Doc ID 949322.1 Oracle11g Data Guard: Database Rolling Upgrade Shell Script o Doc ID 1053147.1 11gR2 Clusterware and Grid Home - What You Need to Know o Doc ID 742060.1 Release Schedule of Current Database Releases o Doc ID 276434.1 Modifying the VIP or VIP Hostname of a 10g or 11g Oracle Clusterware Node o Doc ID 219361.1 Troubleshooting ORA-29740 in a RAC Environment o Doc ID 226880.1 Configuration of Load Balancing and Transparent Application Failover o Doc ID 864633.1 Enable Oracle NUMA support with Oracle Server Version 11.2.0.1 o Click here. RAC on IBM AIX Best practices in memory tuning and configuring for system stability o Click here. XA and Oracle controlled Distributed Transactions
12
371
Database
o Doc ID 810915.1 How to Monitor, Detect and Analyze OS and RAC Resource Related Degradation and Failures on Windows. o Doc ID 433472.1 OS Watcher For Windows (OSWFW) User Guide.
13
371
Database
o Click here. Rapid Oracle RAC Standby Deployment: Oracle Database 11g Release 2 o Click here. Oracle Active Data Guard Oracle Data Guard 11g Release 1 o Click here. Oracle Data Guard: Disaster Recovery for Sun Oracle Database Machine. o Click here. Rapid Oracle RAC Standby Deployment: Oracle Database 11g Release 2 o Click here. Platform Migration Using Transportable Database Oracle Database 11g and 10g Rel 2 o Click here . Switchover and Failover Best Practices: Oracle Data Guard 10g Release 2 o Click here. Client Failover Best Practices for Highly Available Databases: Oracle Database 10gR2 o Click here . Oracle Active Data Guard Oracle Data Guard 11g Release1 o Click here. Configuring Oracle BI EE Server with Oracle Active Data Guard o Click here. Fast-Start Failover Best Practices: Data Guard 10gR2. o Click here. Data Guard Redo Apply and Media Recovery Best Practices Oracle Database 10gR2 o Click here. Switchover and Failover Best Practices: Oracle Data Guard 10g Release 2 o Click here. SQL Apply Best Practices: Oracle Data Guard 11g Release 1 o Click here. SQL Apply Best Practices: Oracle Data Guard 10g Release 2. o Click here. Rapid Oracle RAC Standby Deployment: Oracle Database 11g Release 2
22.16 Cloning
o Click here. DB Cloning Solution Using Oracle's Sun ZFS Storage Appliance And Oracle Data Guard o Click here. Cloning an Oracle Database to the Same Server Using Snapshot and Volume Copy
22.17 Exadata
o Doc ID 888828.1 Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions o Click here. Oracle Exadata Tips, Tricks, and Best Practices: Backup and Recovery (S316821) o Click here. Oracle Data Guard: Disaster Recovery for Sun Oracle Database Machine o Click here. Migrating Oracle E-Business Suite to Sun Oracle Database Machine Using Oracle Data Pump o Click here. Tape Backup Performance and Best Practices for Exadata Storage and the HP Oracle Database Machine o Click here. Oracle Data Guard: Disaster Recovery for Sun Oracle Database Machine
14
371
Database
o Doc ID 388577.1 Using Oracle 10gR2 RAC and Automatic Storage Management with Oracle EBusiness Suite Release 12 o Doc ID 455398.1 Using Oracle Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 11i and Oracle Database 11g o Doc ID 466649.1 Using Oracle 11g Release 1 (11.1.0.7) Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 12 o Doc ID 762024.1 How To Ensure Load Balancing Of Concurrent Manager Processes In PCP-RAC Configuration o Doc ID 241370.1 Concurrent Manager Setup and Configuration Requirements in an 11i RAC Environment o Doc ID 1057802.1 Best Practices for Performance for Concurrent Managers in E-Business Suite o Doc ID 552028.1 How to Ensure That Source Nodes Are Not Used on Failover From RAC to Single Tier Dataguard Environment o Doc ID 271090.1 Parallel Concurrent Processing Failover/Failback Expectations o Click here. Maximum Availability Architecture(MAA):Oracle E-Business Suite Release 12
22.21 Patching
As part of an overall maintenance strategy, it is critical that customers have a formal strategy to stay in front of known issues and bugs. To make it easier for customers to obtain and deploy fixes for known critical issues we have created. o Doc ID 1082394.1 11.2.0.X Grid Infrastructure PSU Information o Doc ID 438314.1 Critical Patch Update - Introduction to Database n-Apply CPUs. o Doc ID 756671.1 Oracle Recommended Patches -- Oracle Database. o Doc ID 850471.1 Oracle Announces First Patch Set Update For Oracle Database Release 10.2. o Doc ID 761111.1 Online Patches (Hot Patching). o Doc ID 405820.1 10.2.0.X CRS Bundle Patch Information. o Doc ID 844983.1 Apply CRS Bundle Patch Using opatch auto Option. o Doc ID 850471.1 Oracle Announces First Patch Set Update For Oracle Database Release 10.2
22.22 Weblogic and RAC o Doc ID 1086009.1 Grid Control 11g Fusion Middleware and Weblogic Server Management New Features 22.23 Websphere and RAC
o Click here. Using Oracle Real Application Cluster (RAC) with WebSphere Process Server. o Click here. Enabling Oracle pooling in WebSphere Application Server V6.1.
15
371
Database
o Click here. Event notification and database connection failover support that is available to database clients when a broker-managed failover occurs. o Click here. Oracle RAC on Extended Distance Clusters
16
371