SunEnterprise Cluster Administration

Sun Enterprise Cluster Administration
ES-330
Student Guide With Instructor Notes
Sun Microsystems, Inc. MS BRM01-209 500 Eldorado Boulevard Broomeld, Colorado 80021 U.S.A.
Rev. A, September 1999
Copyright 1999 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, California 94303, U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun Logo, Sun Enterprise, Sun StorEdge Volume Manager, Solstice DiskSuite, Solaris Operating Environment, Sun StorEdge A5000, Solstice SyMon, NFS, JumpStart, Sun VTS, OpenBoot, and AnswerBook are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Suns licensees who implement OPEN LOOK GUIs and otherwise comply with Suns written license agreements. U.S. Government approval required when exporting the product. RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Govt is subject to restrictions of FAR 52.227-14(g) (2)(6/87) and FAR 52.227-19(6/87), or DFAR 252.227-7015 (b)(6/95) and DFAR 227.7202-3(a). DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Please Recycle
Contents
About This Course ................................................................................... xvii Course Overview ........................................................................... xviii Course Map........................................................................................ xix Module-by-Module Overview ......................................................... xx Course Objectives........................................................................... xxiv Skills Gained by Module................................................................. xxv Guidelines for Module Pacing ..................................................... xxvi Topics Not Covered...................................................................... xxvii How Prepared Are You?............................................................. xxviii Introductions ................................................................................... xxix How to Use Course Materials ........................................................ xxx Course Icons and Typographical Conventions ......................... xxxii Icons .........................................................................................xxxii Typographical Conventions ............................................... xxxiii Notes to the Instructor................................................................. xxxiv Preparing for ES-330 Classes ...............................................xxxix Sun Cluster Overview ...............................................................................1-1 Objectives ........................................................................................... 1-1 Relevance............................................................................................ 1-2 Additional Resources ....................................................................... 1-3 Sun Cluster 2.2 Release Features .................................................... 1-4 Cluster Hardware Components...................................................... 1-6 Administration Workstation ...................................................1-7 Terminal Concentrator .............................................................1-7 Cluster Host Systems................................................................1-7 Redundant Private Networks .................................................1-8 Cluster Disk Storage .................................................................1-8 High Availability Features .............................................................. 1-9 High Availability Hardware Design ......................................1-9 Sun Cluster High Availability Software ..............................1-10 Software Redundant Array of Inexpensive Disks (RAID) Technology..............................................................1-10
iii
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services September 1999, Rev. A
Controller-Based RAID Technology ....................................1-10 Year 2000 Compliance ............................................................1-10 High Availability Strategies .......................................................... 1-11 Redundant Servers..................................................................1-12 Redundant Data ......................................................................1-12 Redundant Public Networks .................................................1-12 Redundant Private Networks ...............................................1-12 Cluster Configurations................................................................... 1-13 Highly Available Data Service Configuration ....................1-13 Parallel Database Configuration...........................................1-14 Sun Cluster Application Support ................................................. 1-15 Highly Available Data Service Support...............................1-16 Parallel Database Support .....................................................1-16 Logical Hosts ................................................................................... 1-17 Logical Host Failure Process .................................................1-18 Cluster Configuration Databases..........................................1-18 Fault Monitoring ............................................................................. 1-19 Data Service Fault Monitoring ..............................................1-20 Cluster Fault Monitoring .......................................................1-20 Failure Recovery Summary ........................................................... 1-22 Exercise: Lab Equipment Familiarization.................................... 1-25 Preparation...............................................................................1-25 Tasks .........................................................................................1-25 Check Your Progress ...................................................................... 1-26 Think Beyond .................................................................................. 1-27 Terminal Concentrator ..............................................................................2-1 Objectives ........................................................................................... 2-1 Relevance............................................................................................ 2-2 Additional Resources ....................................................................... 2-3 Cluster Administration Interface.................................................... 2-4 Major Elements..........................................................................2-6 Terminal Concentrator Overview .................................................. 2-7 Operating System Load............................................................2-9 Setup Port...................................................................................2-9 Terminal Concentrator Setup Programs................................2-9 Terminal Concentrator Setup ........................................................ 2-10 Connecting to Port 1 ...............................................................2-11 Enabling Setup Mode .............................................................2-11 Setting the Terminal Concentrator IP Address...................2-12 Setting the Terminal Concentrator Load Source ................2-12 Specify the Operating System Image ...................................2-13 Setting the Serial Port Variables............................................2-14 Terminal Concentrator Troubleshooting..................................... 2-15 Manually Connecting to a Node...........................................2-15 Using the telnet Command to Abort a Node...................2-16
iv

Connecting to the Terminal Concentrator CLI ...................2-16 Using the Terminal Concentrator help Command ...........2-16 Identifying and Resetting a Locked Port .............................2-17 Erasing Terminal Concentrator Settings..............................2-17 Exercise: Configuring the Terminal Concentrator ..................... 2-18 Preparation...............................................................................2-18 Tasks .........................................................................................2-18 Exercise Summary...................................................................2-28 Check Your Progress ...................................................................... 2-29 Think Beyond .................................................................................. 2-30 Administration Workstation Installation..............................................3-1 Objectives ........................................................................................... 3-1 Relevance............................................................................................ 3-2 Additional Resources ....................................................................... 3-3 Sun Enterprise Cluster Software Summary .................................. 3-4 Sun Cluster Software Installation ...........................................3-6 Administrative Workstation Software Packages..................3-6 scinstall Command Line Options......................................3-8 Sun Cluster Installation Program Startup ..................................... 3-9 Initial Installation Startup ......................................................3-10 Existing Installation Startup ..................................................3-11 Installation Mode ....................................................................3-12 Administration Workstation Environment................................. 3-13 New Search and Man Page Paths .........................................3-13 Host Name Resolution Changes...........................................3-14 Remote Login Control ............................................................3-14 Remote Display Enabling ......................................................3-15 Controlling rcp and rsh Access ...........................................3-15 Cluster Administration Tools Configuration.............................. 3-16 Cluster Administration Interface..........................................3-17 Administration Tool Configuration Files ............................3-18 Cluster Administration Tools........................................................ 3-19 The Cluster Control Panel .....................................................3-20 Cluster Console .......................................................................3-21 Cluster Administration Tools........................................................ 3-24 Cluster Help Tool....................................................................3-24 Exercise: Installing the Sun Cluster Client Software ................. 3-25 Preparation...............................................................................3-25 Tasks .........................................................................................3-26 Updating the Name Service ..................................................3-26 Installing OS Patches ..............................................................3-26 Running the scinstall Utility ............................................3-27 Configuring the Administration Workstation Environment .........................................................................3-28
v
Verifying the Administration Workstation Environment .........................................................................3-28 Configuring the /etc/clusters File ..................................3-29 Configuring the /etc/serialports File............................3-29 Starting the cconsole Tool ...................................................3-30 Configuring the Cluster Host Systems Environment ........3-31 Verifying the Cluster Host Systems Environment.............3-31 Exercise Summary...................................................................3-33 Check Your Progress ...................................................................... 3-34 Think Beyond .................................................................................. 3-35 Preinstallation Configuration ..................................................................4-1 Objectives ........................................................................................... 4-1 Relevance............................................................................................ 4-2 Additional Resources ....................................................................... 4-3 Cluster Topologies ............................................................................ 4-4 Clustered Pairs Topology ........................................................4-5 Ring Topology ...........................................................................4-6 N+1 Topology............................................................................4-7 Shared-Nothing Topology .......................................................4-8 Scalable Topology .....................................................................4-9 Cluster Quorum Devices................................................................ 4-10 Disk Drive Quorum Device ...................................................4-11 Array Controller Quorum Device ........................................4-12 Quorum Device in a Ring Topology ....................................4-13 Quorum Device in a Scalable Topology ..............................4-14 Cluster Interconnect System Overview ....................................... 4-16 Interconnect Types..................................................................4-17 Interconnect Configurations..................................................4-17 Cluster Interconnect System Configuration................................ 4-18 Cluster Interconnect Addressing..........................................4-19 Point-to-Point Connections....................................................4-20 SCI High-Speed Switch Connection.....................................4-21 SCI Card Identification...........................................................4-22 SCI Card Self-Test Information.............................................4-22 SCI Card Scrubber Jumpers...................................................4-23 Ethernet Hub Connection ......................................................4-24 Ethernet Card Identification..................................................4-25 Public Network Management ....................................................... 4-26 PNM Configuration ................................................................4-28 Shared CCD Volume ...................................................................... 4-29 Shared CCD Volume Creation ..............................................4-31 Disabling a Shared CCD ........................................................4-31 Cluster Configuration Information .............................................. 4-32 Using prtdiag to Verify System Configuration ................4-33
vi

Interpreting prtdiag Output................................................4-35 Identifying Storage Arrays ....................................................4-36 Storage Array Firmware Upgrades .............................................. 4-37 Array Firmware Patches ........................................................4-38 Exercise: Preinstallation Preparation ........................................... 4-39 Preparation...............................................................................4-39 Tasks .........................................................................................4-39 Cluster Topology.....................................................................4-40 Quorum Device Configuration .............................................4-40 Ethernet Cluster Interconnect Configuration .....................4-41 SCI Cluster Interconnect Configuration ..............................4-43 Node Locking Configuration ................................................4-46 Check Your Progress ...................................................................... 4-48 Think Beyond .................................................................................. 4-49 Cluster Host Software Installation .........................................................5-1 Objectives ........................................................................................... 5-1 Relevance............................................................................................ 5-2 Additional Resources ....................................................................... 5-3 Sun Cluster Server Software Overview ......................................... 5-4 Server Package Set Contents ...................................................5-6 Sun Cluster Licensing...............................................................5-8 Sun Cluster Installation Overview ................................................. 5-9 Sun Cluster Volume Managers ..................................................... 5-10 Volume Manager Choices......................................................5-11 Sun Cluster Host System Configuration ..................................... 5-12 Cluster Host System Questions ............................................5-13 SCI Interconnect Configuration ............................................5-14 Ethernet Interconnect Configuration ...................................5-15 Sun Cluster Public Network Configuration................................ 5-16 Sun Cluster Logical Host Configuration ..................................... 5-18 Data Protection Configuration ...................................................... 5-20 Failure Fencing ........................................................................5-21 Node Locking ..........................................................................5-22 Quorum Device .......................................................................5-23 Application Configuration ............................................................ 5-26 Post-Installation Configuration..................................................... 5-28 Installation Verification..........................................................5-29 Correcting Minor Configuration Errors ..............................5-30 Software Directory Paths .......................................................5-31 SCI Interconnect Configuration ............................................5-32 Exercise: Installing the Sun Cluster Server Software................. 5-34 Preparation...............................................................................5-34 Tasks .........................................................................................5-35 Update the Name Service ......................................................5-35 Installing Solaris Operating System Patches.......................5-35
vii
Storage Array Firmware Revision ........................................5-36 Installation Preparation..........................................................5-36 Server Software Installation ..................................................5-37 SCI Interconnect Configuration ............................................5-38 Cluster Reboot .........................................................................5-40 Configuration Verification.....................................................5-40 Testing Basic Cluster Operation ...........................................5-41 Check Your Progress ...................................................................... 5-42 Think Beyond .................................................................................. 5-43 System Operation.......................................................................................6-1 Objectives ........................................................................................... 6-1 Relevance............................................................................................ 6-2 Additional Resources ....................................................................... 6-3 Cluster Administration Tools.......................................................... 6-4 Basic Cluster Control (scadmin).............................................6-6 Cluster Control Panel ....................................................................... 6-8 Starting the Cluster Control Panel..........................................6-9 Adding New Applications to the Cluster Control Panel.........................................................................................6-9 Console Tool Variations.........................................................6-10 The hastat Command................................................................... 6-11 General Cluster Status............................................................6-12 Logical Host Configuration ...................................................6-13 Private Network Status ..........................................................6-14 Public Network Status............................................................6-15 Data Service Status..................................................................6-16 Cluster Error Messages ..........................................................6-17 Sun Cluster Manager Overview ................................................... 6-18 Sun Cluster Manager Startup................................................6-19 Initial Sun Cluster Manager Display....................................6-20 Sun Cluster Manager Displays ..................................................... 6-21 SCM Cluster Configuration Display ....................................6-22 System Log Filter.....................................................................6-25 The SCM Help Display...........................................................6-26 Cluster SNMP Agent ...................................................................... 6-27 Cluster MIB Tables..................................................................6-28 SNMP Traps.............................................................................6-29 Configuring the Cluster SNMP Agent Port ........................6-30 Exercise: Using System Operations .............................................. 6-31 Preparation...............................................................................6-31 Tasks .........................................................................................6-32 Starting the Cluster Control Panel........................................6-32 Using the hastat Command ................................................6-32 Using the Sun Cluster Manager............................................6-33 Exercise Summary...................................................................6-35
viii

Check Your Progress ...................................................................... 6-36 Think Beyond .................................................................................. 6-37 Volume Management Using CVM and SSVM ....................................7-1 Objectives ........................................................................................... 7-1 Relevance............................................................................................ 7-2 Additional Resources ....................................................................... 7-3 Disk Space Management.................................................................. 7-4 CVM and SSVM Disk Space Management............................7-5 Private Region Contents ..........................................................7-7 Public Region Usage .................................................................7-7 Private and Public Region Format..........................................7-8 Initialized Disk Types...............................................................7-8 CVM and SSVM Encapsulation ...................................................... 7-9 Preferred Boot Disk Configuration ......................................7-10 Prerequisites for Boot Disk Encapsulation ..........................7-11 Primary and Mirror Configuration Differences .................7-11 The /etc/vfstab File ............................................................7-12 Boot PROM Changes ..............................................................7-12 Un-encapsulating the Boot Disk ...........................................7-13 CVM and SSVM Disk Grouping................................................... 7-14 Cluster Volume Manager Disk Groups ...............................7-15 Sun StorEdge Volume Manager Disk Groups ....................7-16 Volume Manager Status Commands ........................................... 7-17 Checking Disk Status..............................................................7-19 Saving Configuration Information .......................................7-19 Optimizing Recovery Times.......................................................... 7-20 Dirty Region Logging.............................................................7-21 The Veritas VxFS File System................................................7-21 CVM and SSVM Post-Installation ................................................ 7-22 Initializing the rootdg Disk Group......................................7-22 Matching the vxio Driver Major Numbers ........................7-23 StorEdge Volume Manager Dynamic Multi-Pathing ........7-24 Exercise: Configuring Volume Management.............................. 7-26 Preparation...............................................................................7-26 Tasks .........................................................................................7-27 Installing the CVM or SSVM Software ................................7-28 Disabling Dynamic Multipathing (DMP)............................7-29 Creating a Simple rootdg Slice..............................................7-30 Encapsulating the Boot Disk .................................................7-31 Selecting Demonstration Volume Disks ..............................7-32 Configuring the CVM/SSVM Demonstration Volumes.................................................................................7-35 Verifying the CVM/SSVM Demonstration File Systems ..................................................................................7-36
ix
Verifying the Cluster ..............................................................7-37 Exercise Summary...................................................................7-38 Check Your Progress ...................................................................... 7-39 Think Beyond .................................................................................. 7-40 Volume Management Using SDS ...........................................................8-1 Objectives ........................................................................................... 8-1 Relevance............................................................................................ 8-2 Additional Resources ....................................................................... 8-3 Disk Space Management.................................................................. 8-4 SDS Disk Space Management .................................................8-5 Solstice DiskSuite Initialization....................................................... 8-6 Replica Configuration Guidelines ..........................................8-7 SDS Disk Grouping........................................................................... 8-8 Dual-String Mediators.................................................................... 8-10 Shared Diskset Replica Placement........................................8-11 Metatrans Devices........................................................................... 8-12 Metatrans Device Structure ...................................................8-14 SDS Status ........................................................................................ 8-15 Checking Volume Status........................................................8-16 Checking Mediator Status......................................................8-16 Volume Manager Status................................................................. 8-17 Checking Replica Status.........................................................8-17 Volume Manager Status................................................................. 8-18 Recording SDS Configuration Information.........................8-18 SDS Post-Installation ...................................................................... 8-19 Configuring State Database Replicas ...................................8-19 Configuring the Disk ID (DID) Driver.................................8-20 Configuring Dual-String Mediators .....................................8-21 Exercise: Configuring Volume Management.............................. 8-22 Preparation...............................................................................8-22 Tasks .........................................................................................8-23 Installing the SDS Software ...................................................8-24 Configuring the SDS Disk ID Driver....................................8-25 Resolving DID Driver Major Number Conflicts.................8-26 Initializing the SDS State Databases.....................................8-28 SDS Volume Overview...........................................................8-29 Selecting SDS Demo Volume Disks Drives.........................8-31 Configuring the SDS Demonstration Volumes ..................8-32 Configuring Dual-String Mediators .....................................8-32 Verifying the SDS Demonstration File Systems .................8-34 Verifying the Cluster ..............................................................8-35 Exercise Summary...................................................................8-36 Check Your Progress ...................................................................... 8-37 Think Beyond .................................................................................. 8-38

Cluster Configuration Database .............................................................9-1 Objectives ........................................................................................... 9-1 Relevance............................................................................................ 9-2 Additional Resources ....................................................................... 9-3 Cluster Configuration Information ................................................ 9-4 The CDB Database ....................................................................9-5 The CCD Database....................................................................9-6 Cluster Database Consistency ......................................................... 9-8 Data Propagation ......................................................................9-8 The CCD Update Protocol .......................................................9-9 Database Consistency Checking ...........................................9-10 Database Majority ...................................................................9-10 Shared CCD Volume ...................................................................... 9-11 Shared CCD Operation ..........................................................9-13 Creating a Shared CCD ..........................................................9-13 Disabling a Shared CCD ........................................................9-15 CCD Administration ...................................................................... 9-16 Verifying CCD Global Consistency.......................................9-16 Checkpointing the CCD .........................................................9-17 Restoring the CCD From a Backup Copy............................9-17 Creating a Purified Copy of the CCD ..................................9-17 Disabling the CCD Quorum..................................................9-18 Recommended CCD Administration Tasks........................9-18 Common Mistakes ..................................................................9-18 Exercise: CCD Administration...................................................... 9-19 Preparation...............................................................................9-19 Tasks .........................................................................................9-19 Maintaining the CCD Database ...........................................9-20 Exercise Summary...................................................................9-21 Check Your Progress ...................................................................... 9-22 Think Beyond .................................................................................. 9-23 Public Network Management................................................................10-1 Objectives ......................................................................................... 10-1 Relevance.......................................................................................... 10-2 Additional Resources ..................................................................... 10-3 Public Network Management ....................................................... 10-4 The Network Monitoring Process ................................................ 10-6 What Happens? .......................................................................10-7 How PNM Works ........................................................................... 10-8 PNM Support Issues...............................................................10-9 TEST Routine..........................................................................10-11 FAILOVER Routine .................................................................10-12 DETERMINE_NET_FAILURE Routine ...................................10-13 The pnmset Command................................................................. 10-14 Other PNM Commands ............................................................... 10-17
xi
The pnmstat Command.......................................................10-17 The pnmptor Command.......................................................10-19 The pnmrtop Command.......................................................10-19 Exercise: Configuring the NAFO Groups ................................. 10-20 Preparation.............................................................................10-20 Tasks .......................................................................................10-20 Creating a NAFO Group......................................................10-21 Disabling the Interface Groups Feature.............................10-22 Exercise Summary.................................................................10-23 Check Your Progress .................................................................... 10-24 Think Beyond ................................................................................ 10-25 Logical Hosts .............................................................................................11-1 Objectives ......................................................................................... 11-1 Relevance.......................................................................................... 11-2 Additional Resources ..................................................................... 11-3 Logical Hosts ................................................................................... 11-4 Configuring a Logical Host ........................................................... 11-7 Using the scconf -L Command Option ................................11-8 Logical Host Variations................................................................ 11-10 Basic Logical Host .................................................................11-10 Cascading Failover................................................................11-11 Disabling Automatic Takeover ...........................................11-12 Multiple Disk Group and hostnames ...................................11-12 Administrative File System Overview....................................... 11-13 Administrative File System Components..........................11-14 Using the scconf -F Command Option .........................11-15 Logical Host File Systems ............................................................ 11-17 Adding a New Logical Host File System...........................11-18 Sample Logical Host vfstab File .......................................11-18 Logical Host Control .................................................................... 11-19 Forced Logical Host Migration ...........................................11-19 Logical Host Maintenance Mode........................................11-20 Exercise: Preparing Logical Hosts .............................................. 11-21 Preparation.............................................................................11-21 Tasks .......................................................................................11-21 Preparing the Name Service................................................11-22 Activating the Cluster ..........................................................11-22 Logical Host Restrictions .....................................................11-23 Creating the Logical Hosts ..................................................11-24 Creating the CVM/SSVM Administrative File System..................................................................................11-25 Creating the SDS Administrative File System ..................11-25 Exercise Summary.................................................................11-28 Check Your Progress .................................................................... 11-29 Think Beyond ................................................................................ 11-30
xii

The HA-NFS Data Service......................................................................12-1 Objectives ......................................................................................... 12-1 Relevance.......................................................................................... 12-2 Additional Resources ..................................................................... 12-3 HA-NFS Overview.......................................................................... 12-4 HA-NFS Support Issues .........................................................12-5 Start NFS Methods .......................................................................... 12-7 Stop NFS Methods .......................................................................... 12-9 HA-NFS Fault Monitoring........................................................... 12-10 HA-NFS Fault Monitoring Probes......................................12-10 Fault Probes ................................................................................... 12-12 Local Fault Probes......................................................................... 12-13 Remote Fault Probes..................................................................... 12-14 Giveaway and Takeaway Process .............................................. 12-15 Sanity Checking.....................................................................12-16 Processes Related to NFS Fault Monitoring.............................. 12-17 HA-NFS Support Files.................................................................. 12-18 Adding Mount Information to the vfstab File................12-19 Adding Share Information to the dfstab File ................12-19 Sample vfstab and dfstab Files .......................................12-20 Removing HA-NFS File Systems From a Logical Host ......................................................................................12-20 Using the hareg Command......................................................... 12-21 Registering a Data Service ...................................................12-21 Unregistering a Data Service...............................................12-24 Starting and Stopping a Data Service.................................12-25 File Locking Recovery .................................................................. 12-26 Exercise: Setting Up HA-NFS File Systems............................... 12-27 Preparation.............................................................................12-27 Tasks .......................................................................................12-27 Verifying the Environment..................................................12-28 Preparing the HA-NFS File Systems ..................................12-29 Registering HA-NFS Data Service......................................12-30 Verifying Access by NFS Clients ........................................12-31 Observing HA-NFS Failover Behavior ..............................12-32 Check Your Progress .................................................................... 12-33 Think Beyond ................................................................................ 12-34 System Recovery ......................................................................................13-1 Objectives ......................................................................................... 13-1 Relevance.......................................................................................... 13-2 Additional Resources ..................................................................... 13-3 Sun Cluster Reconfiguration Control........................................... 13-4 Cluster Membership Monitor................................................13-6 Switch Management Agent ...................................................13-6 Public Network Management ...............................................13-6
xiii
Failfast Driver (/dev/ff) .......................................................13-6 Data Service Fault Monitors ..................................................13-7 Disk Management Software ..................................................13-7 Database Management Software ..........................................13-7 Sun Cluster Failfast Driver ............................................................ 13-8 Failfast Messages...................................................................13-10 Sun Cluster Reconfiguration Sequence...................................... 13-11 Reconfiguration Triggering Events ....................................13-13 Independent Reconfiguration Processes ...........................13-13 Sun Cluster Reconfiguration Steps............................................. 13-14 Reconfiguration Process Priorities......................................13-16 Reconfiguration Step Summary..........................................13-17 Cluster Interconnect Failures ...................................................... 13-18 CIS Failure Description ........................................................13-18 CIS Failure Symptoms..........................................................13-19 Correcting Ethernet CIS Failures ........................................13-20 Correcting SCI Interconnect Failures .................................13-20 Two-Node Partitioned Cluster Failure ...................................... 13-21 CVM or SSVM Partitioned Cluster.....................................13-21 SDS Partitioned Cluster .......................................................13-22 Logical Host Reconfiguration ..................................................... 13-23 Sanity Checking.....................................................................13-24 Exercise: Failure Recovery ........................................................... 13-25 Preparation.............................................................................13-25 Tasks .......................................................................................13-25 Losing a Private Network Cable.........................................13-26 Partitioned Cluster (Split Brain) .........................................13-26 Public Network Failure (NAFO group).............................13-27 Logical Host Fault Monitor Giveaway ..............................13-27 Cluster Failfast.......................................................................13-28 Exercise Summary.................................................................13-29 Check Your Progress .................................................................... 13-30 Think Beyond ................................................................................ 13-31 Sun Cluster High Availability Data Service API ..............................14-1 Objectives ......................................................................................... 14-1 Relevance.......................................................................................... 14-2 Additional Resources ..................................................................... 14-3 Overview .......................................................................................... 14-4 Data Service Requirements............................................................ 14-6 Client-Server Data Service .....................................................14-6 Data Service Dependencies ...................................................14-6 No Dependence on Physical Hostname of Server..............14-7 Handles Multi-homed Hosts.................................................14-7 Handles Additional IP Addresses for Logical Hosts.........14-7
xiv

Data Service Methods..................................................................... 14-9 START Methods ......................................................................14-9 STOP Methods.......................................................................14-10 ABORT Methods ...................................................................14-10 NET Methods.........................................................................14-11 Fault Monitoring Methods ..................................................14-12 Giveaway and Takeaway............................................................. 14-14 Giveaway Scenario................................................................14-15 Takeaway Scenario ...............................................................14-16 Method Considerations................................................................ 14-17 START and STOP Method Examples......................................... 14-19 Example 1 ...............................................................................14-19 Example 2 ...............................................................................14-20 Data Service Dependencies ......................................................... 14-21 The haget Command Options............................................14-24 The hactl Command ................................................................... 14-26 The hactl Command Options............................................14-27 The halockrun Command .......................................................... 14-28 The hatimerun Command .......................................................... 14-29 The pmfadm Command................................................................ 14-30 What Is Different From HA 1.3? ................................................. 14-31 The hads C Library Routines ..................................................... 14-32 Exercise: Using the Sun Cluster Data Service API ................... 14-33 Preparation.............................................................................14-33 Tasks .......................................................................................14-33 Using the haget Command.................................................14-34 Check Your Progress .................................................................... 14-36 Think Beyond ................................................................................ 14-37 Highly Available DBMS ........................................................................15-1 Objectives ......................................................................................... 15-1 Relevance.......................................................................................... 15-2 Additional Resources ..................................................................... 15-3 Sun Cluster HA-DBMS Overview ................................................ 15-4 Database Binary Placement ...................................................15-5 Supported Database Versions ...............................................15-5 HA-DBMS Components.........................................................15-6 Multiple Data Services ...........................................................15-6 Typical HA-DBMS Configuration ................................................ 15-7 Configuring and Starting HA-DBMS........................................... 15-8 Stopping and Unconfiguring HA-DBMS .................................... 15-9 Removing a Logical Host.....................................................15-10 Removing a DBMS From a Logical Host...........................15-10 The HA-DBMS Start Methods..................................................... 15-11 The HA-DBMS Stop and Abort Methods.................................. 15-13 The HA-DBMS Stop Methods .............................................15-13
xv
The HA-DBMS Abort Methods...........................................15-14 HA-DBMS Fault Monitoring ....................................................... 15-15 Local Fault Probe Operation ...............................................15-16 Remote Fault Probe Operation............................................15-16 HA-DBMS Action Files ........................................................15-17 HA-DBMS Failover Procedures..........................................15-19 Configuring HA-DBMS for High Availability ......................... 15-20 Multiple Data Services .........................................................15-20 Raw Partitions Versus File Systems ...................................15-21 Configuration Overview.............................................................. 15-22 General HA-DBMS Configuration Issues..........................15-22 User and Group Entries .......................................................15-23 Database Software Location ................................................15-23 Oracle Installation Preparation ................................................... 15-24 Sybase Installation Preparation .................................................. 15-26 Informix Installation Preparation ............................................... 15-28 Preparing the Logical Host.......................................................... 15-30 Preparing the Database Configuration Files .....................15-31 Enable Fault Monitoring Access .........................................15-31 Registering the HA-DBMS Data Service ...........................15-32 Adding Entries to the CCD..................................................15-32 Bring the HA-DBMS for Oracle Servers Into Service ......15-32 HA-DBMS Control........................................................................ 15-33 Setting HA-DBMS Monitoring Parameters.......................15-33 Starting and Stopping HA-DBMS Monitoring .................15-37 HA-DBMS Client Overview ........................................................ 15-38 Maintaining the List of Monitored Databases ..................15-39 HA-DBMS Recovery..................................................................... 15-40 Client Recovery .....................................................................15-40 HA-DBMS Recovery Time...................................................15-41 HA-DBMS Configuration Files ................................................... 15-42 HA-Oracle Configuration Files ...........................................15-43 HA-Sybase Configuration Files ..........................................15-44 HA-Informix Configuration Files .......................................15-45 Exercise: HA-DBMS Installation................................................. 15-46 Preparation.............................................................................15-46 Tasks .......................................................................................15-46 Exercise Summary.................................................................15-47 Check Your Progress .................................................................... 15-48 Think Beyond ................................................................................ 15-49 Cluster Configuration Forms ..................................................................A-1 Cluster Name and Address Information...................................... A-2 Multi-Initiator SCSI Configuration ...................................................... B-1 Preparing for Multi-Initiator SCSI................................................. B-2 Background............................................................................... B-2
xvi Sun Enterprise Cluster Administration
Changing All Adapters ................................................................... B-3 Changing the Initiator ID........................................................ B-3 Drive Firmware ........................................................................ B-3 The nvramrc Script .......................................................................... B-4 Changing an Individual Initiator ID for Multipacks .................. B-5 Sun Storage Array Overviews ................................................................C-1 Disk Storage Concepts..................................................................... C-2 Multi-host Access.....................................................................C-2 Host-Based RAID (Software RAID Technology).................C-5 Controller-Based RAID (Hardware RAID Technology) ...........................................................................C-6 Redundant Dual Active Controller Driver...........................C-7 Dynamic Multi-Path Driver....................................................C-8 Hot Swapping...........................................................................C-9 SPARCstorage Array 100 .............................................................. C-10 SPARCstorage Array 100 Features ......................................C-10 SPARCstorage Array 100 Addressing ................................C-11 RSM Storage Array ........................................................................ C-12 RSM Storage Array Features ................................................C-12 RSM Storage Array Addressing...........................................C-13 SPARCstorage Array 214/219...................................................... C-14 SPARCstorage Array 214/219 Features .............................C-14 SPARCstorage Array 214 Addressing ................................C-15 Sun StorEdge A3000 (RSM Array 2000)...................................... C-16 StorEdge A3000 Features ......................................................C-16 StorEdge A3000 Addressing.................................................C-17 StorEdge A1000/D1000................................................................. C-19 Shared Features ......................................................................C-19 StorEdge A1000 Differences .................................................C-20 StorEdge A1000 Addressing.................................................C-20 StorEdge D1000 Differences .................................................C-21 StorEdge D1000 Addressing.................................................C-21 Sun StorEdge A3500 ...................................................................... C-22 StorEdge A3500 Features ......................................................C-22 StorEdge A3500 Addressing.................................................C-24 Sun StorEdge A5000 ...................................................................... C-25 A5000 Features .......................................................................C-25 StorEdge A5000 Addressing.................................................C-27 Sun StorEdge A7000 ...................................................................... C-29 Sun StorEdge A7000 Enclosure............................................C-29 StorEdge A7000 Functional Elements .................................C-31 StorEdge A7000 Addressing.................................................C-33 Combining SSVM and A7000 Devices ................................C-34 SPARCstorage MultiPack ............................................................. C-35 SPARCstorage MultiPack Features .....................................C-36
xvii
SPARCstorage MultiPack Addressing................................C-36 Storage Configuration ................................................................... C-37 Identifying Storage Devices..................................................C-37 Identifying Controller Configurations................................C-40 Oracle Parallel Server ...............................................................................D-1 Oracle Overview .............................................................................. D-2 Oracle 7.x and Oracle 8.x Similarities ...................................D-3 Oracle 7.x and Oracle 8.x Differences ...................................D-5 Oracle Configuration Files.............................................................. D-7 The /etc/system File .............................................................D-8 The /etc/opt/SUNWcluster/conf/clustname .ora_cdb File ........................................................................D-9 The init_ora File..................................................................D-10 Oracle Database Volume Access.................................................. D-11 Oracle Volume Types ............................................................D-13 CVM Volume Pathnames .....................................................D-13 Changing Permission or Ownership of Volumes .............D-13 DLM Reconfiguration.................................................................... D-14 DLM Reconfiguration Steps .................................................D-15 Volume Manager Reconfiguration with CVM .......................... D-16 Initial Volume Configuration...............................................D-18 Volume Reconfiguration With CVM...................................D-18 Oracle Parallel Server Specific Software..................................... D-19 The SUNWudlm Package Summary...................................D-19 Glossary ......................................................................................... Glossary-1 Acronyms Glossary................................................................... Acronyms-1
xviii

About This Course

Course Goal
This course provides students with the essential information and skills to install and administer a Sun Enterprise Cluster system running Sun Cluster 2.2 software.

Use this module to get the students excited about this course. With regard to the overheads: To avoid confusion among the students, it is very important to tell them that the page numbers on the overheads have no relation to the page numbers in their course materials. They should use the title of each overhead as a reference. The strategy provided by the About This Course is to introduce students to the course before they introduce themselves to you and one another. By familiarizing them with the content of the course first, their introductions will have more meaning in relation to the course prerequisites and objectives. Use this introduction to the course to determine how well students are equipped with the prerequisite knowledge and skills. The pacing chart Guidelines for Module Pacing section on page xxviii enables you to determine what adjustments you need to make in order to accommodate the learning needs of students.
xix
Course Overview
This course provides students with the essential information and skills to install and administer Sun Enterprise Cluster hardware running Sun Cluster 2.2 software. The most important tasks for the system administrator are Sun Cluster software installation and conguration, hardware conguration, system operations, and system recovery. During this course, these topics are presented in the order in which a typical cluster installation takes place.
xx

Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, September 1999, Rev. A
Course Map
The following course map enables you to see what you have accomplished and where you are going in reference to the course goal.
About This Course

xxi
Module-by-Module Overview
This course contains the following modules:
G
Module 1 Sun Cluster Overview This lecture-only module introduces all of the basic concepts associated with Sun Enterprise Cluster systems. Lab exercise There is no lab for this module.
Module 2 Terminal Concentrator This module introduces the critical administrative functions supported by the Terminal Concentrator interface, and explains basic Terminal Concentrator theory, and the installation and conguration process. Lab exercise Congure the Terminal Concentrator cabling and operating parameters for proper operation in the Sun Enterprise Cluster environment.
xxii

G
Module 3 Administration Workstation The lecture portion of this module presents an overview of Sun Cluster (SC) administration workstation software les and the general process used to install all cluster software. Lab exercise Install the SC software on the administration workstation, congure the cluster administrative les, and start one of the cluster administration tools, cconsole.
Module 4 Preinstallation Conguration This module provides the information necessary to prepare a Sun Enterprise Cluster (SEC) system for the Sun Cluster software installation. It focuses on issues relevant to selecting and conguring an appropriate cluster topology. Lab exercise Select and congure a target cluster topology. You will verify that the conguration is ready to start the cluster host software installation.
Module 5 Cluster Host Software Installation The lecture presents an overview of the Sun Cluster host software les and distribution. Lab exercise Install the Sun Cluster software on the cluster host systems.
Module 6 System Operation This module discusses the Sun Cluster Manager graphical administration tool, along with the cluster administration command-line features. Lab exercise Start and stop the cluster software using the scadmin command and verify cluster status with the SCM application and the hastat command.
About This Course
xxiii
G
Module 7 Volume Management with CVM and SSVM This module reviews the basic space management techniques used by the Cluster Volume Manager and the Sun StorEdge Volume Manager. The installation and initialization processes for CVM and SSVM are presented along with post-installation issues. Lab exercise Install and initialize either CVM or SSVM. You will use script les to create demonstration volumes.
Module 8 Volume Management with Solstice DiskSuite This module reviews the basic space management techniques used by the Solstice DiskSuite (SDS) volume manager. The installation and initialization processes for SDS are presented along with post-installation issues. Lab exercise Install and initialize SDS. You will use script les to create demonstration volumes.
Module 9 Cluster Conguration Database This module discusses the purpose, structure, and administration of the cluster database (CDB) and the cluster conguration database (CCD). Lab exercise Perform basic administration operations on the CDB and CCD les. This includes verifying consistency between cluster hosts, making backup copies, and checking for errors.
Module 10 Public Network Management This module describes the operation, conguration, and management of the Sun Cluster Public Network Management (PNM) mechanism. The creation of network adapter failover groups (NAFO) and their relationship to logical hosts is also discussed. Lab exercise Congure a NAFO group on each cluster host and disable the Solaris Operating System Interface Groups feature.
xxiv

G
Module 11 Logical Hosts This module discusses the purpose of Sun Cluster logical hosts and their relationship to data services. The structure and creation of logical hosts is presented along with some common variations. Lab exercise Congure and test two logical hosts.
Module 12 The HA-NFS Data Service This module describes and demonstrates the conguration and management of Sun Cluster HA-NFS le systems. Lab exercise Create, register, and test a demonstration HA-NFS data service. Switch the data service between cluster hosts.
Module 13 System Recovery This module summarizes the basic recovery process for a number of cluster failure scenarios. It includes background information and details about operator intervention. Lab exercise Create and recover from cluster interconnect failures, partitioned cluster, public network interface failures, a logical host failure, and a cluster node failfast failure.
Module 14 The Sun Cluster High Availability Data Service API This module provides an overview of how to integrate applications into the Sun Cluster High Availability framework. It also describes key failover actions performed by the Sun Cluster High Availability software. Lab exercise There is no lab for this module
Module 15 Highly Available DBMS This module describes the conguration and operation of a highly available database in the Sun Cluster environment. Lab exercise There is no lab for this module.
About This Course

xxv
Course Objectives
Upon completion of this course, you should be able to:
G G G G G G
Describe major Sun Cluster components and functions Verify system cabling Configure the Terminal Concentrator for proper operation Install, remove, and update Sun Cluster software Troubleshoot software installation and configuration errors Configure environmental variables for correct Sun Cluster operation Use the Sun Cluster administration tools Initialize one of the supported volume managers Describe the differences between the supported volume managers Prepare the Public Network Management failover environment Create and congure logical hosts Install and congure highly available data services Describe the Sun Cluster failure recovery mechanisms Identify and recover from selected Sun Cluster failures
G G G G G G G G
Ask the students how many signed up for this course because of the information in the Sun Educational Services course catalog, what their knowledge and expectations of the objectives stated there are, and use this information as a tool to manage your time in covering the material in this course.
xxvi

Skills Gained by Module

The skills for Sun Enterprise Cluster Administration are shown in column 1 of the following matrix. The black boxes indicate the main coverage for a topic; the gray boxes indicate the topic is briey discussed. Module Skills Gained
Describe the major Sun Enterprise Cluster components and functions Verify disk storage cabling Configure the Terminal Concentrator Configure the cluster interconnect system Install the Sun Cluster 2.2 software Troubleshoot software installation and configuration errors Configure environmental variables for correct SEC operation Use SEC administration tools Initialize either the Enterprise Volume Manager, Cluster Volume Manager, or Solstice DiskSuite Describe the SEC recovery mechanisms Configure the Sun Cluster 2.2 Highly Available NFS data service Create public network adapter backup groups with the public network management utility (PNM Identify and recover from selected SEC failures
2 3 4 5 6 7 8 9 10 11 12 13 14
Refer students to this matrix as you progress through the course to show them the progress they are making in learning the skills advertised for this course.
About This Course
xxvii
Guidelines for Module Pacing

The following table provides a rough estimate of pacing for this course: Module About This Course Product Introduction Terminal Concentrator Administration Workstation Installation Preinstallation Conguration Cluster Host Software Installation System Operation Volume Management Cluster Conguration Data Public Network Management Logical Hosts HA-NFS Data Service System Recovery Sun Cluster High Availability Data Service API HA-DBMS Day 1 AM AM AM/ PM PM AM AM/ PM PM AM PM PM AM AM/ PM PM AM Day 2 Day 3 Day 4 Day 5
AM/ PM
xxviii

Topics Not Covered

This course does not cover the topics shown on the above overhead. Many of the topics listed on the overhead are covered in other courses offered by Sun Educational Services:
G G
Database management Covered in database vendor courses Network administration Covered in SA-380: Solaris 2.x Network Administration Solaris administration Covered in SA-235: Solaris 2.X System Administration I and SA-286: Solaris System Administration II Database performance and tuning Covered in database vendor courses Disk storage management Covered in SO-352: Disk Management With Solstice DiskSuite, SA-345: Volume Manager with SPARCstorage Array, and SA-347: Volume Manager with StorEdge A5000
Refer to the Sun Educational Services catalog for specic information and registration.
About This Course
xxix
How Prepared Are You?

To be sure you are prepared to take this course, can you answer yes to the questions shown on the above overhead?
G
Virtual volume management administration is a central portion of the Sun Enterprise Cluster functionality. Solaris Operating Environment system administration is an integral part of Sun Enterprise Cluster administration. You cannot separate the two. Resolving all Sun Enterprise Cluster issues requires more hardware knowledge than in most other system applications. Sun Enterprise Cluster systems are frequently composed of enterprise-class components. You must be used to dealing with this type of high-end hardware.
If any students indicate they cannot do these requirements, meet with them at the first break to decide how to proceed with the class. Do they want to take the class at a later date? Is there some way to get the extra help needed during the week? It might be appropriate here to recommend resources from the Sun Educational Services catalog that provide training for topics not covered in this course.
xxx

Introductions
Now that you have been introduced to the course, introduce yourself to each other and the instructor, addressing the items shown on the above overhead.
About This Course
xxxi
How to Use Course Materials

To enable you to succeed in this course, these course materials employ a learning model that is composed of the following components:
G
Course map Each module starts with an overview of the content so you can see how the module ts into your overall course goal. Relevance The Relevance section for each module provides scenarios or questions that introduce you to the information contained in the module and provoke you to think about how the module content relates to cluster administration. Overhead image Reduced overhead images for the course are included in the course materials to help you easily follow where the instructor is at any point in time. Overheads do not appear on every page. Lecture The instructor will present information specic to the topic of the module. This information will help you learn the knowledge and skills necessary to succeed with the exercises.
xxxii

How to Use Course Materials

G
Exercise Lab exercises will give you the opportunity to practice your skills and apply the concepts presented in the lecture. Check your progress Module objectives are restated, sometimes in question format, so that before moving on to the next module you are sure that you can accomplish the objectives of the current module. Think beyond Thought-provoking questions are posed to help you apply the content of the module or predict the content in the next module.
About This Course
xxxiii
Course Icons and Typographical Conventions

The following icons and typographical conventions are used in this course to represent various training elements and alternative learning resources.
Icons
Additional resources Indicates additional reference materials are available.
Discussion Indicates a small-group or class discussion on the current topic is recommended at this time.
Exercise objective Indicates the objective for the lab exercises that follow. The exercises are appropriate for the material being discussed.
Note Additional important, reinforcing, interesting or special information.
Caution A potential hazard to data or machinery.
!
Warning Anything that poses personal danger or irreversible damage to data or the operating system.
xxxiv

Course Icons and Typographical Conventions

Typographical Conventions
Courier is used for the names of commands, les, and directories, as well as on-screen computer output. For example: Use ls -al to list all les. system% You have mail. Courier bold is used for characters and numbers that you type. For example: system% su Password:
Courier italic is used for variables and command-line placeholders that are replaced with a real name or value. For example:
To delete a le, type rm filename. Palatino italics is used for book titles, new words or terms, or words that are emphasized. For example: Read Chapter 6 in Users Guide. These are called class options. You must be root to do this.
About This Course
xxxv
Notes to the Instructor

Philosophy
The Sun Enterprise Cluster Administration course has been created to allow for interactions between the instructor and the student as well as between the students themselves. In an effort to enable you to accomplish the course objectives easily, and in the time frame given, a series of tools have been developed and support materials created for your discretionary use. A consistent structure has been used throughout this course. This structure is outlined in the Course Goal section. The suggested ow for each module is: 1. 2. 3. 4. 5. Module objectives Context questions/module rationale Lecture information with appropriate overheads Lab exercises Discussion: either as whole class or in small groups
To allow the instructor exibility and give time for meaningful discussions during the relevance periods, the lectures, and the small-group discussions, a timing table is included in the Course Tools section.
Course Tools
To enable you to follow this structure, the following supplementary materials are provided with this course:
G
Relevance These questions or scenarios set the context of the module. It is suggested that the instructor ask these questions and discuss the answers. The answers are provided only in the instructors guide.
xxxvi


Course Tools
G
Course map The course map allows the students to get a visual picture of the course. It also helps students know where they have been, where they are, and where they are going. The course map is presented in the About This Course in the students guide.
Lecture overheads Overheads for the course are provided in two formats: The paper-based format can be copied onto standard transparencies and used on a standard overhead projector. These overheads are also provided in the students guide. The Web browserbased format is in HTML and can be projected using a projection system which displays from a workstation. This format gives the instructor the ability to allow the students to view the overhead information on individual workstations. It also allows better random access to the overheads.
Small-group discussion After the lab exercises, it is a good idea to debrief the students. Gather them back into the classroom and have them discuss their discoveries, problems, and issues in programming the solution to the problem in small groups of four or ve, one-on-one, or one-onmany.
General timing recommendations Each module contains a Relevance section. This section may present a scenario relating to the content presented in the module, or it may present questions that stimulate students to think about the content that will be presented. Engage the students in relating experiences or posing possible answers to the questions. Spend no more that 1015 minutes on this section
About This Course
xxxvii

Course Tools
Module About This Course Lecture (Minutes) 30 Lab (Minutes) 0 Total Time (Minutes) 30
Sun Cluster Overview Terminal Concentrator Administration Workstation Installation Preinstallation Conguration Cluster Host Software Installation Volume Management System Operation Cluster Configuration Database Public Network Management Logical Hosts The HA-NFS Data Service System Recovery Sun Cluster High Availability Data Service API HA-DBMS
G
135 60 60 90 60 60 60 60 60 60 60 60 90 75 Module self-check
0 45 60 60 45 60 60 45 75 60 75 75 0 0
135 105 120 150 105 120 120 105 135 120 135 135 90 75
Each module contains a checklist for students under Check Your Progress. Give them a little time to read through this checklist before going on to the next lecture. Ask them to see you for items they do not feel comfortable checking off.
xxxviii

Instructor Setup Notes

Purpose of This Guide
This guide provides all of the details that an instructor needs to know both to initially set up the classroom environment and to prepare the environment for each class offering.
Minimum Resource Requirements

Network
Each two-node cluster and administration workstation combination needs a minimum of ve network connections. This would normally be a private network in training, typically just network hub boxes. An additional two connections are needed for each additional node.
Hardware
This is far too complex to answer here. Check the cluster engineering Web sites for cluster conguration guides. Basically, this course can be run on any two/three/four-node cluster with either a Ethernet or SCI interconnect system. Each cluster should have two dual-hosted storage arrays with a minimum of 5 disks each.
Software
If your students are going to use the Sun StorEdge Volume Manager or the Cluster Volume Manager, the cluster hosts can run only the Solaris 2.6 Operating Environment. If the students are going to use Solstice DiskSuite, the cluster hosts can run the Solaris 7 Operating Environment. You will need software for Sun Cluster 2.2, Sun StorEdge Volume Manager 2.6, Cluster Volume Manager 2.2.1, and Solstice DiskSuite 4.2. You MUST have the patches 107388-01, 107538-01, and 106627-03 available.
About This Course
xxxix
Instructor Setup Notes

Storage Array Firmware
It is critical that the Solaris storage array drivers match the revision level of the storage array rmware. This is most important with the Sun StorEdge A5000 units. You should upgrade all storage arrays to the current rmware level and make sure that the operating system drivers are at the corresponding patch level when the class starts. If there is a mismatch between the operating system drivers and the storage array rmware revision, there can be many unpredictable problems when performing the lab exercises. This is especially true when installing and conguring the Solstice DiskSuite volume manager DID drivers.
Assumptions About the Lab

The following assumptions are made about the lab used for this course:
G
Each cluster/administration workstation combination will be used by no more than 2-3 students. Three students per cluster is really too many but business demands sometimes dictate this. There will not be a naming service in operation. The students are instructed to congure the /etc/hosts les on all systems for network name resolution.
xl

Setting Up for a ES-330 Class

Preparing for ES-330 Classes
The following procedures must be completed before each ES-330 class:
Student Workstations
The administration workstations must have a full Solaris 2.6 (Entire Distribution) load before class starts. Use the Sun Cluster scinstall script to remove all Sun Cluster software from the administration workstation. Note If you can use the JumpStart and Solaris 2.6 software, you will not need to remove the Sun Cluster software.
Application Server
The lab les should reside on a lab application server and must be exported for NFS mounting so that portions of the lab les can be copied onto the administration workstations and the cluster host systems. In some training centers, they use the JumpStart product on all the systems with the Solaris Operating Environment, and have the appropriate lab software already mounted with the NFS system. The lab instructions are general to account for this kind of variation.
Cluster Host Systems

The cluster host systems should have all of the Sun Cluster software removed using the scinstall script. They must have the Solaris 2.6 Entire Distribution loaded. This can also be accomplished using a Solaris 2.6 JumpStart conguration, which eliminates the need to remove the Sun Cluster software.
About This Course

xli
Setting Up for a ES-330 Class

Preparing for ES-330 Classes
Projection System and Workstation
If you have a projection system for projecting HTML slides and are planning to use the HTML slides, you need to do the following:
G
Install the HTML overheads on the workstation connected to the projection system so you can display them with a browser during lecture. To install the HTML overheads on the machine connected to your overhead projection system, copy the HTML and images subdirectories provided in the ES-330_OH directory to any directory on the overhead workstation machine. These les use approximately 20-Mbytes of disk space. Display the overheads in the browser by choosing Open File and typing the following in the Selection eld of the pop-up window: /location_of_HTML_directory/OH.Title.doc.html
Set up an overhead-projection system that can project instructor workstation screens.
Note This document does not describe the steps necessary to set up an overhead projection system because it is unknown what will be available in each training center. This setup is the responsibility of each training center.
xlii

Course Files
All of the course les for this course are available from the education.central server. Note You can use ftp or the education.central Web site, http://education.central/Released.html, to download the les from education.central. Either of these methods requires you to know the user ID and password for ftp access. See your manager for these if you have not done this before.
Course Components
The ES330 course consists of the following components:
G
Instructor guide The ES330_IG directory contains the FrameMaker les for the instructors guide (students guide with instructor notes). The ART directory is required for printing this guide.
Student guide The ES330_SG directory contains the FrameMaker les for the students guide. The ART directory is required for printing this guide.
Art The ES330_ART directory contains the supporting images and artwork for the students and instructors guides. This directory is required in order to print the students and instructors guides and should be located in the same directory as ES330_IG and ES330_SG.
Instructor notes The ES330_IN directory contains the document Instructor Setup Notes. This document is also appended as conditional text at the end of the Preface FrameMaker document for this course.
About This Course

xliii
Course Files
Course Components
Overheads The ES330_OH directory contains the instructor overheads. There are both HTML and FrameMaker versions of the overheads.
Lab les The ES330_LF directory contains the following subdirectories:

M
CVM221 Cluster Volume Manager 2.2.1 software, about 35 Mbytes. PCI_FC100 Software drivers to support PCI-based FC100 interface boards, about 3 Mbytes Patches Patches for Sun Cluster, SDS, SSVM, Solaris 2.6, and the A5000 array, about 60 Mbytes SC22 Sun Cluster 2.2 software, about 65 Mbytes. SDS42 Solstice DiskSuite 4.2 software, about 50 Mbytes. SSVM26 Sun StorEdge Volume Manager software, about 105 Mbytes. Scripts Script les that are used in the volume manager labs.
M M M
xliv

Sun Cluster Overview

Objectives
Upon completion of this module, you should be able to:
G
List the hardware elements that comprise a basic Sun Enterprise Cluster system List the hardware and software components that contribute to the availability of a Sun Enterprise Cluster system List the types of redundancy that contribute to the availability of a Sun Enterprise Cluster system. Identify the conguration differences between a high availability cluster and a parallel database cluster Explain the purpose of logical host denitions in the Sun Enterprise Cluster environment Describe the purpose of the cluster conguration databases Explain the purpose of each of the Sun Enterprise Cluster fault monitoring mechanisms.
G G
The main goal of this module is to introduce the basic concepts associated with the Sun Enterprise Cluster environment.
1-1
1
Relevance

Present the following questions to stimulate the students and get them thinking about the issues and topics presented in this module. They are not expected to know the answers to these questions. The answers to these questions should be of interest to the students, and inspire them to learn the content presented in this module.
Discussion The following questions are relevant to understanding the content of this module: 1. What is a highly available data service? 2. What must be done to make a data service highly available? 3. What type of system support would a highly available data service require? 4. How would you manage the group of resources required by a highly available data service?
1-2

1
Additional Resources
Additional resources The following references can provide additional details on the topics discussed in this module:
G G G
Sun Cluster 2.2 System Administration Guide, part number 805-4238 Sun Cluster 2.2 Software Installation Guide, part number 805-4239 Sun Cluster 2.2 Cluster Volume Manager Guide, part number 805-4240 Sun Cluster 2.2 API Developers Guide, part number 805-4241 Sun Cluster 2.2 Error Messages Manual, part number 805-4242 Sun Cluster 2.2 Release Notes, part number 805-4243
G G G

1-3
Sun Cluster 2.2 New Features

The Sun Cluster 2.2 software release has the following new features:
G
Sun Cluster 2.2 is now fully internationalized and Year 2000 (Y2K) compliant Support for Solstice DiskSuite has been added This provides and upgrade path for existing HA 1.3 customers. There are several features and restrictions associated with Solstice DiskSuite installations. They include:
G G G G G G
Solaris 7 Operating Environment compatibility A new DIsk ID (DID) software package A new DID conguration command, scdidadm A special scadmin command option, reserve Shared CCD volume is not supported Quorum disk drives are not supported (or needed)
1-4

1
Sun Cluster 2.2 New Features
G
Solaris 7 Operating Environment is now supported Currently, the Solaris 7 Operating Environment can be used only in conjunction with Solstice DiskSuite. The Cluster Volume Manager and the Sun StorEdge Volume Manager products are not yet compatible with the Solaris 7 Operating Environment.
The installation procedures have been changed You can now fully congure the cluster during the host software installation process. This includes conguring public network backup groups and logical hosts.
Licensing is much simpler Sun Cluster 2.2 requires no framework or HA data service licenses to run. However, you need licenses for Sun Enterprise Volume Manager (SEVM) if you use SEVM with any storage devices other than SPARCstorage Arrays or StorEdge A5000s. SPARCstorage Arrays and StorEdge A5000s include bundled licenses for use with SEVM. Contact the Sun License Center for any necessary SEVM licenses; see: http://www.sun.com/licensing/ for more information. You might need to obtain licenses for DBMS products and other third-party products. Contact your third-party service provider for third-party product licenses.
There is a new cluster management tool Sun Cluster Manager A new Java technology-based cluster management tool replaces the previous Cluster Monitor tool. The new tool can be run from the cluster hosts systems as a standalone application, or can be accessed from a remote browser after the appropriate HTTP server software has been installed on the cluster host systems.

1-5
Cluster Hardware Components

The basic hardware components that are necessary for most cluster congurations include:
G G G G G G
One administration workstation One Terminal Concentrator Two hosts (up to four) One or more public network interfaces per system (not shown) A redundant private network interface At least one source of shared, mirrored disk storage
Note E10000-based clusters do not use the terminal concentrator for host system access.
1-6

1
Administration Workstation
The administration workstation can be any Sun SPARC workstation, providing it has adequate resources to support graphics and compute intensive applications, such as Sun VTS and Solstice SyMON software. You can run up to several different cluster administration tools on the administration workstation. Note Typically, the cluster applications are available to users through networks other than the one used by the administration workstation.
Terminal Concentrator
The Terminal Concentrator (TC) is specially modied for use in the Sun Enterprise Environment. No substitutions are supported. The TC provides direct translation from the network packet switching environment to multiple serial port interfaces. Each of the serial port outputs connects to a separate node in the cluster through serial port A. Because the nodes do not have frame buffers, this is the only access path when Solaris is not running.
Cluster Host Systems

A wide range of Sun hardware platforms are supported for use in the clustered environment. Mixed platform clusters are supported, but the systems should be of equivalent processor, memory, and I/O capability. You cannot mix SBus based systems with PCI-based systems. Equivalent SBus and PCI interface cards are not compatible.

1-7
1
Redundant Private Networks
All nodes in a cluster are linked by a private communication network. This private interface is called the cluster interconnect system (CIS) and is used for a variety of reasons including:
G G G
Cluster status monitoring Cluster recovery synchronization Parallel database lock and query information
Cluster Disk Storage

Although a wide range of Sun storage products are available for use in the Sun Enterprise Cluster Environment, they must all accept at least dual-host connections. Some models support up to four host system connections.
1-8

High Availability Features

The Sun Enterprise Cluster system is a general purpose cluster architecture focused on providing reliability, availability, and scalability. Part of the reliability and availability is inherent in the hardware and software used in the Sun Enterprise Cluster.
High Availability Hardware Design

Many of the supported cluster hardware platforms have the following features that contribute to maximum uptime:
G G G
Hardware is interchangeable between models. Redundant system board power and cooling modules. The systems contain automatic system reconguration; failed components, such as the central processing unit (CPU), memory, and input/output (I/O) can be disabled at reboot. Several disk storage options support hot swapping of disks.

1-9
1
High Availability Features
Sun Cluster High Availability Software
The Sun Cluster software has monitoring and control mechanisms that can initiate various levels of cluster reconguration to help maximize application availability.
Software Redundant Array of Inexpensive Disks (RAID) Technology

The Sun StorEdge Volume Manager (SSVM) and the Cluster Volume Manager (CVM) software provide RAID protection in the following ways:
G G
Redundant mirrored volumes RAID-5 volumes (not for all applications)
The SDS software provides mirrored volume support. The SDS product does not support RAID-5 volumes.
Controller-Based RAID Technology

Several supported disk storage devices use controller-based RAID technology that is sometimes referred to as hardware RAID. This includes the following storage arrays:
G G
Sun StorEdge A1000 Sun StorEdge A3000/3500
Year 2000 Compliance

The Sun Cluster 2.2 software and the associated Solaris Operating Environments are both Year 2000 compliant, which contributes to long-term cluster reliability.
1-10

High Availability Strategies

To provide the high level of system availability required by many Enterprise customers, the Sun Enterprise Cluster system uses the following strategies:
G G G G
Redundant servers Redundant data Redundant public network access Redundant private communications
1-11
1
High Availability Strategies
Redundant Servers
The Sun Enterprise Cluster system consists of one to four interconnected systems that are referred to as cluster host systems and also as nodes. The systems can be almost any of the Sun Enterprise class of platforms. They use off-the-shelf non-proprietary hardware. Note You cannot mix systems that use PCI bus technology, such as the E450, with SBus technology systems, such as the E3000. Many of the interface cards, such the storage array interfaces, are not compatible when connected to the same storage unit.
Redundant Data
A Sun Enterprise Cluster system can use any one of several virtual volume management packages to provide data redundancy. The use of data mirroring provides a backup in the event of a disk drive or storage array.
Redundant Public Networks

The Sun Enterprise Cluster system provides a proprietary public network monitoring feature (PNM) that can transfer user I/O from a failed network interface to a predened backup interface.
Redundant Private Networks

The cluster interconnect system (CIS) consists of dual high-speed private node-to-node communication links. Only one of the links is used at a time. If the primary link fails, the cluster software automatically switches to the backup link. This is transparent to all cluster applications.
1-12

Cluster Congurations
The Sun Enterprise Cluster system provides a highly available platform that is suitable for two general purposes:
G G
Highly available data services Parallel databases
Highly Available Data Service Conguration

The highly available data service (HADS) conguration is characterized by independent applications that run on each node or cluster host system. Each application accesses its own data. If there is a node failure, the data service application can be congured so that a designated backup node can take over the application that was running on the failed node.

HADS configurations are application failover platforms. When a node fails, the application moves to a designated backup system.
1-13
1
Cluster Congurations
Parallel Database Conguration
The parallel database (PDB) conguration is characterized by multiple nodes that access a single database. The PDB application is not a data service. This is a less complex conguration than the HADS and when a node fails, the database software resolves incomplete database transactions automatically. The Sun Cluster software initiates a portion of the database recovery process and performs minor recovery coordination between cluster members.

PDB configurations are generally throughput applications. When a node fails, an application does not move to a backup system.
Parallel database solutions, such as Oracle Parallel Server (OPS), can require special modications to support shared concurrent access to a single image of the data that can be spread across multiple computer systems and storage devices. OPS uses a distributed lock management scheme to prevent simultaneous data modication by two hosts. The lock ownership information is transferred between cluster hosts across the cluster interconnect system. Note There is also a special conguration for use with the Informix Online XPS database. This is discussed in a later module.
1-14

Sun Cluster Application Support

The Sun Cluster software framework provides support for both highly available data services and parallel databases. Regardless of which application your cluster is running, the core Sun Cluster control and monitoring software are identical.
1-15
1
Sun Cluster Application Support
Highly Available Data Service Support
The Sun Cluster software provides precongured components that support the following highly available data services:
G G G G G G G G
Oracle, Informix, and Sybase databases NFS SAP Netscape Mail, News, HTTP, LDAP DNS Tivoli Lotus Notes HA-API for local applications
Parallel Database Support

The Sun Cluster software provides support for the following parallel database application:
G G
Oracle Parallel Server (OPS) Informix XPS
Note When the Sun Cluster software is installed on the cluster host systems, you must specify which of the above products you intend to run on your cluster.
1-16

Logical Hosts
A data service in the Sun Cluster environment must be able to migrate to one or more backup systems if the primary system fails. This should happen with as little disruption to the client as possible. At the heart of any highly available data service is the concept of a logical host. Logical host denitions are created by the system administrator and are associated with a particular data service such as highly available NFS (HA-NFS). A logical host denition provides all of the necessary information for a designated backup system to take over the data service(s) of a failed node. This includes the following:
G G G
The IP address/hostname that users use to access the application The disk group that contains the application-related data The application that must be started
1-17
1
Logical Hosts
Logical Host Failure Process
There are several different fault monitoring processes that can detect the failure of a logical host. Once the failure is detected, the actions taken are as follows:
G
The IP address associated with the logical host is brought up on the designated backup system.
Note There are also logical interface names associated with each logical host.
G
The disk group or diskset associated with a logical host migrates to the designated backup system.
Note Depending on the congured volume manager, the group of disks is captured by the backup node using either the vxdg command or the metaset command.
G
The designated application is started on the backup node.
Cluster Conguration Databases

The logical host conguration information is stored in a global conguration le name ccd.database (CCD). The CCD must be consistent between all nodes. The CCD is a critical element that enables each node to be aware of its potential role as a designated backup system. Note There is another cluster database called the CDB that stores basic cluster conguration information, such as node names. This database is used during initial cluster startup. It does not require a high level of functionality as does the CCD database.
1-18

Fault Monitoring
To ensure continued data availability, the Sun Enterprise Cluster environment has several different fault monitoring schemes that can detect a range of failures and initiate corrective actions. The fault monitoring mechanisms fall into two general categories:
G G
Data service fault monitoring Cluster fault monitoring (daemons)
1-19
1
Fault Monitoring
Data Service Fault Monitoring
Each Sun-supplied data service, such as the highly available network le system (HA-NFS), automatically starts its own fault monitoring processes. The data service fault monitors verify that the data service is functioning correctly and providing its intended service. The well-being of the data service is checked both locally and on the designated backup node for the data service. Each data service always has a local and a remote fault monitor associated with it. The data service fault monitors primarily use the public network interfaces to verify functionality, but can also use the private cluster interconnect system (CIS) interfaces if there is a problem with the public network interfaces.
Cluster Fault Monitoring

There are several cluster fault monitoring mechanisms that verify the overall well-being of the cluster nodes and some of their hardware.
Public Network Management

The public network management (PNM) daemon, pnmd, monitors the functionality of designated public network interfaces and can transparently switch to backup interfaces in the event of a failure.
Cluster Membership Monitor

The cluster membership monitor (CMM) daemon, clustd, runs on each cluster member and is used to detect major system failures. The clustd daemons communicate with one another across the private high-speed network interfaces by sending regular heartbeat messages. If the heartbeat from any node is not detected within a dened timeout period, it is considered as having failed and a general cluster reconguration is initiated by each of the functioning nodes.
1-20

1
Fault Monitoring
Cluster Fault Monitoring
Switch Management Agent
The switch management agent (SMA) daemon, smad, monitors the functionality of the current private network interface. If a node detects a failure on its private network interface, it switches to its backup private network interface. All cluster members must then switch to their backup interfaces.
Failfast Driver
Each node has a special driver, ff, that runs in memory. It monitors critical cluster processes and daemons. If any of the monitored processes or daemons hang or terminate, the ff driver performs its failfast function and forces a panic on the node.
1-21
Failure Recovery Summary

To achieve high availability, failed software or hardware components must be automatically detected and corrective action initiated. Some failures do not require a cluster reconguration.
1-22

1
Many of the components shown in Figure 1-1 have failure recovery capabilities. Some failures are less transparent than others and can result in a node crash. Although some of the failures do not disturb the cluster operation, they reduce the level of redundancy and therefore, increase the risk of data loss.
Node 0
RDBMS failfast driver (ff)
Node 1
failfast driver (ff) RDBMS
Disk management
CDB CCD
Consistency Consistency Heartbeat
CDB CCD CMM
Disk management
Fiber-optic channels
CMM SMA Heartbeat
SMA
Fiber-optic channels
Private networks
Storage array
Storage array
Figure 1-1
1-23
1
The following describes the types of system failures and recovery solutions.
G
Individual disk drive failure When virtual volumes are mirrored, you can replace a failed disk drive without interrupting the database or data service access.
Fibre Channel or array controller failure A failure in any part of a ber-optic interface or an array controller is treated as a multiple disk failure. All I/O is sent to the mirrors on another array. This is handled automatically by the disk management software
Cluster interconnect failure If a single private network failure is detected by the SMA, all trafc is automatically routed through the remaining private network. This is transparent to all programs that use the private network.
Node failure If a node crashes or if both its private networks fail, the cluster membership monitor (CMM) daemons on all node detect the heartbeat loss and initiate a comprehensive reconguration.
Critical process or daemon failure If certain critical processes or daemons hang or fail, the failfast driver, ff, forces a panic on the node. As a result, all other nodes in the cluster go through a reconguration.
Cluster conguration database le inconsistency When a cluster reconguration takes place for any reason, a consistency check is done among all nodes to ensure that all of the CDB and CCD database les agree. If there is an inconsistency in one or more of the cluster database les, CMM determines by a majority opinion, which nodes remain in clustered operation.
1-24

1
Exercise: Lab Equipment Familiarization
Exercise objective None

Although there is not a formal lab for this module, this would be a good time to familiarize the class with the location and configuration of the lab equipment. This can be beneficial even if the class is being taught at a customer site.
Preparation
None
Tasks
None
1-25
1
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K K K List the hardware elements that comprise a basic Sun Enterprise Cluster system List the hardware and software components that contribute to the availability of a Sun Enterprise Cluster system List the types of redundancy that contribute to the availability of a Sun Enterprise Cluster system. Identify the conguration differences between a high availability cluster and a parallel database cluster Explain the purpose of logical host denitions in the Sun Enterprise Cluster environment Describe the purpose of the cluster conguration databases Explain the purpose of each of the Sun Enterprise Cluster fault monitoring mechanisms.
1-26

1
Think Beyond
What are some of the most common problems encountered during cluster installation? How does a cluster installation proceed? What do you need to do rst? Do you need to be a database expert to administer a Sun Enterprise Cluster system?
1-27
Objectives
G G G G G G G G G
Describe the Sun Enterprise Cluster administrative interface Explain the TC hardware conguration Verify the correct TC cabling Congure the TC IP address Congure the TC to self-load Verify the TC port settings Verify that the TC is functional Use the terminal concentrator help, who, and hangup commands Describe the purpose of the telnet send brk command
This module should help you understand the critical functions provided by the Sun Enterprise Cluster central administration hardware interface called the Terminal Concentrator. You should also learn how to congure the TC for proper operation in a clustered environment.
2-1
2
Relevance

Discussion The following questions are relevant to your learning the material presented in this module: 1. Why is this hardware covered so early in the course? 2. You are purchasing an Ultra Enterprise 10000-based cluster. Does the information in this module apply?

The E10000 does not use a Terminal Concentrator. The system service processor (SSP) workstation is used instead.
2-2

2
G G G
Sun Cluster 2.2 System Administration Guide, part number 805-4238 Sun Cluster 2.2 Software Installation Guide, part number 805-4239 Sun Cluster 2.2 Release Notes, part number 805-4243
2-3
Cluster Administration Interface

The TC is a hardware interface, consisting of several components that provide the only access path to the cluster host systems when these systems are halted or before any operating system software is installed. The TC is typically the rst component congured during a new cluster installation. The administration workstation is set up next.
2-4

2
As shown in Figure 2-1, the cluster administration interface is a combination of hardware and software that enable you to monitor and control one or more clusters from a remote location. Administration workstation Administration tools Network
Terminal Concentrator Network interface 1 2 3
Serial ports 4 5 6 7 8
Setup port
Node 0
Node 1
Setup device
Serial port A
Figure 2-1
2-5
2
Major Elements
The relationship of the following elements is shown in Figure 2-1.
Administration Workstation
The administration workstation can be any Sun SPARC workstation, providing it has adequate resources to support graphics and computer- intensive applications, such as SunVTS and Solstice SyMON software.
Administration Tools
There are several administration tools but only one of them, the Cluster Console, is functional when the cluster host systems are at the OpenBoot Programmable Read Only Memory (PROM) prompt. Cluster Console is a tool that automatically links to each node in a cluster through the TC. A text window is provided for each node. The connection is functional even when the nodes are at the OK prompt level. This is the only path available to boot the cluster nodes or initially load the operating system software.
Terminal Concentrator (TC)

The TC provides translation between the local area network (LAN) environment and the serial port interfaces on each node. The nodes do not have display monitors so the serial ports are the only means of accessing each node to run local commands.
Cluster Host Serial Port Connections

The cluster host systems do not have a display monitor or keyboard. The system rmware senses this when power is turned on and directs all system output through serial port A. This is a standard feature on all Sun systems.
2-6

Terminal Concentrator Overview

The TC used in the Sun Enterprise Cluster systems has its own internal OS and resident administration programs. The TC rmware is specially modied for Sun Enterprise Cluster installation. Although other terminal concentrators might seem similar, they should not be used as a substitute.
2-7
2
As shown in Figure 2-2, the TC is a self-contained unit with its own operating system. Part of its operating system is a series of administrative programs. Terminal Concentrator Memory Network interface Self-load at power-on
PROM Operating system OPER_52_ENET.SYS Serial ports 1 2 3 4 5 6 7 8
Setup port
Node 0
Node 1
Serial port A Setup device
Figure 2-2
Terminal Concentrator Functional Diagram
Caution If the PROM-based operating system is older than version 52, it must be upgraded.
2-8

2
Operating System Load
You can set up the TC to load its operating system either internally from the resident PROM, or externally from a server. In the cluster application, it is always set to load internally. Placing the operating system on an external server can actually decrease the reliability of the terminal server. When power is rst applied to the TC, it performs the following steps: 1. A PROM-based self-test is run and error codes are displayed. 2. A PROM-based OS is loaded into the TC resident memory.
Setup Port
Serial port 1 on the TC is a special purpose port that is used only during initial setup. It is used primarily to set up the IP address and load sequence of the Terminal Server. Port 1 access can be either from a tip connection or from a locally connected terminal.
Terminal Concentrator Setup Programs

You must congure the TC non-volatile random-access memory (NVRAM) with the appropriate IP address, boot path, and serial port information. You use the following resident programs to specify this information:
G G G G
Addr Seq Image Admin
2-9
Terminal Concentrator Setup

The TC must be congured for proper operation. Although the TC setup menus seem simple, they can be confusing and it is easy to make a mistake. You can use the default values for many of the prompts.
2-10

2
Connecting to Port 1
Before you can perform the initial TC setup, you must rst make an appropriate connection to its setup port. Figure 2-3 shows a tip hardwire connection from the administration workstation.
Figure 2-3
Setup Connection to Port 1
Note You can also connect an American Standard for Information Interchange (ASCII) terminal to the setup port of the administration workstation.
Enabling Setup Mode

To enable Setup mode, press the TC Test button shown in Figure 2-4 until the TC power indicator begins to blink rapidly, then release the Test button and press it again briey.
STATUS
POWER UNIT NET ATTN LOAD ACTIVE
Power indicator Figure 2-4
Test button
Terminal Concentrator Test Button
After you have enabled Setup mode, a monitor:: prompt should appear on the setup device. Use the addr, seq, and image commands to complete the conguration.
2-11
2
Setting the Terminal Concentrator IP Address
The following example shows how to use the addr program to set the IP address of the TC. Usually this is set correctly when your cluster arrives but you should always verify this. monitor:: addr Enter Internet address [192.9.22.98]:: 129.150.182.100 Enter Subnet mask [255.255.255.0]:: Enter Preferred load host Internet address [192.9.22.98]:: 129.150.182.100 Enter Broadcast address [0.0.0.0]:: 129.150.182.255 Enter Preferred dump address [192.9.22.98]:: 129.150.182.100 Select type of IP packet encapsulation (ieee802/ethernet) [<ethernet>]:: Type of IP packet encapsulation: <ethernet> Load Broadcast Y/N [Y]::
Setting the Terminal Concentrator Load Source

The following example shows how to use the seq program to specify the type of loading mechanism to be used. monitor:: seq Enter a list of 1 to 4 interfaces to attempt to use for downloading code or upline dumping. Enter them in the order they should be tried, separated by commas or spaces. Possible interfaces are: Ethernet: net SELF: self Enter interface sequence [self]::
2-12

2
Setting the Terminal Concentrator Load Source
The self response congures the TC to load its OS internally from the PROM when you turn the power on. The PROM image is currently called OPER_52_ENET.SYS. Enabling the self-load feature negates other setup parameters that refer to an external load host and dump host, but you must still dene them during the initial setup sequence.
Specify the Operating System Image

Even though the self-load mode of operation negates the use of an external load and dump device, you should still verify the operating system image name as shown by the following: monitor:: image Enter Image name [oper_52_enet]:: Enter TFTP Load Directory [9.2.7/]:: Enter TFTP Dump path/filename [dump.129.150.182.100]:: monitor:: Note Do not dene a dump or load address that is on another network because you will see additional questions about a gateway address. If you make a mistake, you can press Control-c to abort the setup and start again.
2-13
2
Setting the Serial Port Variables
The TC port settings must be correct for proper cluster operation. This includes the type and mode port settings. Port 1 requires different type and mode settings. You should verify the port settings before installing the cluster host software. The following is an example of the entire procedure: admin-ws# telnet terminal_concentrator_name Trying terminal concentrator IP address Connected to sec-tc. Escape character is '^]'. Rotaries Defined: cli Enter Annex port name or number: cli Annex Command Line Interpreter * Copyright 1991 Xylogics, Inc. annex: su Password: type the password annex# admin Annex administration MICRO-XL-UX R7.0.1, 8 ports admin: show port=1 type mode Port 1: type: hardwired mode: cli admin:set port=1 type hardwired mode cli admin:set port=2-8 type dial_in mode slave admin: quit annex# boot bootfile: <CR> warning: <CR>. Note This procedure is not performed through the special setup port but through public network access.

The default TC password is its IP address, including the periods. The imask_7bits variable masks out everything but the standard 7-bit character set.
2-14

Terminal Concentrator Troubleshooting

Occasionally it is useful to be able to manually manipulate the terminal concentrator. The command to do this are not well documented in the cluster manuals.
Manually Connecting to a Node

If the cconsole tool is not using the terminal concentrator serial ports, you can use the telnet command to connect to a specic serial port as follows: # telnet tc_name 5002 You can then log in to the node attached to port 5002. After you have nished and logged out of the node, you must break the telnet connection with the Control ] keyboard sequence and then type quit. If you do not, the serial port will be locked against use by other applications, such as the cconsole tool.
2-15
2
Using the telnet Command to Abort a Node
If you have to abort a cluster node, you can either use the telnet command to connect directly to the node and use the Control ] keyboard sequence or you can use the Control ] keyboard sequence in a cconsole window. Once you have the telnet prompt, you can abort the node with the following command: telnet > send brk ok Note You might have to repeat the command more than once.
Connecting to the Terminal Concentrator CLI

As shown below, you can use the telnet command to connect directly to the TC, and then use the resident command line interpreter to perform status and administration procedures. # telnet IPaddress Trying 129.146.241.135... Connected to 129.146.241.135 Escape character is '^]'. Enter Annex port name or number: cli Annex Command Line Interpreter * Copyright 1991 Xylogics, Inc. annex:
Using the Terminal Concentrator help Command

After you connect directly into a terminal concentrator, you can get online help as shown below. annex: help annex: help hangup annex: help stats
2-16

2
Identifying and Resetting a Locked Port
If a node crashes, it can leave a telnet session active that effectively locks the port from further use. You can use the telnet command to connect into the terminal concentrator, use the who command to identify which port is locked, and then use the admin utility to reset the locked port. The command sequence is as follows: annex: who annex: su Password: annex# admin Annex administration MICRO-XL-UX R7.0.1, 8 ports admin : reset 6 admin : quit annex# hangup
Erasing Terminal Concentrator Settings

Using the TC erase command can be dangerous. It should only be used when you have forgotten the superuser password. It will set the password to its default setting which is the TC IP address. It will also return all other settings to their default values. The erase command is available only through the port 1 interface. A typical procedure is as follows: monitor :: erase 1) 2) EEPROM FLASH (i.e. Configuration information) (i.e. Self boot image)
Enter 1 or 2 :: 1 Warning Do not use option 2 of the erase command, it will destroy the TC boot PROM resident operating system
2-17
2
Exercise: Conguring the Terminal Concentrator
Exercise objective In this exercise you will:
M M M M M
Verify the correct TC cabling Congure the TC IP address Congure the TC to self-load Verify the TC port settings Verify that the TC is functional
Preparation
Before starting this lab, you must obtain an IP address assignment for your terminal concentrator. Record it below. TC IP address:

____________________
If there are root passwords set on the lab TCs, tell students now. Stress that this is typically the first step in a real installation. If they are to use the tip hardwire method and the administration workstations in the lab have a shared A/B serial port connector, you must tell the students the correct port setting to use in the Connecting Tip Hardwire section of this exercise.
Tasks
The following tasks are explained in this section:
G G G G G G G G
Verifying the network and host system cabling Connecting a local terminal Connecting tip hardwire Achieving setup mode Conguring the IP address Conguring the Terminal Concentrator to self-load Verifying the self-load process Verifying the TC port settings
2-18

2
TC can have one or two Ethernet connections, depending on their age. All TC generations have the same serial port connections. Before you begin to congure the TC, you must verify that the network and cluster host connections are correct. The port-to-node connection conguration will be used when conguring the cluster administration workstation in a later lab.
Verifying the Network and Host System Cabling

1. Inspect the rear of the TC and make sure it is connected to a public network.
Ethernet ports Figure 2-5 2. Terminal Concentrator Network Connection
Verify that the serial ports are properly connected to the cluster host systems. Each output should go to serial port A on the primary system board of each cluster host system.
To Node 0 Figure 2-6
To Node 1
Concentrator Serial Port Connections
Note In three and four-node clusters, there are additional serial port connections to the TC.
2-19
2
To set up the TC, you can either connect a dumb terminal to serial port 1 or use the tip hardwire command from a shell on the administration workstation. If you are using a local terminal connection, continue with the next section, Connecting a Local Terminal. If you are using the administration workstation, proceed to the Connecting Tip Hardwire section on page 2-21 section.
Connecting a Local Terminal

1. Connect the local terminal to serial port 1 on the back of the TC using the cable supplied. Dumb terminal
Figure 2-7
Concentrator Local Terminal Connection
Note Do not use a cable length over 500 feet. Use a null modem cable. 2. Verify that the local terminal operating parameters are set to 9600 baud, 7 data bits, no parity, and one stop bit. 3. Proceed to the Achieving Setup Mode section on page 2-22.
2-20

2
Perform the following procedure only if you are using the tip connection method to congure the TC. If you have already connected to the TC with a dumb terminal, skip this procedure.
Connecting Tip Hardwire

1. Connect serial port B on the administration workstation to serial port 1 on the back of the TC using the cable supplied.
Figure 2-8
Concentrator to Tip Hardware Connection
Note Do not use a cable length over 500 feet. Use a null modem cable. 2. Verify that the hardwire entry in the /etc/remote le matches the serial port you are using: hardwire:\ :dv=/dev/term/b:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D: The baud rate must be 9600 The serial port designator must match the serial port you are using. 3. Open a shell window on the administration workstation and make the tip connection by typing the following command: # tip hardwire
2-21
2
Before the TC conguration can proceed, you must rst place it in its Setup mode of operation. Once in Setup mode, the TC accepts conguration commands from a serial device connected to Port 1.
Achieving Setup Mode

1. To enable Setup mode, press and hold the TC Test button until the TC power indicator begins to blink rapidly, then release the Test button and press it again briey.
STATUS
POWER UNIT NET ATTN LOAD ACTIVE
Power indicator Figure 2-9 2.
Test button
Enabling Setup Mode on the Terminal Concentrator
After the TC completes its power-on self-test, you should see the following prompt on the shell window or on the local terminal: monitor::
Note It can take a minute or more for the self-test process to complete.
2-22

2
In the next procedure, you will set up the TCs network address with the addr command. This address must not conict with any other network systems or devices.
Conguring the IP Address

Verify that the TC internet address and preferred load host address are set to your assigned value. 1. To congure the Terminal Concentrator IP address using the addr command, type addr at the monitor:: prompt. monitor:: addr Enter Internet address [192.9.22.98]:: 129.150.182.101 Enter Subnet mask [255.255.255.0]:: Enter Preferred load host Internet address [192.9.22.98]:: 129.150.182.101 ***Warning: Local host and Internet host are the same*** Enter Broadcast address [0.0.0.0]:: 129.150.182.255 Enter Preferred dump address [192.9.22.98]:: 129.150.182.101 Select type of IP packet encapsulation (ieee802/ethernet) [<ethernet>]:: Type of IP packet encapsulation: <ethernet> Load Broadcast Y/N [Y]:: Y monitor::
2-23
2
When the TC is turned on, you must congure it to load a small operating system. You can use the seq command to dene the location of the operating system and the image command to dene its name.
Conguring the Terminal Concentrator to Self-Load

1. To congure the TC to load from itself instead of trying to load from a network host, type the seq command at the monitor:: prompt. monitor:: seq Enter a list of 1 to 4 interfaces to attempt to use for downloading code or upline dumping. Enter them in the order they should be tried, separated by commas or spaces. Possible interfaces are: Ethernet: net SELF: self Enter interface sequence [self]:: 2. To congure the TC to load the correct operating system image, type the image command at the monitor:: prompt. monitor:: image Enter Image name [oper.52.enet]:: Enter TFTP Load Directory [9.2.7/]:: Enter TFTP Dump path/filename [dump.129.150.182.101]:: 3. 4. If you used a direct terminal connection, disconnect it from the TC when nished. If you used the tip hardwire method, break the tip connection by typing the ~. sequence in the shell window.
2-24

2
Before proceeding, you must verify that the TC can complete its selfload process and that it will answer to its assigned IP address.
Verifying the Self-load Process

1. Turn off the TC power for at least 10 seconds and then turn it on again. 2. Observe the light emitting diodes (LEDs) on the TC front panel. After the TC completes its power-on self-test and load routine, the front panel LEDs should look like the following: Table 2-1 Power (Green) ON LED Front Panel Settings Unit (Green) ON Net (Green) ON Attn (Amber) OFF Load (Green) OFF Active (Green) Intermittent blinking
Note It takes at least 1 minute for the process to complete. The Load light extinguishes after the internal load sequence is complete.
Verifying the Terminal Concentrator Pathway

Complete the following steps on the administration workstation from a shell or command tool window: 1. Test the network path to the TC using the following command: # ping IPaddress Note Substitute the IP address of your TC for IPaddress.
2-25
2
You must set the TC port variable type to dial_in for each of the eight TC serial ports. If it is set to hardwired, the cluster console might be unable to detect when a port is already in use. There is also a related variable called imask_7bits that you must set to Y.
Verifying the TC Port Settings

You can verify and if necessary, modify the type, mode, and imask_7bits variable port settings, with the following procedure. 1. On the administration workstation, use the telnet command to connect to the TC. Do not use a port number. # telnet IPaddress Trying 129.146.241.135... Connected to 129.146.241.135 Escape character is '^]'. 2. Enable the command line interpreter, su to the root account, and start the admin program. Enter Annex port name or number: cli Annex Command Line Interpreter * Copyright 1991 Xylogics, Inc. annex: su Password: annex# admin Annex administration MICRO-XL-UX R7.0.1, 8 ports admin : Note By default, the superuser password is the TC IP address. This includes the periods in the IP address.
2-26

2
3. Use the show command to examine the current setting of all ports. admin : show port=1-8 type mode 4. Perform the following procedure to change the port settings and to end the TC session. admin:set port=1 type hardwired mode cli admin:set port=2-8 type dial_in mode slave admin: quit annex# boot bootfile: <CR> warning: <CR> Connection closed by foreign host. Note It takes at least 1 minute for the process to complete. The Load light extinguishes after the internal load sequence is complete.

1. On the administration workstation, use the telnet command to connect to the TC. Do not use a port number. # telnet IPaddress Trying 129.146.241.135... Connected to 129.146.241.135 Escape character is '^]'. 2. Enable the command line interpreter. Enter Annex port name or number: cli Annex Command Line Interpreter * Copyright 1991 Xylogics, Inc. annex: 3. Practice using the help and who commands. 4. End the session with the hangup command.
2-27
2
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
Manage the discussion here based on the time allowed for this module, which was given in the About This Course module. If you find you do not have time to spend on discussion, then just highlight the key concepts students should have learned from the lab exercise.
G
Experiences
Ask students what their overall experiences with this exercise have been. You may want to go over any trouble spots or especially confusing areas at this time.
G
Interpretations
Ask students to interpret what they observed during any aspects of this exercise.
G
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
G
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
2-28

2
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K K K K K Describe the Sun Enterprise Cluster administrative interface Explain the TC hardware conguration Verify the correct TC cabling Congure the TC IP address Congure the TC to self-load Verify the TC port settings Verify that the TC is functional Use the terminal concentrator help, who, and hangup commands Describe the purpose of the telnet send brk command
2-29
2
Think Beyond
Is there a signicant danger if the TC port variables are not set correctly? Is the Terminal Concentrator a single point of failure? What would happen if it failed?
2-30

Administration Workstation Installation

Objectives
G G G G G
Summarize the Sun Cluster administration workstation functions Use the scinstall script features Install the client software on the administration workstation Set up the administration workstation environment Congure the Sun Cluster administration tools
This module describes the installation process for the Sun Cluster software on the Administration Workstation.
3-1
3
Relevance

Discussion The following questions are relevant to understanding this modules content: 1. How important is the administration workstation during the conguration of the cluster host systems?
3-2

3
G G
Sun Cluster 2.2 System Administration Guide, part number 805-4238 Sun Cluster 2.2 Software Installation Guide, part number 805-4239

3-3
Sun Enterprise Cluster Software Summary

Sun Cluster software is installed on a Sun Enterprise Cluster hardware platform. The Sun Cluster software is purchased separately from each of the supported volume managers. The complete software collection consists of the following CDs:
G G G G
Sun Cluster 2.2 Cluster Volume Manager 2.2.1 Sun StorEdge Volume Manager 2.6 Solstice DiskSuite 4.2
Note You can use the Solstice DiskSuite volume manager with the Solaris 7 Operating Environment version of the Sun Cluster product. You must use the other supported volume managers with the Solaris 2.6 Operating Environment version of the Sun Cluster product. The administration workstation can be congured with either version of the Solaris Operating Environment independently of the cluster host systems.
3-4

3
As shown in Figure 3-1, the Sun Cluster client software is installed on the administration workstation and the Sun Cluster server software is installed on each of the cluster host systems along with the appropriate volume management software. Administration workstation Solaris 2.6/7 Sun Cluster client software Network Private disk Solaris 2.6/7 Sun Cluster server software Volume management Node 0 system hardware Node 1 system hardware Private disk Solaris 2.6/7 Sun Cluster server software Volume management
Disk storage array Figure 3-1
Disk storage array
Sun Cluster Software Distribution

3-5
3
Sun Cluster Software Installation
The Sun Cluster CD-ROM contains a software administration program called scinstall. The scinstall script is used to install all Sun Cluster software.
!

Caution There are many package dependencies. You should not try to manually add packages unless instructed to do so by formally released procedures.
Each of the volume management CD-ROMs has its own installation process.
Administrative Workstation Software Packages

The client Package Set
The client packages are installed only on the administration workstation and include the following:
G G G G G G
SUNWscch Sun Cluster common help les SUNWccp Sun Cluster control panel SUNWscmgr Sun Cluster manager SUNWccon Sun Cluster console SUNWscsdb Sun Cluster serialports/clusters database SUNWcsnmp Sun Cluster (Simple Network Management Protocol) SNMP agent
Note The Sun Cluster installation program package, SUNWscins, is also installed.

The server packages are discussed in the next module.
3-6

Software Installation Program

You use the scinstall command to install the Sun Cluster CD-ROM packages. Run in its normal interactive mode, scinstall prompts you for all of the information that it requires to properly install the Sun Cluster software. For the administrative workstation, there is little information required, although the scinstall command provides many options.

3-7
3
Software Installation Program
scinstall Command Line Options
The scinstall command recognizes the following options when initially started: scinstall [-a|-c|-s] [-h] [-A admin_file] [[-i|-u|-o|-V][-l]] [-d package_dir]
-a -c -s -i
Loads all the packages (client and server) Loads the client administration packages Loads the server packages Installs the selected packages (use with [acs]) Uninstalls the selected packages (use with [acs]) Uninstalls obsolete packages Veries the installation Lists installed client and server packages Indicates where to nd the packages Use the specied package administration le Prints a help message
-u
-o -V -l -d package_dir -A admin_le -h
Note If no options are provided, the program prompts interactively. This is the recommended method.
3-8

Sun Cluster Installation Program Startup

When you run the scinstall script without any command-line startup options, it manages several complex tasks using simple interactive prompts. It assumes that the operating system (OS) is properly installed and congured. When you start the scinstall script as shown below, the SUNWscins package is installed before anything else occurs. This program manages the installation. # cd /cdrom/suncluster_sc_2_2/Sun_Cluster_2_2/Sol_2.6/Tools # ./scinstall Installing: SUNWscins Installation of <SUNWscins> was successful. Checking on installed package state ............. None of the Sun Cluster software has been installed

3-9
3
Initial Installation Startup
During the next phase of the scinstall program shown below, you must dene which Sun Cluster package set you wish to administer. Choose one: 1) Upgrade
Upgrade to Sun Cluster 2.2 Server packages 2) Server Install the Sun Cluster packages needed on a server 3) Client Install the admin tools needed on an admin workstation 4) Server and Client Install both Client and Server packages 5) Close Exit this Menu 6) Quit Quit the Program Enter the number of the package set [6]: 3
Normally, the Server and Client packages are not installed on the same system. Note When you select the server option, a complex dialogue is displayed, which requires detailed input about the intended cluster conguration. Do not start the server installation until you are prepared to answer all questions. As shown in the following example, you must dene a legitimate source for the Sun Cluster software. What is the path to the CD-ROM image [/cdrom/cdrom0]: /cdrom/suncluster_sc_2_2 You must dene the full path name to the Sun_Cluster_2_2 directory. Note There are detailed upgrade procedures in the Sun Cluster 2.2 Software Installation Guide in chapter 4, Upgrading Sun Cluster Software.
3-10

3
Existing Installation Startup
If you run the scinstall program on a cluster host that already has the current Sun Cluster software congured, you see a different startup dialogue. 1) Install/Upgrade Install or Upgrade Server Packages or Install Client Packages. Remove Server or Client Packages. Modify cluster or data service configuration Verify installed package sets. List installed package sets. Quit this program. The help screen for this menu.
2) Remove
3) Change
4) Verify 5) List 6) Quit 7) Help
Please choose one of the menu items: [7]: You can use option 3 (Change) to modify the cluster conguration if you made too many mistakes during the initial installation. The change option allows you to:
G G G G G G
Change information about the nodes in the cluster Add data service packages onto the system Remove data service packages from the system Remove the volume manager software Change the logical host congurations Reinitialize the NAFO group congurations
3-11
3
Installation Mode
Before you start the installation process, you must select the mode of installation. A typical dialogue is shown in the following output. Installing Client packages Installing the following packages: SUNWscch SUNWccon SUNWccp SUNWcsnmp SUNWscsdb >>>> Warning <<<< The installation process will run several scripts as root. In addition, it may install setUID programs. If you choose automatic mode, the installation of the chosen packages will proceed without any user interaction.If you wish to manually control the install process you must choose the manual installation option. Choices: manual automatic
Interactively install each package Install the selected packages with no user interaction.
In addition, the following commands are supported: list Show a list of the packages to be installed help Show this command summary close Return to previous menu quit Quit the program Install mode [manual automatic] [automatic]: automatic
In most situations, you should select the automatic mode of installation because of complex package dependencies.
3-12

Administration Workstation Environment

New Search and Man Page Paths
Depending on which shell the system uses, add the following search path and man path entries to the .profile or .cshrc les for user root:
G G
/opt/SUNWcluster/bin /opt/SUNWcluster/man
3-13
3
Host Name Resolution Changes
As shown in the following example, you can enter the IP addresses and the hostnames of the TC and all cluster hosts in the administration workstations /etc/inet/hosts le even if you are using a naming service. This gives you access to the cluster host systems by name even if your naming service fails. 127.0.0.1 129.146.53.57 129.146.53.60 129.146.53.61 129.146.53.62 localhost adminws1 loghost sc-tc sc-node0 sc-node1
Note To ensure that this works, specify files before any other name service on the hosts line of /etc/nsswitch.conf.
Remote Login Control

You can control user root logins on the administration workstation in three ways. Edit the /etc/default/login le and modify the line CONSOLE=/dev/console in one of the following ways: # CONSOLE=/dev/console Allows the user to log in remotely as root from any other workstation Requires the user to log in as root only from the workstation keyboard Requires the user to log in as another user and then use the su command to change to root
CONSOLE=/dev/console
CONSOLE=
Note An advantages of the CONSOLE= form of login control is that a log is kept of all su logins in the /var/adm/sulog le.
3-14

3
Remote Display Enabling
If the cluster host systems need to display window-based applications on the administration workstation, edit the .xinitrc le and just above the last line (wait) type xhost hostname1 hostname2.
Controlling rcp and rsh Access

Figure 3-2 illustrates how to use the rcp or rsh commands to move les between the cluster host systems or check status. User JD wants to rlogin, rcp, or rsh to host X host A
No
Is user JD in /etc/passwd? Is user JD a superuser? No Is host A in /etc/hosts.equiv? No Is host A in $HOME/.rhosts? No Command? rcp rsh rlogin
Yes
host X
Yes
Yes
Access allowed
Yes Password prompt Yes Password correct? No Login prompt
Access denied
Figure 3-2 Remote Authentication Flowchart
3-15
Cluster Administration Tools Conguration

All of the necessary information needed for the cluster administration tools that run on the administration workstation is congured in two les. The les are on the administration workstation and are: /etc/clusters /etc/serialports When the Sun Cluster client software is installed on the administration workstation, two blank les are created. You must edit the les and supply the necessary information.
3-16

3
As shown in Figure 3-3, the cluster administration interface is composed of both hardware and software elements. Administration workstation
Administration tools
Network
Terminal concentrator Network interface Serial ports 1 2 3 4 5 6 7 8
Serial port A
Node 0
Node 1
Node 2
Node 3
Figure 3-3
Cluster Administration Components
3-17
3
Administration Tool Conguration Files
The following is a typical entry in the /etc/clusters le: sc-cluster sc-node0 sc-node1
The single-line entry denes a cluster named sc-cluster that has two nodes named sc-node0 and sc-node1. Note The cluster name is purely arbitrary, but it should agree with the name you use when you install the server software on each of the cluster host systems. The following is a typical entry in the /etc/serialports le: sc-node0 sc-node1 sc-tc sc-tc 5002 5003
There is a line for each cluster host that describes the name of each host, the name of the terminal concentrator, and the terminal concentrator port to which each host is attached. For the E10000, the /etc/serialports entries for each cluster domain are congured with the domain name, the System Service Processor (SSP) name, and (always) the number 23, which represents the telnet port. sc-10knode0 sc10k-ssp sc-10knode1 sc10k-ssp 23 23
Note When upgrading the cluster software, the /etc/serialports and /etc/clusters les are overwritten. You should make a backup copy before starting the upgrade.
3-18

Cluster Administration Tools

The cluster administration tools are used to manage a cluster. They provide many useful features including:
G G G
Centralized tool bar Command line interface to each cluster host Overall cluster status tool
The cluster administration tools are accessed by using the ccp program. Note The Cluster Manager tool is not discussed until a later module. It is not operational until the cluster is fully congured.
3-19
3
The Cluster Control Panel
As shown in Figure 3-4, the Cluster Control Panel provides centralized access to several Sun Cluster administration tools.
Figure 3-4
Cluster Control Panel
Cluster Control Panel Start-up

To start the Cluster Control Panel, type the following command: # /opt/SUNWcluster/bin/ccp [clustername] &
Adding New Applications to the Cluster Control Panel

The Cluster Control Panel New Item display is available under the Properties menu and allows you to add custom applications and icons to the Cluster Control Panel display.
3-20

3
Cluster Console
The Cluster Console tool uses the TC to access the cluster host systems through serial port interfaces. The advantage of this is that you can connect to the cluster host systems even when they are halted. This is useful when booting the systems and is essential during initial cluster host conguration when loading the Solaris operating system. As shown in Figure 3-5, the Cluster Console tool uses xterm windows to connect to each of the cluster host systems. Node 0 window Node 1 window
Node 2 window Figure 3-5
Common window
Node 3 window
Cluster Console Windows
3-21
3
Cluster Console
Manually Starting the cconsole Tool
You can use cconsole manually to connect to each of the host systems or to the cluster. This command starts windows for all cluster hosts. /opt/SUNWcluster/bin/cconsole node0 /opt/SUNWcluster/bin/cconsole eng-cluster /opt/SUNWcluster/bin/cconsole node3
Cluster Console Host Windows

There is a host window for each node in the cluster. You can enter commands in each host window separately. The host windows all appear to be vt220 terminals. Set the TERM environment variable to vt220 to use the arrow and other special keys.
Cluster Console Common Window

The common window shown in Figure 3-6 lets you enter commands to all host system windows at the same time. All of the windows are tied together, so that when the Common Window is moved, the host windows follow. The Options menu allows you to ungroup the windows, move them into a new arrangement, and group them again.
Figure 3-6
Cluster Console Common Window
3-22

3
Cluster Console
Cluster Console Window Variations
There are three variations of the cluster console tool that each use a different method to access the cluster hosts. They all look and behave the same way.
G
Cluster Console (console mode) Access to the host systems is made through the TC interface. The Solaris operating system does not have to be running on the cluster host systems. Only one connection at a time can be made through the TC to serial port A of a cluster node. You cannot start a second instance of cconsole for the same cluster. For E10000 domains, a telnet connection is made to the ssp account on the domains SSP, then a netcon session is established. The SSP name and ssp account password are requested during the cluster node software installation process.
Cluster Console (rlogin mode) Access to the host systems is made using the rlogin command, which uses the public network. The Solaris OS must be running on the cluster host systems.
Cluster Console (telnet mode) Access to the host systems is made using the telnet command, which uses the public network. The Solaris OS must be running on the cluster host systems.
3-23
3
Cluster Help Tool
The Cluster Help Tool provides comprehensive assistance in the use of all Sun Cluster administration tools. Start the Cluster Help Tool by selecting the Help Tool icon in the ccp. Figure 3-7 shows a page from the Cluster Help Tool that describes the Cluster Control Panel.
Figure 3-7
Cluster Help Tool
3-24

3
Exercise: Installing the Sun Cluster Client Software
M
Install and congure the Sun Cluster client software on an administration workstation Congure the Sun Cluster administration workstation environment for correct Sun Cluster client software operation Start and use the basic features of the Cluster Control Panel, the Cluster Console, and the Cluster Help tools.
Review the lab objectives and classroom setup with the students.
Preparation
This lab assumes that the Solaris 2.6 operating system software has already been installed on all of the systems. 1. Remove the Cluster Name and Address Information section on page A-2 and complete the rst three entries. Note Ask your instructor for assistance with the information.

Even though all of the names and addresses might already be in the /etc/hosts files, it is still useful for the student to have them available in hardcopy form. Regardless of how the training support software is arranged, the Sun Cluster installation script needs to know only enough of the path to locate the Sun_Cluster_2_2 directory. A typical path that you furnish to the scinstall program is /SEC/2.2/SC_2.2. Do not include the Sun_Cluster_2_2 directory as part of the path name.
3-25
3
Tasks
G G G G G G G
Updating the name service Installing OS patches Running the scinstall utility Setting up the root user environment Conguring the /etc/clusters le Conguring the /etc/serialports le Starting the cconsole tool
Updating the Name Service

1. If necessary, edit the /etc/hosts le on the administrative workstation and add the IP addresses and names of the Terminal Concentrator and the host systems in your cluster.
Installing OS Patches
1. Install any recommended OS patches on the administrative workstation before running scinstall. 2. Reboot the administrative workstation after installing the patches.
3-26

3
Running the scinstall Utility
1. Log in to your administration workstation as user root. 2. Move to the Sun_Cluster_2_2/Sol_2.6/Tools directory. Note Either load the Sun Cluster CD-ROM or move to the location provided by your instructor. 3. Start the Sun Cluster installation script. # ./scinstall 4. Select option 3 to install the client software. 5. If necessary, enter the path to the Sun Cluster software packages. Note The software needs to know only the portion of the path that is needed to locate the Sun_Cluster_2_2 directory. 6. Select the automatic mode of installation. 7. After the SUNWscch, SUNWccon, SUNWccp, SUNWcsnmp, and SUNWscsdb packages have been installed successfully, the scinstall program should display the main menu again. 8. Use options 3 and 4 to verify and list the installed Sun Cluster software. 9. Select option 5 to quit the scinstall program.
3-27
3
Conguring the Administration Workstation Environment
Caution Check with your instructor before performing the steps in this section. Your login les might already be properly congured by a JumpStart operation. The steps in this section will destroy that setup. 1. Navigate to the training Scripts/ENV subdirectory. 2. Copy the admenv.sh le to /.profile on the administration workstation. 3. Exit the window system and then log out and in again as user root.
Verifying the Administration Workstation Environment

1. Verify that the following search paths and variables are present: PATH=/sbin:/usr/sbin:/usr/bin:/opt/SUNWcluster/bin:/u sr/openwin/bin:/usr/ucb:/etc:. MANPATH=/usr/man:/opt/SUNWcluster/man:/usr/dt/man:/us r/openwin/man EDITOR=vi 2. Start the window system again on the administration workstation.
3-28

3
Conguring the /etc/clusters File
The /etc/clusters le has a single line entry for each cluster you intend to monitor. The entries are in the form: ClusterName host0name host1name host2name host3name
Sample /etc/clusters File sec-cluster 1. sec-node0 sec-node1 sec-node2
Edit the /etc/clusters le and add a line using the cluster and node names assigned to your system.
Conguring the /etc/serialports File

The /etc/serialports le has an entry for each cluster host describing the connection path. The entries are in the form: hostname tcname tcport
Sample /etc/serialports File sec-node0 sec-node1 sec-node2 1. sec-tc sec-tc sec-tc 5002 5003 5004
Edit the /etc/serialports le and add lines using the node and TC names assigned to your system.
Note When you upgrade the cluster software, the /etc/serialports and /etc/clusters les are overwritten. You should make a backup copy of these les before starting an upgrade.
3-29
3
Starting the cconsole Tool
This section provides a good functional verication of the Terminal Concentrator in addition to the environment conguration. 1. 2. Make sure power is on for the TC and all of the cluster hosts. Start the cconsole application on the administration workstation. # cconsole clustername & 3. Place the cursor in the cconsole Common window and press Return several times. You should see a response on all of the cluster host windows. If not, ask your instructor for assistance.
Note The cconsole Common window is useful for simultaneously loading the Sun Cluster software on all of the cluster host systems. 4. If the cluster host systems are not booted, boot them now. ok boot 5. 6. After all cluster host systems have completed their boot, log in as user root. Practice using the Common window Group Term Windows feature under the Options menu. You can ungroup the cconsole windows, rearrange them, and then group them together again.
3-30

3
Conguring the Cluster Host Systems Environment
Caution Check with your instructor before performing the steps in this section. Your login les might already be properly congured by a JumpStart operation. The steps in this section will destroy that setup. 1. Navigate to the training Scripts/ENV subdirectory on each cluster host system. 2. Copy the nodenv.sh le to /.profile on each cluster host. 3. Edit the .profile le on each cluster host system and set the DISPLAY variable to the name of the administration workstation. 4. Log out and in again as user root on each cluster host system.
Verifying the Cluster Host Systems Environment

1. Verify that the following environmental variables are present on each cluster host. PATH=/usr/opt/SUNWmd/sbin:/opt/SUNWcluster/bin:/opt/S UNWsma/bin:/opt/SUNWsci/bin:/usr/bin:/usr/sbin:/sbin: /usr/ucb:/etc:. MANPATH=/usr/man:/opt/SUNWcluster/man:/opt/SUNWsma/ma n:/opt/SUNWvxvm/man:/opt/SUNWvxva/man:/usr/opt/SUNWmd /man TERM=vt220 EDITOR=vi DISPLAY=adminworkstation:0.0 Note If necessary, edit the .profile le on each cluster host and set the DISPLAY variable to the name of the administration workstation.
3-31
3
Verifying the Cluster Host Systems Environment (Continued)
2. On each cluster host system, create a /.rhosts le that contains a plus (+) sign. 3. Edit the /etc/default/login le on each cluster host system and comment out the console login control line as follows: #CONSOLE=/dev/console 4. On the administration workstation, type xhost + in the console window to enable remote displays.
3-32

3
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
3-33
3
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K Summarize the Sun Cluster administration workstation functions Use the scinstall script features Install the client software on the administration workstation Set up the administration workstation environment Congure the Sun Cluster administration tools
3-34

3
Think Beyond
What is the advantage of the /etc/clusters and /etc/serialports les? Why is the Cluster SNMP agent installed on the administrative workstation? What is the impact on the cluster if the administrative workstation is not available? What would you do for backup?
3-35
Preinstallation Conguration
Objectives
G G G G G
Congure any supported cluster topology List the appropriate applications for each topology Congure the cluster interconnect system Explain the need for a simple quorum device Estimate the number of quorum devices needed for each cluster topology. Describe the purpose of the public network monitor feature Describe the purpose of a mirrored CCD volume Explain the purpose of the terminal concentrator node locking port
G G G
This module provides the information necessary to prepare a Sun Enterprise Cluster (SEC) system for the Sun Cluster software installation.
4-1
4
Relevance

Discussion The following question is relevant to your learning the material presented in this module: 1. Why is so much preinstallation planning required for an initial software installation?
4-2

4
G G G
Sun Cluster 2.2 System Administration Guide, part number 805-4238 Sun Cluster 2.2 Software Installation Guide, part number 805-4239 Sun Cluster 2.2 Cluster Volume Manager Guide, part number 805-4240
4-3
Cluster Topologies
You can congure a Sun Enterprise Cluster system in several ways called topologies. Topology congurations are determined by the types of disk storage devices used in a cluster and how they are physically connected to the cluster host systems. To be a supported conguration, all disk storage devices must have dual-ported system access at a minimum. You can use most of the topologies in different application environments, but some lend themselves better to particular applications.
4-4

4
Cluster Topologies
Clustered Pairs Topology
The clustered pairs configuration, shown in Figure 4-1, is essentially a pair of two-node clusters that function independently of one another but can be managed as a single cluster. This configuration is easy to administer but the Sun Cluster software treats it as a single four-node cluster when a planned or unplanned reconfiguration happens. This causes service disruption on all of the nodes during a cluster reconfiguration.
CIS switch or hub
Node 0
Node 1
Node 2
Node 3
A B Array
A B Array
A B Array
A B Array
Figure 4-1 Clustered Pairs Topology Conguration
Target Applications
The clustered pairs conguration is suitable for applications in which there are two highly available data services that depend on one another. The HA-SAP application can effectively use the clustered pairs topology.

The HA-SAP application depends on a HA-DBMS application.
4-5
4
Cluster Topologies
Ring Topology
The Ring topology, shown in Figure 4-2, allows each node to assume the workload of either of two neighbors, should that neighbor fail.
Node 0
Node 3
CIS switch or hub
Node 1
Node 2
Figure 4-2 Ring Topology Conguration
Target Applications
The ring topology conguration is used by Highly Available Data Service (HADS) installations. It allows a great deal of exibility when selecting the backup node for a particular data service. Note For clarity, mirror devices are not shown in the Figure 4-2 drawing.

With the ring configuration, each node has two choices for a data service backup. This configuration also allows the use of older storage arrays in a four-node cluster. Configuring data services in a ring topology can be very complex. Data services are most commonly configured in a two-node cluster.
4-6

4
Cluster Topologies
N+1 Topology
The N+1 topology, shown in Figure 4-3, provides one system to act as the backup for every other system in the cluster. All of the secondary paths to the storage devices are connected to the redundant or N+1 system, which can be running a normal workload of its own. When any cluster system fails, it always fails over to the N+1 system. The N+1 system must be large enough to support the workload of any of the other systems in the cluster. CIS switch or hub
Node 0
Node 1
Node 2
Node 3
Array
Array
Array
Figure 4-3
N+1 Topology Conguration
Target Applications
The N+1 topology is used by Highly Available Data Service (HADS) installations. Using this conguration, the backup node can take over without any performance degradation and the backup node is more cost effective because it does not require dedicated data storage.

In a HADS application, if a node fails, all of its data services must migrate to the designated backup node. If there is more than one node failure, the backup node can be overloaded.
4-7
4
Cluster Topologies
Shared-Nothing Topology
The shared-nothing conguration, shown in Figure 4-4, requires only a single connection to each disk storage unit. CIS switch or hub
Node 0
Node 1
Node 2
Node 3
A Array
A Array
A Array
A Array
Figure 4-4 Shared-Nothing Topology Conguration
Target Applications
The shared-nothing conguration is used only by the Informix Online XPS database. The following limitations apply:
G
There are no failover capabilities in the event of a node failure
Although dual-ported disk storage is shown, the second port on each array would not be used in this configuration.
4-8

4
Cluster Topologies
Scalable Topology
The scalable topology configuration features uniform access to storage array data from all nodes in a cluster. This configuration, shown in Figure 4-5, must use the A5000 storage arrays, which allow simultaneous connections from up to four nodes.
CIS switch or hub
Node 0
Node 1
Node 2
Node 3
a0
a1
b0
b1
A5000 storage array
Figure 4-5 Scalable Topology Conguration
Target Applications
The scalable topology can be used by all Sun Enterprise Cluster applications including: Oracle Parallel Server, Informix Online XPS, and all high availability databases and data services.

Very large data storage configurations are possible with the A5000 storage arrays.
4-9
Cluster Quorum Devices

Quorum devices are designated storage devices that are used to prevent data corruption in a cluster when a node suddenly crashes or the cluster interconnect fails. A quorum device can be either a single array disk or an array controller, but it must be a device that is physically connected to at least two cluster hosts. Using a quorum device is one of several cluster strategies to help ensure that storage array data is not corrupted by a failed cluster host system. Quorum devices must be congured during the Sun Cluster software installation. Depending on your topology you will be asked to dene from one to as many as four quorum devices. Note Quorum disk drives are not used in Solstice DiskSuite (SDS) installations. SDS use a different method to ensure data integrity.
4-10

4
Disk Drive Quorum Device
As shown in Figure 4-6, when there is a complete cluster interconnect failure in a two-node cluster, each node assumes the other is in an uncontrolled state and must be prevented from accessing data. Both nodes try to perform a Small Computer Systems Interface (SCSI) reservation of the designated quorum device. The rst node to reserve the quorum disk remains as a cluster member. In a two-node cluster, the node that fails the race to reserve the quorum device will abort the Sun Cluster software. The quorum device information is stored in a cluster database le that is congured when the Sun Cluster software is installed. Node 0 Database Quorum = c3t0d0 Cluster interconnect Node 1 Database Database Quorum Quorum = c2t0d0 = c2t0d0
Reserve A Q B
Reserve
Disk storage array Figure 4-6 Quorum Disk Drive
Note The physical device paths can be different from each attached host system, as shown in Figure 4-6.
4-11
4
Array Controller Quorum Device
As shown in Figure 4-7, you can also congure an array controller as the quorum device during the Sun Cluster software installation. The race for the quorum controller is the same as for the quorum disk drive. The losing node must abort the Sun Cluster software. The worldwide number (WWN) of the array controller is used during the SCSI reservation instead of the disk drive physical path. Node 0 Database Quorum = WWN Node 1 Database Database Quorum Quorum = WWN = c2t0d0
Cluster interconnect
Reserve Controller A
Reserve
Disk storage array Figure 4-7 Quorum Array Controller
Caution If an array controller is used as the quorum device, the array must not contain any disks that are private to one of the nodes such as a boot disk. When the controller is reserved by one node, the other node can no longer access its private disk.
4-12

4
Quorum Device in a Ring Topology
Quorum devices are shared between nodes than can both master the device. As shown in Figure 4-8, the potential resource masters are more complicated in the ring topology.
Node 0
Node 1
Node 2
Resource 1 Figure 4-8

G G G
Resource 2
Resource 3
Potential Masters in a Ring Topology
Node 0 and Node 2 can master Resource 1 Node 0 and Node 1 can master Resource 2 Node 1 and Node 2 can master Resource 3
Note During the Sun Cluster software installation on a ring topology, you will be asked to select a quorum device for each pair of resource masters.
4-13
4
Quorum Device in a Scalable Topology
If your cluster is congured in a scalable topology that uses direct attach storage arrays, such as the StorEdge A5000, you will be asked to congure a single quorum disk during the Sun Cluster software installation. This quorum disk is not used in the same manner as on dual-ported storage arrays and is not really required. With the appearance of three- and four-node clusters using direct access storage devices, such as the A5000, all nodes have equal access to the disk drives. The SCSI reservation feature cannot be applied to multiple nodes. Only one node can exclusively reserve the storage array but as many as three surviving nodes need access to the disks. The scalable topology instead uses the terminal concentrator in a scheme called failure fencing to ensure that a suspect node is prevented from corrupting data.
Failure Fencing and Node Locking

If your cluster has more than two nodes and uses direct attach storage devices, such as the StorEdge A5000, you are asked to select a node locking port during the Sun Cluster software installation. You are asked for the port number of an unused terminal concentrator port. Does the cluster have a disk storage device that is connected to all nodes in the cluster [no]? yes Which unused physical port on the Terminal Concentrator is to be used for node locking: 6
4-14

4
As shown in Figure 4-9, the solution to the failure fencing problem is to use the Terminal Concentrator to issue an abort sequence to the failed node, which ensures it cannot damage disk resident data.
Lock Abort TC Ethernet
Node 3 interconnect failure
H U B
Node 0 Node 1 Node 2 Node 3
Direct attached storage
Figure 4-9
Node Locking Overview
The node that issues the abort is the one that rst obtained a cluster lock by locking an unused TC port using the telnet utility. The cluster lock is obtained by the rst node that joins the cluster. It can be transferred to another functional node in the event of a public network failure. The cluster lock is always owned by the node with the lowest Node ID. Note The node locking function is used to prevent an operator from accidentally starting a second cluster on a node that has been partitioned by a complete cluster interconnect failure.
4-15
Cluster Interconnect System Overview

All Sun Enterprise Cluster installations must have a CIS, a dedicated high-speed interconnect system, to enable constant communication. The nodes need to exchange status and conguration information, as well as provide a fast, efcient path for some types of application data. Note During the Sun Cluster software installation, the CIS is also referred to as the private network interface.
4-16

4
Cluster Interconnect System Overview
Interconnect Types
You can use two types of interconnect systems:
G G
100base-T Ethernet-based interconnect Scalable Coherent Interface (SCI) interconnect
The SCI interconnect has a 100 Mbyte/sec bandwidth and low latency that is needed by the OPS and XPS parallel database applications.
Interconnect Configurations
Depending on the number of cluster host systems, you can use two interconnect congurations:
G G
Point-to-point for two-node clusters Interconnect hubs or switches for greater than two-node cluster
Note You can implement both congurations with Ethernet or SCI hardware. Figure 4-10 demonstrates the basic interconnect conguration rules in which corresponding physical interfaces are connected and redundant to avoid a cluster-wide single point of failure. Node 0 System board hme0 Node 1 hme0 System board
System board Figure 4-10
hme1
hme1
System board
Basic Interconnect Conguration
Note Ethernet crossover cables are required for the Ethernet-based point-to-point cluster interconnect.
4-17
Cluster Interconnect System Conguration

Before installing the Sun Cluster software on the cluster host systems, the CIS must be cabled correctly. This involves identifying the primary and backup interfaces, as well as making the appropriate connections. When the cluster software starts on each node, the lowest numbered physical interconnect interfaces are activated rst and are considered the primary interfaces. The primary interfaces should all be connected to the same hub or switch. Any node that violates this conguration cannot join the cluster because it will attempt to communicate through the wrong hub or switch. Only one interface is used at a time. When CIS failure is detected on any node in the cluster, all nodes switch to their backup interconnects.
4-18

4
Cluster Interconnect Addressing
During the Sun Cluster software installation, xed IP addresses are assigned to each interconnect interface in the cluster. Each interconnect pair on a node are also assigned a virtual IP address. This is used to switch from one interconnect to the backup in the event of failure. The physical and virtual addresses are shown in Figure 4-11. The addresses are licensed to Sun Microsystems for their exclusive use. First Node hme 0, scid 0 204.152.65.1 204.152.65.33 hme 1, scid 1 204.152.65.17 Third Node hme 0, scid 0 204.152.65.3 204.152.65.35 hme 1, scid 1 204.152.65.19
Second Node hme 0, scid 0 204.152.65.2 204.152.65.34 hme 1, scid 1 204.152.65.18 Figure 4-11
Fourth Node hme 0, scid 0 204.152.65.4 204.152.65.36 hme 1, scid 1 204.152.65.20
Cluster Interconnect Address Assignments
Note The Sun Cluster 2.2 Software Installation Guide states, in chapter 3, that the three interconnect addresses for each active node must be placed in a .rhosts le on each node in the root (/) directory. This is only for use by the new hadsconfig utility that can be used to congure or edit the Netscape http, news, and mail data services. It can also be used to modify the HA-DNS data service.
4-19
4
Point-to-Point Connections
Point-to-point connections are used for two-node clusters only and can be Ethernet or SCI based. You must ensure that similar logical interfaces are connected. As shown in Figure 4-12, the logical interface hme0 on Node 0 must be physically connected to the logical interface hme0 on Node 1. The same is true for the logical hme1 interfaces. Node 0 System board hme0 or scid0 hme1 or scid1 primary Node 1 hme0 or scid0 System board
backup
System board
hme1 System board or scid1
Figure 4-12
Cluster Interconnect High Availability Cabling
Caution Any time a SCI component is moved or replaced, you must run the /opt/SUNWsma/bin/sm_config script or your system will not function reliably. To avoid a single point of failure, there are always two separate interfaces on each node for the CIS or private network. If any cluster member loses communication through the cluster interconnect, all cluster members switch to their backup interfaces to try and reestablish communication with one another. The same rules apply when using the SCI interfaces.
4-20

4
SCI High-Speed Switch Connection
If a three- or four-node cluster uses the high-speed SCI switch conguration, shown in Figure 4-13, care must be taken when initially connecting the cables. If it is not clear which node is connected to a particular switch, mistakes can be made during problem resolution attempts.
G G G
Node 0/scid0 should be connected to port 0 of switch 0 Node 0/scid1 should be connected to port 0 of switch 1 The same scheme should be used for all other nodes in the cluster Node 1 Switch 0 scid0 scid1 scid0 scid1 Node 3 scid0 scid1 System board System board System board System board
Node 0 System board System board Node 2 System board System board scid0 scid1 0 1 23 Switch 1 0 1 23
Figure 4-13
High-speed SCI Switch Interconnect
Note Trying to predict which SCI cards are congured as 0 and 1 is difcult. The best indicator that you can use before you install the software is to see what the reset command shows at the ok prompt. The rst card to show in the listing is scid0.

You can use the SCI switch configuration in a two-node cluster if you plan to add more nodes in the future.
4-21
4
SCI Card Identication
To determine the version of the SCI cards, type reset at the ok prompt. This causes the SCI cards to run an internal self-test. 0} ok reset Resetting ... DOLPHIN SBus-to-SCI (SBus2b) Adapter - 9029, Serial #6342 FCode 9029 $Revision: 2.14 $ - d9029_61 $Date: 1997/08/19 13:41:23 Executing SCI adapter selftest. Adapter OK.
DOLPHIN SBus-to-SCI (SBus2b) Adapter - 9029, Serial #6340 FCode 9029 $Revision: 2.3 $ - d9029_52 $Date: 1996/10/30 07:47:53 Executing SCI adapter selftest. Adapter OK.
Note The SCI cards must be the current SBus2b version or they will not function correctly.
SCI Card Self-Test Information

The SCI card self-test routines display several important pieces of information that you should record for each system. You can determine the following from the self-test:
G G G G G
There are two cards in the system and they are basically functional Both cards are the SBus2b model The rst card to appear is addressed as scid0 The second card to appear is addressed as scid1 The cards have the serial numbers 6342 and 6340
Note There are serial number tags on the face of each SCI card.
4-22

4
SCI Card Scrubber Jumpers
As shown in Figure 4-14, each SCI interface card has a scrubber jumper that enables the card to perform link maintenance functions. This jumper needs to be set either on or off depending on your cluster conguration. Scrubber jumper On
Off
Figure 4-14
SCI Card Scrubber Jumper Location
Two-Node SCI Interconnect Jumper Settings

In a two-node point-to-point conguration, only one scrubber jumper per link is set to on. Scrubber on Node 0 Scrubber off Node 1 Scrubber on Scrubber off
Figure 4-15
SCI Card Scrubber Jumper Conguration
Note All SCI card scrubber jumpers must be set to on in three and four-node interconnect conguration.
4-23
4
Ethernet Hub Connection
If a three- or four-node cluster uses the Ethernet hub conguration, shown in Figure 4-13, the interfaces must be connected to the Ethernet hubs as follows:
G G
All hme0 Ethernet interfaces must connect to Hub 0 All hme1 Ethernet interfaces must connect to Hub 1 Node 1 Hub 0 hme0 hme1 hme0 hme1 Node 3 Hub 1 hme0 hme1 hme0 hme1 System board System board System board System board
Node 0 System board System board Node 2 System board System board Figure 4-16
Ethernet Hub Interconnect
The Ethernet cluster interconnect interfaces must meet the following requirement:
G G G
They are dedicated exclusively for use by the CIS software You cannot use the primary system interface Do not place the primary and backup interfaces on the Quad Ethernet cards Only 100base-T Ethernet cards and hubs are supported
4-24

4
Ethernet Card Identication
It is difcult to determine which physical address is assigned to a specic Ethernet card. The OpenBoot PROM rmware creates a device tree when the system is powered on and that tree is used by the Solaris operating system to assign physical interface numbers. The rules about hardware address assignment are complex and are different for virtually every hardware platform. Few individuals have sufcient hardware knowledge to accurately predict the physical address for a given Ethernet interface. Most people perform OpenBoot PROM testing on Ethernet interfaces while attaching them to an active network. When the interface that is being tested nally passes the rmware tests, you can make an appropriate notation on a hardware diagram. Note On Exx00 systems that the operating system installed and the cluster is not active, you can use ifconfig -a plumb to plumb all interfaces and help identify what is available. You can then use the ifconfig -a command to list the plumbed interfaces. This is a dangerous practice after the cluster is congured and should only be used prior to conguring the Sun Cluster software.
4-25

The Public Network Management software creates and manages designated groups of local network adapters. The PNM software is a Sun Cluster package that provides IP address and adapter failover within a designated group of local network adapters. It is designed for use in conjunction with the HA data services. By itself, it has limited functionality. The network adapter failover groups are commonly referred to as NAFO groups. If a cluster host network adapter fails, its associated IP address is transferred to a local backup adapter. A NAFO group can consist of any number of network adapter interfaces but usually contains only a few. Note This discussion of NAFO groups is to eliminate confusion during the Sun Cluster software installation on the cluster host systems. It is not intended as a complete NAFO group lecture.
4-26

4
As shown in Figure 4-17, the PNM daemon (pnmd) continuously monitors designated network adapters on a single node. If a failure is detected, pnmd uses information in the cluster conguration database (ccd) and the pnmconfig file to initiate a failover to a healthy adapter in the backup group. Network
Node 0 hme0 hme1 hme0 hme1
Primary Backup nafo12 group Node 1
Primary Backup nafo7 group
Monitor primary
pnmd
ifconfig backup
ccd NAFO group IP address /etc/pnmconfig NAFO group configuration
Figure 4-17
Public Network Management Components
4-27
4
PNM Conguration
You use the pnmset command to congure a network adapter backup group. The following shows the process of creating two separate backup groups. With pnmset, you can create all of the NAFO backup groups at the same time, or create them one at a time. # /opt/SUNWpnm/bin/pnmset In the following you will be prompted to do configuration for network adapter failover do you want to continue...[y/n]: y How many PNM backup groups on the host: 2 Enter backup group number: 7 Please enter all network adapters under nafo0 qe1 qe0 Enter backup group number: 12 Please enter all network adapters under nafo1 hme0
You can assign any number you wish to a NAFO group. However, you cannot have more than 255 NAFO groups on a system. The groups are given the name nafo followed by the number you furnish during the conguration. Typical group names are nafo7 or nafo12. Note During the Sun Cluster software installation, you are asked if you want to congure public networks and NAFO groups. You do not have to do this at this time. You can congure them after the installation.
4-28

Shared CCD Volume

Each node in a cluster has an identical copy of the cluster conguration database le called ccd.database (CCD). When a node tries to join the cluster, its CCD must be consistent with the current cluster member or it cannot join. If there is a CCD inconsistency in a two-node cluster, there is no way to establish a majority opinion about which CCD is correct. Another problem is that when only one node is in the cluster you cannot modify the CCD database. Note The shared CCD volume is not supported in Solstice DiskSuite cluster installations.
4-29
4
Shared CCD Volume
In a two-node cluster, you can create a third CCD database that is resident on the storage arrays. It is in a private disk group that is highly available. The disk group is imported to the assigned backup node when necessary. Node 0 ccd.database Node 1 ccd.database
I/O interfaces
I/O interfaces
ccd primary Mass storage Figure 4-18
ccd mirror Mass storage
Array Resident ccd Database Conguration
In the case of a single node failure, there are two CCD les to compare to ensure integrity. Note The shared CCD can be used only with a two-node cluster.
4-30

4
Shared CCD Volume
Shared CCD Volume Creation
You can create the shared CCD volume either by replying yes to its creation during the scinstall process or by using the confccdssa script after installation. The confccdssa script, found in the /opt/SUNWcluster/bin directory, sets up a shared disk array-resident copy of the CCD that is mirrored. In either case, you are asked to dedicate two storage array drives for the mirrored CCD volume. If you specied a shared CCD during scinstall processing, you must run the confccdssa command after the Sun Cluster software installation has completed. To create a shared CCD: 1. Identify two entire drives on shared disk storage to contain the CCD data. For reliability, they should be on two different storage devices. 2. If you did not reply yes to the shared CCD question during scinstall processing, run the following command on both nodes in the cluster: # scconf clustername -S ccdvol 3. Run confccdssa on only one node. # confccdssa The confccdssa program furnishes a list of disk drives available for use. They do not yet belong to any disk group. After you select a pair of drives, the program create a special disk group and volume for CCD use. The disks cannot be used for any other purpose.
Disabling a Shared CCD

To disable shared CCD operation, run: # scconf clustername -S none
4-31
Cluster Conguration Information

Before you install the Sun Cluster software, you should record the general system conguration information. This information can be useful if there are any problems during the software installation. Some of the utilities that you can use are:
G G G
prtdiag finddevices luxadm
4-32

4
Using prtdiag to Verify System Conguration
The prtdiag command furnishes general information about your system conguration, but detailed analysis of the output requires considerable hardware knowledge. The following portion of the prtdiag output shows some basic system conguration information. # /usr/platform/sun4u/sbin/prtdiag System Configuration: Sun Microsystems Enterprise 4000/5000 System clock frequency: 84 MHz Memory size: 512Mb
sun4u 8-slot Sun
========================= CPUs ========================= Run MHz ----168 168 168 168 Ecache MB -----0.5 0.5 0.5 0.5 CPU Impl. -----US-I US-I US-I US-I CPU Mask ---2.2 2.2 2.2 2.2
Brd --0 0 2 2
CPU --0 1 4 5
Module ------0 1 0 1
========================= Memory ========================= Intrlv. Factor ------2-way 2-way Intrlv. With ------A A
Brd --0 2
Bank ----0 0
MB ---256 256
Status ------Active Active
Condition ---------OK OK
Speed ----60ns 60ns
From the output you can determine that:

G G G
The system is an 8-slot E4000/5000 system There are CPU/Memory boards in slots 0 and 2 There is a total of 512 Mbytes of system memory
4-33
4
Using prtdiag to Verify System Conguration (Continued)
The following portion of the prtdiag output shows more detailed information about the systems interface board conguration. ========================= IO Cards ========================= Bus Type ---SBus SBus SBus SBus SBus SBus Freq MHz ---25 25 25 25 25 25
Brd --1 1 1 1 1 1
Slot ---0 1 2 3 3 13
Name --------------------DOLPHIN,sci qec/be (network) QLGC,isp/sd (block) SUNW,hme SUNW,fas/sd (block) SUNW,soc/SUNW,pln
Model -------------SUNW,270-2450 QLGC,ISP1000U
501-2069
Detached Boards =============== Slot State Type ---- -------- ----7 disabled disk
Info ----------------------------Disk 0: Target: 14 Disk 1: Target: 15
From the output you can determine that:

G G
There is an I/O board in slot 1 There are four option cards on the I/O board
M M M M
A SCI card in option slot 0 A quad-Ethernet card in option slot 1 An intelligent SCSI interface card in option slot 2 An SOC optical module in option slot 13
There is a disk board in slot 7
4-34

4
Interpreting prtdiag Output
Interpreting prtdiag command output in more detail requires additional information about the internal structure of the I/O boards. Figure 4-32 can help you understand the prtdiag output of several of the Ultra-based systems. TPE Fast/Wide 10/100 SCSI FCOM FCOM
SOC slot 13 (d)
SBus card slot 2
SBus card slot 1
SBus card slot 0
FEPS slot 3
SBus 0
SBus 1
Figure 4-19 Ultra 3000/4000/5000/6000 I/O Board Conguration
4-35
4
Identifying Storage Arrays
The finddevices and luxadm command are useful for identifying the storage arrays attached to a cluster system.
The finddevices Script Output

The /opt/SUNWcluster/bin/finddevices script is a standard feature of cluster software. It displays the controller number and the 12-digit array worldwide number for all storage arrays except the A5000. It does not display non-array controller information. # /opt/SUNWcluster/bin/finddevices c2:00000078BF60 c3:00000078B12D c4:00000078BF9E
The luxadm Utility Output

The luxadm command is a standard Solaris Operating System command. It can display information about any supported storage array. The probe option is used only for A5000 storage arrays. # luxadm probe Found SENA Name:d Node WWN:5080020000011df0 Logical Path:/dev/es/ses0 Logical Path:/dev/es/ses1 SENA Name:a Node WWN:50800200000291d8 Logical Path:/dev/es/ses2 Logical Path:/dev/es/ses3
4-36

Storage Array Firmware Upgrades

Before you upgrade the rware in any storage array, you should rst perform careful research. Upgrading the rmware revision of a storage array is not a single process. The following two components are upgraded when a storage array rmware patch is installed:
G G
The Solaris Operating Environment storage array driver software The storage array rmware revision
You must complete the process. If a new system driver is installed but the related array rmware is not downloaded, you can create a mismatch between the Solaris Operating Environment storage array driver and the array rmware that makes the storage array unavailable. Recovering from this problem can result in a considerable amount of cluster downtime. If you are contemplating array rmware upgrades, it is a good idea to get assistance from your authorized Sun eld representative.
4-37
4
Storage Array Firmware Upgrades
Array Firmware Patches
It is easy to underestimate the complexity of updating array rmware levels. The following example demonstrates what might be involved if you are considering updating the rmware on your Sun StorEdge A5000 storage arrays.
G
Different patches are required for different Solaris Operating Environment versions Different patches are required for different types of host system interface cards (SBus and PCI based cards) Patches can be required for the storage array disk drives
Typical Sun StorEdge A5000 Patches

If your A5000 arrays are connected to SBus-based SOC+ cards and your system is running the Solaris 2.6 operating system, you might need to install all of the following patches:
G
Patch 103346 to update CPU and I/O rmware but only on Sun Enterprise I/O boards with onboard FCAL. Patch 105356 to update the Solaris 2.6 Operating Environment /kernel/drv/ssd driver Patch 105357 to update the Solaris 2.6 Operating Environment /kernel/drv/ses driver Patch 105375 to update the sf and socal drivers, array interface card rmware and array controller board rmware If the version number of the rmware too low, you must rst install the 105375-04 version of the patch to bring it up to revision 1.03.
Patch 106129 to update the A5000 disk drive rmware
Note All of the above patches must be installed in a certain order. For more information and comprehensive instructions, see the README notes for patch 105375.
4-38

4
Exercise: Preinstallation Preparation
Exercise objective In this exercise you will do the following:
G G G G
Select and congure a cluster topology Estimate the number of quorum devices needed Verify that the cluster interconnect is correctly cabled Select an appropriate terminal concentrator node locking port
Preparation
To begin this exercise, your cluster should be in the following state:
G
You are connected to the cluster hosts through the cconsole tool The cluster host systems have been booted and you are logged into them as user root

Rather than the students blundering around trying to figure out what kind of cluster configuration they need, discuss their target system configuration. Your lab equipment might be any number of models and configurations. You have to assign systems based on their interest (HA two-node, PDB 3-node, SCI interconnect). Help the students identify the topology they will use on their assigned cluster. Have them record this in the cluster topology section of this exercise.
Tasks
G G G G
Cluster topology Quorum device conguration SCI or Ethernet cluster interconnect conguration Node locking conguration for direct attach storage arrays.
4-39
4
Cluster Topology
1. Record the desired topology conguration of your cluster. Topology Conguration Target Conguration Number of Nodes Number of Storage Arrays Types of Storage Arrays 2. Verify that the storage arrays in your cluster are connected in your target topology. Recable the storage arrays if necessary.
Quorum Device Conguration

1. Record the estimated number of quorum devices you must congure during the cluster host software installation. Estimated number of quorum devices
Note Please consult with your instructor if you are not sure about your quorum device conguration.
4-40

4
Ethernet Cluster Interconnect Conguration
Point-to-Point Ethernet Interconnect
Skip this section if your cluster interconnect is not a point-to-point Ethernet conguration. 1. Ask your instructor for assistance in determining the logical names of your cluster interconnect interfaces. 2. Complete the form in Figure 4-20 if your cluster uses an Ethernetbased point-to-point interconnect conguration. Node 0 First Ethernet interface: Second Ethernet interface: Node 1 First Ethernet interface: Second Ethernet interface:
Figure 4-20
Ethernet Interconnect Point-to-Point Form
4-41
4
Ethernet Cluster Interconnect Conguration
Hub-Based Ethernet Interconnect
Skip this section if your cluster interconnect is not an Ethernet interconnect with hubs conguration. 1. Complete the form in Figure 4-21 if your cluster uses an Ethernetbased cluster interconnect with hubs. Node 0 First Ethernet interface: Second Ethernet interface: Hub 0 First Ethernet interface: Second Ethernet interface: Node 1
Node 2 Hub 1 First Ethernet interface: Second Ethernet interface:
Figure 4-21 2.
Ethernet Interconnect With Hubs Form
Verify that each Ethernet interconnect interface is connected to the correct hub.
Note If you have any doubt about the interconnect cabling, consult with your instructor now. Do not continue this lab until you are condent that your system is cabled correctly.
4-42

4
SCI Cluster Interconnect Conguration
Point-to-Point SCI Interconnect
Skip this section if your cluster interconnect is not a SCI point-to-point conguration. 1. Using the cconsole common window, halt each of your cluster host system. Type the init 0 command to halt the systems. 2. Type a reset command at the ok prompt on all cluster hosts. 3. Use the scroll bar on each cconsole host window and review the information from the rst and second SCI card self-test of each node in the cluster. Note The SCI card self-tests might repeat twice on some systems, so be careful to start recording at the rst self-test output. 4. Complete the form in Figure 4-22 if your cluster uses a SCI-based point-to-point CIS conguration. Node 1 First self-test serial number: scid0 __________ Second self-test serial number: scid1 __________ Node 0 First self-test serial number: __________ scid0 Second self-test serial number: __________ scid1
Figure 4-22
SCI Interconnect Point-to-Point Form
4-43
4
SCI Cluster Interconnect Conguration
SCI Interconnect with Switches
Skip this section is your cluster interconnect is not a SCI interconnect with switches conguration. 1. Using the cconsole common window, halt each of your cluster host system. Type the init 0 command to halt the systems. 2. Type a reset command at the ok prompt on all cluster hosts. 3. Use the scroll bar on each cconsole host window and review the information from the rst and second SCI card self-test of each node in the cluster. Note The SCI card self-tests might repeat twice on some systems, so be careful to start recording at the rst self-test output. 4. Complete the SCI interconnect conguration form in Figure 4-23.
4-44

4
SCI Cluster Interconnect Conguration (Continued)
SCI Interconnect with Switches
Node 0 First self-test serial number: __________scid0 Second self-test serial number: __________ scid1 Node 2 First self-test serial number: __________scid0 Second self-test serial number: __________ scid1 Switch 1 0 2 1 3 Node 1 First self-test serial number: scid0 __________ Second self-test serial number: scid1 __________
Switch 0 0 2 1 3
Figure 4-23 5.
SCI Interconnect With Switches Form
Each SCI card has a serial number tag on its face. Verify that each serial number is connected to the proper switch and port number.
Note If you have any doubt about the SCI interconnect cabling, please consult with your instructor now. Do not continue this lab until you are condent your system is cabled correctly.
4-45
4
Node Locking Conguration
Although the TC might have been set up previous to this exercise, the unused TC port that is used for node locking must be correctly congured. 1. Verify that serial port 6 on the TC is not connected to a cluster host system. 2. If serial port 6 is in use on the TC, record the port number of a TC port that is not in use. TC Locking Port: _______ Note Do not use serial port 1. It is a special purpose port and does not work for node locking. 3. Boot each of your cluster host systems if they are halted.
4-46

4
Exercise: Hardware Conguration
Exercise Summary
Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.

G
Experiences
G
Interpretations
G
Conclusions
G
Applications
4-47
4
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K K K K Congure any supported cluster topology List the appropriate applications for each topology Congure the cluster interconnect system Explain the need for a simple quorum device Estimate the number of quorum devices needed for each cluster topology Describe the purpose of the public network monitor feature Describe the purpose of a mirrored CCD volume Explain the purpose of the terminal concentrator node locking port
4-48

4
Think Beyond
What additional preparation might be necessary before installing the Sun Cluster host software?
4-49
Cluster Host Software Installation

Objectives
G G
Install the Sun Cluster host system software Correctly interpret conguration questions during Sun Cluster software installation on the cluster host systems Perform post-installation conguration
This module reviews the process of installing and conguring the Sun Cluster software on each of the cluster host systems. Background information is furnished that will enable you to correctly interpret the installation questions.
5-1
5
Relevance

Discussion The following questions are relevant to understanding this modules content: 1. What conguration issues might control how the Sun Cluster software is installed? 2. What type of post-installation tasks might be necessary? 3. What other software might you need to nish the installation?
5-2

5
G G

5-3
Sun Cluster Server Software Overview

The Sun Cluster server software packages include both the general cluster framework software and support for a number of optional congurations. The software types include:
G G G G G G
Sun Cluster server packages (basic framework) SCI interconnect system support Solstice DiskSuite support (driver, mediator) HA data services support HA databases support Oracle Parallel Server support (UNIX Distributed Lock Manager (UDLM))
Before upgrading to a new Sun Cluster release, you must obtain the most recent copies of the Sun Cluster software installation manual and the Sun Cluster product release notes. The upgrade process can be complex and must be performed according to published procedures.
5-4

5
As shown in Figure 5-1, the Sun Cluster server software is installed on each of the cluster hosts systems along with the appropriate volume management software. Administration workstation Solaris 2.6/7 Sun Cluster client software Network Private disk Solaris 2.6/7 Sun Cluster server software Volume management Node 0 system hardware Node 1 system hardware Private disk Solaris 2.6/7 Sun Cluster server software Volume management
Disk storage array Figure 5-1
Disk storage array
Cluster Software Distribution

5-5
5
Server Package Set Contents
The server package sets contain the following packages with descriptions, as described by the pkginfo command:
Sun Cluster Framework Packages

The following packages are loaded in all installations:
G G G G G G G G G
SUNWscman Sun Cluster Man Pages SUNWccd Sun Cluster Conguration Database SUNWsccf Sun Cluster Conguration Database SUNWcmm Sun Cluster Membership Monitor SUNWff Sun Cluster FailFast Device Driver SUNWmond Sun Cluster Monitor - Server Daemon SUNWpnm Sun Cluster Public Network Management SUNWsc Sun Cluster Utilities SUNWsclb Sun Cluster Libraries
Sun Cluster SCI Interconnect Support

The following packages are loaded only if your cluster has a SCI-based cluster interconnect system (private network):
G G
SUNWsci Sun Cluster SCI Driver SUNWscid Sun Cluster SCI Data Link Provider Interface (DLPI) Driver SUNWsma Sun Cluster Switch Management
5-6

5
Solstice DiskSuite Support
The following are Solstice DiskSuite (SDS) support software and are loaded only if you select SDS as your volume manager during the Sun Cluster software installation:
G G
SUNWdid Disk Identication (ID) Pseudo Device Driver SUNWmdm Solstice DiskSuite (Mediator)
Highly Available Data Service Support

The following data service support packages are loaded only if you select their related data service during the Sun Cluster software installation:
G G G G G G
SUNWscds Sun Cluster Highly Available Data Service Utility SUNWscdns Sun Cluster Highly Available DNS SUNWschtt Sun Cluster Highly Available Netscape Web Service SUNWsclts Sun Cluster Highly Available Service For LOTUS SUNWscnew Sun Cluster Netscape News Service SUNWscnsl Sun Cluster Highly Available Netscape Directory Server SUNWscnsm Sun Cluster Netscape Mail Service SUNWscpro Sun Cluster Internet Pro Common Files SUNWscsap Sun Cluster Highly Available Service For SAP R3 SUNWsctiv Sun Cluster Highly Available Service for Tivoli
G G G G

5-7
5
Highly Available Database Support
The following highly available database support packages are only loaded if you select their related database during the Sun Cluster software installation:
G G G
SUNWscor Sun Cluster Highly Available Oracle SUNWscsyb Sun Cluster Highly Available Sybase SUNWscinf Sun Cluster Highly Available Informix
Oracle Parallel Server Support

G
SUNWudlm Sun Cluster UNIX Distributed Lock Manager
Sun Cluster Licensing

Paper licenses for the Sun Cluster 2.2 framework are distributed for each hardware platform on which Sun Cluster 2.2 runs. Paper licenses also are distributed for each Sun Cluster data service, one for each node. No licenses are required for SDS or CVM. The Sun StorEdge Volume Manager license is bundled with SPARCstorage Arrays and A5000 arrays. You need a license for SSVM if you use it in a MultiPack-only environment. The Sun Cluster 2.2 framework does not enforce these licenses, but the paper licenses should be retained as proof of ownership when technical support or other support services are needed.
5-8

Sun Cluster Installation Overview

The Sun Cluster installation script, scinstall, is used to install and congure each host node for the services that it provides. The scinstall script prompts you for various conguration information and then installs the appropriate software components based on your responses. Some of the packages are always installed. Others are installed based on the responses that you provide to the prompts. (The prompts are issued from the SUNWsccf package installation process.) If incorrect answers are provided to the prompts, the system installs inappropriate packages for the conguration. The incorrect packages can prevent certain components from operating, cause them to operate more slowly, or provide extra services that are not wanted. If you specify incorrect information, the Sun Cluster packages must be completely removed from the system and the entire Sun Cluster installation process re-run.

5-9
Sun Cluster Volume Managers

During the Sun Cluster server software installation, you must select one of three available volume managers. Although you can use the volume managers for more than one cluster application, a volume manager is typically used for a specic application. The most common uses for each supported volume manager are:
G G
Cluster Volume Manager for Oracle Parallel Server Sun StorEdge Volume Manager for all HA data services and Informix XPS Solstice DiskSuite for customers migrating from the older HA 1.3 data services product
5-10

5
Sun Cluster Volume Managers
Volume Manager Choices
The rst conguration decision made during the server software installation is which volume manager you intend to use. This does not install the volume manager software but in some cases, installs support software, as shown in the following example. Volume Manager Selection Please choose the Volume Manager that will be used on this node: 1) Cluster Volume Manager (CVM) 2) Sun StorEdge Volume Manager (SSVM) 3) Solstice DiskSuite (SDS) Choose the Volume Manager: 3 Installing Solstice DiskSuite support packages. Installing SUNWdid ... done Installing SUNWmdm ... done ---------WARNING--------Solstice DiskSuite (SDS) will need to be installed before the cluster can be started. <<Press return to continue>> The Cluster Volume Manager is selected only if it is intended for use with the Oracle Parallel Server. Other supported data services and applications can use either the Sun StorEdge Volume Manager or Solstice DiskSuite.
5-11
Sun Cluster Host System Conguration

During the cluster host software installation, you are asked to furnish the name of your cluster and the names of each cluster system. The names supplied should agree with those used in the /etc/clusters and /etc/serialports les on the administrative workstation. Note The names do not have to agree; however standardization helps eliminate confusion when trying to administer a cluster.
5-12

5
Sun Cluster Host System Conguration
Cluster Host System Questions
The following cluster installation questions seem simple, but if not answered correctly, can cause a great deal of unnecessary effort later. What is the name of the cluster? sc-cluster How many potential nodes will sc-cluster have [4]? 3 How many of the initially configured nodes will be active [3]? 3 You can specify up to four nodes. The active nodes are those that you physically connect and include in the cluster now. The potential nodes are the number of nodes to which you will expand your cluster in the near future. Do not specify more potential nodes than active nodes unless you will be expanding your cluster in the near future. Note If the cluster has two active nodes and only two disk strings and the volume manager is Solstice DiskSuite, you must congure mediators. You should do so after conguring Solstice DiskSuite, but before bringing up the cluster. See the Sun Cluster 2.2 System Administration Guide for the procedure.
5-13
Sun Cluster Private Network Conguration

There are two types of private network systems used in the cluster, and these must be congured during installation.
SCI Interconnect Conguration

The private network conguration is brief if you are using the SCI interconnect system. The node names furnished during the installation are checked by scinstall against the /etc/nodename les on each cluster hosts. What type of network interface will be used for this configuration? (ether|SCI) [SCI]? SCI What is the hostname of node 0 [node0]? phys-hahost1 What is the hostname of node 1 [node1]? phys-hahost2 Note Additional SCI conguration is required after the Sun Cluster software installation is complete. The SCI post-installation process is discussed later in this module.
5-14

5
Sun Cluster Private Network Conguration
Ethernet Interconnect Conguration
If the interconnect on your cluster is Ethernet based, then you are asked additional questions about the interconnect paths on each node. The example shown is for the rst node. A similar dialogue is repeated for each node in the cluster. What is the hostname of node 0 [node0]? phys-hahost1 What is phys-hahost1's first private network interface [hme0]? hme0 What is phys-hahost1's second private network interface [hme1]? hme1 You will now be prompted for Ethernet addresses of the host. There is only one Ethernet address for each host regardless of the number of interfaces a host has. You can get this information in one of several ways: 1. 2. 3. -a use the 'banner' command at the ok prompt, use the 'ifconfig -a' command (need to be root), use ping, arp and grep commands. ('ping exxon; arp | grep exxon')
Ethernet addresses are given as six hexadecimal bytes separated by colons. ie, 01:23:45:67:89:ab What is phys-hahost1's ethernet address? 01:23:45:67:89:ab What is the hostname of node 1 [node1]?

You should record the Ethernet addresses in advance of the installation.
5-15
Sun Cluster Public Network Conguration

The controllers and names of primary and secondary network interfaces on each cluster host are collected during the Sun Cluster host system software installation. You can use part of this information during the Sun Cluster software installation to congure network adapter failover groups for use by the public network management software. The NAFO groups are mandatory if you are going to use any of the HA data services. However, you do not have to congure the NAFO groups during the Sun Cluster software installation. It might be easier to congure the NAFO groups after you complete the installation. Note The conguration and use of NAFO groups is discussed in more detail in a later module.
5-16

5
Sun Cluster Public Network Conguration
The operator is prompted during installation for the names of any primary and secondary public networks. You are also asked if you want to establish a NAFO backup group. What is the primary public network controller for phys-hahost1? hme2 What is the primary public network controller for phys-hahost2? hme2 Does the cluster serve any secondary public subnets (yes/no) [no]? y Please enter a unique name for each of these additional subnets: Subnet name (^D to finish): Subnet name (^D to finish): Subnet name (^D to finish): sc-cluster-net1 sc-cluster-net2 ^D
The list of secondary public subnets is: sc-cluster-net1 sc-cluster-net2 Is this list correct (yes/no) [yes]? For subnet sc-cluster-net1 ... What network controller is used for "phys-hahost1"? qe0 What network controller is used for "phys-hahost2"? qe0 For subnet "sc-cluster-net2" ... What network controller is used for "phys-hahost1"? qe1 What network controller is used for "phys-hahost2"? qe1 Initialize NAFO on "phys-hahost1" with one ctlr per group (yes/no) [yes]? y
5-17
Sun Cluster Logical Host Conguration

A logical host is a complex structure that associates a group of virtual volumes with a data service and also denes the user access path using a network interface that is part of a NAFO group. Logical hosts do not have to be congured during the Sun Cluster software installation. It might be easier to congure the logical hosts after you complete the basic Sun Cluster installation. If you do decide to congure logical hosts during the Sun Cluster installation process, you must have congured appropriate NAFO backup groups earlier in the installation. Note The conguration of logical hosts is a complex issue that is addressed in a later module.
5-18

5
Sun Cluster Logical Host Conguration
Logical hosts can be congured during the installation. Will this cluster support any HA data services (yes/no) [yes]? yes Okay to set up the logical hosts for those HA services now (yes/no) [yes]? yes Enter the list of logical hosts you want to add: Logical host (^D to finish): Logical host (^D to finish): The list of logical hosts is: hahost1 Is this list correct (yes/no) [yes]? y What is the name of the default master for hahost1? phys-hahost1 Enter a list of other nodes capable of mastering hahost1: Node name: phys-hahost2 Node name (^D to finish): ^D The list that you entered is: phys-hahost1 phys-hahost2 Is this list correct (yes/no) [yes]? y Enable automatic failback for hahost1 (yes/no) [no]? y What is the net name for hahost1 on subnet sccluster-net1? hahost1-net1 What is the net name for hahost1 on subnet sccluster-net2? hahost1-net2 Disk group name for logical host hahost1 [hahost1]? Is it okay to add logical host hahost1 now (yes/no) hahost1 ^D
5-19
Data Protection Conguration

Preserving data integrity during cluster failures requires that you use data protection techniques. If certain key components in a cluster fail, potentially dangerous situations must be anticipated. Depending on the conguration of your cluster, you are asked one or more questions related to data protection during the Sun Cluster software installation.
5-20

5
Failure Fencing
Failure Fencing is used in three and four node clusters to prevent a failed node from attempting uncontrolled data access. This is done by forcing a UNIX abort through the terminal concentrator. The following shows the conguration process. What type of architecture does phys-hahost1 have (E10000|other) [other]? other What is the name of the Terminal Concentrator connected to the serial port of phys-hahost1 [NO_NAME]? sc-tc Is 123.456.789.1 the correct IP address for this Terminal Concentrator (yes | no) [yes]? yes What is the password for root of the Terminal Concentrator [?] Please enter the password for root again [?] This process is performed for each node in the cluster. When one of the nodes in a cluster suddenly ceases to respond across the cluster interconnect, it might be out of control and must be stopped immediately. In a smaller two-node cluster, you can use the SCSI reserve feature to deny data access to the failed node. In a three-node or four-node cluster, you cannot use SCSI reservation. One of the surviving nodes in a larger cluster uses the telnet command to communicate directly into the TC and send a UNIX abort to the failed node. Note The domains on an E10000 do not have serial ports, so you must implement failure fencing differently from other servers. The E10000 mechanism requires access to the SSP, so you are prompted for SSP information if the E10000 architecture is specied.
5-21
5
Node Locking
If your cluster has more than two nodes and uses direct attach storage devices, such as the StorEdge A5000, you are asked to select a node locking port. You are asked for the number of an unused terminal concentrator port. Does the cluster have a disk storage device that is connected to all nodes in the cluster [no]? yes Which unused physical port on the Terminal Concentrator is to be used for node locking: 6 Figure 5-2 demonstrates the basic node locking feature. The rst node entering the cluster makes a telnet connection to the designated locking port. This prevents another node from making a similar connection. In the diagram, a node with a failed interconnect is no longer a cluster member. An operator might mistakenly try to start another cluster on the failed node. The startup fails because the locking port is already reserved. telnet tc_concentrator 5006 Ethernet
TC 1 2 3 4 5 6
Node 0
Node 1
Node 2
Node 3
Direct attached storage Figure 5-2 Node Locking Overview
5-22

5
Quorum Device
In a two-node cluster with both nodes attached to the same storage array, if communication between the nodes is lost, you must prevent one node from accessing the storage array in an uncontrolled manner. Traditionally, a disk drive is assigned as a quorum device and both nodes race to reserve it. The rst node to reserve the quorum device remains a cluster member. The losing node must abort clustered operation. The following example shows the quorum device selection for a twonode cluster. Getting device information for reachable nodes in the cluster. This may take a few seconds to a few minutes...done Select quorum device for the following nodes: 0 (phys-hahost1) and 1 (phys-hahost2) 1) SSA:000000779A16 2) SSA:000000741430 3) DISK:c0t1d0s2:01799413 Quorum device: 1 ... SSA with WWN 000000779A16 has been chosen as the quorum device. Finished Quorum Selection
5-23
5
Quorum Device
Quorum devices are shared between nodes than can both master the device. As shown in Figure 5-3, the potential resource masters are more complicated in a ring topology conguration.
Node 0
Node 1
Node 2
Resource 1 Figure 5-3

G G G
Resource 2
Resource 3
Potential Masters in a Ring Topology
Node 0 and Node 2 can master Resource 1 Node 0 and Node 1 can master Resource 2 Node 1 and Node 2 can master Resource 3
Note During the Sun Cluster software installation on a ring topology, you are asked to select a quorum device for each pair of resource masters. If the cluster uses direct attach storage arrays, such as the Sun StorEdge A5000, all cluster hosts can master a single storage resource, so you must dene only one quorum disk.
5-24

5
Partitioned Cluster Control
In a three-node or four-node cluster, simultaneous multiple cluster failures can create two separate clusters. This is called a partitioned cluster. Generally, potential partitioning is detected by a large and sudden change in cluster membership. One of the potential partitions must abort clustered operation. You can select either automatic partition selection or operator intervention. In case the cluster partitions into subsets, which subset should stay up? ask) the system will always ask the operator. select) automatic selection of which subset should stay up. Please enter your choice (ask|select) [ask]: select You have a choice of two policies: lowest -- The subset containing the node with the lowest node ID value automatically becomes the new cluster. All other subsets must be manually aborted. highest -- The subset containing the node with the highest node ID value automatically becomes the new cluster. All other subsets must be manually aborted. Select the selection policy for handling partitions (lowest|highest) [lowest]: highest If you congure the cluster partitioning to ask, all nodes display continuous abortpartition/continuepartition messages until the cluster operator decides which pair of systems should continue. Note If you do not congure a quorum device in a two-node cluster, it always initiates the partitioned behavior instead of simply racing for a quorum disk.
5-25
Application Conguration
The nal stage of the Sun Cluster host software installation requires you to identify the combination of data services you intend to support on the cluster. Depending on the choices you make, various support packages are installed. Note The HA-NFS data service support is installed by default in all installations. There are no conguration questions asked about it during the Sun Cluster host system installation.
5-26

5
Application Conguration
You can select multiple data service support when installing the cluster host software. Select item 12 when you have selected all the data services you want. ==== Select Data Services Menu ==================== Please select which of the following data services are to be installed onto this cluster. Select singly, or in a space separated list. Note: HA-NFS and Informix Parallel Server (XPS) are installed automatically with the Server Framework. You may de-select a data service by selecting it a second time. Select DONE when finished selecting the configuration. 1) Sun Cluster HA for Oracle 2) Sun Cluster HA for Informix 3) Sun Cluster HA for Sybase 4) Sun Cluster HA for Netscape 5) Sun Cluster HA for Netscape LDAP 6) Sun Cluster HA for Lotus 7) Sun Cluster HA for Tivoli 8) Sun Cluster HA for SAP 9) Sun Cluster HA for DNS 10) Sun Cluster for Oracle Parallel Server INSTALL 11) No Data Services 12) DONE Choose a data service: 1 4 6 12 Note You will not see the Oracle Parallel Server (OPS) option unless you selected the Cluster Volume Manager earlier in the installation.
5-27
Post-Installation Conguration
Although post-installation can include a wide range of tasks, such as installing a volume manager, this section focuses only on the postinstallation tasks that are most critical. Note If your cluster uses the SCI cluster interconnect, you must complete its conguration before attempting to start or use the cluster.
5-28

5
Installation Verication
When you have completed the Sun Cluster software installation on the cluster host systems, you should verify that the basic cluster conguration information is present by using the scconf command. The following example is typical of a newly congured cluster. # scconf sc-cluster -p /etc/opt/SUNWcluster/conf/sc-cluster.cdb Checking node status... Current Configuration for Cluster sc-cluster: Hosts in cluster: phys-node0 phys-node1 phys-node2 Private Network Interfaces for phys-node0: be0 be1 phys-node1: be0 be1 phys-node2: be0 be1 Quorum Device Information
Logical Host Timeout Values : Step10 : 720 Step11 : 720 Logical Host : 180 Cluster TC/SSP Information phys-node0 TC/SSP, port : 129.150.218.35, 2 phys-node1 TC/SSP, port : 129.150.218.35, 3 phys-node2 TC/SSP, port : 129.150.218.35, 4 sc-cluster Locking TC/SSP, port : 129.150.218.35, 6 Note You should run the scconf command on each of the congured cluster host systems to verify that their conguration database les agree.
5-29
5
Correcting Minor Conguration Errors
When the Sun Cluster software is installed, some common mistakes are:
G G G
Using incorrect node names Using incorrect node Ethernet address Using incorrect CIS interface assignments
These simple mistakes can be resolved using the scconf command. Some examples of post installation corrections follow.
Correcting Node Names

To change one or more incorrect node names: # scconf dbcluster -h dbms1 dbms2 dbms3 dbms4 Note Even if only one node name is incorrect, you should enter the names of the other nodes in the cluster.
Correcting Node Ethernet Addresses

To change an incorrect Ethernet address for a node: # scconf dbcluster -N 1 80:40:33:ff:b0:10 Note The nodes are specied by their number (0,1,2,3).
Correcting Private Interconnect Assignments

To change the Ethernet interconnect assignment for a node: # scconf dbcluster -i dbms2 hme2 hme4
5-30

5
Software Directory Paths
After the Sun Cluster host system software is installed, you should set new software directory paths.
General Search Paths and Man Paths

On all nodes, set your PATH to include:
G G G G
/sbin /usr/sbin /opt/SUNWcluster/bin /opt/SUNWpnm/bin
On all nodes set your MANPATH to include:

G
/opt/SUNWcluster/man
Volume Manager Specic Paths

For SSVM and CVM, set your PATH to include:
G G
/opt/SUNWvxva/bin /etc/vx/bin
For SSVM and CVM, set your MANPATH to include:

G G
/opt/SUNWvxva/man /opt/SUNWvxvm/man
For Solstice DiskSuite, set your PATH to include:

G
/usr/opt/SUNWmd/sbin
For Solstice DiskSuite, set your MANPATH to include:

G
/usr/opt/SUNWmd/man
5-31
5
After you install the Sun Cluster host system software and before you reboot the cluster host systems, you must perform an additional conguration on the SCI interconnects as follows: 1. Add the necessary SCI and cluster information to the template le on all nodes located in the /opt/SUNWsma/bin/Examples directory. Run the /opt/SUNWsma/bin/sm_config script on one node specifying the template le name.
2.
SCI Template File for a Two-Node Cluster

The following example shows a point-to-point SCI template le. Notice that the names of the future potential nodes have been included and commented out. This is mandatory. Cluster is configured as = HOST HOST HOST HOST 0 1 2 3 = = = = sec-0 sec-1 _%sec-2 _%sec-3 SC
Number of Switches in cluster = 0 Number of Direct Links in cluster = 2 Number of Rings in cluster = 0 host 0 :: adp 0 is connected to = link 0 :: endpt 0 host 0 :: adp 1 is connected to = link 1 :: endpt 0 host 1 :: adp 0 is connected to = link 0 :: endpt 1 host 1 :: adp 1 is connected to = link 1 :: endpt 1 Network IP address for Link 0 = Network IP address for Link 1 = Netmask = f0 204.152.65 204.152.65
5-32

5
SCI Template File for a Three-Node Cluster
The following examples show the SCI template le differences associated with using SCI switches in a three-node cluster. Cluster is configured as = HOST 0 HOST 1 HOST 2 = = = sec-0 sec-1 sec-2 2 0 SC
Number of Switches in cluster = Number of Direct Links in cluster = Number of Rings in cluster = 0 host host host host host host 0 0 1 1 2 2 :: :: :: :: :: :: adp adp adp adp adp adp 0 1 0 1 0 1 is is is is is is connected connected connected connected connected connected to to to to to to = = = = = =
switch switch switch switch switch switch
0 1 0 1 0 1
:: :: :: :: :: ::
port port port port port port
0 0 1 1 2 2
Network IP address for Switch 0 = Network IP address for Switch 1 = Netmask = f0
204.152.65 204.152.65
Warning You must run the sm_config script any time SCI components have been moved or replaced, or cables have been switched. It should be run only from one node.
5-33
5
Exercise: Installing the Sun Cluster Server Software
M M
Install the Sun Cluster server software Complete the post-installation SCI interconnect conguration if appropriate Congure environmental variables

Review the lab objectives and classroom setup with students. Mention to the students that the Solaris Operating Environment should already be installed on the administration workstation and cluster nodes. Also remind students that the Terminal Concentrator was installed during the hardware installation lab.
Preparation
Obtain the following information from your instructor: 1. Ask your instructor which volume manager is to be installed on your assigned cluster. Volume manager: _______________
5-34

5
Tasks
G G G G G G G G
Updating the name service Installing Solaris operating system patches Verifying storage array rmware revisions Recording the cluster host Ethernet addresses Installing the Sun Cluster server software Performing post installation SCI interconnect conguration Preparing the cluster host root environment Verifying basic cluster operation
Update the Name Service

1. Edit the /etc/hosts le on the administrative workstation and all cluster nodes and add the IP addresses and hostnames of the administrative workstation and cluster nodes. 2. If you are using NIS or NIS+, add the IP addresses and hostnames to the name service. Note Your lab environment might already have all of the IP addresses and host names entered in the /etc/hosts le.
Installing Solaris Operating System Patches

1. Ask your instructor if any Solaris operating system patches should be installed on the cluster host systems. 2. Reboot all server nodes after installing the patches.

Remind the students to install all patches on all of the server nodes.
5-35
5
Storage Array Firmware Revision
If you need to upgrade the SPARCstorage Array (SSA) or A5000 rmware, you must install the correct patch on all the cluster hosts. In addition, you must download the rmware to all downlevel storage arrays. Instructions are included in the README le for the patch. If you are using Multipacks, you must ensure that the drive rmware is at the proper level. See the Sun Cluster Release notes for more information.
Installation Preparation
Caution Remember to perform the installation in parallel on all cluster host systems. Wait until all nodes have completed each step before proceeding to the next installation step 1. Record the Ethernet address of each of your cluster host systems on the Cluster Name and Address Information section on page A-2. Note Type the ifconfig -a command in the cconsole common window.
5-36

5
Server Software Installation
Note Do not congure a shared CCD volume during installation. 1. In the cconsole common window, log in to each of the cluster hosts systems as user root. 2. Change to the Sun Cluster software location for the correct operating system version (Sun_Cluster_2_2/Sol_2.6/Tools). 3. Start the scinstall script at the same time on all cluster hosts. 4. As the installation proceeds, make the following choices: a. Install the Server option
b. Choose the automatic mode of installation c. Choose your assigned volume manager.
d. Use the assigned cluster name e. f. Make the number of potential and initially congured nodes the same. Select the private network interface that is appropriate for your cluster (Ethernet or SCI).
g. Supply the Ethernet interconnect controller names, cluster host names, and Ethernet addresses, as appropriate, for each of the cluster host systems. h. Answer yes for data service support. i. j. Answer no to setting up the logical hosts. Congure failure fencing, if you are prompted.
k. Congure the node locking port, if you are prompted.
5-37
5
Server Software Installation
l. Congure the quorum device(s) for your cluster if you are prompted.
Note Ask your instructor for help with the quorum devices if you are unsure about how to respond to the installation questions. m. If you are asked about cluster partitioning behavior, choose the ask option. n. Select either the HA Oracle or Oracle Parallel Server (OPS) data service (OPS requires CVM to be installed). 5. After the installation is complete, practice using the scinstall List and Verify options. 6. Quit the scinstall program. 7. Apply any Sun Cluster patches as directed by your instructor. However, you must install either the Sun Cluster patch 107388 or 107538. Note Do not reboot your cluster hosts at this time.

Skip this task if your cluster uses the Ethernet-based interconnect. To complete the SCI interconnect installation, you must create a special template le and run the sm_config program. 1. On all nodes, copy the appropriate SCI template le into the /opt/SUNWsma/bin directory. # cd /opt/SUNWsma/bin/Examples # cp switch1.sc .. # cd ..
5-38

5
SCI Interconnect Conguration (Continued)
Note There are also sample SCI template les in the Scripts/SCI training le directory. There are versions for two and three-node clusters. You can use them as a reference. 2. On all nodes, edit the template les and change the host names and number of hosts associated with the HOST variables to match your cluster potential node conguration.
Warning The sm_config script should be run only on one node if all nodes in the cluster can communicate using a public network. 3. On Node 0, run the sm_config script. # ./sm_config -f ./template_name Although a lot of conguration information is output by this command, the most important point is that there are no obvious errors. 4. Verify the installation on all cluster host systems by checking the contents of the /etc/sma.ip and /etc/sma.config les. 1 17 2 18 3 19 IP IP IP IP IP IP address address address address address address = = = = = = 0 1 0 1 0 1 0 0 1 1 2 2 0 1 0 1 0 1
# more /etc/sma.config 0 sec-0 0 8 0 sec-0 1 c 1 sec-1 0 48 1 sec-1 1 4c 2 sec-2 0 88 2 sec-2 1 8c # more /etc/sma.ip Hostname = sec-0 scid0 Hostname = sec-0 scid1 Hostname = sec-1 scid0 Hostname = sec-1 scid1 Hostname = sec-2 scid0 Hostname = sec-2 scid1
204.152.65.1 204.152.65.17 204.152.65.2 204.152.65.18 204.152.65.3 204.152.65.19
5-39
5
Cluster Reboot
At this point, you might have installed some patches and congured a SCI-based cluster interconnect. Regardless, you must reboot all of your cluster host systems. 1. Reboot all of your cluster host systems. 2. Verify the cluster interconnect functionality using the get_ci_status command.
Conguration Verication
1. Use the scconf command on all cluster host systems to verify that the basic CDB database information is correct. # scconf clustername -p Note A quorum disk is not congured in SDS installations. 2. If you identify any errors in the scconf -p command output, discuss the correction process with your instructor.
5-40

5
Testing Basic Cluster Operation
As a checkpoint test, use the following procedure to verify the basic cluster software operation. Note You are using commands that have not yet been presented in the course. If you have any questions, please consult with your instructor. 1. Log in to each of your cluster host systems as user root. 2. Start the cluster software only on Node 0. Substitute the name of your node and cluster for node0_name and cluster_name, respectively. # scadmin startcluster node0_name cluster_name Note This can take 1 to 2 minutes. It must complete before proceeding. It is complete when you see the finished message. 3. After the initial cluster startup has completed on Node 0, start the Sun Cluster software on all remaining nodes. # scadmin startnode Warning When joining multiple nodes to the cluster, you must enter the scadmin startnode command on each node at exactly the same time. If you cannot start them at exactly the same time, then join them one at a time, letting each join complete before starting the next. If you are not careful, you can cause CCD database corruption. 3. After all the nodes have joined the cluster, check the cluster status by typing the get_node_status command on all nodes. Note The get_node_status command is located in the /opt/SUNWcluster/bin directory.
5-41
5
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K Install the Sun Cluster host system software Correctly interpret conguration questions during Sun Cluster software installation on the cluster host systems Perform post-installation conguration
5-42

5
Think Beyond
As you add additional nodes to the cluster, what might you need to do on the existing nodes? Can you do this while the nodes are running? What kinds of conguration changes need to be made simultaneously on all nodes? How can you tell? What would happen if you did not make them simultaneously? What would happen if you did not specify any quorum devices or use failure fencing?
5-43
System Operation
Objectives
G G
Use the cluster administration tools Use the Sun Cluster Manager (SCM) Graphical User Interface (GUI) Use the get_node_status and hastat status commands List the Simple Network Management Protocol (SNMP) features
G G
This module introduces cluster monitoring, operations software, and new administrative features.
6-1
6
Relevance

Discussion The following questions are relevant to your understanding of the modules content: 1. What needs to be monitored in the Sun Cluster environment? 2. How current does the information need to be? 3. How detailed does the information need to be? 4. What types of information are available?
6-2

6
G G
System Operation
6-3

Several administration tools are used to monitor and operate a Sun Enterprise cluster system. These tools run either locally on cluster host nodes or from the administration workstation. This module reviews and expands on some previously presented concepts and material, but its main objective is to introduce new administrative tools and commands.
6-4

6
As shown in Figure 6-1, the Cluster Control Panel is initiated on the cluster administration workstation, but the remainder of the cluster control and status tools are run locally on each of the host systems. Administration workstation
Network
Terminal Concentrator Network interface
Serial ports
Node 0
Sun Cluster Manager # scadmin # hastat
Figure 6-1
System Operation
6-5

Basic Cluster Control (scadmin)
You use the scadmin command to perform cluster control operations. Note The scadmin command has a number of specialized options that you cannot use until the cluster is fully congured. For that reason, these options are not discussed until later in the course.
6-6

6
Basic Cluster Control (scadmin)
Command Options for scadmin
The following scadmin command options are related only to starting or stopping the cluster. scadmin startcluster local_node clustname scadmin startnode [ clustname ] scadmin stopnode [ clustname ]
Starting the First Node in a Cluster

You must join the rst node to the cluster with the startcluster option. The node name and cluster name must be furnished. # scadmin startcluster node0 eng-cluster Note This command must complete successfully before you can join other nodes to the cluster.
Adding and Removing Nodes in the Cluster

You can start and stop additional nodes simultaneously. The cluster and node names are assumed from the CDB database. # scadmin startnode # scadmin stopnode Warning When starting or stopping additional nodes in the cluster, they must either be started of stopped at exactly the same time or each node must be allowed to complete before starting or stopping the next node. If you do not do this, you can corrupt the CCD.
System Operation
6-7

The Cluster Control Panel is a convenient way to start some of the commonly used cluster administration tools without having to remember the program names and locations. You can customize the Cluster Control Panel to control any desired application.
6-8

6
Starting the Cluster Control Panel
To start the Cluster Control Panel, shown in Figure 6-2, type the following command on the administration workstation: # /opt/SUNWcluster/bin/ccp [clustname] &
Figure 6-2 Cluster Control Panel Display
Adding New Applications to the Cluster Control Panel

You can access the Cluster Control Panel New Item display, shown in Figure 6-3, from the Properties menu. It allows you to add custom applications and icons to the Cluster Control Panel display.
Figure 6-3 Cluster Control Panel New Item Menu
System Operation
6-9
6
Console Tool Variations
Although the following three console tools look and behave the same, the method used to access the cluster host systems is different for each.
G
Cluster console (console mode) You use the TC interface to access the host systems. The Solaris Operating Environment does not have to be running on the cluster host systems. Only one connection at a time can be made through the TC.
Cluster console (rlogin mode) You use the rlogin command to access the host systems through the public network. The Solaris Operating Environment must be running on the cluster host systems.
Cluster console (telnet mode) You use the telnet command to access the host systems through the public network. The Solaris operating system must be running on the cluster host systems.
Note The rlogin and telnet versions of the cluster console tool are useful when there is a problem with the primary access path that uses the TC.
6-10

The hastat Command

The hastat command is run locally on each cluster host and provides a large amount of information in a single output. It is presented in sections here although you see all of the information at once. You can run the hastat command on any node that is a cluster member. It provides global information about all of the cluster members.
System Operation
6-11
6
The hastat Command
General Cluster Status
The hastat command provides general cluster information about the names of congured hosts, the current membership, and the uptime of each cluster host. # hastat Getting Information from all the nodes ...... HIGH AVAILABILITY CONFIGURATION AND STATUS ------------------------------------------LIST OF NODES CONFIGURED IN <sc-cluster> CLUSTER sc-node0 sc-node1 sc-node2 CURRENT MEMBERS OF THE CLUSTER sc-node0 is a cluster member sc-node1 is a cluster member sc-node2 is a cluster member CONFIGURATION STATE OF THE CLUSTER Configuration State on sc-node0: Stable Configuration State on sc-node1: Stable Configuration State on sc-node2: Stable UPTIME OF NODES IN THE CLUSTER uptime of sc-node0: 12:47am up 1:38, 1 user, load average: 0.14, 0.12, 0.10 uptime of sc-node1: 12:47am up 1:37, 1 user, load average: 0.16, 0.12, 0.10 uptime of sc-node2: 12:50am up 1:38, 1 user, load average: 0.16, 0.12, 0.10
6-12

6
The hastat Command
Logical Host Conguration
The logical host portion of the hastat command output lists the logical hosts that are mastered by each node and indicates which node is the designated backup for each logical host. LOGICAL HOSTS MASTERED BY THE CLUSTER MEMBERS Logical Hosts Mastered on sc-node0: sc-dbms Loghost Hosts for which sc-node0 is Backup Node: sc-nfs Logical Hosts Mastered on sc-node1: sc-nfs Loghost Hosts for which sc-node1 is Backup Node: sc-inetpro Logical Hosts Mastered on sc-node2: sc-inetpro Loghost Hosts for which sc-node2 is Backup Node: sc-dbms LOGICAL HOSTS IN MAINTENANCE STATE None
System Operation
6-13
6
The hastat Command
Private Network Status
The hastat command provides information about the private network connections that are used in the cluster interconnect system. STATUS OF PRIVATE NETS IN THE CLUSTER Status of Interconnects on sc-node0: interconnect0: selected interconnect1: up Status of private nets on sc-node0: To sc-node0 - UP To sc-node1 - UP To sc-node2 - UP Status of Interconnects on sc-node1: interconnect0: selected interconnect1: up Status of private nets on sc-node1: To sc-node0 - UP To sc-node1 - UP To sc-node2 - UP Status of Interconnects on sc-node2: interconnect0: selected interconnect1: up Status of private nets on sc-node2: To sc-node0 - UP To sc-node1 - UP To sc-node2 - UP
6-14

6
The hastat Command
Public Network Status
The hastat command provides conguration and status information about the NAFO groups that are congured on each of the cluster hosts. STATUS OF PUBLIC NETS IN THE CLUSTER Status of Public Network On sc-node0: bkggrp nafo113 r_adp hme1 status OK fo_time NEVER live_adp hme1
Status of Public Network On sc-node1: bkggrp nafo113 r_adp hme1 status OK fo_time NEVER live_adp hme1
Status of Public Network On sc-node2: bkggrp nafo113 r_adp hme0 status OK fo_time NEVER live_adp hme0
System Operation
6-15
6
The hastat Command
Data Service Status
The hastat command provides conguration and status information about the data services that are congured on each of the cluster hosts. STATUS OF SERVICES RUNNING ON LOGICAL HOSTS IN THE CLUSTER Status Of Data Services Running On sc-node0 Data Service HA-SYBASE: No Status Method for Data Service dns Data Service HA-NFS: On Logical Host sc-dbms:
Ok
Status Of Data Services Running On sc-node1 Data Service HA-SYBASE: No Status Method for Data Service dns Data Service HA-NFS: On Logical Host sc-nfs:
Ok
Status Of Data Services Running On sc-node2 Data Service HA-SYBASE: No Status Method for Data Service dns Data Service HA-NFS: On Logical Host sc-inetpro:
Ok
6-16

6
The hastat Command
Cluster Error Messages
The hastat command gathers and displays the most recent error messages from each of the cluster hosts. RECENT ERROR MESSAGES FROM THE CLUSTER Recent Error Messages on sc-node0 Feb 2 00:24:20 sc-node0 unix: sbusmem51 at sbus3: SBus3 slot 0x3 offset 0x0 Feb 2 00:24:20 sc-node0 unix: sbusmem51 is /sbus@7,0/sbusmem@3,0 Feb 2 00:36:31 sc-node0 ID[SUNWcluster.ha.hareg.2004]: Service dns is registered Recent Error Messages on sc-node1 Feb 2 00:24:22 sc-node1 unix: sbusmem45 at sbus2: SBus2 slot 0xd offset 0x0 Feb 2 00:24:22 sc-node1 unix: sbusmem45 is /sbus@6,0/sbusmem@d,0 Feb 2 00:24:22 sc-node1 unix: sbusmem48 at sbus3: SBus3 slot 0x0 offset 0x0 Recent Error Messages on sc-node2 Feb 2 00:27:05 sc-node2 unix: sbusmem13 at sbus0: SBus0 slot 0xd offset 0x0 Feb 2 00:27:05 sc-node2 unix: sbusmem13 is /sbus@1f,0/sbusmem@d,0 Feb 2 00:27:05 sc-node2 unix: sbusmem14 at sbus0: SBus0 slot 0xe offset 0x0
System Operation
6-17
Sun Cluster Manager Overview

The Sun Cluster Manager (SCM) provides detailed cluster status information. It is a Java application that you can run in the following three ways:
G G G
Directly as a standalone application From a local browser From a remote browser
Note Remote browser operation requires that you congure the cluster host systems as HTTP servers. You must congure the Apache HTTP server software on each of the cluster hosts.
6-18

6
The Sun Cluster Manager Overview
Sun Cluster Manager Startup
When the Sun Cluster software is started on a node, the /opt/SUNWcluster/scmgr/lib/scmgr_start script runs automatically. This script starts a Java process that functions as the information server for the SCM graphical application. Currently, you must install the SUNWscmgr package on each cluster node along with the following patch:
G G
107388 if you are running Solaris 2.6 107538 if you are running Solaris 7
Once you install the appropriate patch, you can start the SCM user interface as follows: # /opt/SUNWcluster/bin/scmgr local_host_name Note The user application is started on only one node in the cluster. Running the application directly requires that you set the DISPLAY variable to the administration workstation. The Sun Cluster software must be running.
System Operation
6-19
6
The Sun Cluster Manager Overview
Initial Sun Cluster Manager Display
As shown in Figure 6-4, the initial SCM display shows an icon for the entire cluster. You can either click on the icon or select the Properties tab to display the cluster conguration.
Figure 6-4
Initial Sun Cluster Manager Display
6-20

Sun Cluster Manager Displays

The SCM user interface has several displays that can supply detailed information about the physical cluster components and their operational status. The SCM user interface provides a great deal of centralized information and can be a valuable time saving tool for the administrator.
System Operation
6-21
6
SCM Cluster Conguration Display
The SCM Cluster Conguration display, shown in Figure 6-5 and accessed from the Properties tab, provides a visual tree representation of the cluster conguration. As you select each component in the conguration tree, detailed information about the component is displayed.
Figure 6-5
SCM Properties Display
6-22

6
SCM Cluster Conguration Display
Conguration Tree
You can expand the cluster conguration tree by selecting the node points that contain a plus (+) sign. A node that contains a minus (-) sign collapses when selected. The example in Figure 6-6 shows that the node p-n0-b has not been expanded yet. You can expand the private network for the node p-n1-b even further.
Figure 6-6
SCM Conguration Tree
System Operation
6-23
6
SCM Cluster Events Viewer
The SCM Cluster Events viewer, shown in Figure 6-7 and accessed from the Events tab, displays a list of signicant events associated with the item selected in the conguration tree. The events are not system errors but are associated with state changes for the selected item.
Figure 6-7
SCM Events Viewer
6-24

6
System Log Filter
The Syslog function, shown in Figure 6-8 and accessed from the Syslog tab, displays system log le entries that are related to the item selected in the conguration tree. The entries are retrieved from the /var/adm/messages les on the cluster hosts.
Figure 6-8
SCM Syslog Filter
System Operation
6-25
6
The SCM Help Display
The SCM help display is started from Help menu in the main SCM window. A HotJava browser is then displayed, as shown in Figure 6-9.
Figure 6-9
SCM Help Display
6-26

Sun Cluster SNMP Agent

The Sun Cluster software provides full SNMP support as an alternative to the GUI-based monitoring tools. The software collects detailed conguration and status information. You can obtain full cluster status from SNMP network management workstations.
System Operation
6-27
6
SNMP Agent Overview
The sequence of events that occurs when the Management Information Base (MIB) tables gather cluster information is as follows: 1. The Super Monitor agent smond connects to in.mond on all of the requested cluster nodes. 2. smond passes the collected config and syslog information to the snmpd daemon. 3. snmpd lls in the cluster MIB tables, which are now available to clients through SNMP GET operations. 4. snmpd sends out enterprise-specic traps for critical cluster events, as notied by the smonds syslog data.
Cluster MIB Tables

The following tables provide information about the clusters:
G G G G G G G
clustersTable clusterNodesTable switchesTable portsTable lhostTable Logical host characteristics dsTable Congured data service characteristics dsinstTable Data service instance information
Note Each table provides several attributes that describe specic information about major functional sections of the clusters.
6-28

6
SNMP Traps
The following SNMP traps are generated for critical cluster events: Table 6-1 Trap No. 0 1 4 11 21 31 100 106 110 200 Sun Cluster SNMP Traps Trap Name sc:stopped sc:aborted sc:excluded vm:down db:up vm_on_node:slave SOCKET_ERROR:node_out_of_system_resources UNREACHABLE_ERROR:nodes_mond_unreachable: network_problems?? SHUTDOWN_ERROR:nodes_mond_shutdown Fatal:super_monitor_daemon(smond)_exited!!
Note For more information about using and troubleshooting SNMP features, see the Sun Cluster 2.2 System Administration Guide.
System Operation
6-29
6
Conguring the Cluster SNMP Agent Port
By default, the cluster SNMP agent listens on User Datagram Protocol (UDP) Port 161 for requests from the SNMP manager, for example, SunNet Manager Console. You can change this port by using the -p option to the snmpd and smond daemons. You must congure both the snmpd and smond daemons on the same port to function properly.
Caution If you are installing the cluster SNMP agent on an SSP or an Administrative workstation running the Solaris 2.6 Operating Environment or compatible versions, always congure the snmpd and the smond programs on a port other than the default UDP port 161. For example, with the SSP, the cluster SNMP agent interferes with the SSP SNMP agent, which also uses UDP port 161. This interference could result in the loss of RAS features of the Sun Enterprise 10000 server.
6-30

6
Exercise: Using System Operations
G G G G G G
Start the Cluster Control Panel Start the Sun Cluster Manager software Use the SCM Cluster Conguration display Use the SCM Events Viewer display Use the SCM Syslog Filter display Use the SCM Help display
Preparation
1. You must verify that the SCM software is installed on each of your cluster host systems: # pkginfo SUNWscmgr 2. Verify that the appropriate SCM patch is installed on each of your cluster host systems: # showrev -p |grep 107388 (for Solaris 2.6 Operating Environment) # showrev -p |grep 107538 (for Solaris 7 systems) Note The scmgr program does not function without the appropriate patch installed. 3. If necessary, install the appropriate SCM patch on each cluster host. Be sure and reboot the cluster hosts after patch installation so that the svmgr process will start. Warning You must stop the Sun Cluster software on all nodes before rebooting the system. 4. All nodes should be joined in the cluster and the cconsole tool should be running from the previous lab.
System Operation
6-31
6
Tasks
G G G
Starting the Cluster Control Panel Using the hastat command Using the Sun Cluster Manager
Starting the Cluster Control Panel

1. Start the Cluster Control Panel (CCP) on the administration workstation. Substitute your cluster name. # ccp clustername & 2. 3. Start the telnet mode of the cluster console tool by double-clicking on the telnet mode icon. After practicing using the telnet mode of cconsole, log out of all the telnet cluster host windows and quit the telnet cluster console tool. Practice using the CCP cluster help tool. Pay particular attention to the glossary feature. Quit the Cluster Control Panel.
4. 5.
Using the hastat Command

1. Use the cconsole common window and type the hastat command on each of the cluster hosts. # hastat | more 2. Compare the output of the cluster hosts. 3. Type the get_node_status command and compare its output to that of the hastat command. It is useful for a quick status check.
6-32

6
Using the Sun Cluster Manager
1. On each cluster host, verify that the DISPLAY variable is set to the administration workstation. # echo $DISPLAY 2. On the administration workstation, type the xhost command in the console window and verify that access control is disabled. # xhost access control disabled, clients can connect from any host 3. Log in to any cluster host that is in clustered operation. 4. Verify that the SCM manager software is running on all of the cluster hosts. # ps -ef |grep java 5. Start the SCM client software on one of the cluster hosts. # hostname # /opt/SUNWcluster/bin/scmgr hostname & 6. If you do not see the SCM display on your administration workstation after a short time, ask your instructor for help. 7. Select the Cluster Conguration display and practice navigating through the conguration tree. 8. Select the Cluster Events Viewer display and practice viewing the events related to different items in the conguration tree. 9. Select the System Log display and practice viewing system log le messages related to different items in the conguration tree. Note The Previous and Next buttons scroll through the messages one buffer contents at a time.
System Operation
6-33
6
Using the Sun Cluster Manager
10. Display the Help menu and select some of the help items. 11. Practice using the HotJava Help viewer for a while, then quit it. 12. Quit the SCM application. 13. Stop the SC software on all cluster hosts using the scadmin stopnode command.
6-34

6
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
System Operation
6-35
6
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K Use the Cluster administration tools Use the Sun Cluster Manager GUI Use the hastat status command List the Simple Network Management Protocol (SNMP) features
6-36

6
Think Beyond
What would it be like to administer a cluster with 16 nodes and 200 storage arrays? What would it be like to administer a cluster with each cluster member located in a different city? In general, how does SNMP interact with the cluster environment?
System Operation
6-37
Volume Management Using CVM and SSVM

Objectives
G
Explain the disk space management technique used by the Cluster Volume Manager (CVM) and the Sun StorEdge Volume Manager (SSVM) Describe the initialization process necessary for CVM and SSVM Describe how the CVM and SSVM products group disk drives List the basic status commands for CVM and SSVM Describe the basic software installation process CVM and SSVM List post-installation issues relevant to CVM and SSVM Install and congure either CVM or SSVM
G G G G G G
This module introduces some of the basic concepts common to the Cluster Volume Manager and the Sun StorEdge Volume Manager.
7-1
7
Relevance

Discussion The following questions are relevant to your learning the material presented in this module: 1. Which volume manager features are the most important to clustered systems? 2. What relationship do the volume managers have to normal cluster operation? 3. Are there any volume manager feature restrictions when they are used in the Sun Cluster environment?
7-2

7
G G G

7-3
Disk Space Management

The Cluster Volume Manager (CVM) and Sun StorEdge Volume Manager (SSVM), use the same method to manage disk storage space. CVM and SSVM manage data in a non-partitioned environment. They manage disk space by maintaining tables that associate a list of contiguous disk blocks with a data volume structure. A single disk drive can potentially be divided into hundreds of independent data regions.
7-4

7
CVM and SSVM Disk Space Management
As shown in Figure 7-1, CVM and SSVM maintain detailed conguration records that equate specic blocks on one or more disk drives with virtual volume structures.
Blocks 1000 3000 Slice 4 Blocks 3001 5500 Blocks 5501 10000 Blocks 10001 12000 Physical disk drive
Volume 01
Volume 02
Volume 03 Figure 7-1 CVM and SSVM Space Management
Both CVM and SSVM divide a disk into a single slice and then allocate portions of the slice to volume structures.

7-5
CVM and SSVM Initialization

When a physical disk drive is initialized by CVM or SSVM, it is divided into two sections called the private region and the public region. The private and public regions are used for different purposes.
G G
The private region is used for conguration information. The public region is used for data storage
The private region is small. It is usually congured as slice 3 on the disk and at most is a few cylinders in size. The public region is the rest of the disk drive. It is usually congured as slice 4 on the disk. Note You must specially initialize all disk drives that are used by CVM or SSVM.
7-6

7
Private Region Contents
The size of the private region, by default, is 1024 sectors (512Kbytes). It can be enlarged if a large number of Volume Manager (VM) objects are anticipated. If you anticipate having over 2000 VM objects, you should increase the size of the private region. The private region contents are:
G
Disk header Two copies of the le that denes and maintains the host name or cluster name, unique disk ID, disk geometry information, and disk group association information.
Table of contents The disk header points to this linked list of blocks.
Conguration database This database contains persistent conguration information for all of the disks in a disk group. It is usually referred to as the configdb record.
Disk group log This log is composed of kernel written records of certain types of actions, such as transaction commits, plex detaches resulting from I/O failures, dirty region log failures, rst write to a volume, and volume close. It is used after a crash or clean reboot to recover the state of the disk group just before the crash or reboot.
Public Region Usage

Volumes are created from plexes. A plex is built from one or more subdisks. Subdisks are areas of the public region that are mapped and controlled by CVM and SSVM. The public region can be a single large subdisk or many smaller subdisks. You can assemble subdisks from many different physical disk drives into a plex that is then associated with a volume name. Subdisks are not partitions.

7-7
7
Private and Public Region Format
The private and public region format of an initialized SSVM disk can be veried with the prtvtoc command. As shown in the following example, slice 2 is dened as the entire disk. Slice 3 has been assigned tag 15 and is 2016 sectors in size. Slice 4 has been assigned tag 14 and is the rest of the disk. In this example, the private region is the rst two cylinders on the disk. The disk is a 1.05-Gbyte disk and a single cylinder only has 1008 sectors or blocks which does not meet the 1024 sector minimum size for the private region. This is calculated by using the nhead=14 and nsect=72 values for the disk found in the /etc/format.dat le. # prtvtoc /dev/rdsk/c2t4d0s2 First Sector 0 0 2016 Sector Count 2052288 2016 2050272 Last Sector 2052287 2015 2052287
Partition 2 3 4
Tag 5 15 14
Flags 01 01 01
Initialized Disk Types

By default, SSVM initializes disk drives with the type Sliced. There are other possible variations. The three types of initialized disks are:
G G
Simple Private and public regions are on the same partition. Sliced Private and public regions are on different partitions (default) nopriv Does not have a private region
Note You should not use the nopriv command. It is normally used only for RAM disk storage on non-Sun systems.
7-8

CVM and SSVM Encapsulation

If you have existing data on the disk, you would not want to initialize the disk, because this destroys any data. Instead, you can choose to encapsulate the disk. When you install the Sun StorEdge Volume Manager software on a system, you can place your system boot disk under SSVM control using the vxinstall program For Sun StorEdge Volume Manager to encapsulate the disk, there should be at least 1024 sectors in an unused slice at the beginning or end of the disk and 2 free partitions.

7-9
7
Preferred Boot Disk Conguration
Although there are many possible boot disk variations, the preferred boot disk conguration is shown in Figure 7-2. SCSI SCSI rootdg disk group
c0 c1
SOC
c2
rootvol
rootmirror
newdg disk group Storage Array Figure 7-2 Preferred Boot Disk Conguration
The preferred conguration has the following features:

G G G
The boot disk and mirror are on separate interfaces The boot disk and mirror are not in a storage array Only the boot disk and mirror are in the rootdg disk group
7-10

7
Prerequisites for Boot Disk Encapsulation
For the boot disk encapsulation process to succeed, the following prerequisites must be met:
G G
The disk must have at least two unused slices The boot disk must not have any slices in use other than the following:
G G G G
root swap var usr
The following is an additional prerequisite that is desirable but not mandatory:

G
There should be at least 1024 sectors of unused space on the disk. Practically, this will be at least 2 full cylinders. This is needed for the private region. SSVM will take the space from the end of the swap partition if necessary.
Primary and Mirror Conguration Differences

When you encapsulate your system boot disk, the location of all data remains unchanged even though the partition map is modied. When you mirror the encapsulated boot disk, the location of the data on the mirror is different from the original boot disk. During encapsulation, a copy of the system boot disk partition map is made so that the disk can be returned to a state that allows booting directly from a slice. The mirror of the boot disk cannot easily be returned to a sliced conguration.
7-11
7
The /etc/vfstab File
A backup copy of the /etc/vfstab le is made before the new boot disk path names are congured. The following /etc/vfstab le is typical for a boot disk with a single partition root le system.
#device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # fd /dev/fd fd no /proc /proc proc no /dev/vx/dsk/swapvol swap no /dev/vx/dsk/rootvol /dev/vx/rdsk/rootvol / ufs 1 no swap /tmp tmpfs yes # #NOTE: volume rootvol (/) encapsulated partition c0t0d0s0 #NOTE: volume swapvol (swap) encapsulated partition c0t0d0s1
Boot PROM Changes

When the system boot disk is encapsulated, you can no longer boot directly from a boot disk partition. The SSVM software creates two new boot aliases for you so that you can boot from the primary system boot disk, or if a failure occurs, you can boot from the surviving mirror. You can examine the new boot aliases as follows: # eeprom | grep devalias devalias vx-rootdisk /sbus@1f,0/SUNW,fas@e,8800000/sd@1,0:a devalias vx-rootmir /sbus@1f,0/SUNW,fas@e,8800000/sd@0,0:a If your primary boot disk fails, you can boot from the surviving mirror as follows: ok boot vx-rootmir
7-12

7
Un-encapsulating the Boot Disk
About the only time you might want to un-encapsulate the system boot disk is if you are removing the SSVM software. The vxunroot command is used to unencapsulate the boot disk but rst you must make sure the following actions have been taken:
G G
All boot disk volumes have been unmirrored All non-root le systems, volumes, plexes, and subdisks have been removed.
If you forget to prepare the boot disk, the vxunroot command performs a very thorough check before starting. The vxunroot command performs the following functions:
G G G G G
Checks for any unacceptable structures on the boot disk Returns the boot disk partition map to its original state Returns the /etc/system le to its original state Returns the /etc/vfstab le to its original state Returns the OpenBoot PROM device aliases to their original state
7-13
CVM and SSVM Disk Grouping

Disk groups are an arbitrary collection of physical disks that allow a backup host to assume a workload. The disk groups are given unique names and ownership is assigned either to a single cluster host system or to the name of the cluster. Ownership is dependent on the intended database platform and application. CVM and SSVM both use the term disk group to dene a related collection of disk drives. The term dg is used frequently in related documentation.
7-14

7
Cluster Volume Manager Disk Groups
You install the CVM version of the Veritas Volume Manager only when using the Oracle Parallel Server database. As shown in Figure 7-3, the CVM disk groups are owned by the cluster and have the name of the cluster written in private regions on the physical disks. This means that any node in the cluster can read or modify data in a disk group volume.
Node 0 access
A Disk group
Node 1 access
The disk group is owned by the cluster.
Storage array Volume Volume Volume
Figure 7-3 CVM Disk Group Ownership Note To prevent simultaneous data access from two different cluster host systems and possible corruption, the Oracle Parallel Server database uses a software locking mechanism called distributed lock management.
7-15
7
Sun StorEdge Volume Manager Disk Groups
The SSVM can be used by all of the supported high availability databases and data services. As shown in Figure 7-4, SSVM disk groups are owned by an individual node and the hostname of that node is written onto private regions on the physical disks. Even though another node is physically connected to the same array, it cannot access data in the array that is part of a disk group it does not own. During a node failure, the disk group ownership can be transferred to another node that is physically connected to the array. This is the backup node scheme used by all of the supported high availability data services.
Node 0 access
A Disk group
Node 1 access
The disk group is owned by Node 0.
Storage array Volume Volume Volume
Figure 7-4 SSVM Disk Group Ownership
7-16

Volume Manager Status Commands

Although the graphical user interfaces for CVM, SSVM, and SDS furnish useful visual status information, there are times when the images might not update correctly or completely due to window interlocks or system loads. The most reliable and the quickest method of checking status is from the command line. Command line status tools have the additional benets of being easy to use in script les, cron jobs, and remote logins.
7-17
7
Volume Manager Status
Checking Volume Status
Using the vxprint Command
The vxprint command, used with both CVM and SSVM, is the easiest way to check the status of all volume structures. The following sample vxprint output shows the status of two plexes in a volume as bad. One of the plexes is a log. # vxprint Disk group: sdg0 TY NAME dg sdg0 dm disk0 dm disk7 v pl sd pl sd pl sd vol0 vol0-01 disk0-01 vol0-02 disk7-01 vol0-03 disk0-02 ASSOC sdg0 c4t0d0s2 c5t0d0s2 fsgen vol0 vol0-01 vol0 vol0-02 vol0 vol0-03 KSTATE ENABLED DISABLED ENABLED ENABLED ENABLED DISABLED ENABLED LENGTH 8368512 8368512 524288 525141 525141 525141 525141 LOGONLY 5 PLOFFS 0 0 LOG STATE ACTIVE IOFAIL ACTIVE IOFAIL -
Note You can use the vxprint -ht vol0 command to get a detailed analysis of the volume. This gives you all the information you need, including the physical path to the bad disk. You can also use the vxprint command to create a backup conguration le that is suitable for recreating the entire volume structure. This is useful as a worst-case disaster recovery tool.
7-18

7
Checking Disk Status
When disk drives fail, the CVM or SSVM software can lose complete contact with a disk and no longer display the physical path with the vxprint -ht command. At those times, you must nd the media name of the failed disk from the vxprint command and then use the vxdisk list command to associate the media name with the physical device. # vxdisk list
DEVICE c0t0d0s2 c0t1d0s2 -
TYPE sliced sliced -
DISK disk02 disk01
GROUP rootdg rootdg
STATUS error online failed was:c0t0d0s2
When a disk fails and becomes detached, the CVM or SSVM software cannot currently nd the disk but still knows the physical path. This is the origin of the failed was status. This means the disk has failed and the physical path was the value displayed.
Saving Conguration Information

The vxprint and vxdisk commands can also be used to save detailed conguration information that is useful in disaster recovery situations. The output of the following commands should be copied into a le and stored on tape. You should also keep a printed copy of the les. # vxprint -ht > lename # vxdisk list > lename
7-19
Optimizing Recovery Times

In the Sun Cluster environment, data volumes are frequently mirrored to achieve a higher level of availability. If one of the cluster hosts system fails while accessing a mirrored volume, the recovery process might involve several steps including:
G G
Mirrors must be synchronized File systems must be checked
Mirror synchronization can take a long time and must be completed before le systems can be checked. If your cluster uses many large volumes, the complete volume recovery process can take hours. You can expedite mirror synchronization by using the CVM/SSVM dirty region logging feature. You can expedite le system recovery by using the Veritas VxFS le system software. Note Although you can also expedite le system recovery by using the Solaris 7 Operating Environment UFS logging feature, the current versions of CVM and SSVM do not run in the Solaris 7 Operating Environment.
7-20

7
Optimizing Recovery Times
Dirty Region Logging
A dirty region log (DRL) is a CVM or SSVM log le that tracks data changes made to mirrored volumes. The DRL is used to speed recovery time when a failed mirror needs to be synchronized with a surviving mirror.
G
Only those regions that have been modied need to be synchronized between mirrors. Improper placement of DRLs can negatively affect performance
A volume is divided into regions and a bitmap (where each bit corresponds to a volume region) is maintained in the DRL. When a write to a particular region occurs, the respective bit is set to on. When the system is restarted after a crash, this region bitmap is used to limit the amount of data copying that is required to recover plex consistency for the volume. The region changes are logged to special log subdisks linked with each of the plexes associated with the volume. Use of dirty region logging can greatly speed recovery of a volume.
The Veritas VxFS File System

The UNIX le system relies on full structural verication by the fsck command to recover from a system failure. This means checking the entire structure of a le system, verifying that it is intact, and correcting any inconsistencies that are found. This can be time consuming. The VxFS le system provides recovery only seconds after a system failure by using a tracking feature called intent logging. Intent logging is a logging scheme that records pending changes to the le system structure. During system recovery from a failure, the intent log for each le system is scanned and operations that were pending are completed. The le system can then be mounted without a full structural check of the entire system. When the disk has a hardware failure, the intent log might not be enough to recover and in such cases, a full fsck check must be performed, but often, when failure is due to software rather than hardware, a system can be recovered in seconds.
Volume Management Using CVM and SSVM 7-21
CVM and SSVM Post-Installation

You must complete additional conguration tasks before your cluster is operational.
Initializing the rootdg Disk Group

Until a disk group named rootdg is created on a system, the CVM or SSVM software cannot start. There are three ways to satisfy this requirement:
G G
Create a dummy rootdg on a single small slice on any system disk Initialize any storage array disk and add it to the rootdg disk group Encapsulate the system boot disk
Note The dummy rootdg method is used in the lab exercise for this module.
7-22

7
Matching the vxio Driver Major Numbers
During software installation, device drivers are assigned a major number in the /etc/name_to_major le. Unless these numbers are the same on HA-NFS primary and backup host systems, the HA-NFS users receive Stale le handle error messages after a HA-NFS logical host migrates to a backup system. This effectively terminates the user session and destroys the high availability feature. It makes no difference what the major numbers are as long as they agree on all of the host systems. All nodes associated with a HA-NFS logical host must be checked as follows: # grep vxio /etc/name_to_major vxio 45 Make sure that the number is unique in all of the les. Change one so that they all match or, if that is not possible, assign a completely new number in all of the les. If your boot disk is not encapsulated, you can stop all activity on the nodes and edit the /etc/name_to_major les so they all agree. Note If your boot disk has been encapsulated, the process is somewhat more complex. You should consult the Sun Cluster Software Installation Guide for detailed instructions. Warning You must stop the Sun Cluster software before making changes to the vxio driver major number.
7-23
7
StorEdge Volume Manager Dynamic Multi-Pathing
Dynamic Multi-Path Driver Overview
The dynamic multi-path driver (DMP) is unique to the SSVM product. It is used only with ber-optic interface storage arrays. As show in Figure 7-5, the DMP driver can access the same storage array through more than one path. The DMP driver automatically congures multiple paths to the storage array if they exist. Depending on the storage array model, the paths are used for load-balancing in a primary/backup mode of operation. Storage Array Host system Drive Drive Drive SOC card Controller Drive Drive
C1
C2
DMP driver fiber-optic interface Figure 7-5

Dynamic Multi-Path Driver
The AP and DMP features are mutually exclusive. If they do a touch /kernel/drv/ap before they install SSVM, the DMP software will not install. The danger is that they think this will prevent it from operating post-SSVM install. It will not.
7-24

Drive
SOC card
Controller
7
StorEdge Volume Manager Dynamic Multi-Pathing
Disabling the Dynamic Multi-Path Feature
The DMP feature is not compatible with the cluster operation and it must be permanently disabled. During SSVM software installation, the DMP feature is automatically congured. DMP must be disabled and completely removed. The procedure is as follows: 1. Remove the vxdmp driver from the /kernel/drv directory. # rm /kernel/drv/vxdmp 2. Edit the /etc/system le and remove (comment out) the following line: forceload: drv/vxdmp 3. Remove the volume manager DMP les. # rm -rf /dev/vx/dmp /dev/vx/rdmp 4. Symbolically link /dev/vx/dmp to /dev/dsk and /dev/vx/rdmp to /dev/rdsk. # ln -s /dev/dsk /dev/vx/dmp # ln -s /dev/rdsk /dev/vx/rdmp 5. Perform a reconguration boot on the system. # reboot -- -r Caution There are versions of the patches 105463 and 106606 that partially reenable DMP by installing /kernel/drv/vxdmp again. A reboot will fail. You will have to boot from CDROM to remove vxdmp again.
7-25
7
Exercise: Conguring Volume Management
G G G
Install your target volume manager (CVM or SSVM) Initialize your target volume manager (CVM or SSVM) Create demonstration volumes using script les.
Preparation
You must select the appropriate volume manager for installation during this exercise. When you installed the Sun Cluster software you were required to select a volume manager at that time. Unless you install the same volume manager that you specied during the Sun Cluster installation you might be missing critical support software. Ask your instructor about the location of the software that will be needed during this exercise. This includes software for CVM, SSVM, and some script les.
Caution There are two methods of initializing the volume manager software in this exercise. Ask your instructor now which method you should use. Each of the cluster host boot disks should have a small unused slice. This is used either for a dummy rootdg during the CVM/SSVM installation.
Creating a token rootdg disk group using a single slice on the system boot disk seems to be supported now. The procedure is in the Sun Cluster Software Installation Guide and is referenced in the Sun Cluster System Administration Guide. Impress on students the importance of getting the correct physical path to the small unused slice on the system boot disks. They can destroy the Solaris OS if they make a mistake.
7-26

7
Tasks
G G G G G G
Installing the CVM or SSVM software Disabling Dynamic Multipathing Initializing the CVM or SSVM software Selecting disk drives for demonstration volumes Conguring the demonstration volumes Verifying cluster operation
7-27
7
Installing the CVM or SSVM Software
1. Move to the location of the CVM or SSVM software. 2. To install the Sun StorEdge Volume Manager (SSVM) software, verify that the following les are present: # ls SUNWasevm SUNWvmdev SUNWvmman SUNWvmsa SUNWvxvm SUNWvxva
3. To install the Cluster Volume Manager (CVM) software, verify that the following les are present: # ls SUNWvmdev SUNWvmman SUNWvxva SUNWvxvm
4. Run the pkgadd command on all cluster host systems to begin the volume manager installation. # pkgadd -d . 5. Select the all option unless you do not want to install the AnswerBook package SUNWasevm. You can enter a spaceseparated list of the package numbers you want to install. 6. Leave /opt as the default installation directory. 7. Do not install the Apache HTTPD package. 8. Reply yes to installing the StorEdge Volume Manager Server software. 9. Reply yes to installing the SUNWvxva package. 10. Answer yes to installing conicting les.
7-28

7
Disabling Dynamic Multipathing (DMP)
1. Stop the Sun Cluster software on all of the cluster hosts. # scadmin stopnode 2. Remove the vxdmp driver from the /kernel/drv directory. # rm /kernel/drv/vxdmp 3. Edit the /etc/system le and remove (comment out) the following line: forceload: drv/vxdmp 4. Remove the volume manager DMP les. # rm -rf /dev/vx/dmp /dev/vx/rdmp 5. Symbolically link /dev/vx/dmp to /dev/dsk and /dev/vx/rdmp to /dev/rdsk. # ln -s /dev/dsk /dev/vx/dmp # ln -s /dev/rdsk /dev/vx/rdmp 6. Install the 106606 SSVM patch before proceeding. 7. Perform a reconguration boot on the system. # reboot -- -r
7-29
7
Creating a Simple rootdg Slice
Check with your instructor to see if you should use this procedure or the following procedure in the Encapsulating the Boot Disk section on page 7-31. The initrootdg script is in the training Scripts/VM directory. Warning Do not perform this procedure unless you are absolutely sure of the rootdg location on each of the nodes. This procedure will destroy an active, formatted partition. The following is an example of the initrootdg script: # more initrootdg vxconfigd -m disable wait vxdctl init wait vxdg init rootdg wait vxdctl add disk $1 type=simple wait vxdisk -f init $1 type=simple wait vxdg adddisk $1 wait vxdctl enable wait rm /etc/vx/reconfig.d/state.d/install-db 1. Locate and run the initrootdg script on all nodes, specifying the correct slice for each nodes local boot disk. # initrootdg boot_disk_slice Note A typical slice would be c0t0d0s7. 2. Reboot all cluster nodes.
7-30

7
Encapsulating the Boot Disk
Caution Do not perform this section unless you have rst checked with your instructor. The boot disks on your cluster host systems might not be properly congured for encapsulation. 1. On each cluster host, start the vxinstall utility. 2. Select installation option 2, Custom Installation. 3. Enter y (yes) at the Encapsulate Boot Disk prompt. 4. For all other disks and controllers, choose the Leave these disks alone option. 5. Read the summary of your choices carefully at the end of the interactive section. You can still quit without affecting any of the system disks. 6. After the boot disk has been encapsulated, you must reboot each cluster host system. 7. If your cluster host system has enough suitable disk drives, you can also mirror your system boot disk. Note Get assistance from your instructor before attempting to mirror the encapsulated boot disk. The vxdiskadm command is the easiest method of creating the mirror. You can use option 6, Mirror volumes on a disk.
7-31
7
Selecting Demonstration Volume Disks
The makedg.vm training script is used to create four mirrored volumes, each composed of two disks in one array and two mirror disks in another. Figure 7-6 shows the relationship of the disk drives. Node 0 Node 1
c1
c2
c1
c2
A
hanfs Primary Primary Primary Primary Array
B
hanfs.1 hanfs.2
A
Mirror Mirror Mirror Mirror Array
hadbms
hadbms.1 hadbms.2
Disk Groups Figure 7-6
Volumes
Demonstration Volume Structure
7-32

7
Selecting Demonstration Volume Disks (Continued)
The makedg.vm script prompts you for the disk group name, the four disks to be put into the disk group, and then creates the disk group and two mirrored volumes in it. The drives are specied in the form c0t4d0.
G
Make sure that you run the volume creation script on an appropriate node for each disk group.
The makedg.vm script is run once for each disk group that you need to create.
Selecting Disks
Before you run the volume creation script, select and record the physical path to eight disks that are suitable for creating mirrored volumes. Disk Group hanfs Data Devices disk01 disk03 Mirror Devices disk02 disk04
Nodes
Volumes hanfs.1 hanfs.2
hadbms
hadbms.1 hadbms.2
disk05 disk07
disk06 disk08
7-33
7
Selecting Demonstration Volume Disks (Continued)
The following is an example of the makedg.vm script output. The hanfs disk group is created as a result. # ./makedg.vm what service would you like? 1. HA nfs 2. HA RDBMS Enter choice (1|2) : 1 First Volume Enter 2 disks: first data then mirror (Ex.c1t0d3 c2t0d4) c2t1d0 c3t33d0 Second Volume Enter 2 disks: first data then mirror (Ex.c1t2d3 c2t2d4) c2t3d0 c3t35d0 Creating disk group hanfs disk group hanfs built Creating subdisks in disk group hanfs Done with creating sd for hanfs Creating plexes in disk group hanfs Done with making plexes Creating volumes in disk group hanfs Done with creating volumes for hanfs Enabling volumes in disk group Done enabling volumes in group Note The script must be run a second time to create the hadbms disk group.
7-34

7
Conguring the CVM/SSVM Demonstration Volumes
Warning The makedg.vm script can destroy existing data on the storage arrays if you specify incorrect drives. If you specify your boot drive, the script will destroy it. The cluster does not need to be running during the creation of the disk groups.
Caution Run the script on only one cluster node. Do not run it through the cluster console on multiple nodes. 1. Log in as user root on the cluster node. 2. After verifying that your DISPLAY environment variable is set to the Administration Workstation, start vxva on the cluster node and watch the creation process. 3. Change to the training scripts directory on the proper cluster node. 4. Run the makedg.vm script on Node 0 and create the hanfs disk group.
Caution You must run this script twice, once for each disk group that is needed (hanfs, hadbms). 5. Run the makedg.vm script again on Node 0 and create the hadbms disk group. 6. Run newfs on each volume in each disk group you have created. 7. Examine the new volume structures using the vxprint command. Verify that all volumes are in an enabled state.
7-35
7
Verifying the CVM/SSVM Demonstration File Systems
Before proceeding, you should verify that the le systems you created are functional. The le systems that you previously created are dependent on which volume manager you are using. They are as follows for the CVM/SSVM volume manager:
G G
The hanfs.1 and hanfs.2 volumes in disk group hanfs The hadbms.1 and hadbms.2 volumes in disk group hadbms
1. Create the le system mount points on each cluster host that will be associated with the logical hosts. # mkdir /hanfs1 /hanfs2 # mkdir /hadbms1 /hadbms2 2. Manually mount each le system on one of the nodes to ensure that they are functional. # # # # mount mount mount mount /dev/vx/dsk/hanfs/hanfs.1 /hanfs1 /dev/vx/dsk/hanfs/hanfs.2 /hanfs2 /dev/vx/dsk/hadbms/hadbms.1 /hadbms1 /dev/vx/dsk/hadbms/hadbms.2 /hadbms2
3. Verify the mounted le systems. # ls /hanfs1 /hanfs2 /hadbms1 /hadbms2 Note You should see a lost+found directory in each le system.
7-36

7
Verifying the CVM/SSVM Demonstration File Systems (Continued)
4. Create a test directory in the /hanfs1 le system for use later. # mkdir /hanfs1/test_dir 5. Make sure the directory permissions are correct. # cd /hanfs1 # chmod 777 test_dir # cd / 6. Umount all of the demonstration le systems. # # # # umount umount umount umount /hanfs1 /hanfs2 /hadbms1 /hadbms2
Verifying the Cluster

1. Test the cluster by joining all nodes to the cluster. Note Remember to let the scadmin startcluster command nish completely before joining any other nodes to the cluster.
7-37
7
Exercise: Conguring Volumes
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
7-38

7
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K Explain the disk space management technique used by the Cluster Volume Manager (CVM) and the Sun StorEdge Volume Manager (SSVM) Describe the initialization process necessary for CVM and SSVM Describe how the CVM and SSVM products group disk drives List the basic status commands for CVM and SSVM Describe the basic software installation process CVM and SSVM List post-installation issues relevant to CVM and SSVM Install and congure either CVM or SSVM
K K K K K K
7-39
7
Think Beyond
Where does Volume Manager recovery t into the high availability environment? What planning issues are required for the Volume Manager in the high availability environment? Is the use of the Volume Manager required for high availability functionality?
7-40

Volume Management Using SDS

Objectives
G
Explain the disk space management technique used by Solstice DiskSuite (SDS) Describe the initialization process necessary for SDS Describe how SDS groups disk drives List the basic SDS status commands Describe the basic SDS software installation process List the post-installation issues relevant to SDS Install and congure SDS
G G G G G G
This module introduces some of the basic concepts of the Solstice DiskSuite volume manager.
8-1
8
Relevance

Discussion The following questions are relevant to your learning the material presented in this module: 1. Which volume manager features are the most important to clustered systems? 2. What relationship do the volume managers have to normal cluster operation? 3. Are there any volume manager feature restrictions when they are used in the Sun Cluster environment?
8-2

8
G G G

8-3

The Solstice DiskSuite software manages disk space by associating standard UNIX partitions with a data volume structure. A single disk drive can be divided into only seven independent data regions, which is the UNIX partition limit for each physical disk.
8-4

8
SDS Disk Space Management
As shown in Figure 8-1, SDS manages virtual volume structures by equating standard UNIX disk partitions with virtual volume structures.
Volume d18 Slice 0 Slice 3 Slice 4 Slice 6 Physical disk drive Volume d12
Volume d6
Figure 8-1
SDS Space Management
Note Slice 7 is reserved for state database storage on disks that are used in a diskset. Disks are automatically partitioned when they are rst added to a diskset.

8-5
Solstice DiskSuite Initialization

Disk drives that are to be used by SDS do not need special initialization. The standard UNIX partitions are used without any modication. SDS needs a minimum of several small databases in which to store volume conguration information along with some error and status information. These are called state databases and are replicated on one or more disk drives. Another common term for the state databases is replicas. By default, SDS requires a minimum of three copies of the state database. The replicas are placed on standard unused partitions by a special command, metadb. The default size for each replica is 517 Kbytes (1034 disk blocks).
8-6

8
Solstice DiskSuite Initialization
Replica Conguration Guidelines
At least one replica is required to start the SDS software. A minimum of three replicas is recommended. SDS 4.2 allows a maximum of 50 replicas. The following guidelines are recommended:
G G G
For one drive Put all three replicas in one slice For two to four drives Put two replicas on each drive For ve or more drives Put one replica on each drive
Use your own judgement to gauge how many replicas are required (and how to best distribute the replicas) in your storage environment. Note You cannot store replicas on the root, swap, or /usr partitions, or on partitions containing existing le systems or data. If multiple controllers exist on the system, replicas should be distributed as evenly as possible across all controllers. This provides redundancy in case a controller fails and also helps balance the load. If multiple disks exist on a controller, at least two of the disks on each controller should store a replica. Do not place more than one replica on a single disk unless that is the only way to reach the minimum requirement of three replicas.

8-7
SDS Disk Grouping

Disk groups are an arbitrary collection of physical disks that allow a backup host to assume a workload. The disk groups are given unique names and ownership is assigned to a single cluster host system. SDS uses the term diskset to dene a related collection of disk drives. A shared diskset is a grouping of two hosts and disk drives that are accessible by both hosts. Each host can have exclusive access to a shared diskset; they cannot access the same diskset simultaneously. Note It is important to stress that the hosts do not share the disk drives in a shared diskset. They can take turns having exclusive access to a shared diskset, but they cannot concurrently access the drives in a shared diskset.
8-8

8
SDS Disk Grouping
Disksets facilitate moving disks between host systems, and are an important piece in enabling high availability. Disksets also enable you to group storage by department or application.
Mars
Local disks Venus phys-mars
Local disks phys-venus
Figure 8-2
G
Shared Disksets
A shared diskset is a grouping of two hosts and disk drives, which are physically accessible by both hosts and have the same device names on both hosts. Each host can have exclusive access to a shared diskset; they cannot access the same diskset simultaneously. Each host must have a local diskset that is separate from the shared diskset. There is one state database for each shared diskset and one state database for the local diskset.

8-9
Dual-String Mediators
Solstice DiskSuite has two different kinds of state databases. Initially, a local state database is replicated on each local boot disk. The local replicas are private to each host system. When a shared diskset is created, a different set of replicas are created that are unique to the diskset. Each shared diskset has its own set of replicas.
8-10

8
Dual-String Mediators
Shared Diskset Replica Placement
When a shared diskset is created on storage arrays, a different type of state database replicas are automatically created for the diskset. The illustration in Figure 8-3 shows two different shared disksets that each have their own set of replicas. The replicas for each diskset are automatically balanced across storage arrays. Diskset A
Host A
r r
r r
Host B
Mediator
Mediator
r r
local replicas Boot disk Figure 8-3
r r
local replicas Diskset B Boot disk
Diskset Replica Placement
Dual-String Mediation
With dual-string congurations (congurations with two disk stings, such as two SPARCstorage arrays or two SPARCstorage MultiPacks), it is possible that only one string is accessible at a given time. In this situation, it is impossible to guarantee a replica quorum. To resolve the dual-string limitation, the concept of mediators was introduced. Essentially, an additional mediator data is stored in the memory of the host systems and is used to establish a replica quorum when one of the storage arrays fails.
8-11
Metatrans Devices
After a system panic or power failure, UFS le systems are checked at boot time with the fsck utility. The entire le system must be checked and this can be time consuming. Solstice DiskSuite offers a feature called UFS Logging, sometimes referred to as journaling. UFS logging takes the (logically) synchronous nature of updating a le system and makes it asynchronous. Updates to a le system are not made directly to the disk or partition containing the le system. Instead, user data is written to the device containing the le system, but the le system disk structures are not modiedthey are logged instead. Updates to the le system structure are applied to the le system when: the log is detached and the le system is idle for 5 seconds; the log lls; or the device is cleared. Any changes made to the le system by unnished system calls are discarded, ensuring that the le system is in a consistent state. This means that logging le systems do not have to be checked at boot time, speeding the reboot process.
8-12

8
Metatrans Devices
As shown in Figure 8-4, a metatrans device is used to log a UFS le system. It has two components: a master device and a logging device. The master device contains the UFS le system, and has the same on-disk le system format as a non-logged UFS system. The logging device contains the log of le system transactions.
Master device
UNIX file system data /dev/md/diskset/dsk/d11
Logging device
UFS log
/dev/md/diskset/dsk/d14
/dev/md/setname/dsk/d10
Figure 8-4
Solstice DiskSuite UFS Logging
Both the master and logging devices must be mirrored to prevent data loss in the event of a device failure. Losing data in a log because of device errors can leave a le system in an inconsistent state, and user intervention might be required for repair.
8-13
8
Metatrans Devices
Metatrans Device Structure
A typical metatrans device structure is illustrated in Figure 8-5. All components are mirrored. /dev/md/disksetA/dsk/d10
/dev/md/disksetA/dsk/d11
/dev/md/disksetA/dsk/d14
UNIX file system d15 d12 d13
UFS log d16 c1t0d1s6
c2t1d0s6 c1t0d0s0 c2t0d1s0
Figure 8-5
Metatrans Device Structure
8-14

SDS Status
Although the graphical user interface for SDS furnishes useful visual status information, there are times when the images might not update correctly or completely due to window interlocks or system loads. The most reliable and the quickest method of checking status is from the command line. Command line status tools have the additional benets of being easy to use in script les, cron jobs, and remote logins.
8-15
8
Checking Volume Status
Using metastat
The following metastat command output, is for a mirrored metadevice, d0, and is used with the SDS volume manager. # metastat d0 d0: Mirror Submirror 0: d80 State: Okay Submirror 1: d70 State: Resyncing Resyncin progress: 15% done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 2006130 blocks Note You can also use the metastat command to create a backup conguration le that is suitable for recreating the entire volume structure. This is useful as a worst-case disaster recovery tool.
Checking Mediator Status

You use the medstat command to verify the status of mediators in an dual-string storage conguration. # medstat -s boulder Mediator dolphins bills Status Ok Ok Golden No No
8-16

8
Checking Replica Status
The status of the state databases is important and you can verify it using the metadb command, as shown in the following example. The metadb command can also be used to initialize, add, and remove replicas. # metadb flags first blk block count a u 16 1034 /dev/dsk/c0t3d0s5 a u 16 1034 /dev/dsk/c0t2d0s0 a u 16 1034 /dev/dsk/c0t2d0s1 o - replica active prior to last configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/opt/SUNWmd/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors
The status ags for the replicas shown in the previous example indicate that all of the replicas are active and up to date.
8-17
8
Recording SDS Conguration Information
Diskset conguration information should be archived using the metastat -p command option. The conguration information is output in a format that can later be used to automatically rebuild your diskset volumes. # metastat -s denver -p denver/d100 -m denver/d0 denver/d10 1 denver/d0 1 1 /dev/did/dsk/d10s0 denver/d10 1 1 /dev/did/dsk/d28s0 denver/d101 -m denver/d1 denver/d11 1 denver/d1 1 1 /dev/did/dsk/d10s1 denver/d11 1 1 /dev/did/dsk/d28s1 denver/d102 -m denver/d2 denver/d12 1 denver/d2 1 1 /dev/did/dsk/d10s3 denver/d12 1 1 /dev/did/dsk/d28s3
8-18

SDS Post-Installation
Conguring State Database Replicas
Before you can perform any SDS conguration tasks, such as creating disksets on the multihost disks or mirroring the root (/) le system, you must create the metadevice state database replicas on the local (private) disks on each cluster node. The local disks are separate from the multihost disks. The state databases located on the local disks are necessary for basic SDS operation. A typical command to place three state database replicas on slice 7 of a system boot disk is as follows: # metadb -a -c 3 -f c0t0d0s7
8-19
8
Conguring the Disk ID (DID) Driver
All new installations running Solstice DiskSuite require a DID pseudo driver to make use of disk IDs. DIDs enable metadevices to locate data independent of the device name of the underlying disk. Conguration changes or hardware updates are no longer a problem because the data is located by DID and not the device name. To create a mapping between a DID and a disk path, run the scdidadm -r command from only one node and then check the configuration using the scdidadm -L command. The following example shows the creation and verication of DID devices on a cluster with two dual-hosted StorEdge A5000 arrays. # scdidadm -r Configuring /devices and /dev; this may take a while. # scdidadm -L 1 devsys1:/dev/rdsk/c0t0d0 /dev/did/rdsk/d1 2 devsys1:/dev/rdsk/c2t37d0 /dev/did/rdsk/d2 2 devsys2:/dev/rdsk/c3t37d0 /dev/did/rdsk/d2 3 devsys1:/dev/rdsk/c2t33d0 /dev/did/rdsk/d3 3 devsys2:/dev/rdsk/c3t33d0 /dev/did/rdsk/d3 4 devsys1:/dev/rdsk/c2t52d0 /dev/did/rdsk/d4 4 devsys2:/dev/rdsk/c3t52d0 /dev/did/rdsk/d4 5 devsys1:/dev/rdsk/c2t50d0 /dev/did/rdsk/d5 5 devsys2:/dev/rdsk/c3t50d0 /dev/did/rdsk/d5 6 devsys1:/dev/rdsk/c2t35d0 /dev/did/rdsk/d6 6 devsys2:/dev/rdsk/c3t35d0 /dev/did/rdsk/d6 7 devsys1:/dev/rdsk/c3t20d0 /dev/did/rdsk/d7 7 devsys2:/dev/rdsk/c2t20d0 /dev/did/rdsk/d7 8 devsys1:/dev/rdsk/c3t18d0 /dev/did/rdsk/d8 8 devsys2:/dev/rdsk/c2t18d0 /dev/did/rdsk/d8 9 devsys1:/dev/rdsk/c3t1d0 /dev/did/rdsk/d9 9 devsys2:/dev/rdsk/c2t1d0 /dev/did/rdsk/d9 10 devsys1:/dev/rdsk/c3t3d0 /dev/did/rdsk/d10 10 devsys2:/dev/rdsk/c2t3d0 /dev/did/rdsk/d10 11 devsys1:/dev/rdsk/c3t5d0 /dev/did/rdsk/d11 11 devsys2:/dev/rdsk/c2t5d0 /dev/did/rdsk/d11 12 devsys2:/dev/rdsk/c0t0d0 /dev/did/rdsk/d12 #
8-20

8
Conguring Dual-String Mediators
Although the shared-diskset replicas are automatically created and balanced between storage arrays in a dual-string conguration, the mediators must be congured manually. 1. Start the cluster software on all host systems. 2. Determine the hostname and private link address of the rst mediator host. # hostname capri # ifconfig -a | grep 204.152.65 inet 204.152.65.1 netmask fffffff0 inet 204.152.65.33 netmask fffffff0 inet 204.152.65.17 netmask fffffff0 3. Determine the hostname and private link address of the second mediator host. # hostname palermo # ifconfig -a | grep 204.152.65 inet 204.152.65.2 netmask fffffff0 inet 204.152.65.34 netmask fffffff0 inet 204.152.65.18 netmask fffffff0 4. Use the hastat command to determine the current master of the diskset you are conguring for mediators. 5. Congure the mediators using the metaset command on the host that is currently mastering the diskset. capri# metaset -s disksetA -a -m capri,204.152.65.33 capri# metaset -s disksetA -a -m palermo,204.152.65.34 6. Check the mediator status using the medstat command. capri# medstat -s disksetA Note The private links must be assigned as mediator host aliases.
8-21
8
G G G G G
Install the SDS volume manager Initialize the SDS volume manager Congure the DID driver Create demonstration volumes using script les. Create dual-string mediators if appropriate
Preparation
You must select the appropriate volume manager for installation during this exercise. When you installed the Sun Cluster software you were required to select a volume manager at that time. Unless you install the same volume manager that you specied during the Sun Cluster installation you might be missing critical support software. Ask your instructor about the location of the software that will be needed during this exercise. This includes software for SDS and some script les. Each of the cluster host boot disks must have a small unused slice that can be used for a state database during the SDS installation.

Impress on students the importance of getting the correct physical path to the small unused slice on the system boot disks. They can destroy the Solaris OS if they make a mistake.
8-22

8
Tasks
G G G G G G
Installing the SDS software Conguring the SDS disk ID driver Conguring the SDS state databases Demonstration volume overview Conguring the demonstration volumes Cluster verication
8-23
8
Installing the SDS Software
1. Move to the location of the SDS software. 2. If you wish to install the SDS software you should see the following les: # ls SUNWmd SUNWmdg SUNWmdn
Note The SUNWdid and SUNWmdn packages were added during the Sun Cluster software installation when you specied SDS as your volume manager. 3. Run the pkgadd command on all cluster host systems to begin the volume manager installation. # pkgadd -d pwd The following packages are available: 1 SUNWmd Solstice DiskSuite (sparc) 4.2,REV=1998.02.09.12.47.28 2 SUNWmdg Solstice DiskSuite Tool (sparc) 4.2,REV=1998.14.09.08.19.32 3 SUNWmdn Solstice DiskSuite Log Daemon (sparc) 4.2,REV=1998.02.09.12.47.28 Select package(s) you wish to process (or 'all' to process all packages). (default: all) [?,??,q]: 1 2 4. As shown in the previous step, install only the SUNWmd and SUNWmdg packages. 5. Install the current version of the SDS 4.2 patch, 106627, on all cluster host systems. 6. Stop the Sun Cluster software on all nodes. 7. Reboot all of the cluster hosts after you install the SDS patch.
8-24

8
Conguring the SDS Disk ID Driver
To balance device mapping between nodes as explained during the lecture, the Sun Cluster 2.2 software includes a new device driver and scripts to set up and manage the Disk ID (DID) pseudo devices. The scdidadm command is used to congure the Disk IDs.
Caution Although, you run the scdidadm command on only one of the cluster nodes, all nodes must be joined in the cluster. 1. Verify that all nodes are currently cluster members. # get_node_status 2. Start the scdidadm script on Node 0. # scdidadm -r Note You might see error messages about the /etc/name_to_major number conicts. Resolving the name_to_major number conicts is addressed later in this exercise. If the scdidadm command cannot discover the private links, you must run the command again and specify the name of the other nodes in the Cluster. # scdidadm -r -H hostname1,hostname2,.... Note Do not include the name of the node on which the scdidadm command is being run.
8-25
8
Resolving DID Driver Major Number Conicts
The scdidadm command checks the /etc/name_to_major les on all nodes to verify that the DID driver major device numbers are identical. If they are not, you see the following message: The did entries in name_to_major must be the same on all nodes. 1. Examine the /etc/name_to_major les on all nodes and make sure that the DID driver major numbers are identical. host1# cat /etc/name_to_major | grep did did 158 host2# cat /etc/name_to_major | grep did did 156 host2# cat /etc/name_to_major | grep did did 156
If all DID driver major numbers are the same, skip to the Initializing the SDS State Databases section on page 8-28. 2. Look at the highest DID driver major number in the name_to_major les on all nodes and add 1 to it. Record the new number below. __________ 3. Check the /etc/name_to_major les on all nodes and verify that the new number is not already being used. 4. Edit the /etc/name_to_major le on all the nodes and change the DID driver major number to the new value. Warning The new DID driver major number must not be in use by another driver. Consult with your instructor if you are not absolutely sure about this.
8-26

8
Resolving DID Major Number Conicts
Once you have modied the name_to_major les, you must remove all the DID device structures that were created by the scdidadm command. 5. On each of the nodes where the name_to_major le was changed, run the following commands. # # # # # scadmin stopnode rm -rf /devices/pseudo/did* rm -rf /dev/did rm -rf /etc/did.conf reboot -- -r
6. After the reboot operations have completed, start the Sun Cluster software again on all cluster host systems. 7. On the node used to run the scdidadm command, run it again. # scdidadm -r This should resolve any major device number conicts across the nodes. 8. Use the following command to look at pseudo device names across all nodes in the cluster # scdidadm -L 1 2 2 3 3 4 4 5 5 6 devsys1:/dev/rdsk/c0t0d0 devsys1:/dev/rdsk/c2t37d0 devsys2:/dev/rdsk/c3t37d0 devsys1:/dev/rdsk/c2t33d0 devsys2:/dev/rdsk/c3t33d0 devsys1:/dev/rdsk/c2t52d0 devsys2:/dev/rdsk/c3t52d0 devsys1:/dev/rdsk/c2t50d0 devsys2:/dev/rdsk/c3t50d0 devsys2:/dev/rdsk/c0t0d0 /dev/did/rdsk/d1 /dev/did/rdsk/d2 /dev/did/rdsk/d2 /dev/did/rdsk/d3 /dev/did/rdsk/d3 /dev/did/rdsk/d4 /dev/did/rdsk/d4 /dev/did/rdsk/d5 /dev/did/rdsk/d5 /dev/did/rdsk/d6
8-27
8
Initializing the SDS State Databases
Before you can use SDS to create disksets and volumes, the state database must be initialized and one or more replicas created. The system boot disk on each cluster host should be congured with a small unused partition. This should be slice7. 1. On each node in the cluster, verify that the boot disk has a small unused slice available for use. Use the format command to verify the physical path to the unused slice. Record the paths of the unused slice on each cluster host. A typical path is c0t0d0s7. Node 0 Replica Slice: _______________ Node 1 Replica Slice: _______________ Node 2 Replica Slice: _______________ Warning You must ensure that you are using the correct slice. A mistake can corrupt the system boot disk. Check with your instructor. 2. On each node in the cluster, use the metadb command to create three replicas on the boot disk slice. # metadb -a -c 3 -f c0t0d0s7 3. Verify that the replicas are congured on each node. # metadb flags first blk a u 16 a u 1050 a u 2084 block count 1034 /dev/dsk/c0t0d0s7 1034 /dev/dsk/c0t0d0s7 1034 /dev/dsk/c0t0d0s7
8-28

8
SDS Volume Overview
A script, makedg.sds, is used to create two disksets. Each diskset has three mirrored volumes. The relationship of the primary and mirror devices is shown in Figure 8-6. Node 0 Node 1
c1
c2
c1
c2
d100 hanfs Primary d101 d102 Mirror
d100 hadbms Primary d101 d102 Mirror
Disksets Figure 8-6
Array
Volumes
Array
Demonstration Volume Structure
The d100 volumes are 10-Mbytes and are used for a special administration le system in a later lab exercise. The d101 and d102 volumes are 250-Mbytes each.
8-29
8
SDS Volume Overview (Continued)
The makedg.sds script is run twice, once for each diskset. It requires the DID path to pairs of disks along with the physical path to each of the disks. The script uses the physical path to partition the disk drives in an arrangement suitable for this lab exercise. The partition map is contained in a le called 9GB_vtoc. The disk partition structures must meet the following minimum requirements: Slice 0 Slice 1 Slice 2 Slice 3 Slice 7 10-Mbytes 250-Mbytes The entire disk 250-Mbytes ~ 3-Mbytes (2 cylinders)
The makedg.sds script le prompts for the name of the vtoc le. The 9GB_vtoc le is used for the 9-Gbyte disk drives found in the StorEdge A5000 arrays. There are also 1GB_vtoc and 2GB_vtoc les for use with the older SPARCstorage array product. If your arrays have different disk drives, you must manually repartition a disk and then save its vtoc information in a le. You can create a le as follows: 1. Repartition a disk drive to meet the space requirements outlined above. 2. Save the vtoc information in a le as shown in the following example: # prtvtoc /dev/rdsk/c3t0d0s2 > filename 3. Furnish the new vtoc le name to the makedg.sds script.
8-30

8
Selecting SDS Demo Volume Disks Drives
1. Use the sddidadm command on Node 0 to list all of the available DID drives. # scdidadm -l 2. Record the physical and DID paths of four disks that are used to create the demonstration volumes. Remember to mirror across arrays. An entry might look like: d4 c2t50d0 Diskset hanfs Volumes d100 d101 d102 hadbms d100 d101 d102 disk03 disk04 Primary disk01 Mirror disk02
Note You need to record only the last portion of the DID path. The rst part is the same for all DID devices: /dev/did/rdsk. Caution All disks used in each diskset must be the same geometry (that is, they must all be the same type of disk).
8-31
8
Conguring the SDS Demonstration Volumes
Slice 7 on the disks is used by Solstice DiskSuite to store diskset state databases. Using the list obtained from the scdidadm -l command, create two disksets using the makedg.sds script le in the Scripts/SDS directory. 1. On Node 0, run the makedg.sds script le to create the diskset called hanfs. 2. On Node 0, run the makedg.sds script le again to create the diskset called hadbms. 3. Verify the status of the new disksets. # # # # metaset metastat metaset metastat -s -s -s -s hanfs hanfs hadbms hadbms
Conguring Dual-String Mediators

If your cluster is a dual-string conguration, you must congure mediation for both of the disksets you have created. 1. Make sure the cluster software is running on the cluster hosts. 2. Determine the hostname and private link address of the rst mediator host (Node 0) using the hostname and ifconfig -a commands. Record the results below. Node 0 Hostname: _______________ _______________
Node 0 Private Link Address:
Note The private link address is either 204.152.65.33, 204.152.65.34, 204.152.65.35, or 204.152.65.36.
8-32

8
Conguring Dual-String Mediators (Continued)
3. Record the hostname and private link address of the second mediator host (Node 1). Node 1 Hostname: _______________ _______________
Node 1 Private Link Address:
4. Use the metaset command to determine the current master of the disksets you are conguring for mediators. Note Both the hanfs and hadbms disksets should be mastered by Node 0. 5. Congure the mediators using the metaset command on the host that is currently mastering the diskset. # # # # # metaset -s hanfs -a -m node0_name,204.152.65.33 metaset -s hanfs -a -m node1_name,204.152.65.34 metaset -s hadbms -a -m node0_name,204.152.65.33 metaset -s hadbms -a -m node1_name,204.152.65.34
6. Check the mediator status using the medstat command. # medstat -s hanfs # medstat -s hadbms
8-33
8
Verifying the SDS Demonstration File Systems
Before proceeding, you should verify that the le systems you created are functional. The le systems that you previously created are dependent on which volume manager you are using. They are as follows for the SDS volume manager:
G G
The d101 and d102 volumes in diskset hanfs. The d101 and d102 volumes in diskset hadbms.
1. Create the le system mount points on each cluster host that will be associated with the logical hosts. # mkdir /hanfs1 /hanfs2 # mkdir /hadbms1 /hadbms2 2. Manually mount each le system on one of the nodes to ensure that they are functional. # # # # mount mount mount mount /dev/md/hanfs/dsk/d101 /hanfs1 /dev/md/hanfs/dsk/d102 /hanfs2 /dev/md/hadbms/dsk/d101 /hadbms1 /dev/md/hadbms/dsk/d102 /hadbms2
3. Verify the mounted le systems. # ls /hanfs1 /hanfs2 /hadbms1 /hadbms2 Note You should see a lost+found directory in each le system.
8-34

8
Verifying the SDS Demonstration File Systems (Continued)
4. Create a test directory in the /hanfs1 le system for use later. # mkdir /hanfs1/test_dir 5. Make sure the directory permissions are correct. # cd /hanfs1 # chmod 777 test_dir # cd / 6. Umount all of the demonstration le systems. # # # # umount umount umount umount /hanfs1 /hanfs2 /hadbms1 /hadbms2
Verifying the Cluster

1. Test the cluster by joining all nodes to the cluster. Note Remember to let the scadmin startcluster command nish completely before joining any other nodes to the cluster.
8-35
8
Exercise: Conguring Volumes
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
8-36

8
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K K K Explain the disk space management technique used by Solstice DiskSuite (SDS) Describe the initialization process necessary for SDS Describe how SDS groups disk drives List the basic SDS status commands Describe the basic SDS software installation process List the post-installation issues relevant to SDS Install and congure SDS
8-37
8
Think Beyond
Where does SDS t into the high availability environment? What planning issues are required for SDS in the high availability environment? Is use of the SDS required for high availability functionality?
8-38

Cluster Conguration Database

Objectives
G G
Describe the Cluster Database (CDB) and its operation Describe the Cluster Conguration Database (CCD) and its operation List the advantages of a shared CCD volume Manage the contents of the cluster conguration les
G G
This module describes how the Sun Cluster environment conguration information is stored and managed.
9-1
9
Relevance

Discussion The following questions are relevant to understanding the content of this module: 1. What type of information about the cluster conguration do you need to keep? 2. How do you update and share this information between nodes?
9-2

9
G G

9-3

Each cluster node maintains local copies of the databases. The cluster conguration databases contain both local and cluster-wide conguration information. Critical Sun Enterprises Cluster conguration and status information is maintained in two cluster-wide database les clustname.cdb and ccd.database, which are located in the /etc/opt/SUNWcluster/conf directory. Warning You should not manually modify either of these databases. The slightest mistake could leave your cluster unusable. You should regularly back up the directory where the databases reside.
9-4

9
The CDB Database
The CDB database contains general cluster conguration information that is used during cluster reconguration including:
G G G G G
The cluster name and all node names A cluster application software prole Pre-assigned interconnect addresses of all cluster hosts Scheduling, priority, and timeout information for cluster software Quorum device mappings
A CDB database template is loaded and congured during the Sun Cluster software installation on the cluster hosts. This database is seldom modied after the initial Sun Cluster software installation.
The CDB Format

The CDB database has a simple format for every line: variable name: value The following shows typical entries in the CDB database le. cluster.cluster_name: currency cluster.number.nodes : 2 cluster.number.nets: 2 cluster.node.0.hostname: dollar cluster.node.1.hostname: penny # cluster interconnect section # cluster.node.0.if.0: scid0 cluster.node.0.phost.0: 204.152.65.1 cluster.node.0.if.1: scid1 cluster.node.0.phost.1: 204.152.65.17 cluster.node.1.if.0: scid0 cluster.node.1.phost.0: 204.152.65.2 cluster.node.1.if.1: scid1 cluster.node.1.phost.1: 204.152.65.18

9-5
9
The CCD Database
The CCD database is a special purpose database that has multiple uses depending on your cluster application. The CCD database contains:
G G G
Logical host conguration information Data service status Data service instance management information
A CCD database template is loaded during the Sun Cluster software installation on the cluster hosts. Data is added to it as changes are made to disk groups, NAFO groups, or logical host congurations.
The CCD Database Format

The CCD database is a true database that uses many different formats. As shown in the following example, each entry is associated with a major key that has a unique associated format. # Cluster disk group CDG_fmt:prim_node:backup_node:dg CDG:node0:node1:dg # Logical IP address LOGIP_fmt:nodelist:iflist:ipaddr:logif LOGIP:node0,node1:hme0,hme0:129.146.237.136:2
Many different formats are used in the CCD database. Almost all of them store information about the structure, control, and status of logical hosts.
9-6

9
The CCD Database
CCD Database Files
There are several different les associated with the CDD. All of them are located in the /etc/opt/SUNWcluster/conf directory. Following is a brief summary of the les.
G
ccd.database.init The init CCD le contains static conguration parameters used to start the ccdd daemon. Only major conguration changes, such as adding an additional node, affect the ccd.database.init le.
ccd.database This le contains the dynamic CCD database. When the SC software is installed, the dynamic and init CCD les are both created using default entry values. The ccd.database les is regularly modied by changes in things such as PNM status, logical host switches, and logical host state changes such as when they are placed in maintenance mode.
ccd.database.pure The ccdadm command can be used to verify the correctness of the ccd.database le. Any error lines in the original le are removed and a puried version of the original is created. The puried version does not have the imbedded checksum. It can be used to create a new active ccd.database le with the ccdadm command.
ccd.database.shadow Whenever the ccd.database le is to be updated, a shadow copy is made before the update process can continue. The shadow copy can be used if something goes wrong during the update process.

9-7
Cluster Database Consistency

The consistency of the CDB and CCD databases between cluster hosts is checked any time a node joins or leaves the cluster.
Data Propagation
The data in the CDB database is seldom modied after the initial Sun Cluster software installation. The CDB database le is modied as the result of using certain scadmin command options. You must be careful, the CDB change that are made with some scadmin command options do not propagate to the CDB les on all nodes. The CCD data is modied whenever you perform administrative duties such as creating a new logical host. When you make any modication to the CCD database, the changes are automatically propagated to the CCD databases on all current members of the cluster. If a node is not currently in the cluster, it does not get the changes. This can lead to consistency problems when the node tries to join.
9-8

9
The CCD Update Protocol
As shown in Figure 9-1, any CCD updates must be arbitrated by the CCD master. The master is the cluster member with the lowest node identier. The update process is handled by the ccdd daemons that run on each node. SCI switch
Node 0 CCD
Node 1 CCD
Node 2 CCD
Freeze ccdd (Master) Propagate Update request ccdd
Freeze ccdd Propagate
Figure 9-1
CCD Update Control
The CCD master node handles a change request as follows:

G G G
The CCD master freezes all CCD activity on other nodes. The CCD master incorporates the changes into its local CCD. The CCD master propagates the CCD changes to all other nodes.
Note Just prior to starting a ccd.database update, the ccdd daemon make a shadow copy of the ccd.database le called ccd.database.shadow that can be used if anything goes wrong during the update process.

9-9
9
Database Consistency Checking
During cluster reconguration, the consistency of both databases is checked between the proposed cluster members. The majority opinion about the CDB and CCD content takes precedence. A node that does not agree with the majority opinion either cannot join the cluster or is dropped from current membership. Additionally, each nodes CCD database contain a locally generated checksum that allows local detection of corruption. If a node detects corruption this way, it excludes itself from the cluster.
Database Majority
During cluster reconguration, the consistency of the CDB and CCD databases is checked between potential cluster members. If there is a majority consensus, a node with a CCD inconsistency can automatically get a new copy of the CCD downloaded from the current CCD master node. This is normally possible only when there are two nodes already joined in the cluster and a third node is trying to join. If there is only one node in a cluster and a second node tries to join, it is not possible to have a majority opinion. Either node could have the wrong information. The new potential member is not allowed to join if there is any disagreement about CDB or CCD content. Unless a special mirrored CCD volume has been congured, you cannot perform command operations that modify the contents of the CCD database unless two or more nodes are cluster members. Note There is no mechanism to automatically correct defective CDB databases.
9-10

Shared CCD Volume

In a two-node cluster, if only one node is in the cluster, you cannot modify the CCD database. Also, if the second node joins the cluster, there is no way to establish a majority opinion about CCD integrity. In a two-node cluster, you can create a third CCD database that is resident on the storage arrays. In the case of a single node failure, there are two CCD les to compare to ensure integrity. The shared CCD is not supported in Solstice DiskSuite installations. Note You can use the shared CCD only with a two-node cluster.
9-11
9
Shared CCD Volume
As shown in Figure 9-2, you can keep an additional copy of the CCD on a mirrored disk volume that is shared (physically connected) between two nodes. Node A ccd.database Node B ccd.database
ccdvol ccd.database sc_dg disk group Mass storage Figure 9-2
Mirror Primary
ccdvol ccd.database sc_dg disk group Mass storage
Shared CCD Volume Conguration
If you are running a two-node cluster, the scinstall procedure asks you if you want to have a shared CCD. Even if you reply no, you can create a shared CCD at any time. The advantage of a shared CCD is that you can update it with only one node in the cluster. With an unshared CCD, both nodes must be in the cluster to make any changes. You can convert a CCD to and from a shared conguration after installation if necessary. Note If you are upgrading from Sun Cluster 2.0, and you were using a shared CCD, you must rerun the confccdssa program or the upgraded cluster will not start properly.
9-12

9
Shared CCD Volume
Shared CCD Operation
The general functionality of a shared CCD volume is outlined in the following table. When Node A Leaves Node B imports the sc_dg disk group Node B copies its ccd.database to shared CCD Node B removes its local ccd.database le When Node A Returns The shared CCD is copied to both ccd.database les The shared CCD is renamed ccd.shadow Node A remounts the ccdvol le system
Creating a Shared CCD

You can create the shared CCD volume either by replying yes to its creation during the scinstall process and then using the confccdssa script after installation, or you can use the scconf and confccdssa commands together after the Sun Cluster installation. If you did not reply yes to the shared CCD question during scinstall processing, run the following command on both nodes in the cluster: # scconf clustername -S ccdvol Checking node status... Purified ccd file written to /etc/opt/SUNWcluster/conf/ccd.database.init.pure There were 0 errors found.
9-13
9
Shared CCD Volume
You must run the confccdssa command on only one node. The confccdssa command searches for suitable disk drives on the system and prompts you to select a pair as follows: # confccdssa clustername The disk group sc_dg does not exist. Will continue with the sc_dg setup. Please, select the disks you want to use from the following list: 1) SSA:00000078C8A0 2) SSA:000000722F83 Device 1: 1 1) t0d0 2) t0d1 3) t0d2 Disk: 3 Disk c0t0d2s2 with serial id 00142458 in SSA 00000078C8A0 has been selected as device 1. Select devices from list. 1) SSA:00000078C8A0 2) SSA:000000722F83 Device 2: 2 1) t0d2 Disk: 1) t0d2 Disk: 1 Disk c2t0d2s2 with serial id 01186928 in SSA 000000722F83 has been selected as device 2. newfs: construct a new file system /dev/vx/rdsk/sc_dg/ccdvol: (y/n)? y Note For clarity, several comment elds have been removed from the confccdssa command output.
9-14

9
Shared CCD Volume
Disabling a Shared CCD
To disable shared CCD operation, run: # scconf clustername -S none This will modify the ccd.database.init le. Note This does not remove the disk group.
9-15
CCD Administration
You use the ccdadm command to perform a number of administrative operations on the ccd.database les.
Caution Do not manually modify the CCD database le. It has a checksum and other consistency information written at the end of it. Manual editing corrupts the database le.
Verifying CCD Global Consistency

Use the following command from any node to verify global CCD consistency: # ccdadm clustername -v
9-16

9
CCD Administration
Checkpointing the CCD
The following command makes a backup copy of the CCD: # ccdadm clustername -c lename Note Do not copy the ccd.database le using the cp command. If there are any modications in progress, using the cp command could cause corruption. The ccdadm -c option freezes all modications while copying the le. This prevents corruption.
Restoring the CCD From a Backup Copy

The following command uses a CCD backup to restore the CCD: # ccdadm clustername -r restore_lename Caution The restore operation can be global in nature. If possible, ask for assistance from your eld representative before performing the ccdadm restore operation.
Creating a Puried Copy of the CCD

The following command can be used to check for syntax errors in the ccd.database le. A le named ccd.database.pure is created. # ccdadm clustername -p ccd.database Any errors will be reported during the purify operation.
9-17
9
CCD Administration
Disabling the CCD Quorum
If you must perform operations that modify the CCD with only one node in the cluster, you can disable the mechanism that prevents CCD modication with the following command: # ccdadm clustername -q off|on If you have a mirrored CCD volume, you do not need to disable the CCD quorum mechanism.

Make a strong point about how dangerous disabling the CCD quorum can be.
Note If there are any CCD processing errors, check the CCD log le in var/opt/SUNWcluster/ccd/ccd.log.
Recommended CCD Administration Tasks

Regular backups of the CCD should be made as follows:
G G
A daily cron job that uses the ccdadm -c command Use ccdadm -c manually before and after any CCD changes
Common Mistakes
A common CCD related error is to specify a shared CCD during the Sun Cluster installation and not complete the post installation creation of the CCD mirrored volume. When you attempt to start the cluster you will see CCD freeze errors. You must either complete the CCD mirrored volume creation or disable the shared CCD feature using the scconf command as follows: # scconf clustername -S none
9-18

9
Exercise: CCD Administration
G G G
Verify the global CCD consistency Checkpoint a ccd.database le Create a puried copy of the ccd.database le.
Preparation
There is no preparation for this exercise. Your cluster systems should have been left in the appropriate state at the end of the previous module exercise.
Tasks
The following task is explained in this section:
G
Maintaining the CCD database
9-19
9
Maintaining the CCD Database
It is good practice to make backup copies of the CCD database before making conguration changes, such as setting up one of the high availability data services. This can be done with the ccdadm command. You can also use ccdadm to check the node-to-node consistency of the CCD database. 1. Verify that all nodes are cluster members with the get_node_status command. 2. On Node 0, verify the global CCD consistency. # ccdadm clustername -v 3. Make a backup copy of the CCD database on each of the cluster host systems. # cd /etc/opt/SUNWcluster/conf # ccdadm clustername -c ./ccd.backup Note Do not copy the ccd.database le using the cp command. 4. On any of your cluster hosts, create a puried copy of the ccd.database le. # ccdadm -p ./ccd.database
9-20

9
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
9-21
9
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K Describe the Cluster Database and its operation Describe the Cluster Conguration Database and its operation List the advantages of a shared CCD volume Manage the contents of the cluster conguration les
9-22

9
Think Beyond
When would you disable the CCD update quorum requirement? What would it take to have information dened for all nodes, even if the nodes are ofine?
9-23

Objectives
G G G G
10
Explain the need for Public Network Management (PNM) Describe how PNM works Congure a network adapter failover (NAFO) group Disable the Solaris Operating Environment Interface Groups feature
This module describes the operation, conguration, and management of the Sun Cluster Public Network Management mechanism.
10-1
10
Relevance

Discussion The following questions are relevant to understanding this modules content: 1. What happens if a fully functional cluster node loses its network interface to a public network? 2. Are there alternatives to failing over all logical hosts from a cluster if the public network interface is lost?
10-2

10
G G
10-3
10

The Public Network Management (PNM) software creates and manages designated groups of local network adapters. The PNM software is a Sun Cluster package that provides IP address and adapter failover within a designated group of local network adapters. It is designed for use in conjunction with the HA data services. The network adapter failover groups are commonly referred to as NAFO groups. If a cluster host network adapter fails, its associated IP address is transferred to a local backup adapter. Each NAFO backup group on a cluster host is given a unique name, such as nafo12, during creation. A NAFO group can consist of any number of network adapter interfaces but usually contains only a few. The numbers assigned to each NAFO group can be any value as long as the total number of NAFO groups does not exceed 256 on a node. Note The full discussion of using NAFO groups in conjunction with logical hosts and data services is presented in another lecture.
10-4

10
As shown in Figure 10-1, the PNM daemon (pnmd) continuously monitors designated network adapters on a single node. If a failure is detected, pnmd uses information in the cluster conguration database (ccd) and the pnmconfig file to initiate a failover to a healthy adapter in the backup group. Network
Node 0 hme0 hme1 hme0 hme1
CIS
Monitor primary
pnmd
pnmd
ifconfig backup
ccd Node 1 NAFO group IP address /etc/pnmconfig NAFO group configuration
Figure 10-1
Public Network Management Components
The PNM software functions at two levels. It is started at boot time but has limited failure detection until the Sun Cluster software is started. At boot time, the PNM software can detect a local interface failure and switch to the designated backup interface(s) but does not perform more complex testing until a data service associated with a NAFO group is running.
10-5
10
The Network Monitoring Process

If a public network adapter indicates a lack of connectivity, PNM determines whether the fault lies with the adapter or the public network itself. Clearly, if the fault lies with the network itself, there is no recovery action that the server can take. However, something can be done if the failure is in the host servers network adapter. A backup network adapter is activated to replace the failed network adapter on the cluster host and the bad adapter fails over to a new adapter on the same host. This avoids having to move the entire server workload to another server due to the loss of a single network adapter.
10-6

10
The Network Monitoring Process
What Happens?
When monitoring for network faults, the PNM software must determine where the failure is before taking action. The fault could be a general network failure and not a local adapter. PNM can use the cluster interconnect system (CIS) to nd out if other nodes are also experiencing network access problems. If the problem is being seen by other nodes (peers), then the problem is probably a general network failure and there is no need to take action. If the detected fault is determined to be the fault of the local adapter, notify the network failover component to begin an adapter failover, which is transparent to the highly available data services. If a general network failure is identied, or if a remote adapter has failed, do not perform an unnecessary adapter failover. Highly Available Data Services, such HA-NFS, can query the physical net status (using the HA-API framework) if they experience data service loss to their clients. The HA-API framework then uses this information to determine:
G G
Whether to migrate the data service Where to migrate the data service
The Sun Cluster Manager displays network adapter status which system administrators can use during problem diagnosis.
10-7
10
How PNM Works

The PNM daemon is based on an Remote Procedure Call (RPC) clientserver model. It is started at boot time in an /etc/rc3.d script and killed in /etc/rc2.d. PNM uses the CCD for storing distributed state information for the results of the adapter monitoring test results on the various hosts. HA Data Services can query the status of the remote adapters at any time using the HA-API framework. The PNM software can work with backup network adapters on the same subnet that have different Ethernet or FDDI Media Access Control (MAC) addresses.

PNM checks even if the adapter is down. PNM turns the adapter back on if it starts working again.
10-8

10
How PNM Works
PNM Support Issues
Network Trunking
Network trunking is the ability to use multiple network adapters to transfer data as one, providing a higher bandwidth. These adapters are logically grouped by special trunking software that provides this capability. Network trunking is not supported by the Sun Cluster software.
Interface Groups Feature

The Solaris 2.6 Operating Environment provides an Interface Groups feature that resolves an old routing problem with multiple network interfaces on the same network or subnet. The Interface Groups feature is enabled by default. The Interface Groups feature can cause logical host switchovers to fail if it remains enabled. You must disable the Interface groups feature by adding the following line to the /etc/system le on all cluster hosts: set ip:ip_enable_group_ifs=0

For the Solaris 2.6 Operating Environment, there is a bug related to disabling Interface Groups, you must install the 105786-08 patch. The bug is 4077132.
Supported Interface Types

PNM supports the following public network interface types:
G G G G G G G G
le qe be hme qfe FDDI bf and nf ATM (LAN emulation mode only) Token Ring
10-9
10
PNM Monitoring Routines

The three main PNM routines, TEST, DETERMINE_NET_FAILURE, and FAILOVER monitor the local NAFO interfaces for trafc and, if necessary, ask other cluster members for information. If the routines determine that an interface is bad, they switch to the next backup interface. This progresses until there are no more backup interfaces in the group. This usually initiates the failover of the related logical host to another system.
10-10

10
TEST Routine
The TEST routine monitors for network trafc every 5 seconds. If there has been none over the last 5 seconds, it solicits trafc by sending ping commands. If after 5 more seconds there still has been no trafc, it marks the NAFO group in DOUBT status. The TEST routine performs the following steps as necessary: 1. Get the event count, old_count, on the adapter.

Event count comes from kstat. Events are I/O packets sent and received.
2. Sleep for 5 seconds. 3. Get the event count, new_1_count, on the adapter. 4. If new_1_count is not equal to old_count, set the backup group status to OK and return the status to caller.

Network traffic is coming in, system is okay. everything is okay. Most of the time is spent cycling through steps 1-4.
5. Solicit network trafc with a ping command on the subnet

G G G
First try IP routers multicast (224.0.0.2) Then try IP hosts multicast (224.0.0.1) Finally try broadcast ping (255.255.255.255)
Network must be very quiet to get this far, or adapter is down, or network is down.
6. Sleep for 5 seconds. 7. Get the event count, new_2_count, on the adapter. 8. If new_2_count is not equal to old_count, set the backup group status to OK and return the status to caller.

The ping packets arrived, everything is okay.
9. Set the backup group status to DOUBT.

Something is wrong. There must be a network or adapter failure.
10. Return the status to caller.
10-11
10
FAILOVER Routine
The FAILOVER routine creates a failover to another adapter in the group if possible, if the failure has been determined to be the adapter. If there are no working adapters, the entire group is marked DOWN. This usually causes logical host failover. The FAILOVER routine performs the following steps as necessary: 1. Call the DETERMINE_NET_FAILURE routine. 2. If the backup group status is NET_DOWN, return to caller.

The adapter is down; the network is bad so an adapter failover is not attempted.
3. For each adapter in the backup group:

Allow 30-50 seconds per adapter for failure confirmation.
a.
Get the logical IPs congured on the failed adapter.
b. Congure the primary adapter as DOWN. c. Congure the next adapter in the backup group as UP.
d. Congure the logical IPs on the new adapter. e.

If a test of the new adapter is OK, return to caller.
It takes about .55 seconds to configure the IP addresses to the new adapter, depending on the number of IP addresses being moved. Four hundred logical IP addresses takes about four seconds. You are finished with failover.
4. Set the backup group status to DOWN.

The failover failed.
5. Return to caller.
10-12

10
DETERMINE_NET_FAILURE Routine
The DETERMINE_NET_FAILURE routine asks peer nodes (other cluster nodes) if they have any information. It then determines if the local adapter or something else has failed. The DETERMINE_NET_FAILURE routine performs the following steps as necessary: 1. Using the CCD, check the adapter status of your peer nodes adapters.

A peer adapter is one which belongs to a remote host and shares the same subnet as the backup group you are checking.
2. If even one peer node adapter has a status of OK, return to the FAILOVER routine.

The network is okay.
3. If even one peer node claims a status of NET_DOWN, set the backup group status to NET_DOWN and return to the FAILOVER routine.

Someone already knows the net is down.
4. If all peers have a status of either DOWN or DOUBT, set the backup group status to NET_DOWN and return to the FAILOVER routine.

We declare the net down.
10-13
10
The pnmset Command

The pnmset command is used to congure a network adapter backup group. The example that follows shows the process of creating two separate backup groups. With pnmset, you can create all of the nafoX backup groups at the same time, or create them one at a time. The pnmset command can be run only by root.
10-14

10
The pnmset Command
Conguring Backup Groups
The following prompts are displayed by the pnmset program during NAFO group conguration. They each require special attention as follows:
G G
How many PNM backup groups on the host: 2 Enter backup group number: 123 The group number is arbitrary. The total number of groups cannot exceed 256. It is a good idea to make the backup group number unique on each node. If the group already exists, its conguration is overwritten with new information.
Please enter all network adapters under nafo0: qe1 qe0 The backup group should contain a minimum of two interfaces. If reconguring an existing group, you can add additional interfaces.
The pnmset program checks the status of the adapters and then selects a primary adapter. It then ensures that there is only one active adapter for each backup group. The conguration information is then written to the /etc/pnmconfig le. The pnmset program then signals the pnmd daemon to reread the new conguration information and monitor the adapters accordingly. If the backup group testing fails, you might have to perform the following steps on the primary interface in the group before running pnmset again: 1. # ifconfig hme0 2. # ifconfig hme0 plumb 3. # ifconfig hme0 up 4. # ifconfig hme0 down
10-15
10
The pnmset Command
Conguring Backup Groups (Continued)
The following shows the complete transcript of the creation of a single NAFO backup group. # pnmset In the following, you will be prompted to do configuration for network adapter failover do you want to continue ... [y/n]: y How many NAFO backup groups on the host [1]: 1 Enter backup group number [0]: 113 Please enter all network adapters under nafo113 hme0 hme1 hme2 The following test will evaluate the correctness of the customer NAFO configuration... name duplication test passed Check nafo113... < 20 seconds hme1 is active remote address = 192.9.10.222 nafo113 test passed # All backup interfaces that you specify are also activated and tested. The remote address is anyone who responded to a ping. Note If you want to add an additional interface to a group later, recreate the same group number again with additional interfaces. The old group conguration information is overwritten.
10-16

10
Other PNM Commands

The pnmstat Command
The pnmstat command is used to verify backup group status. The following is the output after a backup group failure: # pnmstat -c nafo0 OK 129 qe1 The output means:
M M M
nafo0 backup group is OK. It has been 129 seconds since the last failover qe1 is the current active interface in nafo0
10-17
10
Other PNM Commands
The pnmstat Command (Continued)
The following shows how to use the pnmstat command to check the status of all local backup groups: # pnmstat -l bkggrp r_adp nafo0 le0 status OK fo_time NEVER live_adp le0
The following shows how to use the pnmstat command to check the status of a NAFO group on a remote host using the public network: # pnmstat -h remote_host -c nafo1 OK NEVER le0 The output means:
M M M
nafo1 backup group is OK There has been no failover le0 is the current active interface in nafo1
The following shows how to use the pnmstat command to check the status of a NAFO group on a remote host using the private network: # pnmstat -s -h remote_host -c nafo1 OK Never hme0
10-18

10
Other PNM Commands
The pnmptor Command
The following shows how to use the pnmptor command to identify which adapter is active in a given backup group: # pnmptor nafo1 hme0
The pnmrtop Command

The following shows how to use the pnmrtop command to determine which backup group contains a given adapter: # pnmrtop qe0 nafo0
10-19
10
Exercise: Conguring the NAFO Groups
G G
Create a NAFO group on each cluster host system Disable the Interface Groups feature on each cluster host system
Preparation
Ask your instructor for help with dening the NAFO groups that will be used on your assigned cluster system. You should create a single NAFO group on each cluster host that is congured as follows:
G G
It consists of two interfaces of the same type I should be numbered 0 (nafo0)
Tasks
G G
Creating a NAFO group Disabling the Interface Groups feature
10-20

10
Creating a NAFO Group
1. Determine the primary network adapter on each cluster hosts using the ifconfig -a command. Record the results below. Node 0 Interface: Node 1 Interface: Node 2 Interface: Node 3 Interface: __________ __________ __________ __________ (typically hme0)
2. Verify that no NAFO groups exist on each cluster host. # pnmstat -l 3. Create a single NAFO group, numbered 0, on each cluster host using the pnmset command. # pnmset Note If at all possible, each group should consist of two interfaces, one of which can be the primary node interface. 4. Verify that the status of each new NAFO group is OK on all nodes. # pnmstat -l
10-21
10
Disabling the Interface Groups Feature
Perform the following steps on every cluster host system. 1. Disable the Interface Groups feature on each node by adding the following entry to each of their /etc/system les: set ip:ip_enable_group_ifs=0 2. Stop the Sun Cluster software on all nodes. 3. Install the 105786 patch. 4. Reboot all of your cluster hosts so that the /etc/system le changes take effect.
10-22

10
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
10-23
10
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K Explain the need for Public Network Management (PNM) Describe how PNM works Congure a NAFO group Disable the Solaris Operating Environment Interface Groups feature
10-24

10
Think Beyond
Are there other system components that would benet from the approach taken to network adapters by PNM? What are the advantages and disadvantages of automatic adapter failover? Manual adapter failover? How will IP striping affect this model? Can you realize the dual goals of higher throughput and high availability through PNM/NAFO for the network connections?
10-25
Logical Hosts
Objectives
G G G
11
Congure logical hosts Create the administrative le system for a logical host Switch logical hosts between physical nodes
This module describes how to congure logical hosts.
11-1
11
Relevance

Discussion The following questions are relevant to understanding the content of this module: 1. What is the purpose of a logical host? 2. What needs to be dened for a logical host? 3. What are the restrictions on a logical host?
11-2

11
G
Sun Cluster 2.2 System Administration Guide, part number 805-4238
Logical Hosts
11-3
11
Logical Hosts
To run a data service, a client system must be able to communicate with the data service over the network. This is done usually using a logical hostname that is converted to a IP address by a naming service or locally in the /etc/hosts les (preferred). The server for the data service must provide client access to data, both executables and stored data. A data service in the Sun Cluster HA environment must be able to migrate to one or more backup systems if the primary system fails. This should happen with as little disruption to the client as possible. A logical host in the Sun Cluster HA environment is a collection of network denitions and disk storage. A logical host, consisting of one or more IP addresses, assigned network adapters, and disk storage, is congured as the unit of failover. One or more data services are congured to run in a logical host, so that when the logical host moves, the data service follows it. The denition of a logical host also includes the list of physical hosts on which it can run.
11-4

11
Logical Hosts
You should take the following information into account when designing a logical host:
G
You can have more than one logical host for each physical host, and as many as three backup physical hosts for each logical host. You can have logical hosts that are primarily resident on a particular physical host fail over to different backup hosts. However, not all topologies support this. You can assign as many disk groups as you want to a particular logical host. You can assign multiple data services to a particular logical host, but remember that the logical host is the unit of failover. One application or data service per logical host is the norm.
Each network adapter assigned to a logical host must reside in a PNM backup group. Logical hosts can share physical network adapters. This means that several logical hosts can specify the same PNM group in their denition. They must not share logical hostnames (IP addresses).
Ask the students why this is. You might have the logical hosts fail over to different nodes, potentially advertising the same host name from multiple systems.
Logical Hosts
11-5
11
Logical Hosts
As shown in Figure 11-1, the Sun Cluster HA framework provides the routines for the logical host and data services to be properly failed over to a designated backup system, and restarted. This is why it is critical that the contents of the ccd.database les agree on all cluster host systems. Client workstation # ping ds_host # mount dshost:/Vol-02
Network
Node 0 Logical hostname: dshost 129.50.20.3 NAFO Group
Node 1 Data service recovery routines Detect Node 0 failure Import dg3 disk group fsck and mount Vol-02 Ifconfig dshost IP address Other recovery routines
Vol-02 volume Disk group: dg3 Primary: Node 0 Backup: Node 1 Logical host name: lhost2 ccd.database lhost2 information ccd.database
Figure 11-1
Logical Host Components
11-6

11
Conguring a Logical Host

The creation of a logical host is a three-step process: 1. Put all of the logical hosts network adapters into PNM NAFO backup groups. 2. Use the scconf -L command to create the logical host. 3. Use the scconf -F command to create the administrative le system. Note In SDS installations, you must create the administrative le system manually. The scconf -F option does not function. The logical host becomes active immediately after creation, which means:
G G G
Assigned disk groups are imported and mounted Network interfaces are activated IP addresses are congured up.
Note The administrative le system is discussed in the Administrative File System Overview section on page 11-13.
Logical Hosts 11-7
11
Using the scconf -L Command Option
A logical host is congured using the scconf -L command. You can run the command on only one node that is a running member of the cluster. The following is the format of the command using the -L option: scconf clustername -L logical-host-name -n nodelist -g dglist -i logaddrinfo [-m] where:
clustername
Identies the name of the cluster in which the logical host is being congured. Precedes a logical host name Identies the name of the new logical host Precedes a node name list Identies the cluster nodes on which this logical host can be run. The logical host is run preferentially on the rst node specied, with the others used as backups in order. Precedes a disk group list. Identies the disk groups that must migrate with the logical host. Precedes a list of network interfaces. It can be specied multiple times if there is more than one interface to be used by this logical host on each node. The network adapter names specied must be contained in a NAFO group on the corresponding node. The adapters do not have to be the same type or name.
-L
logical-host-name
-n
nodelist
-g
dglist
-i
11-8

11
Using the scconf -L Command Option (Continued)
logaddrinfo
Identies the primary NAFO interfaces on each node in the node list. These interfaces are specied in the same order as the nodes. The last parameter is the logical hostname (or IP address) assigned to this logical host. Disables automatic takeover if a logical host is running on a backup node and the primary node rejoins the cluster.
-m
Note Disabling automatic failback can help avoid unexpected disruption of data services. You can manually switch the data service back to its primary host system when you think it is better for the data service users. The scconf -L command activates the logical host on the node on which it is run. The specied disk groups are imported and the network interfaces are activated.
Deleting a Logical Host

To delete a logical host denition, type: scconf clustname -L logical-host-name -r
Logical Hosts
11-9
11
Logical Host Variations

You can congure logical hosts with varying levels of complexity. It is helpful to examine different logical host congurations.
Basic Logical Host

You create a basic logical host with the following command: # scconf clustername -L lhost1 -n node0,node1 -g dg1 \ -i hme0,hme0,usersys1 In this conguration, a logical host named lhost1 is being congured to run on primary system node0 and is taken over by the backup system node1 if node0 fails. If node0 is later repaired and joins the cluster again, the logical host lhost1 automatically switches back to node0. The disk group dg1 is imported by node1 if node0 fails. The primary NAFO interface for node0 is hme0. The primary NAFO interface for node1 is hme0. These interfaces might be the rst of only several in their NAFO group. The names of the NAFO groups on each node are not referenced in this command.
11-10

11
Basic Logical Host (Continued)
Clients reach the services offered by the lhost1 conguration by referencing the host name usersys1. You can also use an IP address instead of a logical hostname, but this will be difcult for users to remember. The IP address and name of usersys1 might be recorded locally in the /etc/hosts les or obtained from a network naming service. For highest availability, you should resolve logical hostnames locally. When you dene this logical host on one of the cluster hosts systems, its conguration is automatically propagated to the CCD on all other nodes currently joined in the cluster.
Caution When performing operations that modify the contents of the ccd.database le, you should have all nodes joined to the cluster. Otherwise you will have ccd.database inconsistencies between nodes.
Cascading Failover
You create a cascading failover conguration with the following command: # scconf clustername -L lhost2 -n node0,node1,node2 \ -g dg2 -i hme0,hme0,hme0,usersys2 This is a variation that can contain three or four nodes instead of two in the node list. If necessary, the logical host migrates through to each node in the list and goes back to node0 again if the last node in the list fails.
Logical Hosts
11-11
11
Disabling Automatic Takeover
You can disable the automatic takeover feature with the following command: # scconf clustername -L lhost3 -n nodea,nodeb -g dg1\ -i hme0,hme0,usersys3 -m The -m option disables automatic takeover if a logical host is running on a backup node and the primary node rejoins the cluster. Note You cannot add the -m option later. You must delete the logical host and then recreate it using the -m option.
Multiple Disk Group and Hostnames

You use the following command to associate more than one disk group or metaset with a logical host: # scconf clustername -L lhost4 -n nodea,nodeb \ -g dg1,dg2 \ -i hme0,hme0,usersys4a -i qe0,qe0,usersys4b You can have multiple logical hostnames associated with a logical host. This can provide multiple paths for users to access the resources under control of the logical host. Each -i list provides an additional client interface path and logical hostname.
11-12

11
Administrative File System Overview

Each logical host must have a small administrative le system associated with it. After you create a logical host, you can create the administrative le system using the scconf -F option. The administrative le system is created on a disk group or diskset that is associated with the logical host. Note In SDS-based clusters, you must create the administrative le systems manually for each logical host. The administrative le system stores some cluster conguration information as well as logical host data service information. It stores le lock recovery information for the HA-NFS data service. Approximately 5 Mbytes of space are required for this le system and its mirror. The system administrator does not need to manage or modify the administrative le system. This le system should not be NFS-shared. You can create the dgname-stat administrative volume by hand if you wish to control construction and placement of the disk space.
Logical Hosts
11-13
11
Administrative File System Overview
Administrative File System Components
CVM and SSVM Installations
The following shows the relationship between logical hosts and their associated administrative le system in a CVM/SSVM installation. Logical Host Name Logical hostname Disk group Volume name Volume mount Volume path Mount le dnslhost dnsip (129.55.30.10) dnsdg dnsdg-stat /dnslhost /dev/vx/dsk/dnsdg/dnsdg-stat vfstab.dnslhost
SDS Installations
The following shows the relationship between logical hosts and their associated administrative le system in a SDS installation. Logical Host Name Logical hostname Disk group Volume name Volume mount Volume path Mount le dnslhost dnsip (129.55.30.10) dnsdg d100 /dnslhost /dev/md/dsk/dnsdg/d100 vfstab.dnslhost
11-14

11
Creating the Administrative File System

Using the scconf -F Command Option
In CVM and SSVM installations, you congure the administrative le system for a logical host using the scconf -F command. The command must be run on every node on which the logical host will run, the nodes must be in the cluster, and the logical host must be active on one of the cluster nodes. The following is the command format. scconf clustername -F lhostname [dgname] where:
clustername
Indicates the name of the cluster in which the logical host is being congured. Indicates the name of the logical host being congured. Species a disk group that will contain the administrative le system. If it is not specied, the rst disk group specied in the scconf -L command that congured the logical host is used.
11-15
lhostname
dgname
Logical Hosts
11
Creating the Administrative File System
scconf Command Functions
The scconf -F option performs the following functions:
G
Creates a 2-Mbyte mirrored UNIX le system (UFS) volume in either the named disk group or the rst disk group specied when the logical host was created The administrative volume is named diskgroup-stat. Creates a mount point of the same name as the logical host and mounts the administrative le system on it (/lhostname) Creates a special vfstab le named vfstab.lhostname in the /etc/opt/SUNWcluster/conf/hanfs directory, which contains the mount information for the administrative volume
G G
Note If you need to control the placement of the administrative volume, you can pre-create the logical host administrative le system by using a volume manager relevant to your installation. You must still run the scconf -F command on all nodes.
scconf Command Precautions

To avoid CCD inconsistencies, the following criteria should be met any time a logical host is created or modied:
G
You must run the scconf -F command on all nodes in the cluster. One of the nodes must currently master the logical host. All nodes should be cluster members when you use the scconf command to create or modify logical hosts
G G
If the CCD is not consistent between all cluster hosts, logical host migration between hosts can fail, or you can experience general cluster reconguration failures.
11-16

11
Logical Host File Systems

When you create a logical host, one or more disk groups are associated with the logical host and an administrative le system is created. The mount information for the administrative le system is automatically entered in the logical host-specic vfstab le in the /etc/opt/SUNWcluster/conf/hanfs directory. If there are additional le system volumes you want to have mounted automatically, you must enter the information manually in the logical host-specic vfstab le. When the logical host fails over to a backup system, all le system specic information must be available to the backup host. Note You must record the additional le system information in the logical host-specic vfstab les on all cluster hosts that are congured as backup systems for the logical host.
Logical Hosts
11-17
11
Logical Host File Systems
Adding a New Logical Host File System
To add a new logical host le system, complete the following steps. 1. Create the le system volume in a disk group or shared diskset that belongs to the logical host. 2. Initialize the new le system with the newfs command 3. Create a le system mount point on each cluster node that is congured for the logical host. 4. Test-mount the le system and then unmount it. 5. Add the new le system mount entries to the vfstab.lhname les on the appropriate Sun Cluster nodes. The new le system is automatically mounted and managed as part of the logical host during the next logical host reconguration.
Sample Logical Host vfstab File

The format of the vfstab.lhname le is identical to that of /etc/vfstab le. All entries in this le must correspond to le systems located on multi-host disks, and can specify either UFS or VxFS le systems. The mount at boot and fsck pass elds are ignored. A typical entry might look like the following: /dev/vx/dsk/dg/dg1-v1 /dev/vx/rdsk/dg/dg1-v1 /abc ufs no Caution These les must be identical on all nodes that support the logical host.
11-18

11
Logical Host Control

Logical hosts migrate automatically between physical hosts when a data service in the logical host is determined to have failed and cannot be restarted, or when the physical host node has failed. The logical hosts fail over to a new physical host in the order specied when the logical host was created.
Forced Logical Host Migration

You can also switch logical hosts manually, using the haswitch or scadmin switch commands. The following is the syntax of these commands: # haswitch new_phys_host logical_host # scadmin switch clustername new_phys_host logical_host With haswitch or scadmin switch, you can specify the physical host to which the logical host(s) are to fail over. Note You can initiate the logical host switch from any node that is currently a cluster member.
Logical Hosts
11-19
11
Logical Host Control
Logical Host Maintenance Mode
Occasionally you might need to take a logical host down to perform administration functions, such as backing up le system volumes. You can place the logical host in maintenance mode using the command: # scadmin switch clustername -m logical_host The logical host is shut down and placed in maintenance mode until the haswitch command is executed again for the logical host. Before you take any action, all users must be informed that the related data service will be unavailable. Also, you will probably have to shut down the data service application. As shown in the following, you must perform additional steps if you want to backup logical host le systems. 1. Place the logical host in maintenance mode. # scadmin switch clustername -m logical_host 2. Import the disk group or diskset. # vxdg import diskgroup (metaset -s diskset -t) 3. Perform the volume backups. 4. Deport the disk group or diskset. # vxdg deport diskgroup (metaset -r diskset) 5. Put the logical host back in service. # scadmin switch clustername new_phys_host logical_host A special variation of the switch option can be used to force a cluster reconguration without moving any logical hosts. # scadmin switch clustername -r This can be used to enable a shared CDD without stopping the cluster software. However, it will temporarily suspend all active data services.
11-20

11
Exercise: Preparing Logical Hosts
G G G
Prepare the name service for the logical hosts Create logical hosts Create logical host administrative le systems
Preparation
In this lab, you will create two logical hosts for use in later lab exercises. You will use the administrative le system volumes that were created in a previous exercise.
Tasks
G G G G
Preparing the name service Activating the cluster Creating the logical hosts Testing the logical hosts
Logical Hosts
11-21
11
Preparing the Name Service
You must assign logical IP addresses for each of your logical hosts. You must enter these addresses and logical host names in the name service so they are available to each node. 1. Using the IP addresses given to you by your instructor, create /etc/inet/hosts entries (or entries in the appropriate name service) for each of your new logical hosts on each cluster node. IP Address:____________________ IP Address:____________________ clustername-nfs clustername-dbms
These IP addresses are not active, and do not have interfaces assigned at this time.
Activating the Cluster

If your cluster is not already started, start it now. 1. On only one cluster node, type: # scadmin startcluster phys_nodename clustername 2. Wait for the cluster to activate and reconguration to complete on the rst node. 3. On each other cluster node, type: # scadmin startnode Caution You must type the scadmin startnode commands simultaneously using the cconsole common window. If you do not start the additional nodes at exactly the same time, the CCD data can become corrupted.
11-22

11
Logical Host Restrictions
Now that the network interfaces and disk groups are ready, you can create the logical hosts. You create two logical hosts:
G G
clustername-nfs clustername-dbms
Remember:
G
At least one node in the cluster must be running to create a logical host. You do not need a CCD quorum. You must run the scconf command from a node joined to the cluster. You should run the scconf -L command only once for each logical host, on only one node. If you make a mistake, use scconf -L -r to delete the incorrect logical host denition and rerun the denition. Make sure that the nodes you assign to a logical host have physical access to that logical hosts disk group. Logical host names and logical host IP host names are not always the same. Make sure that you have identied the switchable disk drives for use by the administrative le system for each logical host.
Logical Hosts
11-23
11
Creating the Logical Hosts
1. Record your target logical host congurations. Logical Host clustername-nfs Primary Node Backup Node Disk group or diskset name Primary node interface Backup node interface Logical hostname clustername-nfs clustername-dbms hanfs hadbms Logical Host clustername-dbms
2. Create the hanfs logical host on one of your nodes. Use the -m option to prevent automatic switchback. # scconf clustername -L clustername-nfs \ -n node1,node2 -g hanfs -i \ intf,intf,clustername-nfs -m
Caution Do not use the cconsole common window.
!
3. Create the hadbms logical host on a different node so that each logical host is mastered by a different node. Use the -m option to prevent automatic switchback. # scconf clustername -L clustername-dbms \ -n node2,node1 -g hadbms -i \ intf,intf,clustername-dbms -m
11-24

11
Creating the CVM/SSVM Administrative File System
Skip this section if your cluster is using the SDS volume manager. 1. If your cluster is running the CVM or SSVM volume managers, create the administrative le system for each logical host by running scconf -F for each logical host on each cluster node on which that logical host will run. # scconf clustername -F clustername-nfs # scconf clustername -F clustername-dbms 2. Verify that the mount information for each logical host administrative le system has been entered in the vfstab.clustername-nfs and vfstab.clustername-dbms les in the /etc/opt/SUNWcluster/conf/hanfs directory on each cluster host system.
Creating the SDS Administrative File System

Skip this section if your cluster is using either the CVM or SSVM volume manager. The hanfs and hadbms administrative le system metadevices for SDS installations should have been created in an earlier exercise. The metadevice paths for the hanfs and hadbms diskset administrative le systems should be: /dev/md/hanfs/dsk/d100 (or /dev/md/hanfs/rdsk/d100) /dev/md/hadbms/dsk/d100 (or /dev/md/hadbms/rdsk/d100) 1. On both nodes, create the /clustername-nfs and /clustername-dbms administrative le system mount points. # mkdir /clustername-nfs /clustername-dbms
Logical Hosts
11-25
11
Creating the SDS Administrative File System (Continued)
2. On both nodes, create the logical host-specic vfstab les. # # # # # cd /etc/opt/SUNWcluster/conf mkdir ./hanfs cd hanfs touch vfstab.clustername-nfs touch vfstab.clustername-dbms
3. On both nodes, enter the hanfs administrative le system mount information into the vfstab.clustername-nfs les. /dev/md/hanfs/dsk/d100 /dev/md/hanfs/rdsk/d100 /clustername-nfs ufs 1 no 4. On both nodes, enter the hadbms administrative le system mount information into the vfstab.clustername-dbms le. /dev/md/hadbms/dsk/d100 /dev/md/hadbms/rdsk/d100 /clustername -dbms ufs 1 no -
11-26

11
Testing the Logical Hosts
1. Use either the haswitch command or the scadmin switch command to move a logical host to a new physical host. 2. On the new physical host, verify that the administrative le system using the mount command. 3. Put a logical host in maintenance mode. # haswitch -m logical_host_name 4. Take the logical host out of maintenance mode. # haswitch physical_node logical_host_name 5. Run the scadmin stopnode command on a cluster node with an active logical host. What happens? 6. Restart the node with scadmin startnode. What happens? Why? 7. Return all logical hosts to their home nodes using the haswitch or scadmin switch commands. Leave the cluster running for the next lab.
Logical Hosts
11-27
11
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
11-28

11
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K Congure logical hosts Create the administrative le system for a logical host Switch logical hosts between physical nodes
Logical Hosts
11-29
11
Think Beyond
If the concept of a logical host did not exist, what would that imply for failover? What complexities does having multiple backup hosts for a single logical host add to the high availability environment?
11-30

The HA-NFS Data Service

Objectives
G G G G G G
12
Describe the function of the HA-NFS support les List the primary functions of the HA-NFS start and stop methods List the primary functions of the HA-NFS fault monitoring probes Congure HA-NFS in a Sun Cluster environment Add and remove HA-NFS le systems Switch a HA-NFS logical host between systems
This module describes and demonstrates the conguration and management of Sun Cluster HA-NFS le systems.
12-1
12
Relevance

Discussion The following questions are relevant to understanding the contents of this module: 1. What does the system need to know abut a highly available NFS environment? 2. What conguration information does HA-NFS require? 3. Do clients have any recovery issues after a NFS logical host switchover?
12-2

12
G G
Sun Cluster 2.2 System Administration Guide, part number 805-4238 Online man pages for the scadmin and hareg commands
12-3
12
HA-NFS Overview
The HA-NFS environment is a simple set of modules that acts as an interface between the Sun Cluster high availability framework and the Solaris NFS environment. User applications continue to use NFS services as before, and there is no change to client administration. Because HA-NFS is designed to work in an off-the-shelf NFS environment, it does not improve upon existing NFS services beyond adding high availability. It also does not create any additional problems for existing NFS services. Clients see no operational differences.
cachefs, automounter, and so on all continue to work unchanged on the client systems.
The NFS 2.0 and NFS 3.0 versions are supported by the HA-NFS data service. HA-NFS co-exists with all other Sun Cluster data services, including HA-DBMS and parallel database congurations, on the same cluster node.

HA-NFS is HA-API compliant. It supports UFS and VxFS file systems.
12-4

12
HA-NFS Overview
HA-NFS Support Issues
PC Client Support
HA-NFS was designed to work with a heterogeneous network of NFS clients. Clients must implement the lock recovery protocol (that is, they must provide a lockd and statd daemon). Not all third-party NFS implementations support this feature. This is most often a problem with NFS client implementations on PCs.
Caution The NFS data service is still highly available to a client that does not support NFS lock recovery protocol, but in the event of failover or switchover, the server loses track of the applications locks and might grant another application instance access to the locked les. This could lead to data corruption.
PrestoServe Support
PrestoServe is not supported on HA-NFS servers. On HA-NFS servers, PrestoServe caches data in the event the server fails. If the server fails, PrestoServe waits until the server comes back up and resends the data. If the data is now serviced by another server, this can lead to synchronization problems.
Local HA-NFS File Systems Access

You cannot locally access HA-NFS le systems from either HA server. Local le locking interferes with the ability to run the kill command and restart the lockd command. Between the kill command and the system restart, a blocked local process can be granted the lock, which prevents the client machine that owns that lock from reclaiming it.
Secure NFS and Kerberos

Secure NFS and the use of Kerberos with the NFS environment is not supported in Sun Cluster HA congurations. In particular, the secure and kerberos options to share_nfs(1M) are not supported.
12-5
12
HA-NFS Data Service

The HA-NFS data service is automatically available when the Sun Cluster software is installed. No specic responses are required. Three components compose the HA-NFS software:
G
Start NFS methods The Sun Cluster framework uses the START methods to start or restart data services on a cluster host system.
Stop NFS methods The STOP methods are used to cleanly shut down a data service for maintenance or before starting the data service on a designated backup system
NFS-oriented fault monitoring The fault monitors constantly monitor the health of an active data service. They can force a logical host to migrate to a backup.
The methods are pre-congured routines that run automatically during a cluster reconguration or when a data service is manually stopped or switched between cluster hosts.
12-6

12
Start NFS Methods

The discussion on the next several pages concerns the theory of operation behind the HA-NFS data service. It includes details on the methods to stop and start HA-NFS, and the HA-NFS fault monitoring process. Administrators will never directly call these methods; they are run automatically. However, it can be useful to understand what is happening behind the scenes. The instructor can decide how much detail, if any, to go into based on the students backgrounds and needs.
The Start NFS methods run automatically during logical host reconguration (for example, during takeovers and switchovers). These methods do the following:
G G
Start or restart NFS-related daemons, as appropriate Force NFS daemons to go through a lock recovery protocol, just as if a server reboot has occurred Export shared le systems for the logical host
12-7
12
Start NFS Methods
Before the Start NFS methods are run, the High Availability framework takes ownership of the appropriate logical hosts disk groups and mounts their le systems. After the Start NFS methods complete, the High Availability framework begins listening for clients of the logical host. NFS service is now available. You should not start NFS manually. If NFS is started manually, the HA framework does not know which le systems are mounted and exported, and this could adversely affect the fault monitoring routines. You should not have any NFS le systems outside of the HA-NFS service. These NFS le systems do not failover and service is interrupted when the various NFS-related daemons are stopped and restarted by HA-NFS. If a le system is unknown to HA-NFS, it is not highly available. For example, assume a CD-ROM is mounted locally and shared. If the NFS daemons are stopped and restarted, NFS access to the CD-ROM is interrupted.
Caution Starting NFS on Sun Clusters must be done by the High Availability framework. Do not manually start NFS daemons, mount directories, or add entries to startup scripts to perform these tasks.
12-8

12
Stop NFS Methods

The Stop NFS methods are executed automatically during logical host recongurations (for example, during switchovers). The Stop NFS methods stops the appropriate NFS-related daemons and unshares the le systems. NFS clients will begin to see NFS Server not responding messages after the NFS-related daemons are stopped.
12-9
12
HA-NFS Fault Monitoring

The HA-NFS fault monitoring routines are run automatically to assess the health of the HA-NFS data service.
HA-NFS Fault Monitoring Probes

Each Sun Cluster HA server periodically probes the other servers. It checks if the NFS-related daemons (nfsd, mountd, lockd, statd, and rpcbind) are running and if a typical client is receiving service. To check if a typical client is receiving service, it uses the public net (not the private net) to access the peer server and attempts to mount a le system and read and write a le. When checking typical client services, it attempts to mount a le system approximately every 5 minutes, and it attempts to read and write a le approximately every minute. You cannot modify how often these HA-NFS fault monitoring probes are run.
12-10

12
HA-NFS Fault Monitoring
HA-NFS Fault Monitoring Probes (Continued)
A HA-NFS server is considered sick if its NFS service does not respond within a time-out period (approximately 510 minutes). This is in contrast to the CMM, which detects the failure of a node in a few seconds. Failed hosts can be detected quickly, but the system is more cautious when determining NFS failures, because there can be temporary server overload conditions. If the peer server appears sick, a takeover is considered. But rst, the new host ensures that it is running without problems. To do this, the new host: checks that its own NFS-related daemons are functional and checks that its own NFS-exported le systems can be mounted, read, and written. The HA framework checks the servers ability to communicate over the public network(s) and checks the name service for any problems. If everything is veried, a takeover procedure is initiated.
12-11
12
Fault Probes
As with any Sun Cluster High Availability data service, fault probes are used to determine whether the logical host is functioning correctly. There are two kinds of probes:
G G
Local probes (LP) Remote probes (RP)
The fault probes run on both the master and the rst backup node to allow for both local and remote functionality checking. If the system fails over to the rst backup, then the remote fault probes start running on the second backup, if one is congured. If there is not a second backup, then there will not be a remote probe. All probes act as NFS clients to the NFS data service, and perform mount, read, write, and locking operations.
12-12

12
Local Fault Probes

Local probes test the functionality of the data service without involving a network connection. This ensures that the service is running, and helps differentiate between a service failure and a network failure. The local HA-NFS probes run on the current physical host of the HANFS logical host. They are intended to ensure that the NFS le system is operational. There is one set of local probes per data service. The local probes do the following:
G
Use the logical host IP address to ensure that the NFS daemons are running on the physical host Perform read, write, and locking operations to each shared le system
If these tests fail, a message is written to the console and giveaway is considered.
12-13
12
Remote Fault Probes

The remote probes help identify not only a service failure, but network failures as well. Coupled with the network adapter management process, it allows the probes to determine whether the data service, local or remote network interface, the network itself, or the node has failed. Depending on the determined failure cause, the proper recovery action (network adapter switch, logical host failover, or no action) is taken. The remote probes ensure that the HA-NFS le system is visible, available, and operational from a remote node. The backup physical system for the HA-NFS logical host runs the remote probes. The remote probes do the following:
G
Use the logical host IP address to ensure that the NFS daemons are running on the physical host Mount all of the NFS le systems from the logical host Perform read, write, and locking operations to each le system
G G
If these tests fail, takeaway is considered.
12-14

12
Giveaway and Takeaway Process

If either the local or the remote fault monitors detect certain failures, they attempt to force a reconguration of the logical host. This is an attempt to migrate the logical host to a healthy system. If the fault is detected by the local fault monitor, it initiates a giveaway process. This might end with the designated backup system taking over the logical host. If a fault is detected by the remote fault monitor, it initiates a takeaway process. This might end the same as the giveaway process, with the backup system taking over the logical host.
12-15
12
Giveaway and Takeaway Process
As shown in Figure 12-1, either the local or remote fault monitors for a data service can initiate a logical host migration.
Public network phys-hostA phys-hostB
Data service check Local fault monitor
check
Remote fault monitor
giveaway
takeaway
Figure 12-1
Logical Host Giveaway and Takeaway
Sanity Checking
Before a physical host can become the new master of a logical host, it must be certied as fully operational. This is called sanity checking and is performed by the FM_CHECK methods. If both the local and remote host are not healthy, the logical host might shut down.
12-16

12
Processes Related to NFS Fault Monitoring

For each logical host, local and remote, the fault monitoring processes, nfs_probe_loghost and nfs_mon are running. In addition, there is one local probe per physical node that watches the local NFS daemons, nfs_probe_local_start.
NFS Server Daemon Threads

If you do not specify enough nfsd server daemon threads in the /etc/rc3.d/S15nfs.server script, on the /usr/lib/nfs/nfsd line. you might have server throughput problems. Also, if there are insufcient nfsd server daemon threads, remote fault probes can fail and cause unnecessary failovers. The default value is 16 threads. Some performance and tuning books suggest that you adjust the threads as follows:
G G G
Use 2 NFS threads for each active client process Use 16 to 32 NFS threads for each CPU (can be much higher) Use 16 NFS threads for each 10-Mbits of network capacity
12-17
12
HA-NFS Support Files

In addition to the logical host-specic vfstab le, there is also a logical host-specic dfstab le that contains NFS share commands. Once you create the logical host, adding a HA-NFS le system is similar to adding any NFS le system by adding mount entries to the vfstab.lhname le and by adding share entries to the dfstab.lhname le. The mount and share entries are congured on the cluster hosts that support the particular HA le system. If HA-NFS is already running in the logical host, transition the logical host in and out of maintenance mode using the haswitch command. This automatically mounts and shares any new le systems.
Caution If the le systems are mounted and shared manually, the local and remote fault monitoring processes are not started until you use the next haswitch command for that logical host. When using the VxVA GUI, make sure you do not accidentally create le systems with the automount option selected.
12-18

12
Adding Mount Information to the vfstab File
The vfstab le species the mounting of le systems. There must be a vfstab.lhname le for each logical host, and the les must be the same on all cluster nodes that might support that logical host. There can be more than one disk group for each logical host. All entries in this le must correspond to le systems located on multihost disks, and can specify either UFS or VxFS le systems. File systems can have any name and mount point, as long as they are represented in the proper logical host vfstab le.

This is common to all logical hosts, HA-NFS or otherwise.
Adding Share Information to the dfstab File

The dfstab le species the sharing (exporting) of HA-NFS le systems. There must be a dfstab.lhname le for each logical host that is exporting HA-NFS le systems. These les must be the same on all cluster nodes that master that logical host. If you use the dfstab options to limit access to only certain NFS clients, you should also grant access to all physical hostnames of the servers. This enables these servers to act as clients, which is an important part of fault monitoring. For example: share -F nfs -o rw=client1:client2:sc-node0:sc-node1 /hanfs/export/fs3 If possible, you should conne share options to just the rw or ro forms that do not provide a list of clients or netgroups. This removes any dependency on the name service. Note You must also register and start the HA-NFS data service, which is discussed in the Registering a Data Service section on page 12-21 and in the Starting and Stopping a Data Service section on page 12-25.
12-19
12
Sample vfstab and dfstab Files
Compare the vfstab.lhost1 and dfstab.lhost1 entries shown in the following: # cat vfstab.lhost1 /dev/vx/dsk/dg1/dg1-stat /dev/vx/rdsk/dg1/dg1-stat /hanfs ufs 1 no /dev/vx/dsk/dg1/vol1 /dev/vx/rdsk/dg1/vol1 /ha/fs1 ufs - no /dev/vx/dsk/dg1/vol2 /dev/vx/rdsk/dg1/vol2 /hahttp/home ufs - no /dev/vx/dsk/dg3/volm /dev/vx/rdsk/dg1/vol4 /dbms/tbp1 ufs - no # # # cat dfstab.lhost1 share -F nfs -d HA file system 1 /ha/fs1 share -F nfs -d HTTP file system /hahttp/home share -F nfs -d DBMS file system /dbms/tbp1
The administrative le system (/hanfs) is not shared. The administrative le system for a logical host is never shared.
Removing HA-NFS File Systems From a Logical Host

To remove a le system from HA-NFS control you must: 1. Manually unshare the le system that is being removed. 2. On all appropriate hosts, delete the related mount information from the /etc/opt/SUNWcluster/conf/hanfs/vfstab.lhname le for the logical host. 3. On all appropriate hosts, delete the related share information from the /etc/opt/SUNWcluster/conf/hanfs/dfstab.lhname le for the logical host.
12-20

12
Using the hareg Command

You use the hareg command to register a data service, change its on/off state, and display information about registered data services. It can congure the data service to all existing logical hosts, or just selected logical hosts (using -h).
Registering a Data Service

Before a data service, such as HA-NFS, can provide services you must register it. Data services are typically registered only once when the service is initially congured. You can register data services, only when the cluster is running and all nodes are joined.
12-21
12
Registering the HA-NFS Data Service
You can run the hareg command on any node currently in the cluster membership. It is run only once regardless of the number of HA-NFS logical hosts in the cluster.
G
To register the HA-NFS data service and associate one or more logical hosts with it: # hareg -s -r nfs [-h logical_host]
Note The -s option indicates that this is a Sun-supplied data service.

Although data services do not required to be restricted to specific logical hosts, if they are not restricted, there can be inappropriate processes associated with a logical host such as HA-NFS fault monitors associated with a HA-Oracle logical host.
G
To check the status of all data services: # hareg nfs off
To associate a previously registered data service with a new logical host: # scconf clustname -s data-service-name logicalhost-name
This command is run on only one of the nodes.

G
To obtain conguration information about a data service: # hareg -q nfs
12-22

12
Registering a Custom Data Service
Conguring a custom data service is a complex issue that requires a great deal of preparation. The following example shows how a custom data service might be registered. # hareg -r new_ha_svc \ -v 2.7 \ -m START=/var/new_ha_svc/my_start \ -t START=30 \ -m STOP=/var/new_ha_svc/my_stop \ -t STOP=30 \ -d NFS The command options have the following meaning:
G G G G
The -s option is not used, this is not a standard Sun data service The -r option precedes the data service name The -v 2.7 option denes the data service version number The -m START option is the path to the start method The supported method names are START, START_NET, STOP, STOP_NET, ABORT, ABORT_NET, FM_INIT, FM_START, FM_STOP, and FM_CHECK.
The -t START option is the timeout value for the start method The timeout value is the amount of time the method has to complete before it is terminated.
The -d NFS option indicates the custom data service is dependent on the NFS data service
12-23
12
Unregistering a Data Service
To stop and unregister a data service, such as HA-NFS, you must perform the following steps: 1. Stop the HA-NFS service on all nodes. # hareg -n nfs Note Data services are cluster-wide, you cannot stop HA-NFS on just one node. You can effectively stop a data service for a logical host by placing the logical host in maintenance mode. 2. Unregister the HA-NFS data service. # hareg -u nfs [-h ...] 3. If appropriate, remove the logical hosts themselves. # scconf clustername -L lghost1 -r Note You must turn a data service off before removing a logical host associated with it.
12-24

12
Starting and Stopping a Data Service
Each data service has an on/off state. The on/off state for a data service is cluster-wide, and persists through reboots, takeovers, and switchovers. This on/off state provides system administrators with a mechanism for temporarily shutting off a data service. When a data service is off, it is not providing services to clients.
Starting and Stopping the HA-NFS Data Service

To turn the HA-NFS data service on: # hareg -y nfs # hareg nfs on Note Use the hareg -n option to turn a data service off. You can use multiple -n and -y options to turn some services on and other services off at the same time. However, the nal on/off state must satisfy any data service dependencies. For example, if a data service depends on NFS, it is not legal to turn that data service on and turn NFS off at the same time.
Global Data Service Control

The hareq command has options that globally start all congured data services.
G
To start all registered data services: # hareg -Y
To stop all registered data services: # hareg -N
12-25
12
File Locking Recovery

When the HA-NFS logical host fails over to a different physical host, the client sees no signicant differences. The IP address it was communicating with before is active, and all of the le systems are available. A request could have timed out, and might need to be restarted, but there should be no other changes to the client. However, if the client had locked les on the server, the locks might have been lost when the server failed. This would require an immediate termination of all NFS services for the active clients, because data integrity could no longer be guaranteed. This would defeat the purpose of HA-NFS. When the logical host fails over, the NFS statd and lockd processes on the new physical system are restarted. Before serving any data, contact all of the clients and request information about the locks that they were holding. This lock reestablishment process is a capability of the normal, nonHA operation of the NFS environment, and is not changed for HANFS.
12-26

12
Exercise: Setting Up HA-NFS File Systems
G G G G G
Register the HA-NFS data service Verify that the HA-NFS data service is registered and turned on Verify that the HA-NFS le systems are mounted and exported Verify that clients can access HA-NFS le systems Switch the HA-NFS data services from one server to another
Preparation
There is no preparation required for this exercise.
Tasks
G G G G G
Verifying the environment Preparing the HA-NFS le systems Registering the HA-NFS data service Verifying access by NFS clients Observing HA-NFS Failover Behavior
12-27
12
Verifying the Environment
In earlier exercises, you created a logical host for HA-NFS, hanfs, using a disk group named hanfs. Conrm that this logical host is available and ready to congure for HA-NFS. 1. Ensure that your cluster is active. If not, start it. a. On only one cluster node, type: # scadmin startcluster phys_nodename clustername Note Wait for the cluster to activate and reconguration to complete on the rst node. b. Start each remaining node at exactly the same time. # scadmin startnode Caution If you cannot start the remaining nodes at exactly the same time, then wait for each node to complete its reconguration before starting the next node. 2. Make sure that you have the hanfs switchable disk group that you can assign for use by HA-NFS. The hanfs disk group was created in an earlier lab. # vxprint Note Use the metastat and metaset -s hanfs commands to verify the hanfs diskset and volume status for SDS installations. 3. Verify that the status of each new NAFO group is OK on all nodes. # pnmstat -l
12-28

12
Preparing the HA-NFS File Systems
Now that the logical host is ready, congure it for use with HA-NFS. The demo le system mount information is different for CVM/SSVM installations and SDS installations. Both examples are shown. 1. Add the /hanfs1 and /hanfs2 le system mount information in the vfstab.clustername-nfs le on all nodes on which the HANFS logical host will run.
CVM/SSVM Mount Information

/dev/vx/dsk/hanfs/hanfs.1 /dev/vx/rdsk/hanfs/hanfs.1 /hanfs1 ufs 1 no /dev/vx/dsk/hanfs/hanfs.2 /dev/vx/rdsk/hanfs/hanfs.2 /hanfs2 ufs 1 no -
SDS Mount Information

/dev/md/hanfs/dsk/d101 /dev/md/hanfs/rdsk/d101 /hanfs1 ufs 1 no /dev/md/hanfs/dsk/d102 /dev/md/hanfs/rdsk/d102 /hanfs2 ufs 1 no 2. On both nodes, create the logical host-specic dfstab les. # cd /etc/opt/SUNWcluster/conf/hanfs # touch dfstab.clustername-nfs 3. On both nodes, add share commands to the HA-NFS-specic dfstab.clustername-nfs les for your two HA-NFS test le systems. share -F nfs -o rw,anon=0 /hanfs1 share -F nfs -o rw,anon=0 /hanfs2 4. If any mountd and nfsd processes are running, stop them, or run /etc/init.d/nfs.server stop.
12-29
12
Registering HA-NFS Data Service
Now you are ready to activate the HA-NFS data service. Run these commands from only one node. 1. Use the hareg command to see which data services are currently registered. # hareg 2. Register HA-NFS by typing the following command on only one cluster node: # hareg -s -r nfs -h clustername-nfs 3. Use the hareg command again to see that the HA-NFS data service is now registered. # hareg nfs off 4. Turn on the HA-NFS service for the cluster. # hareg -y nfs 5. Verify that the logical hosts HA-NFS le systems are now mounted and shared using the mount and dfshares commands. Note You may have to switch the nfs logical host between nodes before the nfs le systems are mounted and shared.
12-30

12
Verifying Access by NFS Clients
Verify that NFS clients can access the HA-NFS le systems. 1. On the administration workstation, verify that you can access the nfs logical host le system. # ls /net/clustername-nfs/hanfs1 lost+found test_file 2. On the administration workstation, copy the Scripts/test.nfs le into the root directory. 3. On the administration workstation, edit the /test.nfs script and change the value of the clustername entry to match your logical hostname. #!/bin/sh cd /net/clustername-nfs/hanfs1 while (true) do echo date > test_file cat test_file rm test_file sleep 1 done When this script is running, it creates and writes to an NFS-mounted le system. It also displays the time to standard output (stdout). This script makes it easy to informally time how long the NFS data service is interrupted during switchovers and takeovers.
12-31
12
Observing HA-NFS Failover Behavior
Now that the HA-NFS environment is working properly, test its high availability operation. 1. On the administration workstation, start the test.nfs script. 2. Use the scadmin switch or haswitch command to transfer control of the NFS logical host from one HA server to the other. # scadmin switch clustername dest-phys-host logical-host # haswitch dest-phys-host logical-host 3. Observe the messages displayed by the test.nfs script. 4. How long was the HA-NFS data service interrupted during the switchover from one physical host to another? __________________________________________________ 5. Use the mount and share commands on both nodes to verify which le systems they are now mounting and exporting. __________________________________________________ __________________________________________________ __________________________________________________ 6. Use the ifconfig command on both nodes to observe the multiple IP addresses (physical and logical) congured on the same physical network interface. # ifconfig -a
12-32

12
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K K Describe the function of the HA-NFS support les List the primary functions of the HA-NFS start and stop methods List the primary functions of the HA-NFS fault monitoring probes Congure HA-NFS in a Sun Cluster environment Add and remove HA-NFS le systems Switch a HA-NFS logical host between systems
12-33
12
Think Beyond
Are there restrictions on the le systems HA-NFS can support? What types of NFS operations (if any) might be more difcult in the HA-NFS environment?
12-34

System Recovery
Objectives
Upon completion of this module, you will be able to:
G G G G G G
13
List the functions of Sun Cluster control software List the events that can trigger a cluster reconguration Explain the failfast concept Describe the general priorities during a cluster reconguration Describe the recovery process for selected cluster failures Recover from selected cluster failures
This module summarizes the basic recovery process for a number of typical failure scenarios. It includes background information and details about operator intervention.
13-1
13
Relevance

Discussion The following questions are relevant to your learning the material presented in this module: 1. How does the cluster recognize that there has been an error? 2. What types of error detection mechanisms are there? 3. How does the administrative workstation detect an error? 4. How do you recover from the common Sun Cluster HA system failures?
13-2

13
G G
Sun Cluster 2.2 System Administration Guide, part number 805-4238 Sun Cluster 2.2 Cluster Volume Manager Guide, part number 805-4240 Sun Cluster 2.2 Error Messages Manual, part number 805-4242
System Recovery
13-3
13
Sun Cluster Reconguration Control

Reconguration in a cluster environment can happen at several different levels. Some recongurations are independent of the cluster framework software. For example: the disk management software monitors the state of virtual volume and can detach mirrors if there is a hardware failure. This is managed independently of any other cluster software. Other recongurations can range from a full reconguration in which cluster membership is renegotiated to a minor switchover to a backup network interface.
13-4

13
Many of the components shown in Figure 13-1 have failure recovery capabilities. Some failures are less transparent than others and can result in a node crash. Although some of the failures do not disturb the cluster operation, they reduce the level of redundancy and therefore increase the risk of data loss.
Node 0
DBMS SMA CMM Heartbeats Network Private networks driver Updates ccdd Disk management Fiber-optic channels PNM FF Fault monitor ccdd PNM FF Fault monitor Network driver SMA CMM
Node 1
DBMS
Disk management Fiber-optic channels
Storage array
Storage array
Figure 13-1
Cluster Reconguration Control
System Recovery
13-5
13
All of the following cluster software components have some level of error detection and recovery capabilities.
Cluster Membership Monitor

The CMM daemon, clustd, detects the loss of the heartbeat from a failed node and initiates a general cluster reconguration.
Switch Management Agent

The SMA detects the loss of the primary cluster interconnect and switches to the backup interconnect path. This is a minor local reconguration. SMA provides support for Ethernet private networks and for SCI private networks, as well as additional SCI switch management functions.

The pnmd process monitors the state of the clusters public network interfaces and network. It can failover to a backup adapter or conrm a network outage. It initiates only a minor local reconguration if it is switching to a backup interface. It can also trigger a larger logical host reconguration if general public network problems are detected on a cluster host.
Failfast Driver (/dev/ff)

The Sun Cluster failfast driver monitors critical processes or operations. If they do not respond or complete within dened limits, the failfast driver forces a system panic.
13-6

13
Data Service Fault Monitors
The local and remote fault monitors for a data service can force the data service to migrate to a new host system. This causes an intermediate level cluster reconguration.
Disk Management Software

The CVM, SSVM, and SDS volume managers all monitor the state of their virtual volume structures. If a disk drive failure is detected for a mirrored or RAID5 volume, the disk management software can take the failed object out of active use. This type of reconguration is completely independent of any other cluster software and is transparent to all other cluster software.
Database Management Software

Some databases, such as Oracle Parallel Server, have resident recovery mechanisms that can automatically recover from an unexpected cluster host system crash and continue. This is an independent recovery feature. Most other databases do not have recovery capabilities of this kind.
System Recovery
13-7
13
Sun Cluster Failfast Driver

The failfast mechanism is a software watchdog usually described as a time-out on a time-out. Some failures are too critical to allow further node operation. The affected node must be stopped immediately to prevent database corruption. The failfast driver forces a UNIX panic. After the panic, the system automatically reboots. The panic ensures that the Sun Cluster software is aware that a problem occurred and will initiate appropriate actions.
13-8

13
As shown in Figure 13-2, the failfast drive is constantly monitoring the success of critical daemons or cluster operations. If the monitored operation times out, the failfast driver forces a UNIX panic. Node 0 Critical daemon Critical operation
Kernel driver: ff
OK
Failfast timeout All other nodes UNIX panic CMM Loss of heartbeat detected reconf_ener
Reboot
Cluster configuration dependent steps
Figure 13-2
Failfast Mechanism
After the panic, the system automatically tries to start again. However, if there is UNIX le system damage that cannot be automatically repaired by the fsck utility, you might have to run fsck manually to repair the damage.
System Recovery
13-9
13
Failfast Messages
When the failfast driver forces the UNIX panic, a panic message is displayed in the cconsole window of the system. As shown in the following example, the panic message contains an error message that might point to the source of the problem. # panic[cpu3]/thread=0xf037c4a0: Failfast timeout - unit comm_timeout Device closed while Armed syncing file systems... [3] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 done rebooting... You must record the failfast error if possible. It will soon scroll off the screen. The relevant portion of the error is displayed in quotes.
13-10

13
Sun Cluster Reconguration Sequence

Many events can trigger cluster reconguration. Regardless of how a general reconguration is initiated, the general process is controlled by a master script le named reconf_ener. The reconf_ener script has many subroutines and depending on how it was initiated, the reconguration can be minor or it can be a major reconguration that results in the loss of one or more cluster members. The later stages of a full reconguration are reserved for logical host recongurations. The /opt/SUNWcluster/bin/reconf_ener script is run any time a node reconguration is required. Warning Do not edit the reconf_ener script. Any change to the script can cause unreliable operation or database corruption.
System Recovery
13-11
13
As shown in Figure 13-3, once the reconf_ener script le is initiated, it can perform many different operations that are dependent on current status information. The reconf_ener script can also initiate disk management software recovery procedures.
Operator commands # scadmin startcluster # scadmin startnode # scadmin stopnode
Status change detected Failed private network Other node failed Other node joining cluster
reconf_ener
Varied reconfiguration steps depending on the cluster configuration and application
Disk Management Monitor and disable structures Resync volumes UNIX File system recovery Reboot after panic Virtual volumes
Figure 13-3
Reconguration Initiation
13-12

13
Reconfiguration Triggering Events
Many cluster events can trigger a reconguration including:
G G G
The operator using the scadmin command to start or stop a node. The CMM (clustd) detecting a failed node. The CMM (clustd) detecting that another node is joining the cluster. SMA detecting a failed private network. This generates a minor reconguration.
Independent Reconguration Processes

The following recovery processes are independent of the reconf_ener command
G G
CVM independently manages problems with virtual volumes. RDBMS user application failures are detected and handled internally by the RDBMS software. Oracle data recovery using redo logs is handled by Oracle, but is initiated indirectly by the DLM recovery process. PNM reconguration of NAFO groups UNIX le system recovery is performed automatically by the fsck utility unless the errors are too severe.
G G
System Recovery
13-13
13
Sun Cluster Reconguration Steps

When there is a change in cluster status, either because of a failed node or operator intervention, the reconguration process proceeds in steps. The steps are coordinated between all active nodes in the cluster and all nodes must complete a given step before the reconguration can proceed to the next step.
13-14

13
As shown in Figure 13-4, the cluster interconnect system (CIS) is used for communication between nodes during a reconguration. It provides a critical link that is used to verify step changes between the cluster members.
CIS
reconf_ener
reconf_ener
reconf_ener
Step 1 | Step 2 | Step 3 | Step 4 | Step n
Figure 13-4
Cluster Reconguration Coordination
System Recovery
13-15
13
Reconguration Process Priorities
The reconguration steps are prioritized. The rst steps during a cluster reconguration resolve fundamental issues that are important to general cluster operation. The later steps address the more specialized cluster functionality. The steps proceed as follows: 1. The general reconguration steps are completed rst. This includes:
G G G G
Reserving quorum devices Arbitrating cluster membership Establishing cdb and ccd consistency Starting disk management recovery programs
2. If appropriate for the cluster application, database reconguration steps are initiated next. This is important for Oracle Parallel Server installations and includes:
G
Distributed lock management recovery
3. The data services reconguration steps are completed in the nal stages and include:
G G G G G G
Shutting down NFS daemons Exporting HA-NFS le systems Importing disk groups (to backup node) Using fsck to mount failed HA-NFS le systems Using ifconfig on logical IP addresses to backup node Restarting NFS daemons (initiating client lock recovery)
13-16

13
Reconguration Step Summary
As shown below, the cluster reconguration process varies depending on the data service conguration. Several different reconguration processes can take place in the same time period. Steps
0 1 Disk quorum or failure fencing resolved CIS issues resolved CCD issues resolved Distributed lock recovery initiated and coordinated.
General
CVM/SSVM
Begin
Oracle DLM
2 3 4 5 6 7 8 9 10 11 12

PNM issues resolved
Volume recovery initiated and recovery performed if necessary
Logical host issues resolved Finish
In a cluster that is running only OPS, there might not be any logical hosts configured and steps 10 and 11 would take very little time.
System Recovery
13-17
13
Cluster Interconnect Failures

CIS Failure Description
If the Ethernet or SCI interconnect fails on a node in the cluster, the smad daemon on that node detects the failure and initiates a minor reconguration that switches to the backup CIS interface. The other nodes in the cluster are aware that CIS communications have moved to the backup interface so they also switch to their backup interfaces. Note A critical feature of a CIS failure is that switching to a backup interface can be done quickly. If it takes too long, then the cluster membership monitor daemon, clustd, times out and a major cluster reconguration is started.
13-18

13
CIS Failure Symptoms
Error messages are different for each type of cluster interconnect system. The following Ethernet-based interconnect failures issue continuously repeating error messages. ID[SUNWcluster.sma.down.5010]: link between node 0 and node 1 on net 0 is down Aug 16 11:08:55 eng-node0 ID[SUNWcluster.reconf.5010]: eng-cluster net 0 (be0) de-selected Aug 16 11:08:56 eng-node0 ID[SUNWcluster.reconf.1030]: eng-cluster net 1 (be1) selected be0: Link Down - cable problem?
The following SCI interconnect failure messages cease after the backup interface is operational. NOTICE: ID[SUNWcluster.sma.smak.4001]: SCI Adapter 0: Card not operational (1 2) NOTICE: ID[SUNWcluster.sma.smak.4051]: SCI Adapter 0: Link not operational (1 2) Nov 15 17:55:08 sec-0 ID[SUNWcluster.sma.smad.5010]: seccluster adapter 0 de-selected Nov 15 17:55:08 sec-0 ID[SUNWcluster.sma.smad.1030]: seccluster adapter 1 selected
System Recovery
13-19
13
Correcting Ethernet CIS Failures
The following actions are necessary to repair a failure in a Ethernetbased cluster interconnect:
G G
You must determine if the problem is a cable or an interface card. You might have to take the node with the failed Ethernet interface out of clustered operation while repairs are being made. After repairs are complete, you must bring the node into the cluster again. No further action is required.
Correcting SCI Interconnect Failures

The following actions are necessary to repair a failure in a SCI-based cluster interconnect:
G G
You must determine if the problem is a cable or an interface card. You might have to take the node with the failed SCI interface out of clustered operation while repairs are being made. After repairs are completed, you must take all of the nodes out of clustered operation and run the sm_config program on one of the nodes followed by a reboot of all nodes.
Caution If any SCI card or switch is moved or replaced, you must run the /opt/SUNWsma/bin/sm_config script again to reprogram the SCI card ash PROMs. The sm_config script will tell you which cluster hosts must be rebooted.
13-20

13
Two-Node Partitioned Cluster Failure

If a complete cluster interconnect failure occurs in a two-node cluster, the clustd daemon on both nodes detect a heartbeat loss, and each node initiates a reconguration. The reconguration process is different depending on which disk management software your cluster is using.
CVM or SSVM Partitioned Cluster

If there is a complete CIS failure in a two-node cluster that is running either CVM or SSVM software, both nodes in the cluster race to reserve the designated quorum disk drive. The rst node to reserve the quorum device remains in the cluster and takes over any additional logical hosts for which it is the backup. The other node aborts the Sun Cluster software. Once the CIS problem is repaired, both nodes can resume normal clustered operation.
System Recovery
13-21
13
Two-Node Partitioned Cluster Failure
SDS Partitioned Cluster
If there is a complete CIS failure in a two-node cluster that is running the SDS software, the concept of a quorum device does not exist. When the private net is broken, both nodes cannot communicate with each other and assume they are alone in the cluster. The host owning the logical host or hosts completes its cluster reconguration as normal and takes no action as it owns all the logical hosts. However the opposite host also believes it is alone in the cluster and it also goes through cluster reconguration. During its reconguration, the backup host assumes it has to take control of the logical host and in doing so takes ownership of the disksets. This causes a panic on the node that currently has ownership of the disksets. If each host is the designated backup for the other, it is a race as to which one panics rst.
13-22

13
Logical Host Reconguration

Each logical host has both a local and a remote fault monitoring program associated with it.
G G
The local fault monitor runs on the current logical host master The remote fault monitor runs on the designated backup system for the logical host.
If either the local or the remote fault monitors detect certain failures, they attempt to force a reconguration of the logical host. This is an attempt to migrate the logical host to a healthy system.
System Recovery
13-23
13
Logical Host Reconguration
As shown in Figure 13-5, the local fault monitor runs on the logical host master and veries the health of the master host. The remote fault monitor runs on the designated backup system for the logical host and also veries the correct operation of the logical host master. Public network phys-hostA phys-hostB
Data service check Local fault monitor
check
Remote fault monitor
giveaway
takeaway
Figure 13-5
Logical Host Fault Monitoring
If the fault is detected by the local fault monitor, it initiates a giveaway process with the hactl command. This might end with the designated backup system taking over the logical host. If a fault is detected by the remote fault monitor, it initiates a takeaway process with the hactl command. This might end the same as the giveaway process, with the backup system taking over the logical host.
Sanity Checking
Before a physical host can become the new master of a logical host, it must be certied as fully operational. This is called sanity checking and is performed by the FM_CHECK methods.
13-24

13
Exercise: Failure Recovery
Exercise objective In this lab you will perform a recovery for the following:
G G G G G
Failed cluster interconnect Partitioned cluster A NAFO group interface failure A logical host fault monitor giveaway or takeaway A failfast
Preparation
You should start the Sun Cluster Manager application on one of the cluster hosts and display it on the administration workstation. Use the Sun Cluster Manager application to observe the effects of the failures that are created in this lab.

The failfast exercise could result in permanent file system damage. Tell students now if do not want them to perform the failfast procedure.
Tasks
G G G G G
Recovering after losing a private network cable Recovering for a cluster partition Recovering after a public network failure Recovering after a logical host fault monitor giveaway Recovering for a cluster failfast
System Recovery
13-25
13
Losing a Private Network Cable
1. Disconnect the active private network cable or turn off the associated SCI switch. Note Be careful with the fragile SCI cables if you move them.
G
Predicted behavior: __________________________________________________________ __________________________________________________________ __________________________________________________________ __________________________________________________________ Observed behavior: __________________________________________________________ __________________________________________________________ __________________________________________________________ __________________________________________________________
Partitioned Cluster (Split Brain)

Disconnect both private network cables from the same node, as close to simultaneously as possible, or turn off both SCI switches. Note Be careful with the fragile SCI cables if you move them.
G
13-26

13
Public Network Failure (NAFO group)
1. Disconnect an external (public) network cable.
G
Logical Host Fault Monitor Giveaway

1. On one node, use the kill command to kill the nfsd daemon on the physical host mastering the HA-NFS logical host.
G
System Recovery
13-27
13
Cluster Failfast
1. Kill the clustd process on a node to create a cluster abort or kill the ccdd daemon to create a failfast panic.
G
Caution Creating a failfast causes a UNIX panic. This can cause permanent le system damage.
13-28

13
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
System Recovery
13-29
13
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K K List the functions of Sun Cluster control software List the events that can trigger a cluster reconguration Explain the failfast concept Describe the general priorities during a cluster reconguration Describe the recovery process for selected cluster failures Recover from selected cluster failures
13-30

13
Think Beyond
What are the issues for split-brain failures with more than two modes? Is it safe to have two subclusters running in a nominal four-node cluster? What procedures should be documented for operations personnel?
System Recovery
13-31
Sun Cluster High Availability Data Service API 14

Objectives
G G G G G
Describe the available data service methods Describe when each method is called Describe how to retrieve cluster status information Describe how to retrieve cluster conguration information Describe how the fault methods work and how to request failovers
This module demonstrates how to integrate your applications into the Sun Cluster High Availability framework. It also describes key failover actions performed by the Sun Cluster High Availability software.
14-1
14
Relevance

Discussion The following questions are relevant to understanding this modules content: 1. What are the requirements to add a data service? 2. What information do you have to provide to Sun Cluster? 3. How do you interact with the cluster? 4. How do you retrieve cluster status and conguration information?
14-2

14
G G
Sun Cluster System 2.2 Administration Guide, part number 805-4238 Sun Cluster 2.2 API Developers Guide, part number 805-4241
Sun Cluster High Availability Data Service API
14-3
14
Overview
Sun Cluster High Availability enables an API to make a data service highly available. The API permits a client-server data service to be layered on top of Sun Cluster High Availability. Usually, the data service already exists and was developed in a non-HA environment. The API was designed to permit an existing data service to be easily added to the Sun Cluster HA environment. The Sun Cluster HA Data Service API employs commandline utility programs and a set of C library routines. For convenience, all C library functionality is also available using the commandline utility programs. This gives the programmer the option to code shell scripts or to code in a compiled language. Note Custom written HA data services are not supported by SunService unless they are written by the Sun Professional Services organization.
14-4

14
Overview
When a data service rst registers with Sun Cluster High Availability, it registers a set of call-back programs or methods. Sun Cluster High Availability makes call-backs to the data services methods when certain key events in the Sun Cluster High Availability cluster occur. Refer to the Sun Cluster 2.2 API Programmers Reference Guide for more information.

In this release, only facilities for generic application failover (that is, starting and stopping a data service) are provided. In the event of a logical host failover from one High Availability server to another, generic applications will also experience failover; they will be shut down on the one server and automatically started on the other server. Enhancements such as support for fault monitoring of generic applications are planned for future releases.
14-5
14
Data Service Requirements

A data service must meet the requirements discussed in the following sections below to participate in the Sun Cluster High Availability Data Service API.
Client-Server Data Service

Sun Cluster High Availability is designed for client-server applications. Time-sharing models in which clients remotely log in and run the application on the server have no inherent ability to handle a crash of the server.
Data Service Dependencies

The data service process(es) must be relatively stateless in that they write all updates to disk. When a physical host crashes and a new physical host takes over, Sun Cluster High Availability calls a method to perform any crash recovery of the on-disk data.
14-6

14
Data Service Requirements
No Dependence on Physical Hostname of Server
If a data service needs to know the hostname of the server on which it is running, it should be modified to use the logical hostname rather than the physical hostname.
Handles Multi-homed Hosts

Multi-homed hosts are hosts that are on more than one public network. Sun Cluster High Availability servers can appear on multiple networks, and have multiple logical (and physical) hostname and IP addresses. The data service must handle the possibility of multiple logical hosts on more than one public network.
Handles Additional IP Addresses for Logical Hosts

Even in hosts that are non-multi-homed, a High Availability server has multiple IP addresses: one for its physical host, and one additional IP address for each logical host it currently masters. A High Availability server dynamically acquires additional IP addresses when it becomes master of a logical host, and dynamically relinquishes IP addresses when it gives up mastery of a logical host. The START and STOP methods provide hooks for Sun Cluster HA, which inform a data service that the set of logical hosts has changed.
14-7
14
Reconfiguration Overview
Whenever a change in cluster state occurs, the Sun Cluster HA software performs a cluster reconguration. A cluster state change can be caused by a host crashing, or by the planned migration of a logical host using the haswitch or scadmin switch commands. The Sun Cluster reconguration process is a sequence of steps on all physical hosts that are currently up and in the cluster. The steps execute in lock-step, which means that all hosts complete one step before any host goes on to the next step. After general reconguration, issues are resolved, such as the CDB and CCD consistency checks. A number of HA routines are performed that start, stop, or abort the operation of a logical host, as necessary. These HA routines are called methods.
Caution If the methods are stored in a disk group on the multihost disks, only the server that currently owns that disk group has access to the methods. All servers in the high availability cluster must be able to execute the START, STOP, and ABORT methods.
14-8

14
Data Service Methods

Whenever any change in cluster state occurs as part of the cluster reconguration, Sun Cluster High Availability calls the data services method programs on each host in the cluster. Note You must have methods for starting and stopping a Sun Cluster High Availability data service. ABORT methods are optional, and can be omitted.
START Methods
After a physical host crashes, Sun Cluster HA moves the logical host, which the physical host had been mastering, to a surviving host. Sun Cluster HA uses the START methods to restart the data services on the surviving hosts.
14-9
14
STOP Methods
The haswitch command moves a logical host from one physical host to a backup through cluster reconguration. When the haswitch command is executed, the STOP methods are used to cleanly shut down the data service on the original physical host before starting the data service on the backup. Similarly, the hastop command uses the STOP methods to cleanly shut down Sun Cluster HA data services. Note The STOP method should perform a smooth shutdown but does not wait for network clients to completely finish their work, because that could introduce an unbounded delay.
ABORT Methods
The ABORT methods are called when the cluster on a particular node aborts. All cluster activity on the node is stopped, but the physical node continues to run. The ABORT methods must immediately halt their data services. If a fault probe detects that a high availability server is sick, it causes a failover of data services from that sick HA server to the healthy HA server. Before shutting down, the sick server attempts to call the ABORT methods for all currently registered data services. Sun Cluster HA monitors the health of the physical hosts, and can decide to halt or reboot a physical host, if necessary. The ABORT methods execute last wishes code before halting a HA server. The ABORT and ABORT_NET methods are similar to the START and STOP methods. ABORT_NET methods are called while the logical hosts network addresses are still congured UP. ABORT methods are called after the logical hosts network addresses are congured in the DOWN state and are not available.
14-10

14
ABORT methods (Continued)
There is no guarantee that the ABORT or ABORT_NET methods are called. The HA server might panic, and no methods can be called. Or the HA server might be so sick that it cannot successfully execute the ABORT methods. You should use ABORT methods only to optimize performance. Data services must function correctly, even if ABORT methods are not called. ABORT and ABORT_NET methods might be called while one of the other four START/STOP methods are executing. The ABORT methods might nd that one of the START/STOP methods was interrupted during its execution, and did not nish executing.
NET Methods
START_NET Method
For each registered data service whose ON/OFF state is ON, Sun Cluster HA first calls the data services START method program. When the START method is called, the logical hosts network addresses are not available because they have not been configured UP yet. Next, logical network addresses are configured UP and then the START_NET method is called. When the START_NET methods are called, the logical hosts network addresses are configured UP and are available.
STOP_NET Method
For each registered data service, Sun Cluster HA calls the data services STOP_NET method program. When the STOP_NET method is called, the logical hosts network addresses are still configured UP. Next, the logical hosts network addresses are configured DOWN, and then the STOP methods are called.
14-11
14
NET Methods
NET Method Workload
The data service can split up the stopping of work between its STOP_NET and STOP methods any way it chooses. By the time the STOP method returns, all necessary work associated with stopping the data service should be accomplished. In particular, the data service must be sure to cease using any data on the logical host disk groups, as ownership of these disk groups must be relinquished in subsequent reconfiguration steps. It is up to each individual data service to decide how to split the work between the START and START_NET method programs. The data service can make one of them non-operational and do all the work in the other, or it can do some work in each method. All the work necessary to start up the data service should have been accomplished by the time START_NET returns control.
Fault Monitoring Methods

Sun Cluster HA software denes four methods for data services to use for their own fault monitoring:
G G G G
FM_INIT FM_START FM_STOP FM_CHECK
The FM_INIT, FM_START and FM_STOP methods are called during the appropriate points of the logical host reconguration sequence. A data service can register any or all of these methods when it rst registers itself with the hareg command.
14-12

14
FM_INIT and FM_START
For each registered data service whose ON/OFF state is ON, Sun Cluster HA calls the data services FM_INIT method. FM_INIT initializes fault monitoring of a data service. The data service uses the FM_INIT and FM_START methods can be used by the data service to start up its own data service specic fault monitoring. This fault monitoring indicates whether the data service is available and performing useful work for its clients. FM_INIT and FM_START are called as two successive steps of the Sun Cluster HA reconguration. The FM_INIT step completes on all hosts in the cluster before any host executes the FM_START step. A data service fault monitor can leverage this sequencing if it needs to perform some initialization on all of the hosts before actually starting the fault monitoring. For example, the data service might need to create some dummy objects for the fault monitor to query or update or both.
FM_STOP
The FM_STOP method stops fault monitoring of a data service.
FM_CHECK
The FM_CHECK method checks the health of a data service. It is not called during HA cluster conguration. It can be called by a local or remote data service fault monitor that has detected a possible data service fault condition. If can be used to verify the health of either the current data service master or a potential new data service master. Depending on the results of the FM_CHECK methods, a potential logical host failover can be stopped or can continue on.
14-13
14
Giveaway and Takeaway

You can detect a problem with a logical host can be detected by either the local fault monitor or the remote fault monitor for the logical host. If the fault is detected by the local fault monitor, it initiates a giveaway process with the hactl command. This can result in the designated backup system taking over the logical host. If a fault is detected by the remote fault monitor, it initiates a takeaway process with the hactl command. This also could result in the backup system taking over the logical host. Before a physical host can become the new master of a logical host, it must be certied as fully operational by the FM_CHECK method.
14-14

14
Giveaway Scenario
This scenario assumes that the local data services fault monitor, running on the physical host, phys-hostA, has detected a problem and concluded that phys-hostA is not healthy. 1. The local data service fault monitor (running on phys-hostA) requests that the logical host be given up using the following command: phys-hostA# hactl -g -s A -l mars 2. The potential new master for the logical host is phys-hostB. 3. The FM_CHECK methods for both data services are called on phy-hostB.
M
If all FM_CHECK methods exit zero, indicating they are healthy, the reconguration (transferring logical host to physical host, phys-hostB) continues. If any FM_CHECK method exits non-zero, indicating it is not healthy on phys-hostB, the logical host is not transferred to the physical host, phys-hostB.
14-15
14
Takeaway Scenario
This scenario assumes the data servicess remote fault monitor, running on phys-hostB, concludes that phys-hostA is unhealthy, and requests that the logical host be taken away by using the following command: phys-venus# hactl -t -s A -l mars The potential new master for logical host mars is phys-hostB. The FM_CHECK method for the data service fault monitor is called on phys-hostB but not for phys-hostA. The exit status of the FM_CHECK methods are used in the same way as the giveaway scenario described above. If all FM_CHECK methods exit zero, the reconguration continues. If any FM_CHECK method exits non-zero, the logical host is not transferred.
14-16

14
Method Considerations
It is important to remember the following rules when designing any HA method:
G G
Do not blindly start or stop a data service. Verify whether the data service has already been stopped or started.
Sun Cluster HA might call START and START_NET methods multiple times, with the same logical hosts being mastered, without an intervening STOP or STOP_NET method. Sun Cluster HA might call STOP and STOP_NET methods multiple times without an intervening START or START_NET method. The same applies to the fault methods. For example, START methods should verify that their work has already been accomplished (that is, the data service has been started) before starting any processes. Similarly, STOP methods should verify that the data service has already been stopped before issuing commands to shut down the data service.
14-17
14
STOP and START Method Parameters
If the data service is in the ON state, the STOP or START method is called with:
G
A comma-separated list of logical hosts for which this physical host is the master A comma-separated list of logical hosts for which this node is the next backup node The amount of time the method can take in seconds
Note If the data service is in the ON state, the STOP method is called with an empty string, a comma-separated list of all logical hosts, and a timeout. The START methods are not called if the data service is in the OFF state.
14-18

14
START and STOP Method Examples

Example 1
Assume there is a symmetrically congured two-node high availability cluster. The two HA servers are named phys-venus and phys-mars, and the logical hosts are named mars and venus. Also, assume that phys-mars is mastering mars and phys-venus is mastering venus. When the high availability cluster goes through the reconguration process, the STOP_NET/ STOP methods are called rst, and then the START/START_NET methods are called. The identical STOP/START methods will be called on both phys-mars and phys-venus. However, different arguments are passed to these methods on both servers. The examples show how these argument lists differ. The STOP and START method depend on their programmed logic to do the right thing based on the current cluster conguration and the current state of the data service.
14-19
14
START and STOP Method Examples

Example 2
This example assumes a failover occurred and phys-venus now masters both venus and mars. Different arguments are passed to the STOP and START methods on the HA servers. Note The timeout was specified when the data service was registered. Sun Cluster runs each method in its own child process group, and if the timeout is exceeded, it sends a SIGTERM signal to the child process group.
14-20

14

The implementors of a data service must know the other data services upon which their service depend. Data service dependencies are specified when a data service is initially registered with Sun Cluster HA by using the hareg command. The registered dependencies induce a partial order in which the data service methods are called by Sun Cluster HA. Applications that access a database have dependencies on that database. Before starting the application, the database must be running. Additionally, the application should be shut down before the database is shut down.
G
Data service A depends on data service B if, in order for A to provide its service, B must be providing its service. You must supply data service dependencies to the hareg command when you register a data service.
14-21
14
If data service A depends on data service B, then the START method for data service B is called and completes before Sun Cluster HA calls the START methods for data service A. The START_NET method for data service B similarly is completed before Sun Cluster HA calls the START_NET methods for data service A. For stopping the data service, the dependencies are considered in reverse order. The STOP_NET method of data service A is called and completed before the STOP_NET method of data service B. Similarly, the STOP method of data service A completes before the STOP method of data service B is called. The dependencies for the ABORT and ABORT_NET methods are the same as for the STOP and STOP methods. In the absence of dependencies, Sun Cluster HA can call methods for different data services in parallel.
14-22

14
The haget Command

You use the haget command to extract conguration and state information about a SC HA conguration. Usually, the haget command is called by data service methods. The haget command allows you to program in a shell script, rather than in C using the library functions in ha_get_calls. You can call the haget command from any scripting language that can execute commands and redirect output. The haget command is designed to make multiple calls to get all the needed information. Each call is specific enough that parsing the output of the call should not be required.
14-23
14
The haget Command
The haget Command Options
The haget command outputs the extracted information on stdout. The syntax for the command is: haget [-S] [-a API_version] -f fieldname [-h hostname] [-s dataservice] The -f option takes a single argument that is one of a set of predened eld names. Specify the -f option only once on the command line. Some eld names require a physical or logical host; that is specied by the -h option or the name of a particular data service that is, specied by the -s option. The following -f eld name do not require a -h or -s switch: all_logical_hosts Returns all of the logical hosts in this SC HA conguration. Returns the names of all physical hosts in this SC HA conguration. Returns the names of all logical hosts that this physical host currently masters. Also a not_mastered option. Output the name of the syslog facility that SC HA uses.
all_physical_hosts
mastered
syslog_facility
The following -f eld name requires a -s switch: service_is_on Outputs a line containing 1 if the data service is on, and outputs a line containing 0 if the data service is off
14-24

14
The haget Command
The haget Command Options
The following -f eld names require a -h switch: names_on_subnets Returns the hostnames associated with the named hosts subnetworks. Returns the private links associated with the name host. Returns the names of all physical hosts that can serve the named logical host. Returns the absolute path name of where the named logical hosts administrative le system directory will get mounted. Outputs the full path name of the vfstab le for the named logical host. Queries whether the named logical host is in maintenance mode. Returns the current master of the named logical host
private_links
physical_hosts
pathprex
vfstab_le
is_maint master
The following are examples of the haget command using the options described previously: # haget -f venus mars # haget -f phys-mars phys-venus # haget -f phys-venus # haget -f 1 all_logical_hosts
physical_hosts -h mars
master -h mars service_is_on -s hasvc
14-25
14
The hactl Command

The hactl command provides control operations for use by Sun Cluster HA fault-monitoring programs. The control operations include requesting the movement of a logical host from one physical host to another (possibly forcibly), requesting the movement of all logical hosts that a physical host currently masters to other physical host(s), and requesting an Sun Cluster HA cluster reconguration. The hactl command applies several sanity checks before actually carrying out the request. If any of these sanity checks fail, then the hactl command does not carry out the request and it exits with no side effects.
14-26

14
The hactl Command
The hactl Command Options
The following examples demonstrate the most common usage of the hactl command by fault-monitoring programs.
G
hactl -s NFS -r Recongures the logical hosts for the NFS environment (as called by the NFS Fault Methods).
hactl -k abcd -s nshttp -t -l lhost1 Takes over logical host lhost1 because one of its data service fault monitors has detected that its current master is sick.
hactl -s nsmail -g -p phy-host1 Gives away all logical hosts.
hactl -s NFS -n -t -p phy-host1 Does sanity checks, but giveaway is not done yet.
14-27
14
The halockrun Command

The halockrun command provides a simple mechanism to serialize command execution from a shell script. The halockrun command runs a command while holding a le system lock on a specied le using fcntl(2). If the le is already locked, the halockrun command is delayed by the le system locking mechanism until the le becomes free. Essentially, halockrun implements a mutex mechanism, using the lock le as the mutex. The syntax for halockrun is: halockrun [-vsn] [-e exitcode] lockfile prog [args] Where
lockfile prog args
Is the le serving as the serialization point Is the program to be run Is the arguments to be run
See the halockrun man page for further information.
14-28

14
The hatimerun Command

The hatimerun command provides a simple mechanism to limit the execution time of a command from a shell script. The hatimerun command runs a command under a timer of duration timeoutsecs. If the command does not complete before the timeout occurs, the command is terminated with a SIGKILL command (default), or with the signal specied in the -k argument. The command is run in its own process group. The syntax for the hatimerun command is: hatimerun [-va] [-k signalname] [-e exitcod] -t timeoutsecs prog args Where
timeoutsecs Indicates how long the program has to nish prog args
Identies the program to be run Identies the arguments to be run
The -a operand starts the program and allows it to nish asynchronously. The hatimerun command itself nishes immediately. See the hatimerun man page for further information.
14-29
14
The pmfadm Command
The pmfadm command allows you to start a process, and have the process restarted if it fails. The process can be restarted a certain number of times, or continually. Each process monitored by the pmfadm command has a name tag, which is an identier that describes a process to be monitored. You can use the name tag to stop the process if necessary. The pmfadm command is used as follows:
G
To start monitoring a process, use: pmfadm -c nametag [-n retries] command [args]
To stop a monitored process, use: pmfadm -s nametag
14-30

14
What Is Different From HA 1.3?
The haget command
You must be aware of the following haget command changes if you are migrating to the Sun Cluster environment:
G
The vfstab option prints a comma-separated list of vfstab le contents of the logical host conguration. Sun Cluster HA does not maintain a conguration le. The exit code of 3 in Solstice HA is thus not relevant.
The hactl command

You must be aware of the following hactl command changes if you are migrating to the Sun Cluster environment:
G
The hactl -p option does not abort the node after migrating all the logical hosts. It uses PNM for network monitoring.
14-31
14
The hads C Library Routines
When you write methods using script you can access the hads C libraries through the hactl command. If you write methods in C you must be sure to include the following libraries directly:
G
Synopsis # cc [flag ...] -I /opt/SUNWcluster/include file \ -L /opt/SUNWcluster/lib -lhads -lintl -ldl \ [library ...] #include <hads.h>
Data structures
M M M M M
ha_network_host_t ha_physical_host_t ha_logical_host_t ha_config_t ha_lhost_dyn_t
Functions
M M
ha_open, ha_close ha_get_config, ha_getcurstate, ha_getmastered, ha_getnotmastered, ha_getonoff, ha_getlogfacility
For more information, see the Sun Cluster 2.2 API Programmers Reference Guide.
14-32

14
Exercise: Using the Sun Cluster Data Service API
G
Use the haget command to gather cluster conguration and status information
Preparation
There is no preparation for this exercise.
Tasks
The following task is explained in this section:
G
Using the haget command
14-33
14
Using the haget Command
The haget command extracts information about the current state of a Sun Cluster HA conguration. It would typically be used within a START or STOP method that was written as a script for the C shell or the Bourne shell. Use the haget command to display the following information. Record the results below. 1. The names of all logical hosts: # haget -f all_logical_hosts __________________________________________________ 2. The names of all physical hosts: # haget -f all_physical_hosts __________________________________________________ 3. The names of all logical hosts that this physical host currently masters: # haget -f mastered __________________________________________________ 4. The name of the physical host which is currently master of the logical host clustername-nfs: # haget -f master -h clustername-nfs __________________________________________________
14-34

14
Using the haget Command (Continued)
5. Whether or not the logical host clustername-nfs is in maintenance mode: # haget -f is_maint -h clustername-nfs __________________________________________________ 6. Whether or not the NFS data service is currently on: # haget -f service_is_on -s nfs __________________________________________________ Note See the man page for the haget command for more information on the various haget options.
14-35
14
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K K K K Describe the available data service methods Describe when each method is called Describe how to retrieve cluster status information Describe how to retrieve cluster conguration information Describe how the fault methods work and how to request failovers
14-36

14
Think Beyond
Are there other methods that might be needed for some data services? What would they be? Are there ways to make a non-HA compliant data service work with HA? How would you debug HA API problems when you were developing your data service?
14-37
Highly Available DBMS

Objectives
Upon completion of this module, you will be able to:
G G
15
List the conguration issues for a highly available DBMS instance Describe the general installation and conguration process for an HA-DBMS data service
This module describes the operation and conguration of a DBMS in the Sun Cluster High Availability environment.
15-1
15
Relevance

Discussion The following questions are relevant to your learning the material presented in this module: 1. How is an HA-DBMS instance different from other High Availability data services? 2. What unique things need to be done for an HA-DBMS instance?
15-2

15
G G
15-3
15
Sun Cluster HA-DBMS Overview

The Sun Cluster HA-DBMS services for Oracle, Sybase, and Informix databases are a simple set of modules that act as an interface between the High Availability framework and off-the-shelf database software. User applications continue to use the database services as before. The Sun Cluster HA-DBMS data service components are designed to:
G G G
Leverage existing database crash recovery algorithms Have minimal client-side impact Be easily installed on servers
No changes to the database engines are required. Applications run against the DBMS normally. No change to database administration on the clients is required.
15-4

15
Database Binary Placement
You can install the database binaries either on the local disks of the physical hosts or on the multihost disks. Both locations have advantages and disadvantages. Consider the following points when selecting an install location. Placing database binaries on the multihost disk eases administration, since there is only one copy to administer. It ensures high availability of the binaries or server during a cluster reconguration. However, it sacrices redundancy and therefore availability in case of some failures. Alternatively, placing database binaries on the local disk of the physical host increases redundancy and therefore availability in case of failure or accidental removal of one copy
Supported Database Versions

The Sun Cluster HA-DBMS data service currently supports:
G G G
Oracle Versions 7.3.3, 7.3.4, 8.0.4, and 8.0.5 Sybase Versions 11.5 Informix Version 7.23 and 7.30
Note The supported versions can change without notice. You should always have your Sun eld representative check the current product release notes for database revision support.
15-5
15
HA-DBMS Components
You can divide the Sun Cluster HA-DBMS support software into three primary components:
G G G
Database start methods Database stop methods Database-oriented fault monitoring
Each of these components is described in more detail on the subsequent pages.
Multiple Data Services

If you are running multiple data services in your Sun Cluster High Availability conguration (for example, HA-NFS and HA-DBMS for Informix), you can set them up in any order. You will need separate licenses for the HA-DBMS data service.
15-6

15
Typical HA-DBMS Conguration

The HA-DBMS runs entirely on one node. As is the case with any HA data service, its data is accessible from at least one other node. Should the primary node fail, the entire DBMS instance is restarted on the second node, conceptually as if it were restarting on the rst node. A physical cluster node can support multiple instances (as separate logical hosts) of a DBMS server. They can be congured to failover to different backup physical hosts if desired. The administrator can manually switch an entire DBMS logical host between physical hosts at any time, using the scadmin switch or the haswitch command. Should a cluster node fail, the logical hosts running on that node are automatically switched to the available congured backup node(s).
15-7
15
Conguring and Starting HA-DBMS

The general procedure used to prepare a HA-DBMS instance is: 1. Congure the logical host. # scconf clustername -L ... # scconf clustername -F ... 2. Register the HA-DBMS service and associate it with the logical hosts that will run instances of it. # hareg -s -r oracle -h lhost1[, lhost2, ...] 3. Start the HA-DBMS service. # hareg -y oracle 4. Register the HA-DBMS instances. # haoracle insert ... DBMS fault monitoring is started automatically when the HA-DBMS data service is started.
15-8

15
Stopping and Unconguring HA-DBMS

To stop and uncongure a DBMS data service: 1. Stop the HA-DBMS service. All instances of the HA-DBMS in the cluster will now stop. The DBMS fault monitors are automatically stopped. Or, put the logical host in maintenance mode. # hareg -n sybase 2. Disconnect the service from the logical host. # scconf clustname -s -r syblhost1 3. Unregister the HA-DBMS service. # hareg -u sybase 4. Uncongure the logical host if appropriate. # scconf clustname -L syblhost1 -r 5. Remove the logical host vfstab.lhname and the administrative le system if appropriate.
15-9
15
Stopping and Unconguring HA-DBMS
Removing a Logical Host
To remove a logical host and leave the HA-DBMS data service active for other logical hosts in the cluster, you must not run the hareg commands from the previous procedure. To remove a logical host: 1. Stop the logical hosts DBMS instance(s), or put the logical host into maintenance mode. 2. Stop the data service # hareg -n sybase 3. Disassociate the data service from the logical host. # scconf clustname -s -r syblhost1 4. Uncongure the logical host if appropriate. # scconf clustname -L lhost1 -r 5. Remove the logical host vfstab.lhname or le if appropriate. Note A logical host can also be removed using the scinstall program. When run on an already congured cluster, there is an option to modify the cluster or data service conguration.
Removing a DBMS From a Logical Host

If you are removing only the DBMS instance from a logical host, you need to stop only the instance and disassociate the data service from the logical host. # scconf clustname -s -r syblhost1 Remove any logical host vfstab.lhname if appropriate.
15-10

15
The HA-DBMS Start Methods

The discussion on the next several pages concerns the theory of operations behind the HA-DBMS for the Oracle, Sybase, and Informix data services. It includes details on the Start, Stop, and fault monitoring procedures for the supported databases. Administrators will never directly call these procedures; they are run automatically. However, it is useful to understand what is happening in the background. You can decide how much detail, if any, to go into based on the students backgrounds and needs.
The HA-DBMS Start methods are run automatically when the related data service is registered (for example, with the hareg command). They are also run automatically during takeovers and switchovers or whenever a cluster reconguration occurs. These methods start or restart the database, as appropriate. The database software automatically goes through its crash recovery protocol, performing log roll forward, roll back, or both as appropriate.
15-11
15
The HA-DBMS Start Methods
Before the database Start methods are run, the Sun Cluster High Availability (HA) framework takes ownership of the appropriate logical hosts disk groups and mounts their le systems if necessary. After the Start methods complete, the HA framework begins listening for clients of the logical host. HA-NFS service is also available now. The Sun Cluster HA-DBMS data service for Oracle, Sybase, or Informix is available as soon as crash recovery is complete. Caution You should not start the HA databases manually from the command line or script les. The HA databases on Sun Cluster servers should be started by the HA framework. The only exception to this is if the database becomes corrupt. Then you must stop the fault monitoring and manually start and stop the database.
G
The Start methods are executed automatically during logical host startup (for example, during takeovers and switchovers). Before the Start procedure is invoked, the HA framework takes ownership of the appropriate logical hosts disk groups and mounts any le systems. The HA database Start methods:
M M
Start or restart the database software, as appropriate Cause the database to automatically go through a crash recovery protocol, rolling the log forward or back
After the Start methods complete, the HA framework begins listening for clients of the HA database logical host. The database service is available as soon as crash recovery is complete. The fault methods are started at this time as well.
Caution If your cluster is congured for the HA Oracle data service, the /var/opt/oracle/oratab le must contain an N in the autostart eld.
15-12

15
The HA-DBMS Stop and Abort Methods

The HA-DBMS Stop Methods
The HA-DBMS Stop methods are executed automatically during switchovers to cleanly shut down the database servers.
G
The Stop Oracle method issues a shutdown immediate to Oracle The Stop Sybase method issues a shutdown wait to Sybase The Stop Informix method issues a onmode to Informix
G G
A timeout of 6 minutes is allowed for the Stop methods to complete. (This is the default timeout for the scadmin switch and haswitch commands, and cannot be modied.) The shutdown might not complete if clients have long transactions in process. If the stop methods do not complete, the scadmin switch or haswitch command will not succeed. You can correct the situation (for example, inform clients they should disconnect) and then initiate the switchover again. You can put the DBMS or logical host into maintenance mode to perform DBMS maintenance tasks.
15-13
15
The HA-DBMS Stop and Abort Methods
The HA-DBMS Abort Methods
When a cluster node fails, the Abort methods are invoked instead of the stop methods. The HA-DBMS Abort methods perform differently than the Stop methods. The Abort methods force immediate termination of HA database activity. The different Aborts methods stop the database as follows:
G
The HA-Oracle Abort method issues a shutdown nowait to the Oracle server. The HA-Sybase Abort method issues a shutdown nowait to the Sybase server. The HA-Informix Abort method issues a onmode -k to the Informix server.
15-14

15
HA-DBMS Fault Monitoring

The HA-DBMS fault monitoring routines are run automatically for each database server. Only servers that have been added and started using a HA-specic command are monitored. For each HA server, there are two fault monitor processes: one local monitor and one remote monitor. The local probe ensures that the DBMS instance remains running on the local host. The remote probes act as clients to the DBMS, ensuring that basic operations can be performed
15-15
15
Local Fault Probe Operation
A local HA-DBMS fault probe operates by scanning a DBMS message log le as follows:
G G
It tails the alert.log (or equivalent) of the database. For each message to this le, it looks for a match in the action le: /etc/opt/SUNWcluster/packagename/hainformix|oracle|sybase_c onfig_V1.
It takes action as directed in the action le (restart, stop, give up, or none).
Remote Fault Probe Operation

A remote HA-DBMS fault probe acts as a DBMS client and performs the following functions:
G G
Performs queries to a DBMS table Runs the following types of SQL queries:
M M M M
create table insert update drop
Checks for database failure, network failure, and node failure which are then properly handled by the High Availability framework. Uses the vendor DBMS network connectivity product Uses the public network interface
G G
15-16

15
HA-DBMS Action Files
Each supported database has an alert le that is periodically checked for errors by the fault monitors. When an error is detected in the alert le, an action le for the database is consulted. The action le associates errors with actions that should be taken. The HA-DBMS action les are located and named as follows:
G G G
/etc/opt/SUNWscor/haoracle_config_V1 /etc/opt/SUNWscsyb/hasybase_config_V1 /etc/opt/SUNWscinf/hainformix_cong_V1
Action File Fields

Each line in the action les consist of nine elds. The rst six elds are input elds and the last three are output elds. The elds are as follows:
G G G G G G G G G
Database old state Database error number Process died Log message Timeout Internal error New state Action Message
15-17
15
HA-DBMS Action Files
Action File Entries
The following are action le entries from the HA-Informix action le: di * di 50 * * * * 908 * * * * 0 * * 1 0 co co co takeover Network is down restart Attempt to connect to database server failed restart A timeout has occurred during disconnect
The old state eld denes which database operations have been performed in the current fault probe, and are being evaluated now. The possible values are:
G G G
co connect to the server di disconnect from the server on connection is established
The following are action le entries from the HA-Oracle action le: * co 2811* * * * 1034* * 0 0 * co takeover Unable to attach shared memory segment restart Oracle is not available
The following are action le entries from the HA-Sybase action le: * 822 * * * * * takeover Could not start I/O for request BLKIO %S_BLKIOPTR = 0x%lx, size = %ld, errcode = 0x%lx, %S_BUF. See errorlog file for further details. restart Space available in the log segment has fallen critically low in database %s
on
7412* * * *
15-18

15
HA-DBMS Failover Procedures
If the active server responds with an error during the HA-DBMS fault monitoring or if the server has crashed on its current server, a takeover is considered by the designated backup server. Before a takeover occurs, the backup server uses the FMcheck method to ensure that it is healthy and that the appearance of ill-health of the other server is not caused by a problem (for example, a network or name service problem) that would affect it.
G
The High Availability framework checks the servers ability to communicate over the client network. The High Availability framework checks the health of the name service.
If everything checks out, the designated backup server initiates a takeover procedure.
15-19
15
Conguring HA-DBMS for High Availability

When the Sun Cluster High Availability conguration was created by scinstall, you were asked which data services you would be running. The following steps describe how to congure any of the supported HA-DBMS applications
Multiple Data Services

If you are running multiple data services in your Sun Cluster High Availability conguration (for example, HA-NFS and HA-DBMS for Oracle), you can set them up in any order. You might need separate licenses for each data service.
15-20

15
Conguring HA-DBMS for High Availability
Raw Partitions Versus File Systems
You can congure the supported HA-DBMS databases to use VxFS le systems or raw partitions. If you choose to use raw partitions, you still need at least one le system to store the related database conguration les. This le system must be in a disk group that is built on devices switchable between the primary and backup nodes. This is a strict requirement; no other congurations are supported. While Oracle, Sybase, and Informix each supports raw I/O to both raw physical devices and raw volume manager volumes (mirrored or non-mirrored), Sun Cluster High Availability supports only raw I/O on raw mirrored volumes. You cannot safely use devices, such as /dev/rdsk/c1t1d1s2 to contain database data under Sun Cluster HA because they are not mirrored.
15-21
15
Conguration Overview
When installing and conguring a HA-DBMS data server, you must address both general and database-specic issues.
General HA-DBMS Conguration Issues

The following describes the general HA-DBMS conguration issues.
G
Make sure the cluster is up and running before starting the HADBMS installation and conguration. Do not start the database manually or through another automated system. If it is not started by the HA framework, the fault monitors will not be operational. All database les (data les, logs, redo logs, control les) must be located on a single switchable disk group on the multihost disks, with nothing on the local disks.
15-22

15
Conguration Overview
User and Group Entries
You must create database manager user and group entries. You can add these entries to name services, such as NIS or NIS+. However, you should also make these entries in the local /etc les to eliminate the dependency on the availability of the network name service.
Database Software Location

By default, some database installations will place conguration les on the private disks. It is essential that these les do not reside in the default location; they must reside on the switchable, multi-host le systems. Additionally, each database must have an instance conguration le created as follows:
G
Each Oracle database instance must be listed in the /var/opt/oracle/oratab le on all backup servers, and must contain N in the autostart eld. Each Informix server must be listed in the /var/opt/informix/inftab le on all cluster nodes. Each Sybase server must be listed in the /var/opt/sybase/sybtab le on all High Availability servers. This table must be the same across all cluster nodes.
M
The RUN_server, RUN_backupserver, and interfaces les are the only Sybase les (in addition to the Sybase system software) that should reside on the local disks. All other Sybase les (data les, logs, and so on) must be located on a single switchable disk group. Copy the interfaces le to all HA servers that will support the Sybase data service.
15-23
15
Oracle Installation Preparation

To prepare for an Oracle installation: 1. Make sure the cluster is running on the node you are using, and that the Oracle logical host is active on it. 2. Create a passwd le entry (oracle) and group entry (dba) for Oracle. The users root and oracle both must be members of this group. 3. Choose a location for the ORACLE_HOME and ORACLE_DOC directories. They must be owned by oracle, be in the dba group, and be in the same location on each HA server. # chown oracle /oracle # chgrp dba /oracle Note You must use the vxedit command if the mount points are the root of a Volume Manager disk group.
15-24

15
Oracle Installation Preparation
4. Determine the database instance name (that is, the ORACLE_SID). If you are conguring more than one instance, you must use different database instance names on each logical host. 5. Install the Oracle software.
G
For complete instructions on installing Oracle, refer to the Oracle installation documentation. Remember that all logs and databases must go on the logical host. Changes to system les (such as /etc/system) must be made on all nodes.
G G
6. Change the default values of the various shared memory parameters in /etc/system on all nodes, and reboot. Remember to specify enough resources to support all failed over logical hosts. Note The passwd and group entries can be added to the name services (for example, NIS/NIS+), but they should also be added to the local /etc les to eliminate any dependency on the network name service. Sun Cluster High Availability only supports raw Oracle I/O on raw mirrored Enterprise Volume Manager volumes for availability reasons. Although Oracle supports raw I/O to raw physical devices and raw non-mirrored Enterprise Volume Manager volumes, these are not supported with Sun Cluster High Availability. Sun Cluster High Availability also supports Oracle on VxFS logging Sun Enterprise Volume Manager volumes (VxFS le systems).
15-25
15
Sybase Installation Preparation

1. Make sure the cluster is running on the node you are using, and that the Sybase logical host is active on it. 2. Create a passwd entry for the Sybase user ID (typically sybase). 3. Create a group entry for the database administrator group (typically dba). 4. Choose a location for the Sybase directories, $SYBASE.
G G
If possible, install Sybase on a separate disk. The Sybase directory should be owned by the Sybase user ID and be in the Sybase database administrator group. # chown sybase $SYBASE # chgrp dba $SYBASE
You may have to use vxedit if this is a Volume Manager volume root directory.
15-26

15
Sybase Installation Preparation
5. Install the Sybase software.
M
For complete instructions on installing Sybase, refer to the Sybase documentation. Remember all logs and databases must go on the logical host.
6. Change the default values of the various shared memory parameters in /etc/system on all nodes, and reboot. Remember to specify enough resources to support all failed over logical hosts. Note The HA-DBMS for Sybase fault monitor requires that the ctlib.loc le be present in the $SYBASE/locales/us_english/iso_1 directory. You can install this le by loading one of the Sybase connectivity tools, such as Open Client (DB-Library).
Note You can add the passwd and group entries to the name services (such as, NIS/NIS+), but they should also be added to the local /etc les to eliminate any dependency on the network name service. Remember that all of the le systems that you use for HA-Sybase should be mirrored for availability reasons.
15-27
15
Informix Installation Preparation

To prepare for an Informix installation: 1. Make sure the cluster is running on the node you are using, and that the Informix logical host is active on it. 2. Create a passwd entry for the Informix user ID (typically informix). 3. Create a group entry for the database administrator group (typically informix). 4. Choose a location for the Informix directories, $INFORMIXDIR.
M M
If possible, install Informix on a separate disk. The Informix root directory should be owned by the informix user ID and group ID: # chown informix $INFORMIXDIR # chgrp informix $INFORMIXDIR
15-28

15
Informix Installation Preparation
5. Install the Informix software.
M
For complete instructions on installing Informix, refer to the Informix installation documentation. Remember that all logs and databases must go on the logical host.
6. Set up connectivity by making the appropriate entries in the /etc/services and sq1hosts les on both High Availability servers. 7. Change the default values of the various shared memory parameters in /etc/system on all nodes, and reboot. Remember to specify enough resources to support all failed over logical hosts. Note You can add the passwd and group entries to the name services (for example, NIS or NIS+), but they should also be added to the local /etc les to eliminate any dependency on the network name service. Sun Cluster High Availability only supports raw Informix I/O on raw mirrored Enterprise Volume Manager volumes for availability reasons. Although Informix supports raw I/O to raw physical devices and raw non-mirrored Enterprise Volume Manager volumes, these are not supported with Sun Cluster High Availability. Sun Cluster High Availability also supports Informix on VxFS logging Sun Enterprise Volume Manager volumes (VxFS le systems).
15-29
15
Preparing the Logical Host

To prepare the logical host: 1. Start the Sun Cluster High Availability software, and use the scadmin switch or haswitch command to switch all logical hosts to one High Availability server. phys-mars# scadmin switch sc-node1 sc-nfs sc-dbms phys-mars# haswitch sc-node1 sc-nfs sc-dbms You must switch the logical host only if you want to run the installation process on a particular cluster host. The only restriction is that you must run the installation process on the physical host that currently masters the HA-DBMS logical host.
15-30

15
2. Verify that the volume manager volumes have been created, and that they have the correct ownerships and permissions on both High Availability servers. You must change ownership of the database volumes so that the database can access them. They are owned by root when they are created. The following shows an example of changing database volume ownership for CVM and SSVM volumes. # vxedit -g dgname set user=oracle group=dba mode=660 volumename # vxedit -g dgname set user=sybase group=dba mode=660 volumename Do not use chown or chmod because CVM and SSVM recreate the device les after each boot. The database will be unable to restart after a reboot or cluster node restart. Note If you are using SDS as your volume manager, volume (metadevice) ownership is set using standard UNIX commands, such as chown, chgrp, and chmod.
Preparing the Database Conguration Files

The database conguration les cannot reside in the default location; they must reside on a switchable disk group le system (associated with the logical host). Each supported database has a different method for forcing the location of conguration les.
Enable Fault Monitoring Access

You must grant access to a user ID and password which will be used by the fault monitoring routines. You do this with the following database-specic commands: haoracle, hasybase, and hainformix.
15-31
15
Registering the HA-DBMS Data Service
You must register each HA-DBMS data service using the hareg command. The following shows the variations for each supported database.
G
Use the hareg -s -r oracle command to register the HAOracle data service to the cluster. Use the hareg -s -r sybase command to register the HASybase data service to the cluster. Use the hareg -s -r informix command to register the HAInformix data service to the cluster.
Adding Entries to the CCD

You must add an entry for each HA-DBMS server to the CCD le. The command is database specic as follows:
G
Use the haoracle insert command to add an entry for each Oracle server. Use the hasybase insert command to add an entry for each Sybase server. Use the hainformix insert command to add an entry for each Informix server.
Bring the HA-DBMS for Oracle Servers Into Service

To bring the HA-DBMS servers into service, by execute the following database-specic commands:
G G G
haoracle start hasybase start hainformix start
15-32

15
HA-DBMS Control
Setting HA-DBMS Monitoring Parameters
When each database is entered into the CCD, you must specify several parameters that affect fault monitoring. Depending on the database, you must use either the haoracle, hasybase, or hainformix command.
15-33
15
HA-DBMS Control
Setting Up HA-Oracle
The haoracle command is used to congure Oracle fault-probe monitoring. The general command syntax is: # haoracle insert $ORACLE_SID logicalhost 60 10 120 300 \ user/password /logicalhost/.../init$ORACLE_SID.ora listener The above command line includes the following:
G G G
haoracle insert Command and subcommand $ORACLE_SID Name of the Oracle database instance logicalhost Logical host serving $ORACLE_SID (not the physical host) 60 10 120 300 Parameters that specify a probe cycle time of 60 seconds, a connectivity probe cycle count of 10, a probe time out of 120 seconds, and a restart delay of 300 seconds. user/password User and password to be used for fault monitoring. They must agree with the Oracle authentication. To use the Solaris Operating Environment authentication, enter a slash (/) instead of the user name and password. /logicalhost/.../init$ORACLE_SID.ora This indicates the pfile to use to start up the database. This must be on a logical hosts disk group. listener The SQL*Net V2 listener. The listener is started and monitored using this name. The default is LISTENER. This eld is optional.
Note If you are updating some of the parameters, you must specify all of the parameter, even if they will not change.
15-34

15
HA-DBMS Control
Setting Up HA-Sybase
The hasybase command is used to congure Sybase fault-probe monitoring. The general command syntax is: # hasybase insert sqlserver logicalhost 60 10 120 300 \ user/password $SYBASE/install/RUN_sqlserver backupserver \ $SYBASE/install/RUN_backupserver The above command line includes the following:
G G G G
hasybase insert Command and subcommand sqlserver Name of the SQL Server logicalhost Logical host serving sql_server (not the physical host) 60 10 120 300 Parameters that specify a probe cycle time of 60 seconds, a connectivity probe cycle count of 10 seconds, a probe timeout of 120 seconds, and a restart delay of 300 seconds user/password Login name and password for the Sybase account $SYBASE/install/RUN_sqlserver File used to start the SQL Server backupserver (optional) Name of the backup server $SYBASE/install/RUN_backupserver (optional) File used to start the backup server
G G
G G
Note If you are updating some of the parameters, you must specify all of the parameters, even if they will not change.
15-35
15
HA-DBMS Control
Setting Up HA-Informix
The hainformix command is used to congure Informix fault-probe monitoring. The general command syntax is: # hainformix insert $ONCONFIG logicalhost 60 10 120 300 \ dbname $INFORMIXSERVER The above command line includes the following:
G G G
hainformix insert Command and subcommand $ONCONFIG Name of the Informix conguration le logicalhost Logical host serving $ONCONFIG (not the physical host) 60 10 120 300 Parameters that specify a probe cycle time of 60 seconds, a connectivity probe cycle count of 10, a probe time out of 120 seconds, and a restart delay of 300 seconds dbname Name of the database that Enterprise 10000 system 2.2 is to monitor $INFORMIXSERVER Name of the Informix server
Note If you are updating some of the parameters, you must specify all of the parameters, even if they will not change.
15-36

15
HA-DBMS Control
Starting and Stopping HA-DBMS Monitoring
You can start and stop the fault-monitoring for a HA database using database-specic commands. These commands do not start or stop the database, only the fault-monitoring. To start or stop fault-monitoring for an Oracle instance: # haoracle start my_db # haoracle stop my_db To start or stop fault-monitoring for a Sybase server: # hasybase start my_server # hasybase stop my_server To start or stop fault-monitoring for a Informix instance: # hainformix start my_server # hainformix stop my_server Note You can use the list option with any of the above commands to show which servers are congured for HA-DBMS operation. Using the stop command option effectively places the database into maintenance mode. The specied database is marked in the CCD so that a node or cluster restart does not re-start monitoring of this instance. You must use the start command to re-enable and restart monitoring of that instance.
15-37
15
HA-DBMS Client Overview

When conguring DBMS applications to be highly available, keep in mind the following:
G
No special conguration should be necessary for HA-DBMS clients. Client applications should access the database by using the logical hostname and not the physical hostname. Client-server connections are stateful
M
Client connections (and active transactions) do not survive a logical host switchover. The client application must be prepared to cope with disconnects and reconnect or recover as appropriate.
Database recovery time depends on the application.
15-38

15
HA-DBMS Client Overview
Maintaining the List of Monitored Databases
The haoracle/hasybase/hainformix commands are used to maintain the list of monitored servers in the HA-DBMS conguration. You can also use these commands to put individual HA databases into and out of maintenance mode by stopping and starting monitoring. The haoracle/hasybase/hainformix list command is used to list the entries in the CCD that are currently being monitored. The insert command is used to add new entries, and the delete command is used to delete an existing entry. The update command is used to update an existing record.
15-39
15
HA-DBMS Recovery
Client Recovery
Client-server connections are stateful and cannot survive an HADBMS switchover. The client application must be prepared to cope with disconnects and reconnect or recover as appropriate. A transaction monitor might simplify the application. Applications should already be prepared for this, because this occurs occasionally during normal operation.
15-40

15
HA-DBMS Recovery
HA-DBMS Recovery Time
HA-DBMS server recovery time is application dependent, and depends in part on the database transaction recovery time. Recovery time is affected by the number of applications, the number of databases, the amount of rollback, and the load and type of the application. Remember that, before instance recovery can occur, disk group or diskset recovery must have completed. In production, you might need to adjust the restart delay time specied with the haoracle/hasybase/hainformix update command if the database instance cannot complete its crash recovery in the specied interval. If it cannot, another takeover is requested, so make sure that you have allowed enough time for crash recovery to complete. Make sure that you have adjusted the cluster reconguration step timings with scconf -T and scconf -l if necessary. These times should be coordinated with that specied on the haoracle/hasybase/hainformix command. Typically, the logical host timeout value is adjusted by monitoring the time it takes to complete reconguration steps 10 and 11. The reconguration timeout must equal or exceed the logical host timeout. Frequently an additional 50% is added to the measured logical host reconguration. The default value for the reconguration timeout is 720 and the default value for the logical host timeout is 180. You must also take into account such things as application timeout limits. For instance, Oracle allows six minutes for a shutdown.
15-41
15
HA-DBMS Conguration Files

Each HA-DBMS installation has a number of Sun Cluster les that contain information about the HA-DBMS conguration. Some of the les also contain fault-recovery information. The CCD database also contains status and conguration information about each HA-DBMS application. You must not manually edit these les. Most of them are modied indirectly as a result of executing various HA-DBMS control and conguration commands.
15-42

15
HA-Oracle Conguration Files
The /etc/opt/SUNWscor directory contains information about a HA-DBMS for Oracle conguration. Only the haoracle command should be used to modify the Oracle information in these les. For more information, see the man page for haoracle(1M).
G
/var/opt/oracle/oratab A list of all Oracle servers registered in a Sun Cluster HA configuration. /etc/opt/SUNWscor/haoracle_config_V1 Conguration le for the Oracle HA fault monitor. CCD Contains the list of HA-DBMS Oracle database instances. /etc/opt/SUNWscor/haoracle_support Table of HADBMS for Oracle releases supported by the currently installed Sun Cluster release.
G G
Caution You must never directly edit the haoracle_config_V1, the CCD, and haoracle_support les.
15-43
15
HA-Sybase Conguration Files
The /etc/opt/SUNWscsyb directory contains information about the HA-DBMS for Sybase conguration. Only the hasybase command should be used to modify the Sybase information in these les. For more information, see the man page for hasybase(1M).
G
/var/opt/sybase/sybtab Lists all Sybase servers registered in a Sun Cluster High Availability configuration /etc/opt/SUNWscsyb/hasybase_config_V1 Conguration le for HA-DBMS for the Sybase fault monitor CCD Tracks the registered Sybase instances /etc/opt/SUNWscsyb/hasybase_support Table of Sybase releases supported by Sun Cluster High Availability
G G
Caution You should never directly edit the hasybase_config_V1, the CCD, and hasybase_support les.
15-44

15
HA-Informix Conguration Files
The /etc/opt/SUNWscinf directory contains information about a HA-DBMS for Informix conguration. Only the hainformix command should be used to modify the Informix information in these les. For more information, see the man page for hainformix(1M).
G
/var/opt/informix/inftab List of all Informix instances registered in the cluster /etc/opt/SUNWscinf/hainformix_config_V1 Action le for HA-DBMS for Informix fault probes CCD Tracks the registered Informix instances /etc/opt/SUNWscinf/hainformix_support List of Informix releases supported by the Sun Cluster HA release
G G
Caution You should never directly edit the hainformix_config_V1, the CCD, and hainformix_support les.
15-45
15
Exercise: HA-DBMS Installation
Exercise objective None There is no lab associated with this module.
Preparation
None
Tasks
None
15-46

15
Exercise: HA-DBMS Installation
Exercise Summary
G
Experiences
G
Interpretations
G
Conclusions
G
Applications
15-47
15
Check Your Progress
Before continuing on to the next module, check that you are able to accomplish or answer the following: K K List the conguration issues for a highly available DBMS instance Describe the general installation and conguration process for an HA-DBMS data service
15-48

15
Think Beyond
Are there services associated with HA-DBMS that should be highly available? Are there advantages to using multiple disk groups with the HA-DBMS? Do you need application-specic fault probes for an HA-DBMS environment? Why or why not? Why is the quorum mechanism so important with HA-DBMS?
15-49
Cluster Conguration Forms
This appendix contains forms that are used to record conguration information for cluster systems used during training. Some of the forms are referenced in more than one exercise. Portions of them will be completed in different lab exercises.
A-1
A
Cluster Name and Address Information
Ask your instructor for assistance in completing the following information about your assigned cluster. 1. The location of the Sun Cluster software: ___________________________________________ 2. The name of your cluster: ________________________ 3. The IP Address and name of each system in your cluster: Administration Workstation Terminal Concentrator Cluster Host Cluster Host Cluster Host Cluster Host ______________ ______________ ______________ ______________ ______________ ______________ ______________ ______________ ______________ ______________ ______________ ______________
4. The Ethernet addresses of each cluster host system. Cluster Node hostname Cluster Node Ethernet Address
A-2

Multi-Initiator SCSI Conguration B

This appendix is designed to help you congure your server to support the use of Multi-Initiator SCSI (MIS) congurations. These are used when using Sun MultiPack storage devices or the StorEdge A3000 as your switchable disk devices. The discussion in this appendix does not apply to multi-path SCSI congurations.
B-1
B
Preparing for Multi-Initiator SCSI
Background
MIS allows the connection of regular SCSI devices to two host adapters. This capability has been available since SCSI-1, but has rarely been used due to OS restrictions. The two host adapters might be on the same system (if the OS supports this), or might be on two different systems. If the different systems are not aware of the presence of MIS devices, the devices must be used by only one system at a time. MIS congurations are fairly simple. The SCSI bus originates at one host adapter and connects through the devices as normal. However, instead of using a terminator, the last device is connected to the second host adapter. Termination is provided by the host adapters so the host adapters can reach all devices. The devices do not require any changes or special capabilities to operate with MIS. Unfortunately, practical implementation is not quite this simple. All devices on a SCSI bus have a target address, essentially a controller number. These target addresses must be unique on the bus. If you have duplicate target numbers on the bus, the bus does not function properly. It could hang, randomly reset, garble data, or cause other problems. This is not due to the devices, which can easily be given different target numbers. The problem is with the host adapters themselves. Host adapter target addresses are usually set by the hardware or OS, almost invariably to target 7. Because target 7 is the highest possible priority on a SCSI bus, this makes sense. Unfortunately, with two host adapters on one bus, by default both want to be 7, causing a target collision and an inoperative bus. The solution is to change one of the host adapter target addresses from 7 to 6, the second highest priority. On Sun systems, the SCSI initiator ID for any SCSI adapter is suggested by the adapter card itself, and conrmed by the boot prom. To change the initiator ID, the OpenBoot PROM must make the change.
B-2

B
Changing All Adapters
Caution After reviewing this section, complete the procedure on only one system before any MIS devices are connected to the system. Remember that in a two-node cluster, only one of the adapters must be changed. Do not change both systems or you will be back to the original problem. In three- or four-node clusters, all but one of the nodes must be changed, each to a different value from the others.
Changing the Initiator ID

The are two ways to change the initiator ID. The rst way involves changing the initiator ID for every SCSI adapter on the system to 6 (or some other value). This is done simply by setting the systems scsiinitiator-id OpenBoot PROM parameter to 6 at the OBP ok prompt. ok setenv scsi-initiator-id 6 Unfortunately, this might cause problems with devices, such as CD-ROM drives and some SSA congurations. It also does not work if both SCSI adapters are connected to the same system. In these situations, you must change the individual adapters ID.
Drive Firmware
Some disk drive rmware does not properly support MIS. These drives either must have their rmware updated or must not be used. See the Sun Cluster release notes for more information.
Multi-Initiator SCSI Conguration

B-3
B
The nvramrc Script
The OpenBoot PROM Monitor builds its own device tree based on the devices attached to the system when the boot sequence is invoked. The OBP Monitor has a set of default aliases for the commonly occurring devices in the system. The nvramrc script contains a series of OpenBoot PROM commands that are executed during the reset or boot sequence. You can put any OBP commands in this script to be run at every reset. For this discussion, it is assumed that this le is nonexistent and is created. The steps to create or edit an nvramrc script are similar. You must be familiar with nvramrc Editor Keystroke Commands. These commands are similar to the vi editors commands. Table B-1 below lists the most useful commands. Table B-1 Keystroke ^N ^P ^A Carriage Return ^K ^R ^C nvramrc Editor Keystroke Commands Command Go to next line Go to previous line Go to beginning line Insert after current line Delete until end of line Replace the current line Hold changes and exit editor
The caret symbol ^ in the preceding table indicates that you should press the Control key and the next character simultaneously. You invoke the nvramrc editor from the ok OBP prompt.
B-4

B
Changing an Individual Initiator ID for Multipacks
To set an individual scsi-initiator-id for a Sun MultiPack storage device, complete the following steps. Do not use this procedure for the SorEdge A3000. Note It might be easier to use the OBP scsi-initiator-id property to set all of the SCSI adapters on the system to 6, and use this procedure only to change back to 7 those adapters that will not work as 6. 1. Enter the OBP Monitor. If the OS is running, use the init 0 command to stop the OS. # init 0 2. Go through the OBP device tree until you reach the SCSI adapter that you want to change, or use the show-disks command to save its address. 3. Type the nvedit command to create and store a nvramrc script. The line numbers (0:, 1:, and so on) are printed by the OpenBoot PROM Monitor. See the Table B-1 for the nvramrc editor keystroke commands. ok nvedit 0: probe-all 1: cd /sbus@70,0/SUNW,fas@1,8800000 2: 6 encode-int " scsi-initiator-id" property 3: device-end [Control-c] ok Caution Insert exactly one space after the double quote and before scsi-initiator-id. You must change the device named in the cd command to correspond to the adapter that you want to change. Add the three lines, following the pattern of lines 13, for each host adapter that you want to change.

B-5
B
Changing an Individual Initiator ID for MultiPacks
4. Store or discard the edit changes. The nvedit command does not save them automatically.
G
To store the changes, type nvstore. ok nvstore
To discard the changes, type nvquit. ok nvquit
The changes you make through the nvedit command are made on a temporary copy of the nvramrc script. You can continue to edit this copy without risk. Once you are nished editing, save the modications. If you are not sure about the changes, discard them. Note Before proceeding any further, you must have successfully created the nvramrc script and saved it by using nvstore. 5. Verify the contents of the nvramrc script you created. ok printenv nvramrc nvramrc = probe-all cd /sbus@70,0/SUNW,fas@1,8800000 6 encode-int " scsi-initiator-id" property device-end ok 6. If the output differs from what you have entered, go back and edit the nvramrc script again. 7. Perform a test of the nvramrc script by typing the nvramrc evaluate command. Verify that the scsi-initiator-id property of the host has not been changed. ok nvramrc evaluate ok printenv scsi-initiator-id scsi-initiator-id = 7 ok
B-6

B
Changing an Individual Initiator ID
8. Verify that the nvramrc script works properly. If the scsi-initiator-id for the on-board SCSI controller is now set to 6, then re-edit the nvramrc script. ok cd /sbus@70,0/SUNW,fas@1,8800000 ok .properties hm-rev 00 00 00 22 scsi-initiator-id 00000006 device_type scsi clock-frequency 02625a00 intr 00000020 00000000 interrupts 00000020 reg 0000000e 08800000 00000010 0000000e 08810000 00000040 name SUNW,fas ok In the preceding output, the scsi-initiator-id has a value of 00000006 so the script worked properly. 9. Now tell the OpenBoot PROM Monitor to use the nvramrc script. You must set this property to have the nvramrc script executed at reset time. ok setenv use-nvramrc? true ok printenv use-nvramrc? use-nvramrc? = true ok 10. Ensure that the other adapter has the proper initiator value of 7, either by checking the scsi-initiator-id property or checking the appropriate device tree node with .properties. 11. Load the latest appropriate Solaris MIS patches for the fas driver and the sd driver on both systems. 12. Connect the MIS devices to both hosts and reboot both systems using the -r option. ok boot -r

B-7
Sun Storage Array Overviews
This appendix is designed to help you understand the physical organization and basic device addressing schemes for the storage arrays most commonly used in clustered systems.
C-1
C
Disk Storage Concepts
Multi-host Access
In the past, this feature was referred to as dual-porting. With the advent of current technology, such as the Sun StorEdge A5000, you can connect as many as four different hosts to the same storage device.
Multi-initiated SCSI
The Sun MultiPack storage devices support physical SCSI interface connections from two different host systems. The SCSI interface on each of the systems must have a different initiator ID setting. This is a system rmware conguration known as the scsi-initiator-id.
C-2

C
Multi-host Access
Multi-initiated SCSI (Continued)
As shown in Figure C-1, the scsi-initiator-id on one of the host systems must be changed to eliminate the addressing conict between the two host systems. Host system A Internal SCSI Bus scsi-initiator-id = 7 External SCSI Bus scsi-initiator-id = 6 SCSI card Host system B Internal SCSI Bus scsi-initiator-id = 7 External SCSI Bus scsi-initiator-id = 7 SCSI card
In
Out
t9 t10 t11 Figure C-1
t12 t13 t14
SCSI Initiator Conguration
The SCSI initiator values are changed using complex system rmware commands. The process varies with system hardware platforms. Do not change the External SCSI Bus scsi-initiator-id globally. The scsiinitiator-id is best changed at the interface card level. Read the documentation carefully, the procedures are hardware platform specic.

C-3
C
Multi-host Access
Multi-host Fiber Optic Interface
Two different ber-optic interface storage arrays support multiple host connections. The SPARCstorage Array 100 allows up to two host systems to connect to a single storage array. The Sun StorEdge A5000 array allows up to four host system connections. Host 0 SOC+ host adapter StorEdge A5000
Host 1
Interface board A
Host 2 Interface board B Host 3
SOC host adapter Host 1 Interface board A
SPARCstorage Array 100
Host 2 Interface board B Figure C-2 Fiber-optic Multiple Host Connections
C-4

C
Host-Based RAID (Software RAID Technology)
The Sun StorEdge Volume Manager (SSVM) is a good example of software RAID technology. As shown in Figure C-3, user applications access a virtual structure through a single path that is actually composed of three separate disk drives. SSVM software 3-Gbyte virtual volume
User or application access
Controller c4 1-Gbyte physical disks t1 t2 Storage array Figure C-3 Host-Based RAID Technology t3
A typical virtual volume pathname would be similar to: /dev/vx/dsk/dga/volume-01 Even though the physical paths to the three disk drives in Figure C-3 still exist, they are not accessed directly by users or applications. Only the virtual volume paths are referenced by users. The virtual structures are created and managed by software that runs on the host system.

C-5
C
Controller-Based RAID (Hardware RAID Technology)
Controller-based RAID solutions use rmware that runs in external controller boards to maintain virtual structures that are composed of one or more physical disk drives. As shown in Figure C-4, RAID Manager software running on the host system is used to congure virtual structures in the external controller board. After initial conguration, the controller board rmware manages the virtual structures. Host system Ultra SCSI card C2 Controller User access RAID Manager RAID hardware
disk disk disk disk array Figure C-4 Controller-Based RAID Technology
disk disk disk disk array
disk disk disk disk array
A typical hardware RAID device appears to be the same as any physical path such as /dev/dsk/c0t5d0s0. Applications are unaware of the underlying RAID structures. Hardware RAID solutions typically offer much better performance for some types of RAID structures because RAID overhead calculations are performed at very high speed by the controller resident hardware instead of on the host system as in host-based RAID.
C-6

C
Redundant Dual Active Controller Driver
Some Sun storage devices allow dual connections to a storage array from a single host system. One host adapter can be configured as a backup if the primary access path fails or the two adapters can be used in a load balancing configuration. Storage Array Host system Drive Drive Drive Drive Ultra SCSI card C1 Ultra SCSI card C2 RDAC driver RM6 RAID Manager Controller Drive Drive
C-7
Drive
Figure C-5
Redundant Dual Active Controller Driver
The redundant dual active controller (RDAC) driver is a special purpose driver that manages the dual interface connections. This driver is available with some of the Sun hardware RAID storage arrays which including the A3000 and A3500 models. Applications interface with the RDAC driver and are unaware of interface failure. If one of the dual-controller paths fails, the RDAC driver automatically directs I/O to the functioning path. The controller-based RAID solution is used only on SCSI hardware interfaces.

Drive
Controller
C
The dynamic multi-path driver (DMP) is unique to the SSVM product. It is used only with ber-optic interface storage arrays. As show in Figure C-6, the DMP driver can access the same storage array through more than one path. The DMP driver will automatically congure multiple paths to storage array. Depending on the storage array model, the paths will either be used for load-balancing in a primary/backup mode of operation. Storage Array Host system Drive Drive Drive Drive SOC card Controller Drive Drive
C1
Drive
C2
DMP driver fiber-optic interface
Figure C-6
The paths can be enabled and disabled with the SSVM vxdmpadm command. Note The DMP feature of SSVM is not compatible with the alternate pathing software of the operating system. During installation, SSVM checks to see if AP (AP 2.x) is congured and if so, it does not install the DMP (driver) software.
C-8

Drive
SOC card
Controller
C
Hot Swapping
Most Sun disk storage arrays are engineered to that a failed disk drive can be replaced without interrupting customer applications. The disk replacement process also includes one or more software operations that can vary with each disk storage platform.
Standard SSVM Disk Replacement Procedure

In its simplest form, the process to replace a failed disk drive that is under SSVM control is as follows: 1. Use the SSVM vxdiskadm utility to logically remove the disk. 2. Hot swap in a new disk drive. 3. Use the SSVM vxdiskadm utility to logically install the new disk.
Disk Replacement Variations

The basic SSVM disk replacement process is more complex for some storage arrays, such as the StorEdge A5000. The A5000 procedure is as follows: 1. Use the SSVM vxdiskadm utility options 4 and 11 to logically remove and ofine the disk. 2. Use the luxadm utility to remove the physical disk drive path. 3. Hot swap in a new disk drive. 4. Use the luxadm utility to build a new physical disk drive path. 5. Use the SSVM vxdiskadm utility to logically install the new disk. Note There are other variations on the basic disk replacement process. You must be familiar with the exact process for your particular disk storage devices.

C-9
C
SPARCstorage Array 100 Features

The SPARCstorage Array 100 has the following features:
G G G G G
Thirty disk drives Ten disk drives per removable tray Dual ported ber-optic interfaces Six internal SCSI target addresses Warm pluggable with restrictions and cautions
C-10

C
Host system 1 FCOM card Array 100 Controller Board FCOM card FCOM card Host system 2 FCOM card
t0 Tray 1 d0 d1 d2 d3 d4 d0 d1 d2 d3 d4
t2
t4
t5
SPARCstorage Array 100 Addressing

Typical Address Paths
G G
c0t3d0s2 c4t2d4s4
Note Notice the internal SCSI controllers in the array controller board. There is one per target. This storage device has multiple devices per each target which is unusual.
C-11
C
RSM Storage Array
RSM Storage Array Features

You can use the RSM storage tray as a standalone unit attached to a single differential SCSI SBus card or it can be rack mounted and used with a special dual-ported controller assembly. It has the following:
G G G G G
Seven disk drives in each array Disks that are hot-pluggable Redundant power modules Redundant cooling modules Drives are individually removable
C-12

C
RSM Storage Array
SCSI ID
RSM Storage Array Addressing

If the RSM storage tray is attached to a differential-wide SCSI interface, the SCSI target ID corresponds to the slot number in the tray. Typical physical addresses are:
G G G
c2t0d0s2 c4t2d0s4 c2t5d0s3
The device number will always be zero. Note There is a switch that can change the address range to 8-14.
C-13
C
SPARCstorage Array 214/219
SPARCstorage Array
200 controller
RSM arrays
O I
SPARCstorage Array 214/219 Features

The SPARCstorage Array Model 214/219 combines a SPARCstorage Array 200 disk controller with up to six RSM (Removable Storage Module) differential SCSI disk trays. It is contained in a 56-inch cabinet and has the following features:
G G G G G G
Is rack mounted in a 56-inch expansion cabinet Has dual port ber-optic interface Has six differential SCSI outputs Is typically connected to RSM array trays (six tray maximum) Has seven devices per RSM tray, either 4 or 9-Gbytes each You can remove individual devices from the tray without affecting other drives in the tray
The 214 uses 4-Gbyte disks and the 219 uses 9-Gbyte disks.
C-14

C
SPARCStorage Array 214
Host system 1 FCOM card Array 200 Controller Board FCOM card FCOM card Host system 2 FCOM card
Differential SCSI board t0 t2 t4
Differential SCSI board t1 t3 t5
d0 d1 d2 d3 d4 d5 d6 RSM array
SPARCstorage Array 214 Addressing

Typical Address Paths
G G
c0t3d0s2 c4t2d5s4
In this conguration, the SCSI device number corresponds to the slot number in each RSM tray.
C-15
C
Sun StorEdge A3000 (RSM Array 2000)
Dual power sequencers
RSM arrays
Array 2000 Controller (rear view)
The Sun StorEdge A3000 controller is a compact unit that provides hardware based RAID technology. There are two SCSI controllers that manage up to ve RMS storage arrays.
StorEdge A3000 Features

G G G G G G
Redundant hot plug RAID Controllers Redundant power and cooling Data cache with battery back-up RAID 0, 1, 0+1, 1+0, 3, 5 Dual UltraSCSI host interface (40 MB/sec) Hot plug controllers, power supplies and cooling
C-16

C
Sun StorEdge y A3000 (RSM Array 2000)
Ultra SCSI card C1 Ultra SCSI card C2 RDAC driver Controller #1 Out In Controller #2 In Out Out RAID hardware
T4 RM6 RAID Manager
RAID hardware
T5 PCI Bus
8 9 10 11 12 13 14 RSM
8 9 10 11 12 13 14 RSM
8 9 10 11 12 13 14 RSM
8 9 10 11 12 13 14 RSM
8 9 10 11 12 13 14 RSM
StorEdge A3000 Addressing

The Sun StorEdge A3000 is not directly addressed. A RAID manager GUI called RM6 is used to congure hardware RAID devices consisting of groups of RSM disks. The Redundant Dual Active Controller (RDAC) driver makes is possible for automatic failover to the backup access path through the second Ultra SCSI interface. Different hardware RAID volume access can be directed through each interface for load balancing. There are several utilities associated with managing the hardware RAID devices created with RM6. Note Once created, the RM6 devices can be referenced by other system utilities such as virtual volume managers.
C-17
C
Sun StorEdge A3000 (RSM Array 2000)
StorEdge A3000 Addressing (Continued)
The RM6 RAID Manager software can take 1 or more physical disk drives in the storage trays, and congure them as a single logical unit (LUN). This LUN can have a hardware RAID structure. Once congured, a RM6 LUN appears to be a regular physical address such as c2t3d0s2. The underlying conguration is hidden from applications. There are potential problems with conguring a SSVM software RAID5 device on top of a RM6 hardware RAID-5 device. You must read the array documentation carefully.
C-18

C
StorEdge A1000/D1000
Optional controller board
Power supplies Cooling modules
Hot plug disk drives
Shared Features
Except for controller boards, the StorEdge A1000 and D1000 models have the following features in common:
G G G
Eight 1.6-inch or twelve 1-inch Ultra-SCSI disk drives Dual power supplies Dual cooling modules
Note The disk drives, power supplies, and cooling modules are all hot-pluggable.
C-19
C
StorEdge A1000 Differences
The A1000 is often referred to as the Desktop hardware RAID solution. It is a standalone hardware RAID device and is programmed by the RM6 RAID Manager software in exactly the same manner as the StorEdge A3000. As shown in Figure C-7, the StorEdge A1000 controllers has two SCSI ports. Usually one port is connected to the host system through a Ultra Differential Fast/Wide Intelligent SCSI (UDWIS) adapter. The other port is terminated. UDWIS (40MB/sec) Host UDWIS RAID controller
Figure C-7
StorEdge A1000 Connection

The addressing scheme is identical to that used by the StorEdge A3000 unit. The RM6 RAID Manager software takes one or more physical disk drives in the storage tray and congures them as a single logical unit (LUN) which appears to be a regular physical address such as c2t3d0s2. The underlying conguration is hidden from applications. There are potential problems with conguring a SSVM software RAID5 device on top of a RM6 hardware RAID-5 device. You must read the array documentation carefully.
C-20

C
StorEdge D1000 Differences
As shown in Figure C-8, the StorEdge D1000 controller has four SCSI ports. The controller can be congured so that half the disk are connected to one pair of ports and half to the other pair. They can also be congured so that all the disks are available through a single connection. Each pair of ports provide a Ultra Differential Fast/Wide Intelligent SCSI (UDWIS) connection.
IN/OUT-1
IN/OUT-1
IN/OUT -2
IN/OUT -2
Figure C-8
StorEdge A1000 Connection
StorEdge D1000 Addressing

The StorEdge D1000 trays are used in exactly the same way the RSM trays are used in the StorEdge A3000 and with the same hardware RAID controller boards. The addressing scheme is identical to that used by the StorEdge A3000 unit. The RM6 RAID Manager software takes one or more physical disk drives in the storage tray and congures them as a single logical unit (LUN) which appears to be a regular physical address such as c2t3d0s2. The underlying conguration is hidden from applications.
C-21
C
Sun StorEdge A3500
Single A3000 two-board controller

StorEDGE A3000
D1000 trays
1x5 Configuration Up to 720 GB disk capacity
The StorEdge A3500 unit uses the StorEdge D1000 trays the same way the RSM trays are used in the StorEdge A3000. They are connected using the same two-board controller that is used in the StorEdge A3000. The main difference is the cabinet size.
StorEdge A3500 Features

Depending on its conguration, a StorEdge A3500 system can have up to 2.16-TB disk capacity. The main features are:
G G G
Hardware RAID controller(s) Scalable conguration A 72-inch rack is used to hold up to seven D1000 trays
C-22

C
Sun StorEdge A3500
StorEdge A3500 Features (Continued)
As shown in Figure C-9, the StorEdge A3500 array has two additional congurations that can be purchased:
G
The 2x7 conguration with two dual-board controllers and seven D1000 trays. The 3x15 conguration with three dual-board controllers and fteen D1000 trays
StorEDGE A3000
StorEDGE A3000
StorEDGE A3000
StorEDGE A3000
StorEDGE A3000
2x7 Conguration
Up to 1.008 TB disk capacity
3x15 Conguration
Up to 2.16 TB disk capacity
Figure C-9
StorEdge A3500 Scalability
C-23
C
Sun StorEdge A3500
Host System Ultra SCSI card C1 Ultra SCSI card C2 RDAC driver RM6 RAID Manager
Controller #1 Out
In
Controller #2 In Out Out RAID hardware
T4
RAID hardware
T5
D1000 array
D1000 array
D1000 array
D1000 array
D1000 array

The RM6 RAID Manager software takes 1 or more physical disk drives in the storage trays, and congures them as a logical unit (LUN). This LUN can have a hardware RAID structure. Once congured, a RM6 LUN appears to be a regular physical address such as c2t4d0s2. The underlying conguration is hidden from applications.
Caution There are potential problems with conguring a SSVM software RAID-5 device on top of a RM6 hardware RAID-5 device. There will be redundant parity calculations that will cause extremely poor performance.
C-24

C
Sun StorEdge A5000
Tabletop unit
Rack mounted
The Sun StorEdge A5000 is a highly available mass storage subsystem. The A5000 is the building block for high-performance and high availability congurations with fully redundant, hot-swappable, active components. The A5000 has the highest RAS features of any Sun storage array yet.
A5000 Features
G G
The Sun second generation Fibre Channel storage subsystem Tabletop units or up to four mounted in a 56-inch rack or six in a 72-inch rack
G
Each rack includes two hubs
C-25
C
Sun StorEdge A5000
A5000 Features (Continued)
The following describes the features of the Sun StorEdge A5000.
G
A new way of storing data that is:

G G
Extremely fast (100-Mbytes per second) Highly available best RAS (Reliability, Availability, and Serviceability) features Scalable in capacity, bandwidth, and I/O rate
It contains up to 14 Half Height (HH 1.6 inch) or 22 Low Prole (LP 1 inch) hot-pluggable, dual-ported, FC-AL disk drives. Two interface boards with Gigabit Interface Converters (GBICs) provide dual-path capability to the dual-ported disk drives. Two hosts may be attached to each path. A Front Panel Module (FPM) allows the conguration and status of the enclosure to be displayed and modied. Active components in the disk enclosure are redundant and can be replaced while the subsystem is operating. Automatic reconguration will bypass whole failed components, or portions thereof. The enclosure is designed for tabletop use or for mounting up to four in a standard Sun rack. 123.75 Gbyte usable raw formatted capacity in each unit gives over 495 Gbytes per loop (maximum of 4 units in a loop).
The Sun Enterprise Network Array connects to the host node using the SOC+ FC-AL interface card or built-in FC-AL interfaces in some Sun Enterprise Server I/O boards.
C-26

C
Sun StorEdge A5000
6
drive
5
drive
4
drive
3
drive
2
drive
1
drive
0
drive
Rear slot numbers
Rear Backplane Interface Board A
Host 0
Host 1 Host 2 Host 3
Interface Board B Front Backplane drive drive drive drive drive drive drive
Front slot numbers

The A5000 storage has 14 disk drives. The physical locations are described terms of front 0-6 and rear 0-6. Each box can be assigned a box identier from 0-3. Each identier determines a precongured address range for the box. Each address is directly related to a SCSI target number.
Box ID 0 Addressing
Rear Drives: Front Drives: t22 t0 t21 t1 t20 t2 t19 t3 t18 t4 t17 t5 t16 t6
C-27
C
Sun StorEdge A5000
A5000 Internal Addressing
Box ID 1 Addressing
Box ID 2 Addressing
Box ID 3 Addressing
Physical Addresses
The Box ID addresses are available so that up to four A5000 storage arrays can be daisy-chained on a single controller without any SCSI address conicts. Typical addresses for A5000 storage arrays are:
G G G G G G
c0t3d0s2 c4t67d0s3 c3t98d0s2 c1t6d0s4 c5t113d0s0 c2t83d0s4
C-28

C
Sun StorEdge A7000
Sun
HDSA
HDSA
dsp1
dsp2
The Sun StorEdge A7000 intelligent storage servers is a mainframe class subsystem designed to address the storage needs of UNIX and NT hosts as well as IBM and plug compatible mainframes on a single versatile platform.
Sun StorEdge A7000 Enclosure

In addition to fully redundant hardware, including controllers, cache, hot-pluggable disks, fans, power and power cords, the Sun StorEdge A7000 enclosure contains two high density storage array (HDSA) units and two data storage processor (DSP) units.
C-29
C
Sun StorEdge A7000
Sun StorEdge A7000 Enclosure
High Density Storage Array Units
Each HDSA unit can each hold up to 54 9.1-Gbyte disk drives. They are housed in removable carriers and, together with software redundancy options, provide hot-swappable disks. They are arranged in six-packs and plug into an intelligent backplane that automatically sets the SCSI ID of the device according to its position in the six-pack. Capacity for the A7000 can be expanded from 24 to 324 disk drives (217-Gbytes to 2.93-Tbytes of total storage) by adding a expansion cabinet containing four additional HDSA units.
Data Storage Processor Units

Each of the DSP unit operates independently and controls one of the HDSA units. Each DSP unit has the following features:
G G
14-slot chassis backplane Multiple host system adapters

G G G
Quad block multiplexor channel (BMC) adapter Dual channel enterprise system connection (ESCON) adapter SCSI target emulation (STE) adapters
UNIX Systems laboratories UNIX System V operating system
C-30

C
Sun StorEdge A7000
SCSI
ESCON
BMC
DSP 1 (UNIX) DASD manager Interface board options Interface board options
DSP 2 (UNIX) DASD manager
Memory channel
HDSA (54 disks)
SCSI expanders
HDSA (54 disks)
StorEdge A7000 Functional Elements

Most the important StorEdge A7000 features can be adequately understood without lengthy discussions. Detailed technical discussions can distract the more important conceptual understanding.
Host Adapter Options

Each DSP unit has ve slots available for any mix of SCSI, ESCON, or BMC adapter boards. Simultaneous connections can be made from any of the supported host types.
C-31
C
Sun StorEdge A7000
StorEdge A7000 Functional Elements
Memory Channel Adapter
The two DSP units are connected by a high-speed memory bus. Each DSP unit has up to four memory channel board slots. The memory channel interconnect allows each DSP subsystem to keep the other informed of its state of operation including the presence of any unwritten data. In the event of a DSP failure, the partner DSP unit can take over operation and maintain data integrity.
Direct Access Storage Device Manager

The DASD manager is a graphical user interface (GUI) tool that allows service personnel to congure the storage on the A7000. The conguration information is stored in the master conguration database (MCD) in each of the DSP units. The DASD manager can be used to create and manage the following storage congurations:
G G G G G G G
Linear partitions RAID spare devices RAID 5 RAID 1 RAID 0 RAID 1+0 RAID 0+1
SCSI Expanders
The SCSI expanders allow each DSP unit to access the others disk storage. This path is used only if one of the DSP units fail.
C-32

C
Sun StorEdge A7000
Using the DASD manager, the HDSA disks can be congured in a variety of ways. Each type of conguration has associated special device les that can be referenced by SSVM commands and used to build software RAID devices on top of the A7000 RAID devices. The A7000 device types and their associated device names are shown in Table C-1. Table C-1 StorEdge A7000 Device Addresses Special device /dev/rdsk/cd4 Description Linear partitions that function as normal disk partitions. The last segment of the address is determined by the disks physical location in the HDSA. RAID-5 devices that are composed of multiple linear partitions. RAID-1 devices that are termed mirrored partitions by the A7000 documentation. They are composed of multiple linear partitions. RAID-0 devices that are termed either striped virtual partitions or concatenated partitions by the A7000 documentation. They are composed of multiple linear partitions.
/dev/rdsk/0r3 /dev/rdsk/0r5 /dev/rdsk/mp0
/dev/rdsk/vp0
C-33
C
Sun StorEdge A7000
Combining SSVM and A7000 Devices
Probably the most compelling use for combined host-based and control unit-based RAID is the attainment of very high sequential throughput, such as for large decision support systems. These systems are usually limited by the bandwidth of the connection between the host and the storage subsystem. As shown in Figure C-10, a useful conguration, in these cases, takes suitable volumes implemented with A7000-based RAID 5 (or 1/0) and stripes them together with hostbased RAID 0.
RAID 0 or 1
I/O bus
SCSI adapter SCSI adapter
Host A7000
STE STE
VME bus
SCSI adapter SCSI adapter
PARTITION RAID partition 4 0 RAID partition 5
PARTITION RAID partition 4 0 RAID partition 5
RAID partition 63
RAID partition 63
RAID 5 disk set
RAID 5 disk set
Figure C-10
Combining Host-based RAID 0 or 1 and A7000-based RAID 5
C-34

C
SPARCstorage MultiPack
The SPARCstorage MultiPack enclosure is a multiple-disk storage device equipped with a fast wide SCSI interface. The Multipack-2 provides an UltraWide interface.There are two versions of the device:
G
SPARCstorage MultiPack unit that supports up to six 1.6-inch high, single-connector disk drives SPARCstorage MultiPack unit that supports up to twelve 1-inch high, single-connector disk drives
The Multipack enclosure is 9 inches high. You can use SPARCstorage MultiPack in a multi-initiated SCSI conguration. Note If you do not have SPARCstorage Arrays attached to your system, you will need a special license to use SSVM in a MultiPack only conguration.
C-35
C
SPARCstorage MultiPack
SPARCstorage MultiPack Features
The following describes the features of the SPARCstorage MultiPack.
G G G
68-pin Fast, or Ultra, Wide SCSI interface Drive addresses determined by position (hardwired) Six drive units which can be used on a standard 50-pin (narrow) SCSI bus Twelve-drive unit only for use on 68-pin (wide) SCSI bus Twelve 1.0-inch, 7200 rpm disks (2.1 or 4.2 Gbyte) Six 1.6-inch, 5400 rpm disks (9.1 or 18-Gbyte)
G G G
SPARCstorage MultiPack Addressing

The SPARCstorage MultiPack addressing is determined automatically based on the type and physical position of the disks used. The address range is selectable with the six-drive model. The address ranges are as follows:
G
The six-drive model addresses are switch-selectable and are either 1-6 or 9-14 The twelve drive model addresses are designed so that addresses 6 and 7 are not used to eliminate scsi-initiator-id conicts.
G G
Addresses 2-5 Addresses 8-15
The addresses directly relate to target numbers. A typical device address path would be /dev/dsk/c0t8d0s2.
C-36

C
Storage Conguration
Identifying Storage Devices
The best way to identify the type and model of storage devices connected to your system is to read the product model tag and study the related technical manuals. Occasionally, you might be working with systems remotely and need to identify the hardware conguration using operating system command and other tools.
Using the luxadm Command

The luxadm program is an administrative command that manages the SENA, RSM, and SPARCstorage Array subsystems. It can nd and report basic information about supported storage arrays as follows: # luxadm probe Unfortunately, the probe option only recognizes certain types of storage arrays. This is not comprehensive enough.
C-37
C
Storage Conguration
Using the luxadm Command
The luxadm command can give useful information if you know some basic controller addresses. This is still limited to certain storage models, and the command will give error messages if unsupported device are examined. The following are examples of output from using the luxadm command. # luxadm disp c0 luxadm: Error opening /devices/iounit@f,e0200000/sbi@0,0/dma@0,81000/esp@0,80000:ctlr No such file or directory # luxadm disp c1 SPARCstorage Array 110 Configuration # luxadm disp c2 SPARCstorage Array 100 Configuration
# luxadm disp c3 luxadm: Error opening /devices/iounit@f,e3200000/sbi@0,0/SUNW,socal@3,0/sf@1,0:ctlr No such file or directory # luxadm probe Found SENA Name:kestrel Node WWN:5080020000000878 Logical Path:/dev/es/ses0 Logical Path:/dev/es/ses1 In the previous examples, the c0 controller is a standard SCSI interface, so luxadm cannot identify it. The c1 and c2 controllers are for SPARCstorage Array 100 models, which luxadm can identify. The c3 controller is a luxadm-supported StorEdge A5000 array, but you must use a different luxadm option to see it. The probe option discovered the array successfully. You can use the luxadm display kestrel command format to display the A5000 details.
C-38

C
Storage Conguration
Using the format Utility
The Solaris Operating Environment format utility is the only reliable program for gathering basic storage conguration information. It is not the complete answer but it reports all storage devices, regardless of type or model. Review the following sample output. There are three different types of storage devices shown. AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN1.05 cyl 2036 alt 2 hd 14 sec 72> /io-unit@f,e0200000/sbi@0,0/dma@0,81000 /esp@0,80000/sd@0,0 1. c1t0d0 <SUN1.05 cyl 2036 alt 2 hd 14 sec 72> /io-unit@f,e1200000/sbi@0,0/SUNW, soc@3,0/SUNW,pln@a0000000,8023c7/ssd@0,0 2. c1t0d1 <SUN1.05 cyl 2036 alt 2 hd 14 sec 72> /io-unit@f,e1200000/sbi@0,0/SUNW, soc@3,0/SUNW,pln@a0000000,8023c7/ssd@0,1 3. c3t98d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /io-unit@f,e3200000/sbi@0,0/SUNW, socal@3,0/sf@1,0/ssd@62,0 You can determine the following from the previous example:
G
Device 0 is a standard SCSI interface because of the esp in the path name Devices 1 and 2 are SPARCstorage Array 100 disk because of the soc in the path name Device 3 is a FCAL storage array because of the socal in the path name
C-39
C
Storage Conguration
Identifying Controller Congurations
The format utility can also be used to identify storage arrays that have multi-path controller connections. Warning DMP congurations are not supported in the Sun Cluster environment. It is important that you always disable the DMP feature when installing the CVM or SSVM software.
Dynamic Multi-path Devices

Dynamic Multi-path Device (DMP) connections be identied using the format utility, as shown in the following example. AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 1. c2t33d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /sbus@3,0/SUNW,socal@0,0/sf@0,0/ssd@w22000020370c0de8,0 2. c3t33d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /sbus@3,0/SUNW,socal@0,0/sf@1,0/ssd@w21000020370c0de8,0 The device paths for devices 1 and 2 in the previous example have the same disk drive identier, 20370c0de8. Because the controller numbers are different, they are connected to two different controller interfaces in the same system.
C-40

Oracle Parallel Server
This appendix is designed to help you understand some of the installation and functional elements of a Sun Enterprise Cluster system running the Oracle Parallel Server application.
D-1
D
Oracle Overview
System global area (SGA) Database buffer cache Server processes Database blocks to be modified Redo log buffer Changes made
Program global area (PGA)
Program interface Read User processes Data files PMON Monitor user processes and server processes
DBWR LGWR
Control files
Redo logs
SMON Automatic instance recovery ARCH Off-line storage
Figure D-1 Oracle Conguration Note This diagram is for illustration purposes only. Your version of Oracle might be slightly different.
D-2

D
Oracle Overview
Oracle 7.x and Oracle 8.x Similarities
Oracle 7.x and Oracle 8.x parallel server share many similarities in several general areas.
System Global Area

The system global area (SGA) is maintained in memory and consists mostly of shared memory buffers. Its main features are:
G
The database buffer cache where all data modications are made Database blocks are loaded from the disks and are from 232 Kbytes. Their size is set by the db_block_size parameter during database creation and is permanent.
The redo log buffer It contains a record of all recent changes made to the database. Periodically, the change are written to disk resident redo logs that are used to recover from a system or database crash.
Program Global Area

The program global area (PGA) contains data and control area used by server processes. The server processes handle request from connected user processes.
File Types
All of the following three Oracle le types must be shared:
G G
Data les Control les (the physical makeup of the database, status, and links to volumes) Redo logs (recovery information, separated by instance)
D-3
D
Oracle Overview
Oracle 7.x and Oracle 8.x Similarities
Oracle Background Processes
DBWR Database Writer Writes the modied blocks to disk but only when too few database buffers are free At specic time, causes all modied database buffers and control les to be written to disk. Log Writer Writes redo logs to disk. These are the changes that will eventually be written to disk. System Monitor Performs automatic instance recovery at startup or for other failed instances. If enabled, copies the online redo logs to archival storage when they are full. Process Monitor Performs process recovery when a user process fails. Can also restart failed server processes. The Oracle process that manages most of the locks used by an instance and coordinates requests for those locked by other instances. There can be up to 10 lock processes per instance.
CKPT
LGWR
SMON
ARCH
PMON
LCK 0
D-4

D
Oracle Overview
Oracle 7.x and Oracle 8.x Differences
One of the major difference between Oracle 7.x and Oracle 8.x is the distributed lock management system.
Oracle 7.x UNIX Distributed Lock Manager

The UNIX Distributed Lock Manager (UDLM) used with Oracle 7.x is an external software package that performs global lock management between instances. The UDLM package is installed separately from the database software. Related system processes are:
G G
dlmd dlmmon
Oracle 8.x Integrated Distributed Lock Manager

The Integrated Distributed Lock Manager (IDLM) used with Oracle 8.x has been integrated directly in Oracle for increased efciency. IDLM still uses the lock process but now the you will see the several new process such as:
G
ogms The Oracle Group Membership Services (GMS) is used by the lock manager and other Oracle components for inter-instance initialization and coordination. The GMS software must be started before you can bring an instance up.
lmon Handles instance deaths and associated lock recovery for lock management. This is a background process.
lmd0 Handles remote lock requests that originate from a different instance. This is a background process.
D-5
D
Oracle Overview
Oracle 7.x and Oracle 8.x Differences
Group Membership Services
The startup of GMS is automatic in some conguration and manual in others as follows:
G
For PDB 1.2 there is a patch, 104840-03, that starts the ogms process within the PDB framework. # pdbadmin startnode
For SEC 2.0 and SEC 2.1, the ogms process must be started manually as follows: # scadmin startnode (scadmin startcluster) # ogmsctl start
D-6

D
Oracle Conguration Files
There are several conguration les that contain global settings and parameters that affect general OPS operation. Those that must be tuned for each installation are:
G G G
/etc/opt/SUNWcluster/conf/clustname.ora_cdb /etc/system init.ora
Note The ora_cdb le is not used in Oracle 8 installations
D-7
D
The /etc/system File
Typical /etc/system Shared Memory Entries
set set set set set set set set set set shmsys:shminfo_shmmax=240000000 semsys:seminfo_semmap=2000 semsys:seminfo_semmni=2000 semsys:seminfo_semmns=2000 semsys:seminfo_semmsl=2000 semsys:seminfo_semmnu=2000 semsys:seminfo_semume=1500 shmsys:shminfo_shmmin=1500 shmsys:shminfo_shmmni=1500 shmsys:shminfo_shmseg=1500
Oracle 8 Differences
There is no difference in the /etc/system parameters for Oracle 8 except there may be value adjustment required.
D-8

D
The /etc/opt/SUNWcluster/conf/clustname .ora_cdb File
The parameters in this le affects the shared memory usage of the UDLM processes. The values shown in the following example are typical ora_cdb values only. Your installation will require different values. oracle.maxproc oracle.maxres oracle.maxlock oracle.dba.gid oracle.useISM : : : : : 50 950 1900 dba 1
Oracle 8 Parameter Differences

This le is not used in an Oracle 8 installation. The maxproc, maxres, and maxlock variable are called LM_PROC, LM_RESS, and LM_LOCKS in Oracle 8 and are entered in the init_ora le.
Potential Problems
If there is insufficient shared memory in the /etc/system file or if the configuration parameters in the ora_cdb file are greater than configured shared memory in the system, you will be unable to start the Sun Cluster software when the scadmin startnode command is invoked.
Information Resources
If you are experiencing shared memory problems, the following suggestions may provide useful information:
G G
Use ipcs -am to check shared memory usage For an Oracle 7 installation, check the /var/opt/SUNWcluster/dlm_clustname/logs le for information about the memory required by UDLM.
D-9
D
The init_ora File
Typical init.ora Values
db_file_multiblock_read_count = 8 db_block_buffers = 60 shared_pool_size = 3500000 log_checkpoint_interval = 10000 processes = 50 dml_locks = 100 log_buffer = 8192 sequence_cache_entries = 10 sequence_cache_hash_buckets = 10 global_names = TRUE
Oracle 8 Parameter Differences

G
parallel_server = true The OPS instance will not start unless this parameter is present.
LM_RESS = value This denes the number of resources used by the DLM layer.
LM_LOCKS = value The number of locks to be used by the DLM layer.
LM_PROCS = value Controls the number of Oracle processes that can use the lock management service.
D-10

D
Oracle Database Volume Access
The connections and ownership of Oracle volumes must be congured correctly to allow database access. You must:
G
Create a master directory with links to the volumes if you prefer to access them by name instead of device path Use correct methods to set volume ownership
D-11
D
/tpcs acct00 acct01 acct02 acct03 acct04 acct05 acct06 acct07 acct08 acct09 acct10 acct11 acct12 acct15 brch00 controlfile hist00 hist01 hist02 hist03 ibrch00 itllr00 log00 log01 log02 log03 system tllr00
Links
/dev/vx/rdsk/tpcs v1 v12 v15 v18 v20 v23 v26 v29 v10 v13 v16 v19 v21 v24 v27 v3 v11 v14 v17 v2 v22 v25 v28 v30
v31 v32 v33
v34 v35 v36
v4 v5 v6
v7 v8 v9
The previous example shows the following:

G
The /tpcs directory and its subdirectories are owned by oracle/dba (user/group). The volumes are owned by oracle/dba.
Caution Do not use chown or chgrp to set volume ownership. This should be done with Volume Manager commands.
D-12

D
Oracle Volume Types
You can use Oracle volumes to access the raw devices such as /dev/vx/rdsk/tpcs/v11. The CVM volume should be created with volume type gen. The volume type fsgen should not be used. It is for le systems (block devices).
CVM Volume Pathnames

The /dev/vx structures for volumes not in the rootdg disk group are not present until the cluster software is started. They disappear when the cluster software is stopped. They are built by the vxconfigd daemon each time the cluster software is started. This is true of all shared disk structures. The rootdg /dev/vx entries are created at boot time by the vxconfigd daemon.
Changing Permission or Ownership of Volumes

To run applications, such as Oracle7 Parallel Server, it might be necessary to change read/write permissions and ownership of the volumes. To change the permissions or ownership of CVM volumes, use the vxedit command. vxedit will set the necessary elds in a Volume Manager record. You can use vxedit as follows: vxedit -g tpcs set user=oracle group=dba mode=660 acct06
Caution Do not use chown or chmod because the device les are recreated at each boot, and Volume Manager will not restore any changes. One problem that can arise if you use chown or chmod is Oracle being unable to restart.
D-13
D
DLM Reconguration
The clustd daemon initiates a reconguration whenever a node fails or rejoins the cluster. If another node is entering the cluster, all nodes must perform a DLM reconguration. The DLM reconguration is synchronized between all nodes so that each step of the reconguration is completed on all nodes before proceeding.
Node 0 Cluster interconnect Step synchronization DLM DLM
Node 1
Shared memory segment
Shared memory segment
lck 0 process reconf_ener reconf_ener
lck 0 process
clustd steps Initiate reconguration Initiate reconguration
clustd steps
Figure D-2
DLM Recovery Mechanism
D-14

D
DLM Reconguration
DLM Reconguration Steps
During a cluster reconguration, a portion of the reconguration steps are dedicated to DLM lock recovery. RETURN Disables regular processing during reconguration. Step 1 Establishes new connections to the current nodes and recovers from requests that are pending on a master at the time reconguration began. Step 2 Remasters the resource structures to their new master. Step 3 Replays the lock requests Step 4 Re-enables normal processing.
D-15
D
Volume Manager Reconguration with CVM
One of the most distinct differences between the Cluster Volume Manager and the Enterprise Volume Manager is that CVM supports shared disk groups. All cluster hosts can access the same data. This means that each cluster member in a CVM conguration must be made aware of failures that are detected by other members. The primary elements involved are:
G G G
The vxio software driver The vxconfigd conguration daemon The cluster interconnect system
D-16

D
Volume Manager Reconguration
When a disk group is imported and marked as active, a copy of configdb is created in memory and used to keep track of any conguration changes in the volume structure. The vxconfigd daemon is responsible for updating disk-resident copies of configdb. The kernel conguration table is checked by the vxio driver before it attempts to access a virtual structure. It is not necessary to examine the disk-resident copies Node 0 Kernel configuration table Consult before access Cluster Interconnect vxio driver Update UDP UDP Node 1 Kernel configdb table vxio driver Update Node 2
vxconfigd Device access error Update configdb Data Figure D-3 Volume Reconguration With CVM
Node 3
Storage array
D-17
D
Volume Manager Reconguration
Initial Volume Conguration
When SEC starts, the vxconfigd daemon imports disk groups that belong to the node, When the disk group is imported into a node, the kernel conguration table is created by vxconfigd which reads the configdb records from the disk and creates the kernel conguration table.
Volume Reconguration With CVM

When the vxio driver is notied of a hard device error, it disables the volume it was trying to access and updates the Volume Manager kernel conguration table. It also signals vxconfigd that a conguration change has taken place both locally and on all other nodes.
D-18

D
Oracle Parallel Server Specic Software
When you install the Sun Cluster software for an Oracle parallel server (OPS) installation, the only OPS specic software that is loaded is:
G
The SUNWudlm package
Note The Cluster Volume Manager is loaded later. It does not have any special ties to OPS.
The SUNWudlm Package Summary

The SUWudlm package consists of the following components: scogms A script le that is used for Oracle8 Group Membership Services (GMS) control. A compiled binary le. A script le that is run during cluster reconguration. Depending on the cluster status during a reconguraiton, it will run the scogms script with either a stop_pmg option or a start_pmf option. A script le.
udlmctl ogmsrecong
udlmrecong
D-19
Glossary
Active server A node in the Sun Cluster conguration that is providing highly available data services. Administrative workstation A workstation that is either outside the cluster or one of the cluster nodes that is used to run cluster administrative software. Backup group Used by network adapter failover (NAFO). A set of network adapters on the same subnet. Adapters within a set provide backup for each other. CCD quorum The set of Cluster Conguration Databases needed to elect a valid and consistent copy of the Cluster Conguration Database. Cluster Two to four nodes congured together to run either parallel database software or highly available data services. Cluster Conguration Database (CCD) A highly-available, replicated database that can be used to store data for HA data services and other Sun Cluster conguration needs. Cluster interconnect The private network interface between cluster nodes. Cluster Membership Monitor (CMM) The software that maintains a consistent cluster membership roster to avoid database corruption and subsequent transmission of corrupted or inconsistent data to clients. When
Glossary-1
nodes join or leave the cluster, thus changing the membership, CMM processes on the nodes coordinate global reconguration of various system services. Cluster node A physical machine that is part of a Sun cluster. Also referred to as a cluster host or cluster server. Cluster quorum The set of cluster nodes that can participate in the cluster membership. Cluster reconguration An ordered multistep process that is invoked whenever there is a signicant change in cluster state, such as takeover, switchover, or a physical host reboot. During cluster reconguration, the Sun Cluster software coordinates all of the physical hosts that are up and communicating. Those hosts agree on which logical host(s) should be mastered by which physical hosts. Cluster pair topology Two pairs of Sun Cluster nodes operating under a single cluster administrative framework. Cluster SNMP agent The cluster Simple Network Management Protocol (SNMP) agent is used to monitor several clusters (a maximum of 32) at the same time. CMM quorum See cluster quorum. Concatenation A metadevice created by sequentially mapping blocks on several physical slices (partitions) to a logical device. Two or more physical components can be concatenated. The slices are accessed sequentially rather than interlaced (as with stripes). Data service A network service that implements read-write access to diskbased data from clients on a network. NFS is an example of a data service. The data service may be composed of multiple processes that work together. Default master The node that is congured to master a disk group when the logical hosts are congured.
Glossary-2 Sun Enterprise Cluster Administration
Direct attached device A disk storage unit that is physically connected to all nodes in the cluster. Distributed Lock Manager (DLM) Locking software used in a shared disk Oracle7 or Oracle8 Parallel Server (OPS) environment. The DLM enables Oracle processes running on different nodes to synchronize database access. The DLM is designed for high availability; if a process or node crashes, the remaining nodes do not have to be shut down and restarted. A quick reconguration of the DLM is performed to recover from such a failure. Disk expansion unit The physical storage enclosure that holds the multihost disks. For example, SPARCstorage Arrays, Sun StorEdge MultiPacks, Sun StorEdge A3000s and Sun StorEdge A5000s. Disk group A well dened group of multhost disks that move as a unit between two servers in an HA conguration. This can be either a Solstice DiskSuite diskset or a Sun StorEdge Volume Manager disk group. Diskset See disk group. DiskSuite state database A replicated database that is used to store the conguration of metadevices and the state of these metadevices. Fault detection Sun Cluster programs that detect two types of failures. The rst type includes low-level failures such as system panics and hardware faults (that is, failures that cause the entire server to be inoperable). These failures can be detected quickly. The second type of failures are related to data service. These types of failures take longer to detect. Fault monitor A fault daemon and the programs used to probe various parts of data services. Fibre channel connections Fibre connections connect the nodes with the SPARCstorage Arrays.
Glossary
Glossary-3
Golden mediator In Solstice DiskSuite congurations, the in-core state of a mediator host set if specic conditions were met when the mediator data was last updated. The state permits take operations to proceed even when a quorum of mediator hosts is not available. HA Administrative le system A special le system created on each logical host when Sun Cluster is rst installed. It is used by Sun Cluster and by layered data services to store copies of their administrative data. Heartbeat A periodic message sent between the several membership monitors to each other. Lack of a heartbeat after a specied interval and number of retries may trigger a takeover. Highly available data service A data service that appears to remain continuously available, despite single-point failures of server hardware or software components. Host A physical machine that can be part of a Sun cluster. In Sun Cluster documentation, host is synonymous with node. Hot standby server In an N+1 conguration, the node that is connected to all multihost disks in the cluster. The hot standby is also the administrative node. If one or more active nodes fail, the data services move from the failed node to the hot standby. However, there is no requirement that the +1 node cannot run data services in normal operation. Local disks Disks attached to a HA server but not included in a diskset. The local disks contain the Solaris distribution and the Sun Cluster and volume management software packages. Local disks must not contain data exported by the Sun Cluster data service. Logical host A set of resources that moves as a unit between HA servers. In the current product, the resources include a collection of network host names and their associated IP addresses plus a group of disks (a diskset). Each logical host is mastered by one physical host at a time.
Glossary-4 Sun Enterprise Cluster Administration
Logical host name The name assigned to one of the logical network interfaces. A logical host name is used by clients on the network to refer to the location of data and data services. The logical host name is the name for a path to the logical host. Because a host may be on multiple networks, there may be multiple logical host names for a single logical host. Logical network interface In the Internet architecture, a host may have one or more IP addresses. HA congures additional logical network interfaces to establish a mapping between several logical network interfaces and a single physical network interface. This allows a single physical network interface to respond to multiple logical network interfaces. This also enables the IP address to move from one HA server to the other in the event of a takeover or haswitch(1M), without requiring additional hardware interfaces. Master The server with exclusive read and write access to a diskset. The current master host for the diskset runs the data service and has the logical IP addresses mapped to its Ethernet address. Mediator In a dual-string conguration, provides a third vote in determining whether access to the metadevice state database replicas can be granted or must be denied. Used only when exactly half of the metadevice state database replicas are accessible. Mediator host A host that is acting in the capacity of a third vote by running the rpc.metamed(1M) daemon and that has been added to a diskset. Mediator quorum The condition achieved when half + 1 of the mediator hosts are accessible. Membership monitor A process running on all HA servers that monitors the servers. The membership monitor sends and receives heartbeats to its sibling hosts. The monitor can initiate a takeover if the heartbeat stops. It also keeps track of which servers are active.
Glossary
Glossary-5
Metadevice A group of components accessed as a single logical device by concatenating, striping, mirroring, or logging the physical devices. Metadevices are sometimes called pseudo devices. Metadevice state database Information kept in nonvolatile storage (on disk) for preserving the state and conguration of metadevices. Metadevice state database replica A copy of the state database. Keeping multiple copies of the state database protects against the loss of state and conguration information. This information is critical to all metadevice operations. Mirroring Replicating all writes made to a single logical device (the mirror) to multiple devices (the submirrors), while distributing read operations. This provides data redundancy in the event of a failure. Multihomed host A host that is on more than one public network. Multihost disk A disk congured for potential accessibility from multiple servers. Sun Cluster software enables data on a multihost disk to be exported to network clients via a highly available data service. Multihost disk expansion unit See Disk expansion unit. N to N topology All nodes are directly connected to a set of shared disks. N+1 topology Some number (N) active servers and one (+1) hot-standby server. The active servers provide on-going data services and the hot-standby server takes over data service processing if one or more of the active servers fail. Node A physical machine that can be part of a Sun cluster. In Sun Cluster documentation, it is synonymous with host or node.
Glossary-6

Nodelock The mechanism used in greater than two-node clusters using Cluster Volume Manager or Sun StorEdge Volume Manager to failure fence failed nodes. Parallel database A single database image that can be accessed concurrently through multiple hosts by multiple users. Partial failover Failing over a subset of logical hosts mastered by a single physical host. Potential master Any physical host that is capable of mastering a particular logical host. Primary logical host name The name by which a logical host is known on the primary public network. Primary physical host name The name by which a physical host is known on the primary public network. Primary public network A name used to identify the rst public network. Private links The private network between nodes used to send and receive heartbeats between members of a server set. Quorum device In SSVM or CVM congurations, the system votes by majority quorum to prevent network partitioning. Since it is impossible for two nodes to vote by majority quorum, a quorum device is included in the voting. This device could be either a controller or a disk. Replica See metadevice state database replica. Replica quorum A Solstice DiskSuite concept; the condition achieved when HALF + 1 of the metadevice state database replicas are accessible.
Glossary
Glossary-7
Ring topology One primary and one backup server is specied for each set of data services. Scalable Coherent Interface A high speed interconnect used as a private network interface. Scalable topology See N to N topology. Secondary logical host name The name by which a logical host is known on a secondary public network. Secondary physical host name The name by which a physical host is known on a secondary public network. Secondary public network A name used to identify the second or subsequent public networks. Server A physical machine that can be part of a Sun cluster. In Sun Cluster documentation, it is synonymous with host or node. Sibling host One of the physical servers in a symmetric HA conguration. Solstice DiskSuite A software product that provides data reliability through disk striping, concatenation, mirroring, UFS logging, dynamic growth of metadevices and le systems, and metadevice state database replicas. Stripe Similar to concatenation, except the addressing of the component blocks is non-overlapped and interlaced on the slices (partitions), rather than placed sequentially. Striping is used to gain performance. By striping data across disks on separate controllers, multiple controllers can access data simultaneously. Submirror A metadevice that is part of a mirror. See also mirroring.
Glossary-8

Sun Cluster Software and hardware that enables several machines to act as read-write data servers while acting as backups for each other. Switch Management Agent (SMA) The software component that manages sessions for the SCI and Ethernet links and switches. Switchover The coordinated moving of a logical host from one operational HA server to the other. A switchover is initiated by an administrator using the haswitch(1M) command. Symmetric conguration A two-node conguration where one server operates as the hotstandby server for the other. Takeover The automatic moving of a logical host from one HA server to another after a failure has been detected. The HA server that has the failure is forced to give up mastery of the logical host. Terminal Concentrator A device used to enable an administrative workstation to securely communicate with all nodes in the Sun Cluster. Trans device In Solstice DiskSuite congurations, a pseudo device responsible for managing the contents of a UFS log. UFS An acronym for the UNIX le system. UFS logging Recording UFS updates to a log (the logging device) before the updates are applied to the UFS (the master device). UFS logging device In Solstice DiskSuite congurations, the component of a transdevice that contains the UFS log. UFS master device In Solstice DiskSuite congurations, the component of a transdevice that contains the UFS le system.
Glossary
Glossary-9
Acronyms Glossary
AC Autocheck + Automatic Computer + Alternating Current API Application Program Interface ATM Adobe Typeface Manager + Asynchronous Transfer Mode + Automated Teller Machine CCDD Cluster Conguration Database Demon CCM Cluster Membership Monitor CD Carrier Detect + Change Directory + Collision Detection + Color Display + Compact Disk CDB Cluster Database CDE Common Desktop Environment + Complex Data Entry CCD Cluster Conguration Database CIS Cluster Interconnect System CMM Cluster Membership Monitor
Acronyms-1
CPU Central Processing Unit CVM Cluster Volume Manager DAT Digital Audio Tape + Disk Array Technology DBMS Data Base Management System DBWR Database Writer DID DIsk ID DIMM Dual In-Line Memory Module DLM Distributed Lock Manager + Dynamic Link Module DLPI Data Link Provider Interface DNS Domain Naming System DR Dynamic Reconguration DRL Dirty Region Logging DRAM Dynamic Random Access Memory DSS Decision Support System DWS Data Warehousing System ECC Error Correcting Code + Error Checking and Correction ECF Enterprise Cluster Framework
Acronyms-2 Sun Enterprise Cluster Administration
EEPROM Electrically Erasable Programmable Read-Only Memory EVM Enterprise Volume Manager FCAL Fiber Channel Analog Loop + Fiber Channel ARbitrated Loop FCOM Fiber Channel Optical Module FDDI Fiber Digital Device Interface + Fiber Distributed Data Interface FF Failfast GBIC Gigabit Interface Converters GUI Graphical User Interface HA High Availability + Highly Available HADS HIghly Available Data Services HA-API High Availability - Application Program Interface HA-NFS Highly Availability - Network File System [Sun] HTTP HyperText Transfer Protocol ID Identication + Identier IDLM Integrated Distributed Lock Manager I/O Input/Output IP Internet Protocol
Acronyms Glossary Acronyms-3
LAN Local Area Network LANE Local Area Network Emulation LDAP Lightweight Directory Access Protocol LGWR Log Writer LOFS Loopback File System LP Local Probes MAC Media Access Control MI Multi-initiator MIB Management Information Base MIS Management Information System + Multimedia Information Sources [Internet] MPP Massively Parallel Processing + Message Posting Protocol + Message Processing Program NAFO Network Adapter Failover NFS Network File System [Sun] NIS Network Information Service [Unix] NTP Network Time Protocol NVRAM Non-Volatile Random Access Memory
Acronyms-4

OBP OpenBoot PROM OGMS Oracle Group Membership Services OLAP On-Line Analytical Processing OLTP On-line Transaction Processing OPS Oracle Parallel Server + Open Proling Standard + Operations OS Operating System PCI Peripheral Component Interconnect/Interface PDB Parallel DataBase PGA Program Global Area PMON Process Monitor PNM Public Network Management PROM Programmable Read Only Memory RAID Redundant Arrays of Independent Disks + Redundant Arrays of Independent Drives + Redundant Arrays of Inexpensive Disks RDBMS Relational Database Management System RP Remote Probes ROM Read Only Memory
Acronyms Glossary
Acronyms-5
RPC Remote Procedure Call RSM Redundant Storage Module SAP Service Access Point SCI Scalable Coherent Interface + Serial Communications Interface SCM Sun Cluster Manager SCSI Small Computer Systems Interface SDS Solstice DiskSuite SEC Sun Enterprise Cluster SENA Sun Enterprise Network Array SGA System Global Area SIMM Single In-line Memory Module SMA Switch Management Agent SMON System Monitor SNM SunNet Manager SNMP Simple Network Management Protocol SOC Serial Optical Channel SPARC Scalable Processor Architecture
Acronyms-6 Sun Enterprise Cluster Administration
SSA SPARCstorage Array + Serial Storage Architecture SSP System Service Processor SSVM Sun StorEdge Volume Manager TC Terminal Concentrator TCP Transmission Control Protocol TCP/IP Transmission Control Protocol/Internet Protocol TPE Twisted-Pair Ethernet UDLM Unix Distributed Lock Manager UDP User Data Protocol UFS Unix File System VM Virtual Machine + Virtual Memory+ Volume Manager VxFS Veritas File System VxVA Volume Manager Visual Administrator
Acronyms Glossary
Acronyms-7
Index
Symbols
.rhosts 3-15
C
cabling Network Terminal Server 2-10 CCD database 11-16 cluster configuration 9-6, 9-7 disabling 4-31 volume creation 4-31, 9-13 CCD quorum disabling 9-18 ccdadm 9-16 ccp cluster administration tools 3-19 CDB database cluster configuration 9-5 cdb Format 9-5 client packages 3-6 cluster administration interface administration workstation 2-6 cluster administration tools ccp 3-19 cluster console 3-21 to 3-23 cluster console common window 3-22 cluster console windows 3-21 cluster control panel 3-20 cluster help tool 3-24 cluster configuration CCD database 9-6, 9-7 CDB database 9-5 hardware components
Index-1
A
administration tools administration workstation 2-6 cluster control panel 3-20 administration workstation administration tools 2-6 cluster console 2-6 cluster administration interface 2-6 configuration files 3-16, 3-18 serial port connections 2-6 terminal concentrator 2-6 administration workstation environment access control 3-15 directory paths 3-13 host names 3-14 remote display 3-15 remote login 3-14 administrative file system creation 11-13 to 11-16
B
boot Network Terminal Server 2-9 boot disk encapsulated 7-23
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services September 1999, Rev. A.1
public network interface system 1-6 terminal concentrator 1-6 hardware configurations administration workstation 1-6, 1-7 host systems 1-7 mirrored disk storage 1-6 private network interface 1-6 public network interface system 1-6 terminal concentrator 1-7 two hosts 1-6 cluster configuration information backup 9-4 cluster console administration tools administration workstation 2-6 cluster administration tools 3-21, 3-23 cluster console common window cluster administration tools 3-22 cluster console windows cluster administration tools 3-21 Cluster Control Panel 6-5, 6-8 adding applications 6-9 cluster console 6-10 startup 6-9 cluster control panel administration tools 3-20 cluster administration tools 3-20 cluster help tool cluster administration tools 3-24 cluster interconnect failures 13-18 error messages 13-19 Ethernet 13-20 Scalable Coherent Interface 13-20
Index-2 Sun Enterprise Cluster Administration
cluster interconnect systems Ethernet 4-17 Scalable Coherent Interface Interconnect 4-17 cluster name configuration 5-12 server software 3-18 cluster software components recovery cluster membership monitor 13-6 Cluster Volume Manager 13-7 data service fault monitors 13-7 database management software 13-7 failfast driver 13-6 public network management 13-6 Solstice DiskSuite 13-7 Sun StorEdge Volume Manager 13-7 switch management agent 13-6 cluster software installation cluster interconnect system 4-16 Cluster Volume Manager 7-15, 8-9 disk groups 7-14 to 7-15, ?? to 8-9 initialization 7-6 recovery 13-7 Redundant Array of Inexpensive Disks 1-10 Sun Cluster software 3-4 volume management 5-10, 7-5 command confccdssa 9-12 dfstab 12-18 hactl 14-13, 14-14, 14-26, 14-27, 14-31 haget 14-23 to 14-24, 14-31 halockrun 14-28
hareg 14-12 custom data service registration 12-23 data service registration 12-21 global data service control 12-25 starting data service 12-25 stopping data service 12-25 haswitch 11-19, 14-10 hatimerun 14-29 initrootdg 7-30 metadb replica status 8-17, 8-18 metastat 8-16 nvramrc B-4 nvstore B-6 passwd 3-15 pmfadm 14-30 pnmd 10-5 public network management 4-27 pnmptor 10-19 pnmrtop 10-19 pnmset 10-14 public network management configuration 4-28 reconf_ener 13-11, 13-12, 13-15 scadmin 11-19, 13-12 scconf 5-29, 11-7, 11-13, 11-15 precautions 11-16 scdidadm 8-20 scinstall 5-9 software installation 3-7 sm_config cluster interconnect system 4-20 post-installation Scalable Coherent Interface interconnect 5-3 3 SUNWscins software installation 3-6
vxprint disk status 7-19 configuration cluster interconnect system point-to-point connections 4-20 Scalable Coherent Interface high-speed switch connections 4-21, 4-24 Scalable Coherent Interface identification 4-22, 4-23, 4-25 sm_config 4-20 cluster interconnect systems 4-17 cluster name 5-12 clustered pairs topology 4-5 data integrity failure fencing 5-21 node locking 5-22 partitioned cluster 5-25 quorum device 5-23 logical host 5-19 private network Ethernet interconnect 5-15 Scalable Coherent Interface interconnect 5-14 public network 5-16 to ?? quorum device ring topology 5-24 StorEdge Volume Manage 5-24 ring topology 4-6 Scalable Coherent Interface 5-28 scalable topology 4-9 volume management 7-4, 8-4 Solstice DiskSuite 5-13 consistency data propagation 9-8 database consistency checking 9-10 database majority 9-10 custom data service registration hareg command 12-23
Index-3
D
data service 12-25 physical hostname 14-7 client server 14-6 dependencies 14-6, 14-21 global control hareg command 12-25 highly available 1-2 logical host IP addresses 14-7 multi-homed hosts 14-7 registration hareg command 12-21 starting hareg command 12-25 unconfiguring hareg command 12-24 definitions ABORT methods 14-10 cluster administration interface 2-5 cluster console 2-6 cluster membership monitor 13-6 clustered pairs topology 4-5 disk groups 7-14, 8-8 distributed lock management 7-15 failfast driver 13-6 FM_CHECK method 14-13 FM_INIT method 14-13 FM_STOP method 14-13 hactl command 14-26 haget command 14-23 halockrun command 14-28 hatimerun command 14-29 High Availability-Network File System 12-4 interface groups feature 10-9 logical host 5-18, 11-4 multi-homed hosts 14-7 N+1 topology 4-7 network trunking 10-9 partitioned cluster 5-25 pmfadm command 14-30 Public Network Management
software 10-4 public network management software 4-26 quorum devices 4-10 ring topology 4-6 scalable topology 4-9 START methods 12-6, 14-9 STOP methods 12-6, 14-10 switch management agent 13-6 terminal concentrator 2-4, 2-6 dfstab file High Availability-Network File System support files 12-20 directory paths, needed by Sun Cluster 5-31 disk groups Cluster Volume Manager 7-14, 7-15, 8-9 Sun StorEdge Volume Manager 7-14, 7-16 disk identification pseudo driver configuring 8-20 disksets planning the layout 8-32 Solstice DiskSuite 7-16 Domain Naming System 1-16 high availability support 1-16 drivers vxio post-installation 7-23, 7-25, 7-29 dual strings survival requirements 8-21
E
error messages cluster interconnect failures 13-19 failfast driver 13-10 Ethernet interconnect configuration private network 5-15
F
failfast driver 13-8 to 13-10 error messages 13-10 failure fencing configuration data integrity 5-21 failure recovery array controller 1-24 cluster interconnect 1-24 configuration database file 1-24 critical process 1-24 disk drive 1-24 fibre channel 1-24 node 1-24 fault monitoring process nfs_local_start 12-17 nfs_mon 12-17 nfs_probe_loghost 12-17 fault probes local 12-12 giveaway 12-13 remote 12-12, 12-14
G
giveaway 14-15
H
hads libraries 14-32 haoracle(1M) command, usage 15-34 hareg command unconfiguring data service 12-24 hastat 6-11 cluster error messages 6-17 data service status 6-16 general cluster status 6-12 logical host configuration 6-13 private network status 6-14 public network status 6-15 high availability
support HyperText Transfer Protocol 1-16 Lightweight Directory Access Protocol 1-16 Network File System 1-16 Oracle Parallel Server 1-16 Service Access Point 1-16 High Availability Server Sun Enterprise Cluster 1-1 high availability support high availability-application protocol interface 1-16 High Availability-Network File System components 12-6 fault monitoring 12-10 start methods 12-7 stop methods 12-9 support issues Kerbos 12-5 local file system access 12-5 PC client support 12-5 PrestoServe support PrestoServe support High Availability-Network File System 12-5 security 12-5 Highly Available Data Service topology clustered pairs 4-5 N+1 4-7 ring 4-6, 4-7 hosts.equiv 3-15 hot swapping 1-9 HyperText Transfer Protocol high availability support 1-16
I
Informix Online XPS topology
Index-5
shared-nothing 4-8 initialization Cluster Volume Manager 7-6 Solstice DiskSuite 8-6 Sun StorEdge Volume Manager 7-6 installation data integrity 5-20
L
Lightweight Directory Access Protocol high availability support 1-16 logical host components 11-6 configuration basic 11-10 cascading failover 11-11 disabling automatic switchback 11-12 creation 11-7 to 11-9 deletion 11-9 design considerations 11-5 logical host file system adding 11-18 vfstab file 11-17, 11-18
14-17, 14-22 STOP 14-7, 14-10, 14-11, 14-12, 14-17, 14-18, 14-22 STOP_NET 14-11, 14-17, 14-22 Multi-Initiator Small Computer Systems Interface changing individual initiator identification B-5 drive firmware B-3 initiator identification B-3
N
network adapter failover groups public network management 4-26 Network File System high availability support 1-16 Network Terminal Server boot 2-7, 2-8 boot sequence 2-9 cabling 2-10 load parameters 2-9 node locking 4-14 configuration data integrity 5-22 non-volatile random access memory terminal concentrator configuration 2-9
M
metadevices state database replicas, configuring 8-19 methods 14-5, 14-8 ABORT 14-10, 14-11, 14-22 ABORT_NET 14-10, 14-22 fault monitoring 14-12 FM_CHECK 14-12, 14-13, 14-14, 14-16 FM_INIT 14-12, 14-13 FM_START 14-12, 14-13 FM_STOP 14-12, 14-13 net 14-11 START 14-7, 14-9 to 14-12, 14-17, 14-18, 14-22 START_NET 14-11, 14-12,
O
operating system load terminal concentrator 2-9 Oracle pfile 15-34 Oracle Parallel Server high availability support 1-16 topology clustered pairs 4-5
P
package set client 3-6 parallel database features and components 1-1 Sun Enterprise 1-1 partitioned cluster configuration data integrity 5-25 post-installation 5-29, 5-30 directory paths 5-31 Scalable Coherent Interface interconnect configuration 5-32 scconf command 5-29 vxio drivers 7-23, 7-25, 7-29 private region 7-7 Public Network Management interface support 10-9 network adapter failover 10-4 public network management configuration pnmset command 4-28 Ethernet 10-8 media access control 10-8 monitoring routines DETERMINE_NET_FAILU RE 10-13 FAILOVER 10-12 TEST 10-11 remote procedure call 10-8 support issues interface groups feature 10-9 network trunking 10-9
configuration 5-24
R
RAID Technology 1-10 reconfiguration independent 13-13 steps 13-14 to 13-16, ?? to 13-17 triggering 13-13 volume D-17 Redundant Array of Inexpensive Disks 1-10 redundant mirrored volumes 1-10 Solstice DiskSuite 1-10 StorEdge Volume Manager 1-10 replica requirement 8-7 replicas, See metadevices, state database replicas rootdg Cluster Volume Manager 7-22 Sun StorEdge Volume Manager 7-22
S
scadmin 6-6 adding nodes 6-7 command options 6-7 removing nodes 6-7 starting nodes 6-7 scalability (RAS) 1-9 Scalable Coherent Interface configuration 5-28 Scalable Coherent Interface interconnect configuration post-installation 5-32 private network 5-14 post-installation sm_config 5-33 serial port connections administration workstation 2-6
Index-7
Q
quorum device configuration data integrity 5-23 ring topology 4-13 ring topology configuration 5-24 scalable topology 4-14, 4-15 StorEdge Volume Manager
server packages framework 5-6 highly available data service support 5-7 highly available database support 5-8 Oracle Parallel Server support 5-8 Scalable Coherent Interface interconnect support 5-6 Solstice DiskSuite support 5-7 Service Access Point high availability support 1-16 setup port terminal concentrator 2-9 setup programs terminal concentrator 2-9 shared CCD checkpointing 9-17 creating 9-13 disabling 9-15 restoring 9-17 verifying consistency 9-16 Simple Network Management Protocol Management Information Base tables 6-28 traps 6-29, 6-30 Small Computer Systems Interface quorum device 4-11 software distribution 6-4, 7-4, 8-4, 13-4, D-2 software installation options 3-8 package dependencies 3-6 scinstall command 3-7 server option 3-10 Solaris volume management 3-4 Solstice DiskSuite 1-10 configuration local metadevice state database replicas 8-19 configuration disk
identification pseudo driver 8-20 disksets 7-16 initialization 8-6 replicas 8-7 planning diskset layout 8-32 post-installation configure disk ID driver 8-20 configure mediators 8-21 configure state database replicas 8-19 recovery 13-7 Sun Cluster software 3-4 volume management 5-10, 8-5 configuration 5-13 stopping hareg command 12-25 StorEdge Volume Manager 7-16 Redundant Array of Inexpensive Disks Cluster Volume Manager 1-10 volume management 5-10 Sun Cluster directory paths needed 5-31 high availability products supported 1-16 parallel database products supported 1-16 server software configuration support 5-4 licensing 5-8 Sun Enterprise Cluster hardware 3-4 Sun Cluster Manager configuration display 6-23 display 6-20 event viewer 6-24 Graphical User Interface 6-1 help display 6-26 log filter 6-25 startup 6-19 Sun Cluster software
Cluster Volume Manager 3-4 Solstice DiskSuite 3-4 Sun StorEdge Volume Manager 3-4 Sun Enterprise Cluster High Availability Server 1-1 Sun Enterprise Cluster hardware Sun Cluster software 3-4 Sun StorEdge Volume Manager disk groups 7-14, 7-16 initialization 7-6 recovery 13-7 Sun Cluster software 3-4 volume management 7-5 symmetric multiprocessing (SMP) 1-9
T
takeaway 14-16 terminal concentrator administration workstation 2-6 cable length 2-20, 2-21 configuration 2-11 to 2-14 operating system load 2-9 setup port 2-9 setup programs 2-9 topology 4-4, 4-5 clustered pairs 4-5 Highly Available Data Service 4-5 Oracle Parallel Server 4-5 N+1 4-7 Highly Available Data Service 4-7 ring 4-6 Highly Available Data Service 4-6, 4-7 scalable 4-9 shared-nothing 4-8 Informix Online XPS 4-8
File System support files 12-19, 12-20 volume management Cluster Volume Manager 5-10, 7-5 Oracle Parallel Server 5-11 configuration 5-11 disk status 7-19 replica status 8-17, 8-18 Solaris 3-4 Solstice DiskSuite 5-10, 8-5 status 7-17, 8-15 StorEdge Volume Manager 5-10 Sun StorEdge Volume Manager 7-5 volume status 7-18 to 8-16 vxprint 7-18 volume reconfiguration D-17 vxconfigd D-18 vxprint volume status 7-18
V
vfstab file High Availability-Network
Index-9
Copyright 1999 Sun Microsystems Inc., 901 San Antonio Road, Palo Alto, California 94303, Etats-Unis. Tous droits rservs. Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution, et la dcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit, sans lautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a. Le logiciel dtenu par des tiers, et qui comprend la technologie relative aux polices de caractres, est protg par un copyright et licenci par des fournisseurs de Sun. Des parties de ce produit pourront tre drives du systmes Berkeley 4.3 BSD licencis par lUniversit de Californie. UNIX est une marque dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company Ltd. Sun, Sun Microsystems, le logo Sun, Sun Enterprise, Sun StorEdge Volume Manager, Solstice DiskSuite, Solaris Operating Environment, Sun StorEdge A5000, Solstice SyMon, NFS, JumpStart, Sun VTS, OpenBoot, and AnswerBooksont des marques de fabrique ou des marques dposes de Sun Microsystems, Inc. aux Etats-Unis et dans dautres pays. Toutes les marques SPARC sont utilises sous licence sont des marques de fabrique ou des marques dposes de SPARC International, Inc. aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun Microsystems, Inc. UNIX est une marques dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd. Linterfaces dutilisation graphique OPEN LOOK et Sun a t dveloppe par Sun Microsystems, Inc. pour ses utilisateurs et licencis. Sun reconnat les efforts de pionniers de Xerox pour larecherche et le dveloppement du concept des interfaces dutilisation visuelle ou graphique pour lindustrie de linformatique. Sun dtient une licence non exclusive de Xerox sur linterface dutilisation graphique Xerox, cette licence couvrant galement les licencis de Sun qui mettent en place linterface dutilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun. Laccord du gouvernement amricain est requis avant lexportation du produit. Le systme X Window est un produit de X Consortium, Inc. LA DOCUMENTATION EST FOURNIE EN LETAT ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A LAPTITUDE A UNE UTILISATION PARTICULIERE OU A LABSENCE DE CONTREFAON.
Please Recycle

SunEnterprise Cluster Administration

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

SunEnterprise Cluster Administration

Enviado por

Direitos autorais:

Formatos disponíveis

Sun Enterprise Cluster Administration

Student Guide With Instructor Notes

Rev. A, September 1999