Você está na página 1de 416

TruCluster Available Server

Conguration and Management


Course Guide
Order Number: EY-V355E-SG.0001

Digital Equipment Corporation October 1996.


This document is condential and proprietary and is the property of Digital Equipment
Corporation.
The information in this document is subject to change without notice and should not
be construed as a commitment by Digital Equipment Corporation. Digital Equipment
Corporation assumes no responsibility for any errors that may appear in this document.
Possession, use, duplication, or dissemination of the software described in this
documentation is authorized only pursuant to a valid written license from Digital or
the third-party owner of the software copyright.
No responsibility is assumed for the use or reliability of software on equipment that is
not supplied by Digital Equipment Corporation or its afliated companies.
Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to
restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical
All Rights Reserved.
Printed in U.S.A.
AdvantageCluster, AlphaGeneration, AlphaServer, AXP, Bookreader, CDA, DEC,
DECevent, DECnet, DECnsr, DECsafe, DECwindows, Digital, Digital UNIX, HSC,
LAT, LinkWorks, OpenVMS, PATHWORKS, POLYCENTER, PrintServer, StorageWorks,
TruCluster Software, TURBOchannel, ULTRIX, VAX, VAXcluster, VAX Notes, VMS,
VMScluster, XMI and the DIGITAL logo are trademarks of Digital Equipment
Corporation.
AIX and IBM are registered trademarks of International Business Machines
Corporation. AppleTalk is a registered trademark of Apple Computer, Inc. Global
Knowledge Network and the Global Knowledge Network logo are trademarks of Global
Knowledge Network, Inc. Hewlett-Packard, HP and HP-UX are registered trademarks
of Hewlett-Packard Company. MEMORY CHANNEL is a trademark of Encore Computer
Corporation. Microsoft is a registered trademark of Microsoft Corporation. MIPS is
a trademark of MIPS Computer Systems, Inc. Motif, OSF and OSF/1 are registered
trademarks of the Open Software Foundation. NFS, NEWS, Solaris and Sun are
registered trademarks of Sun Microsystems, Inc. Novell and NetWare are registered
trademarks of Novell, Inc. ORACLE is a registered trademark of Oracle Corporation.
ORACLE Parallel Server and ORACLE7 are trademarks of Oracle Corporation.
POSIX is a registered trademark of IEEE. PostScript is a registered trademark of
Adobe Systems, Inc. Sony is a registered trademark of Sony Corporation. SunOS
is a trademark of Sun Microsystems, Inc. UNIX is a registered trademark licensed
exclusively through X/Open Company Ltd. Windows and Windows NT are trademarks
of Microsoft Corporation. X/Open is a trademark of X/Open Company Ltd. X Window
System is a trademark of the Massachusetts Institute of Technology.

This document was prepared using VAX DOCUMENT Version 2.1.

Contents
About This Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xix

1 Introducing TruCluster Available Server Conguration and


Management
About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Describing the TruCluster Available Server Product . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Product Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conguring the TruCluster Available Server . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sample Available Server Conguration . . . . . . . . . . . . . . . . . . .
Presenting the TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . .
Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ASE Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Planning TruCluster Available Server Congurations . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Network Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Service Availability Conguration . . . . . . . . . . . . . . . . . . . . . . .
Determining Conguration and Maintenance Phases . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Planning the Available Server Conguration . . . . . . . . . . . . . . .
Conguring ASE Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing and Setting Up the Base Operating System . . . . . . .
Installing the TruCluster Software . . . . . . . . . . . . . . . . . . . . . .
Conguring ASE Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing the TruCluster Software Failover Sequences . . . . . . . .
Monitoring and Managing Available Server Congurations . . .
Troubleshooting an Existing Available Server Conguration . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction to the Available Server Software . . . . . . . . . . . . . .
Conguring the Available Server Software . . . . . . . . . . . . . . . .
Presenting the TruCluster Software . . . . . . . . . . . . . . . . . . . . .
Planning Available Server Congurations . . . . . . . . . . . . . . . . .
Determining Conguration and Maintenance Phases . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Describing the TruCluster Available Server Software Product:
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

12
12
13
13
14
14
15
16
16
16
17
17
19
19
110
111
111
111
112
112
114
114
115
115
116
116
116
116
117
117
118
118
118
118
119
119
120

.....

120

iii

Describing the TruCluster Available Server Software Product:


Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conguring the TruCluster Software: Exercise . . . . . . . . . . . . .
Conguring the TruCluster Software: Solution . . . . . . . . . . . . .
Presenting the TruCluster Software: Exercise . . . . . . . . . . . . .
Presenting the TruCluster Software: Solution . . . . . . . . . . . . . .
Planning Available Server Congurations: Exercise . . . . . . . . .
Planning Available Server Congurations: Solution . . . . . . . . .
Determining Conguration and Maintenance Phases: Exercise
Determining Conguration and Maintenance Phases: Solution

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

120
120
120
121
121
121
121
122
122

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

22
22
22
22
23
23
23
24
25
25
25
27
29
29
29
210
211
211
212
213
213
214
215
215
215
217
218
218
219
220
220
220

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

221
222
222
222
222
222
223
223

2 Understanding TruCluster Software Interactions


About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Highly Available Services . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Independence of Services from Servers . . . . . . . . . . . . . . . . . . .
Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing the TruCluster Software Components . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Software Components . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Software Component Interaction . . . . . . . . . . . . . . .
Understanding TruCluster Failure Detection and Response . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Failure Events that Trigger a Response . . . . . . . . . . . . . . . . . .
Member Node Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCSI Bus Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Critical SCSI Path Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Device Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ASE_PARTIAL _MIRRORING Parameter . . . . . . . . . . . . . . . . .
Network Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Network Interface Failure Response . . . . . . . . . . . . . . . . . . . . .
Network Partition Response . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitored Network Failures . . . . . . . . . . . . . . . . . . . . . . . . . . .
Service Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reserving Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Choosing a New Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Action Script Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LSM and TruCluster Failover . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Highly Available Services . . . . . . . . . . . . . . . . . . . .
Introducing the TruCluster Software Components . . . . . . . . . .
Understanding TruCluster Software Failure Detection and
Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Highly Available Services: Exercise . . . . . . . . . . . .
Introducing Highly Available Services: Solution . . . . . . . . . . . .
Introducing the TruCluster Software Components: Exercise . . .
Introducing the TruCluster Software Components: Solution . . .
TruCluster Software Failure Detection and Response: Exercise
TruCluster Software Failure Detection and Response: Solution

iv

3 Conguring TruCluster Available Server Hardware


About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examining TruCluster Available Server General Hardware Conguration
Rules and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rules and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCSI Bus Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Determining Available Server Hardware Components . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Available Server Supported Systems . . . . . . . . . . . . . . . . .
DECsafe Supported SCSI Controllers . . . . . . . . . . . . . . . . . . . . . . . . .
BA350, BA353, and BA356 Storage Expansion Units . . . . . . . . . . . . .
Supported Controllers for DEC RAID Subsystems . . . . . . . . . . . . . . .
Supported Disk Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Signal Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCSI Cables and Terminators for Available Server Congurations . . .
Network Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conguring TruCluster Available Server Hardware . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing the Network Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Firmware Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Starting Your TruCluster Available Server Conguration . . . . . . . . . .
Setting Up a Single-Ended Available Server Conguration for Use
with PMAZCs and a BA350 or BA353 . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up a Differential Available Server Conguration for Use with
PMAZCs and a BA350 or BA353 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up a Differential Available Server Conguration for Use with
PMAZCs and a BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration for Use with PMAZCs
and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PMAZC Dual SCSI Module Jumpers . . . . . . . . . . . . . . . . . . . . . . . . . .
Verifying and Setting PMAZC and KZTSA SCSI ID and Bus
Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration Using a KZTSA
TURBOchannel to SCSI Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with KZMSA SCSI
Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preparing a KZMSA for Use in an Available Server Environment . . .
Setting Up an Available Server Conguration Using KZPSA PCI to
SCSI Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting KZPSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with Mixed Adapter
Types and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with Mixed Adapter
Types and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examining TruCluster Available Server General Hardware
Conguration Rules and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . .
Determining Available Server Hardware Components . . . . . . . . . . . . .
Conguring TruCluster Available Server Hardware . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32
32
32
32
33
33
33
35
310
310
310
310
311
314
315
315
320
323
325
325
325
326
326
327
331
335
338
341
341
344
350
356
361
367
368
370
373
373
373
374
376
v

Examining TruCluster Available Server General Hardware


Conguration Rules and Restrictions: Exercise . . . . . . . . . . . . . .
Examining TruCluster Available Server General Hardware
Conguration Rules and Restrictions: Solution . . . . . . . . . . . . . .
Determining Available Server Hardware Components: Exercise .
Determining Available Server Hardware Components: Solution .
Conguring TruCluster Available Server Hardware: Exercise . .
Conguring TruCluster Available Server Hardware: Solution . .

....

376

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

377
377
378
379
379

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

42
42
42
42
43
43
43
43
44
45
46
46
46
48
49
413

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

416
417
417
417
422
422
422
423
424
424
424
425
426
427
427

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

52
52
52
52
53
53
53
54

4 Installing TruCluster Software


About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performing Preliminary Setup Tasks . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Subsets Required for TruCluster Available Server Operation . .
Before Installing TruCluster Software . . . . . . . . . . . . . . . . . . . .
Network Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preparing to Install TruCluster Software . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Choosing the TruCluster Software Installation Procedure . . . . .
Setting Up an ASE for the First Time . . . . . . . . . . . . . . . . . . . .
Performing a Rolling Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . .
Simultaneous Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adding a Member System to an Existing ASE with ASE V1.4
Operating Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing TruCluster Available Server Software Version 1.4 . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performing Preliminary Setup Tasks . . . . . . . . . . . . . . . . . . . . .
Preparing to Install TruCluster Software . . . . . . . . . . . . . . . . .
Installing TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performing Preliminary Setup Tasks: Exercise . . . . . . . . . . . . .
Performing Preliminary Setup Tasks: Solution . . . . . . . . . . . . .
Preparing to Install TruCluster Software: Exercise . . . . . . . . .
Preparing to Install TruCluster Software: Solution . . . . . . . . . .
Installing TruCluster Software: Exercise . . . . . . . . . . . . . . . . .
Installing TruCluster Software: Solution . . . . . . . . . . . . . . . . . .

5 Setting Up and Managing ASE Members


About This Chapter . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing the asemgr Utility . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
asemgr Command Syntax . . . . . . . . . . . . .
Running Multiple Instances of the asemgr

vi

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

Setting Up and Managing Members . . . . . . . . . . . . . . . .


Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using asemgr the First Time . . . . . . . . . . . . . . . . . . .
Initializing ASE Member Systems . . . . . . . . . . . . . . .
Using asemgr to Manage Members . . . . . . . . . . . . . .
Adding a Member . . . . . . . . . . . . . . . . . . . . . . . . . . .
Deleting a Member . . . . . . . . . . . . . . . . . . . . . . . . . .
Managing ASE Networks . . . . . . . . . . . . . . . . . . . . . .
Displaying ASE Member Status . . . . . . . . . . . . . . . .
Resetting the TruCluster Software Daemons . . . . . . .
TruCluster Software Daemon Scheduling . . . . . . . . .
Using TruCluster Event Logging . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Starting the Logger . . . . . . . . . . . . . . . . . . . . . . . . . .
Stopping the Logger . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting System Logging . . . . . . . . . . . . . . . . . . . . . . .
Displaying Logger Location . . . . . . . . . . . . . . . . . . . .
Setting Log Level . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using an Alert Script . . . . . . . . . . . . . . . . . . . . . . . . .
Examining Log Messages . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing the asemgr Utility . . . . . . . . . . . . . . . . .
Setting Up and Managing Members . . . . . . . . . . . . .
Using TruCluster Software Event Logging . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing the asemgr Utility: Exercise . . . . . . . . .
Introducing the asemgr Utility: Solution . . . . . . . . .
Using asemgr to Manage Members: Exercise . . . . . .
Using asemgr to Manage Members: Solution . . . . . .
Using TruCluster Software Event Logging: Exercise
Using TruCluster Software Event Logging: Solution

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

55
55
55
56
57
58
58
59
514
515
515
517
517
517
517
517
518
519
520
522
524
524
524
524
525
525
525
525
525
526
526

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

62
62
62
62
63
63
63
64
64
64
65
67
69
69
69
610
610
611
612
613

6 Writing and Debugging Action Scripts


About This Chapter . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . .
Introducing Action Scripts . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . .
Types of Action Scripts . . . . . . . .
Available Services and Scripts . . .
Script Exit Codes . . . . . . . . . . . . .
Script Output . . . . . . . . . . . . . . . .
Skeleton Scripts . . . . . . . . . . . . . .
Start and Stop Scripts . . . . . . . . .
Creating Action Scripts . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . .
Methods to Create Scripts . . . . . .
Specifying Your Own Script . . . . .
Editing the Default Script . . . . . .
Pointing to an External Script . . .
Additional Script Information . . .
Testing and Debugging Action Scripts

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

vii

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test First . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Debugging Scripts in ASE . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Action Scripts . . . . . . . . . . . . . . . . .
Creating Action Scripts . . . . . . . . . . . . . . . . . . .
Testing and Debugging Action Scripts . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Action Scripts: Exercise . . . . . . . . .
Introducing Action Scripts: Solution . . . . . . . . .
Creating Action Scripts: Exercise . . . . . . . . . . .
Creating Action Scripts: Solution . . . . . . . . . . .
Testing and Debugging Action Scripts: Exercise
Testing and Debugging Action Scripts: Solution

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

613
613
613
614
614
614
614
615
615
615
616
618
620
620

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understanding Highly Available Services . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Supported Services . . . . . . . . . . . . . . . . . .
Describing Clients and Services . . . . . . . . . . . . . . . . . .
Setting Up a Service . . . . . . . . . . . . . . . . . . . . . . . . . .
Preparing to Set Up Services . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automatic Service Placement Policy . . . . . . . . . . . . . .
Services and Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using UFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using AdvFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using LSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing the Application . . . . . . . . . . . . . . . . . . . . . .
Conguration Example . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up NFS Services . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Describing an NFS Service . . . . . . . . . . . . . . . . . . . . .
Discussing the NFS Service Setup Procedure . . . . . . .
Setting Up an NFS Service for a Public Directory . . .
Discussing the /etc/exports.ase File . . . . . . . . . . . . . . .
NFS Mail Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up a Disk Service . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Describing a Disk Service . . . . . . . . . . . . . . . . . . . . . .
Describing the Set Up Procedure for a Disk Service . .
Using a Network Alias . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up a Disk Service for a Database Application
Setting Up a User-Dened Service . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
User-Dened Service Setup Procedure . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

72
72
72
72
73
73
73
73
74
75
75
75
75
76
76
76
76
77
77
78
78
711
711
711
711
712
717
718
719
719
719
719
720
721
727
727
727

7 Setting Up ASE Services

viii

Adding a User-Dened Service . . . . . . . . . . . . . . . .


User-Dened Login Service . . . . . . . . . . . . . . . . . . .
Using asemgr to Manage Services . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Managing Services Menu . . . . . . . . . . . . . . . . . . . . .
Displaying Service Status . . . . . . . . . . . . . . . . . . . .
Relocating a Service . . . . . . . . . . . . . . . . . . . . . . . . .
Modifying a Service . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understanding Highly Available Services . . . . . . . .
Preparing to Set Up Services . . . . . . . . . . . . . . . . . .
Setting Up NFS Services . . . . . . . . . . . . . . . . . . . . .
Setting Up a Disk Service . . . . . . . . . . . . . . . . . . . .
Setting Up a User-Dened Service . . . . . . . . . . . . .
Using asemgr to Manage Services . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understanding Highly Available Services: Exercise
Understanding Highly Available Services: Solution
Preparing to Set Up Services: Exercise . . . . . . . . . .
Preparing to Set Up Services: Solution . . . . . . . . . .
Setting Up NFS Services: Exercise . . . . . . . . . . . . .
Setting Up NFS Services: Solution . . . . . . . . . . . . .
Setting Up a Disk Service: Exercise . . . . . . . . . . . .
Setting Up a Disk Service: Solution . . . . . . . . . . . .
Setting Up a User-Dened Service: Exercise . . . . .
Setting Up a User-Dened Service: Solution . . . . . .
Using asemgr to Manage Services: Exercise . . . . . .
Using asemgr to Manage Services: Solution . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

727
731
732
732
732
733
735
736
741
741
741
741
741
741
742
743
743
743
743
743
744
744
744
744
744
744
745
745

.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
.......
Monitor
.......
.......
.......

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

82
82
82
82
83
83
83
84
84
85
85
86
86
87
811
815
815
815
815
816
816
816
816

8 Using the Cluster Monitor


About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setup Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sample Setup Script . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Updating the Cluster Map . . . . . . . . . . . . . . . . . . . . . . . .
Using the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Starting the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . .
Top View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Device View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Services View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Launching Other Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Included Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
External Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring Available Server Congurations with the Cluster
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Top View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Device View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Services View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
What to Do when You See an Error . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Launching Other Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring Available Server Congurations with the Cluster
Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up the Cluster Monitor: Exercise . . . . . . . . . . . . . . . . . . . . . .
Setting Up the Cluster Monitor: Solution . . . . . . . . . . . . . . . . . . . . . .
Using the Cluster Monitor: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Cluster Monitor: Solution . . . . . . . . . . . . . . . . . . . . . . . . . .
Launching Other Tools: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Launching Other Tools: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring Available Server Congurations with the Cluster Monitor:
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring Available Server Congurations with the Cluster Monitor:
Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

817
817
819
819
819
819
820
821
821
821
822
822
823
823
823
824

9 Testing, Recovering, and Maintaining TruCluster Congurations


About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performing TruCluster Testing Procedures . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Conguration Assumptions . . . . . . . . . . . . . . .
Observing System Response . . . . . . . . . . . . . . . . . . . .
System Power Off Test . . . . . . . . . . . . . . . . . . . . . . . . .
DWZZA-AA Power Off . . . . . . . . . . . . . . . . . . . . . . . . .
Removing a Shared Disk . . . . . . . . . . . . . . . . . . . . . . .
Removing Power from BA350 . . . . . . . . . . . . . . . . . . .
Removing One Member from the Network . . . . . . . . .
Removing All Members from the Network . . . . . . . . . .
Recovering from Failures in the ASE . . . . . . . . . . . . . . . .
Instructional Strategy . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LSM Mirrored Disk Replacement . . . . . . . . . . . . . . . .
Obtaining LSM Disk Group Information . . . . . . . . . . .
Removing Faulty Disk from LSM Database . . . . . . . .
Restoring the Partition Table . . . . . . . . . . . . . . . . . . . .
Initializing the Disk for LSM . . . . . . . . . . . . . . . . . . .
Associating the New Disk . . . . . . . . . . . . . . . . . . . . . .
Recovering the Plex . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rereserving the Service . . . . . . . . . . . . . . . . . . . . . . . .
Replacing a Nonmirrored Disk . . . . . . . . . . . . . . . . . .
Unassigned Service . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resetting TruCluster Daemons . . . . . . . . . . . . . . . . . .
Performing Ongoing Maintenance Tasks . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Changing Hardware Conguration . . . . . . . . . . . . . . .
Stopping and Restarting TruCluster Daemon Activity

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

92
92
92
92
93
93
93
93
94
95
95
96
97
97
99
99
99
99
910
911
912
912
912
912
913
913
913
914
915
915
915
916

Adding and Removing Member Nodes . . . . . . . . . . . .


Adding and Removing Storage Boxes . . . . . . . . . . . .
Adding and Removing Disks . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performing TruCluster Software Testing Procedures .
Performing Disk Recovery Procedures . . . . . . . . . . . .
Performing Ongoing Maintenance Tasks . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Member Node Failure: Exercise . . . . . . . . . . . . . . . .
Member Node Failure: Solution . . . . . . . . . . . . . . . .
Network Interface Test: Exercise . . . . . . . . . . . . . . .
Network Interface Test: Solution . . . . . . . . . . . . . . . .
Recovering from Failures in the ASE: Exercise . . . . .
Recovering from Failures in the ASE: Solution . . . . .
Performing Ongoing Maintenance Tasks: Exercise . .
Performing Ongoing Maintenance Tasks: Solution . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

916
916
917
918
918
918
918
919
919
919
919
919
920
920
920
921

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing ASE Troubleshooting Techniques . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpreting TruCluster Error Log Messages . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Logger Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpreting Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Alert Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Learning Troubleshooting Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Diagnosing Active Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Diagnosing Nonactive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using System Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using asemgr to Monitor Member Status . . . . . . . . . . . . . . . . . . . . . .
Using asemgr to Monitor Service Status . . . . . . . . . . . . . . . . . . . . . . .
Using asemgr to Monitor the Network Conguration . . . . . . . . . . . . .
Determining Host Adapter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the uerf Utility to Monitor SCSI Bus Errors . . . . . . . . . . . . . . .
Monitoring Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring Disk I/O with iostat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring LSM Congurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring AdvFS Congurations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing ASE Troubleshooting Techniques . . . . . . . . . . . . . . . . . . .
Interpreting TruCluster Software Error Log Messages . . . . . . . . . . . .
Learning Troubleshooting Procedures . . . . . . . . . . . . . . . . . . . . . . . . .
Using System Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing ASE Troubleshooting Techniques: Exercise . . . . . . . . . . .

102
102
102
102
103
103
103
104
105
105
105
106
107
108
108
1010
1012
1012
1013
1015
1015
1016
1016
1017
1018
1019
1019
1019
1021
1021
1021
1022
1023
1024
1024

10 Troubleshooting TruCluster Congurations

xi

Introducing ASE Troubleshooting Techniques: Solution . . . . . . .


Interpreting TruCluster Software Error Log Messages: Exercise
Interpreting TruCluster Software Error Log Messages: Solution
Diagnosing a Nonactive System: Exercise . . . . . . . . . . . . . . . . . .
Diagnosing a Nonactive System: Solution . . . . . . . . . . . . . . . . . .
Generating CAM Error Information: Exercise . . . . . . . . . . . . . . .
Generating CAM Error Information: Solution . . . . . . . . . . . . . . .
Monitoring TruCluster Daemons: Exercise . . . . . . . . . . . . . . . . .
Monitoring TruCluster Daemons: Solution . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

1024
1024
1024
1025
1025
1025
1025
1026
1026

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

112
112
112
112
113
113
115
116
117
118
119
1110
1112
1113
1114
1115
1116
1117
1118
1119
1119
1120
1121
1122
1123
1123
1124
1125
1125
1126
1126
1127
1127
1127
1128
1129
1130
1131
1132
1133
1134
1135

11 Resolving Common TruCluster Problems


About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Recognizing Common Problems and Their Symptoms
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Improperly Congured SCSI Bus . . . . . . . . . . . . .
Host Adapter Failure . . . . . . . . . . . . . . . . . . . . . .
Member Crash . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Device Failure . . . . . . . . . . . . . . . . . . . . .
Network Interface Failure . . . . . . . . . . . . . . . . . .
Network Partition . . . . . . . . . . . . . . . . . . . . . . . .
Invalid Script Format . . . . . . . . . . . . . . . . . . . . .
Multiple asemgr Processes . . . . . . . . . . . . . . . . . .
Removing Disk Without Updating asemgr . . . . . .
NFS Service and ASE Member with Same Name
Service Alias not in /etc/hosts on All Members . .
ASEROUTING not Set in NFS Service . . . . . . . .
ASE Member not Added to TruCluster Database
LSM not Congured on New Member . . . . . . . . .
Known TruCluster Limitations . . . . . . . . . . . . . .
Users Occupying Mount Points . . . . . . . . . . . . . .
Non-TruCluster Processes with Higher Priority .
Using BC09 Cables with KZTSA Controller . . . .
Applying TruCluster Conguration Guidelines . . . . .
TruCluster Conguration Guidelines . . . . . . . . . .
General Hardware Conguration . . . . . . . . . . . . .
SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Host Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disk Storage Enclosures . . . . . . . . . . . . . . . . . . .
Signal Converters . . . . . . . . . . . . . . . . . . . . . . . .
Tri-link Connectors . . . . . . . . . . . . . . . . . . . . . . .
Network Connections . . . . . . . . . . . . . . . . . . . . . .
TruCluster Software Installation . . . . . . . . . . . . .
General Software Conguration . . . . . . . . . . . . . .
TruCluster Software Conguration . . . . . . . . . . .
Service Conguration . . . . . . . . . . . . . . . . . . . . . .
Disk Services . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LSM Conguration . . . . . . . . . . . . . . . . . . . . . . .
AdvFS Conguration . . . . . . . . . . . . . . . . . . . . . .
Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . .

xii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Recognizing Common Problems . . . . . . . . . . . . . . . . . . . .
Applying TruCluster Conguration Guidelines . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Message Interpretation: Exercise . . . . . . . . .
TruCluster Message Interpretation: Solution . . . . . . . . .
Problem Relocating Service: Exercise . . . . . . . . . . . . . . .
Problem Relocating Service: Solution . . . . . . . . . . . . . . .
SCSI ID Limits: Exercise . . . . . . . . . . . . . . . . . . . . . . . .
SCSI ID Limits: Solution . . . . . . . . . . . . . . . . . . . . . . . .
Applying TruCluster Conguration Guidelines: Exercise
Applying TruCluster Conguration Guidelines: Solution

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

1136
1136
1136
1137
1137
1137
1137
1137
1137
1137
1138
1138

Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

122
1213

12 Test

Index
Examples
31
32
33
34
35
36
37
38
39
310
311
41
51
52
53
54
55
56
57
58
61
62
63
64
65
66
67

Displaying DEC 3000 Conguration . . . . . . . . . . . . . . . . . . . . . .


Displaying PMAZC Bus Speed and SCSI ID . . . . . . . . . . . . . . . .
Displaying KZTSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . .
Setting PMAZC SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . .
Setting KZTSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . .
Booting the LFU Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the LFU Utility to Display Hardware Conguration . . . . .
Using the LFU Utility to Update KZMSA Firmware . . . . . . . . .
Using the LFU Utility to Modify KZMSA Options . . . . . . . . . . .
Displaying Devices on AlphaServer 1000, 2000 or 2100 Systems
Setting KZPSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . .
Installing TruCluster Available Server Software Version 1.4 . . .
asemgr Menus for Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Logging Conguration File . . . . . . . . . . . . . . . . . . . . . . .
Displaying Logger Daemon Location . . . . . . . . . . . . . . . . . . . . . .
Displaying Log Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting the Log Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Editing the Alert Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing the Alert Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
daemon.log Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Skeleton Start Action Script . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Skeleton Check Action Script . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Start Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stop Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Specifying Your Own Action Script . . . . . . . . . . . . . . . . . . . . . . .
Editing the Default Action Script . . . . . . . . . . . . . . . . . . . . . . . .
Pointing to an External Script . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

342
342
343
343
344
357
358
359
360
366
368
417
57
518
519
520
520
521
521
522
65
66
67
68
610
610
611

xiii

71
72
73
74
75
76
77
78
79
710
711
91
92
93
94
95
96
97
98
101
102
103
104
105
106
107
108
109
1010
111

Providing a Mirrored Stripe Set Using LSM . . . . . . . . . . . . .


Creating a File Domain Using AdvFS . . . . . . . . . . . . . . . . . .
Adding an NFS Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
/etc/exports.ase File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using a Network Alias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adding a Disk Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up a User-Dened Service . . . . . . . . . . . . . . . . . . . .
Managing ASE Services Menu . . . . . . . . . . . . . . . . . . . . . . . .
Displaying Service Status . . . . . . . . . . . . . . . . . . . . . . . . . . .
Relocating a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modifying a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Power Off Messages . . . . . . . . . . . . . . . . . . . . . . . . .
DWZZA Disconnection Error Messages . . . . . . . . . . . . . . . . .
Failed Device Error Messages . . . . . . . . . . . . . . . . . . . . . . . .
BA350 Power Off Error Messages . . . . . . . . . . . . . . . . . . . . .
Error Messages When One Member Removed from Network
Error Messages When All Members Removed from Network
LSM Disk Group Information . . . . . . . . . . . . . . . . . . . . . . . .
Conrming Failed Disk Information . . . . . . . . . . . . . . . . . . .
Member Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Service Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Network Conguration Status . . . . . . . . . . . . . . . . . . . . . . . .
uerf CAM Error Display . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using ps to Determine Daemon Status . . . . . . . . . . . . . . . . .
Using rpcinfo to Display Daemon and Port Information . . . .
scu show edt Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
volprint Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
showfdmn Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
showfsets Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Display when asemgr Cannot Modify Service . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

79
710
712
718
720
721
727
733
733
735
737
95
95
96
96
97
98
910
911
1014
1015
1015
1016
1017
1017
1018
1019
1019
1020
1112

TruCluster Available Server Conguration and Management Course


Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sample Available Server Conguration . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Server Conguration and Maintenance Phases . . . . . . . . .
ASE Software Component Interaction . . . . . . . . . . . . . . . . . . . . . . . .
Member Down Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Critical SCSI Path Failure Scenario . . . . . . . . . . . . . . . . . . . . . . . . .
Service Failover Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
One SCSI Bus, Two Transmission Methods . . . . . . . . . . . . . . . . . . . .
Legend for SCSI Bus Termination . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCSI Buses with Devices on Bus Ends Only . . . . . . . . . . . . . . . . . . .
SCSI Bus with Device in the Middle of the Bus . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

xxiii
17
110
114
27
210
212
216
35
36
36
37

Figures
1
11
12
13
21
22
23
24
31
32
33
34

xiv

35
36
37
38
39
310
311
312
313
314
315
316
317
318
319
320
321

322

323

324
325
326
327
328
329
330
331
332
41
71

SCSI Buses Using Bus Segments with Different Transmission


Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using External Termination on the SCSI Bus . . . . . . . . . . . . . . . . . . .
Disconnecting a Device from the SCSI Bus . . . . . . . . . . . . . . . . . . . . .
BA350 SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BA353 Device Address Switches and SCSI Input and Output
Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BA356 Storage Shelf SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DWZZA-AA Signal Converter SCSI Bus Termination . . . . . . . . . . . . .
DWZZA-VA Signal Converter SCSI Bus Termination . . . . . . . . . . . . .
DWZZB-AA Signal Converter SCSI Bus Termination . . . . . . . . . . . . .
DWZZB-VW Signal Converter SCSI Bus Termination . . . . . . . . . . . . .
Available Server Conguration with Two DEC 3000 Model 500
Systems and a Single-Ended Shared Bus with a BA350 . . . . . . . . . . .
Available Server Conguration with Two DEC 3000 Model 500
Systems and Two Single-Ended Shared Buses Each with a BA350 . . .
Available Server Conguration with Three DEC 3000 Model 500
Systems with PMAZCs, Differential Shared Bus, and a BA350 . . . . .
Available Server Conguration with Three DEC 3000 Model 500
Systems with PMAZCs, Differential Shared Bus, and a BA356 . . . . .
Available Server Conguration with Two DEC 3000 Model 500
Systems with PMAZC SCSI Controllers and an HSZ40 . . . . . . . . . . .
PMAZC Dual SCSI Module Jumpers . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Server Conguration with Two DEC 3000 Model 500
Systems Using KZTSA SCSI Adapters and a Single-Ended Shared
Bus with a BA350 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Server Conguration with Two DEC 3000 Model 500
Systems Using KZTSA SCSI Adapters and a Single-Ended Shared
Bus with a BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Two DEC 3000 Model 500 Systems with KZTSA TURBOchannel
SCSI Adapters in an Available Server Conguration with an
HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
KZTSA Jumpers and Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Server Conguration with Two DEC 7000 with KZMSA
XMI SCSI Adapters and a BA350 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Server Conguration with Two DEC 7000 with KZMSA
XMI SCSI Adapters and a BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Server Conguration with Two DEC 7000 Systems Using
KZMSA XMI to SCSI Adapters with an HSZ40 . . . . . . . . . . . . . . . . .
Available Server Conguration with Two AlphaServer 2100 Systems
Using KZPSA PCI to SCSI Adapters with a BA350 . . . . . . . . . . . . . .
Available Server Conguration with Two AlphaServer 2100 Systems
Using KZPSA PCI to SCSI Adapters with an HSZ40 . . . . . . . . . . . . .
KZPSA Termination Resistor Locations . . . . . . . . . . . . . . . . . . . . . . . .
Mixed Host Adapter Available Server Conguration with BA350
Storage Expansion Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mixed Host Adapter Available Server Conguration with an
HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Upgrade Paths for Existing DECsafe Available Server Installation . . .
Client View of ASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38
39
39
312
313
314
317
318
319
320
330
330
333
337
340
341

346

347

348
349
353
354
356
363
365
365
370
372
47
74
xv

81
82
83
84
85
86
87
101
102
103
104
105

Cluster Monitor Top View . . . . . . . . . . . . . . . . . . . . . .


Cluster Monitor Conguration View . . . . . . . . . . . . . .
SCSI Bus Conguration . . . . . . . . . . . . . . . . . . . . . . . .
All Shared Connections . . . . . . . . . . . . . . . . . . . . . . . .
Local Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cluster Monitor Services View . . . . . . . . . . . . . . . . . . .
Service Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting Strategy . . . . . . . . . . . . . . . . . . . . . . .
A daemon.log File Entry . . . . . . . . . . . . . . . . . . . . . . .
Alert Message Paths . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting an Active TruCluster Conguration . .
Troubleshooting Nonactive TruCluster Congurations .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

87
88
89
810
811
812
814
104
106
107
109
1011

Conventions Used in this Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


SCSI Bus Lengths in Some Devices and Systems . . . . . . . . . . . . . . . .
Available Server Supported System Types . . . . . . . . . . . . . . . . . . . . . .
DECsafe-Supported SCSI Controllers . . . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Available Server Supported Storage Expansion Units . . . .
Supported DEC RAID Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . .
TruCluster Available Server Supported Disk Devices . . . . . . . . . . . . .
Supported Signal Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cables Used for Available Server Congurations . . . . . . . . . . . . . . . . .
Terminators and Special Connectors . . . . . . . . . . . . . . . . . . . . . . . . . .
Supported Network Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up TruCluster Available Server Congurations . . . . . . . . . . .
PMAZC SCSI Controllers and a BA350 or BA353 with Single-Ended
Available Server Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Needed for a Single-Ended Available Server Conguration
with PMAZC SCSI Controllers and a BA350 or BA353 (No
DWZZAs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with PMAZC SCSI
Controllers and a BA350 or BA353 in a Differential Available Server
Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Needed for a Differential Available Server Conguration
with PMAZC or KZMSA SCSI Controllers and a BA350 or BA353 . . .
Setting Up an Available Server Conguration with PMAZC
SCSI Controllers and a BA356 in a Differential Available Server
Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Needed for a Differential Available Server Conguration
with PMAZC or KZMSA SCSI Controllers and a BA356 . . . . . . . . . . .
Setting Up a Available Server Conguration with PMAZC SCSI
Controllers and an HSZ10 or HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Needed for a Differential Available Server Conguration
with PMAZC or KZMSA SCSI Controllers and an HSZ40 or PMAZCs
and an HSZ10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with KZTSA
TURBOchannel to SCSI Adapters and a BA350, BA353, or BA356 . .

xxvi
34
310
311
311
314
315
316
321
323
324
327

Tables
1
31
32
33
34
35
36
37
38
39
310
311
312
313

314

315
316

317
318
319

320

xvi

328

331

331
334

335
337
339

340
345

321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
41
61
81
82
83
84
101
102
103
111
112
113

Hardware Needed for a KZPSA or KZTSA and BA350, BA353, or


BA356 Available Server Conguration . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with KZTSA
TURBOchannel to SCSI Adapters and an HSZ40 . . . . . . . . . . . . . . . .
Hardware Needed for an Available Server Conguration with a
KZPSA or KZTSA and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
KZMSA Boot ROM Part Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with KZMSA XMI to
SCSI Adapters and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with KZMSA XMI to
SCSI Adapters and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration Using KZPSA Adapters
and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration Using KZPSA Adapters
and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with Mixed Host
Adapters and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . . . . . .
Hardware Needed for a Mixed Adapter Available Server
Conguration with BA350 or BA353 . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up an Available Server Conguration with Mixed Host
Adapters and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Needed for a Mixed Adapter Available Server
Conguration with an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Phase 1: Installing the KZPSA SCSI Adapters . . . . . . . . . . . . . . . . . .
Creating a Shared Bus with the BA350 . . . . . . . . . . . . . . . . . . . . . . .
Creating a Shared Bus with the HSZ40 . . . . . . . . . . . . . . . . . . . . . . .
Upgrade Paths for Existing DECsafe Available Server Installation . . .
Script Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Available Service Icons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Main Window Failure Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Device View Failure Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Services View Failure Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Monitoring Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Host Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Agent Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Frequently Reported Hardware Problems . . . . . . . . . . . . . . . . . . . . . .
Frequently Reported Software Problems . . . . . . . . . . . . . . . . . . . . . . .
Known Limitations of TruCluster Available Server Software Version
1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

347
348
349
350
351
354
362
363
368
370
371
372
380
380
381
48
64
813
816
817
817
1012
1013
1014
114
1111
1119

xvii

About This Course

xix

About This Course

About This Course


Introduction

The TruCluster Available Server Conguration and Management


course includes topics on conguring and managing various
congurations of the TruCluster Available Server Software
Version 1.4 product.
This section describes the contents of the course, suggests ways in
which you can most effectively use the materials, and sets up the
conventions for the use of terms in the course. It includes:

Target audience who should take this course

Prerequisites the skills and knowledge needed to ensure


your success in this course

Course goals and nongoals what skills or knowledge the


course will and will not provide

Course organization the structure of the course

Course map the sequence in which you should take each


chapter

Chapter descriptions brief descriptions of each chapter

Time schedule an estimate of the amount of time needed to


cover the chapter material and lab exercises

Course conventions explanation of symbols and signs used


throughout this course

Course
Description

Course description a brief overview of the course contents

Resources manuals and books to help you successfully


complete this course

This course describes tasks that support personnel and system


administrators must perform to install, congure, and manage
TruCluster Available Server congurations comprising as many as
four member systems.
The course provides extensive discussion on how to plan
TruCluster Available Server congurations, how to install the
associated peripherals and cabling, and how to upgrade to or
install TruCluster Available Server Version 1.4 software.

Target
Audience

This course is designed for system management and support


people who have experience managing a Digital UNIX
environment and require instruction in advanced system
management topics associated with conguring and maintaining
TruCluster Available Server.

Digital UNIX is a registered trademark of Digital Equipment Corporation.

xx

About This Course

Prerequisites

To get the most from this course, students should be able to:

Install and manage a Digital UNIX system

Install layered products and register PAKs

Troubleshoot the operating system and make adjustments to


improve performance

Connect to a TCP/IP network and understand subnetworking


concepts

Set up and maintain distributed network services such as time


synchronization

Set up a Network File System

Set up a LAT server

Install and maintain AdvFS, LSM, and RAID functionality

These prerequisites can be satised by taking the following


courses:

DEC OSF/1 System Administration lecture lab or self-paced


course

DEC OSF/1 Network Management lecture lab or self-paced


course

AdvFS, LSM, and RAID Conguration and Management


lecture lab or self-paced course

These courses are accessible for personal use, internally over


the World Wide Web in PostScript format at the UNIX Course
Development home page. The URL to this WWW location is:
http://mmstuf.zko.dec.com/
This course will also be taught by Global Knowledge Network,
Inc.

Course Goals

To perform TruCluster Available Server tasks, the system


manager should be able to:

Determine the hardware needs of various TruCluster Available


Server Software Version 1.4 congurations

Install and congure TruCluster Available Server hardware

Install and congure the TruCluster Available Server Software


Version 1.4 software on all member nodes

Set up and maintain ASE services

Generate and test ASE start and stop scripts

Manage and maintain TruCluster Available Server hardware


and software

Use other layered and complementary product software in a


TruCluster Available Server conguration such as:
The Logical Storage Manager (LSM)

xxi

About This Course

The Advanced File System (AdvFS) and its utilities

Nongoals

Identify and resolve TruCluster Available Server-related


problems

This course will not cover the following topics:

Digital UNIX hardware installation

AdvFS, LSM, or RAID conguration and management

Conguring and setting up other cluster servers such as the


production server

Installing or maintaining other cluster-related software such


as the Oracle Parallel Server

TruCluster Production Server functionality such as distributed


raw disk or Distributed Lock Manager functionality

The latter goals are met in other courses:

Course
Organization

AdvFS, LSM, and RAID Conguration and Management


TruCluster Software Conguration and Management

This Course Guide is divided into chapters designed to cover a


skill or related group of skills required to fulll the course goals.
Illustrations are used to present conceptual material. Examples
are provided to demonstrate concepts and commands.
In this course, each chapter consists of:

One or more objectives that describe the goals of the chapter.

A list of resources, or materials for further reference. Some


of these manuals are included with your course materials.
Others may be available for reference in your classroom or lab.

The text of each chapter, which includes outlines, tables,


gures, and examples.

The summary highlights the main points presented in the


chapter.

xxii

An introduction to the subject matter of the chapter.

The exercises found at the end of each chapter enable you


to practice your skills and measure your mastery of the
information learned during the course.

About This Course

Course Map

The Course Map shows how each chapter is related to other


chapters and to the course as a whole. Before studying a
chapter, you should master all of its prerequisite chapters. The
prerequisite chapters are depicted before the following chapters
on the Course Map. The direction of the arrows determines the
order in which the chapters should be covered.
This material can be effectively presented in many different ways.
By providing prerequisite information as needed, your instructor
may choose to use a chapter organization different from the one
shown in the following Course Map.

Figure 1 TruCluster Available Server Conguration and Management Course Map


Introducing
TruCluster Available Server
Configuration and
Management
Setting Up
ASE Services

Understanding TruCluster
Software Interactions
Using the
Cluster Monitor

Configuring TruCluster
Available Server
Hardware
Testing, Recovering,
and Maintaining TruCluster
Configurations

Installing
TruCluster
Software
Troubleshooting
TruCluster
Configurations

Setting Up and
Managing ASE
Members
Resolving Common
TruCluster Problems

Writing and
Debugging Action
Scripts

ZKOX548112RGS

xxiii

About This Course

Chapter
Descriptions

A brief description of each chapter is listed below.


Introducing TruCluster Available Server Conguration
and Management Provides an overview of the TruCluster
Available Server Software Version 1.4 product, and briey
describes the things you should know when planning an
Available Server implementation.

Understanding TruCluster Software Interactions


Discusses the interactions among the TruCluster Software
components.

Conguring TruCluster Available Server Hardware


Describes the general hardware conguration rules and
restrictions, lists the supported hardware used to set up
TruCluster Available Server Software Version 1.4, and how to
set up the hardware for TruCluster Available Server.

Installing TruCluster Software Describes how to


upgrade to, or install the TruCluster Available Server Software
Version 1.4 software.

Setting up and Managing ASE Members Describes how


to set up and manage ASE member systems.

Writing and Debugging Action Scripts Discusses the


types of action scripts that TruCluster Available Server uses,
and the guidelines and conventions for creating and debugging
action scripts.

Setting Up ASE Services Explains ASE services and


describes how to set them up.

Using the Cluster Monitor Describes how to install and


use the Cluster Monitor.

Testing, Recovering, and Maintaining TruCluster


Congurations Describes how to verify that your ASE
services will behave as you expect when hardware failures
occur.

Troubleshooting TruCluster Congurations Describes


ways to troubleshoot problems that occur in TruCluster
Software congurations.

xxiv

Resolving Common TruCluster Problems Examines


ways to recognize and solve common TruCluster problems,
focusing on two main topics: common problems and
conguration guidelines.

About This Course

Time Schedule

The amount of time required for this course depends on each


students background knowledge, experience, and interest in the
various topics. Use the following table as a guideline.

Day

Course Chapter

Lecture Hours

Lab Hours

About This Course

0.5

Introducing ASE Conguration and Management

1.5

Understanding TruCluster Software Interaction

Conguring TruCluster Available Server Hardware

Installing TruCluster Software

Setting up and Managing ASE Members

Writing and Debugging Action Scripts

Setting Up ASE Services

Using the Cluster Monitor

.5

Testing, Recovering, and Maintaining the TruCluster


Software

.5

Troubleshooting the TruCluster Software

.5

.5

Resolving Common TruCluster Software Problems

.5

.5

xxv

About This Course

Course
Conventions

Table 1 describes the conventions used in this course.


Table 1 Conventions Used in this Course
Convention

Meaning

keyword

Keywords and new concepts are displayed in


boldface type.

examples

Examples, commands, options and pathnames


appear in monospace type.

command(x)

Cross-references to command documentation


include the section number in the reference
pages. For example, fstab(5) means fstab is
referenced in Section 5.

A dollar sign represents the user prompt.

A number sign represents the superuser prompt.

bold

Within interactive examples, boldface type


indicates user input.

key

The box symbol indicates that the named key on


the keyboard is pressed.
.
.
.

In examples, a vertical ellipsis indicates that not


all lines in the example are shown.

[]

In syntax descriptions, brackets indicate items


that are optional.

variable

In syntax descriptions, italics indicate items that


are variable.

...

xxvi

In syntax descriptions, an ellipsis indicates the


item may be repeated.
Used to separate items to be selected by clicking
a mouse button.

About This Course

Resources

For more information on the topics in this course, see the


following documentation:

TruCluster Available Server Software Version 1.4 SPD

TruCluster Available Server Software Release Notes

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Available Server


Environment Administration

Digital UNIX Installation Guide

Digital UNIX Network Administration

Cluster Monitor online help

xxvii

1
Introducing TruCluster Available Server
Conguration and Management

Introducing TruCluster Available Server Conguration and Management 11

About This Chapter

About This Chapter


Introduction

This chapter provides an overview of the TruCluster Available


Server Software (abbreviated as TruCluster Available Server),
and provides a high-level discussion of the complexities of
planning and conguring Available Server implementations. The
TruCluster Available Server product is a high-availability solution
that minimizes, but does not eliminate, the impact of hardware
and software failures.
A TruCluster Available Server implementation consists of the
following elements:

TruCluster Available Server Software

Supported hardware for an Available Server Implementation

Supported system software for an Available Server


Implementation

Collectively, this group of elements constitutes an Available


Server Environment (ASE).
TruCluster Available Server congurations signicantly reduce
downtime due to system hardware or software failures. They
provide multihost access to SCSI disks and a generic failover
mechanism for network-based services such as NFS, mail, and
login, and for applications such as databases.
Failover is accomplished using a set of daemons to monitor the
health of systems in the environment and an infrastructure for
moving services from one node to another. System managers can
also use this on-demand service migration capability for planned
shutdowns before maintenance cycles and load balancing during
peak performance demands.
The TruCluster Available Server Software can be integrated
with the POLYCENTER Advanced File System (AdvFS) and
the Logical Storage Manager (LSM) to provide high availability
for disks, storage reliability (disk mirroring, striping, and
concatenation), and fast le system recovery.

12 Introducing TruCluster Available Server Conguration and Management

About This Chapter

Objectives

To understand TruCluster Available Server congurations, you


should be able to:

Describe the basic hardware and software requirements for an


Available Server conguration

Explain the concepts associated with the TruCluster Software

Describe the three most important considerations when


conguring a TruCluster Software solution

Resources

Describe the TruCluster Available Server Software product,


including its features, components, and position in the Digital
UNIX cluster program

Describe the conguration and management phases involved


in establishing an Available Server Software implementation

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Version 1.4 Release


Notes

TruCluster Available Server Software Software Product


Description, SPD 44.17.xx

Reference Pages

Introducing TruCluster Available Server Conguration and Management 13

Describing the TruCluster Available Server Product

Describing the TruCluster Available Server Product


Overview

The TruCluster Available Server Software product signicantly


reduces downtime caused by hardware and software failures
by responding to predened failure events within an Available
Server Environment (ASE). An ASE is an integrated organization
of systems and external disks connected to one or more shared
SCSI buses and networks that together provide highly available
software and disk data to client systems.
The TruCluster Available Server provides multihost access to
SCSI disks and a generic failover mechanism for disks and
applications. After conguring the ASE and installing the
TruCluster Software, administrators can set up services that
make disks and applications highly available to client systems.
For example, users can set up services for exported NFS disks,
raw disks, disk-based applications such as database programs
or mail, and nondisk-based applications such as a remote login
service.
The TruCluster Available Server includes the following features:

Concurrently
active servers

All the systems and disks in the ASE are connected to at least one
shared SCSI bus. The services you set up in the ASE can use the
disks, but a service must have exclusive access to a disk. A service
runs on one system at a time, but any system can run a service.

Master/Standby
conguration

You can set up a Master/Standby conguration in which the Master


system runs all the services. If a failure prevents the Master system
from running the services, the TruCluster Available Server relocates
the services to the Standby system.

Transparent
NFS failover

If the TruCluster Available Server relocates a service that provides


access to exported NFS data, the change in the system exporting the
data is virtually transparent. Clients experience only a temporary
NFS server time out.

Fast le system
recovery

The POLYCENTER Advanced File System (AdvFS) provides rapid


crash recovery, high performance, and a exible structure.

Increased data
integrity

Digital RAID technology and the Logical Storage Manager (LSM)


software provide high data availability for disk storage devices,
protecting against data loss and improving disk input/output (I/O)
performance. You can also perform disk management tasks without
disrupting access to disks.

Global event
logging

You can log messages about events that occur in the ASE to one or
more systems. You can also receive notication of critical problems
through electronic mail.

14 Introducing TruCluster Available Server Conguration and Management

Describing the TruCluster Available Server Product

Network
failover and
network
monitoring

The systems in the ASE can be congured using multiple network


adapters. This provides better availability and performance, as
all known paths and routing connections are used when network
path failures occur. Multiple network adapters also provide greater
exibility in client access to ASE services.
Networks can also be monitored for proper operation. The status
of monitored networks is passed to a TruCluster script that can be
customized.

Cluster Monitor

The Cluster Monitor provides a graphical view of the Available Server


implementation and may be used to determine the current state of
availability and connectivity in the ASE.

Product
Position

The TruCluster Available Server Software product was originally


conceived as a mechanism for restarting NFS services in a
new location following a hardware failure. Its capabilities have
expanded into a general failover product. If a server providing a
network-based service or application crashes or cannot access the
disks or network, the service is relocated to another server. The
TruCluster Available Server is also a rst step in implementing
full clustering capabilities on Digital UNIX platforms.
TruCluster Available Server congurations make distributed
services such as NFS, mail, or remote login highly available
and manageable. When the system providing a service fails,
the TruCluster Software automatically relocates it to another
server. Failover occurs quickly and transparently to users. The
TruCluster Software also lets system managers relocate services
from one server to another on demand.
The TruCluster Software provides a tool set for building available
applications. Any application that runs on one system and can
be stopped and started via scripts can be set up to be highly
available in the ASE. You provide the scripts and the TruCluster
Software provides the support for running and relocating services.
The TruCluster Available Server offers unied conguration
and control functions and provides availability for distributed
services. This is a step in the direction of an integrated clustering
capability. A cluster is a group of systems that works collectively
as a single system to provide fast, uninterrupted computing
service. A cluster should provide high availability of services,
scalable performance, and centralized management. You can
cluster from two to four Alpha systems in a TruCluster Available
Server conguration, using SCSI and TCP/IP as interconnects,
and supporting multivendor client workstations. The Digital
AdvantageCluster Available Server package includes the
TruCluster Software in a precongured hardware package.
TruCluster Available Server congurations are ideal for
supporting such high availability applications as order processing,
point-of-sale transaction processing, online customer service,
reservation, catalog and database query systems.

Introducing TruCluster Available Server Conguration and Management 15

Conguring the TruCluster Available Server

Conguring the TruCluster Available Server


Overview

The TruCluster Available Server coordinates groups of computer


systems and their disks so they can function as a unied
service environment rather than as an unintegrated collection of
individual computers.
The Available Server hardware conguration consists of from two
to four member systems, connected to at least one shared SCSI
bus, with external disks. The systems communicate with each
other and monitor the shared devices through both the shared bus
and the network.

Hardware
Requirements

An Available Server hardware conguration can contain from two


to four systems, referred to as member systems. The systems
are connected to at least one shared SCSI bus, and they access
external disks that are connected to the shared bus. The systems
communicate with each other and monitor the shared devices
through both the shared bus and the congured networks.
The TruCluster Software supports a variety of Digital systems,
as listed in the Software Product Description (SPD 44.17.xx).
Because there are eight SCSI IDs (0-7), the TruCluster Software
supports combinations of a number of systems and disks.

One to six shared disks per SCSI bus in a two-node


environment

One to ve shared disks per SCSI bus in a three-node


environment

One to four shared disks per SCSI bus in a four-node


environment

The external disks are mounted in a storage expansion box. This


enables all the member systems to have access to the disks and
provides a separate power source, not dependent on any systems
power.
The TruCluster Software supports Ethernet and FDDI network
controllers. The TruCluster Software does not support Prestoserve
NVRAM failover; the cache would be stranded.
See the Software Product Description for detailed hardware
requirements.

16 Introducing TruCluster Available Server Conguration and Management

Conguring the TruCluster Available Server

Software
Requirements

The TruCluster Available Server Software Version 1.4 requires


the Digital UNIX operating system Version 4.0a. Before you can
install Digital UNIX V4.0a, you must rst upgrade your systems
to Digital UNIX Version 3.2g. The Digital UNIX Version 3.2g
patches are located on the Complementary Products CDROM.
You must install the following subsets for the TruCluster Software
to work correctly:

Basic Networking Services

Software Development Environment

The POLYCENTER Advanced File System (see SPD 46.16.xx) is


recommended for fast le system recovery. The Logical Storage
Manager (See SPD 51.24.xx) is recommended for creating
mirrored and/or striped volumes.

Sample
Available
Server
Conguration

Figure 11 depicts an Available Server implementation with


two member systems in the ASE, two network connections, two
shared SCSI buses, and four ASE services congured.
Figure 11 Sample Available Server Conguration

Client
Client

Client

Server 1

Server 2

nfs_service

mail_service

dbase_service

Shared disks
on shared buses

login_service

SCSI
Controller

SCSI
Controller

SCSI
Controller

SCSI
Controller

Private nonshared disks


ZK0940UAI

The chapters in this course contain extensive system conguration


descriptions and information.

Introducing TruCluster Available Server Conguration and Management 17

Conguring the TruCluster Available Server

You can also refer to TruCluster Available Server Software


Hardware Conguration and Software Installation for other
descriptions and examples of Available Server Software
congurations.

18 Introducing TruCluster Available Server Conguration and Management

Presenting the TruCluster Software

Presenting the TruCluster Software


Software
Components

The TruCluster Software must be installed on each member


system. The components interact by means of TCP/IP socket
connections. The TruCluster Software components include:

Manager utility (asemgr) user interface allowing system


administrators to send commands to the Director daemon

Director daemon (asedirector) daemon that manages and


coordinates activities within the ASE domain

Agent daemon (aseagent) daemon that runs on each ASE


member; oversees ASE activities for a specic member

Host status monitor daemon (asehsm) daemon that runs on


each ASE member; monitors the status of the other members
in the ASE domain

Logger daemon (aselogger) daemon that writes error


messages to the system log les

Availability manager driver driver that supports serverto-server messages on the SCSI bus; reports I/O subsystem
failures to the Agent daemon

ASE driver driver that supports UFS le systems exported


from LSM volumes

asecdb database that describes all member nodes and


services within the ASE domain

Action scripts scripts that initialize, start, and stop ASE


services

Cluster Monitor GUI that allows you to monitor activity


within the ASE

Figure 12 shows the distribution of the TruCluster Software


components in a conguration containing three ASE members.
The asemgr utility and Director daemon run on only one member
system at a time. The Logger daemon can run on more than one
system.

Introducing TruCluster Available Server Conguration and Management 19

Presenting the TruCluster Software

Figure 12 TruCluster Software Components


Client

Director daemon
Logger
daemon

Client

Manager utility
Action
script

Action
script

Host status
monitor
daemon

Client

Host status
monitor
daemon

Agent
daemon

Agent
daemon

Action
script
Host status
monitor
daemon

Agent
daemon

Availability manager
driver

Availability manager
driver

Availability manager
driver

Member system

Member system

Member system
ZKOX381681RGS

ASE Services

To make an application highly available, you must set up an ASE


service for that application. Each service is assigned a unique
name. The TruCluster Software supports three types of services:

NFS service

Disk service

User-dened service

If the member system running a service fails, or if a network or


I/O bus failure prevents the system from providing the service
to clients, the TruCluster Software automatically relocates the
service to another member system under the following conditions:

A member system fails

Members cannot access a device due to a SCSI bus failure or a


device failure

All network connections on a member system fail, while the


network is still available to other ASE members

A member system that was down comes back up and has a


more favored status to run a particular service, as dened in
the TruCluster Software Automatic Service Placement (ASP)
policy

You can use the asemgr utility to manually relocate a service from
one member system to another.

110 Introducing TruCluster Available Server Conguration and Management

Planning TruCluster Available Server Congurations

Planning TruCluster Available Server Congurations


Overview

Planning an Available Server implementation requires you to


consider three main areas of conguration:

Storage conguration

Network
Conguration

Network conguration
Service Availability conguration

When planning an Available Server network conguration, you


must determine:

How clients will receive the services provided by Available


Server

How failover will be performed so that the clients can continue


receiving the service

Available Server implementations can be congured with a


single primary network, or with primary and backup networks.
Backup networks provide higher availability by allowing services
and daemons to switch to auxiliary network paths if a network
failure occurs. The TruCluster Software will declare a network
partition or interface failure only after all congured network
paths are found to be nonfunctional.
To work properly, the primary and backup networks in the
ASE must provide equal access from any member to any client
requiring the available services. This gives any member the
ability to serve any client during a failover recovery sequence.
This is sometimes described as a symmetric network
conguration between clients and ASE members.
The TruCluster Software uses separate network IP addresses to
provide a common access for IP services like NFS. These service
addresses must be reserved so that no systems accidentally use
the same addresses.

Introducing TruCluster Available Server Conguration and Management 111

Planning TruCluster Available Server Congurations

Storage
Conguration

When planning an Available Server conguration, you must


also determine the shared storage requirements for the highly
available set of services. For example, a conguration may need
to provide a highly available database to a set of clients. You
must know the size and availability constraints of the database to
set up an appropriate Available Server conguration.
The Available Server storage conguration requires that there be
at least one shared SCSI bus between all members. All storage
used to support highly available services must be commonly
accessible by each cluster member through a shared SCSI
bus connection. The conguration of the ASE, with respect to
shared SCSI buses and devices, must be such that each device
is addressable from each member by the same name. This
conguration of common/shared devices and SCSI buses is often
referred to as a symmetric storage conguration.
You must also know the amount of storage to be shared and the
degree to which it must withstand failures to successfully plan
Available Server storage requirements. For example, if an ASE
service must survive failures of a single bus and single disk at
the same time, the storage associated with that service must be
mirrored across at least two SCSI buses and each of those buses
must be directly accessible from all ASE members.

Service
Availability
Conguration

Finally, when planning any Available Server conguration you


must determine the services that are being made available.
If there are no services, an Available Server solution is not
required. Knowing the nature of the service and its availability
requirements are essential when creating a TruCluster Available
Server implementation to best suit the service.
Two characteristics are required for an application to be suitable
for an ASE service:

The application must run on only one system at a time.

The application must be able to be started and stopped using


a set of commands issued in a specic order so that the
commands can be used in an action script.

One of the more common applications that lends itself to being


suitable for an ASE service is a database application for a set of
clients. Anytime the database becomes inaccessible to the clients,
the customer will complain that the TruCluster Available Server
conguration is failing. Making the database highly available
requires an understanding of the database design, storage
requirements, and the communications paths used to provide
access to clients.
A breakdown in any of the communications paths can prevent
access to the ASE database service. To the customer, this is an
availability problem and since the TruCluster Available Server
Software is the primary product that provides availability, this

112 Introducing TruCluster Available Server Conguration and Management

Planning TruCluster Available Server Congurations

becomes a Digital problem. Unfortunately this problem may


not be directly related to the TruCluster Available Server. The
problem is often caused by poorly congured networks that
become noisy or saturated.
Be careful when conguring failover storage associated with a
service. Ask the customer if the storage needs are likely to grow.
If storage needs are expected to grow, the storage conguration
plan should take this into account and provide for expansion.
Storage can be expanded in many ways depending on the
applications storage requirements.

Introducing TruCluster Available Server Conguration and Management 113

Determining Conguration and Maintenance Phases

Determining Conguration and Maintenance Phases


Overview

Establishing and maintaining an Available Server implementation


requires that you go through a number of distinct phases. You
must know which phase you or the customer is in when a problem
occurs. Many times problems occur because phases have been
skipped or issues associated with a particular phase have not
been completely addressed.
As shown in the following gure, the phases associated with
establishing and maintaining an Available Server implementation
are:

Planning the hardware and software conguration

Conguring the hardware

Installing and setting up the base operating system software

Installing the TruCluster Software software

Conguring ASE services

Testing ASE failover sequences

Monitoring and managing the environment

Troubleshooting

Figure 13 Available Server Conguration and Maintenance Phases

Planning the
Configuration

Configuring
ASE
Services

Installing and
Configuring the
Hardware

Testing
ASE
Failover

Installing the
Base OS

Installing
TruCluster
Software

Monitoring and
Managing
an ASE

Troublishooting
TruCluster
Implementations
ZKOX548120RGS

To help determine whether a particular phase is complete, you


must develop worksheets to collect the pertinent information
for each phase. The following sections provide lists of questions
associated with each phase. You can use these functions to
develop the necessary information.

114 Introducing TruCluster Available Server Conguration and Management

Determining Conguration and Maintenance Phases

Planning the
Available
Server
Conguration

When planning the hardware and software needs for an Available


Server conguration, you must ask:

What services are to be made available?

What is the network conguration?

What are the survivable failures?


Server => 2 to 4 nodes per ASE
Disk => mirroring or RAID (what kind?)
SCSI bus => mirroring across multiple SCSI buses
Power => UPS
Network => dual network controllers per served net (not
currently supported on primary net)

What is the frequency of client requests?

What other services are provided to clients?

What is the security policy?

Do any available services require custom scripts?

Conguring
ASE Hardware

How much available storage is required?

Is this Available Server conguration expected to grow in


members or storage? If so, how soon and to what extent?

When conguring the hardware for the ASE, you must ask:

What is the primary network?

What are the backup networks (if any).

If subnets are being used, are all members on the same


subnet?

What are the client connectivity networks?

How many shared SCSI buses are required?

How many systems or members are required?

What are the storage requirements?

How will the SCSI buses be congured (SNS, FWD)?

Are DWZZAs required? If so, how many?

What are the cable length restrictions?

Is UPS required?

What are the cabling needs? Y cables?

Are all hardware component revisions correct?

Introducing TruCluster Available Server Conguration and Management 115

Determining Conguration and Maintenance Phases

Installing and
Setting Up the
Base Operating
System

When setting up the base operating system, you must answer the
following questions:
Are there any licensing issues?

What kernel options to choose?

What network address map to which interfaces?

Which member system will be the ntp server to synchronize


time?

What service addresses are set up?

Are mirrored or striped volumes required?

Are RAID setups required?

How to set up the security policy?

Run BIND and/or NIS to distribute IP addresses?

Installing the
TruCluster
Software

Are all members in each others /etc/hosts le?

When installing the TruCluster Software, you must ask:


Is the TruCluster Software Product Authorization Key (PAK)
registered?

Is the Logger daemon running on this node?

Conguring
ASE Services

Is the membership list up and operational (pinging)?

When conguring the ASE services, you must ask:


What types of services are going to be set up to fail over?

Which networks do you want the TruCluster Software to


monitor?

What is the automatic service placement policy?

Testing the
TruCluster
Software
Failover
Sequences

What are the storage needs (NFS, AdvFS, LSM, UFS)?

When you have installed the TruCluster Software and set up


services, you must test your system to verify that it fails over
correctly. The questions you must answer are:

How to cause a server failure?

How to create a storage container failure?

How to invoke a SCSI bus failure?

How to test a network interface failure?

How to test a network partition?

116 Introducing TruCluster Available Server Conguration and Management

Determining Conguration and Maintenance Phases

Monitoring
and Managing
Available
Server
Congurations

After your system has been running for awhile, you will be
performing maintenance tasks. The questions you may be asking
are:
How will I get status on services and members?

Will I be relocating services for load-balancing purposes?

Will I be taking services off line?

How do I restart services after a disk failure?

How do I rereserve LSM disks after a disk failure?

Troubleshooting
an Existing
Available
Server
Conguration

How do I upgrade the TruCluster Software?

When problems occur in an existing Available Server


conguration, you must be able to answer these troubleshooting
questions:

How do I read and evaluate the kern.log and the daemon.log


les?

What corrective actions must I take?

What are some of the most commonly experienced problems?

Introducing TruCluster Available Server Conguration and Management 117

Summary

Summary
Introduction to
the Available
Server
Software

The TruCluster Available Server Software product is a high


availability solution that minimizes, but does not eliminate, the
impact of hardware and software failures.

Conguring
the Available
Server
Software

The TruCluster Software coordinates groups of computer


systems and their disks so they can function as a unied service
environment rather than as an unintegrated collection of loosely
coupled processors. For the TruCluster Software to function
properly, only supported hardware congurations are allowed.

The TruCluster Available Server Software was originally


conceived as a mechanism for restarting NFS services in a
new location following a hardware failure. Its capabilities have
expanded into a general failover product. It is also a rst step
in implementing full clustering capabilities on the Digital UNIX
platforms.

An Available Server hardware conguration consists of from two


to four member systems connected to at least one shared SCSI
bus, with external disks. The systems communicate with each
other and monitor the shared devices through both the shared bus
and the network.

Presenting
the TruCluster
Software

The TruCluster Software components include:

Manager utility (asemgr) user interface allowing system


administrators to send commands to the Director daemon

Director daemon (asedirector) daemon that manages and


coordinates activities within the ASE domain

Agent daemon (aseagent) daemon that runs on each ASE


member; oversees ASE activities for a specic member

Host status monitor daemon (asehsm) daemon that runs on


each ASE member; monitors the status of the other members
in the ASE domain

Logger daemon (aselogger) daemon that writes error


messages to the system log les

Availability manager driver driver that supports serverto-server messages on the SCSI bus; reports I/O subsystem
failures to the Agent daemon

ASE driver driver that supports UFS le systems exported


from LSM volumes

118 Introducing TruCluster Available Server Conguration and Management

Summary

Action scripts scripts that initialize, start, and stop ASE


services

Planning
Available
Server
Congurations

asecdb database that describes all member nodes and


services within the ASE domain

Cluster Monitor GUI that allows you to monitor activity


within the ASE

There are three main areas you need to consider when you plan a
TruCluster Software conguration:

Network conguration

Storage conguration

Service Availability conguration

To work properly, the Available Server network conguration must


provide equal access from any member to any client requiring the
available services. This gives any member the ability to serve
any client during a failover recovery sequence. This is sometimes
described as a symmetric network conguration between
clients and ASE members.
The Available Server storage conguration requires that there be
at least one shared SCSI bus between all members. All storage
used to support highly available services must be commonly
accessible by each cluster member through a shared SCSI bus
connection.
When planning any Available Server conguration, you must
determine the services that are being made available. If there
are no services, an Available Server Software solution is not
required. Knowing the nature of the service and its availability
requirements are essential to conguring the TruCluster Software
to best suit the service.

Determining
Conguration
and
Maintenance
Phases

The phases associated with establishing and maintaining an


Available Server implementation are:

Plan the hardware and software requirements for the


Available Server implementation

Congure the hardware for the ASE

Install and set up the base operating system software

Install the TruCluster Software and congure the ASE


Services

Test ASE failover sequences

Monitor and manage the ASE

Troubleshoot the Available Server conguration if a problem


occurs

Introducing TruCluster Available Server Conguration and Management 119

Exercises

Exercises
Describing
the TruCluster
Available
Server
Software
Product:
Exercise

1. Describe the purpose of the TruCluster Available Server

Describing
the TruCluster
Available
Server
Software
Product:
Solution

1. The TruCluster Software is a high availability solution that

Software product.
2. Identify the TruCluster Software features.
3. Describe the TruCluster Softwares position in the Digital

UNIX cluster program.

minimizes, but does not eliminate, the impact of hardware and


software failures.
2. The TruCluster Software features include concurrently active

servers, master/standby conguration, transparent NFS


failover, automatic restart, fast le system recovery (when
used with AdvFS), and increased data integrity.
3. The TruCluster Software offers unied conguration and

control functions as well as availability for distributed


services. This is a step in the direction of an integrated
clustering capability. The Digital AdvantageCluster Available
Server package includes the TruCluster Software software in a
precongured hardware package.

Conguring
the TruCluster
Software:
Exercise

Describe a basic Available Server hardware conguration.

Conguring
the TruCluster
Software:
Solution

A basic Available Server hardware conguration consists of two to


four member systems, connected to at least one shared SCSI bus,
with external disks. The systems communicate with each other
and monitor the shared devices through both the shared bus and
the network.

120 Introducing TruCluster Available Server Conguration and Management

Exercises

Presenting
the TruCluster
Software:
Exercise

1. Identify the TruCluster Software components.

Presenting
the TruCluster
Software:
Solution

1. Available Server software components include the Manager

2. Which components are located on all member systems? On

only one member system?

utility (asemgr), the Director daemon (asedirector), the Agent


daemon (aseagent), the Host Status Monitor daemon (asehsm),
the Logger daemon (aselogger), the Availability Manager
driver (AM), and action scripts.
2. The Agent daemon, Host Status Monitor daemon, and

Availability Manager driver must be located on each member


system. The Manager utility and Director daemon must be
run on only one member system. The Logger daemon can run
on one or more member systems. The action scripts are on all
member systems that can be servers for those services.

Planning
Available
Server
Congurations:
Exercise

List and describe the three areas of consideration when planning


an Available Server implementation.

Planning
Available
Server
Congurations:
Solution

The three areas to consider when planning an Available Server


Software implementation are:

Network Conguration - Available Server network


congurations must provide equal access from any member to
any client requiring the available services.

Storage Conguration - All storage used to support highly


available services must be commonly accessible by each ASE
member through a shared SCSI bus connection.

Service Availability Conguration - One of the more common


application services is providing highly available access to a
database for a set of clients.

Introducing TruCluster Available Server Conguration and Management 121

Exercises

Determining
Conguration
and
Maintenance
Phases:
Exercise

List the phases for establishing and maintaining an Available


Server conguration.

Determining
Conguration
and
Maintenance
Phases:
Solution

The phases associated with establishing and maintaining an


Available Server implementation are:

Plan the hardware and software conguration for the


Available Server implementation

Congure the hardware for the ASE

Install and set up the base operating system software

Install the TruCluster Software and congure ASE Services

Test ASE failover sequences

Monitor and maintain the environment

Troubleshoot the existing Available Server conguration

122 Introducing TruCluster Available Server Conguration and Management

2
Understanding TruCluster Software
Interactions

Understanding TruCluster Software Interactions 21

About This Chapter

About This Chapter


Introduction

This chapter discusses the interactions among the TruCluster


Software components. It concentrates on three major topics:

Descriptions of the TruCluster Software components

Objectives

Basic concepts of highly available services


Analysis of how the software components interact to detect
failure events and respond when failure events occur

To understand the TruCluster Software interactions, you should


be able to:

Describe the TruCluster Software components

Resources

Dene the basic concepts of highly available services


Describe the ways in which the TruCluster Software
components interact to detect and respond to failure events

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Version 1.4 Release


Notes

Reference Pages

22 Understanding TruCluster Software Interactions

Introducing Highly Available Services

Introducing Highly Available Services


Overview

The TruCluster Available Server provides an infrastructure


that makes applications and system services highly available
to clients, minimizing downtime. High availability is achieved
by decoupling application downtime from system downtime.
If critical resources on a given server become unavailable, the
TruCluster Software runs action scripts that can restart an
application on another server. Different scripts are run in
response to different failure events.
The TruCluster Software detects system failures only; it cannot
detect software failures within an application, or corrupted data
les. However, the TruCluster Software can recover from system
software failures, provided they are severe enough to interrupt
inter-server communication over at least one of the redundant
communications paths monitored by the software. In some
cases, the server that has suffered the failure may be capable
of initiating the recovery process. However, when the damaged
server is unable to recover, it is up to the surviving servers to
realize a server went down and react appropriately.
The services that the TruCluster Software supports are congured
within an Available Server Environment (ASE). An ASE consists
of from two to four servers that are loosely coupled through the
sharing of one or more networks and one or more SCSI buses. By
sending message packets across these redundant paths at regular
intervals, the TruCluster Software can determine the current
status of the ASE members and initiate appropriate actions when
failures occur.

Independence
of Services
from Servers

An ASE service is an application that is provided to clients, such


as a database, an NFS service, or an electronic mail service.
Each ASE service has a unique name. Administrators use the
name to manage the service, and clients also use the name when
specifying their requests for service. This arrangement has the
benet that clients need not know which server is currently
providing the service. For example, incoming mail addressed to
user@server.site.com will be queued and delayed while machine
server is taken off line for repair, while mail addressed to
user@mailer.site.com will not be affected because another server
can provide the service named mailer while server is unavailable.
A services name is similar to a machines hostname, although it
cannot be a machines hostname.

Understanding TruCluster Software Interactions 23

Introducing Highly Available Services

Action Scripts

ASE services are started and stopped through action scripts.


Consequently, any application that is placed in an ASE service
must be manageable through script-based commands. If the
TruCluster Software determines that a resource critical to your
service has failed and a redundant resource is available to replace
it, the TruCluster Software executes a script that stops the
running instance of the service, replaces or reassigns the failed
resource, and executes other scripts to start a new instance of the
service on another server.
The TruCluster Software does not promise that clients will not
notice the interruption, nor does it promise that the client will not
need to take action to resume use of the service. The TruCluster
Software simply promises to try to restart a broken service.

24 Understanding TruCluster Software Interactions

Introducing the TruCluster Software Components

Introducing the TruCluster Software Components


Overview

The TruCluster Software consists of four daemons, several scripts,


a user interface, and two drivers. The presence of drivers in the
kit requires that the kernel be rebuilt once the software is copied
from the distribution media.
The TruCluster Software software is installed in the following
directories, some of which are links:
/dev, /etc, /opt, /sbin, /sbin/init.d, /sbin/rc*.d, /usr/opt, /usr
/bin, /usr/share/man, /usr/sys/kits, /usr/var/adm, /usr/var/ase,
/usr/var/opt, /usr/var/run, /var/ase, and /var/opt.

TruCluster
Software
Components

The TruCluster Software contains the following components:

/usr/sbin/asedirector
The Director daemon has a global view of the state of the
services provided by the servers in the ASE domain. Only
one instance of the Director can run within a given domain;
the Agents collaborate to start a new instance if the member
running the Director goes down. The Director assigns
services to servers according to the state of the domain,
honoring preferences and constraints imposed by the system
administrator.

/usr/sbin/aseagent
The Agent daemon oversees a single ASE member server.
Each member runs an instance of the Agent, which maintains
a near real-time view of the status of its host, communicating
this information to the Director.
The Agent deduces the status of service availability from
information reported by the Host Status Monitor and the
Availability Manager. It starts and stops services on the local
member under the Directors direction. The Agent daemon
uses the Availability Manager driver interfaces to reserve
disks and to receive notication of lost reservations.
The Agents on each member system are responsible for
electing a Director. If an Agent exits, the rest of the ASE
members consider the affected server to be "down."

/usr/sbin/asehsm
The Host Status Monitor (HSM) daemon monitors the
up/down status of the other member nodes and the state
of the local network interfaces, reporting any changes to the
Agent. On the member where the Director is running, the
HSM reports to the Director as well. Each member node runs
a single instance of the HSM.

Understanding TruCluster Software Interactions 25

Introducing the TruCluster Software Components

/usr/sbin/aselogger
The aselogger (Logger) daemon writes messages to the system
log les under /var/adm/syslog.dated. One instance of the
Logger can run on each server within the ASE, but the Logger
is not required to run on any member. However, running an
instance of the Logger on each ASE member is recommended.

Availability Manager Driver


The Availability Manager (AM) is a pseudodriver that lies on
top of the base systems SCSI CAM drivers. It implements
functions needed by the HSM that support server-to-server
messages on the SCSI bus. The AM also reports I/O
subsystem failures to the Agent.

ASE Driver
The ASE driver is a pseudodriver needed to support UFS le
systems exported from LSM volumes. If no UFS le systems
have LSM volumes assigned to them, the ASE driver is not
used.

/usr/sbin/asemgr
The asemgr is the user interface to the Director. The system
administrator can run the asemgr from any member node to
control the operation of the TruCluster Software. If more than
one instance of the asemgr is running within an ASE domain,
the state of the database is protected by locks.
Unlike the TruCluster Software daemons, the asemgr is
not run continuously. It can be accessed through a textbased interactive interface, and it can also be invoked
noninteractively, as from a script.

/usr/var/ase/config/asecdb
The asecdb is the binary database that describes all member
nodes and services known to the ASE. It is maintained using
the asemgr.

Action Scripts
Action Scripts act as the interface between the TruCluster
Software and the ASE services. To make an application ready
for use within an ASE, you rst write scripts to start and stop
the application. You can then use the asemgr to create the
service, assign the applications resources to the service, and
copy the Action Scripts to the ASE database. The TruCluster
Software invokes the script to restart the application in the
event of a failure. Since Action Scripts are copied into the ASE
database, editing the original does not alter the TruCluster
Softwares copy. Action Script templates are provided in the
/var/opt/ASE*/ase/lib directory.

26 Understanding TruCluster Software Interactions

Introducing the TruCluster Software Components

You can also create user-dened Action Scripts to control


the TruCluster Software through the asemgr command line
interface. For instance, a cron script could be used to take an
ASE service off line, back it up, and place it on line again.

TruCluster
Software
Component
Interaction

Cluster Monitor
The Cluster Monitor, contained in an optional software subset,
provides graphical assistance for the TruCluster Software
administration based on event reports from the TruCluster
Software.

The TruCluster Software components work together to monitor


the state of the ASE domain and to detect and respond to failures
in hardware and system software.
Figure 21 depicts an Available Server conguration with two
ASE member systems linked by two networks and two shared
SCSI buses.

Figure 21 ASE Software Component Interaction

ASE Member 1

ASE Member 2

ICMP

Network Pings

ICMP

AM/CAM

Kernel

SCSI Pings

AM/CAM

aseagent

asemgr
commands

Kernel

asecdb

asecdb

action
scripts

asehsm

asedirector

action
scripts

aseagent

aselogger

asehsm

aselogger

ZKOX548122RGS

Understanding TruCluster Software Interactions 27

Introducing the TruCluster Software Components

The two ASE member systems shown in Figure 21 are congured


in a symmetric manner. They share a common database, asecdb.
Member 1 is running the asedirector and the aselogger. The
Director daemon interacts with the Agent daemons on both
members to provide coordination of ASE activities.
The Host Status Monitor communicates the status of the SCSI
buses and the networks to the Agents and the Director. It
receives information about the state of the SCSI bus from the AM
driver, which is congured into the kernel.
User commands are sent to the asedirector from both member
systems through the asemgr utility. However, if more than one
instance of the asemgr is running at a given time, the asecdb
becomes locked and management operations cannot be performed.
The Logger daemon logs errors detected on the SCSI bus by the
AM driver to the kern.log le. It logs errors detected by the other
software components to the daemon.log le.

28 Understanding TruCluster Software Interactions

Understanding TruCluster Failure Detection and Response

Understanding TruCluster Failure Detection and Response


Overview

The TruCluster Software monitors the health of the member


nodes within the ASE domain, and determines if and how to react
when a member appears to change. The decision depends upon
the state of the two communication paths between the member
nodes: the network and the shared SCSI bus. When a signicant
failure is detected on a server, the TruCluster Software reassigns
the services that were being provided from the affected member
to another server. Such an automatic reassignment is called a
failover.
The Host Status Monitor (HSM) on each system monitors the
state of the other member systems through Internet Control
Message Protocol (ICMP) pings over the network and pings
generated by the Availability Manager (AM) over the shared SCSI
bus. The HSMs on each ASE member determine the status of the
other ASE members in the following way:

Every few seconds, one member node issues a SCSI send


command to another member through their respective SCSI
host adapters,1 and the other node responds immediately.

Over the network path, an ICMP echo, or ping, is exchanged.


TruCluster-initiated network echoes carry data as well, to
distinguish them from those initiated by other software.

The failure of a node to respond to these periodic checks (with


retries) may ultimately cause the alerted node to invoke the
failover logic.

This section examines how the TruCluster Software responds to


different types of failures within the ASE.

Failure Events
that Trigger a
Response

There are ve types of failure conditions that the TruCluster


Software detects and responds to, as follows:

Member Node failure An ASE member becomes inoperable


(Host Down)

Critical SCSI path failure A member detects an I/O error to


a shared disk, even though the disk is available

Device failure A disk on the shared SCSI bus fails to


respond to I/O

The TruCluster Software uses the ICMP term ping to describe this
interaction.

Understanding TruCluster Software Interactions 29

Understanding TruCluster Failure Detection and Response

Member Node
Failure

Network Interface failure A members network connection


fails due to a bad controller, a pulled network cable, or a
system crash
Network Partition At least two members cannot
communicate with each other over the network, even though
all members network interfaces are functional

Figure 22 depicts the failure of an ASE domain member. In


this situation, the Host Status Monitor daemon on the local host
member determines that the remote host is down because the
host pings on both the network and the SCSI bus have timed
out.
Figure 22 Member Down Scenario

I/O Bus

Local
Host

Remote
Host

Network
ZKOX3927102RGS

When a member failure is detected, the TruCluster Software:

Noties the administrator that a member is unavailable by


running the Alert script

Restarts the Director on a surviving member (if it was running


on the failed member)

Restarts each affected service according to its Automatic


Service Placement (ASP) policy

When a system crashes, the failed node does not run any of the
stop scripts for the services that it was running. Consequently,
when the member becomes available again, the ASE Agent will
run all of the stop scripts for each service in the ASE database to
be sure that any necessary "clean-up" has been performed.

210 Understanding TruCluster Software Interactions

Understanding TruCluster Failure Detection and Response

SCSI Bus
Failures

The TruCluster Software uses the SCSI buses for two purposes:
carrying I/O to devices, and communicating state information
between servers. A failure of a SCSI bus is not necessarily a
reason to invoke a failover. If another working bus is available,
the server state information can pass over it just as well. If there
are no storage devices in active use on a failed bus, there is no
reason to fail over services associated with those devices (the bus
may become available before a disk I/O arrives). Furthermore,
an unresponding disk may belong to an LSM mirrored volume,
in which case LSM may be able to eld the failure, avoiding a
failover. This line of reasoning leads to the establishment of the
following conditions as being necessary and sufcient to initiate a
failover on a SCSI bus:

Failure of I/O to a device on a shared bus

LSM cannot recover from the failure

Note that even if a member node fails to respond to SCSI pings


over all SCSI buses, this is still not sufcient to trigger a failover.
In the case of a device failure, whether the service fails over or is
shut down depends on whether another member is able to reach
the disk.
As indicated, failures can occur on a SCSI bus under two basic
scenarios:

Critical SCSI path failure

Device failure

The next two sections discuss the TruCluster Softwares response


to each of these conditions.

Critical SCSI
Path Failure

A critical SCSI path failure occurs when an ASE service cannot


access a shared device even though the device itself is available.
At the time of the I/O failure, the TruCluster Software is unable
to distinguish a SCSI path failure from an actual device failure.
Figure 23 depicts a critical SCSI path failure. In this situation,
the Availability Manager on the local host member noties the
Host Status Monitor daemon that the ping over the SCSI bus
has timed out. In addition, the Availability Manager noties the
Agent daemon of a device path failure.
When an I/O failure is detected, the TruCluster Software:

Runs an Alert script to notify the administrator that a device


cannot be reached.

Determines whether the service can continue to run without


accessing the affected device (if it is part of an LSM mirrored
volume), and either:
Leaves the service running, if possible, or

Understanding TruCluster Software Interactions 211

Understanding TruCluster Failure Detection and Response

Figure 23 Critical SCSI Path Failure Scenario

I/O Bus

Local
Host

Remote
Host

Network
ZKOX3927103RGS

Stops the affected service and attempts to restart the


service on each eligible ASE member until successful
or until all eligible members are tried. The TruCluster
Software selects members according to the affected
services ASP policy.
If the SCSI path failure does not affect all eligible members, the
service will be made available on one of the unaffected members.
If the SCSI path failure affects all eligible members, the service
will remain unassigned.
Note that if a service cannot be stopped after an I/O failure,
the TruCluster Software will reboot the member running the
service in order to make the service available on another member.
This will occur if the le systems or lesets associated with the
affected service cannot be unmounted.

Device Failure

A device failure occurs when a device cannot respond to I/O. The


only difference between the TruCluster Softwares response to a
device failure and the TruCluster Softwares response to a critical
SCSI path failure is that when the TruCluster Software tries to
restart the service on other eligible members, all attempts will fail
if the device has failed and the service will remain unassigned.

212 Understanding TruCluster Software Interactions

Understanding TruCluster Failure Detection and Response

In the case where a failing disk is not mirrored, the TruCluster


Software:

Logs an alert message

Stops the affected service

Marks the service as unassigned and issues an alert message

If LSM mirroring is being used, and a plex is available, the


TruCluster Software will continue to make the corresponding
service available. No further response by the TruCluster
Software is necessary. While the service is still available from the
current member, it cannot be started on another member, which
minimizes the chance of accessing stale data. This is a minor
tradeoff between availability and data integrity.

ASE_PARTIAL
_MIRRORING
Parameter

There is a TruCluster Software runtime parameter named


ASE_PARTIAL_MIRRORING which, when set to on will enable an LSM
service with a failed plex to start on another member, but this is
not recommended.
The recommendation is that when you set up a service that uses
an LSM mirrored volume, the asemgr asks you to choose whether
or not to allow the service to move if a more highly favored
member becomes available. You should choose to not allow the
TruCluster Software to move the service in this case. That way, if
a plex has failed, the service will remain available on the member
that it is running on. Otherwise, when the TruCluster Software
tries to move the service, it will be unable to start it on any
member and the service will then remain unavailable.
If you choose to enable ASE_PARTIAL_MIRRORING, you must realize
that you are trading off the remote potential for data integrity in
favor of service availability. To enable ASE_PARTIAL_MIRRORING, type
the following on all ASE members:
# rcmgr set ASE_PARTIAL_MIRRORING on

Network
Failures

Any network to which all ASE members are connected may be


used for daemon communications or pings. If there are several
such fully connected networks, one is designated as the primary
network, while the others are considered to be backup networks.
Unless you specify otherwise, the hostname network is the primary
network.
An ASE member will always use its primary network if it is
operating, and automatically switch to a backup upon failure. If
one server switches to a backup, the other members will use the
same backup to communicate with it, but will continue to use the
primary network to communicate with each other. When a failed
primary network is restored, the TruCluster Software resumes
using it automatically.

Understanding TruCluster Software Interactions 213

Understanding TruCluster Failure Detection and Response

When a network ping failure occurs, the Host Status Monitor


waits to determine whether the failure is local, or if the network
has been partitioned. A network interface failure occurs when
the HSM nds that all of the local network interfaces cannot
transmit onto the network (the TruCluster Software contains
interface test state machines that test each monitored network
controller periodically to determine its status).
In contrast, a network partition is declared between two members
when another member cannot be pinged over any of the available
networks, even though the member does respond to SCSI pings.
Therefore, a network partition may or may not involve a local
network interface failure.
When a network interface failure is detected, the TruCluster
Software will fail ASE services over to another member whose
network interface is still working. However, if the network
has been partitioned, it does not make sense to fail over
services because all other members are affected by the partition.
The TruCluster Softwares response to a partition is passive;
it disables failovers. In short, the necessary and sufcient
conditions to initiate a failover due to a network failure are as
follows:

A member nodes network interface is down, as indicated by


its failure to respond to pings

A network ping has timed out because the network ping has
failed

Services cannot be restarted while the network is partitioned.


The TruCluster Software continues to attempt pings over the
failed network at a reduced frequency until the partition is
repaired.

Network
Interface
Failure
Response

When a network interface failure is detected, the TruCluster


Software:

Noties the administrator that a network interface has failed


by running the Alert script

Moves the Director to an unaffected member (if it was running


on the affected member)

Stops all services that were running on the affected member

Restarts each affected service according to its Automatic


Service Placement (ASP) policy

It is important to note that if a service cannot be stopped after


a network interface failure, the TruCluster Software will reboot
the member running the service to make the service available
on another member. This occurs if the le systems or lesets
associated with the affected service cannot be unmounted.

214 Understanding TruCluster Software Interactions

Understanding TruCluster Failure Detection and Response

Network
Partition
Response

When a network partition is detected, the TruCluster Software:

Noties the administrator that a network partition has


occurred by running the Alert script

Continues to run the existing services

Ceases to provide further failover or administrative services

The TruCluster Software continues to attempt pings over the


failed network at a reduced frequency until the network partition
failure gets corrected.
During a network partition, the asemgr cannot be used.
When the network partition gets corrected, the Agent daemons
select a member node to run the Director daemon and the
Director daemon restarts services.

Monitored
Network
Failures

By default, all networks used by the TruCluster Software


for daemon communications are monitored. However, if you
prefer, you can specify the networks whose local interfaces the
TruCluster Software maintains a watch on. A monitored network
need not be fully connected to all ASE members; any networks
interface may be monitored, provided it is an Ethernet or FDDI
interface. You use the asemgr to designate which networks you
want monitored. When a monitored networks interface fails, a
script is run to give your services the opportunity to react.
A network used for daemon communications does not necessarily
have to be monitored, nor does a monitored network necessarily
have to be used for daemon communications. The notions of
primary and backup network are independent of monitor and
ignore. For example, you may choose to ignore a primary network
if you do not need it for your services. In this case, the TruCluster
Software will not notify your service if the interface fails, but it
will route its own pings or daemon messages through a backup.
Conversely, if you choose to monitor a network that is not shared
by all the ASE members, the TruCluster Software will notify
your services of a failure, but the TruCluster Softwares ability to
function will not be affected by that failure.

Service
Failover

When the TruCluster Software relocates or fails over a service


to another member, it performs a stop script to shut down the
application (except in the case of a panic). If applicable, the le
systems are then unmounted and the SCSI disk reservations
released. The server designated to acquire the service then
reserves the disks, checks (UFS) and mounts the le system, and
executes a start script to restart the application. For example, in
the NFS service bundled with the product, the stop/start scripts
manage an IP address-aliasing process whereby a server answers
to services IP address (as well as to its own, genuine address).

Understanding TruCluster Software Interactions 215

Understanding TruCluster Failure Detection and Response

If the served application is inherently stateless, simple one-line


additions to the start and stop action scripts included in the
TruCluster Software distribution will often prove sufcient.
Where state information must be maintained, however, the scripts
may need elaborate enhancement to place an application in the
state it was in prior to the relocation or failover. For example,
an application that keeps a journal le will likely need to be
restarted with an option switch to read and execute the journal.
This is the administrators responsibility. Of course, the journal
must be available to the application from the new server; the le
should reside on a disk that migrates with the application.
Figure 24 depicts a failure in an NFS service. In this scenario,
a bad controller on ASE Member 2 causes ASE service sources
to fail over to ASE Member 3 (on which another ASE service is
already running).
Figure 24 Service Failover Scenario

ASE Domain
/usr/users

/sources

/spool/mail
SCSI

ASE Member 1

ASE Member 2

ASE Member 3

NFS Service
mail

NFS Service
sources

NFS Service
users

Network
NFS Client

NFS Client
ZKOX3927105RGS

It is not currently possible to control the order in which several


services being moved together are stopped and started. If order is
important, they should be managed together as a single service,
with a single pair of start and stop scripts.

216 Understanding TruCluster Software Interactions

Understanding TruCluster Failure Detection and Response

Note

It is important to understand that as each member enters


multiuser run level and runs asemember start, the Agent, as
part of its initialization, runs the stop script of each service
in its database, regardless of the fact that the service, if
running at all, may be running on another server.

As previously described, only a few well-dened events will


trigger a service failover. General I/O errors are not reported to
the TruCluster Software by the system, so a crashed application
or corrupted le system will not trigger a failover. For example, if
an applications I/O fails because the le system is full, or because
of insufcient privileges, the system will return an error to the
application which may in turn notify the user, but the system will
not react further. An I/O failure at a lower level, however, such
as a disk not responding to commands, will trigger a failover.
(However, if a disk is truly inoperable, no other member would be
able to use it either, and a failover would not succeed.) In such a
case, the service will be left unassigned and an alert message
logged.

Reserving
Devices

SCSI reservations are taken out on a services disks when the


service is started. A SCSI device holding a reservation will reject
most commands coming from hosts other than the one whose
reservations it holds. The inquiry command, however, is always
allowed.
When establishing a reservation on a device, the Agent uses the
following procedure:
1. The Agent sends a SCSI inquiry command to the device (this

is the only time the TruCluster Software pings a device).


If the ping fails, the device is unreachable, and the Agent
registers a path failure.
2. If the inquiry succeeds, the Agent next attempts to open the

devices special le.


3. If the open succeeds, the Agent tries to reserve the device.
4. If this also succeeds, the event is logged and the process is

complete.
If the call to open or the attempt to rereserve the device failed, the
Agent sends an ASE_RESERVATION_FAILURE message to the Director,
and the Director simply writes a log message in the daemon.log
le. No action is taken until a service actually tries to write to a
device only at that time does the TruCluster Software decide
what, if anything, to do.

Understanding TruCluster Software Interactions 217

Understanding TruCluster Failure Detection and Response

There are two ways to break a reservation: by sending the


device a bus device reset message, or by resetting the SCSI bus,
which breaks all reservations on the bus. Either action causes an
Agent to attempt to reestablish its reservations on all registered
devices.

Choosing a
New Director

When an Agent on an ASE domain member loses contact with


the Director, it uses the following procedure to reconnect to a
Director:
1. It walks down the list of members, attempting to connect to a

Director on each.
2. If no Director is found, it walks down the list a second time

looking for at least one member that satises three criteria, as


follows:

The HSM thinks that the member is up

The members IP address is greater than the local hosts


(this criteria is used to avoid starting multiple Directors)

There is an Agent on that host that responds to a test


message (so it isnt likely to be hung.

3. If no member meeting all three requirements is found, the

local member will start a Director.


4. When the Agent connects to a Director, the Agent sends the

Director an online message which causes the Director to run


its database consistency algorithm to ensure that all members
use identical copies of the database.

Action Script
Errors

When the TruCluster Software invokes an action script, it usually


considers a 0 (zero) exit code as a success. An exit code of 1
indicates that the script failed. A stop script can produce the exit
code 99, which indicates that the service could not be stopped
because the service was busy. The exception is the check action
script, which exits with an exit code between 100 and 200 to
indicate that the service is running, and an exit code that is less
than 100 to indicate that the service is not running.
All standard output and standard error output from your script
goes to the syslog daemon log. If a script exits with a 0 (zero),
it is logged as an informational message. If a script exits with a
nonzero exit code, the messages are logged as errors.
The timeout value for a script is the specied length of time
TruCluster Software waits for your script to nish running. This
value should be the maximum amount of time that the script
needs. If your script runs longer than the timeout value (for
example, because it hangs), the TruCluster Software will consider
the script failed and report the failure as a timeout of the script.

218 Understanding TruCluster Software Interactions

Understanding TruCluster Failure Detection and Response

LSM and
TruCluster
Failover

LSM complements the TruCluster Software in that it is able to


protect users and applications from the types of failures from
which the TruCluster Available Server cannot recover. LSM can
mirror a leset so that if one of the underlying disks fails, the
data is still available through the mirror leset. For maximum
protection, the mirrored leset should reside on an I/O subsystem
whose hardware substrate is completely independent of the failed
disks substrate. By placing the mirrors on different shared
SCSI buses, using different cables and other components, the
TruCluster Software and LSM together can protect the application
from any single point of I/O subsystem hardware failure. LSM
protects against disk failure and the TruCluster Available Server
protects against failures in the path to the disk.
When the I/O to a disk fails, the Agent consults the database to
see if that disk belongs to a mirrored LSM plex. If so, the Agent
runs a script causing LSM to manage the failure and continue I/O
through the remaining plex(es). If the LSM script exits with error
status, indicating no other plex is available, the Agent will stop
the service and mark it unassigned.

Understanding TruCluster Software Interactions 219

Summary

Summary
Introducing
Highly
Available
Services

The TruCluster Software provides an infrastructure that makes


applications and system services highly available to clients,
minimizing downtime. High availability is achieved by decoupling
application downtime from system downtime.
The services that the TruCluster Software supports are congured
within an Available Server Environment (ASE). An ASE consists
of from two to four servers that are loosely coupled through the
sharing of one or more networks and one or more SCSI buses.
An ASE service is an application provided to clients, such as a
database, an NFS service, or an electronic mail service. Each
ASE service has a unique name. Administrators use the name
to manage the service, and clients also use the name when
specifying their requests for service. This arrangement has the
benet that clients need not know which server is currently
providing the service.
ASE services are started and stopped through action scripts.
Consequently, any application placed in an ASE service must be
manageable through script-based commands. If the TruCluster
Software determines that a resource critical to your service has
failed and a redundant resource is available to replace it, the
TruCluster Software executes a script that stops the running
instance of the service, replaces or reassigns the failed resource,
and executes other scripts to start a new instance of the service
on another server.

Introducing
the TruCluster
Software
Components

All ASE member systems are congured in a symmetric manner.


They share a common database, asecdb. One member runs the
asedirector. The Director daemon interacts with the Agent
daemons on both members to provide coordination of ASE
activities.
The Host Status Monitor communicates the status of the SCSI
buses and the networks to the Agents and the Director. It
receives information about the state of the SCSI bus from the AM
driver, which is congured into the kernel.
User commands are sent to the asedirector from both member
systems through the asemgr utility. However, if more than one
instance of the asemgr is running at a given time, the asecdb
becomes locked and management operations cannot be performed.
The Logger daemon logs errors detected on the SCSI bus by the
AM driver to the kern.log le. It logs errors detected by the other
software components to the daemon.log le. It is recommended,
but not required, that you run the aselogger on all ASE members.

220 Understanding TruCluster Software Interactions

Summary

Understanding
TruCluster
Software
Failure
Detection and
Response

The TruCluster Software monitors the health of the member


nodes within the ASE domain, and determines if and how to react
when a member appears to change. The decision depends upon
the state of the two communication paths between the member
nodes: the network and the shared SCSI bus. When a signicant
failure is detected on a server, the TruCluster Software reassigns
the services that were being provided from the affected member
to another server. Such an automatic reassignment is called a
failover.
There are ve types of failure conditions that the TruCluster
Software detects and responds to, as follows:

Member Node Failure An ASE member becomes inoperable


(Host Down)

Critical SCSI Path Failure A member detects an I/O error


to a shared disk, even though the disk is available.

Device Failure A disk on the shared SCSI bus fails to


respond to I/O

Network Interface Failure A members network connection


fails due to a bad controller, a pulled network cable, or a
system crash

Network Partition At least two members cannot


communicate with each other over the network, even though
all members network interfaces are functional

Understanding TruCluster Software Interactions 221

Exercises

Exercises
Introducing
Highly
Available
Services:
Exercise

1. Describe the mechanism through which the TruCluster

Introducing
Highly
Available
Services:
Solution

1. An ASE consists of from two to four servers that are loosely

Software determines the status of the ASE members.


2. Discuss some limitations of ASE services.

coupled through the sharing of one or more networks and one


or more SCSI buses. By sending message packets across these
redundant paths at regular intervals, the TruCluster Software
can determine the current status of the ASE members and
initiate appropriate actions when failures occur.
2. The TruCluster Software cannot detect software failures

within an application, or corrupted data les. In addition,


when ASE services are relocated, the TruCluster Software
does not promise that clients will not notice any interruption,
nor does it promise that the client will not need to take action
to resume use of the service. The TruCluster Software simply
promises to try to restart a broken service.

Introducing
the TruCluster
Software
Components:
Exercise

Describe the function of the aseagent daemon.

Introducing
the TruCluster
Software
Components:
Solution

The Agent daemon oversees a single member server. Each


member runs an instance of the Agent, which maintains a near
real-time view of the status of its host, communicating this
information to the Director.
The Agent deduces the status of service availability from
information reported by the Host Status Monitor and the
Availability Manager. It starts and stops services on the local
member under the Directors supervision. The Agent daemon uses
the Availability Manager driver interfaces to reserve disks and to
receive notication of lost reservations.
The Agents on each member system are responsible for electing a
Director. If an Agent exits, the rest of the ASE members consider
the affected server to be down.

222 Understanding TruCluster Software Interactions

Exercises

TruCluster
Software
Failure
Detection and
Response:
Exercise

Describe the way in which the TruCluster Software monitors the


state of the systems in an Available Server conguration.

TruCluster
Software
Failure
Detection and
Response:
Solution

Each member of the ASE domain is connected to all the other


members through both SCSI bus and network communication
paths. The member systems use these paths to detect the state of
the other systems as follows:

Every few seconds, one member node issues a SCSI send


command to another member through their respective SCSI
host adapters.

Over the network path, an ICMP echo, or ping, is exchanged.


TruCluster-initiated network echoes carry data as well, to
distinguish them from those initiated by other software.

The failure of a node to respond to these periodic checks (with


retries) may ultimately cause the alerted node to invoke the
failover procedures.

Understanding TruCluster Software Interactions 223

3
Conguring TruCluster Available Server
Hardware

Conguring TruCluster Available Server Hardware 31

About This Chapter

About This Chapter


Introduction

This chapter describes the general hardware conguration


rules and restrictions. It also lists the supported hardware
used to set up TruCluster Available Server Software Version
1.4, and describes how to set up the hardware for TruCluster
Available Server. Hardware conguration includes installing the
cables, terminators, and signal converters necessary to connect
a supported controller to a storage shelf subsystem containing
Available Server-supported disks.
Detailed information about installing a computer system, SCSI
controller, or storage box is not provided. See the documentation
for the specic hardware for information on installing the
hardware itself.

Objectives

To set up a TruCluster Available Server environment, you should


be able to:

Determine the supported SCSI controllers, cables, terminators,


and storage boxes

Resources

Describe TruCluster Available Server general conguration


rules and restrictions

Congure the hardware necessary to support various


TruCluster Available Server congurations

For more information on the topics in this chapter, see the


TruCluster Available Server Software documentation.

Available Server Environment Administration

Hardware Conguration and Software Installation

TruCluster Available Server Software Version 1.4 SPD

32 Conguring TruCluster Available Server Hardware

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

Examining TruCluster Available Server


General Hardware Conguration Rules and Restrictions
Overview

There are conguration rules and restrictions for each of the


supported components, but there are also general rules and
restrictions that must be adhered to for any Available Server
conguration. This topic covers the general rules and restrictions.
Additionally, this topic contains a SCSI bus overview to ensure
that you are familiar with SCSI bus terminology, cabling, and
termination.

Rules and
Restrictions

Following are the general TruCluster Available Server rules and


restrictions that govern Available Server congurations:

The number of systems allowed on a shared bus in an


Available Server conguration ranges from two to four. If any
system uses a KZMSA XMI to SCSI adapter or a PMAZC
SCSI controller, the maximum number of systems is reduced
to three.

All the systems in a TruCluster Available Server conguration


must be connected to at least one common IP subnet.
Available Server allows you to connect redundant networks for
failover situations.

All Available Server member systems must be connected to at


least one shared SCSI bus.
All systems in the Available Server conguration must see the
disks on the shared SCSI bus as the same device number. For
instance, an RZ26 installed in BA350 slot 1 on SCSI bus 2
(scsi2) will have a device address of rz17. All member systems
in the Available Server conguration must see this disk as
rz17.
Therefore, in this case, the shared SCSI bus must be SCSI bus
2 on all systems in the Available Server conguration.

SCSI bus specications limit the number of devices on an 8-bit


SCSI bus (narrow) to 8 and to 16 on a 16-bit SCSI bus (wide).
However, although TruCluster Available Server supports
wide, differential SCSI devices, Digital UNIX supports only 8
devices on a SCSI bus.

The length of the shared SCSI bus cannot exceed:


3 meters for single-ended fast SCSI
6 meters for single-ended slow SCSI
25 meters for differential SCSI

Conguring TruCluster Available Server Hardware 33

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

The bus length includes the bus length within the system
and host adapter and any storage box. Table 31 provides
some SCSI bus lengths you must consider for Available Server
congurations.
Table 31 SCSI Bus Lengths in Some Devices and Systems
Device

SCSI Bus Length

BA350

0.9 meter

BA353

0.9 meter

BA356

1.0 meter

DEC 7000

0.8 meter

DEC 10000

0.8 meter

For TURBOchannel-based systems, you must set the


boot_reset console variable to ON. If the variable is not set,
the TURBOchannel option self-test may fail and the system
may not reboot automatically.

When a system with a PMAZC installed is turned on, it


may hang during the PMAZC self-tests if the PMAZC is not
properly terminated.

Although it is not necessary, you should use DWZZA-AA signal


converters whenever KZMSA XMI to SCSI adapters (DEC
7000 and DEC 10000) are used because:
You cannot remove the KZMSA internal terminators. The
DWZZA-AA allows the system to be isolated from the
shared SCSI bus without bringing down the entire shared
SCSI bus.
Because the KZMSA uses almost 1 meter of SCSI cable
internally, you have only 2 meters left (for fast speed)
outside the cabinet. On machines the size of the DEC
7000 or DEC 10000, 2 meters are not enough to connect to
storage devices and another host.

You should use DWZZA-AA signal converters when using


PMAZC SCSI host adapters in fast mode.

You must remove the DWZZA-AA cover to remove termination.


When you replace the cover, ensure that star washers are in
place on all four screws that hold the cover in place. If the
star washers are missing, the DWZZA-AA is susceptible to
problems caused by excessive noise.

The devices connected to the shared SCSI bus must be


installed in a manner that allows you to disconnect them
without affecting bus operation.

34 Conguring TruCluster Available Server Hardware

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

An improperly terminated SCSI bus segment will cause


problems. It may operate properly for a period of time with no
error conditions, but cause problems when under heavy load
conditions. An unterminated SCSI bus will not operate.

SCSI Bus
Overview

You cannot use a tape drive on the shared SCSI bus in an


Available Server conguration.

This gure shows the SCSI bus adapter as a single-ended


device. The signal converter converts the single-ended bus to a
differential bus.
Figure 31 One SCSI Bus, Two Transmission Methods

SingleEnded
Host Adapter

SingleEnded Bus

DWZZA

Differential Bus

Differential
Device

ZKOX441853RGS

Single-ended: This transmission method uses one wire for


the signal and a second wire for ground for each signal. This
method is susceptible to noise. The data path can be 8 or 16
bits. Single-ended devices include:
BA350 (narrow)
BA353 (narrow)
BA356 (wide)
PMAZC
KZMSA
Single-ended side of a DWZZA or DWZZB signal converter

Differential: The signal in this transmission method is


determined by the potential difference between two wires and
is less susceptible to noise than single-ended transmission.
This enables the use of longer cables for a total bus length of
25 meters.
The data path used in a differential SCSI bus is usually 16
bits. Differential devices that can be used in a TruCluster
Available Server environment include:
KZPSA
KZTSA
HSZ10
HSZ40
Differential side of a DWZZA or DWZZB signal converter

Conguring TruCluster Available Server Hardware 35

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

A SCSI bus can consist of multiple SCSI buses, but each SCSI
bus must be terminated on each end of the bus.
Note

Figure 32 shows the legend used to indicate SCSI bus


termination in gures in this course.

Figure 32 Legend for SCSI Bus Termination

SCSI bus is terminated on the host adapter or device

The host adapter or device termination has been removed

SCSI bus is terminated externally to a host adapter or device


ZKO392707RGS

Figure 33 shows examples of SCSI buses with a host adapter


and devices on the end of the bus only.
Figure 33 SCSI Buses with Devices on Bus Ends Only

SingleEnded
Host Adapter

Differential
Host Adapter

Singleended Bus
Bus length = 6 meters (slow)
3 meters (fast)

Differential Bus

SingleEnded
Device

Differential
Device

Bus length = 25 meters


ZKOX392703RGS

Host adapters and devices may be added to the SCSI bus as long
as the bus is only terminated at the bus ends. In Figure 34, a
host adapter has been added to the middle of the bus. Note that
the host adapter termination has been removed from the host
adapter in the middle of the bus.

36 Conguring TruCluster Available Server Hardware

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

Figure 34 SCSI Bus with Device in the Middle of the Bus

SingleEnded
Host Adapter

SingleEnded Bus

SingleEnded
Device

Differential
Device

T
SingleEnded
Host Adpater

Differential
Host Adapter

Differential Bus

T
Differential
Host Adapter
ZKOX392704RGS

A single-ended SCSI bus can be connected to a differential SCSI


bus through a signal converter.
The DWZZA signal converter is a single-ended SCSI to differential
SCSI 8-bit bus converter capable of data transfer rates up to
10M bytes/sec. The DWZZA can be installed between an 8-bit,
single-ended SCSI bus and a 16-bit, differential SCSI bus
operating in the 8-bit mode.
The DWZZB signal converter is a single-ended SCSI to differential
SCSI 16-bit bus converter capable of data transfer rates up to
20M bytes/sec. The DWZZB can be installed between a 16-bit,
single-ended SCSI bus and a 16-bit, differential SCSI bus.
A DWZZA (DWZZB) has a single-ended end and a differential
end, is bidirectional, and has termination on each end.
Figure 35 shows four examples of SCSI buses using mixed
transmission.

Conguring TruCluster Available Server Hardware 37

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

Figure 35 SCSI Buses Using Bus Segments with Different


Transmission Methods

SingleEnded
Host Adapter

Differential
Host Adapter

Differential
Host Adapter

SingleEnded
Host Adapter

SingleEnded Bus

16bit Differential Bus

16bit Differential Bus

DWZZA

SingleEnded
Bus

DWZZA

DWZZA

DWZZB

T
Differential
Bus

Differential Bus

8bit SingleEnded Bus

16bit SingleEnded Bus

DWZZA

Differential
Device

SingleEnded
Device

SingleEnded
Device

SingleEnded
Device

SingleEnded
Bus
ZKOX441825RGS

Connecting a single-ended host adapter to a differential device


(HSZ40) through a DWZZA

Connecting a differential host adapter (KZPSA) to an 8-bit


single-ended device (BA350) through a DWZZA

Connecting a differential host adapter (KZPSA) to a 16-bit


single-ended device (BA356) through a DWZZB

Connecting a single-ended host adapter to a single-ended


device using DWZZAs to increase the bus length and allow
using fast bus speed

In an Available Server conguration, there is a minimum of two


computer systems with supported SCSI adapters and one shared
storage device. This means that there is always at least one host
adapter or storage device in the middle of the SCSI bus.
One of the objectives of an Available Server environment is to
allow the removal of a host adapter or device without affecting
overall bus termination.
In Figure 34, if a host adapter or device is disconnected from
the SCSI bus, the SCSI bus would either be opened or improperly
terminated and cease to function properly.
The use of a "Y" cable or tri-link connector allows the SCSI bus
to be terminated external to a host adapter or device so the
host adapter or device can be removed from the bus without
disrupting SCSI bus operation. The device removal from the bus
is accomplished by disconnecting the "Y" cable from the device
that you want to remove from the bus.
38 Conguring TruCluster Available Server Hardware

Examining TruCluster Available Server General Hardware Conguration Rules and Restrictions

In Figure 36, "Y" cables are used to terminate the SCSI bus
external to the host adapter and device. Either host adapter
can be removed from the SCSI bus without disrupting SCSI bus
operation by removing the "Y" cable from the host adapter. Notice
that host adapter and device internal termination is removed
when external termination is used.
Figure 36 Using External Termination on the SCSI Bus
"Y" Cable
Differential
Host Adapter

"Y" Cable

T
Differential
Bus

Differential
Device

Differential
Bus

"Y" Cable

T
Differential
Host Adapter
ZKOX392706RGS

In Figure 37, a "Y" cable is disconnected from one of the host


adapters to remove the host adapter from the SCSI bus. This
allows you to perform maintenance on the host adapter or system
without affecting the SCSI bus. Note that the SCSI bus still has
termination at both ends of the bus and will continue to function
normally.
Figure 37 Disconnecting a Device from the SCSI Bus
"Y" Cable
Host Adapter
Differential

"Y" Cable

T
Differential
Bus

Differential
Device

Differential
Bus

"Y" Cable

T
Differential
Host Adapter
ZKOX392708RGS

Conguring TruCluster Available Server Hardware 39

Determining Available Server Hardware Components

Determining Available Server Hardware Components


Overview

The tables in this section provide lists of supported ASE hardware


components and hardware and rmware revisions (if applicable).

TruCluster
Available
Server
Supported
Systems

A TruCluster Available Server environment can consist of two to


four systems. Table 32 lists the systems supported by Available
Server, the type of I/O bus, and SCSI controllers that can be used
in each system.

Table 32 Available Server Supported System Types

System Type

Minimum
Firmware
Revision

AlphaServer 400 4/166, 4/233

4.5-0

I/O Bus

SCSI
Controller

PCI

KZPSA-BB

PCI

KZPSA-BB

AlphaServer 1000 4/200, 4/233, 4/266

4.5-69

AlphaServer 1000A 4/233, 4/266,


5/300

4.5-72

PCI

KZPSA-BB

AlphaServer 2000 4/200, 4/233, 4/275,


5/250, 5/300, 5/350

4.5-51

PCI

KZPSA-BB

AlphaServer 2100 4/200, 4/233, 4/275,


5/250, 5/300, 5/350

4.5-51

PCI

KZPSA-BB

AlphaServer 2100A 4/275

4.5-60

PCI

KZPSA-BB

AlphaServer 2100A 5/250, 5/300,


5/350

4.5-64

PCI

KZPSA-BB

DEC 3000 Models 300, 300L, 300X,


300LX, 400, 400S, 500, 500S, 500X,
600, 600S, 700, 800, 800S, and 900 1

6.7

TURBOchannel

PMAZC
KZTSA

AlphaServer 4000 5/300, 5/300E

2.0

PCI

KZPSA-BB

AlphaServer 4100 5/300, 5/300E

2.0

PCI

KZPSA-BB

DEC 7000 Models 6xx and 7xx

4.4

XMI

KZMSA

AlphaServer 8200/8400 5/300, 5/350

3.1

PCI

KSPSA

DEC 10000 Models 6xx and 7xx

4.4

XMI

KZMSA

1 Firmware
2 Firmware

disk Version 3.6


disk Version 3.7

DECsafe
Supported
SCSI
Controllers

Each system must have one or more SCSI controllers dedicated to


a shared SCSI bus in addition to any SCSI controller shipped with
the system (which is used for the internal SCSI bus). Table 33
lists the supported SCSI controllers along with rmware and
hardware revision information, as well as the transmission
method used by the controller.

310 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

Table 33 DECsafe-Supported SCSI Controllers

Controller

Minimum
Firmware
Revision

Minimum
Hardware
Revision

Number of
SCSI Ports
/Channels

Applicable System

Data Path

KZMSA

DEC 7000 or DEC


10000

Fast or Slow, Narrow,


Single-ended

5.6

F03

Two

KZPSA2

AlphaServers

Fast, Wide,
Differential

A10

F01

One

KZTSA

DEC 3000 series

Fast, Wide,
Differential

A09

F01

One

PMAZC1

DEC 3000 series

Fast or Slow, Narrow,


Single-ended

1.8

N/A

Two

1 If

you have a DECsafe conguration that includes a KZMSA XMI to SCSI adapter or a PMAZC TURBOchannel SCSI
controller, you can have only three systems in the conguration.

2 You
3 If

must have the Version 1.1 Firmware Update utility and the Version 1.0 Conguration Diagnostics utility.

you change the PMAZC rmware revision, the SCSI ID and bus speed may be reset to the default of 7 and slow.

BA350, BA353,
and BA356
Storage
Expansion
Units

Shared disks for an Available Server conguration must be


housed externally in storage expansion units (Table 34) or DEC
RAID subsystems (Table 35).
These storage subsystems house only single-ended disk drives;
you cannot use a tape drive on a shared SCSI bus in an Available
Server conguration.
The BA350, BA353 and BA356 storage expansion units have
an internal single-ended SCSI bus, and the bus length must be
considered when you compute the overall SCSI bus length.
Table 34 TruCluster Available Server Supported Storage Expansion
Units
Storage Box

Data Path

Internal SCSI Bus Length

BA350

Single-ended,
narrow

.9 meter

BA353

Single-ended,
narrow

.9 meter

BA356

Single-ended,
wide

1.0 meter

BA350

Up to seven narrow (8-bit) StorageWorks building blocks (SBB)


can be installed in the BA350, and their SCSI IDs are based upon
the slot they are installed in. For instance, a disk installed in
BA350 slot 0 has SCSI ID 0; a disk installed in BA350 slot 1 has
SCSI ID 1, and so forth.

Conguring TruCluster Available Server Hardware 311

Determining Available Server Hardware Components

The BA350 storage expansion unit contains internal SCSI bus


termination and a SCSI bus jumper. There are occasions when
the termination must be removed from the BA350 (daisy-chaining
two BA350s together). The jumper is not removed during normal
operation.
The BA350 can be set up for two-bus operation, but that option is
not very useful for a shared SCSI bus and is not covered in this
course.
Figure 38 BA350 SCSI Bus
JB1

JA1

0
T
1

4
J
5

POWER (7)
ZKOX392701RGS

BA353

The SCSI ID for disks installed in a BA353 is dened by device


address switches on the back of the BA353. The switches are
located to the left of the SCSI input and SCSI output connectors,
as shown in Figure 39.
The switches are marked as Left (Slot 1), Center (Slot 2), and
Right (Slot 3). Slot 1 is the left-most slot when the BA353 is
viewed from the front.
The On position of a switch generates a logic "1" in the device
address, and switch one is the least signicant bit in the device
address. The SCSI IDs shown in Figure 39 would be 0, 1, and 2,
left to right.

312 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

Figure 39 BA353 Device Address Switches and SCSI Input and


Output Connectors

on

3
Left

off
on

SCSI Output
.........................
.........................

3
Center

off
on

SCSI Input
.........................
.........................

3
Right

off
ZKOX392710RGS

The BA353 has an internal SCSI bus and internal SCSI bus
termination on the output end of the bus. If a cable (BN21H) is
connected to the output connector to connect two BA353 boxes
together, the termination is disabled. There are circumstances
that require the installation of a terminator on the BA353 SCSI
input connector.
BA356

The BA356, like the BA350, can hold up to seven StorageWorks


building blocks. But, unlike the BA350, these SBBs are wide
devices. Also, like the BA350, the SBB SCSI IDs are based upon
the slot they are installed in, but the switches on the personality
module must be set as follows; switches 4, 5, and 6 are On and
Switches 1, 2, 3, 7, and 8 are Off. These are the default switch
positions.
Figure 310 shows the relative location of the BA356 SCSI bus
jumper, BA35X-MF. The jumper is accessed from the rear of
the box. For Available Server operations, the jumper, J, should
always be installed in the normal position, behind slot 6. Note
that the SCSI bus jumper is not in the same position in the
BA356 as in the BA350.
Termination for the BA356 is on the personality module, and
is active unless a cable is installed on JB1 to daisy-chain two
BA356s together. In this case, when the cable is connected to
JB1, the personality module terminator is disabled.
Like the BA350, the BA356 can be set up for two-bus operation
by installing a SCSI bus terminator (BA35X-ME) in place of the
SCSI bus jumper. However, like the BA350, two-bus operation in
the BA356 is not very useful for an Available Server environment.
The position behind slot 1 can be used to store the SCSI bus
jumper.

Conguring TruCluster Available Server Hardware 313

Determining Available Server Hardware Components

Figure 310 shows the relative locations of the BA356 SCSI bus
jumper and the position for storing the SCSI bus jumper if you
do install the terminator. For Available Server operations, the
jumper, J, should always be installed.
Figure 310 BA356 Storage Shelf SCSI Bus
JB1

JA1

J/T

5
J
6

POWER (7)
ZKOX441811RGS

Note that JA1 and JB1 are located on the personality module
(in the top of the box when it is standing vertically). JB1, on
the front of the module, is visible. JA1 is on the left side of the
personality module as you face the front of the BA356, and is
hidden from the normal view.

Supported
Controllers
for DEC RAID
Subsystems

Table 35 lists the supported controllers for DEC RAID


subsystems.
Table 35 Supported DEC RAID Subsystems
Firmware
Revision

HSZ10

Data Path

N/A

Controller

Wide

HSZ40-Ax

2.0, 2.5 or later

Wide

HSZ40-Bx

2.5

Wide

HSZ40-Cx

2.5

Wide

1 The

HSZ10 must be used in a TruCluster Available Server environment that uses only
PMAZC SCSI controllers.
2 TruCluster Available Server Software supports dual-redundant HSZ40 congurations.
Both HSZ40 controllers must be on the same shared SCSI bus.

314 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

Supported Disk
Devices

Table 36 lists supported disk devices. The RZ series of singleended disks can be housed in the BA350, BA353, or BA356
storage expansion units or HSZ40 DEC RAID controllers.
Table 36 TruCluster Available Server Supported Disk Devices
Disk

Data Path

RZ26

T392 and 392A

Narrow

RZ26L

442D

Narrow

RZ26L

442E

Wide

RZ26N

0466 or later

Narrow

RZ26N

0568 or later

Wide

RZ28

442C

Narrow

RZ28

442E

Wide

RZ28B

0006 or later

Narrow

RZ28D

0008 or later

Narrow and wide

RZ28M

0466 or later

Narrow

RZ28M

0568 or later

Wide

RZ29B

Signal
Converters

Firmware Revision
Supported

0011 or later

Narrow

A signal converter is used to convert a single-ended SCSI bus


to a differential bus or a differential bus to a single-ended bus.
When used in a TruCluster Available Server environment, a
signal converter is used to convert a differential bus to singleended to allow the use of differential devices (KZPSA) with the
single-ended storage expansion units (BA350, BA353, and BA356).
The DWZZA and DWZZB signal converters are used in Available
Server congurations. The supported models are shown in
Table 37. The DWZZA is an 8-bit signal converter and the
DWZZB is a 16-bit signal converter.

Conguring TruCluster Available Server Hardware 315

Determining Available Server Hardware Components

Table 37 Supported Signal Converters


Signal

Hardware

Converter

Connectors

Revision

Single-ended

Differential

Comments

DWZZA-AA

E01 or later

50-pin low density

68-pin high
density

A standalone box requiring


connection to a power source.

DWZZA-VA1

F01 or later

StorageWorks
compatible
96-pin DIN
connector; plugs
into BA350 or
BA353 backplane
connector

68-pin high
density

Installed in a BA350 slot 02 , or any


slot in a BA353. Receives power
from the BA350 or BA353. Does not
take up a SCSI ID but its presence
prevents having a disk at SCSI ID 0
in a BA350 box.

DWZZB-AA3

A01 or later

68-pin high
density

68-pin high
density

A standalone box requiring


connection to a power source.

DWZZB-VW3

A01 or later

StorageWorks
compatible 96-pin
DIN connector;
plugs into
BA356 backplane
connector

68-pin high
density

Installed in slot 04 of a BA356.


Receives power from the BA356.
Does not take up a SCSI ID but its
presence prevents having a disk at
SCSI ID 0 in the BA356 box.

1 DWZZA-AAs

and DWZZA-VAs with serial numbers in the range of CX444xxxxx to CX449xxxxx must be upgraded. See
FCO DWZZA-AA-F002 or DWZZA-VA-F001.
2 If you plug a DWZZA-VA into any BA350 slot other than slot 0, you must install external terminator P/N 12-37004-04
into the JA1 connector and remove the DWZZA-VAs internal, single-ended termination.
3 The

DWZZB-series SCSI bus converters are SCSI-2 and draft SCSI-3 compliant single-ended SCSI to differential 16-bit
converters capable of data transfer rates of up to 20M bytes/s.

4 If

you plug a DWZZB-VW into any BA356 slot other than slot 0, you must install external terminator P/N 12-41768-02
(FR-PCXAR-WJ) into the JA1 connector and remove the DWZZA-VAs internal, single-ended termination.

DWZZA-AA

When you use a DWZZA-AA (standalone DWZZA) in an Available


Server conguration, you typically:

Remove the differential termination resistor SIPs. See


Figure 311.

Ensure that the single-ended SCSI-2 termination jumper, J2,


is installed.

Attach an H885-AA tri-link connector (or BN21W-0B "Y"


cable) to the differential connector to allow daisy-chaining the
differential bus or the installation of bus termination if the
DWZZA is attached to a controller or device on the end of the
SCSI bus.

Figure 311 shows the location of the DWZZA-AA single-ended


termination SCSI bus jumper, J2, and the differential terminator
resistor SIPs.
Caution

After removing the DWZZA-AA cover to remove


termination, when you replace the cover, ensure that the
star washers are in place on all four screws that hold

316 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

the cover in place. If the star washers are missing, the


DWZZA-AA is susceptible to problems caused by excessive
noise.

Figure 311 DWZZA-AA Signal Converter SCSI Bus Termination


SingleEnded
SCSI2 Termination

Differential
Terminator
Resistor SIPs

J2
50Pin
LowDensity
SingleEnded
Connector

68Pin
HighDensity
Differential
Connector

ZKOX441819RGS

DWZZA-VA

When you use a DWZZA-VA (Figure 312) in a BA350 storage


expansion unit:

Remove the DWZZA differential terminator resistor SIPs.

Ensure that the DWZZA single-ended SCSI-2 termination


jumper, J2, is installed to provide single-ended termination.

Attach a BN21W-0B "Y" cable (or H885-AA tri-link connector)


to the differential connector to allow daisy-chaining the
differential bus or bus termination if the BA350 is on the end
of the SCSI bus.

Install the DWZZA-VA in slot 0 of the BA350.

If you use the DWZZA-VA in a BA353 storage expansion unit:

Remove the differential termination resistor SIPs.

Remove the single-ended SCSI termination jumper, J2.

Attach a BN21W-0B "Y" cable (or H885-AA tri-link connector)


to the DWZZA differential connector to allow daisy-chaining
the differential bus.

Install the DWZZA-VA in any BA353 slot.

Install terminator part number 12-37004-04 on the BA353


input connector to terminate the input end of the BA353
single-ended SCSI bus.

Conguring TruCluster Available Server Hardware 317

Determining Available Server Hardware Components

Note

The BA353 single-ended bus extends from the input


connector to the output connector. It has an internal active
terminator on the output end of the bus. The termination
is disabled if a cable is attached to the output connector to
connect one BA353 to another storage box.
The DWZZA-VA and any disks installed in the BA353 are
in the middle of the BA353 SCSI bus. As the bus must be
terminated on both ends, you must install a terminator on
the SCSI input connector. If the DWZZA-VA single-ended
termination is enabled, you create a stub between the
DWZZA-VA and the input connector and the SCSI bus will
not operate properly.

Figure 312 DWZZA-VA Signal Converter SCSI Bus Termination


SingleEnded
SCSI2 Termination

Differential
Terminator
Resistor SIPs

J2
96Pin
SingleEnded
Connector

68Pin
HighDensity
Differential
Connector

ZKOX441820RGS

DWZZB-AA

The DWZZB-AA is a 16-bit, single-ended SCSI to differential


converter. It can be connected to either a 16-bit SSB shelf
personality module (BA356-SB) or an SSB shelf SCSI bus. It is
used with the BA356 storage expansion unit to support the use of
wide disks such as the RZ26L-VW. It is a standalone unit and has
its own power supply.
When you use a DWZZB-AA (Figure 313) in an Available Server
conguration you typically:

Remove the differential termination resistor SIPs.

Ensure that the single-ended SCSI-2 termination jumpers, W1


and W2, are installed to provide single-ended termination.

318 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

Attach an H885-AA tri-link connector (or BN21W-0B "Y"


cable) to the differential connector to allow daisy-chaining the
differential bus or the installation of bus termination if the
DWZZB is attached to a controller or device on the end of the
SCSI bus.

Figure 313 shows the location of the DWZZB-AA single-ended


termination SCSI bus jumpers, W1 and W2, and the differential
terminator resistor SIPs.
Figure 313 DWZZB-AA Signal Converter SCSI Bus Termination

68Pin
SingleEnded
Connector

Differential
Terminator
Resistor SIPs

68Pin
Differential
Connector

W1
W2
SingleEnded
SCSI2 Termination

ZKOX441821RGS

DWZZB-VW

The DWZZB-VW is a single-ended SCSI to differential 16-bit


converter that you plug into slot 0 of a BA356-SB storage
expansion unit for use with wide disks such as the RZ26L-VW.
When you use the DWZZB-VW (Figure 314) in the BA356
storage expansion unit you typically:

Remove the DWZZB-VW differential terminator resistor SIPS.

Ensure that the DWZZB-VW single-ended SCSI-2 termination


jumpers, W1 and W2, are installed to provide single-ended
termination.

Attach a BN21W-0B "Y" cable (or H885-AA tri-link connector)


to the differential connector to allow daisy-chaining the
differential bus or bus termination if the BA356 is on the end
of the SCSI bus.

Install the DWZZB-VW in slot 0 of the BA356.

Conguring TruCluster Available Server Hardware 319

Determining Available Server Hardware Components

Figure 314 DWZZB-VW Signal Converter SCSI Bus Termination

Differential
Terminator
Resistor SIPs
96Pin
SingleEnded
Connector
W1
W2
SingleEnded
SCSI2 Termination

68Pin
Differential
Connector

ZKOX441822RGS

SCSI
Cables and
Terminators
for Available
Server
Congurations

You must know the proper cable to use for a particular connection.
Always check the part number on the cable before you make the
connection.
One of the most critical aspects of cabling an Available Server
conguration is providing proper termination for each segment of
the SCSI bus.
There are congurations that require the use of a special
connector that provides for the connection of both a cable and a
terminator.
Table 38 describes cables supported for use in an Available
Server conguration.

320 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

Table 38 Cables Used for Available Server Congurations


Cable

Cable Connectors

Use

BC06P

Two 50-pin, low density.

Connect one leg of a BN21V-0B "Y" cable to one leg of


another BN21V-0B "Y" cable (for instance, to connect two
PMAZC SCSI controllers together).

50Pin
Low Density

BC06P Series

50Pin
Low Density

ZKOX392722RGS

BN21H/BN21J

Two 50-pin, high density.


The BN21J has a rightangle connector.

Connect two BA350 or BA353 storage expansion units


together.

50Pin
High Density

BN21H Series

50Pin
High Density

ZKOX392712RGS

BN21K,
BN21L

Two 68-pin, high density.


The BN21K has one rightangle connector. The
BN21L has two right-angle
connectors.

Connect differential adapters, for instance the KZPSA


or KZTSA, to the differential side of a DWZZA. Connect
one side of a BN21W-0B "Y" cable or an H885-AA tri-link
connector to another BN21W-0B "Y" cable or H885-AA
tri-link connector. Connect a BA356 to the single-ended
end of a DWZZB-AA, or connect two BA356s together.
BN21K Series
68Pin
High Density
68Pin
High Density

ZKOX392723RGS

BN21R
/BN23G

Two 50-pin, one high


density, one low density.

Connect one leg of a BN21V-0B "Y" cable (for instance


on a PMAZC) to a BA350 or BA353. Connect a singleended device (KZMSA, PMAZC, BA350, or BA353) to the
single-ended side of a DWZZA-AA.
BN21R Series
50Pin
High Density
50Pin
Low Density

ZKOX392724RGS

(continued on next page)

Conguring TruCluster Available Server Hardware 321

Determining Available Server Hardware Components

Table 38 (Cont.) Cables Used for Available Server Congurations


Cable

Cable Connectors

Use

BN21V-0B "Y"
Cable

Three 50-pin, one high


density, two low density.

Attach the high-density connector to a PMAZC or KZMSA


port. Attach a terminator to one low-density connector
of the "Y" cable, and a cable (BN21R) between the "Y"
connector and the storage box or to one leg of another "Y"
connector (BC06P). A "Y" cable allows the system to be
powered down without affecting SCSI bus termination.
BN21V0B
"Y" Cable
50Pin
High Density
50Pin
Low Density

ZKOX392725RGS

BN21W-0B
"Y" Cable

Three 68-pin, high density.

Attach to a differential device. Attach an H879-AA


terminator to one leg of the "Y" cable. Connect a BN21K
cable between this "Y" cable and another BN21W-0B "Y"
cable (or a tri-link connector) attached to the differential
side of a DWZZA or another differential SCSI controller.
The BN21W-0B "Y" cable is equivalent to the H885-AA
tri-link connector. The BN21W-0B "Y" cable enables you
to disconnect a system from the shared SCSI bus without
affecting bus termination.
BN21W0B
"Y" Cable
68Pin
High Density
68Pin
High Density

ZKOX392726RGS

The terminators and connectors supported for Available Server


congurations are shown in Table 39. Note that the tri-link
connector has the same function as a "Y" cable.

322 Conguring TruCluster Available Server Hardware

Determining Available Server Hardware Components

Table 39 Terminators and Special Connectors


Terminator or
Connector

Number of
Pins

Density

Use

H8574-A or
H8860-AA

50

Low

With a BN21V-0B "Y" cable to terminate a single-ended SCSI


bus.

H879-AA

68

High

With an H885-AA tri-link connector or BN21W-0B "Y" cable


to terminate a differential SCSI bus.

12-37004-04

50

High

Installed on the BA353 input connector when a DWZZA-VA


is installed in the BA353 to terminate the input end of the
BA353 single-ended SCSI bus.

H885-AA trilink connector

Three
connectors,
each with 68
pins

High

Attaches to a wide device. Attach an H879-AA terminator


to one jack of the tri-link connector. Connect a BN21K cable
between this tri-link connector and another tri-link connector
(or a BN21W-0B "Y" cable) attached to the differential side of
a DWZZA or another differential SCSI controller. The tri-link
connector enables you to disconnect a system from the shared
SCSI bus without affecting bus termination. The H885-AA
tri-link connector is equivalent to the BN21W-0B "Y" cable.
H885AA
Trilink Connector

Rear View

Front View
ZKOX392727RGS

Note

Care must be used with the H885-AA tri-link connectors


on AlphaServers due to the spacing between modules.

Network
Options

The systems in the Available Server environment can be


congured using multiple network adapters. This provides better
availability of Available Server functionality by providing failover
between network adapters, thus routing connections between
Available Server members around network path failures.
Multiple network adapters also provide greater exibility in client
access to Available Server services.
Table 310 provides a list of supported network adapters for
TruCluster Available Server Software Version 1.4. Note that
TruCluster Available Server supports only Ethernet and FDDI
network adapters.

Conguring TruCluster Available Server Hardware 323

Determining Available Server Hardware Components

Table 310 Supported Network Adapters


Adapter

I/O Bus Type

Network Type

DEFEA

EISA

FDDI

DE422

EISA

Lance Ethernet

DE425

EISA

Ethernet

DEFPA

PCI

FDDI

DE4351

PCI

Ethernet

DE500-XB

PCI

Fast Ethernet

DEMFA

XMI

FDDI

DEMNA

XMI

Ethernet

PMAD

TURBOchannel

Ethernet

DEFTA

TURBOchannel

FDDI

DEFZA

TURBOchannel

FDDI

1 Occasional

system failures have been seen when LAT functionality is enabled on


AlphaServer 8200 and 8400 systems which have DE435 PCI/Ethernet network adapters.
Refer to the release notes for further information.

324 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware


Overview

This section describes the steps necessary to set up TruCluster


Available Server hardware. The steps refer to a specic section
and table. Perform the steps in the table in order, referring to the
gures and examples as necessary.
You must answer the following questions to generate an Available
Server conguration and order the correct hardware for the
conguration.

What type of controllers do you have?

What are the constraints for the placement of the systems and
storage devices? In other words, how long is the shared SCSI
bus? If you are using PMAZCs, do you have to run slow SCSI?

Are you using single-ended or differential SCSI, or both?

For greater availability of mirrored volumes, will you run two


SCSI buses with separate SCSI controllers?

Will you use a BA350, BA353, or BA356 storage expansion


unit, or an HSZ10 or HSZ40 with DEC RAID subsystem?

Installing
the Network
Interfaces

How many systems are there in the Available Server


conguration (two to four)? Keep in mind, you can have
only eight devices on the shared SCSI bus. If you have three
systems, and are using a BA350 storage expansion unit, you
can have only ve disks in the storage shelf (or storage shelves
if you daisy-chain more than one box).
If you have a DWZZA-VA in slot 0 of the BA350, you cannot
have a disk at SCSI ID 0.

Will the Available Server conguration utilize multiple


network adapters?

TruCluster Available Server Software Version 1.4 supports


multiple network adapters which may be used both for client
access to Available Server services and for Available Server
daemon-to-daemon communications.
If the systems did not arrive with the multiple network adapters
installed, install the adapter using the documentation included
with the adapter, then congure the new network interface with
netsetup.
All systems in an Available Server Environment (ASE) must be on
the same primary and backup networks. The network interface
names for all the common networks must be in each systems
/etc/host le.

Conguring TruCluster Available Server Hardware 325

Conguring TruCluster Available Server Hardware

Firmware
Update

When you install a SCSI adapter on a system which will be a


member of an ASE, you must check the SCSI adapter rmware.
It may be out of date. Also, the update procedures may have
changed since this course was written. Therefore, the release
notes for the applicable system/SCSI adapter should be obtained
and used to update the rmware.
Note

To obtain the rmware release notes from the Firmware


Update Utility CDROM, your kernel must be congured
for the ISO 9660 Compact Disk File System (CDFS).

To obtain the release notes for the rmware update:

At the console prompt, determine the drive number of the


CDROM.

Boot the Digital UNIX operating system.

Log in as root.

Place the Alpha Systems Firmware Update CDROM into the


drive.

Mount the CDROM as follows (/dev/rz4c is used as an


example CDROM drive).
# mount -rt cdfs -o noversion /dev/rz4c /mnt

Copy the appropriate release notes to your system disk. In


the example, we obtain the rmware release notes for the
AlphaServer 8200/8400.

# cp "/mnt/cdrom/doc/alpha_2100_v45_relnote.txt" alpha_2100_v45_relnote.txt

Unmount the CDROM drive.


# umount /mnt

Starting Your
TruCluster
Available
Server
Conguration

Print the release notes.

This section describes the steps necessary to set up the Available


Server hardware. The assumption is made that the correct
hardware has already been purchased, but you can use the
information provided to determine the hardware components you
need for any conguration.
Before you start, verify that the system or systems, controllers,
and so on are supported by the version of TruCluster Available
Server you are planning to install. Refer to Table 32 through
Table 37.
When you are ready to begin your conguration, look for the
SCSI controller/storage device combination in the left column of
Table 311 and refer to the section and table shown in the right
column.

326 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 311 Setting Up TruCluster Available Server Congurations


If your controller is a:

Refer to:

PMAZC in a single-ended Available Server


conguration with a BA350 or BA353 storage
expansion unit

Section: Setting Up a Single-Ended Available


Server Conguration for Use with PMAZCs and a
BA350 or BA353 and Table 312

PMAZC in a differential Available Server


conguration with a BA350 or BA353

Section: Setting Up a Differential Available Server


Conguration for Use with PMAZCs and a BA350
or BA353 and Table 314

PMAZC in a differential Available Server


conguration with a BA356

Section: Setting Up a Differential Available Server


Conguration for Use with PMAZCs and a BA356
and Table 316

PMAZC in a differential Available Server


conguration with an HSZ10 or HSZ40 and DEC
RAID subsystem

Section: Setting Up an Available Server


Conguration for Use with PMAZCs and an
HSZ40 and Table 318

KZTSA in an Available Server conguration with a


BA350, BA353, or BA356

Section: Setting Up an Available Server


Conguration Using a KZTSA TURBOchannel
to SCSI Adapter and Table 320

KZTSA in an Available Server conguration with an


HSZ40

Section: Setting Up an Available Server


Conguration Using a KZTSA TURBOchannel
to SCSI Adapter and Table 322

KZMSA in an Available Server conguration with a


BA350, BA353, or BA356

Section: Setting Up an Available Server


Conguration with KZMSA SCSI Controllers
and Table 325

KZMSA in an Available Server conguration with a


HSZ40

Section: Setting Up an Available Server


Conguration with KZMSA SCSI Controllers
and Table 326

KZPSA in an Available Server conguration with a


BA350, BA353, or BA356

Section: Setting Up an Available Server


Conguration Using KZPSA PCI to SCSI Adapters
and Table 327

KZPSA in an Available Server conguration with an


HSZ40

Section: Setting Up an Available Server


Conguration Using KZPSA PCI to SCSI Adapters
and Table 328

Mixed congurations with single-ended adapters and


differential adapters with a BA350, BA353, or BA356

Section: Setting Up an Available Server


Conguration with Mixed Adapter Types and a
BA350, BA353, or BA356 and Table 329

Mixed congurations with single-ended adapters and


differential adapters with an HSZ40

Section: Setting Up an Available Server


Conguration with Mixed Adapter Types and
an HSZ40 and Table 331

Setting Up a
Single-Ended
Available
Server
Conguration
for Use with
PMAZCs and
a BA350 or
BA353

Note

If you have an Available Server conguration that includes


a PMAZC SCSI controller, you can have only three systems
in the conguration.

Conguring TruCluster Available Server Hardware 327

Conguring TruCluster Available Server Hardware

Table 312 covers the steps necessary to congure PMAZC SCSI


controllers in a single-ended Available Server conguration using
a BA350 or BA353. In this conguration there is no differential
SCSI bus, so there are no DWZZAs.
Figure 315 shows two DEC 3000 Model 500 systems with
PMAZC TURBOchannel SCSI controllers in an Available Server
conguration with a BA350 storage expansion unit.
Figure 316 shows two DEC 3000 Model 500 systems with each
PMAZC TURBOchannel SCSI controller channel being used for
a shared bus in an Available Server conguration with a BA350
storage expansion unit.
Table 313 lists the hardware necessary to generate
congurations using two or three systems with PMAZC
TURBOchannel SCSI controllers and a BA350 or BA353 storage
expansion unit.
Table 312 PMAZC SCSI Controllers and a BA350 or BA353 with Single-Ended Available Server
Conguration
Step

Action

Refer to:

For each system using a PMAZC on the shared bus, shut down the
system and install the PMAZC.

Dual SCSI Module


(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility to


set the PMAZC SCSI ID, bus speed, or to update the rmware.

Section: PMAZC Dual


SCSI Module Jumpers and
Figure 320

Turn on system power and set the PMAZC SCSI ID and speed if
necessary.

Section: Verifying and


Setting PMAZC and
KZTSA SCSI ID and Bus
Speed; Example 31, 32,
and 34

If rmware has to be updated, boot from the Alpha Systems


Firmware Update CDROM.

Refer to the rmware


release notes for the system

Turn off system power and remove jumper W1. Store it on an empty
jumper rest.

Figure 320

Disable the PMAZC internal termination by removing the jumper for


the appropriate port (W2 (port A) and W3 (port B)).
5

Install a BN21V-0B "Y" cable on the appropriate PMAZC port for


each PMAZC.

For any system containing a PMAZC on either end of the shared bus,
attach an H8574-A terminator to one leg of the "Y" cable.

Connect adjacent PMAZC controllers together by installing a BC06P


cable between the BN21V-0B "Y" cables.
(continued on next page)

328 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 312 (Cont.) PMAZC SCSI Controllers and a BA350 or BA353 with Single-Ended
Available Server Conguration
Step
8

Action

Refer to:

If the BA350 or BA353 is on one end of the shared bus:

Figure 38 (BA350)
Figure 39 (BA353)

BA350 Ensure that the BA350 SCSI terminator is installed (behind


slot 1)
Install a BN21R cable between the "Y" cable on the PMAZC
adjacent the BA350 and BA350 input connector JA1. You
might have to remove the disks in slots 0 and 1 to provide
room to attach the cable. Note that for normal single bus
operation the BA350 bus jumper (behind slot 5) must be
installed.
BA353 Install a BN21R cable between the "Y" cable on the PMAZC
adjacent the BA353 and the BA353 SCSI input connector
9

If the BA350 or BA353 is in the middle of the shared bus:


BA350 Remove the BA350 SCSI bus terminator (behind slot 1).
Note that the BA350 bus jumper (behind slot 5) must be
installed.

Figure 38 (BA350)
Figure 39 (BA353)

Install a BN21R cable from each of the two adjacent PMAZC


"Y" cables to BA350 SCSI connectors JA1 and JA2.
BA353 Install a BN21R cable from each of the two adjacent PMAZC
"Y" cables to the BA353 SCSI input and SCSI output
connectors

Conguring TruCluster Available Server Hardware 329

Conguring TruCluster Available Server Hardware

Figure 315 Available Server Conguration with Two DEC 3000 Model
500 Systems and a Single-Ended Shared Bus with a
BA350
Network Interface
DEC 3000 Model 500

DEC 3000 Model 500

T
3

0
1
2

PMAZC

PMAZC

BA350
H8574A BN21V0B
Terminator "Y" Cable

BN21R or
BN23G
Cable

BN21V0B
"Y" Cable

H8574A
Terminator
ZKOX392711RGS

Figure 316 Available Server Conguration with Two DEC 3000 Model
500 Systems and Two Single-Ended Shared Buses Each
with a BA350
Network Interface
H8574A
Terminator

BN21R or
BN23G
Cable

BN21R or
BN23G
Cable

BN21V0B
"Y" Cable

PMAZC

4
1

DEC 3000 Model 500

BA350

BN21V0B
"Y" Cable

PMAZC

BA350

H8574A
Terminator

DEC 3000 Model 500


ZKOX392721RGS

Table 313 provides a list of hardware needed for single-ended


Available Server congurations using PMAZC SCSI bus
controllers and BA350 or BA353 storage expansion units. As
there is no differential SCSI bus, there are no DWZZAs.

330 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 313 Hardware Needed for a Single-Ended Available Server Conguration with PMAZC
SCSI Controllers and a BA350 or BA353 (No DWZZAs)
BN21R or BN23G
SCSI Cable

H8574-A Terminator
2

BC06P SCSI Cable

BA350 or
BA353 in
Middle of
Bus

BA350 or
BA353 on
Bus End

BA350 or
BA353 in
Middle of
Bus

BA350 or
BA353 on
Bus End

BA3502 or
BA353 in
Middle of
Bus

Number of
Systems 1

BN21V-0B
"Y" Cables

BA350 or
BA353 on
Bus End

1 You

can have only three systems in an Available Server conguration that include a PMAZC SCSI controller.

2 BA350

internal terminator must be removed.

Setting Up a
Differential
Available
Server
Conguration
for Use with
PMAZCs and
a BA350 or
BA353

Note

If you have an Available Server conguration that includes


a PMAZC SCSI controller, you can have a maximum of
three systems in the conguration.

Use Table 314 to set up PMAZC SCSI controllers in a differential


shared SCSI Available Server conguration with a BA350 or
BA353.
Figure 317 shows three DEC 3000 Model 500 systems with
PMAZC TURBOchannel SCSI controllers in a differential
Available Server conguration with a BA350 storage expansion
unit.
Table 315 lists the hardware necessary to generate
congurations using two or three systems which use PMAZC
TURBOchannel SCSI controllers, DWZZAs, and a BA350 or
BA353 storage expansion unit.

Table 314 Setting Up an Available Server Conguration with PMAZC SCSI Controllers and a
BA350 or BA353 in a Differential Available Server Conguration
Step

Action

Refer to:

For each system using a PMAZC on the shared bus, shut down the
system and install the PMAZC.

Dual SCSI Module


(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility to set


the PMAZC SCSI ID, bus speed, or to update the rmware.

Figure 320

Turn on system power and set the SCSI ID and speed as necessary.

Example 31,
Example 32, and
Example 34

(continued on next page)

Conguring TruCluster Available Server Hardware 331

Conguring TruCluster Available Server Hardware

Table 314 (Cont.) Setting Up an Available Server Conguration with PMAZC SCSI Controllers
and a BA350 or BA353 in a Differential Available Server Conguration
Step

Action

Refer to:

If rmware has to be updated, boot from the Alpha Systems Firmware


Update CDROM.

Refer to the rmware


release notes for the
system

Turn off system power and remove jumper W1. Store it on an empty
jumper rest.

Figure 320

Ensure that the appropriate PMAZC termination jumpers (W2 (Port A)


or W3 (Port B)) are installed to provide termination for one end of each
single-ended SCSI bus.
5

You will need one DWZZA-AA for each system with a PMAZC SCSI
controller and one DWZZA for the BA350 or BA353. You can use a
DWZZA-VA for the BA350 or BA353.
For each DWZZA, remove the ve differential terminator resistor SIPs.

Figure 311

For each DWZZA-AA, ensure that the single-ended SCSI termination


jumper, J2, is installed to provide termination for that end of the singleended SCSI bus segment.

Figure 311

For a DWZZA-VA that is to be installed in a:

Figure 39

BA350 Ensure that the DWZZA-VA single-ended SCSI termination


jumper, J2, is installed
BA353 Remove the DWZZA-VA single-ended SCSI termination jumper,
J2
Install terminator 12-37004-04 on the BA353 SCSI input
connector
6

Connect a BN21W-0B "Y" cable or an H885-AA tri-link connector to the


68-pin differential connector on each DWZZA.

Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cable


or one side of the H885-AA tri-link connector for the two DWZZAs on the
end of the differential SCSI bus.

Connect a BN21R or BN23G cable between each PMAZC and the singleended connector of a DWZZA-AA.

If you are using a DWZZA-AA to connect to the BA350 or BA353, connect


the DWZZA-AA to the BA350 or BA353 by installing a BN21R or BN23G
cable between the DWZZA-AA single-ended connector and JA1 on the
BA350 or the BA353 SCSI input connector.
If you are using a DWZZA-VA, install it in slot 0 of the BA350.
Remember, you no longer have SCSI ID 0 available for a disk in the
BA350.

Figure 38

Install the DWZZA-VA in any slot in a BA353. Verify that terminator


12-37004-04 has been installed on the BA353 SCSI input connector.

Figure 39

You will need one less BN21R (or BN23G) cable if you are using a
DWZZA-VA.
(continued on next page)

332 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 314 (Cont.) Setting Up an Available Server Conguration with PMAZC SCSI Controllers
and a BA350 or BA353 in a Differential Available Server Conguration
Step

Action

Refer to:

10

Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or H885AA tri-link connector) of all the DWZZAs. Start at the "Y" connector
or H885-AA tri-link on one end of the differential bus (one with a
terminator) and daisy-chain until you reach the "Y" connector or trilink with the other terminator. The number of these cables will be
the same as the number of PMAZC controllers in the Available Server
conguration.

11

Ensure that the BA350 terminator and SCSI jumper are both installed.

Figure 38

Figure 317 Available Server Conguration with Three DEC 3000


Model 500 Systems with PMAZCs, Differential Shared
Bus, and a BA350
Network Interface
DWZZAVA with H885AA
Trilink Connector

PMAZC

PMAZC

PMAZC

BN21R
or BN23G
Cable

2
3
4

BN21R
or BN23G
Cable
BA350

BN21K or BN21L
SCSI Differential
Cable
H885AA Trilink Connector
with H879AA Terminator

ZKOX392713RGS

Table 315 provides a list of hardware needed for differential


Available Server congurations using PMAZC SCSI bus
controllers or KZMSA XMI to SCSI bus adapters and BA350 or
BA353 storage expansion units. DWZZA signal converters are
used, one of which may be a DWZZA-VA. If a DWZZA-VA is used,
it must be installed in slot 0 of the BA350.

Conguring TruCluster Available Server Hardware 333

Conguring TruCluster Available Server Hardware

Table 315 Hardware Needed for a Differential Available Server Conguration with PMAZC or
KZMSA SCSI Controllers and a BA350 or BA353
BN21R or BN23G
SCSI Cables

Number of
Systems 1

DWZZA-AA
or
DWZZA-VA 2

BN21W-0B "Y"
Cables or
H855-AA Tri-link
Connectors

H789-AA
Terminators

BN21K
or
BN21L
Cables

Using a
DWZZA-VA3

No
DWZZA-VA

1 You

can have only three systems in an Available Server conguration that include a PMAZC SCSI controller or KZMSA
XMI to SCSI adapter.

2 One

of the DWZZAs can be a DWZZA-VA. The rest are DWZZA-AA.


you use a DWZZA-VA and a BA353 storage expansion unit, you must install terminator 12-37004-04 on the BA353
SCSI input connector.

3 If

334 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Setting Up a
Differential
Available
Server
Conguration
for Use with
PMAZCs and a
BA356

Note

If you have an Available Server conguration that includes


a PMAZC SCSI controller, you can have only three systems
in the conguration.

Use Table 316 to set up PMAZC SCSI controllers in a differential


shared SCSI Available Server conguration with a BA356. Note
that there has to be a signal converter for each system and the
BA356. You must use a DWZZA-AA for each PMAZC, but can use
either a DWZZA-VA, DWZZB-VW, or DWZZB-AA at the BA356.
Use of a DWZZB-VW is recommended to allow conversion to wide
SCSI. This section covers only the use of a DWZZB-VW with the
BA356.
Figure 318 shows three DEC 3000 Model 500 systems with
PMAZC TURBOchannel SCSI controllers in a differential
Available Server conguration with a BA356 storage expansion
unit.
Table 317 lists the hardware necessary to generate
congurations using two or three systems which use PMAZC
TURBOchannel SCSI controllers, DWZZAs, and a BA356 storage
expansion unit.

Table 316 Setting Up an Available Server Conguration with PMAZC SCSI Controllers and a
BA356 in a Differential Available Server Conguration
Step

Action

Refer to:

For each system using a PMAZC on the shared bus, shut down the
system and install the PMAZC.

Dual SCSI Module


(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility to set


the PMAZC SCSI ID, bus speed, or to update the rmware.

Figure 320

Turn on system power and set the SCSI ID and speed as necessary.

Example 31,
Example 32, and
Example 34

If rmware has to be updated, boot from the Alpha Systems Firmware


Update CDROM.

Refer to the rmware


release notes for the
system.

Turn off system power and remove jumper W1. Store it on an empty
jumper rest.

Figure 320

Ensure that the appropriate PMAZC termination jumpers (W2 (Port A)


or W3 (Port B)) are installed to provide termination for one end of each
single-ended SCSI bus.
5

You will need one DWZZA-AA for each PMAZC on the shared SCSI bus
and a DWZZB for the BA356. You can use a DWZZB-AA, or DWZZB-VW
on the BA356 end of the shared SCSI bus.
(continued on next page)

Conguring TruCluster Available Server Hardware 335

Conguring TruCluster Available Server Hardware

Table 316 (Cont.) Setting Up an Available Server Conguration with PMAZC SCSI Controllers
and a BA356 in a Differential Available Server Conguration
Step

Action

Refer to:

For each DWZZB, remove the ve differential terminator resistor SIPs.

Figure 313 or
Figure 314

For each DWZZA-AA, ensure that the single-ended SCSI termination


jumper, J2, is installed to provide termination for the single-ended SCSI
bus segment.

Figure 311

If a DWZZB-AA is to be used (external to the BA356) ensure that the


single-ended SCSI termination jumpers, W1 and W2, are installed.

Figure 313

If a DWZZB-VW is to be installed in a BA356 (slot 0)ensure that the


single-ended SCSI termination jumpers, W1 and W2, are installed.

Figure 314

Connect a BN21W-0B "Y" cable or an H885-AA tri-link connector to the


68-pin differential connector on each DWZZA or DWZZB.

Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cable


or one side of the H885-AA tri-link connector for the two DWZZ*s on the
end of the differential SCSI bus.

Connect a BN21R or BN23G cable between each PMAZC and the singleended connector of a DWZZA-AA.

If you are using a DWZZB-VW, install it in slot 0 of the BA356.


Remember, you no longer have SCSI ID 0 available for a disk in the
BA350.

10

Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or H885AA tri-link connector) of all the DWZZ*s. Start at the "Y" connector
or H885-AA tri-link on one end of the differential bus (one with a
terminator) and daisy-chain until you reach the "Y" connector or trilink with the other terminator. The number of these cables will be
the same as the number of PMAZC controllers in the Available Server
conguration.

11

If you are using a DWZZB-AA, connect the DWZZB-AA to BA356 JA1


with a BN21K or BN21L cable.

336 Conguring TruCluster Available Server Hardware

Figure 310

Conguring TruCluster Available Server Hardware

Figure 318 Available Server Conguration with Three DEC 3000


Model 500 Systems with PMAZCs, Differential Shared
Bus, and a BA356
Network Interface
DWZZBVA with H885AA
Trilink Connector

PMAZC

PMAZC

PMAZC

BN21R
or BN23G
Cable

2
3
4

BN21R
or BN23G
Cable
BA356

BN21K or BN21L
SCSI Differential
Cable

DWZZAAA

T
DWZZAAA

DWZZAAA

H885AA Trilink Connector


with H879AA Terminator

ZKOX548102RGS

Table 317 provides a list of hardware needed for differential


Available Server congurations using PMAZC SCSI bus
controllers or KZMSA XMI to SCSI bus adapters and BA356.
DWZZA-AA, DWZZB-AA, or DWZZB-VW signal converters are
used. If a DWZZB-VW is used, it must be installed in slot 0 of the
BA350.
Table 317 Hardware Needed for a Differential Available Server Conguration with PMAZC or
KZMSA SCSI Controllers and a BA356
BN21K or BN21L
SCSI Cables
BN21W-0B "Y"
Cables or
H855-AA Tri-link
Connectors

H789-AA
Terminators

BN21R
or
BN23G
SCSI
Cables

Using a
DWZZB-AA

No
DWZZB-AA

Number of
Systems 1

DWZZ*

1 You

can have only three systems in an Available Server conguration that include a PMAZC SCSI controller or KZMSA
XMI to SCSI adapter.

2 There

must be one DWZZA-AA for each system in the Available Server conguration. The other DWZZ* may be a
DWZZB-AA or DWZZB-VW.
3 If you do not use a DWZZB-AA, install the DWZZB-VW in BA356 slot 0.

Conguring TruCluster Available Server Hardware 337

Conguring TruCluster Available Server Hardware

Setting Up
an Available
Server
Conguration
for Use with
PMAZCs and
an HSZ40

Note

You can have only three systems in an Available Server


conguration that include a PMAZC SCSI controller.

Use Table 318 to set up PMAZC SCSI controllers in an Available


Server conguration with an HSZ10 or HSZ40. Note that the
HSZ10 and HSZ40 are differential devices so the use of DWZZAs
and a differential SCSI bus is required.
Figure 319 shows two DEC 3000 Model 500 systems with
PMAZC TURBOchannel SCSI controllers in a differential
Available Server conguration with an HSZ40 and a DEC RAID
subsystem.
Table 319 lists the hardware necessary to generate
congurations using two or three systems that use PMAZC
TURBOchannel SCSI controllers (or KZMSA XMI SCSI adapters),
DWZZAs, and an HSZ10 or HSZ40.

338 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 318 Setting Up a Available Server Conguration with PMAZC SCSI Controllers and an
HSZ10 or HSZ40
Step

Action

Refer to:

For each system using a PMAZC on the shared bus, shut down the
system and install the PMAZC.

Dual SCSI Module


(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility to


set the PMAZC SCSI ID, bus speed, or to update the rmware.

Figure 320

Turn on system power and set the SCSI ID and speed as necessary.

Example 31, 32, and


34

If rmware has to be updated, boot from the Alpha Systems Firmware


Update CDROM.

Refer to the rmware


release notes for the
system.

Turn off system power and remove jumper W1. Store it on an empty
jumper rest.

Figure 320

Ensure that the appropriate PMAZC termination jumpers (W2 or W3)


are installed to provide termination for one end of each single-ended
SCSI bus.
5

You will need one DWZZA-AA for each PMAZC SCSI controller.

Figure 311

For each DWZZA-AA, ensure that the single-ended SCSI termination


jumper, J2, is installed to provide termination for that end of the
single-ended SCSI bus segment.
Remove the ve differential terminator resistor SIPs from each
DWZZA.
6

Connect a BN21W-0B "Y" cable or a H885-AA tri-link connector to


input of the HSZ10 or HSZ40 and to the 68-pin differential connector
on each DWZZA.

Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cable


or one side of the H885-AA tri-link connector for the host adapters (or
the HSZ10 or HSZ40) that will be on the ends of the shared bus.

Connect a BN21R or BN23G cable between each PMAZC and the


single-ended connector of a DWZZA-AA.

Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or


H885-AA tri-link connector) of the DWZZA-AAs and HSZ10 or HSZ40.
The number of these cables will be the same as the number of PMAZC
controllers in the Available Server conguration. Make sure that you
create a daisy-chain while keeping the terminators on both ends of the
shared bus.

Conguring TruCluster Available Server Hardware 339

Conguring TruCluster Available Server Hardware

Figure 319 Available Server Conguration with Two DEC 3000 Model
500 Systems with PMAZC SCSI Controllers and an
HSZ40

Network Interface
DEC 3000 Model 500

PMAZC
4

PMAZC
4

BN21R or
BN23G
Cable

H885AA
Trilink
Connector
with
H879AA
Terminator

HSZ40 and DEC


RAID Subsystem

H885AA Trilink Connector

T
T

DWZZAAA

DWZZAAA
H885AA Trilink Connector
with H879AA Terminator

BN21K or
BN21L
Cable

BN21K or
BN21L
Cable
ZKOX392714RGS

Table 319 provides a list of hardware needed for differential


Available Server congurations using PMAZC SCSI bus
controllers or KZMSA XMI to SCSI adapters and an HSZ40
with DEC RAID subsystem. This table can also be used for
congurations using PMAZC SCSI controllers and an HSZ10.
Table 319 Hardware Needed for a Differential Available Server Conguration with PMAZC or
KZMSA SCSI Controllers and an HSZ40 or PMAZCs and an HSZ10

Number of
Systems 1

DWZZA-AA

BN21W-0B
"Y" Cables or
H855-AA Tri-link
Connectors

H789-AA
Terminators

BN21K or
BN21L Cables

BN21R or BN23G
Cables

1 You

can have only three systems in an Available Server conguration that include a PMAZC SCSI controller or a
KZMSA XMI to SCSI adapter.

You cannot use an HSZ10 in Available Server congurations with a KZMSA


XMI to SCSI adapter.

340 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

PMAZC Dual
SCSI Module
Jumpers

Figure 320 shows the jumpers on a PMAZC SCSI controller.


Figure 320 PMAZC Dual SCSI Module Jumpers

MLO-009707

Jumper rests used for storing jumpers that have been


removed.

Verifying and
Setting PMAZC
and KZTSA
SCSI ID and
Bus Speed

W2 and W3 terminator jumpers: When installed, these


jumpers provide required termination to one end of the two
SCSI buses. W2 is for Port A, and is the left-most jumper
in the gure. W3 is for Port B. They are shown as being
removed.

W1 is the ash memory write jumper. It should not be


installed except to update the ROM code or when using the
setid utility to change the SCSI ID or bus speed.

This topic provides examples of how to display and change the


SCSI ID or bus speed for the PMAZC or SCSI ID for the KZTSA
SCSI controller.
To display the SCSI ID and bus speed for a PMAZC or KZTSA
SCSI ID, shut down the system. Use the console show config
command to determine the PMAZC or KZTSA congurations.
Example 31 shows that the DEC 3000 Model 500 has PMAZCAA SCSI controllers in two TURBOchannel slots, TC0 and TC1,
and a KZTSA in TURBOchannel slot TC3.

Conguring TruCluster Available Server Hardware 341

Conguring TruCluster Available Server Hardware

Example 31 Displaying DEC 3000 Conguration


>>> show config
DEC 3000 - M500
Digital Equipment Corporation
VPP PAL X5.48-82000101/OSF PAL X1.35-82000201 - Build on 20-JUL-1994 11:07:03.31
TCINFO
------

DEVNAM
-------CPU
OSC
ASIC
MEM

DEVSTAT
------OK KN15-AA -V5.1-5748-t19D-sV;.?-DECchip 21064 P2.1
150 MHZ
OK
OK

8
CXT

OK

NVR
SCC
NI
ISDN

OK
OK
OK
OK

SCSI
TC1
TC0
TC3

OK

6
1-PMAZC-AA
0-PMAZC-AA
3-KZTSA-AA
>>>

To display the SCSI ID or bus speed for a specic PMAZC


or KZTSA, use the t tc# cnfg console command shown in
Example 32 and Example 33. In this command the number
sign (#) species the TURBOchannel slot number. For instance,
in Example 32, the PMAZC-AA in TURBOchannel slot 0 and slot
1 both have SCSI IDs of 7 and are set to slow speed.
Example 32 Displaying PMAZC Bus Speed and SCSI ID
>>> t tc0 cnfg
DEC PMAZC-AA V2.0
BOOTDEV
ADDR
---------..HostID.. A/7
..HostID.. B/7
>>> t tc1 cnfg
DEC PMAZC-AA V2.0
BOOTDEV
ADDR
---------..HostID.. A/7
DKB000
B/0/0
DKB100
B/1/0
..HostID.. B/7
>>>

Port A Slow Port B Slow (Dual SCSI [53CF96])


DEVTYPE
NUMBYTES
RM/FV WP
DEVNAM
REV
------------------ --------INITR
INITR
Port A Slow Port B Slow (Dual SCSI [53CF96])
DEVTYPE
NUMBYTES
RM/FX WP
DEVNAM
REV
------------------ --------INITR
DISK
1GB
FX
RZ26
T386
DISK
1GB
FX
RZ26
392A
INITR

Example 33 shows how to display the SCSI ID for the KZTSA.


Note that the KZTSA only has one port.

342 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Example 33 Displaying KZTSA SCSI ID and Bus Speed

>>> t tc3 cnfg


DEC KZTSA-AA A09

(SCSI = 7, Slow)

-------------------------------------------------DEV
PID
VID
REV
SCSI DEV
======= ================ ======== ======= ========
dka0000 HSZ40-Bx (C) DEC DEC
V21Z
DIR
dka0100 RZ28
(C) DEC DEC
442D
DIR
dka0300 RZ28B
(C) DEC DEC
0006
DIR
>>>
To set the SCSI ID or bus speed for both PMAZC ports, use the t
command with the following format:
t tc# setid x y
The number sign (#) is the TURBOchannel slot, x is the SCSI ID
or speed (s = slow and f = fast) for port A and y is the SCSI ID or
speed for port B.
Example 34 shows commands for setting the SCSI ID to 6
for both ports and setting the speed to fast for the PMAZC in
TURBOchannel slot 1, then verifying the changes.
Example 34 Setting PMAZC SCSI ID and Bus Speed

>>> t tc1 setid 6 6


Precharging
..................................................................
Erasing
..................................................................
Programming
..................................................................
Checksum GOOD
>>> t tc1 setid f f
Precharging
..................................................................
Erasing
..................................................................
Programming
..................................................................
Checksum GOOD
>>> t tc1 cnfg
DEC PMAZC-AA V2.0
Port A Fast Port B Fast (Dual SCSI [53CF96])
BOOTDEV
ADDR
DEVTYPE
NUMBYTES
RM/FV WP
DEVNAM
REV
--------------------------- --------..HostID.. A/6
INITR
..HostID.. B/6
INITR
>>>

Conguring TruCluster Available Server Hardware 343

Conguring TruCluster Available Server Hardware

Use the same command to set the SCSI ID or bus speed for the
KZTSA, except that the KZTSA has only one port. Example 35
shows how to set the SCSI ID to 5 and set bus speed to fast for
the KZTSA in TURBOchannel slot 1.
Note in Example 35, that after you change the KZTSA SCSI ID,
you must reset the SCSI bus to effect the ID change. The SCSI
bus reset is not needed to change the speed.
Example 35 Setting KZTSA SCSI ID and Bus Speed

>>>
>>>
>>>
DEC

t tc3 setid 5
t tc3 setid f
t tc3 cnfg
KZTSA-AA A09

(SCSI = 7, Fast)

-------------------------------------------------DEV
PID
VID
REV
SCSI DEV
======= ================ ======== ======= ========
dka0000 HSZ40-Bx (C) DEC DEC
V21Z
DIR
dka0100 RZ28
(C) DEC DEC
D41C
DIR
dka0300 RZ28B
(C) DEC DEC
0006
DIR
>>>INIT
>>> t tc3 cnfg
DEC KZTSA-AA A09
(SCSI = 5, Fast)
-------------------------------------------------DEV
PID
VID
REV
SCSI DEV
======= ================ ======== ======= ========
dka0000 HSZ40-Bx (C) DEC DEC
V21Z
DIR
dka0100 RZ28
(C) DEC DEC
D41C
DIR
dka0300 RZ28B
(C) DEC DEC
0006
DIR
>>>

Setting Up
an Available
Server
Conguration
Using a KZTSA
TURBOchannel
to SCSI
Adapter

This section is specic to an Available Server conguration using


KZTSA TURBOchannel to SCSI adapters. The KZTSA is a
differential single channel SCSI adapter. The use of a KZTSA in a
DEC 3000 system simplies hardware conguration and reduces
the total number of required DWZZAs.
Use Table 320 to set up an Available Server conguration using
KZTSA TURBOchannel to SCSI adapters with a BA350, BA353,
or BA356 storage expansion unit.
Figure 321 shows an Available Server conguration with two
DEC 3000 Model 500 systems with KZTSA TURBOchannel SCSI
adapters on a shared bus with a BA350 storage expansion unit.
Figure 322 shows an Available Server conguration with two
DEC 3000 Model 500 systems with KZTSA TURBOchannel SCSI
adapters on a shared bus with a BA356 storage expansion unit.

344 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 321 lists the hardware necessary to generate


congurations using two, three, or four systems. These
systems use KZTSA TURBOchannel SCSI adapters or KZPSA
PCI SCSI adapters in an Available Server conguration with a
BA350, BA353, or BA356 storage expansion unit on the shared
bus.
Table 320 Setting Up an Available Server Conguration with KZTSA TURBOchannel to SCSI
Adapters and a BA350, BA353, or BA356
Step

Action

Refer to:

For each DEC 3000 system that will have a KZTSA on the shared
SCSI bus, shut down the system and install the KZTSA.

KZTSA SCSI Storage


Adapter Installation and
Users Guide

Disable the KZTSA internal SCSI termination by removing the J1, J2,
J3, J6, and J7 terminator packs.

Figure 324

The default SCSI ID for a KZTSA is 7. Turn on the system power and
set the KZTSA SCSI ID if necessary.

Example 31, 33, and


35

If rmware has to be updated, boot from the Alpha Systems Firmware


Update CDROM.

Refer to the rmware


release notes for the
system.

You will need one DWZZ*. It can be a DWZZA-AA, DWZZA-VA, or


DWZZB-VW.
For each DWZZ*, remove the ve differential terminator resistor SIPs.

Figure 311, Figure 312,


or Figure 314 as
appropriate

The DWZZ* single-ended termination is dependent on the type of


StorageWorks device used:
BA350 Ensure that a DWZZA-AA or DWZZA-VA single-ended
termination jumper, J2 is installed.
BA353 For a DWZZA-AA, ensure that the single-ended SCSI
termination jumper, J2 is installed.
For the DWZZA-VA, remove the single-ended SCSI
termination jumper, J2 and install terminator 12-37004-04 on
the BA353 SCSI input connector.
BA356 Ensure that the DWZZB-VW single-ended termination
jumpers, W1 and W2 are installed.
5

Install a BN21W-0B "Y" cable or H885-AA tri-link connector on the


external SCSI p-connector of each KZTSA and on the differential
connector of the DWZZA or DWZZB.

Connect an H879-AA terminator to one leg of each "Y" cable or trilink connector of the two controllers or the device (KZTSA, BA350, or
BA353) that will be on the end of the shared bus.

Connect a BN21K or BN21L cable between all open connectors


on BN21W-0B "Y" cables or tri-link connectors, daisy-chaining the
devices.
(continued on next page)

Conguring TruCluster Available Server Hardware 345

Conguring TruCluster Available Server Hardware

Table 320 (Cont.) Setting Up an Available Server Conguration with KZTSA TURBOchannel to
SCSI Adapters and a BA350, BA353, or BA356
Step

Action

Refer to:

If you are using a DWZZA-AA, connect a BN21R or BN23G cable from


the DWZZA single-ended jack to JA1 on the BA350 or the BA353
input connector.

Ensure that the BA350 terminator and SCSI jumper are both
installed.

Figure 38

Ensure that the BA356 SCSI jumper is installed.

Figure 310

If you are using a BA353 with a DWZZA-VA, ensure that terminator


12-37004-04 is installed on the BA353 SCSI input connector.

Figure 39

Figure 321 Available Server Conguration with Two DEC 3000


Model 500 Systems Using KZTSA SCSI Adapters and a
Single-Ended Shared Bus with a BA350
Ethernet Interface
DWZZAVA
With H885AA
Trilink Connector
BN21K or
BN21L
Cable

1
2

BN21K or
BN21L
Cable

H885AA
Trilink
Connector
With H879AA
Terminator

3
4

T
T
KZTSA

H885AA
Trilink
Connector
With H879AA
Terminator

DEC 3000 Model 500

KZTSA

BA350

DEC 3000 Model 500


ZKOX392715RGS

346 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Figure 322 Available Server Conguration with Two DEC 3000


Model 500 Systems Using KZTSA SCSI Adapters and a
Single-Ended Shared Bus with a BA356

Ethernet Interface
DWZZBVW
With H885AA
Trilink Connector
BN21K or
BN21L
Cable

BN21K or
BN21L
Cable

1
2

H885AA
Trilink
Connector
With H879AA
Terminator

3
4

T
T
KZTSA

H885AA
Trilink
Connector
With H879AA
Terminator

DEC 3000 Model 500

KZTSA

BA356

DEC 3000 Model 500


ZKOX548103RGS

Table 321 Hardware Needed for a KZPSA or KZTSA and BA350, BA353, or BA356 Available
Server Conguration

Number of
Systems

BN21W-0B
"Y" Cables or
H885-AA Tri-link
Connectors

H789-AA
Terminators

BN21K or
BN21L SCSI
Cables

DWZZA-AA
and BN21R
or BN23G
Cable 1;2

DWZZA-VA 1;3 or
DWZZB-VW 1

1 Use
2 The
3 If

either a DWZZA-AA or DWZZA-VA with a BA350 or BA353. Use a DWZZB-VW with a BA356.
BN21R (BN23G) cable is not needed if you use a DWZZA-VA or DWZZB-VW.

you use a DWZZA-VA with a BA353, you need a 12-37004-04 terminator installed on the BA353 SCSI input connector.

Use Table 322 to set up an Available Server conguration


using KZTSA TURBOchannel to SCSI adapters with an HSZ40.
Remember, the KZTSA is a TURBOchannel to fast-wide
differential SCSI adapter, and therefore, when used with an
HSZ40, you do not have to use a DWZZA in the Available Server
conguration.
Figure 323 shows an Available Server conguration with two
DEC 3000 Model 500 systems with KZTSA TURBOchannel SCSI
adapters on a shared bus with an HSZ40.
Table 323 lists the hardware necessary to generate
congurations using two, three, or four systems. These
systems use KZTSA TURBOchannel SCSI adapters or KZPSA
PCI SCSI adapters in an Available Server conguration with an
HSZ40 on the shared bus.

Conguring TruCluster Available Server Hardware 347

Conguring TruCluster Available Server Hardware

Table 322 Setting Up an Available Server Conguration with KZTSA TURBOchannel to SCSI
Adapters and an HSZ40
Step

Action

Refer to:

For each DEC 3000 system that will have a KZTSA on the shared
SCSI bus, shut down the system and install the KZTSA.

KZTSA SCSI Storage


Adapter Installation and
Users Guide

Disable the KZTSA internal SCSI termination by removing the J1, J2,
J3, J6, and J7 terminator packs.

Figure 324

The default SCSI ID for a KZTSA is 7. Turn on the system power and
set the KZTSA SCSI ID if necessary.

Example 31, 33, and


35

Install a BN21W-0B "Y" cable or H885-AA tri-link connector on the


external SCSI p-connector of each KZTSA and on the HSZ40 input
connector.

Connect an H879-AA terminator to one leg of each "Y" cable or tri-link


connector of the two controllers or the device (KZTSA, HSZ40) that
will be on the end of the shared bus.

Connect a BN21K or BN21L cable between all open connectors


on BN21W-0B "Y" cables or tri-link connectors, daisy-chaining the
devices.

Figure 323 Two DEC 3000 Model 500 Systems with KZTSA
TURBOchannel SCSI Adapters in an Available Server
Conguration with an HSZ40

H885AA
Trilink
Connector

Network Interface
H879AA
Terminator

BN21K
or
BN21L
Cable

BN21K
or
BN21L
Cable

H885AA Trilink
Connector with
H879AA
Terminator

T
KZTSA

KZTSA

DEC 3000 Model 500

H885AA Trilink
Connector with
H879AA
Terminator

HSZ40 with DEC


RAID Subsystem

DEC 3000 Model 500


ZKOX392716RGS

348 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 323 Hardware Needed for an Available Server Conguration


with a KZPSA or KZTSA and an HSZ40
Number of
Systems

BN21W-0B "Y" Cables


or H885-AA Tri-link
Connectors

H879-AA
Terminator

BN21K or
BN21L SCSI
Cable

Figure 324 KZTSA Jumpers and Termination

2
2

J1
J2

J6

J3

J7

1
6

NUO_010_000581_09_RGS

Internal SCSI bus p-connector

Near-end SCSI bus terminator packs

Yellow LEDPower-on self-test passed

Red LEDPower-on self-test failed

Green LEDSCSI bus terminator power is functional

Jumper W1Installed: In-line fuse that protects the onboard


SCSI bus terminator power supply
Conguring TruCluster Available Server Hardware 349

Conguring TruCluster Available Server Hardware

7
8

Jumper W3Installed: Enables terminator power onto the


SCSI bus

Setting Up
an Available
Server
Conguration
with KZMSA
SCSI
Controllers

Jumper W2Not installed: Manufacturing use only

Jumper W4Not installed: Manufacturing use only

The KZMSA is an XMI to SCSI adapter used in DEC 7000 or


DEC 10000 systems. It is a dual channel, single-ended SCSI
controller. The KZMSA internal termination cannot be removed.
It must therefore be used with DWZZA-AA signal converter to
provide the proper SCSI bus termination and allow isolation of
a KZMSA and its associated system for maintenance purposes.
When using a KZMSA for a shared SCSI bus in an Available
Server conguration, make sure that you are connecting the bus
to the same KZMSA channel as on other KZMSA or PMAZC SCSI
controllers.
Each KZMSA used for a shared SCSI bus in an Available Server
conguration must have the revision F03 boot ROM. If necessary,
a revision F01 or F02 boot ROM must be replaced with a revision
F03 boot ROM. The part numbers for the various revisions of
KZMSA boot ROMs are shown in Table 324.
Table 324 KZMSA Boot ROM Part Numbers
Part Number

Revision

23-368E9-01

F01

23-386E9-01

F02

23-419E9-01

F03

You can determine the KZMSA hardware revision by booting the


LFU utility and using the console commands, or by examining the
23-class part number printed on the boot ROM located at module
position E7. The LFU utility is covered later.
Only KZMSAs with Rev D NCR 53C710 chips can be used in an
Available Server conguration. The chip must have part numbers
609-3400546 or 609-3400563.
Follow the steps in Table 325 to set up an Available Server
conguration using only KZMSAs and a BA350, BA353, or BA356
storage expansion unit.
Figure 325 shows an Available Server conguration with two
DEC 7000 systems with KZMSA XMI to SCSI adapters on a
shared bus with a BA350.
Table 315 provides a list of hardware needed for differential
Available Server congurations using PMAZC SCSI bus
controllers or KZMSA XMI to SCSI bus adapters and BA350,
BA353, or BA356 storage expansion units. DWZZ* signal
converters are used, one of which may be a DWZZA-VA or

350 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

DWZZB-VW. If a DWZZA-VA is used, it must be installed in slot


0 of the BA350. If a DWZZB-VW is used, it must be installed in
slot 0 of the BA356.
Note

You can have only three systems in an Available Server


conguration that include a KZMSA XMI to SCSI adapter.

Table 325 Setting Up an Available Server Conguration with KZMSA XMI to SCSI Adapters
and a BA350, BA353, or BA356
Step

Action

Refer to:

For each DEC 7000 or DEC 10000 system using a KZMSA on the
shared bus, shut down the system and install the KZMSA in an XMI
slot, keeping in mind that all SCSI controllers on the shared SCSI bus
in an Available Server conguration must be on the same logical SCSI
bus.

KZMSA Adapter
Installation Guide

Boot the Loadable Firmware Update (LFU) utility to congure the


KZMSA hardware.

Example 36

Update the KZMSA rmware if necessary.

Example 37,
Example 38

Set the SCSI IDs for the KZMSA.

Example 37,
Example 39

Enable the Disable Reset conguration option for any KZMSA channel
that will be used for a shared SCSI bus and disable the option for any
channel not used on a shared SCSI bus.

Example 37,
Example 39

Enable (disable) fast SCSI speed for the KZMSA.

Example 37,
Example 39

You will need one DWZZA-AA for each KZMSA on the shared SCSI
bus and a DWZZ* for the BA350, BA353, or BA356.
For each DWZZ*, remove the ve differential terminator resistor SIPs.
For each DWZZA-AA, ensure that the single-ended SCSI termination
jumper, J2, is installed to provide termination for the single-ended
SCSI bus segment.

Figure 311

If a DWZZA-VA is to be installed in the BA350 or BA353, make sure


that the single-ended SCSI termination jumper, J2, is installed.

Figure 312

If a DWZZB-VW is to be installed in a BA356, make sure that the


single-ended SCSI termination jumpers, W1 and W2, are installed.

Figure 314

If a DWZZB-AA is to be used (external to the BA356) ensure that the


single-ended SCSI termination jumpers, W1 and W2, are installed
3

Figure 313, Figure 312,


or Figure 314

Figure 313

For each KZMSA used in the Available Server conguration, install


a BN21R or BN23G cable between the KZMSA connector for the
appropriate channel and the single-ended connector on a DWZZA-AA.
(continued on next page)

Conguring TruCluster Available Server Hardware 351

Conguring TruCluster Available Server Hardware

Table 325 (Cont.) Setting Up an Available Server Conguration with KZMSA XMI to SCSI
Adapters and a BA350, BA353, or BA356
Step

Action

Install a BN21W-0B "Y" cable or an H885-AA tri-link connector on the


differential connector of each DWZZ*.

Refer to:

If you use a DWZZA-VA, install it in slot 0 of the BA350, or any


BA353 slot.
For a BA356, use a DWZZB-VW and install it in slot 0 of the BA356.

Install an H879-AA differential terminator on one leg of the two


BN21W-0B "Y" cables or tri-link connectors for the two adapters or
the device that will be on the ends of the shared SCSI bus.

If you are using a DWZZA-AA for the connection to the BA350 or


BA353, connect a BN21R or BN23G cable between the single-ended
DWZZA-AA connector and BA350 JA1 or the BA353 SCSI input
connector.
If you are using a DWZZB-AA for the connection to the BA356,
connect a BN21K or BN21L cable between the single-ended DWZZBAA connector and BA356 JA1.

Install a BN21K or BN21L cable between the open connections on


the BN21W-0B "Y" cable or tri-link connector, creating a daisy-chain
from one DWZZ* to another. Make sure that the "Y" cables or tri-link
connectors with terminators are on the end of the bus.

Ensure that the BA350 terminator and SCSI jumper are both
installed. Ensure that the BA356 SCSI jumper is installed.

Figure 38 and
Figure 310

If you are using a BA353 with a DWZZA-VA, ensure that terminator


12-37004-04 is installed on the BA353 SCSI input connector.

Figure 39

352 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Figure 325 Available Server Conguration with Two DEC 7000 with
KZMSA XMI SCSI Adapters and a BA350
Network Interface

T
0
1
2

BN21R
or
BN23G
Cable

3
4
5

KZMSA
DWZZAAA

DWZZAAA

DEC 7000
BN21R
or
BN23G
Cable

KZMSA

BA350

H885AA Trilink
Connector

T
BN21K
or
BN21L
Cable

H885AA Trilink
Connector With
H879AA Terminator

DEC 7000
DWZZAAA

BN21R
or
BN23G
Cable

ZKOX392717RGS

Conguring TruCluster Available Server Hardware 353

Conguring TruCluster Available Server Hardware

Figure 326 Available Server Conguration with Two DEC 7000 with
KZMSA XMI SCSI Adapters and a BA356
Network Interface

DWZZBVW with H885AA


Trilink Connector and
H879AA Terminator

1
2
3
4

KZMSA

KZMSA

BA356

DEC 7000
BN21R
or
BN23G
Cable

DWZZAAA

BN21K
or
BN21L
Cable

DEC 7000
DWZZAAA

H885AA Trilink H885AA Trilink


Connector
Connector and
H879AA Terminator

BN21R
or
BN23G
Cable

ZKOX548107RGS

Follow the steps in Table 326 to set up an Available Server


conguration using only KZMSAs and an HSZ40.
Figure 327 shows an Available Server conguration with two
DEC 7000 systems with KZMSA XMI SCSI adapters on a shared
bus with an HSZ40.
Table 319 provides a list of hardware needed for differential
Available Server congurations using PMAZC SCSI bus
controllers or KZMSA XMI to SCSI adapters and an HSZ40 with
DEC RAID subsystem.
Table 326 Setting Up an Available Server Conguration with KZMSA XMI to SCSI Adapters
and an HSZ40
Step

Action

Refer to:

For each DEC 7000 or DEC 10000 system using a KZMSA on the
shared bus, shut down the system and install the KZMSA in an XMI
slot, keeping in mind that all SCSI controllers on the shared SCSI bus
in a Available Server conguration must be on the same logical SCSI
bus.

KZMSA Adapter
Installation Guide

(continued on next page)

354 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 326 (Cont.) Setting Up an Available Server Conguration with KZMSA XMI to SCSI
Adapters and an HSZ40
Step

Refer to:

Boot the Loadable Firmware Update (LFU) utility to congure the


KZMSA hardware.

Example 36

Update the KZMSA rmware if necessary.

Example 37, 38

Set the SCSI IDs for the KZMSA.

Example 37, 39

Enable the Disable Reset conguration option for any KZMSA channel
that will be used for a shared SCSI bus and disable the option for any
channel not used on a shared SCSI bus.

Example 37, 39

Enable (disable) fast SCSI speed for the KZMSA.


2

Action

Example 37, 39

You will need a DWZZA-AA for each KZMSA XMI to SCSI adapter in
the Available Server conguration.

Figure 311

For each DWZZA-AA:


Ensure that the single-ended SCSI jumper, J2, is installed.
Remove the ve differential terminator resistor SIPs.
3

For each KZMSA used in the ACE conguration, install a BN21R


or BN23G cable between the KZMSA connector for the appropriate
channel and the DWZZA-AA single-ended connector.

Install a BN21W-0B "Y" cable or an H885-AA tri-link connector on the


HSZ40 input connector and the DWZZA-AA differential connector.

Install an H879-AA differential terminator on one leg of the BN21W0B "Y" cables or H885-AA tri-link connectors for the two adapters or
the device that will be on the ends of the shared SCSI bus.

Install a BN21K or BN21L cable between the unused connections on


the BN21W-0B "Y" cable or tri-link connector, creating a daisy-chain
from one DWZZA to another. Make sure that the "Y" cables or tri-link
connectors with the terminators are on the end of the bus.

Conguring TruCluster Available Server Hardware 355

Conguring TruCluster Available Server Hardware

Figure 327 Available Server Conguration with Two DEC 7000


Systems Using KZMSA XMI to SCSI Adapters with an
HSZ40
Network Interface

H885AA
Trilink
Connector
With
H879AA
Terminator

KZMSA

KZMSA

DEC 7000
BN21R
or
BN23G
Cable

HSZ40 With DEC


RAID Subsystem

DEC 7000
BN21K or BN21L
Cable
DWZZAAA

H885AA Trilink
Connector With
H879AA Terminator

DWZZAAA

BN21R
or
BN23G
Cable

H885AA Trilink
Connector
ZKOX392718RGS

Preparing a
KZMSA for Use
in an Available
Server
Environment

If you are using a DEC 7000 or DEC 10000 system with a KZMSA
in your Available Server conguration, you may have to update
the KZMSA rmware, change the SCSI ID or bus speed, or enable
or disable the Disable Reset option.
For the DEC 7000 and DEC 10000 systems, use the Loadable
Firmware Update (LFU) utility to perform these hardware
tasks. Shut down the system then load the LFU, as shown in
Example 36.
1

At the console prompt, use the show device kzmsa0 command to


determine the name of the RRD42 drive.
Load the CDROM into an RRD42 caddy and insert the caddy
into the RRD42 drive. The CDROM that includes both the
LFU utility and the KZMSA revision 5.6 rmware has the
label:
Alpha AXP Systems Firmware Update 2.9

Boot the LFU utility.

356 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Example 36 Booting the LFU Utility

1
>>> show device kzmsa0
polling for units on kzmsa0, slot2, xmi0...
dka100.1.0.2.0 dka100 RRD42
>>> boot DKA100 -flag 0,80 2

Boot File: KZMSA_LFU.EXE

Booting...
****** Loadable Firmware Update Utility ******
-------------------------------------------------------------Function
Description
-------------------------------------------------------------Display
Exit
List

Displays the systems configuration table.


Returns to loadable offline operating environment.
Lists the device types and firmware revisions
supported by this revision of LFU.
Modify
Modifies port parameters and device attributes.
Show
Displays device mnemonic, hardware and firmware
revisions.
Update
Replaces current firmware with loadable data image.
Verify
Compares loadable and device images.
? or Help Scrolls the function table.
-------------------------------------------------------------Function?

When prompted, specify the name of the secondary bootstrap


le (KZMSA_LFU.EXE).

Enter the command for the task you want to perform.

You can display the information about the hardware conguration


with the LFU utility using the display command, as shown in
Example 37.

Conguring TruCluster Available Server Hardware 357

Conguring TruCluster Available Server Hardware

Example 37 Using the LFU Utility to Display Hardware Conguration

Function? display 1
Name
Type
LSB
0+ KN7AA
(8001)
5+ MS7AA
(4000)
7+ MS7AA
(4000)
8+ IOP
(2000)
C0
8+
B+
C+
E+

Rev

Mnemonic FW Rev

HW Rev

0000
0000
0000
0001

kn7aa0
ms7aa0
ms7aa1
iop0

1.0
N/A
N/A
N/A

E04
A01
A01
A

XMI
DWLMA
KZMSA
KZMSA
DEMNA

(102A)
(0C36)
(0C36)
(0C03)

xmi0
A5A6
5143
5143
060B

dwlma0 N/A
kzmsa02 4.3
kzmsa12 4.3
demna0 6.8

A
F01
F01

C1 XMI
1+ KZMSA
2+ KZMSA
8+ DWLMA
Function?

(0C36)
(0C36)
(102A)

5343
5343
A5A6

kzmsa23 4.3
kzmsa33 4.3
dwlma1 N/A

F03
F03
A

Enter the display command to display the conguration.

kzmsa0 and kzmsa1 have the revision 4.3 rmware and the
revision F01 hardware.

kzmsa2 and kzmsa3 have the revision 4.3 rmware and the
revision F03 hardware.

If the KZMSA rmware is not up to the correct revision, use the


LFU utility update command to update it. Note that the CDROM
containing the rmware must be installed in the RRD42. The
update command has the format:
update kzmsa#
where the number sign (#) indicates the number of the KZMSA
which is to have the rmware updated.
Example 38 shows how to update the rmware for kzmsa2 to
version 5.6.

358 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Example 38 Using the LFU Utility to Update KZMSA Firmware

Function? update kzmsa2


Update kzmsa2? [Y/(N)]

Return

WARNING: updates may take several minutes to complete for each device.
DO NOT ABORT!
kzmsa2 Updating to 5.6... Reading Device... Verifying 5.6... PASSED.
Function? display 2
Name
Type
LSB
0+ KN7AA
(8001)
5+ MS7AA
(4000)
7+ MS7AA
(4000)
8+ IOP
(2000)
C0
8+
B+
C+
E+

Rev

Mnemonic FW Rev

HW Rev

0000
0000
0000
0001

kn7aa0
ms7aa0
ms7aa1
iop0

E04
A01
A01
A

1.0
N/A
N/A
N/A

XMI
DWLMA
KZMSA
KZMSA
DEMNA

(102A)
(0C36)
(0C36)
(0C03)

xmi0
A5A6
5143
5143
060B

dwlma0 N/A
kzmsa03 4.3
kzmsa13 4.3
demna0 6.8

A
F01
F01

C1 XMI
1+ KZMSA
2+ KZMSA
8+ DWLMA
Function?

(0C36)
(0C36)
(102A)

5356
5343
A5A6

kzmsa24 5.6
kzmsa35 4.3
dwlma1 N/A

F03
F03
A

Update the rmware for kzmsa2.

Display the conguration to verify that the rmware has been


updated.

kzmsa0 and kzmsa1 are still at rmware revision 4.3.

kzmsa2 is now at rmware revision 5.6.

kzmsa3 is still at rmware revision 4.3.

Use the LFU utility modify kzmsa # command to display detailed


information about a specic KZMSA and to:

Change the SCSI ID.

Enable or disable the fast SCSI option for a particular


channel.

Enable or disable the Disable Reset option.

Example 39 shows how to use the LFU utility modify command


to display detailed information, set the SCSI ID, enable fast SCSI
bus speed, and enable the Disable Reset option for kzmsa2.

Conguring TruCluster Available Server Hardware 359

Conguring TruCluster Available Server Hardware

Example 39 Using the LFU Utility to Modify KZMSA Options

Function? modify kzmsa2 1


kzmsa2
Local Console: ENABLED
Log Selftest Errors: ENABLED
Log NRC 53C710 RBD Errors: ENABLED
Log XMI RBD Errors: ENABLED
Log XZA RBD Errors: ENABLED
RBD Error Logging: DISABLED
RBD Error Frame Overflow: DISABLED Read Only
Hard Error Frame Overflow: DISABLED Read Only
Soft Error Frame Overflow: DISABLED Read Only
FW Update Error Frame Overflow: DISABLED Read Only
Disable Reset Channel 0: DISABLED 2
Disable Reset Channel 1: DISABLED 2
Chnl 0 Fast SCSI: DISABLED 3
Chnl 1 Fast SCSI: DISABLED 3
Channel_0 ID: 07 4
Channel_1 ID: 07 4
Module Serial Numbers: *SG90XXX455*
Do you wish to modify any of these parameters? [y/(n)]
Local Console: ENABLED Change? [y/(n)] Return
Log Selftest Errors: ENABLED Change? [y/(n)] Return

Return

.
.
.
Disable Reset Channel 0: DISABLED Change? [y/(n)] y 5
Disable Reset Channel 1: DISABLED Change? [y/(n)] y 5
Chnl 0 Fast SCSI: DISABLED Change? [y/(n)] y 6
Chnl 1 Fast SCSI: DISABLED Change? [y/(n)] y 6
Channel_0 ID: 07 Change? [y/(n)] y 7
Valid ID is a value from 0 to 7.
Enter new Channel ID: 6 7
Channel_1 ID: 07 Change? [y/(n)] y 7
Valid ID is a value from 0 to 7.
Enter new Channel ID: 6 7
Module Serial Numbers: *SG90XXX455* Change? [y/(n)] n
Local Console: ENABLED
Log Selftest Errors: ENABLED
Log NRC 53C710 RBD Errors: ENABLED
Log XMI RBD Errors: ENABLED
Log XZA RBD Errors: ENABLED
RBD Error Logging: DISABLED
RBD Error Frame Overflow: DISABLED Read Only
Hard Error Frame Overflow: DISABLED Read Only
Soft Error Frame Overflow: DISABLED Read Only
FW Update Error Frame Overflow: DISABLED Read Only
Disable Reset Channel 0: ENABLED 8
Disable Reset Channel 1: ENABLED 8
Chnl 0 Fast SCSI: ENABLED 9
(continued on next page)

360 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Example 39 (Cont.) Using the LFU Utility to Modify KZMSA Options

Chnl 1 Fast SCSI: ENABLED 9


Channel_0 ID: 06 1 0
Channel_1 ID: 06 1 0
Module Serial Numbers: *SG909T1455*
Modify kzmsa2 with these parameter values? [y/(n)] y
Function? exit
>>>

11

The Disable Reset option for both channels is disabled.

The fast SCSI option is disabled for both channels.

The ID for both channels is 7.

Enable the Disable Reset option for channels 0 and 1.

Enable the fast SCSI option for channels 0 and 1.

Change the SCSI ID for channels 0 and 1 to 6.

The LFU utility is set up to enable the Disable Reset option.

The LFU utility is set up to enable the fast SCSI option.

10

The LFU utility is set up to set the SCSI ID for both channels
to 6.

11

Setting Up
an Available
Server
Conguration
Using KZPSA
PCI to SCSI
Adapters

Execute the LFU modify command to modify the options for


kzmsa2. The present options are displayed rst.

Cause the options to be changed to the requested values.

The KZPSA PCI to SCSI bus adapter is installed in a PCI slot


of any supported AlphaServer for use in an Available Server
environment.
The KZPSA is a fast, wide differential adapter with only a single
port, so only one differential shared SCSI bus can be connected to
a KZPSA adapter.
The KZPSA operates at fast or slow speed and is compatible with
narrow or wide SCSI. The fast speed is 10 MB/sec for a narrow
SCSI bus and 20 MB/sec for a wide SCSI bus. The slow speed is
5 MB/sec for a narrow SCSI bus and 10 MB/sec for a wide SCSI
bus.
Use Table 327 to set up an Available Server conguration with
KZPSA adapters and a BA350, BA353 or BA356.
Figure 328 shows an Available Server conguration with two
AlphaServer 2100 systems with KZPSA PCI to SCSI adapters on
a shared bus with a BA350 storage expansion unit.
Table 321 shows the hardware components needed for
congurations using KZPSA PCI to SCSI adapters (or KZTSA
TURBOchannel to SCSI adapters) and a BA350, BA353, or
BA356.

Conguring TruCluster Available Server Hardware 361

Conguring TruCluster Available Server Hardware

Table 327 Setting Up an Available Server Conguration Using KZPSA Adapters and a BA350,
BA353, or BA356
Step

Action

Refer to:

Install a KZPSA PCI to SCSI bus adapter in the PCI slot corresponding
to the logical bus to be used for the shared SCSI bus.

KZTSA SCSI Storage


Adapter Installation
and Users Guide

Remove the KZPSA internal termination resistors, Z1, Z2, Z3, Z4, and
Z5.

Figure 330

Use the show config, show device, and show pk#* console commands
to display the installed devices and information about the KZPSAs on
the AlphaServer 1000, 2000, or 2100 systems.

Example 310

If necessary, update the KZPSA rmware by booting from the Alpha


Systems Firmware Update CDROM.

Refer to the rmware


release notes for the
system

Set the KZPSA SCSI bus ID and bus speed as necessary for this
conguration.

Example 311

You will need one DWZZ*. It can be a DWZZA-AA, DWZZA-VA, or


DWZZB-VW. It is recommended that a DWZZB-VW be used.
For each DWZZ*, remove the ve differential terminator resistor SIPs.

Figure 311

The DWZZ* single-ended termination is dependent on the type of


StorageWorks device used:
BA350 Ensure that the DWZZA-AA or DWZZA-VA single-ended
termination jumper, J2 is installed.
BA353 For a DWZZA-AA, ensure that the single-ended SCSI
termination jumper, J2 is installed.
For the DWZZA-VA, remove the single-ended SCSI termination
jumper, J2 and install terminator 12-37004-04 on the BA353
SCSI input connector.
BA356 Ensure that the DWZZB-VW single-ended termination jumpers,
W1 and W2 are installed.
4

Install a BN21W-0B "Y" cable or H885-AA tri-link connector on each


KZPSA in the conguration and on the differential end of the DWZZA or
DWZZB.

Install an H879-AA terminator on the two tri-link connectors or "Y"


cables attached to the two adapters or the device that will be on the ends
of the shared bus.

Connect the other "Y" cables or tri-link connectors together with


BN21K or BN21L cables. You will need one cable for each KZPSA in
the conguration. Daisy-chain from one adapter or device to the next
keeping the "Y" cable or tri-link connector with the installed terminators
on the ends of the bus.

If you are using a DWZZA-VA, install it in slot 0 of the BA350 or any


BA353 slot. Install a DWZZB-VW in slot 0 of a BA356.

If you are using a BA350, ensure that the BA350 terminator and jumper
are both installed.

Figure 38

(continued on next page)

362 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 327 (Cont.) Setting Up an Available Server Conguration Using KZPSA Adapters and a
BA350, BA353, or BA356
Step

Refer to:

For a BA353 with a DWZZA-VA, ensure that terminator 12-37004-04 is


installed on the BA353 SCSI input connector.

Figure 39

For a BA356, ensure that the SCSI jumper is installed.


9

Action

Figure 310

If you are using a DWZZA-AA, connect a BN21R or BN23G cable


between the DWZZA-AA single-ended connector and the BA350 JA1
input connector or the BA353 SCSI input connector.

Figure 328 Available Server Conguration with Two AlphaServer


2100 Systems Using KZPSA PCI to SCSI Adapters with a
BA350
Network Interface
DWZZAVA
With H885AA
Trilink
Connector
and H879AA
Terminator

AlphaServer 2100

T
1
2
3
4
5

T
BA350
H879AA
Terminator

BN21W0B
"Y" Cable

BN21K or
BN21L
Cable

BN21W0B
"Y" Cable

BN21K or
BN21L
Cable
ZKOX548105RGS

Table 328 shows how to use KZPSA adapters in an Available


Server conguration with an HSZ40. An example hardware
conguration is shown in Figure 329.
Table 323 shows the hardware components needed for
congurations using KZPSA PCI to SCSI adapters (or KZTSA
TURBOchannel to SCSI adapters) and an HSZ40.
Table 328 Setting Up an Available Server Conguration Using KZPSA Adapters and an HSZ40
Step

Action

Refer to:

Install a KZPSA PCI to SCSI bus adapter in the PCI slot corresponding
to the logical bus to be used for the shared SCSI bus.

KZTSA SCSI Storage


Adapter Installation
and Users Guide
(continued on next page)

Conguring TruCluster Available Server Hardware 363

Conguring TruCluster Available Server Hardware

Table 328 (Cont.) Setting Up an Available Server Conguration Using KZPSA Adapters and
an HSZ40
Step

Refer to:

Remove the KZPSA internal termination resistors, Z1, Z2, Z3, Z4, and
Z5.

Figure 330

Use the show config, show device, and show pk#* console commands
to display the installed devices and information about the KZPSAs on
the AlphaServer 1000, 2000, or 2100 systems.

Example 310

If necessary, boot from the Alpha Systems Firmware Update CDROM


and update the KZPSA rmware.

Refer to the Firmware


Release notes for the
applicable system.

Set the KZPSA SCSI bus ID and bus speed as necessary for this
conguration.

Action

Example 311

Install a BN21W-0B "Y" cable or H885-AA tri-link connector on the


HSZ40 input connector and each KZPSA in the conguration.

Install an H879-AA terminator on the two tri-link connectors or "Y"


cables attached to the two adapters or the device that will be on the ends
of the shared bus.

Connect the "Y" cables or tri-link connectors to each other with BN21K
or BN21L cables. You will need one cable for each KZPSA in the
conguration. Daisy-chain from one device to the next, making sure that
you keep the "Y" cables or tri-link connectors with installed terminators
at the end of the bus.

364 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Figure 329 Available Server Conguration with Two AlphaServer


2100 Systems Using KZPSA PCI to SCSI Adapters with
an HSZ40

Network Interface
H885AA
Trilink
Connector
With
H879AA
Terminator

AlphaServer 2100

HSZ40 With DEC


RAID Subsystem

T
H879AA
Terminator

BN21W0B
"Y" Cable

BN21K or
BN21L
Cable

BN21W0B
"Y" Cable

BN21K or
BN21L
Cable
ZKOX548108RGS

Figure 330 KZPSA Termination Resistor Locations


Z1, Z2, Z3, Z4, and Z5 Terminators

ZKOX392709RGS

Use the show config and show device console commands shown
in Example 310 to display information about installed devices
on an AlphaServer 1000, 2000, or 2100 system. The show config
command shows you which slots the KZPSAs are installed in, and
their SCSI IDs, but does not indicate the hardware or rmware
revision. Although the show device output does not call out
the KZPSA by name, it provides the hardware and rmware
revisions.

Conguring TruCluster Available Server Hardware 365

Conguring TruCluster Available Server Hardware

Unfortunately, neither the show config nor the show device


command provide the KZPSA bus speed. Use the show pk#*
console command to determine the bus speed. The number sign
(#) is the letter designation for the KZPSA from the show config or
show device command output. Example 310 shows the results for
pkb0.
Example 310 Displaying Devices on AlphaServer 1000, 2000 or 2100 Systems
>>> show config 1
Digital Equipment Corporation
AlphaServer 2100 4/200
SRM Console X3.10-3020
Component

Status

VMS PALcode X5.48-91, OSF PALcode X1.35-59


Module ID

CPU 0
CPU 1

P
P

B2020-AA DECchip (tm) 21064-3


B2020-AA DECchip (tm) 21064-3

Memory 0
Memory 1

P
P

B2021-BA 64 MB
B2021-BA 64 MB

I/O
SLOT
0
1

B2110-AA
dva0.0.0.1000.0
Option
DECchip 21040-AA
NCR 53C810

DEC KZPSA

Hose 0, Bus 0, PCI


ewa0.0.0.0.0
pka0.7.0.1.0
dka0.0.0.1.0
dka100.1.0.1.0
dka600.6.0.1.0

08-00-2B-E2-7C-81
SCSI Bus ID 7
RZ28
RZ28
RRD43

Intel 82375EB

RX26

Bridge to Bus 1, EISA


pkb0.4.0.6.0
dkb0.0.0.6.0
dkb100.1.0.6.0
dkb300.3.0.6.0

8
DEC KZPSA
pkc0.7.0.8.0
>>> show device 4
dka0.0.0.1.0
DKA0
RZ28
dka100.1.0.1.0
DKA100
RZ28
dka600.6.0.1.0
DKA600
RRD43
dkb0.0.0.6.0
DKB0
HSZ40-Bx
dkb100.1.0.6.0
DKB100
RZ28
dkb300.3.0.6.0
DKB300
RZ28B
dva0.0.0.1000.0
DVA0
RX26
ewa0.0.0.0.0
EWA
08-00-2B-E2-7C-81
pka0.7.0.1.0
PKA0
SCSI Bus ID 7
pkb0.4.0.6.0
PKB0
SCSI Bus ID 4
pkc0.7.0.8.0
PKC0
SCSI Bus ID 7
>>> show pkb* 7
pkb0_fast
1 8
pkb0_host_id
4 9
pkb0_termpwr
1 10
>>>

366 Conguring TruCluster Available Server Hardware

SCSI Bus ID 4 2
HSZ40-Bx
RZ28
RZ28B
SCSI Bus ID 7 3
D41C
D41C
1084
V21Z
D41C
0006

C01 A04 5
C01 A04 6

Conguring TruCluster Available Server Hardware

The rst KZPSA available for Available Server is pkb0, which


has a SCSI ID of 4.

This system has a second KZPSA, pkc0, which has SCSI ID 7.

Use the show device command to get more information.

KZPSA pkb0 is hardware revision C01 and has revision A04


rmware.

KZPSA pkc0 also has hardware revision C01 and revision A04
rmware.

Use the show pkb* command to show all variables set for
KZPSA pkbo.

The KZPSA bus speed is fast (1 = fast, 0 = slow).

The KZPSA SCSI ID is 4.

10

Setting KZPSA
SCSI ID and
Bus Speed

Use the show config command to show the system


conguration.

The KZPSA is generating termination power.

If the SCSI ID is not correct, or if it was reset to 7 by the


rmware update utility, or you need to change the KZPSA speed,
use the set console command.
Use the set command with the following format to set the SCSI
bus ID:
set pkn0_host_id #
The n species the KZPSA ID, which you obtain from the show
device console command. The number sign (#) is the SCSI bus ID
for the KZPSA.
Use the set command with the following format to set the bus
speed: set pkn0_fast #
The number sign (#) species the bus speed. Use a 0 for slow and
a 1 for fast.
Example 311 shows how to determine the present SCSI ID and
bus speed, then set the KZPSA SCSI ID to 5 and the bus speed to
fast for pkc0.

Conguring TruCluster Available Server Hardware 367

Conguring TruCluster Available Server Hardware

Example 311 Setting KZPSA SCSI ID and Bus Speed

>>>
7
>>>
0
>>>
>>>
>>>
>>>
5
>>>
1
>>>

show pkc0_host_id

show pkc0_fast

set pkc0_host_id 53
set pkc0_fast 1 4
show pkc0_host_id 5
5

show pkc0_fast

1
2

Display the present bus speed, which is slow (pkc0_fast is 0).

Set the SCSI ID to 5.

Set the bus speed to fast.

Verify that the SCSI ID is now 5.

Setting Up
an Available
Server
Conguration
with Mixed
Adapter Types
and a BA350,
BA353, or
BA356

Display the present SCSI ID.

Verify that the bus speed is now fast (pkc0_fast is 1).

This section describes how to install an Available Server


hardware conguration consisting of multiple host adapters which
are not all the same type, using a BA350, BA353, or storage
expansion unit. For instance, you may have an Available Server
conguration consisting of two DEC 3000 Model 500 systems, one
with a PMAZC TURBOchannel SCSI controller and the other
with a KZTSA TURBOchannel SCSI adapter, and an AlphaServer
2100 with a KZPSA PCI SCSI adapter.
Table 329 provides the steps necessary to install the hardware
for a mixed conguration with a BA350, BA353, or BA356. Note
that you will be referring to steps in previous tables for host
adapter installation and setup.
Table 330 provides a list of the hardware necessary for a
mixed conguration with BA350, BA353, or BA356. Figure 331
provides an illustration of a sample mixed conguration with a
BA350 storage expansion unit.

Table 329 Setting Up an Available Server Conguration with Mixed Host Adapters and a
BA350, BA353, or BA356
Step

Action

Refer to:

For each system with a:


(continued on next page)

368 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 329 (Cont.) Setting Up an Available Server Conguration with Mixed Host Adapters and
a BA350, BA353, or BA356
Step

Refer to:

PMAZC

Table 314 Steps 1,


2, and 3

KZTSA

Table 320 Steps 1,


2

KZMSA

Table 325 Step 1

KZPSA
2

Action

Table 327 Steps 1,


2

You will need one DWZZA-AA for each system with a PMAZC
TURBOchannel SCSI controller or KZMSA SCSI adapter and a DWZZA
for the BA350 or BA353. You can use a DWZZA-VA for the BA350 or
BA353. Use a DWZZB-VW for a BA356.
For each DWZZ*, remove the ve differential terminator resistor SIPs.

Figure 311

For each DWZZA-AA, ensure that the single-ended SCSI termination


jumper, J2, is installed to provide termination for that end of the singleended SCSI bus segment.

Figure 311

For a DWZZA-VA that will be installed in a:


BA350 Ensure that the single-ended SCSI termination jumper, J2, is
installed.
BA353 Remove the single-ended SCSI termination jumper, J2.
Install terminator 12-37004-04 on the BA353 SCSI input connector.
For a DWZZB-VW that will be installed in a BA356, ensure that the
single-ended termination jumpers, W1 and W2 are installed.
3

For each single-ended host adapter, install a BN21R or BN23G cable


between the host adapter and the single-ended connector on a DWZZA-AA.

Install a BN21W-0B "Y" cable or an H885-AA tri-link connector on the


differential connector of each DWZZ* and on the connector for each
differential host adapter.

Install an H879-AA terminator on the two "Y" cables or tri-link connectors


on the two devices that will be on the ends of the shared bus.

Connect the other "Y" cables or tri-link connectors together with BN21K or
BN21L cables. You will need one cable for each system in the conguration.
Daisy-chain from one host adapter or device to the next, keeping the "Y"
cable or tri-link connectors with the installed terminators on the ends of the
bus.

If you are using a DWZZA-AA for the connection to the BA350 or BA353,
connect a BN21R or BN23G cable between the DWZZA-AA single-ended
connector and BA350 connector JA1 or the BA353 SCSI input connector.
If you are using a DWZZA-VA, install it in slot 0 of the BA350 or any BA353
slot.
Install a DWZZB-VW in slot 0 of a BA356.

Conguring TruCluster Available Server Hardware 369

Conguring TruCluster Available Server Hardware

Table 330 Hardware Needed for a Mixed Adapter Available Server Conguration with BA350
or BA353
Number of
Host Adapters1

BN21R or BN23G
SCSI Cables
DWZZA-AA
or
DWZZA-VA
or
DWZZB-VW

BN21W-0B
"Y" Cables
or
H885-AA
Tri-link
Connectors

H789-AA
Terminators

BN21K
or BN21L
Cables

Using a
DWZZAVA3 or
DWZZBVW

No
DWZZAVA or
DWZZBVW

Singleended

Differential

1 You

can have only three systems in an Available Server conguration that include a PMAZC TURBOchannel SCSI
controller or KZMSA XMI to SCSI adapter.

2 One

of the DWZZ*s can be a DWZZA-VA or DWZZB-VW. The rest are DWZZA-AA.


you use a DWZZA-VA and a BA353 storage expansion unit, you must install terminator 12-37004-04 on the BA353
SCSI input connector.

3 If

Figure 331 Mixed Host Adapter Available Server Conguration with


BA350 Storage Expansion Unit
Network Interface
DEC 3000 Model 500
DWZZAVA with H885AA
Trilink Connector

DEC 3000 Model 500

H885AA
Trilink
Connector

PMAZC

AlphaServer 2100

2
3
4

T
T

T
BN21R
or BN23G
Cable

KZTSA

BA350

BN21K or
BN21L
Cable

DWZZAAA

Setting Up
an Available
Server
Conguration
with Mixed
Adapter Types
and an HSZ40

H885AA Trilink Connector


with H879AA Terminator

BN21K or
BN21L
Cable

BN21W0B H879AA
"Y" Cable
Terminator

ZKOX548109RGS

This section describes how to install an Available Server hardware


conguration consisting of multiple host adapters which are not
all the same type, using an HSZ40 DEC RAID subsystem.
For instance, you may have an Available Server conguration
consisting of two DEC 3000 Model 500 systems with PMAZC
TURBOchannel SCSI controllers and an AlphaServer 2100 with a
KZPSA PCI SCSI adapter.

370 Conguring TruCluster Available Server Hardware

Conguring TruCluster Available Server Hardware

Table 331 provides the steps necessary to install the hardware


for a mixed conguration with an HSZ40.
Table 332 provides a list of the hardware necessary for a mixed
conguration with an HSZ40, and Figure 332 provides an
illustration of a sample mixed conguration with an HSZ40.
Table 331 Setting Up an Available Server Conguration with Mixed Host Adapters and an
HSZ40
Step

Action

Refer to:

For each system with a:


PMAZC
KZTSA

Table 320 Steps 1, 2

KZMSA

Table 325 Step 1

KZPSA
2

Table 314 Steps 1, 2,


and 3

Table 327 Steps 1, 2

You will need one DWZZA-AA for each single-ended host adapter.

Figure 311

For each DWZZA-AA, ensure that the single-ended SCSI termination


jumper, J2, is installed to provide termination for that end of the singleended SCSI bus segment.
Remove the ve differential terminator resistor SIPs from each DWZZA.
3

Connect a BN21W-0B "Y" cable or an H885-AA tri-link connector to the:

HSZ40 input connector

Differential end of each DWZZA-AA

Connector for each differential host adapter

Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cable


or one side of the H885-AA tri-link connector for the host adapters (or
HSZ40) that will be on the ends of the shared bus.

Connect a BN21R or BN23G cable between each single-ended host adapter


and the single-ended connector of a DWZZA-AA.

Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or H885AA tri-link connector) of the DWZZA-AAs and HSZ40. The number of
these cables will be the same as the number of systems on this shared
bus in the Available Server conguration. Make sure that you create a
daisy-chain while keeping the terminators on both ends of the shared bus.

Conguring TruCluster Available Server Hardware 371

Conguring TruCluster Available Server Hardware

Table 332 Hardware Needed for a Mixed Adapter Available Server Conguration with an
HSZ40
Number of Host Adapters1

Single-ended

Differential

DWZZA-AA

BN21W-0B
"Y" Cables
or
H885-AA
Tri-link
Connectors

H789-AA
Terminators

BN21K
or BN21L
Cables

BN21R or
BN23G Cables

1 You

can have only three systems in an Available Server conguration that include a PMAZC SCSI controller or KZMSA
XMI to SCSI adapter.

Figure 332 Mixed Host Adapter Available Server Conguration with


an HSZ40

Network Interface
DEC 3000 Model 500

AlphaServer 2100

PMAZC
4

PMAZC
4

H885AA
Trilink
Connector

BN21R or
BN23G
Cable

T
HSZ40 and DEC
RAID Subsystem

H885AA Trilink Connector

T
T

DWZZAAA

DWZZAAA

H885AA Trilink Connector


with H879AA Terminator

372 Conguring TruCluster Available Server Hardware

BN21K or
BN21L

BN21K or
BN21L
Cable

H879AA
Terminator
BN21K or
BN21L
Cable

BN21W0B
"Y" Cable

ZKOX548110RGS

Summary

Summary
Examining
TruCluster
Available
Server General
Hardware
Conguration
Rules and
Restrictions

Some of the more important general rules include:


A maximum of four systems is allowed on each shared bus in
a TruCluster Available Server conguration.

A maximum of eight devices (SCSI host adapters and disks)


are allowed on each bus.

All the systems in a TruCluster Available Server conguration


must be on the same network subnet.

The SCSI host adapter must be installed in a logically


equivalent I/O bus slot on each system. When the kernel
boots, the SCSI bus number is determined by the order in
which the SCSI host adapters are installed in I/O bus slots,
starting with the rst slot.

For dual-ported SCSI host adapters, the shared SCSI bus


must be on equivalent ports, for instance KZMSA channel 0
(1) and PMAZC port A (B).

You must keep the shared SCSI within the length


requirements.

Improper termination will eventually cause problems.

You should use DWZZA-AA signal converters when using


PMAZC SCSI host adapters in fast mode.

Determining
Available
Server
Hardware
Components

Ensure that DWZZA-AA star washers are in place on all four


screws that hold the cover in place after removing terminator
resistor SIPs.

Only certain systems and SCSI controllers are supported for


a TruCluster Available Server environment, and unsupported
devices should not be used.

When connecting the devices to form the shared SCSI bus, you
must use the correct cables and keep within the cable length
limits for your particular conguration.

Each SCSI bus segment must be properly terminated or you


will have problems.

Each Available Server member system or storage device


should be connected to the shared SCSI bus using a "Y" cable
or H885-AA tri-link connector to allow removal of a device
without affecting SCSI bus operation.

Conguring TruCluster Available Server Hardware 373

Summary

Conguring
TruCluster
Available
Server
Hardware

Some of the most important things to consider when preparing for


a TruCluster Available Server conguration are:

How many systems are in the Available Server conguration?

What kind of controllers do you have?

What are the constraints dictated by the system placement as


to the length of the SCSI buses?

Are you using single-ended or differential SCSI controllers?

Do you need to use DWZZA signal converters?

Is the SCSI controller dual-ported?

What kind of device is being used to house the disk devices?

Ensure that the single-ended SCSI bus termination is correct


between a single-ended controller and a DWZZA signal
converter.
Check that SCSI controller internal termination is present
for a single ended controller connected to a DWZZA.
Check that the DWZZA single-ended termination jumper,
J2, is installed.

Ensure that the single-ended SCSI bus termination is correct


between a DWZZ* (either a DWZZA-AA, DWZZA-VA, or
DWZZB-VW) and a BA350 or BA356.
The single-ended terminator jumper, J2, must be installed
in the DWZZA.
The single-ended terminator jumpers, W1 and W2, must be
installed in the DWZZB.
The BA350 terminator must be installed.

Ensure that the single-ended SCSI bus termination is correct


between a DWZZA-AA and a BA353.
The single-ended terminator jumper, J2, must be installed
in the DWZZA.
The BA353 has internal termination for the other end of
the single-ended bus.

Ensure that the single-ended SCSI bus termination is correct


between a DWZZA-VA and a BA353.
The single-ended terminator jumper, J2, must be removed
from the DWZZA-VA.
Terminator 12-37004-04 must be installed on the BA353
SCSI input connector.

If you have a BN21V-0B, BN21W-0B, or H885-AA tri-link


connector attached to the SCSI bus controller:
The controller internal termination has to be removed.

374 Conguring TruCluster Available Server Hardware

Summary

A terminator has to be installed on the "Y" cables or


tri-link connector for those two controllers or the controller
and device that are on the end of the SCSI bus.

If you are using a BA350 or BA356, the internal jumper must


be installed.

If you are using a DWZZA-VA, it must be installed in slot 0 of


a BA350 (any slot for a BA353).

If you are using a DWZZB-VW, it must be installed in slot 0 of


a BA356.

If you have installed a BN21W-0B "Y" cable or an H885-AA


tri-link connector on the differential end of a DWZZA-AA,
DWZZA-VA, or DWZZB-VW you must remove the differential
termination from the DWZZ*.

SCSI bus IDs must be properly set.

SCSI bus speeds must be properly set.

If you are using a PMAZC or KZMSA, the shared bus must be


on the same port (channel).

The shared SCSI bus must be on the same logical bus.

The correct rmware must be installed in the SCSI bus


controller.

Conguring TruCluster Available Server Hardware 375

Exercises

Exercises
Examining
TruCluster
Available
Server General
Hardware
Conguration
Rules and
Restrictions:
Exercise

1.

You must use a DWZZA signal converter with a KZMSA


because:
a. The KZMSA has only one channel
b. The KZMSA uses the differential mode of signal

transmission
c.

You cannot remove the KZMSA internal terminators

d. The KZMSA operates only on a wide SCSI bus


2.

You should use a DWZZA signal converter with a


PMAZC SCSI host adapter because:
a. The use of the DWZZA increases the maximum shared

SCSI bus length


b. The PMAZC is a dual-ported SCSI host adapter
c.

The PMAZC operates as either a fast or slow SCSI host


adapter

d. Signal conversion is necessary to connect the PMAZC to a

BA350 storage box


3.

The maximum length of the shared SCSI bus in an


Available Server conguration for Version 1.4 is:
a. 3 meters
b. 6 meters
c.

25 meters

d. 31 meters
4.

How many systems may be used in an Available Server


conguration for Version 1.4?
a. 1
b. 2
c.

d. 4

376 Conguring TruCluster Available Server Hardware

Exercises

Examining
TruCluster
Available
Server General
Hardware
Conguration
Rules and
Restrictions:
Solution

1.

c You must use a DWZZA signal converter with a KZMSA


because:
a. The KZMSA has only one channel
b. The KZMSA uses the differential mode of signal

transmission
c.

You cannot remove the KZMSA internal terminators

d. The KZMSA operates only on a wide SCSI bus


2.

a You should use a DWZZA signal converter with a


PMAZC SCSI host adapter because:
a. The use of the DWZZA increases the maximum shared

SCSI bus length


b. The PMAZC is a dual-ported SCSI host adapter
c.

The PMAZC operates as either a fast or slow SCSI host


adapter

d. Signal conversion is necessary to connect the PMAZC to a

BA350 storage box


3.

d The maximum length of the shared SCSI bus in a


Available Server conguration for Version 1.4 is:
a. 3 meters
b. 6 meters
c.

25 meters

d. 31 meters
4.

d How many systems may be used in a Available Server


conguration for Version 1.4?
a. 1
b. 2
c.

d. 4

Determining
Available
Server
Hardware
Components:
Exercise

1.

Which cable do you attach to a single-ended device to


enable you to disconnect the system without affecting SCSI
bus termination?
a. BN21J
b. BN21H
c.

BN21V-0B

d. BN21W-0B

Conguring TruCluster Available Server Hardware 377

Exercises

2.

Which cable do you attach to a differential device to


enable you to disconnect the system without affecting SCSI
bus termination?
a. BN21J
b. BN21H
c.

BN21V-0B

d. BN21W-0B
3.

Which of the following is a signal converter that


contains its own power supply?
a. DWZZA-AA
b. DWZZA-VA
c.

DWZZB-VW

d. H885-AA
4.

Which of the following could you use in place of a


BN21W-0B?
a. H8574-A
b. H8660-AA
c.

H879-AA

d. H885-AA

Determining
Available
Server
Hardware
Components:
Solution

1.

c Which cable do you attach to a single-ended device to


enable you to disconnect the system without affecting SCSI
bus termination?
a. BN21J
b. BN21H
c.

BN21V-0B

d. BN21W-0B
2.

d Which cable do you attach to a differential device to


enable you to disconnect the system without affecting SCSI
bus termination?
a. BN21J
b. BN21H
c.

BN21V-0B

d. BN21W-0B

378 Conguring TruCluster Available Server Hardware

Exercises

3.

a Which of the following is a signal converter that


contains its own power supply?
a. DWZZA-AA
b. DWZZA-VA
c.

DWZZB-VW

d. H885-AA
4.

d Which of the following could you use in place of a


BN21W-0B?
a. H8574-A
b. H8660-AA
c.

H879-AA

d. H885-AA

Conguring
TruCluster
Available
Server
Hardware:
Exercise

Install the hardware necessary to create a two-shared SCSI


Available Server conguration with the following hardware:
Two AlphaServer 2100 systems, each with two KZPSA PCI to
SCSI adapters to provide two shared buses on each system

One shared bus will have a BA350 with two RZ26L disks

Conguring
TruCluster
Available
Server
Hardware:
Solution

The other shared bus will have an HSZ40 RAID controller


with four RZ26L disks

The hardware needed is:

2 AlphaServer 2100 systems

4 KZPSA PCI to SCSI adapters

1 BA350 Storage Works Enclosure

1 HSZ40 RAID controller

6 RZ26L 1.05 GB SCSI disk drives

4 BN21K (or BN21L) SCSI cables

6 H885-AA tri-link connectors, 6 BN21W-0B "Y" cables or any


combination of the two to make 6 total

4 H879-AA terminators
Note

Each system needs an Ethernet controller to provide


network communications.

The solution will be performed in three phases:


1. Install the KZPSA SCSI adapters, two to a system.

Conguring TruCluster Available Server Hardware 379

Exercises

2. Install the remaining hardware for the shared SCSI bus with

the BA350.
3. Install the remaining hardware for the shared SCSI bus with

the HSZ40.
Table 333 outlines the steps necessary to install the KZPSA
SCSI adapters in the systems.
Table 333 Phase 1: Installing the KZPSA SCSI Adapters
Step

Action

Refer to:

Remove the KZPSA internal termination resistors, Z1, Z2, Z3,


Z4, and Z5.

Figure 330

Install two KZPSA PCI to SCSI bus adapters in the PCI slot
corresponding to the logical bus to be used for the shared SCSI
bus on each AlphaServer 2100.

KZPSA PCI-to-SCSI Storage


Adapter

Power up the systems and use the show config, show device,
and show pk#* console commands to display the installed
devices and information about the KZPSAs on the AlphaServer
2100.

Example 310

If necessary, update the KZPSA rmware.

Refer to the Firmwre Release


Notes for the AlphaServer 2100
system

Set the SCSI bus ID of both KZPSAs on one system to 6 and to 7


on the other system, and set the bus speed to fast on all KZPSA
SCSI adapters.

Example 311

Table 334 provides the steps necessary to install the hardware


for the shared SCSI bus with the BA350.
Table 334 Creating a Shared Bus with the BA350
Step

Action

Refer to:

Remove the DWZZA-VA ve differential terminator resistor SIPs.

Figure 311

Ensure that the single-ended termination jumper, J2 is installed


in the DWZZA-VA.

Figure 311

Install the DWZZA-VA in slot 0 of the BA350.

Figure 38

Ensure that the BA350 terminator and jumper are both


installed.

Figure 38

Install a RZ26L SCSI disk in BA350 slots 1 and 2.

Install an H885-AA tri-link connector on the KZPSA in each


system that will be on this shared SCSI bus.

Install an H885-AA tri-link connector on the differential end of


the DWZZA-VA.
(continued on next page)

380 Conguring TruCluster Available Server Hardware

Exercises

Table 334 (Cont.) Creating a Shared Bus with the BA350


Step

Action

Refer to:

Install an H879-AA terminator on the two tri-link connectors


attached to the two KZPSA SCSI adapters as they will be on the
ends of the shared bus.

Install a BN21K cable between the open connector on the tri-link


connectors on each of the two KZPSAs to the tri-link connector
on the DWZZA-VA.

Table 335 provides the steps necessary to install the hardware


for the shared SCSI bus with the HSZ40.
Table 335 Creating a Shared Bus with the HSZ40
Step

Action

Install an H885-AA tri-link connector on the HSZ40 input connector and


both of the KZPSAs for this shared bus.

Install an H879-AA terminator on the two tri-link connectors attached to


the two KZPSA SCSI adapters as they will be on the ends of the shared
bus.

Install BN21K cables between the tri-link connectors on the KZPSAs


and the tri-link connector on the HSZ40.

Conguring TruCluster Available Server Hardware 381

4
Installing TruCluster Software

Installing TruCluster Software 41

About This Chapter

About This Chapter


Introduction

The TruCluster Available Server environment uses two to


four member systems, running TruCluster Software, as highly
available servers. This chapter overviews the installation of
TruCluster Software.

Objectives

To set up and manage TruCluster Available Server servers, you


should be able to:

Determine the correct installation procedure

Resources

List TruCluster Available Server system prerequisites


Install TruCluster Software on all member systems

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Version 1.4 SPD

TruCluster Available Server Software Release Notes

TruCluster Available Server Software Hardware Conguration


and Software Installation

Digital UNIX Installation Guide

Digital UNIX Network Administration

42 Installing TruCluster Software

Performing Preliminary Setup Tasks

Performing Preliminary Setup Tasks


Overview

Before you install the software for TruCluster Available Server


Software, you must determine if you are properly prepared for the
installation.

Hardware

Before installing the TruCluster Software, you should read the


SPD and release notes and verify any system prerequisites.
Make sure the hardware is supported by the current version of
TruCluster Available Server Software Version 1.4. The hardware
should be set up and tested. Use console commands to make sure
all the devices on the shared bus(es) are recognized.

Subsets
Required for
TruCluster
Available
Server
Operation

There are several required Digital UNIX and TruCluster Software


subsets which must be installed on each of the systems that will
be Available Server Environment (ASE) members.
Digital UNIX Version 4.0A must be installed, with the following
optional subsets:

OSFCLINET405: Basic Networking Services

OSFPGMR405: Standard Programmer Commands

OSFCMPLRS405: Compiler Back End

TruCluster Available Server Software Version 1.4 must be


installed:

TCRCOMMON140: TruCluster Common Components

TCRASE140: TruCluster Available Server Software

TCRCONF140: TruCluster Conguration Software

If you want to use the Cluster Monitor, you will also need to
install the following subsets. Note that the C++ Class Shared
Libraries and CDE Minimum Runtime Environment subsets must
be installed before TruCluster Available Server Software Version
1.4 is installed.

CXLSHRDA405: C++ Class Shared Libraries

OSFCDEMIN405: CDE Minimum Runtime Environment

TCRCMS140: TruCluster Cluster Monitor

You may also want to install the following optional


subsets/products:

NFS utilities

POLYCENTER Advanced File System utilities

Logical Storage Manager

Networker Version 3.2 client software

Installing TruCluster Software 43

Performing Preliminary Setup Tasks

Before
Installing
TruCluster
Software

Some installation requirements and restrictions you should be


aware of before you install TruCluster Software are:

The TruCluster Software subsets for TruCluster Available


Server Software Version 1.4 can be installed only on systems
running the Digital UNIX Version 4.0A operating system.
Therefore, you must upgrade the Digital UNIX operating
system, and system and SCSI interface adapter rmware
before you install TruCluster Software.
The route you choose to get to Digital UNIX Version 4.0A
depends upon the current operating system version and the
version of DECsafe Available Server. You may want to perform
an install update, or you may decide to perform a complete
installation.

Do not use a disk connected to a shared SCSI as the root


installation disk. It may be listed in the root installation
menu.

TruCluster Available Server relies on synchronized time on all


member systems; it needs accurate timestamps for database
versions. You must be running a distributed time service such
as the Network Time Protocol (NTP) daemon (xntpd).
Note

If you are using NTP to synchronize system times, ensure


that the setting for the NTP version in the /etc/ntp.conf
le on client systems matches the actual version of NTP
running on the server.
Digital UNIX Version 4.0A supports NTP Version 3. The
Digital UNIX update installation procedure sets the value
for the NTP version in the /etc/ntp.conf le to Version 3
if an entry for the setting does not exist. If your server
is running NTP Version 2, change the setting in the
/etc/ntp.conf le to:
server alpha version 2

You must register the ASE-OA PAK before you install


TruCluster Available Server Software Version 1.4 or you will
not be allowed to install the TruCluster Software subsets.

Do not install TruCluster Available Server Software Version


1.4 into a dataless environment.

Use console commands to determine which SCSI buses your


shared disks are on before you start the TruCluster Software
installation.

If you generate a new system conguration le either with the


doconfig command without the -c option, or with the sizer
-n command, you must run the /var/ase/sbin/ase_fix_config
script.

44 Installing TruCluster Software

Performing Preliminary Setup Tasks

Network
Services

Each member system must be included in all the member systems


/etc/hosts les. The members must be able to communicate with
each other even if the network name resolution service you may
be using becomes unavailable.
Starting with DECsafe Available Server Version 1.3, the use
of multiple networks in an ASE increases the availability of
applications and data. Conguring multiple network paths
between member systems reduces the chance that a member
system will be erroneously considered unavailable, and if
a network interface fails, member systems can continue to
communicate over another network path.
All member systems must be on the same network subnets, and
all member systems must be able to access each network, so
clients can access services from any member.
You must set up the local network on each member system (see
netsetup). You should set up NIS or BIND if you intend to use it
for network name resolution on your network (see ypsetup and
bindsetup). You should set up NFS and start the daemons if you
intend to use NFS services (see nfssetup). You should set up mail
so root can receive alert messages from TruCluster Software (see
mailsetup).

Installing TruCluster Software 45

Preparing to Install TruCluster Software

Preparing to Install TruCluster Software


Overview

The TruCluster Software installation procedure provides multiple


choices of how the software will be installed. It depends upon
whether DECsafe Available Server is installed or not, and if it is
installed, which version is installed.

Choosing the
TruCluster
Software
Installation
Procedure

The procedure you use to install TruCluster Available Server


Software Version 1.4 depends upon whether you are installing
TruCluster Available Server Software Version 1.4 for the rst
time, upgrading an existing version of DECsafe Available Server,
or adding a new member to an existing ASE with member
systems already at TruCluster Available Server Software Version
1.4.
Use your present ASE conguration to determine the installation
procedure you will use:

Setting up an ASE for the rst time: Use this procedure if you
are installing TruCluster Software on systems that are not
currently in an ASE. You will use this procedure if none of the
systems has TruCluster Software installed.

Rolling upgrade: This procedure allows you to upgrade


member systems without shutting down the ASE. You delete
from the ASE the member system that you will upgrade,
upgrade the system, then add the system back into the ASE.

Simultaneous upgrade: This procedure requires that you


shut down the ASE. Depending upon the current version of
DECsafe Available Server, you may be able to preserve the
existing ASE database.

Adding a member system to an existing ASE: If you have an


existing ASE, you can add a new member without shutting
down the ASE.

Figure 41 provides an overview of various paths an upgrade may


take for an existing DECsafe Available Server installation.

46 Installing TruCluster Software

Preparing to Install TruCluster Software

Figure 41 Upgrade Paths for Existing DECsafe Available Server Installation

ASE
Version
Installed
?

DECsafe Available
Server V1.3

No
Must be
V3.2G

Digital UNIX
Version
3.2D or 3.2F
?

DECsafe Available
Server V1.2A or V1.2

DECsafe Available
Server Versions
Prior to V1.2

DECsafe Available
Server not
Installed

Simultaneous upgrade
to Digital UNIX V3.2G
and ASE V1.3.
Preserve the database.

Simultaneous upgrade
to Digital UNIX V3.2G
and ASE V1.3. You can
not reuse the database.

Install Digital UNIX


V4.0A and
ASE V1.4

Yes
Rolling upgrade to
Digital UNIX V3.2G
and ASE V1.3

Rolling Upgrade
to Digital UNIX V4.0A
and ASE 1.4

Simultaneous Upgrade
to Digital UNIX V4.0A
and ASE V1.4
ZKOX548111RGS

To upgrade to TruCluster Available Server Software Version 1.4,


existing DECsafe Available Server congurations must be at, or
upgraded to, Digital UNIX Version 3.2G and DECsafe Available
Server Version 1.3. Table 41 provides more information about
the upgrade paths.
Before starting any upgrade, take into consideration the present
Digital UNIX and DECsafe Available Server conguration.
Determine if it is desired to provide continuous support for
services without disruption during the upgrade if at all possible.
You must also consider the amount of time needed to upgrade
from one version of software to another version, to another
version, and so forth, and compare that time with the amount
of time it would take to install Digital UNIX Version 4.0A and
TruCluster Available Server Software Version 1.4.
Remember that the format for the ASE conguration database
(asecdb) changed in DECsafe Available Server Version 1.3.
The ASE V1.3 upgrade procedures take the database format
change into consideration, but upgrades prior to ASE 1.3, and
the upgrade to ASE V1.4 does not. The V1.4 rolling upgrade
procedure assumes that there is no change in database format.

Installing TruCluster Software 47

Preparing to Install TruCluster Software

Table 41 Upgrade Paths for Existing DECsafe Available Server Installation


ASE
Version

Operating
System

Upgrade Procedure

V1.3

Digital UNIX
V3.2G

Rolling upgrade to Digital UNIX Version 4.0A and TruCluster Available


Server Software Version 1.4.

V1.3

Digital UNIX
V3.2D or V3.2F

Rolling upgrade to Digital UNIX Version 3.2G and DECsafe Available Server
V1.3, then rolling upgrade to Digital UNIX Version 4.0A and TruCluster
Available Server Software Version 1.4.

V1.2A

Digital UNIX
V3.2C

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe Available


Server V1.3, then rolling upgrade to Digital UNIX Version 4.0A and
TruCluster Available Server Software Version 1.4. Preserve the database.
Upgrade install (setld) to Digital UNIX Version 3.2G.

V1.2

DEC OSF/1
V3.2A

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe Available


Server V1.3, then rolling upgrade to Digital UNIX Version 4.0A and
TruCluster Available Server Software Version 1.4. Preserve the database.
You must update install (installupdate) to Digital UNIX V3.2C before
you can upgrade (setld) to Digital UNIX 3.2G. Or, you can do a complete
installation of Digital UNIX 3.2C then upgrade to Digital UNIX 3.2G. You
can also do a complete installation of Digital UNIX Version 4.0A.

Note

Because of the amount of time it would take to upgrade


to TruCluster Available Server Software Version 1.4 from
versions of DECsafe prior to V1.2, it is recommended that
Digital UNIX Version 4.0A and TruCluster Available Server
Software Version 1.4 be installed from scratch.
V1.1

DEC OSF/1 V3.0

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe Available


Server V1.3, then rolling upgrade to Digital UNIX Version 4.0A and
TruCluster Available Server Software Version 1.4. You cannot reuse the
database. You must update install (installupdate) to DEC OSF/1 V3.2,
then update install to Digital UNIX V3.2C before you can upgrade (setld)
to Digital UNIX 3.2G. Or, you can do a complete installation of Digital
UNIX 3.2C then upgrade to Digital UNIX 3.2G. You can also do a complete
installation of Digital UNIX Version 4.0A.

V1.0A
V1.0

DEC OSF/1 V2.1


DEC OSF/1 V2.0

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe Available


Server V1.3, then rolling upgrade to Digital UNIX Version 4.0A and
TruCluster Available Server Software Version 1.4. You cannot reuse the
database. You must update install (installupdate) to DEC OSF/1 V3.0,
then update install to DEC OSF/1 V3.2, then update install to Digital UNIX
V3.2C before you can upgrade (setld) to Digital UNIX 3.2G. Or, you can do
a complete installation of Digital UNIX 3.2C then upgrade to Digital UNIX
3.2G. You can also do a complete installation of Digital UNIX Version 4.0A.

Setting Up an
ASE for the
First Time

If you are installing TruCluster Available Server Software Version


1.4 on systems that are not currently in an ASE, perform the
following tasks on each system that will be an ASE member:
1. Install Digital UNIX Version 4.0A along with the appropriate

utility subsets and register the associated licenses.

48 Installing TruCluster Software

Preparing to Install TruCluster Software

2. Register the TruCluster Available Server Software Version 1.4

software license before you install the software.


3. Use the setld -l command to load the TruCluster Available

Server Software Version 1.4 subsets. When the subsets have


been installed, the installation procedure starts.
4. Enter the necessary information at each of the prompts. If

prompted, do not use a saved ASE database.


5. Rebuild the kernel. The doconfig utility is automatically

run. The ase_fix_config utility also runs, providing you the


opportunity to renumber the shared SCSI buses.
6. Move the new kernel to the root directory and reboot the

system.
7. Ensure that the host name and IP address for each member

system is listed in the /etc/hosts le of each member system.


8. After all the system software (Digital UNIX Version 4.0A and

TruCluster Available Server Software Version 1.4) has been


installed, run the asemgr utility on one system and add all the
member systems to the ASE, then set up the ASE services.

Performing
a Rolling
Upgrade

A rolling upgrade allows you to upgrade one member of the ASE


at a time, without having to shut down the ASE. Rolling upgrades
can be performed only from:

Digital UNIX Version 3.2D or 3.2F/DECsafe Available Server


V1.3 to Digital UNIX Version 3.2G/DECsafe Available Server
V1.3

Digital UNIX Version 3.2G/DECsafe Available Server V1.3


to Digital UNIX Version 4.0A/TruCluster Available Server
Software Version 1.4

To perform a rolling upgrade, you must:

Delete one member system at a time from the ASE

Delete the ASE software subsets

Update the operating system on that system

Install the ASE software subsets

Add the system back into the ASE

The new ASE functionality is not enabled until all member


systems have been upgraded to the same versions of DECsafe
/TruCluster Available Server.
As the rolling upgrade tasks depend upon the Digital UNIX/ASE
versions, there are two different procedures for executing a
rolling upgrade. The rst procedure is a rolling upgrade for
systems in an ASE with Digital UNIX Version 3.2G/DECsafe
Available Server V1.3. The second procedure is a rolling upgrade
for systems in an ASE with Digital UNIX Version 3.2C/3.2D or
3.2F/DECsafe Available Server V1.2A/V1.3.
Installing TruCluster Software 49

Preparing to Install TruCluster Software

Rolling Upgrade from Digital UNIX Version 3.2G/ASE V1.3 to Digital


UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a rolling upgrade if the ASE


member systems are running Digital UNIX Version 3.2G and
DECsafe Available Server V1.3.
1. On one member system, run the asemgr utility and delete the

member system that is to be upgraded.


If the system is included in the list of members favored to run
the service, according to the services Automatic Placement
Policy (ASP), you cannot delete the member.
2. On the system being upgraded, use the

/sbin/init.d/asemember stop command to stop running


DECsafe Available Server daemons.
3. Use the setld -i | grep ASE command to determine which

DECsafe Available Server V1.3 subsets are installed.


4. Delete the DECsafe Available Server V1.3 subsets with the

setld -d command. If desired, restore the original kernel.


5. Install update (installupdate) Digital UNIX Version 4.0A and

register the appropriate licenses.


6. Register the TruCluster Available Server software license

before you install the TruCluster Available Server Software


Version 1.4 software.
7. Use the Associated Products CDROM and use setld -l to

load the TruCluster Available Server Software Version 1.4


subsets.
8. Enter the necessary information at each of the prompts. If

you are prompted to determine if you want to use a saved ASE


database, answer n.
9. Rebuild the kernel, then move the new kernel to the root

directory.
10. Reboot the system.
11. Ensure that the host name and IP address of each existing

ASE member system is in the /etc/hosts le.


12. Run the asemgr utility on an existing ASE member and add

the upgraded system to the ASE.


13. Repeat the previous steps for the remaining systems in the

ASE.
Rolling Upgrade from Digital UNIX Version 3.2D or 3.2F/ASE V1.3 to
Digital UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a rolling upgrade if the ASE


member systems are running Digital UNIX Version 3.2D or 3.2F
and DECsafe Available Server V1.3.

410 Installing TruCluster Software

Preparing to Install TruCluster Software

Assuming a two-member ASE, when you start the upgrade, you


have the following ASE conguration:
Node A: ASE 1.3/Digital UNIX V3.2D
Node B: ASE 1.3/Digital UNIX V3.2D
1. On member system B, run the asemgr utility and delete

member system A, the member that is to be upgraded rst.


If the system is included in the list of members favored to run
the service, according to the services Automatic Placement
Policy (ASP), you cannot delete the member. Change the ASP,
then delete the member to be updated.
2. On system A, use the /sbin/init.d/asemember stop command to

stop running DECsafe Available Server daemons.


3. Use the setld -i | grep ASE command to determine which

DECsafe Available Server V1.3 subsets are installed.


4. Delete the DECsafe Available Server V1.3 subsets with the

setld -d command. If desired, restore the original kernel.


5. Upgrade install (setld -l) Digital UNIX Version 3.2G and

register the appropriate licenses.


6. Mount the Digital UNIX V3.2D and Complementary Products

CDROM and use setld -l to load the DECsafe Available


Server V1.3 subsets.
7. Enter the necessary information at each of the prompts. If

you are prompted to determine if you want to use a saved ASE


database, answer n.
8. Rebuild the kernel, then move the new kernel to the root

directory.
9. Reboot the system.
10. Ensure that the host name and IP address of each existing

ASE member system is in the /etc/hosts le.


11. Run the asemgr utility on an existing ASE member and add

the upgraded system to the ASE.


What you have accomplished at this point is to upgrade
system A to ASE V1.3/Digital UNIX V3.2G.
Note that this procedure was very specic about reinstalling
DECsafe Available Server V1.3 and adding the upgraded
system back into the ASE.
If you had a two-member ASE, you will have, at this point:
Node A: ASE 1.3/Digital UNIX V3.2G
Node B: ASE 1.3/Digital UNIX V3.2D
You may be tempted to skip reinstalling ASE 1.3 after
upgrading to Digital UNIX Version 3.2G, and instead update
install to Digital UNIX Version 4.0A. Do not do it. What you
would have in that case would be:
Node A: ASE 1.4/Digital UNIX V4.0A
Installing TruCluster Software 411

Preparing to Install TruCluster Software

Node B: ASE 1.3/Digital UNIX V3.2D


This is an unqualied/unsupported conguration.
You must reinstall ASE V1.3 and add the system back into the
ASE.
For rolling upgrades, the supported congurations are:
Node A: ASE 1.3/Digital UNIX V3.2G
Node B: ASE 1.3/Digital UNIX V3.2D
and
Node A: ASE 1.3/Digital UNIX V3.2G
Node B: ASE 1.4/Digital UNIX V4.0A
12. Once the upgraded system has been made a member of

the ASE, run the asemgr utility, preferably on the upgraded


system, and delete another member system that is yet to be
upgraded (in this case, system B).
13. On system B, use the /sbin/init.d/asemember stop command to

stop running DECsafe Available Server daemons.


14. Use the setld -i | grep ASE command to determine which

DECsafe Available Server V1.3 subsets are installed.


15. Delete the DECsafe Available Server V1.3 subsets with the

setld -d command.
16. Upgrade install (setld -l) Digital UNIX Version 3.2G and

register the appropriate licenses.


17. Install update (installupdate) Digital UNIX Version 4.0A and

register the appropriate licenses.


18. Use the Associated Products CDROM and use setld -l to

load the TruCluster Available Server Software Version 1.4


subsets
What you now have, with the second system upgraded is:
Node A: ASE 1.3/Digital UNIX V3.2G
Node B: ASE 1.4/Digital UNIX V4.0A
19. Enter the necessary information at each of the prompts. If

you are prompted to determine if you want to use a saved ASE


database, answer n.
20. Rebuild the kernel, then move the new kernel to the root

directory.
21. Reboot the system.
22. Run the asemgr utility on an existing ASE member (Node A in

our case) and add the upgraded system to the ASE.


23. On the member system just updated (B), run the asemgr utility

and delete member system A to complete the upgrade for that


system.
24. On system A, use the /sbin/init.d/asemember stop command to

stop running DECsafe Available Server daemons.


412 Installing TruCluster Software

Preparing to Install TruCluster Software

25. Use the setld -i | grep ASE command to determine which

DECsafe Available Server V1.3 subsets are installed.


26. Delete the DECsafe Available Server V1.3 subsets with the

setld -d command.
27. Install update (installupdate) Digital UNIX Version 4.0A and

register the appropriate licenses.


28. Use the Associated Products CDROM and use setld -l to

load the TruCluster Available Server Software Version 1.4


subsets
What you have for an ASE conguration is:
Node A: ASE 1.4/Digital UNIX V4.0A
Node B: ASE 1.4/Digital UNIX V4.0A
29. Enter the necessary information at each of the prompts. If

you are prompted to determine if you want to use a saved ASE


database, answer n.
30. Rebuild the kernel, then move the new kernel to the root

directory.
31. Reboot the system.
32. Run the asemgr utility on an existing ASE member (Node B)

and add the upgraded system to the ASE.

Simultaneous
Upgrade

If you are willing to shut down the ASE, you can perform a
simultaneous upgrade on all the member systems of an existing
ASE.
You can use the simultaneous upgrade and preserve the ASE
database if you are currently running DECsafe Available Server
Version 1.2 or 1.2A. For versions of DECsafe prior to 1.2, you
cannot preserve the ASE database.
For congurations previous to DECsafe Available Server
Version 1.2, it is recommended that, instead of doing multiple
simultaneous upgrades to get to Digital UNIX Version 4.0A and
TruCluster Available Server Software Version 1.4, just install
Digital UNIX Version 4.0A and TruCluster Available Server
Software Version 1.4. It will take much less time.
This section covers two simultaneous upgrades to get to Digital
UNIX Version 4.0A and TruCluster Available Server Software
Version 1.4:

Digital UNIX Version 3.2G and DECsafe Available Server


Version 1.3

Digital UNIX Version 3.2A and DECsafe Available Server


Version 1.2. A simultaneous upgrade from Digital UNIX
Version 3.2C/DECsafe Available Server 1.2A would be very
similar.

Installing TruCluster Software 413

Preparing to Install TruCluster Software

Simultaneous Upgrade from Digital UNIX Version 3.2G/ASE V1.3 to


Digital UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a simultaneous upgrade from


Digital UNIX Version 3.2G and DECsafe Available Server Version
1.3 to Digital UNIX Version 4.0A and TruCluster Available Server
Software Version 1.4:
1. Use the asemgr to: Put all the services off line if you want to

preserve the ASE database. Delete all the services if you do


not want to preserve the existing ASE database. This allows
you to keep any AdvFS and LSM disks congured.
2. Use the setld -i | grep ASE command to determine which

DECsafe Available Server subsets are installed.


3. Delete the DECsafe Available Server subsets with the setld -d

command. You will be asked if you want to retain the existing


ASE database. If you do not perform an update installation of
Digital UNIX Version 4.0A, but install the operating system,
move the database le (/usr/var/ase/config/asecdb) to a safe
place prior to installing Digital UNIX Version 4.0A. This
will prevent the database from being overwritten during the
installation.
4. Upgrade Digital UNIX to Digital UNIX Version 4.0A or install

Digital UNIX Version 4.0A and register the appropriate


licenses.
5. If necessary, move the database le back to /usr/var/ase

/config/asecdb after the operating system has been installed.


Ensure that the protections, owner, and group are
-rw-r--r-- root system.
6. Register the TruCluster Available Server Software Version 1.4

software license before you install the TruCluster Software.


7. Install the TruCluster Available Server Software Version 1.4

software subsets.
8. Specify the appropriate information at the prompts.
9. Rebuild the kernel for each system.
10. Ensure that the host name and IP address of each existing

ASE member system is in the /etc/hosts le.


11. Reboot each system.
12. Run the asemgr utility on only one system to add the member

systems and set up the ASE services.


Simultaneous Upgrade from Digital UNIX Version 3.2A/ASE V1.2 to
Digital UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a simultaneous upgrade from


Digital UNIX Version 3.2A and DECsafe Available Server Version
1.2 to Digital UNIX Version 4.0A and TruCluster Available Server
Software Version 1.4.

414 Installing TruCluster Software

Preparing to Install TruCluster Software

Note that the steps provided here only upgrade to Digital UNIX
Version 3.2G and DECsafe Available Server Version 1.3. After
completing these steps, you must complete the steps of the
simultaneous upgrade from Digital UNIX Version 3.2G/ASE V1.3
to Digital UNIX Version 4.0A and TruCluster Available Server
Software Version 1.4.
1. Use the asemgr to put all the services off line to preserve the

ASE database. If you will not preserve the existing ASE


database, delete all the services (this allows you to keep any
AdvFS and LSM disks congured).
2. Use the setld -i | grep ASE command to determine which

DECsafe Available Server subsets are installed.


3. Delete the DECsafe Available Server subsets with the setld -d

command. You will be asked if you want to retain the existing


ASE database. Move the database le (/usr/var/ase/config
/asecdb) to a safe place prior to upgrading to Digital UNIX
Version 3.2C.
4. Update install (installupdate) Digital UNIX to Digital UNIX

Version 3.2C and register the appropriate licenses.


5. Upgrade (setld -l) Digital UNIX to Digital UNIX Version 3.2G

and register the appropriate licenses.


6. Move the database le back to /usr/var/ase/config/asecdb

after the operating system has been upgraded. Ensure that


the protections, owner, and group are -rw-r--r-- root system.
7. Register the DECsafe Available Server Version 1.3 software

license before you install the TruCluster Software.


8. Install the DECsafe Available Server Version 1.3 software

subsets.
9. When prompted, enter s to select "Performing a simultaneous

upgrade" when you receive the following prompt:


ASE Installation Menu
f)
r)
s)
e)

Setting up an ASE for the first time


Performing a rolling upgrade
Performing a simultaneous upgrade
Adding to an ASE with a V1.3 operating version

x) Quit installation

?) Help

Enter your choice: s


10. Specify the appropriate information at the prompts.
11. When prompted, specify if you want to use the preserved ASE

database.
12. Rebuild the kernel for each system.
13. Reboot each system.
14. If you preserved an ASE database, run the asemgr utility and

put the services on line.


Installing TruCluster Software 415

Preparing to Install TruCluster Software

If you did not preserve a database, use the asemgr to add the
member systems and set up the ASE services.
15. Perform the necessary steps for a simultaneous upgrade from

Digital UNIX Version 3.2G/ASE V1.3 to Digital UNIX Version


4.0A/ASE V1.4.

Adding a
Member
System to an
Existing ASE
with ASE V1.4
Operating
Software

You can add a new member to an existing TruCluster Available


Server Software Version 1.4 conguration without shutting down
the ASE.
To install TruCluster Available Server Software Version 1.4 on
a system and add it to an existing ASE, perform the following
tasks:
1. Install the Digital UNIX Version 4.0A operating system and

appropriate licenses.
2. Register the TruCluster Available Server Software Version 1.4

license before you install the TruCluster Software.


3. Use the setld -l command to load the TruCluster Available

Server Software Version 1.4 subsets on the new system.


4. Specify the appropriate information at the installation

prompts.
5. When the installation has completed, and the kernel has been

rebuilt, move the new kernel to the root le system.


6. Reboot the system.
7. Ensure that the host name and IP address of each existing

ASE member system is in the /etc/hosts le.


8. Run the asemgr on an existing member system and add the

new system to the ASE.

416 Installing TruCluster Software

Installing TruCluster Software

Installing TruCluster Software


Overview

This section contains an example of installing TruCluster


Available Server Software Version 1.4 on a system that was
previously a member of an ASE.

Installing
TruCluster
Available
Server
Software
Version 1.4

Use the following steps to install TruCluster Available Server


Software Version 1.4:
1. Log in as superuser.
2. Load the Complimentary Products CDROM into the

appropriate CDROM drive.


3. Mount the CDROM on the /mnt (or other appropriate)

directory, for example:


# mount -r /dev/rz4c /mnt
4. Load the TruCluster Software subsets using the setld -l

utility. Specify the mount point and the directory where


the TruCluster Available Server Software Version 1.4 kit is
located:
# setld -l /mnt/TCR140
5. The setld utility and installation procedure provide the output

shown in Example 41.


Example 41 Installing TruCluster Available Server Software Version 1.4

# setld -l /mnt/TCR140

*** Enter subset selections ***


The following subsets are mandatory and will be installed automatically
unless you choose to exit without installing any subsets:
* TruCluster Available Server Software
* TruCluster Common Components
* TruCluster Configuration Software
The subsets listed below are optional:
There may be more optional subsets than can be presented on a single
screen. If this is the case, you can choose subsets screen by screen
or all at once on the last screen. All of the choices you make will
be collected for your confirmation before any subsets are installed.
- TruCluster(TM) Software:
1) TruCluster Cluster Monitor
--- MORE TO FOLLOW --Enter your choices or press RETURN to display the next screen.
Choices (for example, 1 2 4-6):

Return

(continued on next page)

Installing TruCluster Software 417

Installing TruCluster Software

Example 41 (Cont.) Installing TruCluster Available Server Software Version 1.4

2) TruCluster Reference Pages


The following choices override your previous selections:
3)
4)
5)
6)

ALL mandatory and all optional subsets


MANDATORY subsets only
CANCEL selections and redisplay menus
EXIT without installing any subsets

Add to your choices, choose an overriding action or


press RETURN to confirm previous selections.
Choices (for example, 1 2 4-6): 3
You are installing the following mandatory subsets:
TruCluster Available Server Software
TruCluster Common Components
TruCluster Configuration Software
You are installing the following optional subsets:
- TruCluster(TM) Software:
TruCluster Cluster Monitor
TruCluster Reference Pages
Is this correct? (y/n): y
Checking file system space required to install selected subsets:
File system space checked OK.
5 subset(s) will be installed.
Loading 1 of 5 subset(s)....
TruCluster Common Components
Copying from . (disk)
Verifying
Loading 2 of 5 subset(s)....
TruCluster Available Server Software
Copying from . (disk)
Working....Thu Sep 5 13:45:40 EDT 1996
Verifying
Loading 3 of 5 subset(s)....
TruCluster Cluster Monitor
Copying from . (disk)
Working....Thu Sep 5 13:46:13 EDT 1996
Verifying
Loading 4 of 5 subset(s)....
TruCluster Reference Pages
Copying from . (disk)
Verifying
Loading 5 of 5 subset(s)....
TruCluster Configuration Software
Copying from . (disk)
Verifying
(continued on next page)

418 Installing TruCluster Software

Installing TruCluster Software

Example 41 (Cont.) Installing TruCluster Available Server Software Version 1.4

5 of 5 subset(s) installed successfully.


Configuring "TruCluster Common Components " (TCRCOMMON140)
Configuring "TruCluster Available Server Software" (TCRASE140)
Configuring "TruCluster Cluster Monitor " (TCRCMS140)
Configuring "TruCluster Reference Pages " (TCRMAN140)
Configuring "TruCluster Configuration Software " (TCRCONF140)
Enter the IP name for the member network interface [tinker]:

Return

You chose "tinker," IP 16.30.80.33 using interface ln0


Is this correct? [y]: Return
Do you want to run the ASE logger on this node? [n]: y

An old ASE database file has been found. Do you want to use this (y/n): n
Removing the local disk copy of the ASE database (services and members) ...

Initializing a new ASE V1.4 database ...


The kernel will now be configured using "doconfig".

Enter the name of the kernel configuration file. [TINKER]:

Return

*** KERNEL CONFIGURATION AND BUILD PROCEDURE ***


Saving /sys/conf/TINKER as /sys/conf/TINKER.bck
Do you want to edit the configuration file? (y/n) [n]: n
*** PERFORMING KERNEL BUILD ***
The ASE I/O Bus Renumbering Tool has been invoked.
Select the controllers that define the shared ASE I/O buses.
)
)
2)
3)
4)

Name
scsi0
scsi1
scsi2
scsi3
scsi4

Controller
tcds0
tcds0
tcds1
tcds1
tza0

Slot
0
1
0
1
0

Bus
tc0
tc0
tc0
tc0
tc0

Slot
6
6
1
1
4

q) Quit without making changes


Enter your choices (comma or space separated): 2 4

scsi2 tcds1 0 tc0 1


scsi4 tza0 0 tc0 4
Are the above choices correct (y|n)? [y]: y
I/O Controller Name Specification Menu
All controllers connected to an I/O bus must be named the same on all ASE
members. Enter the controller names for all shared ASE I/O buses by assigning
them one at a time or all at once with the below options.
Name
2) scsi2
4) scsi4

New Name
scsi2
scsi4

Controller
tcds1
tza0

Slot
0
0

Bus
tc0
tc0

Slot
1
4

(continued on next page)

Installing TruCluster Software 419

Installing TruCluster Software

Example 41 (Cont.) Installing TruCluster Available Server Software Version 1.4

f)
p)
v)
s)
r)
q)
x)

Assign buses starting at a given number


Assign buses as was done in pre-ASE V1.3
View non shared controllers
Show previous assignments
Reapply previous assignments
Quit without making any changes
Exit (done with modifications)

Enter your choice [f]:


No changes made
Working....Thu
Working....Thu
Working....Thu

Sep 5 13:51:44 EDT 1996


Sep 5 13:53:46 EDT 1996
Sep 5 13:55:48 EDT 1996

The new kernel is /sys/TINKER/vmunix


The kernel build was successful. Please perform the following actions:

o Move the new kernel to /.


o Before rebooting make sure that the member network interface IP
addresses for all cluster members are recorded in each members
/etc/hosts file.
o Reboot the system.
#
1

Use the setld utility to install the TruCluster Available Server


Software Version 1.4.

This enables the ASE Logger daemon, which tracks all


messages generated by the member systems.

The system on which the installation is taking place was


previously in an ASE and has an existing database. You would
not normally reuse the existing database.

A new kernel is automatically built.

Select the external controllers on the system that are to be


used as the shared SCSI buses in the ASE.

All member systems must recognize the disks on a shared


SCSI bus at the same device number. Because different
systems have different numbers of internal SCSI buses, the
/var/ase/sbin/ase_fix_config script is used to assign a specic
bus number to each external SCSI controller installed on a
system.
You are provided the opportunity to change the shared SCSI
bus numbers to ensure that the shared buses are the same on
all systems in the ASE.
In this case, the shared SCSI buses are on SCSI buses 2 and
4, and no changes are necessary.

After installation is complete, move the new kernel to the


root le system, ensure that the member network interface
IP addresses for all cluster members are recorded in each
members /etc/hosts le, then reboot the system.

420 Installing TruCluster Software

Installing TruCluster Software

After TruCluster Available Server Software Version 1.4 has


been installed, you need to run the asemgr on one (and only
one) ASE member system to add the newly installed system to
the ASE.

Installing TruCluster Software 421

Summary

Summary
Performing
Preliminary
Setup Tasks

Before you install TruCluster Available Server Software Version


1.4 software, you must determine if you are properly prepared for
the installation.

Read the release notes.

Verify any system prerequisites.

Set up and test the hardware.

Use console commands to make sure all the devices on the


shared bus(es) are recognized.

Install Digital UNIX including the following subsets:

OSFCLINET405: Basic Networking Services

OSFPGMR405: Standard Programmer Commands

OSFCMPLRS405: Compiler Back End

NFS, LSM and AdvFS subsets if you will use those services.

Set up the local network, BIND, NFS, mail, and NTP.


Add each member system to all the member systems /etc/hosts
les.
To use the Cluster Monitor, you must also install the following
subsets. Install the C++ Class Shared Libraries and CDE
Minimum Runtime Environment subsets before installing
TruCluster Available Server Software Version 1.4.

OSFCDEMIN405: CDE Minimum Runtime Environment

Preparing
to Install
TruCluster
Software

CXLSHRDA405: C++ Class Shared Libraries


TCRCMS140: TruCluster Cluster Monitor

Choose one of the installation procedures based upon your


conguration.

Setting up an ASE for the rst time if you are installing


TruCluster Available Server Software Version 1.4 on systems
that are not currently in an ASE. Use this procedure if none
of the systems have DECsafe Available Server or TruCluster
Available Server Software Version 1.4 installed.

Performing a rolling upgrade if any of the following is true:


DECsafe Available Server Version 1.2A/Digital UNIX
Version 3.2C is installed on a member of an existing ASE.
DECsafe Available Server Version 1.3/Digital UNIX
Version 3.2D or 3.2F is installed on a member of an
existing ASE.

422 Installing TruCluster Software

Summary

DECsafe Available Server Version 1.3/Digital UNIX


Version V3.2G is installed on a member of an existing
ASE.
This procedure allows you to upgrade member systems
without shutting down the ASE.
In the rst two cases above, a rolling upgrade to Digital UNIX
Version 3.2G and DECsafe Available Server Version 1.3 is
required before a rolling upgrade to Digital UNIX Version
4.0A and TruCluster Available Server Software Version 1.4
can be completed.

Installing
TruCluster
Software

Performing a simultaneous upgrade if DECsafe Available


Server Version 1.2A, Version 1.2, or versions previous to 1.2 is
installed on an existing ASE. If the DECsafe version is 1.2A
or 1.2, you can preserve the existing ASE database if desired.
If the DECsafe version is prior to 1.2, you cannot preserve
the ASE database. You must add the member systems and
services after installation.
Adding to an ASE with a V1.4 operating version if you are
adding a new member to an existing DECsafe Available Server
Version 1.3 conguration. You can add the new member
without shutting down the existing ASE.

Use the following steps to install TruCluster Available Server


Software Version 1.4:
1. Log in as superuser.
2. Load the Complimentary Products CDROM into the

appropriate CDROM drive.


3. Mount the CDROM on the /mnt (or other appropriate)

directory, for example:


# mount -r /dev/rz4c /mnt
4. Load the TruCluster Software subsets using the setld -l

utility. Specify the mount point and the directory where


the TruCluster Available Server Software Version 1.4 kit is
located:
# setld -l /mnt/TCR140
5. Answer the questions at the prompts throughout the

installation.
6. Rebuild the kernel.
7. Reboot the system.

Installing TruCluster Software 423

Exercises

Exercises
Performing
Preliminary
Setup Tasks:
Exercise

Before installing the TruCluster Software, you should:

1.

a. Read the release notes


b. Verify system prerequisites

Install, set up, and test hardware

c.

d. All of the above


2.

Which subset is not required for TruCluster Available


Server Software Version 1.4?
a. OSFCLINET405
b. OSFPGMR405

OSFCMPLRS405

c.

d. None of the above, they are are all required.


3.

To use the Cluster Monitor, you must install which of


these subsets?
a. CXLSHRDA405
b. OSFCDEMIN405

TCRCMS140

c.

d. All of the above

Performing
Preliminary
Setup Tasks:
Solution

1.

Before installing the TruCluster Software, you should:

a. Read the release notes


b. Verify system prerequisites
c.

Install, set up, and test hardware

d. All of the above


2.

d Which subset is not required for TruCluster Available


Server Software Version 1.4?
a. OSFCLINET405
b. OSFPGMR405
c.

OSFCMPLRS405

d. None of the above, they are are all required.

424 Installing TruCluster Software

Exercises

3.

d To use the Cluster Monitor, you must install which of


these subsets?
a. CXLSHRDA405
b. OSFCDEMIN405
c.

TCRCMS140

d. All of the above

Preparing
to Install
TruCluster
Software:
Exercise

For which environment can you use a rolling upgrade?

1.

a. ASE V1.2A/Digital UNIX Version 3.2C


b. ASE V1.1/Digital UNIX Version 3.0
c.

ASE V1.0A/Digital UNIX Version 2.1

d. ASE V1.0/Digital UNIX Version 2.0


2.

A rolling upgrade allows you to upgrade ASE member


systems without shutting down the ASE.
a. True
b. False

3.

You can perform a rolling upgrade to Digital UNIX


Version 4.0A from which operating system version?
a. Digital UNIX Version 3.2C
b. Digital UNIX Version 3.2D
c.

Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G


4.

If ASE member systems are at DECsafe Available


Server Version 1.2, you can preserve the ASE database if
desired.
a. True
b. False

5.

To add a new member to an existing TruCluster


Available Server Software Version 1.4 conguration, you must
shut down the ASE before adding the new member.
a. True
b. False

Installing TruCluster Software 425

Exercises

6.

Which conguration is supported during a rolling


upgrade to TruCluster Available Server Software Version 1.4?
a. ASE V1.3/Digital UNIX Version 3.2D and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
b. ASE V1.3/Digital UNIX Version 3.2F and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
ASE V1.3/Digital UNIX Version 3.2G and TruCluster
Available Server Software Version 1.4/Digital UNIX
Version 4.0A

c.

d. All of the above

Preparing
to Install
TruCluster
Software:
Solution

1.

For which environment can you use a rolling upgrade?

a. ASE V1.2A/Digital UNIX Version 3.2C


b. ASE V1.1/Digital UNIX Version 3.0
c.

ASE V1.0A/Digital UNIX Version 2.1

d. ASE V1.0/Digital UNIX Version 2.0


2.

a A rolling upgrade allows you to upgrade ASE member


systems without shutting down the ASE.
a. True
b. False

3.

d You can perform a rolling upgrade to Digital UNIX


Version 4.0A from which operating system version?
a. Digital UNIX Version 3.2C
b. Digital UNIX Version 3.2D
c.

Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G


4.

a If ASE member systems are at DECsafe Available Server


Version 1.2, you can preserve the ASE database if desired.
a. True
b. False

5.

b To add a new member to an existing TruCluster


Available Server Software Version 1.4 conguration, you must
shut down the ASE before adding the new member.
a. True

426 Installing TruCluster Software

Exercises

b. False
6.

c Which conguration is supported during a rolling


upgrade to TruCluster Available Server Software Version 1.4?
a. ASE V1.3/Digital UNIX Version 3.2D and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
b. ASE V1.3/Digital UNIX Version 3.2F and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
c.

ASE V1.3/Digital UNIX Version 3.2G and TruCluster


Available Server Software Version 1.4/Digital UNIX
Version 4.0A

d. All of the above

Installing
TruCluster
Software:
Exercise

Register the ASE-OA license PAK and install the TruCluster


Available Server Software Version 1.4 software on all member
systems.

Installing
TruCluster
Software:
Solution

If the system is presently a member of an ASE, remove the


system from the ASE and delete the ASE software subsets before
installing TruCluster Available Server Software Version 1.4.
Use lmfsetup, lmf, or the License Manager GUI to register the
PAK.
Use setld to install the TruCluster Available Server Software
Version 1.4 software subsets and rebuild the kernel. See the text
for a sample script.

Installing TruCluster Software 427

5
Setting Up and Managing ASE Members

Setting Up and Managing ASE Members 51

About This Chapter

About This Chapter


Introduction

This chapter describes how to set up and administer your ASE


member systems.

Objectives

To understand how to set up and manage ASE members, you


should be able to:

Set up ASE member systems

Resources

Describe the purpose and syntax of the asemgr utility


Manage TruCluster Software event logging

For more information on the topics in this chapter, see the

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Version 1.4 Release


Notes

Reference Pages

52 Setting Up and Managing ASE Members

Introducing the asemgr Utility

Introducing the asemgr Utility


Overview

You use the asemgr utility to set up and administer the ASE.
Tasks you can perform with the asemgr include the following:

Adding and deleting network interfaces

Creating and managing ASE services

Displaying the status of member systems and ASE services

asemgr
Command
Syntax

Adding and deleting member systems

Specifying logger locations and levels

The asemgr utility has an interactive menu mode and a limited


command line interface. If you enter asemgr without options, it
displays menus and prompts you for information. The command
line interface allows you to use asemgr in shell scripts.
The syntax for the asemgr command is as follows:
/usr/sbin/asemgr [options]
Command line options are as follows:

-d [ -h ] [ member ] [ -v ] [ service ] [ -l ]
Displays the status of all the member systems (-h) and
services (-v), or specic member systems and services. Also
displays the member systems that are running the logger
daemon (-l).

-m service member
Relocates the specied service to the specied member system.
When you relocate a service, you stop the service on the
member system currently running the service and start the
service on another member system.

-r service . . .
Restarts a service.

-s service . . .
Starts the specied service and sets it on line, making it
available to clients.

-x service . . .
Stops the specied service and sets it off line, making it
unavailable to clients.

Setting Up and Managing ASE Members 53

Introducing the asemgr Utility

Running
Multiple
Instances of
the asemgr

Some ASE administrative tasks can lock the ASE. If you try
to run the asemgr utility and the ASE is locked, the following
message is displayed:
ASE is locked by hostname
This message indicates that the task cannot be performed because
another member system is running the asemgr utility.

54 Setting Up and Managing ASE Members

Setting Up and Managing Members

Setting Up and Managing Members


Overview

After you have installed the TruCluster Software and rebooted


the members, you can use the asemgr to add all the ASE member
systems at the same time and from the same system. The
conguration database created by asemgr (/usr/var/ase/config
/asecdb) will be copied to each member system.
You can also add members one at a time from an existing member
system once the TruCluster Software is installed and running.

Using asemgr
the First Time

The rst time you invoke the asemgr utility you will not see the
main menu. Instead you will be prompted for member system
names and asked to conrm the conguration, as shown in the
following example:
# /usr/sbin/asemgr
Enter a comma separated list of all the host names you want
as ASE servers.
Enter Members: tinker, tailor
Member List: tinker, tailor
Is this correct (y/n) [y]: y
Would you like to define any other network interfaces to tinker
for ASE use (y/n)? [n]: n
Would you like to define any other network interfaces to tailor
for ASE use (y/n)? [n]: n
ASE Network Configuration
Member Name
___________

Interface Name
______________

Member Net
__________

Monitor
_______

tinker
tailor

tinker
tailor

Primary
Primary

Yes
Yes

Is this configuration correct (y|n)? [y]: y


After you enter member names and verify the conguration, the
asemgr main menu is displayed:
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
-->
m) Managing ASE Services -->
s) Obtaining ASE Status -->
x) Exit

?) Help

Enter your choice:

Setting Up and Managing ASE Members 55

Setting Up and Managing Members

Initializing
ASE Member
Systems

If an ASE does not function properly when you attempt to you


add members, rst make sure that you have adhered to the
installation requirements. If this still does not allow you to x the
problem, you can initialize one or all of the member systems in an
ASE.
Initializing a system stops any running ASE daemons and
removes any member system and service information from the
ASE database on the system. After you initialize a system, it can
be added to an existing ASE or used in a new ASE.
Initializing a Single Member System

To initialize one member system in an ASE, use the following


procedure:
1. If the system is already a member system, use the asemgr

utility to delete the member system from the ASE. If you


cannot delete the member system, you cannot initialize only
that member.
2. If the system is not an ASE member system, delete the

/usr/var/ase/config/asecdb ASE database le, if it exists, from


the system.
3. Invoke the /usr/sbin/asesetup command on the system.
4. Run the asemgr utility on an existing member system and add

the initialized system to the ASE.


Initializing All the Member Systems

Initializing all the member systems returns the ASE to a state


that includes no member systems or services. After you do this,
you must add the member systems and set up your services again.
To initialize all the member systems in an ASE, follow these
steps:
1. If possible, use the asemgr utility to display the status of the

member systems, networks, and services in the ASE. This


information will help you to recreate your ASE.
2. If possible, use the asemgr utility to delete all the services from

the ASE. This allows you to save any Logical Storage Manager
(LSM) or Advanced File System (AdvFS) congurations on a
specic system.
3. Delete the usr/var/ase/config/asecdb ASE database le from

all the systems.


4. Invoke the usr/sbin/asesetup command on each system.
5. Run the asemgr utility on a system, add the other initialized

systems to the ASE, one at a time, and set up your services.

56 Setting Up and Managing ASE Members

Setting Up and Managing Members

Using asemgr
to Manage
Members

To manage ASE member systems, invoke the asemgr utility and


choose the Managing the ASE item from the main menu. The
following example shows the Managing the ASE menu.
Example 51 asemgr Menus for Members

# /usr/sbin/asemgr
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
-->
m) Managing ASE Services -->
s) Obtaining ASE Status -->
x) Exit

?) Help

Enter your choice: a


Managing the ASE
a)
d)
n)
m)
l)
e)
t)

Add a member
Delete a member
Modify the network configuration
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script

x) Exit to the Main Menu

?) Help

Enter your choice [x]:

Setting Up and Managing ASE Members 57

Setting Up and Managing Members

Adding a
Member

Choose the Add a member item from the Managing the ASE
menu to add a member. The screen displays the current member
systems and prompts you to enter a new member name and to
conrm the new conguration, as shown in the following example:
Managing the ASE
a)
d)
n)
m)
l)
e)
t)

Add a member
Delete a member
Modify the network configuration
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script

x) Exit to the Main Menu


Enter your choice [x]: a
Member List: tinker, tailor
Enter a new member: weaver
Member List: tinker, tailor, weaver
Is this correct (y/n)? [y]: y
Would you like to define any other network interfaces
to weaver for ASE use? (y/n)? [n]: n
ASE Network Configuration
Member Name
___________

Interface Name
______________

Member Net
__________

Monitor
_______

tinker
tailor
weaver

tinker
tailor
weaver

Primary
Primary
Primary

Yes
Yes
Yes

Is this configuration correct (y|n)? [y]: y

Deleting a
Member

Choose the Delete a member item from the Managing the ASE
menu to remove a member system. The screen displays the
current member systems and prompts you to identify the system
to remove. The following example shows the screen display for
deleting a member:
Managing the ASE
a)
d)
n)
m)
l)
e)
t)

Add a member
Delete a member
Modify the network configuration
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script

x) Exit to the Main Menu


Enter your choice [x]: d
Select the member to delete:
1) tailor
2) weaver
) tinker
58 Setting Up and Managing ASE Members

Setting Up and Managing Members

x) Exit without deleting a member

?) Help

Enter your choice [x]: 2


New member list: tinker, tailor
Member to delete: weaver
Is this correct? (y/n) [y]: y
Deleting member weaver ...
Member deleted
You cannot delete a member selected as a favored member for a
service using the favor members or restrict to favored members
placement policy. You must modify the service and remove that
favored member from the list before you can delete the member
from the available server environment.
You cannot delete the member running the asemgr utility. If there
is only one member system in the ASE, you cannot delete that
member using the asemgr utility. You can use setld -d to delete
the TruCluster Software subset from the last member system.

Managing ASE
Networks

When you add a member system to the ASE, the asemgr utility
prompts you for additional network interface names. Before you
add an interface, you must use the netsetup utility to dene the
network interface on the system.
Primary and backup networks in an ASE must be subnets that
are common to all member systems. Network interface names
used in an ASE for common networks must be included in the
local /etc/hosts le on each member system.
The following example is part of an /etc/hosts le and shows
two member systems, tinker and tailor, and multiple network
interfaces for the systems.
# ASE member systems
#
16.140.64.121 tinker.abc.def.com tinker
16.140.64.122 tinker.abc.def.com tailor
#
#
# FDDI ring #1
#
16.140.64.121 tinker1.abc.def.com tinker1
16.140.64.122 tailor1.abc.def.com tailor1
#
#
# FDDI ring #2
#
16.140.64.121 tinker2.abc.def.com tinker2
16.140.64.122 tailor2.abc.def.com tailor2
You must specify the interface names for the primary and backup
networks in the local /etc/routes le on each member system.
For each member system, you must dene a host route to all
other member systems. This denition is needed to fail over IP
trafc between member systems when a network path fails.

Setting Up and Managing ASE Members 59

Setting Up and Managing Members

For example, if your member systems are tinker1 and tailor1,


where the number in the name refers to the subnet, and each
member system also has interface names tinker2 and tailor2,
then each member systems /etc/routes le must contain the
following information:
-host
-host
-host
-host
-host
-host

tinker tinker
tinker tinker
tinker1 tinker1
tinker2 tinker2
tailor1 tailor1
tailor2 tailor2

Modifying the Network Conguration

Choose the Modify the network conguration item from the


Managing the ASE menu to manage the network interface in
an ASE conguration. The ASE Network Modify menu appears,
containing options that allow you to add and delete network
interfaces, display the current network conguration, specify
primary and backup networks, and specify networks to be
modied or ignored.
Enter your choice [x]: n
ASE Network Modify Menu
a) Add network interfaces
d) Delete network interfaces
s) Show the current configuration
p)
b)
i)
m)

Specify
Specify
Specify
Specify

the primary ASE member network


a backup ASE member network
an ASE member network to be ignored
network interfaces to be monitored

q) Quit without making changes


x) Exit
Adding and Deleting Network Interfaces

Before you specify a network interface for a member system, the


interface must be dened and congured on the system.
Choose the Add network interfaces item from the ASE Network
Modify Menu to add a network interface:
ASE Member Menu
Select a member to add an interface to:
0) tinker
1) tailor
q) Quit without making changes
Enter your choice [q]: 1
Enter interface names for member tailor
Interface name (return to exit): tailor1

510 Setting Up and Managing ASE Members

Setting Up and Managing Members

To delete network interfaces, choose the Delete network interfaces


item:
ASE Member Menu
Select a member to delete an interface from:
0) tinker
1) tailor
q) Quit without making changes
Enter your choice [q]: 1
Network interfaces for member tailor
Choose one or more network interfaces to delete:
1) tailor
2) tailor1
3) tailor2

16.140.64.121
16.142.112.121
16.142.96.122

q) Quit to previous menu


Enter your choices (comma or space separated): 1
Displaying the Current Network Conguration

Choose the Show the current conguration item from the ASE
Network Modify menu to display the member systems, their
interface names, whether an interface is designated as a primary
or a backup network, and whether monitoring is enabled. For
example:
ASE Network Configuration
Member Name
___________

Interface Name
______________

Member Net
__________

Monitor
_______

tinker
tinker1
tinker2

tinker
tinker1
tinker2

Primary
Backup
Backup

Yes
Yes
Yes

tailor
tailor1
tailor2

tailor
tailor1
tailor2

Primary
Backup
Backup

Yes
Yes
Yes

Specifying Primary and Backup Networks

The primary network in an ASE is the network most frequently


used to query other member systems. Backup networks are also
used for queries, but at a slower rate. Interfaces for primary
and backup networks must be common to all the ASE member
systems and included in each member systems local /etc/hosts
and /etc/routes les.
Choose the Specify the primary ASE member network menu item
to select an interface for the primary network:
ASE Member Primary Network Menu
Choose one of the networks to be the ASE member
primary network:
0) 16.140.64.121 (tinker, tailor)
1) 16.142.112.121 (tinker1, tailor1)
2) 16.142.96.122 (tinker2, tailor2)

Setting Up and Managing ASE Members 511

Setting Up and Managing Members

q) Quit to previous menu


Enter your choice: 1
16.142.112.121 (tinker1, tailor1)
Is the above choice correct? (y/n) [y]: y
Choose the Specify a Backup ASE Member Network menu item to
select backup network interfaces for the ASE:
ASE Member Backup Network Menu
Choose the networks you want to be the ASE member
backup networks:
0) 16.140.64.121 (tinker, tailor)
1) 16.142.112.121 (tinker1, tailor1)
2) 16.142.96.122 (tinker2, tailor2)
q) Quit to previous menu
Enter your choices (comma or space separated): 0,2
16.140.64.121 (tinker, tailor)
16.142.96.122 (tinker2, tailor2)
Are the above choices correct? (y/n) [y]: y
Specifying a Network to be Ignored

Choose the Specify an ASE member network to be ignored menu


item to specify a network that you want to congure but you do
not currently want the member system to use:
Ignore ASE Member Network Menu
Choose a network not to be used as an ASE member
network:
0) 16.140.64.121 (tinker, tailor)
1) 16.142.112.121 (tinker1, tailor1)
2) 16.142.96.122 (tinker2, tailor2)
q) Quit to previous menu
Enter your choices (comma or space separated): 2
16.142.96.122 (tinker2, tailor2)
Are the above choices correct? (y/n) [y]: y
Specifying a Network to be Monitored

You should monitor an interface if you are concerned with client


access on a particular interface. Monitoring an interface allows
you to customize a TruCluster Available Server operation when a
network interface fails. You can monitor the primary and backup
network interfaces in an ASE. You can also monitor a network
interface that is congured on only one system and is not common
to all the ASE member systems.
If a monitored network interface fails, the TruCluster Software
runs the error Alert script which invokes the /var/ase/lib
/ni_status_awk script that is located on the member system.
The default script causes the TruCluster Software to stop all
the services running on that member system and start them on

512 Setting Up and Managing ASE Members

Setting Up and Managing Members

another member system if all the network interfaces on the rst


member system fail.
However, you can edit the /var/ase/lib/ni_status_awk script
on each member system to specify a different action to take.
For example, you can edit the script so that services relocate
to another member system if any network interface fails or if a
particular interface fails. In addition, because the error Alert
script is propagated on all the member systems, you can edit the
error Alert script itself, so the actions will be the same on all
systems. Use the asemgr utility to edit the error Alert script.
Choose the Specify network interfaces to be monitored menu item
to modify specic interfaces:
ASE Member Menu
Choose a member to modify:
0) tinker
1) tailor
q) Quit without making changes
Enter your choice: 0
Network Interfaces for Member tinker
Choose one or more network interfaces:
0)
1)
2)
q)
n)

tinker
16.140.64.121
tinker1
16.142.112.121
tinker2
16.142.96.122
Quit to previous menu
Do not monitor any interfaces

(monitored)
(not monitored)
(monitored)

Enter your choice: 1


tinker1 16.142.96.122
Are the above choices correct? (y/n) [y]: y

Setting Up and Managing ASE Members 513

Setting Up and Managing Members

Displaying ASE
Member Status

Choose the Display the status of the members menu item to


display the status of member systems. The screen displays the
host status (UP or DOWN) and the status of the agent daemon
for each member system, as shown in the following example:
Managing the ASE
a)
d)
n)
m)
l)
e)
t)

Add a member
Delete a member
Modify the network configuration
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script

x) Exit to the Main Menu


Enter your choice [x]: m
Member Status
Member:
tinker
tailor

Host Status:
UP
UP

Agent Status:
RUNNING
RUNNING

The Director daemon obtains system status from the Host Status
Monitor (HSM) daemons running on all the member systems. The
following table describes the information for the Host Status eld:
Host Status

Description

UP

The member system is up and can be accessed by the member


that is running the Director daemon using the primary network.

DOWN

The member system cannot be accessed by the member that is


running the Director daemon using the primary network or the
SCSI bus.

DISCONNECTED

The member system is disconnected from all monitored networks.


Any services running on the member system are stopped, and no
services can be added, deleted, or started on the member system.

NETPAR

There is a network partition between the member system and


the member system running the Director daemon, although
the member systems can communicate using SCSI bus queries.
Services that are currently running on the member system
remain running, but the member system cannot start or stop any
service until it leaves this state.
The Director determines the status of the Agent daemons running
on the member systems. The following table describes the
information in the Agent Status eld:
Agent Status

Description

RUNNING

The ASE Agent daemon is running on the


member system.

514 Setting Up and Managing ASE Members

Setting Up and Managing Members

Agent Status

DOWN

The ASE Agent daemon is not running on


the member system.

INITIALIZING

The ASE Agent daemon that is running on


the member system is in its initialization
phase and will be running soon.

UNKNOWN

The ASE Director daemon cannot


determine the state of the Agent daemon
on the member system.

INVALID

Resetting the
TruCluster
Software
Daemons

Description

The ASE Director daemon reports an


invalid state for the Agent daemon on the
member system.

If you experience problems in your ASE, you can reset the


TruCluster Software daemons. This stops the Director and Host
Status Monitor daemons and initializes the Agent daemons.
The Agent daemons then restart the other daemons to make the
TruCluster Software fully operational. If resetting the TruCluster
Software daemons does not x the problem, you can initialize or
reboot the system.
To reset the TruCluster Software daemons on a member system,
use the following command:
/sbin/init.d/asemember restart

TruCluster
Software
Daemon
Scheduling

The TruCluster Software daemons need to run with a scheduling


priority that is higher than normal system processes because the
daemons must be able to respond to administrative commands
and time-sensitive events in the ASE. The Agent daemon
(aseagent) and Logger daemon (aselogger) are started in the
/sbin/init.d/asemember script with a nice value of -5, which raises
the priority of those daemons and all processes they start.
If you see an ASE timeout error message in the daemon.log le,
it means the TruCluster Software daemons are timing out while
waiting to run.
There might be non-TruCluster processes with higher scheduling
priority, forcing the TruCluster Software daemons to wait. In
this case, you can raise the scheduling priority of the TruCluster
Software daemons by increasing the nice value in the /sbin
/init.d/asemember script. Refer to nice(1) for more information
about scheduling priorities.
Note that TruCluster Software daemons started with a nice
priority will not always stay at that priority. Over time, if the
member systems do not reboot, a daemons priority may return to
the average run priority. When the member systems reboot, the
daemons priority is raised again according to the nice value in
the sbin/init.d/asemember script.
Setting Up and Managing ASE Members 515

Setting Up and Managing Members

Therefore, the default sbin/init.d/asemember script contains the


following command, which supercedes the nice value for the
asehsm daemon and runs the daemon with a xed high priority
that does not degrade over time:
aseagent -p hsm
If you do not want the xed high priority for the asehsm daemon,
remove this command from the sbin/init.d/asemember script.
You can also raise and x the priority of the asehsm, asehsm,
and asehsm daemons by including the following command in the
sbin/init.d/asemember script.
aseagent -p all

516 Setting Up and Managing ASE Members

Using TruCluster Event Logging

Using TruCluster Event Logging


Overview

The Logger daemon (aselogger) tracks the TruCluster Available


Server messages generated by all the member systems. It is
recommended that you run an instance of the Logger daemon on
each ASE member system.
The Logger daemon uses the Digital UNIX event logging facility,
syslog, which collects messages logged by various kernel,
command, utility, and application programs.
Alert level messages are critical and need immediate attention.
You can specify special action to take in case an alert error occurs.
You can also identify users to receive a mail message or other
actions to take in an Alert script.
You can use the asemgr utility to manage TruCluster Software
event logging and perform the following tasks:

Set and display the level of message logging

Starting the
Logger

Display the logger location


Edit and test the Alert script

During installation, the TruCluster Software prompts you to start


the Logger daemon on that member system. If you choose not to
run the Logger daemon during installation, you can invoke the
following commands to start the Logger daemon.
# rcmgr set ASELOGGER 1
# /sbin/init.d/asemember restart

Stopping the
Logger

You can stop TruCluster Software logging on a member system by


entering the following commands:
# rcmgr set ASELOGGER 0
# /sbin/init.d/asemember restart

Setting System
Logging

System event messages processed by syslog are logged to a


local le or forwarded to a remote system, as specied in the
/etc/syslog.conf le. If you use the default conguration, all
asemgr utility and TruCluster Software daemon messages are
logged to the /var/adm/syslog.dated/date/daemon.log le on the
system running the logger. The Availability Manager driver
messages are logged to the kern.log le in the same directory.
If no Logger daemon is running, or if the member running the
logger goes down, all TruCluster Software messages are logged
locally.

Setting Up and Managing ASE Members 517

Using TruCluster Event Logging

Example 52 System Logging Conguration File


1

kern.debug
daemon.debug
*.emerg

/var/adm/syslog.dated/kern.log
/var/adm/syslog.dated/daemon.log
*

The following example shows the format of the /etc/syslog.conf


le. For more information, see syslogd(8) and Digital UNIX
System Administration, Chapter 14.
1

Species the severity level. The syslogd daemon logs all


messages of the specied level or greater severity. Severity
levels include emerg (panic), alert, crit, err, warn, notice,
info, and debug.

Displaying
Logger
Location

Species the part of the system generating the message. The


asterisk (*) represents all parts of the system.

Species the destination where the messages are logged.


You can specify a full pathname to log to a le, @hostname to
forward messages to that host, a comma-separated list of
users to receive messages, or an asterisk (*) to write messages
to all users who are logged in.

To nd the TruCluster Software message logs, identify a member


system running the logger daemon, then check its system log
les. You can determine which member systems are running the
Logger daemon by choosing the Obtaining ASE Status item from
the asemgr main menu, and then choosing the Display the location
of the logger(s) item. The following example shows how to display
the location of the Logger daemon.

518 Setting Up and Managing ASE Members

Using TruCluster Event Logging

Example 53 Displaying Logger Daemon Location

Obtaining ASE Status


m)
s)
l)
v)

Display
Display
Display
Display

the
the
the
the

status of the members


status of a service
location of the logger(s)
level of logging

x) Exit to the Main Menu

?) Help

Enter your choice [x]: l


Location of logger(s)
The following member(s) are logging ASE information:
tinker
tailor

Setting Log
Level

You can use the asemgr utility to specify the level of the messages
you want logged by the Logger daemon. The Logger daemon uses
four logging levels, as described in the following table.
Logging Level

Description

Error

Logs messages with Error and Alert severity level.


Species critical conditions that need immediate
attention.

Warning

Logs messages with Warning and Error severity


levels. Includes potential error conditions.

Notice

Logs messages with Notice, Warning, and Error


severity levels. Includes informational messages
about signicant activity. This is the default.

Informational

Logs messages of all severity levels. This is very


verbose, for debugging purposes.

You can display your current logging level by choosing the Display
the level of logging item from the Obtaining ASE Status menu, as
shown in the following example.

Setting Up and Managing ASE Members 519

Using TruCluster Event Logging

Example 54 Displaying Log Level

Obtaining ASE Status


m)
s)
l)
v)

Display
Display
Display
Display

the
the
the
the

status of the members


status of a service
location of the logger(s)
level of logging

x) Exit to the Main Menu

?) Help

Enter your choice [x]: v


Level of ASE Logging:
Notice, warning and error logging
To set the severity level, choose the Set the logging level item
from the Managing the ASE menu, as shown in the following
example.
Example 55 Setting the Log Level

Managing the ASE


a)
d)
m)
l)
e)
t)
res)

Add a member
Delete a member
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script
Reset the ASE daemons

x) Exit to the Main Menu

?) Help

Enter your choice [x]: l


Enter the logging level for the ASE:
i)
n)
w)
e)

Informational (log everything)


Notice, warning and error logging
Warning and error logging
Error logging only

x) Exit to Managing the ASE


Enter your choice [n]: n)

Using an Alert
Script

When an error of severity level alert occurs, the TruCluster


Software uses a special script to determine additional actions to
perform. The default Alert script sends mail to root. You can edit
the script to specify other users to receive mail. You can also edit
the script to specify some other action to take.
To edit the Alert script, choose the Edit the error Alert script
item from the Managing the ASE menu. The asemgr utility
puts you into the vi editor or the editor dened by the EDITOR
environment variable and allows you to edit the script as needed.
The following example shows how to edit the Alert script.

520 Setting Up and Managing ASE Members

Using TruCluster Event Logging

Example 56 Editing the Alert Script

Managing the ASE


a)
d)
m)
l)
e)
t)
res)

Add a member
Delete a member
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script
Reset the ASE daemons

x) Exit to the Main Menu

?) Help

Enter your choice [x]: e


#! /bin/sh
#
# Script to log critical ASE errors
#
# Define ADMIN on next line to get mail
ADMIN="root,tom"
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ERR_FILE=/var/ase/tmp/alertMsg
TIME=date +"%D %T"
if [ -n "${ADMIN}" ]; then
if [ ! -f "${ERR_FILE}" ]; then
echo "Critical ASE error detected \
on date" > ${ERR_FILE}
fi
mailx -s "Critical ASE error detected." \
${ADMIN} < ${ERR_FILE}
fi
rm -f ${ERR_FILE}
:wq
After editing the Alert script, you should test it. To test the
Alert script, choose the Test the error Alert script item from the
Managing the ASE menu. A test message is sent to the Logger
daemon and the Alert script is invoked, as shown in the following
example.
Example 57 Testing the Alert Script

Managing the ASE


a)
d)
m)
l)
e)
t)
res)

Add a member
Delete a member
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script
Reset the ASE daemons

x) Exit to the Main Menu

?) Help

Enter your choice [x]: t


(continued on next page)

Setting Up and Managing ASE Members 521

Using TruCluster Event Logging

Example 57 (Cont.) Testing the Alert Script

Enter y to send a test alert message to logger (y/n):


y
--- Test alert message sent to logger

Examining Log
Messages

The system event logs are ASCII text les that are placed under
the /var/adm/syslog.dated directroy. The les can be displayed
with commands such as cat, more, and tail.
The following example shows the messages logged to daemon.log
on system tinker when an NFS service running on system tinker
is relocated to system tailor.

Example 58 daemon.log Entries

Sep
Sep
Sep
Sep
Sep

10
10
10
10
10

09:24:07
09:24:12
09:24:12
09:24:22
09:24:22

tinker
tinker
tinker
tinker
tinker

ASE:
ASE:
ASE:
ASE:
ASE:

tinker
tailor
tailor
tailor
tailor

Agent Notice: stopping service nfsusers


Director Notice: stopped nfsusers on tinker
Agent Notice: starting service nfsusers
Director Notice: started nfsusers on tailor
AseMgr Notice: Relocated service nfsusers to tailor

TruCluster Software messages include the following information:

Time stamp

Local system name

ASE identier (not used in messages from the Availability


Manager driver)

System that generated the message (or local)

Source of the message:


AseMgr

asemgr utility

Director

Director daemon

Agent

Agent daemon

HSM

HSM daemon

AseLogger

Logger daemon

AM

Availability manager driver

vmunix

Availability manager driver

AseUtility

Command executed by an action script

Severity of the message

Message text

The AseUtility source indicates the message has been produced


by a command or daemon not directly related to the TruCluster
Software. It is caused by some other software that the TruCluster
Software is using. For example, the following message was
produced by the LSM software and captured by a TruCluster
Software action script:

522 Setting Up and Managing ASE Members

Using TruCluster Event Logging

AseUtility Error: voldisk: Volume daemon is not accessible


You must examine the messages in the logs to determine
the severity and source. Alert messages and their meanings
are discussed further in the chapter on troubleshooting. For
more information on alert messages, see TruCluster Available
Server Software Available Server Environment Administration,
Appendix B.

Setting Up and Managing ASE Members 523

Summary

Summary
Introducing the
asemgr Utility

You use the asemgr utility to set up and administer the ASE.
Tasks you can perform with the asemgr include the following:

Adding and deleting network interfaces

Creating and locating ASE services

Displaying the status of member systems and ASE services

Setting Up
and Managing
Members

Adding and deleting member systems

Specifying logger locations and message levels

After you have installed the TruCluster Software and rebooted


the members, you can use the asemgr to add all the ASE member
systems at the same time and from the same system. The
conguration database created by asemgr (/usr/var/ase/config
/asecdb) will be copied to each member system.
The asemgrs Managing the ASE menu contains options that
allow you to add and delete ASE members, modify the network
conguration, and display the status of the ASE members.

Using
TruCluster
Software Event
Logging

The Logger daemon (aselogger) tracks the TruCluster Software


messages generated by all the member systems. You can start the
Logger daemon during TruCluster Software installation or later.
The Logger daemon uses the Digital UNIX event logging facility,
syslog. You can specify special action to take in case an error
of level alert occurs, including send a mail message to specied
users.

524 Setting Up and Managing ASE Members

Exercises

Exercises
Introducing the
asemgr Utility:
Exercise
Introducing the
asemgr Utility:
Solution

If you run more than one instance of the asemgr on different ASE
member systems, what may happen?
Some ASE administrative tasks can lock the ASE. If you try
to run the asemgr utility and the ASE is locked, the following
message is displayed:
ASE is locked by hostname
This message indicates that the task cannot be performed because
another member system is running the asemgr utility.

Using asemgr
to Manage
Members:
Exercise

After installing and rebooting all member systems, run the asemgr
utility on one member to do the following:

Using asemgr
to Manage
Members:
Solution

1. Sample solution. The rst time you run asemgr, you will be

1. Add all member system names


2. Display member status

prompted for member names, rather than see the menu.


Enter a comma separated list of all the host names you want
as ASE servers.
Enter Members: alpha, omega
Member List: alpha, omega
Is this correct (y/n) [y]: y
Would you like to define any other network interfaces to alpha
for ASE use (y/n)? [n]: n
Would you like to define any other network interfaces to omega
for ASE use (y/n)? [n]: n
ASE Network Configuration
Member Name
___________

Interface Name
______________

Member Net
__________

Monitor
_______

alpha
omega

omega
alpha

Primary
Primary

Yes
Yes

Is this configuration correct (y|n)? [y]: y


2. Sample solution.

Available Server Environment (ASE)


ASE Main Menu
a) Managing the ASE
-->
m) Managing ASE Services -->
s) Obtaining ASE Status -->
x) Exit

?) Help
Setting Up and Managing ASE Members 525

Exercises

Enter your choice: a


Managing the ASE
a)
d)
n)
m)
l)
e)
t)

Add a member
Delete a member
Modify the network configuration
Display the status of the members
Set the logging level
Edit the error alert script
Test the error alert script

x) Exit to the Main Menu


Enter your choice [x]: m

?) Help

Member Status
Member:
tinker
tailor

Using
TruCluster
Software Event
Logging:
Exercise

Host Status:
UP
UP

Agent Status:
RUNNING
RUNNING

1. If you did not start the Logger daemon during TruCluster

Software installation, do so now with the following commands:


# rcmgr set ASELOGGER 1
# /sbin/init.d/asemember restart
2. Identify where system event messages are logged on your

system.
3. Identify a member system running the Logger daemon.
4. Display the current TruCluster Software logging level and

change it to Informational level.


5. Add your user name to the error Alert script to receive mail

and then test the script. (Create a user account if you do not
already have one.)
6. Shut down a member system to cause an alert.
7. Examine the daemon.log le and nd the alert messages you

caused in the previous steps. Also verify that root and your
user name received mail about the alert.

Using
TruCluster
Software Event
Logging:
Solution

1. Sample solution.

# ps ax | grep aselogger
262 ??
I <
0:00.19 /usr/sbin/aselogger
In this case, the logger is running.
2. By default, asemgr utility and ASE daemon messages are

logged to the /var/adm/syslog.dated/date/daemon.log le and


AM driver messages are logged to the kern.log le. Check
the /etc/syslog.conf le for kern and daemon message
destinations.
3. Choose the Obtaining ASE Status item from the asemgr main

menu, then choose the Display the location of the logger(s)


item.

526 Setting Up and Managing ASE Members

Exercises

4. Choose the Display the level of logging item from the

Obtaining ASE Status menu to display the current level.


Choose the Set the logging level item from the Managing the
ASE menu to set the severity level.
5. Choose the Edit the error Alert script item from the Managing

the ASE menu. Add your user name after "ADMIN=".


Choose the Test the error Alert script item from the Managing
the ASE menu. You should receive a mail message similar to
the following.
tinker AseMgr ***ALERT: Test of alert script
6. Turn the power off at another member system.
7. Sample example.

Jan 18 14:11:48 tinker ASE: tailor AseMgr ***ALERT:


Member tailor is not available

Setting Up and Managing ASE Members 527

6
Writing and Debugging Action Scripts

Writing and Debugging Action Scripts 61

About This Chapter

About This Chapter


Introduction

Action scripts control starting and stopping available services.


Application services require scripts to start and stop the
application; disk and NFS services may not need any scripts.
This chapter discusses the types of action scripts that TruCluster
Available Server uses and the guidelines and conventions for
creating and debugging them. It shows how to use the asemgr
utility to add scripts to a service.

Objectives

To write and debug action scripts, you should be able to:

Create action scripts

Resources

Describe the types of action scripts and the action script


conventions
Test and debug action scripts

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

62 Writing and Debugging Action Scripts

Introducing Action Scripts

Introducing Action Scripts


Overview

Action scripts contain the operations to set up, start and stop a
service in an Available Server Environment (ASE), so that it can
fail over from one member to another.
TruCluster Available Server denes several types of action scripts.
Some available services do not require scripts; some services
require only two. There are a number of conventions to follow
when writing scripts.

Types of Action
Scripts

An action script species a series of operations telling the system


what to do to manage an available service.
There are ve types of action scripts:

Add action script contains all the commands to congure the


service on a system; for example, set up a system parameter,
create a device special le, or edit a le. You may not need an
add script for your service.
This script is executed on all member systems when you
congure the service and when a member system reboots.

Delete action script contains the commands to reverse the


service setup, to undo the add script; for example, if the add
script sets a system parameter, the delete script resets the
parameter.
This script is executed on all member systems when you delete
a service from the ASE.

Start action script contains the commands to start the service


on a member system; for example, invoke the application.
This script is executed only on the member selected to provide
the service to start or restart (online) the service.

Stop action script contains the commands to stop the service;


for example, stop the application. The script must stop all
processes accessing disks used in the service or it cannot
unmount the disks and stop the service.
This script is executed on the member selected to provide
the service to stop (ofine) or relocate the service. It is also
executed when a member reboots, in case the server crashed.

Check action script determines if a service is running


on a member. The default check action script checks
for the existence of the le /var/opt/TCR*/ase/tmp
/service_name_IS_RUNNING, which is created when TruCluster
Software starts the service on that system.
This script is executed when the TruCluster Software director
daemon starts, when you check the status of a service, or
when you stop or delete a service.

Writing and Debugging Action Scripts 63

Introducing Action Scripts

All members run the add script, while only the member selected
to offer the service runs the start script.
All the stop scripts are run on reboot to make sure the services
are cleaned up. (TruCluster Available Server does not know if
this system was running a service before reboot.)

Available
Services and
Scripts

For NFS services, you usually specify service-specic information


in response to asemgr prompts and you need no other scripts.
TruCluster Available Server includes internal scripts for NFS and
disk services to create devices, set up AdvFS le domains and
LSM logical volumes, and mount and unmount le systems.
You must create the action scripts for a user-dened (application)
service to dene the operations to control your application. You
need at least start and stop scripts.
If your disk service includes an application, you must create
scripts to start and stop the application; scripts to fail over control
of the disks are created from your responses to asemgr prompts.

Script Exit
Codes

The following table shows how TruCluster Software interprets


script exit codes.
Table 61 Script Exit Codes
Script

Exit Code

Meaning

Add

0 (zero)

Success

Delete
Start
Stop

6=0 (non-zero)

6=0

6=0

Failure
Success
Failure
Success
Failure
Failure

99

Failure because service or device


busy

Between 100
and 200

Service is running

Less than 100

Script Output

Success

Check

Service is not running

All standard output and error output from a script goes to the
TruCluster Software Logger daemon, if one is running in the
environment. The Logger daemon passes messages to the syslog
daemon on the same system. If a Logger daemon is not running
in the environment, all messages are logged locally.

64 Writing and Debugging Action Scripts

Introducing Action Scripts

If a script exits with a 0 (zero), it is logged as an informational


message. Otherwise, it is logged as an error.

Skeleton
Scripts

The Available Server skeleton scripts can be used as a base


for application-specic commands. They are located in the
/var/opt/TCR*/ase/lib directory. The le names are:

addAction

checkAction

deleteAction

startAction

stopAction

TruCluster Available Server provides skeleton scripts for you to


add your application-specic commands to. These scripts are
found in the /var/opt/TCR*/ase/lib directory. A sample skeleton
start action script is shown in Example 61. A sample skeleton
check action script is shown in Example 62.
Example 61 Skeleton Start Action Script

# more /var/ase/lib startAction


.
.
.
#
# A skeleton example of a start action script.
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi

# Service name to start

#
# Any non zero exit will be considered a failure.
#
exit 0

Writing and Debugging Action Scripts 65

Introducing Action Scripts

Example 62 Skeleton Check Action Script


# more /var/ase/lib/checkAction
.
.
.
#
# A skeleton example of a check action script.
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi

# Service name to check

#
# For check, exit with 100 to 200 if service is running on this member,
# else exit with < 100 if not running.
#
if [ -f ${ASETMPDIR}/${svcName}_IS_RUNNING ]; then
exit 100
else
exit 0
fi

66 Writing and Debugging Action Scripts

Introducing Action Scripts

Start and Stop


Scripts

Example 63 shows a sample start script for an ASE 1.2 disk


service, a disk-based database application. This script is based
on the skeleton start action script. Example 64 shows the
corresponding stop script.

Example 63 Start Script

#!/bin/sh
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp

1
3

if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi

echo "ASE: Starting $svcName start script..." | tee /dev/console 5


echo "ASE: Setting dbserver internet address ..." | tee /dev/console
#
# This sets up the alias: ifconfig alias dbserver
#
/var/ase/sbin/nfs_ifconfig $svcName start dbserver 6
echo "ASE: Starting $svcName service ..." | tee /dev/console
. /usr/local/dbadm/oracle_version/def_ora_syslog
/bin/su oracle -c $DBSTART | tee /dev/console
7
echo "ASE: Service $svcName is running on hostname" | tee /dev/console
#
# Any non zero exit will be considered a failure.
#
exit 0
1 8

Writing and Debugging Action Scripts 67

Introducing Action Scripts

The following example shows the corresponding stop script.


Example 64 Stop Script
1
#!/bin/sh
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
echo "ASE: Starting $svcName stop script" | tee /dev/console 5
echo "ASE: Killing polyserver processes ..." | tee /dev/console
/usr/local/bin/psvkill | tee /dev/console
echo "ASE: Stopping the database service ..." | tee /dev/console
. /usr/local/dbadm/oracle_version/def_ora_syslog
/bin/su oracle -c $DBSHUT | tee /dev/console
echo "ASE: Deleting the dbserver internet address ..." | tee /dev/console
/var/ase/sbin/nfs_ifconfig $svcName stop dbserver
#
# exit 0 = success
# exit 1 = failure
# exit 99 = failure because busy
#
exit 0
1

This is part of the skeleton provided by TruCluster Available


Server.

You can use any shell. It is good programming practice to


identify the shell in the rst line. Otherwise the roots login
shell executes the script.

Dene the path needed for your commands.

If arguments were passed to the script, the rst argument


should be the service name.

This is user-dened action.

TruCluster Available Server provides the nfs_ifconfig script


to alias the service name/address to the member system.

Start the application.

It would be more accurate to check the result of the userdened action and return the result of that, rather than
always returning success.

68 Writing and Debugging Action Scripts

Creating Action Scripts

Creating Action Scripts


Overview

To fail over an application, you must create a start action


script and a stop action script. If you need to set up the system
environment to run the service, you must also create add and
delete action scripts. Create a check action script so TruCluster
Available Server can determine if the service is running.
When you add or modify a service, you can use the asemgr utility
to specify your action scripts. You can create a script and specify
the pathname to asemgr, or you can edit a skeleton script through
asemgr.
You should test your action scripts outside the Available Server
Environment (ASE) to ensure they work correctly before making
them part of a service. To debug your script in TruCluster
Available Server, set the event logging level to ensure all
signicant messages are logged, start and stop the service, then
check the system event logs.
The asemgr utility allows you to specify arguments passed to the
script and a timeout value representing the maximum length of
time the script needs to run. If the script takes longer than the
timeout value, Available Server will consider the script failed.

Methods to
Create Scripts

Use the asemgr utility to specify any user-dened action scripts


when adding or modifying a service.
There are three ways to create and specify a user-dened action
script for a service:

Create the script outside TruCluster Available Server by


copying a skeleton script and modifying it. Test the script
thoroughly before running asemgr. Specify the pathname when
asemgr prompts for the script name. Your script is then copied
into the ASE database. Any further changes to the script
must be done using the asemgr utility because Available Server
uses only the database copy of the script.

When the asemgr prompts you for a script name, specify


default. You can then edit the TruCluster Available Serverprovided skeleton action script with the commands the system
needs to perform.

Create the action script outside Available Server by copying


the skeleton (default) script and modifying it. When you have
the script completed, copy it to all member nodes. Test the
script before running asemgr. Specify default when asemgr
prompts for the script name. Edit the script, adding the
pathname of the script you create. This method enables
you to edit the script without interrupting the service by
making changes with the asemgr utility. However, you must

Writing and Debugging Action Scripts 69

Creating Action Scripts

redistribute copies of the script to all member nodes after


making changes.

Specifying
Your Own
Script

The following example shows how to specify an action script at


the asemgr prompt. The script must already exist at the specied
pathname. Verify that the script operates as expected before you
run asemgr.

Example 65 Specifying Your Own Action Script

Service Configuration
a) Add a new service
m) Modify a service
.
.
Modifying user-defined scripts for service1:
1) Start action
.
.
Modifying the start action script for service1:
f)
e)
g)
t)
r)
x)

Replace the start action script


Edit the start action script
Modify the start action script arguments [service1]
Modify the start action script timeout [60]
Remove the start action script
Exit - done with changes

Enter your choice [x]:f


Enter the full pathname of your start action script or "default"
for the default script (x to exit): /usr/sbin/dbase_account

Editing the
Default Script

The following example shows how to edit the default action


script. When you choose the "Edit the start action script" menu
option (e), asemgr loads the appropriate skeleton action script and
places you in the vi editor, or the editor dened by the EDITOR
environment variable. You can now make the appropriate changes
to the le.

Example 66 Editing the Default Action Script

Enter the full pathname of your start action script or "default"


for the default script (x to exit): default
e) Edit the start action script
.
.
(continued on next page)

610 Writing and Debugging Action Scripts

Creating Action Scripts

Example 66 (Cont.) Editing the Default Action Script

#!/bin/sh
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi

# Service name to start

# Start prophecy_1 application:


su - prophecy -c dbstart
#
# Any non zero exit will be considered a failure.
#
exit 0
:wq

Pointing to an
External Script

The following example shows how to edit the default action script
to point to an external script. Be sure to copy the external script
to all member systems.

Example 67 Pointing to an External Script

Enter the full pathname of your start action script or "default"


for the default script (x to exit): default
e) Edit the start action script
.
.
#!/bin/sh
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi

# Service name to start

# Start application with script


/usr/local/adm/start.ts
#
# Any non zero exit will be considered a failure.
#
exit 0
:wq

Writing and Debugging Action Scripts 611

Creating Action Scripts

Additional
Script
Information

In addition to script pathnames, the asemgr utility allows you to


specify:

Arguments that are passed to scripts, useful if you have a


generic script that you need to pass the service name or an
action to make the script work.

Timeout value, the maximum number of seconds Available


Server should wait for the script to complete. If the script
runs longer than the timeout value (for example, because it
has hung), Available Server considers the script failed and
reports the failure as a timeout of the script.

612 Writing and Debugging Action Scripts

Testing and Debugging Action Scripts

Testing and Debugging Action Scripts


Overview

You should test your action scripts outside the TruCluster


Available Server service to ensure they work correctly. To debug
your script in Available Server, set the event logging level to
ensure all signicant messages are logged, start and stop the
service, then check the system event logs.

Test First

Test your action scripts before you include them in the service. A
bug in a script can cause the service to hang; neither start nor
stop cleanly. Restarting a service that is not stopped cleanly can
cause data corruption or panic the system.
Your scripts should run without error. Some general debugging
tips include:

Use the -n option (Bourne and C shells) to read commands


and check them for syntax errors without executing them.

Debugging
Scripts in ASE

Add echo commands to display variables or messages


indicating activity.

Use the -x option (Bourne, Korn and C shells) to print


commands and their arguments as they are executed.

You can set the Available Server Environment (ASE) logging level
to informational to log all messages, including script execution
success. Once you add the scripts, examime the syslog daemon
log for any problems. If you added any debugging echo commands
to the script, they will show up in the log.

Writing and Debugging Action Scripts 613

Summary

Summary
Introducing
Action Scripts

Action scripts contain the operations to set up, start and stop
a service in an Available Server Environment (ASE), so that it
can fail over from one member to another. TruCluster Available
Server supports the following types of action scripts:

Add (congure) service

Delete service

Start service

Stop service

Check if a service is running

You need at least start and stop action scripts for application
services.

Creating Action
Scripts

When you add or modify a service, you can use the asemgr utility
to:

Specify your action scripts

Create a script and specify the pathname to asemgr

Edit a skeleton script through asemgr


Add your commands to the skeleton script
Add a pointer to an external script

The asemgr utility allows you to specify arguments passed to the


script and a timeout value representing the maximum length of
time the script needs to run.

Testing and
Debugging
Action Scripts

You should test your action scripts outside the TruCluster


Available Server service to ensure they work correctly. To debug
your script, set the event logging level to ensure all signicant
messages are logged, start and stop the service, then check the
system event logs.

614 Writing and Debugging Action Scripts

Exercises

Exercises
Introducing
Action Scripts:
Exercise

1. Describe the ve types of action scripts.


2. For each of the types of TruCluster Available Server services,

which scripts must you dene?


3. Where does script output go? How is it treated when the

script succeeds; when it fails?


4. TruCluster Available Server provides skeleton scripts in the

/var/opt/TCR*/ase/lib directory. Examine these scripts and


determine what each does.

Introducing
Action Scripts:
Solution

1. There are ve types of action scripts:

Add action script contains all the commands to set up the


system environment for the service to run.

Delete action script contains all the commands to reverse


actions in the add script.

Start action script contains all the commands to start a


service on a member system.

Stop action script reverses the actions in the start script;


for example, stop the application.

Check action script enables TruCluster Available Server to


determine if a service is running.

2. You do not generally need scripts for an NFS service. If a

disk service includes an application, you must create scripts


to start and stop the application. You need at least start and
stop scripts for user-dened (application) service.
3. Standard output and error output from a script goes to

aselogger (then to syslog). Script success is logged as an


informational message; script failure is logged as an error.
4. The skeleton action scripts addAction, deleteAction,

startAction, and stopAction are similar. Here is the addAction


script:

Writing and Debugging Action Scripts 615

Exercises

#
# A skeleton example of an add action script.
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
# Service name to add
else
svcName=
fi
#
# Any non zero exit will be considered a failure.
#
exit 0
They dene a command path and a temporary directory,
determine the service name from the rst argument, and exit
with a success code.
The stopAction script runs when the service is stopped, and
also when the TruCluster Available Server is initializing on a
member as it boots.
#
# A skeleton example of a stop action script.
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1 # Service name to stop
else
svcName=
fi
case "${MEMBER_STATE}" in
BOOTING) # Stopping ${svcName} as ASE member boots.
;;
RUNNING) # This is a true stop of ${svcName}.
;;
esac
#
# exit 0 = success - service stopped successfully
# exit 1 = failure - could not stop service
# exit 99 = failure - could not stop service (service busy)
#
exit 0

Creating
Action Scripts:
Exercise

1. Write start and stop scripts for the calculator application.

Place the scripts in the /usr/local/adm directory.


a. Create a start script (calc-start) to display the calculator

on a client workstation. Use the command:


/usr/bin/X11/dxcalc -d client &

616 Writing and Debugging Action Scripts

Exercises

(Suggestion: you can use the skeleton scripts in /var/opt


/TCR*/ase/lib.)
b. Create a stop script (calc-stop) to stop the calculator. Use

the command:
kill ps -e |grep "dxcalc" |grep -v grep |awk {print $1}
c.

What are the disadvantages of this kill command?

d. An alternative is for the stop script to write the process

ID (PID) of the application to a le (by convention in the


/var/run directory). Then the stop script can kill the right
process. Update your scripts. (Hint: use the service name
for a le name.)
2. Write a single action script (calc-start-stop) that will start

and stop the calculator application depending on the argument


passed in.
3. There are problems with the following script. Can you identify

them?
#
# calc start action script.4
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
dxcalc -d tinker &
exit 0
4. The following start script fails because of a timeout error. Can

you identify the reason?


#!/bin/sh
# calc start action script.5
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
/usr/bin/X11/dxcalc -d tinker
exit 0

Writing and Debugging Action Scripts 617

Exercises

Creating
Action Scripts:
Solution

1. Write start and stop scripts for the calculator application.


a. Sample solution

#!/bin/sh
# calc start action script.
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
/usr/bin/X11/dxcalc -d tinker &
exit $?
b. Sample solution

#!/bin/sh
# calc stop action script.
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
kill ps -e |grep "dxcalc" |grep -v grep |awk {print $1}
exit $?
c.

The disadvantage of this kill command is if several dxcalc


processes are running, this command may stop the wrong
one.

d. Sample start script

#!/bin/sh
# calc start action script.2
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
/usr/bin/X11/dxcalc -d tinker &
pid=$!
echo $pid > /var/run/${svcName}.pid
exit $?

618 Writing and Debugging Action Scripts

Exercises

Sample stop script


#!/bin/sh
# calc stop action script.2
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
kill cat /var/run/${svcName}.pid
exit $?
2. Sample solution

#!/bin/sh
# calc action script.3
#
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1 # Service name to stop
else
echo "$0: Insufficient arguments"
echo "usage: service start|stop"
exit 1
fi
if [ $# -gt 1 ]; then
action=$2
else
echo "$0: Insufficient arguments"
echo "usage: $1 start|stop"
exit 1
fi
case "${action}" in
start)
/usr/bin/X11/dxcalc -d rdwngs:0 &
pid=$!
echo "Service name is " $svcName | tee /dev/console
echo "$svcName PID is " $pid | tee /dev/console
echo $pid > /var/run/${svcName}.pid
exit $?
;;
stop)
echo "Stopping " $svcName "service" | tee /dev/console
kill cat /var/run/${svcName}.pid
exit $?
;;
esac
exit 0

Writing and Debugging Action Scripts 619

Exercises

3. The shell is not dened; if the default shell of TruCluster

Available Server (root) is not sh, the script may not run
correctly. The application path is not dened. Even if the
dxcalc command fails, the script returns a zero (success).
4. The application is invoked in foreground rather than

background and does not return.

Testing and
Debugging
Action Scripts:
Exercise

1. The following start script fails. Can you identify the problem?

#!/bin/sh
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
if [ -f /usr/local/appstart ]; then
su - dbmaster -c /usr/local/appstart
exit 0
2. The following stop script stops the application and unmounts

the disk. Can you identify any potential problems?


#!/bin/sh
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
/usr/local/application stop
/sbin/umount /appdata
exit 0

Testing and
Debugging
Action Scripts:
Solution

1. Syntax error; if unmatched.


2. There is no error checking. If a le is open, the disk will fail

to unmount. The stop script still succeeds. The start script on


another member will mount and reserve the disk, which can
then be seen as mounted on both systems. (However, only the
system with the SCSI reserve can perform I/O to the disk.)

620 Writing and Debugging Action Scripts

7
Setting Up ASE Services

Setting Up ASE Services 71

About This Chapter

About This Chapter


Introduction

To make an application highly available, set up an Available


Server environment (ASE) service for that application. ASE
selects a member system to provide the service to client systems.
Clients refer to the service name rather than the server name. If
the server fails, ASE will relocate the service to another member
system.
This chapter discusses the types of services that ASE supports
and how to set them up. It examines the action scripts to control
a service, as well as using the asemgr utility to manage services.

Objectives

To set up and manage ASE services, you should be able to:

Describe the service control structure: the automatic service


placement policy.

Set up a NFS service

Set up a Disk service

Set up a user-dened service

Resources

Describe the services ASE supports

Manage services in the Available Server environment

For more information on the topics in this chapter, see the


following:

72 Setting Up ASE Services

TruCluster Available Server Software Available Server


Environment Administration

Understanding Highly Available Services

Understanding Highly Available Services


Overview

To make an application highly available, you set up an Available


Server environment (ASE) service for that application.
Each service is assigned a unique name.

Introducing
Supported
Services

ASE supports three types of services:


A Network File System (NFS) service provides highly available
access to exported disk data. When you create an NFS service,
specify the UNIX le systems, AdvFS lesets, or LSM logical
volumes to export.

A disk service provides highly available access to disks or a


disk-based application, such as a database program. A disk
service is similar to an NFS service except that no data is
exported. When you create a disk service, specify the UNIX
le systems, AdvFS lesets, or LSM logical volumes and any
application that you want to make available.

Describing
Clients and
Services

A user-dened service provides highly available access to


an application that is not disk based; for example, a login
service. When you create a user-dened service, specify the
application.

Clients refer to service names rather than server names. The


following gure shows the client view of the ASE environment.

Setting Up ASE Services 73

Understanding Highly Available Services

Figure 71 Client View of ASE


Client

nfs_service

Client

Client

mail_service

NFS
programs

NFS
programs
sendmail
dbase_service
login_service

Database
application

Member system

ifconfig

Member system
ZKOX381682RGS

To access the NFS service nfs_service, a client will have a line


such as the following in its /etc/fstab le:
/project@nfs_service /usr/project nfs rw,bg 0 0
It will also have an entry in its /etc/hosts le for nfs_service
with an Internet address. This is not the address associated
with the host name of any system in the network. It is a oating
address aliased to the member system currently running the
service.
If the service is relocated to another member system, the new
member system will respond to that Internet address. Clients are
unaware of the change in the system exporting the le systems
and experience only a temporary NFS server time out.

Setting Up a
Service

To set up a service, you must prepare your application and


disks. Provide information to ASE about the application, such as
commands to start and stop the application. You can restrict the
service so that it runs only on a select group of member systems.
After you use the asemgr utility to set up a service, ASE chooses
a member system to run the service for clients. Although each
member system can run any service, only one member system
runs a service at one time.

74 Setting Up ASE Services

Preparing to Set Up Services

Preparing to Set Up Services


Overview

Before adding a service, you must plan how to set up the service
and perform some preparatory tasks. For example, you may need
to set up NFS, AdvFS, or LSM. You must install any application
you want to make highly available before you set up the service.
You must assign each service a unique name and an automatic
service placement (ASP) policy.
You may also need to generate start and stop action scripts to
start and stop an application.

Documentation

The TruCluster Available Server Software Available Server


Environment Administration, Chapter 2 describes many of the
tasks to prepare to set up services. It includes information on
using AdvFS and LSM.

Automatic
Service
Placement
Policy

You must designate an automatic service placement policy (ASP)


when you create a service. The automatic service placement
policy enables you to control which members can run the service.
The ASP is used when ASE automatically starts or relocates a
service. You can override the automatic placement policy by using
the asemgr utility to manually relocate a service.
There are three automatic service placement (ASP) policies:

Balanced service distribution tries to balance the service


load. ASE will choose the member running the least number
of services at the time the new service is started.

Favor members checks the specied members in order rst.


If one of them is available, it is selected to run the service. If
none of the favored members are available, ASE will choose
the member running the least number of services.

Restrict to favored members checks the specied members


in order. If none of the favored members are available, ASE
will not start the service. This policy ensures that ASE never
moves the service to a member not on the list.
You can however, manually relocate the service to another
member not on the list.

In addition, you must specify how you want ASE to react when a
more highly favored member system becomes available. You can
choose to relocate the service to the more highly favored member
or keep it running on the current member.

Setting Up ASE Services 75

Preparing to Set Up Services

Services and
Disks

NFS and disk services can use both applications and disks.
User-dened services use applications.
A disk cannot be used in more than one service because a service
must have exclusive access to the disk. When you use a disk in a
service, you use the entire disk. The availability manager driver
reserves a disk using the SCSI reserve command.
Once a disk is used in an ASE service, it must be managed within
the ASE.
To stop a disk-based service, ASE must be able to unmount the
le systems. This means ASE must be able to stop all processes
accessing the mounted le systems. You should ensure that all
processes invoked by the start action script are stopped by the
stop action script. Avoid users accessing the local mount point
(and preventing unmounting) by allowing access only to the
directory that is exported.

Using NFS

NFS service names cannot be member system names. Service


names and member names must be unique. Service names and
member names must have addresses on the same IP subnet, and
must be in all members /etc/hosts les. The service name and
IP address must also be in all clients /etc/hosts le.
Do not use the automount command option /net -hosts on any
client system to access an NFS service. It may cause a stale le
handle error if the service is relocated.
To enable client access through NFS on a member system, create
an entry in each member systems /etc/fstab le specifying the
exported path and the service name as the remote host.

Using UFS

To use UFS with ASE, set up the disks in the usual way with
disklabel and newfs. Do not locally mount the le systems
because ASE mounts them for you when the service is started.

Using AdvFS

To use AdvFS with ASE, set up the domains and lesets on the
same member on which you will run asemgr to add the service.
A service can use more than one AdvFS domain, but a domain
cannot be used by more than one service. A service should control
all the lesets in the domain; do not put one leset in a service
and mount another locally. Do not locally mount the lesets
because ASE mounts them for you when the service is started.
All member systems should have AdvFS software installed.

76 Setting Up ASE Services

Preparing to Set Up Services

Quotas

You can enable quotas on UFS le systems or AdvFS lesets used


in an ASE service. To use quotas you must mount the /proc le
system on each member system that you want to fail over.
Add the /proc le system to the /etc/fstab le as follows:
/proc /proc procfs rw
There are two methods to set up disk quotas on a UNIX le
system:

Using the quotacheck command:


1. Set up the quota.user or quota.group le.
2. Use the asemgr utility to specify the le system in a service.

When prompted for quota les, specify the pathname of the


quota.user or quota.group le.

Using the asemgr and edquota utilities:


1. Use asemgr to specify the le system.
2. When prompted for quota les, specify the pathname for

quota.user or quota.group.
3. After you start the service, use the edquota command to

specify the quota limits. The edquota command must be


run on the system running the service.
For AdvFS lesets, when you use the mkfset command to create
an AdvFS leset, the command sets up quota les in the root of
the leset. To set up quotas on an AdvFS leset in ASE:

Specify the quota.user or quota.group le.

Using LSM

Use the asemgr utility to specify the leset.


After adding the service, use the vedquota command to specify
quota limits in the les.

You can use LSM logical volumes in NFS and disk services. You
can also use a UNIX le system or an AdvFS leset on top of an
LSM volume. Set up the disk groups, logical volumes and le
systems or lesets on the same member on which you will run
asemgr to add the service.
All member systems need LSM software so that any of them can
run the service. Each member system needs a rootdg disk group
set up on a local (nonshared) disk. The rootdg disk group must
be active (imported) whenever ASE is active, to provide an active
disk group for LSM. Set up other disk groups using the shared
disks.
See TruCluster Available Server Software Available Server
Environment Administration Chapter 2 for more information on
using LSM with ASE.

Setting Up ASE Services 77

Preparing to Set Up Services

Installing the
Application

If your service includes an application, install the application


before setting up the service. For example, if you are setting up
an NFS mail service, set up the mail hubs; if you are setting up
a database service, install the database program on all member
systems.
An application must have the following characteristics to be made
highly available with ASE:

Conguration
Example

The application must run on only one system at a time.


The application must be able to be started and stopped using
a series of commands (action script).

This example conguration is based on information in


SUPER::USER6:[SES$PUBLIC]PARIBAS_DIR.TAR.
Consider a sample conguration (Example 71) using two DEC
7610 systems with four shared SCSI buses, each with three
RZ26L disks, to provide a disk service consisting of a 6-gigabyte,
mirrored Oracle database made up of two stripe sets of six disks.
This conguration uses LSM to provide a mirrored stripe set, the
logical volume voldb, as shown in the following volprint report.
This conguration uses an AdvFS domain (db_dom) on the logical
volume voldb and leset db_fs ( Example 72).

78 Setting Up ASE Services

Preparing to Set Up Services

Example 71 Providing a Mirrored Stripe Set Using LSM

Disk group: rootdg

(Local disk group)

TYPE NAME
dg rootdg

ASSOC
rootdg

KSTATE
-

dm

rz1g

rz1g

LENGTH COMMENT
3584

(Shared disk group)

Disk group: db
TYPE NAME
dg db

ASSOC
db

KSTATE
-

LENGTH COMMENT
-

dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm

pd-rz33
pd-rz34
pd-rz35
pd-rz41
pd-rz42
pd-rz43
pd-rz49
pd-rz50
pd-rz51
pd-rz57
pd-rz58
pd-rz59

rz33
rz34
rz35
rz41
rz42
rz43
rz49
rz50
rz51
rz57
rz58
rz59

vol
plex
sd
sd
sd
sd
sd
sd
sd
plex
sd
sd
sd
sd
sd
sd
sd

voldb
db-01
pd-rz33-log
pd-rz33-data
pd-rz41-data
pd-rz34-data
pd-rz42-data
pd-rz35-data
pd-rz43-data
db-02
pd-rz49-log
pd-rz49-data
pd-rz57-data
pd-rz50-data
pd-rz58-data
pd-rz51-data
pd-rz59-data

fsgen
voldb
db-01
db-01
db-01
db-01
db-01
db-01
db-01
voldb
db-02
db-02
db-02
db-02
db-02
db-02
db-02

ENABLED 12301824 (Logical volume)


ENABLED 12301824 (Mirror 1)
1
2050304
2050304
2050304
2050304
2050304
2050304
ENABLED 12301824 (Mirror 2)
1
2050304
2050304
2050304
2050304
2050304
2050304

2050347
2050347
2050347
2050347
2050347
2050347
2050347
2050347
2050347
2050347
2050347
2050347

Setting Up ASE Services 79

Preparing to Set Up Services

Example 72 Creating a File Domain Using AdvFS

# showfdmn -k db_dom
Id
Date Created LogPgs Domain Name
2e6b3030.000b9620 Mon Sep 5 16:50:24 1994
512 db_dom
Vol
1L

1K-Blks
Free % Used Cmode Rblks Wblks Vol Name
6150912 4851072
21%
on 128 128 /dev/vol/db/voldb

# showfsets db_dom
db_fs
Id
: 2e6b3030.000b9620.1.8001
Files
:
529, SLim= 0, HLim= 0
Blocks (512) : 2587658, SLim= 0, HLim= 0
Quota Status : user=on group=on

710 Setting Up ASE Services

Setting Up NFS Services

Setting Up NFS Services


Overview

This topic describes how to use the asemgr to add a Network File
System (NFS) service to an available server environment (ASE).

Describing an
NFS Service

An NFS service exports one or more le systems, AdvFS lesets,


or LSM logical volumes located on one or more entire disks. If
a hardware or software failure occurs, ASE relocates the le
systems, lesets, or logical volumes to another member system
for export to clients. To enable clients to access an NFS service,
the NFS service name is assigned its own Internet address. The
member system that runs the service responds to this address.
Both the client and member systems must be running NFS
Version 2.0 or Version 3.0 and use the Address Resolution Protocol
(ARP).
NFS Version 3.0 has an option to use TCP connections for NFS
mounts. This option cannot be used with an ASE service.

Discussing
the NFS
Service Setup
Procedure

To set up an NFS service:


1. Specify the service name and its Internet address in all

member and client systems /etc/hosts les. The service name


is a virtual host name making the service independent of the
availability of any member system.
2. Use the asemgr utility to specify the following:

Service name (already in /etc/hosts)

Disks
UFS device special le names
AdvFS lesets
LSM volumes

Mount pathname to be exported to clients

(Optional) netgroups or system names allowed access

Read-write or read-only access

(Optional) mount options

NFS locking area


If you have more than one writeable disk areas in a
service, you need to specify a writeable space to store some
state information for NFS locking that must be failed over.

Service automatic service placement (ASP) policy and


favored members

3. Add the service name and export mount point to the clients

/etc/fstab les.

Setting Up ASE Services 711

Setting Up NFS Services

Setting Up an
NFS Service
for a Public
Directory

Example 73 in this section covers how to use the assemgr to set


up an NFS service for a public directory to be exported to other
systems. This example uses an AdvFS domain, but could use a
UFS le system or LSM volume.
The procedure in Example 73 assumes that the AdvFS domain
has already been set up.
Example 73 shows how to use the asemgr utility to add the NFS
service nfspublic, which exports an AdvFS domain.

Example 73 Adding an NFS Service


# asemgr

TruCluster Available Server (ASE)


ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status

-->
-->
-->

x) Exit
Enter your choice: m

?) Help
2

Managing ASE Services


c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu


Enter your choice [x]: c

?) Help

Service Configuration
a)
m)
d)
s)

Add a new service


Modify a service
Delete a service
Display the status of a service

x) Exit to Managing ASE Services

Enter your choice [x]: a

?) Help

Adding a service
Select the type of service:
1) NFS service
2) Disk service
3) User-defined service
x) Exit to Service Configuration

?) Help
(continued on next page)

712 Setting Up ASE Services

Setting Up NFS Services

Example 73 (Cont.) Adding an NFS Service


Enter your choice [1]: 1

You are now adding a new NFS service to ASE.


An NFS service consists of an IP host name and disk configuration that are
failed over together. The disk configuration can include UFS file systems,
AdvFS filesets, or LSM volumes.
NFS Service Name
The name of an NFS service is a unique IP host name that has been set up
for this service. This host name must exist in the local hosts database
on all ASE members.
Enter the NFS service name (q to quit): nfspublic

Checking to see if nfspublic is a valid host...


Specifying Disk Information
Enter one or more UFS device special files, AdvFS filesets, or LSM volumes
to define the disk storage for this service.
For example:

UFS device special file: /dev/rz3c


AdvFS fileset:
domain1#set1
LSM volume:
/dev/vol/dg1/vol01

To end entering disk information, press the Return key at the prompt.
Enter a device special file, an AdvFS fileset, or an LSM volume as storage
7
for this service (press Return to end): public-domain#public
ADVFS domain public-domain has the following volume(s):
/dev/vol/public-dg/public-vol
Is this correct (y/n) [y]: y

Following is a list of device(s) and pubpath(s) for disk group public-dg:


DEVICE PUBPATH
rz18 /dev/rz18g
rz34 /dev/rz34g
Is this correct (y/n) [y]: y

Enter a directory pathname(s) to be NFS exported from the storage area


"public-domain#public". Press Return when done.
Directory pathname: /usr/nfspublic

10

Enter a host name, NIS netgroup, or IP address for the NFS exports
list. (press Return for all hosts):
11
Directory pathname: Return
AdvFS Fileset Read-Write Access and Quota Management
Mount public-domain#public fileset with read-write or read-only access?
1) Read-write
2) Read-only
Enter your choice [1]: 1

12

You may enable user, and group and fileset quotas on this file system by
specifying the full pathnames for the quota files. Quota files must reside
within the fileset. Enter "none" to disable quotas.
(continued on next page)

Setting Up ASE Services 713

Setting Up NFS Services

Example 73 (Cont.) Adding an NFS Service


User quota file path [/var/ase/mnt/nfspublic/usr/nfspublic/quota.user]: Return
Group quota file path [/var/ase/mnt/nfspublic/usr/nfspublic/quota.group]: Return

13

AdvFS Mount Options Modification


Enter a comma-separated list of any mount options you want to use for
the public-domain#public fileset (in addition to the defaults listed in the
mount.8 reference page). If none are specified, only the default mount
options are used.
Enter options (Return for none):

Return

14

Specifying Disk Information


Enter one or more UFS device special files, AdvFS filesets, or LSM volumes
to define the disk storage for this service.
For example: UFS device special file: /dev/rz3c
AdvFS fileset:
domain1#set1
LSM volume:
/dev/vol/dg1/vol01
To end entering disk information, press the Return key at the prompt.
Enter a device special file, an AdvFS fileset, or an LSM volume as storage
15
for this service (press Return to end): Return
Modifying user-defined scripts for nfspublic:
1)
2)
3)
4)

Start action
Stop action
Add action
Delete action

x) Exit - done with changes


Enter your choice [x]:

Return

16

Selecting an Automatic Service Placement (ASP) Policy


Select the policy you want ASE to use when choosing a member
to run this service:
b) Balanced Service Distribution
f) Favor Members
r) Restrict to Favored Members
x) Exit to Service Configuration
Enter your choice [b]: b

?) Help

17

Selecting an Automatic Service Placement (ASP) Policy


Do you want ASE to consider relocating this service to another member
18
if one becomes available while this service is running (y/n/?): y
Enter y to add Service nfspublic (y/n): y

19

Adding service...
Starting service...
Service nfspublic successfully added...
Service Configuration
a)
m)
d)
s)

Add a new service


Modify a service
Delete a service
Display the status of a service
(continued on next page)

714 Setting Up ASE Services

Setting Up NFS Services

Example 73 (Cont.) Adding an NFS Service


x) Exit to Managing ASE Services
Enter your choice [x]: s

?) Help

20

Service Status
Select the service whose status you want to display:
1) nfsusers on tinker
2) nfspublic on tailor
x) Exit to previous menu
Enter your choice [x]: 2

?) Help

21

Status for NFS service nfspublic


Status:
on tailor

Relocate: Placement Policy:


yes
Balance Services

Favored Member(s):
None

Storage configuration for NFS service nfspublic


NFS Exports list
/usr/nfspublic
Mount Table (device, mount point, type, options)
public-domain#public /var/ase/mnt/nfspublic/usr/nfspublic advfs rw,groupquota,userquota
Advfs Configuration
Domain:
Volume(s):
public-domain
/dev/vol/public-dg/public-vol
LSM Configuration
Disk Group:
Device(s):
public-dg
rz18 rz34
Press Return to continue:

Return

Service Status
Select the service whose status you want to display:
1) nfsusers on tinker
2) nfspublic on tailor
x) Exit to previous menu
Enter your choice [x]:

?) Help

Return

Service Configuration
a)
m)
d)
s)

Add a new service


Modify a service
Delete a service
Display the status of a service

x) Exit to Managing ASE Services


Enter your choice [x]:

?) Help

Return

Managing ASE Services


(continued on next page)

Setting Up ASE Services 715

Setting Up NFS Services

Example 73 (Cont.) Adding an NFS Service


c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu


Enter your choice [x]:

?) Help

Return

TruCluster Available Server (ASE)


ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status
x) Exit

-->
-->
-->
?) Help

Enter your choice: x


#
1

Invoke the asemgr utility.

From the ASE Main menu, choose the Managing ASE Services
item.

From the Managing ASE Services menu, choose the Service


Conguration item.

From the Service Conguration menu, choose the Add a new


service item.

From the Add a new service menu, choose the NFS service.

Enter the service name, the virtual host name already in the
/etc/hosts le with an Internet address.

Enter the UFS device special le, AdvFS leset, or LSM


volume which denes the disk storage for this service. There
can be more than one UFS device special le, and so forth.
Other storage is added later on.

Because the storage is an AdvFS leset on an LSM volume,


you are required to verify that the LSM volume is correct.

Verify the devices which make up the disk group.

10

Enter the directory to be exported to clients.

11

You can specify a netgroup or system names allowed access.

12

Select the read-write or read-only mount option.

13

Enter the path names for user and group quota les or use the
default. Enter "none" to disable quotas.

14

Enter any other mount options you want to use.

15

You can enter another UFS device special le, AdvFS leset,
or LSM volume for this service.

716 Setting Up ASE Services

Setting Up NFS Services

16
17

Choose a placement policy for this service. The help screens


describe the various options. If you choose one of the favored
members policies, you will be prompted to select the members.

18

Determine whether the service will be relocated to another


member.

19

Conrm that you want to add the service. This updates the
ASE database. If there are errors, the service will not be
added. Check for errors in the /var/adm/syslog.dated daemon
log.

20

If desired, select Display the status of a service from the


Service Conguration menu to display the status of the new
service, or any existing service.

21

Discussing the
/etc/exports.ase
File

Action scripts are not generally needed for NFS services.

Select the service you want to display.

To export the NFS service to clients, the asemgr creates an


/etc/exports.ase le, which is included in the /etc/exports le.
The /etc/exports.ase le includes an exports le for each service,
/etc/exports.ase.servicename. This species the device special
le, the pathname to export, the local mount point (-m option),
and the mount options.
Example 74 shows the exports les for the NFS service added in
Example 73 with the entry for the exiting NFS service.

Setting Up ASE Services 717

Setting Up NFS Services

Example 74 /etc/exports.ase File

# more /etc/exports
.
.
.
.INCLUDE /etc/exports.ase
# more /etc/exports.ase
.INCLUDE /etc/exports.ase.nfsusers
.INCLUDE /etc/exports.ase.nfspublic
# more /etc/exports.ase.nfspublic
#
# ASE exports file for service nfspublic (ONLY EDIT THIS FILE WITH asemgr)
#
#public-domain#public exports (after this line) - DO NOT DELETE THIS LINE
/usr/nfspublic -m=/var/ase/mnt/nfspublic/usr/nfspublic
#

NFS Mail
Service

An NFS mail service fails over mail hubs (servers) so that mail
service is highly available. If a hardware or software error occurs,
ASE relocates the queued mail and reroutes any new mail to a
new hub.
To set up a highly available mail service with ASE, the le
systems containing the mailbox directory /var/spool/mail and the
mail queue area /var/spool/mqueue must be set up as an NFS
service, and the mail hubs sendmail.cf conguration le must be
modied to ensure that the service name is treated as a virtual
host. The mail hub member systems then NFS mount the mail
directories from the service.

718 Setting Up ASE Services

Setting Up a Disk Service

Setting Up a Disk Service


Overview

This topic describes how to use the asemgr to add a disk service in
an Available Server Environment (ASE).

Describing a
Disk Service

A disk service includes one or more le systems, AdvFS lesets,


or LSM logical volumes on one or more entire disks, and usually
an application that utilizes the disks.
A database application is a popular disk service. Most commercial
database programs provide a rollback/commit function, where a
system failure will roll back uncommitted transactions on reboot.
This makes them good candidates for ASE failover.

Describing
the Set Up
Procedure for a
Disk Service

Before setting up a disk service, install the application software


on all member systems, and prepare the shared disks. If you
want to fail over an application as well as disks, you will need
action scripts to start and stop the application.
Write start and stop action scripts and debug them before adding
the disk service that will use the action scripts.
A disk service is not the same as a distributed raw disk (DRD)
service, which is supported only if you have the TruCluster
Production Server Software product. Disk services typically
involve le system usage, while DRD services provide clusterwide
access to raw physical disks.
Before using the asemgr to set up a disk service, you must:

Include the disk service name, with the Internet address in


the /etc/hosts le of each member system.

Ensure that the Internet address associated with a disk


service is on the same subnet as the member systems.

Client systems that will access the disk service must have the
disk service name and Internet address in their /etc/hosts
le.

Use the asemgr utility to specify the following:

Unique service name

Disks
UFS device special le names
AdvFS lesets
LSM logical volumes

(Optional) mount points for each le system, leset, or volume

Setting Up ASE Services 719

Setting Up a Disk Service

Specify NONE if you do not want ASE to automatically mount


the devices. For example, an application using raw disks does
not use mounted devices. Client systems specify this mount
point in their /etc/fstab le to access the services le system.

Automatic Service Placement (ASP) policy and favored


members

Using a
Network Alias

Read-write or read-only access and (optional) mount options

Action scripts to fail over the application

Client access to the application may be able to use a service name


aliased to the member system providing the service.
Add the service (pseudo host) name and Internet address to all
member and client systems /etc/hosts le. Use the /var/ase
/sbin/nfs_ifconfig script in the start and stop action scripts to
establish and remove the alias.
Example 75 shows a start script that establishes the service
alias.

Example 75 Using a Network Alias

#!/bin/sh
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
ASETMPDIR=/var/ase/tmp
if [ $# -gt 0 ]; then
svcName=$1
else
svcName=
fi
# This sets up the alias: ifconfig interface alias $svcName
/var/ase/sbin/nfs_ifconfig $svcName start aliasname
status=$?
if [ $status != 0 ]
then
echo "$0: Can not set alias; exit status of nfs_ifconfig = $status"
exit 1
fi
exit 0

nfs_ifconfig is a script that uses ifconfig to establish and


remove the alias.

$svcName is an internal variable set to the service name.

start is the keyword to establish the alias; stop is the keyword


to remove the alias.

aliasname is the pseudo host name in /etc/hosts.

720 Setting Up ASE Services

Setting Up a Disk Service

Setting Up a
Disk Service
for a Database
Application

Consider a sample conguration with two member systems, each


with two shared SCSI buses. Each SCSI bus has two RZ28 disks
to provide a disk service consisting of a 4-gigabyte, mirrored
database, made up of two stripe sets of two disks each. This
conguration uses LSM to provide a mirrored stripe set, the disk
group disks-dbase, and AdvFS to create a le domain dbase-domain
and leset dbase.
Example 76 shows how to use the asemgr utility to add the disk
service database using the AdvFS lset dbase-domain#dbase.

Example 76 Adding a Disk Service

# asemgr
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status

-->
-->
-->

x) Exit
Enter your choice: m

?) Help
1

Managing ASE Services


c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu


Enter your choice [x]: c

?) Help

Service Configuration
a)
m)
d)
s)

Add a new service


Modify a service
Delete a service
Display the status of a service

x) Exit to Managing ASE Services


Enter your choice [x]: a

?) Help

Adding a service
Select the type of service:
1) NFS service
2) Disk service
3) User-defined service
x) Exit to Service Configuration

?) Help
(continued on next page)

Setting Up ASE Services 721

Setting Up a Disk Service

Example 76 (Cont.) Adding a Disk Service

Enter your choice [1]: 2

You are now adding a new disk service to ASE.


A disk service consists of a disk-based application and disk configuration
that are failed over together. The disk configuration can include UFS
file systems, AdvFS filesets, LSM volumes, or raw disk information.
Disk Service Name
The name of a disk service must be
an IP address may be assigned to a
name must be a unique IP host name
in the local hosts database on all

a unique service name. Optionally,


disk service. In this case, the
set up for this service and present
ASE members.

Enter the disk service name (q to quit): dbase-01


Assign an IP address to this service? (y/n): y

Checking to see if dbase-01 is a valid host...


Specifying Disk Information
Enter one or more device special files, AdvFS filesets, or LSM volumes
to define the disk storage for this service.
For example: Device special file:
AdvFS fileset:
LSM volume:

/dev/rz3c
domain1#set1
/dev/vol/dg1/vol01

To end the list, press the Return key at the prompt.


Enter a device special file, an AdvFS fileset, or an LSM volume as storage
for this service (press Return to end): dbase-domain#dbase
7
ADVFS domain dbase-domain has the following volume(s):
/dev/vol/disks-dbase/database-vol
Is this correct (y/n) [y]:

Return

Following is a list of device(s) and pubpath(s) for disk group disks-dbase:


DEVICE PUBPATH
rz19
rz20
rz35
rz36

/dev/rz19g
/dev/rz20g
/dev/rz35g
/dev/rz36g

Is this correct (y/n) [y]:

Return

Mount Point
The mount point is the directory on which to mount dbase-domain#dbase.
If you do not want it mounted, enter "NONE".
Enter the mount point or NONE: /usr/dbase

AdvFS Fileset Read-Write Access and Quota Management


Mount dbase-domain#dbase fileset with read-write or read-only access?
1) Read-write
2) Read-only
Enter your choice [1]: 1

10

(continued on next page)

722 Setting Up ASE Services

Setting Up a Disk Service

Example 76 (Cont.) Adding a Disk Service

You may enable user, and group and fileset quotas on this file system by
specifying the full pathnames for the quota files. Quota files must reside
within the fileset. Enter "none" to disable quotas.
User quota file path [/usr/dbase/quota.user]: Return
Group quota file path [/usr/dbase/quota.group]: Return

11

AdvFS Mount Options Modification


Enter a comma-separated list of any mount options you want to use for
the dbase-domain#dbase fileset (in addition to the defaults listed in the
mount.8 reference page). If none are specified, only the default mount
options are used.
Enter options (Return for none):

Return

12

Specifying Disk Information


Enter one or more device special files, AdvFS filesets, or LSM volumes
to define the disk storage for this service.
For example: Device special file:
/dev/rz3c
AdvFS fileset:
domain1#set1
LSM volume:
/dev/vol/dg1/vol01
To end the list, press the Return key at the prompt.
Enter a device special file, an AdvFS fileset, or an LSM volume as storage
for this service (press Return to end): Return
13
Modifying user-defined scripts for dbase-01:
1)
2)
3)
4)

Start action
Stop action
Add action
Delete action

x) Exit - done with changes


Enter your choice [x]: 1

14

Modifying the start action script for dbase-01:


a)
)
)
)
)
x)

Add a start action script


Edit the start action script
Modify the start action script arguments []
Modify the start action script timeout [60]
Remove the start action script
Exit - done with changes

Enter your choice [x]: a

15

Enter the full pathname of your start action script or "default"


for the default script (x to exit):
/usr/local/adm/database/usr/local/adm/database)
16
Enter the argument list for the start action script
(x to exit): start 1 7
Enter the timeout in seconds for the start action script [60]:

Return

18

(continued on next page)

Setting Up ASE Services 723

Setting Up a Disk Service

Example 76 (Cont.) Adding a Disk Service

Modifying the start action script for dbase-01:


f)
e)
g)
t)
r)
x)

Replace the start action script


Edit the start action script
Modify the start action script arguments [start]
Modify the start action script timeout [60]
Remove the start action script
Exit - done with changes

Enter your choice [x]:

19

Return

Modifying user-defined scripts for dbase-01:


1)
2)
3)
4)

Start action
Stop action
Add action
Delete action

x) Exit - done with changes


Enter your choice [x]: 2

20

Modifying the stop action script for dbase-01:


a)
)
)
)
)
x)

Add a stop action script


Edit the stop action script
Modify the stop action script arguments []
Modify the stop action script timeout [60]
Remove the stop action script
Exit - done with changes

Enter your choice [x]: a

20

Enter the full pathname of your stop action script or "default"


for the default script (x to exit): /usr/local/adm/database
21
Enter the argument list for the stop action script
(x to exit): stop 2 2
Enter the timeout in seconds for the stop action script [60]

Return

23

Modifying the stop action script for dbase-01:


f)
e)
g)
t)
r)
x)

Replace the stop action script


Edit the stop action script
Modify the stop action script arguments [stop]
Modify the stop action script timeout [60]
Remove the stop action script
Exit - done with changes

Enter your choice [x]:

Return

Modifying user-defined scripts for dbase-01:


1)
2)
3)
4)

Start action
Stop action
Add action
Delete action

x) Exit - done with changes


(continued on next page)

724 Setting Up ASE Services

Setting Up a Disk Service

Example 76 (Cont.) Adding a Disk Service

Enter your choice [x]:

Return

Selecting an Automatic Service Placement (ASP) Policy


Select the policy you want ASE to use when choosing a member
to run this service:
b) Balanced Service Distribution
f) Favor Members
r) Restrict to Favored Members
x) Exit to Service Configuration
Enter your choice [b]: b

?) Help

24

Selecting an Automatic Service Placement (ASP) Policy


Do you want ASE to consider relocating this service to another member
if one becomes available while this service is running (y/n/?): y
Enter y to add Service dbase-01 (y/n): y
Adding service...
Starting service...
Service dbase-01 successfully added...
.
.
.

25

26

After invoking asemgr, from the ASE main menu, choose the
Managing ASE Services item.

From the Managing ASE Services menu, choose the Service


Conguration item.

From the Service Conguration menu, choose the Add a new


service item.

From the Adding a service menu, choose the Disk service.

Enter the service name, which is a virtual host name already


in the /etc/hosts le of all member systems with an Internet
address.

If you assign an IP address to the service, POLYCENTER


NetWorker Save and Restore (NetWorker) can back up the
disks associated with this service.

Enter the UFS device special le, AdvFS leset, or LSM


volume which denes the disk storage for this service. There
can be more than one UFS device special le, and so forth.
Additional storage is added later.

Verify that the LSM volume and list of disks being used is
correct.

Enter the mount point for the disk service.

10

Select the read-write or read-only mount option.

11

Enter the pathnames for user and group quota les; use the
default, or enter "none" to disable quotas.

Setting Up ASE Services 725

Setting Up a Disk Service

12

Enter any other mount options you want to use.

13

You can enter another UFS device special le, AdvFS leset,
or LSM volume for this service.

14

Add actions scripts if you want. There should be at least start


and stop action scripts. Select the start action script.

15

Add the start action script.

16

Provide the full pathname for the start action script.

17

Provide the arguments needed to start the service.

18

Enter the timeout value or take the default of 60 seconds.

19

Finish the start action script.

20

Add the stop action script.

21

Provide the full pathname for the Stop action script.

22

Provide the arguments needed to stop the service.

23

Enter the timeout value or take the default of 60 seconds.

24

Choose a placement policy for this service. The help screens


describe the various options. If you choose one of the favored
members policies, you will be prompted to select the members.

25

Determine whether the service will be relocated to another


member.

26

Conrm that you want to add the service. This updates the
ASE database.

726 Setting Up ASE Services

Setting Up a User-Dened Service

Setting Up a User-Dened Service


Overview

A user-dened service consists only of an application that can fail


over.

User-Dened
Service Setup
Procedure

Before setting up a user-dened service, install the application on


all member systems and write, then test the action scripts to start
and stop the application.
A user-dened application cannot use disks. If your application is
disk-based, set up a disk service or an NFS service instead.
Use the asemgr utility to specify the following:

Action scripts to fail over the application

Adding a
User-Dened
Service

Unique service name


Service placement policy and favored members

Example 77 shows how to set up a user-dened service. The


example uses dxcalc as the application.

Example 77 Setting Up a User-Dened Service

$ asemgr
.
.
.
Adding a service
Select the type of service:
1) NFS service
2) Disk service
3) User-defined service
x) Exit to Service Configuration
Enter your choice [1]: 3

?) Help

You are now adding a new user-defined service to ASE.


User-defined Service Name
The name of a user-defined service must be a unique service name within
the ASE environment.
Enter the user-defined service name (q to quit): dxcalc

Modifying user-defined scripts for dxcalc:


(continued on next page)

Setting Up ASE Services 727

Setting Up a User-Dened Service

Example 77 (Cont.) Setting Up a User-Dened Service

1)
2)
3)
4)
5)
x)

Start action
Stop action
Add action
Delete action
Check action
Exit - done with changes

Enter your choice [x]: 1

Modifying the start action script for dxcalc:


f)
e)
g)
t)
r)
x)

Replace the start action script


Edit the start action script
Modify the start action script arguments [dxcalc]
Modify the start action script timeout [60]
Remove the start action script
Exit - done with changes

Enter your choice [x]: f

Enter the full pathname of your start action script or "default"


for the default script (x to exit): /usr/local/adm/calc-start-stop

Modifying the start action script for dxcalc:


f)
e)
g)
t)
r)
x)

Replace the start action script


Edit the start action script
Modify the start action script arguments [dxcalc]
Modify the start action script timeout [60]
Remove the start action script
Exit - done with changes

Enter your choice [x]: g

Enter the argument list for the start action script


(x to exit, NONE for none) [dxcalc]: dxcalc start

Modifying the start action script for dxcalc:


f)
e)
g)
t)
r)
x)

Replace the start action script


Edit the start action script
Modify the start action script arguments [dxcalc start]
Modify the start action script timeout [60]
Remove the start action script
Exit - done with changes

Enter your choice [x]:

Return

Modifying user-defined scripts for dxcalc:


1)
2)
3)
4)
5)
x)

Start action
Stop action
Add action
Delete action
Check action
Exit - done with changes

Enter your choice [x]: 2

(continued on next page)

728 Setting Up ASE Services

Setting Up a User-Dened Service

Example 77 (Cont.) Setting Up a User-Dened Service

Modifying the stop action script for dxcalc:


f)
e)
g)
t)
r)
x)

Replace the stop action script


Edit the stop action script
Modify the stop action script arguments [dxcalc]
Modify the stop action script timeout [60]
Remove the stop action script
Exit - done with changes

Enter your choice [x]: f

Enter the full pathname of your stop action script or "default"


for the default script (x to exit): /usr/local/adm/calc-start-stop

10

Modifying the stop action script for dxcalc:


f)
e)
g)
t)
r)
x)

Replace the stop action script


Edit the stop action script
Modify the stop action script arguments [dxcalc]
Modify the stop action script timeout [60]
Remove the stop action script
Exit - done with changes

Enter your choice [x]: g

11

Enter the argument list for the stop action script


(x to exit, NONE for none) [dxcalc]: dxcalc stop

12

Modifying the stop action script for dxcalc:


f)
e)
g)
t)
r)
x)

Replace the stop action script


Edit the stop action script
Modify the stop action script arguments [dxcalc stop]
Modify the stop action script timeout [60]
Remove the stop action script
Exit - done with changes

Enter your choice [x]:

Return

Modifying user-defined scripts for dxcalc:


1)
2)
3)
4)
5)
x)

Start action
Stop action
Add action
Delete action
Check action
Exit - done with changes

Enter your choice [x]:

Return

Selecting an Automatic Service Placement (ASP) Policy


Select the policy you want ASE to use when choosing a member
to run this service:
b) Balanced Service Distribution
f) Favor Members
r) Restrict to Favored Members
x) Exit to Service Configuration

?) Help
(continued on next page)

Setting Up ASE Services 729

Setting Up a User-Dened Service

Example 77 (Cont.) Setting Up a User-Dened Service

Enter your choice [b]:

Return

13

Selecting an Automatic Service Placement (ASP) Policy


Do you want ASE to consider relocating this service to another member
if one becomes available while this service is running (y/n/?):
n
14
Enter y to add Service dxcalc (y/n): y
Adding service...
Starting service...
Service dxcalc successfully added...
.
.
.
$

15

From the Adding a service menu, choose the User-dened


service item.

Enter the service name, a virtual host name already in


the /etc/hosts le of all member systems with an Internet
address.

Select Start action to add the start action script.

Select Replace the start action script to allow using a script


you have already written and debugged.

Provide the complete pathname of the start action script.

You must provide arguments to the start action script.

Provide the arguments to the start action script, the name of


the service, and the action to be taken (start).

Select Stop action to add the stop action script.

Select Replace the stop action script to allow using a script


you have already written and debugged.

10

Provide the complete pathname of the stop action script.

11

You must provide arguments to the stop action script.

12

Select the ASP policy.

13

Determine if you want ASE to relocate the service.

14

Provide the arguments to the stop action script, the name of


the service, and the action to be taken (stop).

15

Conrm that you want to add the service. This adds the
service and modies the ASE database.

730 Setting Up ASE Services

Setting Up a User-Dened Service

User-Dened
Login Service

You can set up a user-dened network or login service that uses


a pseudo host name for user login and network operations. The
pseudo host name is the name of the service and has an Internet
address; it is used as an alias of a member system. This service
name must be unique from any system name. Users can log in to
the pseudo host name.
To set up a user-dened login service, you must perform the
following steps:
1. Specify the pseudo host name and its Internet address in all

member and client systems /etc/hosts les.


2. Use the asemgr utility to specify the following:

Service (pseudo host) name

Action scripts to start and stop the service


ASE provides a script, /var/ase/sbin/nfs_ifconfig, to
establish and remove the host name alias. Refer to
the description of Adding a User-Dened Login Service
in Chapter 3 of TruCluster Available Server Software
Available Server Environment Administration.

Service placement policy and favored members

Setting Up ASE Services 731

Using asemgr to Manage Services

Using asemgr to Manage Services


Overview

Use the asemgr utility to manage services, including the following


activities:

Modify an existing service

Delete a service

Display the status of a service

Manually relocate a service to a specic member system

Temporarily stop and restart a service

Restart a stopped service

Managing
Services Menu

Add a new service

Rereserve a services devices (LSM only)

To manage services, invoke the asemgr utility and choose the


Managing ASE Services item from the main menu. The following
example shows the menus dealing with services.

732 Setting Up ASE Services

Using asemgr to Manage Services

Example 78 Managing ASE Services Menu

# asemgr
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status

-->
-->
-->

x) Exit

?) Help

Enter your choice: m


Managing ASE Services
c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu

?) Help

Enter your choice [x]:

Displaying
Service Status

To display the status of a service, choose the Display the status of


a service item from various ASE menus, as shown in the following
example, then select the service you want to display.

Example 79 Displaying Service Status

# asemgr
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status
x) Exit

-->
-->
-->
?) Help

Enter your choice: m


Managing ASE Services
(continued on next page)

Setting Up ASE Services 733

Using asemgr to Manage Services

Example 79 (Cont.) Displaying Service Status

c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu

?) Help

Enter your choice [x]: s


Service Status
Select the service whose status you want to display:
1) nfsusers on tinker
2) nfspublic on tailor
3) dxcalc on tailor
x) Exit to previous menu

?) Help

Enter your choice [x]: 1


Status for NFS service nfsusers
Status:
on tinker

Relocate: Placement Policy:


yes
Balance Services

Favored Member(s):
None

Storage configuration for NFS service nfsusers


NFS Exports list
/usr/nfsusers
Mount Table (device, mount point, type, options)
users-domain#users /var/ase/mnt/nfsusers/usr/nfsusers advfs rw,groupquota,userquota
Advfs Configuration
Domain:
Volume(s):
users-domain
/dev/vol/users-dg/users-vol
LSM Configuration
Disk Group:
Device(s):
users-dg
rz17 rz33
Press Return to continue:

Return

Service Status
Select the service whose status you want to display:
1) nfsusers on tinker
2) nfspublic on tailor
3) dxcalc on tailor
x) Exit to previous menu

?) Help
(continued on next page)

734 Setting Up ASE Services

Using asemgr to Manage Services

Example 79 (Cont.) Displaying Service Status

Enter your choice [x]:


.
.
.
#

Relocating a
Service

A service is automatically relocated by ASE if a failure stops a


member system from providing the service. You can also use
the asemgr utility to manually relocate a service. This stops
the service on the member currently running the service, and
starts the service on the member you select. You can override the
services placement policy when you select a member system to
run the service.
To relocate a service, choose the Relocate a service item from the
Managing ASE Services menu, as shown in the Example 710.
Example 710 Relocating a Service

# asemgr
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status

-->
-->
-->

x) Exit

?) Help

Enter your choice: m


Managing ASE Services
c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu

?) Help

Enter your choice [x]: r


Select the service you want to relocate
Services:
1) nfsusers on tinker
2) nfspublic on tailor
3) dxcalc on tailor
(continued on next page)

Setting Up ASE Services 735

Using asemgr to Manage Services

Example 710 (Cont.) Relocating a Service

x) Exit to Managing ASE Services

?) Help

Enter your choice [x]: 1


Select member to run nfsusers service:
1) tailor
) tinker
x) Exit without making changes

?) Help

Enter your choice: 1


Relocating service nfsusers to member tailor...
Relocation successful.
Managing ASE Services
c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu

?) Help

Enter your choice [x]: s


Service Status
Select the service whose status you want to display:
1) nfsusers on tailor
2) nfspublic on tailor
3) dxcalc on tailor
x) Exit to previous menu

?) Help

Enter your choice [x]:


.
.
.

Modifying a
Service

You can use the asemgr to modify any information that was
specied when a service was added to ASE.

Disk conguration: You can add UFS le systems, AdvFS


lesets, LSM volumes to the service or delete them from
the service. You can change any disk information that was
specied when the service was added, including:
Name of the le system, leset, or volume
Mount point
Access mode and mount options
Owner and mode of the mount point
Exports le and locking area for an NFS service

736 Setting Up ASE Services

Using asemgr to Manage Services

Service information:
Service name
Automatic Service Placement (ASP) policy and favored
members
User-dened action scripts
Exports le for an NFS service

To modify an LSM disk group or logical volume, or AdvFS domain


or leset being used in a service, you can modify the conguration
while the service is on line. Relocate the service to the member
on which you will change the conguration, and ensure that the
service will not relocate if a more highly favored member becomes
available. Use the AdvFS or LSM commands to change the
conguration. Then use the asemgr utility to modify the service.
When you select the leset or volume, asemgr displays the changed
conguration. Enter y if the information is correct, and the ASE
database is updated.
After the service is modied by the asemgr, the TruCluster
Software:

Stops the service

Deletes the service

Propagates the database to all member systems

Starts the modied service

Example 711 uses the asemgr to modify the Automatic Service


Placement (ASP) policy of a service.
Example 711 Modifying a Service

# asemgr
TruCluster Available Server (ASE)
ASE Main Menu
a) Managing the ASE
m) Managing ASE Services
s) Obtaining ASE Status
x) Exit

-->
-->
-->
?) Help

Enter your choice: m


Managing ASE Services
(continued on next page)

Setting Up ASE Services 737

Using asemgr to Manage Services

Example 711 (Cont.) Modifying a Service

c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu

?) Help

Enter your choice [x]: c


Service Configuration
a)
m)
d)
s)

Add a new service


Modify a service
Delete a service
Display the status of a service

x) Exit to Managing ASE Services

?) Help

Enter your choice [x]: m


Modifying a Service
Select the service you want to modify:
1) nfsusers on tinker
2) nfspublic on tailor
3) dxcalc on tinker
x) Exit to Service Configuration

?) Help

Enter your choice [x]: 2


Select what you want to modify in service nfspublic:
g) General service information
a) Automatic service placement (ASP) policy
x) Exit without modifications

?) Help

Enter your choice [g]: a


Selecting an Automatic Service Placement (ASP) Policy
Select the policy you want ASE to use when choosing a member
to run this service:
b) Balanced Service Distribution
f) Favor Members
r) Restrict to Favored Members
x) Exit to Service Configuration

?) Help

Enter your choice [b]: f


Selecting an Automatic Service Placement (ASP) Policy
Select the favored member(s) IN ORDER for service nfspublic:
1) tinker
2) tailor
x) No favored members

?) Help
(continued on next page)

738 Setting Up ASE Services

Using asemgr to Manage Services

Example 711 (Cont.) Modifying a Service

Enter a comma-separated list [x]: 2


Selecting an Automatic Service Placement (ASP) Policy
Do you want ASE to relocate this service to a more highly favored member
if one becomes available while this service is running (y/n/?): y
NOTE: Modifying a service causes it to stop and then restart. If you do
not want to interrupt the service availability, do not modify the service.
Enter y to modify service nfspublic (y/n): y
Stopping service...
Deleting service...
Adding service...
Starting service...
Service successfully updated.
Service Configuration
a)
m)
d)
s)

Add a new service


Modify a service
Delete a service
Display the status of a service

x) Exit to Managing ASE Services

?) Help

Enter your choice [x]:


Managing ASE Services
c)
r)
on)
off)
res)
s)
a)

Service Configuration
-->
Relocate a service
Set a service on line
Set a service off line
Restart a service
Display the status of a service
Advanced Utilities
-->

x) Exit to the Main Menu

?) Help

Enter your choice [x]: s


Service Status
Select the service whose status you want to display:
1) nfsusers on tinker
2) nfspublic on tailor
3) dxcalc on tinker
x) Exit to previous menu

?) Help

Enter your choice [x]: 2


Status for NFS service nfspublic
(continued on next page)

Setting Up ASE Services 739

Using asemgr to Manage Services

Example 711 (Cont.) Modifying a Service

Status:
on tailor
.
.
.

Relocate: Placement Policy:


yes
Favor Member(s)

740 Setting Up ASE Services

Favored Member(s):
tailor

Summary

Summary
Understanding
Highly
Available
Services

To make an application highly available, set up an ASE service


for that application.
ASE supports three types of services:

NFS service provides highly available access to exported disk


data.

Disk service provides highly available access to disks or a


disk-based application.

User-dened service provides highly available access to an


application.

Clients refer to service names rather than server names.

Preparing
to Set Up
Services

Before adding a service, you must plan how to set up the service
and perform some preparatory tasks. For example, you may need
to set up NFS, AdvFS, or LSM, or install an application. You
must assign each service a unique name and a automatic service
placement policy.

Setting Up NFS
Services

To set up an NFS service, you must specify the service name and
its Internet address in all member and client systems /etc/hosts
les. Use the asemgr utility to specify the service, device, export
path, and mount options, and add the service name and export
mount point to the clients /etc/fstab les.
To set up a mail service with ASE, the le systems containing
the mailbox directory /var/spool/mail and the mail queue area
/var/spool/mqueue must be set up as an NFS service, and the mail
hubs sendmail.cf conguration le must be modied to ensure
that the service name is treated as a virtual host.

Setting Up a
Disk Service

To set up a disk service with ASE, install the application software


on all member systems, prepare the shared disks, develop action
scripts to start and stop the disk-based application, and use the
asemgr utility to specify the service information.
Ensure that the service name is in the /etc/hosts le of all
member and client systems.

Setting Up a
User-Dened
Service

To set up a user-dened service with ASE, install the application


software on all member systems, develop action scripts to start
and stop the application, and use the asemgr utility to specify the
service information.

Setting Up ASE Services 741

Summary

To set up a user-dened login service with ASE, add the pseudo


host name and its Internet address to all member and client
systems /etc/hosts les, and use the asemgr utility to specify the
service information. You can use the nfs_ifconfig script in your
start and stop action scripts to set up and remove the host name
alias.

Using asemgr
to Manage
Services

Use the asemgr utility to manage services, including the following


activities:

Add a new service

Modify an existing service

Delete a service

Display the status of a service

Manually relocate a service to a specic member system

Temporarily stop and restart a service

Restart a stopped service

Rereserve an Logical Storage Manager (LSM) device

742 Setting Up ASE Services

Exercises

Exercises
Understanding
Highly
Available
Services:
Exercise

1. Describe the three types of services supported by ASE.

Understanding
Highly
Available
Services:
Solution

1. ASE supports three types of services:

2. Explain the signicance of the service name.

NFS service provides highly available access to exported


disk data.

Disk service provides highly available access to disks or a


disk-based application.

User-dened service provides highly available access to an


application.

2. The service name, which is what the client refers to, is distinct

from any system name. This means the service is not tied to
any system, and can be relocated if necessary.

Preparing
to Set Up
Services:
Exercise

1. Describe the three automatic service placement policies.

Preparing
to Set Up
Services:
Solution

1. The three automatic service placement policies are:

2. Identify the characteristics an application must have to be

made highly available with ASE.

Balanced service distribution tries to balance the service


load. ASE will choose the member running the least
number of services at the time the service is started.

Favor members checks the specied members in order


rst. If one of them is available, it is selected to run the
service. If none of the favored members are available,
ASE will choose the member running the least number of
services.

Restrict to favored members checks the specied members.


However, if none of the favored members are available,
ASE will not start the service. This policy ensures that
ASE never moves the service to a member not on the list.

2. An application must have the following characteristics to be

made highly available with ASE:

The application must run on only one system at a time.

The application must be able to be started and stopped


using a series of commands (action script).

Setting Up ASE Services 743

Exercises

Setting Up
NFS Services:
Exercise

Set up an NFS service.


1. Set up a UFS le system or AdvFS leset on a shared disk.
2. Choose a unique service name. Enter the service name and an

Internet address into each server and client /etc/hosts le.


3. Use asemgr to specify the service name, device special le,

leset, or logical volume, mount pathname, and read-write


access.
4. Use the balanced placement policy.
5. Add the service name and export mount point to the clients

/etc/fstab les.
6. Log in to a client system and create a le on this le system.

Setting Up
NFS Services:
Solution
Setting Up a
Disk Service:
Exercise

Use Example 73 from the course guide as an example solution.

Use the asemgr utility to set up a disk service. You do not have
to include a disk-based application (therefore no start/stop action
script is necessary).

Setting Up a
Disk Service:
Solution

No solution is necessary. Refer to Example 76 for a sample


solution.

Setting Up a
User-Dened
Service:
Exercise

Set up the /usr/bin/X11/dxcalc program as a user-dened service.


If you have not already done so, write an action script that
will both start and stop the service. Place the script in the
/usr/local/adm directory. Verify that the script executes correctly
before running asemgr to add the service.
Keep in mind when you write the script, the command to
start /usr/bin/X11/dxcalc may need to include the -d option to
designate your workstation.

Setting Up a
User-Dened
Service:
Solution

No solution required. Refer to Example 77 as a sample solution.

744 Setting Up ASE Services

Exercises

Using asemgr
to Manage
Services:
Exercise

Use the asemgr utility to:


1. Set a service off line.
2. Set the same service back on line.
3. Relocate a service to another member.
4. Change the ASP policy of a service.

Using asemgr
to Manage
Services:
Solution

No solution necessary. Refer to the examples in this section.

Setting Up ASE Services 745

8
Using the Cluster Monitor

Using the Cluster Monitor 81

About This Chapter

About This Chapter


Introduction

The Cluster Monitor monitors the status of an Available Server


conguration and displays the conguration, including member
systems, available services, storage devices, and interconnects.
The Cluster Monitor provides a graphical interface to the cluster
conguration map. It simplies TruCluster Available Server
management by allowing you to view the status of members and
services, to relocate a service, and to launch other tools such as
dxlsm and asemgr.
This chapter shows you how to set up and run the Cluster
Monitor and launch other tools.

Objectives

To use the Cluster Monitor to monitor an Available Server


conguration, you should be able to:

Use the Cluster Monitor to view the status of devices and


services

Launch other management tools through the Cluster Monitor

Resources

Set up the Cluster Monitor

Use the Cluster Monitor to identify problems in a TruCluster


Software conguration

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Version 1.4 Release


Notes

Reference Pages

Cluster Monitor online help

82 Using the Cluster Monitor

Setting Up the Cluster Monitor

Setting Up the Cluster Monitor


Overview

Before you can run the Cluster Monitor, the hardware and
software must be installed, the ASE must be properly congured,
and all member systems and devices must be up and available.
On a new ASE conguration, or after you change the hardware
conguration, you must also create the cluster map.
The Cluster Monitor obtains the ASE hardware conguration
information from the cluster conguration map. The cluster map
is formed by gathering hardware conguration information from
each of the member systems in the ASE. This information is
is compiled into a text le, /etc/CCM. This le is copied to each
member system.

Setup
Procedure

To set up the Cluster Monitor, follow these steps:


1. Make sure that the required subsets for the product are

installed:

TCRCMS140 (TruCluster Cluster Monitor)

CXLSHRDA405 (DEC C++ Class Shared Libraries)


To use dxadvfs, you must also have the AFAADVGUI401 subset
installed. In addition, LSM and other GUI tools may require
licenses to be loaded on the member system running the
monitor.
2. Set up the /.rhosts le to allow root access for the rsh

command between any two member systems. You must


include all member systems, including the local system, in the
/.rhosts le. For example, the /.rhosts le on system tinker
should list systems tailor and tinker. Refer to .rhosts(4) for
more information.
3. Check that all member systems are UP by running the

asemgr utility and displaying member status.


4. Use the cluster_map_create command to create the cluster

conguration map on one member system in the ASE domain.


To issue the cluster_map_create command, log in as superuser
and use the following syntax:
/usr/sbin/cluster_map_create clustername -full
The following table describes the usage for the clustername
variable and the -full option.

Using the Cluster Monitor 83

Setting Up the Cluster Monitor

clustername

Name of ASE domain, up to 64 characters; used


to label the title bar on the Cluster Monitor
main window. You can use any name you want
for this variable.

-full

Option forces all member systems to rescan for


new components and update their cluster map.

After you issue the cluster_map_create command, the


conguration information is merged into the cluster map,
/etc/CCM, which is distributed to the kernels of all member
systems. If any conguration errors are discovered, or if
any member systems are down, the utility generates error
messages. Refer to cluster_map_create(8) and CCM(5) for more
information.

Sample Setup
Script

The following is a sample script showing the commands you issue


and the system prompts that are displayed when you set up the
Cluster Monitor application.
# /usr/sbin/cluster_map_create ASE0 -full
Members running are: ( tinker tailor )
Doing device table scans.
...
Doing symmetry checks
...
Processing map input file.
...
Calling makeclmap to create /etc/CCM.
Distributing cluster map to all members.
Processing member tinker.
Processing member tailor.
Successful cluster map creation and distribution.

Updating the
Cluster Map

If you add a member system later, you must perform the


preceding tasks on that new system. If you make other changes
to your hardware conguration, invoke the cluster_map_create
command with the -full and -append options on one member
system. This updates each members cluster conguration map.

84 Using the Cluster Monitor

Using the Cluster Monitor

Using the Cluster Monitor


Overview

The Cluster Monitor provides a graphical interface for managing


an Available Server conguration and detecting Available Serverrelated problems. It shows the current state of availability and
connectivity, and visually alerts the administrator to problems.
The Cluster Monitor performs the following tasks:

Displays the status of each member system

Reports member system failures, ASE service failures, and


hard and soft disk errors

Displays the conguration of an Available Server


implementation, including its member systems, ASE
services, storage devices, and network interfaces

Displays the devices on a member systems private SCSI buses

Displays the shared storage reserved by an ASE service

Stops, starts and relocates ASE services

Launches external tools:


asemgr
dxadvfs
dxlsm
pmgr

Using the Cluster Monitor 85

Using the Cluster Monitor

Starting the
Cluster Monitor

To start the Cluster Monitor:


1. Set the session security to allow X access, or use the xhost +

command to disable all X client security checks.


2. Log in to an ASE member system as root.
3. Set your DISPLAY variable to point to the desired workstation

or PC (that supports the X Window System).


4. Run /usr/bin/X11/cmon.

For more information on the Cluster Monitor, see cmon(8) and the
online help.

Top View

The top view, or main window, presents an overview of the status


of all the ASE member systems. If you have a four-member ASE
domain, all four member systems will be visible in the top view.
For each member system, icons show the status of the system,
its ASE services, its interconnects, and its storage. If the Cluster
Monitor detects a problem with one of these subsystems, it draws
a line through the corresponding icon.

86 Using the Cluster Monitor

Using the Cluster Monitor

Figure 81 is a representation of the Cluster Monitor top view.


Figure 81 Cluster Monitor Top View

Cluster Monitor: ASE 0: tinker


Monitor

Options

Help

ASE0

tinker

tailor

ZKOX548121RGS

The icons in each server indicate the following:


1

System status (up or down)

Status of ASE services

Status of shared storage devices

Local area network interfaces

Click a member system to display the device view for that


member. Click a service icon to display the services view.

Device View

The device view displays the hardware conguration in the ASE


domain. It displays the network interconnects, the member
systems, the shared SCSI buses, and the shared storage devices.

Using the Cluster Monitor 87

Using the Cluster Monitor

Figure 82 is a representation of the Cluster Monitor


conguration view.
Figure 82 Cluster Monitor Conguration View

Cluster Monitor: ASE: ASE 0


View

Device

Tools

Action

Help

Service

Pmgr

Device View

Dxlsm

Dxadvfs

Asemgr

Xterm

Status
08:44:57 Cluster Monitor
started

Net_199.155.0.0

Net_199.156.0.0

tinker

tailor

SCSI_3

SCSI_2

rz26

rz25

rz18

rz17

ZKOX548123RGS

88 Using the Cluster Monitor

Using the Cluster Monitor

Click MB1 on a device to display its connections. For example,


click a SCSI bus and the monitor will display all devices attached
to that bus, as shown in Figure 83.
Figure 83 SCSI Bus Conguration

Cluster Monitor: ASE: ASE 0


View

Device

Tools

Help

Action

Service

Pmgr

Device View

Dxlsm

Dxadvfs

Asemgr

Xterm

Status
08:44:57 Cluster Monitor
started

Net_199.155.0.0

Net_199.156.0.0

tinker

tailor

SCSI_3

SCSI_2

rz26

rz25

rz18

rz17

ZKOX548124RGS

Using the Cluster Monitor 89

Using the Cluster Monitor

Press Ctrl while clicking MB1 to add connections from another


component, as shown in Figure 84.
Figure 84 All Shared Connections
Cluster Monitor: ASE: ASE 0
View

Device

Tools

Help

Action

Service

Pmgr

Device View

Dxlsm

Dxadvfs

Asemgr

Xterm

Status
08:44:57 Cluster Monitor
started

Net_199.155.0.0

Net_199.156.0.0

tinker

tailor

SCSI_3

SCSI_2

rz26

rz25

rz18

rz17

ZKOX548125RGS

810 Using the Cluster Monitor

Using the Cluster Monitor

Double-clicking an icon may open a dialog box with more


detailed information. For example, double-clicking a member
system displays that systems local bus and devices, as shown in
Figure 85.
Figure 85 Local Connections
Cluster Monitor: ASE: ASE 0
View

Device

Tools

Help

Action

Service

Pmgr

Device View

Dxlsm

Dxadvfs

Asemgr

Xterm

Status
Status:
Private Devices view: tailor

08:44:57 Cluster Monitor


started

Net_199.155.0.0

tailor
SCSI0_tailor

Net_199.156.0.0

tinker

tz5

tailor

Close

Cdrom6

rz1

rz0

Help

SCSI_3

SCSI_2

rz26

rz25

rz18

rz17

ZKOX548126RGS

Services View

The services view displays the services that are registered in the
ASE domain. It displays the member systems with the online
services that are running on them. It also displays the ofine and
unavailable services.

Using the Cluster Monitor 811

Using the Cluster Monitor

Figure 86 is a representation of the Cluster Monitor services


view.
Figure 86 Cluster Monitor Services View

Cluster Monitor: ASE: ASE 0


View

Device

Tools

Action

Help

Service

Pmgr

Services View

Dxlsm

Dxadvfs

Asemgr

Xterm

Status
08:44:57 Cluster Monitor
started

tinker

tailor

users

projectX

database

ZKOX441838RGS

812 Using the Cluster Monitor

Using the Cluster Monitor

Each type of available service has an icon, as shown in Table 81.


Table 81 Available Service Icons

Disk service

NFS service

User-dened service

Unknown type of service

Double-click a service to display the shared storage devices


associated with the service, as shown in Figure 87.

Using the Cluster Monitor 813

Using the Cluster Monitor

Figure 87 Service Devices

Cluster Monitor: ASE: ASE 0


View

Device

Tools

Action

Help

Service

Pmgr

Services View

Dxlsm

Dxadvfs

Asemgr

Xterm

Status
08:44:57 Cluster
Service Details View: users Monitor
started

tinker

tailor
users

/dev/rz17c
users

projectX
/dev/rz25c

Close

Help

database

ZKOX441839RGS

You can relocate a service visually by dragging its icon to another


member system in this window.
The Action menu (available on the menu bar or by pressing MB3)
provides operations that can be performed on an ASE service,
including:

Placing a service on line

Placing a service off line

Restarting a service

Relocating a service

814 Using the Cluster Monitor

Launching Other Tools

Launching Other Tools


Overview

The Cluster Monitor provides a graphical view of the Available


Server conguration, where ASE member systems and tools are
represented as icons. It allows the administrator to launch tools
via a drag and drop interface. The Cluster Monitor is extensible
to allow integration of other graphical management applications.
With this capability, an administrator can manage a set of
systems from a local workstation.

Included Tools

The Cluster Monitor conguration and services windows include


toolbar icons for the following utilities:

LSM Visual Administrator (dxlsm)

AdvFS manager (dxadvfs)

Performance Manager (pmgr)

ASE manager (asemgr)

X terminal window (xterm)

You can click a tool icon to activate the utility on a system in the
current ASE domain, or drag the tool icon to a member system
icon to run the utility on that system.

External Tools

You can drag icons from the CDE desktop or Application Manager
to the Cluster Monitor window to invoke the application on
that cluster member system. For example, dragging the System
Information icon to the Cluster Monitor window and dropping the
icon onto a member system would cause the Cluster Monitor to
run that command on the member system and display the results.

Using the Cluster Monitor 815

Monitoring Available Server Congurations with the Cluster Monitor

Monitoring Available Server Congurations with the Cluster


Monitor
Overview

The Cluster Monitor provides a useful tool for monitoring the


health of the components of an Available Server conguration.
Each view provides indicators of the status of the components it
displays.
You can use the Cluster Monitor to:

Launch diagnostic tools to check the setup

Top View

Determine which components are failing


Launch management tools to recongure hardware or ASE
services

The top view presents an overview of the status of an Available


Server conguration. Problems or state changes are indicated by
changes in the icons displayed, as shown in Table 82.
Table 82 Main Window Failure Indicators
Icon or Display

Meaning

Outline around
system graphic

Failure of that system, one of its devices,


or one of its services
Failure of the system

Failure of one or more services on that


system
Failure of one or more shared devices

In addition, an attention light appears at the bottom of the main


window when a failure has been reported.

Device View

The device view displays the status of the hardware conguration


of the Available Server conguration. Problems or state changes
are indicated by changes in the icons displayed, as shown in the
following table.

816 Using the Cluster Monitor

Monitoring Available Server Congurations with the Cluster Monitor

Table 83 Device View Failure Indicators


Icon or Display

Meaning

Available Server Availability Manager is


not reporting that the system is a member
of the ASE domain
Ten soft errors have occurred in a 15minute interval; this indicates the disk
may be deteriorating
Hard error has occurred on the device; data
on the disk may be corrupted

In addition, the status log area reports status changes.

Services View

The services view displays the status of services registered in the


ASE domain. Problems or state changes are indicated by changes
in the icons displayed, as shown in the following table.
Table 84 Services View Failure Indicators
Icon or Display

Meaning

Service is off line

Service is unassigned due to a missing


resource
Available Server Availability Manager is
not reporting that the system is a member
of the ASE domain

Unknown type of service

In addition, the status log area reports status changes.

What to Do
when You See
an Error

If the Cluster Monitor indicates a problem, use the various views


to gather more detailed information about the failed component.
Double-click the component to see if further details are available.
See the Cluster Monitor online help to nd out what the symbol
displayed means for that component. The online help may suggest
some tools you can use to further diagnose the problem.

Using the Cluster Monitor 817

Monitoring Available Server Congurations with the Cluster Monitor

You can run the clu_ivp script to check the cluster setup. Run the
asemgr utility to check the ASE member and service setup. Check
the system event logs for cluster messages. Run other system
monitoring utilities, such as ifconfig, netstat, and ping.
You can run character-based utilities without leaving the Cluster
Monitor by selecting the Xterm option on the Tools menu, by
clicking the xterm icon on the toolbar, or by dragging the xterm
icon using MB2 to any member system.

818 Using the Cluster Monitor

Summary

Summary
Setting Up the
Cluster Monitor

To set up the Cluster Monitor:


1. Make sure all systems and devices are properly connected.
2. Make sure the Cluster Management subset (TCRCMS140) is

installed.
3. On each member system, set up the /.rhosts le to allow root

access for the rsh command between any two member systems.
You must include all member systems, including the local
system, in the /.rhosts le.
4. Check that all member systems are up by running the asemgr

utility and displaying member status.


5. Create the cluster conguration map on one ASE member

system.
/usr/sbin/cluster_map_create clustername -full

Using the
Cluster Monitor

The Cluster Monitor shows the current state of availability and


connectivity, and visually alerts the administrator to problems.
Run the Cluster Monitor by executing /usr/bin/X11/cmon as root
on an ASE member system.
The Cluster Monitor uses the following display windows:

The device view displays the hardware conguration. It


displays the network interconnects, the member systems, the
shared SCSI buses, and the shared storage devices.

Launching
Other Tools

The top view presents an overview of the status of the member


systems. For each member system, icons show the status of
the system, its ASE services, its interconnects, and its storage.

The services view displays the services that have been


registered in the current environment.

The Cluster Monitor displays ASE member systems, devices and


tools as icons. It allows you to drag a tool icon to a system icon to
execute the utility on that system.
The Cluster Monitor includes toolbar icons for the following
utilities:

LSM Visual Administrator (dxlsm)

AdvFS manager (dxadvfs)

Performance Manager (pmgr)

ASE manager (asemgr)

X terminal window (xterm)

Using the Cluster Monitor 819

Summary

In CDE, you can also drag application icons from the system
management applications window in the Application Manager to
the Cluster Monitor window.

Monitoring
Available
Server
Congurations
with the
Cluster Monitor

The Cluster Monitor provides a useful tool for monitoring the


health of the components of an Available Server conguration.
Each view provides indicators of the status of the components it
displays. In general, a diagonal line across an icon indicates the
failure of that component.

820 Using the Cluster Monitor

Exercises

Exercises
Setting Up
the Cluster
Monitor:
Exercise

Set up the Cluster Monitor by following these steps:


1. Make sure the appropriate subsets are installed.
2. Set up the /.rhosts le to allow root access for the rsh

command between any two member systems. You must


include all member systems, including the local system, in the
/.rhosts le.
3. Check that all member systems are "UP" by running the

asemgr utility and displaying member status.


4. Create the cluster conguration map on one cluster member

system.

Setting Up
the Cluster
Monitor:
Solution

1. Sample solution for TruCluster Software

# /usr/sbin/setld -i | grep TCRCMS140


You should also check for CXLSHRDA and any other subsets
you require.
2. Sample solution

tinker# cat > /.rhosts


tailor
tinker
Ctrl/D

3. Sample solution (in each ASE)

# /usr/sbin/asemgr
ASE Main Menu
a) Managing the ASE
-->
m) Managing ASE Services -->
s) Obtaining ASE Status -->
x) Exit

?) Help

Enter your choice: a


Managing the ASE
a)
d)
n)
m)
.
.
.

Add a member
Delete a member
Modify the network configuration
Display the status of the members

Enter your choice : m


Member Status

Using the Cluster Monitor 821

Exercises

Member:
tinker
tailor

Host Status:
UP
UP

Agent Status:
RUNNING
RUNNING

4. Sample solution

# /usr/sbin/cluster_map_create myase -full

Using the
Cluster
Monitor:
Exercise

1. Start the Cluster Monitor on your workstation.


a. Log in to an ASE member system as root.
b. Set your DISPLAY variable to point to the workstation.
c.

Set the session security to allow X access, or use the xhost


+ command to disable all X client security checks.

d. Run /usr/bin/X11/cmon.
2. Click a member system to show the device view for that

member.
3. Click a SCSI bus to display the devices attached to that bus.
4. Press

Ctrl

while clicking to display more connections.

5. Double-click a member system to display that systems local

SCSI buses and devices.


6. Click the Service icon in the device view above the Cluster

Map to switch to the services view.


7. Double-click an ASE service to display its shared storage

devices.
8. Relocate an ASE service by dragging its icon to another

member system. Use the Action menu (from the menu bar or
by pressing MB3) to put a service off line.

Using the
Cluster
Monitor:
Solution

1. The Cluster Monitor top view should appear. If you get a

display error, check your DISPLAY environment variable and


the session security.
2. See the online help for more information.
3. When you click a bus, the name should highlight, and lines

should be drawn showing the connections.


4. When you press

Ctrl

while clicking, additional connections

should be drawn.
5. When you double-click a member system, the host details

dialog box appears, displaying the systems local bus and


devices.
6. See the online help for more information.
7. When you double-click a service, the service details dialog box

appears, displaying the services shared storage devices.

822 Using the Cluster Monitor

Exercises

8. See the online help for more information.

Launching
Other Tools:
Exercise

1. Start the Cluster Monitor and check the online help for

information about launching tools.


2. Run each of the tools included in the Cluster Monitor by

clicking their icons.


3. Drag the asemgr icon to a service icon to get detailed status of

the service. Drag the xterm icon to a member system icon and
use the hostname command to verify you are executing on that
member system.
4. In CDE, launch the Application Manager by clicking its icon

on the CDE Front Panel. Drag the System Information icon


from the System Admin group and drop it on a member
system icon.

Launching
Other Tools:
Solution

1. Use the Help menu to access online help. The Tasks section

includes a topic on running applications by drag and drop.


2. Each tool should activate its own window.
3. The asemgr utility will run in an xterm window. The result of

the hostname command should be the name of the system on


which you dropped the xterm icon.
4. The Application Manager icon is a le drawer with tools in

it. The System Information icon is in the Daily Admin group.


The System Information application should run in its own
window.

Monitoring
Available
Server
Congurations
with the
Cluster
Monitor:
Exercise

1. Match each of the following Cluster Monitor failure indications

with its correct meaning.


a. Blank area in the shape
of a system icon

a. Service is off line

b. Diagonal line across a


system icon

b. Service is unassigned

c. Diagonal line across a


storage icon

c. System is not reported as an


ASE member

d. Circle in the corner of a


service icon

d. Disk error (may be soft errors)

e. Diagonal lines crossing


in the corner of service
icon

e. Failure of the system

2. Run the Cluster Monitor and check the online help to see

what troubleshooting information it provides.

Using the Cluster Monitor 823

Exercises

3. Pull a disk or shut down a member system and check the

Cluster Monitor to see the effect.

Monitoring
Available
Server
Congurations
with the
Cluster
Monitor:
Solution

1. Solution to matching
a. c.
b. e.
c.

d.

d. b.
e.

a.

2. An introduction to troubleshooting information is provided in

the Reference section of the Cluster Monitor online help. It


explains each of the icons and suggests tools to use to further
diagnose problems.
3. Any services that depend on the failed resource should try

to restart on another member in the same ASE domain. The


failure of a member system should be indicated by an icon
containing a blank space in the shape of a server.

824 Using the Cluster Monitor

9
Testing, Recovering, and Maintaining
TruCluster Congurations

Testing, Recovering, and Maintaining TruCluster Congurations 91

About This Chapter

About This Chapter


Introduction

This chapter describes how to verify that your ASE services will
behave as you expect when hardware failures occur. The chapter
also describes how to recover from hardware failures and how to
modify your ASE hardware conguration.
The following failures are discussed:

Member node crash

Failure of a DWZZA

Shared disk failure

Removal of power from a storage enclosure

Network interface falure

Network partition

Knowing how ASE fails over services will help you understand
how to best congure your ASE services.
You must also know what steps are needed to perform ongoing
maintenance tasks.

Objectives

To test TruCluster Available Server failover capability, you should


be able to:

Recover from hardware failures in an ASE

Resources

Test the hardware failure conditions that the TruCluster


Software detects, and predict how Available Server
Environment responds to these failures
Change the hardware conguration in an ASE

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Version 1.4 Release


Notes

TruCluster Available Server Software Software Product


Description, SPD 44.17.xx

Reference Pages

92 Testing, Recovering, and Maintaining TruCluster Congurations

Performing TruCluster Testing Procedures

Performing TruCluster Testing Procedures


Overview

This section describes how to test whether your Available Server


Environment conguration responds properly when failure events
occur. The following six failure events are tested:

System Power Off

DWZZA-AA Power Off

Removal of a Shared Disk

Removing Power from a BA350

Removal of the Network From One Member

Removal of the Network From All Members

System
Conguration
Assumptions

Tests in this section assume that the conguration meets


minimum requirements. There must be at least two Alpha
system member nodes, with a shared SCSI bus that is properly
terminated. Either external or internal DWZZAs can be used.
Testing can be performed with shared disks being mirrored with
LSM and/or with disks that are not mirrored.

Observing
System
Response

You should use the Obtaining ASE Status submenu on the ASE
Managers Main menu to see where a service resides before and
after performing tests. This will verify that the services failed
over as you expect.
You can also observe what messages the TruCluster Software
produces by using the tail -f command on a member node
running the Logger daemon and specifying the current daemon.log
le in the /var/adm/syslog.dated/date directory, where date is a
date and time stamp directory name such as 11-Jan-10:00.
For example, to observe messages in the directory for January 11
at 10:00 a.m., you would enter the following on a member node
running the Logger daemon:
# tail -f /var/adm/syslog.dated/11-Jan-10:00/daemon.log
As you introduce system failures, the Logger daemon sends
messages to the daemon.log le and tail displays them. This gives
you immediate feedback as you perform each test. Be aware that
network failures can interfere with the sending of log messages.
Whenever a message of severity level alert occurs, messages
get logged and the Alert script gets invoked. This script sends
"Critical ASE error" message mail to users specied in the script.
The default user specied in the script is root.

Testing, Recovering, and Maintaining TruCluster Congurations 93

Performing TruCluster Testing Procedures

Instructor Note

Relocation of each service is based on the services failover


Automatic Service Placement (ASP) policy.
In general, all tests in this section are highly dependent
on how the hardware and software have been congured.
The test results you achieve will vary depending on the
conguration, but the error message examples should be
consistent.

System Power
Off Test

This test consists of turning off the power on any member node
running a service to produce a Host Down condition. When you
power off a member node, The TruCluster Software relocates
services to surviving members based on the Automatic Service
Placement (ASP) policies in the Available Server Environment
conguration database. If the Director daemon was running on
the failed system, the Agent daemons on the remaining member
nodes elect a new node to start the Director daemon.
In a two-member environment, all services get relocated to the
surviving member node unless the service is restricted to a failed
member.
The daemon.log le will contain messages similar to those in the
following example. In the example, tailor is the member that
fails and tinker is a remaining member node. Note that the
Director daemon was running on tailor so that tinker starts a
new Director.

94 Testing, Recovering, and Maintaining TruCluster Congurations

Performing TruCluster Testing Procedures

Example 91 System Power Off Messages

Sep
Sep
Sep
Sep
Sep

16
16
16
16
16

14:24:22
14:24:25
14:24:26
14:24:26
14:24:46

Sep 16 14:24:47
Sep 16 14:24:56
Sep 16 14:24:56
Sep 16 14:24:57
Sep 16 14:25:00
Sep 16 14:25:13

DWZZA-AA
Power Off

tinker ASE: local HSM Warning: Cant ping tailor over the SCSI bus
tinker ASE: local HSM Warning: Cant ping tailor over the network
tinker ASE: local HSM ***ALERT: HSM_PATH_STATUS:30.14.80.33:DOWN
tinker ASE: local HSM Warning: member tailor is DOWN
tinker ASE: tinker AseMgr Warning: timeout waiting on Reply
to ASE_INQ_SERVICES
tinker ASE: tinker AseMgr Notice: director request timed out,
retrying...
tinker ASE: tinker Agent Notice: agent on tailor should start
director, but isnt in RUN state
tinker ASE: tinker AseMgr Warning: timeout waiting on Reply
to ASE_INQ_SERVICES
tinker ASE: tinker Agent Notice: starting a new director
tinker ASE: tinker Agent Notice: starting service nfsusers
tinker ASE: tinker Director Notice: started service nfsusers
on tinker
This test consists of removing power from one of the DWZZAAA signal converters located on a shared bus connected to a
controller. The purpose of the test is to simulate a SCSI path
failure. This test can also be accomplished by disconnecting a
tri-link connector on the DWZZA-AA connected to a controller.
The results of this action are that the disks located on the
corresponding SCSI bus will be inaccessible to the affected
member.
The TruCluster Software logs an error message, but takes no
further action until I/O to the affected devices is attempted.
The following example shows the error messages produced when
a DWZZA is disconnected.

Example 92 DWZZA Disconnection Error Messages


Sep 20 10:32:41 tinker ASE: local HSM Warning: Cant ping tailor over the SCSI bus
Sep 20 10:32:41 tinker ASE: local HSM ***ALERT: HSM_PATH_STATUS:30.14.80.33:UP
Sep 20 10:32:42 tinker ASE: local HSM ***ALERT: network ping to host tailor is
working but SCSI ping is not
Sep 20 10:22:53 tinker ASE: tinker Agent ***ALERT: device access failure on
/dev/rz17g from tinker
Sep 20 10:22:58 tinker ASE: tinker Agent Warning: AM cant ping /dev/rz17g
Sep 20 10:22:58 tinker ASE: tinker Agent Warning: cant reach device /dev/rz17g

Removing a
Shared Disk

This test consists of removing a shared disk during I/O activity


without powering off the disk and is also referred to as a hot
disk removal. This test creates a device failure condition.
If LSM mirroring is being used, the volumes should still be
available from the mirrored disk so that the service continues to
run. The Director daemon logs a message.
If mirroring is not being used, the Director daemon stops the
service.

Testing, Recovering, and Maintaining TruCluster Congurations 95

Performing TruCluster Testing Procedures

When a service is using mirrored disks, use the asemgrs rereserve


function to restart the service if no plexes are available and the
service fails. When a service is not using mirrored disks, you
should use the asemgrs restart function to restart the service.
The following example shows sample device error messages in a
conguration that has mirrored volumes.
Example 93 Failed Device Error Messages

Sep 20 11:54:20 tailor ASE: tailor Agent ***ALERT: device access failure on
/dev/rz17g from tailor
Sep 20 11:54:25 tailor ASE: tailor Agent Warning: AM cant ping /dev/rz17g
Sep 20 11:54:25 tailor ASE: tailor Agent Warning: cant reach device
/dev/rz17g
Sep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:
Using default setting of ASE_PARTIAL_MIRRORING=OFF
Sep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:
LSM plex "users-01" is not enabled
Sep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:
LSM plex "users-02" is OK, volume "users-vol" can continue to run
Sep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:
Device "/dev/vol/users-dg/users-vol" passed the LSM volume query
Disk recovery procedures are discussed later in this chapter.

Removing
Power from
BA350

This test removes power from a BA350 on a shared SCSI bus,


which also simulates a device failure. This is similar to removing
a shared disk. In a single BA350 conguration, the other member
nodes cannot access any of the disks. Therefore, any associated
disk services become "unassigned" until power is reapplied to the
BA350 device.
If a second BA350 had been congured to mirror the failed
BA350, the results of this test would be that the service continues
to run with data I/O continuing on the mirrored BA350.
The example shows a portion of the messages generated when
powering off a BA350.

Example 94 BA350 Power Off Error Messages

Sep 20 10:42:07 tailor ASE: tailor


Sep 20 10:42:07 tailor ASE: tailor
/dev/rz33g
Sep 20 10:42:07 tailor ASE: tailor
Sep 20 10:42:07 tailor ASE: tinker
Sep 20 10:42:09 tailor ASE: tailor
Sep 20 10:42:09 tailor ASE: tailor
/dev/rz33g
Sep 20 10:42:09 tailor ASE: tailor
/dev/rz33g

Agent Warning: AM cant ping /dev/rz33g


Agent Warning: cant reach device
Agent
Agent
Agent
Agent

Warning: cant reserve /dev/rz33g


Error: cant unreserve device
Warning: AM cant ping /dev/rz33g
Warning: cant reach device

Agent ***ALERT: possible device failure:

96 Testing, Recovering, and Maintaining TruCluster Congurations

Performing TruCluster Testing Procedures

Removing One
Member from
the Network

This test causes a network interface failure by disconnecting


all congured ASE network connections from one of the ASE
members. Because the ASE software continuously pings the
network, it knows immediately when network connection to a
member is lost.
When the network is lost to a member, the Agent daemon stops
all services on that member and the Director daemon relocates
them to a remaining member, as dened by the Automatic
Services Placement policy for the ASE.
The following example shows the error messages for a network
interface failure.

Example 95 Error Messages When One Member Removed from Network


Sep 20 12:07:45 tinker ASE: local HSM Warning: Cant ping tailor over the network
Sep 20 12:07:45 tinker ASE: local HSM ***ALERT: HSM_PATH_STATUS:30.14.80.33:DOWN
Sep 20 12:07:49 tinker ASE: local HSM Warning: member tailor is disconnected
from the network
Sep 20 12:07:50 tinker ASE: tinker Agent ***ALERT: member tailor cut off from net
Sep 20 12:08:02 tinker ASE: tinker Agent Notice: starting service nfsusers
Sep 20 12:08:20 tinker ASE: tinker Director Notice: started service nfsusers
on tinker
Sep 20 12:08:20 tinker ASE: tinker Director Notice: finished processing agent
state change from HSM: agent tailor state NIT_DOWN

Removing All
Members from
the Network

This test causes a network partition by disconnecting all


congured ASE network connections from all the ASE members.
When the network is lost among all the ASE members, the ASE
services continue to run, but the Director daemon exits, so the
TruCluster Available Server conguration no longer provides
failover or administrative functions until the network partition is
repaired.
The following example shows the error messages for a network
partition.

Testing, Recovering, and Maintaining TruCluster Congurations 97

Performing TruCluster Testing Procedures

Example 96 Error Messages When All Members Removed from Network


Sep 20 12:18:35 tinker ASE: local HSM Warning: \
Network interface ln0 30.14.80.33:DOWN
Sep 20 12:18:35 tinker ASE: local HSM ***ALERT: \
HSM_NI_STATUS:30.14.80.33:DOWN
Sep 20 12:18:37 tinker ASE: local Simulator Notice: \
snd: exiting...
Sep 20 12:18:38 tinker ASE: tinker Director ***ALERT: \
Network connection down... exiting
Sep 20 12:18:38 tinker ASE: tinker Director Warning: \
Director exiting...
Sep 20 12:18:45 tinker ASE: tinker Agent Notice: \
/var/ase/sbin/lsm_dg_action: voldg: Disk group \
users-dg: Some volumes in the disk group are in use
Sep 20 12:18:45 tinker ASE: tinker Agent Notice: \
/var/ase/sbin/lsm_dg_action: voldg deport \
of disk group users-dg failed
Sep 20 12:18:45 tinker ASE: tinker Agent Notice: \
/var/ase/sbin/lsm_dg_action: fsgen/volume: \
Warning: Volume users-vol in use by another utility
Sep 20 12:18:50 tinker ASE: tinker AseMgr Error: \
were net partitioned from the director
Sep 20 12:18:51 tinker ASE: tinker AseMgr ***ALERT: \
Net partition or disconnect - cannot find a director.
Sep 20 12:18:51 tinker ASE: tinker AseMgr Error: \
Unable to open the database.

98 Testing, Recovering, and Maintaining TruCluster Congurations

Recovering from Failures in the ASE

Recovering from Failures in the ASE


Instructional
Strategy

Overview

Instructor Note

This section describes procedures that a system


administrator may need to use when certain disk errors
occur.
These procedures are provided as examples for student
consideration when recovering a failed disk. The
TruCluster Software automatically recovers from other
failures.

Recovery from the following failures is performed automatically:

Member node

SCSI path

Network interface

Network partition

Recovery from disk device failures requires manual recovery.


This section describes procedures for replacing an LSM mirrored
disk and replacing a nonmirrored disk if either disk fails during
normal operations.

LSM
Mirrored Disk
Replacement

When a disk under LSM control in an ASE becomes faulty,


the Available Server Environment Alert script generates Alert
messages. The messages identify the faulty disk by including its
/dev/rz device name along with error-specic information.
With mirrored disks, the remaining plexes to the faulty disk
provide users with transparent access to the data. However eld
service engineers must still replace the bad disk. When the disk
is replaced, you must bring the new disk on line. The following
steps explain how to use LSM bottom-up commands to accomplish
this task.
Perform the following steps to replace a shared disk under LSM
control in an ASE.
1. Obtain LSM disk group information.
2. Remove faulty disk from the LSM database.
3. Restore the partition table.
4. Initialize the new disk for use with LSM.

Testing, Recovering, and Maintaining TruCluster Congurations 99

Recovering from Failures in the ASE

5. Associate the new disk to a disk media name and to a disk

group.
6. Recover the plex from the working plex.
7. Rereserve the corresponding service devices

These steps are described in the following sections.

Obtaining LSM
Disk Group
Information

Use the volprint command on the member running the service to


obtain information on the failed disk. The example shows how to
display all records in all disk groups and their associations with
an RZ26l disk named rz49 as a faulty disk.

Example 97 LSM Disk Group Information


# volprint -hA
TYPE
dg
dm

NAME
rootdg
rz2g

dg
dm
dm
dm
dm

db
pd-rz33
pd-rz34
pd-rz35
pd-rz41

dm
dm
dm

pd-rz49
pd-rz50
pd-rz51

vol
plex
SD
SD
SD
SD
plex

vold
db-01
pd-rz33-data
pd-rz34-data
pd-rz35-data
pd-rz41-data
db-02

SD
SD

pd-rz49-log
pd-rz49-data

ASSOC
rootdg
rz2g
db
rz33
rz34
rz35
rz41
.
.
.
rz50
rz51
.
.
.
fsgen
vold
db-01
db-01
db-01
db-01
vold
db-02
db-02
.
.
.

KSTATE
-

LENGTH
4096

COMMENT
Disk group
Disk media

2050347
2050347
2050347
2050347

Disk group
Disk media
Disk media
Disk media
Disk media

2050347
2050347
2050347

Disk media
Disk media
Disk media

ENABLED 12301824
ENABLED 12301824
2050347
2050347
2050347
2050347
DISABLED 12301824
-

2050347
2050347

Volume
Plex nbr 1
Sub-disk
Sub-disk
Sub-disk
Sub-disk
Plex nbr 2
Sub-disk
Sub-disk

910 Testing, Recovering, and Maintaining TruCluster Congurations

Recovering from Failures in the ASE

The listing provides the following information on rz49:

Disk group name - db

Physical disk name - rz49

Disk media name - pd-rz49

Subdisk name - pd-rz49-log and pd-rz49-data

Plex name - db-02

Volume name - vold

The listing also shows that because the disk failed during I/O, the
db-02 plex was automatically disabled as indicated in the KSTATE
column.
You can conrm the failed disk information using the voldisk
command, as shown in the following example.
Example 98 Conrming Failed Disk Information
# voldisk list
DEVICE
rz1g
rz33
rz34
rz35
rz36
rz41
rz42
rz43
rz49

TYPE
simple
sliced
sliced
sliced
sliced
sliced
sliced
sliced
sliced

rz50
rz51

sliced
sliced

DISK
rz1g
pd-rz33
pd-rz34
pd-rz35
pd-rz36
pd-rz41
pd-rz42
pd-rz43
pd-rz49
.
.
.
pd-rz50
pd-rz51
pd-rz49

GROUP
rootdg
db
db
db
db
db
db
db
db

STATUS
online
online
online
online
online
online
online
online
online

db
db
db

online
online
failed - was rz49

Note that the rz49 device is disassociated from the disk media
name, rz49-data along with the failed message in the STATUS
column.

Removing
Faulty Disk
from LSM
Database

You must remove the faulty disks disk media name from the disk
group and the disk access record from the LSM conguration
database so that you can use the disklabel command. The
following example shows how to remove rz49 from the LSM
conguration databases.
# voldg -g db rmdisk pd-rz49
# voldisk rm rz49

Testing, Recovering, and Maintaining TruCluster Congurations 911

Recovering from Failures in the ASE

Restoring the
Partition Table

Once the disk is removed from the LSM database, you must
restore the partition table on the new disk. Use the disklabel
command on a rz26l with a device name of rz49 as follows:
# disklabel -r -R rz49 /usr/ASE_SERVICES/disklabel_rz49 rz26l

Initializing the
Disk for LSM

To initialize a sliced disk with rz49g as the public region and


rz49h as the private region:
# voldisk -f init rz49

Associating the
New Disk

Next associate the new disk to a disk media name and to a disk
group. Here is an example of associating rz49 to the db disk group
with a disk name of pd-rz49.
# voldg -g db -k addisk pd-rz49=rz49

Recovering the
Plex

The last step is to recover the plex from the working plex. This
operation will not interfere with normal database service provided
by the TruCluster Available Server, except for a temporary system
performance degradation while the plex is restored. The recovery
process can take a number of hours to complete depending on the
size of the volume and the processing capability of the system.
The volrecover command restores the plex. For example, to
recover the db plex using volrecover in the background:
# volrecover -g db -sb
Note

You must never interrupt a volrecover operation with a


kill -9 command because the volume and the plex will
become locked.
If kill -9 is invoked, you must issue commands similar to
the following to restart the recovery process:
# voledit -g db -P set tutil0="" db-02
# voledit -g db -v set tutil0="" vold

Use the volprint command to check the recovery process as


follows:
# volprint -g db -mp db-02 | grep iomode
When the iomode equals RW for Read/Write, the recovery
operation is complete.

912 Testing, Recovering, and Maintaining TruCluster Congurations

Recovering from Failures in the ASE

Rereserving
the Service

You must nish incorporating the LSM disk into the ASE service
by rereserving the devices associated with the service.
1. Run the asemgr utility.
2. Choose the Advanced Utilities submenu item from the

Managing ASE Services menu.


3. Choose Rereserve a services devices (LSM only).
4. Select the service associated with the disk that you want to

rereserve.

Replacing a
Nonmirrored
Disk

If a disk that is not part of an LSM mirrored volume or a


mirrored RAID device needs to be replaced, you must do one of
the following:

Unassigned
Service

Remove the disk from the service by modifying the service


and deleting the disk. Once you replace the disk, modify the
service and add the disk.
Select the Modify a service item on the Service Conguration
menu.
Temporarily stop the service by setting the service off line.
Once you replace the disk, set the service on line.
Select the Set a service off line and the Set a service on line
items in the Managing ASE Services menu.

A service becomes unassigned when the TruCluster Software


cannot start the service on another eligible member, such as when
there is a disk failure. The Availability Manager noties the
Agent daemon of a device path failure or device failure message.
If the affected disk is not mirrored with LSM, the Agent stops the
service and noties the Director daemon. The Director daemon
sequentially attempts to start the service on another member.
If you see a service status of unassigned with asemgr, you should
check the log les that the Logger daemon writes to, such as the
daemon.log le, to determine the cause.
Once the hardware is working, the service must be manually
restarted by selecting the Restart a service item in the Managing
ASE Services menu.

Testing, Recovering, and Maintaining TruCluster Congurations 913

Recovering from Failures in the ASE

Resetting
TruCluster
Daemons

If you experience problems, you can reset the TruCluster


daemons. This stops the Director and Host Status Monitor
daemons and initializes the Agent daemons. The Agent daemons
then restart the other daemons to make the TruCluster Software
operational.
To reset the TruCluster daemons, issue the following command:
# /sbin/init.d/asemember restart

914 Testing, Recovering, and Maintaining TruCluster Congurations

Performing Ongoing Maintenance Tasks

Performing Ongoing Maintenance Tasks


Overview

After you have set up your TruCluster Available Server


conguration and it has been running awhile, events may occur
that require you to:

Change the hardware conguration

Add or remove systems

Add or remove storage boxes

Add or remove disks

This section discusses how to accomplish these tasks.

Changing
Hardware
Conguration

The biggest concern with changing the hardware conguration


focuses on proper termination of shared buses. If the original
conguration used Y cables and DWZZAs, you may be able to
isolate a device and maintain termination of the bus so that you
can perform maintenance without affecting the rest of the system.
If the original system was not congured to isolate devices, you
will need to stop all TruCluster Software daemon activity before
changing the conguration and then restart the TruCluster
Software once you have made the changes.

Testing, Recovering, and Maintaining TruCluster Congurations 915

Performing Ongoing Maintenance Tasks

Stopping and
Restarting
TruCluster
Daemon
Activity

Before performing maintenance on a device you cannot isolate


and maintain a terminated shared bus, you must set all the ASE
services off line and then stop all TruCluster Software daemon
activity. To stop all TruCluster Software daemon activity, invoke
the asemember utility on all ASE members as follows:
# /sbin/init.d/asemember stop
Once you have recongured the system, restart TruCluster
Software daemon activity as follows:
# /sbin/init.d/asemember start
After you issue the asemember start command, you can set the
ASE services back on line.

Adding and
Removing
Member Nodes

If you want to add a member node, the shared bus will be


unterminated at some position. Therefore, you must stop all
TruCluster Software daemon activity before you add the new
member.
When removing a system, you do not have to stop TruCluster
Software daemon activity if the shared bus is properly terminated.
With a properly terminated bus, you can:

Shut down and power off the system

Adding and
Removing
Storage Boxes

Delete the member system from the ASE


Disconnect the system from the shared bus

When adding a storage box, you must stop all TruCluster


Software daemon activity because the shared bus will be
unterminated at some point in the conguration and will need to
be recongured.
When removing a storage box, you do not have to shut down
TruCluster Software daemon activity provided that removing the
storage box will not cause the bus to be unterminated. Make
sure that no services are using the disks in the storage box that
you are removing unless the disks are part of an LSM mirrored
volume or a hardware-mirrored RAID device.
To take a shared disk off line, ensure that a running service is not
using the disk, unless the disk is part of a mirrored volume or a
mirrored RAID device. If the disk is being used by a service, use
asemgr to put the service off line. Use the asemgr to restart the
service when you bring the disk back on line.
If the disk is part of an LSM mirrored set and the service is
running, you must use the ASE Manager rereserve function when
you bring the disk back on line. To use this function, choose a)
Advanced Utilities from the Managing ASE Services menu. The
Advanced Utilities submenu has the following menu item:
r) Rereserve a services devices (LSM only)

916 Testing, Recovering, and Maintaining TruCluster Congurations

Performing Ongoing Maintenance Tasks

Adding and
Removing
Disks

To add a disk to your hardware conguration:


1. Install the disk in the storage box.
2. Note the disks unique SCSI ID.

The SCSI ID for a disk in a BA350 storage box corresponds to


its slot number. The SCSI IDs for disks in a BA353 storage
box are set by the device address switch on the back of the
box. On an 8-bit SCSI bus, the SCSI specication limits the
number of nodes or devices to eight and each SCSI device or
controller must have a unique SCSI ID from 0 to 7.
The HSZ10 uses only one SCSI ID, while the HSZ40 can be
congured with 1 to 4 SCSI IDs.
3. If necessary, update the system conguration les to ensure

that the system recognizes the new disk.


4. If you are using the disk in an AdvFS domain or LSM volume,

you must also perform the appropriate steps to add the disk to
the le system or volume.
If you are removing a disk from a storage box, ensure that a
running service is not using the disk, unless the disk is part of
an LSM mirrored volume or a mirrored RAID device. Use the
asemgr to stop any services using a disk that you want to remove
or replace.

Testing, Recovering, and Maintaining TruCluster Congurations 917

Summary

Summary

Performing
TruCluster
Software
Testing
Procedures

Testing procedures give you condence that the TruCluster


Software conguration you have built will fail over as expected.
Some of the tests that you can run are:

Power off a member node to create a Host Down condition.

Power off DWZZA-AA to cause a SCSI path failure.

Remove a shared disk to create a device failure condition.

Remove power from BA350 to also produce a device failure.

Remove a member node from the network to create a network


interface failure.

Remove all member nodes from the network to create a


network partition.

Performing
Disk Recovery
Procedures

When a disk becomes faulty, you must perform different


procedures to recover from the failure depending on whether the
disk is mirrored with LSM or is a nonmirrored disk.

Performing
Ongoing
Maintenance
Tasks

There are a number of maintenance tasks you must perform as


conditions change with your system. These tasks include:

Changing the hardware conguration

Adding and removing disk storage boxes

Adding and removing disks

Adding and deleting members

918 Testing, Recovering, and Maintaining TruCluster Congurations

Exercises

Exercises
Member
Node Failure:
Exercise

Perform a member node failure test. How does the TruCluster


Software respond?

Member
Node Failure:
Solution

To test a member node failure:


a. Use the tail utility on an unaffected member node running

the Logger daemon to observe the entries in the daemon.log


le. Locate the messages associated with services that failed
over to another member node (for example, tinker).
# tail -f /var/adm/syslog.dated/26-Mar-16:22/daemon.log
b. Turn off the power on one of the ASE members (for example,

tailor) running a particular service (such as the ase4 service).


c.

Observe the error messages that were generated.

d. Conrm that services on the disconnected member node tailor

failed over as dened by the Automatic Service Placement


policy to another member node tinker. This assumes that the
service is not restricted to run only on tailor. Use the asemgr
menu selections to display the status of the ase4 service:
Status for NFS service ase4
Status:
on tinker

Relocate: Placement Policy:


no
Favor Member(s)

Favored Member(s):
tinker

NFS service ase4 exports


#
# ASE exports file for service ase4
#
#loopback#fset1 exports (after this line) - DO NOT DELETE THIS LINE
/share/test/loopback -r=0 cluster_dev
File system(s):
AdvFS set: loopback#fset1 mount options: rw
AdvFS domain: loopback devices: /dev/vol/dg2/vol01
LSM Disk Group: dg2 devices: rz19c rz25c

Network
Interface Test:
Exercise

Perform a network interface test. How does the TruCluster


Software respond?

Network
Interface Test:
Solution

To test a network interface failure:


a. Use the tail utility on an unaffected ASE member to observe

the entries in the daemon.log le.


b. Disconnect all congured ASE network connections from an

ASE member node (for example, tailor) running a particular


service (such as the ase4 service).

Testing, Recovering, and Maintaining TruCluster Congurations 919

Exercises

c.

Conrm that services on the disconnected member node tailor


failed over as dened by the Automatic Service Placement
policy to another member node tinker. This assumes that the
service is not restricted to run on only tailor. Use the asemgr
menu selections to display the status of the ase4 service:
Status for NFS service ase4
Status:
on tinker

Relocate: Placement Policy:


no
Favor Member(s)

Favored Member(s):
tinker

NFS service ase4 exports


#
# ASE exports file for service ase4
#
#loopback#fset1 exports (after this line) - DO NOT DELETE THIS LINE
/share/test/loopback -r=0 cluster_dev
/share/test/loopback -r=0 cluster_dev
File system(s):
AdvFS set: loopback#fset1 mount options: rw
AdvFS domain: loopback devices: /dev/vol/dg2/vol01
LSM Disk Group: dg2 devices: rz19c rz25c

Recovering
from Failures
in the ASE:
Exercise

What procedure would you use to remove a nonmirrored disk


from an ASE service?

Recovering
from Failures
in the ASE:
Solution

To remove a nonmirrored disk from an ASE service:


a. Choose Modify a service from the Service Conguration item

in the asemgr Managing ASE Services menu and delete the


disk from the service.
b. Set the service off line.
c.

Replace the disk.

d. Choose Modify a service from the Service Conguration item

in the asemgr Managing ASE Services menu and add the new
disk to the service.
e.

Performing
Ongoing
Maintenance
Tasks:
Exercise

Set the service back on line.

Describe the commands used to start and stop all TruCluster


Software daemon activity.

920 Testing, Recovering, and Maintaining TruCluster Congurations

Exercises

Performing
Ongoing
Maintenance
Tasks:
Solution

To stop all TruCluster Software daemon activity, use the


following command:
# /sbin/init.d/asemember stop

To restart TruCluster Software daemon activity, use the


following command:
# /sbin/init.d/asemember start

Testing, Recovering, and Maintaining TruCluster Congurations 921

10
Troubleshooting TruCluster Congurations

Troubleshooting TruCluster Congurations 101

About This Chapter

About This Chapter


Introduction

This chapter describes ways to troubleshoot a TruCluster


Available Server conguration that is not functioning properly. It
focuses on three main topics:

Interpreting error messages

Learning troubleshooting procedures

Using system monitoring tools to diagnose problems

Used in combination, the information covered in these topics can


help you to isolate and resolve problems in TruCluster Available
Server congurations.

Objectives

To understand how to troubleshoot a TruCluster Available Server


conguration, you should be able to:

Interpret TruCluster Software error log messages and use that


information to assess problem causes

Describe the recommended procedures for troubleshooting


TruCluster Software congurations

Resources

Describe basic techniques for troubleshooting TruCluster


congurations

Use the system monitoring tools to gauge whether a


TruCluster Software implementation is functioning properly

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Version 1.4 Release


Notes

Reference Pages

102 Troubleshooting TruCluster Congurations

Introducing ASE Troubleshooting Techniques

Introducing ASE Troubleshooting Techniques


Overview

TruCluster Available Server implementations involve the complex


interaction of hardware and software in customized solutions.
Because of the variability of possible TruCluster congurations, it
is difcult to generalize about malfunctions that may occur.
This chapter presents a general overview of the techniques and
tools available for TruCluster Available Server troubleshooting.

Troubleshooting
Topics

The contents of the three main troubleshooting topics in this


chapter are as follows:

Interpreting error messages


The TruCluster Software keeps a record of signicant events
that occur by generating error log messages. This section
describes how to interpret these messages.

Following troubleshooting procedures


It is important to identify and resolve failures in TruCluster
Available Server congurations with as little down time as
possible. This section describes procedures that promote
efcient troubleshooting of TruCluster problems.

Using system monitoring tools


System monitoring tools allow you to determine whether
your TruCluster Available Server conguration is functioning
properly. This section describes some of the most useful tools
for monitoring TruCluster congurations.

Troubleshooting TruCluster Congurations 103

Introducing ASE Troubleshooting Techniques

Troubleshooting
Sequence

The key to successful troubleshooting is to quickly isolate the


source of the problem. To promote this goal, the information in
this chapter is presented in an order that supports the following
recommended troubleshooting strategy:
1. If possible, examine the error log messages to isolate

the component of your TruCluster conguration that is


malfunctioning.
2. Follow troubleshooting procedures to pinpoint the problem

cause.
3. Use system monitoring tools to diagnose the problem.
4. Formulate and apply a recovery plan.

Following this strategy can help you to nd and correct a problem


with a minimum of extraneous effort.
Figure 101 Troubleshooting Strategy

Examine Error
Log and Alert
Messages

Follow
Troubleshooting
Procedures

Apply System
Monitoring
Tools

Formulate
and Apply
Recovery Plan
ZKOX3927100RGS

104 Troubleshooting TruCluster Congurations

Interpreting TruCluster Error Log Messages

Interpreting TruCluster Error Log Messages


Overview

Examining the TruCluster log messages is one of the best ways to


determine the cause of a problem in an ASE. This section reviews
how to access and interpret these messages.

TruCluster
Logger
Daemon

The TruCluster Logger daemon (aselogger) tracks the messages


generated by all the ASE member systems. The Logger daemon
uses the event logging facility, syslog, which collects messages
logged by various kernel, command, utility, and application
programs.
Messages sent to the daemon.log le can be generated by the
following software components:

AseMgr The asemgr utility

Director The Director daemon

Agent The Agent daemon

AseLogger The Logger daemon

HSM The Host Status Monitor daemon

AseUtility A process or daemon unrelated to TruCluster

Error messages generated by the Availability Manager (AM)


driver are sent to the kern.log le.
Log messages are logged to a local le or forwarded to a remote
system, as specied in the systems /etc/syslog.conf le. Log
messages generated by the HSM daemon and the AM driver are
logged only to the local system. If all Logger daemons in the ASE
stop, daemon messages continue to be logged, but only locally.
To nd the TruCluster message logs, identify a member system
running the Logger daemon, then check its system log les.
Log les are located at: /var/adm/syslog.dated/date, where date
is a date and time stamp subdirectory (such as 10-Feb-09:07).
The daemon.log les contain messages generated by the various
TruCluster Software daemons according to the severity level set
with the asemgr. Error messages generated by the Availability
Manager do not contain severity levels.

Troubleshooting TruCluster Congurations 105

Interpreting TruCluster Error Log Messages

Interpreting
Log Messages

TruCluster log messages include the following information:

Time stamp

Local system name

ASE identier (not used in messages from the Availability


Manager driver)

System that generated the message (or local)

Source of the message:


AseMgr

asemgr utility

Director

Director daemon

Agent

Agent daemon

HSM

HSM daemon

AseLogger

Logger daemon

AM

Availability manager driver

vmunix

Kernel image

AseUtility

Command executed by an action script

Severity of the message (not provided in AM messages)

Message text

Figure 102 shows the location of the information in a sample


daemon.log message.
Figure 102 A daemon.log File Entry

Sep 27 15:38:20 tailor ASE: tailor Agent Notice: AseMgr on tailor disconnected

Message Text
Event severity level
Daemon that generated
the message
Member node on which
the event originated
ASE identifier
Name of the local system
Date and Time stamp
ZKOX548127RGS

106 Troubleshooting TruCluster Congurations

Interpreting TruCluster Error Log Messages

Alert Messages

An alert is triggered when the TruCluster Software detects a


circumstance requiring user intervention to maintain normal
operation of the ASE. Such events include disk access failure,
network access failure, and failure to read the TruCluster
conguration database.
When an alert condition occurs, the TruCluster Software prints
an Alert message in the log. It also writes the Alert message into
the le /var/ase/tmp/alertMsg and executes a user-dened script.
The example script shipping with TruCluster mails the error text
to a list of users. However, users can write their own scripts to
perform other tasks in response to an alert, such as dialing a
beeper number. Scripts can also be used to attempt to parse alert
text from the alertMsg le.
Figure 103 depicts the two paths an Alert message takes when
an alert condition occurs.
Figure 103 Alert Message Paths

Alert
Error
Condition

TruCluster
Logger
Daemon

Alert Script
Actions

Log Files in
/var/adm/sys.dated/date
ZKOX548128RGS

Troubleshooting TruCluster Congurations 107

Learning Troubleshooting Procedures

Learning Troubleshooting Procedures


Diagnosing
Active Systems

This section discusses procedures that can help you to diagnose


and x TruCluster Available Server problems with a minimum of
wasted effort.
You should try to identify and resolve TruCluster Available Server
problems while the TruCluster Software is still up and running.
The TruCluster Available Server conguration can continue
providing ASE services (if possible), and you have access to data
about the ongoing status of the system.
As Figure 104 indicates, troubleshooting active TruCluster
Available Server congurations involves the following steps:

Check Error Logs Start by examining the error log les.


Error log messages can often identify a problem cause and
suggest a solution.

Apply System Monitoring Tools Use available system


monitoring tools to look for problems.
Utilities that can be helpful in troubleshooting TruCluster
Available Server congurations are discussed in the system
monitoring section of this chapter.

Reset Daemons If all else fails, you can reset the TruCluster
Software daemons. This stops the Director, Logger, and Host
Status Monitor (HSM) daemons and initializes the Agent
daemons. The Agent daemons then restart all the daemons to
make the TruCluster Software fully operational.

108 Troubleshooting TruCluster Congurations

Learning Troubleshooting Procedures

Figure 104 Troubleshooting an Active TruCluster Conguration

Examine
Error Log
Messages

asemgr for
ASE Member
and Service
Status
AdvFS and
LSM Utilities
to Monitor
Disk Services

Use System
Monitoring
Tools

file and
scu/show edt
to Check SCSI
Devices
uerf to
Examine
Errors on
the SCSI Bus

netstat to
Diagnose
Network
Problems

Reset
TruCluster
Daemons
ZKOX548129RGS

Troubleshooting TruCluster Congurations 109

Learning Troubleshooting Procedures

Diagnosing
Nonactive
Systems

If a problem cannot be resolved while the TruCluster Software is


up and running, you may need to shut down some or all of the
member systems to look for the problem cause. As Figure 105
indicates, you should use the following procedure:
1. Stop all TruCluster activity (procedures for stopping

TruCluster activity are described later in this section).


2. Shut down the systems (procedures for turning off member

systems are described later in this section).


3. Running from the console of each ASE member, check to see

that all devices are recognized. Determine that the basic


hardware setup, SCSI bus conguration, and network setup
are all correct.
4. Boot each machine, then make sure that each disk is

reachable by the operating system. Check to see that you can


ping each host member over the network. Use uerf to make
sure devices are numbered consistently. If you discover a
problem, you may need to recongure the kernel.
Note

If steps 3 and 4 reveal no problems, you have a high


assurance that the basic setup is OK.

5. Do some data transfers to make sure that the system can

do reliable I/O. If you get parity errors, you probably have a


cabling problem.
6. Do some network transfers to make sure that the network is

up and running properly.


7. If these steps reveal no problems, but failures in the

TruCluster conguration persist, follow procedures for


diagnosing active systems.

1010 Troubleshooting TruCluster Congurations

Learning Troubleshooting Procedures

Figure 105 Troubleshooting Nonactive TruCluster Congurations

Stop All
TruCluster
Activity

Shut Down
ASE Member
Systems

Use Console
Commandsto
Check Devices
If Necessary,
Reconfigure
the Kernel
Boot Member
Systems, and
Check Network
and SCSI Bus

Try Some
Data
Transfers

Follow
Procedures for
Diagnosing
Active Systems
ZKOX548130RGS

Troubleshooting TruCluster Congurations 1011

Using System Monitoring Tools

Using System Monitoring Tools


Overview

You can use various system monitoring tools to examine your


TruCluster Available Server conguration and gauge the status
of system components. This section identies system monitoring
tools that can be helpful in diagnosing TruCluster problems. It
also describes procedures to apply these tools.
Table 101 lists some of the system monitoring utilities that can
be useful in troubleshooting TruCluster congurations.
Table 101 System Monitoring Utilities
Utility

Function

asemgr

Displays ASE member, service, and network


status

ps

Displays daemon status

rpcinfo

Displays daemon status and port information

uerf

Searches CAM logs for SCSI bus problems

netstat

Diagnoses network problems

iostat

Determines disk status

file

Checks devices on the shared SCSI bus

scu/show edt

Checks devices in the Equipment Device Table


shared SCSI bus

showfdmn

Monitors AdvFS le domains

showfsets

Monitors AdvFS le sets

volprint

Monitors LSM disk services

1012 Troubleshooting TruCluster Congurations

Using System Monitoring Tools

Using asemgr
to Monitor
Member Status

You can use the asemgr utility to display the host status and the
status of the Agents for each member system in the ASE domain.
Table 102 describes possible values displayed in the Host Status
eld.
Table 102 Host Status Values
Host Status

Description

UP

The member system is up and can be


accessed by the member running the
Director using the network and the SCSI
bus.

DOWN

The member system cannot be accessed by


the member running the Director using the
network or the SCSI bus.

DISCONNECTED

The member system is disconnected from


the network.

NETPAR

There is a network partition between the


member system running the Director and
the specied member.

Troubleshooting TruCluster Congurations 1013

Using System Monitoring Tools

Table 103 describes possible values displayed in the Agent Status


eld.
Table 103 Agent Status Values
Agent Status

Description

RUNNING

The Agent is running on the member system.

DOWN

The Agent is not running on the member


system.

INITIALIZING

The Agent running on the member system is


in its initialization phase and will be running
soon.

UNKNOWN

The Director cannot determine the state of


the Agent on the member system.

INVALID

The Director reports an invalid state for the


Agent on the member system.

Example 101 shows the asemgr member status display of a


two-member TruCluster conguration where one member is
disconnected.
Example 101 Member Status

Member:
tailor
tinker

1014 Troubleshooting TruCluster Congurations

Host Status:
DISCONNECTED
UP

Agent Status
UNKNOWN
RUNNING

Using System Monitoring Tools

Using asemgr
to Monitor
Service Status

You can also use the asemgr to display the status of existing ASE
services. The service status includes the following information:

The type of service; either NFS, disk, or user-dened

The service name

The member on which the service is running or OFFLINE if


the service is off line

The services ASP policy

The disk conguration that the service uses

Example 102 shows the asemgr status display for a disk service
named nfsusers.
Example 102 Service Status

Status for NFS service nfsusers


Status: Relocate: Placement Policy: Favored Member(s):
on tinker yes
Balance Services None
Storage configuration for NFS service nfsusers
NFS Exports list
/usr/nfsusers
Mount Table (device, mount point, type, options)
users-domain#users /var/ase/mnt/nfsusers/usr/nfsusers\
advfs rw,groupquota,userquota
Advfs Configuration
Domain:
Volume(s):
users-domain
/dev/vol/users-dg/users-vol
LSM Configuration
Disk Group:
Device(s):
users-dg
rz17 rz33

Using asemgr
to Monitor
the Network
Conguration

You can choose the Show the current conguration item from the
asemgr ASE Network Conguration menu to show the current
network status, as shown in Example 103.

Example 103 Network Conguration Status

ASE Network Configuration


Member Name
___________

Interface Name
______________

Member Net
__________

Monitor
_______

tinker

tinker

Primary

Yes

tailor

tailor

Primary

Yes

Is this configuration correct (y|n)? [y]: y

Troubleshooting TruCluster Congurations 1015

Using System Monitoring Tools

Determining
Host Adapter
Settings

To determine host adapter settings, follow the instructions for


the particular host adapter component you are using in your
TruCluster conguration. There is no generic way to obtain
this information; each host adapter has its own set of command
requirements.

Using the
uerf Utility to
Monitor SCSI
Bus Errors

You can use the uerf utility to inspect information about CAM
errors on the SCSI bus. In addition, when a system reboots, uerf
displays useful information about the bus conguration. Note
that the information displayed by the uerf utility varies according
to the host adapter being used.
To generate uerf information, log in as superuser and enter the
following command string:
# uerf -o full | more
Example 104 shows uerf output indicating a bus reset.
Example 104 uerf CAM Error Display

----- EVENT INFORMATION ----EVENT CLASS


OS EVENT TYPE
199.
SEQUENCE NUMBER
2.
OPERATING SYSTEM
OCCURRED/LOGGED ON
OCCURRED ON SYSTEM
SYSTEM ID
x00020004
SYSTYPE
x00000000

ERROR EVENT
CAM SCSI
DEC OSF/1
Mon Apr 17 18:10:09 1995
tinker
CPU TYPE: DEC 3000

----- UNIT INFORMATION ----CLASS


SUBSYSTEM
BUS #

x0022
x0000
x0003
x00C8

DEC SIM
DISK
LUN x0
TARGET x1

----- CAM STRING ----ROUTINE NAME

ss_device_reset_done
Bus device reset has been performed

If the uerf output displays a repeated pattern of messages


which has no obvious explanation, this often indicates there is
a termination problem. It can also be useful to look for errors
registered by different targets (devices) on the same bus this
indicates that there is a problem with the specied bus.

1016 Troubleshooting TruCluster Congurations

Using System Monitoring Tools

Monitoring
Daemons

Use the ps command to determine where the TruCluster daemons


are running. The following command string allows you to isolate
TruCluster-related process information.
# ps -ax | grep ase
This command generates the following information display.
Example 105 Using ps to Determine Daemon Status

tailor.zko.dec.com> ps -ax | grep ase


PID TTY
S
TIME COMMAND
265 ??
I <
0:01.02 /usr/sbin/aselogger
267 ??
I <
0:02.00 /usr/sbin/aseagent -b
358 ??
S < 23:09.48 asehsm
11491 ??
I <
0:00.40 asedirector
In addition, you can use rpcinfo -p to display a list of the
daemons running and the ports the daemons are using.
Example 106 Using rpcinfo to Display Daemon and Port Information

tailor.zko.dec.com> rpcinfo -p | grep ase


program vers proto port
395179
1 tcp 1023 aselogger
395177
1 tcp 1021 asehsm
395176
1 tcp 1022 aseagent
395175
1 tcp 1017 asedirector
You can also use rpcinfo to determine if applications are
registered and running. For more information, see the
rpcinfo(8nfs) man page.

Troubleshooting TruCluster Congurations 1017

Using System Monitoring Tools

Monitoring the
Network

If a problem exists with a service using an IP "pseudo-address",


you can go to the server on which the service is running and use
netstat to make sure it is congured. For example, the netstat -i
command generates the following information about service nfs2:
Name Mtu Network Addr Ipkts Ierrs Opkts Oerrs Coll
nfs2 1500 10.30.50 none 32221
0 12843
6 6989
You can also use the scu command to monitor devices listed
in the Equipment Device Table (edt). Example 107 displays
information about devices at bus 4, lun 0 generated by the scu
show edt command.
Example 107 scu show edt Display

# scu
scu> show edt bus 4 lun 0 full
Inquiry Information:
SCSI Bus ID: 4
SCSI Target ID: 1
SCSI Target LUN: 0
Peripheral Device Type: 0 (Direct Access)
Peripheral Qualifier: 0 (Peripheral Device
Connected)
Device Type Modifier: 0
Removable Media: No
ANSI Version: 2 (Complies to ANSI
X3.131-1994, SCSI-2)
ECMA Version: 0
ISO Version: 0
Response Data Format: 2 (SCSI-2)
Terminate I/O Process: 0
Asynchronous Notification: 0
Additional Length: 91
Soft Reset Support: No
Command Queuing Support: Yes
Target Transfer Disable: Yes
Linked Command Support: No
Synchronous Data Transfers: Yes
Support for 16 Bit Transfers: No
Support for 32 Bit Transfers: No
Relative Addressing Support: No
Vendor Identification: DEC
Product Identification: RZ28
(C) DEC
Firmware Revision Level: 442D

1018 Troubleshooting TruCluster Congurations

Using System Monitoring Tools

Monitoring
Disk I/O with
iostat

You can use iostat to determine if you are able to read and write
to a specic disk. The following output was generated by the
iostat rz24 command.
tailor.zko.dec.com> iostat rz24
tty
rz1
rz2
rz24
cpu
tin tout bps tps bps tps bps tps us ni sy id
0
5 1 0 0 0 0 0 23 0 3 74
For more information about iostat, see the iostat(1) man page.

Monitoring
LSM
Congurations

Use the volprint command to generate information about an LSM


conguration. Example 108 shows the information generated by
the volprint command.
Example 108 volprint Display

# /usr/sbin/volprint
TYPE NAME
ASSOC
KSTATE
dg rootdg rootdg dm rz3a
rz3a
-

LENGTH COMMENT
130544

You can also use the dxlsm utility to display the current status of
an LSM conguration. For more information, refer to the section
on LSM System Administration.

Monitoring
AdvFS
Congurations

To monitor an AdvFS conguration, use the showfdm and showfsets


commands. Example 109 shows the output of the showfdm
command.
Example 109 showfdmn Display

tinker.zko.dec.com> showfdmn dbfdmn1


Id

Date Created

LogPgs

2f620925.00038f30 Sat Mar 11


512
15:33:41 1995

Domain
Name
dbfdmn1

Vol 512-Blks Free %Used Cmode Rblks Wblks Vol Name


1L 102400 93760
8%
on 128 128 /dev/vol
/dbdg/vol2
2
102400 101456
1%
on 128 128 /dev/vol
/dbdg/vol3
------ ------ ----204800 195216
5%
Example 1010 shows the output of the showfsets command.

Troubleshooting TruCluster Congurations 1019

Using System Monitoring Tools

Example 1010 showfsets Display

tailor.zko.dec.com> /sbin/showfsets dbfdmn1


dbfs1
Id
: 2f620925.00038f30.1.8001
Files
:
2, SLim= 0, HLim= 0
Blocks (512) :
32, SLim= 0, HLim= 0
Quota Status : user=on group=on
dbfs2
Id
Files
Blocks (512)
Quota Status

: 2f620925.00038f30.2.8001
:
2, SLim= 0, HLim= 0
:
32, SLim= 0, HLim= 0
: user=on group=on

You can also use the dxadvfs command to display the current
status of an AdvFS conguration.

1020 Troubleshooting TruCluster Congurations

Summary

Summary
Introducing
ASE
Troubleshooting
Techniques

Troubleshooting TruCluster Software congurations involves


three main activities:

Interpreting error messages

Learning troubleshooting procedures

Using system monitoring tools to diagnose problems

Use the following strategy to troubleshoot an ASE conguration:

Apply system monitoring tools

Check common problem scenarios

Interpreting
TruCluster
Software Error
Log Messages

Examine error log and alert messages

Review conguration guidelines

The TruCluster Logger daemon (aselogger) tracks the messages


generated by all the ASE member systems.
Log les are located at: /var/adm/syslog.dated/date, where date is
a date and time stamp subdirectory (such as 23-Sep-09:07).
TruCluster Software messages include the following information:

Time stamp

Local system name

ASE identier (not used in messages from the availability


manager driver)

System that generated the message (or local)

Source of the message:


AseMgr

asemgr utility

Director

Director daemon

Agent

Agent daemon

HSM

HSM daemon

AseLogger

Logger daemon

AM

Availability manager driver

vmunix

Kernel image

AseUtility

Command executed by an action script

Severity of the message (not provided in AM messages)

Message text

When an alert condition occurs, the TruCluster Software prints


an Alert message in the log. It also writes the Alert message into
the le /var/ase/tmp/alertMsg and executes a user-dened script.

Troubleshooting TruCluster Congurations 1021

Summary

Learning
Troubleshooting
Procedures

Troubleshooting active TruCluster implementations involves the


following steps:

Check Error Logs First, examine the error log les. Error
log messages can often identify a problem cause and suggest a
solution.

Apply System Monitoring Tools Use available system


monitoring tools to look for problems.

Reset Daemons If all else fails, you can reset the TruCluster
Software daemons. This stops the Director, Logger, and Host
Status Monitor (HSM) daemons and initializes the Agent
daemons. The Agent daemons then restart all the daemons to
make the TruCluster Software fully operational.

Use the following procedure to diagnose inoperative TruCluster


Software implementations.
1. Stop all TruCluster activity.
2. Shut down the systems.
3. Running from the console of each ASE member, check to see

that all devices are recognized. Determine that the basic


hardware setup, SCSI bus conguration, and network setup
are all correct.
4. Boot each machine, then make sure that each disk is

reachable by the operating system. Check to see that you can


ping each host member over the network. Use uerf to make
sure devices are numbered consistently. If you discover a
problem, you may need to recongure the kernel.
Note

If steps 3 and 4 reveal no problems, you have a high


assurance that the basic setup is OK.

5. Do some data transfers to make sure that the system can

do reliable I/O. If you get parity errors, you probably have a


cabling problem.
6. Do some network transfers to make sure that the network is

up and running properly.

1022 Troubleshooting TruCluster Congurations

Summary

Using System
Monitoring
Tools

The following system monitoring tools can be helpful in


diagnosing problems with ASE congurations:

asemgr for viewing ASE member and service status

uerf for monitoring the SCSI bus

ps and rpcinfo for determining daemon status

netstat for monitoring the network

iostat for monitoring disk I/O

volprint and dxlsm for monitoring LSM congurations

showfdmn and dxadvfs for monitoring AdvFS congurations

Troubleshooting TruCluster Congurations 1023

Exercises

Exercises
Introducing
ASE
Troubleshooting
Techniques:
Exercise

Describe the recommended troubleshooting strategy for isolating


problems in a TruCluster Software conguration.

Introducing
ASE
Troubleshooting
Techniques:
Solution

The following troubleshooting strategy is recommended:


1. If possible, examine the error log messages to isolate

the component of your TruCluster conguration that is


malfunctioning.
2. Follow troubleshooting procedures to pinpoint the problem

cause.
3. Use system monitoring tools to diagnose the problem.
4. Formulate and apply a recovery plan.

Interpreting
TruCluster
Software Error
Log Messages:
Exercise

Describe the format of a TruCluster error log message.

Interpreting
TruCluster
Software Error
Log Messages:
Solution

TruCluster Software error messages include the following


information:

Time stamp

Local system name

ASE identier (not used in messages from the Availability


Manager driver)

System that generated the message (or local)

Source of the message

Severity of the message (not provided in AM messages)

Message text

1024 Troubleshooting TruCluster Congurations

Exercises

Diagnosing
a Nonactive
System:
Exercise

Describe the procedure for diagnosing a nonactive TruCluster


conguration.

Diagnosing
a Nonactive
System:
Solution

Use the following procedure to diagnose a nonactive system.


1. Stop all TruCluster activity (procedures for stopping

TruCluster activity are described later in this section).


2. Shut down the systems (procedures for turning off member

systems are described later in this section).


3. Running from the console of each ASE member, check to see

that all devices are recognized. Determine that the basic


hardware setup, SCSI bus conguration, and network setup
are all correct.
4. Boot each machine, then make sure that each disk is

reachable by the operating system. Check to see that you can


ping each host member over the network. Use uerf to make
sure devices are numbered consistently. If you discover a
problem, you may need to recongure the kernel.
5. Do some data transfers to make sure that the system can

do reliable I/O. If you get parity errors, you probably have a


cabling problem.
6. Do some network transfers to make sure that the network is

up and running properly.


7. If these steps reveal no problems, but failures in the

TruCluster conguration persist, follow procedures for


diagnosing active systems.

Generating
CAM Error
Information:
Exercise

Describe the command sequence you use to generate CAM error


information about activity on the SCSI bus.

Generating
CAM Error
Information:
Solution

Use the following command sequence to generate CAM error


information about activity on the SCSI bus:
# uerf -o full | more

Troubleshooting TruCluster Congurations 1025

Exercises

Monitoring
TruCluster
Daemons:
Exercise

Describe the utilitiess you can use to monitor the TruCluster


Software daemons.

Monitoring
TruCluster
Daemons:
Solution

You can use the ps utility and the rpcinfo utility to monitor
the TruCluster Software daemons. The command syntax for
determining the TruCluster daemon status with the ps utility is
as follows:
# ps -ax | grep ase
The command syntax for determining the TruCluster daemon
status with the rpcinfo utility is as follows:
# /usr/sbin/rpcinfo -p | grep ase

1026 Troubleshooting TruCluster Congurations

11
Resolving Common TruCluster Problems

Resolving Common TruCluster Problems 111

About This Chapter

About This Chapter


Introduction

This chapter describes ways to diagnose and recover from


frequently reported problems in TruCluster Available Server
congurations. The information is divided into two main topics:

Objectives

Recognizing and solving common problems


Applying TruCluster conguration guidelines

To understand how to troubleshoot a TruCluster Available Server


conguration, you should be able to:

Resources

Describe some of the most common problems found in


TruCluster Software congurations, and steps you can take to
remedy them
Apply TruCluster Software conguration guidelines to
determine if an implementation is properly set up

For more information on the topics in this chapter, see the


following:

TruCluster Available Server Software Available Server


Environment Administration

TruCluster Available Server Software Hardware Conguration


and Software Installation

TruCluster Available Server Software Version 1.4 Release


Notes

Reference Pages

112 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Recognizing Common Problems and Their Symptoms


Overview

This section identies some of the more common problems that


can arise in TruCluster Available Server implementations.
TruCluster problems can be grouped into two main categories:

Hardware problems
Hardware problems usually involve either faulty hardware
conguration or failure of a hardware component. Table 111
lists common hardware problems and their symptoms.

Software problems
Software problems usually involve faulty conguration or
failure to comply with TruCluster requirements. Table 112
lists common software problems and their symptoms.

Problems can also arise due to limitations of the hardware and


software used in a TruCluster conguration. Table 113 lists
problems related to such problems. Note that these limitations
may not apply to future versions.
After each table, discussions of the listed problems contain the
following information:

Problem description

Symptoms

Message patterns

Possible solutions

If a malfunction in your TruCluster Available Server conguration


corresponds to one of the listed entries, you can use this
information to diagnose and recover from the problem.

Resolving Common TruCluster Problems 113

Recognizing Common Problems and Their Symptoms

Table 111 lists frequently reported hardware problems.


Table 111 Frequently Reported Hardware Problems
Problem

Symptom

Improperly congured
SCSI bus

Faulty I/O
Cannot ping member over SCSI bus
Cannot reach devices
SCSI CAM errors in uerf

Host adapter failure

Cannot ping member over SCSI bus

Member crash

Cannot ping member over network and SCSI bus

Disk failure

Device unreachable

Improperly congured
storage device

Frequent SCSI bus resets


I/O timeouts
Device unreachable

Network interface failure

Cannot ping member over network

Network partition

Cannot ping member over network


Cannot access network services

114 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Improperly
Congured
SCSI Bus

Problem Description
SCSI bus difculties are among the most frequently
encountered problems in TruCluster Available Server
congurations. Common causes include:
Faulty bus connections
Improperly terminated bus segments
Cable lengths too long
Incorrect bus conguration
Note that an improperly congured bus may operate properly
for a period of time with no error conditions, but then cause
problems when under heavy load conditions.

Symptoms
Common symptoms of SCSI bus problems include:
TruCluster cannot ping devices over SCSI bus
I/O errors and device failures
Services taken off line or undened

Message Patterns
The following daemon.log excerpt contains TruCluster
messages sent during a SCSI bus failure.

Apr 17 18:54:16 tinker ASE: local HSM Warning: Cant ping tailor over the SCSI bus
Apr 17 18:54:17 tinker ASE: local HSM ***ALERT: network ping to host tailor
is working but SCSI ping is not

Possible Solutions
Review the SCSI bus conguration guidelines to ensure
that all requirements are met.
Check to see that the SCSI bus segments are properly
connected and terminated.
Determine that the cable lengths are within the prescribed
limitations.
Running from a member console, use system monitoring
tools to ensure that the bus setup is correct.
Note

If many otherwise unrelated failures occur on a single


bus (for example, failures on disks belonging to different
services), it is a strong indication that there is a problem
with the specied bus.

Resolving Common TruCluster Problems 115

Recognizing Common Problems and Their Symptoms

Host Adapter
Failure

Problem Description
A host adapter failure isolates the affected ASE member from
the SCSI bus(es) to which the adapter is connected. However,
a host adapter failure does not cause the TruCluster network
to stop functioning.

Symptoms
Cannot ping affected member over SCSI bus
Services and/or TruCluster Director failover
If service is running on affected member under Restrict to
Favored Member policy, the service goes off line

Message Patterns
The following daemon.log message is generated when a host
adapter failure occurs.
Apr 17 14:05:44 tailor ASE: local HSM Warning:
Cant ping tinker over the SCSI bus

Possible Solutions
If you suspect that a host adapter has failed, rst check the
cable(s) connecting the adapter to the SCSI bus. If this does
not resolve the problem, disconnect the affected member from
the TruCluster conguration. If the TruCluster conguration
now functions properly, examine the suspect host adapter in
the following manner:
Ensure that the proper rmware is installed.
Conrm that the host adapter is recognized by the system.
If an adapter has been removed and reinstalled, make sure
it has been placed in the proper slot.
If the conguration is correct and the host adapter still
does not work properly, replace the host adapter.

116 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Member Crash

Problem Description
A member system crash removes the affected member
from the TruCluster conguration, eliminating any failover
capability or performance enhancement provided by that
member. However, if the TruCluster system is properly
congured, a member system crash does not cause TruCluster
to fail.

Symptoms
Common symptoms of a member system crash include:
Cannot ping member over SCSI bus or network
Services and/or TruCluster Director fail over
If service is running on crashed member under restrict to
favored member policy, the service goes off line
uerf error reports indicate SCSI bus resets

Message Patterns
The following daemon.log excerpt shows the message pattern
that occurs when member tinker fails and the TruCluster
Director and service ds1 fail over to member tailor.

Apr 17 18:09:58 tailor ASE: local HSM Warning: Cant ping tinker
over the SCSI bus
Apr 17 18:10:05 tailor ASE: local HSM Warning: Cant ping tinker
over the network
Apr 17 18:10:05 tailor ASE: local HSM Warning: member tinker is DOWN
Apr 17 18:10:05 tailor ASE: tailor Agent Notice: starting a new
director
Apr 17 18:10:07 tailor ASE: local Director ***ALERT: Member tinker
is not available
Apr 17 18:10:08 tailor ASE: tailor Agent Notice: starting service
ds1
Apr 17 18:10:27 tailor ASE: tailor Agent Notice:
/var/ase/sbin/ase_filesystem: /dev/rvol/dbdg/vol1: 4 files, 23857 used, 24974 free
(14 frags, 3120 blocks, 0.0% fragmentation)
Apr 17 18:10:27 tailor ASE: tailor Agent Notice:
/var/ase/sbin/ase_filesystem: /sbin/ufs_fsck -P /dev/rvol/dbdg/vol1
Apr 17 18:10:27 tailor ASE: tailor Director Notice: started service
ds1 on tailor

Possible Solutions
Potential causes of member crashes are too numerous
to be discussed in this chapter. One TruCluster-specic
member crash, however, should be mentioned. This is
the phenomenon in which a reboot occurs when a service
relocation is attempted while a user is occupying a mount
point. This problem is discussed at greater length in the
Known TruCluster Limitations section.

Resolving Common TruCluster Problems 117

Recognizing Common Problems and Their Symptoms

Storage Device
Failure

Problem Description
In the case where a failing disk is not mirrored, the TruCluster
Software:
Logs an Alert message
Stops the affected service
Marks the service as unassigned and issues an Alert
message
If the service is mirrored on a device that has not failed,
storage device failures may not become immediately apparent.
However, a storage device failure or disk failure causes an
Alert message to be sent.

Symptoms
Disk or NFS service fails to start up
Device access failure messages
The following asemgr status display shows the status of a disk
service in which the storage device has failed.

Status for DISK service ds1


Status:
- UNKNOWN

Relocate: Placement Policy:


yes
Balance Services

Favored Member(s):
None

disk service "ds1"


File system(s):
UFS device: /dev/vol/dbdg/vol1
AdvFS set: dbfdmn1#dbfs2 mount
AdvFS set: dbfdmn1#dbfs1 mount
AdvFS domain: dbfdmn1 devices:

Apr 19 17:42:06
/dev/rz24g from
Apr 19 17:42:13
Apr 19 17:42:13
/dev/rz24g
Apr 19 17:42:19
/dev/rz25g
Apr 19 17:42:19
/dev/rz25g

mount options: rw
options: rw
options: rw
/dev/vol/dbdg/vol2 /dev/vol/dbdg/vol3

Message Patterns
The following daemon.log error message pattern indicates a
storage device failure.

tailor ASE: tinker Agent ***ALERT: device access failure on


tinker.zko.dec.com
tailor ASE: tinker Agent Warning: AM cant ping /dev/rz24g
tailor ASE: tinker Agent Warning: cant reach device
tailor ASE: tinker Agent Warning: AM cant ping
tailor ASE: tinker Agent Warning: cant reach device

Possible Solutions
Check the storage devices for proper connections and
conguration.
Determine the location of the failed disk and follow
procedures for replacing a disk described in the section on
TruCluster failure recovery.

118 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Network
Interface
Failure

Problem Description
A network interface error occurs when an individual member
becomes isolated from the primary TruCluster network.
Although this problem may cause services to fail over from the
isolated member, the other members of a properly congured
TruCluster system should continue to function.

Symptoms
The following asemgr display occurs during a network interface
failure on member tinker.
Member Status
Member:
tinker
tailor

Host Status:
DISCONNECTED
UP

Agent Status:
UNKNOWN
RUNNING

If you try to examine the network status by choosing the Show


the current conguration item from the ASE Network Modify
menu, the TruCluster Software issues the message: Net
partition or disconnect - cannot nd a director.

Message Patterns
The following daemon.log error message pattern was recorded
when member tinker experienced a network interface failure.

Apr 17 19:02:54 tailor ASE: local HSM Warning: Cant ping tinker
over the network
Apr 17 19:02:56 tailor ASE: local HSM Warning: Cant ping tinker
over the network
Apr 17 19:04:39 tailor last message repeated 2 times
Apr 17 19:04:43 tailor ASE: local HSM Warning: member tinker is
disconnected from the network
Apr 17 19:04:44 tailor ASE: tailor Agent ***ALERT: member tinker
cut off from net
Apr 17 19:04:45 tailor ASE: tailor Director Notice: finished
processing agent state change from HSM: agent tinker state NIT_DOWN

Possible Solutions
Check the cable connections from the affected ASE member
system to the network.
Use netstat to diagnose the problem on the affected
system.
If necessary, replace the network controller or cable
connections.

Resolving Common TruCluster Problems 119

Recognizing Common Problems and Their Symptoms

Network
Partition

Problem Description
A network partition occurs when all ASE member systems
become isolated from the local subnet.

Symptoms
The following symptoms occur during a network partition:
Cannot ping member over network
Cannot access network services
If you try to examine the network status by choosing the Show
the current conguration item from the ASE Network Modify
menu, the TruCluster Software issues the message: Net
partition or disconnect - cannot nd a director.

Apr 17 19:23:15
Apr 17 19:23:15
down... exiting
Apr 17 19:23:16
Apr 17 19:23:20
network
Apr 17 19:23:26
network

Message Patterns
The following is an example of the daemon.log message pattern
recorded during a network partition on node tailor, the
member on which the TruCluster Director was running.

tailor ASE: local HSM Warning: Network interface DOWN


tailor ASE: tailor Director ***ALERT: Network connection
tailor ASE: tailor Director Warning: Director exiting...
tailor ASE: local HSM Warning: Cant ping tinker over the
tailor ASE: local HSM Warning: Cant ping tinker over the
The next example shows the daemon.log message pattern
recorded during a network partition on node tinker (on which
the TruCluster Director was not running).

Apr 17 19:25:56 tinker


the network
Apr 17 19:25:58 tinker
the network
Apr 17 19:26:00 tinker
between local host and
Apr 17 19:26:01 tinker
between local host and

ASE: local HSM Warning: Cant ping tailor over


ASE: local HSM Warning: Cant ping tailor over
ASE: local HSM Warning: network partition detected
member tailor
ASE: tinker Agent ***ALERT: network is partitioned
tailor
The following kern.log excerpt contains a TruCluster message
sent during a network partition.

Apr 17 19:23:09 tailor vmunix: ln0: lost carrier: check connector


Apr 17 19:23:39 tailor last message repeated 17 times
Apr 17 19:24:25 tailor last message repeated 21 times

Possible Solutions
Use the following procedure to troubleshoot a network
partition:
Check to see that all network cable connections are
properly attached.
Use netstat to troubleshoot the problem.

1110 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Table 112 lists frequently reported software problems.


Table 112 Frequently Reported Software Problems
Problem

Symptom

Invalid script format

Script fails

Multiple asemgr processes

asemgr locks

Removing a disk without updating


asemgr

System crashes

NFS service and ASE member with


same name

Agent daemon fails to initialize

Service alias not in /etc/hosts on all


members

Cannot add service

ASEROUTING not set in NFS service

NFS service cannot be added

ASE member not added to ASE database

New member cannot connect to ASE services

LSM not congured on new member

Disk service relocation fails

Resolving Common TruCluster Problems 1111

Recognizing Common Problems and Their Symptoms

Invalid Script
Format

Problem Description
An action script that contains an error will make it impossible
to start or stop the associated service.

Symptoms
The following output is generated by the asemgr when an
invalid script is added to a service.

Example 111 Display when asemgr Cannot Modify Service

NOTE: Modifying a service causes it to stop and then restart. If you do


not want to interrupt the service availability, do not modify the service.
Enter y to modify service ds1 (y/n): y
Stopping service...
Deleting service...
Adding service...
Starting service...
Start failed - Unable to start service.
Check syslogs daemon log to determine the error.
This service uses either AdvFS or LSM in its storage configuration.
You must select a member on which to leave the storage configured.
1) tinker
2) tailor
x) Exit to Service Configuration

?) Help

Enter your choice [1]: x


Enter o to restore the old service configuration, n to retry the new service
configuration, or d to delete the service [n]:

Message Patterns
The following log message pattern is generated when an
invalid script is added to a service.

Apr 20 10:38:03 tinker ASE: tailor Agent Notice: starting service ds1
Apr 20 10:38:12 tinker ASE: tailor Agent Notice: /var/ase/sbin/ase_filesyste
m: /dev/rvol/dbdg/vol1: File system unmounted cleanly - no fsck needed
Apr 20 10:38:12 tinker ASE: tailor Agent Notice: /var/ase/sbin/ase_filesyste
m: /sbin/ufs_fsck -P /dev/rvol/dbdg/vol1
Apr 20 10:38:12 tinker ASE: tailor Agent Error: user script: /tmp/ase_sh5164
[20]: B: not found
Apr 20 10:38:16 tinker ASE: tailor AseMgr Error: Start failed - Unable to st
art service.

Possible Solutions
To solve a problem with an invalid Action Script, x the
script and test it by executing it outside of the TruCluster
conguration. As a general rule, all scripts should be tested
this way before adding them to an ASE service.

1112 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Multiple
asemgr
Processes

Problem Description
If you run multiple asemgr processes, the TruCluster database
becomes locked, and you cannot perform any modications on
the system.

Symptoms
The asemgr exhibits the following display if you try to access
the database when multiple asemgr processes are running.

ASE is locked by tailor.zko.dec.com


ASE is locked - you have the option of forcing ASE to reset
Enter y to force ASE to reset (y/n): n
ASE is locked - you have the option of forcing ASE to reset
Enter y to force ASE to reset (y/n): y

Message Patterns
The following message pattern is displayed in the daemon.log
le when you try to access the TruCluster database when
multiple asemgr processes are running.

Apr 17 17:05:48 tinker ASE:


use by an ASEmgr on tailor
Apr 17 17:05:48 tinker ASE:
tailor.zko.dec.com
Apr 17 17:05:50 tinker ASE:
ASE database.
Apr 17 17:05:50 tinker ASE:
Apr 17 17:05:50 tinker ASE:

tinker Director Notice: DB lock is in


tinker AseMgr Error: ASE is locked by
tinker AseMgr Error: Unable to freeup
unknown client Notice: restarting Agent!
tinker Agent Notice: restarting Agent!

Possible Solutions
To resolve this problem, stop the multiple asemgr processes
until only one is running.

Resolving Common TruCluster Problems 1113

Recognizing Common Problems and Their Symptoms

Removing
Disk Without
Updating
asemgr

Problem Description
If you remove a disk from an ASE service and forget to update
the asemgr to redene the service, the service will fail.

Symptoms
If the service is running when you remove the disk, the system
running the service will crash when you try to access a mount
point. However, if the service is mirrored, it may fail over to
another member.
If the service is not running when you remove the disk, the
service becomes unreachable.

Message Patterns
The following daemon.log message pattern is generated when
you access a mount point on a disk that has been removed.

Apr 28 12:18:47 tailor ASE: tinker Agent Error:


cant unreserve device
Apr 28 12:18:50 tailor ASE: tinker Agent Warning:
AM cant ping /dev/rz24g
Apr 28 12:18:50 tailor ASE: tinker Agent Warning:
cant reach device /dev/rz24g
Apr 28 12:18:51 tailor ASE: tinker Agent ***ALERT:
possible device failure: /dev/rz24g
Apr 28 12:18:51 tailor ASE: tinker Agent Error:
cant unreserve device /dev/rz24g
Apr 28 12:18:54 tailor ASE: tinker Agent Error:
cant unreserve device
Apr 28 12:18:57 tailor ASE: tinker Agent Warning:
AM cant ping /dev/rz24g
Apr 28 12:18:57 tailor ASE: tinker Agent Warning:
cant reach device /dev/rz24g
Apr 28 12:18:58 tailor ASE: tinker Agent ***ALERT:
possible device failure:
/dev/rz24g
Apr 28 12:18:58 tailor ASE: tinker Agent Error: cant unreserve device /dev/rz24g
Apr 28 12:19:15 tailor ASE: tinker Agent Notice: cant unreserve ds1s
devices, stopping it anyway
Apr 28 12:19:25 tailor ASE: local HSM Warning: Cant ping tinker over the network

Possible Solutions
If possible, try to set a service off line before you replace a
disk used by the service. In any case, you must ensure that no
users are occupying the mount point or otherwise attempting
to access the disk while it is being replaced.
After you replace the disk and turn on the storage device,
rereserve the device (if necessary) and set the service back on
line.

1114 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

NFS Service
and ASE
Member with
Same Name

Problem Description
NFS services require the conguration of a pseudo host that is
associated with an internet address so that the service can be
failed over. If the service name is the same as one of the ASE
members, the Agent daemon process for that host name will
become confused and fail to initialize properly.

Symptoms
If this problem arises, the asemgr host status for the affected
host will be UP, while the Agent status will be DOWN.
However, the Agent daemon will be running on the host.

Message Patterns
The following message is recorded when a system with ASE
member name tinker has an NFS service name tinker.
Apr 22 11:40:38 tinker ASE: tinker AseMgr Error:
AseMgr failed to initialize

Possible Solutions
Recongure the NFS service with a unique name.

Resolving Common TruCluster Problems 1115

Recognizing Common Problems and Their Symptoms

Service
Alias not in
/etc/hosts on
All Members

Problem Description
ASE services require that you associate the service name with
an IP address by placing an entry in the /etc/hosts le on all
ASE member systems. If you do not do this, you will not be
able to congure the service.

Symptoms
If this problem arises, the attempt to add the service will fail.

Message Patterns
The following message pattern is recorded when the service
nfsusers is added, but the service and IP address are not
properly entered in the /etc/hosts les on all the member
systems:
Sep 20 16:24:25 tailor ASE: tinker Agent Notice: adding
service nfsusers
Sep 20 16:24:26 tailor ASE: tailor Agent Notice: adding
service nfsusers
Sep 20 16:24:28 tailor ASE: tinker Agent Error:
/var/ase/sbin/nfs_ifconfig: nfsusers not in hosts
database
Sep 20 16:24:29 tailor ASE: tailor Agent Error:
/var/ase/sbin/nfs_ifconfig: nfsusers not in hosts
database
Sep 20 16:24:30 tailor ASE: tinker Agent Error:
/var/ase/sbin/nfs_ifconfig: nfsusers not in hosts
database
Sep 20 16:24:31 tailor ASE: tailor Agent Error:
/var/ase/sbin/nfs_ifconfig: nfsusers not in hosts
database
Sep 20 16:24:31 tailor ASE: tinker Director Error:
cant add service
Sep 20 16:24:31 tailor ASE: tailor AseMgr Error:
Add failed - Unable to add service.

Possible Solutions
Add the proper service alias and IP address to the /etc/hosts
le on all member systems.

1116 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

ASEROUTING
not Set in NFS
Service

Problem Description
As described in Available Server Environment Administration,
it is possible to congure an NFS service to broadcast host
names on networks that are not native to the ASE NFS
service name. If you set up this conguration but fail to
set the rc.config le to ASEROUTING=yes on all ASE member
systems, the NFS service cannot be added to the conguration
database.

Symptoms
The asemgr issues a message that the NFS service cannot be
added.

Message Patterns
The following message is recorded when ASEROUTING is not
set in an NFS service.
tailor ASE: tailor Agent Error:
Must be IP router and run gated to use
ASE routing; run netsetup.

Possible Solutions
Run netsetup again, following the instructions in the Available
Server Environment Administration for setting up TruCluster
routing. After you have completed the netsetup process, make
sure you run the following command on all ASE member
systems:
# rcmgr set ASEROUTING yes

Resolving Common TruCluster Problems 1117

Recognizing Common Problems and Their Symptoms

ASE Member
not Added to
TruCluster
Database

Problem Description
If the TruCluster Software is installed on a member system,
but the member is not added to the TruCluster conguration
database on the original member system, the new system will
not be recognized as a legitimate member.

Symptoms
If the asemgr is run on one of the preexisting ASE members,
the new member will not be displayed in the member status
display. Since the new member is essentially a foreign system,
any requests by the new member to connect to ASE services
will fail.

Message Patterns
The following message pattern is recorded when system tinker
tries to access the TruCluster conguration without having
been added to the TruCluster database.

May 9 07:31:50 tailor ASE: tailor Agent ***ALERT: ***Possible security


breach attempt: connect request from non-member node tinker
May 9 07:31:50 tailor ASE: tailor Agent Notice: connection refused by
connect callback

Possible Solutions
Run the asemgr on the original member and add the new
member to the TruCluster conguration database.

1118 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

LSM not
Congured on
New Member

Problem Description
For a disk service to fail over to a new member, LSM must be
set up on the new system, with rootdg congured on a local
disk.

Symptoms
If LSM has not been congured on a new member, and a
service using LSM attempts to relocate to the new system, the
relocation will fail.

Message Patterns
The following daemon.log message patterns are displayed if
LSM has not been congured on a system and a service using
LSM attempts to relocate.

May 9 10:27:06 tailor ASE: tinker Agent


voldg: Volume daemon is not accessible
May 9 10:27:07 tailor ASE: tinker Agent
voldg deport of disk group dbdg failed
May 9 10:27:07 tailor ASE: tinker Agent
forcing a deport of disk group
May 9 10:27:07 tailor ASE: tinker Agent
voldg: Volume daemon is not accessible
May 9 10:27:07 tailor ASE: tinker Agent
voldg deport of disk group dbdg failed
May 9 10:27:07 tailor ASE: tinker Agent
force deport of dbdg failed

Known
TruCluster
Limitations

Error: /var/ase/sbin/lsm_dg_action:
Error: /var/ase/sbin/lsm_dg_action:
Error: /var/ase/sbin/lsm_dg_action:
Error: /var/ase/sbin/lsm_dg_action:
Error: /var/ase/sbin/lsm_dg_action:
Error: /var/ase/sbin/lsm_dg_action:

Possible Solutions
Use volsetup to congure the rootdg disk group on the new
system, then restart the vold daemon.

Table 113 lists known limitations of TruCluster Available


Server Software Version 1.4. Note that these limitations may be
addressed in future versions.

Table 113 Known Limitations of TruCluster Available Server Software Version 1.4
Problem

Symptom

Users occupying mount points

System reboots when service relocated

Non-TruCluster processes with higher


priority

ASE services time out

BC09D cables used with KZTSA


controller

Errors on SCSI bus

Resolving Common TruCluster Problems 1119

Recognizing Common Problems and Their Symptoms

Users
Occupying
Mount Points

If you try to relocate a service while users are occupying a mount


point, it will cause the system on which the service is running to
shut down.

Problem Description
To stop a disk-based service, the TruCluster Software must
be able to unmount the le systems. This means that the
TruCluster Software must be able to stop all processes
accessing the mounted le systems. You should ensure that all
processes invoked by the start action script are stopped by the
stop action script. Avoid users accessing the local mount point
(and preventing unmounting) by recommending that users
access only the directory that is exported.

Symptoms
The system shuts down when you try to relocate a service.

Message Patterns

Mar 1 10:05:05 member1 ASE: member1 Agent Notice: stopping service


service1
Mar 1 10:05:37 member1 ASE: member1 Agent Error: /var/ase/sbin/ase_
mount_action: /share/test/service1: Device busy
Mar 1 10:05:38 member1 last message repeated 9 times
Mar 1 10:05:38 member1 ASE: member1 Agent Error: /var/ase/sbin/ase_
mount_action: Unable to umount /share/test/service1
Mar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm
_dg_action: voldisk: Device rz16c: Device is in use
Mar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm
_dg_action: voldisk: Device rz19c: Device is in use
Mar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm
_dg_action: voldisk: Device rz25c: Device is in use
Mar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm
_dg_action: voldg: Disk group dg2: import failed: Disk group exists and
is imported
Mar 1 10:05:41 member1 ASE: member1 Director Error: cant stop service
Mar 1 10:05:41 member1 ASE: member1 AseMgr Error: Stop failed - Unable
to stop service.
Mar 1 10:05:41 member1 ASE: member1 AseMgr Error: Unable to stop ser
vice service1 - Relocation not successful.
Mar 1 10:05:41 member1 ASE: member1 AseMgr Error: Unable to relocate
service1 to member2.

Possible Solutions
Avoid accessing the mount point on systems running ASE
services.

1120 Resolving Common TruCluster Problems

Recognizing Common Problems and Their Symptoms

Non-TruCluster
Processes with
Higher Priority

Problem Description
If there are non-TruCluster processes with a scheduling
priority higher than the priority of the TruCluster daemons,
the daemons could time out while waiting to run. If this
occurs, a TruCluster timeout error appears in the daemon.log
le.

Symptoms
If the TruCluster Software daemons do not start even though
the TruCluster Software is properly congured, you may have
a timeout problem.

Message Patterns
The following message pattern indicates a timeout error due
to a scheduling priority problem.
Mar 8 13:09:28 surry ASE: surry AseMgr error:
ASE timeout - Unable to stop service.

Possible Solutions
If necessary, you can raise the scheduling priority of the
TruCluster daemons by changing the lines in the /sbin
/init.d/asemember le that start the asedirector, aseagent,
and aselogger daemons, or xing a higher priority with the
aseagent -p command. For more information about scheduling
priorities, see Available Server Environment Administration
priorities.

Resolving Common TruCluster Problems 1121

Recognizing Common Problems and Their Symptoms

Using BC09
Cables with
KZTSA
Controller

Problem Description
BC09D narrow cables are built to an earlier SCSI specication,
and they are conditionally supported in TruCluster
congurations; the newer BN family of cables is preferred. If
BC09D cables are used in an ASE conguration, they should
be limited to slow speed operation, with a maximum length of
3 meters.
If you use BC09D cables with a KZTSA controller, you may
get errors on the SCSI bus.

Symptoms
If you have a problem with a BC09D cable, common symptoms
include irregular I/O and reports of CAM errors.

Message Patterns
Possible daemon.log message patterns associated with
problems with a BC09D cable include errors accessing devices
on the shared SCSI bus, such as the following:
May 02 12:42:06 tailor ASE:
tinker Agent ***ALERT: device access failure on
/dev/rz24g from tinker.zko.dec.com

Possible Solutions
Replace the BC09D cable(s) with equivalent cables from the
BNxx family of cables.

1122 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

Applying TruCluster Conguration Guidelines


TruCluster
Conguration
Guidelines

This section contains a series of checklists you can use to help


determine whether a TruCluster Available Server implementation
is properly congured. The information in this section is divided
into the following sections:

General Hardware Conguration

SCSI Bus

Termination

Host Adapters

Disk Storage Enclosures

Signal Converters

Tri-link Connectors

Network Connections

TruCluster Software Installation

General Software Conguration

TruCluster Conguration

Service Conguration

Disk Services

LSM Conguration

AdvFS Conguration

Action Scripts

For specic information about conguring TruCluster components,


see the appropriate section in the TruCluster Available Server
Software documentation.

Resolving Common TruCluster Problems 1123

Applying TruCluster Conguration Guidelines

General
Hardware
Conguration

Use the following checklist to determine if your basic hardware


conguration complies with TruCluster requirements:1

TruCluster Available Server congurations can have from


two to four member systems (if your system has a PMAZC or
KZMSA host adapter, the upper limit is three).

All member systems and disk storage boxes must be connected


by means of a shared SCSI bus.

Only eight devices (SCSI controllers, disks, or RAID


controllers) are allowed on each shared bus.

Devices must be connected in a way that enables you to


disconnect them from the bus without affecting the bus
termination.

All members in your ASE must be congured to be on at least


one common network subnet.

A member system may not boot properly if it does not have a


graphics head attached. If a member does not have a graphics
head, you must set the SERVER console variable to ON.

These checklists assume that your conguration uses only hardware and
software supported for TruCluster Available Server Software.

1124 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

SCSI Bus

Use this checklist to determine whether your SCSI bus


conguration complies with TruCluster requirements.

Congurations can use no more than 8 SCSI IDs (0-7) on each


shared SCSI bus (devices that require SCSI IDs include host
adapters and storage devices, but not signal controllers).

Member systems must see all devices on a given SCSI bus at


the same SCSI ID.

If you are using a dual-ported controller, the shared bus must


be on the same channel.

If narrow and wide devices are used, a signal converter must


be placed between them.

Tape devices are not permitted on the shared SCSI bus.

The length of the shared SCSI bus must be within the


TruCluster limits:
Sub-buses cannot exceed 3 meters for single-ended fast, 6
meters for single-ended slow, and 25 meters for differential.
Bus length calculations must include the internal bus
lengths of devices.

Termination

All the SCSI controllers on a bus do not have to be set to the


same bus speed. However, using fast SCSI bus speed on any
SCSI controller connected to a bus decreases the total amount
of single-ended cable that you can use for that bus from 6
meters to 3 meters. If one or more SCSI controllers are set
to fast SCSI bus speed, your shared bus must adhere to this
cable length restriction.

Use this checklist to determine whether your SCSI bus is properly


terminated:

An improperly terminated bus segment will cause problems.


The bus may operate properly for a period of time with no
error conditions, but cause problems when under heavy load
conditions.

Check to see that the shared SCSI buses are properly


terminated at both ends (this includes sub-buses or bus
segments).

Make sure that terminators that should be removed are not in


place.

Check to see that all movable resistors are installed correctly


(they must not be installed backwards).

Resolving Common TruCluster Problems 1125

Applying TruCluster Conguration Guidelines

Host Adapters

Use this checklist to determine whether the host adapters in your


conguration comply with TruCluster requirements.

The correct rmware must be installed in all SCSI host


adapters.

SCSI IDs and mode settings must be correct for each host
adapter.

Depending on your conguration, you may have to remove the


internal termination for a SCSI controller port.

Disk Storage
Enclosures

The SCSI host adapters must be installed in logically


equivalent I/O bus slots in each system. When the kernel
boots, the SCSI bus number is determined by the order in
which the SCSI controllers are installed in SCSI bus slots,
starting with the rst slot.

Unused ports on duel-ported host adapters must be


terminated.

Use this checklist to determine whether the storage devices in


your conguration comply with TruCluster requirements.

Check to see that all systems on a given shared SCSI bus see
the disks at the same device.

If you alter the logical conguration of a bus, you must allow


for the fact that previously congured disk device numbers
may have changed.

Be careful when performing maintenance on any device


located on the shared bus because of the constant activity
on the bus. In general, to perform maintenance on a device
without shutting down the ASE, you must isolate the device
from the shared bus without affecting bus termination.

To take a disk that is on the shared bus off line, you must
ensure that no service is using the disk, unless the disk is part
of an LSM-mirrored logical volume or a mirrored RAID device.

If you disconnect a storage box from the SCSI bus, TruCluster


behavior is unpredictable, unless one of the following
conditions is met:
The disks are part of a mirrored Logical Storage Manager
(LSM) volume.
The disks are contained in a RAID set.

When an ofine disk goes on line again, you must use the
asemgr utility to manually restart the service. You should do
this even if the disk was part of a mirrored volume because it
may not be reserved.

1126 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

Signal
Converters

Use this checklist to determine whether the signal converters in


your conguration comply with TruCluster requirements.

If the SCSI host adapter on an ASE member system is


connected to a DWZZA, be sure to follow the proper sequence
when turning the system on or off:
Turn on the system and allow it to complete startup
diagnostics before you turn on any DWZZA attached to the
system.
Before turning off a system, you must turn off all DWZZAs
connected to the system.

Tri-link
Connectors

If you take off the DWZZA-AA cover to remove termination,


ensure that star washers are in place on all four screws that
hold the cover in place when you replace the cover. (If the
star washers are missing, the DWZZA-AA is susceptible to
problems caused by excessive noise.)

Tri-link connectors must adhere to the following restrictions:


If you connect a cable to a tri-link connector, do not block
access to the screws that mount the tri-link, or you will be
unable to disconnect the tri-link from a device.

Network
Connections

If a device to which a tri-link connector is connected is at the


end of a shared bus, you can attach a terminator to one of the
tri-links connectors to terminate the bus.

Use this checklist to determine whether the network in your


conguration complies with TruCluster requirements:

Systems must be on the same network subnet.

Member system names must correspond to the name returned


by the /sbin/hostname command.

A DEMFA network controller cannot be the network controller


associated with the host name of a member system.

If the primary network connected to the systems becomes


saturated, TruCluster operation is impaired. If you receive
messages indicating that you are out of mbufs you can:
Use a dedicated network as the primary network for the
member systems.
Adjust the ubcmaxpercent and ubcminpercent conguration
le parameters. Refer to the Digital UNIX manual
System Tuning and Performance Management for more
information.

Resolving Common TruCluster Problems 1127

Applying TruCluster Conguration Guidelines

TruCluster
Software
Installation

Use this checklist to determine that the software installation


of your TruCluster Available Server implementation meets
TruCluster requirements.

The TruCluster Software subset can be installed only on


systems running the supported operating system version.

Each member system should have at least 64 MB of memory.

You must install all required operating system subsets.

The TruCluster Software subset and license must be installed


on each member system.

If you install the PAK after you install the TruCluster


Software, you must reboot the system after installing the PAK.

If you use AdvFS, LSM, NFS, or RIS, additional subsets are


required:
For AdvFS: POLYCENTER AdvFS, Utilities, and GUI
For LSM: Logical Storage Manager Advanced Utilities and
GUI
For NFS: NFS Utilities (client and server)
For RIS: Remote Installation Service (RIS) subset
Use setld -i to determine which subsets are installed on your
system.

Check the TruCluster Available Server Software release notes


and cover letter to see if there are operating system patches
for TruCluster.

1128 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

General
Software
Conguration

Use this checklist to determine that the general software


conguration for your TruCluster implementation meets
TruCluster requirements.

You must set up the local network on each member system


(see netsetup(8)).

Set up NIS or BIND if you intend to use them for network


name resolution on your network (see bindsetup(8) and
ypsetup(8)).

If you intend to use NFS services, you should set up NFS and
start the daemons (see nfssetup(8)).

Set up MAIL so that root can receive alert messages from


TruCluster (see mailsetup(8)).

You must set up a distributed time service such as the


Network Time Protocol (NTP) daemon (xntpd). TruCluster
relies on synchronized time on all member systems; it needs
accurate timestamps for database versions.

The host names and IP addresses of each member system


must be included in each member systems /etc/hosts le.

Resolving Common TruCluster Problems 1129

Applying TruCluster Conguration Guidelines

TruCluster
Software
Conguration

Use this checklist to determine that your TruCluster Software


conguration meets TruCluster requirements.

When you use the asemgr utility to add the member systems
to the ASE, add all the member systems at the same time
and from the same system. Do not run the asemgr utility on
one system and add one member system, then run the asemgr
utility on another system and add a different member system.

When adding a new member to an existing TruCluster


conguration, do not run asemgr on the new member, as this
will create a new ASE domain and a new database.

To change the name of a member system, you must delete the


member system from the ASE, initialize the system, and then
add the new member system to the ASE.

You cannot delete a member if it is included in the list of


members that are favored to run the service, according to
the services Automatic Service Placement (ASP) policy. You
must use the asemgr utility to change the services ASP before
deleting the member. This restriction does not apply if the
service allows you to change the only favored member to the
one specied when you manually relocate a service. In this
situation, use the asemgr utility to relocate the service before
deleting the member.

You cannot delete the member running the asemgr utility. If


there is only one member system in the ASE, you cannot
delete that member using asemgr. Use setld -d to delete the
TruCluster Software subset from the last member system.

1130 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

Service
Conguration

Use this checklist to determine that the service conguration for


your TruCluster implementation meets TruCluster requirements.

The maximum number of services TruCluster can handle is


256.

Only one member system at a time can run a given service.

You cannot use an NFS service name that is the same as the
name of a member system. You cannot use a service name
that has a slash (/) in it.

Only certain types of applications can be made available with


an ASE service. The application must:
Run on only one system at a time.
Be able to be started and stopped using a set of commands
performed in a specic order.
When you set up a service, these commands are included in
the action scripts for the service.

The balanced service policy attempts to balance the service at


the time a new service is started. It does not relocate services
to continue balancing the service load.

TruCluster Software clients refer to service names rather


than server names. For instance, to access an NFS service
nfs_services, a client will have a line such as the following in
its /etc/fstab le:
/project@nfs_service /usr/project nfs rw,bg 0 0
The client must also have an entry in its /etc/hosts le
for nfs_service with an Internet address. This is a oating
address aliased to the member system currently running the
service.

When adding a service, be sure to congure the service so that


it can run on all member systems.

Resolving Common TruCluster Problems 1131

Applying TruCluster Conguration Guidelines

Disk Services

Use this checklist to determine that the disk services in your


TruCluster Available Server implementation meet TruCluster
requirements.

NFS service names and member names must have addresses


on the same IP subnet, and must be in all members /etc
/hosts les.

Do not manually edit the /etc/exports.ase. service le; you


must use the asemgr utility to edit it.

To use UFS with TruCluster, set up the disks in the usual


way with disklabel and newfs. Do not locally mount the le
systems because TruCluster mounts them for you when the
service is started.

A disk cannot be used in more than one service because a


service must have exclusive access to the disk. When you use
a disk in a service, you use the entire disk.

Applications in user-dened services cannot use disks. If your


application is disk based, set up a disk service instead.

1132 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

LSM
Conguration

Use this checklist to determine that the LSM conguration


in your TruCluster Available Server implementation meets
TruCluster requirements.

When using LSM in TruCluster Available Server conguration,


all member systems need LSM software so that any of them
can run the service.

All member systems need the rootdg disk group set up on a


local (nonshared) disk. The rootdg disk group must be active
(imported) whenever TruCluster is active, to provide an active
disk group for LSM. Set up another disk group using the
shared disks.

You must set up a services LSM disk groups and volumes on


the same member on which you will be running the asemgr
utility to set up the service.

All LSM disk group names in the TruCluster conguration


must be unique.

When adding a service that uses LSM or modifying a services


LSM conguration, the disk groups used in the service must
be imported to the machine on which you are running the
asemgr utility. LSM conguration changes can be made only on
an imported disk group.

A disk or disk group can be used in only one service, but a


service can use more than one disk or disk group.

Rereserving an LSM device allows you to replace a new


synchronized part of an LSM mirrored volume without
stopping the service.

If a service uses LSM mirrored volumes, do not modify the


service while a mirrored volume is resynchronizing because
the resynchronization will abort and then restart. The abort
will not corrupt the volume, but it will delay the volume
resynchronization.

Resolving Common TruCluster Problems 1133

Applying TruCluster Conguration Guidelines

AdvFS
Conguration

Use this checklist to determine that the AdvFS conguration


in your TruCluster Available Server implementation meets
TruCluster requirements.

When using AdvFS in a TruCluster conguration, all member


systems must have AdvFS software installed.

To use AdvFS with TruCluster, set up the domains and lesets


on the same member on which you will run asemgr to add the
service.

If you create a disk service that uses AdvFS and choose not
to have the TruCluster Software automatically mount the
lesets, a member system may panic unless the following
conditions are met:
Before you add the disk service, make sure that the leset
is not already mounted.
If you mount a leset in your own user-dened action
scripts, make sure that the user-dened stop action script
unmounts the le system and returns an error if the
unmount fails.

To modify an AdvFS conguration that a service uses, the


disks must be congured on the system on which you make
the modications.

AdvFS domain names must be unique on all the member


systems.

A service can use more than one AdvFS domain, but a domain
cannot be used by more than one service.

A service should control all the lesets in the domain; do not


put one leset in a service and mount another locally.

Do not locally mount the lesets because TruCluster mounts


them for you when the service is started.

Do not use quotas on a UFS le system or enable quotas on an


AdvFS leset that you want to fail over.

1134 Resolving Common TruCluster Problems

Applying TruCluster Conguration Guidelines

Action Scripts

Use this checklist to determine that the action scripts in your


TruCluster Available Server implementation meet TruCluster
requirements.

When you add a disk service, you are not prompted for action
scripts. To fail over an application, modify the service and
specify the action scripts.

If you create your own user-dened action scripts, you must


install them locally on each member system.

If you specify a pathname for a script when prompted by the


asemgr utility, you can edit the script only by using the asemgr.
This is because the TruCluster Software uses the copy of the
script that is in the TruCluster database and not the one
located on the system.

Ensure that only processes started and stopped by the


services action scripts can access the disks used in a service.
Make sure that all the processes invoked by the start action
script are stopped by the stop action script. The actions in the
start script must be reversed by the actions in the stop script.

If you must allow users access to the local mount point,


ensure that the services stop action script is able to stop these
processes.

The TruCluster Software does not report messages generated


by applications running within an ASE service. However,
TruCluster action scripts capture any output from the
commands that they execute. If the action script fails, the
command output is logged in the daemon.log le.

Resolving Common TruCluster Problems 1135

Summary

Summary
Recognizing
Common
Problems

You should be familiar with the common TruCluster hardware


and software problems that have been described in this chapter.
In addition, you should be aware of the TruCluster limitations
which cause problem situations.

Applying
TruCluster
Conguration
Guidelines

Use the TruCluster conguration guideline checklists to help


determine whether your TruCluster Software conguration is
properly set up. Checklists are provided for the following topics:

General Hardware Conguration

SCSI Bus

Termination

Host Adapters

Disk Storage Enclosures

Signal Converters

Tri-Link Connectors

Network Connections

TruCluster Software Installation

General Software Conguration

ASE Conguration

Service Conguration

Disk Services

LSM Conguration

AdvFS Conguration

Action Scripts

1136 Resolving Common TruCluster Problems

Exercises

Exercises
TruCluster
Message
Interpretation:
Exercise

Describe the common problem that generates the following


TruCluster error message:

TruCluster
Message
Interpretation:
Solution

A common cause of this message is a situation where ASE has


been installed on a member system, but the member has not
been added to the ASE conguration database on the original
member system. The new system is not recognized as a legitimate
member.

Problem
Relocating
Service:
Exercise

If you try to relocate a disk service and get a device busy message,
what is the likely cause?

Problem
Relocating
Service:
Solution

The message may indicate that the TruCluster Software could


not unmount a disk, possibly because it could not stop all the
processes accessing the disk.

May 9 07:31:50 tailor ASE: tailor Agent ***ALERT:


***Possible security breach attempt:
connect request from non-member node tinker
May 9 07:31:50 tailor ASE: tailor Agent Notice:
connection refused by connect callback

The TruCluster Software cannot stop a service that uses mounted


le systems, lesets, or volumes unless it can unmount them.
TruCluster may not be able to unmount a disk in the following
situations:

A process that accesses the disk was started by the services


start action script, but was not stopped by the services stop
action script.

A process that the start action script did not start and that
is unrelated to TruCluster is accessing the disk. This could
occur if a user logs in to the system on which the le system is
locally mounted and changes directory to the mount point.

SCSI ID Limits:
Exercise

Describe the limitations that TruCluster places on the use of SCSI


IDs.

SCSI ID Limits:
Solution

TruCluster Software congurations can use no more than 8 SCSI


IDs (0-7) on each shared SCSI bus (devices that require SCSI
IDs include host adapters and storage devices, but not signal
controllers).

Resolving Common TruCluster Problems 1137

Exercises

Applying
TruCluster
Conguration
Guidelines:
Exercise

Describe important service conguration guidelines.

Applying
TruCluster
Conguration
Guidelines:
Solution

Service conguration guidelines are as follows:

The maximum number of services TruCluster can handle is


256.

Only one member system at a time can run a given service.

You cannot use an NFS service name that is the same as the
name of a member system. You cannot use a service name
that has a slash (/) in it.

Only certain types of applications can be made available with


an ASE service. The application must:
Run on only one system at a time.
Be able to be started and stopped using a set of commands
performed in a specic order.
When you set up a service, these commands are included in
the action scripts for the service.

The balanced service policy attempts to balance the service at


the time a new service is started. It does not relocate services
to continue balancing the service load.

TruCluster Software clients refer to service names rather


than server names. For instance, to access an NFS service
nfs_services, a client will have a line such as the following in
its /etc/fstab le:
/project@nfs_service /usr/project nfs rw,bg 0 0
The client must also have an entry in its /etc/hosts le
for nfs_service with an Internet address. This is a oating
address aliased to the member system currently running the
service.

When adding a service, be sure to congure the service so that


it can run on all member systems.

1138 Resolving Common TruCluster Problems

12
Test

Test 121

Questions

Questions
In the space provided, write the letter corresponding with the best
answer to each multiple-choice, matching, or true/false question.
1.

What is not a feature of the TruCluster Software


product?
a. Concurrently Active Servers
b. Network Failover
c.

Distributed Lock Manager

d. Transparent NFS Failover


2.

The following is a hardware requirement for TruCluster


Software.
a. A shared SCSI bus
b. External disks in an expansion box
c.

Ethernet or FDDI network

d. All of the above


3.

Identify the Available Server component that controls


Available Server operations on one member system.
a. aseagent
b. asedirector
c.

asehsm

d. asemgr
4.

The Available Server component that controls the ASE


and coordinates the ASE activities.
a. aseagent
b. asedirector
c.

asehsm

d. asemgr
5.

The following characteristic is not required for an


application to be suitable for an ASE service:
a. The application must be able to be stopped using a set of

commands issued in a specic order


b. The application must read and write data from an NFS le

system
c.

The application must be able to be started using a set of


commands issued in a specic order

d. The application must run on only one system at a time

122 Test

Questions

6.

When you make initial plans for an Available Server


implementation, you must consider:
a. Services to be made available
b. Survivable failures
c.

Do services require custom scripts?

d. All of the above


7.

Which feature is not provided by the TruCluster


Software?
a. Failover of services
b. Decoupling of host name and service name
c.

Restarting failed applications

d. Determining status of ASE members

What provides the user interface to the ase software?

8.

a. asedirector
b. asemgr
c.

Cluster Monitor

d. Availability Manager driver


9.

A failure condition that does not result in a service


relocation:
a. Network Interface Failure
b. Host Down Scenario
c.

Device Failure

d. Network Partition
10.

You must use a DWZZA signal converter with a KZMSA


because:
a. The KZMSA has only one channel
b. The KZMSA uses the differential mode of signal

transmission
c.

You cannot remove the KZMSA internal terminators

d. The KZMSA operates on only a wide SCSI bus


11.

You should use a DWZZA signal converter with a


PMAZC SCSI host adapter because:
a. The DWZZA increases the maximum shared SCSI bus

length
b. The PMAZC is a dual-ported SCSI host adapter
c.

The PMAZC operates as either a fast or slow SCSI host


adapter
Test 123

Questions

d. Signal conversion is necessary to connect the PMAZC to a

BA350 storage box


12.

The maximum length of the shared SCSI bus in an


Available Server conguration for Version 1.4 is:
a. 3 meters
b. 6 meters
c.

25 meters

d. 31 meters
13.

Which cable do you attach to a single-ended device to


enable you to disconnect the system without affecting SCSI
bus termination?
a. BN21J
b. BN21H
c.

BN21V-0B

d. BN21W-0B
14.

You cannot mix single-ended and differential SCSI bus


segments on a shared SCSI bus for ASE congurations.
a. True
b. False

What could you use in place of a BN21W-0B?

15.

a. H8574-A
b. H8660-AA
c.

H879-AA

d. H885-AA
16.

You are conguring a shared SCSI bus on port A of a


PMAZC. Which jumper do you remove to disable the PMAZC
single-ended bus termination?
a. W1
b. W2
c.

W3

d. W4

124 Test

Questions

17.

What is the most important thing to consider when


conguring DEC 3000 Model 500 systems in a single-ended
Available Server conguration with a BA350 without using a
DWZZA?
a. Remove the PMAZC single-ended termination jumper for

the port being used


b. Remove the ash memory write jumper
c.
18.

SCSI bus length is appropriate for the bus speed

Which console command sets the SCSI ID for a


KZTSA?
a. t tc cnfg
b. t tc setid
c.

t tc speed

d. t tc id
19.

When using a KZMSA XMI to SCSI bus adapter in


an Available Server conguration, you must use a DWZZA
because you cannot remove the KZMSA single-ended bus
termination.
a. True
b. False

20.

The lfu utility modies the SCSI ID or bus speed for


which adapter?
a. KZMSA
b. KZPSA
c.

KZTSA

d. PMAZC
21.

Use the set console command to set the SCSI ID or bus


speed for which adapter?
a. KZMSA
b. KZPSA
c.

KZTSA

d. PMAZC

Test 125

Questions

22.

You have an Available Server conguration with two


AlphaServer 2100 systems with KZPSA PCI to SCSI adapters,
an HSZ40, a DWZZA-VA, and a BA350 containing four RZ28s.
Why is the HSZ40 likely to be assigned SCSI ID zero (0)?
a. The DWZZA-VA is installed in BA350 slot 0
b. SCSI ID 0 is reserved for the HSZ40
c.

The HSZ40 must have a higher priority than the RZ28s

d. SCSI ID 0 is reserved for the HSZ40


e.

None of the above


Before installing the TruCluster Software:

23.

a. Read the release notes


b. Verify system prerequisites
c.

Install, set up, and test hardware

d. All of the above


24.

Which subsets are not required for TruCluster Available


Server Software Version 1.4?
a. OSFCLINET405
b. OSFPGMR405
c.

OSFCMPLRS405

d. None of the above; all are required


25.

To use the Cluster Monitor, you must install which


subsets?
a. CXLSHRDA405
b. OSFCDEMIN405
c.

TCRCMS140

d. All of the above


26.

A rolling upgrade allows you to upgrade ASE member


systems without shutting down the ASE.
a. True
b. False

For which environment can you use a rolling upgrade?

27.

a. ASE V1.2A/Digital UNIX Version 3.2C


b. ASE V1.1/Digital UNIX Version 3.0
c.

126 Test

ASE V1.0A/Digital UNIX Version 2.1

Questions

d. ASE V1.0/Digital UNIX Version 2.0


28.

You can perform a rolling upgrade to Digital UNIX


Version 4.0A from which operating system version?
a. Digital UNIX Version 3.2C
b. Digital UNIX Version 3.2D
c.

Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G


29.

If ASE member systems are at DECsafe Available


Server Version 1.2, you can preserve the ASE database if
desired.
a. True
b. False

30.

To add a new member to an existing TruCluster


Available Server Software Version 1.4 conguration, you must
shut down the ASE before adding the new member.
a. True
b. False

31.

Which conguration is supported during a rolling


upgrade to TruCluster Available Server Software Version 1.4?
a. ASE V1.3/Digital UNIX Version 3.2D and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
b. ASE V1.3/Digital UNIX Version 3.2F and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
c.

ASE V1.3/Digital UNIX Version 3.2G and TruCluster


Available Server Software Version 1.4/Digital UNIX
Version 4.0A

d. All of the above


32.

Action script containing commands to stop an


application.
a. Add
b. Delete
c.

Start

d. Stop
e.

Check

Test 127

Questions

33.

Action script containing commands to determine if a


service is running.
a. Add
b. Delete
c.

Start

d. Stop
e.
34.

Check

Minimum scripts you must create for an application


service.
a. Add and delete
b. Start and stop
c.

Add, delete, start, and stop

d. Add, delete, start, stop, and check


35.

How do you specify an action script that will remain


external to TruCluster Available Server in asemgr?
a. Specify the pathname when asemgr prompts for the script

name
b. Specify default, then add the commands to the skeleton

script
c.

Specify default, then add a pointer to the external script

d. Copy the external script to the ASE database


36.

You create an action script and specify its pathname


to asemgr. The next day you make changes to the script, but
TruCluster Available Server does not use the new version.
What action must you take?
a. Delete and add the service again
b. Modify the script through asemgr to update its database
c.

Edit the default script

d. Stop and start the service


37.

If you restrict ASE service A to one favored member


and that member crashes:
a. ASE relocates service A on the member running the least

number of services
b. ASE relocates service A on another favored member
c.

ASE nds no other favored member and relocates service


A on the member running the least number of services

d. ASE will not relocate service A

128 Test

Questions

38.

If you select the balanced service distribution policy for


service A and the member system running it crashes:
a. ASE relocates service A on the member running the least

number of services
b. ASE relocates service A on another favored member
c.

ASE nds no other favored member and relocates service


A on the member running the least number of services

d. ASE will not relocate service A

Before using asemgr to add an NFS service you must:

39.

a. Specify the service name and Internet address in all

member /etc/hosts les


b. Specify the service name and Internet address in all client

system /etc/hosts les


c.

Set up the UFS device special les, AdvFS, or LSM


volumes

d. All of the above


40.

You are adding a disk service and the service name is


in the /etc/hosts le only for the member system on which
asemgr is being run. The service will be set up and started.
a. True
b. False

41.

When you write an action script to both start and stop


the user-dened service, it will require what parameters?
a. Service name only
b. Action only
c.

Service name and action

d. None of the above


42.

When setting up a highly available user-dened


application, the application must be installed on all member
systems.
a. True
b. False

43.

If you stop a service that uses the Logical Storage


Manager, the disk groups are deported, are inaccessible, and
the volumes are deleted.
a. True
b. False

Test 129

Questions

44.

If the status of a service is unassigned, how would you


manually restart the service?
a. Relocate a service
b. Restart a service
c.

Set a service on line

d. Set a service off line

The command to create the cluster map.

45.

a. /etc/CCM
b. cluster_map_create
c.

cmon

d. tractd

The command to start the Cluster Monitor.

46.

a. asemgr
b. cluster_map_create
c.

cmon

d. monitor
47.

You are logged in to ASE member alpha from


workstation gamma and run the Cluster Monitor. You see
ASE domain members alpha and beta. To run the LSM utility
on beta you must:
a. Drag the beta icon and drop it on the dxlsm icon
b. Drag the dxlsm icon and drop it on the alpha icon
c.

Drag the dxlsm icon and drop it on the beta icon

d. Drag the dxlsm icon and drop it on the gamma icon


48.

To check the status of the shared disks associated


with a particular ASE service, go to this view of the Cluster
Monitor.
a. Top view, or main window
b. Devices view
c.

Services view

d. Any of the above


49.

In the Cluster Monitor, this symbol indicates the system


is not reported as an ASE member.
a. Blank area in the shape of the system icon
b. Outline around the system graphic
c.

Diagonal line across the system icon

d. Question mark in the middle of the system icon

1210 Test

Questions

50.

When a network partition occurs, the TruCluster


Software:
a. Stops all services
b. Fails over services to the member on which the Director is

running
c.

Continues running all services on the servers on which


they are located

d. Reboots all systems on which services are running

Which command initializes a disk for LSM?

51.

a. voldisk init
b. voldg init
c.

voldg -g db

d. volrecover -g db -sb
52.

If your ASE has a properly terminated bus, without


stopping TruCluster Software activity, you can:
a. Add a storage box to the system
b. Add a new member system
c.

Remove a DWZAA from the bus

d. Disconnect a member system from the bus


53.

The le in which the Availability Manager logs error


messages:
a. daemon.log
b. asecdb
c.

kern.log

d. asemgr.log
54.

When troubleshooting an active TruCluster


implementation, the rst thing you should do is:
a. Reset the daemons
b. Examine the error log messages
c.

Stop the services

d. Turn off the DWZAAs

Which utility displays the status of an ASE service?

55.

a. uerf
b. showfdmn
c.

ps

d. asemgr

Test 1211

Questions

One common cause of problems on a TruCluster SCSI

56.

bus:
a. Improperly terminated bus segments
b. Cable lengths too long
c.

Improperly congured SCSI IDs

d. All of the above


57.

The host names and IP addresses of each member


system must be included in which le on each member
system?
a. rc.local
b. asecdb
c.

/etc/hosts

d. /etc/fstab

1212 Test

Answers

Answers
1.

c What is not a feature of the TruCluster Software


product?
a. Concurrently Active Servers
b. Network Failover
c.

Distributed Lock Manager

d. Transparent NFS Failover


2.

e The following is a hardware requirement for TruCluster


Software.
a. A shared SCSI bus
b. External disks in an expansion box
c.

Ethernet or FDDI network

d. All of the above


3.

a Identify the Available Server component that controls


Available Server operations on one member system.
a. aseagent
b. asedirector
c.

asehsm

d. asemgr
4.

b The Available Server component that controls the ASE


and coordinates the ASE activities.
a. aseagent
b. asedirector
c.

asehsm

d. asemgr
5.

a The following characteristic is not required for an


application to be suitable for an ASE service:
a. The application must be able to be stopped using a set of

commands issued in a specic order


b. The application must read and write data from an NFS le

system
c.

The application must be able to be started using a set of


commands issued in a specic order

d. The application must run on only one system at a time


6.

d When you make initial plans for an Available Server


implementation, you must consider:
a. Services to be made available
b. Survivable failures
Test 1213

Answers

c.

Do services require custom scripts?

d. All of the above


7.

c Which feature is not provided by the TruCluster


Software?
a. Failover of services
b. Decoupling of host name and service name
c.

Restarting failed applications

d. Determining status of ASE members

8.

What provides the user interface to the ase software?

a. asedirector
b. asemgr
c.

Cluster Monitor

d. Availability Manager driver


9.

d A failure condition that does not result in a service


relocation:
a. Network Interface Failure
b. Host Down Scenario
c.

Device Failure

d. Network Partition
10.

c You must use a DWZZA signal converter with a KZMSA


because:
a. The KZMSA has only one channel
b. The KZMSA uses the differential mode of signal

transmission
c.

You cannot remove the KZMSA internal terminators

d. The KZMSA operates on only a wide SCSI bus


11.

a You should use a DWZZA signal converter with a


PMAZC SCSI host adapter because:
a. The DWZZA increases the maximum shared SCSI bus

length
b. The PMAZC is a dual-ported SCSI host adapter
c.

The PMAZC operates as either a fast or slow SCSI host


adapter

d. Signal conversion is necessary to connect the PMAZC to a

BA350 storage box

1214 Test

Answers

12.

d The maximum length of the shared SCSI bus in an


Available Server conguration for Version 1.4 is:
a. 3 meters
b. 6 meters
c.

25 meters

d. 31 meters
13.

c Which cable do you attach to a single-ended device to


enable you to disconnect the system without affecting SCSI
bus termination?
a. BN21J
b. BN21H
c.

BN21V-0B

d. BN21W-0B
14.

b You cannot mix single-ended and differential SCSI bus


segments on a shared SCSI bus for ASE congurations.
a. True
b. False

15.

What could you use in place of a BN21W-0B?

a. H8574-A
b. H8660-AA
c.

H879-AA

d. H885-AA
16.

b You are conguring a shared SCSI bus on port A of a


PMAZC. Which jumper do you remove to disable the PMAZC
single-ended bus termination?
a. W1
b. W2
c.

W3

d. W4
17.

c What is the most important thing to consider when


conguring DEC 3000 Model 500 systems in a single-ended
Available Server conguration with a BA350 without using a
DWZZA?
a. Remove the PMAZC single-ended termination jumper for

the port being used


Test 1215

Answers

b. Remove the ash memory write jumper


c.
18.

SCSI bus length is appropriate for the bus speed

b Which console command sets the SCSI ID for a


KZTSA?
a. t tc cnfg
b. t tc setid
c.

t tc speed

d. t tc id
19.

a When using a KZMSA XMI to SCSI bus adapter in


an Available Server conguration, you must use a DWZZA
because you cannot remove the KZMSA single-ended bus
termination.
a. True
b. False

20.

a The lfu utility modies the SCSI ID or bus speed for


which adapter?
a. KZMSA
b. KZPSA
c.

KZTSA

d. PMAZC
21.

c Use the set console command to set the SCSI ID or bus


speed for which adapter?
a. KZMSA
b. KZPSA
c.

KZTSA

d. PMAZC
22.

a You have an Available Server conguration with two


AlphaServer 2100 systems with KZPSA PCI to SCSI adapters,
an HSZ40, a DWZZA-VA, and a BA350 containing four RZ28s.
Why is the HSZ40 likely to be assigned SCSI ID zero (0)?
a. The DWZZA-VA is installed in BA350 slot 0
b. SCSI ID 0 is reserved for the HSZ40
c.

The HSZ40 must have a higher priority than the RZ28s

d. SCSI ID 0 is reserved for the HSZ40

1216 Test

Answers

e.

None of the above


d

23.

Before installing the TruCluster Software:

a. Read the release notes


b. Verify system prerequisites
c.

Install, set up, and test hardware

d. All of the above


24.

d Which subsets are not required for TruCluster Available


Server Software Version 1.4?
a. OSFCLINET405
b. OSFPGMR405
c.

OSFCMPLRS405

d. None of the above; all are required


25.

d To use the Cluster Monitor, you must install which


subsets?
a. CXLSHRDA405
b. OSFCDEMIN405
c.

TCRCMS140

d. All of the above


26.

a A rolling upgrade allows you to upgrade ASE member


systems without shutting down the ASE.
a. True
b. False

27.

For which environment can you use a rolling upgrade?

a. ASE V1.2A/Digital UNIX Version 3.2C


b. ASE V1.1/Digital UNIX Version 3.0
c.

ASE V1.0A/Digital UNIX Version 2.1

d. ASE V1.0/Digital UNIX Version 2.0


28.

d You can perform a rolling upgrade to Digital UNIX


Version 4.0A from which operating system version?
a. Digital UNIX Version 3.2C
b. Digital UNIX Version 3.2D
c.

Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G

Test 1217

Answers

29.

a If ASE member systems are at DECsafe Available


Server Version 1.2, you can preserve the ASE database if
desired.
a. True
b. False

30.

b To add a new member to an existing TruCluster


Available Server Software Version 1.4 conguration, you must
shut down the ASE before adding the new member.
a. True
b. False

31.

c Which conguration is supported during a rolling


upgrade to TruCluster Available Server Software Version 1.4?
a. ASE V1.3/Digital UNIX Version 3.2D and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
b. ASE V1.3/Digital UNIX Version 3.2F and TruCluster

Available Server Software Version 1.4/Digital UNIX


Version 4.0A
c.

ASE V1.3/Digital UNIX Version 3.2G and TruCluster


Available Server Software Version 1.4/Digital UNIX
Version 4.0A

d. All of the above


32.

d Action script containing commands to stop an


application.
a. Add
b. Delete
c.

Start

d. Stop
e.
33.

Check

e Action script containing commands to determine if a


service is running.
a. Add
b. Delete
c.

Start

d. Stop
e.

1218 Test

Check

Answers

34.

b Minimum scripts you must create for an application


service.
a. Add and delete
b. Start and stop
c.

Add, delete, start, and stop

d. Add, delete, start, stop, and check


35.

c How do you specify an action script that will remain


external to TruCluster Available Server in asemgr?
a. Specify the pathname when asemgr prompts for the script

name
b. Specify default, then add the commands to the skeleton

script
c.

Specify default, then add a pointer to the external script

d. Copy the external script to the ASE database


36.

b You create an action script and specify its pathname


to asemgr. The next day you make changes to the script, but
TruCluster Available Server does not use the new version.
What action must you take?
a. Delete and add the service again
b. Modify the script through asemgr to update its database
c.

Edit the default script

d. Stop and start the service


37.

d If you restrict ASE service A to one favored member


and that member crashes:
a. ASE relocates service A on the member running the least

number of services
b. ASE relocates service A on another favored member
c.

ASE nds no other favored member and relocates service


A on the member running the least number of services

d. ASE will not relocate service A


38.

a If you select the balanced service distribution policy for


service A and the member system running it crashes:
a. ASE relocates service A on the member running the least

number of services
b. ASE relocates service A on another favored member
c.

ASE nds no other favored member and relocates service


A on the member running the least number of services

d. ASE will not relocate service A

Test 1219

Answers

39.

Before using asemgr to add an NFS service you must:

a. Specify the service name and Internet address in all

member /etc/hosts les


b. Specify the service name and Internet address in all client

system /etc/hosts les


c.

Set up the UFS device special les, AdvFS, or LSM


volumes

d. All of the above


40.

b You are adding a disk service and the service name is


in the /etc/hosts le only for the member system on which
asemgr is being run. The service will be set up and started.
a. True
b. False

41.

c When you write an action script to both start and stop


the user-dened service, it will require what parameters?
a. Service name only
b. Action only
c.

Service name and action

d. None of the above


42.

a When setting up a highly available user-dened


application, the application must be installed on all member
systems.
a. True
b. False

43.

b If you stop a service that uses the Logical Storage


Manager, the disk groups are deported, are inaccessible, and
the volumes are deleted.
a. True
b. False

44.

b If the status of a service is unassigned, how would you


manually restart the service?
a. Relocate a service
b. Restart a service
c.

Set a service on line

d. Set a service off line


45.

The command to create the cluster map.

a. /etc/CCM
b. cluster_map_create
1220 Test

Answers

c.

cmon

d. tractd

46.

The command to start the Cluster Monitor.

a. asemgr
b. cluster_map_create
c.

cmon

d. monitor
47.

c You are logged in to ASE member alpha from


workstation gamma and run the Cluster Monitor. You see
ASE domain members alpha and beta. To run the LSM utility
on beta you must:
a. Drag the beta icon and drop it on the dxlsm icon
b. Drag the dxlsm icon and drop it on the alpha icon
c.

Drag the dxlsm icon and drop it on the beta icon

d. Drag the dxlsm icon and drop it on the gamma icon


48.

c To check the status of the shared disks associated


with a particular ASE service, go to this view of the Cluster
Monitor.
a. Top view, or main window
b. Devices view
c.

Services view

d. Any of the above


49.

a In the Cluster Monitor, this symbol indicates the system


is not reported as an ASE member.
a. Blank area in the shape of the system icon
b. Outline around the system graphic
c.

Diagonal line across the system icon

d. Question mark in the middle of the system icon


50.

c When a network partition occurs, the TruCluster


Software:
a. Stops all services
b. Fails over services to the member on which the Director is

running
c.

Continues running all services on the servers on which


they are located

d. Reboots all systems on which services are running


51.

Which command initializes a disk for LSM?

a. voldisk init
b. voldg init

Test 1221

Answers

c.

voldg -g db

d. volrecover -g db -sb
52.

d If your ASE has a properly terminated bus, without


stopping TruCluster Software activity, you can:
a. Add a storage box to the system
b. Add a new member system
c.

Remove a DWZAA from the bus

d. Disconnect a member system from the bus


53.

c The le in which the Availability Manager logs error


messages:
a. daemon.log
b. asecdb
c.

kern.log

d. asemgr.log
54.

b When troubleshooting an active TruCluster


implementation, the rst thing you should do is:
a. Reset the daemons
b. Examine the error log messages
c.

Stop the services

d. Turn off the DWZAAs

55.

Which utility displays the status of an ASE service?

a. uerf
b. showfdmn
c.

ps

d. asemgr
56.

d
bus:

One common cause of problems on a TruCluster SCSI

a. Improperly terminated bus segments


b. Cable lengths too long
c.

Improperly congured SCSI IDs

d. All of the above


57.

c The host names and IP addresses of each member


system must be included in which le on each member
system?
a. rc.local
b. asecdb
c.

/etc/hosts

d. /etc/fstab

1222 Test

Index
A
action scripts, 26, 69, 720
add, 63
check, 63
delete, 63
start, 63
stop, 63
Address Resolution Protocol
See ARP
AdvFS, 12, 721
Use with ASE, 73, 76, 711, 737
Use with Available Server, 17
Use with the TruCluster Available Server,
14
Alert messages, 107
arc, 367
ARC console, 367
ARP, 711
ASE database
/usr/var/ase/config/asecdb, 414
ase driver, 26
ASE Logger daemon, 420
aseagent daemon, 25, 515
asecdb, 47
asedirector daemon, 25, 28
asehsm daemon, 25
aselogger daemon, 26, 515, 517, 64
asemgr, 72
asemgr utility, 410, 414, 416, 55, 64,
69, 612, 74, 75, 711, 712, 717,
719, 721, 727, 732, 735, 736, 1013
aseprod, 15
ase_fix_config, 44, 49
ase_fix_config script, 420
ASP, 75, 720
Automatic Service Placement policy
See ASP
automount, 76
Availability Manager driver, 26
Available Server, 12
troubleshooting, 117
Available Server conguration
differential with PMAZC, 331, 335
Single-ended with PMAZC, 328

Available Server conguration planning,


115
Available Server Environment
See ASE
Available Server Management Phases, 114

B
BA350, 311
jumper, 312
termination, 312
BA353, 311
BA356, 311
jumper, 313
termination, 313
Base operating system setup, 116
BC06P, 321
bindsetup, 45
BN21H, 321
BN21K, 321
BN21L, 321
BN21R, 321
BN21V-0B, 321
BN21W-0B, 321
BN23G, 321
Bus speed
setting for KZPSA, 367

C
Cables, 320
BC06P, 321
BC09, 1122
BN21H, 321
BN21K, 321
BN21L, 321
BN21R, 321
BN21V-0B, 321
BN21W-0B, 321
BN23G, 321
CDFS, 326
cluster, 15
cluster conguration map
See /etc/CCM
cluster map, 83

Index1

Cluster Monitor, 86
setup, 83
cluster_map_create command, 83
cmon utility, 86
Commands
arc, 367
iostat, 1019
netstat, 1018
ps, 1017
rpcinfo, 1017
scu, 1018
set, 367
set pkn, 367
show config, 364, 366
show device, 364, 366
show pk#*, 364, 366
t, 342
t Test TURBOchannel command, 342
uerf, 1016
Compact Disk File System
See CDFS
Conguration
starting, 326
Conguring ASE hardware, 115
Conguring ASE Services, 116
Conguring Available Server
with KZMSA and BA350, 351
with KZMSA and HSZ40, 354
with PMAZC and an HSZ10 or HSZ40,
338
with PMAZC, differential bus, and BA350,
331
with PMAZC, differential bus, and BA356,
335
Conguring TruCluster Available Server
with PMAZC, single-ended bus, and
BA350, 327
Console Utility
t Test TURBOchannel command, 342

D
daemon.log le, 105
Database format change, 47
director daemon, 25
Disk devices, 315
disk service, 64, 73, 719, 721
disklabel, 76
Displaying devices on an AlphaServer 1000,
2000 or 2100, 366
doconfig, 44, 49
DWZZA, 315
in BA350 slot 0, 332
termination, 317, 332, 335, 339, 351,
371

Index2

DWZZA-AA, 336
DWZZA-VA
in BA356 slot 0, 336
DWZZB, 315
termination, 319, 351
DWZZB-VW
in BA356 slot 0, 336

E
edquota, 77
/etc/CCM, 83, 84
/etc/exports, 717
/etc/exports.ase, 717
/etc/exports.ase.servicename, 717
/etc/fstab, 74, 76, 77, 711, 720
/etc/host, 325
/etc/hosts, 45, 49, 410, 411, 74, 711
/etc/ntp.conf, 44
/etc/syslog.conf, 517
event logging, 105

F
failover, 29
Fast SCSI, 33
rmware update
KZPSA, 367
Firmware Update utility, 326
fwupdate.exe, 367

G
Global Event Logging, 14

H
H6660-AA, 321, 322
H8574-A, 321
H879-AA, 321, 322
H885-AA, 321, 322
Hardware components, 310
disk devices, 315
SCSI cables, 320
SCSI controllers, 310
signal converters, 315
storage expansion units, 311
systems, 310
terminators, 320
tri-link connector, 320
Host Status Monitor, 25
HSM daemon, 25

Installing TruCluster Software, 116, 417,


423
installupdate, 410, 411

netsetup, 325, 45
Network adapters, 325
Network Time Protocol
See xntpd
newfs, 76
NFS, 15, 74, 76, 77, 711
NFS service, 64, 73, 711
nfssetup, 45
NTP
See xntpd

K
kern.log le, 105
Kernel build, 420
KZMSA, 350
Available Server conguration with BA350,
351
Available Server conguration with HSZ40,
354
boot ROM part numbers, 350
Disable Reset conguration option, 351,
355
hardware revision, 350
NCR chips, 350
setting SCSI ID, 351, 355
setting SCSI speed, 351, 355
updating rmware, 351, 355
KZMSA and DWZZA, 34
KZPSA
bus speed, 367
SCSI bus ID, 367
setting bus speed, 367
setting SCSI ID, 367
KZTSA
displaying and changing SCSI ID, 341
setting up an Available Server
conguration, 344, 347
t command, 342

L
LFU, 356
LFU utility, 350, 358
Loadable Firmware Update utility
See LFU
login service, 731
LSM, 12, 721
Use with ASE, 73, 77, 711, 737
Use with the TruCluster Available Server,
219

M
mailsetup, 45
member systems, 16
mirrored stripe set, 721
mirroring, 721
mkfset, 77
mount command, 417, 423

P
PMAZC
Available Server conguration with an
HSZ10 or HSZ40, 338
Available Server differential conguration
with BA350, 331
Available Server differential conguration
with BA356, 335
Available Server single-ended conguration
with BA350, 327
conguring for a differential conguration,
331, 335, 338
displaying and changing SCSI ID, 341
displaying and changing speed, 341
in single-ended conguration, 327
install, 328, 331, 335, 339
internal jumpers, 341
jumpers, 328, 331, 335, 339, 341
setting SCSI bus speed, 328
setting SCSI ID, 328
t command, 342
termination, 328, 331, 335, 339
used with DWZZA, 332, 339
used with DWZZA-AA, 335
/proc, 77
pseudo host name, 731

Q
quota, 77, 1134
quota.group, 77
quota.user, 77
quotacheck, 77

R
Replacing an LSM Shared Disk, 99
Required subsets, 43
/.rhosts, 83

Index3

S
/sbin/init.d/asemember script, 515
/sbin/init.d/asemember stop, 410, 411
SCSI Bus ID
setting for KZPSA, 367
SCSI bus length, 33
SCSI bus termination, 320
SCSI cables, 320
SCSI controllers, 310
sendmail.cf, 718
service
highly available, 73
set, 367
setid, 328, 331, 335, 339
setld -d, 410, 411
setld -i, 410, 411
setld utility, 48, 410, 414, 416, 417,
423, 59
Setting bus speed
KZPSA, 367
Setting SCSI ID
KZPSA, 367
Shared SCSI bus selection, 420
show config, 364, 366
show device, 364, 366
show pk#*, 364, 366
showfdmn command, 1019
showfsets command, 1019
Signal converters, 315
sizer, 44
Slow SCSI, 33
Software subsets, 43
Starting an Available Server conguration,
326
Storage expansion units, 311
stripe set, 721
Supported hardware
cables, 320
disk devices, 315
signal converters, 315
storage expansion units, 311
terminators, 320
tri-link connector, 320
Supported systems, 310
syslog, 517, 64, 105

T
t, 342
Terminators, 320
H6660-AA, 321, 322
H8574-A, 321, 322
H879-AA, 321, 322
tri-connector
H885-AA, 321

Index4

tri-link connector, 320


H885-AA, 322
Troubleshooting, 102
See TruCluster Available Server
troubleshooting
TruCluster Available Server
troubleshooting, 102
troubleshooting procedures, 108
TruCluster Available Server installation
Adding a member system to an existing
ASE, 416
Rolling upgrade, 410
Setting up an ASE for the rst time, 48
Simultaneous upgrade, 414
TruCluster Available Server troubleshooting,
112
common problems, 113
conguration guidelines, 1123
TruCluster Software
conguring services, 116
failover testing, 116
installing, 116
TruCluster troubleshooting
system monitoring tools, 1012

U
user-dened service, 64, 73, 727, 731
/usr/bin/X11/cmon, 86
/usr/sbin/asemgr, 26
/usr/var/ase/config/asecdb, 55
/usr/var/ase/config/asecdb, 26
Utilities
rmware update utility, 367
LFU, 356
LFU, 350
setid, 328, 331, 335, 339

V
/var/adm/syslog.dated, 712
/var/adm/syslog.dated/date/daemon.log,
515, 517, 522

/var/adm/syslog.dated/date/kern.log,
517

/var/spool/mail, 718
/var/spool/mqueue, 718
vedquota, 77
volprint, 1019

X
X Window System, 86
xntpd, 44

Y
Y cable
BN21V-0B, 321
BN21W-0B, 321
ypsetup, 45

Index5

Você também pode gostar