Você está na página 1de 254

AlphaServer DS20E

AlphaStation DS20E
Service Guide
Order Number: EK-K8F6W-SV. A01

Compaq Computer Corporation

Notice
First Printing, February 2000
2000 Compaq Computer Corporation.
COMPAQ, Compaq Insight Manager, the Compaq logo, and OpenVMS Registered in U.S. Patent and Trademark
Office. AlphaServer and Tru64 are trademarks of Compaq Information Technologies Group, L.P. in the United States
and/or other countries. Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark in the U.S.
and other countries, licensed exclusively through X/Open Company Ltd. All other product names mentioned herein
may be the trademarks or registered trademarks of their respective companies.
Compaq shall not be liable for technical or editorial errors or omissions contained herein. The information in this
publication is subject to change without notice.

FCC Notice: The equipment described in this manual generates, uses, and may emit radio frequency energy. The
equipment has been type tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of
FCC rules, which are designed to provide reasonable protection against such radio frequency interference. Operation
of this equipment in residential area may cause interference in which case the user at his own expense will be required
to take whatever measures may be required to correct the interference. Any Microsoft, modifications to this device
unless expressly approved by the manufacturercan void the authority to operate this equipment under part 15 of the
FCC rules.
Shielded Cables: If shielded cables have been supplied or specified, they must be used on the system in order to
maintain international regulatory compliance.

Warning! This is a Class A product. In a domestic environment this product may cause radio interference in which
case the user may be required to take adequate measures.
Achtung! Dieses ist ein Gert der Funkstrgrenzwertklasse A. In Wohnbereichen knnen bei Betrieb dieses Gertes
Rundfunkstrungen auftreten, in welchen Fllen der Benutzer fr entsprechende Gegenmanahmen verantwortlich
istAttention! Ceci est un produit de Classe A. Dans un environnement domestique, ce produit risque de crer des
interfrences radiolectriques, il appartiendra alors l'utilisateur de prendre mesures spcifiques appropries.

Contents

Contents
About This Guide
Intended Audience............................................................................................... xiii
Document Organization ....................................................................................... xiii
DS20E Documentation ........................................................................................ xiv
Symbols in Text .................................................................................................. xiv
Rack Stability....................................................................................................... xv
Alpha Web Site .................................................................................................... xv

Chapter 1
System Overview
Introduction......................................................................................................... 1-1
Product Description ............................................................................................. 1-2
Product Packaging............................................................................................... 1-3
Memory and I/O .................................................................................................. 1-3
System Components ............................................................................................ 1-4
Operator Control Panel ................................................................................. 1-6
Rear Panel .................................................................................................... 1-8
Power Supplies ........................................................................................... 1-10
Storage Subsystem...................................................................................... 1-11
Side Cover Interlock ................................................................................... 1-11
Removable Media....................................................................................... 1-12
Server Features Module (SFM2) ................................................................. 1-13
System Board ............................................................................................. 1-14
PCI Options................................................................................................ 1-15
CPU Modules ............................................................................................. 1-16
Speaker ...................................................................................................... 1-16
Doors ......................................................................................................... 1-16
Standard Components and Features.................................................................... 1-17
Mechanical Specifications ................................................................................. 1-18

vi

Electrical Specifications .................................................................................... 1-19


Environmental Specifications ............................................................................ 1-20
Differences Between DS20E and DS20.............................................................. 1-20

Chapter 2
Technical Overview
System Block Diagram ........................................................................................ 2-2
P-Chips ........................................................................................................ 2-3
C-Chip ......................................................................................................... 2-3
D-Chips........................................................................................................ 2-3
Bcache ......................................................................................................... 2-3
Memory DIMMs .......................................................................................... 2-3
CPU ............................................................................................................. 2-4
CPU Subsystem................................................................................................... 2-5
Bcache Interface ........................................................................................... 2-5
Clock Interface ............................................................................................. 2-5
SROM Interface ........................................................................................... 2-5
System Interface ........................................................................................... 2-6
Cross-Bar Switch................................................................................................. 2-7
D-chip (data slice) ........................................................................................ 2-7
C-chip (controller chip)................................................................................. 2-8
P-chip (peripheral interface chip) .................................................................. 2-8
Memory Subsystem ............................................................................................. 2-9
Memory Configuration Rules........................................................................ 2-9
Qualified DIMMs ......................................................................................... 2-9
Typical Memory Configurations ................................................................. 2-10
I/O Subsystem ................................................................................................... 2-10
PCI Interface .............................................................................................. 2-10
ISA Interface .............................................................................................. 2-11
Timing Interrupt and General (TIG) Interface.............................................. 2-12
Environmental Logic ......................................................................................... 2-14
SFM2 Status LEDs ..................................................................................... 2-15
SFM2 Power Supplies ................................................................................ 2-15
SFM2 Inverter ............................................................................................ 2-16
SFM2 PAL ................................................................................................. 2-16
SFM2 System Fans Sense Logic ................................................................. 2-16
SFM2 CPU Fans Sense Logic ..................................................................... 2-17
SFM2 30-Second Shutdown........................................................................ 2-17
SFM2 Temperature Sensor.......................................................................... 2-17
SFM2 Remote Management Controller Microprocessor .............................. 2-17
2
Maintenance Bus (I C Bus)................................................................................ 2-18
Monitoring System Conditions.................................................................... 2-19
Fault Display .............................................................................................. 2-19
Error State .................................................................................................. 2-19
Configuration Tracking............................................................................... 2-19

vii

Chapter 3
System Installation
Introduction......................................................................................................... 3-1
Preparing for Installation ..................................................................................... 3-2
Positioning the System......................................................................................... 3-3
Connecting the System ........................................................................................ 3-4
Verifying Hardware Installation........................................................................... 3-5
Shutting Down the System................................................................................... 3-6
Shutting Down the Tru64 UNIX Operating System ....................................... 3-6
Shutting Down the OpenVMS Operating System .......................................... 3-6
Updating the Firmware ........................................................................................ 3-7
Locking the System ............................................................................................. 3-7
Installing a Rackmount System ............................................................................ 3-8
Marking the Installation Area in the Rack...................................................... 3-9
Rack Accessories........................................................................................ 3-10
Preparing the System .................................................................................. 3-12
Preparing the Rack...................................................................................... 3-16
Attaching Slide Brackets to Rails ................................................................ 3-17
Stabilizing the Rack.................................................................................... 3-18
Installing the System................................................................................... 3-19
Installing U-Nuts ........................................................................................ 3-20
Installing the Interlock System .................................................................... 3-21
Installing the Cable Management Arm ........................................................ 3-23
Dressing the Cables .................................................................................... 3-24
Attaching the Front Bezel ........................................................................... 3-25
Starting a Tru64 UNIX Installation .................................................................... 3-26
Booting Tru64 UNIX......................................................................................... 3-27
Verifying the Firmware Version.................................................................. 3-27
Changing Startup and Boot Defaults............................................................ 3-28
Ensuring that Environment Variables Match System Configuration ............. 3-28
Installing OpenVMS.......................................................................................... 3-29
Booting OpenVMS............................................................................................ 3-29
Booting OpenVMS from the local CD-ROM Drive ..................................... 3-29
Booting OpenVMS from an InfoServer ....................................................... 3-30
Installing Linux ................................................................................................. 3-31
Booting Linux ................................................................................................... 3-31
Linux Boot Example ................................................................................... 3-32

Chapter 4
System Configuration
Introduction......................................................................................................... 4-1
Base System Configuration.................................................................................. 4-1
Switch Settings.................................................................................................... 4-2
System Board SW2....................................................................................... 4-2
System Board SW3....................................................................................... 4-4
CPU SW1..................................................................................................... 4-5

viii

CPU SW2..................................................................................................... 4-7


Memory Configurations....................................................................................... 4-8
Addressing Considerations .................................................................................. 4-9
CPU to PCI Address Translation................................................................. 4-11
SCSI Configuration ........................................................................................... 4-16
SCSI IDs .................................................................................................... 4-16
SCSI Termination....................................................................................... 4-16
SCSI Cable Length ..................................................................................... 4-16
PCI and ISA Configuration................................................................................ 4-17
PCI Slot Numbering ................................................................................... 4-17
Graphics Options........................................................................................ 4-17
SCSI Controllers......................................................................................... 4-18
PCI Restrictions.......................................................................................... 4-18
ISA Bus...................................................................................................... 4-18
ISA Restrictions ......................................................................................... 4-18
PCI Assignment Tables............................................................................... 4-18
Interrupt Configuration...................................................................................... 4-20
TIG Bus Interrupt Assignments................................................................... 4-21
Real-Time Clock Interrupt .......................................................................... 4-21
Halt Interrupt.............................................................................................. 4-21
ISA Interrupt Assignments.......................................................................... 4-22
Fan Fault Interrupt ...................................................................................... 4-22
TIG Interrupt Processing............................................................................. 4-22
DMA Configuration .......................................................................................... 4-23
Firmware Configuration .................................................................................... 4-23
System Options and Upgrades ........................................................................... 4-25
Obtaining Options ...................................................................................... 4-25
Upgrading the CPU to EV67 Operating at 667 MHz.................................... 4-25
Upgrading Memory .................................................................................... 4-25
Updating Firmware and Device Drivers ...................................................... 4-26
Adding Third-Party Devices ....................................................................... 4-26
Adding PCI Options ................................................................................... 4-26

Chapter 5
Firmware
Introduction......................................................................................................... 5-1
Firmware in the DS20E ....................................................................................... 5-2
SRM Console ............................................................................................... 5-2
AlphaBIOS Console ..................................................................................... 5-2
Updating Firmware and Device Drivers ........................................................ 5-2
Using the SRM Console ...................................................................................... 5-3
SRM Console Start Sequence........................................................................ 5-3
Displaying System Configuration.................................................................. 5-4
Showing and Setting Environment Variables................................................. 5-7
Initializing the System ................................................................................ 5-10
Listing and Reading a File .......................................................................... 5-11
Editing Files ............................................................................................... 5-12

ix

Depositing and Examining Data .................................................................. 5-12


Creating a Power-Up Script ........................................................................ 5-17
Booting an Operating System...................................................................... 5-19
Forcing a System Crash Dump.................................................................... 5-21
Obtaining Help ........................................................................................... 5-21
Using the AlphaBIOS Console........................................................................... 5-22
Starting AlphaBIOS.................................................................................... 5-22
Keyboard Conventions and Help................................................................. 5-24
Running AlphaBIOS from a Serial Terminal ............................................... 5-25
Utilities ...................................................................................................... 5-26
To Run a Configuration Utility.................................................................... 5-27
Remote Management Console............................................................................ 5-28
First-Time Setup......................................................................................... 5-28
RMC Commands ........................................................................................ 5-29
Using the RMC Switchpack ........................................................................ 5-33
Troubleshooting the RMC........................................................................... 5-36

Chapter 6
Troubleshooting
Introduction......................................................................................................... 6-1
Basic Troubleshooting ......................................................................................... 6-2
Considerations Before Troubleshooting......................................................... 6-2
Steps for Isolating Faults............................................................................... 6-2
Troubleshooting Strategy .............................................................................. 6-3
Problem Categories ............................................................................................. 6-4
Power Problems............................................................................................ 6-4
No Access to Console Mode ......................................................................... 6-6
Console-Reported Failures............................................................................ 6-7
Boot Problems .............................................................................................. 6-8
Thermal Problems....................................................................................... 6-10
Operating System-Reported Failures ........................................................... 6-10
Memory Problems ...................................................................................... 6-10
PCI Bus Problems....................................................................................... 6-11
SCSI Problems ........................................................................................... 6-12
Power Up/Down Sequence ................................................................................ 6-12
Troubleshooting Tools and Utilities ................................................................... 6-13
Fail-Safe Booter (FSB) Utility .................................................................... 6-14
Power-On Self-Test (POST)........................................................................ 6-17
LEDs and Beep Codes ................................................................................ 6-18
Using Firmware to Troubleshoot........................................................................ 6-24
Using SRM Commands to Test the System ................................................. 6-24
Changing the System Type ................................................................................ 6-30
For More Information ........................................................................................ 6-30

Chapter 7
Error Registers
Introduction......................................................................................................... 7-1
Ibox Status Register............................................................................................. 7-2
Memory Management Status Register.................................................................. 7-5
Dcache Status Register ........................................................................................ 7-6
Cbox Read Register............................................................................................. 7-7
Miscellaneous Register........................................................................................ 7-8
Device Interrupt Request Registers .................................................................... 7-10
P-Chip Error Register ........................................................................................ 7-11
Failure Register ................................................................................................. 7-13
Function Register .............................................................................................. 7-14

Chapter 8
OS Diagnostics Overview
Introduction......................................................................................................... 8-1
Tru64 UNIX Diagnostic Tools............................................................................. 8-2
DEC VET ........................................................................................................... 8-3
Machine Checks .................................................................................................. 8-5
Operating System ......................................................................................... 8-5
Error Classes ................................................................................................ 8-7
Error Types .................................................................................................. 8-8
Machine Check Logout Frame (SCB 660 and 670)...................................... 8-10
Compaq Analyze ............................................................................................... 8-18
Compaq Analyze Operation ........................................................................ 8-19
Compaq Analyze Analysis Components...................................................... 8-20
Compaq Analyze Interface.......................................................................... 8-21
Using Compaq Analyze with a Standard Browser........................................ 8-22
Compaq Analyze Error Report .................................................................... 8-22

Chapter 9
Removal and Replacement Procedures
Introduction......................................................................................................... 9-1
FRU Part Numbers .............................................................................................. 9-2
Precautions.......................................................................................................... 9-3
Side Cover .......................................................................................................... 9-4
Operator Control Panel ........................................................................................ 9-5
PCI/ISA Options ................................................................................................. 9-6
Storage Subsystem .............................................................................................. 9-7
Removable Media Drive Bay............................................................................... 9-9
CPU Daughter Card........................................................................................... 9-10
CPU Guide Brackets.......................................................................................... 9-11
System Board .................................................................................................... 9-12
DIMMs ............................................................................................................. 9-13
Battery .............................................................................................................. 9-14
Fans .................................................................................................................. 9-15

xi

Speaker ............................................................................................................. 9-16


Power Supply .................................................................................................... 9-17
Power Supply Backplane ................................................................................... 9-18
Side Cover Interlock.......................................................................................... 9-19
Server Features Module ..................................................................................... 9-20
System Cabling ................................................................................................. 9-21
Data and Signal Cabling Information .......................................................... 9-21
Power Cabling Information ......................................................................... 9-21

Chapter 10
Compaq Insight Manager
Introduction....................................................................................................... 10-1
Overview........................................................................................................... 10-2
Functions of Compaq Insight Manager............................................................... 10-3
Insight Manager Components ............................................................................ 10-4
For More Information ........................................................................................ 10-6

xiii

About This Guide

Intended Audience
This manual is for service providers who are responsible for servicing Compaq
AlphaServer DS20E and AlphaStation DS20E systems.

Document Organization
This manual has nine chapters.
Chapter 1, System Overview, introduces the physical components of the
system.
Chapter 2, Technical Overview, describes the switch-based interconnect; the
system board logic; the CPU, memory, and I/O subsystems; the environmental
logic; and the maintenance bus.
Chapter 3, System Installation, explains how to install the system, how to
install the rackmount system, and how to boot an operating system.
Chapter 4, System Configuration, describes the base system configuration;
configuring CPUs, memory, and I/O; interrupt and DMA configuration; and
system options and upgrades.
Chapter 5, Firmware, covers the SRM firmware and the remote console
manager (RCM).
Chapter 6, Troubleshooting, presents troubleshooting steps as well as
troubleshooting with LEDs and beep codes and with the SRM console.

Compaq Confidential Need to Know Required


Writer: Jill Hackett Stevens / Bob Young Project: DS20E Reference Guide Comments:
File Name: frnt.doc Last Saved On: 2/16/00 10:47 AM

xiv

Chapter 7, Error Registers, presents the registers that contain information


critical to the troubleshooting process.
Chapter 8, OS Diagnostics Overview, explains the operating system
diagnostics and the Compaq Analyze error analysis and reporting tool.
Chapter 9, FRU Removal and Replacement, lists the steps for removing and
replacing FRUs (field-replaceable units).
Chapter 10, Compaq Insight Manager, introduces the server management
tool used to monitor and control the operation of DS20E systems.

DS20E Documentation
Title

Order Number

DS20E Reference Guide

ER-K8F6W-UA

DS20E Basic Installation

ER-K8F6W-IM

DS20E Processor Upgrade

ER-PD12U-UG

KN311 CPU Installation Card

EK-DSCPU-IN

Memory Option Installation Card

EK-MS340-IN

H9A10/H9A15 Rack-mounting Template

EK-DS20E-TP

Release Notes

EK-K8F6W-RN

Symbols in Text
These symbols are found in the text of this guide. They have the following
meanings.
WARNING: Text set off in this manner indicates that failure to
follow directions in the warning could result in bodily harm or loss
of life.
CAUTION: Text set off in this manner indicates that failure to
follow directions could result in damage to equipment or loss of
information.

xv

NOTE: Text set off in this manner presents commentary, sidelights, or


interesting points of information.

Rack Stability
WARNING: To reduce the risk of personal injury or damage to the
equipment, be sure that:

The leveling jacks are extended to the floor.

The full weight of the rack rests on the leveling jacks.

The stabilizing feet are attached to the rack if it is a single rack


installation.

The racks are coupled together in multiple rack installations.

A rack may become unstable if more than one component is


extended. Extend only one component at a time.

Alpha Web Site


The Compaq Alpha Web site has information on this product as well as the
latest drivers and flash ROM images.
http://www.digital.com/alphaserver/ds20e/index.html

Compaq Confidential Need to Know Required


Writer: Jill Hackett Stevens / Bob Young Project: DS20E Reference Guide Comments:
File Name: frnt.doc Last Saved On: 2/16/00 10:47 AM

Chapter

System Overview

Introduction
The Compaq DS20E is a low-end server that provides 64-bit performance for compute-intensive
applications. It is a departmental server with a design that supports expansion without
necessarily taking up additional floor space. It provides high performance, comprehensive
system management, high availability, and easy access for servicing.
The system ships with one processor, but can be upgraded to a dual-processor system. Its single
system board, also known as the main logic board (MLB), contains the I/O subsystem, including
the PCI/ISA slots and the cabling. The system also provides internal mounting for four disk
storage units (six units in the future) and an open removable media bay.
This chapter covers the following components:

Product Description

Product Packaging

Memory and I/O

System Components

Standard Components and Features

Mechanical Specifications

Electrical Specifications

Environmental Specifications

Differences Between the DS20E and DS20 Systems

1-2 DS20E Service Guide

Product Description
The DS20E system is a departmental class server that runs the Tru64 UNIX, OpenVMS, and
Linux operating systems. It can have up to two Alpha 21264 processors, the EV6 (500 MHz) or
EV67 (667 MHz).
DS20E memory can be increased up to 4 GB. The system uses a PCI and ISA bus architecture,
and network clustering technology. For system management, the DS20E includes Compaq
Insight Manager, a GUI-based tool for monitoring and controlling system operation. The DS20E
system is available in a pedestal or an industry-standard chassis for mounting in a rack. The side
cover can be removed to expose most system components for maintenance. The drives, fans,
and power supplies are hot-swap devices that can be replaced while the system is running.

System Overview 1-3

Product Packaging
The system can be used as a deskside pedestal in the vertical position, or with the addition of
brackets, may be mounted in the horizontal position in a standard rack.

Rackmount

Pedestal

CAT0039

Memory and I/O


All memory and I/O components are on a single system board that contains a memory
subsystem, PCI bus, ISA bus, integrated SCSI F/W/U I/O controllers, and option slots for PCIbased and ISA-based option modules. A separate server features module V2 (SFM2) monitors
environmental conditions.

1-4 DS20E Service Guide

System Components
9
5

7
6
3

10
4

8
MR0300A

System Overview 1-5

Figure Legend

Component

Floppy disk drive

CD-ROM

Storage subsystem

Hard disk drives

Operator control panel (OCP)

CPUs

System board

Power supplies (2 minimum)

System features module

SymBios Adapter 895 SCSI Card

1-6 DS20E Service Guide

Operator Control Panel

3
1
2

6
1

CAT0018

System Overview 1-7

Figure Legend

Function

Description

Power On/
Standby

When the system is plugged in to a power


source and powered on, pressing the button On
allows power to the OCP. Pressing the button
to Standby turns off all DC voltages except Aux
5 volts. This is a latching button.

Reset button

A momentary contact button that initializes the


system.

Power Indicator
LED

Green power OK indicator.

Fault LEDs

Programmable by software. Blink at various


console states.

Halt button

Terminates system operation. This button is a


momentary type button.

Halt LED

Halt condition (yellow).

1-8 DS20E Service Guide

Rear Panel

4
8
1
3
5
2

7
6

10

CAT0019A

System Overview 1-9

Figure Legend

Connector/Port

Description

Five 64-bit PCI


slots

For option modules such as high performance


network, video, or disk controllers.

One shared 64bit PCI /16-bit


ISA slot

For option modules such as high performance


network, video, or disk controllers.

Parallel port

To parallel devices such as a printer.

Serial port
(COM2)

Extra port to modem or any serial device.

Keyboard port

To PS/2-compatible keyboard.

Ethernet port

To network (or optional NIC card)

Mouse Port

To PS/2-compatible mouse.

Serial port
(COM1)

Standard port to modem or any serial device.

AC Power inlet

To power outlet.

SCSI breakouts

For SCSI options.

System fan 0

System fan.

System fan 1

System fan.

Universal Serial
Bus (USB)

Not supported.

1-10 DS20E Service Guide

Power Supplies
The system comes standard with two 375 watt power supplies that are connected in parallel and
can accommodate a third power supply, for redundancy. A power backplane integrates the three
supplies for power distribution, monitoring and control. All three supplies are removable and
accessible through the front of the enclosure. When a third redundant supply is present, the power
supplies can be replaced while the power is on.

CAT0043

System Overview 1-11

Storage Subsystem
The system comes with a storage subsystem that holds four 1.6-inch drives. Six 1-inch drives
will be supported in the future.

DVA00047a

Side Cover Interlock


When the system enclosure is open, the system does not operate. System power cannot be
turned on until the cover is closed. If the cover is opened while the system is running, power
shuts off immediately.

1-12 DS20E Service Guide

Removable Media
The removable media area contains the removable media bay , which accommodates one
5.25-inch, half-height tape device and a combination CD-ROM/FDD drive .

CAT0050A

System Overview 1-13

Server Features Module V2 (SFM2)


The server features module V2 (SFM2) monitors environmental conditions in the system. The
SFM2 supports the two system fans and three power supplies. The SFM2 also monitors the
state of the CPU fans on the system board. The side cover interlock is on the SFM2 at .

PK1216e

1-14 DS20E Service Guide

System Board
All memory and I/O components are on a single system board that contains a memory
subsystem, PCI bus, ISA bus, integrated SCSI F/W/U I/O controllers, and option slots for PCIbased and ISA-based option modules.

CAT0030

System Overview 1-15

PCI Options
The system has six physical, 64-bit PCI slots, one of which is a combination PCI/ISA slot. The
callouts show the PCI slot numbering on the system board.

Slot 1 supports a half-length card only.

Slots 2 through 6 support a full-length card.

Slot 6 is shared with an ISA slot.


6

CAT0046

1-16 DS20E Service Guide

CPU Modules
The system supports up to two processor modules that can be installed on the system board.
Each processor module contains a 21264 microprocessor, either 500 MHz or 667 MHz. The
21264 microprocessor is a superscalar chip with out-of-order execution and speculative
execution to maximize speed and performance. It contains four integer execution units and
dedicated execution units for floating-point add, multiply, and divide. The chip also has an
integrated instruction cache and a data cache. Each cache consists of a 64 KB two-way set
associative, virtually addressed cache divided into 64-byte blocks. The data cache is a
physically tagged, write-back cache.
The EV6 500 MHz processor module contains 4 MB secondary B-cache consisting of late-write
synchronous DRAMs (dynamic random access memory) that provide low latency and high
bandwidth. The EV67 667 MHz processor module has an 8 MB DDR (dual data rate).

Speaker
An internal speaker produces audio output for error beep codes and other audible messages.

Doors
The pedestal has an upper door and a full door. The upper door swings open for access to the
OCP and media drives. The full door provides access to the storage subsystems and hot-swap
power supplies.

System Overview 1-17

Standard Components and Features


Component

Description

Processor

The 21264 microprocessor is a superscalar, superpipelined implementation of the Alpha architecture and
runs at an optimized price:performance speed of 500
MHz and above. The chip contains a 64 KB I set
associative cache and a 64 Kb D set associative cache.
A 4MB L2 Backup Cache (Bcache) supports each 500
MHz processor. The EV67 667 MHz processor module
has an 8 MB DDR (dual data rate)

Memory

The system supports the following memory option sizes:


256MB, 512MB, and 1GB. Each option is made up of
four 200-pin DIMM modules. System memory is
expandable up to 4GB

Expansion Slots

There are six physical, 64-bit PCI slots, one of which is a


combination PCI/ISA slot. Five slots are full length. Slot
6 is shared with an ISA slot.

Diskette Drive

One 1.44MB diskette drive standard on all models

Internal Storage

The system is preconfigured with a 4-slot storage


subsystem. The 4-slot subsystem accommodates 1.6inch drives. A 6-slot cage will be available in the future.

Removable Media Storage

A modular storage system accommodates one 5.25-inch


slim-height CD/FDD combination drive and one 5.25inch half-height tape device

Network Controller

None

Hard Drives

9.1GB or 18GB 10K RPM LVD drives

Interfaces

Two serial ports and one parallel port support external


options such as printer, modem, or local terminal.

Power Supply

Two 375 watt power supplies connected in parallel are


standard. A third power supply can be installed for
redundancy.

Operating Systems

Tru64 UNIX 4.0E


OpenVMS
LINUX

1-18 DS20E Service Guide

Features

Description

Upgrades

Supports upgrades of Alpha processor to next generation, with


retention of memory, graphics memory, and hard drives.

Memory Architecture

PCI/ISA

System Architecture

Dual processor with a switch based interconnect.

Manageability

Compaq Insight Manager in conjunction with the embedded


remote management console (RMC) provides for easy system
management and lower cost of ownership.

Security

SRM console password, and lockable front panel.

Design for Easy Service

Includes easily removable single side cover for entry without


tools. Most of the components are visible when the side cover is
removed.

Flexible Packaging

Available as free-standing pedestal or rack-mountable chassis.

Hot-Plug Fans

Fans can be replaced while the system is running.

Service and Support

Protected by Compaq Services, including a three-year limited


warranty with three-year parts coverage, one year of on-site
service (second business day response) and one year of labor; 8
X 5, Monday-Friday (excluding local holidays) hardware technical
phone support, and online support through the Internet

Mechanical Specifications
Measurement/Weight
Pedestal
Depth
Width
Height
Weight

66 cm (26 in.)
22.15 cm (8.7 in.)
44.6 cm (17.55 in.)
27.3 to 40.9 kg (60 to 90 lb)

Rackmount
66 cm (26 in.)
Standard EIA 310D (RETMA)
22.15 cm (8.7 in.)
34 to 36.7 kg (75 to 81 lb)

System Overview 1-19

Electrical Specifications
Power and Voltage
Maximum input power
System input power

Specification (Each Power Supply)


380W
Rated Voltage
Range
100 to 120VAC
220 to 240VAC

Rated Input
Current
7 ARMS
3 ARMS

Operating
Frequency
47 to 63Hz
47 to 63Hz

Maximum
Inrush Current
75A
75A

Output
Range
(Min. to Max.)
3.2 to
3.4
4.80 to
5.25
11.50 to
12.60
-10.9 to
-13.20
4.75 to
5.25

Maximum PARD
Load

Output voltage
DC Outputs
+3.3V
+5.25 sV
+ 12.0V
-12.0V
+5.5 VSB

40A

50mV

42.5A

50mV

6A

150mV

1.0A

150mV

.5A

50mV

Output voltage tolerances are total tolerance at the output connector or remote sense point, as
applicable. Total tolerance shall be the sum of periodic and random disturbances (PARD), peak response
voltage, and the root sum square of all static tolerances.

1-20 DS20E Service Guide

Environmental Specifications
Dimension
Temperature range

Altitude
Acoustics

Measurement/Weight
10 to 35 C
(41 to 95 F)
Nonoperating
-40 to 66 C
(-40 to 151 F)
Operating
2000 m (6,562 ft) maximum
Nonoperating
3600 m (12,000 ft) maximum
Idle
6.5 LWAd,B (0 or 1 x HDD); 6.9 (6 x HDD)
Operating
6.5 LWAd,B (0 or 1 x HDD); 6.9 (6 x HDD)

Operating

Differences Between DS20E and DS20


Comparison

DS20E

DS20

System Board

Same functionality as DS20


system board. Onboard SCSI
controller enabled.

CPU

500 MHz CPU modules


667 MHz CPU modules
Same as DS20

Contains a memory subsystem, PCI bus,


ISA bus, integrated SCSI F/W/U I/O
controllers, and option slots for PCI-based
and ISA-based option modules
500 MHz CPU modules

Memory
Configurations

CD and Floppy

Memory can be configured from a


minimum of 256 MB (1 MS340-CA) to
4GB (4 MS340-EA).
Same as DS20
Two PCI buses, PCI-0 and PCI-1. Both
are 64-bit buses with three PCI slots
each.
Rackmount that can also be a
Deskside system that can be mounted on
pedestal system.
a shelf in a rack.
Four removable hard drives (six Seven removable hard drives. Drives not
in the future). Drives not
interchangeable between DS20 and
interchangeable between DS20 DS20E.
and DS20E.
N+1 hot swap (2 required, 3 for N+1 non-hot swap (1 required, 2 for N+1)
N+1)
Hot-swap system fans
Must be powered off and side cover
removed to access fans
Combo CD/floppy
Separate CD and floppy

OCP Display

LED display

PCI slots and hoses

Enclosure
Storage

Power Supplies
System Fans

Alphanumeric display.

Chapter

Technical Overview

The DS20E architecture features a switch-based interconnect system using a cross-bar switch
chipset that allows data to move directly from place to place in the system. Its single, large
system board, the main logic board (MLB), contains the DS20E subsystems. A separate
component, called the server features module V2 (SFM2), contains the environmental logic.
Topics in this chapter are:

System Block Diagram

CPU Subsystem

Cross-Bar Switch

Memory Subsystem

I/O Subsystem

Environmental Logic

Maintenance Bus (IC Bus)

2-2 DS20E Service Guide

System Block Diagram


The system uses Alpha symmetric multiprocessing technology. A combined system bus
bandwidth of 4+ GB/s can be realized with two 500 MHz CPUs, each with a 200 MHz 4 MB
cache and a memory bus clocked at 83.3 MHz. Peak raw bandwidth is 5.2 GB/s. The DS20E is
a one or two processor system that supports up to 4 GB of DIMM memory in several DIMM
sizes. It has two independent 256 bit wide memory buses. The system architecture is shown in
the diagram and described in the following sections.

Technical Overview 2-3

P-Chips
The P-chips are the PCI interface chips of the Tsunami core logic chipset. There are two 33MHz 64-bit PCI implementation P-chips:
The P-chip has a cycle time of 10 ns for the system interface and a cycle time of 30 ns for the
PCI interface. It is able to run a 30 ns PCI bus with a 12 ns to 15 ns system interface. It has the
following interfaces:

PCI busA single 64-bit PCI implementation running at 33 MHz.

PCI central resource functionsArbitration and PCI clock sourcing.

D-chip port to the PADbus40 bits for 4 bytes of data plus check bits. In standard mode,
the P-chip receives 4 bytes of data and their 4 associated check bits each cycle (36 pins
used). To support a system with eight D-chips, the P-chip has an additional mode where it
receives 8 bytes over two cycles, but receives all 8 associated check bits in one cycle (40
pins used). Quadword-based transfers are always used because that is the unit on which
the ECC is calculated.

C-chip command and address port to the CAPbusIt takes two cycles (20 ns minimum)
to transfer a command and address in either direction.

Test, reset, clocks.

C-Chip
The C-chip is the control chip of the Tsunami core logic chipset. It provides the interface to the
CPU, the main memory, and the I/O subsystem.

D-Chips
D-chips are the data chips of the cross-bar switch. The system has eight D-chips. They provide
the interface with the memory data bus, the SysData bus, and the I/O subsystem data bus.

Bcache
Each 500 MHz processor module contains 4 MB of secondary Bcache (backup cache). Each
667 MHz processor module contains 8 MB of DDR cache (dual data rate). The Bcache consists
of late-write synchronous dynamic random access memory (SDRAMs) that provide low latency
and high bandwidth.

Memory DIMMs
The DS20E supports up to four banks of memory on the system board. Each bank contains four
slots with a total of 16 slots on the system board. The system uses 200-pin, buffered,
synchronous, dual in-line memory modules (DIMMs).

2-4 DS20E Service Guide

CPU
The CPU is a 21264 Alpha chip with the following features:

500 MHz or 667 MHz operating frequency

1800MB/s McCalpin STREAM

DVD Encode in real-time

Pipeline organization

Four instructions mapped per cycle

q
q
q

- Out-of-order execution
- Quad integer execution
- Dual floating-point execution

Motion video instructions

Tournament predictor

Dynamic JSR/JMP predictor

64 KB I-cache 2-set

64 KB D-cache 2-way set associative

CMOS6 0.35u, 2.0V, 6LM

Four major interfaces:

q
q

- Bcache interface. Supports external data and tag stores; data path 128 bits (16 bytes)
wide, 16 bits of ECC. Tag size is 14 bits of address, 4 bits of control (including parity
bits). The EV6 cache line is 64 bytes.
System interface

- Bidirectional SysData bus


- Unidirectional SysAddOut bus
- Unidirectional SysAddIn bus

Clock interface

- Internal PLL
- Y-Divisor

SROM interface

Technical Overview 2-5

CPU Subsystem
The CPU subsystem is a module that plugs into the main logic module and consists of the Alpha
processor and its board-level cache (Bcache). The DS20E system supports the EV6 and EV67
processors at various speeds. The EV6 and EV67 share a common pin interface to the system
and use the same 587-pin CPGA package.
The DS20E supports up to two processor modules. Each processor module contains a 21264
microprocessor. The 21264 microprocessor is a superscalar, superpipelined implementation of
the Alpha architecture with out-of-order execution and speculative execution to maximize speed
and performance. It runs at an optimized price/performance speed of 500 MHz or 667 MHz. It
contains four integer execution units and dedicated execution units for floating-point add,
multiply, and divide. The chip contains a 64 KB I set-associative cache and a 64 KB D set
associative cache. Support for a larger L2 cache is provided by a private Bcache.

NOTE: If two CPUs are installed, the first CPU must be in CPU slot 0, and both
CPUs must have the same Alpha chip clock speed selected.
The EV6/EV67 chip has the following four interfaces to external logic:

Bcache Interface
This interface supports external data and tag stores. The data path is 128 bits (16 bytes) wide
with 16 bits of ECC (8 bits per 8 bytes). The tag size supported in the DS20E system has 14 bits
of address and 4 bits of control (including parity bits). The 22-bit index is used to address the
data store and tag store. The EV6/EV67cache line is 64 bytes.

Clock Interface
The EV6/EV67 has an internal phase-locked loop (PLL) that it uses to generate all its internal
and external clocks. To drive the PLL, a clock signal (CLKIN) is provided by the system. The
system also multiplies CLKIN to produce the internal EV6/EV67 clock (GCLK). GCLK is used
for all internal clocking. It is also used to derive the clocks for the Bcache interface and the
system interface, both of which use forwarded clocks (BC_CLK and FWD_CLK).

SROM Interface
The SROM interface has two uses:

To read a serial bit stream of initialization information and code after power-up reset

To communicate with an RS232 style serial port as a backdoor console mechanism

2-6 DS20E Service Guide

System Interface
The system interface consists of the following:

SysData bus

Bidirectional
64 bits wide
8 bits of ECC

This bus uses forwarded clocking in either direction. There is a clock


in each direction for each group of 16 SysData + 2 SySECC signals.
The data is transferred on either edge of the forwarded clock;
therefore, if the FWD_CLK is 166.66 MHz, the data-rate or bit-rate on
this bus is 333.33 MHz.
SysAddOUT bus

Unidirectional from EV6 to the system


15 bits wide with clock

This bus is used to send a command/address packet to the system.


The clock is the same rate as the FWD_CLK.
SysAddIN bus

Unidirectional from the system to the EV6/EV67


15 bits wide with clock

This bus is used to send response/probe packets from the system to


the EV6/EV67. The forwarded clock scheme is necessary to produce
the extremely high data rates. For a detailed description of the
forward clocking scheme, refer to the EV6/EV67 specifications and
the Tsunami specifications. This point-to-point interface is terminated
on both ends.

Technical Overview 2-7

Cross-Bar Switch
The system switch has the following performance features:

Supports 21264 CPUs

Supports two 64-bit, 33-MHz PCI buses, each with its own PCI address space

Supports a large range of main memory capacity using 16 MB or 64 MB synchronous


DRAMs

Provides low-latency memory access (120 ns CPU access using 83-MHz DRAMs)

Supports ECC in main memory

Provides system clock periods of 12 ns

Provides very high bandwidth

The cross-bar switch consists of the following components:

D-chip (data slice)


The data chip provides the interface between the memory data bus and the EV6/EV67 data bus
and the I/O subsystem data bus. Each D-chip accommodates 18 bits of EV6/EV67 data, 72 bits
of memory data and 18 bits of I/O data. The DS20E system uses eight D-chips to produce two
256-bit (+ECC) memory data buses, four 72 bit EV6 data buses, and two 36-bit I/O data buses.
The D-chip has the following interfaces:

Two memory-bus data ports operating as a single 72-bit wide port

Four CPU data ports

Two P-chip data ports to the P-chip and D-chip bus (PADbus)

Control from C-chip (CPM/PAD) (The D-chip receives all of its commands from the Cchip.)

Test, reset, and clocks

2-8 DS20E Service Guide

C-chip (controller chip)


The controller chip provides the interface with the EV6/EV67 command/address buses,
implements the main memory control, and provides the interface to the I/O subsystem. All
system control and addressing information passes through this chip. The C-chip also provides an
interface to the TIG bus, which is used for accessing flash ROM, I/O system interrupts, and
some general purpose system I/O registers.
The C-chip has the following interfaces:

Two independent CPU SysAdd ports

Four independent DRAM command and address ports

One bidirectional P-chip command and address port

One D-chip control port

TIG bus port for interrupts, flash ROM, etc.

Miscellaneous test, reset, and clock interfaces

P-chip (peripheral interface chip)


The PCI interface chip uses the CAP bus as its interface to the C-chip for address control
information and the PAD bus as its interface to the D-chips for data. The interface at the other
end of this chip is a 64-bit/33-MHz PCI bus. The Tsunami chipset can be configured to have
one or two P-chips. Because each P-chip needs a 36-bit PAD Bus, at least four D-chips are
needed to support two P-chips. The DS20E system uses two P-chips to produce two 64-bit PCI
buses (PCI-0 and PCI-1), both of which operate at 33 MHz.
The P-chip has a cycle time of 10 ns for the system interface and a cycle time of 30 ns for the
PCI interface. It is able to run a 30 ns PCI bus with a 12 ns to 15 ns system interface.
It has the following interfaces:

PCI busA single 64-bit PCI implementation running at 33 MHz.

PCI central resource functionsArbitration and PCI clock sourcing.

D-chip port to the PAD bus40 bits for 4 bytes of data plus check bits.

C-chip command and address port to the CAP busIt takes two cycles to transfer a
command and address in either direction.

Test, reset, and clocks.

Technical Overview 2-9

Memory Subsystem
The system main memory is a synchronous DRAM-based memory with a maximum clock rate
of 83.33 MHz. The DRAMs are mounted on 200-pin JEDEC standard DIMMs. Each DIMM
supports a 72-bit data bus (64 bits of data and 8 checkbits). The checkbits provide single-bit
correction/double-bit detection across each 64 bits. The system supports 16 DIMMs arranged in
four arrays of four DIMMs each. Memory is organized on two 256-bit plus ECC-bit buses. Each
bus can support up to two memory banks (a memory option) made up of four DIMMs. Memory
can be configured from a minimum of 256 MB (1 MS340-CA) to 4 GB (4 MS340-EA).
The two memory buses transfer data between the cross-bar switch and main memory. Each
DIMM bank provides 256 bits of data plus 32 ECC bits for the 32 bytes of data transferred. Two
modules in each bank provide the odd bytes of data and the other two modules provide the even
bytes of data.
The interface to the main memory subsystem occurs by means of the cross-bar switch. Each
array has a unique address port from the cross-bar C-chip. A four-DIMM array provides a data
bus width of 256 bits (+ECC). The DS20E uses two four-DIMM arrays.
For the memory subsystem to work properly, you must follow configuration rules and use only
qualified DIMMs.

Memory Configuration Rules

A memory option consists of four DIMMs.

All four DIMMs in an option must be the same size.

The largest memory option goes in bank 0.

Memory options are installed in the following bank order: 0, 1, 2, and 3.

Other memory options must be the same size or smaller than the first memory option.

Only DIMMs qualified by Compaq are guaranteed to work.

Use DIMMs made by the same supplier in all slots.

Holes in memory are supported

Qualified DIMMs
The following DIMMs are qualified; others may be added in the future.

Sales
Part Number
FR-MS340-CA
FR-MS340-DA
FR-MS340-EA

Description
256MB ECC Memory DIMMs (4x64MB)
512MB ECC Memory DIMMs (4x128MB)
1GB ECC Memory DIMMs (4x256MB)

2-10 DS20E Service Guide

Typical Memory Configurations


Array 0

Array 1

32MB x 4
32MB x 4
32MB x 4
64MB x 4
64MB x 4
128MB x 4
128MB x 4
256 MB x 4
256MB x 4
256MB x 4

32MB x 4
32MB x 4
64 MB x 4
64MB x 4
64MB x 4
128MB x 4
128MB x 4
128MB x 4
256MB x 4
256MB x 4

Array 2

Array 3

Total

256MB
512MB
640MB
512MB
64MB x 4 64MB x 4 1GB
1GB
128MB x 4 128MB x 4 2GB
1.5GB
2GB
256MB x 4 256MB x 4 4GB
32MB x 4
64MB x4

32MB x 4

I/O Subsystem
The DS20E I/O subsystem comprises the following I/O bus interfaces along with the Cypress
Bridge chip:

PCI

ISA

TIG

PCI Interface
This interface consists of two PCI buses, PCI-0 and PCI-1. Both are 64-bit buses with three PCI
slots each, but PCI-0 also connects to a Cypress chip and an Adaptec SCSI controller. The
Cypress bridge chip (also referred to as the CY82C693U or simply the 693U) is a multifunction
Type 0 device. The 693U is on the PCI-0 bus and implements the following important
functions:

ISA Bridge
This chip implements a bus that performs most of the functions of an ISA bus. The chip is used
primarily for legacy purposes and is not expected to be used for any new devices. The Cypress
bridge implementation is not Intel SIO compatible, but it includes direct memory access (DMA)
and an interrupt controller.

Keyboard/Mouse
These interfaces are ISA-based and 8242 compatible. An 8051 style microcontroller with a
built-in ROM is present. The DS20E system implements PS/2 style keyboard/mouse ports that
are transparent to existing drivers for such ports.

Real-Time Clock
The Dallas 1287A compatible real-time clock (RTC) interface that is implemented in the
Cypress bridge (693U) is transparent to existing drivers. A square wave output is provided by

Technical Overview 2-11

the Cypress bridge and is routed to the Tsunami C-chip to generate an RTC interrupt to the
processor. Routing the RTC interrupt through ISA interrupt request priority 8 (IRQ8) can be
masked under program control.

Interrupt Controller
This dual-stage interrupt controller program is compatible with the one used in the Intel SIO
bridge. This controller can accept 16 edge-triggered interrupt requests (IRQs) from the ISA
sources (including internal ISA sources) and four PCI level interrupts. (The PCI interrupt inputs
are not used in the DS20E system.) The Cypress bridge gathers all these interrupts and
compares them to a programmable mask to produce a level interrupt signal INTR to present to
the system interface.

Enhanced IDE
This two-channel PCI-based enhanced IDE interface operates at up to 16.67 MB/s (Type 4
transfers) and implements bus mastership. The DS20E system uses only one of the two
channels, allowing a maximum of two IDE drives in the system.

Power Management Support


This feature of the Cypress bridge implements all the advanced configuration and power
interface(ACPI) registers and generates a system management interrupt (SMI) on various
programmable wake-up conditions. A sleep mode is implemented that allows parts of the chip to
have no power while the remaining parts have power through special auxiliary power pins.

ISA Interface
This interface provides a link to legacy ISA options. The Cypress PCI-ISA bridge provides the
interface between PCI-0 and this ISA bus. The ISA bus supports the following components:

Super I/O Chip


This 5V chip (FDC37C669) provides support for the following legacy functions:

Serial Ports: The Super I/O chip contains two 16550-compatible UARTs that provide 16byte send/receive FIFOs. The maximum achievable baud rate on this port is 230 K. It has
a programmable baud rate generator and modem control circuitry. The port is fully
compliant with legacy ISA standards.

Multimode Parallel Port: This port supports both standard, enhanced, and extended
modes of operation. In standard mode, it is a PS/2, PC/AT compatible, bidirectional
parallel port. In enhanced mode (EPP), it is an IEEE1284 compliant interface. In extended
high-speed mode, it is an extended capabilities port (ECP) that is also IEEE1284
compliant.

Floppy Disk Interface: This is a 2.88 MB Super I/O Floppy Disk Controller that is
software and register compatible to the 82077A. It supports two floppy drives directly and
a vertical recording format (VRF). It has a 16-byte data FIFO and detects all
overrun/underrun conditions. It has direct memory access (DMA) enable logic and a nonburst DMA option. It is IBM compatible.

2-12 DS20E Service Guide

ISA Option Slot


The ISA options slot is provided for ISA options such as internal FAX modems.

Timing Interrupt and General (TIG) Interface


The TIG interface is a private 8-bit data/address bus off the C-chip. It includes some control
signals and is used to access the flash ROM, general purpose I/O registers, and interrupts.
The TIG bus is controlled by the cross-bar C-chip. It is used for gathering and sending interrupts
to the EV6/EV67 processor and for accessing I/O space in what is defined as TIG address space.
The TIG bus is operated asynchronously with programmable timing for accesses available in the
C-chip CSRs.

Technical Overview 2-13

Flash ROM
The flash ROM contains the diagnostics, a fail-safe loader, and console firmware. It sits on the
TIG bus and interfaces with the system through the cross-bar C-chip.

Configuration Registers and Switchpack


The CSR registers and switchpack data include module information, interrupts, and clock
information.

IRQs
System interrupts for PCI devices and all onboard devices, including the CPUs and memory, are
passed through the TIG bus to the cross-bar C-chip.

2-14 DS20E Service Guide

Environmental Logic
The server features module V2 (SFM2) monitors environmental conditions. This module
supports two system fans and three power supplies in an N+1 configuration. The power supplies
may be hot-swapped. The SFM2 also monitors the state of the CPU fans.

Technical Overview 2-15

SFM2 Status LEDs


The SFM2 has 7 status LEDs. The VAUX5 LED is lit whenever AC power is supplied to the
system.
1

POK0

POK1

POK2

SYS FANS OK

CPU FANS OK

TEMP OK
PK1216c

SFM2 Status LEDs


#

On SFM2 Module

Description

POK0

When lit, indicates power supply 0 has passed its selftest and is running okay.

POK1

When lit, indicates power supply 1 has passed its selftest and is running okay.

POK2

When lit, indicates power supply 2 has passed its selftest and is running okay.

SYS FANS OK

When lit, indicates both system fans (fans 0 and 1) are


operating.

CPU FANS OK

When lit, indicates CPU fans are operating. If a CPU


fan fails, it will result in a fatal system error and the
system will shut down in 30 seconds or less.

TEMP OK

Indicates the system temperature is below 55 C.

SFM2 Power Supplies


The DS20E is designed to operate with two power supplies even when fully configured. A third
supply can be added for redundancy. Each power supply indicates its status by asserting POK
when it is operating properly. If a power supply is removed or fails, its POK signal is removed
and the change is detected by the PAL the next time the power is cycled. If the loss of a power
supply results in an invalid power supply configuration, the system will not power up.

2-16 DS20E Service Guide

SFM2 Inverter
Inverts the PSn OK signals sent to the LEDs to light the LED when the power supply is good.

SFM2 PAL
The function of the SFM2 PAL is to monitor the power environment to determine whether or
not the power supplies can be enabled. It also determines the power supply configuration and
signals a shutdown if the configuration is invalid.
To enable the power supplies, PS_EN is sent to all power supplies. It is generated from signals
produced by the following:

On/Off switch

Door interlock

Environmental fault shutdown signal

Disable remote shutdown switch setting,

Power enable from the remote management console (RMC) controller.

The state machine cycles as follows:


1. From the reset state (STA_0) to the delay state (STA_1) to allow the power supplies to
respond
2. From the delay state (STA_1) to the powered state (STA_3)
3. From the powered state, the state machine can go directly back to STA_0 (normal
shutdown) or through STA_2 to STA_0 for abnormal shutdowns. For abnormal
shutdowns, the state machine will wait in the STA_2 state until the power is cycled off
and on.
To determine if the power supply configuration is valid, the PAL monitors the POK signals
from each power supply. PS_OK is asserted if there are at least two good power supplies.
To determine which power supplies are present, the PAL monitors the POK signals from each
power supply. When a POK is initially asserted high, the corresponding PS_PRESENT is
asserted and remains asserted until the ON/OFF switch is cycled or AC power is interrupted.
PS_PRESENT is a state bit that represents whether a particular POK has ever been asserted
during a particular powered-up session.

SFM2 System Fans Sense Logic


The system is configured with two system fans. Both fans should be installed and operating for
optimal system cooling; however, if one fan fails, the system will continue to function with the
other fan running. The system fans sense logic monitors the system fans, and if both fans are
operating, this logic asserts SYS_FAN_OK_L. If one or both fans fail, the system fans sense
logic asserts SYS_FANFAIL_L. Scan logic common to the system fans and the CPU fans
determines which of the system or CPU fans failed.

Technical Overview 2-17

SFM2 CPU Fans Sense Logic


Each CPU has a fan to help cool the Alpha processor chip. These fans are monitored by the
CPU fans sense logic, which also monitors the CPU Present signals to determine which CPUs
are present. If the fan for each CPU that is present is operating, this logic asserts
CPU_FANS_OK_L. If a fan fails, the CPU fans sense logic asserts CPU_FANFAIL_L.
CPU_FANS_OK_L is sent to the 30-second error shutdown logic and is used to cause a
shutdown if it becomes non-asserted. Similarly, CPU_FANFAIL_L is used to power the LED as
long as it remains non-asserted. The CPU and system fans sense logic together assert
FAN_FAIL_L if either a CPU or a system fan fails. Scan logic common to the system fans and
the CPU fans determines which of the system or CPU fans failed. This information is sent to the
IC logic.

SFM2 30-Second Shutdown


Fault shutdown normally occurs when a CPU fan fails, the internal ambient temperature exceeds
a preset threshold (set to 50 or 55 C) or a power supply failure results in an invalid power
supply configuration. Invalid power supply configuration means there is only one good power
supply remaining. It can take multiple cycles of the 30-second shutdown timer for the power
supply configuration to result in a final shutdown. Overcurrent sensing in the power supply
protects the system from damaging itself and will shut off power more quickly if necessary.

SFM2 Temperature Sensor


This circuitry senses the internal ambient temperature and asserts TEMP_OK if the temperature
remains below a preset threshold set by the firmware or software (usually set to 50 or 55 C).

SFM2 Remote Management Controller Microprocessor


The remote management controller (RMC) microprocessor allows the system to be managed
from a remote terminal. The RMC micro generates control signals in response to commands
from the remote console. Among these control signals is SYSPWR_ENABLE_L, which is used
to power the system on and off remotely. Other functions of this circuit include:
2

Interfacing with the I C bus

Handling remote reset and halt requests

Transferring of serial data over the communications port

2-18 DS20E Service Guide

Maintenance Bus (IC Bus)


There are two maintenance (IC) buses. The standard or internal IC bus performs the following
functions:

Monitor system conditions

Display faults

Log error state

Track system configuration information

The private IC bus between memory and the C-chip is used to provide memory configuration to
the consoles and operating systems.

Technical Overview 2-19

Monitoring System Conditions


The IC bus monitors the state of system conditions scanned by the power control logic. The
PC logic writes data to two registers:

One records the state of the fans and power supplies and is latched when there is a fault.

The other causes an interrupt on the IC bus when a CPU or system fan fails, an
overtemperature condition exists, or power supplied to the system changes from N + 1 to
N or from N to N +1.

The interrupt received by the IC bus controller and passed on to P-chip 0 alerts the system of a
power system event that may or may not cause a power shutdown. If power loss is imminent the
controller has 30 seconds to read the two registers and store the information in the NVRAM on
the server features module. The SRM console show power command reads these registers.

Fault Display
The OCP display is written by means of the IC bus.

Error State
Error state is logged by the IC controller. The error state for power, fan, and overtemperature
conditions are stored for access when there is a fault.

Configuration Tracking
Each CPU and each logical section of the system board (the PCI bridge, the PCI backplane, the
power control logic, the remote console manager), and the system board itself has an EEPROM
that contains information about the module that can be written and read over the IC bus. All
EEPROMs contain the following information:

Module type

Module serial number

Hardware revision for the logical block

Firmware revision

Chapter

System Installation

Introduction
This chapter explains how to install a DS20E system and boot the operating system. Topics
in this chapter are:

Preparing for Installation

Positioning the System

Connecting the System

Verifying Hardware Installation

Updating the Firmware

Locking the System

Installing a Rackmount System

Starting a Tru64 UNIX Installation

Booting Tru64 UNIX

Installing OpenVMS

Booting OpenVMS

Installing Linux

Booting Linux

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-2 DS20E Service Guide

Preparing for Installation


System Inventory
For a new installation, check to make sure you have the components listed on the shipping
list that came with the system, and note the items for later reference.
For a reinstallation, make sure that all the components for this system configuration are
available.

Preinstallation Checklist
Before you install the system, perform the following checks:
1.

Review the information supplied with the system, including user documentation.

2.

Select a well-ventilated site for the system near a grounded power outlet and away from
sources of excessive heat. The site should also be isolated from electric noise (for example,
spikes, sags, and surges) produced by devices such as air conditioners, large fans, radios,
and televisions.
WARNING: When unpacking and moving system components, be aware
that some components may be too heavy for you to lift alone safely. If you
are doubtful about whether you can lift these items alone, please get
assistance.

CAUTION: Remove any removable media (for example, a CD disk) in the


drives before moving the system, to prevent damage to the media or the
drives.

3.

Save all shipping containers and packing material for repackaging or moving the system
later.

NOTE: Before installing optional hardware or application software, start the


system and verify that the base system is working correctly.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

Positioning the System


CAUTION: To ensure proper cooling, position the system so that air can flow freely
to and from the vents.

Keep in mind the environmental conditions, the power requirements, and the clearance
needed to access the system for servicing.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-3

3-4 DS20E Service Guide

Connecting the System


Connect all devices as shown.
NOTE: All connectors are keyed and have icons to indicate the type of device

to be connected.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-5

Verifying Hardware Installation


Perform the following steps to start the system:
1.

Plug the power cord into the system and then into the wall outlet.

2.

Apply power to any external devices, including the monitor.

3.

Press the system unit power button.

4.

After waiting for the monitor to warm up, if necessary, adjust the contrast and brightness to
obtain a readable screen display.

5.

See the information supplied with the monitor for adjustment instructions.

6.

Allow the system to complete the power-on self-test (POST) and device initialization.
(This takes about one minute.)

The POST firmware runs basic hardware tests on the following system components to make
sure the operating system firmware can start:

Memory

Cache

PCI data path

ISA data path

Flash ROM

During initialization, LED and beep codes show the current status and indicate initialization
problems. Initialization occurs during the power-up sequence or when an SRM init
command is issued.
If the system completes the POST and device initialization with no errors, the system was
correctly installed. If errors occur, refer to the Troubleshooting chapter for troubleshooting
procedures.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-6 DS20E Service Guide

Shutting Down the System


Before turning off the system, save and close all open files according to the steps outlined
for the specific operating system. If you turn the system off without saving and closing
files, you might lose some or all of the work in process.
CAUTION: Do not turn off power to the system or peripherals until the shutdown
sequence is complete.

WARNING: Always disconnect the power cord from the wall before servicing
the system.

Shutting Down the Tru64 UNIX Operating System


Close any open application data files as well as any running applications. Most application
programs prompt you to save the information before closing.

NOTE: You must be a superuser to shut down the system.


1.

Open a terminal window.

2.

Type shutdown -h now and press Enter.

3.

The system returns to the SRM console.

4.

The system displays the prompt P00>>> when it is safe to turn off the power or restart the
system.

5.

To turn off the power, press the system unit power button.
WARNING: Always disconnect the power cord from the wall before
servicing the system.

Shutting Down the OpenVMS Operating System


Close any open application data files as well as any running applications. Most application
programs prompt you to save the information before closing.
To shut down the operating system, enter: @sys$system:shutdown

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

Updating the Firmware


The system flash ROM contains the power-on self-test (POST), AlphaBIOS console
firmware, and the SRM console firmware.
To update the firmware, see the Updating Firmware and Device Drivers section of the
Firmware chapter.
Consult the upgrade documentation for more information.

Locking the System


Systems have a key lock that is located on the front door to prevent unauthorized access.
The removable media devices and the system control panel are accessible through the upper
front door that opened by sliding down the lock latch shown.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-7

3-8 DS20E Service Guide

Installing a Rackmount System


This section contains information for installing the DS20E server into the H9A10/H9A15
rack. Consult the documents listed below, if needed.

DS20E Rackmount Documentation


Title

Order Number

Rackmount Installation
Template

EK-DS20E-TP (included in 3X-BA56RRC/RD/RA)

H9A10 M-Series Cabinet


Interconnect

B-IC-H9A10-5-DBM

H9A10 M-Series Cabinet


Configurations

B-IB-H9A10-5-DBM

H9A10 M-Series Illustrated


Parts Breakdown

EK-H9A10-IP

H9A15 M-Series Interconnect

B-IC-H9A15-3-DBM

H9A15 M-Series Configurations

B-IB-H9A15-3-DBM

H9A15 M-Series Illustrated


Parts Breakdown

EK-H9A15-IP

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-9

Marking the Installation Area in the Rack


Determine the installation area as shown in the illustration.

0.500 inch
0.625 inch
0.625 inch

1U
(1.75 inches)

0.500 inch

PK1221

Figure 3-1. Rackmount Installation Area

The installation of the rackmount system requires 8.75 inches (5U) of vertical height
in the rack.
1.

Mark the midpoint hole on the vertical rail as shown in Figure 3-1. The midpoint hole
must be selected so that the holes immediately above and immediately below are
equidistant (.625 inches).

2.

Mark the corresponding hole on the other three rails.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-10

DS20E Service Guide

Rack Accessories

2
3
1
6

3
4

6
2

3
5
1

1
6

1
2

Figure 3-2. M-Series Rack Accessories

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

PK0967

System Installation

Accessories List
Reference
Number

Mounting Hardware

Vertical nut bar

10-32 x .375-inch hex head screw

Bracket slide, right

Chassis slide

Nut plate, horizontal, slide

Screw, M4 x 10 mm, Bossard

Bracket slide, left

Bar nut

Screw, flat head, M3 x 6 mm

Mounting rail, EIA (bars)

Front bezel

Actuator bracket, interlock

M5 x 8mm pan head, square cone washer


Nut keps, M4

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-11

3-12

DS20E Service Guide

Preparing the System


To prepare the system for installation, attach the mounting brackets to the chassis and
attach the slide brackets to the slides.

Attaching Mounting Brackets

CAT0152

Figure 3-3. Attaching Mounting Brackets and Slides

CAUTION: The slides are lightly greased. Handle them carefully


to avoid soiling your clothing.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-13

1.

Attach the front mounting brackets along each edge, using three flat head Phillips
screws per bracket.

2.

Pull the narrow segment of the slide out and detach it completely by pressing the green
release button and continuing to pull.

3.

Attach the narrow segment of the slide to the system with five M4 x 10, Bossard screws.

4.

Repeat the procedure for the other slide.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-14

DS20E Service Guide

Attaching Slide Brackets to Slides


The illustration below provides steps for attaching the slide brackets to the slides.

7
6
4
5
3
3

1
2

CAT0160A

Figure 3-4 Attaching Slide Brackets to Slides.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-15

The sliding segment of the slide has an access hole that provides access to three
mounting holes in the stationary segment. You use two of the mounting holes.
Front
1. Insert a cap screw through the access hole and the first (forward-most) mounting hole
in the slide and through the hole in the slide bracket. Fasten with one two-hole nut
bar on and tighten.
2.

Align the access hole with the third mounting hole in the slide.

3.

Insert a cap screw through the access hole and the third hole in the slide and through the
slot in the slide bracket. Fasten through the nutbar and tighten.

Back
1.

Insert a screw through the two holes in the stationary segment of the slide and
through a slot in the slide bracket. Attach to a two-hole nut bar .

Repeat the entire procedure for the other slide.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-16

DS20E Service Guide

Preparing the Rack


Prepare the rack by attaching the slides to the rack rails.

Back
1

2
2
6

Front

5
4
4
2
CAT0161A

Figure 3-5. Attaching Slide Brackets to Rails

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-17

Attaching Slide Brackets to Rails


Front
1. Starting at the top marked hole put two hex screws through the rack rail and the slide
bracket . Fasten with a 2-hole nut bar .
2.

Fit the posts of a 2-post nut bar into the holes in the cabinet rail and slide bracket
and fasten with nuts .

3.

Repeat the procedure for the other rail.

Back
1. Starting at the top marked hole put two hex screws through the rack tail and the slide
bracket . Fasten with a 2-hole nut bar .
2.

Fit the posts of a 2-post nut bar into the holes in the cabinet rail and slide bracket and
fasten with nuts .

3.

Repeat the procedure for the other rail.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-18

DS20E Service Guide

Stabilizing the Rack

PK0213

Figure 3-6. Activating the Stabilizer Foot

The system is intended for installation in one of the following racks, which are equipped
with a stabilizer bar:

H9A10 M-Series Medium Rack

H9A15 M-Series Tall Rack

Pull out the stabilizer bar and extend the leveler foot to the floor before installing the
system.
If you are using a rack other than those listed above, install rack stabilizing feet or provide
other means to stabilize the rack before installing the system.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-19

Installing the System

CAT0153

Figure 3-7. Installing the System into the M-Series Rack

1.

Extend the fixed portion of the chassis slide until you hear a click. Ensure that the inner
ball bearing slide on the chassis slide is pulled to the front of the rail.

2.

Align the narrow segment of the slides attached to the system with the slides attached to the
rack, and slide the system onto the rail.

3.

Depress the green release button on each side and slide the system completely into the rack.
WARNING: Make sure that all other hardware in the rack is pushed in and attached.
The system is very heavy. Do not attempt to lift it manually. Use a material lift or
other mechanical device.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-20

DS20E Service Guide

Installing U-Nuts
Install U-nuts and shipping screws as follows:

1
CAT0157B

Figure 3-8. Attaching the System to the Rack

1.

Install U-nuts at marked locations for two shipping screws.

2.

Install two 10-32 x .500-inch hex head shipping screws and tighten.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

Installing the Interlock System


The interlock system ensures rack stability by allowing only one rackmount server at a time
to be pulled out of the rack. The stabilizer bracket and actuator latch work only in a rack
equipped with the interlock system. Follow the instructions on the next page for installing
the interlock system.

4
6
2
6

5
6

7
6

3
6

1
6
PK0965

Figure 3-9. Installing the Interlock System


CAUTION: If you are installing a rack that does not have the
interlock system, you must ensure rack stability by installing rack
stabilizing feet or by some other means.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-21

3-22

DS20E Service Guide

1.

At the back of the rack, release the vertical bar of the interlock system.

2.

Insert the stabilizer bracket and the actuator latch into the vertical bar so that the
actuator latch is below the stabilizer bracket.

3.

Reinstall the vertical bar.

4.

Secure the stabilizer bracket to the two remaining marked holes on the right rack rail with
two 10-32 x .500-inch hex screws . Tighten into the u-nuts.

5.

Install the trip mechanism onto the chassis using two M5 x 8 mm screws .

6.

Vertically position the actuator latch such that the trip mechanism
aligns with the actuator latch.

7.

Rotate the actuator latch to orient it like the other actuator latches on the vertical bar.

8.

Tighten the Allen screws on the actuator latch.

on the system

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

3-23

Installing the Cable Management Arm


Attach the cable management arm to the rear rails of the rack as shown below.

1
6

2
6

3
6

PK0966

Figure 3-10. Installing the Cable Management Arm


NOTE: Be sure that you have attached all cables to the rear of the unit
before installing the cable management arm.

1.

Clip U-nuts over the holes in the vertical rail corresponding to the holes in the cable
management bracket.

2.

Attach the cable management bracket to the rack with two 10-32 x .5-inch screws .

3.

Attach the cable management bracket to the chassis with two M3 x 6 mm screws .

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-24

DS20E Service Guide

Dressing the Cables


Dress the cables through the cable clamps on the cable retractor assembly at the rear of the
system.

PK1223

Figure 3-11. Dressing the Cables

1.

Dress the cables through the cable clamps or tie wrap them to the cable retractor assembly.

2.

Attach all cables to the member of the cable management arm that is attached to the
system.
CAUTION: Failure to attach the cables to the attached member of
the management arm may cause cables to become disconnected.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

System Installation

Attaching the Front Bezel


To complete the installation, attach the front bezel as shown below.

CAT0157

Figure 3-12. Attaching the Front Bezel

1.

Align the front bezel with the front of the system and snap it into place.

Compaq Confidential Need to Know Required


Writer: Bob Young Project: Compaq AlphaServer DS20E Comments:
Part Number: xxxxxx-xxx File Name: DS20E_svc3.doc Last Saved On: 2/7/00 3:44 PM

3-25

3-26

Compaq AlphaServe r DS20E

Starting a Tru64 UNIX Installation


To start an installation of Tru64 UNIX, follow the steps in the table below.

NOTE: The SRM console must be running to install Tru64 UNIX.


Step Action
1
At the SRM prompt, type:

Result
Clears the boot_osflags variable.

P00>>>set boot_osflags " "

P00>>>set auto_action halt

Halts the system at the console prompt each time


the system is turned on, crashes, or when the Reset
button is pushed.

At the SRM prompt, type:

Sets the operating system to Tru64 UNIX.

At the SRM prompt, type:

P00>>>set os_type unix


P00>>>init

4
5

Insert the Tru64 UNIX CD into the


CD-ROM drive.

The CD-ROM drive is ready.

At the SRM prompt, type:

List of devices is displayed:

P00>>>show device

dkc0.0.0.8.0 DKC0 SEAGATE ST39102LC 7B04


dqa0.0.0.105.0 DQA0 TOSHIBA CD-ROM XM1702B 1150
dva0.0.0.0.0 DVA0
eia0.0.0.2005.0 EIA0 00-06-2B-00-6E-56
pka0.7.0.6.0 PKA0 SCSI Bus ID 7
pkb0.7.0.106.0 PKB0 SCSI Bus ID 7
pkc0.7.0.8.0 PKC0 SCSI Bus ID 7

From the SRM console, boot the


Tru64 UNIX CD:
P00>>>boot dka400

Installation information is displayed, and you are


prompted to select an option. For more information,
see the Tru64 UNIX installation guide.

System Installation

Booting Tru64 UNIX


Before booting the operating system for the first time, you may have to perform some of the
following tasks:

Verify that the console firmware version is correct.

Use the show and set console commands to check and set the required environment
variable. See the firmware chapter for more information about these commands.

Change the system startup or boot defaults.

Ensure that settings for environment variables match the system configuration.

Verifying the Firmware Version


The show version command displays the version of the SRM console that is installed on the
system. An example of the show version command is shown below. The show config
command includes console firmware version information as shown in the following partial
show config command example. Use this information to verify that the correct version of
firmware is installed.
P00>>>show version
version
P00>>> show config
SRM Console:
PALcode:
Processors
CPU 0
CPU 1
Core Logic
Cchip
Dchip
Pchip 0
Pchip 1

V5.4-4481 Aug 11 1999 11:41:48

AlphaPC 264DP 500 MHz


V5.4-x
OpenVMS PALcode V1.42-32, Digital UNIX PALcode V1.40-35
Alpha 21264-3 500 MHz SROM Revision: V1.82
Bcache size: 4 MB
Alpha 21264-4 500 MHz SROM Revision: V1.82
Bcache size: 4 MB
DECchip
DECchip
DECchip
DECchip

21272-CA
21272-DA
21272-EA
21272-EA

Rev
Rev
Rev
Rev

2
2
2
2

3-27

3-28

Compaq AlphaServe r DS20E

Changing Startup and Boot Defaults


Default settings cause preinstalled operating systems to boot automatically from the system disk
after successful startup tests. To change default settings, use SRM console commands. For
example, you can reset the system to do the following:

Halt at the console prompt after the startup tests.

Boot the operating system from a different device.

To change how the system starts or boots the operating system, change default values for
environment variables.

Examples:
Set the system to autoboot.
P00>>>set auto_action boot
Set the system to halt at the console prompt after the startup tests.
P00>>>set auto_action halt
Change the default boot device.
P00>>>set bootdef_dev dka0

Set the operating system to UNIX.


P00>>>set os_type unix
Set autoboot, for a system that should come up automatically after a power failure.
P00>>>set boot_osflags a

Ensuring that Environment Variables Match System


Configuration
Compare the system configuration with the environment variable settings. If necessary, use the
show config command and record the configuration. To see all environment variables, type
show * at the SRM prompt. The following example shows how to display variables one screen
at a time.

Example:
P00>>> show | more

System Installation

3-29

Installing OpenVMS
After you boot the operating system CD, an installation menu is displayed on the screen.
1.

Boot the OpenVMS operating system CD.

2.

Choose option 1 (Install or upgrade OpenVMS Alpha).

To create the system disk, see the OpenVMS installation guide.

OpenVMS (TM) Alpha Operating System, Version V7.1-2


Copyright 1999 Digital Equipment Corporation. All rights reserved.
Installing required known files...
Configuring devices...
****************************************************************
You can install or upgrade the OpenVMS Alpha operating system
or you can install or upgrade layered products that are included
on the OpenVMS Alpha operating system CD-ROM.

Booting OpenVMS
OpenVMS can be booted from a CD on a local drive (the CD-ROM drive connected to the
system) or from a CD-ROM drive on an InfoServer.

Booting OpenVMS from the local CD-ROM Drive


1.

Power up the system. The system stops at the SRM console prompt, P00>>>.

2.

Set boot environment variables, if desired.

3.

Install the boot medium. For a network boot, see Booting OpenVMS from the InfoServer.

4.

Enter the show device command to determine the unit number of the drive for your device.

5.

Enter the boot command. (If you have not set the associated environment variables, enter the
command-line parameters along with the boot command.)

3-30

Compaq AlphaServe r DS20E

Example
P00>>>show device
dkc0.0.0.8.0
DKC0
SEAGATE ST39102LC 7B04
dqa0.0.0.105.0
DQA0
TOSHIBA CD-ROM XM-1702B 1150
dva0.0.0.0.0
DVA0
eia0.0.0.2005.0
EIA0
00-06-2B-00-6E-56
pka0.7.0.6.0
PKA0
SCSI Bus ID 7
pkb0.7.0.106.0
PKB0
SCSI Bus ID 7
pkc0.7.0.8.0
PKC0
SCSI Bus ID 7
P00>>>
.
.
.
P00>>> boot -flags 0,0 dqa0
(boot dqa0.0.0.105.1 -flags 0,0)
block 0 of dqa0.0.0.105.1 is a valid boot block
reading 898 blocks from dqa0.0.0.105.1
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 70400
initializing HWRPB at 2000
initializing page table at 3ffee000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
OpenVMS (TM) Alpha Operating System, Version V7.1-2

Booting OpenVMS from an InfoServer


You can boot OpenVMS from a LAN device on an InfoServer. The devices are designated
EW*0 or EI*0. The asterisk stands for the adapter ID (a, b, c, and so on).
1.

Power up the system. The system stops at the P00>>> console prompt.

2.

Insert the operating system CD into the CD-ROM drive connected to the InfoServer.

3.

Enter the show device command to determine the unit number of the drive for your device.

4.

Enter the boot command and any command-line parameters.

5.

The InfoServer ISL program displays a menu.

6.

Respond to the menu prompts, using the selections shown in the InfoServer example.

For complete instructions on booting OpenVMS from an InfoServer, see the OpenVMS
installation document.

System Installation

Installing Linux
The procedure for installing Linux on a DS20E is documented in the Linux Installation and
Configuration Guide for AlphaServer DS10, DS20, and AlphaStation XP1000 Computers.
http://www.digital.com/alphaserver/linux/install_guide.html
Power up the system to the SRM console and enter the show version command.
P00>>show version
version V5.4-2 May 19 1999 14:53:22
P00>>
You need V5.4-2 or higher of the SRM console to install Linux. If you have a lower version of
the firmware, you will need to upgrade.

Booting Linux
Before booting Linux, enter the show device command to determine the unit number of the
drive for your boot device. In the following example DKA300 is a hard disk, DKA500 is a CD,
and DVA0 is a floppy drive. In the following example DKA300 is a hard disk, DKA500 is a
CD, and DVA0 is a floppy drive.
P00>>>show device
dka300.3.0.7.1 DKA300 RZ1CF-CF 1614
dka500.5.0.7.1 DKA500 TOSHIBA CD-ROM XM-5701TA 0557
dva0.0.0.0.0 DVA0
pka0.7.0.7.1 PKA0 SCSI Bus ID 7 5.57
. . .
Set the following SRM environment variables to configure boot parameters. This example
shows configuration commands to boot the floppy created by the Linux installation.
P00>>>set bootdef_dev dva0
P00>>>set boot_file vmlinux.gz
P00>>>set boot_osflags "root=/dev/hda"
P00>>>show boot*
boot_dev dva0.0.0.0.0
boot_file vmlinux.gz
boot_osflags root=/dev/hda
boot_reset OFF
bootdef_dev dva0.0.0.0.0
booted_dev
booted_file
booted_osflags
Insert the boot floppy and enter the boot command.

3-31

3-32

Compaq AlphaServe r DS20E

Linux Boot Example


P00>>>b
(boot dkb0.0.0.3000.0 -file boot/vmlinux.gz -flags root=/dev/hda)
block 0 of dkb0.0.0.3000.0 is a valid boot block
reading 152 blocks from dkb0.0.0.3000.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 13000
initializing HWRPB at 2000
initializing page table at 3ff8e000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
Linux version 2.2.12 (jestabro@linux04.mro.dec.com) (gcc version egcs-2.90.29
980515 (egcs-1.0.3 release)) #21 Fri Sep 10 16:55:01 EDT 1999
Booting on Tsunami variation Clipper using machine vector Brick
Command line: root=/dev/hda bootdevice=scd0 bootfile=boot/vmlinux.gz
setup_smp: 2 CPUs probed, cpu_present_map 0x3, boot_cpu_id 0
Console: colour VGA+ 80x25
Calibrating delay loop... 996.15 BogoMIPS
Memory: 1033720k available
POSIX conformance testing by UNIFIX
Entering SMP Mode.
secondary_console_message: on 0 from 1 HALT_REASON 0x0 FLAGS 0x1ee
secondary_console_message: on 0 message is P01>>>START P01>>>
smp_boot_cpus: Total of 2 Processors activated (1992.29 BogoMIPS).
start_secondary: commencing CPU 1 current fffffc003ffe0000
Alpha PCI BIOS32 revision 0.04
PCI: Probing PCI hardware
Linux NET4.0 for Linux 2.2
. . .
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x24c9f043)
. . .
Red Hat Linux release 6.0 (Hedwig)
Kernel 2.2.12 on an alpha
peng1 login:

Chapter

System Configuration

Introduction
To configure a DS20E system, you need to know which configuration options you can use and
how to configure the components to interact for optimum system performance. This chapter
describes configuration options, guidelines, requirements, and procedures. Topics in this
chapter are:

Base System Configuration

Switch Settings

Memory Configurations

Addressing Considerations

SCSI Configuration

PCI and ISA Configuration

Interrupt Configuration

DMA Configuration

Firmware Configuration

System Options and Upgrades

Base System Configuration


Customers can order the DS20E system as system building blocks. System building blocks are
designed for a la carte ordering and distributor integration. Each system building block
contains a system kernel, 4 MB cache memory, and a Tru64 UNIX, OpenVMS, or LINUX
operating system. All DS20E system building blocks require a minimum 128 MB of memory, a
hard disk drive, and a country kit. All system building blocks require the selection of a countryspecific keyboard for full system operation. See the Systems and Options Catalog:
http://ftp.digital.com/pub/DEC/info/SOC/Systems_and_Options_Catalog_25jun1999SOHOME
HM.htm

4-2 DS20E Service Guide

Switch Settings
Two switchpacks configure functions on the system board (or main logic board). They are
located at the lower right corner of the board. .
Switch SW2 is used to control the writing of the flash ROM, the speed of the cross-bar switch,
cache memory timing, and the debug monitor output path. The switch positions are identified in
the table.

System Board SW2

System Configuration

Number

Name

Function
Causes the SROM to jump to the Fail-Safe
Booter program (currently the debug
monitor).
Sets the cache memory timing.
Sets the cache memory timing.
Causes the SROM to jump to the SROM
Mini debugger.
Sets the speed of the Tsunami chipset and
the CPU (bit 0) (see next table).
Sets the speed of the Tsunami chipset and
the CPU (bit 1) (see next table).
Sets the speed of the Tsunami chipset and
the CPU (bit 2) (see next table).
Causes the debug monitor to send its
output through the debug port on the CPU
daughter card.

Default

FSB

2
3

CACHE_OFF_A
CACHE_OFF_B

MINI_DEB

TS_SPD0

TS_SPD1

TS_SPD2

PASS_BY

TS_SPD2

TS_SPD1

TS_SPD0

Cross-Bar Speed

CPU Speed

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

66.6
75
77
79
83.3
87.5
91.7
100

400
450
500
550
600
666
833
1000

Off
Off
Off
Off
On
On
Off
Off

4-3

4-4 DS20E Service Guide

System Board SW3


Only switch positions 8 and 2 are used on SW3. Positions 1and 3 through 6 are reserved.
Position 7 is a spare. Switch position 8 is used to enable or disable writing to the flash ROM.
Writing to the flash ROM is enabled when this switch is OFF. The flash ROM is protected when
the switch position is ON.
Switch position 2 is used to select between an EV6 and EV67 processor. Set switch position 2 to
ON if upgrading to a EV67 (667 MHz) processor.

System Configuration

CPU SW1
The CPU daughter card on the system board also contains two switchpacks. They configure
functions for the CPU subsystem.

CPU switch SW1 is used to set the Bcache configuration, the CPU speed, and the SROM flash
enable. The switch positions are identified in the following table:

Position
1-4
5-7
8

Description
Bcache configuration
CPU speed
SROM flash select

Bcache Configuration
SW1-4
Off

SW1-3
On

SW1-2
X

SW1-1
On

Function
Reserved

4-5

4-6 DS20E Service Guide

CPU Speed
SW1-7
On

SW1-6
On

SW1-5
Off

Function
500MHz

SROM Flash Select


SW1-8
Off
On

Description
Flash select disabled
Flash select enabled

Typical Switch Settings


The following switch settings represent a CPU set for 4 MB Bcache at 200 MHz, processor
speed of 500 MHz, and SROM flash select enabled.

System Configuration

CPU SW2
This switchpack sets the CPU voltage and selects the flash. Default settings are shown in bold.

CPU Voltage
SW2-4
Off
Off
Off
Off
Off
Off
Off
Off
On
On
On
On
On
On
On
On

SW2-3
Off
Off
Off
Off
On
On
On
On
Off
Off
Off
Off
On
On
On
On

SW2-2
Off
Off
On
On
Off
Off
On
On
Off
Off
On
On
Off
Off
On
On

SW2-1
Off
On
Off
On
Off
On
Off
On
Off
On
Off
On
Off
On
Off
On

VDC
1.429
1.500
1.571
1.643
1.714
1.786
1.857
1.929
2.000
2.071
2.143
2.214
2.286
2.357
2.429
2.500

Flash Select Settings


Flash0
(SW2-7)
Off
On
Off
On
Off
On
Off
On

Flash1
(SW2-6)
Off
Off
On
On
Off
Off
On
On

Flash2
(SW2-5)
Off
Off
Off
Off
On
On
On
On

Flash Bypass Settings


SW2-8
Off
On

Description
Flash bypass disabled
Flash bypass enabled

The following example shows the switches set appropriately for the system.

4-7

4-8 DS20E Service Guide

Memory Configurations
The DS20E system has 16 memory slots for 4 arrays of DIMMs as shown in the following
illustration. The system has a memory capacity of up to 4 GB.

Memory Configuration Rules

A memory option consists of four DIMMs.

All four DIMMs in an option must be the same size.

The largest memory option goes in bank 0.

Memory options are installed in the following bank order: 0, 1, 2, and 3.

Other memory options must be the same size or smaller than the first memory option.

Only DIMMs qualified by Compaq are guaranteed to work.

Use DIMMs made by the same supplier in all slots.

Qualified DIMMs
The following DIMMs are qualified; others may be added in the future.
Sales
Part Number
Description
FR-MS340-CA
256MB ECC Memory DIMMs (4x64MB)
FR-MS340-DA
512MB ECC Memory DIMMs (4x128MB)
FR-MS340-EA
1GB ECC Memory DIMMs (4x256MB)

System Configuration

Addressing Considerations
Addresses are generated either by the CPU or an I/O device on the PCI bus. A CPU-generated
address can be targeted at system memory, PCI memory, or PCI I/O space. Similarly, an I/O
devices address can select system memory or other PCI devices. Because the addressing
capabilities of CPU and I/O devices are different, a scheme to map them to the appropriate
target address space is required.
From the CPUs perspective, the PCI I/O and memory space are linear and byte accessible.
Because the EV6/EV67 supports byte mode accesses, a single linear I/O space is used. CPU
address space is defined as the map of CPU-generated addresses used to access system memory
and I/O space.

CPUAddr[43:0]
Space

Size

From

To

System Memory
(Cacheable, Prefetchable)

4 GB

000 0000 0000

000 FFFF FFFF

Reserved

8188 GB

001 0000 0000

7FF FFFF FFFF

4 GB
P-Chip0 PCI Memory
(Linear Addressing, NonCacheable)

800 0000 0000

800 FFFF FFFF

TIG BUS (addr[5:0]=0. 1


byte/64bytes. Effective
space is 16 MB)

1GB

801 0000 0000

801 3FFF FFFF

Reserved

1GB

801 4000 0000

801 7FFF FFFF

256 MB
P-Chip0 CSRs
(addr[5:0]=0. Quadword
access only)

801 8000 0000

801 8FFF FFFF

Reserved

256 MB

801 9000 0000

801 9FFF FFFF

C-Chip CSRs (addr[5:0]=0. 256 MB


Quadword access only.
Non-cacheable)

801 A000 0000

801 AFFF FFFF

D-Chip CSRs (addr[5:0]=0. 256 MB


Each byte per quadword
points at 1 of 8 D-Chips.
All bytes must be
identical. Noncacheable)

801 B000 0000

801 BFFF FFFF

Reserved

896MB

801 C000 0000

801 F7FF FFFF

P-Chip 0 PCI
IACK/Special (Linear
addressing, No address
extension using HAE)

64MB

801 FC00 0000

801 FBFF FFFF

4-9

4-10

DS20E Service Guide

CPUAddr[43:0]
From

To

P-Chip 0 PCI I/O (Linear 32 MB


addressing. No HAE. Noncacheable)

801 FC00 0000

801 FDFF FFFF

16 MB
P-Chip 0 PCI
Configuration (Linear
addressing. No HAE. Noncacheable)

801 FE00 0000

801 FEFF FFFF

P-Chip 1 CSRs (Linear


2 GB
addressing. No HAE. Noncacheable)

803 0000 0000

803 7FFF FFFF

Reserved

803 8000 0000

803 8FFF FFFF

P-Chip1 PCI IACK/Special 1664 MB


(Linear addressing. No
HAE. Non- cacheable)

803 9000 0000

803 F7FF FFFF

P-Chip 1 PCI I/O (Linear


64 MB
addressing. No HAE. Noncacheable)

803 F800 0000

803 FBFF FFFF

P-Chip 1 Configuration
(Linear addressing. No
HAE. Non- cacheable)

32 MB

803 FC00 0000

803 FDFF FFFF

Reserved

16 MB

803 FE00 0000

803 FEFF FFFF

8188 GB

804 0000 0000

FFF FFFF FFFF

Space

Size

Reserved

256 MB

System Configuration

CPU to PCI Address Translation


CPU address mapping to PCI memory space must be translated into a PCI memory address.
Translation occurs in the C-chip, with the CPU address and byte mask provided by the
EV6/EV67 processor.

The following table is used to translate the CPU mask into PCI AD[1:0] and PCI BE[3:0].

Type

Mask

Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Word
Word
Word
Word
LW
LW
LW
LW
LW
LW
LW
LW
QW

0000 0001
0000 0010
0000 0100
0000 1000
0001 0000
0010 0000
0100 0000
1000 0000
0000 0011
0000 1100
0011 0000
1100 0000
xxxx xxx1
xxxx xx10
xxxx x100
xxxx 1000
xxx1 0000
xx10 0000
x100 0000
1000 0000
xxxx xxxx

PCI_AD[2:0]
64-bit
000
001
010
011
100
101
110
111
000
010
100
110
000
100
000
100
000
100
000
100
000

PCI_BE[7:0]
64-bit
1111 1110
1111 1101
1111 1011
1111 0111
1110 1111
1101 1111
1011 1111
0111 1111
1111 1100
1111 0011
1100 1111
0011 1111
xxxx 0000
0000 1111
xxxx 0000
0000 1111
1111 0000
0000 1111
1111 0000
0000 1111
0000 0000

PCI_AD[2:0]
32-bit
000
001
010
011
100
101
110
111
000
010
100
110
000
100
000
100
000
100
000
100
000

PCI_BE[3:0]
32-bit
1110
1101
1011
0111
1110
1101
1011
0111
1100
0011
1100
0011
0000
0000
0000
0000
0000
0000
0000
0000
0000

4-11

4-12

DS20E Service Guide

CPU address mapping to PCI I/O space uses the CPU mask. (The PCI host controller in the Pchip does not recognize I/O space accesses directed to it from other PCI or ISA devices.)

CPU address mapping must also be translated into PCI configuration space 0 and space 1.

Space 0

Binary encoding is used to decode the 5-bit Device # field in the CPU address to generate the
field IDSEL[20:0] in PCI AD[31:11], as shown here:

When Device # is 00000, IDSEL bit 0 (PCI AD[11]) is set to "1".

When Device # is 00002, IDSEL bit 1 (PCI AD[12]) is set to "1", and so on up to Device
# 10100.

Device field encodings 1010111111 are not used and result in IDSEL[20:0] being set to
0s.

System Configuration

The following table is used to translate CPU Mask into PCI AD[1:0] and PCI BE[3:0].

Type
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Word
Word
Word
Word
LW
LW
LW
LW
LW
LW
LW
LW
QW

Mask
0000 0001
0000 0010
0000 0100
0000 1000
0001 0000
0010 0000
0100 0000
1000 0000
0000 0011
0000 1100
0011 0000
1100 0000
xxxx xxx1
xxxx xx10
xxxx x100
xxxx 1000
xxx1 0000
xx10 0000
x100 0000
1000 0000
xxxx xxxx

PCI AD[2]
0
0
0
0
1
1
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0

PCI BE[3:0]
1110
1101
1011
0111
1110
1101
1011
0111
1100
0011
1100
0011
0000
0000
0000
0000
0000
0000
0000
0000
0000

4-13

4-14

DS20E Service Guide

Space 1

CPU address mapping must also be translated to TIG address space. The TIG address is sparse
in that each aligned 64-byte region has only one byte of information.

PCI Direct Memory Access (DMA) Address


Translation
Both PCI-0 and PCI-1 have identical translation mechanisms located in P-chip 0 and P-chip 1,
respectively. DMA address translation refers to taking a PCI-device-generated PCI memory
address and mapping that into the system memory address.
The P-chips ignore the following PCI commands from a PCI device:

Interrupt acknowledge

Special Cycle

I/O Read & Write

Configuration Read & Write

The P-chips respond to PCI memory read/write and invalidate commands if the PCI address
maps to system memory.
Each P-chip supports four DMA address windows and one DMA monster window. Each of the
normal DMA windows is capable of mapping to system memory (or to other PCI devices, as
long as another P-chip exists). If the window selected by the PCI address is not a peer-to-peer
window, it can translate the incoming address to the system memory address by either direct
mapping or scatter/gather mapping.

System Configuration

Direct Mapping
The incoming address is compared to a Window Base Address register and a Window Mask
register, which determines the size of the window. If the address fits into this window, the
address bits that are not part of the compare are concatenated to a Translated Base address
register to form the System Memory Address. Note that the PCI Address bits that are not part of
the compare are the lower-order bits, and represent the size of the window:
System Address[34:2] = T_Base[34:20+n]:PCI_AD[19+n:2]
The variable n varies from 011. Thus, the size of the window can vary from 1MB to 2GB.

Scatter/Gather Mapping
This scheme also uses the Window Base, Window Mask and Translated Base Address
Registers. The difference is that the translated address from the scheme for direct mapping
results in a quadword address. This quadword address is fed into a system memory-based page
table to produce a Page Table Entry (PTE). The PTE produces the top 21 bits of the system
memory address while the PCI AD[12:0] are sent through untranslated as the Page Offset. The
translation is illustrated as follows:

V is the valid bit in the PTE and must be a 1 to indicate a valid PTE. Each P-chip caches a
number of PTEs in a scatter/gather table to avoid the memory fetch on every DMA transaction.
The page size is fixed at 8 KB. The window size determines the size of the page table for a
given window. The size of the page table determines the number of bits used from the
Translated Base Address[34:10] and the PCI AD[31:13]:
PTE Address[34:3] = T_Base[34:10+n]:PCI_AD[19+n:13]
The variable n varies from 011. Thus the window size ranges from 1MB to 2GB.

Monster Window
This window is used only with a PCI dual address cycle. A monster window is selected if the
PCI AD[63:40] equals 0x0000_01 (only bit 40 is a 1). In this case, the low-order PCI AD [34:0]
is used untranslated to address system memory.

4-15

4-16

DS20E Service Guide

SCSI Configuration
The DS20E system supports up to four internal SCSI devices. Systems that include a SCSI bus
currently use the Qlogic 1040UW Ultra Wide SCSI PCI host adapter.

SCSI IDs
The DS20E system currently supports up to four internal SCSI devices. The adapter can support
SCSI IDs ranging from 0 to 15. Each SCSI device must have a unique SCSI ID. Typically, the
host adapter is ID 7.
NOTE: The CD-ROM drive is an ATAPI device attached to the IDE port.

SCSI Termination
Termination on the host adapter is controlled by software commands through an adapter utility.
The default setting is Automatic. In Automatic mode, the adapter detects attached cables to
either one or two of its three connectors (cables on all three connectors is an illegal
configuration). The adapter then sets termination accordingly, as show in the following table:

Device Connected to the Adapters:


68-pin internal connector only
68-pin external connector only
68-pin internal and external connectors
50-pin internal connector only
50-pin and 68-pin internal connectors
50-pin internal and 68-pin external connectors

Host Adapter Termination


Low-Byte
High-Byte
ON
ON
ON
ON
OFF
OFF
ON
ON
OFF
ON
OFF
ON

NOTE: You can also select these settings manually, using the SCSISelect utility.

No internal SCSI devices should be terminated. (Termination must be disabled if a device is in


the middle of the bus.) The internal SCSI bus is terminated by the SCSI controller card at one
end, and a SCSI terminator plug on the cable at the other end.
You can disable termination on SCSI devices in several ways, including removable resistor
packs, switch settings, or a terminator plug. Refer to the device documentation for details.

SCSI Cable Length


For reliable operation of Ultra SCSI devices, do not let the cable length exceed 1.5 meters (4.9
feet).

System Configuration

PCI and ISA Configuration


PCI Slot Numbering
The DS20E has an unusual PCI slot numbering scheme. The slot numbers are not included on
the MLB. Only the J numbers are on the MLB (this applies to the DIMM slots as well). Compaq
Analyze will include the reference designator (J number) in its callout. The slot at J46 is the PCI
slot and J47 is the ISA slot.
The slot assignments for PCI and ISA devices are shown in the following table:

PCI Slot Numbering from the top as mounted in the pedestal chassis
1
64 bit PCI - bus 0, slot 0
Slot 7
J35
2
64 bit PCI - bus 0, slot 1
Slot 8
J40
3
64 bit PCI - bus 0, slot 2
Slot 9
J41
4
64 bit PCI - bus 1, slot 0
Slot 7
J42
5
64 bit PCI - bus 1, slot 1
Slot 8
J44
6
Shared ISA/64 bit PCI - bus 1, slot 2
Slot 9
PCI J46/ISA J47
The following installation order must be followed:

Step
1
2
3
4
5

Install this option:


ISA
64 bit PCI options
Graphics options
SCSI controllers
Network adapters

In this slot:
6
First available slots 3, 1
First available slots 3, 2, 1
First available slots 3, 2, 1, 4, 5
First available slots 3, 2, 1, 4, 5

Graphics Options

The primary VGA device must be installed on bus 0.

All graphics adapters should be installed on bus 0 for optimum performance.

PowerStorm 4D51T and 3D30 must be installed on bus 0 only.

Multihead systems must be homogeneous (all ELSA, all PowerStorm 300, and so on).

Maximum number of graphics adapters supported:

SN-PBXGK-BB
SN-PBXGB-AA
SN-PBXGI-AD
SN-PBXGD-AD

ELSA Gloria Synergy (note 1)


PowerStorm 3D30 (note 2)
PowerStorm 4D51T
PowerStorm 300

Tru64 UNIX
3
1/3
2
2

Note 1: UNIX 1 to 3 heads is 2D only.


Note 2: UNIX supports 1 head for 2D/3D, 2 or 3 heads is 2D only.

4-17

4-18

DS20E Service Guide

SCSI Controllers

On-board controller for internal devices only.

Two additional storage adapters maximum per system.

Add-on RAID SCSI controllers must be connected only to external buses/devices.

PCI Restrictions
The DS20E system will not operate properly unless the following restrictions are observed:

Do not use 64-bit cards in 32-bit slots.

On systems using the Tru64 UNIX operating system, all TGA-2 and PowerStorm 4D51T
options must be installed in PCI bus 0.

In a multi-head environment with ELSA Gloria modules, the VGA-enabled ELSA Gloria
must be installed in PCI bus 0.

ISA Bus
The DS20E system has a built-in ISA device and a slot for an additional ISA option (if the slot
is not already occupied by a PCI device). The built-in device is a super I/O chip (SMC FDC
37C669), which provides controllers for the floppy disk bus, two serial lines, keyboard, mouse,
and the bidirectional enhanced parallel port.

ISA Restrictions
IRQ7 is reserved for use by the parallel port. IRQ10 is reserved for use by the USB controller,
regardless of whether it is enabled.

PCI Assignment Tables


To read a PCI configuration register, you need to know which bus (primary or secondary) the
device is on and which PCI address line is tied to its IDSEL pin.

PCI Device
Cypress 693U
PCI Slot 1
PCI Slot 2
PCI Slot 3
Ethernet 21143
SCSI 1040C
P2P Bridge
PCI Slot 4
PCI Slot 5

IDSEL
PCI AD[18]
PCI AD[22]
PCI AD[23]
PCI AD[24]
PCI AD[14]
PCI AD[17]
PCI AD[19]
PCI AD[25]
PCI AD[26]

PCI Bus Number


PCI Bus 0
PCI Bus 0
PCI Bus 0
PCI Bus 0
PCI Bus 1
PCI Bus 1
PCI Bus 1
Secondary PCI Bus 1
Secondary PCI Bus 1

The following assignment tables show the DS20E system with PCI dense configuration space.

System Configuration

4-19

Primary Bus 0 PCI IDSEL


IDSEL Bit
AD<18>
AD<22>
AD<23>
AD<24>

PCI
Base Address
0004.0000
0040.0000
0080.0000
0100.0000

CPU
Base Address
801.FE00.3800
801.FE00.5800
801.FE00.6000
801.FE00.6800

MLB Device
PCI-ISA
Slot 1
Slot 2
Slot 3

Primary Bus 1 PCI IDSEL


IDSEL Bit
AD<14>
AD<17>
AD<19>

PCI
Base Address
0000.4000
0002.0000
0008.0000

CPU
Base Address
803.FE00.1800
803.FE00.3000
803.FE00.4000

MLB Device
PCI-Ethernet
SCSI 1040C
P2P Bridge

Secondary Bus PCI IDSEL


IDSEL Bit
AD<25>
AD<26>

Secondary
PCI Base Address
0200.0000
0400.0000

Primary
PCI Base Address
0001.4801
0001.5001

CPU Base Address

MLB Device

803.0001.4800
803.0001.5000

Slot 4
Slot 5

For either bus, you create a CPU address from the CPU base address (above), the function
number of the device, and the offset into the PCI configuration space, using the following
formula:
CPU address = CPU base address + function number * 256 + offset
To read from this address, use the following SRM console command:
>>> e -p {size} {CPU address}
{size}
-l, -w, -b (for longword, word, or byte access)
{CPU address}
The address calculated above
For example, to read the status register (offset 6, word wide) from the USB section (function 3)
of the Cypress PCI-ISA bridge (primary bus, IDSEL hooked to AD<18>), the CPU address (in
hex) is:
801.0000.3800 + 300 + 6 -> 801.0000.3B06
The console command is:
>>> e -p -w 801.0000.3b06

4-20

DS20E Service Guide

Interrupt Configuration
The main logic board collects error interrupts and interrupts from various other sources and
routes them to the appropriate IRQ.

Interrupt Source
P-chips
PCI and ISA devices
rtc_irq
C_chip_csr
Halt jumper

CPU Interrupt
Cpu_irq 0
Cpu_irq 1
Cpu_irq 2
Cpu_irq 3
Cpu_irq 4

Description
Error interrupts
PCI and ISA interrupts
Real-time clock interrupt
Interprocessor
Halt is through the Halt button or RMC halt command;
software can enable this

These interrupts (IRQ_0 through IRQ_2) need to be posted through the TIG bus to the Tsunami
C-chip where they are collected in the device interrupt register before being sent to the
EV6/EV67 through the TIG bus. Programmable masking of interrupts is also done in the C-chip
using the interrupt mask register.
All interrupts collected through the TIG bus are level interrupts, and the interrupt conditions
remain present until cleared at the source through programmed I/O. The interrupt posting
buffers of the TIG bus are not the real source of the interrupts. Reading them will not clear the
interrupt condition.
For optimum system performance and to prevent conflicts, all bus master devices are assigned
interrupt request levels. The following sections provide assignment and other interrupt
information.

System Configuration

TIG Bus Interrupt Assignments


C-Chip
Device
Interrupt
Register Bit
63
62
61
60-56
55
54
53
52-49
48
47-44
43-40
39-36
35-32
31-28
27-24
23-20
19
18
17
16-0

Interrupt Source

C-Chip Internal Error


P-Chip0 Error
P-Chip1 Error
Reserved for future use
INTR from Cypress. Includes ISA, PCI-IDE,
PCI-USB
SMI from Cypress
NMI from Cypress
Unused
I2C_IOINT_l (680 MCHK)
PCI1-INTA0,INTB0,INTC0,INTD0
PCI1-INTA1,INTB1,INTC1,INTD1
PCI1-INTA2,INTB2,INTC2,INTD2
PCI1-INTA3,INTB3,INTC3,INTD3
PCI0-INTA0,INTB0,INTC0,INTD0
PCI0-INTA1,INTB1,INTC1,INTD1
PCI0-INTA2,INTB2,INTC2,INTD2
PCI0-IRQ_ADPTA (Adaptec 7895)
PCI0-IRQ_ADPTB (Adaptec 7895)
2
2
I C_INT_L Server management I C controller
device bit (OpenVMS only)
Unused

Real-Time Clock Interrupt


The main logic board sends the square wave from the Cypress Real-Time Clock to the C-chip.
The C-chip uses the square wave positive edges to record Interval Timer Interrupts through the
C-Chip MISC register. This interrupt is routed to IRQ_2.

Halt Interrupt
Various catastrophic or operator-induced conditions cause a halt to the SRM console.

Code
Description
1
Hardware Halt button pushed*
2
Kernal Stack Pointer invalid
5
Software Halt instruction executed
6
Double machine check
* The front panel Reset button can optionally be configured to generate a Halt
interrupt to the EV6/EV67 processor. The processor will receive this interrupt
on IRQ_4, which is a dedicated interrupt line for this function. (This interrupt is
not routed through the C-chip).

4-21

4-22

DS20E Service Guide

ISA Interrupt Assignments


The ISA interrupts are collected in the Cypress bridge using two 8259-compatible interrupt
controllers in the same fashion as the Intel SuperIO bridge. Some of the ISA devices
(keyboard/mouse, real-time clock) are embedded inside the Cypress chip, whereas others are
external. Each interrupt condition is maskable.

ISA Device
Keyboard (Cypress)
Real Time Clock (Cypress)
COM 2 Port
COM 1 Port
FDC
Parallel Port
ISA Option Slot
USB
Mouse (Cypress)
PCI-IDE Primary
PCI-IDE Secondary

ISA IRQ
Notes
IRQ_1
Not an ISA interrupt Maskable in Cypress
IRQ_3
IRQ_4
IRQ_6
IRQ_7
IRQ_9,11,13
Programmable Selection
IRQ_10
IRQ_12
IRQ_14
IRQ_15

Fan Fault Interrupt


The Fan Fault interrupt is generated when the fan fault circuitry on the MLB detects a fan
failure. This interrupt cannot be cleared until the error condition is corrected. The normal
response to this interrupt is to record the error and perform a graceful shutdown.

TIG Interrupt Processing


The C-chip is responsible for collecting interrupts off the TIG bus. It does so continuously in the
background as a low-priority activity on the TIG bus. An interrupt gathering cycle consists of
reading all the interrupt bytes selected (the DS20E has five); updating the Raw Device Interrupt
Register and the Masked Device Interrupt Register in the C-chip; and performing a write to the
EV6/EV67 IRQ register on the TIG DATA Bus. This entire interrupt gathering cycle is
automatic.

System Configuration

DMA Configuration
Direct memory access (DMA) allows devices to access memory directly without going through
the CPU. DMA channels must be unique, and addresses must not conflict. The following table
summarizes the DMA channel assignments. Certain channels are hardwired (H), and others are
program selected (P):

Device
Floppy disk controller
Parallel
ISA

DMA Channels
0
1
2
H
P

3
H
P

The floppy interface in the Super I/O chip is hardwired to DMA channel 2. The parallel
interface is allowed to use DMA channel 3 during certain modes of operation.
DMA channels 0 through 3 and 5 through 7 can be used by the ISA slot on the MLB.

Firmware Configuration
The DS20E firmware is used to:

Display system configuration information

Set boot parameters

Boot the operating system (OS) or distribution media

Control PCI parity

Configuration information is stored in the NVRAM section of the flash ROM. If the MLB is
replaced, the information in the NVRAM is lost.

NOTE: Keep records of the configuration information to facilitate restoring the system.
DS20E firmware resides in a serial ROM (SROM). This SROM contains the power-on self-test
(POST) and the SRM firmware for Tru64 UNIX and OpenVMS systems.

4-23

4-24

DS20E Service Guide

Using SRM Commands to Configure the System


To view and verify the configuration, use these commands at the SRM P00>>> prompt:
P00>>> show config Displays the buses on the system and the devices found on
those buses. Identifies target devices for commands, such
as boot and test, and checks that systems see all installed
devices.
P00>>> show device Displays the devices and controllers in the system.
P00>>> show memory Displays main memory configuration.
P00>>> show pal
Displays current versions of Tru64 UNIX and OpenVMS
PALcode.
P00>>> show
Displays the version of the SRM console program that is
version
installed on the system.
Use the set and show commands to set and then verify environment variable settings with
SRM. Initialize with the init command or press Reset after changing an environment variable.

Example:
P00>>> set auto_action halt
P00>>> init
The system will halt.
P00>>> show auto_action
halt

System Configuration

System Options and Upgrades


This section provides information about supported options, system requirements, and sources of
information about the DS20E.

Obtaining Options
Before installing any options or upgrading the system, you should:

Get an accurate list of the modules and devices in the current system configuration. Use
the show config command to display the current system configuration.

To display information on the devices and controllers installed in the system, enter the
show device command.

Determine what options are to be added to the system and ensure that they are supported.

Refer to the Compaq Systems and Options Catalog for the latest information on base
system components, configuration guidelines, packages, and available system options.
http://ftp.digital.com/pub/DEC/info/SOC/Systems_and_Options_Catalog_25jun1999SOH
OMEHM.htm

For the latest list of supported options, see the DS20E Supported Options List at:
http://www.digital.com/alphaserver/ds20e/options/asds20e_options.html

Upgrading the CPU to EV67 Operating at 667 MHz


The DS20E system supports the EV6 and EV67 processors at various speeds. The EV6 and
EV67 share a common pin interface to the system and use the same 587-pin CPGA package.
However, VDD for EV67 is different from that for EV6. The system platform uses a
programmable regulator to accommodate a range of CPU VDD requirements.
To upgrade a CPU:
1.

Replace the CPU daughter card with the new version.

2.

On the system board, set SW3-2 to ON for the EV67 processor. See the SW3 illustration
earlier in this chapter.

For information about replacing a CPU or adding a second CPU, see the CPU Daughter Card
procedure in the FRU Removal and Replacement chapter.

Upgrading Memory
Minimum standard memory capacity is 128 MB (using four 32 MB DIMMs), or 256 MB (using
four 64 MB DIMMs). Memory can be upgraded to as much as 4 GB by removing the standard
DIMMs and installing the optional 256 MB DIMMs.
To upgrade memory, see the DIMMs procedure in the FRU Removal and Replacement chapter.

4-25

4-26

DS20E Service Guide

Updating Firmware and Device Drivers


The DS20E system contains flash ROM for the console firmware. The flash ROM contains the
power-on self-test (POST), AlphaBIOS console firmware, and the SRM console firmware.
Periodically, you may need to update the firmware.
See Updating Firmware and Device Drivers in the Firmware chapter.

Adding Third-Party Devices


The following types of options are supported by the DS20E system:

Memory DIMMs

Removable-media devices

Fixed-media drives

ISA devices

PCI devices

Before attempting to connect third-party devices or install third-party devices inside the system
unit, check to ensure that the operating system supports the device. All compatible third-party
devices use standard mounting hardware and connectors.

NOTE: Third-party memory DIMMs are not supported on the DS20E system.

Adding PCI Options


The system supports six 64-bit peripheral component interconnect (PCI) options.
PCI boards are installed according to instructions supplied with the option and require no
additional configuration procedures. The system automatically recognizes the boards and
assigns the appropriate system resources.

Chapter

Firmware

Introduction
Firmware for the DS20E system includes the SRM console and the AlphaBIOS console. The
system also has a remote management console (RMC) for remote monitoring.
Topics in this chapter are:

Firmware

Updating Firmware and Device Drivers

Using the SRM Console

Using the AlphaBIOS Console

Remote Management Console (RMC)

5-2 DS20E Service Guide

Firmware in the DS20E


Firmware is software that becomes a permanent part of a computing device. It is used to
initialize and set up a computer before the operating system is loaded. Knowledge of the
firmware is important in configuring, managing, and troubleshooting DS20E systems.
The DS20E firmware is located in the flash ROM on the main logic board. This chip can be
electronically reprogrammed, allowing you to upgrade the firmware without replacing the chip.

SRM Console
The SRM console is the command-line interface (CLI) that controls and sets up the operation of
a DS20E system running the Tru64 UNIX or OpenVMS operating system. This interface is a
shell, similar to UNIX, that provides a set of commands as well as a scripting facility. You enter
SRM console commands at the console prompt, P00>>>.
The SRM console allows you to boot the operating system, and perform other system
management tasks, such as:

Displaying the system configuration

Setting environment variables

Depositing and examining data in memory

AlphaBIOS Console
The AlphaBIOS console is an enhanced BIOS graphical user interface for Alpha systems. It is
used to run certain utilities, such as the RAID configuration utility.

Updating Firmware and Device Drivers


The SRM firmware resides in the flash ROM located on the main logic board (MLB). The
firmware comes preinstalled from the factory. On occasion, you may need to update the
firmware and device drivers.
You can obtain firmware updates and instructions from:
ftp://ftp.digital.com/pub/Digital/Alpha/firmware/index.html
This site is updated about once a quarter when new Update CDs are released.

Firmware 5-3

Using the SRM Console


To invoke the SRM console make sure the auto_action environment variable is set to
halt. With this setting, the system goes to SRM console mode following any error, halt, power
cycle, or system reset.
When you are in console mode, the SRM console displays this prompt: P00>>>.

SRM Console Start Sequence


A DS20E system that has successfully booted to console mode displays a message like the
following:
Testing the System
Testing the Memory
Testing the Disks (read only)
System Temperature is 31 degrees C
initializing GCT/FRU at 1ea000
COMPAQ AlphaServer DS20E 500 MHz
Console v5.5-3, Aug 9 1999 10:46:40
P00>>>
When you initialize or power up the system, it performs power-on self-tests and produces a
hexadecimal display at the front panel LEDs. Each hex number represents a state of the console
program during the start sequence. The LED codes are helpful if you are troubleshooting a
system that:

Breaks before the console can display information on the screen

Has a faulty monitor

Has a bad video card, and no serial monitor is attached to the system

If the processor stops during startup, the hex countdown also stops, allowing you to determine
the state of the system.

5-4 DS20E Service Guide

Displaying System Configuration


The show config command displays summary information about the system. Other show
commands display specific system information, such as devices attached, memory
configuration, PALcode revisions, and firmware revisions.

This command...

Displays...

show config

List of devices found on the system bus and I/O buses. This
configuration was in effect when you initialized the system.

show cpu

Status of each CPU.

show device

Status of devices and controllers in the system.

show memory

Information about the capacity of each memory bank, the size of the
DIMMs used in the memory bank, and the starting address of each
bank.

show pal

Versions of Tru64 UNIX and OpenVMS PALcode.

show power

Status information about the power supplies, system fans, CPU fans,
and temperature.

show version

Version of the SRM console program that is installed on the system.

Show Config
The show config command displays a list of devices found on the system interconnect and
I/O buses. This is the configuration at the most recent initialization. The syntax is:
P00>>> show config
SRM Console:
PALcode:
Processors
CPU 0
CPU 1

AlphaPC 264DP 500 MHz


V5.5-9
OpenVMS PALcode V1.42-32, Digital UNIX PALcode V1.40-35
Alpha 21264-3 500 MHz SROM Revision: V1.82
Bcache size: 4 MB
Alpha 21264-4 500 MHz SROM Revision: V1.82
Bcache size: 4 MB

Core Logic
Cchip
DECchip 21272-CA Rev 2
Dchip
DECchip 21272-DA Rev 2
Pchip 0
DECchip 21272-EA Rev 2
Pchip 1
DECchip 21272-EA Rev 2
TIG
Rev 4.11
Arbiter
Rev 2.8 (0x1)
MEMORY
Array #
Size
Base Addr
1
128 MB
000000000
Total Bad Pages = 0
Total Good Memory = 128 MBytes
PCI Hose 00
Bus 00 Slot 05/0: Cypress 82C693
Bridge to Bus 1, ISA

Firmware 5-5

Bus 00 Slot 05/1: Cypress 82C693 IDE


dqa.0.0.105.0
Bus 00 Slot 05/2: Cypress 82C693 IDE
dqb.0.1.205.0
Bus 00 Slot 05/3: Cypress 82C693 USB
Bus 00 Slot 06/0: Adaptec AIC-7895
Bus 00 Slot 06/1: Adaptec AIC-7895
Bus 00 Slot 08: 00E31091
Bus 00 Slot 09: Cirrus CL-GD5430
PCI Hose 01
Bus 00 Slot 07: DECchip 21152-AA
Bridge to Bus 2, PCI
Bus 02 Slot 00: NCR 53C875
pka0.7.0.2000.1 SCSI Bus ID 7
dka0.0.0.2000.1 RZ1CB-CS
Bus 02 Slot 01: NCR 53C875
pkb0.7.0.2001.1 SCSI Bus ID 7
dkb500.5.0.2001.1 RRD47
Bus 02 Slot 02: DE500-AA Network Controller
ewa0.0.0.2002.1 00-06-2B-00-13-47
ISA
Slot
Device Name
Type
Enabled BaseAddr
IRQ
DMA
0
0
MOUSE
Embedded Yes
60
12
1
KBD
Embedded Yes
60
1
2
COM1
Embedded Yes
3f8 4
3
COM2
Embedded Yes
2f8 3
4
LPT1
Embedded Yes
3bc 7
5
FLOPPY Embedded Yes
3f0 6 2
P00>>>

Show Cpu
The show cpu command displays the status of each CPU. The syntax is:
P00>>> show cpu
Primary CPU:
Active CPUs:
Configured CPUs:
SROM Revision:
P00>>>

00
00
00
V1.82

01
01
V1.82

5-6 DS20E Service Guide

Show Device
The show device command displays status for devices and controllers in the system: SCSI
and MSCP devices, the internal floppy drive, and the network.
The syntax for this command is:
P00>>>show device controller_name
In this command, controller_name is the controller name or abbreviation. When
abbreviations or wildcards are used, the system displays all controllers that match that type. If
you do not specify a name, the system displays all devices and controllers in the system.
This show device example shows the devices and controllers on a DS20E system.
P00>>>show device
dkc0.0.0.8.0
dqa0.0.0.105.0
dva0.0.0.0.0
eia0.0.0.2005.0
pka0.7.0.6.0
pkb0.7.0.106.0
pkc0.7.0.8.0
P00>>>

DKC0
DQA0
DVA0
EIA0
PKA0
PKB0
PKC0

SEAGATE ST39102LC 7B04


TOSHIBA CD-ROM XM-1702B 1150
00-06-2B-00-6E-56
SCSI Bus ID 7
SCSI Bus ID 7
SCSI Bus ID 7

Show Memory
The show memory command displays information about each memory bank: slot number,
size in megabytes, and the starting address.
P00>>> show memory
Array #
Size
Base Addr
------- ---------- --------0
128 MB
000000000
1
128 MB
008000000
2
128 MB
010000000
3
128 MB
018000000
Total Bad Pages = 0
Total Good Memory = 512 MBytes
P00>>>

Show Pal
P00>>> show pal
pal OpenVMS PALcode V1.61-49, Tru64 UNIX PALcode V1.54-58
P00>>>

Firmware 5-7

Show Power
The show power command displays status information about the power supplies, system
fans, CPU fans, and temperature. This command is useful for displaying the error state of a
system that shuts down because of a fan, temperature, or power supply failure. If the system can
be restarted, use this command; if it cannot, use the Remote Console Managers status
command, described later in this chapter.
P00>>>show power
Power Supply 0
Power Supply 1/Fan Tray
Power Supply 2/Fan Tray
System Fans
CPU Fans
Temperature

Status
good
not present
good
good
good
good

Current ambient temperature is 34 degrees C


System shutdown temperature is set to 42 degrees C
0 Environmental events are logged in nvram

Show Version
The show version command displays the version of the SRM console program that is
currently installed on the system.
P00>>>show version
version
v5.5-1 Jul 30 1999 10:04:02
P00>>>

Showing and Setting Environment Variables


An environment variable is a name and a value association maintained by the console program.
Some environment variables pass configuration information from the console to the operating
system. Their settings determine how the system starts up, boots the operating system, and
operates. Other environmental variables control the operation of various console functions and
are specific to a particular implementation.
You can set or change environment variables with the set command. You can view the value
of environment variables with the show command.
NOTE Whenever you modify the value of the console variable, you must reset the system
by pressing the Reset button or issuing the init command for the new value to take effect.

5-8 DS20E Service Guide

Environment Variables List


The following table shows selected environment variables and their uses.
This variable:

Does this:

auto_action

Sets/shows the console action following an error, halt, or power-up. The


action can be halt, boot, or restart. Halt is the default.
Sets/shows the file name to be used when a bootstrap requires a filename.
The default setting is null.
Sets/shows additional parameters to be passed to system software. When
using Tru64 UNIX software, the following parameters are valid:
i = interactive boot
s = boot to single user
a = autoboot to multiuser
Sets/shows the default device or device list from which the system will
attempt to boot. If the system software is preloaded, the variable is preset
to point to the device containing the preloaded software. Otherwise, the
default value is null.
Changes the default baud rate of the COM1 or COM2 serial port.
Sets the console output to either serial port or the graphics controller.
Enables or disables a specific secondary CPU.
Allows network booting operations. (In this case, possible values are
BOOTP or MOP.)
Selects which Ethernet port to use: AUI (ThinWire); twisted-pair; Full
Duplex, twisted-pair; BNC; Fast (for Fast Ethernet controllers); or FastFD
(for Fast Ethernet controllers that support Full Duplex). AUI is the default.
(Auto-sensing is not supported.)
Determines the Ethernet protocol, which can be either MOP or BOOTP.
MOP is the default.

boot_file
boot_osflags

bootdef_dev

com_baud
console
cpu_enabled
ewa0_inet_init
ew*0_mode or
ei*0_mode

ew*0_protocols
or
ei*0_protocols
kbd_hardware

Specifies the default console keyboard type.

Firmware 5-9

This variable:

Does this:

language

The language environment variable associates a language to the system.


To specify a language, enter:
>>> set language n
Value of n
Language
0
None
30
Dansk
32
Deutsch (Deutschland/Osterreich)
34
Deutsch (Schweiz)
36
English (American)
38
English (British/Irish)
3a
Espanol
3c
Francais
3e
Francais (Canadian)
40
Francais (Suisse Romande)
42
Italiano
44
Nederlands
46
Norsk
48
Portugues
4a
Suomi
4c
Svenska
4e
Belgisch-Nederlands
50
Japanese (JIS)
52
Japanese (ANSI)
After you change the language, you must turn the system off and on again
for the change to take effect.
Overrides the default OCP display text with specified text.
Identifies the operating system being used on the system.
A password stored in NVRAM used to secure the console.
This variable controls PCI parity checking at the PCI bridge chip. Parity
checking is performed if on, disabled if off, and dependent on the SCSI
controller revision if sniff. The default is off.
Certain PCI adapters have been known to generate bad parity on the PCI
under certain loading conditions, resulting in system errors.
Ensure that your specific PCI configuration will operate correctly prior to
turning on parity checking.
Enables fast SCSI mode.
Specifies the default value for a controller host bus node ID.
Enables or disables SCSI terminators on systems that use the QLogic
ISP1040 SCSI controller.
Enables or disables login to the SRM console firmware on other console
ports.

ocp_test
os_type
password
pci_parity

pk*0_fast
pk*0_host_id
pk*0_soft_term
tt_allow_login

5-10 DS20E Service Guide

Initializing the System


The system might occasionally need to be reset, for example, if the operating system hangs.
NOTE: If the system hangs, press the Halt button to return to the console.

You can reset the system with the init command. The syntax is:
P00>>> init
Executing the init command is equivalent to pressing the Reset button. The system
performs self-tests and autoboots. The init command restarts the current in-memory console
image and resets all devices on the PCI bus. The system will not autoboot following an init
command if either of these conditions exists:

The Halt button on the control panel is pushed in.

The auto_action environment variable is set to halt.


NOTE: If the auto_action environment variable is set to boot or restart and the Halt button
is not pushed in, the system will autoboot. In all other cases, the system stops in console
mode and does not attempt to boot.

Example
P00>>>init
Initializing...
128 Meg of system memory
probing hose 1, PCI
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pka -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkb -- Adaptec AIC-7895
bus 0, slot 7 -- vga -- ELSA GLoria Synergy
bus 0, slot 8 -- pkc -- NCR 53C895
bus 0, slot 9 -- ewa -- DE500-AA Network Controller
Testing the System
Testing the Memory
Testing the Disks (read only)
Testing the Network
System Temperature is 34 degrees C
initializing GCT/FRU at 1ec000
COMPAQ AlphaServer DS20E Console v5.5-9, Aug 31 1999 11:52:26
P00>>>

Firmware 5-11

Listing and Reading a File


You can list the files on the system using the ls command.
To read the contents of a file, use the cat command. When you need to read a file that scrolls
too quickly to be viewed, you can use the more command to display the file one page at a
time.
The syntax is:
P00>>> more [-<pagesize>] [<file>...]
where -pagesize specifies the number of lines to print before the prompt, and file
specifies the file(s) to be displayed.
P00>>>ls
P00>>>cat test
P00>>>more -9 test

Example
This example shows the contents of a file called "test" After a page is displayed, you press the
space bar to see the next page of text.
P00>>>more test
echo "Requires diskette and loopback connectors on COM2 and parallel port"
echo "type kill_diags to halt testing"
echo "type show_status to display testing progress"
echo "type cat el to redisplay recent errors"
set d_group field
set d_harderr halt
set d_softerr halt
echo "Start exer on COM2"
serial1
echo "Start nettest on EWA0"
network
show memory
echo "Start Memory test "
memory
echo "Start exer on PARA"
parallel
echo "Start exer on DVA0"
floppy
if (ls dk*.* >nl) then
echo "Start exer on dk*"
disks
fi
dqtest
--More-- (SPACE - next page, ENTER - next line, Q - quit)

5-12 DS20E Service Guide

Editing Files
The edit command invokes a console editor, similar to a line editor in BASIC. It is used to
add, insert, and delete lines in RAM files or the NVRAM (nonvolatile RAM) power-up script.
CAUTION: Use caution when editing the NVRAM script. For example, if you include the
init command in the script, you will put the system into an endless loop. To correct this
error, press the Halt button while the system is powering up. When the P00>>> prompt is
displayed, edit the nvram script to remove the illegal command.

For more information on the edit command, see Creating a Power-up Script, later in this
chapter.

Depositing and Examining Data


You may need to examine or deposit data to help you in troubleshooting a system. For example,
you may need to examine or deposit data to verify that a device in a given PCI slot is responding
to configuration cycles. This might indicate whether the device is working or not. You may also
want to examine physical memory when you have been told that at signature, special data has
been left by the operating system that will help in analyzing a crash.
Many console commands act on a device. The console treats a device as an address space or a
sequential byte stream. A device may represent an extent of memory (pmem), a set of registers
(gpr), a physical device (eerom), or a file (nvram). The console manipulates these byte streams
by performing typical device operations: open, read, write, close.
The examine and deposit commands manipulate devices when accessing data in the
system. The default device for these commands before booting the operating system is physical
memory (pmem). After booting the operating system, the default device is virtual memory
(vmem). The implied device is sticky. That is, all implicit examine/deposit references access
either the default device or the last referenced device. When another device is explicitly
specified in a command, that device becomes the default. Commands manipulate devices when
accessing data in the system. The default device for these commands before booting the
operating system is physical memory (pmem).

Firmware 5-13

Internally, the console uses drivers as the access mechanism for referencing different devices.
Specifically, the console provides drivers for the following generic devices or address spaces:

pmem: physical memory

vmem: virtual memory

gpr: general purpose registers

fpr: floating point registers

ipr: internal processor registers

These hardware devices are also accessible using the device names shown here:

eerom: 8 KB of EEROM, of which the first 6 KB is used to store ARC/AlphaBIOS


nonvolatile data and the last 2 KB is used to store SRM nonvolatile data

toy: 64 bytes, of which the first 14 bytes are Time-of-Year Clock registers and the last 40
bytes are private BBU RAM

5-14 DS20E Service Guide

Deposit Command
The deposit command stores data in a specified location. In this example, the hexadecimal
number 9b (155 in decimal) is stored in physical memory at address 0 as 000000000000009B.
By default, data is stored as a quadword. All values default to eight bytes.
CAUTION: Before experimenting with memory, find a safe area in memory to alter. The
console and other critical data structures reside in memory. Be careful not to alter them
inadvertently. Use the alloc command to allocate a block of memory for experimentation.

The syntax of the deposit command is:


P00>>>deposit [-{b,w,l,q,o,h}][-{p,v,g,f,i}] [-n <count>] [-s <step>]
NOTE: If you do not specify options with a deposit or examine command, the system
uses the options from the preceding command.

Options
Value

Definition

-b

Defines data size as a byte (8 bits).

-w

Defines data size as a word (16 bits).

-l

Defines data size as a longword (32 bits).

-q (default)

Defines data size as a quadword (64 bits). All values default to 8 bytes.

-o

Defines data size as an octaword.

-h

Defines data size as a hexword.

-d

NOTE: This option applies to the examine command only.


The data is displayed in the decoded macro instructions. Alpha instruction
decode (-d) does not recognize machine-specific PAL instructions.

-p

The address space is physical memory.

-v

The address space is virtual memory.

-g

The address space is general purpose memory.

-f

The address space is floating-point registers.

-i

The address space is internal processor registers.

-n <count>

The number of consecutive locations to modify.

-s <step>

The address increment size. (The default is the data size).

Example
The deposit command can be abbreviated to d.
Clear first 512 bytes of physical memory.
P00>>>d -b -n 1FF pmem:0 0
Deposit 5 into four longwords starting at virtual memory address 1234.
P00>>>d -l -n 3 vmem:1234 5

Firmware 5-15

Load GPRs R0 through R8 with -1.


P00>>>d -n 8 R0 FFFFFFFF
Deposit 8 into the first longword of physical memory. Then repeat this operation 16 times,
incrementing each address by 200 for each operation. Result: The value 8 is deposited into
addresses 0, 200, 400, 600, 800, a00, c00, e00, 1000, 1200, 1400, 1600, 1800, 1a00, 1c00, 1e00,
and 2000.
P00>>>d -l -n 10 -s 200 pmem:0 8

Examine Command
The examine command displays the contents of an address you specify: a memory location, a
register, a device, or a file.
As with the deposit command, if you do not specify options in an examine command, the
system uses the options from the last examine command that was entered. Also, if you specify
conflicting address space or data size, the system ignores the command and issues an error
message.
NOTE: For data lengths longer than a longword, data should be separated by a space.

The examine command uses the same options and arguments as the deposit command with
two exceptions:

It also uses a -d option, which specifies an instruction decode.

It does not use the data argument.

The syntax for an examine command is the same as for the deposit command, with the
exceptions noted above.

5-16 DS20E Service Guide

Examples
These examples show how you can use the examine command to view the contents of
different devices.
Examine physical memory location 0.
P00>>>examine pmem:0
pmem: 0 0000000000000000
Deposit the hex number 9b to location 0 in physical memory and then view its contents.
NOTE: By default, data is stored as a quadword, so the actual number stored is zero,
padded for data length.

P00>>>deposit pmem:0 9b
P00>>>examine pmem:0
pmem: 0 000000000000009B
Examine the next location.
NOTE: An examine or deposit command without an explicit address always references
the next address (computed as the last referenced address plus the current data size).

P00>>>examine
pmem: 8 0000000000000000
Examine location 0 again.
P00>>>examine 0
pmem: 0 000000000000009B
Examine the contents of the TOY register.
P00>>>examine toy:0
toy: 0 1C06AF026F37002E

Firmware 5-17

Creating a Power-Up Script


The system has a special nonvolatile file named "nvram" that is stored in EEROM. Nvram is a
user-created power-up script (set of commands) that is always invoked during the power-up
sequence. You use the SRM edit command to create or alter the nvram script.
NOTE: It is possible to disable the system by editing the nvram script. For example, if you
include the init command in the script, the system will go into an endless loop. To fix this,
press the Halt button while the system is powering up. You can then edit the script to
delete the offending command.

Editing the Nvram Script


Using the SRM edit command, you can create an nvram script to include any commands you
want the system to execute at power-up. The edit command provides editing commands that
allow you to list the file, renumber lines, delete lines, write over lines, and so on.
The syntax for the edit command is:
P00>>>edit file
where file is the name of the file to be edited.
This Command:

Does This:

help

Displays the brief help file.

list

Lists the current file prefixed with line numbers.

renumber

Renumbers the lines of the file in increments of 10.

exit

Leaves the editor and closes the file, saving all


changes.

quit

Leaves the editor and closes the file without saving


changes.

nn

Deletes line number nn.

nn text

Adds or overwrites line number nn with text.

5-18 DS20E Service Guide

Example
This example shows how to modify the user-created power-up script, nvram. The pound sign
(#) indicates explanatory comments. In this example the script is edited to include a command
that allows you to boot the Tru64 UNIX operating system over the network.
P00>>> edit nvram
#Modify user power-up script, nvram
editing nvram
0 bytes read in
*10 set ewa0_protocols bootp
*list
#List current file with line numbers
10 set ewa0_protocols bootp
*exit
#Close file and save changes
27 bytes written out to nvram
P00>>> nvram
#Execute the script.

Firmware 5-19

Booting an Operating System


The boot command initializes the processor, loads a program image from the specified boot
device, and transfers control to that image.
Command qualifiers and arguments allow you to modify the boot sequence.
The syntax is:
P00>>>boot[-file <filename>][-flags <value>[,<value>]][-protocols
<enet_protocol>][-halt] [<boot_device>][,<boot_device>]

Qualifier

Meaning

-file <filename>

This qualifier specifies the name of a file to load into the


system. The default is null.
P00>>> boot -file vmunix

Use the set boot_file command to specify a default


boot file. If no default file has been specified, you may have
to enter the boot filename.

-flags <value>

This qualifier specifies additional information to the


loaded image or operating system.
P00>>> set boot -flags a

In Tru64 UNIX, it specifies boot flags. This qualifier


overrides the setting of the boot_osflags
environment variable.
The following parameters are used with the Tru64
UNIX operating system:
a

Autoboot. Use this for a system that should


come up automatically after a power failure.

Stop in single-user mode.

Interactive boot. Request the name of the


image to boot from the specified boot device.

Full memory dump, implies "s" as well.

5-20 DS20E Service Guide

Qualifier

Meaning

-protocols
<enet_protocol>

This qualifier specifies the Ethernet protocol(s) to be


used for a network boot. Either mop or bootp may be
specified. If both are specified, each protocol is
attempted to solicit a boot server.
P00>>> set boot -protocols mop

The qualifier overrides the setting of the


ew*0_protocols environment variable.
The qualifier -protocols boot_dev defines a
device path list of devices from which the console
program attempts to boot, or a saved boot
specification in the form of an environment variable
(option). This qualifier overrides the setting of the
bootdef_dev environment variable.
NOTE: Use the bootdef_dev environment variable to define
the default boot device string.
-halt

This qualifier forces the bootstrap operation to halt


after the load and to return to the console prompt.
P00>>> set boot -halt

The console program is started after the system has


loaded the bootstrap program and has set up page
tables and other necessary data structures.
During this process, console device drivers are not
shut down. You can transfer control to the image by
entering the continue command.
<boot_device>

This argument specifies a device path or list of devices


from which the firmware attempts to boot, or a saved
boot specification in the form of an environment
variable.
P00>>> set boot dka0

Use the set bootdef_dev command to define the


default boot device. If no default device has been
specified, then the user must enter it at the command
prompt.

Firmware 5-21

Forcing a System Crash Dump


The crash command forces a crash dump at the operating system level. This command is
used when an error has caused the system to hang and the system can be halted with the Halt
button or the Remote Console Managers halt command. This command restarts the operating
system and forces a crash dump to the selected device.

Obtaining Help
You can use the SRM consoles on-line help system for reference.
NOTE: The on-line help may display commands that are not supported on the DS20E
system, and it may not display some commands that are supported.

You invoke help with either the help or man command.


The following commands are documented in SRM Help:
alloc
chmod
deposit
examine
free
init
line
more
sa
show
show memory
stop

boot
clear
dynamic
exer
grep
isacfg
ls
net
semaphore
show cluster
show_status
sys_exer

break
continue
echo
exit
halt
isp1020_edit
man
nettest
set
show config
sleep
true

cat
crash
edit
false
hd
kill
memexer
ps
set host
show iobq
sp
update

check
debug1
eval
find_field
help
kill_diags
memtest
rm
shell
show map
start
wc

For more information about SRM console commands, consult the reference manual,
AlphaServer 800, 1000/A, 2x00/A, 4x00, 8x00 SRM Console Command Line Interface:
http://prosic.cxo.dec.com/PUBS/SYSTEMS/EK-ASCLI-SRM-04.pdf

5-22 DS20E Service Guide

Using the AlphaBIOS Console


Starting AlphaBIOS
Start the AlphaBIOS Setup by pressing F2 from the Boot screen displayed at power-up or reset.
AlphaBIOS Version 5.68
Please select the operating system to start:
Tru64 UNIX

Press Enter to choose.

AlphaPowered
Press F8 For Windows 2000 Advanced Startup Options

Press <F2> to enter SETUP


CAT0100

Firmware 5-23

To invoke AlphaBIOS, enter the following command at the SRM console:


P00>>> alphabios
The AlphaBIOS Setup screen shown below is displayed. From this screen you can select the
tasks to perform. Use the arrow keys to select the menu item you want and press Enter.
NOTE: Only the following choices are applicable for the Tru64 UNIX and
OpenVMS operating systems:
Utilities
About AlphaBIOS

AlphaBIOS Setup

Display System Configuration...


Upgrade AlphaBIOS
Hard Disk Setup...
CMOS Setup...
Install Windows NT
Utilities
About AlphaBIOS...

F1=Help

Run Maintenance Program...

ESC=Exit
PK0954b

5-24 DS20E Service Guide

Keyboard Conventions and Help


AlphaBIOS uses DOS and Windows keyboard conventions for navigating the interface and
selecting items. The valid keystrokes are listed in the keyboard help screens.
Two levels of keyboard help are available. The first level, reached by pressing F1 once, shows
explanations of the keystrokes available for the specific part of AlphaBIOS currently displayed.
The second level of keyboard help, shown below, is reached by pressing F1 from the first help
screen. The second level screen shows the keystrokes for navigating AlphaBIOS.
AlphaBIOS Setup

F1=Help

Help: Action Keys


TAB

Move highlight forward between fields of a dialog.

SHIFT+TAB

Move highlight backward between fields of a dialog.


Move highlight within a menu, or cycle through available field
values in a dialog window.

ALT+

Drop down a menu of choices from a drop-down listbox.


drop-down listbox can be recognized by the
symbol.

HOME

Move to the beginning of a text entry field.

END

Move to the end of a test entry field.

Move to the left or right in a text entry field.


ESC

Discard changes and/or backup to previous screen.

ENTER=Continue

PK-0725-96

Firmware 5-25

Running AlphaBIOS from a Serial Terminal


Utilities can be run from either a VGA monitor or a serial terminal (VT200 or higher, or
equivalent). The following table gives the serial terminal key equivalents of the graphics
monitor keyboard commands.

Serial Terminal Key Commands for AlphaBIOS


Graphics Terminal Command

Serial Terminal Key Equivalent

F1

Ctrl + A

F2

Ctrl + B

F3

Ctrl + C

F4

Ctrl + D

F5

Ctrl + E

F6

Ctrl + F

F7

Ctrl + P

F8

Ctrl + R

F9

Ctrl + T

F10

Ctrl + U

Insert

Ctrl + V

Delete

Ctrl + W

Backspace

Ctrl + H

Esc

Ctrl + [

5-26 DS20E Service Guide

Utilities
Configuration utilities are run directly from the AlphaBIOS Utilities menu .
Run Maintenance Program

F1-Help

Program Name: __________________

Location: A:

ENTER=Execute

ESC=Quit
CAT0138

If you change your system configuration, for example, by adding another RAID drive, you will
have to run the RAID configuration utility. As you modify your system, you might be required
to run other types of configuration utilities as well. Configuration utilities (also called
maintenance programs) are run directly from the AlphaBIOS Utilities menu.

Firmware 5-27

To Run a Configuration Utility


1. From AlphaBIOS Setup, select Utilities. From the submenu that is displayed, select
Run Maintenance Program and press Enter.
2. In the Run Maintenance Program dialog box, type the name of the program to be run
in the Program Name field. (The program must be an executable). Then tab to the
Location list box, and select the hard disk partition, diskette, or CD-ROM drive from
which to run the program.
3. Press Enter to execute the program.
NOTE: If you are running a utility from a diskette, you can type the utilitys
name into the Program Name field, and press Enter. The diskette drive is
the default selection in the Location field. Use Alt+Down arrow when a list
box is selected to open the list.

5-28 DS20E Service Guide

Remote Management Console


The remote management console (RMC) monitors and controls the system remotely. The
control logic resides on the system board. The RMC firmware resides on the server features
module and can be accessed only through COM1. The RMC is run from a serial console
terminal or terminal emulator. A command interface lets you reset, halt, and power the system
on or off, regardless of the state of the operating system or hardware. You can also use the RMC
to monitor system power and temperature.
You can invoke the RMC either remotely or through the local serial console terminal. Once in
RMC command mode, you can enter commands to control and monitor the system. Only one
RMC session can be active at a time.
CAUTION: Do not issue RMC commands until the system has powered up. If you enter
certain RMC commands during power-up or reset, the system may hang. In that case, you
would have to disconnect the power cord at the power outlet. You can, however, use the
RMC halt command during power-up to force a halt assertion.

First-Time Setup
Using RMC Locally or with a Modem on COM1
To connect to the RMC locally, the console terminal has to be connected to COM1. You type
the escape sequence at the SRM console prompt on the local serial console terminal to enter
RMC command mode. You can invoke RMC from the SRM console, the operating system, or
an application.

To invoke the RMC locally, type the RMC escape sequence.

To exit RMC and reconnect to the system console port, enter the quit command.

Press Return to get a prompt from the operating system or system console.

Example: Invoking and Leaving RMC Locally


P00>>> ^]^]RCM
RCM>
RCM> quit
Focus returned to COM port

Firmware 5-29

RMC Commands
The following RMC commands are used to control and monitor a system remotely:

Command
halt

Function
Halts the server. Emulates pressing the Halt button and immediately releasing
it.
Causes a halt assertion. Emulates pressing the Halt button and holding it in.
Terminates a halt assertion created with the haltin command. Emulates
releasing the Halt button after holding it in.
Displays the list of commands
Turns off power. Emulates pressing the On/Off button to the off position.
Turns on power. Emulates pressing the On/Off button to the on position.
Exits console mode and returns to system console port.
Resets the server. Emulates pressing the Reset button.
Changes the escape sequence for invoking command mode.
Displays system status and sensors.

haltin
haltout
help or ?
poweroff
poweron
quit
reset
set escape
status

Command Conventions

The commands are not case sensitive.

A command must be entered in full.

You can delete an incorrect command with the Backspace key before you press Enter.

If you type a valid RMC command, followed by extra characters, and press Enter, the
RMC accepts the correct command and ignores the extra characters.

If you type an incorrect command and press Enter, the command fails with the message:
*** ERROR - unknown command ***

Halt
The halt command halts the managed system. The halt command is equivalent to pressing
and then immediately releasing the Halt button on the control panel. The RMC firmware exits
command mode and reconnects the users terminal to the system COM1 serial port.
RCM>halt
Focus returned to COM port
NOTE: The halt command can be used to force a halt assertion.

Haltin
The haltin command halts a managed system and forces a halt assertion. The haltin
command is equivalent to pressing and holding in the Halt button on the control panel. This
command can be used at any time after system power-up to allow you to perform system
management tasks.

5-30 DS20E Service Guide

Haltout
The haltout command terminates a halt assertion that was done with the haltin
command. It is equivalent to releasing the Halt button on the control panel after holding it in
(rather than pressing it once and releasing it immediately). This command can be used at any
time after system power-up.

Help or ?
The help or ? command displays all of the RMC firmware commands.

Poweroff
The poweroff command requests the RMC to power off the system. The poweroff
command is equivalent to pressing the On/Off button on the control panel to the off position.
RCM>poweroff
If the system is already powered off or if switch 3 (RPD DIS) on the switchpack has been set to
the on setting (disabled), this command has no immediate effect.
To power the system on again after using the poweroff command, you must issue the
poweron command.
If you are not able to issue the poweron command, the local operator can start the system as
follows:
1. Press the On/Off button to the off position and disconnect the power cord.
2. Reconnect the power cord and press the On/Off button to the on position.

Poweron
The poweron command requests the RMC to power on the system. The poweron command
is equivalent to setting the On/Off button on the control panel to the ON position. For the
system to power on, the following conditions must be met:

AC power must be present at the power supply inputs.

The On/Off button must be in the on position.

All system interlocks must be set correctly.

The RMC exits command mode and reconnects the users terminal to the system console port.
RCM>poweron
Focus returned to COM port
NOTE: If the system is powered off with the On/Off button, the system will not power up
from the RMC. The RMC will not override the "off" state of the On/Off button. If the system
is already powered on, the poweron command has no effect.

Firmware 5-31

Quit
The quit command exits the user from command mode and reconnects the serial terminal to
the system console port. The following message is displayed:
Focus returned to COM port
The next display depends on what the system was doing when the RMC was invoked. For
example, if the RMC was invoked from the SRM console prompt, the console prompt is
displayed when you enter a carriage return. If the RMC was invoked from the operating system
prompt, the operating system prompt is displayed when you enter a carriage return.

Reset
The reset command requests the RMC to reset the hardware. The reset command is
equivalent to pressing the Reset button on the control panel.
RCM>reset
Focus returned to COM port
The following events occur when the reset command is executed:

The system restarts and the system console firmware reinitializes.

The console exits RMC command mode and reconnects the serial terminal to the system
COM1 serial port.

The power-up messages are displayed, and then the console prompt is displayed or the
operating system boot messages are displayed, depending on how the startup sequence has
been defined.

Setesc
The setesc command resets the default escape sequence for invoking RMC. The escape
sequence can be any character string. A typical sequence consists of 2 or more characters, to a
maximum of 15 characters. The escape sequence is stored in the modules on-board NVRAM.
NOTE: Be sure to record the new escape sequence. Although the factory defaults can be
restored if you forget the escape sequence, this requires resetting the EN RMC switch on
the RMC switchpack. See Using the RMC Switchpack.

The following sample escape sequence consists of five iterations of the Ctrl key and the letter
"o".
RCM>setesc
^o^o^o^o^o
RCM>

5-32 DS20E Service Guide

If the escape sequence entered exceeds 15 characters, the command fails with the message:
*** ERROR ***
When changing the default escape sequence, avoid using special characters that are used by the
systems terminal emulator or applications.
Control characters are not echoed when entering the escape sequence. Use the status
command to verify the complete escape sequence.

Status
The status command displays the current state of the system sensors, as well as the current
escape sequence and alarm information. The following is an example of the display:
RCM>status
Firmware Rev: V2.0
Escape Sequence: ^]^]RCM
Remote Access: ENABLE
Temp (C): 26.0
RCM Power Control: ON
RCM Halt: Deasserted
External Power: ON
Server Power: ON
RCM>

RMC Status Command Fields


Item

Description

Firmware Rev:
Escape Sequence:
Remote Access:
Temp (C):
RCM Power Control:
RCM Halt:

Revision of RMC firmware.


Current escape sequence to invoke RMC.
Modem remote access state. (ENABLE/DISABLE)
Current system temperature in degrees Celsius.
Current state of RMC system power control. (ON/OFF)
"Asserted" indicates that halt has been asserted with the haltin
command. "Deasserted" indicates that halt has been deasserted with
the haltout command or by cycling power with the On/Off button on
the control panel. The RCM Halt field does not report halts caused by
pressing the Halt button.
Current state of power to RMC. Always on.
Indicates whether power to the system is on or off.

External Power:
Server Power:

Firmware 5-33

Using the RMC Switchpack


The RMC operating mode is controlled by a switchpack on the server features module V2
(SFM2). Use the switches to enable or disable certain RMC functions, if desired.

5-34 DS20E Service Guide

Switch

Function

Description

SW1-1

PIC Enable

SW1-2

Fault Shutdown Disable

SW1-3

PIC SYSPWR_ENABLE
Bypass

SW1-4

Load PIC Defaults

This switch redirects the system COM1 transmit and receive


serial lines to the SFM2, enabling the RMC to communicate
with the device attached to COM1. Note that the hardware
flow control lines are not passed to the SFM2 from the DP264.
This switch allows the user to bypass the critical fault
shutdown that normally occurs when a CPU fan fails, the
internal ambient temperature exceeds a preset threshold (set
to 50 or 55 degrees C), or a power supply failure results in an
invalid power supply configuration. Invalid power supply
configuration basically means there is only one good power
supply remaining. It can take multiples of the 30-second
shutdown timer for the power supply configuration to result in
a final shutdown. Overcurrent sensing in the power supply
protects the system from damaging itself and will shut off
power more quickly if needed.
This switch bypasses the control signal from the PIC that
enables the system to operate. With this switch in the
CLOSED or ON position, software cannot shut the system
down. Use this to prevent remote shutdown.
This switch forces the PIC to default settings.

Changing a Switch Setting


1. Turn off the system.
2. Unplug the AC power cords.
NOTE: If you do not unplug the power cord, the new setting will not take effect when you
power up the system.

3. Remove the system covers.


4. Locate the RMC switchpack on the server features module and change the switch setting
as desired.
5. Replace the system covers and plug in the power cords.
6. Power up the system to the SRM console prompt and type the escape sequence to enter
RMC command mode, if desired.

Firmware 5-35

Resetting the RMC to Factory Defaults


You can reset the RMC to factory settings, if desired. You would need to do this if you forgot
the escape sequence for the RMC. Follow the steps below.
1. Turn off the system.
2. Unplug the AC power cords.
NOTE: If you do not unplug the power cord, the new setting will not take effect when you
power up the system.

3. Remove the system covers.


4. Locate the RMC switchpack on the server features module and set switch 4 to ON.
5. Replace the system covers and plug in the power cords.
6. Power up the system to the SRM console prompt.
NOTE: Powering up with switch 4 set to ON resets the escape sequence, password, and
modem enable states to the factory defaults.

7. Power down the system, unplug the AC power cords, and remove the system covers.
8. Set switch 4 to OFF.
9. Replace the system covers and plug in the power cords.
10. Power up the system to the SRM console prompt, and type the default escape sequence to
invoke RMC command mode. The escape sequence is the Ctrl key + left bracket key,
typed twice, followed by the letters rcm:
^]^]RCM

5-36 DS20E Service Guide

Troubleshooting the RMC


The following table provides a list of possible causes and suggested solutions for problem
symptoms you might see with the RMC:

Symptom

Possible Cause

Suggested Solution

The local console terminal is


not accepting input.

Cables not correctly installed. Check external cable installation.


Switch 1 on the switchpack is Set switch 1 to ON.
set to disable.

The console terminal is


displaying unrecognizable
characters.

System and terminal baud


rate set incorrectly.

After the system and RMC are This delay is normal


powered up, the COM port
behavior.
seems to hang briefly.

Disable RMC and set the system


and terminal baud rates to 9600
baud.
Wait a few seconds for the COM
port to start working.

RMC installation is complete, RMC Power Control is set to Invoke RMC and issue the
poweron command.
but the system does not power DISABLE.
up.
Reseat the cables.
Cables are not correctly
installed.
You reset the system to factory AC power cords were not
defaults, but the factory
removed before you reset
settings did not take effect.
switch 4 on the RMC
switchpack.

Refer to Using the RMC


Switchpack.

The message "unknown


command" is displayed when
the user enters a carriage
return by itself.

Change the terminal or terminal


emulator setting so that "new line"
is not selected.

The terminal or terminal


emulator is including a
linefeed character with the
carriage return.

Chapter

Troubleshooting

Introduction
As a service engineer, you are responsible for troubleshooting a system. Following a simple
checklist or strategy can help minimize the time you spend or help ensure that an obvious
problem is not overlooked. Many resources are available to help you isolate problems.
Topics in this chapter are:

Basic Troubleshooting

Problem Categories

Power-Up/Down Sequence

Troubleshooting Tools and Utilities

Using Firmware to Troubleshoot

Changing the System Type

For More Information

6-2 DS20E Service Guide

Basic Troubleshooting
Considerations Before Troubleshooting
Before troubleshooting a problem, check the site maintenance log for service history. Ask the
system manager the following questions:
1.

Has the system been used before and did it work correctly?

2.

Have any recent changes to hardware or updates to firmware or software occurred?

3.

If changes or updates were made, are the revision numbers compatible for the system and
the operating system?

4.

What is the state of the system?


a. Operating system is down and you cannot bring it up.
b. Operating system is running.

If the operating system is down and you cannot bring it up, use power-up information and
console environment tools:

Power-up display

LEDs and beep codes

ROM-based diagnostics

If the operating system is running, use the operating system (OS) to gather information from
crash dumps, error logs, and the operator log. Run OS-based diagnostics.

Steps for Isolating Faults


Follow these steps to isolate a fault. Also refer to the chart on the next page.
1.

Define the problem.

2.

Gather information about the problem, including system status, revision levels of firmware,
and the operating system.

3.

Evaluate the information.

4.

Isolate the problem.

5.

Solve the problem. If necessary, escalate.

6.

Clean up any modifications made in testing.

7.

Verify the solution.

8.

Document the problem and record the solution.

Troubleshooting

Troubleshooting Strategy

6-3

6-4 DS20E Service Guide

Problem Categories
System problems can be classified into the following categories. Using these categories, you can
quickly determine a starting point for diagnosis and eliminate the unlikely sources of the
problem.

Power problems

No access to console mode

Console-reported failures

Boot problems

Thermal problems

Operating system-reported failures

System hung at power-on

Memory problems

PCI bus problems

SCSI problems

Power Problems
If the system does not power on, perform the following steps:
1.

Check the power source and power cord.

2.

Ensure that the power switch connector is plugged in.

3.

Check the magnetic interlock switch.

4.

Check that the ambient room temperature is within environmental specifications (10C
40C, 50F104F).

5.

Check the remote management console using the status command. Look for fan status,
system temperatures, or power supply failures.

6.

Check that the cables on the system board are connected properly.

7.

Check that the internal power supply cables are plugged in at both the power supply and
system board.

8.

Ensure that both fans are plugged in and operating properly. Any non-operational fan will
cause automatic OS shutdown and can prevent power-on.

9.

Look for short circuits or overcurrent if the power will not stay on.

10. Wait two or more seconds after plugging the unit in before powering it on.
If the power button seems to be flashing, ACPI sleep mode is on. Push and hold the system
power button for more than four seconds if you want to shut the power off completely.

Troubleshooting

Troubleshooting Suggestions
If the power indicator is:
OFF

Check:
Front-panel power switch
Power at the wall receptacle
AC cord

ON for a few seconds and


then goes OFF

Fans
There are three main fans in the system: two
are at the front of the system (top and bottom),
and one is in the power supply.
NOTE: The power supply shuts OFF within one
second if its internal fan fails.

ON, but the display monitor


is blank

Monitor power indicator is ON.


Raster is present if the brightness is turned up.
NOTE: On systems with multiple video cards, a
raster will be present on the primary card only.

Fuses and circuit breakers.


Video cable is properly connected.
Power is present at the system board.
SRM console environment variable.
NOTE: Systems using the SRM console display a
black raster if the console environment variable is
set to serial mode rather than graphics mode.

6-5

6-6 DS20E Service Guide

No Access to Console Mode

Consult LED code charts and try suggested fixes.

Floppy light illuminated indicates firmware corrupted. Create firmware update floppy
disk. Then insert disk, power cycle system, and update firmware to repair system.

CPU fan failure. Replace CPU that has faulty fan. If a CPU fan is frozen, you can access
the RMC, but the system will not respond to a reset. If the CPU fan is open or
disconnected, the system powers off within 30 seconds.

Interpret the error beep codes at power-up for a failure detected during self-tests.

Check that the keyboard and monitor are properly connected and turned on.

If the power-up screen is not displayed, yet the system enters console mode when you
press return, check that the console environment variable is set correctly. If you are
using a VGA monitor as the console terminal, the console variable should be set to
graphics. If you are using a serial console terminal, the console variable should be set to
serial.

If a VGA controller other than the standard VGA controller is being used, ensure that the
VGA device conforms to the specified VGA legacy addressing. The P2P bridge will not
pass non-VGA legacy ISA addresses through to the secondary side. Also, the VGA BIOS
ROM must be readable by means of the standard PCI expansion ROM space.

If the console is set to serial mode, the power-up screen is routed to the COM1 serial
communication port and cannot be viewed from the VGA monitor. Try connecting a
console terminal to the COM1 serial communication port. If necessary, use an MMJ-to-9pin adapter (H8571-J). Check the baud rate setting for the console terminal and the
system. The system baud rate setting is 9600. When using the COM1 port, you must set
the console environment variable to serial.

If you suspect a firmware problem, use the fail-safe boot mechanism described later in this
chapter to load new console firmware from a diskette.

Troubleshooting

6-7

Console-Reported Failures
Symptom
Power-up tests do not
complete.

Console program reports


an error.

Solution
Use error beep codes or console serial terminal to determine what
error occurred and what FRU to replace.
Check the power-up screen for error messages

Interpret the error beep codes at power-up and check the power-up
screen for a failure detected during self-tests.
Use the error beep codes and/or console terminal to determine the
error.
Examine the console event log (enter the more el command) or the
power-up screen to check for embedded error messages recorded
during power-up.
If the power-up screen or console event log indicates problems with
mass storage devices, or if storage devices are missing from the
show config display, see SCSI Problems in this chapter.
If the power-up screen or console event log indicates problems with
PCI devices, or if PCI devices are missing from the show config
display, see PCI Problems in this chapter.
Use console commands such as test, show config, and more
el to verify the problem.

6-8 DS20E Service Guide

Boot Problems
Problem/Possible Cause
Action
Installation fails with an "inaccessible If the ATAPI CD-ROM is not shown as a SCSI
boot dev" restart installation
device, obtain the driver and setup file from the Web
message.
page for Alpha systems and specify it as an
additional controller.
Operating system (OS) software is
not installed on the hard disk drive.

Install the operating system and license key.

Target boot device is not listed in the Check the cables. Are the cables oriented properly
and not cocked? Are there bent pins? Check all the
SRM show device or show
config display because of a SCSI SCSI devices for incorrect or conflicting IDs. Refer to
bus problem.
the devices documentation.
SCSI termination: The SCSI bus must be terminated
at the end of the internal cable and at the last
external SCSI peripheral.
System cannot find the boot device. Check the system configuration for correct device
parameters. Use SRM firmware to display the
hardware configuration.
Use the SRM show config and show device
commands. Use the displayed information to identify
target devices for the boot command, and verify that
the system sees all of the installed devices. If you are
attempting to use bootp, first set the following
variables as shown:
P00>>>set ewa0_inet_init BOOTP
P00>>>set ewa0_protocols BOOTP

Troubleshooting

Problem/Possible Cause
System does not boot.

Action
Verify that no unsupported adapters are installed.

Environment variables are incorrectly Configuration information is stored in the flash ROM
set. (This could happen if the MLB and RTC memory on the MLB. If the MLB is
has been replaced, which would
replaced, the information in the flash ROM and realcause a loss of the previous
time clock is lost. If the battery is replaced, the
configuration information).
information in RTC memory is lost. Keeping records
of the configuration (IRQs, DMAs, I/O addresses,
and so on) will facilitate getting the system back into
use.
Check and set the environment variables, if
necessary.
Use the SRM console show and set commands to
check and set the values assigned to boot-related
variables such as auto_action, bootdef_dev,
and boot_osflags.
System will not boot over the
network.

For problems booting over a network, check the


ew*0_protocols or ei*0_protocols
environment variable settings: Systems booting from
a Tru64 UNIX server should be set to bootp;
systems booting from an OpenVMS server should be
set to mop. Run the device tests to check that the
boot device is operating.

6-9

6-10 DS20E Service Guide

Thermal Problems
The DS20E system operates in an ambient temperature range of 10C40C. The system unit is
cooled by as many as 10 fans. There are two enclosure fans in the rear of the enclosure, two fans
in each power supply, and one fan on each CPU. A minimum configuration would have seven
fans.
Intermittent problems could result from overheating. Check that the airflow path is clear. Make
sure nothing is blocking the input grill. Also check to see that the cables inside the system are
properly dressed. A dangling cable can impede airflow to the system.

Operating System-Reported Failures


Symptom
System is hung or has crashed.

Solution
Examine the crash dump file.
Refer to the Guide to Kernel Debugging (AA-PS2TD-TE)
for information on using the Tru64 UNIX Crash utility.

Errors have been logged and the


operating system is up.

Examine the operating system error log files to isolate the


problem.

System Hung at Power-On

Consult diagnostic LED code charts and try suggested fixes.

Floppy light illuminated; firmware corrupted. Follow instructions for creating firmware
update floppy disk. Update firmware to repair system.

CPU fan failure. Replace CPU that has bad fan. If a CPU fan is frozen, you can access the
RMC, but the system will not respond to a reset. If the CPU fan is open or disconnected,
the system powers off within 30 seconds.

Memory Problems
Symptom
DIMMs ignored by system, or system
unstable. System hangs or crashes.

Solution
Ensure each memory bank has identical DIMMs installed.

DIMMs failing memory POST testing.

Try another pair of DIMMs.

DIMMs may not have ECC bits.

Ensure that ECC capable PC100 buffered DIMMs are


installed. (Look for 72-bit, not 64-bit DIMMs.) See note.

Noticeable performance degradation.


The system may appear hung or run very
slowly.

This could be a result of hard single bit ECC errors on a


particular DIMM. Check the error logs for memory errors.
Ensure memory DIMMs are qualified and check field blitz
data to ensure DIMM was not from a bad lot. Passing POST
does not ensure that DIMMs are good.

NOTE: Some third-party DIMMs may not be compatible with DS20E systems:

POST checks 24+ SDRAM DIMM performance parameters for each DIMM to ensure
system reliability.

Troubleshooting

All DS20E system qualified DIMMs meet performance requirements.

SDRAM DIMMs come in many quality grades, which do not all meet the performance
requirements of these machines.

PCI Bus Problems


PCI bus problems at startup are usually indicated by the inability of the system to detect the PCI
device. The following steps can be used to diagnose the likely cause of PCI bus problems:
1.

Confirm that the PCI option card is supported and has the correct firmware and software
versions.

2.

Confirm that the PCI option card and any cabling are properly seated.

3.

Check for a bad PCI slot by moving the last installed PCI controller to a different slot.

4.

Call the option manufacturer for help.

PCI Parity Error


Some PCI devices do not implement PCI parity, and some have a parity generating scheme that
may not comply with the PCI specification. In such cases, the device should function properly if
parity is not checked.
Parity checking can be turned off with the set pci_parity off command so that false
PCI parity errors do not result in machine check errors. However, if you disable PCI parity, no
parity checking is implemented for any device. Turning off PCI parity is therefore not
recommended or supported.
1.

All PCI devices must correctly handle PCI parity to enable checking feature.

2.

Parity errors indicate a PCI option card is probably at fault.

3.

PCI option card may be defective or not fully PCI compliant.

4.

If the card must be used, try to disable PCI parity checking in SRM firmware.

5.

If the problem is not specific to the PCI option cards, replace CPU card or the system
board.

6-11

6-12 DS20E Service Guide

SCSI Problems
SCSI problems are generally manifested as:

Data corruption

Boot problems

Poor performance

Check SCSI bus termination.

Cable is properly seated at system board or option connector.

Bus must be terminated at last device on cable or at physical cable end.

No terminators in between.

Old 50-pin (narrow) devices must be connected with wide-to-narrow adapter (SNPBXKP-BA). Do not cable from the connector on the card.

Using 50-pin devices on the bus may significantly degrade performance.

Any external drives must be connected to their associated card, and these card must have no
internal drives connected to them. Use a separate external controller card.

Ultra-wide SCSI has strict bus length requirements.

SCSI bus itself cannot handle internal plus external cable.

Use a separate card for external devices and terminate properly.

Power Up/Down Sequence


When AC is applied to the system, VAUX (auxiliary voltage) is asserted and is sensed on the
server features module. If the On/Standby button is on, and RMC OK and Interlock OK are
asserted, the OCP asserts DC_ENABLE_L, starting the power supplies. If there is a hard fault
on power-up, the power supplies shut down immediately; otherwise, the power system powers
up and remains up until the system is shut off or the server feature module senses a fault. If a
power fault is sensed, the signal SHUTDOWN is asserted after a 30-second delay. Cycling the
On/Standby button can restore power. If the system powers up and shuts off in approximately
30 seconds, the server features module has sensed a fault and a fan (system or CPU) is likely
broken.

Troubleshooting

6-13

Troubleshooting Tools and Utilities


Use the following tools for acceptance testing, diagnosis, and service. Refer to the following
chapters for additional information: Firmware, Error Registers, and OS Diagnostics
Overview.

Tools
Error handling/logging tools

Description
Use error logs as the primary method of diagnosis and fault isolation. If the
system is up or you can bring it up, look here first.

ROM-based diagnostics
(RBDs)

RBDs execute automatically at power-up and can be invoked in console


mode. Use RBDs to test the console environment and to diagnose CPU,
memory, Ethernet, I/O buses, and SCSI subsystems. Use in acceptance
testing when installing a system, adding memory, or replacing these
components: CPU, memory, main logic board, I/O bus device, or storage
device.

Loopback tests

Use this RBD subset to isolate a failure by testing segments of a control or


data path. Use to isolate problems with the COM2 serial port, parallel port,
and Ethernet controllers.

Firmware console commands Use SRM commands to set and examine environment variables and invoke
RBDs and exercisers.
Crash dumps

Crash dumps are created when the operating system hangs and is manually
halted by pressing and holding the Halt button for at least one second. The
SRM crash command provides a display of the crash dump that can be
used to determine why the system crashed.

Fail-Safe Booter (FSB) Utility

The fail-safe booter allows you to restore console firmware that may have
become corrupted. Use the FSB when one of the following failures at
power-up prohibits you from getting to the console program:
Firmware image in flash memory corrupted
Power failure or accidental power-down during a firmware upgrade
Error in the nonvolatile RAM (NVRAM) file
Incorrect environment variable setting
Driver error

6-14 DS20E Service Guide

Fail-Safe Booter (FSB) Utility


The fail-safe booter (FSB) provides an emergency recovery mechanism when the firmware
image contained in flash memory has become corrupted. You can run the FSB and boot another
image from a diskette that is capable of reprogramming the flash ROM.

Starting the FSB


You can start the FSB automatically or manually:

If the firmware image is unavailable when the system is powered on or reset, the FSB runs
automatically. When the FSB runs, the system emits a series of beeps through the speaker
as beep code 1-2-3; that is, one beep and a pause, followed by two beeps and a pause,
followed by three beeps.

1.

After the diskette activity light flashes, insert the FSB diskette named DP264SRM.ROM
that you created. (See the next section, Preparing Diskettes.)

2.

Reset the system to restart the FSB. The FSB loads the SRM console from the diskette.

3.

Proceed to the section Updating Firmware.

1.

Start the FSB manually as follows:


Power the system off.

2.

Set switch 1 (FSB) of SW2 on the main board to the on position (see the following
illustration).

3.

Insert the FSB diskette named DP264SRM.ROM that you created. (See the next section,
Preparing Diskettes.)

4.

Power on the system up to the SRM console.

5.

Proceed to the section Updating Firmware.

Troubleshooting

Preparing Diskettes
The required firmware for your system is preloaded onto the flash ROM. Copies of the firmware
files may be included on your distribution CD, in case you need to refresh the firmware. If they
are not included, you can download them from the Alpha OEM World Wide Web Internet site
at:
http://www.digital.com/alphaoem
Click on Technical Information, then click on Alpha Drivers and Firmware.
The utilities that are used to reload or update the firmware expect to find the files on a diskette,
so you need to prepare a diskette for each utility with the correct files from the CD or the Web.
For FSB: Copy the file PC264SRM.ROM onto a diskette, renaming it DP264SRM.ROM.
For Updating Firmware: Copy the file PC264SRM.ROM and the file PC264FW.TXT onto
a diskette.

6-15

6-16 DS20E Service Guide

Updating Firmware
Be sure to read the information on starting the FSB and preparing diskettes before continuing
with this section.
At the Alpha SRM console prompt, issue the lfu command. This command invokes the
Loadable Firmware Update (LFU) utility.
Perform the following steps to update the console firmware. Refer to the example below.
1.

Load the firmware diskette you created.

2.

Enter the device name dva0 when prompted for the location of the update files.

3.

Enter the filename PC264FW.TXT when prompted. Note that the LFU has already checked
the contents of the diskette and should provide PC264FW.TXT as the default.
PC264FW.TXT specifies which firmware is to be updated and passes the names of the files
that contain updated firmware.

4.

At the UPD> prompt, enter the update command.

Example of Running LFU

P00>>>lfu
Checking dka400.4.0.7.1 for the option firmware files. . .
Checking dva0 for the option firmware files. . .
Option firmware files were not found on CD or floppy.
If you want to load the options firmware,
please enter the device on which the files are located(ewa0),
or just press <return> to proceed with a standard console update: dva0
Please enter the name of the options firmware files list, or
Press <return> to use the default filename (pc264fw.txt) : pc264fw.txt
Copying PC264FW.TXT from dva0. . .
Copying PC264SRM.ROM from dva0. . .
***** Loadable Firmware Update Utility *****
--------------------------------------------------------------------------- Function
Description
--------------------------------------------------------------------------Display
Displays the systems configuration table.
Exit
Done exit LFU (reset).
List
Lists the device, revision, firmware name, and update revision.
Readme
Lists important release information.
Update
Replaces current firmware with loadable data image.
Verify
Compares loadable and hardware images.
? or Help
Scrolls this function table.
--------------------------------------------------------------------------UPD> update

Troubleshooting

Power-On Self-Test (POST)


When power is applied to the DS20E system, a power-up sequence begins. During the power-up
sequence, power-on self-tests (POST) are performed to initialize the hardware and test its ability
to perform basic tasks. Understanding this process can help you isolate power-up problems.

POST Sequence
The power-on self-test (POST) firmware is loaded into the CPU from the SROM at cold poweron or after a hard reset. POST then performs the following functions:

Puts the CPU, Bcache, memory, and I/O into a state that can be used by the operating
system firmware.

Runs some basic hardware tests to ensure the operating system firmware can start:

q
q
q
q
q

Memory test
Cache test
PCI data path test
ISA data path test
Flash ROM test

Reports by means of diagnostic LED and beep codes.

Starts up the SRM firmware

Memory Testing

Ensures that each array consists of four identical DIMMs in every other socket.

Checks the parameters in the onboard ROM of each DIMM for compatibility with the
system.

Rejects any whole array that contains any incompatible DIMM.

Ensures that there are identical DIMMs in each bank and rejects the array if the test fails.

Performs the Checkerboard test to detect data path errors. The pattern is alternated with
each write to memory. The first 32 MB of the first array is tested.

Performs an addressing test to check all address lines in both arrays. The memory array is
rejected if memory testing fails. Rejected arrays are rendered inoperative and are not
reported to the operating system firmware. If no usable memory is detected, a beep code is
emitted.

Cache Test
In this test, the data path integrity is tested. A thorough pattern test is performed on all Bcache
SRAM cells.

6-17

6-18 DS20E Service Guide

PCI Data Path Test


This test reads the configuration registers in PCI configuration address space on both buses. It
writes and reads back all registers being used to initialize PCI to the ISA bridge.

ISA Data Path Test


This test reads the ISA Super I/O chip ID register.

Flash ROM Test


The contents of the flash ROM are fully check summed. If the flash ROM image is corrupt, a
fail-safe floppy load procedure is started. Recovery firmware can be loaded from the emergency
recovery floppy.

LEDs and Beep Codes


During power-up, reset, initialization, or testing, diagnostics are run on the CPUs, memory, Pchips, and the PCI backplane and its embedded options. System LEDs and beep codes can help
troubleshoot a system if a problem occurs during power-up. The LEDs are on the front panel,
CPUs, and on the server features module V2 (SFM2). The LEDs on the CPU and SFM2 can be
viewed when the side cover is removed. Beep codes represent system errors during power-up.

Beep Codes
Beeps
Meaning of the Code
Action to Repair
1-2-3 One beep and a pause followed by two beeps Update the firmware. See the procedure in the
and a pause, followed by three beeps.
Fail-Safe Booter section of this chapter.
Indicates that the firmware in flash ROM is
unavailable and fail-safe booter has begun
running.
4
The header in the ROM is not valid.
Replace the ROM.
6

A checksum error occurred after the ROM


Check memory configuration.
Reseat or replace DIMM.
image was copied into memory. Either
memory is misconfigured or a memory DIMM
needs to be reseated.

Troubleshooting

LEDs on the Front Panel


The LED codes are described as a hex digit. For the pedestal system, the left LED is the least
significant bit (LSB) and the right LED is the most significant bit (MSB). For the rack-mounted
system, the top LED is the MSB. For example, the illustration indicates the hex digit 5.

The LED codes are generated in the order indicated in the following table. Notice that the 4
follows the 0, and that the F and E are ambiguous unless you are watching the sequence. They
have a different meaning the second time they appear.

6-19

6-20 DS20E Service Guide

Hex
Code
F
E
D
C
B
A
9
8
7
6
5
3
2
1
0
4
F
E

Meaning
MSB
Starting console
1
Initialized idle PCB
1
Initializing semaphores
1
Initializing heap
1
Setting heap base address
1
Setting memory low limit
1
Initializing driver structures
1
Initializing idle process PID
1
Initializing file system
0
Initializing timer data structures 0
Lowering IPL
0
Create dead_eater
0
Create poll
0
Create timer
0
Create power-up
0
Entering idle loop
0
Probing I/0
1
Starting drivers
1

1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
1
1
1

1
1
0
0
1
1
0
0
1
1
0
1
1
0
0
0
1
1

LSB
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0

Troubleshooting

LEDs on the CPU


Normally, all CPU LEDs are on except the SROM clock LED. These LEDs are shown in the
diagram and troubleshooting information is provided below:

Indication
5V OK LED ON and any of the
following are OFF:

Possible Cause and Corrective Action


Replace the CPU.

CPU DC OK
2V OK
5V OK LED OFF

Power is not getting to the CPU. The problem could


be the power harness, the power supply, or the
CPU.

CPU Self-Test LED OFF

The self-test for the CPU failed or was not


completed. Try self-test again or replace the CPU.

6-21

6-22 DS20E Service Guide

LEDs on the Server Features Module


The server features module has seven LEDs that indicate the environmental status of the system.
Looking at the features module from the side of the system with the side cover removed (cover
interlock disabled), the LEDs are as follows:

POST LED Codes


The SRM console contains diagnostic LED codes that are displayed after POST firmware passes
control to the SRM firmware. After the user interface to the console is started on the selected
console device (COM1, COM2, or the PC keyboard/mouse and graphics device), these LEDs
appear as a countdown until the console prompt P00>>> appears.
NOTE: Generally, a defective DIMM or Bcache can cause console startup to fail or crash at
any point. Also, most PCI devices are used for the first time during boot; therefore,
defective or unsupported devices can cause problems that may prevent SRM startup.
Occasionally, a bad MLB or CPU daughter card is able to complete POST but is not able to
start SRM.

Hints

If the front panel power LED quickly changes from off to on, then off again after you
press the power button, there may be a short in the system.

If the power button seems to be flashing, the ACPI sleep mode is on. Push and hold the
system power button for more than four seconds.

Troubleshooting

1 indicates that the LED is on, and 0 indicates the LED is off.

Meaning of the Code


Starting console

E Process queue hdrs init idle


PCB

D
C
B
A
9
8
7
6

Possible Cause/Action to Repair


A defective DIMM or Bcache can cause
console startup to fail at any point, or to
crash. Most PCI devices are used for
the first time during boot, and defective
or unsupported devices can cause
problems that prevent SRM startup.
Occasionally, a bad MLB or CPU can
complete POST, but fail to start SRM.

Initializing semaphores
Initializing heap
Initial heap
Memory low limit
Initializing driver structures
Initializing idle process PID
Initializing file system
Initializing timer data structures
Lowering IPL - begin taking
Stuck interrupt, remove options one by
5
interrupts
one, MLB, CPU0, CPU1
4 Entering idle loop
3 Create dead_eater
2 Create poll
1 Create timer
Check that DIMMs are identical within
0 Create powerup
bank, seated properly, good.
F Probing I/0
Check for bad or unsupported options.
Remove options one by one, or
E Starting drivers
disconnect storage cables to isolate
problem.

6-23

6-24 DS20E Service Guide

Using Firmware to Troubleshoot


The SRM firmware contains ROM-based diagnostics (RBDs) that can assist you in
troubleshooting the DS20E system. These RBDs offer powerful diagnostic utilities that allow
you to examine error logs and run specific system or device exercisers.

Using SRM Commands to Test the System


NOTE: You must have a terminal or other device connected to the COM1 serial port to
display SRM command output. A good practice is to connect your laptop to the COM1
serial port and use it as a terminal emulator to access the SRM console.
CAUTION: If the console is set to serial mode, and the system is set to autoboot, the
graphics terminal is inaccessible.

RBDs run using console commands and rely on exerciser modules to isolate errors. They report
errors to the console terminal and/or the console event log. Exercisers run concurrently,
providing maximum bus interaction between the console drivers and the target devices. The
test, init, and more el commands are particularly useful in troubleshooting.
The test command requires a diskette in the floppy disk drive and loopback connectors on
COM2 and parallel port. The test command:

Is the primary diagnostic.

Quickly tests the core system.

The init command causes a system restart and initialization. The firmware begins initializing
and testing the system and the console displays the test countdown.
The more el command displays the contents of the event log one page at a time.
Use the following command sequence:
P00>>> init
.
.
.
P00>>> test
P00>>> more el
This sequence of commands provides a convenient way to test the system for hardware errors. If
the tests that are run when the init command is executing fail, they are indicated during the
test countdown and an error message is displayed. If these tests do not find any errors, the test
command may find errors because it runs firmware diagnostics for the entire core system. Fatal
errors are reported to the console terminal.

Troubleshooting

Init Command
The init command resets the system.
P00>>>init
Initializing...
128 Meg of system memory
probing hose 1, PCI
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pka -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkb -- Adaptec AIC-7895
bus 0, slot 7 -- vga -- ELSA GLoria Synergy
bus 0, slot 8 -- pkc -- NCR 53C895
bus 0, slot 9 -- ewa -- DE500-AA Network Controller
Testing the System
Testing the Memory
Testing the Disks (read only)
Testing the Network
System Temperature is 34 degrees C
initializing GCT/FRU at 1ec000
COMPAQ AlphaServer DS20E Console v5.5-9, Aug 31 1999 11:52:26
P00>>>

Test Command
The test command tests the entire system, a portion of the system (subsystem), or a specified
device. If no device or subsystem is specified, the entire system is tested. The command syntax
is:
t[est][-write][-nowrite"list"][-omit "list"][-t time][-q][dev_arg]
where:
-write specifies that data will be written to the specified device.
-nowrite specifies that data will not be written to the device specified in the "list".
-omit specifies that the devices in the "list" are not to be tested.
-t specifies the amount of time the test command is to run.
-q defines data size as a quadword (64 bits). All values default to 8 bytes.
<dev_arg> specifies the target device, group of devices, or subsystem to test.
For example:
P00>>> t pci0 -t 60
In this example, the test command tests all devices associated with the PCI0 subsystem. Test
run time is 60 seconds.
When a subsystem or device is specified, tests are executed on the associated modules first, then
the appropriate exercisers are run.

6-25

6-26 DS20E Service Guide

More El (or Cat El) Command


Event logs contain information on system events that occur during operation of the system.
The commands cat el and more el display the current contents of the console event log at
power-up, during normal system operation, and while running system tests. Error messages are
indicated by asterisks (***). Use these two commands to display event logs:

cat elDisplays the console event log.

more elDisplays the console event log one screen at time.

When viewing the scrolling log with cat el, use Ctrl/S and Ctrl/Q to pause and resume
display.
P00>>>more el
256 Meg of system memory
probing hose 1, PCI
bus 0, slot 7 -- pka -- NCR 53C895
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pkb -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkc -- Adaptec AIC-7895
bus 2, slot 5 -- eia -- Intel 8255x Ethernet
resetting the SCSI bus on pka0.7.0.7.1
port dqa.0.0.105.0 initialized
port pka0.7.0.7.1 initialized, scripts are at 1d03c0
port dqb.0.1.205.0 initialized
device dqa0.0.0.105.0 (CD-224E) found on dqa0.0.0.105.0
device dka0.0.0.7.1 (SEAGATE ST39102LC) found on pka0.0.0.7.1
device dka100.1.0.7.1 (QUANTUM VIKING II 4.5SCA) found on pka0.1.0.7.1
device dka200.2.0.7.1 (QUANTUM VIKING II 4.5SCA) found on pka0.2.0.7.1
device dka300.3.0.7.1 (QUANTUM VIKING II 4.5SCA) found on pka0.3.0.7.1
environment variable aa_value_bcc created
environment variable aa_2x_cache_size created
environment variable mstart created
--More-- (SPACE - next page, ENTER - next line, Q - quit)_[2K_[256Denvironment
variable mend created
sense key = Unit Attention (29|02) from dka0.0.0.7.1
P00>>>

Troubleshooting

6-27

Showit and Show_Status Commands


You can view the status of the currently executing test/exercisers by displaying one line per
executing diagnostic or displaying status periodically. Use these two commands to display
status:

showitDisplays continuously. Use Ctrl/C to quit the report.

show_statusDisplays system status once.

The showit command continually displays the status of currently running diagnostics. The
syntax is:
P00>>>showit

Showit Example
In this example, the showit command is terminated using Ctrl/C (^C) after two iterations. The
information displayed includes the process ID, device under test, passes completed, and error
counts (hard/soft).
P00>>>showit
ID
Program
Device
Pass Hard/Soft
-------- ------------ ------------ ------ --------00000001 idle
system
0
0 0
00000057 exer_kid
tta1
0
0 0
000001bc memtest
memory
47
0 0
000001f2 memtest
memory
29
0 0
00000229 exer_kid
dva0.0.0.0.0 0
0 0
ID
Program
Device
Pass
Hard/Soft
-------- ------------ ------------ ------ --------00000001 idle
system
0
0 0
00000057 exer_kid
tta1
0
0 0
000001bc memtest
memory
48
0 0
000001f2 memtest
memory
29
0 0
00000229 exer_kid
dva0.0.0.0.0 0
0 0
^C

Bytes Written
-------------

Bytes Read
------------0
0
1
0
12331253760
12331253760
7398752256
7398752256
158208
157696

Bytes Written
Bytes Read
------------- ------------0
0
1
0
12535726080
12535726080
7603224576
7603224576
164864
164864

6-28 DS20E Service Guide

Show_status Example
The show_status command displays the status of currently running diagnostics. The syntax
is:
P00>>> show_status
Many diagnostics run in the background and display information only if an error is detected
during the testing. Use the show_status command whenever you need to display the
progress of these diagnostics.
In this example, show_status displays one line of information for each currently executing
diagnostic. This information includes the process ID, device under test, passes completed, and
error counts.
P00>>> show_status
ID
Program
Device
Pass Hard/Soft
-------- ------------ ------------ ------ --------00000001
idle
system
0
0 0
00000057 exer_kid
tta1
0
0 0
000001bc
memtest
memory
47
0 0
000001f2
memtest
memory
29
0 0
00000229 exer_kid
dva0.0.0.0.0 0
0 0
P00>>>

Bytes Written
------------0
1
12331253760
7398752256
158208

Bytes Read
------------0
0
12331253760
7398752256
157696

Scripts and Loopback Testing


Running tests concurrently and indefinitely (until stopped with the kill_diags command) is
useful in flushing out intermittent hardware problems. The script tests devices in this order:

Console loopback tests if -lb argument is specified

Network external loopback tests for E*A0

Memory tests (one pass)

Read-only tests: DK* disks, DR* disks, DU* disks, MK* tapes, DV* floppy

VGA console tests, only if the console environment variable is set to serial.
NOTE: You must have a serial terminal if the console environment variable is set to
serial.

Troubleshooting

Use the test command with the following arguments for loopback testing or running
scripts:

The test command with this


argument:

Does this:

-lb

Conducts loopback tests for COM2 and the parallel


port in addition to quick core system tests. (Requires
diskette and loopback connectors on COM2 and
parallel port.)

memory

Starts memory test.

serial1

Starts exercise on COM2.

network

Starts network test on EW*0 or EI*0.

parallel

Starts exercise on PARA.

floppy

Starts exercise on DV*0.

disks

Starts exercise on DK*.

Type show_status or showit to display test progress. Type cat el to redisplay recent
errors.

Terminating Test Processes


There are two commands to terminate a test process:

kill <PID>Ends a specific test process where <PID> is the specific process.

kill_diagsEnds all test processes.

Use the showit or ps command to find <PID>.


After you terminate the tests, use the init command to set all system registers to a defined
state.

6-29

6-30 DS20E Service Guide

Changing the System Type


The DS20E can be ordered as either a server or a workstation. The console banner displayed
upon completion of power-up identifies the system type:
COMPAQ AlphaServer DS20E 666 MHz Console V5.6-3, Nov 29 1999 10:32:53

Under some circumstances, you may need to change the system type. For example, you need to
change it if it was set incorrectly by Manufacturing. You should also check the setting if you
replace the server features module.
The system type is set with the following deposit commands.
P00>>> d -b iic_rcm_nvram0:11 00
P00>>> d -b iic_rcm_nvram0:11 01

This command sets the system type to AlphaServer DS20E


This command sets the system type to AlphaStation DS20E

If you need to reset the system type, use a procedure similar to the following. This example
changes the system type to a workstation.
1.

Enter the following command:

P00>>> d -b iic_rcm_nvram0:11 01

2.

Initialize the system and observe the console banner when power-up has been completed. In
this example, the system type is now a workstation.

P00>>>init
Initializing...
128 Meg of system memory
probing hose 1, PCI
probing PCI-to-PCI bridge, bus 2
bus 2, slot 0 -- pka -- NCR 53C875
bus 2, slot 1 -- pkb -- NCR 53C875
bus 2, slot 2 -- ewa -- DE500-AA Network Controller
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pkc -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkd -- Adaptec AIC-7895
bus 0, slot 7 -- vga -- ELSA GLoria Synergy
bus 0, slot 8 -- pke -- QLogic ISP10x0
Testing the System
Testing the Memory
Testing the Disks (read only)
Testing the Network
System Temperature is 24 degrees C
initializing GCT/FRU at 1e2000
COMPAQ AlphaStation DS20E 666 MHz Console V5.6-3, Nov 29 1999 10:32:53

Troubleshooting
2

You can also examine the I C ROM on the server features module to determine the system type.
The example below identifies the system as a workstation.
P00>>> e -b iic_rcm_nvram0:11
iic_rcm_nvram0:
11 01

For More Information


For more service information, use the following online resources:

Alpha firmware updates, training, and support:


http://www.digital.com/info/alphaserver/support.html

Latest ECU revision kits: 1-800-DIGITAL

Product and late-breaking technical information

q
q

http://www.digital.com/alphaserver/ds20e/index.html
Product Support Information Collection (ProSIC) for Alpha systems:
http://prosic.cxo.dec.com/SYSTEMS/systems.html#alpha

Supported options

q
q
q

http://www.digital.com/alphaserver/ds20e/options/asds20e_options.html
Configuration information and examples:
http://www.digital.com/info/alphaserver/configure.html
Compaq Systems and Options Catalog:
http://www.digital.com/info/SOHOME/SOHOMEHM.HTM

6-31

Chapter

Error Registers

Introduction
When diagnosing a DS20E system error, you may have to use the contents of one or more
system error or status registers to determine the specific cause of the error. This chapter presents
the registers that contain information critical to troubleshooting.
Topics in this chapter are:

Ibox Status Register

Memory Management Status Register

Dcache Status Register

Cbox Read Register

Miscellaneous Register

Device Interrupt Request Registers

P-Chip Error Register

Failure Register

Function Register

7-2 DS20E Service Guide

Ibox Status Register


The Ibox status register (I_STAT) is a read/write-1-to-clear register that contains Ibox status
information. This register can be used in performance monitoring.

Error Registers

7-3

Field

Description

PAR (Icache parity error)

This bit can be written with a one or cleared. It indicates whether or not the
Icache encountered a parity error on instruction fetch. When a parity error is
detected, the Icache is flushed, a replay trap back to the address of the error
instruction is generated, and a correctable read interrupt is requested.

OVR [1:0] (ProfileMe Counter 0


Overcount)

This read-only bit indicates a value (0-7) that must be subtracted from the
counter 0 result to obtain an accurate count of the number of instructions
retired in the interval beginning three cycles after the profiled instruction
reaches pipeline stage 2 and ending four cycles after the profiled instruction
is retired.

MIS (ProfileMe Mispredict Trap)

If the I_STAT[TRP] bit is set, this read-only bit indicates that the profiled
instruction caused a mispredict trap. JSR/JMP/RET/COR or
HW_JSR/HW_JMP/ HW_RET/HW_COR mispredicts do not set this bit but
can be recognized by the presence of one of these instructions at the PMPC
location with the I_STAT[TRP] bit set. This identification is exact in all cases
except error condition traps. Hardware corrected Icache parity or Dcache
ECC errors and machine check traps can occur on any instruction in the
pipeline.

LSO (ProfileMe Load-Store Order


Trap)

If the profiled instruction caused a replay trap, this read-only bit indicates that
the precise trap cause was an Mbox load-store order replay trap. If clear, this
bit indicates that the replay trap was any one of the following:
Mbox load-load order
Mbox load queue full
Mbox store queue full
Mbox wrong size trap (such as, STL

/'4

Mbox Bcache alias (2 physical addresses map to same Bcache line)


Mbox Dcache alias (2 physical addresses map to same Dcache line)
Icache parity error
Dcache ECC error

7-4 DS20E Service Guide

Field

Description

TRP (ProfileMe Trap)

This read-only bit indicates that the profiled instruction caused a trap. The
trap type field, PMPC register, and instruction at the PMPC location are
needed to distinguish all trap types.

Trap Type (ProfileMe Trap


Types)

If the profiled instruction caused a trap (indicated by I_STAT[TRP]), this field


indicates the trap type as listed here:
Value Trap Type
0

Replay

Invalid (unused)

DTB Double miss (3 level page tables)

DTB Double miss (4 level page tables)

Floating point disabled

Unaligned Load/Store

DTB Single miss

Dstream Fault

OPCDEC

Invalid (use PMPC)

10

Machine Check

11

Invalid (use PMPC)

12

Arithmetic

13

Invalid (use PMPC)

14

MT_FPCR

15

Reset

Traps due to ITB miss, Istream access violation, or interrupts are not reported
in the trap type field because they do not cause pipeline aborts. Instead,
these traps cause pipeline redirection and can be distinguished by examining
the PMPC value for the presence of the corresponding PALcode entry offset
addresses indicated below. In these cases, the ProfileMe interrupt will
normally be delivered when exiting the trap PALcode flow and the
EXC_ADDR register will contain the original PC that encountered the redirect
trap.
PC[14:0] Trap
0581
ITB miss
0481
Istream Access Violation
0681
Interrupt
ICM (ProfileMe Icache Miss)

OVR [2] (ProfileMe Counter 0


Overcount

This read-only bit indicates that the profiled instruction was contained in an
aligned 4-instruction Icache fetch block that requested a new Icache fill
stream.
This read-only bit indicates a value (07) that must be subtracted from the
counter 0 result to obtain an accurate count of the number of instructions
retired in the interval beginning three cycles after the profiled instruction
reaches pipeline stage 2 and ending four cycles after the profiled instruction
is retired.

Error Registers

Memory Management Status Register


The memory management status register (MM_STAT) is a read-only register that stores
information on Dstream faults and Dcache parity errors. The VA, VA_FORM, and MM_STAT
registers are locked against further updates until software reads the VA register.
The MM_STAT bits are modified by hardware only when the register is not locked and a
memory management error, DTB miss, or Dcache parity error occurs. The MM_STAT register
is not unlocked or cleared on reset.

Field

Description

OPCODE

Opcode field of the faulting instruction.

RA

RA field of the faulting instruction.

BAD_VA

Set if reference had a bad virtual address.

DTB_MISS

Set if reference resulted in a DTB miss.

FOW

Set if reference was a write and the FOW bit of the PTE was set.

FOR

Set if reference was a read and the FOR bit of the PTE was set.

ACV

Set if reference caused an access violation. Includes bad VA.

WR

Set if reference that caused the error was a write operation.

7-5

7-6 DS20E Service Guide

Dcache Status Register


The Dcache status register (DC_STAT) is a read-write register. If a Dcache tag parity error or
data ECC error occurs, information about the error is latched in this register. The register is read
only by PALcode and is an element in the CPU or System Uncorrectable Machine Check Error
Logout frame.

Field

Description

SEO

Second error occurred. When set, this bit indicates that a second Dcache
store ECC error occurred within 6 cycles of the previous Dcache store ECC
error.

ECC_ER_LD

ECC error on load. When set, this bit indicates that a single-bit ECC error
occurred while processing a load from the Dcache or any fill.

ECC_ERR_ST

ECC error on store. When set, this bit indicates that an ECC error occurred
while processing a store.

TPERR_P1

Tag parity error, pipe 1. When set, this bit indicates that a Dcache tag probe
from pipe 1 resulted in a tag parity error. The error is uncorrectable and
results in a machine check.

TPERR_P0

Tag parity error, pipe 0. When set, this bit indicates that a Dcache tag probe
from pipe 0 resulted in a tag parity error. The error is uncorrectable and
results in a machine check.

Error Registers

Cbox Read Register


The Cbox read register is read 6 bits at a time. The table shows the ordering from LSB to MSB.
The register is read only by PALcode and is an element in the CPU or System Uncorrectable
Machine Check Error Logout frame.

Name

Description

C_SYNDROME_1[7:0] Syndrome for upper QW in OW of victim that was scrubbed.


C_SYNDROME_0[7:0] Syndrome for lower QW in OW of victim that was scrubbed.
C_STAT[4:0]

C_STS[3:0]

C_ADDR[6:42]

Bits

Error Status

00000

Either no error, or error on a speculative load, or a


Bcache victim read due to a Dcache/Bcache miss.

00001

BC_PERR (Bcache tag parity error)

00010

DC_PERR (duplicate tag parity error)

00011

DSTREAM_MEM_ERR

00100

DSTREAM_BC_ERR

00101

DSTREAM_DC_ERR

0011X

PROBE_BC_ERR

01000

Reserved

01001

Reserved

01010

Reserved

01011

ISTREAM_MEM_ERR

01100

ISTREAM_BC_ERR

01101

Reserved

0111X

Reserved

10011

DSTREAM_MEM_DBL

10100

DSTREAM_BC_DBL

11011

ISTREAM_MEM_DBL

11100

ISTREAM_BC_DBL

If C_STAT equals xxx_MEM_ERR or xxx_BC_ERR, then C_STS


contains the status of the block as follows; otherwise, the value of
C_STS is X:
Bit Value

Status of Block

7:4
3
2
1
0

Reserved
Parity
Valid
Dirty
Shared

Address of last reported ECC or parity error. If C_STAT value is


DSTREAM_DC_ERR, only bits 6:19 are valid.

7-7

7-8 DS20E Service Guide

Miscellaneous Register
The miscellaneous register (MISC) is designed so that only writes of 1 affect it. When a 1 is
written to any bit in the register, there is no need to be concerned with read-modify-write or the
status of any other bits in the register. Once NXM is set, the NXS field is locked. It is unlocked
when software clears the NXM field. The ABW (arbitration won) field is locked if either ABW
bit is set, so the first CPU to write it locks out the other CPU. Writing a 1 to ACL (arbitration
clear)clears both ABW bits and both ABT (arbitration try) bits and unlocks the ABW field.

Error Registers

Field

Description

DEVSUP

<43:40> WO 0 Development support.

REV

<39:32> RO 1 Latest revision of the C-chip: 1 = Tsunami

NXM

<28> R, W1C 0 Nonexistent memory address detected. Sets DRIR<63> and


locks the NXS field until it is cleared.

NXS

<31:29> RO 0 NXM source Device that caused the NXM. Unpredictable if


NXM not set. 0 = CPU0, 1 = CPU1.

ACL

<24> WO 0 Arbitration clear. Writing a 1 to this bit clears the ABT and ABW
fields.

ABT

<23:20> R, W1S 0 Arbitration try. Writing a 1 to these bits sets them.

ABW

<19:16> R, W1S 0 Arbitration won. Writing a 1 to these bits sets them unless
one is already set, in which case the write is ignored.

IPREQ

<15:12> WO 0 Interprocessor interrupt request write a 1 to the bit


corresponding to the CPU you want to interrupt. Writing a 1 here sets the
corresponding bit in the IPINTR.

IPINTR

<11:8> Interprocessor interrupt pending one bit per CPU. Pin irq<3> is
asserted to the CPU corresponding to a 1 in this field.

ITINTR

<7:4> R, W1C 0 Interval timer interrupt pending one bit per CPU. Pin
irq<2> is asserted to the CPU corresponding to a 1 in this field.

CPUID

<1:0> ID of the CPU performing the read.

7-9

7-10 DS20E Service Guide

Device Interrupt Request Registers


The device interrupt request registers (DIRn) indicate which interrupts are pending to the CPUs
and indicate the presence of an I/O error condition.

Field

Description

ERR

<63:62> IRQ0 error interrupts


<63> C-chip detected MISC <NXM>
<62> P-chip 0 error
<62> P-chip 1 error

NXS

<55:0> IRQ1 PCI interrupts pending to the CPU.

Error Registers

P-Chip Error Register


If any bits <11:0> are set, the P-chip error register (PERROR) is frozen. Only bit <0> can be set
after that. All other values are held until all bits <11:0> are clear. When an error occurs and one
of the <11:0> bits is set, the associated information is captured in bit <63:16>. After the
information is captured, the INV bit is cleared, but the information is not valid and should not be
used if INV is set.

7-11

7-12 DS20E Service Guide

Field

Description

INV

<51> Info Not Valid This bit is meaningful when one of the bits <11:0> is
set. This bit indicates the validity of SYN, CMD, and ADDR bits.
Valid = 0, Invalid = 1.

CMD

<55:52> This field represents the PCI command when the error occurred if
the error is not a correctable ECC error (CRE) or uncorrectable ECC error
(UECC).
If the error is a CRE or UECC, then the value of this field is defined as
follows:
Value

Command

0000

DMA read

0001

DMA read-modify-write

0011

SGTE read

Others

Reserved

SYN

<63:56> ECC syndrome of error if CRE or UECC.

RES

<15:12> Reserved.

CRE

<11> Correctable ECC error.

UEOC

<10> Uncorrectable ECC error.

RES

<15:12> Reserved.

NDS

<8> No b_devsel_l as PCI master.

RDPE

<7> PCI read data parity error as PCI master.

TA

<6> Target abort as PCI master.

APE

<5> Address parity error detected as potential PCI target.

SGE

<4> Scatter-gather had invalid page table entry.

DCRTO

<3> Delayed completion retry timeout as PCI target.

PERR

<2> b_perr_l sampled asserted.

SERR

<1> b_serr_l sampled asserted.

LOST

<0> An error was lost because it was detected after this register was frozen,
or while in the process of clearing this register.

Error Registers

7-13

Failure Register
2

The failure register, located on the I C bus, is locked when there is a power supply or fan failure.
Together with the function register, fan and power supply failures are identified and reported to
the operating system thus notifying it that the system will shut down in 30 seconds. The results
of reading this register are displayed by the SRM show power command.

Field

Description

Reserved

<0>

C/SFANO_L

<1> When set, this bit indicates that either the system fan 0 or the fan on the
heatsink on CPU0 failed. Which of the two fans failed is determined by the
state of SYSFAN_OK and CPUFANS_OK in the function register.

Reserved

<2>

Reserved

<3>

PS1_PRESENT_L

<4> If this bit is clear, power supply 1 is present.

C/SFAN1_L

<5> When set, this bit indicates that either the system fan 1 or the fan on the
heatsink on CPU1 failed. Which of the two fans failed is determined by the
state of SYSFAN_OK and CPUFANS_OK in the function register.

PS2_PRESENT_L

<6> If the bit is clear, power supply 2 is present.

PS0_PRESENT_L

<7> If the bit is clear, power supply 0 is present.

7-14 DS20E Service Guide

Function Register
2

The function register generates an interrupt on the I C bus if one of the critical functions
monitored (power, temperature, fan operation) goes beyond predetermined limits. When such an
interrupt is generated, the contents of bits <0, 1, 2, and 5> in the failure register are frozen. The
system shuts down 30 seconds after the interrupt is posted. The results of reading this register
are displayed by the SRM show power console command.

Field

Description

TEMP_OK

<0> When set, this bit indicates that the temperature inside the system
enclosure is below the temperature limit.

SYSFAN_OK

<1> When this bit is zero, C/SFAN0_L and C/SFAN1_L indicate which
system fan failed. When set to one, this bit indicates that the system fans
are functioning properly.

Reserved

<2>

CPUFANS_OK

<3> When this bit is 0, C/SFAN0_L and C/SFAN1_L indicate which CPU
fan failed. When set, this bit indicates that the fans on CPU heatsinks are
functioning properly.

Reserved

<4>

PSO_FAIL

<5> When set, this bit indicates that power supply 0 has failed. This bit is
valid only if the corresponding PS0_PRESENT_L bit is 0.

PS1_FAIL

<6> When set, this bit indicates that power supply 1 has failed. This bit is
valid only if the corresponding PS1_PRESENT_L bit is 0.

PS2_FAIL

<7> When set, this bit indicates that power supply 2 has failed. This bit is
valid only if the corresponding PS2_PRESENT_L bit is 0.

Chapter

OS Diagnostics Overview

Introduction
Each operating system supports tools and has features that can assist you in troubleshooting.
The following tools are described in this chapter:

Tru64 UNIX Diagnostic Tools

DEC VET

Machine Checks

Compaq Analyze

8-2 DS20E Service Guide

Tru64 UNIX Diagnostic Tools


Under the Tru64 UNIX operating system, you have access to these diagnostic tools:

SCU-SCSI CAM utility

Crash dumps

SCU-SCSI CAM Utility


The SCU-SCSI CAM utility is used to maintain or diagnose SCSI peripherals or the Tru64
UNIX CAM I/O subsystem.
To list full SCSI configuration, including device names and revisions, type:
#scu show edt lun 0 full
For more information on SCU type:
#man scu or scu help

Crash Dumps
Under the Tru64 UNIX operating system, you can initiate a crash dump and analyze a systemgenerated crash dump. For example, if the system is hung (no response from keyboard, mouse,
or network), press the Halt button for a second. The system should exit to the SRM console. If
the system crashes, it exits UNIX and writes a crash dump to disk. To display the crash dump,
enter crash at the SRM console prompt.

OS Diagnostics Overview

DEC VET
The DIGITAL Verifier Exerciser Tool (DEC VET) is an application that is used to verify a
system installation and to exercise the components of the system.
DEC VET provides both a graphical user interface and a command-line interface. Both
interfaces allow a user to exercise the hardware and software in the same way for any system on
a network, regardless of the operating system of the remote system.
Because DEC VET runs on an operating system, it can be used as the first level of testing when
troubleshooting a system. You can run DEC VET to exercise one or more system components
without having to shut down the operating system. This may be an advantage when
troubleshooting a system if the customer does not want to shut down the operating system just
to test the system.
DEC VET has a generic set of exercisers that can be used to test the installation of hardware and
base operating system software. The DEC VET exercisers can be configured to test a single
device or to exercise all the devices on a system simultaneously. With this tool, you can:

Troubleshoot system hardware and software.

Perform tests on single systems or on multiple nodes on a network.

Compose, edit, and run script files.

Select multiple processes for each device.

Set the run time or pass count.

Set the level of error message detail.

View a test status summary during testing or after testing is complete.

8-3

8-4 DS20E Service Guide

The exercisers check the following components:

This exerciser:
CPU

Does this:
Tests system processor functions, including binary operations, integer and
floating-point computations, and data conversion.

Memory

Dynamically allocates and deallocates virtual memory; writes and verifies test
patterns.

Disks

Tests both logical and physical disk I/O by performing read and write
operations. Verifies the test patterns written to the disks.

Files

Writes to and reads from disk files and verifies the test patterns written.

Tape

Writes to and reads from tape device files and verifies the test patterns written.
The operations include file mark detection, spacing, rewinding, and end-oftape detection.

Network

Tests the underlying protocols, physical network adapters, local and remote
networks, destination adapters, network services, and echo daemons. Both
TCP/IP and DECnet networks are supported.

Printer

Prints out a file containing a test pattern of all the ASCII characters from " "
(blank space) to "~" (tilde). This pattern is shifted one character to the right on
each subsequent line. Enough lines are printed to verify that all the ASCII
characters can be printed at each position. Other tests are available to test
PostScript output.

Terminal

Displays to the terminal screen a file containing a test pattern of all the ASCII
characters from " " (blank space) to "~" (tilde). This pattern is shifted one
character to the right on each subsequent line. Enough lines are displayed to
verify that all the ASCII characters can be displayed at each position.

Video

Displays several video test patterns and graphics. These verify the consoles
ability to display graphics, text, and shades of color accurately.

OS Diagnostics Overview

Machine Checks
Machine checks are usually associated with hardware error conditions. Machine checks can
represent:

Correctable (soft) errors

Uncorrectable (hard) errors

Correctable errors do not usually affect system operation. Uncorrectable errors usually result in
a system crash.
When a system error is detected, the PALcode usually classifies it as a machine check. PALcode
collects error information from module control and status registers and formats it into a logout
frame that is passed to the operating system. The operating system uses the information in the
logout frame to determine the action to take when an error occurs. Some errors are fatal; they
can cause the entire system or a specific process to fail. Other errors can be corrected and do not
halt processing. The operating system writes the error information in an entry in a binary file
that can be used to produce an error log. Most of the errors occur during the transmission of
commands or data along the system bus or in buses or storage internal to a particular module.
In handling errors, the PALcode is responsible for parsing the exception and building the
machine check logout frame. During error checking, the following activities take place:

All error bits are examined.

The nature of the machine check or interrupt is determined.

Corrective action is taken, if possible.

A Corrected Error Frame or Machine Check Logout Frame is built.

A Stack Frame is built.

Error bits are cleared.

Control passes to the operating system through the system control block (SCB)

Operating System
The operating system (OS) machine check handler is responsible for the following actions:

Creating the Kernel Event Header Frame

Appending the Corrected Error Frame or Machine Check Logout Frame

Updating the error log

Executing fault analysis using the saved machine check frame and context

8-5

8-6 DS20E Service Guide

The operating system logs events and errors that occur while the system is running. You can use
the information in these event logs to help troubleshoot system problems.
The error handlers in an operating system generate entries in the binary system error log. All
error log entries are written immediately, except for correctable memory errors. The size of an
error log entry depends on the type of error that occurs and the error handling mechanisms used
to log it.

Common OS Header
The common operating system header is a segment of the error log entry for systems using the
Alpha 21264 (EV6/EV67) processor. This header is used by the error analysis tools to:

Identify events

Control error parsing

Dispatch errors

This header contains similar information for all supported operating systems. However, some of
the fields contain different information depending on the operating system.

Error Logout Frame


The error logout frame is generated by the PALcode for one of the following machine check
types:

System or processor correctable error

System or processor uncorrectable error

Environmental correctable or uncorrectable error

The error logout frame includes the values of several system and/or processor registers, which
can be used to determine the cause of the error.

OS Diagnostics Overview

Extended Error Log Block


The extended error log capture block is used to provide error capture packets for diagnostic
FRU analysis. This additional information helps the error analysis tools to determine which
FRU needs to be replaced.

Termination Block
Each error log entry must be terminated with an error event termination block. This block tells
the software reading the error log entry that it has reached the end of the entry.

Error Classes
Four classes of errors are handled by the system bus:

Soft errors, hardware corrected, transport to the software (for example, single-bit ECC
errors).

Soft errors requiring PALcode/software support to correct.

Hard errors restricted to the failing transaction (for example, a double-bit error in a
memory location in a users process. This would result in the process being aborted and
the page being marked as bad). The system can continue operation.

System fatal hard errors. The system integrity has been compromised and continued
system operation cannot be guaranteed. All outstanding transactions are aborted, and the
state of the system is unknown.

8-7

8-8 DS20E Service Guide

Error Types
Six types of machine checks can occur in DS20E systems.

Correctable Machine Checks


The correctable machine checks are:

System-detected This type of machine check is caused by correctable single-bit errors that
correctable errors occur in the system. These errors are detected by the crossbar P-chip and are
(SCB 620)
typically correctable read data (CRD) errors. Possible causes for this type of

machine check include:


DMA read errors
Read errors in the PCI memory space
This is known as a 620 machine check for the associated SCB number.
ProcessorThis type of machine check is caused by system-independent, single-bit errors
detected
that are detected and corrected by the processor. Possible causes for this type
correctable errors of machine check include:
(SCB 630)

Bcache ECC errors


Processor I-stream ECC errors
Processor D-stream ECC errors
Processor Icache parity errors
Processor Dcache parity errors

This is known as a 630 machine check for the associated SCB number.
Environmental
This type of machine check is caused either by a system-detected hardware
correctable errors failure or by an environmental condition. The system may recover from these
(SCB 680)
conditions if there is redundant hardware present (for example, a redundant

power supply). Possible causes for this type of machine check include:
Power supply failure error
Overtemperature warning condition
This is known as a 680 correctable machine check for the associated SCB
number.

OS Diagnostics Overview

Uncorrectable Machine Checks


The uncorrectable machine checks are:

System
uncorrectable
errors (SCB 660)

This type of machine check is caused by a system-detected or processordetected uncorrectable error. These errors are the result of a request that was
made external to the processor. These errors may cause the machine to
crash. Possible causes for this type of machine check include:
Nonexistent memory reference error
Fatal PCI error
PCI data parity error
This is known as a 660 machine check for the associated SCB number.

Processor
uncorrectable
errors (SCB 670)

This type of machine check is caused by internal processor errors detected by


the processor. These errors always result in a system crash. Possible causes
for this type of machine check include:
Uncorrectable Bcache errors
Bugcheck errors
Dcache tag parity errors
Double-bit ECC memory errors
This is known as a 670 machine check for the associated SCB number.

Environmental
uncorrectable
errors (SCB 680)

This type of machine check is caused either by a system-detected hardware


failure or by an environmental condition. The system is unable to recover from
these conditions. Possible causes for this type of machine check include:
Complete power supply failure
System or CPU fan failure
Overtemperature condition
This is known as a 680 uncorrectable machine check for the associated SCB
number.

8-9

8-10

DS20E Service Guide

Machine Check Logout Frame (SCB 660 and 670)


Machine logout frame maps are used for system control block (SCB) 660 system uncorrectable
errors and SCB 670 processor uncorrectable errors. The uncorrectable software machine check
handlers access uncorrectable error context detected through system and processor dedicated
hardware error detector state capture. The system segment offset locations in the CPU
uncorrectable logout frames are always zeroes, and the first eight quadwords of the EV6/EV67
segment offset locations in the system uncorrectable logout frames are always zeroes.

Machine Check Logout FrameHeader

The r in the table below, when set, indicates that the error is retryable.

The s in the table below, when set, indicates that this is the second error.

The offset from the CEF below is shown for reference only.

Offset from
63

62

61

32 31
SBZ

Frame Size

0 MCLF

CEF

000h

068h

System Offset

EV6/EV67 Offset

008h

070h

MCHK Frame Revision

MCHK Code

010h

078h

OS Diagnostics Overview

8-11

Machine Check Logout FrameCPU Data


The table applies to both EV6 and EV67.

EV6 Ibox Status (I_STAT<31:29>)

Offset from CPU Data


000h

EV6 DCache Status (DC_STAT<4:0>)

008h

EV6 Cbox (C_ADDR<43:6>)

010h

EV6 Cbox (C_SYNDROME_1<7:0>)

018h

EV6 Cbox (C_SYNDROME_0<7:0>)

020h

EV6 Cbox (C_STAT<4:0>)

028h

EV6 Cbox (C_STS<3:0>)

030h

EV6 TB Miss or Fault Status(MM_STAT<10:0>)

038h

EV6 Exception Address (EXC_ADDR)

040h

EV6 Interrupt Enable and Current Processor Mode (IER_CM)

048h

EV6 Interrupt Summary Register (ISUM)

050h

EV6 Reserved 0

058h

EV6 PAL Base Address (PAL_BASE)

060h

EV6 Ibox Control (I_CTL)

068h

EV6 Ibox Process Context (PCTX)

070h

EV6 Reserved 1

078h

EV6 Reserved 2

080h

8-12

DS20E Service Guide

Machine Check Logout FrameSystem Data


The offsets from the MCLF and CEF below are shown for reference only.

Offset from
System Data

MCLF

Software Error Summary Flags

000h

0A0h

CChip CPU0 Device Interrupt Request Register(DIR0)

008h

0A8h

CChip Miscellaneous Register (MISC)

010h

0B0h

PChip 0 Error Register (P0_PERROR)

018h

0B8h

PChip 1 Error Register (P1_PERROR)

020h

0C0h

Tsunami/Typhoon Reserved 1

028h

0C8h

Tsunami/Typhoon Reserved 2

030h

0D0h

Tsunami/Typhoon Reserved 3

038h

0D8h

Tsunami/Typhoon Reserved 4

040h

0E0h

CEF

OS Diagnostics Overview

Locating the Logout Frames


The location of the Machine Check Logout Frame (MCLF) varies from system to system, and
could possibly change for different versions of the same system. For that reason, the following
information should be used to find the structure:

The SRM console command, show hwrpb, returns the physical address of the Hardware
Restart Parameter Block (HWRPB). That address contains its own physical address.

HWRPB offset A0h contains an offset relative to the beginning of the HWRPB that points
to the Per-CPU Slot data structure for the DS20E system.

Offset D8h from the beginning of the Per-CPU Slot information contains the physical
address of the Corrected Error Frame (CEF).

Offset 00h of the CEF contains its length. Since the MCLF immediately follows the
Corrected Error Frame, add this value to get the address of the MCLF.

The following is an example of the console commands:

Command
P00>>> show hwrpb
HWRPB is at 2000
P00>>> e -p 20a0
PMEM: 20A0 000001C0
P00>>> e -p 2298
PMEM: 2298 00006000
P00>>> e -p 6000
PMEM: 6000 00000068
P00>>> e -p 6068
PMEM: 6078 00000098

Description
<- physical address of HWRPB
2000+A0
<- offset of Per-CPU Slot
2000+1c0+D8
<- physical address of CEF
6000
<- offset to MCLF
6000+68+10
<- machine check code

8-13

8-14

DS20E Service Guide

Detailed Logout Frame Field Descriptions


The following descriptions of the fields within the logout frames provide additional clarification
of their use:

Frame Size

Provides the physical size of the entire logout error frame in bytes

Second Flags

If bit <63> is set to 1 a retryable error has occurred. If bit <63> set to 0 a nonretryable error typically fatal in nature.
Second Error is <62> flag and if bit <62> is set to 1 a second error has
occurred. If is set to 0 only one error has occurred.

EV6/EV67
Offset

Offset from the base address of the frame to the start of 21264 internal
processor registers error content information.

System Offset

Offset from the base address of the frame to the start of the system-specific
diagnostic, control, status, or error content information begins.

MCHK Code

Indicates the specific type of correctable or uncorrectable machine check error


which has occurred. (See Machine Checks in this chapter for specific
details.)

MCHK Frame
Revision
EV6/EV67
xxx xxx xxx

The revision level of the logout frame.

Software Error

Software flags used to signal system error handler(s) execution processing


and Summary Flags control.

CChip, PChip

Specifically refers to a Tsunami Core Logic Chipset system diagnostic, control,


status, or error registers

TIG

Specifically refers to a Timing, Interrupt and General bus (TIG) Controller Chip
system diagnostic, control, status, or error registers.

Xxx

Reserved.

Yy

Reserved for additional logout frame error state capture.

Specifically refers to a specified 21264 internal processor control, diagnostic,


error, or status register. Refer to for a complete description and layout of the
processor registers contained in the logout frame.

OS Diagnostics Overview

Environmental Error Logout Frames (SCB 680)


These logout frames are used for all EV6/EV67 SCB 680 system environmental correctable and
uncorrectable errors.
This table shows the format of the Environmental Error Logout Frame (MCLF) built by the
console firmware used for Environmental Errors vectored through SCB entry 680h. Currently,
the only environmental fault supported is fan failure.

Frame_Flags
System_Area_Offset
Frame_Rev
SW_Sum_Flags
Cchip_DIR
Environ_QW_1
[64:8]MBZ
Environ_QW_2
[64:8]MBZ
Environ_QW_3 (Reserved)
Environ_QW_4 (Reserved)
Environ_QW_5 (Reserved)
Environ_QW_6 (Reserved)
Environ_QW_7 (Reserved)
Environ_QW_8 (Reserved)
Environ_QW_9 (Reserved)

Frame_Size
CPU_Area_Offser
Mchk_Error_Code

Fan, Temperature, and


Power Supply Function
Register [7:0]
Failing Fan/Power
Supply Register [7:0]

8-15

8-16

DS20E Service Guide

Corrected Error Frame (SCB 620 and 630)


These tables show the format of the Corrected Error Frame built by the console firmware for
correctable errors vectored through SCB entries 620h and 630h.

The r in the table below indicates that the error is retryable when set.

The s in the table below indicates that this is the second error.

Corrected Error Frame Header

63

62

61

32 31

SBZ

Frame Size

Offset(Hex)
000h

System Offset

CPU Offset

008h

MCHK Frame Revision

MCHK Code

010h

Corrected Error Frame CPU Data


CPU data are meaningless when the MCHK code is for a system-corrected error.
The offset from the CEF below is shown for reference only; check the CPU offset field in the
header.

Offset from

EV6 Ibox Status (I_STAT<31:29>)

CPU
Data
000h

018h

EV6 DCache Status (DC_STAT<4:0>)

008h

020h

EV6 Cbox (C_ADDR<43:6>)

010h

028h

EV6 Cbox (C_SYNDROME_1<7:0>)

018h

030h

EV6 Cbox (C_SYNDROME_0<7:0>)

020h

038h

EV6 Cbox (C_STAT<4:0>)

028h

040h

EV6 Cbox (C_STS<3:0>)

030h

048h

EV6 TB Miss or Fault Status(MM_STAT<10:0>)

038h

050h

The table above also applies to EV67.

CEF

OS Diagnostics Overview

Corrected Error Frame System Data


System data are meaningless when the MCHK code is for a CPU-corrected error.
The offset from the CEF below is shown for reference only; check the system offset field in
the header.

Offset from
System Data

CEF

Software Error Summary Flags

000h

058h

CChip CPU0 Device Interrupt Request Register (DIR0)

008h

060h

CChip Miscellaneous Register (MISC)

010h

068h

PChip 0 Error Register (P0-PERROR)

018h

070h

PChip 1 Error Register (P1-PERROR )

020h

078h

Tsunami/Typhoon Reserved 1

028h

080h

Tsunami/Typhoon Reserved 2

030h

088h

Tsunami/Typhoon Reserved 3

038h

090h

Tsunami/Typhoon Reserved 4

040h

098h

8-17

8-18

DS20E Service Guide

Compaq Analyze
Compaq Analyze is an error analysis and reporting tool for systems using the Alpha 21264
(EV6/EV67) processors with the Tsunami chipset. Compaq Analyze is intended as the successor
to the DECevent utility.
Compaq Analyze is designed to be used by:

System managers

Service engineers

Customer Support Center (CSC) specialists

Compaq Analyze runs on all operating systems supported by DS20E systems. Compaq Analyze
contains sets of rules (analysis rulesets) that are used to analyze errors in the event log based on
input from the FRU table. The rules contain knowledge about the possible causes of errors in the
system.

OS Diagnostics Overview

Compaq Analyze Operation


Compaq Analyze must be installed on the customers system. The Compaq Analyze software is
automatically started when the operating system is booted. Compaq Analyze continuously
monitors the system event log for error conditions. When one or more events trigger one or
more of the rules in the ruleset, an analysis is performed on the event and a problem report may
be generated.
A report of the error is sent to the users on the notification mailing list. If DSNlink is installed
on the system, a call is also logged with the local Customer Support Center (CSC).

8-19

8-20

DS20E Service Guide

Compaq Analyze Analysis Components


The components of Compaq Analyze use a messaging model to communicate with each other.

1. The system event log is the source of system event information. When one or more events
are logged, they are sent to the Decomposer for translation.
2. The Decomposer performs the bit-to-text translation of the events sent from the system
event log. The Director routes the event and data packets among the different services in
the Compaq Analyze system.
3. The Director sends the translated event from the Decomposer to the Analysis Engine. The
Analysis Engine can operate on multiple events at a time.
4. The Analysis Engine consults the ruleset database to see if one of the rules applies to the
event. Not all events will cause a rule to indicate that a problem has occurred. If a rule
indicates that an error has occurred, the Analysis Engine sends the analysis results to the
Director.
5. The Director sends the analysis results to the graphical user interface (if it is running) and
to the Notification Service.

OS Diagnostics Overview

Compaq Analyze Interface


Compaq Analyze uses a browser style graphical user interface (GUI). Versions of this GUI are
available for OpenVMS and Tru64 UNIX. The GUI allows the user to:

View event problem reports

View the translated output of event log entries

Configure tool components

Create user groups and add or remove nodes

Connect to specific nodes to view event logs from other systems

When an error is detected, it is reported to the console with a series of Problem Found
statements.

8-21

8-22

DS20E Service Guide

Using Compaq Analyze with a Standard Browser


To use Compaq Analyze with a standard browser:
1. Start up the browser.
2. Point to the node URL, such as http://reza.zko.dec.com:7902, where the 7902 points it to
the Compaq Analyze director.
Compaq Analyze operates basically the same from a browser as it does from the interface that
comes with the tool. The benefit of using Compaq Analyze with a browser is that you have the
additional functionality of your standard browser, including Print and Save functions.
For more information about Compaq Analyze, see the lecture/lab course (EY-Z487E-LO-0001)
available on the Learning Utility.

Compaq Analyze Error Report


A full analysis of the error can be displayed by double-clicking on the Problem Found: hot
spot on the active screen. This analysis provides details about the error, including a full
description, the most likely FRU, and supporting evidence.
Evidence provided depends upon the type of error detected (machine check code). The Evidence
section of the Compaq Analyze report provides information that leads the tool to identify the
failing FRU and its location. For more information, see the Regatta Platform Fault Management
Specification at: http://innov8.ogo.dec.com/ds20e/pfms.doc
Problem Found: System correctable DMA memory event threshold exceeded. at
Wed Aug 04 15:29:38 EDT 1999
Managed Entity:
System Entity: enugu - Compaq AlphaServer DS20
Event ID_Prefix: 42943 Event ID_Count: 54
Brief Description:
System correctable DMA memory event threshold exceeded.
Callout ID:
000C010000078005
Severity:
2
Reporting Node:
enugu
Full Description:
A System correctable direct mapped DMA ECC Memory event at system address
x013570000 has been diagnosed. This is a proactive maintenance indictment
is due to exceeding a single-bit ECC threshold. The system still
continues to normally operate, however, it is recommended that a service
call be scheduled within the near future. This will avoid unexpected

OS Diagnostics Overview

system unavailability. This System event requires replacement of the


Array 0 Set 0 DIMM 1 field replaceable unit. This FRU is physically
located in the system motherboard interconnect slot J11.
FRU List:
Probability: High
Manufacturer: Compaq
Device Type: 128Mb Memory DIMM
Physical Location: Slot J11
FRU Part Number: 54-24941-EA
FRU Serial Number: Fru SN not available
FRU Firmware Rev: Fru FW Rev not available
Evidence:
Entry Errlog: SMM_1920 SysType_34 OS_Type_1 Entry_Type_620 Entry_Type_Ana
Entry_Type_No_Disp Mchk_Error_Code_516
Event_Header_Common_Fields_V2_0
OS_Type: 1
Logging_CPU: 0
DSR_Msg_Num: 1920
Unique_ID_Count: 54
Unique_ID_Prefix: 42943
Event_Header_UNIX_Specific_Fields_V2_0
TLV_Processing_Support
TLV_Time_as_Local: Thu, 27 May 1999 10:47:00 -0400
TLV_Computer_Name: enugu
SMM_Decode_Support
Member_ID: 6
Entry_Type_Support
Entry_Type: 620
CPU_EV6_Corr_Regs_V1
Mchk_Error_Code: x00000204
I_STAT: x0000000000000000
DC_STAT: x0000000000000000
C_ADDR: x0000000000000000
C_SYNDROME_1: x0000000000000000
C_SYNDROME_0: x0000000000000000
C_STAT: x0000000000000000
MM_STAT: x0000000000000000
Systype34_Sys_Regs_V1
Cchip_DIR: x4000000000000000
Cchip_MISC: x0000000100000020
P0_Perror: xDC10135700000800
P1_Perror: x0000000000000000
Subpacket_Support
Subpacket_Header_Support
Subpkt_C12_T07_V1
AAR_0: x0000000000006005
AAR_1: x0000000000000000
AAR_2: x0000000000000000
AAR_3: x0000000000000000
Subpacket_Header_Support
Trailer_Frame_Support

8-23

Chapter

Removal and Replacement


Procedures

Introduction
As a service engineer, you often need to add components, upgrade the system, or remove and
replace faulty FRUs to restore the system to error-free operation. Becoming familiar with the
procedures for FRU removal and replacement can minimize the time you spend upgrading or
repairing the system.
In addition to precautions to follow before beginning any procedure, this chapter includes the
removal and replacement procedures for all major system FRUs (listed below) as well as
information on system cabling.

Side cover

Operator control panel

PCI/ISA options

Storage subsystem

Removable media drive bay

CPU guide bracket

CPU daughter card

DIMMs

System board

Fans

Battery

Power supply

Speaker

Side cover interlock

Power supply backplane

Server features module V2 (SFM2)

9-2 DS20E Service Guide

FRU Part Numbers


FRU Part Numbers
SN-PBXGK400712-B21 ELSA GLORIA SYNERGY 8MB GFX
BB
SN-KZPCA136237-B21 1 CHANNEL WIDE ULTRA-2 ADAPTER
AA
3X-DE602-AA 134718-B21 PCI-TO-DUAL PORT FAST ETHERNET "TX" ADAPTER BASE MODULE
74-60528-01
BUTTON OCP
74-60336-01
BEZEL OCP
74-60247-05
COVER, TOP, RACK
74-53914-01
BUTTON OCP CUSTOM
70-40255-01
BEZEL RACK ASSY
70-40254-01
BEZEL PEDESTAL ASSY
70-40168-01
BEZEL ASSY RACK
70-40142-01
FAN ASSEMBLY
70-40136-01
ASM BLANK 1.6"
70-31349-01
SPEAKER ASSEMBLY (WIDETOWER)
54-30358-01
SERVER FEATURES MODULE V2
MODULAR FOUR DRIVE LVD/SE SCSI BACKPLANE SUPPORTING
54-25657-01
1.6"SC
MODULAR FOUR DRIVE LVD/SE SCSI BACKPLANE SUPPORTING
54-25657-01
1.6"SC
54-25651-01
OPERATOR CONTROL PANEL
54-25649-01
PADDLE MODULE
54-25647-01
POWER BACKPLANE
54-24941-HA
256MB 200 PIN 3.3V 128M-BIT SDRAM(32MX4) SINGLE DENSITY
54-24941-FA
256 MB 200 PIN 3.3V 64M-BIT SDRAM(16MX4) DOUBLE DENSITY
54-24941-EA
128 MB 200 PIN 3.3V 64M-BIT SDRAM (16MX4) SINGLE DENSITY
54-25053-BA
64 MB 200 PIN DIMM 100 MHz
54-24758-35
DP264 500MHZ 4MB IBM L2/HAN (P2.5 IBM)
54-24758-33
DP264 500MHZ 4MB L2 CACHE/HAN (P2.5 Motor)
54-30060-01
667 MHZ 8 MB DDR EV67 CPU
17-03970-04
CABLE ASSY FLAT 34 COND W/2 POL IDC FEM SKTS (Floppy)
17-03971-09
CABLE ASSY,FLAT,10COND,(2)POL RCPT CONNS
CABLE ASSY,FLAT,68CND,BUNDLED,(3)MALE MICRO D CONNS (LVD
17-04867-01
Ultra)
17-04678-02
CABLE ASSY,40COND, BUNDLED,(2)40 POS. RECPT,28AWG
17-04901-01
HARNESS ASSY, 24COND 18AWG (2)LOCKING RCPT CONN (Power)
17-04902-01
HARNESS ASSY,16COND,18AWG,(2)LOCKING RCPT CONN
17-04903-01
HARNESS ASSY,18COND,18AWG,(2)LOCKING RCPT CONN
17-04904-01
HARNESS ASSY, 20COND, 18AWG, (1)20POS, (1) 24 POS RCPT CONN
17-04905-01
HARNESS ASSY,3COND,22AWG,(1)4POS,(1)3POS RCPT CONN

Removal and Replacement Procedures

9-3

FRU Part Numbers


17-04907-01
17-04908-01
17-04909-01
17-04912-01
17-04917-01
30-50662-01
30-50871-01
30-50827-01
30-50871-02
30-50827-02
30-50871-04
30-50827-04
30-50827-05
30-50835-01
30-50846-01

HARNESS ASSY,8COND,18AWG,(2) POL RCPT CONNS (3.5V sense)


HARNESS ASSY,14COND,18/20AWG,(4) POL RCPT CONNS
CABLE ASSY,FLAT 20COND,28AWG,(2) POL RCPT CONNS
HARNESS ASSY,3 POS CONN (2) RING TERM (3) QUICK TERM
HARNESS ASSY, 2COND,24AWG,6POS. CONN., MAG. SENSOR
POWER SUPPLY,375 WATTS,PFC,WITH I SQUARED C OPTIONS
ENCLOSURE ASSY
ENCLOSURE, ASSY,SYSTEM BOX, RACK MOUNT
PEDESTAL KIT
PEDESTAL KIT
RACK MOUNT KIT
RACK MOUNT KIT, MARQUEE
CD ROM / FLOPPY ASSY (BLUE)
KIT,FAN ASSEMBLY,FOR POWER SUPPLY (Blank for 30-50662-01)
CARD, ETHERNET AND VIDEO COMBO
ALPHADP264 PLANAR; 21271 CHIPSET W/MEMORY & I/O EXPANSION
54-24756-03
(MLB)
For the most up-to-date list of part numbers, consult the DS20E Spare Kits.

Precautions
Before beginning FRU removal and replacement procedures, be sure to:

Put on your electrostatic discharge (ESD) wrist strap to avoid damaging any circuitry.

Record system configuration information.


NOTE: Configuration information stored in the flash ROM and TOY clock on the system
board is lost when the system board is replaced or if the battery is replaced. If possible,
use the SRM console commands, show config and show *, to display the configuration
information, and record that information on paper.

9-4 DS20E Service Guide

Side Cover
To remove the side cover from the system:
1.

Shut down the system and all peripheral devices.

2.

Turn the system power off.

3.

Unplug the power cord.

4.

Open the front bezel.

5.

Loosen the screw that secures the side cover.

6.

Slide the side cover back, and then out and away from the enclosure.

Reverse steps 1 through 6 to replace the side cover.

Removal and Replacement Procedures

Operator Control Panel


To remove the OCP:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug the OCP connector from the OCP board.
6. Pull the OCP board away from the chassis.
7. Press down on the top release of the OCP and then pull it away from the chassis.
Reverse steps 1 through 7 to replace the OCP.

9-5

9-6 DS20E Service Guide

PCI/ISA Options
To remove a PCI/ISA option card:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug all external cables connected to the PCI/ISA option card.
6. Remove the screw securing the PCI/ISA option card to the rear of the chassis.
7. Gently pull the PCI option card out of the system board socket.
Reverse steps 1 through 7 to replace the PCI/ISA option card.

Removal and Replacement Procedures

Storage Subsystem
To remove a storage subsystem:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Remove all hot plug hard drives.
6. Remove the four screws securing the cage to the enclosure.
7. Loosen the two captive screws on the power supply FCC door and swing the door open.
8. Slide the storage subsystem forward to gain access to the back of the subsystem.
9. Unplug the cables from the back of the storage subsystem.
10. Remove the four screws securing the storage subsystem to the chassis.
11. Pull the storage subsystem out and away from the chassis.

9-7

9-8 DS20E Service Guide

12. If necessary, remove the six screws securing the storage subsystem backplane to the
storage subsystem.
13. Pull the storage subsystem backplane away from the storage subsystem.
Reverse steps 1 through 11 to replace the storage subsystem.

Removal and Replacement Procedures

Removable Media Drive Bay


To remove the media drive bay:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug the combination CD-ROM/diskette drive from the removable media drive bay.
6. Remove the screws securing the removable media drive bay to the chassis.
7. Slide the removable media drive bay forward and out of the chassis.
8. If necessary, remove the screws securing the backplane to the removable media drive bay.
Reverse steps 1 through 8 to replace the removable media drive bay.

9-9

9-10 DS20E Service Guide

CPU Daughter Card


To remove the CPU daughter card:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug all cables from the CPU daughter card.
6. Loosen the captive screws securing the CPU daughter card to the chassis.
7. Carefully pull the CPU daughter card out and away from the system board.
Reverse steps 1 through 7 to replace the CPU daughter card.

CAT0040A

Removal and Replacement Procedures

CPU Guide Brackets


To remove the CPU guide brackets:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Remove the four screws securing the CPU guide brackets to the chassis.
6. Remove the CPU guide bracket.
Reverse steps 1 through 6 to replace the CPU guide brackets.

9-11

9-12 DS20E Service Guide

System Board
Removing the system board requires the removal of other FRUs. Review the removal
procedures for the items listed in steps 1 through 11 before you begin.
1. Record the configuration information.
2. Shut down the system and all peripheral devices.
3. Turn the system power off.
4. Unplug the power cord.
5. Remove the side cover.
6. Remove all PCI/ISA options.
7. Remove the CPU daughter cards.
8. Remove the CPU guide brackets.
9. Remove the CPU card guides.
10. Unplug all cables connected to the system board.
11. Remove the storage subsystem.
12. Pull all power cables out of the system board compartment.
13. Remove the screws securing the system board to the chassis.
14. Tip the system board to allow the serial and parallel port connectors to clear the opening.
15. Tip the board from the PCI end, and carefully slide it out and away from the chassis.
Reverse steps 1 through 15 to replace the system board.

Removal and Replacement Procedures

DIMMs
To remove DIMMs, you may need to remove a CPU daughter card.
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Remove the CPU daughter card,
if necessary.
6. To release a DIMM from the system board, press down on the two latches (one at each
side of the DIMM).
7. Gently pull the DIMM out of the system board socket.
To install a replacement DIMM:
1. Orient the key notches, and then insert the DIMM straight into the socket.
2. Press down firmly until both retaining levers engage the DIMM.
3. Replace the side cover and power cord.
4. Turn on the power to the system.
NOTE: Follow the Memory Configuration Rules when installing DIMMs.

9-13

9-14 DS20E Service Guide

Battery
CAUTION: Take care not to bend the battery hold-down spring when removing or
replacing the battery. A bent spring could result in intermittent system problems due to
poor contact with the battery.

1.

Record the configuration information.

2.

Shut down the system and all peripheral devices.

3.

Turn the system power off.

4.

Unplug the power cord.

5.

Remove the side cover.

6.

Gently pull out on the tab and then hold it open to release the system battery from the
chassis.

7.

Carefully slide the battery out and away from its holder.

8.

Release the tab.

Reverse steps 1 through 8 to replace the system battery.


WARNING: Discard used batteries according to manufacturers instructions.
Follow any country, state, or local statutes for proper battery disposal.
If the battery is incorrectly replaced, there is a danger of explosion. Replace the
battery only with the same or equivalent type recommended by the
manufacturer (Rayovac BR2032, 12-41476-06).

Removal and Replacement Procedures

Fans
To remove a fan:
1. Loosen the captive screws securing the system fan to the rear of the chassis.
2. Pull the system fan out to unplug it from its power socket and pull it away from the
chassis.
Reverse steps 1 and 2 to replace the fan.

9-15

9-16 DS20E Service Guide

Speaker
To remove the speaker:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug the speaker cable from the system board.
6. Gently route the speaker cable up and through the interior of the chassis.
7. With the speaker cable free, slide the speaker toward the front of the system and then pull
it away from the chassis.
Reverse steps 1 through 7 to replace the speaker.

Removal and Replacement Procedures

Power Supply
To remove a power supply:
1. For a dual power supply configuration, complete the preparation procedures. If you have
an N+1 power configuration (three power supplies), you do not need to turn off the power
for a hot plug power supply replacement.
2. Loosen the thumbscrews securing the power supply grid, and remove the grid.
3. Loosen the thumbscrew on the power supply handle, and then pull it down to release it
from the power supply backplane.
4. Using the handle, pull the power supply from the system.
Reverse steps 1 through 4 to replace the power supply.

9-17

9-18 DS20E Service Guide

Power Supply Backplane


To remove the power supply backplane:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Remove the power supplies.
6. Unplug all cables from the power supply backplane.
7. Remove the four screws securing the power supply backplane to the chassis.
8. Disconnect the A/C harness from the rear of the power supply backplane.
9. Slide the power supply backplane out and away from the chassis.
Reverse steps 1 through 8 to replace the power supply backplane.

Removal and Replacement Procedures

Side Cover Interlock


To remove the side cover interlock:
1.

Shut down the system and all peripheral devices.

2.

Turn the system power off.

3.

Unplug the power cord.

4.

Remove the side cover.

5.

Unplug the side cover interlock from the system board.

6.

Remove the screw securing the side cover interlock to the chassis.

7.

Pull the side cover interlock out and away from the chassis.

Reverse steps 1 through 7 to replace the side cover interlock.

9-19

9-20 DS20E Service Guide

Server Features Module (SFM2)


To remove the server features module:
1.

Shut down the system and all peripheral devices.

2.

Turn the system power off.

3.

Unplug the power cord.

4.

Remove the side cover.

5.

Remove all cables from the server features module.

6.

Gently pry the server features module off the snap standoffs.

Reverse steps 1 through 6 to replace the server features module.


NOTE: After replacing the SFM2, power up the system and use the following
examine command to check that the system type is set correctly. See the
Troubleshooting chapter for instructions on resetting the system type.

P00>>> e -b iic_rcm_nvram0:11

Removal and Replacement Procedures

System Cabling
Data and power cables for the DS20E system include those attached to SCSI devices, IDE
devices, the floppy drive, the power supply, the front cover, and the server features module.

Data and Signal Cabling Information


Cable Description

SFM2 to MLB
Speaker Cable
SFM2 to OCP
Side Cover Interlock Cable
IDE
Floppy

2-5-2 Part
Number

17-04909-01
17-03971-01
17-04971-01
17-04678-02
17-03970-04

Source

Destination

MLB J37
MLB J39
SFM2 J3
Interlock
MLB J45
MLB J43

SFM2 J5
Speaker
OCP J1
SFM2 J2
Removable Media Drive J1
Removable Media Drive J3

Power Cabling Information


The power supply provides six regulated outputs to the system:
+3.3V, +5.0V, +12.0V, -5.0V, -12.0V, and +5VAUX

PS Cable
Connector
J3 P1
J5 P3
J1 P5
J2 P5
J8 P7
J4 P9
J7 P13

SFM2 J6, J7

Destination
MLB J3 P2
MLB J33 P4
CPU0 J1 P6
CPU1 J1 P6
SFM2 J1 P8
MLB J4 P10
Storage
Subsystem
Removable Media
Drive J2
SCSI Drive Board
J1 P11
J6 - Fan0, J7 Fan1

2-5-2
Part Number
17-04901-01
17-04902-01
17-04903-01
17-04903-01
17-04904-01
17-04907-01
17-04908-01

Description
3V Power
5V Power
CPU0 Power
CPU1 Power
SFM2 Power
3/5V Sense
Storage Power
RMD Power
SCSI Power

17-04905-01

Fan Power

9-21

Chapter

10

Compaq Insight Manager

Introduction
Compaq Insight Manager (CIM) is a comprehensive management tool used to monitor and
control the operation of Compaq Alpha-based servers.
Topics in this chapter are:

Overview

Functions of Compaq Insight Manager

Insight Manager Components

Characteristics of Compaq Insight Manager

10-2

DS20E Service Guide

Overview
Compaq Insight Manager consists of two components:

Windows-based console application

Server- or client-based management data collection agents

A Compaq Insight Manager XE environment consists of three major components:

Server running the Compaq Insight Manager XE application

SQL database

Browser console

Management Agents monitor over 1,000 management parameters, encompassing health,


configuration, and performance data. The agents act upon the data by initiating alarms when
faults occur and by providing updated management information, such as network interface or
storage subsystem performance statistics. Compaq Insight Manager monitoring and alerting
capabilities provide real control over the critical systems in your area.
Compaq Insight Manager XE can monitor and manage network devices that are running
Compaq Insight Agents. These agents provide real-time status information about the hardware
and software on each node. Insight Agents allow you to perform such functions as remote
shutdown and remote configuration.
Network devices that are unable to use Insight Agents do not report as many details. These
devices can be remotely monitored but not remotely managed.

Compaq Insight Manager

Functions of Compaq Insight Manager


Compaq Insight Manager (CIM) is a comprehensive management tool that:

Manages Compaq servers, desktops, portables, and networking devices

Manages third-party SNMP devices and DMI

Performs discovery, identification, and fault management

Serves as a database repository for asset and status reporting

Offers cluster monitor and administration-only MSCS today

10-3

10-4

DS20E Service Guide

Insight Manager Components


Compaq Insight Manager XE can monitor and manage network devices that are running
Compaq Insight Agents.
Three tasks are associated with managing events:

Notification

Control

Polling

The Agents consist of several sub-agents that report the health and status of various subcomponents of a managing device. Compaq Insight Manager XE can discover devices outside
the specified IP range due to an HTTP auto-discovery of a web-enabled agent on the network.

Management Tasks
Compaq Insight Manager XE can monitor and manage network devices that are running
Compaq Insight Agents. These agents provide real-time status information about the hardware
and software on each node. Insight Agents allow you to perform such functions as remote
shutdown and remote configuration.
Network devices that are unable to use Insight Agents do not report as many details. These
devices can be remotely monitored, but not remotely managed.

Security
By default when Compaq Insight Manager XE is initially installed, an administrator account is
created with a password of administrator. Change this immediately on the accounts page of the
Administer Insight Manager XE menu.

Compaq Insight Manager

Event Manager
Event management is accomplished by creating categories for logical groupings of devices. The
groups are polled to check their status. This information is stored in the SQL database.
Three tasks are associated with managing events:

Notification

Control

Polling

After these tasks are configured, event information is collected and stored, and notifications are
sent as needed. Any control task, such as launching an application to virus-check the system,
becomes automated. The administrator sets the frequency and level of the event thresholds. For
example, if a disk reaches 80% capacity, the administrator might want an alert generated so that
the disk can either be purged or taken off-line.

Device Manager
Compaq Insight Manager XE can monitor and manage network devices that are running
Compaq Insight Agents. These agents provide real-time status information about the hardware
and software on each node. Insight Agents allow you to perform such functions as remote
shutdown and remote configuration.
Network devices that are unable to use Insight Agents do not report as many details. These
devices can be remotely monitored, but not remotely managed.

HTTP Server/UI Server


If HTTP auto-discovery is enabled, it is possible that Compaq Insight Manager XE will discover
devices outside the specified IP range due to an HTTP auto-discovery of a web-enabled agent
on the network. Therefore, if you delete a device from the Manage device list, disable HTTP
auto-discovery to prevent the device from being re-discovered.

SQL 6.5 Server Requirements


Component
Hardware
System Memory
Disk Space

Operating System
Installed Services
Internet Browser
Relational Database

Requirement
All hardware must be on the Microsoft hardware compatibility list
96MB RAM with SQL Server running on a remote system
25MB for the master SQL database
100MB for the Insight Device database
200MB for the Insight Device log
Microsoft Windows NT Server 4.0 with Service Pack 3 or later
Compaq SSD 2.08 for Windows NT 4.0
TCP/IP, SNMP, IPX installed for management of IPX devices
Internet Explorer 4.01 with Service Pack 1 or later
Microsoft SQL Server 6.5 with Service Pack 4 or SQL 7.0
(can be installed on a separate server)

10-5

10-6

DS20E Service Guide

Compaq Web Enhanced Agents


The Insight Agents consist of several sub-agents that report the health and status of various subcomponents of a managing device. This agent information can be managed by using the
standard Compaq Insight Manager console product, which is based on SNMP.
Some of the management sub-components consist of the following:

Insight Host Agent

Insight Server Agent

Insight Storage Agent

Insight Agents Event Notifier

For More Information


To learn more about Compaq Insight Manager, see the Compaq Insight Manager XE WebBased Training at:
http://mcsww1.das.dec.com/mcsl_abs/0b0008ae800e260e.asp

Index

A
Accessories, 3-10
Addressing considerations, 4-9
AlphaBIOS
running from a serial terminal, 5-25
starting, 5-22
AlphaBIOS console, 5-2
AlphaServer DS20E system type, 6-30
AlphaStation DS20E system type, 6-30
Architecture, 2-2

B
Battery
removing, 9-14
Bcache, 2-3
Bcache configuration, 4-5
Bcache interface, 2-5
Beep codes, 6-18
Bezel, attaching, 3-25
boot command, 5-19
Boot problems, 6-8
Booting OpenVMS, 3-29
Buses, 2-6
Buttons, on OCP, 1-7

C
Cable management arm, installing, 3-23
Cables, dressing, 3-24
Cabling
data and signal, 9-21
power, 9-21
system, 9-21
cache test, 6-17
cat command, 5-11
cat el command, 6-26
C-chip, 2-3, 2-8
Clearance, system, 3-3
Clock interface, 2-5
Common OS header, 8-6
Compaq Analyze, 8-18

components, 8-20
error report, 8-22
interface, 8-21
operation, 8-19
using with browser, 8-22
Compaq Insight Manager, 10-2
components, 10-4
functions, 10-3
SQL requirements, 10-5
web-enhanced agents, 10-6
Components, 1-4, 1-17
Configuration tracking, 2-19
Configuration Utility
running, 5-26
Connecting the system, 3-4
Conventions
keyboard, 5-24
Corrected error frame, 8-16
CPU data, 8-16
header, 8-16
system data, 8-17
CPU
features, 2-4
subsystem, 2-5
CPU daughter card
removing, 9-10
CPU fans, 2-17
CPU guide bracket
removing, 9-11
CPU modules, 1-16
CPU speed switch settings, 4-6
CPU SW1, 4-5
CPU SW2, 4-7
CPU to PCI address translation, 4-11
CPU voltage settings, 4-7
CPU, upgrading to EV67, 4-25
crash command, 5-21
Crash dump, 5-21
Crash dumps, 8-2
Cross-bar switch, 2-7

D
D-chips, 2-3, 2-7
DEC VET, 8-3

Index-2

deposit command, 5-12


Device driver
updates, 5-2
Dimensions, system, 3-3
DIMM configuration rules, 2-9
DIMMs, 2-3
qualified, 2-9
removing, 9-13
DIMMs, qualified, 4-8
Direct mapping, 4-15
DMA configuration, 4-23
Documentation, rackmount, 3-8
Doors, 1-16
DS20E, compared to DS20, 1-20

E
edit command, 5-12, 5-17
Electrical specifications, 1-19
Environment variables
verifying, 3-28
Environment variables list, 5-8
Environmental error logout frames, 8-15
Environmental logic, 2-14
Environmental specifications, 1-20
Error classes, 8-7
Error logout frame, 8-6
Error state logging, 2-19
examine command, 5-12
Extended error log block, 8-7

F
Fail-safe booter utility, 6-13
Failure register, 7-13
Failures reported by OS, 6-10
Failures reported on console, 6-7
Fan fault interrupt, 4-22
Fans
LEDs on SFM2 module, 2-15
removing, 9-15
system, 2-16
Fault shutdown, 2-17
Faults, isolating, 6-2
Firmware
location, 5-2
troubleshooting with, 6-24
updates, 5-2
updating, 3-7
Firmware configuration, 4-23
Firmware version, verifying, 3-27
Flash bypass settings, 4-7
Flash ROM test, 6-18
Flash select settings, 4-7
FRUs
part numbers, 9-2
precautions, 9-3
removal and replacement, 9-1
Function register, 7-14

G
Graphics options, 4-17

H
halt command (RMC), 5-29
Halt interrupt, 4-21
haltin command (RMC), 5-29
haltout command (RMC), 5-30
Hang, at power-on, 6-10
help or ? command, 5-21
help or ? command (RMC), 5-30

I
I/O, 1-3
I/O subsystem, 2-10
I2C bus, 2-18
init command, 6-24, 6-25
Initializing system, 5-10
Installation
checklist, 3-2
verifying, 3-5
Interlock, 1-11
installing, 3-21
removing, 9-19
Interrupt configuration, 4-20
ISA bus, 4-18
ISA data path test, 6-18
ISA interface, 2-11
option slot, 2-12
super I/O chip, 2-11
ISA interrupt assignments, 4-22
ISA restrictions, 4-18

K
kill command, 6-29
kill_diags command, 6-29

L
LED codes, POST, 6-22
LEDs
CPU, 6-21
front panel, 6-19
on OCP, 1-7
server features module, 6-22
Linux
boot example, 3-32
booting, 3-31
installing, 3-31
Locking the system, 3-7
Logout frame field descriptions, 8-14
Logout frames, locating, 8-13
Loopback tests
commands for running, 6-29
ls command, 5-11

Index-3

M
Machine check logout frame, 8-10
CPU data, 8-11
header, 8-10
system data, 8-12
Machine checks, 8-5
correctable, 8-8
uncorrectable, 8-9
Maintenance bus, 2-18
Mechanical specifications, 1-18
Memory
configuration, 2-9
subsystem, 2-9
upgrading, 4-25
Memory, 1-3
Memory configuration rules, 4-8
Memory configurations, 2-10, 4-8
Memory DIMMs, 2-3
Memory problems, 6-10
Memory test, 6-17
more command, 5-11
more el command, 6-24, 6-26
Mounting brackets, attaching, 3-12
Mounting hardware, 3-11

N
Nvram script, editing, 5-17

O
OCP, 1-6
removing, 9-5
OCP display, 2-19
Online resources, 6-30
OpenVMS
booting from InfoServer, 3-30
booting from local CD, 3-29
installing, 3-29
shutting down, 3-6
Operating system
shutting down, 3-6
Operator control panel. See OCP
Option cards
removing, 9-6
Options, obtaining, 4-25
OS machine check handler, 8-5

P
Packaging, 1-3
PAL, functions, 2-16
PALcode, error checking, 8-5
P-chip, 2-8
P-chips, 2-3
PCI and ISA configuration, 4-17
PCI assignment tables, 4-18
PCI bus problems, 6-11
PCI data path test, 6-18

PCI DMA translation, 4-14


PCI interface, 2-10
enhanced IDE, 2-11
interrupt controller, 2-11
ISA bridge, 2-10
keyboard/mouse, 2-10
power management support, 2-11
real-time clock, 2-10
PCI options, adding, 4-26
PCI parity error, 6-11
PCI restrictions, 4-18
PCI slot numbering, 4-17
PCI slots, 1-15
PCI space 0, 4-12
PCI space 1, 4-14
Positioning system, 3-3
POST, 3-5
Power control logic, 2-19
Power problems, 6-4
Power sequence, 6-12
Power supplies, 1-10
LEDs on SFM2 module, 2-15
Power supply
backplane removal, 9-18
removing, 9-17
poweroff command (RMC), 5-30
poweron command (RMC), 5-30
Power-on self-test, 6-17
Power-up script, 5-17
Problem categories, 6-4
Problems accessing console mode, 6-6

Q
quit command (RMC), 5-31

R
Rackmount
accessories, 3-10
documentation, 3-8
installation area, 3-9
Real-time clock interrupt, 4-21
Rear panel connections, 1-8
Registers
Cbox Read, 7-7
DC_STAT, 7-6
DIRn, 7-10
failure, 7-13
function, 7-14
I_STAT, 7-2
MISC, 7-8
MM_STAT, 7-5
PERROR, 7-11
Remote management console. See RMC
Removable media, 1-12
Removable media drive bay
removing, 9-9
reset command (RMC), 5-31
Resetting RMC to defaults, 5-35

Index-4

RMC, 5-28
resetting to defaults, 5-35
setting up, 5-28
troubleshooting, 5-36
RMC commands, 5-29
RMC microprocessor, 2-17
RMC switchpack, 5-33
changing a setting, 5-34

S
Scatter/gather mapping, 4-15
Scripts, 6-28
commands for running, 6-29
SCSI cable length, 4-16
SCSI configuration, 4-16
SCSI controllers, 4-18
SCSI IDs, 4-16
SCSI problems, 6-12
SCSI termination, 4-16
SCU-SCSI CAM utility, 8-2
Server features module, 1-13
removing, 9-20
set command, 5-7
setesc command (RMC), 5-31
SFM2, 1-13
30-second shutdown, 2-17
CPU fans sense logic, 2-17
inverter, 2-16
logic, 2-14
power supplies, 2-15
RMC microprocessor, 2-17
status LEDs, 2-15
system fans sense logic, 2-16
temperature sensor, 2-17
SFM2 PAL, 2-16
show command, 5-7
show config command, 5-4
show cpu command, 5-5
show device command, 5-6
show hwrpb command, 8-13
show memory command, 5-6
show pal command, 5-6
show power command, 5-7
show version command, 5-7
show_status command, 6-27
showit command, 6-27
Shutting down, 3-6
Side cover, removing, 9-4
Slide brackets
attaching to rails, 3-16
attaching to slides, 3-14
Speaker, 1-16
removing, 9-16
SRM commands
for configuring system, 4-24
SRM console, 5-2
invoking, 5-3
startup sequence, 5-3
SROM flash select, 4-6

SROM interface, 2-5


Stabilizer foot, 3-18
Startup and boot defaults, changing, 3-28
status command (RMC), 5-32
Storage subsystem, 1-11
removing, 9-7
Super I/O chip, 2-11
Switchpacks, 4-2
System
attaching to rack, 3-20
clearance, 3-3
connecting, 3-4
dimensions, 3-3
installing into rack, 3-19
locking, 3-7
shutting down, 3-6
System block diagram, 2-2
System board, 1-14
removing, 9-12
System board SW2, 4-2
System board SW3, 4-4
System building blocks, 4-1
System configuration, displaying, 5-4
System description, 1-2
System interface, 2-6
System overview, 1-1
System, positioning, 3-3

System type, changing, 6-30

T
Temperature
LEDs on SFM2 module, 2-15
Temperature threshold, 2-17
Termination block, 8-7
test command, 6-24, 6-25
Tests, terminating, 6-29
Thermal problems, 6-10
Third-party devices, adding, 4-26
TIG bus interrupt assignments, 4-21
TIG interface, 2-12
CSR registers and switchpack, 2-13
flash ROM, 2-13
IRQs, 2-13
TIG interrupt processing, 4-22
Tools and utilities, 6-13
Troubleshooting
considerations, 6-2
strategy, 6-3
Troubleshooting RMC, 5-36
Tru64 UNIX
booting, 3-27
shutting down, 3-6
starting installation, 3-26

Index-5

Upgrading memory, 4-25

U
Updating device drivers, 4-26
Updating firmware, 3-7, 4-26
Upgrading CPU to EV67, 4-25

V
VGA option configuration, 4-17

Você também pode gostar