Escolar Documentos
Profissional Documentos
Cultura Documentos
AlphaStation DS20E
Service Guide
Order Number: EK-K8F6W-SV. A01
Notice
First Printing, February 2000
2000 Compaq Computer Corporation.
COMPAQ, Compaq Insight Manager, the Compaq logo, and OpenVMS Registered in U.S. Patent and Trademark
Office. AlphaServer and Tru64 are trademarks of Compaq Information Technologies Group, L.P. in the United States
and/or other countries. Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark in the U.S.
and other countries, licensed exclusively through X/Open Company Ltd. All other product names mentioned herein
may be the trademarks or registered trademarks of their respective companies.
Compaq shall not be liable for technical or editorial errors or omissions contained herein. The information in this
publication is subject to change without notice.
FCC Notice: The equipment described in this manual generates, uses, and may emit radio frequency energy. The
equipment has been type tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of
FCC rules, which are designed to provide reasonable protection against such radio frequency interference. Operation
of this equipment in residential area may cause interference in which case the user at his own expense will be required
to take whatever measures may be required to correct the interference. Any Microsoft, modifications to this device
unless expressly approved by the manufacturercan void the authority to operate this equipment under part 15 of the
FCC rules.
Shielded Cables: If shielded cables have been supplied or specified, they must be used on the system in order to
maintain international regulatory compliance.
Warning! This is a Class A product. In a domestic environment this product may cause radio interference in which
case the user may be required to take adequate measures.
Achtung! Dieses ist ein Gert der Funkstrgrenzwertklasse A. In Wohnbereichen knnen bei Betrieb dieses Gertes
Rundfunkstrungen auftreten, in welchen Fllen der Benutzer fr entsprechende Gegenmanahmen verantwortlich
istAttention! Ceci est un produit de Classe A. Dans un environnement domestique, ce produit risque de crer des
interfrences radiolectriques, il appartiendra alors l'utilisateur de prendre mesures spcifiques appropries.
Contents
Contents
About This Guide
Intended Audience............................................................................................... xiii
Document Organization ....................................................................................... xiii
DS20E Documentation ........................................................................................ xiv
Symbols in Text .................................................................................................. xiv
Rack Stability....................................................................................................... xv
Alpha Web Site .................................................................................................... xv
Chapter 1
System Overview
Introduction......................................................................................................... 1-1
Product Description ............................................................................................. 1-2
Product Packaging............................................................................................... 1-3
Memory and I/O .................................................................................................. 1-3
System Components ............................................................................................ 1-4
Operator Control Panel ................................................................................. 1-6
Rear Panel .................................................................................................... 1-8
Power Supplies ........................................................................................... 1-10
Storage Subsystem...................................................................................... 1-11
Side Cover Interlock ................................................................................... 1-11
Removable Media....................................................................................... 1-12
Server Features Module (SFM2) ................................................................. 1-13
System Board ............................................................................................. 1-14
PCI Options................................................................................................ 1-15
CPU Modules ............................................................................................. 1-16
Speaker ...................................................................................................... 1-16
Doors ......................................................................................................... 1-16
Standard Components and Features.................................................................... 1-17
Mechanical Specifications ................................................................................. 1-18
vi
Chapter 2
Technical Overview
System Block Diagram ........................................................................................ 2-2
P-Chips ........................................................................................................ 2-3
C-Chip ......................................................................................................... 2-3
D-Chips........................................................................................................ 2-3
Bcache ......................................................................................................... 2-3
Memory DIMMs .......................................................................................... 2-3
CPU ............................................................................................................. 2-4
CPU Subsystem................................................................................................... 2-5
Bcache Interface ........................................................................................... 2-5
Clock Interface ............................................................................................. 2-5
SROM Interface ........................................................................................... 2-5
System Interface ........................................................................................... 2-6
Cross-Bar Switch................................................................................................. 2-7
D-chip (data slice) ........................................................................................ 2-7
C-chip (controller chip)................................................................................. 2-8
P-chip (peripheral interface chip) .................................................................. 2-8
Memory Subsystem ............................................................................................. 2-9
Memory Configuration Rules........................................................................ 2-9
Qualified DIMMs ......................................................................................... 2-9
Typical Memory Configurations ................................................................. 2-10
I/O Subsystem ................................................................................................... 2-10
PCI Interface .............................................................................................. 2-10
ISA Interface .............................................................................................. 2-11
Timing Interrupt and General (TIG) Interface.............................................. 2-12
Environmental Logic ......................................................................................... 2-14
SFM2 Status LEDs ..................................................................................... 2-15
SFM2 Power Supplies ................................................................................ 2-15
SFM2 Inverter ............................................................................................ 2-16
SFM2 PAL ................................................................................................. 2-16
SFM2 System Fans Sense Logic ................................................................. 2-16
SFM2 CPU Fans Sense Logic ..................................................................... 2-17
SFM2 30-Second Shutdown........................................................................ 2-17
SFM2 Temperature Sensor.......................................................................... 2-17
SFM2 Remote Management Controller Microprocessor .............................. 2-17
2
Maintenance Bus (I C Bus)................................................................................ 2-18
Monitoring System Conditions.................................................................... 2-19
Fault Display .............................................................................................. 2-19
Error State .................................................................................................. 2-19
Configuration Tracking............................................................................... 2-19
vii
Chapter 3
System Installation
Introduction......................................................................................................... 3-1
Preparing for Installation ..................................................................................... 3-2
Positioning the System......................................................................................... 3-3
Connecting the System ........................................................................................ 3-4
Verifying Hardware Installation........................................................................... 3-5
Shutting Down the System................................................................................... 3-6
Shutting Down the Tru64 UNIX Operating System ....................................... 3-6
Shutting Down the OpenVMS Operating System .......................................... 3-6
Updating the Firmware ........................................................................................ 3-7
Locking the System ............................................................................................. 3-7
Installing a Rackmount System ............................................................................ 3-8
Marking the Installation Area in the Rack...................................................... 3-9
Rack Accessories........................................................................................ 3-10
Preparing the System .................................................................................. 3-12
Preparing the Rack...................................................................................... 3-16
Attaching Slide Brackets to Rails ................................................................ 3-17
Stabilizing the Rack.................................................................................... 3-18
Installing the System................................................................................... 3-19
Installing U-Nuts ........................................................................................ 3-20
Installing the Interlock System .................................................................... 3-21
Installing the Cable Management Arm ........................................................ 3-23
Dressing the Cables .................................................................................... 3-24
Attaching the Front Bezel ........................................................................... 3-25
Starting a Tru64 UNIX Installation .................................................................... 3-26
Booting Tru64 UNIX......................................................................................... 3-27
Verifying the Firmware Version.................................................................. 3-27
Changing Startup and Boot Defaults............................................................ 3-28
Ensuring that Environment Variables Match System Configuration ............. 3-28
Installing OpenVMS.......................................................................................... 3-29
Booting OpenVMS............................................................................................ 3-29
Booting OpenVMS from the local CD-ROM Drive ..................................... 3-29
Booting OpenVMS from an InfoServer ....................................................... 3-30
Installing Linux ................................................................................................. 3-31
Booting Linux ................................................................................................... 3-31
Linux Boot Example ................................................................................... 3-32
Chapter 4
System Configuration
Introduction......................................................................................................... 4-1
Base System Configuration.................................................................................. 4-1
Switch Settings.................................................................................................... 4-2
System Board SW2....................................................................................... 4-2
System Board SW3....................................................................................... 4-4
CPU SW1..................................................................................................... 4-5
viii
Chapter 5
Firmware
Introduction......................................................................................................... 5-1
Firmware in the DS20E ....................................................................................... 5-2
SRM Console ............................................................................................... 5-2
AlphaBIOS Console ..................................................................................... 5-2
Updating Firmware and Device Drivers ........................................................ 5-2
Using the SRM Console ...................................................................................... 5-3
SRM Console Start Sequence........................................................................ 5-3
Displaying System Configuration.................................................................. 5-4
Showing and Setting Environment Variables................................................. 5-7
Initializing the System ................................................................................ 5-10
Listing and Reading a File .......................................................................... 5-11
Editing Files ............................................................................................... 5-12
ix
Chapter 6
Troubleshooting
Introduction......................................................................................................... 6-1
Basic Troubleshooting ......................................................................................... 6-2
Considerations Before Troubleshooting......................................................... 6-2
Steps for Isolating Faults............................................................................... 6-2
Troubleshooting Strategy .............................................................................. 6-3
Problem Categories ............................................................................................. 6-4
Power Problems............................................................................................ 6-4
No Access to Console Mode ......................................................................... 6-6
Console-Reported Failures............................................................................ 6-7
Boot Problems .............................................................................................. 6-8
Thermal Problems....................................................................................... 6-10
Operating System-Reported Failures ........................................................... 6-10
Memory Problems ...................................................................................... 6-10
PCI Bus Problems....................................................................................... 6-11
SCSI Problems ........................................................................................... 6-12
Power Up/Down Sequence ................................................................................ 6-12
Troubleshooting Tools and Utilities ................................................................... 6-13
Fail-Safe Booter (FSB) Utility .................................................................... 6-14
Power-On Self-Test (POST)........................................................................ 6-17
LEDs and Beep Codes ................................................................................ 6-18
Using Firmware to Troubleshoot........................................................................ 6-24
Using SRM Commands to Test the System ................................................. 6-24
Changing the System Type ................................................................................ 6-30
For More Information ........................................................................................ 6-30
Chapter 7
Error Registers
Introduction......................................................................................................... 7-1
Ibox Status Register............................................................................................. 7-2
Memory Management Status Register.................................................................. 7-5
Dcache Status Register ........................................................................................ 7-6
Cbox Read Register............................................................................................. 7-7
Miscellaneous Register........................................................................................ 7-8
Device Interrupt Request Registers .................................................................... 7-10
P-Chip Error Register ........................................................................................ 7-11
Failure Register ................................................................................................. 7-13
Function Register .............................................................................................. 7-14
Chapter 8
OS Diagnostics Overview
Introduction......................................................................................................... 8-1
Tru64 UNIX Diagnostic Tools............................................................................. 8-2
DEC VET ........................................................................................................... 8-3
Machine Checks .................................................................................................. 8-5
Operating System ......................................................................................... 8-5
Error Classes ................................................................................................ 8-7
Error Types .................................................................................................. 8-8
Machine Check Logout Frame (SCB 660 and 670)...................................... 8-10
Compaq Analyze ............................................................................................... 8-18
Compaq Analyze Operation ........................................................................ 8-19
Compaq Analyze Analysis Components...................................................... 8-20
Compaq Analyze Interface.......................................................................... 8-21
Using Compaq Analyze with a Standard Browser........................................ 8-22
Compaq Analyze Error Report .................................................................... 8-22
Chapter 9
Removal and Replacement Procedures
Introduction......................................................................................................... 9-1
FRU Part Numbers .............................................................................................. 9-2
Precautions.......................................................................................................... 9-3
Side Cover .......................................................................................................... 9-4
Operator Control Panel ........................................................................................ 9-5
PCI/ISA Options ................................................................................................. 9-6
Storage Subsystem .............................................................................................. 9-7
Removable Media Drive Bay............................................................................... 9-9
CPU Daughter Card........................................................................................... 9-10
CPU Guide Brackets.......................................................................................... 9-11
System Board .................................................................................................... 9-12
DIMMs ............................................................................................................. 9-13
Battery .............................................................................................................. 9-14
Fans .................................................................................................................. 9-15
xi
Chapter 10
Compaq Insight Manager
Introduction....................................................................................................... 10-1
Overview........................................................................................................... 10-2
Functions of Compaq Insight Manager............................................................... 10-3
Insight Manager Components ............................................................................ 10-4
For More Information ........................................................................................ 10-6
xiii
Intended Audience
This manual is for service providers who are responsible for servicing Compaq
AlphaServer DS20E and AlphaStation DS20E systems.
Document Organization
This manual has nine chapters.
Chapter 1, System Overview, introduces the physical components of the
system.
Chapter 2, Technical Overview, describes the switch-based interconnect; the
system board logic; the CPU, memory, and I/O subsystems; the environmental
logic; and the maintenance bus.
Chapter 3, System Installation, explains how to install the system, how to
install the rackmount system, and how to boot an operating system.
Chapter 4, System Configuration, describes the base system configuration;
configuring CPUs, memory, and I/O; interrupt and DMA configuration; and
system options and upgrades.
Chapter 5, Firmware, covers the SRM firmware and the remote console
manager (RCM).
Chapter 6, Troubleshooting, presents troubleshooting steps as well as
troubleshooting with LEDs and beep codes and with the SRM console.
xiv
DS20E Documentation
Title
Order Number
ER-K8F6W-UA
ER-K8F6W-IM
ER-PD12U-UG
EK-DSCPU-IN
EK-MS340-IN
EK-DS20E-TP
Release Notes
EK-K8F6W-RN
Symbols in Text
These symbols are found in the text of this guide. They have the following
meanings.
WARNING: Text set off in this manner indicates that failure to
follow directions in the warning could result in bodily harm or loss
of life.
CAUTION: Text set off in this manner indicates that failure to
follow directions could result in damage to equipment or loss of
information.
xv
Rack Stability
WARNING: To reduce the risk of personal injury or damage to the
equipment, be sure that:
Chapter
System Overview
Introduction
The Compaq DS20E is a low-end server that provides 64-bit performance for compute-intensive
applications. It is a departmental server with a design that supports expansion without
necessarily taking up additional floor space. It provides high performance, comprehensive
system management, high availability, and easy access for servicing.
The system ships with one processor, but can be upgraded to a dual-processor system. Its single
system board, also known as the main logic board (MLB), contains the I/O subsystem, including
the PCI/ISA slots and the cabling. The system also provides internal mounting for four disk
storage units (six units in the future) and an open removable media bay.
This chapter covers the following components:
Product Description
Product Packaging
System Components
Mechanical Specifications
Electrical Specifications
Environmental Specifications
Product Description
The DS20E system is a departmental class server that runs the Tru64 UNIX, OpenVMS, and
Linux operating systems. It can have up to two Alpha 21264 processors, the EV6 (500 MHz) or
EV67 (667 MHz).
DS20E memory can be increased up to 4 GB. The system uses a PCI and ISA bus architecture,
and network clustering technology. For system management, the DS20E includes Compaq
Insight Manager, a GUI-based tool for monitoring and controlling system operation. The DS20E
system is available in a pedestal or an industry-standard chassis for mounting in a rack. The side
cover can be removed to expose most system components for maintenance. The drives, fans,
and power supplies are hot-swap devices that can be replaced while the system is running.
Product Packaging
The system can be used as a deskside pedestal in the vertical position, or with the addition of
brackets, may be mounted in the horizontal position in a standard rack.
Rackmount
Pedestal
CAT0039
System Components
9
5
7
6
3
10
4
8
MR0300A
Figure Legend
Component
CD-ROM
Storage subsystem
CPUs
System board
3
1
2
6
1
CAT0018
Figure Legend
Function
Description
Power On/
Standby
Reset button
Power Indicator
LED
Fault LEDs
Halt button
Halt LED
Rear Panel
4
8
1
3
5
2
7
6
10
CAT0019A
Figure Legend
Connector/Port
Description
Parallel port
Serial port
(COM2)
Keyboard port
To PS/2-compatible keyboard.
Ethernet port
Mouse Port
To PS/2-compatible mouse.
Serial port
(COM1)
AC Power inlet
To power outlet.
SCSI breakouts
System fan 0
System fan.
System fan 1
System fan.
Universal Serial
Bus (USB)
Not supported.
Power Supplies
The system comes standard with two 375 watt power supplies that are connected in parallel and
can accommodate a third power supply, for redundancy. A power backplane integrates the three
supplies for power distribution, monitoring and control. All three supplies are removable and
accessible through the front of the enclosure. When a third redundant supply is present, the power
supplies can be replaced while the power is on.
CAT0043
Storage Subsystem
The system comes with a storage subsystem that holds four 1.6-inch drives. Six 1-inch drives
will be supported in the future.
DVA00047a
Removable Media
The removable media area contains the removable media bay , which accommodates one
5.25-inch, half-height tape device and a combination CD-ROM/FDD drive .
CAT0050A
PK1216e
System Board
All memory and I/O components are on a single system board that contains a memory
subsystem, PCI bus, ISA bus, integrated SCSI F/W/U I/O controllers, and option slots for PCIbased and ISA-based option modules.
CAT0030
PCI Options
The system has six physical, 64-bit PCI slots, one of which is a combination PCI/ISA slot. The
callouts show the PCI slot numbering on the system board.
CAT0046
CPU Modules
The system supports up to two processor modules that can be installed on the system board.
Each processor module contains a 21264 microprocessor, either 500 MHz or 667 MHz. The
21264 microprocessor is a superscalar chip with out-of-order execution and speculative
execution to maximize speed and performance. It contains four integer execution units and
dedicated execution units for floating-point add, multiply, and divide. The chip also has an
integrated instruction cache and a data cache. Each cache consists of a 64 KB two-way set
associative, virtually addressed cache divided into 64-byte blocks. The data cache is a
physically tagged, write-back cache.
The EV6 500 MHz processor module contains 4 MB secondary B-cache consisting of late-write
synchronous DRAMs (dynamic random access memory) that provide low latency and high
bandwidth. The EV67 667 MHz processor module has an 8 MB DDR (dual data rate).
Speaker
An internal speaker produces audio output for error beep codes and other audible messages.
Doors
The pedestal has an upper door and a full door. The upper door swings open for access to the
OCP and media drives. The full door provides access to the storage subsystems and hot-swap
power supplies.
Description
Processor
The 21264 microprocessor is a superscalar, superpipelined implementation of the Alpha architecture and
runs at an optimized price:performance speed of 500
MHz and above. The chip contains a 64 KB I set
associative cache and a 64 Kb D set associative cache.
A 4MB L2 Backup Cache (Bcache) supports each 500
MHz processor. The EV67 667 MHz processor module
has an 8 MB DDR (dual data rate)
Memory
Expansion Slots
Diskette Drive
Internal Storage
Network Controller
None
Hard Drives
Interfaces
Power Supply
Operating Systems
Features
Description
Upgrades
Memory Architecture
PCI/ISA
System Architecture
Manageability
Security
Flexible Packaging
Hot-Plug Fans
Mechanical Specifications
Measurement/Weight
Pedestal
Depth
Width
Height
Weight
66 cm (26 in.)
22.15 cm (8.7 in.)
44.6 cm (17.55 in.)
27.3 to 40.9 kg (60 to 90 lb)
Rackmount
66 cm (26 in.)
Standard EIA 310D (RETMA)
22.15 cm (8.7 in.)
34 to 36.7 kg (75 to 81 lb)
Electrical Specifications
Power and Voltage
Maximum input power
System input power
Rated Input
Current
7 ARMS
3 ARMS
Operating
Frequency
47 to 63Hz
47 to 63Hz
Maximum
Inrush Current
75A
75A
Output
Range
(Min. to Max.)
3.2 to
3.4
4.80 to
5.25
11.50 to
12.60
-10.9 to
-13.20
4.75 to
5.25
Maximum PARD
Load
Output voltage
DC Outputs
+3.3V
+5.25 sV
+ 12.0V
-12.0V
+5.5 VSB
40A
50mV
42.5A
50mV
6A
150mV
1.0A
150mV
.5A
50mV
Output voltage tolerances are total tolerance at the output connector or remote sense point, as
applicable. Total tolerance shall be the sum of periodic and random disturbances (PARD), peak response
voltage, and the root sum square of all static tolerances.
Environmental Specifications
Dimension
Temperature range
Altitude
Acoustics
Measurement/Weight
10 to 35 C
(41 to 95 F)
Nonoperating
-40 to 66 C
(-40 to 151 F)
Operating
2000 m (6,562 ft) maximum
Nonoperating
3600 m (12,000 ft) maximum
Idle
6.5 LWAd,B (0 or 1 x HDD); 6.9 (6 x HDD)
Operating
6.5 LWAd,B (0 or 1 x HDD); 6.9 (6 x HDD)
Operating
DS20E
DS20
System Board
CPU
Memory
Configurations
CD and Floppy
OCP Display
LED display
Enclosure
Storage
Power Supplies
System Fans
Alphanumeric display.
Chapter
Technical Overview
The DS20E architecture features a switch-based interconnect system using a cross-bar switch
chipset that allows data to move directly from place to place in the system. Its single, large
system board, the main logic board (MLB), contains the DS20E subsystems. A separate
component, called the server features module V2 (SFM2), contains the environmental logic.
Topics in this chapter are:
CPU Subsystem
Cross-Bar Switch
Memory Subsystem
I/O Subsystem
Environmental Logic
P-Chips
The P-chips are the PCI interface chips of the Tsunami core logic chipset. There are two 33MHz 64-bit PCI implementation P-chips:
The P-chip has a cycle time of 10 ns for the system interface and a cycle time of 30 ns for the
PCI interface. It is able to run a 30 ns PCI bus with a 12 ns to 15 ns system interface. It has the
following interfaces:
D-chip port to the PADbus40 bits for 4 bytes of data plus check bits. In standard mode,
the P-chip receives 4 bytes of data and their 4 associated check bits each cycle (36 pins
used). To support a system with eight D-chips, the P-chip has an additional mode where it
receives 8 bytes over two cycles, but receives all 8 associated check bits in one cycle (40
pins used). Quadword-based transfers are always used because that is the unit on which
the ECC is calculated.
C-chip command and address port to the CAPbusIt takes two cycles (20 ns minimum)
to transfer a command and address in either direction.
C-Chip
The C-chip is the control chip of the Tsunami core logic chipset. It provides the interface to the
CPU, the main memory, and the I/O subsystem.
D-Chips
D-chips are the data chips of the cross-bar switch. The system has eight D-chips. They provide
the interface with the memory data bus, the SysData bus, and the I/O subsystem data bus.
Bcache
Each 500 MHz processor module contains 4 MB of secondary Bcache (backup cache). Each
667 MHz processor module contains 8 MB of DDR cache (dual data rate). The Bcache consists
of late-write synchronous dynamic random access memory (SDRAMs) that provide low latency
and high bandwidth.
Memory DIMMs
The DS20E supports up to four banks of memory on the system board. Each bank contains four
slots with a total of 16 slots on the system board. The system uses 200-pin, buffered,
synchronous, dual in-line memory modules (DIMMs).
CPU
The CPU is a 21264 Alpha chip with the following features:
Pipeline organization
q
q
q
- Out-of-order execution
- Quad integer execution
- Dual floating-point execution
Tournament predictor
64 KB I-cache 2-set
q
q
- Bcache interface. Supports external data and tag stores; data path 128 bits (16 bytes)
wide, 16 bits of ECC. Tag size is 14 bits of address, 4 bits of control (including parity
bits). The EV6 cache line is 64 bytes.
System interface
Clock interface
- Internal PLL
- Y-Divisor
SROM interface
CPU Subsystem
The CPU subsystem is a module that plugs into the main logic module and consists of the Alpha
processor and its board-level cache (Bcache). The DS20E system supports the EV6 and EV67
processors at various speeds. The EV6 and EV67 share a common pin interface to the system
and use the same 587-pin CPGA package.
The DS20E supports up to two processor modules. Each processor module contains a 21264
microprocessor. The 21264 microprocessor is a superscalar, superpipelined implementation of
the Alpha architecture with out-of-order execution and speculative execution to maximize speed
and performance. It runs at an optimized price/performance speed of 500 MHz or 667 MHz. It
contains four integer execution units and dedicated execution units for floating-point add,
multiply, and divide. The chip contains a 64 KB I set-associative cache and a 64 KB D set
associative cache. Support for a larger L2 cache is provided by a private Bcache.
NOTE: If two CPUs are installed, the first CPU must be in CPU slot 0, and both
CPUs must have the same Alpha chip clock speed selected.
The EV6/EV67 chip has the following four interfaces to external logic:
Bcache Interface
This interface supports external data and tag stores. The data path is 128 bits (16 bytes) wide
with 16 bits of ECC (8 bits per 8 bytes). The tag size supported in the DS20E system has 14 bits
of address and 4 bits of control (including parity bits). The 22-bit index is used to address the
data store and tag store. The EV6/EV67cache line is 64 bytes.
Clock Interface
The EV6/EV67 has an internal phase-locked loop (PLL) that it uses to generate all its internal
and external clocks. To drive the PLL, a clock signal (CLKIN) is provided by the system. The
system also multiplies CLKIN to produce the internal EV6/EV67 clock (GCLK). GCLK is used
for all internal clocking. It is also used to derive the clocks for the Bcache interface and the
system interface, both of which use forwarded clocks (BC_CLK and FWD_CLK).
SROM Interface
The SROM interface has two uses:
To read a serial bit stream of initialization information and code after power-up reset
System Interface
The system interface consists of the following:
SysData bus
Bidirectional
64 bits wide
8 bits of ECC
Cross-Bar Switch
The system switch has the following performance features:
Supports two 64-bit, 33-MHz PCI buses, each with its own PCI address space
Provides low-latency memory access (120 ns CPU access using 83-MHz DRAMs)
Two P-chip data ports to the P-chip and D-chip bus (PADbus)
Control from C-chip (CPM/PAD) (The D-chip receives all of its commands from the Cchip.)
D-chip port to the PAD bus40 bits for 4 bytes of data plus check bits.
C-chip command and address port to the CAP busIt takes two cycles to transfer a
command and address in either direction.
Memory Subsystem
The system main memory is a synchronous DRAM-based memory with a maximum clock rate
of 83.33 MHz. The DRAMs are mounted on 200-pin JEDEC standard DIMMs. Each DIMM
supports a 72-bit data bus (64 bits of data and 8 checkbits). The checkbits provide single-bit
correction/double-bit detection across each 64 bits. The system supports 16 DIMMs arranged in
four arrays of four DIMMs each. Memory is organized on two 256-bit plus ECC-bit buses. Each
bus can support up to two memory banks (a memory option) made up of four DIMMs. Memory
can be configured from a minimum of 256 MB (1 MS340-CA) to 4 GB (4 MS340-EA).
The two memory buses transfer data between the cross-bar switch and main memory. Each
DIMM bank provides 256 bits of data plus 32 ECC bits for the 32 bytes of data transferred. Two
modules in each bank provide the odd bytes of data and the other two modules provide the even
bytes of data.
The interface to the main memory subsystem occurs by means of the cross-bar switch. Each
array has a unique address port from the cross-bar C-chip. A four-DIMM array provides a data
bus width of 256 bits (+ECC). The DS20E uses two four-DIMM arrays.
For the memory subsystem to work properly, you must follow configuration rules and use only
qualified DIMMs.
Other memory options must be the same size or smaller than the first memory option.
Qualified DIMMs
The following DIMMs are qualified; others may be added in the future.
Sales
Part Number
FR-MS340-CA
FR-MS340-DA
FR-MS340-EA
Description
256MB ECC Memory DIMMs (4x64MB)
512MB ECC Memory DIMMs (4x128MB)
1GB ECC Memory DIMMs (4x256MB)
Array 1
32MB x 4
32MB x 4
32MB x 4
64MB x 4
64MB x 4
128MB x 4
128MB x 4
256 MB x 4
256MB x 4
256MB x 4
32MB x 4
32MB x 4
64 MB x 4
64MB x 4
64MB x 4
128MB x 4
128MB x 4
128MB x 4
256MB x 4
256MB x 4
Array 2
Array 3
Total
256MB
512MB
640MB
512MB
64MB x 4 64MB x 4 1GB
1GB
128MB x 4 128MB x 4 2GB
1.5GB
2GB
256MB x 4 256MB x 4 4GB
32MB x 4
64MB x4
32MB x 4
I/O Subsystem
The DS20E I/O subsystem comprises the following I/O bus interfaces along with the Cypress
Bridge chip:
PCI
ISA
TIG
PCI Interface
This interface consists of two PCI buses, PCI-0 and PCI-1. Both are 64-bit buses with three PCI
slots each, but PCI-0 also connects to a Cypress chip and an Adaptec SCSI controller. The
Cypress bridge chip (also referred to as the CY82C693U or simply the 693U) is a multifunction
Type 0 device. The 693U is on the PCI-0 bus and implements the following important
functions:
ISA Bridge
This chip implements a bus that performs most of the functions of an ISA bus. The chip is used
primarily for legacy purposes and is not expected to be used for any new devices. The Cypress
bridge implementation is not Intel SIO compatible, but it includes direct memory access (DMA)
and an interrupt controller.
Keyboard/Mouse
These interfaces are ISA-based and 8242 compatible. An 8051 style microcontroller with a
built-in ROM is present. The DS20E system implements PS/2 style keyboard/mouse ports that
are transparent to existing drivers for such ports.
Real-Time Clock
The Dallas 1287A compatible real-time clock (RTC) interface that is implemented in the
Cypress bridge (693U) is transparent to existing drivers. A square wave output is provided by
the Cypress bridge and is routed to the Tsunami C-chip to generate an RTC interrupt to the
processor. Routing the RTC interrupt through ISA interrupt request priority 8 (IRQ8) can be
masked under program control.
Interrupt Controller
This dual-stage interrupt controller program is compatible with the one used in the Intel SIO
bridge. This controller can accept 16 edge-triggered interrupt requests (IRQs) from the ISA
sources (including internal ISA sources) and four PCI level interrupts. (The PCI interrupt inputs
are not used in the DS20E system.) The Cypress bridge gathers all these interrupts and
compares them to a programmable mask to produce a level interrupt signal INTR to present to
the system interface.
Enhanced IDE
This two-channel PCI-based enhanced IDE interface operates at up to 16.67 MB/s (Type 4
transfers) and implements bus mastership. The DS20E system uses only one of the two
channels, allowing a maximum of two IDE drives in the system.
ISA Interface
This interface provides a link to legacy ISA options. The Cypress PCI-ISA bridge provides the
interface between PCI-0 and this ISA bus. The ISA bus supports the following components:
Serial Ports: The Super I/O chip contains two 16550-compatible UARTs that provide 16byte send/receive FIFOs. The maximum achievable baud rate on this port is 230 K. It has
a programmable baud rate generator and modem control circuitry. The port is fully
compliant with legacy ISA standards.
Multimode Parallel Port: This port supports both standard, enhanced, and extended
modes of operation. In standard mode, it is a PS/2, PC/AT compatible, bidirectional
parallel port. In enhanced mode (EPP), it is an IEEE1284 compliant interface. In extended
high-speed mode, it is an extended capabilities port (ECP) that is also IEEE1284
compliant.
Floppy Disk Interface: This is a 2.88 MB Super I/O Floppy Disk Controller that is
software and register compatible to the 82077A. It supports two floppy drives directly and
a vertical recording format (VRF). It has a 16-byte data FIFO and detects all
overrun/underrun conditions. It has direct memory access (DMA) enable logic and a nonburst DMA option. It is IBM compatible.
Flash ROM
The flash ROM contains the diagnostics, a fail-safe loader, and console firmware. It sits on the
TIG bus and interfaces with the system through the cross-bar C-chip.
IRQs
System interrupts for PCI devices and all onboard devices, including the CPUs and memory, are
passed through the TIG bus to the cross-bar C-chip.
Environmental Logic
The server features module V2 (SFM2) monitors environmental conditions. This module
supports two system fans and three power supplies in an N+1 configuration. The power supplies
may be hot-swapped. The SFM2 also monitors the state of the CPU fans.
POK0
POK1
POK2
SYS FANS OK
CPU FANS OK
TEMP OK
PK1216c
On SFM2 Module
Description
POK0
When lit, indicates power supply 0 has passed its selftest and is running okay.
POK1
When lit, indicates power supply 1 has passed its selftest and is running okay.
POK2
When lit, indicates power supply 2 has passed its selftest and is running okay.
SYS FANS OK
CPU FANS OK
TEMP OK
SFM2 Inverter
Inverts the PSn OK signals sent to the LEDs to light the LED when the power supply is good.
SFM2 PAL
The function of the SFM2 PAL is to monitor the power environment to determine whether or
not the power supplies can be enabled. It also determines the power supply configuration and
signals a shutdown if the configuration is invalid.
To enable the power supplies, PS_EN is sent to all power supplies. It is generated from signals
produced by the following:
On/Off switch
Door interlock
Display faults
The private IC bus between memory and the C-chip is used to provide memory configuration to
the consoles and operating systems.
One records the state of the fans and power supplies and is latched when there is a fault.
The other causes an interrupt on the IC bus when a CPU or system fan fails, an
overtemperature condition exists, or power supplied to the system changes from N + 1 to
N or from N to N +1.
The interrupt received by the IC bus controller and passed on to P-chip 0 alerts the system of a
power system event that may or may not cause a power shutdown. If power loss is imminent the
controller has 30 seconds to read the two registers and store the information in the NVRAM on
the server features module. The SRM console show power command reads these registers.
Fault Display
The OCP display is written by means of the IC bus.
Error State
Error state is logged by the IC controller. The error state for power, fan, and overtemperature
conditions are stored for access when there is a fault.
Configuration Tracking
Each CPU and each logical section of the system board (the PCI bridge, the PCI backplane, the
power control logic, the remote console manager), and the system board itself has an EEPROM
that contains information about the module that can be written and read over the IC bus. All
EEPROMs contain the following information:
Module type
Firmware revision
Chapter
System Installation
Introduction
This chapter explains how to install a DS20E system and boot the operating system. Topics
in this chapter are:
Installing OpenVMS
Booting OpenVMS
Installing Linux
Booting Linux
Preinstallation Checklist
Before you install the system, perform the following checks:
1.
Review the information supplied with the system, including user documentation.
2.
Select a well-ventilated site for the system near a grounded power outlet and away from
sources of excessive heat. The site should also be isolated from electric noise (for example,
spikes, sags, and surges) produced by devices such as air conditioners, large fans, radios,
and televisions.
WARNING: When unpacking and moving system components, be aware
that some components may be too heavy for you to lift alone safely. If you
are doubtful about whether you can lift these items alone, please get
assistance.
3.
Save all shipping containers and packing material for repackaging or moving the system
later.
System Installation
Keep in mind the environmental conditions, the power requirements, and the clearance
needed to access the system for servicing.
3-3
to be connected.
System Installation
3-5
Plug the power cord into the system and then into the wall outlet.
2.
3.
4.
After waiting for the monitor to warm up, if necessary, adjust the contrast and brightness to
obtain a readable screen display.
5.
See the information supplied with the monitor for adjustment instructions.
6.
Allow the system to complete the power-on self-test (POST) and device initialization.
(This takes about one minute.)
The POST firmware runs basic hardware tests on the following system components to make
sure the operating system firmware can start:
Memory
Cache
Flash ROM
During initialization, LED and beep codes show the current status and indicate initialization
problems. Initialization occurs during the power-up sequence or when an SRM init
command is issued.
If the system completes the POST and device initialization with no errors, the system was
correctly installed. If errors occur, refer to the Troubleshooting chapter for troubleshooting
procedures.
WARNING: Always disconnect the power cord from the wall before servicing
the system.
2.
3.
4.
The system displays the prompt P00>>> when it is safe to turn off the power or restart the
system.
5.
To turn off the power, press the system unit power button.
WARNING: Always disconnect the power cord from the wall before
servicing the system.
System Installation
3-7
Order Number
Rackmount Installation
Template
B-IC-H9A10-5-DBM
B-IB-H9A10-5-DBM
EK-H9A10-IP
B-IC-H9A15-3-DBM
B-IB-H9A15-3-DBM
EK-H9A15-IP
System Installation
3-9
0.500 inch
0.625 inch
0.625 inch
1U
(1.75 inches)
0.500 inch
PK1221
The installation of the rackmount system requires 8.75 inches (5U) of vertical height
in the rack.
1.
Mark the midpoint hole on the vertical rail as shown in Figure 3-1. The midpoint hole
must be selected so that the holes immediately above and immediately below are
equidistant (.625 inches).
2.
3-10
Rack Accessories
2
3
1
6
3
4
6
2
3
5
1
1
6
1
2
PK0967
System Installation
Accessories List
Reference
Number
Mounting Hardware
Chassis slide
Bar nut
Front bezel
3-11
3-12
CAT0152
System Installation
3-13
1.
Attach the front mounting brackets along each edge, using three flat head Phillips
screws per bracket.
2.
Pull the narrow segment of the slide out and detach it completely by pressing the green
release button and continuing to pull.
3.
Attach the narrow segment of the slide to the system with five M4 x 10, Bossard screws.
4.
3-14
7
6
4
5
3
3
1
2
CAT0160A
System Installation
3-15
The sliding segment of the slide has an access hole that provides access to three
mounting holes in the stationary segment. You use two of the mounting holes.
Front
1. Insert a cap screw through the access hole and the first (forward-most) mounting hole
in the slide and through the hole in the slide bracket. Fasten with one two-hole nut
bar on and tighten.
2.
Align the access hole with the third mounting hole in the slide.
3.
Insert a cap screw through the access hole and the third hole in the slide and through the
slot in the slide bracket. Fasten through the nutbar and tighten.
Back
1.
Insert a screw through the two holes in the stationary segment of the slide and
through a slot in the slide bracket. Attach to a two-hole nut bar .
3-16
Back
1
2
2
6
Front
5
4
4
2
CAT0161A
System Installation
3-17
Fit the posts of a 2-post nut bar into the holes in the cabinet rail and slide bracket
and fasten with nuts .
3.
Back
1. Starting at the top marked hole put two hex screws through the rack tail and the slide
bracket . Fasten with a 2-hole nut bar .
2.
Fit the posts of a 2-post nut bar into the holes in the cabinet rail and slide bracket and
fasten with nuts .
3.
3-18
PK0213
The system is intended for installation in one of the following racks, which are equipped
with a stabilizer bar:
Pull out the stabilizer bar and extend the leveler foot to the floor before installing the
system.
If you are using a rack other than those listed above, install rack stabilizing feet or provide
other means to stabilize the rack before installing the system.
System Installation
3-19
CAT0153
1.
Extend the fixed portion of the chassis slide until you hear a click. Ensure that the inner
ball bearing slide on the chassis slide is pulled to the front of the rail.
2.
Align the narrow segment of the slides attached to the system with the slides attached to the
rack, and slide the system onto the rail.
3.
Depress the green release button on each side and slide the system completely into the rack.
WARNING: Make sure that all other hardware in the rack is pushed in and attached.
The system is very heavy. Do not attempt to lift it manually. Use a material lift or
other mechanical device.
3-20
Installing U-Nuts
Install U-nuts and shipping screws as follows:
1
CAT0157B
1.
2.
Install two 10-32 x .500-inch hex head shipping screws and tighten.
System Installation
4
6
2
6
5
6
7
6
3
6
1
6
PK0965
3-21
3-22
1.
At the back of the rack, release the vertical bar of the interlock system.
2.
Insert the stabilizer bracket and the actuator latch into the vertical bar so that the
actuator latch is below the stabilizer bracket.
3.
4.
Secure the stabilizer bracket to the two remaining marked holes on the right rack rail with
two 10-32 x .500-inch hex screws . Tighten into the u-nuts.
5.
Install the trip mechanism onto the chassis using two M5 x 8 mm screws .
6.
Vertically position the actuator latch such that the trip mechanism
aligns with the actuator latch.
7.
Rotate the actuator latch to orient it like the other actuator latches on the vertical bar.
8.
on the system
System Installation
3-23
1
6
2
6
3
6
PK0966
1.
Clip U-nuts over the holes in the vertical rail corresponding to the holes in the cable
management bracket.
2.
Attach the cable management bracket to the rack with two 10-32 x .5-inch screws .
3.
Attach the cable management bracket to the chassis with two M3 x 6 mm screws .
3-24
PK1223
1.
Dress the cables through the cable clamps or tie wrap them to the cable retractor assembly.
2.
Attach all cables to the member of the cable management arm that is attached to the
system.
CAUTION: Failure to attach the cables to the attached member of
the management arm may cause cables to become disconnected.
System Installation
CAT0157
1.
Align the front bezel with the front of the system and snap it into place.
3-25
3-26
Result
Clears the boot_osflags variable.
4
5
P00>>>show device
System Installation
Use the show and set console commands to check and set the required environment
variable. See the firmware chapter for more information about these commands.
Ensure that settings for environment variables match the system configuration.
21272-CA
21272-DA
21272-EA
21272-EA
Rev
Rev
Rev
Rev
2
2
2
2
3-27
3-28
To change how the system starts or boots the operating system, change default values for
environment variables.
Examples:
Set the system to autoboot.
P00>>>set auto_action boot
Set the system to halt at the console prompt after the startup tests.
P00>>>set auto_action halt
Change the default boot device.
P00>>>set bootdef_dev dka0
Example:
P00>>> show | more
System Installation
3-29
Installing OpenVMS
After you boot the operating system CD, an installation menu is displayed on the screen.
1.
2.
Booting OpenVMS
OpenVMS can be booted from a CD on a local drive (the CD-ROM drive connected to the
system) or from a CD-ROM drive on an InfoServer.
Power up the system. The system stops at the SRM console prompt, P00>>>.
2.
3.
Install the boot medium. For a network boot, see Booting OpenVMS from the InfoServer.
4.
Enter the show device command to determine the unit number of the drive for your device.
5.
Enter the boot command. (If you have not set the associated environment variables, enter the
command-line parameters along with the boot command.)
3-30
Example
P00>>>show device
dkc0.0.0.8.0
DKC0
SEAGATE ST39102LC 7B04
dqa0.0.0.105.0
DQA0
TOSHIBA CD-ROM XM-1702B 1150
dva0.0.0.0.0
DVA0
eia0.0.0.2005.0
EIA0
00-06-2B-00-6E-56
pka0.7.0.6.0
PKA0
SCSI Bus ID 7
pkb0.7.0.106.0
PKB0
SCSI Bus ID 7
pkc0.7.0.8.0
PKC0
SCSI Bus ID 7
P00>>>
.
.
.
P00>>> boot -flags 0,0 dqa0
(boot dqa0.0.0.105.1 -flags 0,0)
block 0 of dqa0.0.0.105.1 is a valid boot block
reading 898 blocks from dqa0.0.0.105.1
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 70400
initializing HWRPB at 2000
initializing page table at 3ffee000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
OpenVMS (TM) Alpha Operating System, Version V7.1-2
Power up the system. The system stops at the P00>>> console prompt.
2.
Insert the operating system CD into the CD-ROM drive connected to the InfoServer.
3.
Enter the show device command to determine the unit number of the drive for your device.
4.
5.
6.
Respond to the menu prompts, using the selections shown in the InfoServer example.
For complete instructions on booting OpenVMS from an InfoServer, see the OpenVMS
installation document.
System Installation
Installing Linux
The procedure for installing Linux on a DS20E is documented in the Linux Installation and
Configuration Guide for AlphaServer DS10, DS20, and AlphaStation XP1000 Computers.
http://www.digital.com/alphaserver/linux/install_guide.html
Power up the system to the SRM console and enter the show version command.
P00>>show version
version V5.4-2 May 19 1999 14:53:22
P00>>
You need V5.4-2 or higher of the SRM console to install Linux. If you have a lower version of
the firmware, you will need to upgrade.
Booting Linux
Before booting Linux, enter the show device command to determine the unit number of the
drive for your boot device. In the following example DKA300 is a hard disk, DKA500 is a CD,
and DVA0 is a floppy drive. In the following example DKA300 is a hard disk, DKA500 is a
CD, and DVA0 is a floppy drive.
P00>>>show device
dka300.3.0.7.1 DKA300 RZ1CF-CF 1614
dka500.5.0.7.1 DKA500 TOSHIBA CD-ROM XM-5701TA 0557
dva0.0.0.0.0 DVA0
pka0.7.0.7.1 PKA0 SCSI Bus ID 7 5.57
. . .
Set the following SRM environment variables to configure boot parameters. This example
shows configuration commands to boot the floppy created by the Linux installation.
P00>>>set bootdef_dev dva0
P00>>>set boot_file vmlinux.gz
P00>>>set boot_osflags "root=/dev/hda"
P00>>>show boot*
boot_dev dva0.0.0.0.0
boot_file vmlinux.gz
boot_osflags root=/dev/hda
boot_reset OFF
bootdef_dev dva0.0.0.0.0
booted_dev
booted_file
booted_osflags
Insert the boot floppy and enter the boot command.
3-31
3-32
Chapter
System Configuration
Introduction
To configure a DS20E system, you need to know which configuration options you can use and
how to configure the components to interact for optimum system performance. This chapter
describes configuration options, guidelines, requirements, and procedures. Topics in this
chapter are:
Switch Settings
Memory Configurations
Addressing Considerations
SCSI Configuration
Interrupt Configuration
DMA Configuration
Firmware Configuration
Switch Settings
Two switchpacks configure functions on the system board (or main logic board). They are
located at the lower right corner of the board. .
Switch SW2 is used to control the writing of the flash ROM, the speed of the cross-bar switch,
cache memory timing, and the debug monitor output path. The switch positions are identified in
the table.
System Configuration
Number
Name
Function
Causes the SROM to jump to the Fail-Safe
Booter program (currently the debug
monitor).
Sets the cache memory timing.
Sets the cache memory timing.
Causes the SROM to jump to the SROM
Mini debugger.
Sets the speed of the Tsunami chipset and
the CPU (bit 0) (see next table).
Sets the speed of the Tsunami chipset and
the CPU (bit 1) (see next table).
Sets the speed of the Tsunami chipset and
the CPU (bit 2) (see next table).
Causes the debug monitor to send its
output through the debug port on the CPU
daughter card.
Default
FSB
2
3
CACHE_OFF_A
CACHE_OFF_B
MINI_DEB
TS_SPD0
TS_SPD1
TS_SPD2
PASS_BY
TS_SPD2
TS_SPD1
TS_SPD0
Cross-Bar Speed
CPU Speed
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
66.6
75
77
79
83.3
87.5
91.7
100
400
450
500
550
600
666
833
1000
Off
Off
Off
Off
On
On
Off
Off
4-3
System Configuration
CPU SW1
The CPU daughter card on the system board also contains two switchpacks. They configure
functions for the CPU subsystem.
CPU switch SW1 is used to set the Bcache configuration, the CPU speed, and the SROM flash
enable. The switch positions are identified in the following table:
Position
1-4
5-7
8
Description
Bcache configuration
CPU speed
SROM flash select
Bcache Configuration
SW1-4
Off
SW1-3
On
SW1-2
X
SW1-1
On
Function
Reserved
4-5
CPU Speed
SW1-7
On
SW1-6
On
SW1-5
Off
Function
500MHz
Description
Flash select disabled
Flash select enabled
System Configuration
CPU SW2
This switchpack sets the CPU voltage and selects the flash. Default settings are shown in bold.
CPU Voltage
SW2-4
Off
Off
Off
Off
Off
Off
Off
Off
On
On
On
On
On
On
On
On
SW2-3
Off
Off
Off
Off
On
On
On
On
Off
Off
Off
Off
On
On
On
On
SW2-2
Off
Off
On
On
Off
Off
On
On
Off
Off
On
On
Off
Off
On
On
SW2-1
Off
On
Off
On
Off
On
Off
On
Off
On
Off
On
Off
On
Off
On
VDC
1.429
1.500
1.571
1.643
1.714
1.786
1.857
1.929
2.000
2.071
2.143
2.214
2.286
2.357
2.429
2.500
Flash1
(SW2-6)
Off
Off
On
On
Off
Off
On
On
Flash2
(SW2-5)
Off
Off
Off
Off
On
On
On
On
Description
Flash bypass disabled
Flash bypass enabled
The following example shows the switches set appropriately for the system.
4-7
Memory Configurations
The DS20E system has 16 memory slots for 4 arrays of DIMMs as shown in the following
illustration. The system has a memory capacity of up to 4 GB.
Other memory options must be the same size or smaller than the first memory option.
Qualified DIMMs
The following DIMMs are qualified; others may be added in the future.
Sales
Part Number
Description
FR-MS340-CA
256MB ECC Memory DIMMs (4x64MB)
FR-MS340-DA
512MB ECC Memory DIMMs (4x128MB)
FR-MS340-EA
1GB ECC Memory DIMMs (4x256MB)
System Configuration
Addressing Considerations
Addresses are generated either by the CPU or an I/O device on the PCI bus. A CPU-generated
address can be targeted at system memory, PCI memory, or PCI I/O space. Similarly, an I/O
devices address can select system memory or other PCI devices. Because the addressing
capabilities of CPU and I/O devices are different, a scheme to map them to the appropriate
target address space is required.
From the CPUs perspective, the PCI I/O and memory space are linear and byte accessible.
Because the EV6/EV67 supports byte mode accesses, a single linear I/O space is used. CPU
address space is defined as the map of CPU-generated addresses used to access system memory
and I/O space.
CPUAddr[43:0]
Space
Size
From
To
System Memory
(Cacheable, Prefetchable)
4 GB
Reserved
8188 GB
4 GB
P-Chip0 PCI Memory
(Linear Addressing, NonCacheable)
1GB
Reserved
1GB
256 MB
P-Chip0 CSRs
(addr[5:0]=0. Quadword
access only)
Reserved
256 MB
Reserved
896MB
P-Chip 0 PCI
IACK/Special (Linear
addressing, No address
extension using HAE)
64MB
4-9
4-10
CPUAddr[43:0]
From
To
16 MB
P-Chip 0 PCI
Configuration (Linear
addressing. No HAE. Noncacheable)
Reserved
P-Chip 1 Configuration
(Linear addressing. No
HAE. Non- cacheable)
32 MB
Reserved
16 MB
8188 GB
Space
Size
Reserved
256 MB
System Configuration
The following table is used to translate the CPU mask into PCI AD[1:0] and PCI BE[3:0].
Type
Mask
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Word
Word
Word
Word
LW
LW
LW
LW
LW
LW
LW
LW
QW
0000 0001
0000 0010
0000 0100
0000 1000
0001 0000
0010 0000
0100 0000
1000 0000
0000 0011
0000 1100
0011 0000
1100 0000
xxxx xxx1
xxxx xx10
xxxx x100
xxxx 1000
xxx1 0000
xx10 0000
x100 0000
1000 0000
xxxx xxxx
PCI_AD[2:0]
64-bit
000
001
010
011
100
101
110
111
000
010
100
110
000
100
000
100
000
100
000
100
000
PCI_BE[7:0]
64-bit
1111 1110
1111 1101
1111 1011
1111 0111
1110 1111
1101 1111
1011 1111
0111 1111
1111 1100
1111 0011
1100 1111
0011 1111
xxxx 0000
0000 1111
xxxx 0000
0000 1111
1111 0000
0000 1111
1111 0000
0000 1111
0000 0000
PCI_AD[2:0]
32-bit
000
001
010
011
100
101
110
111
000
010
100
110
000
100
000
100
000
100
000
100
000
PCI_BE[3:0]
32-bit
1110
1101
1011
0111
1110
1101
1011
0111
1100
0011
1100
0011
0000
0000
0000
0000
0000
0000
0000
0000
0000
4-11
4-12
CPU address mapping to PCI I/O space uses the CPU mask. (The PCI host controller in the Pchip does not recognize I/O space accesses directed to it from other PCI or ISA devices.)
CPU address mapping must also be translated into PCI configuration space 0 and space 1.
Space 0
Binary encoding is used to decode the 5-bit Device # field in the CPU address to generate the
field IDSEL[20:0] in PCI AD[31:11], as shown here:
When Device # is 00002, IDSEL bit 1 (PCI AD[12]) is set to "1", and so on up to Device
# 10100.
Device field encodings 1010111111 are not used and result in IDSEL[20:0] being set to
0s.
System Configuration
The following table is used to translate CPU Mask into PCI AD[1:0] and PCI BE[3:0].
Type
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Word
Word
Word
Word
LW
LW
LW
LW
LW
LW
LW
LW
QW
Mask
0000 0001
0000 0010
0000 0100
0000 1000
0001 0000
0010 0000
0100 0000
1000 0000
0000 0011
0000 1100
0011 0000
1100 0000
xxxx xxx1
xxxx xx10
xxxx x100
xxxx 1000
xxx1 0000
xx10 0000
x100 0000
1000 0000
xxxx xxxx
PCI AD[2]
0
0
0
0
1
1
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
PCI BE[3:0]
1110
1101
1011
0111
1110
1101
1011
0111
1100
0011
1100
0011
0000
0000
0000
0000
0000
0000
0000
0000
0000
4-13
4-14
Space 1
CPU address mapping must also be translated to TIG address space. The TIG address is sparse
in that each aligned 64-byte region has only one byte of information.
Interrupt acknowledge
Special Cycle
The P-chips respond to PCI memory read/write and invalidate commands if the PCI address
maps to system memory.
Each P-chip supports four DMA address windows and one DMA monster window. Each of the
normal DMA windows is capable of mapping to system memory (or to other PCI devices, as
long as another P-chip exists). If the window selected by the PCI address is not a peer-to-peer
window, it can translate the incoming address to the system memory address by either direct
mapping or scatter/gather mapping.
System Configuration
Direct Mapping
The incoming address is compared to a Window Base Address register and a Window Mask
register, which determines the size of the window. If the address fits into this window, the
address bits that are not part of the compare are concatenated to a Translated Base address
register to form the System Memory Address. Note that the PCI Address bits that are not part of
the compare are the lower-order bits, and represent the size of the window:
System Address[34:2] = T_Base[34:20+n]:PCI_AD[19+n:2]
The variable n varies from 011. Thus, the size of the window can vary from 1MB to 2GB.
Scatter/Gather Mapping
This scheme also uses the Window Base, Window Mask and Translated Base Address
Registers. The difference is that the translated address from the scheme for direct mapping
results in a quadword address. This quadword address is fed into a system memory-based page
table to produce a Page Table Entry (PTE). The PTE produces the top 21 bits of the system
memory address while the PCI AD[12:0] are sent through untranslated as the Page Offset. The
translation is illustrated as follows:
V is the valid bit in the PTE and must be a 1 to indicate a valid PTE. Each P-chip caches a
number of PTEs in a scatter/gather table to avoid the memory fetch on every DMA transaction.
The page size is fixed at 8 KB. The window size determines the size of the page table for a
given window. The size of the page table determines the number of bits used from the
Translated Base Address[34:10] and the PCI AD[31:13]:
PTE Address[34:3] = T_Base[34:10+n]:PCI_AD[19+n:13]
The variable n varies from 011. Thus the window size ranges from 1MB to 2GB.
Monster Window
This window is used only with a PCI dual address cycle. A monster window is selected if the
PCI AD[63:40] equals 0x0000_01 (only bit 40 is a 1). In this case, the low-order PCI AD [34:0]
is used untranslated to address system memory.
4-15
4-16
SCSI Configuration
The DS20E system supports up to four internal SCSI devices. Systems that include a SCSI bus
currently use the Qlogic 1040UW Ultra Wide SCSI PCI host adapter.
SCSI IDs
The DS20E system currently supports up to four internal SCSI devices. The adapter can support
SCSI IDs ranging from 0 to 15. Each SCSI device must have a unique SCSI ID. Typically, the
host adapter is ID 7.
NOTE: The CD-ROM drive is an ATAPI device attached to the IDE port.
SCSI Termination
Termination on the host adapter is controlled by software commands through an adapter utility.
The default setting is Automatic. In Automatic mode, the adapter detects attached cables to
either one or two of its three connectors (cables on all three connectors is an illegal
configuration). The adapter then sets termination accordingly, as show in the following table:
NOTE: You can also select these settings manually, using the SCSISelect utility.
System Configuration
PCI Slot Numbering from the top as mounted in the pedestal chassis
1
64 bit PCI - bus 0, slot 0
Slot 7
J35
2
64 bit PCI - bus 0, slot 1
Slot 8
J40
3
64 bit PCI - bus 0, slot 2
Slot 9
J41
4
64 bit PCI - bus 1, slot 0
Slot 7
J42
5
64 bit PCI - bus 1, slot 1
Slot 8
J44
6
Shared ISA/64 bit PCI - bus 1, slot 2
Slot 9
PCI J46/ISA J47
The following installation order must be followed:
Step
1
2
3
4
5
In this slot:
6
First available slots 3, 1
First available slots 3, 2, 1
First available slots 3, 2, 1, 4, 5
First available slots 3, 2, 1, 4, 5
Graphics Options
Multihead systems must be homogeneous (all ELSA, all PowerStorm 300, and so on).
SN-PBXGK-BB
SN-PBXGB-AA
SN-PBXGI-AD
SN-PBXGD-AD
Tru64 UNIX
3
1/3
2
2
4-17
4-18
SCSI Controllers
PCI Restrictions
The DS20E system will not operate properly unless the following restrictions are observed:
On systems using the Tru64 UNIX operating system, all TGA-2 and PowerStorm 4D51T
options must be installed in PCI bus 0.
In a multi-head environment with ELSA Gloria modules, the VGA-enabled ELSA Gloria
must be installed in PCI bus 0.
ISA Bus
The DS20E system has a built-in ISA device and a slot for an additional ISA option (if the slot
is not already occupied by a PCI device). The built-in device is a super I/O chip (SMC FDC
37C669), which provides controllers for the floppy disk bus, two serial lines, keyboard, mouse,
and the bidirectional enhanced parallel port.
ISA Restrictions
IRQ7 is reserved for use by the parallel port. IRQ10 is reserved for use by the USB controller,
regardless of whether it is enabled.
PCI Device
Cypress 693U
PCI Slot 1
PCI Slot 2
PCI Slot 3
Ethernet 21143
SCSI 1040C
P2P Bridge
PCI Slot 4
PCI Slot 5
IDSEL
PCI AD[18]
PCI AD[22]
PCI AD[23]
PCI AD[24]
PCI AD[14]
PCI AD[17]
PCI AD[19]
PCI AD[25]
PCI AD[26]
The following assignment tables show the DS20E system with PCI dense configuration space.
System Configuration
4-19
PCI
Base Address
0004.0000
0040.0000
0080.0000
0100.0000
CPU
Base Address
801.FE00.3800
801.FE00.5800
801.FE00.6000
801.FE00.6800
MLB Device
PCI-ISA
Slot 1
Slot 2
Slot 3
PCI
Base Address
0000.4000
0002.0000
0008.0000
CPU
Base Address
803.FE00.1800
803.FE00.3000
803.FE00.4000
MLB Device
PCI-Ethernet
SCSI 1040C
P2P Bridge
Secondary
PCI Base Address
0200.0000
0400.0000
Primary
PCI Base Address
0001.4801
0001.5001
MLB Device
803.0001.4800
803.0001.5000
Slot 4
Slot 5
For either bus, you create a CPU address from the CPU base address (above), the function
number of the device, and the offset into the PCI configuration space, using the following
formula:
CPU address = CPU base address + function number * 256 + offset
To read from this address, use the following SRM console command:
>>> e -p {size} {CPU address}
{size}
-l, -w, -b (for longword, word, or byte access)
{CPU address}
The address calculated above
For example, to read the status register (offset 6, word wide) from the USB section (function 3)
of the Cypress PCI-ISA bridge (primary bus, IDSEL hooked to AD<18>), the CPU address (in
hex) is:
801.0000.3800 + 300 + 6 -> 801.0000.3B06
The console command is:
>>> e -p -w 801.0000.3b06
4-20
Interrupt Configuration
The main logic board collects error interrupts and interrupts from various other sources and
routes them to the appropriate IRQ.
Interrupt Source
P-chips
PCI and ISA devices
rtc_irq
C_chip_csr
Halt jumper
CPU Interrupt
Cpu_irq 0
Cpu_irq 1
Cpu_irq 2
Cpu_irq 3
Cpu_irq 4
Description
Error interrupts
PCI and ISA interrupts
Real-time clock interrupt
Interprocessor
Halt is through the Halt button or RMC halt command;
software can enable this
These interrupts (IRQ_0 through IRQ_2) need to be posted through the TIG bus to the Tsunami
C-chip where they are collected in the device interrupt register before being sent to the
EV6/EV67 through the TIG bus. Programmable masking of interrupts is also done in the C-chip
using the interrupt mask register.
All interrupts collected through the TIG bus are level interrupts, and the interrupt conditions
remain present until cleared at the source through programmed I/O. The interrupt posting
buffers of the TIG bus are not the real source of the interrupts. Reading them will not clear the
interrupt condition.
For optimum system performance and to prevent conflicts, all bus master devices are assigned
interrupt request levels. The following sections provide assignment and other interrupt
information.
System Configuration
Interrupt Source
Halt Interrupt
Various catastrophic or operator-induced conditions cause a halt to the SRM console.
Code
Description
1
Hardware Halt button pushed*
2
Kernal Stack Pointer invalid
5
Software Halt instruction executed
6
Double machine check
* The front panel Reset button can optionally be configured to generate a Halt
interrupt to the EV6/EV67 processor. The processor will receive this interrupt
on IRQ_4, which is a dedicated interrupt line for this function. (This interrupt is
not routed through the C-chip).
4-21
4-22
ISA Device
Keyboard (Cypress)
Real Time Clock (Cypress)
COM 2 Port
COM 1 Port
FDC
Parallel Port
ISA Option Slot
USB
Mouse (Cypress)
PCI-IDE Primary
PCI-IDE Secondary
ISA IRQ
Notes
IRQ_1
Not an ISA interrupt Maskable in Cypress
IRQ_3
IRQ_4
IRQ_6
IRQ_7
IRQ_9,11,13
Programmable Selection
IRQ_10
IRQ_12
IRQ_14
IRQ_15
System Configuration
DMA Configuration
Direct memory access (DMA) allows devices to access memory directly without going through
the CPU. DMA channels must be unique, and addresses must not conflict. The following table
summarizes the DMA channel assignments. Certain channels are hardwired (H), and others are
program selected (P):
Device
Floppy disk controller
Parallel
ISA
DMA Channels
0
1
2
H
P
3
H
P
The floppy interface in the Super I/O chip is hardwired to DMA channel 2. The parallel
interface is allowed to use DMA channel 3 during certain modes of operation.
DMA channels 0 through 3 and 5 through 7 can be used by the ISA slot on the MLB.
Firmware Configuration
The DS20E firmware is used to:
Configuration information is stored in the NVRAM section of the flash ROM. If the MLB is
replaced, the information in the NVRAM is lost.
NOTE: Keep records of the configuration information to facilitate restoring the system.
DS20E firmware resides in a serial ROM (SROM). This SROM contains the power-on self-test
(POST) and the SRM firmware for Tru64 UNIX and OpenVMS systems.
4-23
4-24
Example:
P00>>> set auto_action halt
P00>>> init
The system will halt.
P00>>> show auto_action
halt
System Configuration
Obtaining Options
Before installing any options or upgrading the system, you should:
Get an accurate list of the modules and devices in the current system configuration. Use
the show config command to display the current system configuration.
To display information on the devices and controllers installed in the system, enter the
show device command.
Determine what options are to be added to the system and ensure that they are supported.
Refer to the Compaq Systems and Options Catalog for the latest information on base
system components, configuration guidelines, packages, and available system options.
http://ftp.digital.com/pub/DEC/info/SOC/Systems_and_Options_Catalog_25jun1999SOH
OMEHM.htm
For the latest list of supported options, see the DS20E Supported Options List at:
http://www.digital.com/alphaserver/ds20e/options/asds20e_options.html
2.
On the system board, set SW3-2 to ON for the EV67 processor. See the SW3 illustration
earlier in this chapter.
For information about replacing a CPU or adding a second CPU, see the CPU Daughter Card
procedure in the FRU Removal and Replacement chapter.
Upgrading Memory
Minimum standard memory capacity is 128 MB (using four 32 MB DIMMs), or 256 MB (using
four 64 MB DIMMs). Memory can be upgraded to as much as 4 GB by removing the standard
DIMMs and installing the optional 256 MB DIMMs.
To upgrade memory, see the DIMMs procedure in the FRU Removal and Replacement chapter.
4-25
4-26
Memory DIMMs
Removable-media devices
Fixed-media drives
ISA devices
PCI devices
Before attempting to connect third-party devices or install third-party devices inside the system
unit, check to ensure that the operating system supports the device. All compatible third-party
devices use standard mounting hardware and connectors.
NOTE: Third-party memory DIMMs are not supported on the DS20E system.
Chapter
Firmware
Introduction
Firmware for the DS20E system includes the SRM console and the AlphaBIOS console. The
system also has a remote management console (RMC) for remote monitoring.
Topics in this chapter are:
Firmware
SRM Console
The SRM console is the command-line interface (CLI) that controls and sets up the operation of
a DS20E system running the Tru64 UNIX or OpenVMS operating system. This interface is a
shell, similar to UNIX, that provides a set of commands as well as a scripting facility. You enter
SRM console commands at the console prompt, P00>>>.
The SRM console allows you to boot the operating system, and perform other system
management tasks, such as:
AlphaBIOS Console
The AlphaBIOS console is an enhanced BIOS graphical user interface for Alpha systems. It is
used to run certain utilities, such as the RAID configuration utility.
Firmware 5-3
Has a bad video card, and no serial monitor is attached to the system
If the processor stops during startup, the hex countdown also stops, allowing you to determine
the state of the system.
This command...
Displays...
show config
List of devices found on the system bus and I/O buses. This
configuration was in effect when you initialized the system.
show cpu
show device
show memory
Information about the capacity of each memory bank, the size of the
DIMMs used in the memory bank, and the starting address of each
bank.
show pal
show power
Status information about the power supplies, system fans, CPU fans,
and temperature.
show version
Show Config
The show config command displays a list of devices found on the system interconnect and
I/O buses. This is the configuration at the most recent initialization. The syntax is:
P00>>> show config
SRM Console:
PALcode:
Processors
CPU 0
CPU 1
Core Logic
Cchip
DECchip 21272-CA Rev 2
Dchip
DECchip 21272-DA Rev 2
Pchip 0
DECchip 21272-EA Rev 2
Pchip 1
DECchip 21272-EA Rev 2
TIG
Rev 4.11
Arbiter
Rev 2.8 (0x1)
MEMORY
Array #
Size
Base Addr
1
128 MB
000000000
Total Bad Pages = 0
Total Good Memory = 128 MBytes
PCI Hose 00
Bus 00 Slot 05/0: Cypress 82C693
Bridge to Bus 1, ISA
Firmware 5-5
Show Cpu
The show cpu command displays the status of each CPU. The syntax is:
P00>>> show cpu
Primary CPU:
Active CPUs:
Configured CPUs:
SROM Revision:
P00>>>
00
00
00
V1.82
01
01
V1.82
Show Device
The show device command displays status for devices and controllers in the system: SCSI
and MSCP devices, the internal floppy drive, and the network.
The syntax for this command is:
P00>>>show device controller_name
In this command, controller_name is the controller name or abbreviation. When
abbreviations or wildcards are used, the system displays all controllers that match that type. If
you do not specify a name, the system displays all devices and controllers in the system.
This show device example shows the devices and controllers on a DS20E system.
P00>>>show device
dkc0.0.0.8.0
dqa0.0.0.105.0
dva0.0.0.0.0
eia0.0.0.2005.0
pka0.7.0.6.0
pkb0.7.0.106.0
pkc0.7.0.8.0
P00>>>
DKC0
DQA0
DVA0
EIA0
PKA0
PKB0
PKC0
Show Memory
The show memory command displays information about each memory bank: slot number,
size in megabytes, and the starting address.
P00>>> show memory
Array #
Size
Base Addr
------- ---------- --------0
128 MB
000000000
1
128 MB
008000000
2
128 MB
010000000
3
128 MB
018000000
Total Bad Pages = 0
Total Good Memory = 512 MBytes
P00>>>
Show Pal
P00>>> show pal
pal OpenVMS PALcode V1.61-49, Tru64 UNIX PALcode V1.54-58
P00>>>
Firmware 5-7
Show Power
The show power command displays status information about the power supplies, system
fans, CPU fans, and temperature. This command is useful for displaying the error state of a
system that shuts down because of a fan, temperature, or power supply failure. If the system can
be restarted, use this command; if it cannot, use the Remote Console Managers status
command, described later in this chapter.
P00>>>show power
Power Supply 0
Power Supply 1/Fan Tray
Power Supply 2/Fan Tray
System Fans
CPU Fans
Temperature
Status
good
not present
good
good
good
good
Show Version
The show version command displays the version of the SRM console program that is
currently installed on the system.
P00>>>show version
version
v5.5-1 Jul 30 1999 10:04:02
P00>>>
Does this:
auto_action
boot_file
boot_osflags
bootdef_dev
com_baud
console
cpu_enabled
ewa0_inet_init
ew*0_mode or
ei*0_mode
ew*0_protocols
or
ei*0_protocols
kbd_hardware
Firmware 5-9
This variable:
Does this:
language
ocp_test
os_type
password
pci_parity
pk*0_fast
pk*0_host_id
pk*0_soft_term
tt_allow_login
You can reset the system with the init command. The syntax is:
P00>>> init
Executing the init command is equivalent to pressing the Reset button. The system
performs self-tests and autoboots. The init command restarts the current in-memory console
image and resets all devices on the PCI bus. The system will not autoboot following an init
command if either of these conditions exists:
Example
P00>>>init
Initializing...
128 Meg of system memory
probing hose 1, PCI
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pka -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkb -- Adaptec AIC-7895
bus 0, slot 7 -- vga -- ELSA GLoria Synergy
bus 0, slot 8 -- pkc -- NCR 53C895
bus 0, slot 9 -- ewa -- DE500-AA Network Controller
Testing the System
Testing the Memory
Testing the Disks (read only)
Testing the Network
System Temperature is 34 degrees C
initializing GCT/FRU at 1ec000
COMPAQ AlphaServer DS20E Console v5.5-9, Aug 31 1999 11:52:26
P00>>>
Firmware 5-11
Example
This example shows the contents of a file called "test" After a page is displayed, you press the
space bar to see the next page of text.
P00>>>more test
echo "Requires diskette and loopback connectors on COM2 and parallel port"
echo "type kill_diags to halt testing"
echo "type show_status to display testing progress"
echo "type cat el to redisplay recent errors"
set d_group field
set d_harderr halt
set d_softerr halt
echo "Start exer on COM2"
serial1
echo "Start nettest on EWA0"
network
show memory
echo "Start Memory test "
memory
echo "Start exer on PARA"
parallel
echo "Start exer on DVA0"
floppy
if (ls dk*.* >nl) then
echo "Start exer on dk*"
disks
fi
dqtest
--More-- (SPACE - next page, ENTER - next line, Q - quit)
Editing Files
The edit command invokes a console editor, similar to a line editor in BASIC. It is used to
add, insert, and delete lines in RAM files or the NVRAM (nonvolatile RAM) power-up script.
CAUTION: Use caution when editing the NVRAM script. For example, if you include the
init command in the script, you will put the system into an endless loop. To correct this
error, press the Halt button while the system is powering up. When the P00>>> prompt is
displayed, edit the nvram script to remove the illegal command.
For more information on the edit command, see Creating a Power-up Script, later in this
chapter.
Firmware 5-13
Internally, the console uses drivers as the access mechanism for referencing different devices.
Specifically, the console provides drivers for the following generic devices or address spaces:
These hardware devices are also accessible using the device names shown here:
toy: 64 bytes, of which the first 14 bytes are Time-of-Year Clock registers and the last 40
bytes are private BBU RAM
Deposit Command
The deposit command stores data in a specified location. In this example, the hexadecimal
number 9b (155 in decimal) is stored in physical memory at address 0 as 000000000000009B.
By default, data is stored as a quadword. All values default to eight bytes.
CAUTION: Before experimenting with memory, find a safe area in memory to alter. The
console and other critical data structures reside in memory. Be careful not to alter them
inadvertently. Use the alloc command to allocate a block of memory for experimentation.
Options
Value
Definition
-b
-w
-l
-q (default)
Defines data size as a quadword (64 bits). All values default to 8 bytes.
-o
-h
-d
-p
-v
-g
-f
-i
-n <count>
-s <step>
Example
The deposit command can be abbreviated to d.
Clear first 512 bytes of physical memory.
P00>>>d -b -n 1FF pmem:0 0
Deposit 5 into four longwords starting at virtual memory address 1234.
P00>>>d -l -n 3 vmem:1234 5
Firmware 5-15
Examine Command
The examine command displays the contents of an address you specify: a memory location, a
register, a device, or a file.
As with the deposit command, if you do not specify options in an examine command, the
system uses the options from the last examine command that was entered. Also, if you specify
conflicting address space or data size, the system ignores the command and issues an error
message.
NOTE: For data lengths longer than a longword, data should be separated by a space.
The examine command uses the same options and arguments as the deposit command with
two exceptions:
The syntax for an examine command is the same as for the deposit command, with the
exceptions noted above.
Examples
These examples show how you can use the examine command to view the contents of
different devices.
Examine physical memory location 0.
P00>>>examine pmem:0
pmem: 0 0000000000000000
Deposit the hex number 9b to location 0 in physical memory and then view its contents.
NOTE: By default, data is stored as a quadword, so the actual number stored is zero,
padded for data length.
P00>>>deposit pmem:0 9b
P00>>>examine pmem:0
pmem: 0 000000000000009B
Examine the next location.
NOTE: An examine or deposit command without an explicit address always references
the next address (computed as the last referenced address plus the current data size).
P00>>>examine
pmem: 8 0000000000000000
Examine location 0 again.
P00>>>examine 0
pmem: 0 000000000000009B
Examine the contents of the TOY register.
P00>>>examine toy:0
toy: 0 1C06AF026F37002E
Firmware 5-17
Does This:
help
list
renumber
exit
quit
nn
nn text
Example
This example shows how to modify the user-created power-up script, nvram. The pound sign
(#) indicates explanatory comments. In this example the script is edited to include a command
that allows you to boot the Tru64 UNIX operating system over the network.
P00>>> edit nvram
#Modify user power-up script, nvram
editing nvram
0 bytes read in
*10 set ewa0_protocols bootp
*list
#List current file with line numbers
10 set ewa0_protocols bootp
*exit
#Close file and save changes
27 bytes written out to nvram
P00>>> nvram
#Execute the script.
Firmware 5-19
Qualifier
Meaning
-file <filename>
-flags <value>
Qualifier
Meaning
-protocols
<enet_protocol>
Firmware 5-21
Obtaining Help
You can use the SRM consoles on-line help system for reference.
NOTE: The on-line help may display commands that are not supported on the DS20E
system, and it may not display some commands that are supported.
boot
clear
dynamic
exer
grep
isacfg
ls
net
semaphore
show cluster
show_status
sys_exer
break
continue
echo
exit
halt
isp1020_edit
man
nettest
set
show config
sleep
true
cat
crash
edit
false
hd
kill
memexer
ps
set host
show iobq
sp
update
check
debug1
eval
find_field
help
kill_diags
memtest
rm
shell
show map
start
wc
For more information about SRM console commands, consult the reference manual,
AlphaServer 800, 1000/A, 2x00/A, 4x00, 8x00 SRM Console Command Line Interface:
http://prosic.cxo.dec.com/PUBS/SYSTEMS/EK-ASCLI-SRM-04.pdf
AlphaPowered
Press F8 For Windows 2000 Advanced Startup Options
Firmware 5-23
AlphaBIOS Setup
F1=Help
ESC=Exit
PK0954b
F1=Help
SHIFT+TAB
ALT+
HOME
END
ENTER=Continue
PK-0725-96
Firmware 5-25
F1
Ctrl + A
F2
Ctrl + B
F3
Ctrl + C
F4
Ctrl + D
F5
Ctrl + E
F6
Ctrl + F
F7
Ctrl + P
F8
Ctrl + R
F9
Ctrl + T
F10
Ctrl + U
Insert
Ctrl + V
Delete
Ctrl + W
Backspace
Ctrl + H
Esc
Ctrl + [
Utilities
Configuration utilities are run directly from the AlphaBIOS Utilities menu .
Run Maintenance Program
F1-Help
Location: A:
ENTER=Execute
ESC=Quit
CAT0138
If you change your system configuration, for example, by adding another RAID drive, you will
have to run the RAID configuration utility. As you modify your system, you might be required
to run other types of configuration utilities as well. Configuration utilities (also called
maintenance programs) are run directly from the AlphaBIOS Utilities menu.
Firmware 5-27
First-Time Setup
Using RMC Locally or with a Modem on COM1
To connect to the RMC locally, the console terminal has to be connected to COM1. You type
the escape sequence at the SRM console prompt on the local serial console terminal to enter
RMC command mode. You can invoke RMC from the SRM console, the operating system, or
an application.
To exit RMC and reconnect to the system console port, enter the quit command.
Press Return to get a prompt from the operating system or system console.
Firmware 5-29
RMC Commands
The following RMC commands are used to control and monitor a system remotely:
Command
halt
Function
Halts the server. Emulates pressing the Halt button and immediately releasing
it.
Causes a halt assertion. Emulates pressing the Halt button and holding it in.
Terminates a halt assertion created with the haltin command. Emulates
releasing the Halt button after holding it in.
Displays the list of commands
Turns off power. Emulates pressing the On/Off button to the off position.
Turns on power. Emulates pressing the On/Off button to the on position.
Exits console mode and returns to system console port.
Resets the server. Emulates pressing the Reset button.
Changes the escape sequence for invoking command mode.
Displays system status and sensors.
haltin
haltout
help or ?
poweroff
poweron
quit
reset
set escape
status
Command Conventions
You can delete an incorrect command with the Backspace key before you press Enter.
If you type a valid RMC command, followed by extra characters, and press Enter, the
RMC accepts the correct command and ignores the extra characters.
If you type an incorrect command and press Enter, the command fails with the message:
*** ERROR - unknown command ***
Halt
The halt command halts the managed system. The halt command is equivalent to pressing
and then immediately releasing the Halt button on the control panel. The RMC firmware exits
command mode and reconnects the users terminal to the system COM1 serial port.
RCM>halt
Focus returned to COM port
NOTE: The halt command can be used to force a halt assertion.
Haltin
The haltin command halts a managed system and forces a halt assertion. The haltin
command is equivalent to pressing and holding in the Halt button on the control panel. This
command can be used at any time after system power-up to allow you to perform system
management tasks.
Haltout
The haltout command terminates a halt assertion that was done with the haltin
command. It is equivalent to releasing the Halt button on the control panel after holding it in
(rather than pressing it once and releasing it immediately). This command can be used at any
time after system power-up.
Help or ?
The help or ? command displays all of the RMC firmware commands.
Poweroff
The poweroff command requests the RMC to power off the system. The poweroff
command is equivalent to pressing the On/Off button on the control panel to the off position.
RCM>poweroff
If the system is already powered off or if switch 3 (RPD DIS) on the switchpack has been set to
the on setting (disabled), this command has no immediate effect.
To power the system on again after using the poweroff command, you must issue the
poweron command.
If you are not able to issue the poweron command, the local operator can start the system as
follows:
1. Press the On/Off button to the off position and disconnect the power cord.
2. Reconnect the power cord and press the On/Off button to the on position.
Poweron
The poweron command requests the RMC to power on the system. The poweron command
is equivalent to setting the On/Off button on the control panel to the ON position. For the
system to power on, the following conditions must be met:
The RMC exits command mode and reconnects the users terminal to the system console port.
RCM>poweron
Focus returned to COM port
NOTE: If the system is powered off with the On/Off button, the system will not power up
from the RMC. The RMC will not override the "off" state of the On/Off button. If the system
is already powered on, the poweron command has no effect.
Firmware 5-31
Quit
The quit command exits the user from command mode and reconnects the serial terminal to
the system console port. The following message is displayed:
Focus returned to COM port
The next display depends on what the system was doing when the RMC was invoked. For
example, if the RMC was invoked from the SRM console prompt, the console prompt is
displayed when you enter a carriage return. If the RMC was invoked from the operating system
prompt, the operating system prompt is displayed when you enter a carriage return.
Reset
The reset command requests the RMC to reset the hardware. The reset command is
equivalent to pressing the Reset button on the control panel.
RCM>reset
Focus returned to COM port
The following events occur when the reset command is executed:
The console exits RMC command mode and reconnects the serial terminal to the system
COM1 serial port.
The power-up messages are displayed, and then the console prompt is displayed or the
operating system boot messages are displayed, depending on how the startup sequence has
been defined.
Setesc
The setesc command resets the default escape sequence for invoking RMC. The escape
sequence can be any character string. A typical sequence consists of 2 or more characters, to a
maximum of 15 characters. The escape sequence is stored in the modules on-board NVRAM.
NOTE: Be sure to record the new escape sequence. Although the factory defaults can be
restored if you forget the escape sequence, this requires resetting the EN RMC switch on
the RMC switchpack. See Using the RMC Switchpack.
The following sample escape sequence consists of five iterations of the Ctrl key and the letter
"o".
RCM>setesc
^o^o^o^o^o
RCM>
If the escape sequence entered exceeds 15 characters, the command fails with the message:
*** ERROR ***
When changing the default escape sequence, avoid using special characters that are used by the
systems terminal emulator or applications.
Control characters are not echoed when entering the escape sequence. Use the status
command to verify the complete escape sequence.
Status
The status command displays the current state of the system sensors, as well as the current
escape sequence and alarm information. The following is an example of the display:
RCM>status
Firmware Rev: V2.0
Escape Sequence: ^]^]RCM
Remote Access: ENABLE
Temp (C): 26.0
RCM Power Control: ON
RCM Halt: Deasserted
External Power: ON
Server Power: ON
RCM>
Description
Firmware Rev:
Escape Sequence:
Remote Access:
Temp (C):
RCM Power Control:
RCM Halt:
External Power:
Server Power:
Firmware 5-33
Switch
Function
Description
SW1-1
PIC Enable
SW1-2
SW1-3
PIC SYSPWR_ENABLE
Bypass
SW1-4
Firmware 5-35
7. Power down the system, unplug the AC power cords, and remove the system covers.
8. Set switch 4 to OFF.
9. Replace the system covers and plug in the power cords.
10. Power up the system to the SRM console prompt, and type the default escape sequence to
invoke RMC command mode. The escape sequence is the Ctrl key + left bracket key,
typed twice, followed by the letters rcm:
^]^]RCM
Symptom
Possible Cause
Suggested Solution
RMC installation is complete, RMC Power Control is set to Invoke RMC and issue the
poweron command.
but the system does not power DISABLE.
up.
Reseat the cables.
Cables are not correctly
installed.
You reset the system to factory AC power cords were not
defaults, but the factory
removed before you reset
settings did not take effect.
switch 4 on the RMC
switchpack.
Chapter
Troubleshooting
Introduction
As a service engineer, you are responsible for troubleshooting a system. Following a simple
checklist or strategy can help minimize the time you spend or help ensure that an obvious
problem is not overlooked. Many resources are available to help you isolate problems.
Topics in this chapter are:
Basic Troubleshooting
Problem Categories
Power-Up/Down Sequence
Basic Troubleshooting
Considerations Before Troubleshooting
Before troubleshooting a problem, check the site maintenance log for service history. Ask the
system manager the following questions:
1.
Has the system been used before and did it work correctly?
2.
3.
If changes or updates were made, are the revision numbers compatible for the system and
the operating system?
4.
If the operating system is down and you cannot bring it up, use power-up information and
console environment tools:
Power-up display
ROM-based diagnostics
If the operating system is running, use the operating system (OS) to gather information from
crash dumps, error logs, and the operator log. Run OS-based diagnostics.
2.
Gather information about the problem, including system status, revision levels of firmware,
and the operating system.
3.
4.
5.
6.
7.
8.
Troubleshooting
Troubleshooting Strategy
6-3
Problem Categories
System problems can be classified into the following categories. Using these categories, you can
quickly determine a starting point for diagnosis and eliminate the unlikely sources of the
problem.
Power problems
Console-reported failures
Boot problems
Thermal problems
Memory problems
SCSI problems
Power Problems
If the system does not power on, perform the following steps:
1.
2.
3.
4.
Check that the ambient room temperature is within environmental specifications (10C
40C, 50F104F).
5.
Check the remote management console using the status command. Look for fan status,
system temperatures, or power supply failures.
6.
Check that the cables on the system board are connected properly.
7.
Check that the internal power supply cables are plugged in at both the power supply and
system board.
8.
Ensure that both fans are plugged in and operating properly. Any non-operational fan will
cause automatic OS shutdown and can prevent power-on.
9.
Look for short circuits or overcurrent if the power will not stay on.
10. Wait two or more seconds after plugging the unit in before powering it on.
If the power button seems to be flashing, ACPI sleep mode is on. Push and hold the system
power button for more than four seconds if you want to shut the power off completely.
Troubleshooting
Troubleshooting Suggestions
If the power indicator is:
OFF
Check:
Front-panel power switch
Power at the wall receptacle
AC cord
Fans
There are three main fans in the system: two
are at the front of the system (top and bottom),
and one is in the power supply.
NOTE: The power supply shuts OFF within one
second if its internal fan fails.
6-5
Floppy light illuminated indicates firmware corrupted. Create firmware update floppy
disk. Then insert disk, power cycle system, and update firmware to repair system.
CPU fan failure. Replace CPU that has faulty fan. If a CPU fan is frozen, you can access
the RMC, but the system will not respond to a reset. If the CPU fan is open or
disconnected, the system powers off within 30 seconds.
Interpret the error beep codes at power-up for a failure detected during self-tests.
Check that the keyboard and monitor are properly connected and turned on.
If the power-up screen is not displayed, yet the system enters console mode when you
press return, check that the console environment variable is set correctly. If you are
using a VGA monitor as the console terminal, the console variable should be set to
graphics. If you are using a serial console terminal, the console variable should be set to
serial.
If a VGA controller other than the standard VGA controller is being used, ensure that the
VGA device conforms to the specified VGA legacy addressing. The P2P bridge will not
pass non-VGA legacy ISA addresses through to the secondary side. Also, the VGA BIOS
ROM must be readable by means of the standard PCI expansion ROM space.
If the console is set to serial mode, the power-up screen is routed to the COM1 serial
communication port and cannot be viewed from the VGA monitor. Try connecting a
console terminal to the COM1 serial communication port. If necessary, use an MMJ-to-9pin adapter (H8571-J). Check the baud rate setting for the console terminal and the
system. The system baud rate setting is 9600. When using the COM1 port, you must set
the console environment variable to serial.
If you suspect a firmware problem, use the fail-safe boot mechanism described later in this
chapter to load new console firmware from a diskette.
Troubleshooting
6-7
Console-Reported Failures
Symptom
Power-up tests do not
complete.
Solution
Use error beep codes or console serial terminal to determine what
error occurred and what FRU to replace.
Check the power-up screen for error messages
Interpret the error beep codes at power-up and check the power-up
screen for a failure detected during self-tests.
Use the error beep codes and/or console terminal to determine the
error.
Examine the console event log (enter the more el command) or the
power-up screen to check for embedded error messages recorded
during power-up.
If the power-up screen or console event log indicates problems with
mass storage devices, or if storage devices are missing from the
show config display, see SCSI Problems in this chapter.
If the power-up screen or console event log indicates problems with
PCI devices, or if PCI devices are missing from the show config
display, see PCI Problems in this chapter.
Use console commands such as test, show config, and more
el to verify the problem.
Boot Problems
Problem/Possible Cause
Action
Installation fails with an "inaccessible If the ATAPI CD-ROM is not shown as a SCSI
boot dev" restart installation
device, obtain the driver and setup file from the Web
message.
page for Alpha systems and specify it as an
additional controller.
Operating system (OS) software is
not installed on the hard disk drive.
Target boot device is not listed in the Check the cables. Are the cables oriented properly
and not cocked? Are there bent pins? Check all the
SRM show device or show
config display because of a SCSI SCSI devices for incorrect or conflicting IDs. Refer to
bus problem.
the devices documentation.
SCSI termination: The SCSI bus must be terminated
at the end of the internal cable and at the last
external SCSI peripheral.
System cannot find the boot device. Check the system configuration for correct device
parameters. Use SRM firmware to display the
hardware configuration.
Use the SRM show config and show device
commands. Use the displayed information to identify
target devices for the boot command, and verify that
the system sees all of the installed devices. If you are
attempting to use bootp, first set the following
variables as shown:
P00>>>set ewa0_inet_init BOOTP
P00>>>set ewa0_protocols BOOTP
Troubleshooting
Problem/Possible Cause
System does not boot.
Action
Verify that no unsupported adapters are installed.
Environment variables are incorrectly Configuration information is stored in the flash ROM
set. (This could happen if the MLB and RTC memory on the MLB. If the MLB is
has been replaced, which would
replaced, the information in the flash ROM and realcause a loss of the previous
time clock is lost. If the battery is replaced, the
configuration information).
information in RTC memory is lost. Keeping records
of the configuration (IRQs, DMAs, I/O addresses,
and so on) will facilitate getting the system back into
use.
Check and set the environment variables, if
necessary.
Use the SRM console show and set commands to
check and set the values assigned to boot-related
variables such as auto_action, bootdef_dev,
and boot_osflags.
System will not boot over the
network.
6-9
Thermal Problems
The DS20E system operates in an ambient temperature range of 10C40C. The system unit is
cooled by as many as 10 fans. There are two enclosure fans in the rear of the enclosure, two fans
in each power supply, and one fan on each CPU. A minimum configuration would have seven
fans.
Intermittent problems could result from overheating. Check that the airflow path is clear. Make
sure nothing is blocking the input grill. Also check to see that the cables inside the system are
properly dressed. A dangling cable can impede airflow to the system.
Solution
Examine the crash dump file.
Refer to the Guide to Kernel Debugging (AA-PS2TD-TE)
for information on using the Tru64 UNIX Crash utility.
Floppy light illuminated; firmware corrupted. Follow instructions for creating firmware
update floppy disk. Update firmware to repair system.
CPU fan failure. Replace CPU that has bad fan. If a CPU fan is frozen, you can access the
RMC, but the system will not respond to a reset. If the CPU fan is open or disconnected,
the system powers off within 30 seconds.
Memory Problems
Symptom
DIMMs ignored by system, or system
unstable. System hangs or crashes.
Solution
Ensure each memory bank has identical DIMMs installed.
NOTE: Some third-party DIMMs may not be compatible with DS20E systems:
POST checks 24+ SDRAM DIMM performance parameters for each DIMM to ensure
system reliability.
Troubleshooting
SDRAM DIMMs come in many quality grades, which do not all meet the performance
requirements of these machines.
Confirm that the PCI option card is supported and has the correct firmware and software
versions.
2.
Confirm that the PCI option card and any cabling are properly seated.
3.
Check for a bad PCI slot by moving the last installed PCI controller to a different slot.
4.
All PCI devices must correctly handle PCI parity to enable checking feature.
2.
3.
4.
If the card must be used, try to disable PCI parity checking in SRM firmware.
5.
If the problem is not specific to the PCI option cards, replace CPU card or the system
board.
6-11
SCSI Problems
SCSI problems are generally manifested as:
Data corruption
Boot problems
Poor performance
No terminators in between.
Old 50-pin (narrow) devices must be connected with wide-to-narrow adapter (SNPBXKP-BA). Do not cable from the connector on the card.
Any external drives must be connected to their associated card, and these card must have no
internal drives connected to them. Use a separate external controller card.
Troubleshooting
6-13
Tools
Error handling/logging tools
Description
Use error logs as the primary method of diagnosis and fault isolation. If the
system is up or you can bring it up, look here first.
ROM-based diagnostics
(RBDs)
Loopback tests
Firmware console commands Use SRM commands to set and examine environment variables and invoke
RBDs and exercisers.
Crash dumps
Crash dumps are created when the operating system hangs and is manually
halted by pressing and holding the Halt button for at least one second. The
SRM crash command provides a display of the crash dump that can be
used to determine why the system crashed.
The fail-safe booter allows you to restore console firmware that may have
become corrupted. Use the FSB when one of the following failures at
power-up prohibits you from getting to the console program:
Firmware image in flash memory corrupted
Power failure or accidental power-down during a firmware upgrade
Error in the nonvolatile RAM (NVRAM) file
Incorrect environment variable setting
Driver error
If the firmware image is unavailable when the system is powered on or reset, the FSB runs
automatically. When the FSB runs, the system emits a series of beeps through the speaker
as beep code 1-2-3; that is, one beep and a pause, followed by two beeps and a pause,
followed by three beeps.
1.
After the diskette activity light flashes, insert the FSB diskette named DP264SRM.ROM
that you created. (See the next section, Preparing Diskettes.)
2.
Reset the system to restart the FSB. The FSB loads the SRM console from the diskette.
3.
1.
2.
Set switch 1 (FSB) of SW2 on the main board to the on position (see the following
illustration).
3.
Insert the FSB diskette named DP264SRM.ROM that you created. (See the next section,
Preparing Diskettes.)
4.
5.
Troubleshooting
Preparing Diskettes
The required firmware for your system is preloaded onto the flash ROM. Copies of the firmware
files may be included on your distribution CD, in case you need to refresh the firmware. If they
are not included, you can download them from the Alpha OEM World Wide Web Internet site
at:
http://www.digital.com/alphaoem
Click on Technical Information, then click on Alpha Drivers and Firmware.
The utilities that are used to reload or update the firmware expect to find the files on a diskette,
so you need to prepare a diskette for each utility with the correct files from the CD or the Web.
For FSB: Copy the file PC264SRM.ROM onto a diskette, renaming it DP264SRM.ROM.
For Updating Firmware: Copy the file PC264SRM.ROM and the file PC264FW.TXT onto
a diskette.
6-15
Updating Firmware
Be sure to read the information on starting the FSB and preparing diskettes before continuing
with this section.
At the Alpha SRM console prompt, issue the lfu command. This command invokes the
Loadable Firmware Update (LFU) utility.
Perform the following steps to update the console firmware. Refer to the example below.
1.
2.
Enter the device name dva0 when prompted for the location of the update files.
3.
Enter the filename PC264FW.TXT when prompted. Note that the LFU has already checked
the contents of the diskette and should provide PC264FW.TXT as the default.
PC264FW.TXT specifies which firmware is to be updated and passes the names of the files
that contain updated firmware.
4.
P00>>>lfu
Checking dka400.4.0.7.1 for the option firmware files. . .
Checking dva0 for the option firmware files. . .
Option firmware files were not found on CD or floppy.
If you want to load the options firmware,
please enter the device on which the files are located(ewa0),
or just press <return> to proceed with a standard console update: dva0
Please enter the name of the options firmware files list, or
Press <return> to use the default filename (pc264fw.txt) : pc264fw.txt
Copying PC264FW.TXT from dva0. . .
Copying PC264SRM.ROM from dva0. . .
***** Loadable Firmware Update Utility *****
--------------------------------------------------------------------------- Function
Description
--------------------------------------------------------------------------Display
Displays the systems configuration table.
Exit
Done exit LFU (reset).
List
Lists the device, revision, firmware name, and update revision.
Readme
Lists important release information.
Update
Replaces current firmware with loadable data image.
Verify
Compares loadable and hardware images.
? or Help
Scrolls this function table.
--------------------------------------------------------------------------UPD> update
Troubleshooting
POST Sequence
The power-on self-test (POST) firmware is loaded into the CPU from the SROM at cold poweron or after a hard reset. POST then performs the following functions:
Puts the CPU, Bcache, memory, and I/O into a state that can be used by the operating
system firmware.
Runs some basic hardware tests to ensure the operating system firmware can start:
q
q
q
q
q
Memory test
Cache test
PCI data path test
ISA data path test
Flash ROM test
Memory Testing
Ensures that each array consists of four identical DIMMs in every other socket.
Checks the parameters in the onboard ROM of each DIMM for compatibility with the
system.
Ensures that there are identical DIMMs in each bank and rejects the array if the test fails.
Performs the Checkerboard test to detect data path errors. The pattern is alternated with
each write to memory. The first 32 MB of the first array is tested.
Performs an addressing test to check all address lines in both arrays. The memory array is
rejected if memory testing fails. Rejected arrays are rendered inoperative and are not
reported to the operating system firmware. If no usable memory is detected, a beep code is
emitted.
Cache Test
In this test, the data path integrity is tested. A thorough pattern test is performed on all Bcache
SRAM cells.
6-17
Beep Codes
Beeps
Meaning of the Code
Action to Repair
1-2-3 One beep and a pause followed by two beeps Update the firmware. See the procedure in the
and a pause, followed by three beeps.
Fail-Safe Booter section of this chapter.
Indicates that the firmware in flash ROM is
unavailable and fail-safe booter has begun
running.
4
The header in the ROM is not valid.
Replace the ROM.
6
Troubleshooting
The LED codes are generated in the order indicated in the following table. Notice that the 4
follows the 0, and that the F and E are ambiguous unless you are watching the sequence. They
have a different meaning the second time they appear.
6-19
Hex
Code
F
E
D
C
B
A
9
8
7
6
5
3
2
1
0
4
F
E
Meaning
MSB
Starting console
1
Initialized idle PCB
1
Initializing semaphores
1
Initializing heap
1
Setting heap base address
1
Setting memory low limit
1
Initializing driver structures
1
Initializing idle process PID
1
Initializing file system
0
Initializing timer data structures 0
Lowering IPL
0
Create dead_eater
0
Create poll
0
Create timer
0
Create power-up
0
Entering idle loop
0
Probing I/0
1
Starting drivers
1
1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
1
0
0
1
1
0
0
1
1
0
1
1
0
0
0
1
1
LSB
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
Troubleshooting
Indication
5V OK LED ON and any of the
following are OFF:
CPU DC OK
2V OK
5V OK LED OFF
6-21
Hints
If the front panel power LED quickly changes from off to on, then off again after you
press the power button, there may be a short in the system.
If the power button seems to be flashing, the ACPI sleep mode is on. Push and hold the
system power button for more than four seconds.
Troubleshooting
1 indicates that the LED is on, and 0 indicates the LED is off.
D
C
B
A
9
8
7
6
Initializing semaphores
Initializing heap
Initial heap
Memory low limit
Initializing driver structures
Initializing idle process PID
Initializing file system
Initializing timer data structures
Lowering IPL - begin taking
Stuck interrupt, remove options one by
5
interrupts
one, MLB, CPU0, CPU1
4 Entering idle loop
3 Create dead_eater
2 Create poll
1 Create timer
Check that DIMMs are identical within
0 Create powerup
bank, seated properly, good.
F Probing I/0
Check for bad or unsupported options.
Remove options one by one, or
E Starting drivers
disconnect storage cables to isolate
problem.
6-23
RBDs run using console commands and rely on exerciser modules to isolate errors. They report
errors to the console terminal and/or the console event log. Exercisers run concurrently,
providing maximum bus interaction between the console drivers and the target devices. The
test, init, and more el commands are particularly useful in troubleshooting.
The test command requires a diskette in the floppy disk drive and loopback connectors on
COM2 and parallel port. The test command:
The init command causes a system restart and initialization. The firmware begins initializing
and testing the system and the console displays the test countdown.
The more el command displays the contents of the event log one page at a time.
Use the following command sequence:
P00>>> init
.
.
.
P00>>> test
P00>>> more el
This sequence of commands provides a convenient way to test the system for hardware errors. If
the tests that are run when the init command is executing fail, they are indicated during the
test countdown and an error message is displayed. If these tests do not find any errors, the test
command may find errors because it runs firmware diagnostics for the entire core system. Fatal
errors are reported to the console terminal.
Troubleshooting
Init Command
The init command resets the system.
P00>>>init
Initializing...
128 Meg of system memory
probing hose 1, PCI
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pka -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkb -- Adaptec AIC-7895
bus 0, slot 7 -- vga -- ELSA GLoria Synergy
bus 0, slot 8 -- pkc -- NCR 53C895
bus 0, slot 9 -- ewa -- DE500-AA Network Controller
Testing the System
Testing the Memory
Testing the Disks (read only)
Testing the Network
System Temperature is 34 degrees C
initializing GCT/FRU at 1ec000
COMPAQ AlphaServer DS20E Console v5.5-9, Aug 31 1999 11:52:26
P00>>>
Test Command
The test command tests the entire system, a portion of the system (subsystem), or a specified
device. If no device or subsystem is specified, the entire system is tested. The command syntax
is:
t[est][-write][-nowrite"list"][-omit "list"][-t time][-q][dev_arg]
where:
-write specifies that data will be written to the specified device.
-nowrite specifies that data will not be written to the device specified in the "list".
-omit specifies that the devices in the "list" are not to be tested.
-t specifies the amount of time the test command is to run.
-q defines data size as a quadword (64 bits). All values default to 8 bytes.
<dev_arg> specifies the target device, group of devices, or subsystem to test.
For example:
P00>>> t pci0 -t 60
In this example, the test command tests all devices associated with the PCI0 subsystem. Test
run time is 60 seconds.
When a subsystem or device is specified, tests are executed on the associated modules first, then
the appropriate exercisers are run.
6-25
When viewing the scrolling log with cat el, use Ctrl/S and Ctrl/Q to pause and resume
display.
P00>>>more el
256 Meg of system memory
probing hose 1, PCI
bus 0, slot 7 -- pka -- NCR 53C895
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pkb -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkc -- Adaptec AIC-7895
bus 2, slot 5 -- eia -- Intel 8255x Ethernet
resetting the SCSI bus on pka0.7.0.7.1
port dqa.0.0.105.0 initialized
port pka0.7.0.7.1 initialized, scripts are at 1d03c0
port dqb.0.1.205.0 initialized
device dqa0.0.0.105.0 (CD-224E) found on dqa0.0.0.105.0
device dka0.0.0.7.1 (SEAGATE ST39102LC) found on pka0.0.0.7.1
device dka100.1.0.7.1 (QUANTUM VIKING II 4.5SCA) found on pka0.1.0.7.1
device dka200.2.0.7.1 (QUANTUM VIKING II 4.5SCA) found on pka0.2.0.7.1
device dka300.3.0.7.1 (QUANTUM VIKING II 4.5SCA) found on pka0.3.0.7.1
environment variable aa_value_bcc created
environment variable aa_2x_cache_size created
environment variable mstart created
--More-- (SPACE - next page, ENTER - next line, Q - quit)_[2K_[256Denvironment
variable mend created
sense key = Unit Attention (29|02) from dka0.0.0.7.1
P00>>>
Troubleshooting
6-27
The showit command continually displays the status of currently running diagnostics. The
syntax is:
P00>>>showit
Showit Example
In this example, the showit command is terminated using Ctrl/C (^C) after two iterations. The
information displayed includes the process ID, device under test, passes completed, and error
counts (hard/soft).
P00>>>showit
ID
Program
Device
Pass Hard/Soft
-------- ------------ ------------ ------ --------00000001 idle
system
0
0 0
00000057 exer_kid
tta1
0
0 0
000001bc memtest
memory
47
0 0
000001f2 memtest
memory
29
0 0
00000229 exer_kid
dva0.0.0.0.0 0
0 0
ID
Program
Device
Pass
Hard/Soft
-------- ------------ ------------ ------ --------00000001 idle
system
0
0 0
00000057 exer_kid
tta1
0
0 0
000001bc memtest
memory
48
0 0
000001f2 memtest
memory
29
0 0
00000229 exer_kid
dva0.0.0.0.0 0
0 0
^C
Bytes Written
-------------
Bytes Read
------------0
0
1
0
12331253760
12331253760
7398752256
7398752256
158208
157696
Bytes Written
Bytes Read
------------- ------------0
0
1
0
12535726080
12535726080
7603224576
7603224576
164864
164864
Show_status Example
The show_status command displays the status of currently running diagnostics. The syntax
is:
P00>>> show_status
Many diagnostics run in the background and display information only if an error is detected
during the testing. Use the show_status command whenever you need to display the
progress of these diagnostics.
In this example, show_status displays one line of information for each currently executing
diagnostic. This information includes the process ID, device under test, passes completed, and
error counts.
P00>>> show_status
ID
Program
Device
Pass Hard/Soft
-------- ------------ ------------ ------ --------00000001
idle
system
0
0 0
00000057 exer_kid
tta1
0
0 0
000001bc
memtest
memory
47
0 0
000001f2
memtest
memory
29
0 0
00000229 exer_kid
dva0.0.0.0.0 0
0 0
P00>>>
Bytes Written
------------0
1
12331253760
7398752256
158208
Bytes Read
------------0
0
12331253760
7398752256
157696
Read-only tests: DK* disks, DR* disks, DU* disks, MK* tapes, DV* floppy
VGA console tests, only if the console environment variable is set to serial.
NOTE: You must have a serial terminal if the console environment variable is set to
serial.
Troubleshooting
Use the test command with the following arguments for loopback testing or running
scripts:
Does this:
-lb
memory
serial1
network
parallel
floppy
disks
Type show_status or showit to display test progress. Type cat el to redisplay recent
errors.
kill <PID>Ends a specific test process where <PID> is the specific process.
6-29
Under some circumstances, you may need to change the system type. For example, you need to
change it if it was set incorrectly by Manufacturing. You should also check the setting if you
replace the server features module.
The system type is set with the following deposit commands.
P00>>> d -b iic_rcm_nvram0:11 00
P00>>> d -b iic_rcm_nvram0:11 01
If you need to reset the system type, use a procedure similar to the following. This example
changes the system type to a workstation.
1.
P00>>> d -b iic_rcm_nvram0:11 01
2.
Initialize the system and observe the console banner when power-up has been completed. In
this example, the system type is now a workstation.
P00>>>init
Initializing...
128 Meg of system memory
probing hose 1, PCI
probing PCI-to-PCI bridge, bus 2
bus 2, slot 0 -- pka -- NCR 53C875
bus 2, slot 1 -- pkb -- NCR 53C875
bus 2, slot 2 -- ewa -- DE500-AA Network Controller
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
bus 0, slot 5, function 1 -- dqa -- Cypress 82C693 IDE
bus 0, slot 5, function 2 -- dqb -- Cypress 82C693 IDE
bus 0, slot 6, function 0 -- pkc -- Adaptec AIC-7895
bus 0, slot 6, function 1 -- pkd -- Adaptec AIC-7895
bus 0, slot 7 -- vga -- ELSA GLoria Synergy
bus 0, slot 8 -- pke -- QLogic ISP10x0
Testing the System
Testing the Memory
Testing the Disks (read only)
Testing the Network
System Temperature is 24 degrees C
initializing GCT/FRU at 1e2000
COMPAQ AlphaStation DS20E 666 MHz Console V5.6-3, Nov 29 1999 10:32:53
Troubleshooting
2
You can also examine the I C ROM on the server features module to determine the system type.
The example below identifies the system as a workstation.
P00>>> e -b iic_rcm_nvram0:11
iic_rcm_nvram0:
11 01
q
q
http://www.digital.com/alphaserver/ds20e/index.html
Product Support Information Collection (ProSIC) for Alpha systems:
http://prosic.cxo.dec.com/SYSTEMS/systems.html#alpha
Supported options
q
q
q
http://www.digital.com/alphaserver/ds20e/options/asds20e_options.html
Configuration information and examples:
http://www.digital.com/info/alphaserver/configure.html
Compaq Systems and Options Catalog:
http://www.digital.com/info/SOHOME/SOHOMEHM.HTM
6-31
Chapter
Error Registers
Introduction
When diagnosing a DS20E system error, you may have to use the contents of one or more
system error or status registers to determine the specific cause of the error. This chapter presents
the registers that contain information critical to troubleshooting.
Topics in this chapter are:
Miscellaneous Register
Failure Register
Function Register
Error Registers
7-3
Field
Description
This bit can be written with a one or cleared. It indicates whether or not the
Icache encountered a parity error on instruction fetch. When a parity error is
detected, the Icache is flushed, a replay trap back to the address of the error
instruction is generated, and a correctable read interrupt is requested.
This read-only bit indicates a value (0-7) that must be subtracted from the
counter 0 result to obtain an accurate count of the number of instructions
retired in the interval beginning three cycles after the profiled instruction
reaches pipeline stage 2 and ending four cycles after the profiled instruction
is retired.
If the I_STAT[TRP] bit is set, this read-only bit indicates that the profiled
instruction caused a mispredict trap. JSR/JMP/RET/COR or
HW_JSR/HW_JMP/ HW_RET/HW_COR mispredicts do not set this bit but
can be recognized by the presence of one of these instructions at the PMPC
location with the I_STAT[TRP] bit set. This identification is exact in all cases
except error condition traps. Hardware corrected Icache parity or Dcache
ECC errors and machine check traps can occur on any instruction in the
pipeline.
If the profiled instruction caused a replay trap, this read-only bit indicates that
the precise trap cause was an Mbox load-store order replay trap. If clear, this
bit indicates that the replay trap was any one of the following:
Mbox load-load order
Mbox load queue full
Mbox store queue full
Mbox wrong size trap (such as, STL
/'4
Field
Description
This read-only bit indicates that the profiled instruction caused a trap. The
trap type field, PMPC register, and instruction at the PMPC location are
needed to distinguish all trap types.
Replay
Invalid (unused)
Unaligned Load/Store
Dstream Fault
OPCDEC
10
Machine Check
11
12
Arithmetic
13
14
MT_FPCR
15
Reset
Traps due to ITB miss, Istream access violation, or interrupts are not reported
in the trap type field because they do not cause pipeline aborts. Instead,
these traps cause pipeline redirection and can be distinguished by examining
the PMPC value for the presence of the corresponding PALcode entry offset
addresses indicated below. In these cases, the ProfileMe interrupt will
normally be delivered when exiting the trap PALcode flow and the
EXC_ADDR register will contain the original PC that encountered the redirect
trap.
PC[14:0] Trap
0581
ITB miss
0481
Istream Access Violation
0681
Interrupt
ICM (ProfileMe Icache Miss)
This read-only bit indicates that the profiled instruction was contained in an
aligned 4-instruction Icache fetch block that requested a new Icache fill
stream.
This read-only bit indicates a value (07) that must be subtracted from the
counter 0 result to obtain an accurate count of the number of instructions
retired in the interval beginning three cycles after the profiled instruction
reaches pipeline stage 2 and ending four cycles after the profiled instruction
is retired.
Error Registers
Field
Description
OPCODE
RA
BAD_VA
DTB_MISS
FOW
Set if reference was a write and the FOW bit of the PTE was set.
FOR
Set if reference was a read and the FOR bit of the PTE was set.
ACV
WR
7-5
Field
Description
SEO
Second error occurred. When set, this bit indicates that a second Dcache
store ECC error occurred within 6 cycles of the previous Dcache store ECC
error.
ECC_ER_LD
ECC error on load. When set, this bit indicates that a single-bit ECC error
occurred while processing a load from the Dcache or any fill.
ECC_ERR_ST
ECC error on store. When set, this bit indicates that an ECC error occurred
while processing a store.
TPERR_P1
Tag parity error, pipe 1. When set, this bit indicates that a Dcache tag probe
from pipe 1 resulted in a tag parity error. The error is uncorrectable and
results in a machine check.
TPERR_P0
Tag parity error, pipe 0. When set, this bit indicates that a Dcache tag probe
from pipe 0 resulted in a tag parity error. The error is uncorrectable and
results in a machine check.
Error Registers
Name
Description
C_STS[3:0]
C_ADDR[6:42]
Bits
Error Status
00000
00001
00010
00011
DSTREAM_MEM_ERR
00100
DSTREAM_BC_ERR
00101
DSTREAM_DC_ERR
0011X
PROBE_BC_ERR
01000
Reserved
01001
Reserved
01010
Reserved
01011
ISTREAM_MEM_ERR
01100
ISTREAM_BC_ERR
01101
Reserved
0111X
Reserved
10011
DSTREAM_MEM_DBL
10100
DSTREAM_BC_DBL
11011
ISTREAM_MEM_DBL
11100
ISTREAM_BC_DBL
Status of Block
7:4
3
2
1
0
Reserved
Parity
Valid
Dirty
Shared
7-7
Miscellaneous Register
The miscellaneous register (MISC) is designed so that only writes of 1 affect it. When a 1 is
written to any bit in the register, there is no need to be concerned with read-modify-write or the
status of any other bits in the register. Once NXM is set, the NXS field is locked. It is unlocked
when software clears the NXM field. The ABW (arbitration won) field is locked if either ABW
bit is set, so the first CPU to write it locks out the other CPU. Writing a 1 to ACL (arbitration
clear)clears both ABW bits and both ABT (arbitration try) bits and unlocks the ABW field.
Error Registers
Field
Description
DEVSUP
REV
NXM
NXS
ACL
<24> WO 0 Arbitration clear. Writing a 1 to this bit clears the ABT and ABW
fields.
ABT
ABW
<19:16> R, W1S 0 Arbitration won. Writing a 1 to these bits sets them unless
one is already set, in which case the write is ignored.
IPREQ
IPINTR
<11:8> Interprocessor interrupt pending one bit per CPU. Pin irq<3> is
asserted to the CPU corresponding to a 1 in this field.
ITINTR
<7:4> R, W1C 0 Interval timer interrupt pending one bit per CPU. Pin
irq<2> is asserted to the CPU corresponding to a 1 in this field.
CPUID
7-9
Field
Description
ERR
NXS
Error Registers
7-11
Field
Description
INV
<51> Info Not Valid This bit is meaningful when one of the bits <11:0> is
set. This bit indicates the validity of SYN, CMD, and ADDR bits.
Valid = 0, Invalid = 1.
CMD
<55:52> This field represents the PCI command when the error occurred if
the error is not a correctable ECC error (CRE) or uncorrectable ECC error
(UECC).
If the error is a CRE or UECC, then the value of this field is defined as
follows:
Value
Command
0000
DMA read
0001
DMA read-modify-write
0011
SGTE read
Others
Reserved
SYN
RES
<15:12> Reserved.
CRE
UEOC
RES
<15:12> Reserved.
NDS
RDPE
TA
APE
SGE
DCRTO
PERR
SERR
LOST
<0> An error was lost because it was detected after this register was frozen,
or while in the process of clearing this register.
Error Registers
7-13
Failure Register
2
The failure register, located on the I C bus, is locked when there is a power supply or fan failure.
Together with the function register, fan and power supply failures are identified and reported to
the operating system thus notifying it that the system will shut down in 30 seconds. The results
of reading this register are displayed by the SRM show power command.
Field
Description
Reserved
<0>
C/SFANO_L
<1> When set, this bit indicates that either the system fan 0 or the fan on the
heatsink on CPU0 failed. Which of the two fans failed is determined by the
state of SYSFAN_OK and CPUFANS_OK in the function register.
Reserved
<2>
Reserved
<3>
PS1_PRESENT_L
C/SFAN1_L
<5> When set, this bit indicates that either the system fan 1 or the fan on the
heatsink on CPU1 failed. Which of the two fans failed is determined by the
state of SYSFAN_OK and CPUFANS_OK in the function register.
PS2_PRESENT_L
PS0_PRESENT_L
Function Register
2
The function register generates an interrupt on the I C bus if one of the critical functions
monitored (power, temperature, fan operation) goes beyond predetermined limits. When such an
interrupt is generated, the contents of bits <0, 1, 2, and 5> in the failure register are frozen. The
system shuts down 30 seconds after the interrupt is posted. The results of reading this register
are displayed by the SRM show power console command.
Field
Description
TEMP_OK
<0> When set, this bit indicates that the temperature inside the system
enclosure is below the temperature limit.
SYSFAN_OK
<1> When this bit is zero, C/SFAN0_L and C/SFAN1_L indicate which
system fan failed. When set to one, this bit indicates that the system fans
are functioning properly.
Reserved
<2>
CPUFANS_OK
<3> When this bit is 0, C/SFAN0_L and C/SFAN1_L indicate which CPU
fan failed. When set, this bit indicates that the fans on CPU heatsinks are
functioning properly.
Reserved
<4>
PSO_FAIL
<5> When set, this bit indicates that power supply 0 has failed. This bit is
valid only if the corresponding PS0_PRESENT_L bit is 0.
PS1_FAIL
<6> When set, this bit indicates that power supply 1 has failed. This bit is
valid only if the corresponding PS1_PRESENT_L bit is 0.
PS2_FAIL
<7> When set, this bit indicates that power supply 2 has failed. This bit is
valid only if the corresponding PS2_PRESENT_L bit is 0.
Chapter
OS Diagnostics Overview
Introduction
Each operating system supports tools and has features that can assist you in troubleshooting.
The following tools are described in this chapter:
DEC VET
Machine Checks
Compaq Analyze
Crash dumps
Crash Dumps
Under the Tru64 UNIX operating system, you can initiate a crash dump and analyze a systemgenerated crash dump. For example, if the system is hung (no response from keyboard, mouse,
or network), press the Halt button for a second. The system should exit to the SRM console. If
the system crashes, it exits UNIX and writes a crash dump to disk. To display the crash dump,
enter crash at the SRM console prompt.
OS Diagnostics Overview
DEC VET
The DIGITAL Verifier Exerciser Tool (DEC VET) is an application that is used to verify a
system installation and to exercise the components of the system.
DEC VET provides both a graphical user interface and a command-line interface. Both
interfaces allow a user to exercise the hardware and software in the same way for any system on
a network, regardless of the operating system of the remote system.
Because DEC VET runs on an operating system, it can be used as the first level of testing when
troubleshooting a system. You can run DEC VET to exercise one or more system components
without having to shut down the operating system. This may be an advantage when
troubleshooting a system if the customer does not want to shut down the operating system just
to test the system.
DEC VET has a generic set of exercisers that can be used to test the installation of hardware and
base operating system software. The DEC VET exercisers can be configured to test a single
device or to exercise all the devices on a system simultaneously. With this tool, you can:
8-3
This exerciser:
CPU
Does this:
Tests system processor functions, including binary operations, integer and
floating-point computations, and data conversion.
Memory
Dynamically allocates and deallocates virtual memory; writes and verifies test
patterns.
Disks
Tests both logical and physical disk I/O by performing read and write
operations. Verifies the test patterns written to the disks.
Files
Writes to and reads from disk files and verifies the test patterns written.
Tape
Writes to and reads from tape device files and verifies the test patterns written.
The operations include file mark detection, spacing, rewinding, and end-oftape detection.
Network
Tests the underlying protocols, physical network adapters, local and remote
networks, destination adapters, network services, and echo daemons. Both
TCP/IP and DECnet networks are supported.
Printer
Prints out a file containing a test pattern of all the ASCII characters from " "
(blank space) to "~" (tilde). This pattern is shifted one character to the right on
each subsequent line. Enough lines are printed to verify that all the ASCII
characters can be printed at each position. Other tests are available to test
PostScript output.
Terminal
Displays to the terminal screen a file containing a test pattern of all the ASCII
characters from " " (blank space) to "~" (tilde). This pattern is shifted one
character to the right on each subsequent line. Enough lines are displayed to
verify that all the ASCII characters can be displayed at each position.
Video
Displays several video test patterns and graphics. These verify the consoles
ability to display graphics, text, and shades of color accurately.
OS Diagnostics Overview
Machine Checks
Machine checks are usually associated with hardware error conditions. Machine checks can
represent:
Correctable errors do not usually affect system operation. Uncorrectable errors usually result in
a system crash.
When a system error is detected, the PALcode usually classifies it as a machine check. PALcode
collects error information from module control and status registers and formats it into a logout
frame that is passed to the operating system. The operating system uses the information in the
logout frame to determine the action to take when an error occurs. Some errors are fatal; they
can cause the entire system or a specific process to fail. Other errors can be corrected and do not
halt processing. The operating system writes the error information in an entry in a binary file
that can be used to produce an error log. Most of the errors occur during the transmission of
commands or data along the system bus or in buses or storage internal to a particular module.
In handling errors, the PALcode is responsible for parsing the exception and building the
machine check logout frame. During error checking, the following activities take place:
Control passes to the operating system through the system control block (SCB)
Operating System
The operating system (OS) machine check handler is responsible for the following actions:
Executing fault analysis using the saved machine check frame and context
8-5
The operating system logs events and errors that occur while the system is running. You can use
the information in these event logs to help troubleshoot system problems.
The error handlers in an operating system generate entries in the binary system error log. All
error log entries are written immediately, except for correctable memory errors. The size of an
error log entry depends on the type of error that occurs and the error handling mechanisms used
to log it.
Common OS Header
The common operating system header is a segment of the error log entry for systems using the
Alpha 21264 (EV6/EV67) processor. This header is used by the error analysis tools to:
Identify events
Dispatch errors
This header contains similar information for all supported operating systems. However, some of
the fields contain different information depending on the operating system.
The error logout frame includes the values of several system and/or processor registers, which
can be used to determine the cause of the error.
OS Diagnostics Overview
Termination Block
Each error log entry must be terminated with an error event termination block. This block tells
the software reading the error log entry that it has reached the end of the entry.
Error Classes
Four classes of errors are handled by the system bus:
Soft errors, hardware corrected, transport to the software (for example, single-bit ECC
errors).
Hard errors restricted to the failing transaction (for example, a double-bit error in a
memory location in a users process. This would result in the process being aborted and
the page being marked as bad). The system can continue operation.
System fatal hard errors. The system integrity has been compromised and continued
system operation cannot be guaranteed. All outstanding transactions are aborted, and the
state of the system is unknown.
8-7
Error Types
Six types of machine checks can occur in DS20E systems.
System-detected This type of machine check is caused by correctable single-bit errors that
correctable errors occur in the system. These errors are detected by the crossbar P-chip and are
(SCB 620)
typically correctable read data (CRD) errors. Possible causes for this type of
This is known as a 630 machine check for the associated SCB number.
Environmental
This type of machine check is caused either by a system-detected hardware
correctable errors failure or by an environmental condition. The system may recover from these
(SCB 680)
conditions if there is redundant hardware present (for example, a redundant
power supply). Possible causes for this type of machine check include:
Power supply failure error
Overtemperature warning condition
This is known as a 680 correctable machine check for the associated SCB
number.
OS Diagnostics Overview
System
uncorrectable
errors (SCB 660)
This type of machine check is caused by a system-detected or processordetected uncorrectable error. These errors are the result of a request that was
made external to the processor. These errors may cause the machine to
crash. Possible causes for this type of machine check include:
Nonexistent memory reference error
Fatal PCI error
PCI data parity error
This is known as a 660 machine check for the associated SCB number.
Processor
uncorrectable
errors (SCB 670)
Environmental
uncorrectable
errors (SCB 680)
8-9
8-10
The r in the table below, when set, indicates that the error is retryable.
The s in the table below, when set, indicates that this is the second error.
The offset from the CEF below is shown for reference only.
Offset from
63
62
61
32 31
SBZ
Frame Size
0 MCLF
CEF
000h
068h
System Offset
EV6/EV67 Offset
008h
070h
MCHK Code
010h
078h
OS Diagnostics Overview
8-11
008h
010h
018h
020h
028h
030h
038h
040h
048h
050h
EV6 Reserved 0
058h
060h
068h
070h
EV6 Reserved 1
078h
EV6 Reserved 2
080h
8-12
Offset from
System Data
MCLF
000h
0A0h
008h
0A8h
010h
0B0h
018h
0B8h
020h
0C0h
Tsunami/Typhoon Reserved 1
028h
0C8h
Tsunami/Typhoon Reserved 2
030h
0D0h
Tsunami/Typhoon Reserved 3
038h
0D8h
Tsunami/Typhoon Reserved 4
040h
0E0h
CEF
OS Diagnostics Overview
The SRM console command, show hwrpb, returns the physical address of the Hardware
Restart Parameter Block (HWRPB). That address contains its own physical address.
HWRPB offset A0h contains an offset relative to the beginning of the HWRPB that points
to the Per-CPU Slot data structure for the DS20E system.
Offset D8h from the beginning of the Per-CPU Slot information contains the physical
address of the Corrected Error Frame (CEF).
Offset 00h of the CEF contains its length. Since the MCLF immediately follows the
Corrected Error Frame, add this value to get the address of the MCLF.
Command
P00>>> show hwrpb
HWRPB is at 2000
P00>>> e -p 20a0
PMEM: 20A0 000001C0
P00>>> e -p 2298
PMEM: 2298 00006000
P00>>> e -p 6000
PMEM: 6000 00000068
P00>>> e -p 6068
PMEM: 6078 00000098
Description
<- physical address of HWRPB
2000+A0
<- offset of Per-CPU Slot
2000+1c0+D8
<- physical address of CEF
6000
<- offset to MCLF
6000+68+10
<- machine check code
8-13
8-14
Frame Size
Provides the physical size of the entire logout error frame in bytes
Second Flags
If bit <63> is set to 1 a retryable error has occurred. If bit <63> set to 0 a nonretryable error typically fatal in nature.
Second Error is <62> flag and if bit <62> is set to 1 a second error has
occurred. If is set to 0 only one error has occurred.
EV6/EV67
Offset
Offset from the base address of the frame to the start of 21264 internal
processor registers error content information.
System Offset
Offset from the base address of the frame to the start of the system-specific
diagnostic, control, status, or error content information begins.
MCHK Code
MCHK Frame
Revision
EV6/EV67
xxx xxx xxx
Software Error
CChip, PChip
TIG
Specifically refers to a Timing, Interrupt and General bus (TIG) Controller Chip
system diagnostic, control, status, or error registers.
Xxx
Reserved.
Yy
OS Diagnostics Overview
Frame_Flags
System_Area_Offset
Frame_Rev
SW_Sum_Flags
Cchip_DIR
Environ_QW_1
[64:8]MBZ
Environ_QW_2
[64:8]MBZ
Environ_QW_3 (Reserved)
Environ_QW_4 (Reserved)
Environ_QW_5 (Reserved)
Environ_QW_6 (Reserved)
Environ_QW_7 (Reserved)
Environ_QW_8 (Reserved)
Environ_QW_9 (Reserved)
Frame_Size
CPU_Area_Offser
Mchk_Error_Code
8-15
8-16
The r in the table below indicates that the error is retryable when set.
The s in the table below indicates that this is the second error.
63
62
61
32 31
SBZ
Frame Size
Offset(Hex)
000h
System Offset
CPU Offset
008h
MCHK Code
010h
Offset from
CPU
Data
000h
018h
008h
020h
010h
028h
018h
030h
020h
038h
028h
040h
030h
048h
038h
050h
CEF
OS Diagnostics Overview
Offset from
System Data
CEF
000h
058h
008h
060h
010h
068h
018h
070h
020h
078h
Tsunami/Typhoon Reserved 1
028h
080h
Tsunami/Typhoon Reserved 2
030h
088h
Tsunami/Typhoon Reserved 3
038h
090h
Tsunami/Typhoon Reserved 4
040h
098h
8-17
8-18
Compaq Analyze
Compaq Analyze is an error analysis and reporting tool for systems using the Alpha 21264
(EV6/EV67) processors with the Tsunami chipset. Compaq Analyze is intended as the successor
to the DECevent utility.
Compaq Analyze is designed to be used by:
System managers
Service engineers
Compaq Analyze runs on all operating systems supported by DS20E systems. Compaq Analyze
contains sets of rules (analysis rulesets) that are used to analyze errors in the event log based on
input from the FRU table. The rules contain knowledge about the possible causes of errors in the
system.
OS Diagnostics Overview
8-19
8-20
1. The system event log is the source of system event information. When one or more events
are logged, they are sent to the Decomposer for translation.
2. The Decomposer performs the bit-to-text translation of the events sent from the system
event log. The Director routes the event and data packets among the different services in
the Compaq Analyze system.
3. The Director sends the translated event from the Decomposer to the Analysis Engine. The
Analysis Engine can operate on multiple events at a time.
4. The Analysis Engine consults the ruleset database to see if one of the rules applies to the
event. Not all events will cause a rule to indicate that a problem has occurred. If a rule
indicates that an error has occurred, the Analysis Engine sends the analysis results to the
Director.
5. The Director sends the analysis results to the graphical user interface (if it is running) and
to the Notification Service.
OS Diagnostics Overview
When an error is detected, it is reported to the console with a series of Problem Found
statements.
8-21
8-22
OS Diagnostics Overview
8-23
Chapter
Introduction
As a service engineer, you often need to add components, upgrade the system, or remove and
replace faulty FRUs to restore the system to error-free operation. Becoming familiar with the
procedures for FRU removal and replacement can minimize the time you spend upgrading or
repairing the system.
In addition to precautions to follow before beginning any procedure, this chapter includes the
removal and replacement procedures for all major system FRUs (listed below) as well as
information on system cabling.
Side cover
PCI/ISA options
Storage subsystem
DIMMs
System board
Fans
Battery
Power supply
Speaker
9-3
Precautions
Before beginning FRU removal and replacement procedures, be sure to:
Put on your electrostatic discharge (ESD) wrist strap to avoid damaging any circuitry.
Side Cover
To remove the side cover from the system:
1.
2.
3.
4.
5.
6.
Slide the side cover back, and then out and away from the enclosure.
9-5
PCI/ISA Options
To remove a PCI/ISA option card:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug all external cables connected to the PCI/ISA option card.
6. Remove the screw securing the PCI/ISA option card to the rear of the chassis.
7. Gently pull the PCI option card out of the system board socket.
Reverse steps 1 through 7 to replace the PCI/ISA option card.
Storage Subsystem
To remove a storage subsystem:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Remove all hot plug hard drives.
6. Remove the four screws securing the cage to the enclosure.
7. Loosen the two captive screws on the power supply FCC door and swing the door open.
8. Slide the storage subsystem forward to gain access to the back of the subsystem.
9. Unplug the cables from the back of the storage subsystem.
10. Remove the four screws securing the storage subsystem to the chassis.
11. Pull the storage subsystem out and away from the chassis.
9-7
12. If necessary, remove the six screws securing the storage subsystem backplane to the
storage subsystem.
13. Pull the storage subsystem backplane away from the storage subsystem.
Reverse steps 1 through 11 to replace the storage subsystem.
9-9
CAT0040A
9-11
System Board
Removing the system board requires the removal of other FRUs. Review the removal
procedures for the items listed in steps 1 through 11 before you begin.
1. Record the configuration information.
2. Shut down the system and all peripheral devices.
3. Turn the system power off.
4. Unplug the power cord.
5. Remove the side cover.
6. Remove all PCI/ISA options.
7. Remove the CPU daughter cards.
8. Remove the CPU guide brackets.
9. Remove the CPU card guides.
10. Unplug all cables connected to the system board.
11. Remove the storage subsystem.
12. Pull all power cables out of the system board compartment.
13. Remove the screws securing the system board to the chassis.
14. Tip the system board to allow the serial and parallel port connectors to clear the opening.
15. Tip the board from the PCI end, and carefully slide it out and away from the chassis.
Reverse steps 1 through 15 to replace the system board.
DIMMs
To remove DIMMs, you may need to remove a CPU daughter card.
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Remove the CPU daughter card,
if necessary.
6. To release a DIMM from the system board, press down on the two latches (one at each
side of the DIMM).
7. Gently pull the DIMM out of the system board socket.
To install a replacement DIMM:
1. Orient the key notches, and then insert the DIMM straight into the socket.
2. Press down firmly until both retaining levers engage the DIMM.
3. Replace the side cover and power cord.
4. Turn on the power to the system.
NOTE: Follow the Memory Configuration Rules when installing DIMMs.
9-13
Battery
CAUTION: Take care not to bend the battery hold-down spring when removing or
replacing the battery. A bent spring could result in intermittent system problems due to
poor contact with the battery.
1.
2.
3.
4.
5.
6.
Gently pull out on the tab and then hold it open to release the system battery from the
chassis.
7.
Carefully slide the battery out and away from its holder.
8.
Fans
To remove a fan:
1. Loosen the captive screws securing the system fan to the rear of the chassis.
2. Pull the system fan out to unplug it from its power socket and pull it away from the
chassis.
Reverse steps 1 and 2 to replace the fan.
9-15
Speaker
To remove the speaker:
1. Shut down the system and all peripheral devices.
2. Turn the system power off.
3. Unplug the power cord.
4. Remove the side cover.
5. Unplug the speaker cable from the system board.
6. Gently route the speaker cable up and through the interior of the chassis.
7. With the speaker cable free, slide the speaker toward the front of the system and then pull
it away from the chassis.
Reverse steps 1 through 7 to replace the speaker.
Power Supply
To remove a power supply:
1. For a dual power supply configuration, complete the preparation procedures. If you have
an N+1 power configuration (three power supplies), you do not need to turn off the power
for a hot plug power supply replacement.
2. Loosen the thumbscrews securing the power supply grid, and remove the grid.
3. Loosen the thumbscrew on the power supply handle, and then pull it down to release it
from the power supply backplane.
4. Using the handle, pull the power supply from the system.
Reverse steps 1 through 4 to replace the power supply.
9-17
2.
3.
4.
5.
6.
Remove the screw securing the side cover interlock to the chassis.
7.
Pull the side cover interlock out and away from the chassis.
9-19
2.
3.
4.
5.
6.
Gently pry the server features module off the snap standoffs.
P00>>> e -b iic_rcm_nvram0:11
System Cabling
Data and power cables for the DS20E system include those attached to SCSI devices, IDE
devices, the floppy drive, the power supply, the front cover, and the server features module.
SFM2 to MLB
Speaker Cable
SFM2 to OCP
Side Cover Interlock Cable
IDE
Floppy
2-5-2 Part
Number
17-04909-01
17-03971-01
17-04971-01
17-04678-02
17-03970-04
Source
Destination
MLB J37
MLB J39
SFM2 J3
Interlock
MLB J45
MLB J43
SFM2 J5
Speaker
OCP J1
SFM2 J2
Removable Media Drive J1
Removable Media Drive J3
PS Cable
Connector
J3 P1
J5 P3
J1 P5
J2 P5
J8 P7
J4 P9
J7 P13
SFM2 J6, J7
Destination
MLB J3 P2
MLB J33 P4
CPU0 J1 P6
CPU1 J1 P6
SFM2 J1 P8
MLB J4 P10
Storage
Subsystem
Removable Media
Drive J2
SCSI Drive Board
J1 P11
J6 - Fan0, J7 Fan1
2-5-2
Part Number
17-04901-01
17-04902-01
17-04903-01
17-04903-01
17-04904-01
17-04907-01
17-04908-01
Description
3V Power
5V Power
CPU0 Power
CPU1 Power
SFM2 Power
3/5V Sense
Storage Power
RMD Power
SCSI Power
17-04905-01
Fan Power
9-21
Chapter
10
Introduction
Compaq Insight Manager (CIM) is a comprehensive management tool used to monitor and
control the operation of Compaq Alpha-based servers.
Topics in this chapter are:
Overview
10-2
Overview
Compaq Insight Manager consists of two components:
SQL database
Browser console
10-3
10-4
Notification
Control
Polling
The Agents consist of several sub-agents that report the health and status of various subcomponents of a managing device. Compaq Insight Manager XE can discover devices outside
the specified IP range due to an HTTP auto-discovery of a web-enabled agent on the network.
Management Tasks
Compaq Insight Manager XE can monitor and manage network devices that are running
Compaq Insight Agents. These agents provide real-time status information about the hardware
and software on each node. Insight Agents allow you to perform such functions as remote
shutdown and remote configuration.
Network devices that are unable to use Insight Agents do not report as many details. These
devices can be remotely monitored, but not remotely managed.
Security
By default when Compaq Insight Manager XE is initially installed, an administrator account is
created with a password of administrator. Change this immediately on the accounts page of the
Administer Insight Manager XE menu.
Event Manager
Event management is accomplished by creating categories for logical groupings of devices. The
groups are polled to check their status. This information is stored in the SQL database.
Three tasks are associated with managing events:
Notification
Control
Polling
After these tasks are configured, event information is collected and stored, and notifications are
sent as needed. Any control task, such as launching an application to virus-check the system,
becomes automated. The administrator sets the frequency and level of the event thresholds. For
example, if a disk reaches 80% capacity, the administrator might want an alert generated so that
the disk can either be purged or taken off-line.
Device Manager
Compaq Insight Manager XE can monitor and manage network devices that are running
Compaq Insight Agents. These agents provide real-time status information about the hardware
and software on each node. Insight Agents allow you to perform such functions as remote
shutdown and remote configuration.
Network devices that are unable to use Insight Agents do not report as many details. These
devices can be remotely monitored, but not remotely managed.
Operating System
Installed Services
Internet Browser
Relational Database
Requirement
All hardware must be on the Microsoft hardware compatibility list
96MB RAM with SQL Server running on a remote system
25MB for the master SQL database
100MB for the Insight Device database
200MB for the Insight Device log
Microsoft Windows NT Server 4.0 with Service Pack 3 or later
Compaq SSD 2.08 for Windows NT 4.0
TCP/IP, SNMP, IPX installed for management of IPX devices
Internet Explorer 4.01 with Service Pack 1 or later
Microsoft SQL Server 6.5 with Service Pack 4 or SQL 7.0
(can be installed on a separate server)
10-5
10-6
Index
A
Accessories, 3-10
Addressing considerations, 4-9
AlphaBIOS
running from a serial terminal, 5-25
starting, 5-22
AlphaBIOS console, 5-2
AlphaServer DS20E system type, 6-30
AlphaStation DS20E system type, 6-30
Architecture, 2-2
B
Battery
removing, 9-14
Bcache, 2-3
Bcache configuration, 4-5
Bcache interface, 2-5
Beep codes, 6-18
Bezel, attaching, 3-25
boot command, 5-19
Boot problems, 6-8
Booting OpenVMS, 3-29
Buses, 2-6
Buttons, on OCP, 1-7
C
Cable management arm, installing, 3-23
Cables, dressing, 3-24
Cabling
data and signal, 9-21
power, 9-21
system, 9-21
cache test, 6-17
cat command, 5-11
cat el command, 6-26
C-chip, 2-3, 2-8
Clearance, system, 3-3
Clock interface, 2-5
Common OS header, 8-6
Compaq Analyze, 8-18
components, 8-20
error report, 8-22
interface, 8-21
operation, 8-19
using with browser, 8-22
Compaq Insight Manager, 10-2
components, 10-4
functions, 10-3
SQL requirements, 10-5
web-enhanced agents, 10-6
Components, 1-4, 1-17
Configuration tracking, 2-19
Configuration Utility
running, 5-26
Connecting the system, 3-4
Conventions
keyboard, 5-24
Corrected error frame, 8-16
CPU data, 8-16
header, 8-16
system data, 8-17
CPU
features, 2-4
subsystem, 2-5
CPU daughter card
removing, 9-10
CPU fans, 2-17
CPU guide bracket
removing, 9-11
CPU modules, 1-16
CPU speed switch settings, 4-6
CPU SW1, 4-5
CPU SW2, 4-7
CPU to PCI address translation, 4-11
CPU voltage settings, 4-7
CPU, upgrading to EV67, 4-25
crash command, 5-21
Crash dump, 5-21
Crash dumps, 8-2
Cross-bar switch, 2-7
D
D-chips, 2-3, 2-7
DEC VET, 8-3
Index-2
E
edit command, 5-12, 5-17
Electrical specifications, 1-19
Environment variables
verifying, 3-28
Environment variables list, 5-8
Environmental error logout frames, 8-15
Environmental logic, 2-14
Environmental specifications, 1-20
Error classes, 8-7
Error logout frame, 8-6
Error state logging, 2-19
examine command, 5-12
Extended error log block, 8-7
F
Fail-safe booter utility, 6-13
Failure register, 7-13
Failures reported by OS, 6-10
Failures reported on console, 6-7
Fan fault interrupt, 4-22
Fans
LEDs on SFM2 module, 2-15
removing, 9-15
system, 2-16
Fault shutdown, 2-17
Faults, isolating, 6-2
Firmware
location, 5-2
troubleshooting with, 6-24
updates, 5-2
updating, 3-7
Firmware configuration, 4-23
Firmware version, verifying, 3-27
Flash bypass settings, 4-7
Flash ROM test, 6-18
Flash select settings, 4-7
FRUs
part numbers, 9-2
precautions, 9-3
removal and replacement, 9-1
Function register, 7-14
G
Graphics options, 4-17
H
halt command (RMC), 5-29
Halt interrupt, 4-21
haltin command (RMC), 5-29
haltout command (RMC), 5-30
Hang, at power-on, 6-10
help or ? command, 5-21
help or ? command (RMC), 5-30
I
I/O, 1-3
I/O subsystem, 2-10
I2C bus, 2-18
init command, 6-24, 6-25
Initializing system, 5-10
Installation
checklist, 3-2
verifying, 3-5
Interlock, 1-11
installing, 3-21
removing, 9-19
Interrupt configuration, 4-20
ISA bus, 4-18
ISA data path test, 6-18
ISA interface, 2-11
option slot, 2-12
super I/O chip, 2-11
ISA interrupt assignments, 4-22
ISA restrictions, 4-18
K
kill command, 6-29
kill_diags command, 6-29
L
LED codes, POST, 6-22
LEDs
CPU, 6-21
front panel, 6-19
on OCP, 1-7
server features module, 6-22
Linux
boot example, 3-32
booting, 3-31
installing, 3-31
Locking the system, 3-7
Logout frame field descriptions, 8-14
Logout frames, locating, 8-13
Loopback tests
commands for running, 6-29
ls command, 5-11
Index-3
M
Machine check logout frame, 8-10
CPU data, 8-11
header, 8-10
system data, 8-12
Machine checks, 8-5
correctable, 8-8
uncorrectable, 8-9
Maintenance bus, 2-18
Mechanical specifications, 1-18
Memory
configuration, 2-9
subsystem, 2-9
upgrading, 4-25
Memory, 1-3
Memory configuration rules, 4-8
Memory configurations, 2-10, 4-8
Memory DIMMs, 2-3
Memory problems, 6-10
Memory test, 6-17
more command, 5-11
more el command, 6-24, 6-26
Mounting brackets, attaching, 3-12
Mounting hardware, 3-11
N
Nvram script, editing, 5-17
O
OCP, 1-6
removing, 9-5
OCP display, 2-19
Online resources, 6-30
OpenVMS
booting from InfoServer, 3-30
booting from local CD, 3-29
installing, 3-29
shutting down, 3-6
Operating system
shutting down, 3-6
Operator control panel. See OCP
Option cards
removing, 9-6
Options, obtaining, 4-25
OS machine check handler, 8-5
P
Packaging, 1-3
PAL, functions, 2-16
PALcode, error checking, 8-5
P-chip, 2-8
P-chips, 2-3
PCI and ISA configuration, 4-17
PCI assignment tables, 4-18
PCI bus problems, 6-11
PCI data path test, 6-18
Q
quit command (RMC), 5-31
R
Rackmount
accessories, 3-10
documentation, 3-8
installation area, 3-9
Real-time clock interrupt, 4-21
Rear panel connections, 1-8
Registers
Cbox Read, 7-7
DC_STAT, 7-6
DIRn, 7-10
failure, 7-13
function, 7-14
I_STAT, 7-2
MISC, 7-8
MM_STAT, 7-5
PERROR, 7-11
Remote management console. See RMC
Removable media, 1-12
Removable media drive bay
removing, 9-9
reset command (RMC), 5-31
Resetting RMC to defaults, 5-35
Index-4
RMC, 5-28
resetting to defaults, 5-35
setting up, 5-28
troubleshooting, 5-36
RMC commands, 5-29
RMC microprocessor, 2-17
RMC switchpack, 5-33
changing a setting, 5-34
S
Scatter/gather mapping, 4-15
Scripts, 6-28
commands for running, 6-29
SCSI cable length, 4-16
SCSI configuration, 4-16
SCSI controllers, 4-18
SCSI IDs, 4-16
SCSI problems, 6-12
SCSI termination, 4-16
SCU-SCSI CAM utility, 8-2
Server features module, 1-13
removing, 9-20
set command, 5-7
setesc command (RMC), 5-31
SFM2, 1-13
30-second shutdown, 2-17
CPU fans sense logic, 2-17
inverter, 2-16
logic, 2-14
power supplies, 2-15
RMC microprocessor, 2-17
status LEDs, 2-15
system fans sense logic, 2-16
temperature sensor, 2-17
SFM2 PAL, 2-16
show command, 5-7
show config command, 5-4
show cpu command, 5-5
show device command, 5-6
show hwrpb command, 8-13
show memory command, 5-6
show pal command, 5-6
show power command, 5-7
show version command, 5-7
show_status command, 6-27
showit command, 6-27
Shutting down, 3-6
Side cover, removing, 9-4
Slide brackets
attaching to rails, 3-16
attaching to slides, 3-14
Speaker, 1-16
removing, 9-16
SRM commands
for configuring system, 4-24
SRM console, 5-2
invoking, 5-3
startup sequence, 5-3
SROM flash select, 4-6
T
Temperature
LEDs on SFM2 module, 2-15
Temperature threshold, 2-17
Termination block, 8-7
test command, 6-24, 6-25
Tests, terminating, 6-29
Thermal problems, 6-10
Third-party devices, adding, 4-26
TIG bus interrupt assignments, 4-21
TIG interface, 2-12
CSR registers and switchpack, 2-13
flash ROM, 2-13
IRQs, 2-13
TIG interrupt processing, 4-22
Tools and utilities, 6-13
Troubleshooting
considerations, 6-2
strategy, 6-3
Troubleshooting RMC, 5-36
Tru64 UNIX
booting, 3-27
shutting down, 3-6
starting installation, 3-26
Index-5
U
Updating device drivers, 4-26
Updating firmware, 3-7, 4-26
Upgrading CPU to EV67, 4-25
V
VGA option configuration, 4-17