Você está na página 1de 360

Hardware Platform Monitoring Guide

NetApp, Inc. 495 East Java Drive Sunnyvale, CA 94089 USA Telephone: +1 (408) 822-6000 Fax: +1 (408) 822-4501 Support telephone: +1 (888) 4-NETAPP Documentation comments: doccomments@netapp.com Information Web: www.netapp.com Part number: 215-06474_A0 November 2011

Table of Contents | 3

Contents
Sources of troubleshooting information ................................................... 27
Where LEDs appear .................................................................................................. 27 Where messages are displayed .................................................................................. 27 How AutoSupport e-mail messages help with troubleshooting ................................ 28 Forms and use of diagnostic tools ............................................................................. 28 Where to find documentation .................................................................................... 29

System LEDs ............................................................................................... 33


FAS20xx and SA200 system LEDs .......................................................................... 33 Location and meaning of LEDs on the front of FAS20xx and SA200 chassis ...................................................................................................... 33 Location and meaning of LEDs on the back of FAS20xx and SA200 controller modules ................................................................................... 35 Location and meaning of FAS20xx and SA200 PSU LEDs ......................... 37 2240 system LEDs .................................................................................................... 38 Location and meaning of LEDs on the front of 2240 systems ...................... 39 Location and meaning of LEDs on the back of 2240 controllers .................. 40 Location and meaning of 2240 PSU LEDs ................................................... 43 Location and meaning of 2240 internal FRU LEDs ..................................... 45 30xx and SA300 and C2300 and C3300 NetCache system LEDs ............................ 45 Location and meaning of LEDs on the front of 30xx, SA300, and C2300 and C3300 NetCache controllers controllers ........................................... 45 Location and meaning of LEDs on the back of 30xx, SA300, and C2300 and C3300 NetCache controllers ............................................................. 47 Location and meaning of 30xx, SA300, and C2300 and C3300 NetCache PSU LEDs ............................................................................... 48 31xx system LEDs .................................................................................................... 49 Location and meaning of LEDs on the front of 31xx chassis ....................... 49 Location and meaning of LEDs on the back of 31xx controllers .................. 51 Location and meaning of 31xx fan LEDs ..................................................... 52 Location and meaning of 31xx PSU LEDs ................................................... 53 Location and meaning of 31xx FRU LEDs ................................................... 54 32xx and SA320 system LEDs .................................................................................. 55

4 | Platform Monitoring Guide Location and meaning of LEDs on the front of 32xx and SA320 chassis .... 55 Location and meaning of LEDs on the back of 32xx and SA320 controllers ................................................................................................ 56 Location and meaning of the LED on the back of 32xx and SA320 I/O expansion modules .................................................................................. 59 Location and meaning of 32xx and SA320 fan LEDs .................................. 60 Location and meaning of 32xx and SA320 PSU LEDs ................................ 60 Location and meaning of 32xx and SA320 internal FRU LEDs ................... 61 60xx and SA600 system LEDs .................................................................................. 62 Location and meaning of LEDs on the front of 60xx and SA600 controllers ................................................................................................ 62 Location and meaning of LEDs on the back of 60xx and SA600 controllers ................................................................................................ 63 Location and meaning of 60xx and SA600 fan LEDs .................................. 64 Location and meaning of 60xx and SA600 PSU LEDs ................................ 65 62xx and SA620 system LEDs .................................................................................. 66 Location and meaning of LEDs on the front of 62xx and SA620 chassis .... 66 Location and meaning of LEDs on the back of 62xx and SA620 controllers ................................................................................................ 68 Location and meaning of the 62xx and SA620 I/O expansion module LED ......................................................................................................... 72 Location and meaning of 62xx and SA620 fan LEDs .................................. 73 Location and meaning of 62xx and SA620 PSU LEDs ................................ 73 Location and meaning of 62xx and SA620 internal FRU LEDs ................... 74 HBA LEDs ................................................................................................................ 75 Location and meaning dual-port Fibre Channel HBA LEDs ........................ 75 Location and meaning of dual-port, 4-Gb or 8-Gb, target-mode Fibre Channel HBA LEDs ................................................................................ 76 Location and meaning of dual-port, 8-Gb Fibre Channel Virtual Interface HBA LEDs ............................................................................... 78 Location and meaning of quad-port, 4-Gb, Fibre Channel HBA LEDs: four-LED version ..................................................................................... 79 Location and meaning of quad-port, 4-Gb, Fibre Channel HBA LEDs: 12-LED version ....................................................................................... 81 Location and meaning of fiber-optic iSCSI target HBA LEDs .................... 82 Location and meaning of copper iSCSI target HBA LEDs .......................... 83

Table of Contents | 5 Location and meaning of dual-port, 10-Gb, FCoE unified target HBA LEDs ........................................................................................................ 85 Location of dual-port, 3-Gb SAS HBA ports ................................................ 87 Location of quad-port, 3-Gb SAS HBA ports ............................................... 87 MetroCluster adapter LEDs ...................................................................................... 88 Location and meaning of dual-port, 2-Gb VI-MetroCluster adapter LEDs ........................................................................................................ 88 Location and meaning of dual-port, 4-Gb MetroCluster adapter LEDs ....... 90 Location and meaning of dual-port, 8-Gb MetroCluster adapter LEDs ....... 91 GbE NIC LEDs ......................................................................................................... 93 Location and meaning of single-port GbE NIC LEDs .................................. 93 Location and meaning of single-port, 10-GbE NIC LEDs (FAS2050 systems only) ........................................................................................... 95 Location and meaning of LEDs on the dual-port 10-GbE NIC that supports fiber optic cables with SFP+ modules or copper SFP+ cables ....................................................................................................... 96 Location and meaning of LEDs on the dual-port 10-GbE NIC that supports fiber optic cables with X6569 SFP+ modules or copper SFP + cables .................................................................................................... 97 Location and meaning of multiport GbE NIC LEDs .................................... 99 TOE NIC LEDs ....................................................................................................... 101 Location and meaning of single-port TOE NIC LEDs ............................... 101 Location and meaning of dual-port, 10GBase-SR TOE NIC LEDs ........... 103 Location and meaning of dual-port, 10GBase-CX4 TOE NIC LEDs ......... 104 Location and meaning of quad-port TOE NIC LEDs ................................. 105 NVRAM adapter LEDs ........................................................................................... 106 Location and meaning of NVRAM5 and NVRAM6 LEDs ........................ 107 Location and meaning of NVRAM7 LEDs ................................................. 108 Location and meaning of NVRAM5 and NVRAM6 media converter LEDs ...................................................................................................... 109 Location and meaning of NVRAM8 LEDs ................................................. 109 Flash Cache module and PAM LEDs ..................................................................... 115 Location and meaning of PAM LEDs ......................................................... 115 Location and meaning of Flash Cache module LEDs ................................. 115

Startup messages ...................................................................................... 117


POST messages ....................................................................................................... 117

6 | Platform Monitoring Guide Boot messages ......................................................................................................... 118 FAS20xx and SA200 startup progress .................................................................... 118 Method of viewing progress on the console ................................................ 118 Method of viewing progress through the BIOS Status sensor .................... 119 3020 and 3050 system and C2300 and C3300 NetCache appliance POST error messages ............................................................................................................ 120 Abort AutobootPOST Failure(s): CPU ..................................................... 120 Abort AutobootPOST Failure(s): MEMORY ........................................... 121 Abort AutobootPOST Failure(s): RTC, RTC_IO ..................................... 121 Abort AutobootPOST Failure(s): UCODE ............................................... 121 Autoboot of backup image aborted ............................................................. 121 Autoboot of backup image failed ................................................................ 122 Autoboot of primary image aborted ............................................................ 122 Autoboot of primary image failed ............................................................... 122 Invalid FRU EEPROM Checksum .............................................................. 123 Memory init failure ..................................................................................... 123 No Memory found ....................................................................................... 123 Unsupported system bus speed ................................................................... 124 3040, 3070, 31xx, 60xx, SA300, and SA600 system POST error messages .......... 124 0200: Failure Fixed Disk ............................................................................. 124 0230: System RAM Failed at offset: ........................................................... 125 0231: Shadow RAM failed at offset ............................................................ 125 0232: Extended RAM failed at address line ................................................ 125 0235: Multiple-bit ECC error occurred ....................................................... 126 023C: Bad DIMM found in slot # ............................................................... 126 023E: Node Memory Interleaving disabled ................................................ 127 0241: Agent Read Timeout ......................................................................... 127 0242: Invalid FRU information ................................................................... 128 0250: System battery is dead ....................................................................... 128 0251: System CMOS checksum bad ........................................................... 128 0253: Clear CMOS jumper detected ........................................................... 129 0260: System timer error ............................................................................. 129 0280: Previous boot incomplete .................................................................. 129 02C2: No valid Boot Loader in System FlashNon Fatal ........................... 129 02C3: No valid Boot Loader in System FlashFatal ................................... 130 02F9: FGPA jumper detected ...................................................................... 130

Table of Contents | 7 02FA: Watchdog Timer Reboot (PciInit) .................................................... 131 02FB: Watchdog Timer Reboot (MemTest) ............................................... 131 02FC: LDTStop Reboot (HTLinkInit) ........................................................ 131 No message on console ............................................................................... 132 2240, 32xx, 62xx, SA320, and SA620 system POST error messages .................... 132 0200: Failure Fixed Disk ............................................................................. 132 0230: System RAM Failed at offset: ........................................................... 133 0231: Shadow RAM Failed at offset: .......................................................... 133 0232: Extended RAM Failed at address line: .............................................. 133 BIOS detected uncorrectable ECC error in DIMM slot: ............................. 133 No message on the console ......................................................................... 133 BIOS detected errors or invalid configuration in DIMM slot: .................... 134 BIOS detected unknown errors in DIMM slot: ........................................... 134 023A: ONTAP Detected Bad DIMM in slot: .............................................. 134 023B: BIOS detected SPD checksum error in DIMM slot: ........................ 134 BIOS detected pattern write/read mismatch in DIMM slot: ....................... 134 0241: SMBus Read Timeout ....................................................................... 135 0242: Invalid FRU information ................................................................... 135 0250: System battery is dead - Replace and run SETUP ............................ 135 0251: System CMOS checksum bad ........................................................... 135 0260: System timer error ............................................................................. 135 0271: Check date and time settings ............................................................. 136 0280: Previous boot incomplete - Default configuration used .................... 136 02A1: SP Not Found ................................................................................... 136 02A2: BMC System Error Log (SEL) Full ................................................. 136 02A3: No Response From SP To FRU ID Read Request ........................... 137 SP FRU Entry is Blank or Checksum Error ................................................ 137 No Response to Controller FRU ID Read Request via IPMI ...................... 137 No Response to Midplane FRU ID Read Request via IPMI ....................... 137 02C2: No valid Boot Loader in System Flash - Non Fatal ......................... 137 02C3: No valid Boot Loader in System Flash - Fatal ................................. 138 Fatal Error: No DIMM detected and system can not continue boot! .......... 138 Fatal Error! All channels are disabled! ....................................................... 139 Software memory test failed! ...................................................................... 139 Fatal Error! RDIMMs and UDIMMs are mixed! ........................................ 139 Fatal Error! UDIMM in 3rd slot is not supported! ...................................... 139

8 | Platform Monitoring Guide Fatal Error! All DIMM failed and system can not continue boot! .............. 140 C1300 NetCache appliance POST error messages ................................................. 140 8042-gate A20 failure .................................................................................. 140 A: drive failure ............................................................................................ 140 B: drive failure ............................................................................................ 141 base 64KB memory failure ......................................................................... 141 Boot failure .................................................................................................. 141 BootSector write!! ....................................................................................... 142 Cache error/external cache bad ................................................................... 142 Checking NVRAM...update failed .............................................................. 142 CMOS battery low ...................................................................................... 142 CMOS checksum bad .................................................................................. 143 CMOS date/time not set .............................................................................. 143 CMOS settings wrong ................................................................................. 143 CMOS shutdown register read/write error .................................................. 143 display memory read/write error ................................................................. 144 DMA-2 error ............................................................................................... 144 DMA controller error .................................................................................. 144 Drive not ready ............................................................................................ 145 Gate20 error ................................................................................................. 145 Insert BOOT diskette in A .......................................................................... 145 Interrupt controller-N error ......................................................................... 145 Invalid boot diskette .................................................................................... 146 Keyboard error ............................................................................................ 146 Keyboard/interface error ............................................................................. 146 Microcode error ........................................................................................... 147 Multi-bit ECC error ..................................................................................... 147 NVRAM bad ............................................................................................... 147 NVRAM checksum bad .............................................................................. 147 NVRAM cleared ......................................................................................... 148 NVRAM ignored ......................................................................................... 148 parity error (beep code) ............................................................................... 148 Parity error (no beep code) .......................................................................... 149 PCI I/O conflict ........................................................................................... 149 PCI IRQ conflict .......................................................................................... 149 PCI IRQ routing table error ......................................................................... 149

Table of Contents | 9 PCI ROM conflict ....................................................................................... 150 processor error ............................................................................................. 150 processor exception interrupt error ............................................................. 150 Reboot and select proper boot device ... ..................................................... 151 refresh failure .............................................................................................. 151 Resource conflict ......................................................................................... 151 ROM checksum error .................................................................................. 152 Static resource conflict ................................................................................ 152 System halted .............................................................................................. 152 Timer error .................................................................................................. 152 timer not operational ................................................................................... 153 VIRUS: continue (y/n) ................................................................................ 153 X hard disk error ......................................................................................... 153 Boot error messages ................................................................................................ 154 Boot device err ............................................................................................ 154 Cannot initialize labels ................................................................................ 154 Cannot read labels ....................................................................................... 154 Configuration exceeds max PCI space ........................................................ 154 DIMM slot # has correctable ECC errors .................................................... 155 Dirty shutdown in degraded mode .............................................................. 155 Disk label processing failed ........................................................................ 155 Drive %s.%d not supported ......................................................................... 155 Error detection detected too many errors to analyze at once ...................... 156 FC-AL loop down, adapter %d ................................................................... 156 File system may be scrambled .................................................................... 156 Halted disk firmware too old ....................................................................... 157 Halted: Illegal configuration ....................................................................... 157 Invalid PCI card slot %d ............................................................................. 157 No /etc/rc ..................................................................................................... 157 No disk controllers ...................................................................................... 158 No disks ....................................................................................................... 158 No /etc/rc, running setup ............................................................................. 158 No network interfaces ................................................................................. 158 No NVRAM present .................................................................................... 159 NVRAM #n downrev .................................................................................. 159 NVRAM: wrong pci slot ............................................................................. 159

10 | Platform Monitoring Guide Panic: DIMM slot #n has uncorrectable ECC errors ................................... 159 This platform is not supported on this release ............................................. 159 Too many errors in too short time ............................................................... 160 Warning: Motherboard Revision not available ........................................... 160 Warning: Motherboard Serial Number not available .................................. 160 Warning: system serial number is not available .......................................... 160 Watchdog error ............................................................................................ 160 Watchdog failed .......................................................................................... 161

EMS and operational messages ............................................................... 163


Environmental EMS messages ................................................................................ 163 Chassis fan FRU failed ................................................................................ 163 Chassis over temperature on XXXX ........................................................... 164 Chassis over temperature shutdown on XXXX .......................................... 164 Chassis Power Degraded: 3.3V in warn high state ..................................... 164 Chassis power degraded: PS# ..................................................................... 165 Chassis Power Fail: PS# .............................................................................. 165 Chassis Power Shutdown ............................................................................ 165 Chassis power shutdown: 3.3V in warn low state ....................................... 166 Chassis Power Supply: PS# removed .......................................................... 166 Chassis power supply degraded: PS# .......................................................... 167 Chassis power supply fail: PS# ................................................................... 167 Chassis power supply off: PS# .................................................................... 167 Chassis power supply off: PS# .................................................................... 168 Chassis power supply OK: PS# ................................................................... 168 Chassis power supply removed: PS# .......................................................... 168 Chassis under temperature on XXXX ......................................................... 169 Chassis under temperature shutdown on XXXX ........................................ 169 Fan: # is spinning below tolerable speed .................................................... 169 monitor.chassisFan.degraded ...................................................................... 170 monitor.chassisFan.ok ................................................................................. 170 monitor.chassisFan.removed ....................................................................... 170 monitor.chassisFan.slow ............................................................................. 170 monitor.chassisFan.stop .............................................................................. 171 monitor.chassisFan.warning ........................................................................ 171 monitor.chassisFanFail.xMinShutdown ...................................................... 171 monitor.chassisPower.degraded .................................................................. 171

Table of Contents | 11 monitor.chassisPower.ok ............................................................................. 172 monitor.chassisPowerSupplies.ok ............................................................... 172 monitor.chassisPowerSupply.degraded ....................................................... 172 monitor.chassisPowerSupply.notPresent .................................................... 172 monitor.chassisPowerSupply.off ................................................................. 173 monitor.chassisPowerSupply.ok ................................................................. 173 monitor.chassisTemperature.cool ................................................................ 173 monitor.chassisTemperature.ok .................................................................. 173 monitor.chassisTemperature.warm ............................................................. 173 monitor.cpuFan.degraded ............................................................................ 174 monitor.cpuFan.failed ................................................................................. 174 monitor.cpuFan.ok ...................................................................................... 174 monitor.ioexpansionPower.degraded .......................................................... 175 monitor.ioexpansionPower.ok ..................................................................... 175 monitor.ioexpansionTemperature.cool ........................................................ 175 monitor.ioexpansionTemperature.ok .......................................................... 175 monitor.ioexpansionTemperature.warm ..................................................... 176 monitor.ioexpansion.unpresent ................................................................... 176 monitor.nvmembattery.warninglow ............................................................ 176 monitor.nvramLowBattery .......................................................................... 176 monitor.power.unreadable ........................................................................... 177 monitor.shutdown.cancel ............................................................................ 177 monitor.shutdown.cancel.nvramLowBattery .............................................. 177 monitor.shutdown.chassisOverTemp .......................................................... 177 monitor.shutdown.chassisUnderTemp ........................................................ 178 monitor.shutdown.emergency ..................................................................... 178 monitor.shutdown.ioexpansionOverTemp .................................................. 178 monitor.shutdown.chassisUnderTemp ........................................................ 178 monitor.shutdown.nvramLowBattery.pending ........................................... 179 monitor.temp.unreadable ............................................................................. 179 Multiple chassis fans have failed ................................................................ 179 Multiple fan failure on XXXX .................................................................... 180 Multiple power supply fans failed ............................................................... 180 nvmem.battery.capacity.low ....................................................................... 180 nvmem.battery.capacity.low.warn .............................................................. 181 nvmem.battery.capacity.normal .................................................................. 181

12 | Platform Monitoring Guide nvmem.battery.current.high ........................................................................ 181 nvmem.battery.current.high.warn ............................................................... 181 nvmem.battery.sensor.unreadable ............................................................... 182 nvmem.battery.temp.high ............................................................................ 182 nvmem.battery.temp.low ............................................................................. 182 nvmem.battery.temp.normal ....................................................................... 183 nvmem.battery.voltage.high ........................................................................ 183 nvmem.battery.voltage.high.warn ............................................................... 183 nvmem.battery.voltage.normal .................................................................... 183 nvmem.voltage.high .................................................................................... 184 nvmem.voltage.high.warn ........................................................................... 184 nvmem.voltage.normal ................................................................................ 184 nvram.bat.missing.error ............................................................................... 184 nvram.battery.capacity.low ......................................................................... 185 nvram.battery.capacity.low.critical ............................................................. 185 nvram.battery.capacity.low.warn ................................................................ 185 nvram.battery.capacity.normal .................................................................... 185 nvram.battery.charging.nocharge ................................................................ 186 nvram.battery.charging.normal ................................................................... 186 nvram.battery.charging.wrongcharge .......................................................... 186 nvram.battery.current.high .......................................................................... 186 nvram.battery.current.high.warn ................................................................. 187 nvram.battery.current.low ........................................................................... 187 nvram.battery.current.low.warn .................................................................. 187 nvram.battery.current.normal ...................................................................... 188 nvram.battery.end_of_life.high ................................................................... 188 nvram.battery.end_of_life.normal ............................................................... 188 nvram.battery.fault ...................................................................................... 188 nvram.battery.fault.warn ............................................................................. 189 nvram.battery.fcc.low .................................................................................. 189 nvram.battery.fcc.low.critical ...................................................................... 189 nvram.battery.fcc.low.warn ......................................................................... 189 nvram.battery.fcc.normal ............................................................................ 190 nvram.battery.power.fault ........................................................................... 190 nvram.battery.power.normal ....................................................................... 190 nvram.battery.sensor.unreadable ................................................................. 190

Table of Contents | 13 nvram.battery.temp.high ............................................................................. 191 nvram.battery.temp.high.warn .................................................................... 191 nvram.battery.temp.low ............................................................................... 191 nvram.battery.temp.low.warn ...................................................................... 192 nvram.battery.temp.normal ......................................................................... 192 nvram.battery.voltage.high .......................................................................... 192 nvram.battery.voltage.high.warn ................................................................. 192 nvram.battery.voltage.low ........................................................................... 193 nvram.battery.voltage.low.warn .................................................................. 193 nvram.battery.voltage.normal ..................................................................... 193 nvram.hw.initFail ........................................................................................ 193 SAS EMS messages ................................................................................................ 194 ds.sas.config.warning .................................................................................. 194 ds.sas.crc.err ................................................................................................ 194 ds.sas.drivephy.disableErr ........................................................................... 194 ds.sas.element.fault ..................................................................................... 195 ds.sas.element.xport.error ............................................................................ 195 ds.sas.hostphy.disableErr ............................................................................ 196 ds.sas.invalid.word ...................................................................................... 196 ds.sas.loss.dword ......................................................................................... 196 ds.sas.multPhys.disableErr .......................................................................... 197 ds.sas.phyRstProb ........................................................................................ 197 ds.sas.running.disparity ............................................................................... 197 ds.sas.ses.disableErr .................................................................................... 198 ds.sas.xfer.element.fault .............................................................................. 198 ds.sas.xfer.export.error ................................................................................ 198 ds.sas.xfer.not.sent ...................................................................................... 199 ds.sas.xfer.unknown.error ........................................................................... 199 sas.adapter.bad ............................................................................................ 200 sas.adapter.bootarg.option ........................................................................... 200 sas.adapter.debug ........................................................................................ 200 sas.adapter.exception ................................................................................... 200 sas.adapter.failed ......................................................................................... 201 sas.adapter.firmware.download ................................................................... 201 sas.adapter.firmware.fault ........................................................................... 201 sas.adapter.firmware.update.failed .............................................................. 201

14 | Platform Monitoring Guide sas.adapter.not.ready ................................................................................... 202 sas.adapter.offline ........................................................................................ 202 sas.adapter.offlining .................................................................................... 202 sas.adapter.online ........................................................................................ 203 sas.adapter.online.failed .............................................................................. 203 sas.adapter.onlining ..................................................................................... 203 sas.adapter.reset ........................................................................................... 203 sas.adapter.unexpected.status ...................................................................... 204 sas.cable.error .............................................................................................. 204 sas.cable.pulled ............................................................................................ 204 sas.cable.pushed .......................................................................................... 204 sas.config.mixed.detected ........................................................................... 205 sas.device.invalid.wwn ................................................................................ 205 sas.device.quiesce ........................................................................................ 205 sas.device.resetting ...................................................................................... 206 sas.device.timeout ....................................................................................... 206 sas.initialization.failed ................................................................................. 207 sas.link.error ................................................................................................ 207 sas.port.disabled .......................................................................................... 207 sas.port.down ............................................................................................... 207 sas.shelf.conflict .......................................................................................... 208 sasmon.adapter.phy.disable ......................................................................... 208 sasmon.adapter.phy.event ........................................................................... 209 sasmon.disable.module ................................................................................ 209 SES EMS messages ................................................................................................. 209 ses.access.noEnclServ ................................................................................. 209 ses.access.noMoreValidPaths ...................................................................... 210 ses.access.noShelfSES ................................................................................ 211 ses.access.sesUnavailable ............................................................................ 211 ses.badShareStorageConfigErr .................................................................... 212 ses.bridge.fw.getFailWarn ........................................................................... 212 ses.bridge.fw.mmErr ................................................................................... 212 ses.channel.rescanInitiated .......................................................................... 213 ses.disk.pctl.timeout .................................................................................... 213 ses.config.drivePopError ............................................................................. 213 ses.config.IllegalEsh270 .............................................................................. 213

Table of Contents | 15 ses.config.shelfMixError ............................................................................. 214 ses.config.shelfPopError ............................................................................. 214 ses.disk.configOk ........................................................................................ 214 ses.disk.illegalConfigWarn ......................................................................... 214 ses.disk.pctl.timeout .................................................................................... 215 ses.download.powerCyclingChannel .......................................................... 215 ses.download.shelfToReboot ...................................................................... 215 ses.download.suspendIOForPowerCycle .................................................... 215 ses.drive.PossShelfAddr .............................................................................. 216 ses.drive.shelfAddr.mm ............................................................................... 216 ses.exceptionShelfLog ................................................................................. 217 ses.extendedShelfLog .................................................................................. 217 ses.fw.emptyFile .......................................................................................... 218 ses.fw.resourceNotAvailable ....................................................................... 218 ses.giveback.restartAfter ............................................................................. 218 ses.giveback.wait ......................................................................................... 218 ses.psu.coolingReqError .............................................................................. 219 ses.psu.powerReqError ................................................................................ 219 ses.remote.configPageError ........................................................................ 219 ses.remote.elemDescPageError ................................................................... 220 ses.remote.faultLedError ............................................................................. 220 ses.remote.flashLedError ............................................................................ 220 ses.remote.shelfListError ............................................................................ 220 ses.remote.statPageError ............................................................................. 220 ses.shelf.changedID ..................................................................................... 221 ses.shelf.ctrlFailErr ...................................................................................... 221 ses.shelf.em.ctrlFailErr ................................................................................ 222 ses.shelf.IdBasedAddr ................................................................................. 222 ses.shelf.invalNum ...................................................................................... 222 ses.shelf.mmErr ........................................................................................... 223 ses.shelf.OSmmErr ...................................................................................... 223 ses.shelf.powercycle.done ........................................................................... 223 ses.shelf.powercycle.start ............................................................................ 223 ses.shelf.sameNumReassign ........................................................................ 224 ses.shelf.unsupportAllowErr ....................................................................... 224 ses.shelf.unsupportedErr ............................................................................. 224

16 | Platform Monitoring Guide ses.startTempOwnership ............................................................................. 225 ses.status.ATFCXError ............................................................................... 225 ses.status.ATFCXInfo ................................................................................. 225 ses.status.currentError ................................................................................. 225 ses.status.currentInfo ................................................................................... 226 ses.status.currentWarning ............................................................................ 226 ses.status.displayError ................................................................................. 226 ses.status.displayInfo ................................................................................... 227 ses.status.displayWarning ........................................................................... 227 ses.status.driveError .................................................................................... 227 ses.status.driveOk ........................................................................................ 228 ses.status.driveWarning ............................................................................... 228 ses.status.electronicsError ........................................................................... 228 ses.status.electronicsInfo ............................................................................. 229 ses.status.electronicsWarn ........................................................................... 229 ses.status.ESHPctlStatus ............................................................................. 229 ses.status.fanError ....................................................................................... 229 ses.status.fanInfo ......................................................................................... 230 ses.status.fanWarning .................................................................................. 230 ses.status.ModuleError ................................................................................ 230 ses.status.ModuleInfo .................................................................................. 230 ses.status.ModuleWarn ................................................................................ 231 ses.status.psError ......................................................................................... 231 ses.status.psInfo ........................................................................................... 231 ses.status.psWarning ................................................................................... 232 ses.status.temperatureError ......................................................................... 232 ses.status.temperatureInfo ........................................................................... 233 ses.status.temperatureWarning .................................................................... 233 ses.status.upsError ....................................................................................... 233 ses.status.upsInfo ......................................................................................... 234 ses.status.volError ....................................................................................... 234 ses.status.volWarning .................................................................................. 234 ses.system.em.mmErr .................................................................................. 235 ses.tempOwnershipDone ............................................................................. 235 sfu.adapterSuspendIO ................................................................................. 235 sfu.auto.update.off.impact ........................................................................... 235

Table of Contents | 17 sfu.ctrllerElmntsPerShelf ............................................................................ 236 sfu.downloadCtrllerBridge .......................................................................... 236 sfu.downloadError ....................................................................................... 236 sfu.downloadingController .......................................................................... 236 sfu.downloadingCtrllerR1XX ..................................................................... 237 sfu.downloadStarted .................................................................................... 237 sfu.downloadSuccess ................................................................................... 237 sfu.downloadSummary ................................................................................ 237 sfu.downloadSummaryErrors ...................................................................... 237 sfu.FCDownloadFailed ............................................................................... 238 sfu.firmwareDownrev ................................................................................. 238 sfu.firmwareUpToDate ............................................................................... 238 sfu.partnerInaccessible ................................................................................ 239 sfu.partnerNotResponding ........................................................................... 239 sfu.partnerRefusedUpdate ........................................................................... 239 sfu.partnerUpdateComplete ......................................................................... 239 sfu.partnerUpdateTimeout ........................................................................... 240 sfu.rebootRequest ........................................................................................ 240 sfu.rebootRequestFailure ............................................................................. 240 sfu.resumeDiskIO ........................................................................................ 240 sfu.SASDownloadFailed ............................................................................. 241 sfu.statusCheckFailure ................................................................................ 241 sfu.suspendDiskIO ...................................................................................... 241 sfu.suspendSES ........................................................................................... 241 Flash Cache module and PAM module EMS messages ......................................... 242 extCache.io.BlockChecksumError .............................................................. 242 extCache.io.cardError .................................................................................. 242 extCache.io.readError .................................................................................. 242 extCache.io.writeError ................................................................................ 243 extCache.offline .......................................................................................... 243 extCache.ReconfigComplete ....................................................................... 243 extCache.ReconfigFailed ............................................................................ 243 extCache.ReconfigStart ............................................................................... 244 extCache.UECCerror ................................................................................... 244 extCache.UECCmax ................................................................................... 244 fal.chan.offline.comp ................................................................................... 245

18 | Platform Monitoring Guide fal.chan.online.erase.warn ........................................................................... 245 fal.chan.online.fail ....................................................................................... 245 fal.chan.online.read.warn ............................................................................ 245 fal.chan.online.rep.fail ................................................................................. 246 fal.chan.online.rep.part ................................................................................ 246 fal.chan.online.rep.succ ............................................................................... 246 fal.chan.online.rep.ver.err ........................................................................... 246 fal.chan.online.write.warn ........................................................................... 247 fal.init.failed ................................................................................................ 247 fmm.bad.block.detected .............................................................................. 247 fmm.device.stats.missing ............................................................................ 247 fmm.domain.card.failure ............................................................................. 248 fmm.domain.core.failure ............................................................................. 248 fmm.hourly.device.report ............................................................................ 248 fmm.threshold.bank.degraded ..................................................................... 248 fmm.threshold.bank.offline ......................................................................... 249 fmm.threshold.card.degraded ...................................................................... 249 fmm.threshold.card.failure .......................................................................... 249 fmm.threshold.core.offline .......................................................................... 249 iomem.bbm.bbtl.overflow ........................................................................... 250 iomem.bbm.init.failed ................................................................................. 250 iomem.bbm.new.flash ................................................................................. 250 iomem.card.disable ...................................................................................... 250 iomem.card.enable ...................................................................................... 251 iomem.card.fail.cecc ................................................................................... 251 iomem.card.fail.data.crc .............................................................................. 251 iomem.card.fail.desc.crc .............................................................................. 251 iomem.card.fail.dimm ................................................................................. 252 iomem.card.fail.firmware.primary .............................................................. 252 iomem.card.fail.fpga ................................................................................... 252 iomem.card.fail.fpga.primary ...................................................................... 253 iomem.card.fail.fpga.rev ............................................................................. 253 iomem.card.fail.internal .............................................................................. 254 iomem.card.fail.pci ...................................................................................... 254 iomem.card.fail.uecc ................................................................................... 254 iomem.dimm.log.checksum ........................................................................ 255

Table of Contents | 19 iomem.dimm.log.init ................................................................................... 255 iomem.dimm.log.read ................................................................................. 255 iomem.dimm.log.sync ................................................................................. 255 iomem.dimm.log.write ................................................................................ 256 iomem.dimm.mismatch.banks ..................................................................... 256 iomem.dimm.mismatch.burst ...................................................................... 256 iomem.dimm.mismatch.casLatency ............................................................ 256 iomem.dimm.mismatch.columns ................................................................ 257 iomem.dimm.mismatch.dataWidth ............................................................. 257 iomem.dimm.mismatch.eccWidth ............................................................... 257 iomem.dimm.mismatch.ranks ..................................................................... 257 iomem.dimm.mismatch.rows ...................................................................... 258 iomem.dimm.mismatch.vendor ................................................................... 258 iomem.dimm.spd.banks ............................................................................... 258 iomem.dimm.spd.burst ................................................................................ 258 iomem.dimm.spd.casLatency ...................................................................... 259 iomem.dimm.spd.checksum ........................................................................ 259 iomem.dimm.spd.columns .......................................................................... 259 iomem.dimm.spd.dataWidth ....................................................................... 259 iomem.dimm.spd.detect .............................................................................. 260 iomem.dimm.spd.eccWidth ......................................................................... 260 iomem.dimm.spd.ranks ............................................................................... 260 iomem.dimm.spd.read ................................................................................. 260 iomem.dimm.spd.rows ................................................................................ 261 iomem.dma.crc.data .................................................................................... 261 iomem.dma.crc.desc .................................................................................... 261 iomem.dma.internal ..................................................................................... 261 iomem.dma.stall .......................................................................................... 262 iomem.ecc.cecc ........................................................................................... 262 iomem.ecc.correct.off .................................................................................. 262 iomem.ecc.correct.on .................................................................................. 262 iomem.ecc.detect.off ................................................................................... 263 iomem.ecc.detect.on .................................................................................... 263 iomem.ecc.inject .......................................................................................... 263 iomem.ecc.summary .................................................................................... 263 iomem.ecc.uecc ........................................................................................... 264

20 | Platform Monitoring Guide iomem.fail.stripe .......................................................................................... 264 iomem.firmware.package.access ................................................................. 264 iomem.firmware.primary ............................................................................ 265 iomem.firmware.program.complete ............................................................ 265 iomem.firmware.program.fail ..................................................................... 265 iomem.firmware.program.reboot ................................................................ 265 iomem.firmware.program.start .................................................................... 265 iomem.firmware.rev .................................................................................... 266 iomem.flash.mismatch.id ............................................................................ 266 iomem.fru.badInfo ....................................................................................... 266 iomem.fru.checksum ................................................................................... 266 iomem.fru.read ............................................................................................ 267 iomem.fru.write ........................................................................................... 267 iomem.i2c.link.down ................................................................................... 267 iomem.i2c.read.addrNACK ......................................................................... 267 iomem.i2c.read.dataNACK ......................................................................... 268 iomem.i2c.read.timeout ............................................................................... 268 iomem.i2c.write.addrNACK ....................................................................... 268 iomem.i2c.write.dataNACK ........................................................................ 268 iomem.i2c.write.timeout ............................................................................. 269 iomem.init.detect.fpga ................................................................................. 269 iomem.init.detect.pci ................................................................................... 269 iomem.init.fail ............................................................................................. 269 iomem.memory.flash.syndrome .................................................................. 269 iomem.memory.none ................................................................................... 270 iomem.memory.power.high ........................................................................ 270 iomem.memory.power.low ......................................................................... 270 iomem.memory.scrub.start .......................................................................... 270 iomem.memory.size .................................................................................... 271 iomem.memory.zero.complete .................................................................... 271 iomem.memory.zero.start ............................................................................ 271 iomem.nor.op.failed .................................................................................... 271 iomem.pci.error.config.bar .......................................................................... 271 iomem.pio.op.failed ..................................................................................... 272 iomem.remap.block ..................................................................................... 272 iomem.remap.target.bad .............................................................................. 272

Table of Contents | 21 iomem.temp.report ...................................................................................... 272 iomem.train.complete .................................................................................. 273 iomem.train.fail ........................................................................................... 273 iomem.train.notReady ................................................................................. 273 iomem.train.start .......................................................................................... 273 iomem.vmargin.high ................................................................................... 274 iomem.vmargin.low .................................................................................... 274 iomem.vmargin.nominal ............................................................................. 274 monitor.extCache.failed .............................................................................. 274 monitor.flexscale.noLicense ........................................................................ 274 USB boot device EMS messages ............................................................................ 275 usb.adapter.debug ........................................................................................ 275 usb.adapter.exception .................................................................................. 275 usb.adapter.failed ........................................................................................ 275 usb.adapter.reset .......................................................................................... 276 usb.device.failed .......................................................................................... 276 usb.device.initialize.failed ........................................................................... 276 usb.device.maximum.connected ................................................................. 277 usb.device.protocol.mismatch ..................................................................... 277 usb.device.removed ..................................................................................... 278 usb.device.timeout ....................................................................................... 278 usb.device.unsupported ............................................................................... 278 usb.device.unsupported.speed ..................................................................... 279 usb.external.device.not.used ........................................................................ 279 usb.externalHub.notSupported .................................................................... 279 usb.port.error ............................................................................................... 279 usb.port.reset ............................................................................................... 280 usb.port.state.indeterminate ......................................................................... 280 usb.port.status.inconsistent .......................................................................... 280 usbmon.boot.device.failed ........................................................................... 281 usbmon.boot.device.pfa ............................................................................... 281 usbmon.disable.module ............................................................................... 281 usbmon.unable.to.monitor ........................................................................... 282 FCoE HBA EMS messages ..................................................................................... 282 ispcna.mpi.dump ......................................................................................... 282 ispcna.mpi.dump.saved ............................................................................... 282

22 | Platform Monitoring Guide ispcna.mpi.initFailed ................................................................................... 283 Operational error messages ..................................................................................... 283 Disk hung during swap ................................................................................ 283 Disk n is broken ........................................................................................... 284 Dumping core .............................................................................................. 284 Error dumping core ..................................................................................... 284 FC-AL LINK_FAILURE ............................................................................ 284 FC-AL RECOVERABLE ERRORS ........................................................... 284 Panicking ..................................................................................................... 285 RMC Alert: Boot Error ............................................................................... 285 RMC Alert: Down Appliance ..................................................................... 285 RMC Alert: OFW POST Error .................................................................... 285

RLM messages .......................................................................................... 287


When and how RLM AutoSupport e-mail messages are sent ................................. 287 What RLM AutoSupport e-mail messages include ................................................. 288 When and how RLM EMS messages are sent ........................................................ 288 RLM-generated AutoSupport messages .................................................................. 288 Heartbeat loss warning ................................................................................ 288 Reboot (power loss) critical ........................................................................ 289 Reboot warning ........................................................................................... 289 Reboot (watchdog reset) warning ............................................................... 289 RLM heartbeat loss ..................................................................................... 289 RLM heartbeat stopped ............................................................................... 290 System boot failed (POST failed) ............................................................... 290 User triggered (RLM test) ........................................................................... 290 User_triggered (system nmi) ....................................................................... 290 User_triggered (system power cycle) .......................................................... 290 User_triggered (system power off) ............................................................. 291 User_triggered (system power on) .............................................................. 291 User_triggered (system reset) ...................................................................... 291 EMS messages about the RLM ............................................................................... 291 rlm.driver.hourly.stats ................................................................................. 291 rlm.driver.mailhost ...................................................................................... 292 rlm.driver.network.failure ........................................................................... 292 rlm.driver.timeout ........................................................................................ 292 rlm.firmware.update.failed .......................................................................... 293

Table of Contents | 23 rlm.firmware.upgrade.reqd .......................................................................... 293 rlm.firmware.version.unsupported .............................................................. 294 rlm.heartbeat.bootFromBackup ................................................................... 294 rlm.heartbeat.resumed ................................................................................. 294 rlm.heartbeat.stopped .................................................................................. 295 rlm.network.link.down ................................................................................ 295 rlm.notConfigured ....................................................................................... 296 rlm.orftp.failed ............................................................................................ 296 rlm.snmp.traps.off ....................................................................................... 297 rlm.systemDown.alert ................................................................................. 297 rlm.systemDown.notice ............................................................................... 297 rlm.systemDown.warning ........................................................................... 298 rlm.systemPeriodic.keepAlive .................................................................... 298 rlm.systemTest.notice .................................................................................. 299 rlm.userlist.update.failed ............................................................................. 299

BMC messages .......................................................................................... 301


How and when BMC AutoSupport e-mail notifications are sent ............................ 301 What BMC e-mail notifications include ................................................................. 301 BMC-generated AutoSupport messages ................................................................. 301 BMC_ASUP_UNKNOWN ......................................................................... 302 REBOOT (abnormal) .................................................................................. 302 REBOOT (power loss) ................................................................................ 302 REBOOT (watchdog reset) ......................................................................... 302 SYSTEM_BOOT_FAILED (POST failed) ................................................ 302 SYSTEM_POWER_OFF (environment) .................................................... 303 USER_TRIGGERED (bmc test) ................................................................. 303 USER_TRIGGERED (system nmi) ............................................................ 303 USER_TRIGGERED (system power cycle) ............................................... 303 USER_TRIGGERED (system power off) ................................................... 303 USER_TRIGGERED (system power on) ................................................... 304 USER_TRIGGERED (system power soft-off) ........................................... 304 USER_TRIGGERED (system reset) ........................................................... 304 EMS messages about the BMC ............................................................................... 304 bmc.asup.crit ............................................................................................... 304 bmc.asup.error ............................................................................................. 305 bmc.asup.init ............................................................................................... 305

24 | Platform Monitoring Guide bmc.asup.queue ........................................................................................... 305 bmc.asup.send ............................................................................................. 305 bmc.asup.smtp ............................................................................................. 306 bmc.batt.id ................................................................................................... 306 bmc.batt.invalid ........................................................................................... 306 bmc.batt.mfg ................................................................................................ 306 bmc.batt.rev ................................................................................................. 307 bmc.batt.seal ................................................................................................ 307 bmc.batt.unknown ....................................................................................... 307 bmc.batt.unseal ............................................................................................ 307 bmc.batt.upgrade ......................................................................................... 307 bmc.batt.upgrade.busy ................................................................................. 308 bmc.batt.upgrade.failed ............................................................................... 308 bmc.batt.upgrade.failure .............................................................................. 308 bmc.batt.upgrade.ok .................................................................................... 309 bmc.batt.upgrade.power-off ........................................................................ 309 bmc.batt.upgrade.voltagelow ...................................................................... 309 bmc.batt.voltage .......................................................................................... 309 bmc.config.asup.off ..................................................................................... 310 bmc.config.corrupted .................................................................................. 310 bmc.config.default ....................................................................................... 310 bmc.config.default.pef.filter ........................................................................ 310 bmc.config.default.pef.policy ...................................................................... 311 bmc.config.fru.systemserial ........................................................................ 311 bmc.config.mac.error .................................................................................. 311 bmc.config.net.error .................................................................................... 311 bmc.config.upgrade ..................................................................................... 312 bmc.power.on.auto ...................................................................................... 312 bmc.reset.ext ................................................................................................ 312 bmc.reset.int ................................................................................................ 312 bmc.reset.power .......................................................................................... 312 bmc.reset.repair ........................................................................................... 313 bmc.reset.unknown ...................................................................................... 313 bmc.sensor.batt.charger.off ......................................................................... 313 bmc.sensor.batt.charger.on .......................................................................... 313 bmc.sensor.batt.time.run.invalid ................................................................. 313

Table of Contents | 25 bmc.ssh.key.missing .................................................................................... 314

Service Processor messages ..................................................................... 315


When and how SP AutoSupport e-mail messages are sent ..................................... 315 What SP AutoSupport e-mail messages include ..................................................... 316 When and how SP EMS messages are sent ............................................................. 316 SP-generated AutoSupport messages ...................................................................... 316 HEARTBEAT_LOSS ................................................................................. 316 REBOOT (abnormal) .................................................................................. 317 SYSTEM_BOOT_FAILED (POST failed) ................................................ 317 USER_TRIGGERED (sp test) .................................................................... 317 USER_TRIGGERED (system nmi) ............................................................ 317 USER_TRIGGERED (system power cycle) ............................................... 318 USER_TRIGGERED (system power off) ................................................... 318 USER_TRIGGERED (system reset) ........................................................... 318 EMS messages about the SP ................................................................................... 318 sp.firmware.upgrade.reqd ............................................................................ 318 sp.firmware.version.unsupported ................................................................ 319 sp.heartbeat.resumed ................................................................................... 319 sp.heartbeat.stopped .................................................................................... 319 sp.network.link.down .................................................................................. 320 sp.notConfigured ......................................................................................... 320 sp.orftp.failed .............................................................................................. 321 sp.snmp.traps.off ......................................................................................... 321 sp.userlist.update.failed ............................................................................... 321 spmgmt.driver.hourly.stats .......................................................................... 322 spmgmt.driver.mailhost ............................................................................... 323 spmgmt.driver.network.failure .................................................................... 323 spmgmt.driver.timeout ................................................................................ 323

Abbreviations ............................................................................................ 325 Copyright information ............................................................................. 341 Trademark information ........................................................................... 343 How to send your comments .................................................................... 345 Index ........................................................................................................... 347

26 | Platform Monitoring Guide

27

Sources of troubleshooting information


Your storage system alerts you when problems occur and informs you of events that do not pose problems. It does so with LEDs and messages that appear on your system console. Monitoring messages and LEDs and using this guide to determine the meaning of messages and LEDs can help you prevent or correct problems on your system. The following systems are included in this guide: 20xx and SA200 2240 30xx, SA300, and C3300 NetCache 31xx 32xx and SA320 60xx 62xx and SA620

Where LEDs appear


LEDs appear on the front of system chassis, the back of controllers, on PSUs, and on fan FRUs. They also appear on adapters that might be installed on your system. LEDs for one system family differ from LEDs for another system family. For example, LEDs on FAS20xx and SA200 systems differ from those on 60xx and SA600 systems.

Where messages are displayed


Your system displays messages in different places, depending on the type of message. The following table lists the types of messages your system might generate and where you can see them on your system. Error message type POST error messages Boot error messages EMS environmental messages and other operational messages RLM notifications about the system and EMS messages about the RLM Where the type of message is displayed System console System console System console or LCD display AutoSupport e-mail messages and the system console

28 | Platform Monitoring Guide

Error message type BMC notifications about the system and EMS messages about the BMC SP notifications about the system and EMS messages about the SP

Where the type of message is displayed AutoSupport e-mail messages and the system console AutoSupport e-mail messages and the system console

Your system also logs messages. See the System Administration Guide for the version of Data ONTAP that your system is running for information about message logs. Additional information about messages that appear on your system console or in logs may be available through the Syslog Translator on the NOW site.

How AutoSupport e-mail messages help with troubleshooting


Your system has an AutoSupport feature, which sends e-mail containing information about your system to technical support. AutoSupport provides customized real-time support to monitor the performance of your system. AutoSupport messages are generated and sent when specific events occur within a system or a cluster. Messages also are sent weekly to provide support personnel information about system performance. If necessary, technical support contacts you at the e-mail address that you specify to help resolve a potential system problem. You also can have AutoSupport messages sent to addresses that you designate, such as your internal support organization. Descriptions of the AutoSupport messages that you receive are available through the Message Matrices page on the NOW site. For information about configuring AutoSupport, see the System Administration Guide for the version of Data ONTAP that your system is running.
Note: AutoSupport is enabled by default. You should keep it enabled because it can significantly

speed the determination and resolution of problems if they occur on your system.

Forms and use of diagnostic tools


Diagnostic tools enable you to troubleshoot problems with your storage system hardware. Forms and use of diagnostics differ, depending on your system model. You need to understand how to use the applicable form of diagnostics for your system. The following lists describe the forms of diagnostics available on different systems:

Sources of troubleshooting information | 29

System-level diagnostics

System-level diagnostics are available on 32xx and 62xx systems by entering


sldiag commands at the Maintenance mode prompt.

The sldiag commands enable you to specify devices, tests, and options; run diagnostics based on the command; and then view the results. They are documented in man pages and in the command reference documents on the NetApp Support Site at support.netapp.com. Additional information about system-level diagnostics is available in the SystemLevel Diagnostics Guide on the NetApp Support Site at support.netapp.com. SYSDIAG tool The SYSDIAG tool is available on systems earlier than 32xx and 62xx by entering the boot_diags command at the boot environment prompt and then navigating menu options. The command boots the diagnostic program and then displays the Diagnostic Monitor, the interface providing access to diagnostic menus. After you select and run a test, the SYSDIAG tool generates a message and displays it on the system console if the test finds an error. Additional information about the SYSDIAG tool is available in the Diagnostics Guide on the NetApp Support Site at support.netapp.com.

Where to find documentation


Documentation is available for specific system families and disk shelves that might be attached to your storage system. You can find documentation on the NetApp Support Site at support.netapp.com. Use the following table to learn what documents contain information that might assist you with troubleshooting specific systems or disk shelves. Platform or disk shelf type System or disk shelf model FAS systems 62xx systems 60xx systems 32xx systems 31xx systems 30xx systems 2240 systems 20xx systems FAS900 series Document

Platform Monitoring Guide (This guide)

FAS900 Hardware Service Guide

30 | Platform Monitoring Guide

Platform or disk shelf type System or disk shelf model FAS250 and FAS270 systems Filer systems

Document

FAS250/FAS270 Hardware and Service Guide


F800 filers F87 filers F85 filers

F800 Hardware Installation Guide F87 Hardware and Service Guide F85 Hardware and Service Guide Platform Monitoring Guide (This guide)

V-Series systems and gFiler V30xx systems gateways V31xx systems V32xx systems V60xx systems V62xx systems V900 gFiler V270c gFiler GF825 gFiler SA systems SA200 systems SA300 systems SA320 systems SA600 systems SA620 systems NearStore systems R200 systems R150 systems R100 systems

gFiler Hardware Maintenance Guide

Platform Monitoring Guide (This guide)

R200 Hardware and Service Guide R150 Hardware and Service Guide R100 Hardware and Service Guide

Sources of troubleshooting information | 31

Platform or disk shelf type System or disk shelf model NetCache appliances C1300/C2300/C3000 appliances C6200 appliances C6100/C3100 appliances C1200/C2100 appliances Disk shelves DS2246 DS4243 DS14mk2 FC DS14mk2 AT FC9 Third-party hardware Switches, routers, storage subsystems, and tape backup devices

Document

Platform Monitoring Guide (This guide) C6200 Hardware and Service Guide C6100/C3100 Hardware and Service Guide C1200/C2100 Hardware and Service Guide DS2246 Installation and Service Guide DS4243 Installation and Service Guide DS14mk2 FC Hardware Guide DS14mk2 AT Hardware Guide FC9 Hardware Guide
Applicable third-party hardware documentation

32 | Platform Monitoring Guide

33

System LEDs
LEDs enable you to monitor your storage system and its components. Each storage system platform has LEDs on the chassis, controller, fans, and PSUs. These LEDs provide high-level status of your system and network activity. Your system might have adapters installed and configured on them. These adapters also have LEDs, which show you whether the adapter has power, whether there is a network connection, and whether data is being transmitted.
Note: For information about disk shelf LEDs, see the appropriate disk shelf guide on the NetApp Support Site at support.netapp.com.

FAS20xx and SA200 system LEDs


FAS20xx and SA200 systems have LEDs that you can check to learn whether the system and its individual components are turned on and are operating normally. LEDs are visible on the front and the back of the system and on the power supply.

Location and meaning of LEDs on the front of FAS20xx and SA200 chassis
You can check the LEDs on the front of the system to learn whether the power is turned on, whether there is activity on the controller, whether the system is halted, or whether there is a fault in the chassis. The following illustration shows the LEDs on the front of the FAS20xx and SA200 chassis.

34 | Platform Monitoring Guide

1 2 3 4

Power LED

Fault LED

Controller module A LED

Controller module B LED

The following table explains what the LEDs on the front of the chassis mean. Label LED name Power Status indicator Green Off Description The system is receiving power. The system is not receiving power.

System LEDs | 35

Label

LED name Fault

Status indicator Amber

Description The system halted or a fault occurred in the chassis. The error might be in a PSU, fan, controller module, or internal disk. The LED also is lit when there is a field-replaceable unit failure, Data ONTAP is not running on a controller module, or the system is in Maintenance mode. The system is operating normally. The controller is operating and is active. This LED blinks in proportion to activity; the greater the activity, the more frequently the LED blinks. When activity is absent or very low, the LED does not blink. No activity is detected.

Off A/B (Controller A or B) Green Blinking

Off

Note: If an internal disk drive fails or is disabled, the fault light on the front of the chassis turns on. When you remove the faulty or disabled disk drive, the fault light turns off. However, the failure of disk drives in expansion disk shelves does not affect the fault light on the front of the chassis.

Location and meaning of LEDs on the back of FAS20xx and SA200 controller modules
You can check the LEDs on the back of the controller module to learn whether the controller module is functioning properly, or to learn the status of the system network or disk shelf connections or NVMEM. The following LEDs are on the back of the controller module: Fibre Channel port Remote management port Ethernet port NVMEM Controller module fault

The following illustration shows the location of LEDs on the rear of FAS2050 and SA200 controller modules.

36 | Platform Monitoring Guide The LEDs on the back of FAS2020 controller modules are the same as on the back of FAS2050 and SA200 controller modules, except for the placement of some labels. The following illustration shows the location of LEDs on the back of FAS2040 controller modules.

The following table explains what the LEDs on the back of the controller modules mean. Label Port type LED type Status indicator Description Green Off SAS LNK Green Off Remote management LNK (Left) Green Off ACT (Right) Amber Off Green Off ACT (Right) Amber Off Link is established and communication is happening. No link is established. Link is established on at least one external SAS lane. No link is established on any external SAS lane. A valid network connection is established. There is no network connection present. There is data activity. There is no network activity present. A valid network connection is established. There is no network connection present. There is data activity. There is no network activity present.

Fibre Channel LNK

Ethernet

LNK (Left)

System LEDs | 37

Label

Port type N/A

LED type NVMEM status LED

Status indicator Description Blinking green Off (power on) NVMEM is in battery-backed standby mode. The system is running normally, and NVMEM is armed if Data ONTAP is running. The system is shut down, NVMEM is not armed, and the battery is not enabled. The controller module is starting up, Data ONTAP is initializing, the controller module is in Maintenance mode, or a controller module fault is detected. The controller module is functioning properly.

or Off (power off)

N/A

Controller Amber module fault LED

Off

Attention: Do not replace DIMMs or any other system hardware when the NVMEM LED is

blinking. Doing so might cause you to lose data. Always flush NVMEM contents to disk by entering a halt command at the system prompt before replacing the hardware.
Attention: To protect critical data in NVMEM, you cannot update BIOS or BMC firmware when

NVMEM is in use. Before updating firmware, ensure that NVMEM no longer contains critical data by performing a halt command to cleanly shut down Data ONTAP. When the system reboots to the boot environment prompt, you can update your firmware.

Location and meaning of FAS20xx and SA200 PSU LEDs


You can check the LEDs on each PSU in your system to see whether the PSU has power and is functioning properly. The following illustration shows the location of the PSU LEDs, which are visible at the back of the system.
Note: The following illustration shows the PSU of FAS2050 and SA200 systems. The location of PSU LEDs in FAS2020 and FAS2040 systems are different, but the LEDs are functionally identical.

38 | Platform Monitoring Guide

1 2

AC LED

Fault LED

The following table explains what the PSU LEDs mean. Icon LED name AC LED color Green Off Fault Amber Description AC input is good and the switch is on. AC input is bad or the switch is off. The power supply is not functioning properly and needs service. See the system console for any applicable error messages. The power supply is functioning properly.

Off

2240 system LEDs


2240 systems have LEDs that you can check to learn whether the system and its individual components are turned on and are operating normally. LEDs are visible on the front of the chassis, on the back of controllers, and on the PSUs.

System LEDs | 39 2240 systems are available in two models: the 2U 2240-2 system and the 4U 2240-4 system.

Location and meaning of LEDs on the front of 2240 systems


You can check the LEDs on the front of the chassis to learn whether the power is turned on, the controller is active, the system is halted, or a fault in the chassis has occurred. The following illustration shows the LEDs on the front of a FAS2240-2 system with the bezel in place. 1

1 2

LEDs

Shelf ID digital display

2240-4 systems have 4U chassis, but the placement and function of the LEDs are the same as on 2240-2 systems. The following table shows what the LED labels look like and explains what the LEDs mean. LED label LED name Power Status indicator Green Off Fault Amber Description Power is being supplied to the system. No power is being supplied to the system. A fault has occurred in the controller, PSU, or onboard storage, or Data ONTAP is not running. The system is operating normally.

Off

The shelf ID digital display shows the shelf ID of the chassis, which contains disk drives.

40 | Platform Monitoring Guide


Note: If the 2240 system has no attached disk shelves, then the chassis can have any ID number. However, if disk shelves are attached, the chassis shelf and attached disk shelves must have unique ID numbers.

When the bezel is removed, a third LED, indicating activity, is revealed below the fault LED. The following table shows what the activity LED label looks like and explains what the LED means. LED label LED name Activity Status indicator Green Description A link is established between the controller and storage.

Location and meaning of LEDs on the back of 2240 controllers


You can check the LEDs on the back of the controller to learn the status of its network or disk shelf connections, or, in an HA pair, to identify the controller where a fault occurred. The following illustration shows the ports and LEDs on the back of the controller.

1
LNK LNK

7
IOIOI

12

13 14
e0a e0c

1a

1b
LNK
e0b e0d

0b

0a
LNK

10 11

13 14

1 2 3 4 5

SAS port LEDs

SAS ports

Controller fault LED

NVMEM status LED

Optional mezzanine card LEDs (either 2/4/8 Gbps Fibre Channel or 10 GbE)

System LEDs | 41

6 7 8 9 10 11 12 13 14

Optional mezzanine card ports (either 2/4/8 Gbps Fibre Channel or 10 GbE)

Serial port

USB port

Remote management Ethernet 10/100 Mb port LEDs

Remote management Ethernet 10/100 Mb port

Private management 10/100 Mb Ethernet port LEDs

Private management 10/100 Mb Ethernet port

GbE Ethernet port LEDs

GbE Ethernet port

If the optional mezzanine card is installed, it provides one of the following sets of ports: Two 2/4/8 Gbps Fibre Channel ports, each with one LNK LED Two 10-GbE ports, each with one activity LED and one LNK LED

The following table describes the meaning of the LEDs on the back of the controller. Label Name Serial attached SCSI (SAS) Type Link Status indicator Green Off Description Link is established on at least 1 external SAS lane. No link is established on any external SAS lane.

42 | Platform Monitoring Guide

Label

Name Controller fault

Type Activity

Status indicator Amber

Description The controller module is starting up, Data ONTAP is initializing, the controller module is in Maintenance mode, or a controller module fault is detected.
Note: The LED might be illuminated on both controllers.

Off NVMEM NVMEM Blinking status green Off (power on) Fibre Channel Link Green Off Ethernet Link Green Off Activity Blinking amber Off Remote Link management and Activity Green Off Blinking amber Off

The controller is functioning properly. NVMEM is in battery-backed standby mode. The system is running normally, and NVMEM is armed if Data ONTAP is running. A connection is established on the port. No connection is established on the port. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection.

System LEDs | 43

Label

Name

Type

Status indicator Green Off

Description A link is established between the port and a downstream disk shelf. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection.

Private Link management and

Activity

Blinking amber Off

Location and meaning of 2240 PSU LEDs


You can check the LEDs on each PSU to see whether its power is on and whether the PSU and integrated fan modules are working properly. The PSUs on 2240-2 systems and 2240-4 systems are different. The appearance of the PSU and its LEDs are different, but the LEDs function the same way. The following illustration shows the location of PSU LEDs on the back of 2240 systems. 1 2 3 4

AC

1 2 3

PSU OK

DC fault

AC fault

44 | Platform Monitoring Guide

Fan fault

The following illustration shows the location of PSU LEDs on the back of the 2240-4 system. 3 1 2

1 2 3 4

AC fault

Fan fault

PSU OK

DC fault

The following table describes what the PSU LEDs on 2240 systems mean. Label Name PSU OK Status indicator Green Description The PSU is functioning normally.
Note: The other three LEDs are not illuminated.

DC fault

Amber

The PSU cannot provide DC voltage to the disk shelf within margin.

System LEDs | 45

Label

Name AC fault Fan fault

Status indicator Amber Amber

Description The PSU is not turned on or the AC power cord is not plugged in. An error occurred with the function of the fan.

Location and meaning of 2240 internal FRU LEDs


2240 systems contain LEDs inside the controller that assist in troubleshooting FRUs inside of them. The following FRUs are in the controller and have LEDs on or near them: DIMMs (2) RTC battery Boot media device Mezzanine card

The FRU LEDs remain unlit when the FRU is functioning normally and turn amber when a problem occurs. They stay lit for at least 10 minutes even after you remove the controller from the chassis.

30xx and SA300 and C2300 and C3300 NetCache system LEDs
30xx and SA300 systems and C2300 and C3300 NetCache appliances have LEDs that you can check to learn whether the system and its components are turned on and are operating normally. LEDs are visible on the front and rear of each system and on the power supplies.
Note: 30xx systems do not include 3140, 3160, and 3170 systems, which are referred to collectively as 31xx systems. 30xx systems also do not include 3210, 3240, and 3270 systems, which are referred to collectively as 32xx systems.

Location and meaning of LEDs on the front of 30xx, SA300, and C2300 and C3300 NetCache controllers controllers
You can check the LEDs on the front of the controller to learn whether the power is turned on, whether there is activity on the controller, whether the system is halted, or whether a fault has occurred. The following illustration shows the LEDs on the front of the controller.

46 | Platform Monitoring Guide

1 2 3

Activity LED

Status LED

Power LED

The following table explains the meaning of the LEDs. LED label Activity Status indicator Green Blinking Off Status Green Amber Description The system is operating and is active. The system is actively processing data. No activity is detected. The system is operating normally The system halted or a fault occurred. The fault is displayed in the LCD.
Note: This LED remains lit during boot, while the operating system loads.

Power

Green Off

The system is receiving power. The system is not receiving power.

System LEDs | 47

Location and meaning of LEDs on the back of 30xx, SA300, and C2300 and C3300 NetCache controllers
You can check the LEDs on the back of the controller to learn the status of the controller network connections. The following LEDs are visible on the back of the controller: Fibre Channel port LEDs GbE port LEDs RLM LEDs

The following illustration shows the location of LEDs on the back of the controller.

1 2 3

Fibre Channel port LEDs

GbE port LEDs

RLM LEDs

The following table explains what the LEDs on the back of the controller mean. Port type Fibre Channel LED type LNK Status indicator Off Green Description No link with the Fibre Channel is established. A link is established.

48 | Platform Monitoring Guide

Port type GbE and RLM

LED type LNK

Status indicator On Off

Description A valid network connection is established. There is no network connection There is data activity

ACT Off

There is no network activity present.

Location and meaning of 30xx, SA300, and C2300 and C3300 NetCache PSU LEDs
You can check the LEDs on the PSUs to learn whether they are functioning normally. The following illustration shows the location of the PSU LEDs on the back of the system.

1 2 3

PSU 1

PSU 2

PSU LEDs

System LEDs | 49 The following table explains what the PSU LEDs mean. LED label AC OK or Status AC OK or Status AC OK or Status Status indicator Amber Green Off Off Amber Off There is no external power; check the connections and the power source. (3020 and 3050 systems) CFE prompt. (3040, 3070, and SA300 systems) The system displays the LOADER> prompt because it has not booted Data ONTAP. Description No fault is indicated.

AC OK or Status

Flashing amber Amber

There is a power supply fault; replace the power supply.

31xx system LEDs


31xx systems have LEDs that you can check to learn whether the system and its individual components are turned on and are operating normally. LEDs are visible on the front and rear of each system and on the fan FRUs and the power supplies.

Location and meaning of LEDs on the front of 31xx chassis


You can check the LEDs on the front of the chassis to learn whether the power is turned on, the controller is active, the system is halted, or a fault in the chassis has occurred. The following illustration shows the LEDs on the front of the chassis.

50 | Platform Monitoring Guide

LEDs on the front of the system

When the bezel is in place, the LEDs are arranged horizontally in the following left-to-right order: Power Fault Controller A activity Controller B activity

Controller A is the controller in the top of the chassis, and Controller B is the controller in the bottom of the chassis.
Note: When the bezel is removed, the LEDs are arranged vertically in the following top-to-bottom

order: Power Fault Controller A activity Controller B activity

The following table shows what the LED labels look like and explains what the LEDs mean.

System LEDs | 51

LED label

LED name Power

Status indicator Green Off

Description At least one of the two PSUs is delivering power to the system. Neither PSU is delivering power to the system. The system halted or a fault occurred in the chassis. The error might be in a PSU, fan, or controller. The LED also is lit when there is a FRU failure, Data ONTAP is not running on a controller, or the system is in Maintenance mode.
Note: You can check the fault light on the back of each controller to see where the problem occurred. Note: The fault light does not come on when you remove the controller from a dual-controller system in an HA pair.

Fault

Amber

Off A/B Activity Blinking green Off

Both controllers are operating normally. Data ONTAP is running on the controller. The length of time that the light remains on is proportional to the controller's activity. Data ONTAP is not running on the controller.

Location and meaning of LEDs on the back of 31xx controllers


You can check the LEDs on the back of the controller to learn the status of its network or disk shelf connections, or, in an HA pair, to identify the controller where a fault occurred. The following LEDs are visible on the back of the controller: Ethernet port Fault Fibre Channel port

The following illustration shows the location of the LEDs on the back of the controller.

The following table explains the behavior of the LEDs on the back of the controller.

52 | Platform Monitoring Guide

LED label

Type name Ethernet port

LED type Link (left)

Status indicator Green Off

Description A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. The controller is the one causing the front panel LED to be illuminated.
Note: This LED might be illuminated on both controllers.

Activity (right) Management Link port (Ethernet) (left)

Amber Off Green Off

and

Activity (right) Controller fault Activity

Amber Off Amber

Off Fibre Channel Link Green Off

The controller is functioning properly. A loop connection is established on the port. No loop connection is established on the port.

Location and meaning of 31xx fan LEDs


You can check the LED on each fan module FRU to pinpoint problems that can occur in the FRU. When the bezel is removed, the fan module FRUs and their LEDs are visible. The following illustration shows the LED on a fan module FRU.

System LEDs | 53

Fan module FRU LED

The fan module FRU LED is amber and turns on when a problem occurs in the fan. If you see error messages indicating a fan problem, you can remove the bezel and use the illuminated fan FRU LED to locate the FRU where the problem occurred.

Location and meaning of 31xx PSU LEDs


You can check the LEDs on each AC PSU or DC PSU to see whether its power is on and whether the PSU is working properly. The following illustration shows the location of AC PSU LEDs on the back of the system. DC PSUs have different power connectors, but their LEDs are the same.

54 | Platform Monitoring Guide

1 2

Fault LED

Power LED

The following table describes what the AC PSU and DC PSU LEDs mean. PSU type AC -48VDC AC -48VDC AC -48VDC PSU condition PSU is present and switched on. Normal mode. PSU is missing or switched off. The other PSU is off or functioning normally. PSU fault: AC in or -48VDC is out of range, or there is a DC fault or fan fault. Power LED status Green Fault LED status Off

Off

Off

Off

Blinking amber

Location and meaning of 31xx FRU LEDs


31xx systems have 15 internal LEDs that assist in troubleshooting FRUs. Eleven LEDs are next to FRUs on the controller board: (up to eight) DIMMs, CompactFlash, RLM, and the RTC battery. When an LED is lit, it indicates that the FRU next to it needs to be replaced.

System LEDs | 55 Four LEDs are on the PCIe riser, one per PCIe slot. When one of the LEDs is lit, it indicates that there is a problem with the card in that particular PCIe slot. The FRU LEDs stay lit for at least 10 minutes even after you remove the controller from the system.

32xx and SA320 system LEDs


32xx and SA320 systems have LEDs that you can check to learn whether the system and its individual components are turned on and are operating normally. LEDs are visible on the front of the chassis, on the back of controllers and I/O expansion modules, and on fan FRUs and power supplies.

Location and meaning of LEDs on the front of 32xx and SA320 chassis
You can check the LEDs on the front of the chassis to learn whether the power is turned on, the controller is active, the system is halted, or a fault in the chassis has occurred. The following illustration shows the LEDs on the front of the chassis.

LEDs

When the bezel is in place, the LEDs are arranged horizontally in the following left-to-right order: Power Fault Controller A activity Controller B activity

When two controllers are installed in the chassis, Controller A is the controller in the top bay, and Controller B is the controller in the bottom bay. When a controller and an I/O expansion module are installed in the chassis, the controller is always in the top bay and the I/O expansion module is always in the bottom bay. The following table shows what the LED labels look like and explains what the LEDs mean.

56 | Platform Monitoring Guide

LED label

LED name Power

Status indicator Green Off

Description Power is being supplied to the system. No power is being supplied to the system. The system halted, or a fault occurred in the chassis. The controllers are operating normally, or the controller and the I/O expansion module are operating normally. Data ONTAP is running on the controller. The length of time that the light remains on is proportional to the controller's activity.
Note: If an I/O expansion module is installed in the chassis, the corresponding controller activity LED is not lit.

Fault

Amber Off

Controller A/B

Blinking green

Off

Data ONTAP is not running on the controller.

Location and meaning of LEDs on the back of 32xx and SA320 controllers
You can check the LEDs on the back of the controller to learn the status of its network or disk shelf connections, or, in an HA pair, to identify the controller where a fault occurred. The following illustration shows the ports and LEDs on the back of the controller. 1 3 5 7 9

c0a

0c

e0a

0a LNK

0b LNK

!
c0b 0d e0b

10

11

12 13

SAS port LEDs

System LEDs | 57

2 3 4 5 6 7 8 9 10 11 12 13

SAS ports

HA port LEDs (LEDs pointing up belong to the upper port; LEDs pointing down belong to the lower port.) HA ports

Fibre Channel port LEDs (LED pointing up belongs to the upper port; LED pointing down belongs to the lower port.) Fibre Channel ports

1-GbE port LEDs

1-GbE ports

Management Ethernet 10/100 Mb port LEDs

Private management 10/100 Mb Ethernet port

USB (top) and serial console (bottom) ports (External USB devices are not currently supported.) Controller fault LED

NVMEM LED

The following table describes the meaning of the LEDs on the back of the controller. Label Name Serial attached SCSI (SAS) Type Link Status indicator Green Off Description Link is established on at least 1 external SAS lane. No link is established on any external SAS lane.

58 | Platform Monitoring Guide

Label

Name Fibre Channel

Type Link

Status indicator Green Off

Description A connection is established on the port. No connection is established on the port. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and a downstream disk shelf. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A problem has occurred in the controller. This in turn has caused the system fault LED on the front of the chassis to be illuminated.
Note: The LED might be illuminated on both controllers.

Ethernet

Link

Green Off

Activity

Amber Off

Remote Link management and Activity

Green Off Amber Off

Private Link management and

Green Off

Activity

Amber Off

Controller fault

Activity

Amber

Off

The controller is functioning properly.

System LEDs | 59

Label

Name NVMEM

Type

Status indicator

Description NVMEM is in battery-backed standby mode. The system is running normally, and NVMEM is armed if Data ONTAP is running.

NVMEM Blinking status green Off (power on)

Location and meaning of the LED on the back of 32xx and SA320 I/O expansion modules
You can check the back of the I/O expansion module to detect whether a fault has occurred. The following illustration shows the ports and LEDs on the back of an I/O expansion module.

4
!
2 4

1 PCIe slots (labeled 3, 4, 5, and 6)

1 2

Fault LED

The following table describes the meaning of the LED on the I/O expansion module. Label Name I/O expansion module fault Type Activity Status indicator Amber Off Description A fault has occurred. The I/O expansion module is functioning normally.

60 | Platform Monitoring Guide

Location and meaning of 32xx and SA320 fan LEDs


You can check the LED on each fan module FRU to pinpoint problems that can occur in the FRU. When the bezel is removed, the fan module FRUs and their LEDs are visible. The following illustration shows the LED on a fan module FRU. 1

LED

The fan module FRU LED is amber and illuminates when a problem occurs in the fan. If you see error messages indicating a fan problem, you can remove the bezel and use the illuminated fan FRU LED to locate the FRU where the problem occurred.

Location and meaning of 32xx and SA320 PSU LEDs


You can check the LEDs on each PSU to see whether its power is on and whether the PSU is working properly. The following illustration shows the location of PSU LEDs on the back of the system.

System LEDs | 61

1 2

Fault LED

Power LED

The following table describes what the PSU LEDs mean. Power LED status Green Off Off Fault LED status Off Off Blinking amber PSU condition PSU is present and switched on. Normal mode. PSU is missing or switched off. The other PSU is off or functioning normally. PSU fault: AC in is out of range, or there is a DC fault or fan fault.

Location and meaning of 32xx and SA320 internal FRU LEDs


32xx systems contain LEDs inside the controller and I/O expansion module that assist in troubleshooting FRUs inside of them. The following FRUs are in the controller and have LEDs on or near them: DIMMs (up to 4) RTC battery USB device PCIe slots (2)

62 | Platform Monitoring Guide The I/O expansion module has four PCIe slots, each with an LED. The FRU LEDs remain unlit when the FRU is functioning normally and turn amber when a problem occurs. They stay lit for at least 10 minutes even after you remove the controller or I/O expansion module from the chassis.

60xx and SA600 system LEDs


60xx and SA600 systems have LEDs that you can check to learn whether the system and its components are turned on and are operating normally. LEDs are visible on the front and rear of each system, and on the fan FRUs and the power supplies.

Location and meaning of LEDs on the front of 60xx and SA600 controllers
You can check the LEDs on the front of the controller to learn whether the power is turned on, whether the system is active, whether the system is halted, or whether there is a fault in the chassis. The following illustration shows the LEDs on the front of the controller.

1 2 3

1 2 3

Activity LED

Status LED

Power LED

The following table explains what the LEDs on the front of the controller mean.

System LEDs | 63

LED label Activity

Status indicator Green Blinking Off

Description The system is operating and is active. The system is actively processing data. No activity is detected. The system is operating normally. The system halted or a fault occurred. The fault is displayed in the LCD.
Attention: The LED remains lit during boot, while the

Status

Green Amber

operating system loads. Power Green Off The system is receiving power. The system is not receiving power.

Location and meaning of LEDs on the back of 60xx and SA600 controllers
You can check the LEDs on the back of the controller to learn the status of network and disk shelf connections. The following illustration shows the location of LEDs on the back of the controller. 1

1 2

GbE port LEDs

RLM port LEDs

64 | Platform Monitoring Guide

Fibre Channel port LEDs

The following table explains what the LEDs on the rear of the controller mean. Port type Fibre Channel LED type LNK (Green) Status indicator Off Blinking (6030 and 6070 systems) Solid (6040, 6080, and SA600 systems) GbE and RLM LNK On Off ACT On Off A valid network connection is established. There is no network connection. There is data activity. There is no network activity present. Description No link with the Fibre Channel is established. A link is established and communication is happening.

Location and meaning of 60xx and SA600 fan LEDs


You can check the fan LEDs to learn whether the fan is functioning properly. The following illustration shows the location of the fan LEDs, which you can see when you remove the bezel from the system. 1

Fan

System LEDs | 65

LEDs

The following table describes the behavior of the fan LEDs. LED status Orange blinking Off Description The fan failed. There is no power to the system, or the fan is operational.

Location and meaning of 60xx and SA600 PSU LEDs


You can check the LEDs to learn whether the PSUs are providing power to your system and whether they are functioning properly. The following illustration shows the location of the PSU LEDs on your system. 1

1 2

LEDs

Power supply

The following table explains what the PSU LEDs mean. Amber (AC input) On Green (PSU status) On Description The AC power source is good, and the PSU is providing power to the system. Corrective action N/A

66 | Platform Monitoring Guide

Amber (AC input) On

Green (PSU status) Off

Description AC power is present, but the PSU is not delivering power to the system. AC power is present, but the power supply is not enabled.

Corrective action Ensure that the PSU is properly seated and that its cables are connected and secure. 1. Log in to the RLM and enter the following command:
system power on Note: Using the system power

On

Blinking

command might cause an improper shutdown of the storage system. During powercycling, a brief pause occurs before power is turned back on. 2. If the problem persists, contact technical support. Off Off AC power is either not present or not within operational limits. Check the AC switch, AC power cable, and upstream circuit breakers.

62xx and SA620 system LEDs


62xx and SA620 systems have LEDs that you can check to learn whether the system and its individual components are turned on and are operating normally. LEDs are visible on the front of the chassis, the rear of controllers and I/O expansion modules, and on fan FRUs and power supplies.

Location and meaning of LEDs on the front of 62xx and SA620 chassis
You can check the LEDs on the front of the chassis to learn whether the power is turned on, the controller is active, the system is halted, or a fault in the chassis has occurred. The following illustration shows the LEDs on the front of the 62xx and SA620 chassis.

System LEDs | 67

Chassis LEDs

When the bezel is in place, the LEDs are arranged horizontally in the following left-to-right order: Power Fault Controller A activity Controller B activity

When two controllers are installed in the chassis, Controller A is the controller in the top bay, and Controller B is the controller in the bottom bay. When a controller and an I/O expansion module are installed in the chassis, the controller is always in the top bay and the I/O expansion module is always in the bottom bay.
Note: When the bezel is removed, the LEDs are arranged vertically in the following top-to-bottom

order: Power Fault Controller A activity Controller B activity

The following table shows what the LED labels look like and explains what the LEDs mean.

68 | Platform Monitoring Guide

LED label

LED name Power

Status indicator Green Off

Description At least one of the two PSUs is delivering power to the system. Neither PSU is delivering power to the system. The system halted or a fault occurred in the chassis. The error might be in a PSU, fan, controller, or I/O expansion module. The LED also is lit when there is a FRU failure, Data ONTAP is not running on a controller, or the system is in Maintenance mode.
Note: You can check the fault light on the back of each controller to see where the problem occurred. Note: The fault light does not come on when you remove the controller from a dual-controller system in an HA pair.

Fault

Amber

Off Activity Blinking green

The system is operating normally. Data ONTAP is running on the controller. The length of time that the light remains on is proportional to the controller's activity. Data ONTAP is not running on the controller.

Off

Location and meaning of LEDs on the back of 62xx and SA620 controllers
You can check the LEDs on the back of the controller to learn the status of its network or disk shelf connections, or, in an HA pair, to identify the controller where a fault occurred. The following illustration shows the LEDs on left side the back of the 62xx and SA620 controllers.

System LEDs | 69

2
0

3
e0a e0b e0c e0d e0e e0f

!
LNK

LNK LNK

LNK

1 2 3 4 5 6 7 8 9

Remote management port LEDs

Private management port LEDs

GbE port LEDs

Controller fault LED

Remote management port

Private management port

GbE port

10-GbE ports

10-GbE port LEDs

70 | Platform Monitoring Guide The following table describes the meaning of the LEDs on left side of the back of the controller. LED label LED name Fault LED type Activity Status indicator Amber Description The controller is the one causing the front panel fault LED to be illuminated.
Note: The LED might be illuminated on both controllers.

Off Remote management and Activity (Right) Link (Left) Green Off Amber Off Private management and Link (Left) Green Off Activity (Right) Amber Off Port number and GbE Link (Left) Green Off Activity (Right) Amber Off

The controller is functioning properly. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and a downstream disk shelf. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and some upstream device. No link is established. Traffic is flowing over the connection. No traffic is flowing over the connection.

System LEDs | 71

LED label Port number and

LED name 10 GbE

LED type Activity (Top)

Status indicator Amber Off

Description Traffic is flowing over the connection. No traffic is flowing over the connection. A link is established between the port and some upstream device. No link is established.

Link (Bottom)

Green Off

The following illustration shows the location of ports and LEDs on the right side of the back of the controller.

1
0a
LNK

0b
LNK LNK

0c

0d
LNK

1 2 3 4

USB port

8-Gb Fibre Channel port LED

8-Gb Fibre Channel ports

Console port

72 | Platform Monitoring Guide The following table describes the meaning of the LEDs on the right of the back of the controller. LED label Port number and LED name 8-Gb Fibre Channel LED type Link Status indicator Description Green Off A connection is established on the port. No connection is established on the port.

Location and meaning of the 62xx and SA620 I/O expansion module LED
You can check the back of the I/O expansion module to check whether a fault has occurred. The following illustration shows the ports and LEDs on the back of an 62xx and SA620 I/O expansion module. 2 3 2

1 2 3

Fault LED

PCIe slots

Vertical I/O slots

The following table describes the meaning of LEDs on the I/O expansion module.

System LEDs | 73

LED label

LED name Fault

LED type Status indicator Activity Amber Off

Description A fault has occurred. The I/O expansion module is operating properly.

Location and meaning of 62xx and SA620 fan LEDs


You can check the LED on each fan module FRU to pinpoint problems that can occur in the FRU. When the bezel is removed, the fan module FRUs and their LEDs are visible. The following illustration shows the LED on a fan module FRU.

LED

The fan module FRU LED is amber and turns on when a problem occurs in the fan. If you see error messages indicating a fan problem, you can remove the bezel and use the illuminated fan FRU LED to locate the FRU where the problem occurred.

Location and meaning of 62xx and SA620 PSU LEDs


You can check the LEDs on each PSU to see whether its power is on and whether the PSU is working properly. The following illustration shows the location of PSU LEDs on the back of the system.

74 | Platform Monitoring Guide

1 2

Fault LED

Power LED

The following table describes what the PSU LEDs mean. Power LED status Green Off Off Fault LED status Off Off Blinking amber PSU condition PSU is present and switched on. Normal mode. PSU is switched off. PSU fault: AC in is out of range, or there is a DC fault or fan fault.

Location and meaning of 62xx and SA620 internal FRU LEDs


62xx and SA620 systems contain LEDs near FRUs inside the controller and I/O expansion module that assist in troubleshooting the FRUs. The following FRUs LEDs are in the controller: DIMMs (up to 12) RTC battery USB boot device PCIe slots

System LEDs | 75 10-GbE slot I/O slots (2)

The following FRU LEDs are in the I/O expansion module: PCIe slots I/O slots

FRU LEDs are off when the FRU is functioning normally and turn amber when a problem occurs. They stay lit for at least 10 minutes even after you remove the controller or I/O expansion module from the chassis.

HBA LEDs
HBAs have LEDs that you can check to learn whether the adapter has power, whether a link is established, or whether an error has occurred. Storage systems might have Fibre Channel or iSCSI host bus adapters installed and configured on them.

Location and meaning dual-port Fibre Channel HBA LEDs


You can check the LEDs on the HBA to learn the status of the Fibre Channel connection. The following illustration shows the location of the LED on a dual-port Fibre Channel HBA.

Green LED

76 | Platform Monitoring Guide

Amber LED

The following table explains what the LEDs on a dual-port Fibre Channel HBA mean. Green On Off Off On Flashing Amber On Blinking On Off Off Description The power is on. Sync is lost. Signal is acquired. Ready. 4 seconds solid followed by one flash: 1-Gb link speed. 4 seconds solid green link followed by two flashes: 2-Gb link speed. Adapter firmware error has been detected.

Flashing

Blinking

Location and meaning of dual-port, 4-Gb or 8-Gb, target-mode Fibre Channel HBA LEDs
You can check the LEDs to learn whether the HBA power is on, whether a firmware error has been detected, and whether a link has been established. The following illustration shows the location of LEDs on a dual-port, 4-Gb or 8-Gb, target-mode Fibre Channel HBA.

System LEDs | 77

1 2 3 4 5 6 7 8 9 10 11 12

Amber

Green

Yellow

Port a

Port b

Yellow

Green

Amber

TX

RX

TX

RX

The following table explains what the LEDs mean. Yellow Off On Green Off On Blinking Blinking alternately Amber Off On Description Power is off. Power is on, before firmware initialization. Power is on, after firmware initialization. A firmware error is detected.

78 | Platform Monitoring Guide

Yellow Off

Green Off

Amber

Description

On/ 4-Gb HBA: 1-Gbps link/I/O is established. Blinking 8-Gb HBA: On for 2 Gbps link up. If there is I/O activity, the LED blinks several times per second. 4-Gb HBA: 2-Gbps link/I/O is established. 8-Gb HBA: On for 4 Gbps link up. If there is I/O activity, the LED blinks several times per second. 4-Gb HBA: 4-Gbps link/I/O is established. 8-Gb HBA: On for 8 Gbps link up. If there is I/O activity, LED blinks several times per second.

Off

On/ Off Blinking Off Off

On/ blinking

Blinking Off

Blinking Beacon.

Location and meaning of dual-port, 8-Gb Fibre Channel Virtual Interface HBA LEDs
You can check the LEDs to learn whether the HBA power is on, whether a firmware error has been detected, and whether a link has been established. The following illustration shows the location of LEDs on the dual-port, 8-Gb Fibre Channel Virtual Interface HBA.

1 2 3 4 5 3 2 1 6 7 6 7

Amber LED

System LEDs | 79

2 3 4 5 6 7

Green LED

Yellow LED

Port a

Port b

Transmitter port

Receiver port

The following table explains what the LEDs mean. Yellow Off On Blinking Green Off On Blinking Amber Off On Blinking Description Power off Power on, before firmware initialization Power on, after firmware initialization Firmware error Online, 2 Gbps link/ I/O activity Online, 4 Gbps link/ I/O activity Online, 8 Gbps link/ I/O activity

Yellow, green, and amber LEDs blinking alternately Off Off On/blinking Off On/blinking Off On/blinking Off Off

Location and meaning of quad-port, 4-Gb, Fibre Channel HBA LEDs: fourLED version
You can check the LEDs on the HBA to learn the status of the storage system Fibre Channel link and whether data is being transferred. The following illustration shows the location of LEDs.

80 | Platform Monitoring Guide

1 2 3 4 5 6 7 8

Port A (as identified by Data ONTAP)

Port B (as identified by Data ONTAP)

Port C (as identified by Data ONTAP)

Port D (as identified by Data ONTAP)

Port A LED

Port C LED

Port B LED

Port D LED

The following table describes what the LEDs mean.

System LEDs | 81

LED label

Status indicator

Description There is a loss of sync or no link. There is a fault. 1-Gbps link is established. 1-Gbps data transfer is taking place. 2-Gbps link is established. 2-Gbps data transfer is taking place. 4-Gbps link is established. 4-Gbps data transfer is taking place.

By port letter White Blinking white Amber Blinking amber Green Blinking green Blue Blinking blue

Location and meaning of quad-port, 4-Gb, Fibre Channel HBA LEDs: 12LED version
You can check the LEDs on the HBA to learn the status of Fibre Channel connection and whether data is being transferred. The following illustration shows the location of LEDs.

1 2

Port A (as identified in Data ONTAP)

Port B (as identified in Data ONTAP)

82 | Platform Monitoring Guide

3 4 5 6 7

Port C (as identified in Data ONTAP)

Port D (as identified in Data ONTAP)

Ports A through D yellow LEDs

Ports A though D green LEDs

Ports A through D amber LEDs

The following table describes what the LEDs mean. Yellow LEDs Green LEDs Off On Blinking Blinking alternately Off Off Off Off On Blinking Off Off On Blinking Off Off On Blinking Off Off Off Off Amber LEDs Description The power is off. The power is on (before firmware initialization). The power is on (after firmware initialization). A firmware error is detected. 1-Gbps link is established. 1-Gbps data transfer is taking place. 2-Gbps link is established. 2-Gbps data transfer is taking place. 4-Gbps link is established. 4-Gbps data transfer is taking place.

Location and meaning of fiber-optic iSCSI target HBA LEDs


You can check the LEDs on the HBA to learn whether the HBA is on, whether it is connected to the network, and whether there is data activity. The following illustration shows the location of LEDs on a fiber optic, iSCSI, target HBA.

System LEDs | 83

1 2 3 4

LINK LED

ACT LED

Port 2

Port 1

The following table explains what the LEDs on a fiber optic, iSCSI, target HBA mean. LED label LINK Status indicator Yellow Off ACT Green Blinking green Description The HBA is on and connected to the network. The HBA is not connected to the network. A connection is established. There is data activity.

Location and meaning of copper iSCSI target HBA LEDs


You can check the HBA LEDs to learn whether the HBA is running at 1 Gbps, whether a connection is established, and whether there is data activity. The following illustration shows the location of LEDs on a copper iSCSI target HBA.

84 | Platform Monitoring Guide

1 2 3 4

Speed LED

ACT LED

Port 2

Port 1

The following table explains what the LEDs on a copper iSCSI target HBA mean. LED label Speed Status indicator Green Off ACT Amber Blinking amber Description The HBA is running at 1 Gbps. The HBA is not running at 1 Gbps. A connection is established. There is data activity.

System LEDs | 85

Location and meaning of dual-port, 10-Gb, FCoE unified target HBA LEDs
You can check the LEDs on the HBA to learn about SAN or LAN traffic over the HBA and the status of the HBA and the connection. The following illustration shows the location of LEDs on a dual-port, 10-Gb, FCoE (Fibre Channel over Ethernet) HBA.

2 3 4 2
1

5 6 5 6

1 2 3 4 5 6

One of two LAN LEDs

One of two SAN LEDs

Port a

Port b

One of two transmitter ports

One of two receiver ports

86 | Platform Monitoring Guide The ports in the preceding illustration are labeled a and b because Data ONTAP identifies ports alphabetically. The physical ports are labeled Port 1 for Port a and Port 2 for Port b.
Note: These HBAs are supported only in target mode and single system image controller failover

cfmode. You cannot use this HBA as an initiator to connect to disks or tape, and you cannot use it for Fabric MetroCluster interconnect configurations. The following table explains what the LEDs on a dual-port,10-GB, FCoE HBA mean. Port a SAN traffic green LED Off Slow flashing (unison) On On Flashing Flashing LAN traffic green LED Off Slow flashing (unison) On Flashing On Flashing Hardware state Power off. Power on/no link. Power on/link established, no activity. Power on/link established, Rx/Tx Ethernet activity only. Power on/link established, Rx/Tx storage activity only. Power on/link established, Rx/Tx Ethernet and storage activity. Beaconing. Power off. Power on/no link. Power on/link established, no activity. Power on/link established, Rx/Tx Ethernet activity only. Power on/link established, Rx/Tx storage activity only. Power on/link established, Rx/Tx Ethernet and storage activity. Beaconing.

Slow flashing, alternating Slow flashing, alternating with other LED with other LED b Off Slow flashing (unison) On On Flashing Flashing Off Slow flashing (unison) On Flashing On Flashing

Slow flashing, alternating Slow flashing, alternating with other LED with other LED

System LEDs | 87

Location of dual-port, 3-Gb SAS HBA ports


Dual-port, 3-Gb SAS HBAs do not have LEDs that you can monitor. The following illustration shows the location of ports on a dual-port 3-Gb SAS HBA and its cable.

1 3 4

1 2 3 4

Port A

Port B

QSFP-to-Mini-SAS copper cableMini-SAS connector (to card)

QSFP-to-Mini-SAS copper cableQSFP connector (to disk shelf)

Location of quad-port, 3-Gb SAS HBA ports


Quad-port, 3-Gb SAS HBAs do not have LEDs that you can monitor. The following illustration shows the location of ports on a quad-port, 3-Gb SAS HBA port and its cable.

88 | Platform Monitoring Guide

5 1 2 3 4

1 2 3 4 5

Port A

Port B

Port C

Port D

SAS QSFP-to-QSFP copper cable

MetroCluster adapter LEDs


MetroCluster adapters have LEDs that you can check to learn whether the adapter has power and whether an error has occurred.

Location and meaning of dual-port, 2-Gb VI-MetroCluster adapter LEDs


You can check the LEDs on the adapter to learn whether the power is on, whether a signal has been acquired, or whether an error has occurred. The following illustration shows the location of LEDs on a dual-port 2-Gb VI-MetroCluster adapter.

System LEDs | 89

2 3 4 2
1

5 6 5 6

1 2 3 3 4 6

One of two amber LEDs

One of two green LEDs

Port A

Port B

One of two transmitter ports

One of two receiver ports

The following table explains what the LEDs mean. Green Off On Off Off Amber Off On Blinking at half-second intervals On Description Power is off. Power is on. Synchronization has been lost. A signal has been acquired.

90 | Platform Monitoring Guide

Green On

Amber Off

Description Adapter is online. A system error has occurred.

Blinking at half-second intervals

Location and meaning of dual-port, 4-Gb MetroCluster adapter LEDs


You can check the LEDs on the adapter to learn whether power is on, whether there is activity, or whether an error has occurred. The following illustration shows the LEDs on the dual-port, 4-Gb MetroCluster adapter.

1 2 3 4 5 6

Amber LED

Green LED

Yellow LED

Port a

Port b

Yellow LED

System LEDs | 91

7 8 9 10 11 12

Green LED

Amber LED

Transmitter port

Receiver port

Transmitter port

Receiver port

The following table describes what the LEDs mean. Yellow Off On Blinking Green Off On Blinking Amber Off On Blinking Description Power is off. Power is on, before firmware initialization. Power is on, after firmware initialization. A firmware error has occurred. Online, 1 Gbps link/ I/O activity. Online, 2 Gbps link/ I/O activity. Online, 4 Gbps link/ I/O activity.

Yellow, green, and amber LEDs blinking alternately Off Off On/blinking Off On/blinking Off On/blinking Off Off

Location and meaning of dual-port, 8-Gb MetroCluster adapter LEDs


You can check the LEDs on the adapter to learn whether power is on, whether there is activity, or whether an error has occurred. The following illustration shows the LEDs on the dual-port, 8-Gb MetroCluster adapter.

92 | Platform Monitoring Guide

1 2 3 4 5 3 2 1 6 7 6 7

1 2 3 4 5 6 7

Amber LED

Green LED

Yellow LED

Port a

Port b

Transmitter port

Receiver port

The following table describes what the LEDs mean. Yellow Off On Green Off On Amber Off On Description Power off Power on, before firmware initialization

System LEDs | 93

Yellow Blinking

Green Blinking

Amber Blinking

Description Power on, after firmware initialization Firmware error Online, 2 Gbps link/ I/O activity Online, 4 Gbps link/ I/O activity Online, 8 Gbps link/ I/O activity

Yellow, green, and amber LEDs blinking alternately Off Off On/blinking Off On/blinking Off On/blinking On Off

GbE NIC LEDs


Gigabit Ethernet NICs have LEDs that you can check to learn the status of the Ethernet connection and, in some cases, transfer speeds. The GbE NICs in your system might be fiber optic-based or copper-based. They might have one, two, or four ports.

Location and meaning of single-port GbE NIC LEDs


You can check the LEDs on your single-port copper or fiber GbE NIC to learn whether there is a network connection and whether there is data activity. On copper GbE NICs, you also can learn how fast data is being transmitted. The following illustration shows the location of LEDs on copper and fiber single-port GbE NICs.

94 | Platform Monitoring Guide

1 2

Copper 10Base-T/100Base-BX/1000Base-T NIC

Fiber 1000Base-SX NIC

The following table explains what the LEDs on single-port copper GbE NICs mean. LED type ACT/LNK Status indicator Green Description A valid network connection is established.

Blinking green or blinking amber There is data activity. Off 10=OFF 100=GRN 1000=YLW Off Green Yellow There is no network connection. Data transmits at 10 Mbps. Data transmits at 100 Mbps. Data transmits at 1000 Mbps.

The following table explains what the LEDs on single-port fiber GbE NICs mean.

System LEDs | 95

LED type LNK

Status indicator On Off

Description A valid network connection is established. There is no network connection. There is data activity. There is no network activity present.

ACT

On Off

Location and meaning of single-port, 10-GbE NIC LEDs (FAS2050 systems only)
You can check the LEDs on your single-port, 10-GbE NIC to learn whether there is a network connection and whether there is data activity. This NIC is used only in FAS2050 systems. The following illustration shows the location of LEDs on the single-port, 10-GbE NIC.

1 2

LINK/ACT LED

Port A

96 | Platform Monitoring Guide The following table explains what the LEDs on the single-port 10-Gb NIC mean. LED label LINK/ACT Status indicator Green Blinking amber Off Description A valid network connection is established. There is data activity. There is no network connection present.

Location and meaning of LEDs on the dual-port 10-GbE NIC that supports fiber optic cables with SFP+ modules or copper SFP+ cables
You can check the LEDs on your dual-port 10-GbE NIC that supports fiber optic cables and SFP + optical modules or copper SFP + cables to learn whether there is a network connection and whether there is data activity. The following illustration shows the location of LEDs and ports on the NIC.

1 3 5 4 2

1 2

LINK/ACT LED for Port A

LINK/ACT LED for Port B

System LEDs | 97

3 4 5

Port A

Port B

SFP module latches

The following table explains what the LEDs on the NIC mean. LED label LINK/ACT Status indicator Green Blinking amber Off Description A valid network connection is established. There is data activity. There is no network connection present.

Location and meaning of LEDs on the dual-port 10-GbE NIC that supports fiber optic cables with X6569 SFP+ modules or copper SFP+ cables
You can check the LEDs on your dual-port 10-GbE NIC that supports fiber optic cables and X6569 SFP+ optical modules or copper SFP+ cables to learn whether there is a network connection, whether there is data activity, and whether the card is operating at 10-Gb speed. The following illustration shows the location of LEDs and ports on the NIC.

98 | Platform Monitoring Guide

3 1 2
GRN=10G ACT/LNK A

4 5 6
GRN=10G ACT/LNK A

1 2 3 4 5 6

Port A 10-Gb link LED

Port A ACT/Link LED

Port A with SFP+ installed

Port B with no SFP+ connector

Port B 10-Gb link LED

Port B ACT/Link

System LEDs | 99 The following table explains what the LEDs on the card mean. LED label GRN=10G LINK/ACT Status indicator Green Green Blinking amber Off Description The NIC is operating at 10 Gb speed. A valid network connection is established. There is data activity. There is no network connection present.

Location and meaning of multiport GbE NIC LEDs


You can check the LEDs on your multiport copper or fiber GbE NIC to learn whether there is a network connection and whether there is data activity. On copper GbE NICs, you also can learn how fast data is being transmitted. The following illustration shows the location of LEDs on copper and fibre dual-port GbE NICs.

1 2 3

Copper 10Base-T/100Base-TX/1000Base-T NIC

Fiber 1000Base-SX NIC

Network speed LEDs

The following illustration shows the location of LEDs on copper quad-port GbE NICs.

100 | Platform Monitoring Guide 1 2

4 3

5 4

6 5 6

Note: The orientation of the ports on NICs might differ.

1 2 3 4 5 6

ACT LED

LNK LED

Port a

Port b

Port c

Port d

The following table explains what the LEDs on a copper multiport GbE NIC mean.

System LEDs | 101

LED type ACT

Status indicator Green

Description A valid network connection is established.

Blinking green or blinking amber There is data activity. Off LNK Off Green Amber There is no network connection. Data transmits at 10 Mbps. Data transmits at 100 Mbps. Data transmits at 1000 Mbps.

The following table explains what the LEDs on the fiber multiport GbE NICs mean. LED type LNK Status indicator On Off ACT On Off Description A valid network connection is established. There is no network connection. There is data activity. There is no network activity present.

TOE NIC LEDs


TOE NICs have LEDs that you can check to learn the state of the network connection. TOE NICs might have one port or multiple ports.

Location and meaning of single-port TOE NIC LEDs


The single-port TCP offload engine is a 10GBase-SR fiber optic NIC. You can check the NIC LEDs to learn whether it is on, whether there is a network connection, or whether the operating system has booted. The following illustration shows the location of LEDs on the NIC.

102 | Platform Monitoring Guide

1 2 3 4

Fiber optic LC port

LINK LED

ACT LED

STAT (power) LED

The following table explains what the LEDs mean. LED type ACT/LNK Status indicator Green Blinking green Off STAT Red Off Description A valid network connection is established. There is data activity. There is no network connection. The NIC is receiving power and is on. The operating system has booted.

System LEDs | 103

Location and meaning of dual-port, 10GBase-SR TOE NIC LEDs


You can check the LEDs on the TOE NIC to learn whether there is a network connection or data activity. The following illustration shows the location of LEDs on the TOE NIC.

1 2 3 4

LINK/ACT LED, port A

LINK/ACT LED, port B

Fiber optic LC, port A

Fiber optic LC, port B

The following table explains what the LEDs on the TOE NIC mean. LED label LINK/ACT Status indicator Green Green Off Description A valid network connection is established There is data activity. There is no network connection present.

104 | Platform Monitoring Guide

Location and meaning of dual-port, 10GBase-CX4 TOE NIC LEDs


You can check the LEDs on the TOE NIC to learn whether there is a network connection or data activity.
Note: The 10GBase-CX4 dual-port TOE NIC is for use only on systems running Data ONTAP

10.0.3 or later. The following illustration shows the location of LEDs on the TOE NIC.

1 2 3 4

LINK/ACT LED A

Port A

LINK/ACT LED B

Port B

The following table explains what the LEDs on the TOE NIC mean.

System LEDs | 105

LED type LINK/ACT

Status indicator Green Blinking green Off

Description A valid network connection is established. There is data activity. There is no network connection present.

Location and meaning of quad-port TOE NIC LEDs


You can check the LEDs on the TOE NIC to learn whether there is data activity and the speed of data transmission. The following illustration shows the location of LEDs on the TOE NIC.

1 2 3 4 5

Activity LEDs: LED 1 corresponds to port a, LED 2 corresponds to port b, and so on.

Port a

Port b

Port c

Port d

106 | Platform Monitoring Guide

6 7 8 9 10

Activity LEDs: LED 1 corresponds to port a, LED 2 corresponds to port b, and so on.

Port d

Port c

Port b

Port a

The following table explains what the LEDs on the TOE NIC mean. LED label Labeled by port number Status indicator Yellow Green Blinking Description Data transmits at 1 Gbps. Data transmits at 10/100 Mbps. There is data activity.

NVRAM adapter LEDs


NVRAM adapter LEDs enable you to determine whether NVRAM is holding unwritten data and, in HA pairs, to check the connection between the two nodes. NVRAM preserves unwritten data if your system loses power. NVRAM also is the HA interconnect when your system is in an HA pair, except when you use MetroCluster. Different systems have different kinds of NVRAM adapters. NVRAM5, NVRAM6, and NVRAM8 adapters plug into the motherboard. NVRAM7 is integrated into the motherboard. The following table shows the type of NVRAM that different systems support. NVRAM type NVRAM5 NVRAM6 Systems 3020 and 3050 3040, 3070, and SA300 60xx and SA600

NVRAM7 NVRAM8

31xx 62xx

System LEDs | 107

Location and meaning of NVRAM5 and NVRAM6 LEDs


You can check the LEDs to learn whether there is valid data in NVRAM when your system loses power. When you use the NVRAM adapter as an HA interconnect, you also can check the LEDs to learn whether there is a connection between the nodes. Two sets of LEDs by each port on the faceplate operate when you use the NVRAM5 or NVRAM6 adapter as an HA interconnect. NVRAM adapters also have an internal LED that you can see through the faceplate. The following illustration shows LEDs on the NVRAM5 and NVRAM6 adapter.

L01 PH1

L02 PH2

NVRAM5

The following table explains what the LEDs on an NVRAM5 or NVRAM6 adapter mean. LED type Internal Indicator Red Status Blinking Description There is valid data in NVRAM.
Note: The LED might blink red if your system did not shut down properly, as in the case of a power failure or panic. The data is replayed when the system boots again.

PH1

Green

On Off

The physical connection is working. No physical connection exists. The logical connection is working. No logical connection exists.

LO1

Yellow

On Off

108 | Platform Monitoring Guide

Location and meaning of NVRAM7 LEDs


You can check the LEDs to learn if there is any unwritten data in NVRAM if your controller loses power. Each 31xx controller has two NVRAM7 LEDs: One is near the left front corner of the motherboard next to the NVRAM DIMM. The LED is labeled "D35" and "NVRAM Data Valid When Lit." You can see the LED only after you remove the controller from the chassis. One is near the right rear corner of the motherboard. It is labeled "D87." You can see the LED through the rear grille of the controller, as shown in the following illustration.

NVRAM7 LED

NVRAM7 LEDs flash red if unwritten data is being held in NVRAM when power to the controller is turned off. If you remove the NVRAM7 battery or NVRAM7 DIMM when the red LEDs are flashing, you lose data that is being held in NVRAM.
Note: In an HA pair, each node continually monitors its partner and mirrors its partner's NVRAM data. Therefore, if you remove a controller from a 31xx system in an HA pair without first shutting it down, you can disregard the illuminated NVRAM LEDs on the motherboard of the removed controller.

System LEDs | 109

Location and meaning of NVRAM5 and NVRAM6 media converter LEDs


You can check the LED to learn whether the media converter has power, whether a link is present, and whether the converter is operating normally. The following illustration shows the location of the LED on NVRAM5 and NVRAM6 media converters. 1

1 2

LED

Media converter

The following table explains what the LED on NVRAM5 and NVRAM6 media converters means. Indicator Green Green/amber Green Status On On Flickering or off Description Normal operation Power is present but link is down. Power is present but link is down.

Location and meaning of NVRAM8 LEDs


You can check the LEDs on the NVRAM8 adapter to check the connection between controllers in an HA pair and to learn the status of data when the system loses power. Five LEDs are on the faceplate, and one LED on the adapter board is visible through the faceplate grille. The following illustration shows the LEDs on the NVRAM8 adapter.

110 | Platform Monitoring Guide

1 3 4

LNK ACT

INT LNK

5 7

LNK

ACT

1 2 3 4 5 6 7

InfiniBand port 0 link LED

InfiniBand port 0 activity LED

InfiniBand port 0 connector

Internal link select LED

InfiniBand port 1 link LED

InfiniBand port 1 activity LED

InfiniBand port 1 connector

Port 0 link and activity LEDs are relevant when port 0 of the controller is connected to a partner in an HA pair. The following table explains the meaning of the port 0 LEDs.

System LEDs | 111

LED name Port 0 link

Status indicator Green Off

Description A physical connection is working on the port 0 connector. A physical connection is not working on the port 0 connector. A logical connection is working on the port 0 connector. A logical connection is not working on the port 0 connector.

Port 0 activity

Amber Off

Port 1 LEDs reflect the state of the port 1 connector used between two controllers installed in different chassis or the state of the internal InfiniBand connection used between two controllers installed in the same chassis. The following table explains the meaning of the port 1 LEDs. LED name Status indicator Internal link select LED status On (internal midplane connection) An internal physical connection is working over the midplane. An internal physical connection is not working over the midplane. An internal logical connection is working over the midplane. Off (external cable connection) An external physical connection is working on the port 1 connector. An external physical connection is not working on the port 1 connector. An external logical connection is working on the port 1 connector.

Port 1 link Green

Off

Port 1 activity

Amber

Off

An internal logical connection is not An external logical connection working over the midplane. is not working on the port 1 connector.

Port 1 LEDs depend on the state of the Internal link select LED, which in HA pair configurations depends on how the controllers are connected. The following table explains the meaning of the internal link select LED. LED name Internal link select Status indicator Description Green Off The HA pair consists of two controllers in the same chassis connected over the internal midplane. The HA pair consists of two controllers in different chassis connected by an external cable.

112 | Platform Monitoring Guide A destage status LED, on the top of the adapter board, is visible through the grille of the faceplate halfway between the top of the faceplate and the InfiniBand port 0 LEDs. The LED shows the status of NVRAM8 data after an unexpected loss of system power. Data may need to be destaged, or saved from active DRAM to nonvolatile flash memory after an unexpected power loss. Destaging lasts about one minute. Once data has been destaged, it must be restaged, or restored from nonvolatile flash memory to active DRAM during system initialization. The destage LED may be lit as red or green. Its behavior depends on whether the system power is on or off. When the system power is off, the LED behavior depends on whether the NVRAM8 adapter is running on battery power. The battery automatically turns off after data is destaged. The following table explains the meaning of the destage status LED when the NVRAM8 adapter is in the controller. Destage LED status indicator Red System power on System power off Battery power on The NVRAM8 adapter has destage data that needs to be restored. The NVRAM8 adapter has restored data and is ready for the next destage. Invalid Battery power off N/A

Green

Invalid

N/A

Alternating red and Invalid green Off Invalid

The NVRAM8 adapter N/A is destaging data. Invalid The NVRAM8 adapter has finished destaging data.

You can use the destage status LED when the adapter is removed from the system to determine whether destage data is in the NVRAM8 adapter. The following illustration shows the location of the destage status LED.

System LEDs | 113

1 2

Destage status LED

InfiniBand port 0 LEDs

You activate the destage status LED when the NVRAM8 adapter is removed from the controller by pressing and holding the button marked SW6 and STATUS on the bottom of the adapter board. The following illustration shows the location of the button.

114 | Platform Monitoring Guide

STATUS
SW6

STATUS
SW6

Button for activating destage status LED

The LED consists of a red LED and a green LED that might turn on separately or together, creating a light that appears amber. The following table explains the meaning of the destage status LED when the button is pressed. LED color None (Off) Amber Green Red Description No status; no battery power. Miscellaneous status for debugging. No data in flash memory; not destaged. Data in flash memory; destaged.

System LEDs | 115

Flash Cache module and PAM LEDs


Flash Cache modules and Performance Acceleration Modules (PAMs) have LEDs that you can check to ensure that the card has power or to learn about its performance. Flash Cache modules are available in capacities of 256 GB, 512 GB, and 1 TB. PAMs have a capacity of 16 GB. This document uses the term Flash Cache module to refer to caching modules with capacities greater than 16 GB. Before the release of Data ONTAP 7.3.5, such adapters were called Performance Acceleration Modules (PAM II). The name of the 16-GB caching module remains Performance Acceleration Module (PAM I).

Location and meaning of PAM LEDs


The PAM has two LEDs, both visible through the perforations of the PCIe bracket. You can check the LEDs to ensure that the module is in place and has power. The position of the LEDs relative to the system depends on the model of the system it is installed in. Different systems can have horizontal or vertical expansion slots. The following table describes the behavior of the module LEDs. LED Green Blinking blue Description Power ready indicator. Replace the card if the LED is off. Indicates the presence of the card. The LED dims slightly on heavy loads. Replace the card if it does not blink after you boot Data ONTAP.

Location and meaning of Flash Cache module LEDs


Each Flash Cache module has two LEDs, which you can check to see if the module is operating properly and to view its performance. The illustration shows the LEDs on a module.

116 | Platform Monitoring Guide

1 2

The following table explains what the LEDs on the module mean.

1 2

Fault

Activity

LED type Fault Activity

Status indicator Solid amber Blinking green

Description A fault has occurred. There is activity on the card. The LED blinks once every two seconds when the card is idle and increases the blink rate as its performance increases up to 10 times per second.

117

Startup messages
When you apply power to your system, it verifies the hardware that is in the system, loads the operating system, and displays startup informational and error messages on the system console. There are two types of startup error messages: POST error messages Boot error messages

Both error message types are displayed on the system console, and an e-mail notification is sent out by the remote management subsystem, if it is configured to do so.

POST messages
POST is a series of tests run from the motherboard PROM. These tests check the hardware on the motherboard and differ depending on your system configuration. POST messages appear on the system console before Data ONTAP software is loaded. The following text is an example of a POST message on the console on a system that uses the LOADER boot environment. Systems using the CFE boot environment display similar messages.
Phoenix TrustedCore(tm) Server Copyright 1985-2005 Phoenix Technologies Ltd. All Rights Reserved Portions Copyright (c) 2005-2009 NetApp All Rights Reserved BIOS Version: 1.7X9 CPU= Dual Core AMD Opteron(tm) Processor 885 X 4 Testing RAM. 512MB RAM tested 32768MB RAM installed Fixed Disk 0: NACF1GBJU-A11 Boot Loader version 1.6.1X2 Copyright (C) 2000-2003 Broadcom Corporation. Portions Copyright (C) 2002-2009 NetApp CPU Type: Dual Core AMD Opteron(tm) Processor 885 Starting AUTOBOOT press Ctrl-C to abort... Note: If your system has an LCD, it displays POST messages without a header.

118 | Platform Monitoring Guide

Boot messages
After the boot is successfully completed, your system loads the operating system. Messages provide information about your system and alert you to errors that occur during boot.
Note: The exact boot messages that appear on your system console depend on your system

configuration. The following message is an example of the start of a boot message that appears on the system console of a FAS6030 storage system at first boot.
NetApp Release 7.3.1X19: Sat Nov 22 02:04:05 PST 2008 Copyright (C) 1992-2008 NetApp. Starting boot on Wed Mar 25 00:51:31 GMT 2009 Wed Mar 25 00:52:13 GMT [diskown.isEnabled:info]: Software ownership has been enabled ... Wed Mar 25 00:51:17 GMT [fmmb.current.lock.disk:info]: Disk 0b17 is a local HA mailbox disk Wed Mar 25 00:51:17 GMT [fmmb.current.lock.disk:info]: Disk 0b16 is a local HA mailbox disk ... Wed Mar 25 00:51:17 GMT [cf.fm.partner:info]: Cluster monitor: partner 'node2' ...

FAS20xx and SA200 startup progress


FAS20xx and SA200 systems do not display POST error messages on the system console. You can track BIOS and boot loader progress by watching a progress indicator on the system console and by monitoring a sensor through the BMC.

Method of viewing progress on the console


You can view BIOS and boot loader progress by monitoring the progress indicator on your system console. The initial BIOS message appears on the console about five seconds after the system starts. After that, and before the boot loader runs, continued POST progress is indicated by a line of dots (.) or plus signs (+). These dots or plus signs follow the line showing the BIOS version, as shown in the console output below:
AMI BIOS8 Modular BIOS Copyright (C) 1985-2006, American Megatrends, Inc. All Rights Reserved Portions Copyright (C) 2006 Network Appliance, Inc. All Rights Reserved BIOS Version 3.0

Startup messages | 119


................... Boot Loader version 1.3 Copyright (C) 2000,2001,2002,2003 Broadcom Corporation. Portions Copyright (C) 2002-2005 Network Appliance Inc. CPU Type: Mobile Intel(R) Celeron(R) CPU 2.20GHz Starting AUTOBOOT press Ctrl-C to abort...

The dots or plus signs are a progress indicator to show that the BIOS is not hung. If the system restarts after a fault, the dots are replaced by plus signs to indicate that the system NVMEM is armed, or being protected, during the boot process. The BIOS should begin loading Data ONTAP within about 25 seconds after the initial greeting.

Method of viewing progress through the BIOS Status sensor


The BMC monitors boot progress; you can determine the boot progress status through the BIOS Status sensor by entering the sensors show BMC command. The following text shows partial output of the BMC sensors show command:
bmc shell -> sensors show name State ID Reading Crit-Low Warn-Low Warn-Hi Crit-Hi -----------------------------------------------------------------------------------1.1V Normal #77 1121 mV 95 mV --1239 mV 1.2V Normal #76 1239 mV 1038 mV --1357 mV 1.5V Normal #75 1522 mV 1309 mV --1699 mV 1.8V Normal #74 1829 mV 1569 mV --2029 mV 12.0V Normal #70 12080 mV 10160 mV --13840 mV 2.5V Normal #73 2520 mV 2116 mV --2870 mV 3.3V Normal #72 3374 mV 2808 mV --3799 mV BIOS Status Normal #f0 Loader #20 ----Batt 8.0V Normal #50 7552 mV --8512 mV 8576 mV Batt Amp Normal #59 0 mA --2112 mA 2208 mA

In the sensors show output, the BIOS Status sensor displays one of three states: Normal, Hung, or Error. In the Reading column, the sensor displays BIOS and boot loader progress. In the example output, the BIOS Status sensor displays a state of Normal and a reading of Loader #20, indicating that the boot loader is running normally. The following table lists the BIOS and boot loader progress values. Status 0x00 0x01 0x02 0x05 0x13 0x1F Description System software has cleanly shut down. (Sent only by Data ONTAP.) Memory initialization is in progress. NVMEM initialization is in progress (when NVMEM is armed). User has entered setup. Booting to Data ONTAP (or boot loader). BIOS is starting up. (Special message to the BMC.) This is the first BIOS status message. It might be quickly followed by another.

120 | Platform Monitoring Guide

Status 0x20 0x21 0x22 0x2F 0x60 0x61 0x62 0x63 0x64

Description Boot loader is running. Boot loader is programming the primary firmware hub. The BMC does not allow the system to be powered down at this time. Boot loader is programming the alternate firmware hub. The BMC does not allow the system to be powered down at this time. Boot loader has transferred control to Data ONTAP. Data ONTAP might send this periodically to inform the BMC that Data ONTAP is running, if the BMC has rebooted. BMC has shut power off. BMC has turned power on. BMC has reset the system. BMC Watchdog power cycle. BMC Watchdog cold reset.

The BIOS Status sensor also displays BIOS and boot loader error codes. If the BIOS status sensor displays a Hung or Error state, contact technical support for interpretation of the codes.

3020 and 3050 system and C2300 and C3300 NetCache appliance POST error messages
POST error messages might appear on the system console if your system encounters errors while the CFE initiates the hardware.

Abort AutobootPOST Failure(s): CPU


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action Abort AutobootPOST Failure(s): CPU At least one CPU fails to start up properly. 1. Power-cycle the system to see whether the problem persists. 2. Replace the motherboard tray if the problem persists.

Startup messages | 121

Abort AutobootPOST Failure(s): MEMORY


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for that error message.

Message Description

Abort AutobootPOST Failure(s): MEMORY The memory test failed.

Corrective action 1. Make sure that DIMMs are seated properly, then power- cycle your system. 2. Replace the DIMM if the problem persists.
Note: There is an LED next to each DIMM on the motherboard. When a DIMM fails, the LED lights help you find the failed DIMM.

Abort AutobootPOST Failure(s): RTC, RTC_IO


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Abort AutobootPOST Failure(s): RTC, RTC_IO The Common Firmware Environment (CFE) cannot read the real-time clock (RTC_IO) or the RTC date is invalid (RTC).

Corrective action 1. Use the set date and the set time command to set the date and time. 2. Make sure that the RTC battery is still good.

Abort AutobootPOST Failure(s): UCODE


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

Abort AutobootPOST Failure(s): UCODE At least one CPU fails to load the microcode 1. Power-cycle your system to see whether the problem persists. 2. Replace the motherboard tray if the problem persists.

Autoboot of backup image aborted


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

122 | Platform Monitoring Guide

Message Description

Autoboot of backup image aborted Autoboot is stopped due to a key being pressed during the autoboot process.

Corrective action Power-cycle the system and avoid pressing any keys during the autoboot process.

Autoboot of backup image failed


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

Autoboot of Back up image failed The kernel could not be found on the CompactFlash card. 1. Check the CompactFlash card connection. 2. Make sure that the CompactFlash card content is valid; if it is not, replace the CompactFlash card. 3. Follow the netboot procedure on your CompactFlash card documentation to download a new kernel.

Autoboot of primary image aborted


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Autoboot of primary image aborted Autoboot is stopped due to a key being pressed during the autoboot process.

Corrective action Power-cycle the system and avoid pressing any keys during the autoboot process.

Autoboot of primary image failed


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Autoboot of primary image failed The kernel could not be found on the CompactFlash card.

Corrective action 1. Check the CompactFlash card connection. 2. Make sure that the CompactFlash card content is valid; if it is not, replace the CompactFlash card.

Startup messages | 123

3. Follow the netboot procedure on your CompactFlash card documentation to download a new kernel.

Invalid FRU EEPROM Checksum


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Invalid FRU EEPROM Checksum The system backplane or motherboard Electrically Erasable Programmable Read-Only Memory (EEPROM) is corrupted.

Corrective action Call technical support.

Memory init failure


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for that error message.

Message Description

Memory init failure: Data segment does not compare at XXXX XXXX denotes memory address. The Common Firmware Environment (CFE) failed to initialize the system memory properly.

Corrective action 1. Make sure that the DIMM is supported. 2. Make sure that the DIMM is seated properly. 3. Replace the DIMM if the problem persists.
Note: There is an LED next to each DIMM on the motherboard. When a DIMM fails, the LED lights help you find the failed DIMM.

No Memory found
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

No Memory found The Common Firmware Environment (CFE) cannot detect the system DIMMs.

Corrective action 1. Make sure that the DIMM is seated properly and power- cycle your system. 2. Replace the DIMM if the problem persists.

124 | Platform Monitoring Guide


Note: There is an LED next to each DIMM on the motherboard. When a DIMM fails, the LED lights help you find the failed DIMM.

Unsupported system bus speed


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Unsupported system bus speed 0xXXXX defaulting to 1000Mhz The Common Firmware Environment (CFE) detects an unsupported DIMM.

Corrective action 1. Make sure that the DIMM is seated properly. 2. Replace the DIMM if the problem persists.
Note: There is an LED next to each DIMM on the motherboard. When a DIMM fails, the LED lights help you find the failed DIMM.

3040, 3070, 31xx, 60xx, SA300, and SA600 system POST error messages
POST error messages might appear on the system console if your system encounters errors while the BIOS and boot loader initiate the hardware.

0200: Failure Fixed Disk


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0200: Failure Fixed Disk A disk error occurred. Complete the following steps to see if the CompactFlash card is bad. 1. Enter the following command at the boot environment prompt:
boot_diags

2. Select the cf-card test. 3. If the test shows that the CompactFlash card is bad, replace it. If the CompactFlash card is good, replace the motherboard.

Startup messages | 125

0230: System RAM Failed at offset:


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0230: System RAM Failed at offset The BIOS cannot initialize the system memory or a DIMM has failed. Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot environment prompt:
boot_diags

3. Select the following test:mem. 4. Replace the failed DIMMs.

0231: Shadow RAM failed at offset


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action 0231: Shadow RAM failed at offset The BIOS cannot initialize the system memory or a DIMM has failed. Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot environment prompt:
boot_diags

3. Select the following test: mem. 4. Replace the failed DIMMs.

0232: Extended RAM failed at address line


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message

0232: Extended RAM failed at address line

126 | Platform Monitoring Guide

Description Corrective action

The BIOS cannot initialize the system memory or a DIMM has failed. Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot loader prompt:
boot_diags

3. Select the following test: mem. 4. Replace the failed DIMMs.

0235: Multiple-bit ECC error occurred


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0235: Multiple-bit ECC error occurred The BIOS cannot initialize the system memory or a DIMM has failed. Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot loader prompt:
boot_diags

3. Select the following test: mem. 4. Replace the failed DIMMs.

023C: Bad DIMM found in slot #


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

023C: Bad DIMM found in slot # The BIOS cannot initialize the system memory or a DIMM has failed Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot loader prompt:

Startup messages | 127

boot_diags

3. Select the following test: mem. 4. Replace the failed DIMMs.

023E: Node Memory Interleaving disabled


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

023E: Node Memory Interleaving disabled A bad DIMM was detected, which causes BIOS to disable Node Interleaving. Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot loader prompt:
boot_diags

3. Select the following test:mem. 4. Replace the failed DIMMs.

0241: Agent Read Timeout


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0241: Agent Read Timeout Timeout occurs when BIOS tries to read or write information through System Management Bus (SMBUS) or Inter-Integrated Circuit (I2C). Run the Agent diagnostic test. 1. Enter the following command at the boot loader prompt:
boot_diags

2. Select and run the following tests: agent, 2, and 6. 3. Select and run the following tests: mb, 2, and 8.

128 | Platform Monitoring Guide

0242: Invalid FRU information


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0242: Invalid FRU information The information from the field-replaceable unit (FRU) Electrically Erasable Programmable Read-Only Memory (EEPROM) is invalid. 1. Enter the following command at the boot environment prompt:
boot_diags

2. To determine the FRU involved, select the following tests: mb and 74. 3. Check whether the FRUs model name, serial number, part number, and revision are correct in one of the following ways: Visually inspect the FRU. Look for error messages indicating that the FRU information is invalid or could not be read.

4. Contact technical support if you suspect a misprogrammed FRU.

0250: System battery is dead


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0250: System battery is deadReplace and run SETUP The real-time clock (RTC) battery is dead. 1. Reboot the system. 2. If the problem persists, replace the RTC battery. 3. Reset the RTC.

0251: System CMOS checksum bad


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

0251: System CMOS checksum badDefault configuration used CMOS checksum is bad, possibly because the system was reset during BIOS boot or because of a dead RTC battery.

Startup messages | 129

Corrective action 1. Reboot the system. 2. If the problem persists, replace the RTC battery. 3. Reset the RTC.

0253: Clear CMOS jumper detected


Note: This message occurs only on 60xx and SA600 systems. Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

0253: Clear CMOS jumper detectedPlease remove for normal operation The clear CMOS jumper is installed on the main board. Remove the clear CMOS jumper and reset the system.

0260: System timer error


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action 0260: System timer error The system clock is not ticking. Replace the HT1000 chip.

0280: Previous boot incomplete


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action 0280: Previous boot incompleteDefault configuration used The previous boot was incomplete, and the default configuration was used. Reboot the system.

02C2: No valid Boot Loader in System FlashNon Fatal


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message 02C2: No valid Boot Loader in System FlashNon Fatal

130 | Platform Monitoring Guide

Description

No valid boot loader is found in system flash memory while the option to Halt For Invalid Boot Loader is disabled in setup. As the result, the system still can boot from CompactFlash if it has a valid boot loader. Enter the update_flash command two times to place a good boot loader in the system flash.

Corrective action

02C3: No valid Boot Loader in System FlashFatal


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

02C3: No valid Boot Loader in System FlashFatal No valid boot loader is found in system flash memory while the option to Halt For Invalid Boot Loader is enabled in setup. As the result, the system halts. Users should take corrective action. Place a valid version of the boot loader in the system flash by completing either of the following series of steps: 1. Boot from the backup boot image. 2. Enter the update_flash command. or 1. Enter BIOS setup and disable boot from system flash. 2. Save the setting. 3. Reboot to the boot environment prompt, and then enter the update_flash command two times.

Corrective action

02F9: FGPA jumper detected


Note: This message occurs only on 60xx and SA600 systems. Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

02F9: FGPA jumper detectedPlease remove for normal operation The Field Programmable Gate Array (FPGA) jumper was installed on the motherboard.

Corrective action 1. Remove the FPGA jumper. 2. Reboot the system.

Startup messages | 131

02FA: Watchdog Timer Reboot (PciInit)


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

02FA: Watchdog Timer Reboot (PciInit) The watchdog times out while BIOS is doing PCI initialization. 1. Power-cycle the system a few times or reset the system through the RLM. 2. If the problem persists, check the PCI interface. At the boot environment prompt, enter the following command:
boot_diags

3. Select and run the following tests: mb, 4, 71 4. Replace the motherboard if the diagnostics show a problem.

02FB: Watchdog Timer Reboot (MemTest)


Note: This message appears only on 60xx and SA600 systems. Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action 02FB: Watchdog Timer Reboot (MemTest) The watchdog times out while BIOS is testing the extended memory. 1. Power-cycle the system a few times or reset the system through the RLM. 2. If the problem persists, check the memory interface. At the boot loader prompt, enter the following command:
boot_diags

3. Select and run the following tests: mem and 1 4. Replace the DIMMs if the diagnostics show a problem. 5. Replace the motherboard if the problem persists.

02FC: LDTStop Reboot (HTLinkInit)


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message

02FC: LDTStop Reboot (HTLinkInit)

132 | Platform Monitoring Guide

Description The watchdog times out while BIOS is setting up the HT link speed. 1. Power-cycle the system a few times or reset the system through the Remote LAN Module (RLM). 2. If the problem persists, replace the motherboard.

No message on console
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message

No message on console. Problem might be reported in the Remote LAN Module (RLM) system event log with the code 037h or in the SMBIOS system event log (SEL) with the error code 237h. There is not enough memory to accommodate SMBIOS structure. Perform one of the following steps: Remove some adapters from PCI slots. Check the DIMMs and replace any bad ones by completing the following steps: 1. Make sure that each DIMM is seated properly, then power- cycle the system. 2. If the problem persists, run the diagnostics to determine which DIMMs failed. Enter the following command at the boot loader prompt:
boot_diags

Description Corrective action

3. Select the following test: mem. 4. Replace the failed DIMMs.

2240, 32xx, 62xx, SA320, and SA620 system POST error messages
POST error messages might appear on the system console if your system encounters errors while the BIOS and boot loader initiate the hardware.

0200: Failure Fixed Disk


Message Description
0200: Failure Fixed Disk

A disk error occurred.

Startup messages | 133

Corrective action SP error code

Replace the USB boot device. 000h

0230: System RAM Failed at offset:


Message Description Corrective action SP error code
0230: System RAM Failed at offset:

The BIOS cannot initialize the system memory, or a DIMM has failed. Check and replace the bad DIMM modules. 030h

0231: Shadow RAM Failed at offset:


Message Description Corrective action SP error code
0231: Shadow RAM Failed at offset:

The BIOS cannot initialize the system memory or a DIMM has failed. Check and replace the bad DIMM modules. 031h

0232: Extended RAM Failed at address line:


Message Description Corrective action SP error code
0232: Extended RAM Failed at address line:

The BIOS cannot initialize the system memory, or a DIMM has failed. Check and replace the bad DIMM modules. 032h

BIOS detected uncorrectable ECC error in DIMM slot:


Message Description Corrective action SP error code
BIOS detected uncorrectable ECC error in DIMM slot:

BIOS detected an uncorrectable ECC error in the displayed DIMM slot. Check and replace the bad DIMM modules. 035h

No message on the console


Message Description Corrective action No message on the console. There is not enough memory to accommodate SMBIOS structure. Check and replace the bad DIMM modules.

134 | Platform Monitoring Guide

SP error code

037h

BIOS detected errors or invalid configuration in DIMM slot:


Message Description
BIOS detected errors or invalid configuration in DIMM slot:

BIOS detected unknown errors in the displayed DIMM.

Corrective action Check and replace the bad DIMM modules. SP error code 038h

BIOS detected unknown errors in DIMM slot:


Message Description Corrective action SP error code
BIOS detected unknown errors in DIMM slot:

BIOS detected unknown errors in the displayed DIMM. Check and replace the bad DIMM modules. 038h

023A: ONTAP Detected Bad DIMM in slot:


Message Description
023A: ONTAP Detected Bad DIMM in slot:

Data ONTAP detected a bad DIMM and disabled it in the displayed DIMM slot.

Corrective action Check and replace the bad DIMM modules. SP error code 03Ah

023B: BIOS detected SPD checksum error in DIMM slot:


Message Description Corrective action SP error code
023B: BIOS detected SPD checksum error in DIMM slot:

BIOS detected an SPD checksum error in the displayed DIMM slot. Check and replace the bad DIMM modules. 03Bh

BIOS detected pattern write/read mismatch in DIMM slot:


Message Description Corrective action
BIOS detected pattern write/read mismatch in DIMM slot:

BIOS detected a pattern write/read mismatch in the displayed DIMM slot. Check and replace the bad DIMM modules.

Startup messages | 135

SP error code

03Ch

0241: SMBus Read Timeout


Message Description
0241: SMBus Read Timeout

Timeout occurs when BIOS tries to read or write information through System Management Bus (SMBUS) or Inter-Integrated Circuit (I2C).

Corrective action Run system-level diagnostics to check the SMBUS. SP error code 041h

0242: Invalid FRU information


Message Description
0242: Invalid FRU information

The information from the field-replaceable unit (FRU) Electrically Erasable Programmable Read-Only Memory (EEPROM) is invalid.

Corrective action Program the FRU information through the SP or system-level diagnostics. SP error code 042h

0250: System battery is dead - Replace and run SETUP


Message Description Corrective action SP error code
0250: System battery is dead - Replace and run SETUP

The real-time clock (RTC) battery is dead. Replace the CMOS battery. 050h

0251: System CMOS checksum bad


Message Description
0251: System CMOS checksum bad -- Default configuration used

CMOS checksum is bad, possibly because the system was reset during BIOS boot or because of a dead RTC battery.

Corrective action None. BIOS corrects the error automatically, and the system continues normal boot. SP error code 051h

0260: System timer error


Message
0260: System timer error

136 | Platform Monitoring Guide

Description Corrective action SP error code

The system clock is not ticking. Replace the chipset. 060h

0271: Check date and time settings


Message Description Corrective action
0271: Check date and time settings

Date or time setting is invalid. 1. Set date and time in a proper range. 2. Make sure that the RTC battery is in and not dead.

SP error code

071h

0280: Previous boot incomplete - Default configuration used


Message Description
0280: Previous boot incomplete -- Default configuration used

The previous boot was incomplete, and the default configuration is used.

Corrective action Reboot the system. SP error code 080h

02A1: SP Not Found


Message Description Corrective action SP error code
02A1: SP Not Found

SP does not respond or SP hangs. Check and replace the SP. 0A2h

02A2: BMC System Error Log (SEL) Full


Message Description Corrective action SP error code
02A2: BMC System Error Log (SEL) Full

SP system error log (SEL) is full. Clear the SEL log for SP. 0A2h

Startup messages | 137

02A3: No Response From SP To FRU ID Read Request


Messages Description Corrective action SP error code
02A3: No Response From SP To FRU ID Read Request

Service Processor fails to respond to the FRU ID read request. Check and replace the Service Processor. 0A3h

SP FRU Entry is Blank or Checksum Error


Message Description Corrective action SP error code
SP FRU Entry is Blank or Checksum Error

FRU information is invalid. Check and replace the FRU. 0A3h

No Response to Controller FRU ID Read Request via IPMI


Message Description Corrective action SP error code
No Response to Controller FRU ID Read Request via IPMI

SP does not respond to a controller FRU information inquiry. Check and replace the SP. 0A4h

No Response to Midplane FRU ID Read Request via IPMI


Message Description Corrective action SP error code
No Response to Midplane FRU ID Read Request via IPMI

The SP does not respond to a midplane FRU information inquiry. Check and replace the SP. 0A5h

02C2: No valid Boot Loader in System Flash - Non Fatal


Message Description
02C2: No valid Boot Loader in System Flash - Non Fatal

No valid boot loader is found in system flash memory while the option to Halt For Invalid Boot Loader is disabled in setup. As the result, the system still can boot from the boot media if it has a valid boot loader. Take one of the following actions:

Corrective action

138 | Platform Monitoring Guide

If the system can boot to the boot loader prompt through the boot media, run the following command to place a good boot loader in system flash:
flash

If the system cannot boot to the boot loader prompt through the boot media, boot from the backup image through the SP and then enter the following command to place a good boot loader in the corrupted portion of system flash:
flash

SP error code

0C2h

02C3: No valid Boot Loader in System Flash - Fatal


Message Description
02C3: No valid Boot Loader in System Flash - Fatal

No valid boot loader is found in system flash memory while the option to Halt For Invalid Boot Loader is enabled in setup. As the result, the system halts. Users should take corrective action.

Corrective action Place a valid version of the boot loader in the system flash by completing the following steps: 1. Boot the system from the backup boot image. 2. Enter the following command:
flash

SP error code

0C3h

Fatal Error: No DIMM detected and system can not continue boot!
Message Description
Fatal Error: No DIMM detected and system can not continue boot!

All DIMM serial presence detect (SPD) EEPROMs are inaccessible due to the hanging of the Inter-Integrated Circuit (I2C) switch for System Management Bus (SMBUS). The system regards the condition as if there were no DIMMs on the system.

Corrective action Complete the following steps: 1. If the message persists, try to power-cycle the system. 2. If the problem persists after power-cycling the system, replace the motherboard. SP error code 0E8h

Startup messages | 139

Fatal Error! All channels are disabled!


Message Description Corrective action
Fatal Error! All channels are disabled!

All channels of DIMM are disabled. Complete the following steps: 1. Clear CMOS. 2. Power-cycle the system. 3. If the problem persists, replace all DIMMs.

SP error code

0EAh

Software memory test failed!


Message Description Corrective action SP error code
Software memory test failed!

Software memory test failed in memory reference code (MRC). Check and replace the bad DIMM modules. 0EBh

Fatal Error! RDIMMs and UDIMMs are mixed!


Message Description
Fatal Error! RDIMMs and UDIMMs are mixed!

The registered dual inline memory modules (RDIMMs) and unregistered dual inline memory modules (UDIMMs) are mixed in the system.

Corrective action Make sure that the RDIMMs and UDIMMs are not mixed. SP error code 0EDh

Fatal Error! UDIMM in 3rd slot is not supported!


Message Description
Fatal Error! UDIMM in 3rd slot is not supported!

An unregistered dual inline memory module (UDIMM) is populated in the third slot.

Corrective action Make sure that an unregistered dual inline memory module (UDIMM) is not plugged into the third slot. SP error code 0EEh

140 | Platform Monitoring Guide

Fatal Error! All DIMM failed and system can not continue boot!
Message Description
Fatal Error! All DIMM failed and system can not continue boot!

All DIMMs are mapped out either as bad or having the disable flag set. The system has no memory to continue.

Corrective action Complete the following steps: 1. Clear CMOS. 2. Power-cycle the system. 3. If the problem persists, replace all DIMMs. SP error code N/A

C1300 NetCache appliance POST error messages


POST error messages might appear on the system console if your appliance encounters errors while BIOS initiates the hardware. Some messages are accompanied by beeps.

8042-gate A20 failure


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

8042-gate A20 failure 6 The keyboard controller (8042) might be functioning incorrectly. The BIOS cannot switch to protected mode.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

A: drive failure
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Startup messages | 141

Message Description Corrective action

A: drive failure The BIOS failed to configure the specified drive during POST. 1. Power-cycle the system. 2. Replace the specified drive.

B: drive failure
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

B: drive failure The BIOS failed to configure the specified drive during POST. 1. Power-cycle the system. 2. Replace the specified drive.

base 64KB memory failure


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

base 64KB memory failure 3 The system experienced a memory failure.

Corrective action 1. Check whether the DIMMs are seated properly and reseat the DIMMs, as needed. 2. If the problem persists, replace the DIMMs.

Boot failure
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Boot failure The BIOS could not boot from a particular device. This message is usually followed by other information concerning the device.

Corrective action 1. Wait for the message that follows, which provides more information.

142 | Platform Monitoring Guide

2. Replace the device specified, and then power-cycle the system.

BootSector write!!
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Description

The BIOS has detected software attempting to write to a drives boot sector. This message appears if virus detection is enabled in the AMIBIOS setup.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

Cache error/external cache bad


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Beep code Description 11 The external cache is faulty.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

Checking NVRAM...update failed


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Description Corrective action

The BIOS could not write to the NVRAM. 1. Power-cycle the system. 2. If the problem persists, call technical support.

CMOS battery low


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Startup messages | 143

Message Description Corrective action

CMOS battery low The CMOS battery is low. 1. Power-cycle the system. 2. If the problem persists, call technical support.

CMOS checksum bad


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

CMOS checksum bad The CMOS data has been changed by a program other than the BIOS. 1. Power-cycle the system. 2. If the problem persists, call technical support.

CMOS date/time not set


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action CMOS date/time not set The CMOS date and time are invalid. 1. Power-cycle the system. 2. If the problem persists, call technical support.

CMOS settings wrong


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

CMOS settings wrong Power-cycle the system. If the problem persists, call technical support.

CMOS shutdown register read/write error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

144 | Platform Monitoring Guide

Message Beep code Description

CMOS shutdown register read/write error 10 The shutdown register for CMOS RAM failed.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

display memory read/write error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description Corrective action

display memory read/write error 8 The system video adapter is either missing or its memory is faulty. This is not a fatal error. Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

DMA-2 error
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description DMA-2 error An error occurred when the system attempted to initialize the secondary direct memory access (DMA) controller.

Corrective action Call technical support.

DMA controller error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Startup messages | 145

Message Description

DMA controller error An error occurred when the system attempted to initialize the secondary direct memory access (DMA) controller.

Corrective action Call technical support.

Drive not ready


Message Description Drive not ready The BIOS cannot access the drive because it was not ready for data transfer. This is often reported by drives when no media is present.

Corrective action 1. Power-cycle the system. 2. If the problem, persists call technical support.

Gate20 error
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Gate20 error The BIOS cannot control the gate A20 function, which controls access of memory over 1 MB.

Corrective action 1. Run diagnostics on the motherboard. 2. If the problem persists, replace the motherboard.

Insert BOOT diskette in A


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action Insert BOOT diskette in A The BIOS could not boot from the A drive. Replace the disk drive.

Interrupt controller-N error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message

Interrupt controller-N error

146 | Platform Monitoring Guide

Description Corrective action

The BIOS could not initialize an interrupt controller. 1. Power-cycle the system. 2. If the problem persists, call technical support.

Invalid boot diskette


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Invalid boot diskette A diskette was found in the drive, but it is not configured as a bootable diskette.

Corrective action 1. Power-cycle the system. 2. If the problem, persists call technical support.

Keyboard error
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Keyboard error The keyboard controller is not responding when the BIOS attempts to initialize it.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

Keyboard/interface error
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Keyboard/interface error The keyboard controller is not responding when the BIOS attempts to initialize it.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

Startup messages | 147

Microcode error
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

Microcode error The BIOS could not find or load the CPU microcode update to the CPU. 1. Update to the correct version of BIOS. 2. If this problem persists, call technical support.

Multi-bit ECC error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

Multi-bit ECC error A multiple bit corruption of memory has occurred that cannot be corrected. Replace the DIMMs.

NVRAM bad
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

NVRAM bad There was an error in configuring the nonvolatile RAM (NVRAM). 1. Power-cycle the system. 2. If the problem persists, call technical support.

NVRAM checksum bad


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

NVRAM checksum bad There was an error while validating the nonvolatile RAM (NVRAM) data. This causes POST to clear the NVRAM data.

Corrective action 1. Power-cycle the system.

148 | Platform Monitoring Guide

2. If the problem persists, call technical support.

NVRAM cleared
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Messages Description

NVRAM cleared There was an error while validating the nonvolatile RAM (NVRAM) data. This causes POST to clear the NVRAM data.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

NVRAM ignored
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description Corrective action NVRAM ignored There was an error in configuring the nonvolatile RAM (NVRAM). 1. Power-cycle the system. 2. If the problem persists, call technical support.

parity error (beep code)


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

parity error (beep code) 2 The system experienced a parity error.

Corrective action 1. Check whether the DIMMs are seated properly and reseat the DIMMs, as needed. 2. If the problem persists, replace the DIMMs.

Startup messages | 149

Parity error (no beep code)


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Parity error (no beep code) A fatal parity error has occurred. The system halts after displaying this message.

Corrective action 1. Run diagnostics on all components. 2. If this message persists, call technical support.

PCI I/O conflict


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

PCI I/O conflict A PCI adapter generated an I/O resource conflict when configured by BIOS.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

PCI IRQ conflict


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description PCI IRQ conflict A PCI adapter generated an I/O resource conflict when configured by BIOS.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

PCI IRQ routing table error


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Description Corrective action There was an error in configuring a PCI device. 1. Power-cycle the system.

150 | Platform Monitoring Guide

2. If the problem persists, call technical support.

PCI ROM conflict


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

PCI ROM conflict A PCI adapter generated an I/O resource conflict when configured by BIOS.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

processor error
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

processor error 5 The CPU on the motherboard experienced an error.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

processor exception interrupt error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

processor exception interrupt error 7 The CPU generated an exception interrupt.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs.

Startup messages | 151

If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

Reboot and select proper boot device ...


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

Reboot and select proper boot device or insert boot media in selected boot device. The BIOS could not find a bootable device in the system and/or the removable media drive does not contain media.

Corrective action Call technical support.

refresh failure
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Beep code Description refresh failure 1 The memory refresh circuitry on the motherboard is faulty.

Corrective action 1. Check whether the DIMMs are seated properly and reseat the DIMMs, as needed. 2. If the problem persists, replace the DIMMs.

Resource conflict
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

Resource conflict More than one system device is trying to use the same resources. 1. Power-cycle the system. 2. If the problem persists, call technical support.

152 | Platform Monitoring Guide

ROM checksum error


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

ROM checksum error 9 The ROM checksum value does not match the value encoded in the BIOS.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

Static resource conflict


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

Status resource conflict More than one system device is trying to use the same resources. 1. Power-cycle the system. 2. If the problem persists, call technical support.

System halted
Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description Corrective action

System halted The system was halted. This message appears when a fatal error occurs. 1. Power-cycle the system. 2. If the problem persists, call technical support.

Timer error
Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message.

Startup messages | 153

Message Description Corrective action

Timer error There was an error with initializing the system hardware. 1. Power-cycle the system. 2. If the problem persists, call technical support.

timer not operational


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Beep code Description

timer not operational 4 The system either experienced a memory failure or the motherboard failed.

Corrective action Remove all network interface cards (NICs) and power-cycle the system. If the beeps do not occur, reinstall each NIC one at a time, and then powercycle the system to find and replace any problematic NICs. If beeps persist when the NICs are absent, run diagnostics on the motherboard to determine whether the motherboard needs replacement.

VIRUS: continue (y/n)


Note: Always power-cycle your system when you receive this message. If the system repeats the error message, follow the corrective action for the error message.

Message Description

VIRUS: continue (y/n) The BIOS detects virus activity. This message appears if virus detection is enabled in AMIBIOS setup.

Corrective action 1. Power-cycle the system. 2. If the problem persists, call technical support.

X hard disk error


Note: Always power-cycle your system when you receive this message. If the system repeats the

error message, follow the corrective action for the error message. Message Description X hard disk error A configuration error occurred; X represents the specified device.

154 | Platform Monitoring Guide

Corrective action

Call technical support.

Boot error messages


Boot error messages might appear after the hardware passes all POSTs and your system encounters errors while loading the operating system.

Boot device err


Message Description Corrective action Boot device err A CompactFlash card could not be found to boot from. Insert a valid CompactFlash card.

Cannot initialize labels


Message Description Cannot initialize labels When the system tries to create a new file system, it cannot initialize the disk labels.

Corrective action Usually, you do not need to create and initialize a file system; do so only after consulting technical support.

Cannot read labels


Message Description Cannot read labels When your system tries to initialize a new file system, it has a problem reading the disk labels it wrote to the disks. This problem can be because the system failed to read the disk size, or the written disk labels were invalid. Corrective action Usually, you do not need to create and initialize a file system; do so only after consulting technical support.

Configuration exceeds max PCI space


Message Description Configuration exceeds max PCI space The memory space for mapping PCI adapters has been exhausted, because either There are too many PCI adapters in the system

Startup messages | 155

Corrective action

An adapter is demanding too many resources

1. Verify that all expansion adapters in your system are supported. 2. Contact technical support for help. Have a list ready of all expansion adapters installed in your system.

DIMM slot # has correctable ECC errors


Message Description DIMM slot # has correctable ECC errors The specified DIMM slot has correctable error correction code (ECC) errors.

Corrective action Run diagnostics on your DIMMs. If the problem persists, replace the specified DIMM.

Dirty shutdown in degraded mode


Message Description Dirty shutdown in degraded mode The file system is inconsistent because you did not shut down the system cleanly when it was in degraded mode.

Corrective action Contact technical support for instructions about repairing the file system.

Disk label processing failed


Message Description Corrective action Disk label processing failed Your system detects that the disk is not in the correct drive bay. Make sure that the disk is in the correct bay.

Drive %s.%d not supported


Message Description Corrective action Drive %s.%d not supported %sThe disk number; %dThe disk ID number. The system detects an unsupported disk drive. 1. Remove the drive immediately or the system drops down to the programmable ROM (PROM) monitor within 30 seconds. 2. Check the System Configuration Guide at http://now.netapp.com to verify support for your disk drive.

156 | Platform Monitoring Guide

Error detection detected too many errors to analyze at once


Message Description Error detection detected too many errors to analyze at once This message occurs when other error messages occur at the same time.

Corrective action See the other error messages and their respective corrective actions. If the problem persists, contact technical support.

FC-AL loop down, adapter %d


Message Description Corrective action FC-AL loop down, adapter %d The system cannot detect the Fibre Channel-Arbitrated Loop (FC-AL) loop or adapter. 1. Identify the adapter by entering the following command:
storage show adapter

2. Turn off the power on your system and verify that the adapter is properly seated in the expansion slot. 3. Verify that all Fibre Channel cables are connected.

File system may be scrambled


Message Descriptions and corrective actions File system may be scrambled The following table lists errors that cause the file system to become inconsistent and steps you can take to correct the problem. Description An unclean shutdown when your system is in degraded mode and when NVRAM is not working. The number of disks detected in the disk array is different from the number of disks recorded in the disk labels. The system cannot start when more than one disk is missing. The system encounters a read error while reconstructing parity. Corrective action Contact technical support to learn how to start the system from a system boot diskette and repair the file system. Make sure that all disks on the system are properly installed in the disk shelves.

Contact technical support for help.

Startup messages | 157

Description A disk failed at the same time the system crashed.

Corrective action Contact technical support to learn how to repair the file system.

Halted disk firmware too old


Message Description Corrective action Halted disk firmware too old The disk firmware is an old version. Update the disk firmware by entering the following command:
disk_fw_update

Halted: Illegal configuration


Message Description Corrective action Halted: Illegal configuration Incorrect HA pair. 1. Check the console for details. 2. Verify that all cables are correctly connected.

Invalid PCI card slot %d


Message Description Invalid PCI card slot %d %dThe expansion slot number. The system detects a adapter that is not supported.

Corrective action Replace the unsupported adapter with an adapter that is included in the System Configuration Guide at http://now.netapp.com.

No /etc/rc
Message Description Corrective action No /etc/rc The /etc/rc file is corrupted. 1. At the hostname> prompt, enter
setup

2. As the system prompts for system configuration information, use the information you recorded in your system configuration information worksheet in the Getting Started Guide.

158 | Platform Monitoring Guide

For more information about your system setup program, see the appropriate system administration guide.

No disk controllers
Message Description No disk controllers The system cannot detect any Fibre Channel-Arbitrated Loop (FC-AL) disk controllers.

Corrective action 1. Turn off your system power. 2. Verify that all NICs are properly seated in the appropriate expansion slots.

No disks
Message Description No disks The system cannot detect any Fibre Channel-Arbitrated Loop (FC-AL) disks.

Corrective action Verify that all disks are properly seated in the drive bays.

No /etc/rc, running setup


Message Description No /etc/rc, running setup The system cannot find the /etc/rc file and automatically starts setup.

Corrective action As the system prompts for system configuration information, use the information you recorded in your system configuration information worksheet in the Getting Started Guide. For more information about your system setup program, see the appropriate system administration guide.

No network interfaces
Message Description No network interfaces The system cannot detect any network interfaces.

Corrective action 1. Turn off the system and verify that all network interface cards (NICs) are seated properly in the appropriate expansion slots. 2. Run diagnostics to check the onboard Ethernet port. 3. If the problem persists, contact technical support.

Startup messages | 159

No NVRAM present
Message Description No NVRAM present The system cannot detect the NVRAM adapter.

Corrective action Make sure that the NVRAM adapter is securely installed in the appropriate expansion slot.

NVRAM #n downrev
Message Description NVRAM #n downrev nThe serial number of the nonvolatile RAM (NVRAM) adapter. The NVRAM adapter is an early revision that cannot be used with the system.

Corrective action Check the console for information about which revision of the NVRAM adapter is required. Replace the NVRAM adapter.

NVRAM: wrong pci slot


Message Description NVRAM: wrong pci slot The system cannot detect the nonvolatile RAM (NVRAM) adapter. For a stand-alone 3020 or 3050 system, make sure that the NVRAM adapter is in slot 1. For a 3020 or 3050 system in an HA pair, make sure that the NVRAM adapter is in slot 2.

Corrective action

Note: C2300 and C3300 NetCache appliances do not support NVRAM5.

Panic: DIMM slot #n has uncorrectable ECC errors


Message Description Panic: DIMM slot #n has uncorrectable ECC errors. Replace these DIMMS. The specified DIMM has uncorrectable ECC errors.

Corrective action Replace the specified DIMM.

This platform is not supported on this release


Message This platform is not supported on this release. Please consult the release notes. Please downgrade to a supported release! Shutting down: EOL platform Description This platform is not supported on this release. Please consult the release notes for your software.

160 | Platform Monitoring Guide

Corrective action You must downgrade your software version to a compatible release. Verify that you have the correct URL for software download.

Too many errors in too short time


Message Description Too many errors in too short time The error detection system is experiencing problems. This message occurs when other error messages occur at the same time.

Corrective action See the other error messages and their respective corrective actions. If the problem persists, call technical support.

Warning: Motherboard Revision not available


Message Description Warning: Motherboard Revision not available. Motherboard is not programmed. The system motherboard is not programmed with the correct revision.

Corrective action Replace the motherboard.

Warning: Motherboard Serial Number not available


Message Description Warning: Motherboard Serial Number not available. Motherboard is not programmed The system motherboard is not programmed with the correct serial number.

Corrective action Replace the motherboard.

Warning: system serial number is not available


Message Description Warning: system serial number is not available. System backplane is not programmed. The backplane of your system does not have the correct system serial number.

Corrective action Report the problem to technical support so that your system can be replaced.

Watchdog error
Message Description Corrective action Watchdog error An error occurred during the testing of the watchdog timer. Replace the motherboard.

Startup messages | 161

Watchdog failed
Message Description Watchdog failed Your system watchdog reset hardware, used to reset your system from a system hang condition, is not functioning properly.

Corrective action Replace the motherboard.

162 | Platform Monitoring Guide

163

EMS and operational messages


You might encounter various messages on your system during normal operation. The EMS collects event data from various parts of the Data ONTAP kernel and displays information about those events in AutoSupport messages. EMS messages appear on your system console or LCD and provide information about disk drives, disk shelves, system power supply, system fans, and acceleration modules. Operational error messages might appear on your system console or LCD when the system is operating, when it is halted, or when it is restarting because of system problems.

Environmental EMS messages


EMS messages appear on the console and in AutoSupport messages if your system encounters extremes in its operational environment. They also appear on the LCD display if your system has one.
Note: In 31xx systems, both controllers in a chassis share the power supplies. As a result, the system is never shut down because of a single power supply failure. Removing one power supply does not shut down the system. Note: Degraded power might be caused by bad power supplies, bad wall power, or bad

components on the motherboard. If spare power supplies are available, try replacing them to see whether that alleviates the problem.

Chassis fan FRU failed


Message LCD display LED behavior Description Corrective action Chassis fan FRU failed: current speed is 4272 RPM, on [time stamp]. Fans stopped; replace them FRU LED: Green if problem is PSU; off if problem is fan. This message occurs when a system fan fails. Check LEDs on the fans and power supply. SNMP trap ID If both fan LEDs are green, run diagnostics on the power supplies. If the fan LED is off, replace the fan.

#414: Chassis fan is degraded

164 | Platform Monitoring Guide

Chassis over temperature on XXXX


LCD display Message Description Temperature exceeds limits Chassis over temperature on XXXX at [time stamp]. This message occurs when the system is operating above the high-temperature threshold.

Corrective action 1. Make sure that the system has proper ventilation. 2. Power-cycle the system and run diagnostics on the system. SNMP trap ID #372: Chassis temperature is too hot

Chassis over temperature shutdown on XXXX


Message LCD display Description Chassis over temperature shutdown on XXXX at [time stamp]. Temperature exceeds limits This message occurs when the system is operating above the high-temperature threshold. The system shuts down immediately.

Corrective action 1. Make sure that the system has proper ventilation. 2. Power-cycle the system and run diagnostics on the system. SNMP trap ID #371: Chassis temperature is too hot

Chassis Power Degraded: 3.3V in warn high state


Message LCD display Description Chassis Power Degraded: 3.3V is in warn high state current voltage is 3273 mV on XXXX at [time stamp]. Power supply degraded This message occurs when the system is operating above the high-voltage threshold.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#403: Chassis power is degraded

EMS and operational messages | 165

Chassis power degraded: PS#


Message LCD display LED behavior Description Chassis Power degraded: PS# Power supply degraded FRU LED: Amber This message occurs when there is a problem with one of the power supplies.

Corrective action 1. Check that the power supply is seated properly in its bay and that all power cords are connected. 2. Power-cycle your system and run diagnostics on the identified power supply. 3. If the problem persists, replace the identified power supply. SNMP trap ID #392: Chassis power supply is degraded

Note: In 31xx systems, both controllers in a chassis share the power supplies. As a result, the system is never shut down because of a single power supply failure. Removing one power supply does not shut down the system.

Chassis Power Fail: PS#


Message LCD display Description Chassis Power Fail: PS# Power supply degraded This message occurs when the power supply fails.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#6: Chassis power is degraded

Chassis Power Shutdown


Message LCD display LED behavior Chassis Power Shutdown: Chassis Power Supply Fail: PS# Power supply degraded FRU LED: Amber

166 | Platform Monitoring Guide

Description

This message occurs when the system is in a warning state. The system shuts down immediately.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#392: Chassis power supply is degraded

Note: In 31xx systems, both controllers in a chassis share the power supplies. As a result, the system is never shut down because of a single power supply failure. Removing one power supply does not shut down the system.

Chassis power shutdown: 3.3V in warn low state


Message LCD display Description Chassis power shutdown: 3.3V is in warn low state current voltage is 3273 mV on XXXX at [time stamp]. Power supply degraded This message occurs when the system is operating below the low-voltage threshold. The system shuts down immediately.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#403: Chassis power is degraded

Chassis Power Supply: PS# removed


Message LCD display LED behavior Description Chassis Power Supply: PS# removed system will shutdown in 2 minutes Power supply degraded FRU LED: Amber This message occurs when the power supply unit is removed from the system. The system will shut down unless the power supply is replaced.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it.

EMS and operational messages | 167

If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#501: Chassis power supply is degraded

Chassis power supply degraded: PS#


Note: This message appears only on 31xx systems.

Message LED behavior Description

Chassis power supply degraded: PS# FRU LED: Amber This message occurs when there is a problem with one of the power supplies.

Corrective action 1. Check that the power supply is seated properly in its bay and that all power cords are connected. 2. Power-cycle your system and run diagnostics on the identified power supply. 3. If the problem persists, replace the identified power supply. SNMP trap ID #392: Chassis power supply is degraded

Chassis power supply fail: PS#


LCD display Message Description Power supply degraded Chassis power supply fail: PS# This message occurs when the system is operating below the low-voltage threshold. The system shuts down immediately.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

N/A

Chassis power supply off: PS#


Note: This message appears only on 31xx systems.

Message

Chassis Power supply off: PS#

168 | Platform Monitoring Guide

LED behavior Description

FRU LED: Off This message occurs when the power supply unit is turned off.

Corrective action Your action depends on whether the power supply is present. If the power supply is present and is switched off, turn the switch on. If the power supply is present and turned on, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#395: Power supply not present

Chassis power supply off: PS#


Message LCD display Description Chassis power supply off: PS# Power supply degraded This message occurs when one or more chassis power supplies are turned off.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#395: Power supply not present

Chassis power supply OK: PS#


Note: This message appears only on 31xx systems.

Message LED behavior Description Corrective action SNMP trap ID

Chassis power supply OK: PS# FRU LED: Green This message occurs when the power supply is operating normally. None. #397: Chassis power supply (%id) is OK

Chassis power supply removed: PS#


Note: This message appears only on 31xx systems

Message LED behavior

Chassis power supply removed: PS# N/A

EMS and operational messages | 169

Description

This message occurs when the power supply unit is removed from the system.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#394: I/O expansion module is not present in the chassis

Chassis under temperature on XXXX


Message LCD display Description Chassis under temperature on XXXX at [time stamp]. Temperature exceeds limits This message occurs when the system is operating below the low-temperature threshold.

Corrective action 1. Raise the ambient temperature around the system. 2. Power-cycle the system and run diagnostics on the system. SNMP trap ID #372: Chassis temperature is too cold

Chassis under temperature shutdown on XXXX


Message LCD display Description Chassis under temperature shutdown on XXXX at [time stamp]. Temperature exceeds limits This message occurs when the system is operating below the low-temperature threshold. The system shuts down immediately.

Corrective action 1. Check that the system has proper ventilation. You might need to raise the ambient temperature around the system. 2. Power-cycle the system and run diagnostics on the system. SNMP trap ID #371: Chassis temperature is too cold

Fan: # is spinning below tolerable speed


Message LCD display Description Fan: # is spinning below tolerable speed replace immediately to avoid overheating Fans stopped; replace them This message occurs when one or more chassis fans is spinning too slowly.

170 | Platform Monitoring Guide

Corrective action Check LEDs on the fans. SNMP trap ID If both fan LEDs are green, run diagnostics on the motherboard If the fan LED is off, replace the fan.

#415: Chassis fan is degraded

monitor.chassisFan.degraded
Message Severity Description Corrective action SNMP trap ID monitor.chassisFan.degraded ALERT This message is issued when a chassis fan is degraded. The fan unit should be replaced. #412 Chassis fan is degraded: %s

monitor.chassisFan.ok
Message Severity Description Corrective action SNMP trap ID monitor.chassisFan.ok NOTICE This message occurs when the chassis fans are OK. N/A #366 Chassis FRU is OK

monitor.chassisFan.removed
Message Severity Description Corrective action SNMP trap ID monitor.chassisFan.removed ALERT This message occurs when a chassis fan is removed. Replace the fan unit. #363 Chassis FRU is removed

monitor.chassisFan.slow
Message Severity Description Corrective action monitor.chassisFan.slow ALERT This message occurs when a chassis fan is spinning too slowly. Replace the fan unit.

EMS and operational messages | 171

SNMP trap ID

#365 Chassis FRU contains at least one fan spinning slowly

monitor.chassisFan.stop
Message Severity Description Corrective action SNMP trap ID monitor.chassisFan.stop ALERT This message occurs when a chassis fan is stopped. Replace the fan unit. #364 Chassis FRU contains at least one stopped fan

monitor.chassisFan.warning
Message Severity Description monitor.chassisFan.warning ALERT This message is issued when a chassis fan is spinning either too slowly or too fast. This is a warning message.

Corrective action The fan unit should be replaced. SNMP trap ID #415 Chassis fan is in warning state

monitor.chassisFanFail.xMinShutdown
Message Severity Description monitor.chassisFanFail.xMinShutdown EMERG This message indicates that multiple chassis fans have failed and the system will shut down in few minutes unless corrected.

Corrective action Make sure the system fans are working. SNMP trap ID #511 Multiple Chassis Fan failure: System will shut down in 2 minutes.

monitor.chassisPower.degraded
Message Severity Description Corrective action monitor.chassisPower.degraded NOTICE This message indicates that a power supply is degraded. 1. If spare power supplies are available, try replacing them to see whether that alleviates the problem.

172 | Platform Monitoring Guide

2. Otherwise, contact technical support for further instruction. SNMP trap ID #403 Chassis power is degraded

monitor.chassisPower.ok
Message Severity Description Corrective action SNMP trap IP monitor.chassisPower.ok NOTICE This messages indicates that the motherboard power is OK. N/A #406 Normal operation

monitor.chassisPowerSupplies.ok
Message Severity Description Corrective action SNMP trap ID monitor.chassisPowerSupplies.ok INFO This message indicates that all power supplies are OK. N/A #396 Normal operation

monitor.chassisPowerSupply.degraded
Message Severity Description Corrective action SNMP trap ID monitor.chassisPowerSupply.degraded INFO This message indicates that a power supply is degraded. A replacement power supply might be required. Contact technical support for further instruction. #392 Chassis power supply is degraded

monitor.chassisPowerSupply.notPresent
Message Severity Description Corrective action SNMP trap ID monitor.chassisPowerSupply.notPresent NOTICE This message indicates that a power supply is not present. Replace the power supply. #394 Power supply not present

EMS and operational messages | 173

monitor.chassisPowerSupply.off
Message Severity Description Corrective action SNMP trap ID monitor.chassisPowerSupply.off NOTICE This message indicates that a power supply is turned off. Turn on the power supply. #395 Power supply not present

monitor.chassisPowerSupply.ok
Message Severity Description Corrective action SNMP trap ID monitor.chassisPowerSupply.ok INFO This message indicates the power supply is OK None. # 397 Chassis power supply (%id) is OK

monitor.chassisTemperature.cool
Message Severity Description Corrective action SNMP trap ID monitor.chassisTemperature.cool ALERT This message occurs when the chassis temperature is too cool. Raise the temperature around the system. #372 Chassis temperature is too cool

monitor.chassisTemperature.ok
Message Severity Description Corrective action SNMP trap ID monitor.chassisTemperature.ok NOTICE This message occurs when the chassis temperature is normal. N/A #376 Normal operation

monitor.chassisTemperature.warm
Message monitor.chassisTemperature.warm

174 | Platform Monitoring Guide

Severity Description Corrective action SNMP trap ID

ALERT This message occurs when the chassis temperature is too warm. Check to see whether air conditioning units are needed, or whether they are functioning properly. #372 Chassis temperature is too warm

monitor.cpuFan.degraded
Message Severity Description Corrective action monitor.cpuFan.degraded NOTICE This message indicates that a CPU fan is degraded. 1. Replace the identified fan. 2. Power-cycle the system and run diagnostics on the system. SNMP trap ID #383 A CPU fan is not operating properly

monitor.cpuFan.failed
Message Severity Description Corrective action monitor.cpuFan.failed NOTICE This message indicates that a CPU fan is degraded. 1. Replace the identified fan. 2. Power-cycle the system and run diagnostics on the system. SNMP trap ID #381: CPU fan is stopped

monitor.cpuFan.ok
Message Severity Description Corrective action SNMP trap ID monitor.cpuFan.ok INFO This message indicates that a CPU fan is OK. N/A #386 Normal operation

EMS and operational messages | 175

monitor.ioexpansionPower.degraded
Message Severity Description monitor.ioexpansionPower.degraded NOTICE This message indicates that power on the I/O expansion module is degraded.

Corrective action Degraded power might be caused by bad power supplies, bad wall power, or bad components on the motherboard. If spare power supplies are available, try exchanging them to see whether the problem is resolved. Otherwise, contact technical support. SNMP trap ID #403 Power on IO expansion is degraded:

monitor.ioexpansionPower.ok
Message Severity Description Corrective action SNMP trap ID monitor.ioexpansionPower.ok NOTICE This messages indicates that power on the I/O expansion module is OK. None. #406 Power on IO expansion module is OK

monitor.ioexpansionTemperature.cool
Message Severity Description monitor.ioexpansionTemperature.cool ALERT This warning message occurs when the I/O expansion module is too cold.

Corrective action The system cannot function in an environment that is too cold; find ways to warm the system. SNMP trap ID #372 I/O expansion module is too cold:

monitor.ioexpansionTemperature.ok
Message Severity Description monitor.ioexpansionTemperature.ok NOTICE This message occurs when the temperature of the I/O expansion module is normal. It can occur for the following two cases: 1) LOG_NOTICE to show that a bad condition has reverted to normal. 2) LOG_INFO for hourly to indicate that the temperature is OK.

176 | Platform Monitoring Guide

Corrective action None. SNMP trap ID #376 Temperature of the I/O expansion module is OK.

monitor.ioexpansionTemperature.warm
Message Severity Description monitor.ioexpansionTemperature.warm ALERT This warning message occurs when the I/O expansion module is too warm.

Corrective action Evaluate the environment in which the system is functioning: Are air conditioning units needed or is the current air conditioning not functioning properly? SNMP trap ID #372 I/O expansion module is too warm:

monitor.ioexpansion.unpresent
Message Severity Description monitor.ioexpansion.unpresent NOTICE This message occurs when the I/O expansion module is not inserted into the chassis.

Corrective action None. SNMP trap ID #394: I/O expansion module is not present in the chassis.

monitor.nvmembattery.warninglow
Message Severity Description monitor.nvmembattery.warninglow WARNING This message occurs when the NVMEM (nonvolatile memory) lithium battery is low on power.

Corrective action Replace the NVMEM battery as soon as practical. SNMP trap ID #63 NVMEM battery is low on power and should be replaced as soon as practical.

monitor.nvramLowBattery
Message Severity monitor.nvramLowBattery NODE_ERROR

EMS and operational messages | 177

Description

This message occurs when the NVRAM batteries are discovered to be at a dangerously low power level.

Corrective action Contact technical support. SNMP trap ID N/A

monitor.power.unreadable
Message Severity Description monitor.power.unreadable INFO This message occurs when a power sensor in the controller module is not readable.

Corrective action Shut down the system and power-cycle the controller module. If the sensor is still not readable, replace the controller module. SNMP trap ID N/A

monitor.shutdown.cancel
Message Severity Description monitor.shutdown.cancel WARNING This message is issued when an automatic shutdown sequence has been canceled.

Corrective action None. SNMP trap ID #6 Automatic shutdown sequence canceled

monitor.shutdown.cancel.nvramLowBattery
Message Severity Description monitor.shutdown.cancel.nvramLowBattery WARNING This message is issued when an automatic shutdown sequence has been postponed due to RAID reconstruction.

Corrective action Unknown SNMP trap ID #6 NVRAM battery is dangerously Low. Halt delayed until %s finishes.

monitor.shutdown.chassisOverTemp
Message monitor.shutdown.chassisOverTemp

178 | Platform Monitoring Guide

Severity Description

CRIT This message occurs just before shutdown, indicating that the chassis temperature is too hot.

Corrective action Check to see if air conditioning units are needed, or whether they are functioning properly. #371 Chassis temperature is too hot

monitor.shutdown.chassisUnderTemp
Message Severity Description monitor.shutdown.chassisUnderTemp CRIT This message occurs just before shutdown, indicating that the chassis temperature becomes too cold.

Corrective action Raise the temperature around the system. SNMP trap ID #371 Chassis temperature is too cold

monitor.shutdown.emergency
Message Severity Description Corrective action SNMP trap ID monitor.shutdown.emergency NODE_FAULT This message is issued when an emergency shutdown is initiated. None. #6 Emergency shutdown: %s

monitor.shutdown.ioexpansionOverTemp
Message Severity Description monitor.shutdown.ioexpansionOverTemp CRIT This message occurs when the I/O expansion module is too hot. This message is sent just before shutdown.

Corrective action The system environment is too hot; cool the environment. SNMP trap ID #371 I/O expansion module is too hot:

monitor.shutdown.chassisUnderTemp
Message monitor.shutdown.chassisUnderTemp

EMS and operational messages | 179

Severity Description

CRIT This message occurs just before shutdown, indicating that the chassis temperature becomes too cold.

Corrective action Raise the temperature around the system. SNMP trap ID #371 Chassis temperature is too cold

monitor.shutdown.nvramLowBattery.pending
Message Severity Description monitor.shutdown.nvramLowBattery.pending WARNING This message is issued when an automatic shutdown sequence is pending due to a low battery.

Corrective action Replace the battery. SNMP trap ID #62 Emergency shutdown: NVRAM battery dangerously low in degraded mode. Replace the battery immediately!

monitor.temp.unreadable
Message Severity Description monitor.temp.unreadable INFO This message occurs when the controller module temperature is not readable. The system does not automatically shut down if it becomes too hot for reliable operation.

Corrective action Shut down the system and power-cycle the controller module. If the temperature is still not readable, replace the controller module. SNMP trap ID N/A

Multiple chassis fans have failed


Message LCD display Description Multiple chassis fans have failed; system will shut down in 2 minutes. Fans stopped; replace them. This message occurs during a multiple chassis fan failure. The system shuts down in two minutes if this condition is uncorrected.

Corrective action 1. Replace both fans. 2. Power-cycle and run diagnostics on the system.

180 | Platform Monitoring Guide

SNMP trap ID

#511: Chassis fan is degraded

Multiple fan failure on XXXX


Message LCD display LED behavior Description Multiple fan failure on XXXX at [time stamp]. Fans stopped; replace them. FRU LED: Amber This message occurs when both system fans fail. The system shuts down immediately.

Corrective action 1. Replace both fans. 2. Power-cycle and run diagnostics on the system. SNMP trap ID #6 Emergency shutdown

Multiple power supply fans failed


Message LCD display Description Multiple power supply fans failed; system will shut down in 2 minutes. Power supply degraded This message occurs when multiple power supplies and fans have failed. The system shuts down in two minutes if this condition is uncorrected.

Corrective action Your action depends on whether the power supply is present. If the power supply is not inserted, insert it. If the power supply is inserted, power-cycle your system and run diagnostics on the identified power supply. If the problem persists, replace the identified power supply.

SNMP trap ID

#521: Chassis power is degraded

nvmem.battery.capacity.low
Message Severity Description nvmem.battery.capacity.low NODE_ERROR This message occurs when the NVMEM battery lacks the capacity to preserve the NVMEM contents for the required minimum of 72 hours. The system is at the risk of data loss if the power fails. This message repeats every hour while the problem continues and the system shuts down in 24 hours if automatic recharging of the battery does not restore its charge.

EMS and operational messages | 181

Corrective action SNMP trap ID

Correct any environmental problems, such as chassis over-temperature. The battery charges automatically. If the capacity is not restored in several hours, replace the battery pack. If the problem persists, replace the controller module. N/A

nvmem.battery.capacity.low.warn
Message Severity Description Corrective action SNMP trap ID nvmem.battery.capacity.low.warn INFO This message occurs when the NVMEM battery capacity is below normal. None. N/A

nvmem.battery.capacity.normal
Message Severity Description Corrective action SNMP trap ID nvmem.battery.capacity.normal INFO This message occurs when the NVMEM battery capacity is normal. None. N/A

nvmem.battery.current.high
Message Severity Description nvmem.battery.current.high NODE_ERROR This message occurs when the NVMEM battery current is excessively high and the system will shut down.

Corrective action First, correct any environmental problems, such as chassis overtemperature. If the NVMEM battery current is still too high, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

nvmem.battery.current.high.warn
Message Severity nvmem.battery.current.high.warn INFO

182 | Platform Monitoring Guide

Description Corrective action SNMP trap ID

This message occurs when the NVMEM battery current is above normal. INFO N/A

nvmem.battery.sensor.unreadable
Message Severity Description nvmem.battery.sensor.unreadable INFO This message occurs when the battery state of the battery-backed memory (NVMEM) is unknown. One of the battery sensors is not readable.

Corrective action Shut down the system and power-cycle the controller module. If the problem is not corrected, replace the battery. If the sensor is still not readable, replace the controller module. SNMP trap ID N/A

nvmem.battery.temp.high
Message Severity Description nvmem.battery.temp.high NODE_ERROR This message occurs when the NVMEM battery is too hot and the system is at a high risk of data loss if power fails.

Corrective action If the system is excessively warm, allow it to cool gradually. If the NVMEM battery temperature reading is still too high, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

nvmem.battery.temp.low
Message Severity Description nvmem.battery.temp.low NODE_ERROR This message occurs when the NVMEM battery is too cold and the system is at a high risk of data loss if power fails.

Corrective action If the system is excessively cold, allow it to warm gradually. If the NVMEM battery temperature reading is still too low, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

EMS and operational messages | 183

nvmem.battery.temp.normal
Message Severity Description Corrective action SNMP trap ID nvmem.battery.temp.normal INFO This message occurs when the NVMEM battery temperature is normal. None. N/A

nvmem.battery.voltage.high
Message Severity Description nvmem.battery.voltage.high NODE_ERROR This message occurs when the NVMEM battery voltage is excessively high and the system will shut down.

Corrective action First, correct any environmental problems, such as chassis overtemperature. If the NVMEM battery voltage is still too high, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

nvmem.battery.voltage.high.warn
Message Severity Description Corrective action SNMP trap ID nvmem.battery.voltage.high.warn INFO This message occurs when the NVMEM battery voltage is above normal. None. N/A

nvmem.battery.voltage.normal
Message Severity Description Corrective action SNMP trap ID nvmem.battery.voltage.normal INFO This message occurs when the NVMEM battery voltage is normal. None. N/A

184 | Platform Monitoring Guide

nvmem.voltage.high
Message Severity Description nvmem.voltage.high NODE_ERROR This message occurs when the NVMEM supply voltage is high and the system is at a high risk of data loss if power fails.

Corrective action First, correct any environmental or battery problems. If the problem continues, replace the controller module. SNMP trap ID N/A

nvmem.voltage.high.warn
Message Severity Description Corrective action SNMP trap ID nvmem.voltage.high.warn INFO This message occurs when the NVMEM supply voltage is above normal. None. N/A

nvmem.voltage.normal
Message Severity Description Corrective action SNMP trap ID nvmem.voltage.normal INFO This message occurs when the NVMEM supply voltage is normal. None. N/A

nvram.bat.missing.error
Message Severity Description Corrective action SNMP trap ID nvram.bat.missing.error NODE_ERROR This message occurs when the battery in the chassis is degrading. Contact technical support. N/A

EMS and operational messages | 185

nvram.battery.capacity.low
Message Severity Description nvram.battery.capacity.low NODE_ERROR This message occurs when the NVRAM battery lacks the capacity to preserve the NVRAM contents for the required minimum of 72 hours. The system is at the risk of data loss if the power fails. This message repeats every hour while the problem continues, and the system shuts down in 24 hours if automatic recharging of the battery does not restore its charge. Correct any environmental problems, such as chassis over-temperature. The battery charges automatically. If the capacity is not restored in several hours, replace the battery pack. If the problem persists, replace the controller module. N/A

Corrective action SNMP trap ID

nvram.battery.capacity.low.critical
Message Severity Description nvram.battery.capacity.low.critical NODE_ERROR This message occurs when the NVRAM battery capacity is dangerously low. To prevent data loss, the system will shut down in 20 minutes

Corrective action Correct any environmental problems, such as chassis over-temperature. The battery charges automatically. If the capacity is not restored automatically, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.capacity.low.warn
Messages Severity Description Corrective action SNMP trap ID nvram.battery.capacity.low.warn INFO This message occurs when the NVRAM battery capacity is below normal. None. N/A

nvram.battery.capacity.normal
Message Severity nvram.battery.capacity.normal INFO

186 | Platform Monitoring Guide

Description Corrective action SNMP trap ID

This message occurs when the NVRAM battery capacity is normal None. N/A

nvram.battery.charging.nocharge
Message Severity Description nvram.battery.charging.nocharge NODE_ERROR This message occurs when the NVRAM battery is requesting to be charged but the charger is not charging the battery. To prevent data loss, the system will shut down in 20 minutes.

Corrective action Replace the NVRAM battery/card. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.charging.normal
Message Severity Description Corrective action SNMP trap ID nvram.battery.charging.normal INFO This message occurs when the NVRAM battery charging status is normal. None. N/A

nvram.battery.charging.wrongcharge
Message Severity Description nvram.battery.charging.wrongcharge NODE_ERROR This message occurs when the NVRAM battery charger is charging the battery even though the battery is not requesting to be charged. To prevent data loss, the system will be shut down in 20 minutes.

Corrective action Replace the NVRAM battery. If the problem persists, replace the NVRAM card. SNMP trap ID N/A

nvram.battery.current.high
Message nvram.battery.current.high

EMS and operational messages | 187

Severity Description

NODE_ERROR This message occurs when the NVRAM battery current is excessively high and the system will shut down.

Corrective action First, correct any environmental problems, such as chassis over-temperature. If the NVRAM battery current is still too high, replace the battery pack. If the problem persists, replace the controller module SNMP trap ID N/A

nvram.battery.current.high.warn
Message Severity Description Corrective action SNMP trap ID nvram.battery.current.high.warn INFO This message occurs when the NVRAM battery current is above normal. None. N/A

nvram.battery.current.low
Message Severity Description nvram.battery.current.low NODE_ERROR This message occurs when the NVRAM battery has a short circuit.

Corrective action Replace the NVRAM battery/card. If the problem persists, replace the controller module SNMP trap ID N/A

nvram.battery.current.low.warn
Message Severity Description nvram.battery.current.low.warn NODE_ERROR This message occurs when the NVRAM battery current is below normal.

Corrective action First, correct any environmental problems. If the NVRAM battery current is still below normal, replace the NVRAM battery/card. If the problem persists, replace the controller module. SNMP trap ID N/A

188 | Platform Monitoring Guide

nvram.battery.current.normal
Message Severity Description Corrective action SNMP trap ID nvram.battery.current.normal INFO This message occurs when the NVRAM battery current is normal. None. N/A

nvram.battery.end_of_life.high
Message Severity Description nvram.battery.end_of_life.high INFO This message occurs when the NVRAM battery-cycle count indicates that the battery has reached its anticipated life expectancy.

Corrective action None. SNMP trap ID N/A

nvram.battery.end_of_life.normal
Message Severity Description nvram.battery.end_of_life.normal INFO This message occurs when the NVRAM battery-cycle count indicates that the battery is well below its anticipated life expectancy.

Corrective action None. SNMP trap ID N/A

nvram.battery.fault
Message Severity Description nvram.battery.fault NODE_ERROR This message occurs when the NVRAM battery is reporting a fatal fault condition. To prevent data loss, the system will shut down in 2 minutes.

Corrective action Correct any environmental problems, such as chassis over-temperature. If the battery still reports a fatal fault condition, replace the NVRAM battery/card. If the problem persists, replace the controller module.

EMS and operational messages | 189

SNMP trap ID

N/A

nvram.battery.fault.warn
Message Severity Description nvram.battery.fault.warn INFO This message occurs when the NVRAM battery is reporting a non-fatal fault condition.

Corrective action Correct any environmental problems, such as chassis over-temperature. SNMP trap ID N/A

nvram.battery.fcc.low
Message Severity Description nvram.battery.fcc.low NODE_ERROR This message occurs when the NVRAM battery full-charge capacity is low. To prevent data loss, the system will shut down in 24 hours.

Corrective action First, correct any environmental problems, such as chassis over-temperature. If the NVRAM full-charge capacity is still dangerously low, replace the NVRAM battery/card. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.fcc.low.critical
Message Severity Description nvram.battery.fcc.low.critical NODE_ERROR This message occurs when the NVRAM battery full-charge capacity is dangerously low. To prevent data loss, the system will shut down in 20 minutes.

Corrective action First, correct any environmental problems, such as chassis over-temperature. If the NVRAM full-charge capacity is still dangerously low, replace the NVRAM battery/card. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.fcc.low.warn
Message nvram.battery.fcc.low.warn

190 | Platform Monitoring Guide

Severity Description

INFO This message occurs when the NVRAM battery full-charge capacity is below normal.

Corrective action Replace the NVRAM battery/card during your next scheduled down-time (within 3 months). SNMP trap ID N/A

nvram.battery.fcc.normal
Message Severity Description nvram.battery.fcc.normal INFO This message occurs when the NVRAM battery full-charge capacity is normal.

Corrective action None. SNMP trap ID N/A

nvram.battery.power.fault
Message Severity Description nvram.battery.power.fault NODE_ERROR This message occurs when the NVRAM battery is not getting powered.

Corrective action Correct any environmental problems such as chassis over-temperature. If the NVRAM battery is still not getting power, replace the NVRAM battery/card. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.power.normal
Message Severity Description Corrective action SNMP trap ID nvram.battery.power.normal INFO This message occurs when the NVRAM battery power is normal. None. N/A

nvram.battery.sensor.unreadable
Messages nvram.battery.sensor.unreadable

EMS and operational messages | 191

Severity Description

INFO This message occurs when the battery state of the battery-backed memory (NVRAM) is unknown. One of the battery sensors is not readable.

Corrective action Shut down the system and power-cycle the controller module. If the problem is not corrected, replace the NVRAM battery/card. If the sensor is still not readable, replace the controller module. SNMP trap ID N/A

nvram.battery.temp.high
Message Severity Description nvram.battery.temp.high NODE_ERROR This message occurs when the NVRAM battery is too hot and the system is at a high risk of data loss if power fails.

Corrective action If the system is excessively warm, allow it to cool gradually. If the NVRAM battery temperature reading is still too high, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.temp.high.warn
Message Severity Description Corrective action SNMP trap ID nvram.battery.temp.high.warn INFO This message occurs when the NVRAM battery temperature is high. None. N/A

nvram.battery.temp.low
Message Severity Description nvram.battery.temp.low NODE_ERROR This message occurs when the NVRAM battery is too cold and the system is at a high risk of data loss if power fails.

Corrective action If the system is excessively cold, allow it to warm gradually. If the NVRAM battery temperature reading is still too low, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

192 | Platform Monitoring Guide

nvram.battery.temp.low.warn
Message Severity Description Corrective action SNMP trap ID nvram.battery.temp.low.warn INFO This message occurs when the NVRAM battery temperature is low. None. N/A

nvram.battery.temp.normal
Message Severity Description Corrective action SNMP trap ID nvram.battery.temp.normal INFO This message occurs when the NVRAM battery temperature is normal. None. N/A

nvram.battery.voltage.high
Message Severity Description nvram.battery.voltage.high NODE_ERROR This message occurs when the NVRAM battery voltage is excessively high and the system will shut down.

Corrective action First, correct any environmental problems, such as chassis over-temperature. If the NVRAM battery voltage is still too high, replace the battery pack. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.voltage.high.warn
Message Severity Description Corrective action SNMP trap ID nvram.battery.voltage.high.warn INFO This message occurs when the NVRAM battery voltage is above normal. None. N/A

EMS and operational messages | 193

nvram.battery.voltage.low
Message Severity Description nvram.battery.voltage.low NODE_ERROR This message occurs when the NVRAM battery voltage is critically low. To prevent data loss, the system will shut down in 2 minutes.

Corrective action First correct any environmental problem, such as chassis over-temperature. If the NVRAM battery voltage is still critically low, replace the NVRAM battery/ card. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.voltage.low.warn
Message Severity Description nvram.battery.voltage.low.warn INFO This message occurs when the NVRAM battery voltage is below normal. To prevent data loss, the system will shut down in 24 hours.

Corrective action First, correct any environmental problems such as chassis over-temperature. If the NVRAM battery voltage is still below normal, replace the NVRAM battery/ card. If the problem persists, replace the controller module. SNMP trap ID N/A

nvram.battery.voltage.normal
Message Severity Description Corrective action SNMP trap ID nvram.battery.voltage.normal INFO This message occurs when the NVRAM battery voltage is normal. None. N/A

nvram.hw.initFail
Message Severity Description nvram.hw.initFail ERR This message occurs when the Data ONTAP NVRAM hardware fails to initialize.

194 | Platform Monitoring Guide

Corrective action Typically, this type of error is unexpected and indicates that the NVRAM hardware is failing and should be replaced. Contact technical support for assistance with the replacement. SNMP trap ID N/A

SAS EMS messages


SAS EMS messages inform you of events and problems involving your system SAS disk drives.

ds.sas.config.warning
Message Severity Description ds.sas.config.warning WARNING This message occurs when the system detects a configuration problem on the shelf I/O module.

Corrective action 1. Reseat the disk shelf I/O module. 2. If that does not fix the problem, replace the disk shelf I/O module. SNMP trap ID N/A

ds.sas.crc.err
Message Severity Description ds.sas.crc.err DEBUG This message occurs when a serial-attached SCSI (SAS) cyclic redundancy check (CRC) error is detected.

Corrective action N/A SNMP trap ID N/A

ds.sas.drivephy.disableErr
Message Severity Description ds.sas.drivephy.disableErr ERR This message occurs when a physical layer device (PHY) on a serial-attached SCSI (SAS) I/O module is disabled because of one of the following reasons: Manually bypassed

EMS and operational messages | 195

Exceeded loss of double word synchronization threshold Exceeded running disparity threshold transmitter fault Exceeded cyclic redundancy check (CRC) error threshold Exceeded invalid double word threshold Exceeded PHY reset problem threshold Exceeded broadcast change threshold Mirroring disabled on the other I/O module

Corrective action Replace the disabled disk drive. SNMP Trap ID #574

ds.sas.element.fault
Message Severity Description ds.sas.element.fault ERR This message indicates a transport error.

Corrective action 1. Check cabling to the disk shelf. 2. Check the status LED on the disk shelf and make sure that fault LEDs are not on. 3. Clear any fault condition, if possible. 4. See the quick reference card beneath the disk shelf for information about the meanings of the LEDs. SNMP trap ID N/A

ds.sas.element.xport.error
Message Severity Description ds.sas.element.xport.error ERR This message indicates a transport error.

Corrective action 1. Check cabling to the disk shelf. 2. Check the status LED on the disk shelf and make sure that fault LEDs are not on. 3. Clear any fault condition, if possible 4. See the quick reference card beneath the disk shelf for information about the meanings of the LEDs.

196 | Platform Monitoring Guide

SNMP trap ID

N/A

ds.sas.hostphy.disableErr
Message Severity Description ds.sas.hostphy.disableErr ERR This message occurs when a host physical layer device (PHY) on a serialattached SCSI (SAS) I/O module is disabled because of one of the following reasons: Manually bypassed Exceeded loss of double word synchronization threshold Exceeded running disparity threshold Transmitter fault Exceeded cyclic redundancy check (CRC) error threshold Exceeded invalid double word threshold Exceeded PHY reset problem threshold Exceeded broadcast change threshold Mirroring disabled on the other I/O module

Corrective action Replace the disk shelf module to which the host physical layer device belongs. SNMP trap ID N/A

ds.sas.invalid.word
Message Severity Description ds.sas.invalid.word DEBUG This message occurs when a serial-attached SCSI (SAS) word error is detected in a SAS primitive. These errors can be caused by the disk drive, the cable, the host bus adapter (HBA), or the shelf I/O module.

Corrective action The SAS specification allows for a certain bit error rate so that these errors can occur. There is nothing to be alarmed about if these individual errors show up occasionally. SNMP trap ID N/A

ds.sas.loss.dword
Message Severity ds.sas.loss.dword DEBUG

EMS and operational messages | 197

Description

This message occurs when a serial-attached SCSI (SAS) loss of double word synchronization error is detected in a SAS primitive.

Corrective action N/A SNMP trap ID N/A

ds.sas.multPhys.disableErr
Message Severity Description ds.sas.multPhys.disableErr ERR This message occurs when physical layer devices (PHYs) are disabled on multiple disk drives in a serial-attached SCSI (SAS) disk shelf.

Corrective action 1. Check whether the problems on the physical layer devices are valid. 2. If multiple physical layer devices are disabled at the same time, replace the disk shelf module. SNMP trap ID N/A

ds.sas.phyRstProb
Message Severity Description ds.sas.phyRstProb DEBUG This message occurs when a serial-attached SCSI (SAS) physical layer device (PHY) reset error is detected in a SAS primitive.

Corrective action N/A SNMP trap ID N/A

ds.sas.running.disparity
Message Severity Description ds.sas.running.disparity DEBUG This message occurs when a serial-attached SCSI (SAS) running disparity error is detected in a SAS primitive. These errors are caused when the number of logical 1s and 0s are too much out of sync.

Corrective action N/A SNMP trap ID N/A

198 | Platform Monitoring Guide

ds.sas.ses.disableErr
Message Severity Description ds.sas.ses.disableErr NODE_ERROR This message occurs when a virtual SCSI Enclosure Services (SES) physical layer device (PHY) on a serial-attached SCSI (SAS) I/O module is disabled due to one of the following reasons: Manually bypassed Exceeded loss of double word synchronization threshold Exceeded running disparity threshold Transmitter fault Exceeded cyclic redundancy check (CRC) error threshold Exceeded invalid double word threshold Exceeded PHY reset problem threshold Exceeded broadcast change threshold

Corrective action Replace the shelf module to which the concerned SES physical layer device belongs. SNMP trap ID N/A

ds.sas.xfer.element.fault
Message Severity Description ds.sas.xfer.element.fault ERR This message indicates that an element had a fault during an I/O request. It might be because of a transient condition in link connectivity.

Corrective action 1. Check cabling to the shelf. 2. Check the status LED on the shelf, and make sure that fault LEDs are not on. 3. Clear any fault condition, if possible. 4. See the quick reference card beneath the shelf for information about the meanings of the LEDs. SNMP trap ID N/A

ds.sas.xfer.export.error
Message ds.sas.xfer.export.error

EMS and operational messages | 199

Severity Description

ERR This message indicates a transport error during an I/O request. It might be due to a transient condition in link activity.

Corrective action 1. Check cabling to the shelf. 2. Check cabling to the shelf. 3. Clear any fault condition, if possible. 4. See the quick reference card beneath the shelf for information about the meanings of the LEDs. SNMP trap ID N/A

ds.sas.xfer.not.sent
Message Severity Description ds.sas.xfer.not.sent ERR This message indicates that an I/O transfer could not be sent. It might be because of a transient condition in link connectivity.

Corrective action 1. Check cabling to the shelf. 2. Check the status LED on the shelf, and make sure that fault LEDs are not on. 3. Clear any fault condition, if possible. 4. See the quick reference card beneath the shelf for information about the meanings of the LEDs. SNMP trap ID N/A

ds.sas.xfer.unknown.error
Message Severity Description ds.sas.xfer.unknown.error ERR This message indicates that an unknown error occurred during an I/O request.

Corrective action N/A SNMP trap ID N/A

200 | Platform Monitoring Guide

sas.adapter.bad
Message Severity Description sas.adapter.bad ALERT This message occurs when the serial-attached SCSI (SAS) adapter fails to initialize.

Corrective action 1. Reseat the adapter. 2. If reseating the adapter failed to help, replace the adapter. SNMP trap ID N/A

sas.adapter.bootarg.option
Message Severity Description sas.adapter.bootarg.option INFO The serial-attached SCSI (SAS) adapter driver is setting an option based on the setting of a bootarg/environment variable.

Corrective action None SNMP trap ID N/A

sas.adapter.debug
Message Severity Description sas.adapter.debug INFO This message occurs during the serial-attached SCSI (SAS) adapter driver debug event.

Corrective action None SNMP trap ID N/A

sas.adapter.exception
Message Severity Description sas.adapter.exception WARNING This message occurs when the serial-attached SCSI (SAS) adapter driver encounters an error with the adapter. The adapter is reset to recover.

Corrective action None.

EMS and operational messages | 201

SNMP trap ID

N/A

sas.adapter.failed
Message Severity Description sas.adapter.failed ERR This message occurs when the serial-attached SCSI (SAS) adapter driver cannot recover the adapter after resetting it multiple times. The adapter is put offline.

Corrective action 1. If the adapter is in use, check the cabling. 2. If connected to disk shelves, check the seating of IOM cards and disks. 3. If the problem persists, try replacing the adapter. 4. If the issue is still not resolved, contact technical support. SNMP trap ID N/A

sas.adapter.firmware.download
Message Severity Description sas.adapter.firmware.download INFO This message occurs when firmware is being updated on the serial-attached SCSI (SAS) adapter.

Corrective action None. SNMP trap ID N/A

sas.adapter.firmware.fault
Message Severity Description sas.adapter.firmware.fault WARNING This message occurs when a firmware fault is detected on the serial-attached SCSI (SAS) adapter and it is being reset to recover.

Corrective action None. SNMP trap ID N/A

sas.adapter.firmware.update.failed
Message sas.adapter.firmware.update.failed

202 | Platform Monitoring Guide

Severity Description

CRIT This message occurs when firmware on the serial-attached SCSI (SAS) adapter cannot be updated.

Corrective action Replace the adapter as soon as possible. The SAS adapter driver attempts to continue using the adapter without updating the firmware image. SNMP trap ID N/A

sas.adapter.not.ready
Message Severity Description sas.adapter.not.ready ERR This message occurs when the serial-attached SCSI (SAS) adapter does not become ready after being reset.

Corrective action The SAS adapter driver automatically attempts to recover from this error. If the error keeps occurring, the adapter might need to be replaced. SNMP trap ID N/A

sas.adapter.offline
Message Severity Description sas.adapter.offline INFO This message indicates the name of the associated serial-attached SCSI (SAS) host bus adapter (HBA).

Corrective action None. SNMP trap ID N/A

sas.adapter.offlining
Message Severity Description sas.adapter.offlining INFO This message occurs when the serial-attached SCSI (SAS) adapter is going offline after all outstanding I/O requests have finished.

Corrective action None. SNMP trap ID N/A

EMS and operational messages | 203

sas.adapter.online
Message Severity Description sas.adapter.online INFO This message indicates that the serial-attached SCSI (SAS) adapter is now online.

Corrective action None. SNMP trap ID N/A

sas.adapter.online.failed
Message Severity Description sas.adapter.online.failed LOG_ERR This message indicates the name of the associated serial-attached SCSI (SAS) host bus adapter (HBA).

Corrective action 1. If the HBA is in use, check the cabling. 2. If the HBA is connected to disk shelves, check the seating of IOM cards. SNMP trap ID N/A

sas.adapter.onlining
Message Severity Description sas.adapter.onlining INFO This message indicates that the serial-attached SCSI (SAS) adapter is in the process of going online.

Corrective action None. SNMP trap ID N/A

sas.adapter.reset
Message Severity Description sas.adapter.reset INFO This message occurs when the Data ONTAP serial-attached SCSI (SAS) driver is resetting the specified HBA. This can occur during normal error handling or by user request.

204 | Platform Monitoring Guide

Corrective action None. SNMP trap ID N/A

sas.adapter.unexpected.status
Message Severity Description sas.adapter.unexpected.status WARNING This message occurs when the serial-attached SCSI (SAS) adapter returns an unexpected status and is reset to recover.

Corrective action None. SNMP trap ID N/A

sas.cable.error
Message Severity Description sas.cable.error WARNING Failure to retrieve information about cable attached to the serial-attached SCSI (SAS) adapter port occurred.

Corrective action None. SNMP trap ID N/A

sas.cable.pulled
Message Severity Description sas.cable.pulled INFO The cable attached to the serial-attached SCSI (SAS) adapter port was pulled out.

Corrective action None. SNMP trap ID N/A

sas.cable.pushed
Message Severity Description sas.cable.pushed INFO The cable attached to the serial-attached SCSI (SAS) adapter port was pushed in

EMS and operational messages | 205

Corrective action None. SNMP trap ID N/A

sas.config.mixed.detected
Message Severity Description sas.config.mixed.detected WARNING This message occurs when a serial-attached SCSI (SAS) disk shelf contains a mixture of SAS drives, serial advanced technology attachment (SATA) drives or bridged SAS drives. Mixing drive types within a disk shelf is not supported.

Corrective action Ensure that each SAS disk shelf is populated with drives of only one type. SNMP trap ID N/A

sas.device.invalid.wwn
Message Severity Description sas.device.invalid.wwn ERR This message occurs when the serial-attached SCSI (SAS) device responds with an invalid worldwide name.

Corrective action Power-cycling the device might allow it to recover from this problem. SNMP trap ID N/A

sas.device.quiesce
Message Severity Description sas.device.quiesce INFO This message indicates that at least one command to the specified device has not completed in the normally expected time. In this case, the driver stops sending additional commands to the device until all outstanding commands have had an opportunity to be completed. This condition is automatically handled by the Data ONTAP serial-attached SCSI (SAS) driver. This condition by itself does not mean that the target device is problematic. High workloads might cause link saturation leading to device contention for the bus. Transport issues might also cause link throughput to decrease, thereby causing I/Os to take longer than normal. If you see this message only on occasion, no action is required. The system handles the condition automatically.

Corrective action

206 | Platform Monitoring Guide

SNMP trap ID

N/A

sas.device.resetting
Message Severity Description sas.device.resetting WARNING This message indicates device level error recovery has escalated to resetting the device. It is usually seen in association with error conditions such as device level timeouts or transmission errors. This message reports the recovery action taken by the Data ONTAP serialattached SCSI (SAS) driver when evaluating associated device-related or linkrelated error conditions. Corrective action None. SNMP trap ID N/A

sas.device.timeout
Message Severity Description sas.device.timeout ERR This message occurs when not all outstanding commands to the specified device were completed within the allotted time. As part of the standard error handling sequence managed by the Data ONTAP serial-attached SCSI (SAS) driver, all commands to the device are aborted and reissued. Device level timeouts are a common indication of a SAS link stability problem. In some cases, the link is operating normally and the specified device is having trouble processing I/O requests in a timely manner. In such cases, the specified device should be evaluated for possible replacement. Quite often the problem results from the partial failure of a component involved in the SAS transport. Common things to check include the following: SNMP trap ID Complete seating of drive carriers in enclosure bays Properly secured cable connections IOM seating Crimped or otherwise damaged cables

Corrective action

N/A

EMS and operational messages | 207

sas.initialization.failed
Message Severity Description sas.initialization.failed ERR This message occurs when the serial-attached SCSI (SAS) adapter fails to initialize the link and appears to be unattached or disconnected.

Corrective action 1. If the adapter is in use, check the cabling. 2. If the adapter is connected to disk shelves, check the seating of IOM cards. SNMP trap ID N/A

sas.link.error
Message Severity Description sas.link.error ERR This message occurs when the serial-attached SCSI (SAS) adapter cannot recover the link and is going offline.

Corrective action 1. If the adapter is in use, check the cabling. 2. If the adapter is connected to disk shelves, check the seating of IOM cards and disks. 3. If this does not resolve the issue, contact technical support. SNMP trap ID N/A

sas.port.disabled
Message Severity Description sas.port.disabled WARNING The serial-attached SCSI (SAS) adapter port went down by virtue of being disabled by the operator.

Corrective action None. SNMP trap ID N//A

sas.port.down
Message sas.port.down

208 | Platform Monitoring Guide

Severity Description

WARNING The serial-attached SCSI (SAS) adapter port went down through no action by the operator.

Corrective action None. SNMP trap ID N/A

sas.shelf.conflict
Message Severity Description sas.shelf.conflict ERR This message occurs when the system detects that two or more SAS (Serial Attached SCSI) disk shelves have the same shelf ID. The SAS domain is functional, but references to disk shelves will be based on disk shelf serial numbers, not disk shelf IDs.

Corrective action Reassign disk shelf IDs so that no conflict exists. SNMP trap ID N/A

sasmon.adapter.phy.disable
Message Severity Description sasmon.adapter.phy.disable ERR This message occurs when a serial attached serial-attached SCSI (SAS) transceiver (physical layer device) attached to a SAS host bus adapter (HBA) is disabled due to one of the following reasons: Corrective action Exceeded loss of double word synchronization error threshold Exceeded running disparity error threshold Exceeded invalid double word error threshold Exceeded physical layer device reset problem threshold Exceeded broadcast change threshold

1. If the adapter is in use, check the cabling. 2. If the adapter is connected to the disk shelves, check the seating of the IOM cards. 3. If that does not fix the problem, contact technical support.

SNMP trap ID

N/A

EMS and operational messages | 209

sasmon.adapter.phy.event
Message Severity Description sasmon.adapter.phy.event DEBUG This message occurs when a serial attached serial-attached SCSI (SAS) transceiver (physical layer device) attached to a SAS host bus adapter (HBA) experiences a transient error. These errors are observed on a received double word (dword) or when resetting a PHY. Types of these errors are disparity errors, invalid dword errors, physical layer device (PHY) reset problem errors, loss of dword synchronization errors, and PHY change events. The SAS specification allows for a certain bit error rate so that these errors can occur under normal operating conditions. There is no cause for concern if these individual errors show up occasionally. Corrective action SNMP trap ID None. N/A

sasmon.disable.module
Message Severity Description sasmon.disable.module INFO This message occurs when the Data ONTAP module responsible for monitoring the serial attached serial-attached SCSI (SAS) domains transient errors is disabled due to the environment variable disable-sasmon? being set to true.

Corrective action Set the environment variable disable-sasmon? to false to enable this monitor module. SNMP trap ID N/A

SES EMS messages


SES messages appear in AutoSupport messages if failures or warning conditions occur in your systems storage components.

ses.access.noEnclServ
Message ses.access.noEnclServ

210 | Platform Monitoring Guide

Severity Description

NODE_ERROR This message occurs when SCSI Enclosure Services (SES) in the storage system cannot establish contact with the enclosure monitoring process in any disk shelf on the channel. Some disk shelves require that disks be installed and functioning in particular shelf bays.
Note: This message applies to DS14/DS14mk2/DS14mk4 disk shelves that are not -AT-type shelves. DS14mk2 is used in this message as an example.

Corrective action

1. In disk shelves that require certain disk placement, verify that disks are installed in the indicated bays: DS14/DS14mk2 FC: bays 0 and/or 1
Note: SCSI-based shelves, serial-attached SCSI (SAS) shelves, and DS14mk2 AT shelves do not rely on disk placement for SES. SES in the storage system tries periodically to reestablish contact with the disk shelf.

2. If disks are placed correctly but the error persists for more than an hour, halt the storage system, power-cycle the disk shelf, and reboot. 3. If the error persists, then SES hardware (for example, VEM, LRC, or IOM) might need to be replaced. In SCSI-based shelves, replace the shelf.

ses.access.noMoreValidPaths
Message Severity Description ses.access.noMoreValidPaths NODE_ERROR This message occurs when SCSI Enclosure Services (SES) in the storage system loses access to the enclosure monitoring process in the disk shelf. Some disk shelves require that disks be installed and functioning in particular shelf bays.
Note: This message applies to DS14/DS14mk2/DS14mk4 disk shelves that are not -AT-type shelves. DS14mk2 is used in this message as an example.

Corrective action

1. This message occurs when SES in the storage system loses access to the enclosure monitoring process in the disk shelf. Some disk shelves require that disks be installed and functioning in particular shelf bays: DS14/DS14mk2 FC: bays 0 and/or 1
Note: SCSI-based shelves, serial-attached SCSI (SAS) shelves, and

DS14mk2 AT shelves do not rely on disk placement for SES. SES in the storage system tries periodically to reestablish contact with the disk shelf.

EMS and operational messages | 211

2. If disks are placed correctly, but the error persists for more than an hour, halt the storage system, power-cycle the disk shelf, and reboot. 3. If the error persists, then SES hardware (for example, VEM, LRC, or IOM) might need to be replaced. In SCSI-based shelves, replace the shelf.

ses.access.noShelfSES
Message Severity Description ses.access.noShelfSES NODE_ERROR This message occurs when SCSI Enclosure Services (SES) in the storage system cannot establish contact with the SES process in the indicated disk shelf. Some disk shelves require that disks be installed and functioning in particular disk shelf bays.
Note: This message applies to DS14/DS14mk2/DS14mk4 disk shelves that are not -AT-type shelves. DS14mk2 is used in this message as an example.

Corrective action

1. In disk shelves that require certain disk placement, verify that disks are installed in the indicated bays: DS14/DS14mk2 FC: bays 0 and/or 1
Note: SCSI-based shelves, serial-attached SCSI (SAS) shelves, and DS14mk2 AT shelves do not rely on disk placement for SES.

SES in the storage system tries periodically to reestablish contact with the disk shelf. 2. If disks are placed correctly but the error persists for more than an hour, halt the storage system, power-cycle the disk shelf, and reboot. 3. If the error persists, then SES hardware (for example, VEM, LRC, or IOM) might need to be replaced. In SCSI-based shelves, replace the shelf.

ses.access.sesUnavailable
Message Severity Description ses.access.sesUnavailable NODE_ERROR This message occurs when SCSI Enclosure Services (SES) in the storage system cannot establish contact with the enclosure monitoring process in one or more disk shelves on the channel. Some disk shelves require that disks be installed and functioning in particular disk shelf bays.
Note: This message applies to DS14/DS14mk2/DS14mk4 disk shelves that are not -AT-type shelves. DS14mk2 is used in this message as an example.

Corrective action

212 | Platform Monitoring Guide

1. In disk shelves that require certain disk placement, verify that disks are installed in the indicated bays: DS14/DS14mk2 FC: bays 0 and/or 1
Note: SCSI-based shelves, serial-attached SCSI (SAS) shelves, and

DS14mk2 AT shelves do not rely on disk placement for SES. SES in the storage system tries periodically to reestablish contact with the disk shelf. 2. If disks are placed correctly but the error persists for more than an hour, halt the storage system, power-cycle the disk shelf, and reboot. 3. If the error persists, then SES hardware (for example, VEM, LRC, or IOM) might need to be replaced. In SCSI-based shelves, replace the shelf.

ses.badShareStorageConfigErr
Message Severity Description ses.badShareStorageConfigErr NODE_ERROR This message occurs when a disk shelf module that is not supported in a SharedStorage system, such as an LRC module, is detected in a SharedStorage system.

Corrective action Replace the unsupported module with one that is supported, such as an ESH, ESH2, or AT-FCX module.

ses.bridge.fw.getFailWarn
Message Severity Description ses.bridge.fw.getFailWarn WARNING This message occurs when the bridge firmware revision cannot be obtained.

Corrective action Check the connection to the bank of Maxtor drives.

ses.bridge.fw.mmErr
Message Severity Description ses.bridge.fw.mmErr SVC_ERROR This message occurs when the bridge firmware revision is inconsistent.

Corrective action Check the firmware revision number and make sure that they are consistent. You might have to update the firmware.

EMS and operational messages | 213

ses.channel.rescanInitiated
Message Severity Description ses.channel.rescanInitiated INFO This message identifies the name of the adapter port or switch port being rescanned; for example, 7a or myswitch:5.

Corrective action None.

ses.disk.pctl.timeout
Message Severity Description Corrective action ses.disk.pctl.timeout DEBUG This message occurs when a power control request submitted to the specified SCSI Enclosure Services (SES) module is not completed within 60 seconds. Normally, there is no corrective action required for this error because the timeout might be due to a transient error. However, if you see this message frequently, there might be an issue with the I/O module in the shelf, which might need to be replaced.

ses.config.drivePopError
Message Severity Description ses.config.drivePopError WARNING This message occurs when the channel has more disk drives on it than are allowed. Systems using synchronous mirroring allow more disk drives per channel than other systems. Corrective action Your action depends on whether you intend to use synchronous mirroring. If you intend to use synchronous mirroring, make sure that the license is installed. If you do not intend to use synchronous mirroring, reduce the number of disk drives on the channel to no more than the maximum allowed.

ses.config.IllegalEsh270
Message Severity ses.config.IllegalEsh270 NODE_ERROR

214 | Platform Monitoring Guide

Description

This message occurs when Data ONTAP detects one or more ESH disk shelf modules in a disk shelf that is attached to a FAS270 system. This is not a supported configuration.

Corrective action Replace the ESH modules with ESH2 modules.

ses.config.shelfMixError
Message Severity Description Corrective action ses.config.shelfMixError NODE_ERROR This message occurs when the channel has a mixture of ATA and Fibre Channel disk shelves; this is not a supported configuration. Mixed-mode operation of ATA and Fibre Channel disks on the system is only supported on separate loops. Move all Fibre Channel-based disk shelves to one loop and place all Fibre Channel-to-ATA-based disk shelves on another loop.

ses.config.shelfPopError
Message Severity Description ses.config.shelfPopError NODE_ERROR This message occurs when the channel has more shelves on it than are allowed.

Corrective action Reduce the number of disk shelves on the channel to the number specified.

ses.disk.configOk
Message Severity Description ses.disk.configOk INFO This message occurs when there are no longer any drives in a FAS2050 or an SA200 system slots between 20 and 23.

Corrective action None.

ses.disk.illegalConfigWarn
Message Severity Description ses.disk.illegalConfigWarn WARNING This message occurs when disk drives are inserted into the bottom row of a FAS2050 or an SA200 system. Disk drives are not supported in those slots.

Corrective action None.

EMS and operational messages | 215

ses.disk.pctl.timeout
Message Severity Description Corrective action ses.disk.pctl.timeout DEBUG This message occurs when a power control request submitted to the specified SCSI Enclosure Services (SES) module is not completed within 60 seconds. Normally, there is no corrective action required for this error because the timeout might be due to a transient error. However, if you see this message frequently, there might be an issue with the I/O module in the shelf, which might need to be replaced.

ses.download.powerCyclingChannel
Message Severity Description ses.download.powerCyclingChannel INFO This message occurs when the power-cycling channel event is issued after a disk shelf firmware download to disk shelves that require a power-cycle to activate the new code.

Corrective action None.

ses.download.shelfToReboot
Message Severity Description ses.download.shelfToReboot INFO This message occurs after the completion of shelf firmware transfer to the DS14mk2 AT disk shelf. At this point, the disk shelf requires about another five minutes to transfer the new firmware to its nonvolatile program memory, whereupon it reboots to begin to execute the new firmware. During this reboot, a Fibre Channel loop reinitialization occurs, temporarily interrupting the loop. None.

Corrective action

ses.download.suspendIOForPowerCycle
Message Severity ses.download.suspendIOForPowerCycle INFO

216 | Platform Monitoring Guide

Description

This message occurs when the suspending I/O event signals that the storage subsystem is temporarily stopping I/O to disks while one or more disk shelves have their power cycled after a download, if required by the disk shelf design. None.

Corrective action

ses.drive.PossShelfAddr
Message Severity Description ses.drive.PossShelfAddr WARNING This message occurs in conjunction with the message ses.drive.shelfAddr.mm when there are devices that have apparently taken a wrong address; the adapter shows device addresses that SCSI Enclosure Services (SES) indicates should not exist, and vice versa. This error is not a fatal condition. It means that SES cannot perform certain operations on the affected disk drives, such as setting failure LEDs, because it is not certain which disk shelf the affected disk drive is in. Corrective action 1. If the problem is throughout the disk shelf, replace the disk shelf. 2. If the error is only one disk drive per disk shelf, the drive might have taken an incorrect address at power-on. 3. Arrange to make this disk drive a spare, and then reseat it to cause it to take its address again. 4. If the problem persists, insert a different spare disk drive into the slot. If the error then clears, replace the original disk drive. 5. If the problem persists, there is a hardware problem with the individual disk bay. Replace the disk shelf.

ses.drive.shelfAddr.mm
Message Severity Description ses.drive.shelfAddr.mm NODE_ERROR This message occurs when there is a mismatch between the position of the drives detected by the disk shelf and the address of the drives detected by the Fibre Channel loop or SCSI bus. This error indicates that a disk drive took an address other than what the disk shelf should have provided, or that SCSI Enclosure Services (SES) in a disk shelf cannot be contacted for address information, or that a disk drive unexpectedly does not participate in device discovery on the loop or bus.

EMS and operational messages | 217

If the message EMS_ses_drive_possShelfAddr subsequently appears, follow the corrective actions in that message. In this condition, the SES process in the system might be unable to perform certain operations on the disk, such as setting failure LEDs or detecting disk swaps. Corrective action
Note: This message applies to DS14/DS14mk2/DS14mk4 disk shelves that are not -AT-type shelves. DS14mk2 is used in this message as an example.

1. If this occurs to multiple disk drives on the same loop, check the I/O modules at the back of the disk shelves on that loop for errors. 2. In disk shelves that require certain disk placement, verify that disks are installed in the indicated bays: DS14/DS14mk2 FC: bays 0 and/or 1
Note: SCSI-based disk shelves and DS14mk2 AT disk shelves do not rely on

disk placement for SES.

ses.exceptionShelfLog
Message Severity Description Corrective action ses.exceptionShelfLog DEBUG This message occurs when an I/O module encounters an exception condition. 1. Check the system logs to see whether any disk errors recently occurred. 2. Pull an AutoSupport message file that contains the latest copy of the shelf log information from each disk shelf. 3. Try to correlate the date and time from the errors in the message file with the date and time of events in the shelf log file.

ses.extendedShelfLog
Message Severity Description ses.extendedShelfLog DEBUG This message occurs when a disk encounters an error and the system requests that additional log information be obtained from both modules in the disk shelf reporting the error to aid in debugging problems. 1. Check the system logs to see whether any disk errors recently occurred. 2. Pull an AutoSupport message file that contains the latest copy of the shelf log information from each disk shelf.

Corrective action

218 | Platform Monitoring Guide

3. Try to correlate the date and time from the errors in the message file with the date and time of events in the shelf log file.

ses.fw.emptyFile
Message Severity Description ses.fw.emptyFile WARNING This message occurs when a firmware file is found to be empty during a disk shelf firmware update.

Corrective action Obtain the correct firmware file and place it in the etc/shelf_fw directory. You can download the firmware file from the NOW site at http://now.netapp.com/.

ses.fw.resourceNotAvailable
Message Severity Description ses.fw.resourceNotAvailable ERR This message occurs when there is not enough contiguous memory available to download disk shelf firmware.

Corrective action 1. Reduce the amount of system activities before performing a manual disk shelf firmware update. 2. If the disk shelf firmware update fails again, reboot the storage system.

ses.giveback.restartAfter
Message Severity Description ses.giveback.restartAfter INFO This message occurs when SCSI Enclosure Services (SES) is restarted after giveback.

Corrective action None.

ses.giveback.wait
Message Severity Description ses.giveback.wait INFO This message occurs when SCSI Enclosure Services (SES) information is not available because the system is waiting for giveback.

EMS and operational messages | 219

Corrective action None.

ses.psu.coolingReqError
Message Severity Description ses.psu.coolingReqError LOG_CRIT This message occurs when the installed power supplies are placed so that air-flow requirements of the disk shelf are not met. The power supply chassis and their power supplies are an integral part of the disk shelf cooling and air-flow design. Verify that the power supplies are placed in the locations required to provide proper air flow according to the disk shelf specifications. DS14-style shelves always require both power supplies. SAS-Shelf24 requires power supplies in power supply bays 1 and 4 for proper air flow and cooling.

Corrective action

ses.psu.powerReqError
Message Severity Description ses.psu.powerReqError LOG_CRIT This message occurs when too few power supplies are installed to redundantly satisfy the current-draw requirements of the disk drives in the disk shelf. This might occur if a power supply is removed or fails. Some disk drive models require more power than others. If the disk shelf specifications for the installed drive models specify more power supplies to support that disk type, then this condition can also occur at disk swap or insertion in some disk shelves. Verify that the number of power supplies installed satisfies the power requirements of the installed disk drives. DS14-style shelves always require both power supplies. SAS-Shelf24 requires power supplies in power supply bays 1 and 4 for proper cooling and air flow. If any disk drives are 10K RPM or faster, then power supply bays 2 and 3 must also have power supplies.

Corrective action

ses.remote.configPageError
Message Severity Description ses.remote.configPageError INFO This message occurs when a request to another system in a SharedStorage configuration fails. This request was for a specific disk shelf's SCSI Enclosure Services (SES) configuration page.

Corrective action Contact technical support.

220 | Platform Monitoring Guide

ses.remote.elemDescPageError
Message Severity Description ses.remote.elemDescPageError INFO This message occurs when a request to another system in a SharedStorage configuration fails. This request was for the element descriptor pages that the other system has local access to.

Corrective action Contact technical support.

ses.remote.faultLedError
Message Severity Description ses.remote.faultLedError INFO This message occurs when a request to another system to have it set the fault LED of a disk drive on a disk shelf fails.

Corrective action Contact technical support.

ses.remote.flashLedError
Message Severity Description ses.remote.flashLedError INFO This message occurs when a request to another system to have it flash the LED of a disk drive on a disk shelf fails.

Corrective action Contact technical support.

ses.remote.shelfListError
Message Severity Description ses.remote.shelfListError INFO This message occurs when a request to another system in a SharedStorage configuration fails. This request was for a list of the disk shelves that the other system has local access to.

Corrective action Contact technical support.

ses.remote.statPageError
Message ses.remote.statPageError

EMS and operational messages | 221

Severity Description

INFO This message occurs when a request to another system in a SharedStorage configuration fails. This request was for the SCSI Enclosure Services (SES) status pages that the other system has local access to.

Corrective action Contact technical support.

ses.shelf.changedID
Message Severity Description Corrective action ses.shelf.changedID WARNING This message occurs on a SAS disk shelf when the disk shelf ID changes after power is applied to the disk shelf. 1. Verify that the disk shelf ID displayed in this message is the same as the disk shelf ID shown on the disk shelf. 2. If they are different, perform one of the following steps: If the disk shelf ID displayed in this message is the one you want, reset the disk shelf ID on the thumbwheel to match it. If you want the new disk shelf ID instead of the disk shelf ID displayed in the message, verify that the disk shelf ID you want does not conflict with other disk shelves in the domain.

3. Power-cycle the disk shelf chassis. You can wait to perform this procedure until your next maintenance window. 4. If the warning persists on both disk shelf modules after you complete the procedure, replace the disk shelf chassis. If it persists on only one disk shelf module, replace the disk shelf module.

ses.shelf.ctrlFailErr
Message Severity Description Corrective action ses.shelf.ctrlFailErr SVC_ERROR This message occurs when the adapter and loop ID of the SCSI Enclosure Services (SES) target for which the SES has control fail. 1. Check the LEDs on the disk shelf and the disk shelf modules on the back of the disk shelf to see whether there are any abnormalities. If the modules appear to be problematic, replace the applicable module.

222 | Platform Monitoring Guide

2. If the SES target is a disk drive, check to see whether the disk drive failed. If it failed, replace the disk drive.

ses.shelf.em.ctrlFailErr
Message Severity Description ses.shelf.em.ctrlFailErr SVC_ERROR This message occurs when SCSI Enclosure Services (SES) control to the internal disk drives of a system fails.

Corrective action 1. Enter environment shelf to see whether that disk shelf is still being actively monitored. 2. If the environment shelf command indicates a failure, there is a hardware failure in the system's internal disk shelf.

ses.shelf.IdBasedAddr
Message Severity Description ses.shelf.IdBasedAddr WARNING This message occurs on a serial-attached SCSI (SAS) disk shelf when the SAS addresses of the devices are based on the disk shelf ID instead of the disk shelf backplane serial number. This indicates problems communicating with the disk shelf backplane. 1. Reseat the master disk shelf module, as indicated by the output of the environment shelf command. 2. If the problem persists, reseat the slave disk shelf module. 3. If the problem persists, find the new master disk shelf module and replace it. 4. If the problem persists, replace the other disk shelf module. 5. If the problem persists, replace the disk shelf enclosure.

Corrective action

ses.shelf.invalNum
Message Severity Description ses.shelf.invalNum WARNING This message occurs when Data ONTAP detects that a serial-attached SCSI (SAS) shelf connected to the system has an invalid shelf number.

EMS and operational messages | 223

Corrective action 1. Power-cycle the shelf. 2. If the problem persists, replace the shelf modules. 3. If the problem persists, replace the shelf.

ses.shelf.mmErr
Message Severity Description Corrective action ses.shelf.mmErr SVC_FAULT This message occurs when there is a disk shelf that is not supported by the platform it was booted on. 1. Check whether the current version of Data ONTAP supports the disk shelf. 2. If the current version of Data ONTAP does not support the disk shelf, install a version that does support the disk shelf. If the disk shelf is supported, the error might be cleared by hourly attempts by Data ONTAP to establish proper contact with the disk shelf.

ses.shelf.OSmmErr
Message Severity Description ses.shelf.OSmmErr SVC_ERROR This message occurs when there are incompatible Data ONTAP versions in a SharedStorage configuration that would cause SCSI Enclosure Services (SES) not to function properly.

Corrective action Update the system that has an earlier Data ONTAP version to match the one that has the latest Data ONTAP version.

ses.shelf.powercycle.done
Message Severity Description Corrective action ses.shelf.powercycle.done INFO This message occurs when a disk shelf power-cycle finishes. None.

ses.shelf.powercycle.start
Message ses.shelf.powercycle.start

224 | Platform Monitoring Guide

Severity Description

INFO This message occurs when a disk shelf is power-cycled and SCSI Enclosure Services (SES) needs to wait for it to finish.

Corrective action None.

ses.shelf.sameNumReassign
Message Severity Description ses.shelf.sameNumReassign WARNING This message occurs when Data ONTAP detects more than one serial-attached SCSI (SAS) disk shelf connected to the same adapter with the same shelf number. 1. Change the shelf number on the shelf to one that does not conflict with other shelves attached to the same adapter. Halt the system and reboot the shelf. 2. If the problem persists, contact technical support.

Corrective action

ses.shelf.unsupportAllowErr
Message Description
ses.shelf.unsupportAllowErr

This message occurs when a disk shelf is not supported by Data ONTAP. Data ONTAP will continue to use the disk shelf, but environmental monitoring of the disk shelf is not possible. SVC_FAULT 1. Check whether the current version of Data ONTAP supports the disk shelf. 2. If the current version of Data ONTAP does not support the disk shelf, install a version that does support the disk shelf. If the disk shelf is supported, the error might be cleared by hourly attempts by Data ONTAP to establish proper contact with the disk shelf.

Severity Corrective action

ses.shelf.unsupportedErr
Message Severity Description ses.shelf.unsupportedErr SVC_FAULT This message occurs when there is a disk shelf that is not supported by Data ONTAP.

EMS and operational messages | 225

Corrective action Check whether this disk shelf is supported by a newer version of Data ONTAP. If it is, upgrade to the appropriate version.

ses.startTempOwnership
Message Severity Description ses.startTempOwnership DEBUG This message occurs when SCSI Enclosure Services (SES) is starting temporary ownership acquisition of disks owned by other nodes. This involves removing the disk reservations while the SES operations are in progress

Corrective action Contact technical support.

ses.status.ATFCXError
Message Severity Description ses.status.ATFCXError NODE_ERROR This message occurs when the reporting disk shelf detects an error in the indicated AT-FCX module. The module might not be able to perform I/O to disks within the disk shelf.

Corrective action 1. Verify that the AT-FCX module is fully seated and secured. 2. If the problem persists, replace the AT-FCX module.

ses.status.ATFCXInfo
Message Severity Description ses.status.ATFCXInfo INFO This message occurs when a previously reported error in the AT-FCX module is corrected, or the system reports other information that does not necessarily require customer action.

Corrective action None.

ses.status.currentError
Message Severity Description ses.status.currentError NODE_ERROR This message occurs when a critical condition is detected in the indicated storage shelf current sensor. The shelf might be able to continue operation.

226 | Platform Monitoring Guide

Corrective action 1. Verify that the power supply and the AC line are supplying power. 2. Monitor the power grid for abnormalities. 3. Replace the power supply. 4. If the problem persists, contact technical support.

ses.status.currentInfo
Message Severity Description ses.status.currentInfo INFO This message occurs when an error or warning condition previously reported by or about the disk shelf current sensor is corrected, or the system reports other information about the current in the disk shelf that does not necessarily require customer action.

Corrective action None.

ses.status.currentWarning
Message Severity Description ses.status.currentWarning WARNING This message occurs when a warning condition is detected in the indicated storage shelf current sensor. The shelf might be able to continue operation.

Corrective action 1. Verify that the power supply and the AC line are supplying power. 2. Monitor the power grid for abnormalities. 3. Replace the power supply. 4. If the problem persists, contact technical support.

ses.status.displayError
Message Severity Description ses.status.displayError NODE_ERROR This message occurs when the SCSI Enclosure Services (SES) module in the disk shelf detects an error in the disk shelf display panel. The disk shelf might be unable to provide correct addresses to its disks. 1. If possible, verify that the connection between the disk shelf and the display is secure.

Corrective action

EMS and operational messages | 227

2. Verify that the SES module or modules are fully seated; replacing them might solve the problem. 3. If the problem persists, the SES module that detected the warning condition might be faulty. 4. If the problem persists after the module or modules are replaced, replace the disk shelf. 5. If the problem persists, contact technical support.

ses.status.displayInfo
Message Severity Description ses.status.displayInfo INFO This message occurs when a previous condition in the display panel is corrected.

Corrective action None.

ses.status.displayWarning
Message Severity Description ses.status.displayWarning WARNING This message occurs when the SCSI Enclosure Services (SES) module detects a warning condition for the disk shelf display panel. The disk shelf might be unable to provide correct addresses to its disks. 1. If possible, verify that the connection between the disk shelf and the display is secure. 2. Verify that the SES module or modules are fully seated; replacing them might solve the problem. 3. If the problem persists, the SES module that detected the warning condition might be faulty. 4. If the problem persists after the module or modules are replaced, replace the disk shelf. 5. If the problem persists, contact technical support.

Corrective action

ses.status.driveError
Message ses.status.driveError

228 | Platform Monitoring Guide

Severity Description Corrective action

NODE_ERROR This message occurs when a critical condition is detected for the disk drive in the shelf. The drive might fail. 1. Make sure that the drive is not running on a degraded volume. If it is, then add as many spares as necessary into the system, up to the specified level. 2. After the volume is no longer in degraded mode, replace the drive that is failing.

ses.status.driveOk
Message Severity Description ses.status.driveOk INFO This message occurs when a disk drive that was previously experiencing problem returns to normal operation.

Corrective action None.

ses.status.driveWarning
Message Severity Description Corrective action ses.status.driveWarning NODE_ERROR This message occurs when a non-critical condition is detected for the disk drive in the shelf. The drive might fail. 1. Make sure that the drive is not running on a degraded volume. If it is, then add as many spares as necessary into the system, up to the specified level. 2. After the volume is no longer in degraded mode, replace the drive that is failing.

ses.status.electronicsError
Message Severity Description ses.status.electronicsError NODE_ERROR This message occurs when a failure has been detected in the module that provides disk SCSI Enclosure Services (SES) monitoring capability.

Corrective action Replace the module. In some disk shelf types, this function is integrated into the Fibre Channel, SCSI, or serial-attached SCSI (SAS) interface modules.

EMS and operational messages | 229

ses.status.electronicsInfo
Message Severity Description ses.status.electronicsInfo INFO This message occurs when a problem previously reported about the disk shelf SCSI Enclosure Services (SES) electronics is corrected or when other information about the enclosure electronics that does not necessarily require customer action is reported.

Corrective action None.

ses.status.electronicsWarn
Message Severity Description ses.status.electronicsWarn WARNING This message occurs when a non-fatal condition is detected in the module that provides disk SCSI Enclosure Services (SES) monitoring capability.

Corrective action Replace the module. In some disk shelf types, this function is integrated into the Fibre Channel, SCSI, or serial-attached SCSI (SAS) interface modules.

ses.status.ESHPctlStatus
Message Severity Description ses.status.ESHPctlStatus DEBUG This message occurs when a change in the power control status is detected in the indicated disk shelf.

Corrective action None.

ses.status.fanError
Message Severity Description ses.status.fanError NODE_ERROR This message occurs when the indicated disk shelf cooling fan or fan module fails, and the shelf or its components are not receiving required cooling airflow.

Corrective action 1. Verify that the fan module is fully seated and secured. (The fan is integrated into the power supply module in some disk shelves.) 2. If the problem persists, replace the fan module.

230 | Platform Monitoring Guide

3. If the problem persists, contact technical support.

ses.status.fanInfo
Message Severity Description ses.status.fanInfo INFO This message occurs when a condition previously reported about the disk shelf cooling fan or fan module is corrected or when other information about the fans that does not necessarily require customer action is reported.

Corrective action None.

ses.status.fanWarning
Message Severity Description ses.status.fanWarning WARNING This message occurs when a disk shelf cooling fan is not operating to specification, or a component of a fan module has stopped functioning. The disk shelf components continue to receive cooling airflow but might eventually reach temperatures that are out of specification. 1. Verify that the fan module is fully seated and secured. (The fan is integrated into the power supply module in some disk shelves.) 2. If the problem persists, replace the fan module. 3. If the problem persists, contact technical support.

Corrective action

ses.status.ModuleError
Message Severity Description ses.status.ModuleError NODE_ERROR This message occurs when the reporting disk shelf detects an error in the indicated disk shelf module.

Corrective action 1. Verify that the shelf module is fully seated and secure. 2. If the problem persists, replace the disk shelf module.

ses.status.ModuleInfo
Message ses.status.ModuleInfo

EMS and operational messages | 231

Severity Description

INFO This message occurs when a previously reported error in the shelf module is corrected or when other information that does not necessarily require customer action is reported.

Corrective action None.

ses.status.ModuleWarn
Message Severity Description ses.status.ModuleWarn WARNING This message occurs when the reporting disk shelf detects a warning in the indicated disk shelf module.

Corrective action 1. Verify that the shelf module is fully seated and secure. 2. If the problem persists, replace the disk shelf module.

ses.status.psError
Message Severity Description Corrective action ses.status.psError NODE_ERROR This message occurs when a critical condition is detected in the indicated storage shelf power supply. The power supply might fail. 1. Verify that power input to the shelf is correct. If separate events of this type are reported simultaneously, the common power distribution point might be at fault. 2. If the shelf is in a cabinet, verify that the power distribution unit is ON and functioning properly. Make sure that the shelf power cords are fully inserted and secured, the supply is fully seated and secured, and the supply is switched ON. 3. Verify that power supply fans, if any, are functioning. If the problem persists, replace the power supply. 4. If the problem persists, contact technical support.

ses.status.psInfo
Message Severity ses.status.psInfo INFO

232 | Platform Monitoring Guide

Description

This message occurs when a condition previously reported about the disk shelf power supply is corrected or when other information about the power supply that does not necessarily require customer action is reported.

Corrective action None.

ses.status.psWarning
Message Severity Description Corrective action ses.status.psWarning WARNING This message occurs when a warning condition is detected in the indicated storage shelf power supply. The power supply might be able to continue operation. 1. Verify that the disk shelf is receiving power. If separate events of this type are reported simultaneously, the common power distribution point might be at fault. 2. If the disk shelf is in a cabinet, verify that the power distribution unit status is ON and functioning properly. Make sure that the disk shelf power cords are fully inserted and secured, the power supply is fully seated and secured, and the power supply is switched on. 3. If the problem persists, replace the power supply. 4. If the problem persists, contact technical support.

ses.status.temperatureError
Message Severity Description Corrective action ses.status.temperatureError NODE_ERROR This message occurs when the indicated disk shelf temperature sensor reports a temperature that exceeds the specifications for the disk shelf or its components. 1. Verify that the ambient temperature where the shelf is installed is within equipment specifications using the environment shelf [adapter] command, and that airflow clearances are maintained. 2. If the same disk shelf also reports fan or fan module failures, correct that problem now. If the problem is reported by the ambient temperature sensor (located on the operator panel), verify that the connection between the disk shelf and the panel is secure, if possible. 3. If the problem persists, and if the shelf has multiple temperature sensors of which only one exhibits the problem, replace the module that contains the

EMS and operational messages | 233

sensor that reports the error. If the problem persists, contact technical support for assistance.
Note: You can display temperature thresholds for each shelf through the environment shelf command.

ses.status.temperatureInfo
Message Severity Description ses.status.temperatureInfo INFO This message occurs when an error or warning condition previously reported by or about the disk shelf temperature sensor is corrected or when other information about the temperature in the disk shelf that does not necessarily require customer action is reported.

Corrective action None.

ses.status.temperatureWarning
Message Severity Description ses.status.temperatureWarning WARNING This message occurs when the indicated disk shelf temperature sensor reports a temperature that is close to exceeding the specifications for the disk shelf or its components. 1. Verify that the ambient temperature where the disk shelf is installed is within equipment specifications by using the environment shelf [adapter] command, and that airflow clearances are maintained. 2. If this disk shelf also reports fan or fan module errors or warnings, correct those problems now. 3. If the problem persists, and the shelf has multiple temperature sensors and only one of them exhibits the problem, replace the module that contains the sensor. 4. If the problem persists, contact technical support.
Note: Temperature thresholds for each shelf can be displayed through the environment shelf command.

Corrective action

ses.status.upsError
Message Severity ses.status.upsError NODE_ERROR

234 | Platform Monitoring Guide

Description

This message occurs when the disk shelf detects a failure in the uninterruptible power supply (UPS) attached to it. This might occur, for example, if power to the UPS is lost. 1. Restore power to the UPS 2. Verify that the connection from the UPS to the disk shelf is in place and secured and that the UPS is enabled. 3. If the problem persists, contact technical support.

Corrective action

ses.status.upsInfo
Message Severity Description ses.status.upsInfo INFO This message occurs when a condition previously reported about the uninterruptible power supply (UPS) attached to the disk shelf is corrected or when other information about the UPS that does not necessarily require customer action is reported.

Corrective action None.

ses.status.volError
Severity Description Corrective action NODE_ERROR This message occurs when a critical condition is detected in the indicated disk storage shelf voltage sensor. The shelf might be able to continue operation. 1. Verify that the power supply and the AC line are supplying power. 2. Monitor the power grid for abnormalities. 3. Replace the power supply. 4. If the problem persists, contact technical support.

ses.status.volWarning
Message Severity Description ses.status.volWarning WARNING This message occurs when a warning condition is detected in the indicated storage shelf voltage sensor. The shelf might be able to continue operation.

Corrective action 1. Verify that the power supply and the AC line are supplying power

EMS and operational messages | 235

2. Monitor the power grid for abnormalities. 3. Replace the power supply. 4. If the problem persists, contact technical support.

ses.system.em.mmErr
Message Severity Description ses.system.em.mmErr NODE_FAULT This message occurs when Data ONTAP does not support this system with internal disk drives.

Corrective action Check whether this system is currently supported. If it is, upgrade to the appropriate Data ONTAP version.

ses.tempOwnershipDone
Message Severity Description ses.tempOwnershipDone DEBUG This message occurs when SCSI Enclosure Services (SES) completes temporary ownership acquisition.

Corrective action Contact technical support.

sfu.adapterSuspendIO
Message Severity Description sfu.adapterSuspendIO INFO This message occurs during a disk shelf firmware update on a disk shelf that cannot perform I/O while updating firmware. Typically, the shelves involved are bridge-based as opposed to LRC-based or ESH-based.

Corrective action None.

sfu.auto.update.off.impact
Message Severity Description sfu.auto.update.off.impact WARNING This message occurs when the automated disk shelf firmware update cannot be completed on a downrev disk shelf enclosure because the (hidden) global option shelf.fw.auto.update is set to off.

236 | Platform Monitoring Guide

Corrective action Use the storage download shelf command to update. To have the automatic update enabled, set the hidden option shelf.fw.auto.update to on.

sfu.ctrllerElmntsPerShelf
Message Severity Description sfu.ctrllerElmntsPerShelf INFO This message occurs when a disk shelf firmware download determines the number of controller elements per shelf that can be downloaded.

Corrective action None.

sfu.downloadCtrllerBridge
Message Severity Description sfu.downloadCtrllerBridge INFO This message occurs when a disk shelf firmware download starts on a particular disk shelf.

Corrective action None.

sfu.downloadError
Message Severity Description sfu.downloadError ERR This message occurs when a disk shelf firmware update fails to successfully download firmware to a disk shelf or shelves in the system.

Corrective action 1. Redownload the latest disk shelf firmware from the NOW site at http:// now.netapp.com/ NOW/download/tools/ diskshelf/. 2. Attempt to download disk shelf firmware again by using the storage download shelf command.

sfu.downloadingController
Message Severity Description sfu.downloadingController INFO This message occurs when a disk shelf firmware download starts on a particular disk shelf.

EMS and operational messages | 237

Corrective action None.

sfu.downloadingCtrllerR1XX
Message Severity Description sfu.downloadingCtrllerR1XX INFO This message occurs when a disk shelf firmware download starts on a particular disk shelf.

Corrective action None.

sfu.downloadStarted
Message Severity Description sfu.downloadStarted INFO This message occurs when a disk shelf firmware update starts to download disk shelf firmware.

Corrective action None.

sfu.downloadSuccess
Message Severity Description Corrective action sfu.downloadSuccess INFO This message occurs when disk shelf firmware is updated successfully. None.

sfu.downloadSummary
Message Severity Description sfu.downloadSummary INFO This message occurs when a disk shelf firmware update is completed successfully.

Corrective action None.

sfu.downloadSummaryErrors
Message Severity sfu.downloadSummaryErrors ERR

238 | Platform Monitoring Guide

Description

This message occurs when a disk shelf firmware update is completed without successfully downloading to all shelves it attempted.

Corrective action Issue the storage download shelf command again.

sfu.FCDownloadFailed
Message Severity Description sfu.FCDownloadFailed ERR This message occurs when a disk shelf firmware update fails to download shelf firmware to a Fibre Channel or an ATA shelf successfully.

Corrective action 1. Redownload the latest disk shelf firmware from the NOW site at http:// now.netapp.com/ NOW/download/tools/ diskshelf/. 2. Attempt to download disk shelf firmware again by using the storage download shelf command.

sfu.firmwareDownrev
Message Severity Description sfu.firmwareDownrev WARNING This message occurs when disk shelf firmware is downrev and therefore cannot be updated automatically.

Corrective action 1. Copy updated disk shelf firmware into the /etc/shelf_fw directory on the storage appliance. 2. Manually issue the storage download shelf command.

sfu.firmwareUpToDate
Message Severity Description sfu.firmwareUpToDate INFO This message occurs when a disk shelf firmware update is requested but the system determines that all shelves are already updated already to the latest version of firmware available.

Corrective action None.

EMS and operational messages | 239

sfu.partnerInaccessible
Message Severity Description sfu.partnerInaccessible ERR This message occurs in an HA pair in which communication between partner nodes cannot be established.

Corrective action 1. Verify that the HA pair interconnect is operational. 2. Retry the storage download shelf command.

sfu.partnerNotResponding
Message Severity Description sfu.partnerNotResponding ERR This message occurs in an HA pair in which one node does not respond to firmware download requests from another node. In this case, the other node cannot download disk shelf firmware. Verify that the HA pair interconnect is up and running on both nodes of the configuration and then attempt to redownload the disk shelf firmware, using the storage download shelf command.

Corrective action

sfu.partnerRefusedUpdate
Message Severity Description sfu.partnerRefusedUpdate ERR This message occurs in an HA pair in which one node refuses firmware download requests from its partner node. In this case, the partner node cannot download disk shelf firmware. 1. Verify that both the partners are running the same version of Data ONTAP and that the active/active configuration interconnect is up and running on all nodes of the configuration. 2. Attempt the storage download shelf command again.

Corrective action

sfu.partnerUpdateComplete
Message Severity sfu.partnerUpdateComplete INFO

240 | Platform Monitoring Guide

Description

This message occurs in an HA pair in which a partner downloads disk shelf firmware and the download is completed. At this point, this notification is sent and SCSI Enclosure Services (SES) are resumed by the partner.

Corrective action None.

sfu.partnerUpdateTimeout
Message Severity Description sfu.partnerUpdateTimeout INFO This message occurs in an HA pair in which a partner downloads disk shelf firmware but the download times out. At this point, this notification is sent and SCSI Enclosure Services (SES) are resumed by the partner.

Corrective action 1. Verify that the HA pair interconnect is operational. 2. Retry the storage download shelf command.

sfu.rebootRequest
Message Severity Description sfu.rebootRequest INFO This message occurs when the disk shelf firmware update is completed. The disk shelf reboots to run the new code.

Corrective action None.

sfu.rebootRequestFailure
Message Severity Description sfu.rebootRequestFailure ERR This message occurs when an attempt to issue a reboot request after downloading shelf firmware fails, indicating a software error.

Corrective action Reboot the storage system, if possible, and try the firmware update again.

sfu.resumeDiskIO
Message Severity Description sfu.resumeDiskIO INFO This message occurs when a disk shelf firmware update is completed and disk I/O is resumed.

EMS and operational messages | 241

Corrective action None.

sfu.SASDownloadFailed
Message Severity Description sfu.SASDownloadFailed ERR This message occurs when a disk shelf firmware update fails to download shelf firmware to a shelf successfully.

Corrective action 1. Redownload the latest disk shelf firmware from the NOW site at http:// now.netapp.com/ NOW/download/tools/ diskshelf/. 2. Download disk shelf firmware again by using the storage download shelf command.

sfu.statusCheckFailure
Message Severity Description sfu.statusCheckFailure ERR This message occurs when the storage download shelf command encounters a failure while attempting to read the status of the firmware update in progress.

Corrective action Retry the storage download shelf command.

sfu.suspendDiskIO
Message Severity Description sfu.suspendDiskIO INFO This message occurs when a disk shelf firmware update is started and disk I/O is suspended.

Corrective action None.

sfu.suspendSES
Message Severity Description Suspending enclosure services -- partner is updating disk shelf firmware. INFO This message occurs when a disk shelf firmware update is requested in an HA pair environment. In this case, one partner node updates the firmware on the

242 | Platform Monitoring Guide

disk shelf module while the other partner node temporarily disables SCSI Enclosure Services (SES) while the firmware update is in process. Corrective action None.

Flash Cache module and PAM module EMS messages


The caching module WAFL cache, hardware driver, and system monitoring can generate error messages. All messages are reported through the EMS. This document uses the term Flash Cache module to refer to caching modules with capacities greater than 16 GB. Before the release of Data ONTAP 7.3.5, such adapters were called Performance Acceleration Modules (PAM II). The name of the 16-GB caching module remains Performance Acceleration Module (PAM I).

extCache.io.BlockChecksumError
Message Severity Description
extCache.io.BlockChecksumError

NODE_ERROR This message occurs when the external cache detects a block checksum verification error while performing a read operation. The operation will be retried from persistent storage (RAID).

Corrective action Contact technical support.

extCache.io.cardError
Message Severity Description
extCache.io.cardError

NODE_Error This message occurs when the external cache detects a card failure on read or write I/O. If the I/O was a read, the operation will be retried from persistent storage (RAID).

Corrective action Contact technical support.

extCache.io.readError
Message Severity Description
extCache.io.readError

NODE_ERROR This message occurs when the external cache detects an I/O error on a read. The operation will be retried from persistent storage (RAID).

Corrective action Contact technical support.

EMS and operational messages | 243

extCache.io.writeError
Message Severity Description
extCache.io.writeError

NODE_ERROR This message occurs when the external cache detects an I/O error on a write. This causes the external cache component to be disabled and might result in degraded performance until the problem is corrected.

Corrective action Contact technical support.

extCache.offline
Message Severity Description
extCache.offline

SVC_ERROR This message occurs when the external cache is automatically taken offline and disabled. This can happen after an I/O error on the external cache and might result in degraded performance until the problem is corrected. Check the Event Management System (EMS) log for earlier errors.

Corrective action Contact technical support.

extCache.ReconfigComplete
Message Severity Description
extCache.ReconfigComplete

NODE_ERROR This message occurs when the Write Anywhere File Layout (WAFL) external cache has detected a failure of one or more cache memory cards, and was able to successfully reconfigure to continue operation with the remaining cards.

Corrective action None.

extCache.ReconfigFailed
Message Severity Description
extCache.ReconfigFailed

NODE_ERROR This message occurs when an attempt to reconfigure the external cache has failed. The message identifies what step of the reconfiguration failed.

Corrective action Contact technical support.

244 | Platform Monitoring Guide

extCache.ReconfigStart
Message Severity Description
extCache.ReconfigStart

NODE_ERROR This message occurs when the Write Anywhere File Layout (WAFL) external cache has detected a failure of one or more cache memory cards. An attempt will be made to restart the cache with the remaining card(s). Even if the cache is restarted performance may be degraded due to the reduced size of cache available. See related EMS messages for details of the failing unit. Contact technical support.

Corrective action

extCache.UECCerror
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

Message Severity Description

extCache.UECCerror

NODE_ERROR This message occurs when an uncorrectable multi-bit ECC memory error is reported to the Write Anywhere File Layout (WAFL) file system external cache. When this event occurs the data will be re-read from persistent storage (RAID) and operation continues. See related EMS messages for details about the failing unit. If multiple uncorrectable multi-bit ECC errors are issued, this indicates that a hardware component might be failing and should be considered for replacement.

Corrective action

extCache.UECCmax
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

Message Severity Description

extCache.UECCmax

NODE_ERROR This message occurs when the Write Anywhere File Layout (WAFL) file system external cache has detected excessive multi-bit uncorrectable ECC memory errors in a recent period. When too many multi-bit ECC errors are reported, WAFL disables the external cache until the failing component is replaced, resulting in degraded performance. See related EMS messages for details about the failing unit.

EMS and operational messages | 245

Corrective action

Contact technical support.

fal.chan.offline.comp
Message Severity Description
fal.chan.offline.comp

INFO This message occurs when the FAL (Flash Adaptation Layer) finishes taking a channel offline.

Corrective action None.

fal.chan.online.erase.warn
Message Severity Description
fal.chan.online.erase.warn

INFO This message occurs when an erase of a label block fails while attempting to bring online a channel of a card. This could lead to a failure to read the label (see the fal.chan.online.read.warn event).

Corrective action None.

fal.chan.online.fail
Message Severity Description
fal.chan.online.fail

SVC_ERROR This message occurs when the FAL (Flash Adaptation Layer) fails to bring online a channel of a card for the mentioned reason.

Corrective action None.

fal.chan.online.read.warn
Message Severity Description
fal.chan.online.read.warn

INFO This message occurs when the read of a label fails while attempting to bring online a channel of a module. This is expected on the first boot with a Flash Cache module. Otherwise, it means existing FAL (Flash Adaptation Layer) label information is lost. The current version of software does not depend on label information, so this loss is not a problem right now. However, future versions of software might store cache data persistently. If persistent data is stored on a card

246 | Platform Monitoring Guide

and this version of software is booted on such a system, failure to read the label might lead to loss of some cached data. Corrective action None.

fal.chan.online.rep.fail
Message Severity Description
fal.chan.online.rep.fail

SVC_ERROR This message occurs when the FAL (Flash Adaptation Layer) fails to bring online all channels in a caching module. The reasons for failure are listed in the accompanying fal.chan.online.fail events.

Corrective action Contact technical support.

fal.chan.online.rep.part
Message Severity Description
fal.chan.online.rep.part

SVC_ERROR This message occurs when the FAL (Flash Adaptation Layer) fails to bring online some channels in a caching module. The reasons for failure are listed in the accompanying fal.chan.online.fail events.

Corrective action Contact technical support.

fal.chan.online.rep.succ
Message Severity Description
fal.chan.online.rep.succ

INFO This message occurs when the FAL (Flash Adaptation Layer) successfully brings online all channels in a card.

Corrective action None.

fal.chan.online.rep.ver.err
Message Severity Description
fal.chan.online.rep.ver.err

SVC_ERROR This message occurs when the FAL (Flash Adaptation Layer) fails to bring online all channels in a caching module because of version mismatch.

EMS and operational messages | 247

Corrective action Follow the documented revert procedure.

fal.chan.online.write.warn
Message Severity Description
fal.chan.online.write.warn

INFO This message occurs when a write of a label block fails while attempting to bring online a channel of a module. This could lead to a failure to read the label (see the fal.chan.online.read.warn event).

Corrective action None.

fal.init.failed
Message Severity Description
fal.init.failed

SVC_ERROR This message occurs when the FAL (Flash Adaptation Layer) fails to initialize. This error likely indicates a software bug.

Corrective action Contact technical support.

fmm.bad.block.detected
Message Severity Description
fmm.bad.block.detected

DEBUG This message occurs when Flash Management Module (FMM) gets a message from a flash device driver reporting that a bad block is detected.

Corrective action None.

fmm.device.stats.missing
Message Severity Description
fmm.device.stats.missing

DEBUG This message occurs when the onboard copy of statistics maintained by Flash Management Module (FMM) are missing. This can happen when a device is initially activated in the controller.

Corrective action None.

248 | Platform Monitoring Guide

fmm.domain.card.failure
Message Severity Description
fmm.domain.card.failure

SVC_ERROR This message occurs when the Flash Management Module (FMM) detects that a flash device failed. Typically, this is the result of a hardware failure on the flash device itself.

Corrective action Repair or replace the failed flash device.

fmm.domain.core.failure
Message Severity Description
fmm.domain.core.failure

DEBUG This message occurs when Flash Management Module (FMM) detects that a core domain on a flash device managed by FMM has failed. Typically, this is the result of a hardware failure on the flash device itself. Core failure is not considered to be fatal.

Corrective action None.

fmm.hourly.device.report
Message Severity Description
fmm.hourly.device.report

DEBUG This message is sent by Flash Management Module (FMM) every hour, to report the status of a flash device that FMM manages.

Corrective action None.

fmm.threshold.bank.degraded
Message Severity Description
fmm.threshold.bank.degraded

DEBUG This message occurs when Flash Management Module (FMM) detects that in a flash device, the percentage of a bank that is offline is above a warning threshold. FMM responds with the action described by the action parameter.

Corrective action None.

EMS and operational messages | 249

fmm.threshold.bank.offline
Message Severity Description
fmm.threshold.bank.offline

DEBUG This message occurs when Flash Management Module (FMM) detects that in a flash device, a critical percentage of a bank is offline, beyond which the bank cannot operate. FMM responds with the action described by the action parameter.

Corrective action None.

fmm.threshold.card.degraded
Message Severity Description
fmm.threshold.card.degraded

SVC ERROR This message occurs when the Flash Management Module (FMM) detects the offline percentage of a flash device exceeds a specified warning threshold. FMM responds with the action described by the action parameter.

Corrective action Repair or replace this degraded flash device.

fmm.threshold.card.failure
Message Severity Description
fmm.threshold.card.failure

SVC_Error This message occurs when Flash Management Module (FMM) detects the offline percentage of a flash device exceeds a specified critical threshold beyond which the device cannot operate. FMM responds with the action described by the action parameter. This flash device can no longer operate and will be taken offline. Repair or replace the flash device.

Corrective action

fmm.threshold.core.offline
Message Severity Description
fmm.threshold.core.offline

DEBUG This message occurs when Flash Management Module (FMM) detects that an excessive number of blocks in a core of a flash device have gone bad. The threshold for a core is defined as a percentage of bad blocks, and when that

250 | Platform Monitoring Guide

threshold is exceeded, FMM responds with the action described by the action parameter. Corrective action None.

iomem.bbm.bbtl.overflow
Message Severity Description
iomem.bbm.bbtl.overflow

NODE_ERROR This message occurs when the caching module driver detects that the Bad Block Transaction Log has overflowed.

Corrective action None.

iomem.bbm.init.failed
Message Severity Description
iomem.bbm.init.failed

NODE_ERROR This message occurs when the caching module driver detects that an operation to a NOR flash memory has failed.

Corrective action None.

iomem.bbm.new.flash
Message Severity Description
iomem.bbm.new.flash

DEBUG This message occurs when the caching module driver detects that a NAND flash package has been replaced.

Corrective action None.

iomem.card.disable
Message Severity Description
iomem.card.disable

WARNING This message occurs when the caching module has been disabled as a result of an explicit diagnostic command.

Corrective action None.

EMS and operational messages | 251

iomem.card.enable
Message Severity Description
iomem.card.enable

INFO This message occurs when the caching module has been enabled as a result of an explicit diagnostic command.

Corrective action None.

iomem.card.fail.cecc
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

Message Severity Description

iomem.card.fail.cecc

NODE_ERROR This message occurs when the caching module driver takes an acceleration card offline due to an excessive number of correctable memory errors.

Corrective action Replace the caching module.

iomem.card.fail.data.crc
Message Severity Description
iomem.card.fail.data.crc

NODE_ERROR This message occurs when the caching module driver takes a caching module offline due to an excessive number of detected data cyclic redundancy check (CRC) errors.

Corrective action Replace the caching module.

iomem.card.fail.desc.crc
Message Severity Description
iomem.card.fail.desc.crc

NODE_ERROR This message occurs when the caching module driver takes a caching module offline due to an excessive number of detected descriptor cyclic redundancy check (CRC) errors.

Corrective action Replace the caching module.

252 | Platform Monitoring Guide

iomem.card.fail.dimm
Message Severity Description
iomem.card.fail.dimm

NODE_ERROR This message occurs when the caching module driver takes a caching module offline due to failure of a memory DIMMs.

Corrective action Replace the caching module.

iomem.card.fail.firmware.primary
Message Severity Description
iomem.card.fail.firmware.primary

NODE_ERROR This messages occurs when the caching module driver detects that the module is not running on the primary firmware image. The card does not function unless it running on the primary image.
Note: The following steps are for systems that use the SYSDIAG diagnostic tool. 32xx and 62xx systems use system-level diagnostics, which is a different diagnostic tool. For details about using system-level diagnostics, see the SystemLevel Diagnostics Guide on the NetApp Support Site at support.netapp.com.

Corrective action

1. Enter the following command at the boot environment prompt:


boot_diags

2. Select xtnd yes on the diagnostic main menu. 3. Take one of the following actions: If your system has a 16-GB Performance Acceleration Module, select the iomem submenu and then run test 62, Update FPGA [Extended]. If your system has a 256-GB or 512-GB Performance Acceleration Module, select the pam2 submenu and then run test 61, Update FPGA [Extended].

4. Exit diagnostics and reboot the system.

iomem.card.fail.fpga
Message Severity
iomem.card.fail.fpga

NODE_ERROR

EMS and operational messages | 253

Description

This message occurs when the caching module driver detects a fatal operational error with the onboard field programmable gate array (FPGA) hardware and is taking the caching module offline.

Corrective action Contact technical support.

iomem.card.fail.fpga.primary
Message Severity Description
iomem.card.fail.fpga.primary

NODE_ERROR This messages occurs when the acceleration card driver detects that the card is not running on the primary firmware image. The card does not function unless it is running on the primary image.
Note: The following steps are for systems that use the SYSDIAG diagnostic tool. 32xx and 62xx systems use system-level diagnostics, which is a different diagnostic tool. For details about using system-level diagnostics, see the SystemLevel Diagnostics Guide on the NetApp Support Site at support.netapp.com. Take one of the following actions:

Corrective action

If you have a 16-GB Performance Acceleration Module, complete the following steps: 1. Enter the following command at the boot environment prompt:
boot_diags

2. Select xtnd yes on the diagnostic main menu. 3. Run test 62, Update FPGA [Extended]. 4. Exit diagnostics and reboot the system. If you have a Flash Cache module, the FPGA firmware should be programmed automatically. Other EMS messages earlier in the log should indicate why programming failed.

iomem.card.fail.fpga.rev
Message Severity Description
iomem.card.fail.fpga.rev

NODE_ERROR This message occurs when the caching module driver detects that the field programmable gate array (FPGA) firmware image is a revision not supported by the driver.

254 | Platform Monitoring Guide

Corrective action

Note: The following steps are for systems that use the SYSDIAG diagnostic tool. 32xx and 62xx systems use system-level diagnostics, which is a different diagnostic tool. For details about using system-level diagnostics, see the SystemLevel Diagnostics Guide on the NetApp Support Site at support.netapp.com. Take one of the following actions:

If you have a 16-GB Performance Acceleration Module, complete the following steps: 1. Enter the following command at the boot environment prompt:
boot_diags

2. Select xtnd yes on the diagnostic main menu. 3. Run test 62, Update FPGA [Extended]. 4. Exit diagnostics and reboot the system. If you have a Flash Cache module, the FPGA firmware should be programmed automatically. Other EMS messages earlier in the log should indicate why programming failed.

iomem.card.fail.internal
Message Severity Description
iomem.card.fail.internal

NODE_ERROR This message occurs when the caching module driver detects a fatal internal error on the caching module and is taking the module offline.

Corrective action Contact technical support.

iomem.card.fail.pci
Message Severity Description
iomem.card.fail.pci

NODE_ERROR This message occurs when the caching module driver detects a fatal PCI error on the caching module and is taking the module offline.

Corrective action Contact technical support.

iomem.card.fail.uecc
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

EMS and operational messages | 255

Message Severity Description

iomem.card.fail.uecc

NODE_ERROR This message occurs when the caching module driver takes a caching module offline due to an excessive number of uncorrectable memory errors.

Corrective action Replace the caching module.

iomem.dimm.log.checksum
Message Severity Description
iomem.dimm.log.checksum

NODE_ERROR This message occurs when the caching module driver detects a checksum error in the error log for a DIMM on the caching module.

Corrective action Replace the caching module.

iomem.dimm.log.init
Message Severity Description
iomem.dimm.log.init

INFO This message occurs when the caching module driver initializes the error log for a DIMM.

Corrective action None.

iomem.dimm.log.read
Message Severity Description
iomem.dimm.log.read

NODE_ERROR This message occurs when the caching module driver fails to read the error log for a DIMM on the caching module.

Corrective action Replace the caching module.

iomem.dimm.log.sync
Message Severity Description
iomem.dimm.log.sync

INFO This message occurs when the caching module driver is writing the error log for a DIMM to persistent storage.

256 | Platform Monitoring Guide

Corrective action None.

iomem.dimm.log.write
Message Severity Description
iomem.dimm.log.write

NODE_ERROR This message occurs when the caching module driver fails to write the error log for a DIMM on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.banks
Message Severity Description
iomem.dimm.mismatch.banks

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of banks that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.burst
Message Severity Description
iomem.dimm.mismatch.burst

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a burst size that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.casLatency
Message Severity Description
iomem.dimm.mismatch.casLatency

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a column address select (CAS) that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

EMS and operational messages | 257

iomem.dimm.mismatch.columns
Message Severity Description
iomem.dimm.mismatch.columns

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of columns that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.dataWidth
Message Severity Description
iomem.dimm.mismatch.dataWidth

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a data synchronous dynamic RAM (SDRAM) width that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.eccWidth
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

Message Severity Description

iomem.dimm.mismatch.eccWidth

NODE_ERROR This message occurs when the caching module driver detects a DIMM with an ECC synchronous dynamic RAM (SDRAM) width that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.ranks
Message Severity Description
iomem.dimm.mismatch.ranks

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of ranks that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

258 | Platform Monitoring Guide

iomem.dimm.mismatch.rows
Message Severity Description
iomem.dimm.mismatch.rows

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of rows that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.mismatch.vendor
Message Severity Description
iomem.dimm.mismatch.vendor

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a manufacturer ID that does not match that of the other installed DIMMs on the caching module.

Corrective action Replace the caching module.

iomem.dimm.spd.banks
Message Severity Description
iomem.dimm.spd.banks

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of banks incompatible with the memory controller of the caching module.

Corrective action Replace the caching module.

iomem.dimm.spd.burst
Message Severity Description
iomem.dimm.spd.burst

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a burst size incompatible with the memory controller of the caching module.

Corrective action Replace the caching module.

EMS and operational messages | 259

iomem.dimm.spd.casLatency
Message Severity Description
iomem.dimm.spd.casLatency

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a column address select (CAS) latency incompatible with the memory controller of the caching module

Corrective action Replace the caching module.

iomem.dimm.spd.checksum
Message Severity Description
iomem.dimm.spd.checksum

NODE_ERROR This message occurs when the caching module driver detects a checksum error for the identifying information read from the serial presence detect (SPD) electronically erasable programmable read-only memory (EEPROM) of a DIMM installed on the caching module.

Corrective action Replace the caching module.

iomem.dimm.spd.columns
Message Severity Description
iomem.dimm.spd.columns

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of columns incompatible with the memory controller of the caching module.

Corrective action Replace the caching module.

iomem.dimm.spd.dataWidth
Message Severity Description
iomem.dimm.spd.dataWidth

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a data synchronous dynamic RAM (SDRAM) width incompatible with the memory controller of the caching module.

Corrective action Replace the caching module.

260 | Platform Monitoring Guide

iomem.dimm.spd.detect
Message Severity Description
iomem.dimm.spd.detect

INFO This message occurs when the caching module driver detects the presence of an installed DIMM during initialization.

Corrective action None.

iomem.dimm.spd.eccWidth
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

Message Severity Description

iomem.dimm.spd.eccWidth

NODE_ERROR This message occurs when the caching module driver detects a DIMM with an ECC synchronous dynamic RAM (SDRAM) SDRAM width incompatible with the memory controller of the caching module.

Corrective action Replace the caching module.

iomem.dimm.spd.ranks
Message Severity Description
iomem.dimm.spd.ranks

NODE_ERROR This message occurs when the acceleration card driver detects a DIMM with a number of ranks incompatible with the memory controller of the acceleration card.

Corrective action Replace the acceleration card.

iomem.dimm.spd.read
Message Severity Description
iomem.dimm.spd.read

NODE_ERROR This message occurs when the caching module driver fails to read the identifying information from the synchronous dynamic RAM (SDRAM) electronically erasable programmable read-only memory EEPROM of a DIMM installed on the caching module.

Corrective action Replace the acceleration card.

EMS and operational messages | 261

iomem.dimm.spd.rows
Message Severity Description
iomem.dimm.spd.rows

NODE_ERROR This message occurs when the caching module driver detects a DIMM with a number of rows incompatible with the memory controller of the caching module.

Corrective action Replace the caching module.

iomem.dma.crc.data
Message Severity Description
iomem.dma.crc.data

WARNING This message occurs when the caching module driver detects a data checksum error for data in transit across the PCI link between the system and the caching module.

Corrective action Contact technical support.

iomem.dma.crc.desc
Message Severity Description
iomem.dma.crc.desc

WARNING This message occurs when the caching module driver detects a descriptor checksum error for data in transit across the PCI link between the system and the caching module.

Corrective action Contact technical support.

iomem.dma.internal
Message Severity Description
iomem.dma.internal

WARNING This message occurs when the caching module driver detects an internal direct memory access (DMA) error during data transfer.

Corrective action Contact technical support.

262 | Platform Monitoring Guide

iomem.dma.stall
Message Severity Description
iomem.dma.stall

WARNING This message occurs when the acceleration card driver detects a direct memory access (DMA) channel has unexpectedly stalled and is attempting to restart the DMA channel for normal operation.

Corrective action None.

iomem.ecc.cecc
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable

ECC memory errors occur per day for three consecutive days, replace the module. Message Severity Description
iomem.ecc.cecc

WARNING This message occurs when a correctable ECC memory error is detected while accessing the memory of a caching module. If frequent, correctable ECC errors usually indicate that a hardware memory component of the caching module is failing.

Corrective action None.

iomem.ecc.correct.off
Message Severity Description Corrective action
iomem.ecc.correct.off

WARNING This message occurs when the error correction code (ECC) memory error correction has been disabled for a caching module. ECC error correction should never be disabled for the caching module under normal operating conditions. The only way that this can occur is if it has been explicitly disabled through a private diagnostic interface. If this message is encountered under normal operating conditions, contact technical support.

iomem.ecc.correct.on
Message Severity
iomem.ecc.correct.on

INFO

EMS and operational messages | 263

Description

This message occurs when the error correction code (ECC) memory error correction has been enabled for a caching module.

Corrective action None.

iomem.ecc.detect.off
Message Severity Description Corrective action
iomem.ecc.detect.off

WARNING This message occurs when the error correction code (ECC) memory error detection has been disabled for an acceleration card. ECC error detection should never be disabled for the caching module under normal operating conditions. The only way that this can occur is if the functionality has been explicitly disabled via a private diagnostic interface. If this message is encountered under normal operating conditions, contact technical support.

iomem.ecc.detect.on
Message Severity Description
iomem.ecc.detect.on

INFO This message occurs when the error correction code (ECC) memory error detection has been enabled for a caching module.

Corrective action None.

iomem.ecc.inject
Message Severity Description
iomem.ecc.inject

WARNING This message occurs when an error correction code (ECC) memory error is manually injected into the memory of a caching module. This injection event will only occur during diagnostic testing.

Corrective action None.

iomem.ecc.summary
Message Severity
iomem.ecc.summary

WARNING

264 | Platform Monitoring Guide

Description

This message occurs when the caching module driver makes its periodic error summary report indicating that uncorrectable memory errors have been detected on the acceleration card.

Corrective action Replace the acceleration card.

iomem.ecc.uecc
Message Severity Description
iomem.ecc.uecc

NODE_ERROR This message occurs when an uncorrectable ECC memory error is detected while accessing the memory of a caching module. Uncorrectable ECC errors indicate that a hardware memory component of the caching module has failed or is failing. Uncorrectable memory errors can only be isolated to a pair of DIMMs on the caching module. None.
Note: If you have a 16-GB Performance Acceleration Module, and if more than 10 uncorrectable ECC memory errors occur per day for three consecutive days, replace the module.

Corrective action

iomem.fail.stripe
Message Severity Description Corrective action
iomem.fail.stripe

INFO An erase stripe is being failed. None.

iomem.firmware.package.access
Message Severity Description
iomem.firmware.package.access

NODE_error This message occurs when the caching module driver encounters a problem while accessing the firmware package. The caching module might continue to function, but it is recommended that you follow the corrective action at the earliest opportunity. Reinstall the Data ONTAP software package or service image.

Corrective action

EMS and operational messages | 265

iomem.firmware.primary
Message Severity Description
iomem.firmware.primary

WARNING This message occurs when the caching module driver detects that the card is not running on the primary firmware image. The card does not function unless it is running on the primary image.

Corrective action None.

iomem.firmware.program.complete
Message Severity Description
iomem.firmware.program.complete

INFO This message occurs when the caching module driver finishes the programming procedure for the caching module firmware.

Corrective action None.

iomem.firmware.program.fail
Message Severity Description
iomem.firmware.program.fail

NODE_ERROR This message occurs when the caching module driver fails to program the card firmware.

Corrective action Contact technical support.

iomem.firmware.program.reboot
Message Severity
iomem.firmware.program.reboot

INFO

Description This message occurs when the caching module driver triggers a reboot due to programming firmware on one or more caching modules.

iomem.firmware.program.start
Message Severity
iomem.firmware.program.start

INFO

266 | Platform Monitoring Guide

Description

This message occurs when the caching module driver begins the programming procedure for the module firmware.

Corrective action None.

iomem.firmware.rev
Message Severity Description
iomem.firmware.rev

WARNING This message occurs when the caching module driver detects that the field programmable gate array (FPGA) firmware image is a revision not supported by the driver.

Corrective action None.

iomem.flash.mismatch.id
Message Severity Description
iomem.flash.mismatch.id

NODE_ERROR This message occurs when the caching module driver detects a flash device with an identifier that does not match the identifier contained in the fieldreplaceable unit (FRU) information. The caching module is not functional until you resolve this issue.

Corrective action Contact technical support.

iomem.fru.badInfo
Message Severity Description
iomem.fru.badInfo

WARNING This message occurs when the caching module driver detects invalid information in the field-replaceable unit (FRU) electronically erasable programmable read-only memory (EEPROM) of the caching module.

Corrective action Replace the caching module.

iomem.fru.checksum
Message Severity Description
iomem.fru.checksum

WARNING This message occurs when the caching module driver detects a checksum error in the card field-replaceable unit (FRU) information for the caching module.

EMS and operational messages | 267

Corrective action Replace the caching module.

iomem.fru.read
Message Severity Description
iomem.fru.read

WARNING This message occurs when the caching module driver encounters an error reading the field-replaceable unit (FRU) electronically erasable programmable read-only memory (EEPROM) of the caching module.

Corrective action Replace the caching module..

iomem.fru.write
Message Severity Description
iomem.fru.write

WARNING This message occurs when the caching module driver encounters an error writing the field-replaceable unit (FRU) electronically erasable programmable read-only memory (EEPROM) of the caching module.

Corrective action Replace the caching module.

iomem.i2c.link.down
Message Severity Description
iomem.i2c.link.down

WARNING This message occurs when the caching module driver detects the failure of Inter-Integrated Circuit (I2C) serial link on the caching module.

Corrective action Replace the caching module.

iomem.i2c.read.addrNACK
Message Severity Description
iomem.i2c.read.addrNACK

WARNING This message occurs when the caching module driver detects an address negative acknowledgment (NACK) error condition when reading data from an Inter-Integrated Circuit (I2C) device on the caching module.

Corrective action Replace the caching module.

268 | Platform Monitoring Guide

iomem.i2c.read.dataNACK
Message Severity Description
iomem.i2c.read.dataNACK

WARNING This message occurs when the caching module driver detects a data negative acknowledgment (NACK) error condition when reading data from an InterIntegrated Circuit (I2C) device on the caching module.

Corrective action Replace the caching module.

iomem.i2c.read.timeout
Message Severity Description
iomem.i2c.read.timeout

WARNING This message occurs when the caching module driver times out while trying to read data from an Inter-Integrated Circuit (I2C) device on the caching module.

Corrective action Replace the caching module.

iomem.i2c.write.addrNACK
Message Severity Description
iomem.i2c.write.addrNACK

WARNING This message occurs when the caching module driver detects an address negative acknowledgment (NACK) error condition when writing data from an Inter-Integrated Circuit (I2C) device on the caching module.

Corrective action Replace the caching module.

iomem.i2c.write.dataNACK
Message Severity Description
iomem.i2c.write.dataNACK

WARNING This message occurs when the caching module driver detects a data negative acknowledgment (NACK) error condition when writing data from an InterIntegrated Circuit (I2C) device on the caching module.

Corrective action Replace the caching module.

EMS and operational messages | 269

iomem.i2c.write.timeout
Message Severity Description
iomem.i2c.write.timeout

WARNING This message occurs when the caching module driver times out while trying to write data from an Inter-Integrated Circuit (I2C) device on the caching module.

Corrective action Replace the caching module.

iomem.init.detect.fpga
Message Severity Description
iomem.init.detect.fpga

INFO This message occurs when the field-programmable gate array (FPGA) on a caching module is detected and initialized for use by the driver.

Corrective action None.

iomem.init.detect.pci
Message Severity Description
iomem.init.detect.pci

INFO This message occurs when a caching module is detected in a PCI slot and is being initialized for use by the system.

Corrective action None.

iomem.init.fail
Message Severity Description
iomem.init.fail

NODE_ERROR This message occurs when the caching module driver fails to initialize a caching module.

Corrective action Look for the specific failure log messages in the EMS log prior to this message; they identify the reason for the failure.

iomem.memory.flash.syndrome
Message Severity
iomem.memory.flash.syndrome

DEBUG

270 | Platform Monitoring Guide

Description

This messages occurs when the caching module driver detects a syndrome code associated with a flash memory access.

Corrective action None.

iomem.memory.none
Message Severity Description
iomem.memory.none

NODE_ERROR This message occurs when the caching module driver cannot detect any installed memory on a caching module.

Corrective action Replace the caching module.

iomem.memory.power.high
Message Severity Description Corrective action
iomem.memory.power.high

WARNING This message occurs when the memory of the caching module has been configured to operate in high power mode. Memory high power mode should never be enabled for the caching module under normal operating conditions. The only way that this can occur is if it has been explicitly enabled via a private diagnostic interface. If this message is encountered under normal operating conditions, contact technical support.

iomem.memory.power.low
Message Severity Description
iomem.memory.power.low

INFO This message occurs when the memory DIMMs of the caching module have been configured to operate in low power mode.

Corrective action None.

iomem.memory.scrub.start
Message Severity Description
iomem.memory.scrub.start

INFO This message occurs when the background error correction code (ECC) memory scrubbing process on a caching module is starting.

EMS and operational messages | 271

Corrective action None.

iomem.memory.size
Message Severity Description
iomem.memory.size

INFO This message occurs when the caching module driver has determined the amount of memory installed on a caching module.

Corrective action None.

iomem.memory.zero.complete
Message Severity Description
iomem.memory.zero.complete

INFO This message occurs when the boot-time zeroing of the memory of a caching module is complete.

Corrective action None.

iomem.memory.zero.start
Message Severity Description
iomem.memory.zero.start

INFO This message occurs when the boot-time zeroing of the memory of a caching module is starting.

Corrective action None.

iomem.nor.op.failed
Message Severity Description
iomem.nor.op.failed

NODE_ERROR This message occurs when the caching module driver detects that an operation to a NOR flash memory has failed.

Corrective action None.

iomem.pci.error.config.bar
Message
iomem.pci.error.config.bar

272 | Platform Monitoring Guide

Severity Description Corrective action

NODE_ERROR This message occurs when the caching module driver detects a misconfigured Base Address Register (BAR) on the caching hardware. Boot into diagnostics and use the applicable menu option to reprogram the primary field-programmable gate array (FPGA) image on the caching module. If the problem persists, replace the caching module.

iomem.pio.op.failed
Message Severity Description
iomem.pio.op.failed

NODE_ERROR This message occurs when the caching module driver detects that a programmed I/O (PIO) NAND flash access failed.

Corrective action None.

iomem.remap.block
Message Severity Description
iomem.remap.block

INFO This message occurs when a bad erase block is being remapped to a spare block.

Corrective action None.

iomem.remap.target.bad
Message Severity Description Corrective action
iomem.remap.target.bad

INFO This message occurs when the target of a remap is found to be bad. None.

iomem.temp.report
Message Severity Description
iomem.temp.report

INFO This message occurs periodically to report the operating temperature of the field-programmable gate array (FPGA) on the caching module.

Corrective action None.

EMS and operational messages | 273

iomem.train.complete
Message Severity Description
iomem.train.complete

INFO This message occurs when the caching module driver has successfully trained one of the memory controllers for a memory DIMM bank to report the calibrated idelay setting.

Corrective action None.

iomem.train.fail
Message Severity Description
iomem.train.fail

NODE_ERROR This message occurs when the caching module driver detects that the card memory controllers have failed to train for the installed DIMMs.

Corrective action Replace the caching module.

iomem.train.notReady
Message Severity Description
iomem.train.notReady

NODE_ERROR This message occurs when the caching module driver detects that a caching module memory controller has failed to become ready for operation after calibration.

Corrective action Replace the caching module.

iomem.train.start
Message Severity Description
iomem.train.start

INFO This message occurs when the caching module driver initiates training of the memory controllers on the acceleration card to calibrate them to the installed memory modules.

Corrective action None.

274 | Platform Monitoring Guide

iomem.vmargin.high
Message Severity Description
iomem.vmargin.high

WARNING This message occurs when the acceleration card driver has been configured to margin a voltage level high for testing purposes.

Corrective action None.

iomem.vmargin.low
Message Severity Description
iomem.vmargin.low

WARNING This message occurs when the caching module driver has been configured to margin a voltage level low for testing purposes.

Corrective action None.

iomem.vmargin.nominal
Message Severity Description
iomem.vmargin.nominal

INFO This message occurs when voltage margining has been returned to nominal level on the caching module.

Corrective action None.

monitor.extCache.failed
Message Severity Description
monitor.extCache.failed

LOG_WARNING This message occurs if the monitor detects the Write Anywhere File Layout (WAFL) external cache subsystem (FlexScale) has failed and is no longer available for use.

Corrective action Consult the system logs to determine the original cause of the error.

monitor.flexscale.noLicense
Message Severity
monitor.flexscale.noLicense

INFO

EMS and operational messages | 275

Description

This message occurs if the monitor detects that the caching module is present but the FlexScale product is not licensed. FlexScale requires a license for use.

Corrective action Obtain a license for the FlexScale product, or remove the caching module.

USB boot device EMS messages


The universal serial bus boot device on 32xx and 62xx systems can generate informational, warning, and error messages. All messages are reported through the EMS.

usb.adapter.debug
Message Severity Description
usb.adapter.debug

INFORMATION This message indicates a Data ONTAP universal serial bus (USB) adapter driver debug event.

Corrective action None.

usb.adapter.exception
Message Severity Description
usb.adapter.exception

WARNING This message occurs when the Data ONTAP universal serial bus (USB) adapter driver encounters an error with the adapter. The adapter is reset to recover.

Corrective action None.

usb.adapter.failed
Message Severity Description
usb.adapter.failed

ERROR This message occurs when the Data ONTAP universal serial bus (USB) adapter driver cannot recover the adapter after resetting it multiple times. The adapter and the devices attached to it will not be used anymore. Take the following actions: 1. If the adapter is in use, verify that all attached devices are supported devices and that they are seated correctly. 2. If the problem persists, replace the attached devices.

Corrective action

276 | Platform Monitoring Guide

3. If the problem still persists, contact technical support for help in diagnosing a USB issue.

usb.adapter.reset
Message Severity Description
usb.adapter.reset

INFORMATION This message occurs when the Data ONTAP universal serial bus (USB) driver resets the specified adapter. This can occur during normal error handling.

Corrective action If the problem persists, then contact technical support.

usb.device.failed
Message Severity Description
usb.device.failed

ERROR This message occurs when multiple consecutive commands to the specified universal serial bus (USB) device are not completed within the allotted time. All recovery actions have been taken and the device cannot be used anymore. Take the following actions: 1. Ensure that all attached devices are supported devices and that they are seated correctly. 2. If the problem persists, replace the attached devices. 3. If the problem still persists, contact technical support for help in diagnosing a USB issue.

Corrective action

usb.device.initialize.failed
Message Severity Description
usb.device.initialize.failed

ERROR This message occurs when the Data ONTAP universal serial bus (USB) adapter driver fails to initialize the device attached to the associated port in the associated adapter for one of the following reasons: Cannot set a unique address for the device; device descriptor is invalid or contains incorrect data; cannot set an active configuration for the device; or the device had multiple interfaces. Note that the Data ONTAP USB driver only supports USB 2.0 bulk-only mass storage devices. Take one of the following actions:

Corrective action

EMS and operational messages | 277

1. If the device is connected to an external USB port, try reinserting the device. 2. If that fails, try replacing the device with a device from a different product family. 3. If the device is connected to the motherboard and the problem persists, contact technical support for help in diagnosing a USB issue.

usb.device.maximum.connected
Message Severity Description
usb.device.maximum.connected

WARNING This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects a new USB device inserted into the associated port in the associated adapter. This new device cannot be initialized because the maximum number of USB devices supported by the Data ONTAP USB adapter driver is already connected to the system. Take the following actions: 1. Remove a USB device that is already connected but is not being used. 2. Wait for 10 seconds, then reinsert the new device.

Corrective action

usb.device.protocol.mismatch
Message Severity Description
usb.device.protocol.mismatch

ERROR This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects a protocol mismatch in the device attached to the associated port in the associated adapter. It can be due to one of the following reasons: Unsupported interface. Unsupported device class or device subclass. Does not support the required pipes. Does not support required end points. Does not support the required maximum transfer packet size.

Note that the Data ONTAP USB driver only supports USB 2.0 bulk-only mass storage devices. Corrective action Take one of the following actions: If the device is connected to an external USB port, try replacing the device with a device from a different product family.

278 | Platform Monitoring Guide

If the device is connected to the motherboard, contact technical support for help in diagnosing a USB issue.

usb.device.removed
Message Severity Description
usb.device.removed

INFORMATION This message occurs when the Data ONTAP universal serial bus (USB) adapter driver successfully detects and handles the removal of the associated device, and the device is no longer accessible.

Corrective action None.

usb.device.timeout
Message Severity Description
usb.device.timeout

ERROR This message occurs when an outstanding command to the specified universal serial bus (USB) device is not completed within the allotted time. As part of the standard error handling sequence managed by the Data ONTAP USB adapter driver, this command to the device is aborted and reissued. Device level timeouts are a common indication of a USB link stability problem. In some cases, the link is operating normally and the specified device is having internal trouble processing I/O requests in a timely manner. In such cases, evaluate the specified device for possible replacement. Quite often the problem results from the partial failure of a component involved in the USB transport. The most common thing to check is the seating of the USB device into the USB port or the header. Take one of the following actions: If the device is connected to an external USB port, try replacing the device with a device from a different product family. If the device is connected to the motherboard, contact technical support for help in diagnosing the USB issue.

Corrective action

usb.device.unsupported
Message Severity
usb.device.unsupported

ERROR

EMS and operational messages | 279

Description

This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects an unsupported device attached to the default boot device port on the motherboard.

Corrective action Contact technical support for a replacement USB boot device.

usb.device.unsupported.speed
Message Severity Description Corrective action
usb.device.unsupported.speed

ERROR This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects a non high-speed device in the associated port. Remove all non high-speed devices attached to the system because the Data ONTAP USB adapter driver does not support non high-speed devices.

usb.external.device.not.used
Message Severity Description
usb.external.device.not.used

WARNING This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects a USB device connected to the external port.

Corrective action Remove the external USB device connected to the system.

usb.externalHub.notSupported
Message Severity Description
usb.externalHub.notSupported

WARNING This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects a USB hub device.

Corrective action Remove all hub devices attached to the system because the USB adapter driver does not support USB hub devices.

usb.port.error
Message Severity Description
usb.port.error

ERROR This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects an unrecoverable error on the associated port.

280 | Platform Monitoring Guide

Corrective action Take the following actions: 1. If a device is attached to the associated port, try reinserting the device. 2. If the problem persists, try replacing the device. 3. If the problem still persists, contact technical support for assistance in diagnosing a USB issue.

usb.port.reset
Message Severity Description
usb.port.reset

INFORMATION This message occurs when the Data ONTAP universal serial bus (USB) adapter driver resets the specified port on the associated adapter. This can occur during normal error handling.

Corrective action If the problem persists, contact technical support.

usb.port.state.indeterminate
Message Severity Description Corrective action
usb.port.state.indeterminate

WARNING This message occurs when the Data ONTAP universal serial bus (USB) adapter driver cannot determine the status of the associated port. Take the following actions: 1. If a device is attached to the associated port, try reinserting the device. 2. If the problem persists, try replacing the device. 3. If the problem still persists, contact technical support for assistance in diagnosing a USB issue.

usb.port.status.inconsistent
Message Severity Description
usb.port.status.inconsistent

ERROR This message occurs when the Data ONTAP universal serial bus (USB) adapter driver detects an inconsistent state of the associated port and cannot communicate with the attached device.

EMS and operational messages | 281

Corrective action

If a device is attached to the associated port, try reinserting the device. If that fails, try replacing the device. If the problem persists, contact technical support for assistance in diagnosing a USB issue.

usbmon.boot.device.failed
Message Severity Description
usbmon.boot.device.failed

ERROR This message occurs when the Data ONTAP module that is responsible for monitoring the health of the universal serial bus (USB) boot devices determines that the associated boot device will fail all writes to the media. Take the following actions: 1. Replace the device. 2. If the problem persists, contact technical support for help in diagnosing the USB issue.

Corrective action

usbmon.boot.device.pfa
Message Severity Description
usbmon.boot.device.pfa

WARNING This message occurs when the Data ONTAP universal serial bus (USB) boot device health monitor PFA (predictive failure analysis) determines that failure is forthcoming for the associated boot device.

Corrective action Take the following actions: 1. Replace the device. 2. If the problem persists, contact technical support for help in diagnosing the USB issue.

usbmon.disable.module
Message Severity Description Corrective action
usbmon.disable.module

INFORMATION This message occurs when the Data ONTAP module that is responsible for monitoring the health of the universal serial bus (USB) boot devices is disabled. 1. Halt the system by entering the following command at the system prompt:
halt

282 | Platform Monitoring Guide

2. After the system boots to the LOADER prompt, run the setenv disableusbmon? false command at the LOADER prompt. 3. Continue to boot the system by entering the following command at the LOADER prompt:
boot_ontap

usbmon.unable.to.monitor
Message Severity Description
usbmon.unable.to.monitor

WARNING This message occurs when the Data ONTAP module that is responsible for monitoring the health of the universal serial bus (USB) boot devices cannot extract health information from the monitored device.

Corrective action Take the following actions: 1. Replace the device. 2. If the problem persists, contact technical support.

FCoE HBA EMS messages


FCoE messages appear if the CNA (Converged Network Adapter) MPI (Management Port Interface) driver detects an unexpected event or illegal condition or if the HBA fails to initialize.

ispcna.mpi.dump
Message Severity Description ispcna.mpi.dump SVC_ERROR This message occurs when an unexpected event or illegal condition is detected by the CNA (Converged Network Adapter) Management Port Interface (MPI) driver and the contents of the adapter's Static RAM and memory must be dumped. After the dump, the adapter is reset and the contents of the dump are stored in a file in the /etc/log/ql8mpi directory.

Corrective action None; the adapter was reset.

ispcna.mpi.dump.saved
Message ispcna.mpi.dump.saved

EMS and operational messages | 283

Severity Description

SVC_ERROR This message occurs when an unexpected event or illegal condition is detected by the CNA (Converged Network Adapter) Management Port Interface (MPI) driver and the contents of the adapter's Static RAM and memory are saved. The dump files are stored on the system's root volume in the /etc/log/ql8mpi directory, with the following file name format: mpi[adapter]_[date]_[time].bin Send the dump file to technical support for analysis.

Corrective action

ispcna.mpi.initFailed
Message Severity Description ispcna.mpi.initFailed NODE_ERROR This message occurs when the CNA (Converged Network Adapter) fails to initialize.

Corrective action Take corrective actions based on the indicated reason for the failure.

Operational error messages


Operational error messages might appear on your system console or LCD when the system is operating, when it is halted, or when it is restarting because of system problems.

Disk hung during swap


Message Description Fatal? Disk hung during swap A disk error occurred as you were hot-swapping a disk. Yes.

Corrective action 1. Disconnect the disk from the power supply by opening the latch and pulling it halfway out. 2. Wait 15 seconds to allow all disks to spin down. 3. Reinstall the disk. 4. Restart the system by entering the following command:
boot

284 | Platform Monitoring Guide

Disk n is broken
Message Description Fatal? Disk n is broken nThe RAID group disk number. The solution depends on whether you have a hot spare in the system. No.

Corrective action See the appropriate system administration guide for information about how to locate a disk based on the RAID group disk number and how to replace a faulty disk.

Dumping core
Message Description Fatal? Dumping core The system is dumping core after a system crash. Yes.

Corrective action Write down the system crash message on the system console and report the problem to technical support.

Error dumping core


Message Description Fatal? Error dumping core The system cannot dump core during a system crash and restarts without dumping core. Yes.

Corrective action Report the problem to technical support.

FC-AL LINK_FAILURE
Message Description Fatal? Corrective action FC-AL LINK_FAILURE Fibre Channel arbitrated loop has link failures. No Report the problem to technical support.

FC-AL RECOVERABLE ERRORS


Message FC-AL RECOVERABLE ERRORS

EMS and operational messages | 285

Description Fatal?

Fibre Channel arbitrated loop has been determined to be unreliable. The link errors are recoverable in the sense that the system is still up and running No.

Corrective action Report the problem to technical support.

Panicking
Message Description Fatal? Panicking The system is crashing. If the system does not hang while crashing, the message Dumping core appears. Yes

Corrective action Report the problem to technical support.

RMC Alert: Boot Error


Message Description Fatal? RMC Alert: Boot Error RMC card sent a DOWN APPLIANCE message. Causes might be a down system, a boot error, or an OFN POST error. Yes.

Corrective action Harness script filters them and creates a case. Contact technical support.

RMC Alert: Down Appliance


Message Description Fatal? RMC Alert: Down Appliance RMC card sent a DOWN APPLIANCE message. Causes might be a down system, a boot error, or an OFN POST error. Yes.

Corrective Action Harness script filters them and creates a case. Contact technical support.

RMC Alert: OFW POST Error


Message Description RMC Alert: OFW POST Error RMC card sent a DOWN APPLIANCE message. Causes might be a down system, a boot error, or an OFN POST error.

286 | Platform Monitoring Guide

Fatal?

Yes

Corrective action Harness script filters them and creates a case. Contact technical support.

287

RLM messages
The RLM provides remote management capabilities for some storage systems and continuously monitors system health. Two types of messages are associated with the RLM and can help you monitor your system and troubleshoot problems. The following systems contain RLMs: 30xx and SA300 systems 31xx systems 60xx and SA600 systems

The RLM sends AutoSupport messages when certain problems occur with the system. These might include a reboot failure or a user-triggered power cycle. Data ONTAP generates EMS messages when RLM events and errors occur. These might include a firmware update failure or a communication error.
Note: For more information about what the RLM does, see the System Administration Guide for the version of Data ONTAP that your system is running.

When and how RLM AutoSupport e-mail messages are sent


The RLM generates AutoSupport e-mail messages when the system goes down or when certain problems occur. The RLM sends AutoSupport e-mail messages under the following conditions: The system reboots unexpectedly The system stops communicating with the RLM A watchdog reset occurs The system is power-cycled Firmware POST errors occur A user-initiated AutoSupport message occurs

The subject line of e-mail messages contains the words "System Notification" and includes the host name of the system and the message type. The following text shows an example of an RLM AutoSupport e-mail subject line: System Notification from system (RLM HBT
STOPPED)CRITICAL

Messages are sent to recipients that you designate when you configure AutoSupport in Data ONTAP.
Note: The RLM must be properly configured to send AutoSupport messages. For information about configuring the RLM, see the System Administration Guide and the Software Setup Guide for the version of Data ONTAP that your system is running.

288 | Platform Monitoring Guide

What RLM AutoSupport e-mail messages include


RLM AutoSupport e-mail messages have different sections that contain different kinds of information about your system. RLM e-mail messages include the following sections and information: Subject line: a system notification from the RLM of the system, stating the system condition or event that caused the AutoSupport message and the log level. Message body: the RLM configuration and version information, the system ID, serial number, model number, and host name. Attachments: SELs, the system sensor state as determined by the RLM, and console logs.
Note: For more information about the contents of AutoSupport messages, see the System Administration Guide for the version of Data ONTAP running on your system.

When and how RLM EMS messages are sent


Data ONTAP generates EMS messages when problems occur with the RLM and displays them on the system console. Problems that trigger EMS messages might include failed network configuration, failed RLM heartbeat, or firmware update errors. The console message includes the name of the EMS message and a brief description of the event or problem. The following text contains an example of an RLM EMS message:
[rlm.orftp.failed:warning]: RLM communication error, unsupported send request

RLM-generated AutoSupport messages


The RLM continuously monitors the system's health and generates AutoSupport messages when the system goes down or when other problems, such as startup errors, occur.

Heartbeat loss warning


Message Description Corrective action Heartbeat loss warning The Remote LAN Module (RLM) detects that the system is offline, possibly because the system stopped serving data. If this system shutdown was manually triggered, no action is necessary. Otherwise, complete the following steps.

RLM messages | 289

1. Check the status of your system and verify that the system and disk shelves are operational. 2. Contact technical support if the problem persists.

Reboot (power loss) critical


Message Description Reboot (power loss) critical The Remote LAN Module (RLM) detects that the system lost AC power.

Corrective action If you switched off the system before you received the notification, no action is necessary. Otherwise, restore power to the system.

Reboot warning
Message Description Reboot warning The Remote LAN Module (RLM) detects an abnormal system reboot.

Corrective action If this was a manually triggered or expected reboot, no action is necessary. Otherwise, complete the following steps. 1. Check the status of the system and determine the cause of the reboot. 2. Contact technical support if the system fails to reboot.

Reboot (watchdog reset) warning


Message Description Corrective action Reboot (watchdog reset) warning The Remote LAN Module (RLM) detects a watchdog reset error. 1. Check the system to verify that it is operational. 2. If your system is operational, run diagnostics on your entire system. 3. Contact technical support if the storage system is not serving data.

RLM heartbeat loss


Message Description RLM heartbeat loss The Remote LAN Module (RLM) detects the loss of heartbeat from Data ONTAP. The system possibly stopped serving data.

Corrective action 1. Connect to the RLM command-line interface (CLI) to check whether the RLM is operational.

290 | Platform Monitoring Guide

2. Contact technical support if the problem persists.

RLM heartbeat stopped


Message Description RLM heartbeat stopped The system software cannot see the RLM.

Corrective action 1. Connect to the RLM command-line interface (CLI) to check whether the RLM is operational. 2. Contact technical support if the problem persists.

System boot failed (POST failed)


Message Description System boot failed (POST failed) The Remote LAN Module (RLM) detects that a system error occurred during the POST and the system software cannot be booted.

Corrective action 1. Run diagnostics on your system. 2. Contact technical support if running diagnostics does not detect any faulty components.

User triggered (RLM test)


Message Description User triggered (RLM test) The Remote LAN Module (RLM) received the rlm test command, which tests the RLM configuration.

Corrective action No action is necessary.

User_triggered (system nmi)


Message Description User_triggered (system nmi) A user is initiating a system core dump (nmi) through the Remote LAN Module (RLM).

Corrective action No action is necessary.

User_triggered (system power cycle)


Message User_triggered (system power cycle)

RLM messages | 291

Description

A user is initiating a system power-cycle through the Remote LAN Module (RLM).

Corrective action No action is necessary.

User_triggered (system power off)


Message Description User_triggered (system power off) A user is powering off the system through the Remote LAN Module (RLM).

Corrective action No action is necessary.

User_triggered (system power on)


Message Description User_triggered (system power on) A user is powering on the system through the Remote LAN Module (RLM).

Corrective action No action is necessary.

User_triggered (system reset)


Message Description Corrective action User_triggered (system reset) A user is resetting the system through the Remote LAN Module (RLM). No action is necessary.

EMS messages about the RLM


Data ONTAP generates EMS messages when problems occur with the RLM. These problems might include failed network configuration or firmware update errors.

rlm.driver.hourly.stats
Message Severity Description rlm.driver.hourly.stats Warning The system encountered an error while trying to get hourly statistics from the Remote LAN Module (RLM).

Corrective action 1. Check whether the RLM is online by entering the following command at the Data ONTAP prompt:
rlm status

292 | Platform Monitoring Guide

2. If the RLM is operational and the problem persists, enter the following command to reboot the RLM:
rlm reboot

rlm.driver.mailhost
Message Severity Description rlm.driver.mailhost Warning This message occurs when Remote LAN Module (RLM) setup verifies whether a mailhost specified in ONTAP can be reached. In this case, RLM setup cannot connect to the specified mailhost.

Corrective action 1. Verify that a valid mailhost is configured in Data ONTAP by checking the system AutoSupport configuration. 2. Ensure that ONTAP can successfully connect to the specified mailhost by entering a test AutoSupport command.

rlm.driver.network.failure
Message Severity Description rlm.driver.network.failure Warning A failure occurred during the network configuration of the Remote LAN Module (RLM). The system could not assign the RLM a Dynamic Host Configuration Protocol (DHCP) or fixed IP address. 1. Check whether the RLM is online by entering the following command at the Data ONTAP prompt:
rlm status

Corrective action

2. If the RLM is operational and the problem persists, enter the following command to reboot the RLM:
rlm reboot

rlm.driver.timeout
Message Severity rlm.driver.timeout Warning

RLM messages | 293

Description

A failure occurred during communication with the Remote LAN Module (RLM).

Corrective action 1. Check whether the RLM is online by entering the following command at the Data ONTAP prompt:
rlm status

2. If the RLM is operational and the problem persists, enter the following command to reboot the RLM:
rlm reboot

rlm.firmware.update.failed
Message Severity Description rlm.firmware.update.failed SVC_ERROR An error occurred during an update to the Remote LAN Module (RLM) firmware. The firmware might have failed due to the following reasons: Corrective action An incorrect RLM firmware image or a corrupted image file A communication error while sending new firmware to the RLM An update failure while applying new firmware at the RLM A system reset or loss of power during an update

1. Download the firmware image by entering the following command:


software install http://pathto/RLM_FW.zip -f

2. Make sure that the RLM is still operational by entering the following command at the system prompt:
rlm status

3. Retry updating the RLM firmware. For more information, see the section on updating RLM firmware in the System Administration Guide. 4. If the failure persists, contact technical support.

rlm.firmware.upgrade.reqd
Message Severity Description rlm.firmware.upgrade.reqd WARNING The Remote LAN Module (RLM) firmware version and the version of Data ONTAP are incompatible and cannot communicate correctly about a particular capability.

294 | Platform Monitoring Guide

Corrective action Update the firmware version of the RLM to the version recommended for your version of Data ONTAP. For more information, see the section on upgrading RLM firmware in the System Administration Guide.

rlm.firmware.version.unsupported
Message Severity Description Corrective action rlm.firmware.version.unsupported WARNING The firmware on the Remote LAN Module (RLM) is an unsupported version and must be upgraded. Update the firmware version of the RLM to the version recommended for your version of Data ONTAP. For more information, see the section on upgrading RLM firmware in the System Administration Guide.

rlm.heartbeat.bootFromBackup
Message Severity Description rlm.heartbeat.bootFromBackup WARNING The system rebooted the Remote LAN Module (RLM) from its backup firmware to restore RLM availability. The RLM is considered unavailable when the system stops receiving heartbeat notifications from the RLM. To restore availability, the system tries to reboot the RLM form the RLM's primary firmware. If that fails, the system tries to reboot the RLM from the RLM's backup firmware. This message is generated if the reboot from backup firmware restores availability. Update the firmware version of the RLM to the version recommended for your version of Data ONTAP. For more information, see the section on upgrading RLM firmware in the System Administration Guide.

Corrective action

rlm.heartbeat.resumed
Message Severity Description rlm.heartbeat.resumed WARNING The system detected the resumption of Remote LAN Module (RLM) heartbeat notifications, indicating that the RLM is now available. The earlier issue indicated by the rlm.heartbeat.stopped message was resolved.

RLM messages | 295

Corrective action None needed.

rlm.heartbeat.stopped
Message Severity Description rlm.heartbeat.stopped WARNING The system did not receive an expected heartbeat message from the Remote LAN Module (RLM). The RLM and the system exchange heartbeat messages, which they use to detect when one or the other is unavailable. 1. Connect to the RLM CLI. 2. Collect debugging information by entering the following commands:
rlm version rlm config priv set advanced rlm log debug rlm log messages

Corrective action

3. Run the RLM diagnostics: a. From the boot loader prompt, enter
boot_diags

b. When the diagnostics main menu appears, select agent. c. To test the syst/agent/RLM interface, select tests 2 and 6. 4. See the section on troubleshooting RLM problems in the System Administration Guide. 5. If the problem persists, contact technical support.

rlm.network.link.down
Message Severity Description rlm.network.link.down WARNING The Remote LAN Module (RLM) detected a link error on the RLM network port. This can happen if a network cable is not plugged into the RLM network port. It can also happen if the network that the RLM is connected to cannot run at 10/100 Mbps.

296 | Platform Monitoring Guide

Corrective action

1. Check whether the network cable is correctly plugged into the RLM network port. 2. Check the link status LED on the RLM. 3. Verify that the network that the RLM is connected to supports autonegotiation to 10/100 Mbps or is running at one of those speeds; otherwise, RLM network connectivity does not work.

rlm.notConfigured
Message Severity Description rlm.notConfigured WARNING This message occurs weekly to remind you to configure the Remote LAN Module (RLM). The RLM is a physical device that is incorporated into your system to provide remote access and remote management capabilities. To use the full functionality of RLM, you need to configure it first. 1. Use the rlm setup command to configure the RLM. If necessary, use the rlm status command to obtain its MAC address. 2. Use the rlm status command to verify the RLM network configuration. 3. Use the rlm test autosupport command to verify that the RLM can send AutoSupport e-mail. Note that AutoSupport mailhosts and recipients must be properly configured in Data ONTAP before issuing this command.

Corrective action

rlm.orftp.failed
Message Severity Description rlm.orftp.failed WARNING A communication error occurred while sending or receiving information from the Remote LAN Module (RLM).

Corrective action 1. Check whether the RLM is operational by entering the following command at the Data ONTAP prompt:
rlm status

2. If the RLM is operational and this error persists, enter the following command to reboot the RLM:
rlm reboot

RLM messages | 297

3. If this message persists after you reboot the RLM, contact technical support.

rlm.snmp.traps.off
Message Severity Description rlm.snmp.traps.off INFO The advanced privilege level in Data ONTAP was used to disable the SNMP trap feature of the Remote LAN Module (RLM). This message occurs at boot. This message also occurs when the SNMP trap capability was disabled and a user invokes a Data ONTAP command to use the RLM to send an SNMP trap. To enable RLM SNMP trap support, set the rlm.snmp.traps option to On.

Corrective action

rlm.systemDown.alert
Message Severity Description rlm.systemDown.alert ALERT System remote management detected a system down event. This is only an SNMP trap that is sent out by the Remote LAN Module (RLM) firmware. The trap includes a string describing the specific event that triggered the trap. The string is structured in the following form with key=value pairs:
Remote Management Event: type={system_down|system_up|test| keep_alive}, severity={alert|warning| notice|normal|debug|info}, event={post_error|watchdog_reset| power_loss}

Corrective action

1. Check the system to verify that it has power and is operational. 2. If your system is operational, run diagnostics on your entire system. 3. Contact technical support if the system is not serving data.

rlm.systemDown.notice
Message Severity Description rlm.systemDown.notice NOTICE System remote management detected a system down event. This is only an SNMP trap that is sent out by the Remote LAN Module (RLM) firmware. The trap includes a string describing the specific event that triggered the trap. The string is structured in the following form with key=value pairs:

298 | Platform Monitoring Guide

Remote Management Event: type={system_down|system_up|test| keep_alive}, severity={alert|warning|notice|normal|debug| info}, event={power_off_via_rlm|power_cycle_via_rlm| reset_via_rlm}

Corrective action

1. Check the system to verify that it has power and is operational. 2. If your system is operational, run diagnostics on your entire system. 3. Consult technical support if the system is not serving data.

rlm.systemDown.warning
Message Severity Description rlm.systemDown.warning WARNING System remote management detected a system down event. This is only an SNMP trap that is sent out by the Remote LAN Module (RLM) firmware. The trap includes a string describing the specific event that triggered the trap. The string is structured in the following form with key=value pairs:
Remote Management Event: type={system_down|system_up|test| keep_alive}, severity={alert|warning|notice|normal|debug| info}, event={loss_of_heartbeat}

Corrective action

1. Check the system to verify that it has power and is operational. 2. If your system is operational, run diagnostics on your entire system. 3. Consult technical support if the system is not serving data.

rlm.systemPeriodic.keepAlive
Message Severity Description rlm.systemPeriodic.keepAlive INFO System remote management sent a periodic keep-alive event. This is only an SNMP trap that is sent out by the Remote LAN Module (RLM) firmware. The trap includes a string describing the specific event that triggered the trap. The string is structured in the following form with key=value pairs:
Remote Management Event: type={system_down|system_up|test| keep_alive}, severity={alert|warning|notice|normal|debug| info}, event={periodic_message}

Corrective action

None needed.

RLM messages | 299

rlm.systemTest.notice
Message Severity Description rlm.systemTest.notice NOTICE System remote management detected a test event. This is only an SNMP trap that is sent out by the Remote LAN Module (RLM) firmware. The trap includes a string describing the specific event that triggered the trap. The string is structured in the following form with key=value pairs:
Remote Management Event: type={system_down|system_up|test| keep_alive}, severity={alert|warning|notice|normal|debug| info}, event={test}

Corrective action

None needed.

rlm.userlist.update.failed
Message Severity Description rlm.userlist.update.failed WARNING There was an error while updating user information for the Remote LAN Module (RLM). When user information is updated on Data ONTAP, the RLM is also updated with the new changes. This enables users to log in to the RLM. 1. Check whether the RLM is operational by entering the following command at the Data ONTAP prompt:
rlm status

Corrective action

2. If the RLM is operational and this error persists, reboot the RLM by entering the following command:
rlm reboot

3. Retry the operation that caused the error message. 4. If this message persists after you reboot the RLM, contact technical support.

300 | Platform Monitoring Guide

301

BMC messages
The BMC provides remote platform management capabilities on FAS20xx and SA200 systems. BMC capabilities include remote access, monitoring, troubleshooting, logging, and alerting features. The BMC sends AutoSupport messages through its independent management interface, regardless of the state of the system.

How and when BMC AutoSupport e-mail notifications are sent


BMC e-mail notifications are sent to configured recipients designated by the AutoSupport feature. The e-mail notifications have the title System Alert from BMC of filer serial number," followed by the message type. The serial number is that of the controller with which the BMC is associated. Typical BMC-generated AutoSupport messages occur under the following conditions: The system reboots unexpectedly A system reboot fails A user-issued action triggers an AutoSupport message

What BMC e-mail notifications include


The different parts of BMC e-mail messages contain information about your system. BMC e-mail notifications include the following information: Subject line: a system notification from the BMC of the system, listing the system condition or event that cause the AutoSupport message and the log level. Message body: the IP address, netmask, and other information about the system. Attachments: system configuration and sensor information.

BMC-generated AutoSupport messages


The BMC can generate a variety of messages telling you of problems or events occurring on your system.

302 | Platform Monitoring Guide

BMC_ASUP_UNKNOWN
Message Description Corrective action BMC_ASUP_UNKNOWN Unknown Baseboard Management Controller (BMC) error. Report the problem to technical support.

REBOOT (abnormal)
Message Explanation Corrective action REBOOT (abnormal) An abnormal reboot occurred. Verify that the system has returned to operation.

REBOOT (power loss)


Message Description REBOOT (power loss) A power failure was detected, and the system restarted. This occurs when the system is power-cycled by the external switches or in a true power loss.

Corrective action Verify that the system has returned to operation.

REBOOT (watchdog reset)


Message Description REBOOT (watchdog reset) The system stopped responding and was rebooted by the Baseboard Management Controller (BMC). This occurs when the BMC watchdog is triggered.

Corrective action Verify that the system has returned to operation.

SYSTEM_BOOT_FAILED (POST failed)


Message Description Corrective action SYSTEM_BOOT_FAILED (POST failed) The system failed to pass the BIOS POST. This occurs when the BIOS status sensor is in a failed or hung state. 1. Issue a system reset backup command from the Baseboard Management Controller (BMC) console, and if the system can come up to the boot loader, issue the flash command to update the primary BIOS firmware. 2. If the system is still nonresponsive, contact technical support.

BMC messages | 303

SYSTEM_POWER_OFF (environment)
Message Description SYSTEM_POWER_OFF (environment) An environmental sensor entered a critical, nonrecoverable state, and Data ONTAP has been requested to power off the system.

Corrective action Verify the environmental conditions of the system.

USER_TRIGGERED (bmc test)


Message Description USER_TRIGGERED (bmc test) A user triggered the Baseboard Management Controller (BMC) AutoSupport internal test through the BMC console, Systems Management Architecture for Server Hardware (SMASH), or Intelligent Platform Management Interface (IPMI).

Corrective action Verify that the command was issued by an authorized user.

USER_TRIGGERED (system nmi)


Message Description USER_TRIGGERED (system nmi) A user requested a core dump through the BMC console, SMASH, or IPMI.

Corrective action Verify that the command was issued by an authorized user.

USER_TRIGGERED (system power cycle)


Message Description USER_TRIGGERED (system power cycle) A user issued a power-cycle command through the Baseboard Management Controller (BMC) console, Systems Management Architecture for Server Hardware (SMASH), or Intelligent Platform Management Interface (IPMI).

Corrective action Verify that the command was issued by an authorized user.

USER_TRIGGERED (system power off)


Message Description USER_TRIGGERED (system power off) A user issued a power off command through the Baseboard Management Controller (BMC) console, Systems Management Architecture for Server Hardware (SMASH), or Intelligent Platform Management Interface (IPMI).

Corrective action Verify that the command was issued by an authorized user.

304 | Platform Monitoring Guide

USER_TRIGGERED (system power on)


Message Description USER_TRIGGERED (system power on) A user issued a power on command through the Baseboard Management Controller (BMC) console, Systems Management Architecture for Server Hardware (SMASH), or Intelligent Platform Management Interface (IPMI).

Corrective action Verify that the command was issued by an authorized user.

USER_TRIGGERED (system power soft-off)


Message Description USER_TRIGGERED (system power soft-off) A user issued a power soft-off command through the Baseboard Management Controller (BMC) console, Systems Management Architecture for Server Hardware (SMASH), or Intelligent Platform Management Interface (IPMI).

Corrective action Verify that the command was issued by an authorized user.

USER_TRIGGERED (system reset)


Message Description USER_TRIGGERED (system reset) A user issued a reset command through the Baseboard Management Controller (BMC) console, Systems Management Architecture for Server Hardware (SMASH), or Intelligent Platform Management Interface (IPMI).

Corrective action Verify that the command was issued by an authorized user.

EMS messages about the BMC


The EMS might send messages to your system console about the BMC.

bmc.asup.crit
Message Description Corrective action bmc.asup.crit This message occurs when the Baseboard Management Controller (BMC) sends an AutoSupport message of a CRITICAL priority. The action you take depends on whether the operating environment for the system, storage, or associated cabling has changed. If the operating environment has changed, shut down and power off the system until the environment is restored to normal operations. If the operating environment has not changed, check for previous errors and warnings. Also check for hardware statistics from Fibre Channel, SCSI, disk

BMC messages | 305

drives, other communications mechanisms, and previous administrative activities.

bmc.asup.error
Message Description bmc.asup.error This message occurs when the Baseboard Management Controller (BMC) fails to construct the necessary attachments of an AutoSupport message.

Corrective action This message indicates an internal error with the BMC's AutoSupport processing. Contact technical support.

bmc.asup.init
Message Description bmc.asup.init This message occurs when the Baseboard Management Controller (BMC) fails to initialize its AutoSupport subsystem due to a lack of resources.

Corrective action This message indicates an internal error with the BMC's AutoSupport processing. Contact technical support.

bmc.asup.queue
Message Description bmc.asup.queue This message occurs when the Baseboard Management Controller (BMC) has too many outstanding AutoSupport messages and no longer has enough resources to service them. This message might indicate an issue with your AutoSupport configuration. 1. Ensure that your system is configured to use the correct AutoSupport SMTP mail host, and that the mail host is properly configured to handle AutoSupport messages originating from the BMC. 2. For additional help, contact technical support.

Corrective action

bmc.asup.send
Message Description bmc.asup.send This message occurs when the Baseboard Management Controller (BMC) sends an AutoSupport message.

Corrective action 1. Follow the corrective action recommended for the AutoSupport message that was sent.

306 | Platform Monitoring Guide

2. For additional help, contact technical support.

bmc.asup.smtp
Message Description Corrective action bmc.asup.smtp This message occurs when the Baseboard Management Controller (BMC) fails to contact the mailhost when attempting to send an AutoSupport message. This message indicates an issue with your AutoSupport configuration. 1. Ensure that your system is configured to use the correct AutoSupport SMTP mail host and that the mail host is properly configured to handle AutoSupport messages originating from the BMC. 2. For additional help, contact technical support.

bmc.batt.id
Message Description bmc.batt.id This message occurs when the Baseboard Management Controller (BMC) cannot read the part number information stored in the battery configuration firmware.

Corrective action Contact technical support for the current procedure to determine whether the battery failed.

bmc.batt.invalid
Message Description bmc.batt.invalid This message occurs when the Baseboard Management Controller (BMC) determines that the battery installed is not the correct model for your system.

Corrective action Contact technical support to request the appropriate replacement battery for your model of system.

bmc.batt.mfg
Message Description bmc.batt.mfg This message occurs when the Baseboard Management Controller (BMC) cannot read the manufacturer information stored in the battery configuration firmware.

Corrective action Contact technical support for the current procedure to determine whether the battery failed.

BMC messages | 307

bmc.batt.rev
Message Description bmc.batt.rev This message occurs when the Baseboard Management Controller (BMC) cannot read the revision code stored in the battery configuration firmware.

Corrective action Contact technical support for the current procedure to determine whether the battery failed.

bmc.batt.seal
Message Description bmc.batt.seal This message occurs when the Baseboard Management Controller (BMC) cannot seal the battery's configuration information after a battery upgrade.

Corrective action Contact technical support for the current procedure to determine whether the battery failed.

bmc.batt.unknown
Message Description bmc.batt.unknown This message occurs when the Baseboard Management Controller (BMC) determines that the installed battery is not a recognized part that is approved for use in your system.

Corrective action Contact technical support to request the appropriate replacement battery for your model of system.

bmc.batt.unseal
Message Description bmc.batt.unseal This message occurs when the Baseboard Management Controller (BMC) cannot unseal the battery's configuration information to determine whether the battery firmware requires an upgrade.

Corrective action Contact technical support for the current procedure to determine whether the battery failed.

bmc.batt.upgrade
Message bmc.batt.upgrade

308 | Platform Monitoring Guide

Description

This message occurs when the Baseboard Management Controller (BMC) generates it before an upgrade of the battery's configuration firmware to indicate to the user the present and new revisions of battery configuration.

Corrective action None.

bmc.batt.upgrade.busy
Message Description bmc.batt.upgrade.busy This message occurs when the Baseboard Management Controller (BMC) determines that the battery configuration firmware requires an upgrade, but that the BMC is too busy to perform the upgrade. It is normal to get this message one time after a BMC upgrade. However, if this message is issued more than once, it indicates a problem with your system. Contact technical support for the current procedure to determine whether your system needs to be replaced.

Corrective action

bmc.batt.upgrade.failed
Message Description Corrective action bmc.batt.upgrade.failed This message occurs when the Baseboard Management Controller (BMC) cannot upgrade the battery configuration firmware to the latest revision. In most cases, this error does not impact the functionality of your system, but replacing the battery might be advised at your next maintenance window. Contact technical support for the current procedure to determine whether the battery needs to be replaced.

bmc.batt.upgrade.failure
Message Description bmc.batt.upgrade.failure This message occurs when the Baseboard Management Controller (BMC) generates it for every configuration item in the battery configuration firmware that could not be updated during a battery upgrade. 1. Remove and reinsert the controller module. In most cases, this forces the BMC to reattempt and successfully upgrade the battery. 2. If you see this message more than once, contact technical support for the current procedure to determine whether the battery needs to be replaced.

Corrective action

BMC messages | 309

bmc.batt.upgrade.ok
Message Description Corrective action bmc.batt.upgrade.ok This message occurs when the entire battery upgrade process is complete. None.

bmc.batt.upgrade.power-off
Message Description bmc.batt.upgrade.power-off This message occurs in the rare event where the Baseboard Management Controller (BMC) cannot turn on system power, and the battery has not been checked to determine whether it requires a configuration upgrade. 1. Remove and reinsert the controller module. 2. If you continue to see this message, contact technical support for the current procedure to determine whether the controller module needs to be replaced.

Corrective action

bmc.batt.upgrade.voltagelow
Message Description bmc.batt.upgrade.voltagelow This message occurs when the Baseboard Management Controller (BMC) generates it because the battery is discharged to below 6.0V and the battery requires a configuration firmware update. This message is printed every 10 minutes until the battery is recharged. If you continue to see this message after one hour, contact technical support for the current procedure to determine whether the battery needs to be replaced.

Corrective action

bmc.batt.voltage
Message Description bmc.batt.voltage This message occurs in the rare event where the Baseboard Management Controller (BMC) determines that the battery configuration firmware requires an update and the battery is successfully prepared for the update, but the BMC cannot read the battery voltage sensor. Contact technical support for the current procedure to determine whether the battery needs to be replaced.

Corrective action

310 | Platform Monitoring Guide

bmc.config.asup.off
Message Description bmc.config.asup.off This message occurs in the rare event that the Baseboard Management Controller (BMC) detects corruption in the BMC's internal cached copy of the AutoSupport mail host and/or configured destinations. AutoSupport messages from the BMC are disabled until the system boots. Boot the system to ensure that the BMC's cache of the AutoSupport configuration is correct.

Corrective action

bmc.config.corrupted
Message Description bmc.config.corrupted This message occurs in the rare event that the Baseboard Management Controller (BMC) internal configuration is corrupted and is being reset to defaults. Notably, the SSH service on the BMC LAN interface is disabled until the system boots. 1. Boot the system. Upon boot, the Secure Shell (SSH) host keys for the BMC are regenerated. The previous host keys for the BMC are no longer valid and cannot be used for logins. 2. Contact technical support to determine whether your system needs maintenance.

Corrective action

bmc.config.default
Message Description bmc.config.default This message occurs in the rare event that the Baseboard Management Controller (BMC) internal configuration is corrupted and is being reset to defaults. Notably, the Secure Shell (SSH) service on the BMC LAN interface is disabled until the system boots. 1. Boot the system. Upon boot, the SSH host keys for the BMC are regenerated. The previous host keys for the BMC are no longer valid and cannot be used for logins. 2. Contact technical support to determine whether your system needs maintenance.

Corrective action

bmc.config.default.pef.filter
Message bmc.config.default.pef.filter

BMC messages | 311

Description

This message occurs in the rare event that the Baseboard Management Controller (BMC) internal configuration is corrupted and is being reset to defaults. Notably, the BMC's Platform Event Filter (PEF) tables are being cleared to factory defaults. Most users need to take no action. However, if you want to use custom Intelligent Platform Management Interface (IPMI) PEF tables, you need to reenable the BMC IPMI LAN interface, and reload any custom PEF tables that might be defined for your site.

Corrective action

bmc.config.default.pef.policy
Message Description bmc.config.default.pef.policy This message occurs in the rare event that the Baseboard Management Controller (BMC) internal configuration is corrupted and is being reset to defaults. Notably, the BMC's Platform Event Filter (PEF) tables are being cleared to factory defaults. Most users need to take no action. However, if you want to use custom IPMI PEF tables, you need to reenable the BMC Intelligent Platform Management Interface (IPMI) LAN interface, and reload any custom PEF tables that might be defined for your site.

Corrective action

bmc.config.fru.systemserial
Message Description bmc.config.fru.systemserial This message occurs when the Baseboard Management Controller (BMC) detects an invalid System Serial Number field in the systems field-replaceable unit (FRU) configuration area.

Corrective action Contact technical support to determine the maintenance procedure for your system.

bmc.config.mac.error
Message Description bmc.config.mac.error This message occurs when the Baseboard Management Controller (BMC) Ethernet Media Access Control (MAC) identifier is invalid.

Corrective action Contact technical support to determine the corrective procedure for your system.

bmc.config.net.error
Message bmc.config.net.error

312 | Platform Monitoring Guide

Description

This message occurs when the Baseboard Management Controller (BMC) cannot start networking support on the BMC LAN interface.

Corrective action Contact technical support to determine the corrective procedure for your system.

bmc.config.upgrade
Message Description bmc.config.upgrade This message occurs when the Baseboard Management Controller (BMC) internal configuration defaults are updated.

Corrective action None.

bmc.power.on.auto
Message Description bmc.power.on.auto This message occurs when, upon power up, the Baseboard Management Controller (BMC) detects that the system was previously soft powered-off.

Corrective action None.

bmc.reset.ext
Message Description bmc.reset.ext This message occurs when the Baseboard Management Controller (BMC) detects that a bmc reboot command was issued on the system previously.

Corrective action None.

bmc.reset.int
Message Description bmc.reset.int This message occurs when the Baseboard Management Controller (BMC) was reset through the BMC command sequence ngs smash; set reboot=1; priv set diag.

Corrective action None.

bmc.reset.power
Message Description bmc.reset.power This message occurs when the Baseboard Management Controller (BMC) detects a system power up, or after the BMC is upgraded.

BMC messages | 313

Corrective action None.

bmc.reset.repair
Message Description bmc.reset.repair This message occurs when the Baseboard Management Controller (BMC) detects and corrects an internal BMC error.

Corrective action If you receive this message frequently, contact technical support to determine the corrective procedure for your system.

bmc.reset.unknown
Message Description bmc.reset.unknown This message occurs when the Baseboard Management Controller (BMC) cannot determine why it was reset.

Corrective action This message usually indicates a BMC internal error. Contact technical support to determine the corrective procedure for your system.

bmc.sensor.batt.charger.off
Message Description bmc.sensor.batt.charger.off This message occurs when the Baseboard Management Controller (BMC) detects that the battery charger cannot be disabled for the hourly battery load test.

Corrective action Contact technical support to determine the corrective procedure for your system.

bmc.sensor.batt.charger.on
Message Description bmc.sensor.batt.charger.on This message occurs when the Baseboard Management Controller (BMC) cannot reenable the battery charger after the hourly battery load test.

Corrective action Contact technical support to determine the corrective procedure for your system.

bmc.sensor.batt.time.run.invalid
Message bmc.sensor.batt.time.run.invalid

314 | Platform Monitoring Guide

Description

This message occurs when the Baseboard Management Controller (BMC) detects that the battery's calculated run time differs substantially from the battery's run-time sensor.

Corrective action None.

bmc.ssh.key.missing
Message Description bmc.ssh.key.missing This message occurs when the Baseboard Management Controller (BMC) detects that the Secure Shell (SSH) host keys for the BMC are corrupted or missing.

Corrective action Reboot the system. The boot sequence regenerates the host key and makes the BMC SSH service available again.

315

Service Processor messages


The Service Processor (SP) enables you to access, monitor, and troubleshoot 2240, 32xx, 62xx, SA320, and SA620 storage systems remotely. Two types of messages are associated with the SP and can help you monitor your system and troubleshoot problems. The SP sends AutoSupport messages when certain problems occur. These might include a loss of heartbeat or a reboot failure. Data ONTAP generates EMS messages when SP events and errors occur. These might include a reminder to configure the SP or an alert to an SP communication problem.
Note: For more information about what the SP does, see the System Administration Guide for the version of Data ONTAP that your system is running.

When and how SP AutoSupport e-mail messages are sent


The SP generates AutoSupport e-mail messages when the system goes down or when certain problems occur. The SP sends the messages under the following conditions: The storage system reboots unexpectedly. The storage system stops communicating with the SP. A watchdog reset occurs. The watchdog is a built-in hardware sensor that monitors the storage system for a hung or unresponsive condition. If the watchdog detects such a condition, it resets the storage system so that the system can automatically reboot and begin functioning. The storage system is power-cycled. Firmware power-on self-test (POST) errors occur. A user initiates an AutoSupport message. A user resets the system using the SP.

The subject line of e-mail messages contains the word Notification and includes the host name of the system and the message type. The following text shows an example of an SP AutoSupport e-mail subject line:
System Notification from host_name (HEARTBEAT_LOSS [WARNING]

Messages are sent to recipients that you designate when you configure AutoSupport in Data ONTAP.
Note: The SP must be properly configured to send AutoSupport messages. For information about

configuring the SP, see the System Administration Guide and the Software Setup Guide for the version of Data ONTAP that your system is running.

316 | Platform Monitoring Guide

What SP AutoSupport e-mail messages include


SP AutoSupport e-mail messages have different sections that contain different kinds of information about your system. SP e-mail messages include the following sections and information: Subject line: a system notification from the SP of the system, stating the system condition or event that caused the AutoSupport message and the log level. Message body: the SP configuration and version information, the system ID, serial number, model, and host name. Attachments: System Event Logs, the system sensor state as determined by the SP, and console logs.

When and how SP EMS messages are sent


Data ONTAP generates EMS messages when problems occur with the SP and displays them on the system console. Problems that trigger EMS messages might include installation of the wrong version of firmware, communication failure, or a network configuration failure. The console message includes the name of the EMS message and a brief description of the event or problem. The following text contains an example of an SP EMS message:
Date [sp.notConfigured:warning] The system's Service Processor (SP) is not configured. Use the 'sp setup' command to configure it.

SP-generated AutoSupport messages


The SP continuously monitors the system's health and generates AutoSupport messages when problems occur.

HEARTBEAT_LOSS
Message Description
HEARTBEAT_LOSS

This message is sent by the Service Processor (SP) when it detects loss of heartbeat from Data ONTAP, possibly because the system has stopped serving data. If this was a manually triggered or expected reboot, no action is needed. Otherwise, complete the following steps: 1. Check the status of the system and determine whether it is operational.

Corrective action

Service Processor messages | 317

2. Contact technical support.

REBOOT (abnormal)
Message Description Corrective action
REBOOT (abnormal)

This message is sent by the Service Processor (SP) when it detects an abnormal reboot of the system. If this was a manually triggered or expected reboot, no action is needed. Otherwise, complete the following steps: 1. Check the status of the system and determine the cause of reboot. 2. If the system fails to boot, contact technical support.

SYSTEM_BOOT_FAILED (POST failed)


Message Description
SYSTEM_BOOT_FAILED (POST failed)

This message is sent by the Service Processor (SP) when the system firmware has a Power On Self Test (POST) failure and cannot load and run Data ONTAP.

Corrective action 1. Run diagnostics on your system. 2. Contact technical support.

USER_TRIGGERED (sp test)


Message Description
USER_TRIGGERED (sp test)

This message is sent by the Service Processor (SP) when the sp test autosupport command is run from the Data ONTAP CLI. This is a test mechanism to verify the SP configuration.

Corrective action None.

USER_TRIGGERED (system nmi)


Message Description
USER_TRIGGERED (system nmi)

This message is sent by the Service Processor (SP) when a user issues a system core dump (NMI) SP command.

Corrective action None.

318 | Platform Monitoring Guide

USER_TRIGGERED (system power cycle)


Message Description
USER_TRIGGERED (system power cycle)

This message is sent by the Service Processor (SP) when a user power-cycles the system using SP.

Corrective action None.

USER_TRIGGERED (system power off)


Message Definition
USER_TRIGGERED (system power off)

This message is sent by the Service Processor (SP) when a user powers off the system using the SP.

Corrective action None.

USER_TRIGGERED (system reset)


Message Description
USER_TRIGGERED (system reset)

This message is sent by the Service Processor (SP) when a user resets the system using the SP.

Corrective action None.

EMS messages about the SP


Data ONTAP generates EMS messages when problems occur with the SP.

sp.firmware.upgrade.reqd
Message Severity Description
sp.firmware.upgrade.reqd

WARNING This message occurs when the Service Processor (SP) firmware version and the Data ONTAP software version are incompatible and cannot communicate correctly about a particular capability. Update the firmware version of the SP to the version recommended for your version of Data ONTAP. The firmware and update instructions are available on the NetApp Support Site. After you update the firmware, this message should no longer occur. If the message occurs again, contact technical support and explain that you already updated the firmware to the recommended version.

Corrective action

Service Processor messages | 319

sp.firmware.version.unsupported
Message Severity Description Corrective action
sp.firmware.version.unsupported

WARNING This message occurs when the firmware on the Service Processor (SP) is an unsupported version and must be upgraded. The firmware and instructions are available on the NOW site. After the SP is running the new firmware, this message should no longer occur. If the message occurs again, contact technical support and explain that you already updated the firmware to the recommended version.

sp.heartbeat.resumed
Message Severity Description
sp.heartbeat.resumed

INFO This message occurs when the system detects resumption of Service Processor (SP) heartbeat notifications indicating that the SP is now available. The earlier issue indicated by the sp.heartbeat.stopped event has been resolved.

Corrective action None.

sp.heartbeat.stopped
Message Severity Description
sp.heartbeat.stopped

WARNING This message occurs when Data ONTAP does not receive expected Service Processor (SP) heartbeat notifications. The SP and Data ONTAP exchange heartbeat messages so that they can detect when one or the other is unavailable. This event is generated when Data ONTAP has not received an expected heartbeat message from the SP. 1. Connect to the SP CLI and enter the following commands:
sp version priv set advanced sp log debug sp log messages

Corrective action

2. Run SP system diagnostics.

320 | Platform Monitoring Guide

3. If you still see this EMS message, contact technical support.

sp.network.link.down
Message Severity Description
sp.network.link.down

WARNING This message occurs when the Service Processor (SP) detects a link error on the SP network port. This can happen if a network cable is not plugged into the SP network port. It can also happen if the network that the SP is connected to cannot run at 10/100 Mbps. 1. Check whether the network cable is correctly plugged into the SP network port. 2. Check the link status LED on the SP. 3. Verify that the network that the SP is connected to supports autonegotiation to 10/100 Mbps or is running at one of those speeds; otherwise, SP network connectivity does not work. The SP supports a 10/100 Mbps Ethernet network in autonegotiation mode.

Corrective action

sp.notConfigured
Message Severity Description
sp.notConfigured

WARNING This message occurs weekly to remind you to configure the Service Processor (SP). The SP is a physical device that is incorporated into your system to provide remote access and remote management capabilities. To use the full functionality of SP, you must configure it first. Ensure that AutoSupport mailhosts and recipients are properly configured in Data ONTAP, and then take the following actions: 1. Configure the SP by entering the following command:
sp setup

Corrective action

If necessary, use the sp status command to obtain the SP's MAC address. 2. Verify the SP network configuration by entering the following command: sp status 3. Verify that the SP can send AutoSupport messages by entering the following command:
sp test autosupport

Service Processor messages | 321

sp.orftp.failed
Message Severity Description
sp.orftp.failed

WARNING This message occurs when there is a communication error while sending information to or receiving information from the Service Processor (SP). This error could be due to the following reasons: Communication error while the information is being sent or received. SP is nonoperational.

Corrective action

1. Check whether the SP is operational by entering the following command at the Data ONTAP prompt:
sp status

2. If the SP is operational and this message persists, reboot the SP by entering the following command at the Data ONTAP prompt:
sp reboot

3. If this message persists after you reboot the SP, contact technical support.

sp.snmp.traps.off
Message Severity Description
sp.snmp.traps.off

INFO This message occurs each time a system boots, if the advanced privilege level in Data ONTAP was used to disable the SNMP Trap feature of the Service Processor (SP). This message also occurs when the SNMP Trap capability is disabled and a user invokes a Data ONTAP command to use the SP to send an SNMP trap.

Corrective action

SP SNMP Trap support is currently disabled. To enable this feature, set the sp.snmp.traps option to On.

sp.userlist.update.failed
Message Severity
sp.userlist.update.failed

WARNING

322 | Platform Monitoring Guide

Description

This message occurs when there is an error updating user information for the Service Processor (SP). When user information is updated on Data ONTAP, the SP is also updated with the new changes. This enables users to log in to the SP. User information update for the Service Processor (SP) may have failed due to the following reasons: Communication error with the SP. SP might not be operational.

Corrective action

1. Check whether the SP is operational by entering the following command at the Data ONTAP prompt:
sp status

2. If the SP is operational and this message persists, reboot the SP by entering the following command at the Data ONTAP prompt:
sp reboot

3. Retry the operation that caused the error message. 4. If this message persists after you reboot the SP, contact technical support.

spmgmt.driver.hourly.stats
Message Severity Description
spmgmt.driver.hourly.stats

WARNING This message occurs when the system encounters an error while trying to get hourly statistics from the Service Processor (SP). The error could be due to the following reasons: Communication error with the (SP). SP is not operational.

Corrective action

1. Check whether the SP is online by entering the following command at the Data ONTAP prompt:
sp status

2. If the SP is online and this message persists, reboot the SP by entering the following command at the Data ONTAP prompt:
sp reboot

3. If this message persists after you reboot the SP, contact technical support.

Service Processor messages | 323

spmgmt.driver.mailhost
Message Severity Description
spmgmt.driver.mailhost

WARNING This message occurs when the Service Processor (SP) setup attempts to verify whether a mailhost specified in Data ONTAP can be reached. In this case, SP setup cannot connect to the specified mailhost. 1. Verify that a valid mailhost is configured in Data ONTAP by checking the system AutoSupport configuration. 2. Ensure that Data ONTAP can successfully connect to the specified mailhost by invoking a test command to invoke AutoSupport.

Corrective action

spmgmt.driver.network.failure
Message Severity Description
spmgmt.driver.network.failure

WARNING This message occurs when the system encounters a failure during network configuration of the Service Processor (SP). The system cannot assign the SP a DHCP (Dynamic Host Configuration Protocol) or fixed IP address. 1. Check whether the network cable is correctly plugged into the SP network port. 2. Check the link status LED on the SP. 3. Verify that the network that the SP is connected to supports autonegotiation to 10/100 speed or is running at one of those speeds; otherwise, SP network connectivity does not work. The SP supports a 10/100 Ethernet network in autonegotiation mode.

Corrective action

spmgmt.driver.timeout
Message Severity Description
spmgmt.driver.timeout

WARNING This message occurs when there is a failure during communication with the Service Processor (SP) firmware. The failure could be due to the following reasons: Communication error with the SP. SP is not operational.

324 | Platform Monitoring Guide

Corrective action

1. Check whether the SP is online by entering the following command at the Data ONTAP prompt:
sp status

2. If the SP is operational and this message persists, reboot the SP by entering the following command at the Data ONTAP prompt:sp reboot After the reboot, this message should no longer occur. If the message occurs again, contact support and explain that you already performed the preceding steps.

325

Abbreviations
A list of abbreviations and their spelled-out forms are included here for your reference. A

ABE (Access-Based Enumeration) ACE (Access Control Entry) ACL (access control list) ACP (Alternate Control Path) AD (Active Directory) ALPA (arbitrated loop physical address) ALUA (Asymmetric Logical Unit Access) AMS (Account Migrator Service) API (Application Program Interface) ARP (Address Resolution Protocol) ASCII (American Standard Code for Information Interchange) ASP (Active Server Page) ATA (Advanced Technology Attachment)
B

BCO (Business Continuance Option) BIOS (Basic Input Output System BCS (block checksum type ) BLI (block-level incremental) BMC (Baseboard Management Controller)

326 | Platform Monitoring Guide

CD-ROM (compact disc read-only memory) CDDI (Copper Distributed Data Interface) CDN (content delivery network) CFE (Common Firmware Environment) CFO (controller failover) CGI (Common Gateway Interface) CHA (channel adapter) CHAP (Challenge Handshake Authentication Protocol) CHIP (Client-Host Interface Processor) CIDR (Classless Inter-Domain Routing) CIFS (Common Internet File System) CIM (Common Information Model) CLI (command-line interface) CP (consistency point) CPU (central processing unit) CRC (cyclic redundancy check) CSP (communication service provider)

Abbreviations | 327

DAFS (Direct Access File System) DBBC (database consistency checker) DCE (Distributed Computing Environment) DDS (Decru Data Decryption Software) dedupe (deduplication) DES (Data Encryption Standard) DFS (Distributed File System) DHA (Decru Host Authentication) DHCP (Dynamic Host Configuration Protocol) DIMM (dual-inline memory module) DITA (Darwin Information Typing Architecture) DLL (Dynamic Link Library) DMA (direct memory access) DMTD (Distributed Management Task Force) DNS (Domain Name System) DOS (Disk Operating System) DPG (Data Protection Guide) DTE (Data Terminal Equipment)

328 | Platform Monitoring Guide

ECC (Elliptic Curve Cryptography) or (EMC Control Center) ECDN (enterprise content delivery network) ECN (Engineering Change Notification) EEPROM (electrically erasable programmable read-only memory) EFB (environmental fault bus) EFS (Encrypted File System) EGA (Enterprise Grid Alliance) EISA (Extended Infrastructure Support Architecture) ELAN (Emulated LAN) EMU environmental monitoring unit) ESH (embedded switching hub)
F

FAQs (frequently asked questions) FAS (fabric-attached storage) FC (Fibre Channel) FC-AL (Fibre Channel-Arbitrated Loop) FC SAN (Fibre Channel storage area network) FC Tape SAN (Fibre Channel Tape storage area network) FC-VI (virtual interface over Fibre Channel) FCP (Fibre Channel Protocol) FDDI (Fiber Distributed Data Interface) FQDN (fully qualified domain name) FRS (File Replication Service) FSID (file system ID) FSRM (File Storage Resource Manager) FTP (File Transfer Protocol)

Abbreviations | 329

GbE (Gigabit Ethernet) GID (group identification number) GMT (Greenwich Mean Time) GPO (Group Policy Object) GUI (graphical user interface) GUID (globally unique identifier)
H

HA (high availability) HBA (host bus adapter) HDM (Hitachi Device Manager Server) HP (Hewlett-Packard Company) HTML (hypertext markup language) HTTP (Hypertext Transfer Protocol)

330 | Platform Monitoring Guide

IB (InfiniBand) IBM (International Business Machines Corporation) ICAP (Internet Content Adaptation Protocol) ICP (Internet Cache Protocol) ID (identification) IDL (Interface Definition Language) ILM (information lifecycle management) IMS (If-Modified-Since) I/O (input/output) IP (Internet Protocol) IP SAN (Internet Protocol storage area network) IQN (iSCSI Qualified Name) iSCSI (Internet Small Computer System Interface) ISL (Inter-Switch Link) iSNS (Internet Storage Name Service) ISP (Internet storage provider)
J

JBOD (just a bunch of disks) JPEG (Joint Photographic Experts Group)


K

KB (Knowledge Base) Kbps (kilobits per second) KDC (Kerberos Distribution Center)

Abbreviations | 331

LAN (local area network) LBA (Logical Block Access) LCD (liquid crystal display) LDAP (Lightweight Directory Access Protocol) LDEV (logical device) LED (light emitting diode) LFS (log-structured file system) LKM (Lifetime Key Management) LPAR (system logical partition) LREP (logical replication tool utility) LUN (logical unit number) LUSE (Logical Unit Size Expansion) LVM (Logical Volume Manager)

332 | Platform Monitoring Guide

MAC (Media Access Control) Mbps (megabits per second) MCS (multiple connections per session) MD5 (Message Digest 5) MDG (managed disk group) MDisk (managed disk) MIB (Management Information Base) MIME (Multipurpose Internet Mail Extension) MMC (Microsoft Management Console) MMS (Microsoft Media Streaming) MPEG (Moving Picture Experts Group) MPIO (multipath network input/output) MRTG (Multi-Router Traffic Grapher) MSCS (Microsoft Cluster Service MSDE (Microsoft SQL Server Desktop Engine) MTU (Maximum Transmission Unit)

Abbreviations | 333

NAS (network-attached storage) NDMP (Network Data Management Protocol) NFS (Network File System) NHT (NetApp Health Trigger) NIC (network interface card) NMC (Network Management Console) NMS (network management station) NNTP (Network News Transport Protocol) NTFS (New Technology File System) NTLM (NetLanMan) NTP (Network Time Protocol) NVMEM (nonvolatile memory management) NVRAM (nonvolatile random-access memory)
O

OFM (Open File Manager) OFW (Open Firmware) OLAP (Online Analytical Processing) OS/2 (Operating System 2) OSMS (Open Systems Management Software) OSSV (Open Systems SnapVault)

334 | Platform Monitoring Guide

PC (personal computer) PCB (printed circuit board) PCI (Peripheral Component Interconnect) pcnfsd (storage daemon) (PC)NFS (Personal Computer Network File System) PDU (protocol data unit) PKI (Public Key Infrastructure) POP (Post Office Protocol) POST (power-on self-test) PPN (physical path name) PROM (programmable read-only memory) PSU power supply unit) PVC (permanent virtual circuit)
Q

QoS (Quality of Service) QSM (Qtree SnapMirror)

Abbreviations | 335

RAD (report archive directory) RADIUS (Remote Authentication Dial-In Service) RAID (redundant array of independent disks) RAID-DP (redundant array of independent disks, double-parity) RAM (random access memory) RARP (Reverse Address Resolution Protocol) RBAC (role-based access control) RDB (replicated database) RDMA (Remote Direct Memory Access) RIP (Routing Information Protocol) RISC (Reduced Instruction Set Computer) RLM (Remote LAN Module) RMC (remote management controller) ROM (read-only memory) RPM (revolutions per minute) rsh (Remote Shell) RTCP (Real-time Transport Control Protocol) RTP (Real-time Transport Protocol) RTSP (Real Time Streaming Protocol)

336 | Platform Monitoring Guide

SACL (system access control list) SAN (storage area network) SAS (storage area network attached storage) or (serial-attached SCSI) SATA (serial advanced technology attachment) SCSI (Small Computer System Interface) SFO (storage failover) SFSR (Single File SnapRestore operation) SID (Secure ID) SIMM (single inline memory module) SLB (Server Load Balancer) SLP (Service Location Protocol) SNMP (Simple Network Management Protocol) SNTP (Simple Network Time Protocol) SP (Storage Processor) SPN (service principal name) SPOF (single point of failure) SQL (Structured Query Language) SRM (Storage Resource Management) SSD (solid state disk SSH (Secure Shell) SSL (Secure Sockets Layer) STP (shielded twisted pair) SVC (switched virtual circuit)

Abbreviations | 337

TapeSAN (tape storage area network) TCO (total cost of ownership) TCP (Transmission Control Protocol) TCP/IP (Transmission Control Protocol/Internet Protocol) TOE (TCP offload engine) TP (twisted pair) TSM (Tivoli Storage Manager) TTL (Time To Live)
U

UDP (User Datagram Protocol) UI (user interface) UID (user identification number) Ultra ATA (Ultra Advanced Technology Attachment) UNC (Uniform Naming Convention) UPS (uninterruptible power supply) URI (universal resource identifier) URL (uniform resource locator) USP (Universal Storage Platform) UTC (Universal Coordinated Time) UTP (unshielded twisted pair) UUID (universal unique identifier) UWN (unique world wide number)

338 | Platform Monitoring Guide

VCI (virtual channel identifier) VCMDB (Volume Configuration Management Database) VDI (Virtual Device Interface) VDisk (virtual disk) VDS (Virtual Disk Service) VFM (Virtual File Manager) VFS (virtual file system) VI (virtual interface) vif (virtual interface) VIRD (Virtual Router ID) VLAN (virtual local area network) VLD (virtual local disk) VOD (video on demand) VOIP (voice over IP) VRML (Virtual Reality Modeling Language) VTL (Virtual Tape Library)
W

WAFL (Write Anywhere File Layout) WAN (wide area network) WBEM (Web-Based Enterprise Management) WHQL (Windows Hardware Quality Lab) WINS (Windows Internet Name Service) WORM (write once, read many) WWN (worldwide name) WWNN (worldwide node name) WWPN (worldwide port name) www (worldwide web)

Abbreviations | 339

ZCS (zoned checksum)

340 | Platform Monitoring Guide

341

Copyright information
Copyright 19942011 NetApp, Inc. All rights reserved. Printed in the U.S.A. No part of this document covered by copyright may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an electronic retrieval systemwithout prior written permission of the copyright owner. Software derived from copyrighted NetApp material is subject to the following license and disclaimer: THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp. The product described in this manual may be protected by one or more U.S.A. patents, foreign patents, or pending applications. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).

342 | Platform Monitoring Guide

343

Trademark information
NetApp, the NetApp logo, Network Appliance, the Network Appliance logo, Akorri, ApplianceWatch, ASUP, AutoSupport, BalancePoint, BalancePoint Predictor, Bycast, Campaign Express, ComplianceClock, Cryptainer, CryptoShred, Data ONTAP, DataFabric, DataFort, Decru, Decru DataFort, DenseStak, Engenio, Engenio logo, E-Stack, FAServer, FastStak, FilerView, FlexCache, FlexClone, FlexPod, FlexScale, FlexShare, FlexSuite, FlexVol, FPolicy, GetSuccessful, gFiler, Go further, faster, Imagine Virtually Anything, Lifetime Key Management, LockVault, Manage ONTAP, MetroCluster, MultiStore, NearStore, NetCache, NOW (NetApp on the Web), Onaro, OnCommand, ONTAPI, OpenKey, PerformanceStak, RAID-DP, ReplicatorX, SANscreen, SANshare, SANtricity, SecureAdmin, SecureShare, Select, Service Builder, Shadow Tape, Simplicity, Simulate ONTAP, SnapCopy, SnapDirector, SnapDrive, SnapFilter, SnapLock, SnapManager, SnapMigrator, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapSuite, SnapValidator, SnapVault, StorageGRID, StoreVault, the StoreVault logo, SyncMirror, Tech OnTap, The evolution of storage, Topio, vFiler, VFM, Virtual File Manager, VPolicy, WAFL, Web Filer, and XBB are trademarks or registered trademarks of NetApp, Inc. in the United States, other countries, or both. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. A complete and current list of other IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. Apple is a registered trademark and QuickTime is a trademark of Apple, Inc. in the U.S.A. and/or other countries. Microsoft is a registered trademark and Windows Media is a trademark of Microsoft Corporation in the U.S.A. and/or other countries. RealAudio, RealNetworks, RealPlayer, RealSystem, RealText, and RealVideo are registered trademarks and RealMedia, RealProxy, and SureStream are trademarks of RealNetworks, Inc. in the U.S.A. and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. NetApp, Inc. is a licensee of the CompactFlash and CF Logo trademarks. NetApp, Inc. NetCache is certified RealSystem compatible.

344 | Platform Monitoring Guide

345

How to send your comments


You can help us to improve the quality of our documentation by sending us your feedback. Your feedback is important in helping us to provide the most accurate and high-quality information. If you have suggestions for improving this document, send us your comments by e-mail to doccomments@netapp.com. To help us direct your comments to the correct division, include in the subject line the product name, version, and operating system. You can also contact us in the following ways: NetApp, Inc., 495 East Java Drive, Sunnyvale, CA 94089 Telephone: +1 (408) 822-6000 Fax: +1 (408) 822-4501 Support Telephone: +1 (888) 4-NETAPP

346 | Platform Monitoring Guide

Index | 347

Index
2240 systems power LED 39 chassis fault LED 39 controller activity LED 39 controller fault LED 40 Fibre Channel port LEDs 40 GbE port LEDs 40 internal FRU LEDs 45 LEDs on the back of the controller 40 LEDs on the front of the chassis 39 management port LEDs 40 mezzanine card 40 NVMEM LED 40 POST error messages 132 PSU LEDs 43 SAS port LEDs 40 3020 and 3050 systems POST error messages 120 3040 and 3070 systems POST error messages 124 30xx systems activity LED 45 Fibre Channel port LEDs 47 GbE port LEDs 47 LEDs on the back of the controller 47 LEDs on the front of the controller 45 power LED 45 PSU LEDs 48 RLM LEDs 47 status LED 45 31xx systems controller activity LED 49 Ethernet port LEDs 51 fan LED 52 fault LED 49, 51 Fibre Channel port LED 51 FRU LEDs 54 LEDs on the back of the controller 51 LEDs on the front of the chassis 49 POST error messages 124 power LED 49 PSU LEDs 53 32xx systems chassis fault LED 55 controller activity LED 55 controller fault LED 56 controller- I/O expansion module configuration 55 dual-controller configuration 55 fan LED 60 Fibre Channel port LEDs 56 GbE port LEDs 56 I/O expansion module fault LED 59 internal FRU LEDs 61 LEDs on the back of the controller 56 LEDs on the back of the I/O expansion module 59 LEDs on the front of the chassis 55 management port LEDs 56, 59 NVMEM LED 56 POST error messages 132 power LED 55 PSU LEDs 60 SAS port LEDs 56 60xx systems activity LED 62 fan LEDs 64 Fibre Channel port LEDs 63 GbE port LEDs 63 LEDs on the back of the controller 63 LEDs on the front of the controller 62 POST error messages 124 power LED 62 PSU LEDs 65 RLM LEDs 63 status LED 62 62xx systems 10-GbE port LEDs 68 8-Gb Fibre Channel port LEDs 68 chassis fault LED 66 console port 68 controller activity LED 66 controller fault LED 68 controller-I/O expansion module configuration 66 dual-controller configuration 66 fan LEDs 73 GbE port LEDs 68 I/O expansion module fault LED 72 internal FRU LEDs 74 LEDs on the back of the controller 68 LEDs on the back of the I/O expansion module 72 POST error messages 132 power LED 66 private management port LEDs 68, 72

348 | Platform Monitoring Guide


PSU LEDs 73 remote management port LEDs 68 USB port 68 No /etc/rc 157 No /etc/rc, running setup 158 No disk controllers 158 No disks 158 No network interfaces 158 No NVRAM present 159 NVRAM #n downrev 159 NVRAM:wrong pci slot 159 Panic:DIMM slot #n has uncorrectable ECC errors 159 This platform is not supported on this release 159 Too many errors in too short time 160 Warning:Motherboard Revision not available 160 Warning:Motherboard Serial Number not available 160 Warning:system serial number is not available 160 Watchdog error 160 Watchdog failed 161

A
AutoSupport messages 28

B
BIOS and boot loader progress Method of viewing progress on the console 118 method of viewing progress through the Bios Status sensor 119 BMC e-mail contents 301 function 301 how and when e-mail AutoSupport messages are sent 301 systems containing the 301 BMC-generated messages BMC_ASUP_UNKNOWN 302 REBOOT (abnormal) 302 REBOOT (power loss) 302 REBOOT (watchdog reset) 302 SYSTEM_BOOT_FAILED (POST failed) 302 SYSTEM_POWER_OFF (environment) 303 USER_TRIGGERED (bmc test 303 USER_TRIGGERED (system nmi) 303 USER_TRIGGERED (system power cycle) 303 USER_TRIGGERED (system power off) 303 USER_TRIGGERED (system power on) 304 USER_TRIGGERED (system power soft-off) 304 USER_TRIGGERED (system reset) 304 Boot error messages Boot device err 154 Cannot initialize labels 154 Cannot read labels 154 Configuration exceeds max PCI space 154 DIMM slot # has correctable ECC errors 155 Dirty shutdown in degraded mode 155 Disk label processing failed 155 Drive %s.%d not supported 155 Error detection detected too many errors to analyze at once 156 FC-AL loop down, adapter %d 156 File system may be scrambled 156 Halted disk firmware too old 157 Halted:Illegal configuration 157 Invalid PCI card slot %d 157

C
C1300 NetCache appliances POST error messages 140 C2300 and C3300 NetCache appliances activity LED 45 Fibre Channel port LEDs 47 GbE port LEDs 47 LEDs on the back of the controller 47 LEDs on the front of the controller 45 POST error messages 120 power LED 45 PSU LEDs 48 RLM LEDs 47 status LED 45

D
degraded power, possible cause and remedy 163 diagnostics, forms and use 28

E
EMS messages what information they provide 163 EMS messages about the BMC bmc.asup.crit 304 bmc.asup.error 305 bmc.asup.init 305 bmc.asup.queue 305

Index | 349
bmc.asup.send 305 bmc.asup.smtp 306 bmc.batt.id 306 bmc.batt.invalid 306 bmc.batt.mfg 306 bmc.batt.rev 307 bmc.batt.seal 307 bmc.batt.unknown 307 bmc.batt.unseal 307 bmc.batt.upgrade 307 bmc.batt.upgrade.busy 308 bmc.batt.upgrade.failed 308 bmc.batt.upgrade.failure 308 bmc.batt.upgrade.ok 309 bmc.batt.upgrade.power-off 309 bmc.batt.upgrade.voltagelow 309 bmc.batt.voltage 309 bmc.config.asup.off 310 bmc.config.corrupted 310 bmc.config.default 310 bmc.config.default.pef.filter 310 bmc.config.default.pef.policy 311 bmc.config.fru.systemserial 311 bmc.config.mac.error 311 bmc.config.net.error 311 bmc.config.upgrade 312 bmc.power.on.auto 312 bmc.reset.ext 312 bmc.reset.int 312 bmc.reset.power 312 bmc.reset.repair 313 bmc.reset.unknown 313 bmc.sensor.batt.charger.off 313 bmc.sensor.batt.charger.on 313 bmc.sensor.batt.time.run.invalid 313 bmc.ssh.key.missing 314 EMS messages about the RLM rlm.driver.hourly.stats 291 rlm.driver.mailhost 292 rlm.driver.network.failure 292 rlm.driver.timeout 292 rlm.firmware.update.failed 293 rlm.firmware.upgrade.reqd 293 rlm.firmware.version.unsupported 294 rlm.heartbeat.bootFromBackup 294 rlm.heartbeat.resumed 294 rlm.heartbeat.stopped 295 rlm.network.link.down 295 rlm.notConfigured 296 rlm.orftp.failed 296 rlm.snmp.traps.off 297 rlm.systemDown.alert 297 rlm.systemDown.notice 297 rlm.systemDown.warning 298 rlm.systemPeriodic.keepAlive 298 rlm.systemTest.notice 299 rlm.userlist.update.failed 299 EMS messages about the SP sp.network.link.down 320 sp.notConfigured 320 sp.firmware.upgrade.reqd 318 sp.firmware.version.unsupported 319 sp.heartbeat.resumed 319 sp.heartbeat.stopped 319 sp.orftp.failed 321 sp.snmp.traps.off 321 sp.userlist.update.failed 321 spmgmt.driver.hourly.stats 322 spmgmt.driver.mailhost 323 spmgmt.driver.network.failure 323 spmgmt.driver.timeout 323 environmental EMS messages nvmem.battery.capacity.low 180 nvram.battery.capacity.low 185 nvram.battery.capacity.low.critical 185 nvram.battery.capacity.low.warn 185 nvram.battery.capacity.normal 185 nvram.battery.charging.nocharge 186 nvram.battery.charging.normal 186 nvram.battery.charging.wrongcharge 186 nvram.battery.current.high.warn 187 nvram.battery.current.low 187 nvram.battery.current.low.warn 187 nvram.battery.current.normal 188 nvram.battery.end_of_life.high 188 nvram.battery.fault 188 nvram.battery.fault.warn 189 nvram.battery.fcc.low 189 nvram.battery.fcc.low.critical 189 nvram.battery.fcc.low.warn 189 nvram.battery.fcc.normal 190 nvram.battery.power.fault 190 nvram.battery.power.normal 190 nvram.battery.sensor.unreadable 190 nvram.battery.temp.high.warn 191 nvram.battery.temp.low 191 nvram.battery.temp.normal 192 nvram.battery.voltage.high 192 nvram.battery.voltage.high.warn 192 nvram.battery.voltage.low 193

350 | Platform Monitoring Guide


nvram.battery.voltage.low.warn 193 nvram.battery.voltage.normal 193 Chassis fan FRU failed 163 Chassis over temperature on XXXX 164 Chassis over temperature shutdown on XXXX 164 Chassis Power Degraded:3.3V in warn high state 164 Chassis power degraded:PS# 165 Chassis Power Fail:PS# 165 Chassis Power Shutdown 165 Chassis power shutdown:3.3V is in warn low state 166 Chassis power supply degraded:PS# 167 Chassis power supply fail:PS# 167 Chassis power supply off:PS# 167, 168 Chassis power supply OK:PS# 168 Chassis power supply removed:PS# 168 Chassis Power Supply:PS# removed 166 Chassis under temperature on XXXX 169 Chassis under temperature shutdown on XXXX 169 Fan:# is spinning below tolerable speed 169 monitor.chassisFan.degraded 170 monitor.chassisFan.ok 170 monitor.chassisFan.removed 170 monitor.chassisFan.slow 170 monitor.chassisFan.stop 171 monitor.chassisFan.warning 171 monitor.chassisFanFail.xMinShutdown 171 monitor.chassisPower.degraded 171 monitor.chassisPower.ok 172 monitor.chassisPowerSupplies.ok 172 monitor.chassisPowerSupply.degraded 172 monitor.chassisPowerSupply.notPresent 172 monitor.chassisPowerSupply.off 173 monitor.chassisPowerSupply.ok 173 monitor.chassisTemperature.cool 173 monitor.chassisTemperature.ok 173 monitor.chassisTemperature.warm 173 monitor.cpuFan.degraded 174 monitor.cpuFan.failed 174 monitor.cpuFan.ok 174 monitor.ioexpansion.unpresent 176 monitor.ioexpansionPower.degraded 175 monitor.ioexpansionPower.ok 175 monitor.ioexpansionTemperature.cool 175 monitor.ioexpansionTemperature.ok 175 monitor.ioexpansionTemperature.warm 176 monitor.nvmembattery.warninglow 176 monitor.nvramLowBattery 176 monitor.power.unreadable 177 monitor.shutdown.cancel 177 monitor.shutdown.cancel.nvramLowBattery 177 monitor.shutdown.chassisOverTemp 177 monitor.shutdown.chassisUnderTemp 178 monitor.shutdown.emergency 178 monitor.shutdown.ioexpansionOverTemp 178 monitor.shutdown.nvramLowBattery.pending 179 monitor.temp.unreadable 179 Multiple chassis fans have failed 179 Multiple fan failure on XXXX 180 Multiple power supply fans failed 180 nvmem.battery.capacity.low.warn 181 nvmem.battery.capacity.normal 181 nvmem.battery.current.high 181 nvmem.battery.current.high.warn 181 nvmem.battery.sensor.unreadable 182 nvmem.battery.temp.high 182 nvmem.battery.temp.low 182 nvmem.battery.temp.normal 183 nvmem.battery.voltage.high 183 nvmem.battery.voltage.high.warn 183 nvmem.battery.voltage.normal 183 nvmem.voltage.high 184 nvmem.voltage.high.warn 184 nvmem.voltage.normal 184 nvram.bat.missing.error 184 nvram.battery.current.high 186 nvram.battery.end_of_life.normal 188 nvram.battery.temp.high 191 nvram.hw.initFail 193

F
fas2020 FAS20xx systems controller module fault LED 35 controller module LEDs 33 Ethernet port LEDs 35 fault LED 33 Fibre Channel port LEDs 35 LEDs on the back of the controller module 35 LEDs on the front of the chassis 33 NVMEM LED 35 power LED 33 PSU LEDs 37 remote management port LEDs 35 startup progress, viewing 118 FCoE HBA EMS messages ispcna.mpi.dump.saved 282

Index | 351
ispcna.mpi.initFailed 283 ispcna.mpi.dump 282 Flash Cache module and PAM EMS messages fal.chan.online.write.warn 247 fmm.threshold.bank.degraded 248 fmm.threshold.card.degraded 249 fmm.threshold.core.offline 249 extCache.io.BlockChecksumError 242 extCache.io.cardError 242 extCache.io.readError 242 extCache.io.writeError 243 extCache.offline 243 extCache.ReconfigComplete 243 extCache.ReconfigFailed 243 extCache.ReconfigStart 244 extCache.UECCerror 244 extCache.UECCmax 244 fal.chan.offline.comp 245 fal.chan.online.erase.warn 245 fal.chan.online.fail 245 fal.chan.online.read.warn 245 fal.chan.online.rep.fail 246 fal.chan.online.rep.part 246 fal.chan.online.rep.succ 246 fal.chan.online.rep.ver.err 246 fal.init.failed 247 fmm.bad.block.detected 247 fmm.device.stats.missing 247 fmm.domain.card.failure 248 fmm.domain.core.failure 248 fmm.hourly.device.report 248 fmm.threshold.bank.offline 249 fmm.threshold.card.failure 249 iomem.bbm.bbtl.overflow 250 iomem.bbm.new.flash 250 iomem.card.disable 250 iomem.card.enable 251 iomem.card.fail.cecc 251 iomem.card.fail.data.crc 251 iomem.card.fail.desc.crc 251 iomem.card.fail.dimm 250, 252 iomem.card.fail.firmware.primary 252 iomem.card.fail.fpga 252 iomem.card.fail.fpga.primary 253 iomem.card.fail.fpga.rev 253 iomem.card.fail.internal 254 iomem.card.fail.pci 254 iomem.card.fail.uecc 254 iomem.dimm.log.checksum 255 iomem.dimm.log.init 255 iomem.dimm.log.read 255 iomem.dimm.log.sync 255 iomem.dimm.log.write 256 iomem.dimm.mismatch.banks 256 iomem.dimm.mismatch.burst 256 iomem.dimm.mismatch.casLatency 256 iomem.dimm.mismatch.columns 257 iomem.dimm.mismatch.dataWidth 257 iomem.dimm.mismatch.eccWidth 257 iomem.dimm.mismatch.ranks 257 iomem.dimm.mismatch.rows 258 iomem.dimm.mismatch.vendor 258 iomem.dimm.spd.banks 258 iomem.dimm.spd.burst 258 iomem.dimm.spd.casLatency 259 iomem.dimm.spd.checksum 259 iomem.dimm.spd.columns 259 iomem.dimm.spd.dataWidth 259 iomem.dimm.spd.detect 260 iomem.dimm.spd.eccWidth 260 iomem.dimm.spd.ranks 260 iomem.dimm.spd.read 260 iomem.dimm.spd.rows 261 iomem.dma.crc.data 261 iomem.dma.crc.desc 261 iomem.dma.internal 261 iomem.dma.stall 262 iomem.ecc.cecc 262 iomem.ecc.correct.off 262 iomem.ecc.correct.on 262 iomem.ecc.detect.off 263 iomem.ecc.detect.on 263 iomem.ecc.inject 263 iomem.ecc.summary 263 iomem.ecc.uecc 264 iomem.fail.stripe 264 iomem.firmware.package.access 264 iomem.firmware.primary 265 iomem.firmware.program.complete 265 iomem.firmware.program.fail 265 iomem.firmware.program.reboot 265 iomem.firmware.program.start 265 iomem.firmware.rev 266 iomem.flash.mismatch.id 266 iomem.fru.badInfo 266 iomem.fru.checksum 266 iomem.fru.read 267 iomem.fru.write 267 iomem.i2c.link.down 267 iomem.i2c.read.addrNACK 267

352 | Platform Monitoring Guide


iomem.i2c.read.dataNACK 268 iomem.i2c.read.timeout 268 iomem.i2c.write.addrNACK 268 iomem.i2c.write.dataNACK 268 iomem.i2c.write.timeout 269 iomem.init.detect.fpga 269 iomem.init.detect.pci 269 iomem.init.fail 269 iomem.memory.flash.syndrome 269 iomem.memory.none 270 iomem.memory.power.high 270 iomem.memory.power.low 270 iomem.memory.scrub.start 270 iomem.memory.size 271 iomem.memory.zero.complete 271 iomem.memory.zero.start 271 iomem.nor.op.failed 271 iomem.pci.error.config.bar 271 iomem.pio.op.failed 272 iomem.remap.block 272 iomem.remap.target.bad 272 iomem.temp.report 272 iomem.train.complete 273 iomem.train.fail 273 iomem.train.notReady 273 iomem.train.start 273 iomem.vmargin.high 274 iomem.vmargin.low 274 iomem.vmargin.nominal 274 message generation and reporting 242 monitor.extCache.failed 274 monitor.flexscale.noLicense 274

L
LEDs 2240 system internal FRU LEDs 45 2240 system LEDs on the back of the controller 40 2240 system LEDs on the front of the chassis 39 2240 system PSU LEDs 43 30xx system LEDs on the front of the controller 45 30xx system PSU LEDs 48 31xx system fan LEDs 52 31xx system FRU LEDs 54 31xx system LEDs on the back of the controller 51 31xx system LEDS on the front of the chassis 49 31xx system PSU LEDs 53 32xx system fan LEDs 60 32xx system internal FRU LEDs 61 32xx system LEDs on the back of the controller 56 32xx system LEDs on the back of the I/O expansion module 59 32xx system PSU LEDs 60 60xx system fan LEDs 64 60xx system LEDs on the back of the controller 63 60xx system LEDs on the front of the controller 62 60xx system PSU LEDs 65 62xx LEDs on the back of the controller 68 62xx PSU LEDs 73 62xx system fan LEDs 73 62xx system internal FRU LEDs 74 62xx system LEDs on front of chassis 66 62xx system LEDs on the back of the I/O expansion module 72 C2300 and C3300 NetCache appliance LEDs on the back of the controller 47 C2300 and C3300 NetCache appliance LEDs on the front of the controller 45 C2300 and C3300 NetCache appliance PSU LEDs 48 copper, iSCCI, target HBA 83 dual port, 8-Gb Fibre Channel Virtual Interface HBA 78 dual-port Fibre Channel HBA 75 dual-port GbE NICs 96, 97 dual-port, 10-Gb, FCoE unified target HBA 85 dual-port, 10GBase-CX4 TOE NICs 104 dual-port, 2-Gb VI-MetroCluster adapter 88 dual-port, 3-Gb SAS 87 dual-port, 4-Gb MetroCluster adapter 90 dual-port, 4-Gb, target-mode Fibre Channel HBA 76 dual-port, 8-Gb MetroCluster adapter 91

H
HBA LEDs dual port, 8-Gb Fibre Channel Virtual Interface HBA 78 dual-port Fibre Channel 75 dual-port, 10-Gb, FCoE unified target 85 dual-port, 3-Gb SAS 87 dual-port, 4-Gb, target-mode Fibre Channel 76 dual-port, 8-Gb, target-mode Fibre Channel 76 fiber-optic iSCSI target 82 quad-port, 4-Gb, Fibre Channel, 12-LED version 81 quad-port, 4-Gb, Fibre Channel, four-LED version 79 quad-port, 8-Gb SAS 87

Index | 353
dual-port, 8-Gb, target-mode Fibre Channel HBA 76 FAS20xx system LEDs on the back of the controller module 35 FAS20xx system LEDs on the front of the chassis 33 FAS30xx system LEDs on the back of the controller 47 fiber-optic iSCSI target HBA 82 Flash Cache module 115 HBA LEDs copper, iSCSI, target 83 multiport GbE NICs 99 NVRAM5 adapter 107 NVRAM5 and NVRAM6 media converter 109 NVRAM6 adapter 107 NVRAM7 adapter 108 NVRAM8 adapter 109 onboard drive failures, FAS20xx systems 33 Performance Acceleration Module (PAM) 115 PSU, FAS20xx systems 37 PSU, SA200 systems 37 quad-port TOE NICs 105 quad-port, 3-Gb SAS 87 quad-port, 4-Gb, Fibre Channel HBA, 12-LED version 81 quad-port, 4-Gb, Fibre Channel HBA, four-LED version 79 SA200 system LEDs on the back of the controller module 35 SA200 system LEDs on the front of the chassis 33 SA300 system LEDs on the back of the controller 47 SA300 system LEDs on the front of the controller 45 SA300 system PSU LEDs 48 SA320 system fan LEDs 60 SA320 system internal FRU LEDs 61 SA320 system LEDs on the back of the controller 56 SA320 system LEDs on the back of the I/O expansion module 59 SA320 system PSU LEDs 60 SA600 system fan LEDs 64 SA600 system LEDs on the back of the controller 63 SA600 system LEDs on the front of the controller 62 SA600 system PSU LEDs 65 SA620 LEDs on the back of the controller 68 SA620 PSU LEDs 73 SA620 system fan LEDs 73 SA620 system internal FRU LEDs 74 SA620 system LEDs on front of chassis 66 SA620 system LEDs on the back of the I/O expansion module 72 single-port GbE NICs 93 single-port GbE NICs, FAS2050 systems only 95 single-port TOE NICs 101

M
MetroCluster adapter LEDs dual-port, 2-Gb VI-MetroCluster adapter 88 dual-port, 4-Gb MetroCluster adapter 90 dual-port, 8-Gb MetroCluster adapter 91

N
NIC LEDs dual-port GbE 96, 97 multiport GbE 99 single-port GbE 93 single-port GbE, FAS2050 systems only 95 NVRAM5 adapter LEDs 107 which systems support the 106 NVRAM6 adapter LEDs 107 which systems support the 106 NVRAM7 adapter LEDs 108 which systems support the 106 NVRAM8 adapter destage status 109 HA pair 109 LEDs 109 which systems support the 106

O
operational error messages Disk hung during swap 283 Disk n is broken 284 Dumping core 284 Error dumping core 284 FC-AL LINK_FAILURE 284 FC-AL RECOVERABLE ERRORS 284 Panicking 285

354 | Platform Monitoring Guide


RMC Alert:Boot Error 285 RMC Alert:Down Appliance 285 RMC Alert:OFW POST Error 285 when they appear 163 Autoboot of backup image aborted 121 Autoboot of backup image failed 122 Autoboot of primary image aborted 122 Autoboot of primary image failed 122 Invalid FRU EEPROM Checksum 123 Memory init failure 123 No Memory found 123 Unsupported system bus speed 124 POST error messages, 3040, 3070, and SA300 systems 0200:Failure Fixed Disk 124 0230:System RAM Failed at offset: 125 0231:Shadow RAM failed at offset 125 0232:Extended RAM failed at address line 125, 130 0235:Multiple-bit ECC error occurred 126 023C:Bad DIMM found in slot # 126 023E:Node Memory Interleaving disabled 127 0241:Agent Read Timeout 127 0242:Invalid FRU information 128 0250:System battery is dead 128 0251:System CMOS checksum bad 128 0253:Clear CMOS jumper detected 129 0260:System timer error 129 0280:Previous boot incomplete 129 02C2:No valid Boot Loader in System FlashNon Fatal 129 02C3:No valid Boot Loader in System FlashFatal 130 02FA:Watchdog Timer Reboot (PciInit) 131 02FB:Watchdog Timer Reboot (MemTest) 131 02FC:LDTStop Reboot (HTLinkInit) 131 No message on console 132 POST error messages, 31xx systems 0200:Failure Fixed Disk 124 0230:System RAM Failed at offset: 125 0231:Shadow RAM failed at offset 125 0232:Extended RAM failed at address line 125 0235:Multiple-bit ECC error occurred 126 023C:Bad DIMM found in slot # 126 023E:Node Memory Interleaving disabled 127 0241:Agent Read Timeout 127 0242:Invalid FRU information 128 0250:System battery is dead 128 0251:System CMOS checksum bad 128 0253:Clear CMOS jumper detected 129 0260:System timer error 129 0280:Previous boot incomplete 129 02C2:No valid Boot Loader in System FlashNon Fatal 129

P
POST error messages, 2240 systems 0200:Failure Fixed Disk 132 0230:System RAM Failed at offset: 133 0232:Extended RAM Failed at address line: 133 0250:System battery is dead - Replace and run SETUP 135 0260:System timer error 135 0271:Check date and time settings 136 02A2:BMC System Error Log (SEL) Full 136 02C3:No valid Boot Loader in System Flash - Fatal 138 Fatal Error:No DIMM detected and system can not continue boot! 138 Fatal Error! All channels are disabled! 139 Fatal Error! UDIMM in 3rd slot is not supported! 139 No Response to Controller FRU ID Read Request via IPMI 137 SP FRU Entry is Blank or Checksum Error 137 0231:Shadow RAM Failed at offset: 133 0251:System CMOS checksum bad 135 0280:Previous boot incomplete - Default configuration used 136 02A1:SP Not Found 136 02C2:No valid Boot Loader in System Flash - Non Fatal 137 BIOS detected pattern write/read mismatch in DIMM slot: 134 BIOS detected uncorrectable ECC error in DIMM slot: 133 BIOS detected unknown errors in DIMM slot 134 BIOS detected unknown errors in DIMM slot: 134 Fatal Error! RDIMMs and UDIMMs are mixed! 139 No message on the console 133 No Response to Midplane FRU ID Read Request via IPMI 137 POST error messages, 3020 and 3050 systems Abort AutobootPOST Failure(s):CPU 120 Abort AutobootPOST Failure(s):MEMORY 121 Abort AutobootPOST Failure(s):RTC, RTC_IO 121 Abort AutobootPOST Failure(s):UCODE 121

Index | 355
02C3:No valid Boot Loader in System FlashFatal 130 02FA:Watchdog Timer Reboot (PciInit) 131 02FB:Watchdog Timer Reboot (MemTest) 131 02FC:LDTStop Reboot (HTLinkInit) 131 No message on console 132 POST error messages, 32xx and SA320 systems 023A:ONTAP Detected Bad DIMM in slot: 134 023B:BIOS detected SPD checksum error in DIMM slot: 134 0280:Previous boot incomplete - Default configuration used 136 BIOS detected pattern write/read mismatch in DIMM slot: 134 Fatal Error:No DIMM detected and system can not continue boot! 138 Fatal Error! All channels are disabled! 139 Fatal Error! All DIMM failed and system can not continue boot! 140 Software memory test failed! 139 0200:Failure Fixed Disk 132 0230:System RAM Failed at offset: 133 0231:Shadow RAM Failed at offset: 133 0232:Extended RAM Failed at address line: 133 0241:SMBus Read Timeout 135 0242:Invalid FRU information 135 0250:System battery is dead - Replace and run SETUP 135 0251:System CMOS checksum bad 135 0260:System timer error 135 0271:Check date and time settings 136 02A2:BMC System Error Log (SEL) Full 136 02A3:No Response From SP To FRU ID Read Request 137 02C2:No valid Boot Loader in System Flash - Non Fatal 137 02C3:No valid Boot Loader in System Flash - Fatal 138 BIOS detected uncorrectable ECC error in DIMM slot: 133 BIOS detected unknown errors in DIMM slot 134 Fatal Error! RDIMMs and UDIMMs are mixed! 139 Fatal Error! UDIMM in 3rd slot is not supported! 139 No message on the console 133 POST error messages, 60xx and SA600 systems 0200:Failure Fixed Disk 124 0230:System RAM Failed at offset: 125 0231:Shadow RAM failed at offset 125 0232:Extended RAM failed at address line 125 0235:Multiple-bit ECC error occurred 126 023C:Bad DIMM found in slot # 126 023E:Node Memory Interleaving disabled 127 0241:Agent Read Timeout 127 0242:Invalid FRU information 128 0250:System battery is dead 128 0251:System CMOS checksum bad 128 0253:Clear CMOS jumper detected 129 0260:System timer error 129 0280:Previous boot incomplete 129 02C2:No valid Boot Loader in System FlashNon Fatal 129 02C3:No valid Boot Loader in System FlashFatal 130 02F9:FGPA jumper detected 130 02FA:Watchdog Timer Reboot (PciInit) 131 02FB:Watchdog Timer Reboot (MemTest) 131 02FC:LDTStop Reboot (HTLinkInit) 131 No message on console 132 POST error messages, 62xx and SA620 systems 023A:ONTAP Detected Bad DIMM in slot:: 134 023B:BIOS detected SPD checksum error in DIMM slot: 134 0271:Check date and time settings 136 0280:Previous boot incomplete - Default configuration used 136 Fatal Error:No DIMM detected and system can not continue boot! 138 Fatal Error! All channels are disabled! 139 Fatal Error! All DIMM failed and system can not continue boot! 140 Software memory test failed! 139 0200:Failure Fixed Disk 132 0230:System RAM Failed at offset: 133 0231:Shadow RAM Failed at offset: 133 0232:Extended RAM Failed at address line: 133 0241:SMBus Read Timeout 135 0242:Invalid FRU information 135 0250:System battery is dead - Replace and run SETUP 135 0251:System CMOS checksum bad 135 0260:System timer error 135 02A2:BMC System Error Log (SEL) Full 136 02A3:No Response From SP To FRU ID Read Request 137 02C2:No valid Boot Loader in System Flash - Non Fatal 137 02C3:No valid Boot Loader in System Flash - Fatal 138

356 | Platform Monitoring Guide


BIOS detected pattern write/read mismatch in DIMM slot: 134 BIOS detected uncorrectable ECC error in DIMM slot: 133 BIOS detected unknown errors in DIMM slot 134 Fatal Error! RDIMMs and UDIMMs are mixed! 139 Fatal Error! UDIMM in 3rd slot is not supported! 139 No message on the console 133 POST error messages, C1300 NetCache appliances 8042-gate A20 failure 140 A:drive failure 140 B:drive failure 141 base 64KB memory failure 141 Boot failure 141 BootSector write!! 142 Cache error/external cache bad 142 Checking NVRAM...update failed 142 CMOS battery low 142 CMOS checksum bad 143 CMOS date/time not set 143 CMOS settings wrong 143 CMOS shutdown register read/write error 143 display memory read/write error 144 DMA-2 error 144 DMA-controller error 144 Drive not ready 145 Gate20 error 145 Insert BOOT diskette in A 145 Interrupt controller-N error 145 Invalid boot diskette 146 Keyboard error 146 Keyboard/interface error 146 Microcode error 147 Multi-bit ECC error 147 NVRAM bad 147 NVRAM checksum bad 147 NVRAM cleared 148 NVRAM ignored 148 parity error (beep code) 148 Parity error (no beep code) 149 PCI I/O conflict 149 PCI IRQ conflict 149 PCI IRQ routing table error 149 PCI ROM conflict 150 processor error 150 processor exception interrupt error 150 Reboot and select proper boot device... 151 refresh failure 151 Resource conflict 151 ROM checksum error 152 Static resource conflict 152 System halted 152 Timer error 152 timer not operational 153 VIRUS:continue (y/n) 153 X hard disk error 153 PSU LEDs 2240 systems 43 30xx systems 48 31xx systems 53 32xx systems 60 60xx systems 65 62xx systems 73 C2300 and C3300 NetCache appliances 48 FAS20xx systems 37 SA200 systems 37 SA300 systems 48 SA320 systems 60 SA600 systems 65 SA620 systems 73

R
RLM AutoSupport e-mail contents 288 types of messages 287 when AutoSupport messages are sent 287 when RLM EMS messages are sent 288 RLM-generated messages Heartbeat loss warning 288 Reboot (power loss) critical 289 Reboot (watchdog reset) warning 289 Reboot warning 289 RLM heartbeat loss 289 RLM heartbeat stopped 290 System boot failed (POST failed) 290 User triggered (RLM test) 290 User_triggered (system nmi) 290 User_triggered (system power cycle) 290 User_triggered (system power off) 291 User_triggered (system power on) 291 User_triggered (system reset) 291

S
SA200 systems controller module fault LED 35 controller module LEDs 33

Index | 357
Ethernet port LEDs 35 fault LED 33 Fibre Channel port LEDs 35 LEDs on the back of the controller module 35 LEDs on the front of the chassis 33 NVMEM LED 35 power LED 33 PSU LEDs 37 remote management port LEDs 35 startup progress, viewing 118 SA300 systems activity LED 45 Fibre Channel port LEDs 47 GbE port LEDs 47 LEDs on the back of the controller 47 LEDs on the front of the controller 45 POST error messages 124 power LED 45 PSU LEDs 48 RLM LEDs 47 status LED 45 SA320 systems chassis fault LED 55 controller activity LED 55 controller fault LED 56 controller- I/O expansion module configuration 55 dual-controller configuration 55 fan LED 60 Fibre Channel port LEDs 56 GbE port LEDs 56 I/O expansion module fault LED 59 internal FRU LEDs 61 LEDs on the back of the controller 56 LEDs on the back of the I/O expansion module 59 LEDs on the front of the chassis 55 management port LEDs 56, 59 NVMEM LED 56 POST error messages 132 power LED 55 PSU LEDs 60 SAS port LEDs 56 SA600 systems activity LED 62 fan LEDs 64 Fibre Channel port LEDs 63 GbE port LEDs 63 LEDs on the back of the controller 63 LEDs on the front of the controller 62 POST error messages 124 power LED 62 PSU LEDs 65 RLM LEDs 63 status LED 62 SA620 systems 10-GbE port LEDs 68 8-Gb Fibre Channel port LEDs 68 chassis fault LED 66 console port 68 controller activity LED 66 controller fault LED 68 controller-I/O expansion module configuration 66 dual-controller configuration 66 fan LED 73 GbE port LEDs 68 I/O expansion module fault LED 72 internal FRU LEDs 74 LEDs on the back of the controller 68 LEDs on the back of the I/O expansion module 72 POST error messages 132 power LED 66 private management port LEDs 68, 72 PSU LEDs 73 remote management port LEDs 68 USB port 68 SAS EMS messages ds.sas.config.warning 194 ds.sas.crc.err 194 ds.sas.drivephy.disableErr 194 ds.sas.element.fault 195 ds.sas.element.xport.error 195 ds.sas.hostphy.disableErr 196 ds.sas.invalid.word 196 ds.sas.loss.dword 196 ds.sas.multPhys.disableErr 197 ds.sas.phyRstProb 197 ds.sas.running.disparity 197 ds.sas.ses.disableErr 198 ds.sas.xfer.element.fault 198 ds.sas.xfer.export.error 198 ds.sas.xfer.not.sent 199 ds.sas.xfer.unknown.error 199 sas.adapter.bad 200 sas.adapter.bootarg.option 200 sas.adapter.debug 200 sas.adapter.exception 200 sas.adapter.failed 201 sas.adapter.firmware.down load 201 sas.adapter.firmware.fault 201 sas.adapter.firmware.update.failed 201 sas.adapter.not.ready 202

358 | Platform Monitoring Guide


sas.adapter.offline 202 sas.adapter.offlining 202 sas.adapter.online 203 sas.adapter.online.failed 203 sas.adapter.onlining 203 sas.adapter.reset 203 sas.adapter.unexpected.status 204 sas.cable.error 204 sas.cable.pulled 204 sas.cable.pushed 204 sas.config.mixed.detected 205 sas.device.invalid.wwn 205 sas.device.quiesce 205 sas.device.resetting 206 sas.device.timeout 206 sas.initialization.failed 207 sas.link.error 207 sas.port.disabled 207 sas.port.down 207 sas.shelf.conflict 208 sasmon.adapter.phy.disable 208 sasmon.adapter.phy.event 209 sasmon.disable.module 209 SAS HBAs dual-port, 3-Gb SAS HBA ports and cable 87 quad-port, 3-Gb SAS HBA ports and cable 87 SES EMS messages ses.shelf.unsupportAllowErr 224 ses.access.noEnclServ 209 ses.access.noMoreValidPaths 210 ses.access.noShelfSES 211 ses.access.sesUnavailable 211 ses.badShareStorageConfigErr 212 ses.bridge.fw.getFailWarn 212 ses.bridge.fw.mmErr 212 ses.channel.rescanInitiated 213 ses.config.drivePopError 213 ses.config.IllegalEsh270 213 ses.config.shelfMixError 214 ses.config.shelfPopError 214 ses.disk.configOk 214 ses.disk.illegalConfigWarn 214 ses.disk.pctl.timeout 213, 215 ses.download.powerCyclingChannel 215 ses.download.shelfToReboot 215 ses.download.suspendIOForPowerCycle 215 ses.drive.PossShelfAddr 216 ses.drive.shelfAddr.mm 216 ses.exceptionShelfLog 217 ses.extendedShelfLog 217 ses.fw.emptyFile 218 ses.fw.resourceNotAvailable 218 ses.giveback.restartAfter 218 ses.giveback.wait 218 ses.psu.coolingReqError 219 ses.psu.powerReqErrorr 219 ses.remote.configPageError 219 ses.remote.elemDescPageError 220 ses.remote.faultLedError 220 ses.remote.flashLedError 220 ses.remote.shelfListError 220 ses.remote.statPageError 220 ses.shelf.changedID 221 ses.shelf.ctrlFailErr 221 ses.shelf.em.ctrlFailErr 222 ses.shelf.IdBasedAddr 222 ses.shelf.invalNum 222 ses.shelf.mmErr 223 ses.shelf.OSmmErr 223 ses.shelf.powercycle.done 223 ses.shelf.powercycle.start 223 ses.shelf.sameNumReassign 224 ses.shelf.unsupportedErr 224 ses.startTempOwnership 225 ses.status.ATFCXError 225 ses.status.ATFCXInfo 225 ses.status.currentError 225 ses.status.currentInfo 226 ses.status.currentWarning 226 ses.status.displayError 226 ses.status.displayInfo 227 ses.status.displayWarning 227 ses.status.driveError 227 ses.status.driveOk 228 ses.status.driveWarning 228 ses.status.electronicsError 228 ses.status.electronicsInfo 229 ses.status.electronicsWarn 229 ses.status.ESHPctlStatus 229 ses.status.fanError 229 ses.status.fanInfo 230 ses.status.fanWarning 230 ses.status.ModuleError 230 ses.status.ModuleInfo 230 ses.status.ModuleWarn 231 ses.status.psError 231 ses.status.psInfo 231 ses.status.psWarning 232 ses.status.temperatureError 232 ses.status.temperatureInfo 233

Index | 359
ses.status.temperatureWarning 233 ses.status.upsError 233 ses.status.upsInfo 234 ses.status.volError 234 ses.status.volWarning 234 ses.system.em.mmErr 235 ses.tempOwnershipDone 235 sfu.adapterSuspendIO 235 sfu.auto.update.off.impact 235 sfu.ctrllerElmntsPerShelf 236 sfu.downloadCtrllerBridge 236 sfu.downloadError 236 sfu.downloadingController 236 sfu.downloadingCtrllerR1XX 237 sfu.downloadStarted 237 sfu.downloadSuccess 237 sfu.downloadSummary 237 sfu.downloadSummaryErrors 237 sfu.FCDownloadFailed 238 sfu.firmwareDownrev 238 sfu.firmwareUpToDate 238 sfu.partnerInaccessible 239 sfu.partnerNotResponding 239 sfu.partnerRefusedUpdate 239 sfu.partnerUpdateComplete 239 sfu.partnerUpdateTimeout 240 sfu.rebootRequest 240 sfu.rebootRequestFailure 240 sfu.resumeDiskIO 240 sfu.SASDownloadFailed 241 sfu.statusCheckFailure 241 sfu.suspendDiskIO 241 sfu.suspendSES 241 SP AutoSupport e-mail contents 316 EMS messages about the SP 318 SP-generated AutoSupport messages 316 types of messages 315 when AutoSupport messages are sent 315 when SP EMS messages are sent 316 SP-generated messages HEARTBEAT_LOSS 316 REBOOT (abnormal) 317 SYSTEM_BOOT_FAILED (post failed) 317 USER_TRIGGERED (sp test) 317 USER_TRIGGERED (system nmi) 317 USER_TRIGGERED (system power cycle) 318 USER_TRIGGERED (system power off) 318 USER_TRIGGERED (system reset) 318 startup error messages boot messages 118 POST messages 117 types of 117

T
TOE NIC LEDs dual-port, 10GBase-CX4 104 quad-port 105 single-port 101 Troubleshooting How AutoSupport messages help with troubleshooting 28 sources of 27 Where LEDs appear 27 where messages are displayed 27 where to find documentation 29

U
USB boot device EMS messages 275 USB EMS messages usb.adapter.debug 275 usb.adapter.exception 275 usb.adapter.failed 275 usb.adapter.reset 276 usb.device.failed 276 usb.device.initialize.failed 276 usb.device.maximum.connected 277 usb.device.protocol.mismatch 277 usb.device.removed 278 usb.device.timeout 278 usb.device.unsupported 278 usb.device.unsupported.speed 279 usb.external.device.not.used 279 usb.externalHub.notSupported 279 usb.port.error 279 usb.port.reset 280 usb.port.state.indeterminate 280 usb.port.status.inconsistent 280 usbmon.boot.device.failed 281 usbmon.boot.device.pfa 281 usbmon.disable.module 281 usbmon.unable.to.monitor 282

Você também pode gostar