Você está na página 1de 2

How to clear faults in FMA after component replacement on Sun Fire[TM] servers.

Solaris[TM] 10 FMD (Fault Management Daemon) reports a failure or suspect component (called a FRU). The component is replaced but it may still reported as faulty or suspect in fmdadm output for Solaris 10 or the system still prints a self-healing message during boot.

Cause
There are three cases in which you have to clear the fault manually: 1. The component has no fruid/serial number support (e.g. PCI cards) 2. The fruid/serial number support of the given platform wasn't implemented into fma for this part (e.g Sun Fire 3800 - Sun Fire[TM] E25k and memory) 3. A self-healing message is printed during boot even though the fmadm faulty list is empty (caused by CR 6369961 fmd emits identical diagnosis after repair when case was never closed).

Solution
Procedure: As the root user on the domain in question, run the following commands:

fmadm faulty o This will display a list of components and their associated resource/uuid's that are categorized as faulty or degraded. o The resource/uuid is required in order to clear the fault tags. fmadm repair o This will clear the suspect or fault tags associated with the resource/uuid's in the faulty list.

The following is an example of how to clear the fault tags on a Host Bus Adapter (HBA) in a Sun Fire[TM] 6800 that has been replaced but is still reporting in FMA as degraded:
# fmadm faulty STATE RESOURCE / UUID -------- --------------------------------------------------------------------degraded dev:////ssm@0,0/pci@19,700000 47b86ff0-6743-ceff-ba0d-b452d09b0b65 -------- --------------------------------------------------------------------degraded dev:////ssm@0,0/pci@19,700000/lpfc@1 47b86ff0-6743-ceff-ba0d-b452d09b0b65 -------- --------------------------------------------------------------------degraded mod:///mod-name=lpfc/mod-id=54 47b86ff0-6743-ceff-ba0db452d09b0b65

-------- --------------------------------------------------------------------degraded mod:///mod-name=pcisch/mod-id=25 47b86ff0-6743-ceff-ba0db452d09b0b65 -------- ---------------------------------------------------------------------

NOTE: Once you see the faulty components, run the fmadm repair command to clear the fault.
# fmadm repair dev:////ssm@0,0/pci@19,700000

NOTE: After you have run the repair command on each component that has been replaced, rerun the fmadm faulty command to ensure that the fault has been cleared. If there are no faults, you will not see any output other than the column headings:
# fmadm faulty STATE RESOURCE / UUID -------- --------------------------------------------------------------------#

Você também pode gostar