Você está na página 1de 49

OptiX SDH System Troubleshooting Methods

www.huawei.com

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Objectives

Upon completion of this course, you will be able to:

List the common analysis methods of fault locating. Outline the Fault Handling Flow. Analyze the typical faults: traffic interruption, error bit, etc.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page2

Contents
1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page3

Contents
1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page4

Requirements for Maintenance Staff-I


Professional Skills

Be familiar with hardware system and SDH fundamental,

Be familiar with alarm generation mechanism and signal flow in transmission system

Be familiar with the basic maintenance instruments and tools

Familiar with the network under maintenance


Page5 Network topology, network protection, traffic

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Requirements for Maintenance Staff-II


Professional Skills

Be familiar with common alarms

SDH line alarms (R_LOS, R_LOF, R_OOF, AU_AIS, AU_LOP, MS_AIS, MS_RDI, B1_EXC, B2_EXC, HP_LOM, HP_SLM, HP_TIM, HP_UNEQ);

PDH tributary alarms (TU_AIS, TU_LOP, T_ALOS, T_DLOS, P_LOS, EXT_LOS, UP_E1_AIS, LP_RDI, LP_SLM, LP_TIM, LP_UNEQ, B3_EXC);

Protection switching alarms (PS); Clock alarms (LTI, SYNC_C_LOS , SYN_BAD); Equipment alarms (POWER_FAIL, FAN_FAIL, BD_STATUS).

Collect and save on-site data

System alarms, performance events data, configurations, operation records of NMS


Page6

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Fault Handling Flow


Flow Chart
Start Record fault trace External cause? No Analyze the fault to locate it Fault removed? No Report the fault to Huawei Yes Yes Other handling flows

Continue 1
Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page7

Continue 2

Fault Handling Flow - cont.


Flow Chart
Continue 1 Make solution together Try the solution No Service recovered? Yes Observe service running Fault removed? Yes Archive the fault handling report End
Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page8

Continue 2

No

Contents
1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page9

Question

What is the key for troubleshooting ?

To locate a failure ACCURATELY in one station

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page10

How to Locate a Fault?


Basic Principles of Fault Localization

External first, then transmission


Broken fiber, switch failure Power failure, grounding

Network first, then network elements

Try your best to locate the troubles to one LU alarms can lead to TU alarms
Higher-severity alarms first, then Lower-severity alarms

node LU first, then TU

First analyze critical/major alarms.

Then come to minor/warning alarms.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page11

Common Methods of Fault Localization


Keys of Fault Localization
1 Alarm and performance analysis 2 Loopback 3 Replacement 4 Configuration Data Analysis 5 Configuration Modification 6 Test with instruments 7 Experience

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page12

Evaluate Whole Network


Use NMS

Alarm and Performance Analysis-I


How to obtain alarms and performance? Observe indicators on boards and cabinets

Comprehensive All alarms/performance events from the whole network Accurate Current alarms, history alarms, occurrence time and performance event data can be queried.

Not detailed No history alarms

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page13

Alarm and Performance Analysis-II


Main Steps

Obtain alarm and performance events

Select the key alarm or performance events

Analyze reasons

Limit the troubles to a certain range or a node

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page14

Alarm and Performance Analysis-III


Case
R-LOS

LP-RDI

MS-RDI

TU-AIS

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page15

Loopback
What is Loopback?

Loopback is the most common, most efficient method in troubleshooting.


Inloop Inloop SDH equipment outloop outloop Tributary Inloop
Line

Line

outloop

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page16

Loopback
Where Do We Loop?
Board Loopback Loopback Loopback involved options tools level
Tributary Inloop/outloop Loopback board cable, NMS

Application

Loopback at Separate switching faults from path level transmission faults. Determine the tributary board failure roughly. Be unnecessary to modify service configuration.

Line board Inloop/outloop Patch fiber, Loopback by Locate single station faults. Roughly NMS optical determines the line board failure. Be interface no need to modify service Software loopback is NOT an absolute method, why? configuration.

Notes

May interrupt the traffic and ECC Will automatically be removed in 5 minutes (provisional)

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page17

Loopback
Procedure

Select one NE from several faulty NEs; Choose one affected traffic path from the selected faulty NE; Draw the traffic flow diagram (source, sink, pass through); Connect testing devices; Check alarms. : w w :

: e w :

t :
Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page18

t :

Replacement
When to Use?

Objective
Fiber Cable Board Modules

Application
External faults Board faults

MSP switch SNCP switch Active/standby XC switch TPS switch


Page19

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Configuration Data Analysis


Query & Analyze the Configuration

Timeslot configuration J1 or C2 bytes LU and TU paths loopback SNCP or MSP switching conditions External commands (e.g. locked switch) Consistent Configuration in both NMS and NEs

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page20

Configuration Modification
Fast Solution

Objective
Port Timeslot Sub-rack Slots

Application Examples
No spare boards Restore the traffic temporarily

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page21

Testing Instrument
Accurate Judgments

Instrument
Bit error testing device Optical power meter SDH analyzer

Test item
Bit error/traffic Optical power Bit error/traffic/overhead bytes

Multi-meter

Voltage/current/resistance

This method is the most reliable one, but we must have the devices in hand.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page22

Experience
Rule of Thumb

Reset board Power off and on Resend the configuration


not consider them as cure-all. are not helpful for us to find the real cause of

Do

They

the failure.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page23

Summary
Methods Alarm and performance analysis Loopback Application Universal Features 1. Evaluate the whole network situation. 2. Locate the faulty point preliminarily based on the collected data. 3. Cause no negative effect on normal services 4. Depend on the NMS

Locate the fault to 1. Independent of alarm and performance event a single station or analysis board 2. Rapid and effective

Replacement Locate the fault to 1. Convenient a board or isolate 2. Require spare parts/equipment. external faults 3. Applied with other methods Configuration Locate the fault to 1. Can find the fault cause. data analysis a single station or 2. Fault locating time is longer. board 3. Depend on the NMS Configuration Locate the fault to 1. Have a high risk. modification a board 2. Depend on the NMS Test with Isolate external 1. A general method with high accuracy instruments faults and resolve 2. Have certain requirements for the meters. interconnectivity 3. Applied with other methods problem 1. Fast fault handling 2. High probability of mistake 3. Need rights reserved. Copyright 2006 Huawei Technologies Co., Ltd. All experience accumulation. Page24 Experience Special cases

Contents
1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page25

Troubleshooting Sequence
Exclude external troubles Switching problem? Fiber problems? Trunk cable? Power supply system? Grounding problem? Replacement Instrument testing Loopback Alarm/performance analysis

Locate troubles to one NE Replacement Loopback Alarm/performance analysis Configuration analysis Configuration modification Rule of Thumb

Loopback Alarm/performance analysis Locate the troubles to one board

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page26

Classified Troubleshooting Examples


Traffic Interruption

Bit Errors

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page27

Traffic Interruption
Possible Causes

External causes
Power

Operation causes

Equipment failure

supply system equipment power off, under voltage, etc. Switch problems Fiber or trunk cables Excessive attenuation, fiber cut Cable disconnection

Loopback Data

Faulty

modification

board Performance degrade

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page28

Traffic Interruption
Operations

Equipment operator Check the indicator status on each board Analyze the alarms Hardware loopback Replacement

NMS operator Check the login of each station Query and analyze alarms Loopback section by section Configuration modification Implement switch
Page29

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Traffic Interruption
No-protection Line-Case
w 2:1 w E 2:1 w E 2:1 w

1
t2:1
LP-RDI

4
t2:1
TU-AIS

Network Configuration
Node 1 is the centralized services node. Each station has E1 services with node 1. Failure Description

Interrupted E1 service between node1 and 4 Node 4:TU-AIS Node 1: LP-RDI Other services normal
Page30

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Traffic Interruption
Where is the Problem?
w 2:1 w E 2:1 w E 2:1 w

1
t2:1 LP-RDI

4
t2:1

Query alarms

TU-AIS

Alarm analysis
TU-AIS in node 4 only

Node 4 can not receive the traffic from node 1

Other traffic normal between nodes 1, 2, 3

Failure location between nodes 3 and 4

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page31

Traffic Interruption
Analysis
Loop back
w 2:1 w E 2:1 w E 2:1 w

1
t2:1 BER tester
Connect tester

4
t2:1

Outloop on VC4 #2 at node 4 Normal No Failure between nodes 3, 4 Soft Inloop on VC4 #2 at east LU of node 3 Yes Failure in node 4

Normal

No

Failure in node 3

Yes Failure between nodes 3, 4 Hard Optical port inloop at east LU of node 3 Normal Yes Failure in node 4 No Failure in node 3

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page32

Traffic Interruption
Final Solution
w 2:1 w E 2:1 w E 2:1 w

1
t2:1 LP-RDI Replacement

4
t2:1 TU-AIS

Locate failure in one node Maybe LU/TU/XC faulty TPS switch Traffic normal No Active/standby XC switch Traffic normal No Replace faulty LU Yes Replace faulty TU

Yes

Replace faulty XC

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page33

Traffic Interruption
SNCP Ring-Case
TU-AIS

w
TU-AIS

e 2 w SNCP Ring w 4 e
TU-AIS

LP-RDI

3 e

Network Configuration

Node 1 is the centralized services node. Each station has E1 services with node 1. All E1 services interrupted Nodes 2, 3, 4: TU-AIS Node 1: LP-RDI
Page34

Failure Description

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Traffic Interruption
Where is the Problem?
TU-AIS

w
TU-AIS

e 2 w SNCP Ring w 4 e

3 e

e LP-RDI 1 w

Alarm/performance analysis Analyze configuration correctness Disconnect ring, convert to line Loopback Replacement

TU-AIS

Thoughts and methods

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page35

Traffic Interruption
MSP Ring-Case
R-LOS R-LOS

e TU-AIS TU-AIS 3 MSP Ring 1 e w STM-4 5 e w 4 e w Network Configuration


Node 1 is the centralized services node. Each station has E1 services with node 1. Failure Description Shortest service route configuration Fibers between NE2-NE3 are broken R-LOS E1 services interrupted between nodes1 and 3 Nodes 1, 3: TU-AIS Other services normal

e 2 w

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page36

Traffic Interruption
Where is the Problem?
MSP switch process

LU

SF or SD detection K1 & K2 bytes transmission

SCC

Normally process APS protocol


Started APS controller Right switch state

XC

Implement switching

Protection channels

Available

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page37

Traffic Interruption
Analysis
R-LOS APS-INDI
Query and check alarms

S R-LOS
APS-INDI
Yes

Check switch status Normal

w 3

e 2 w

e 1 5 e
P

No Maybe APS protocol stoped Restart it Yes

MSP Ring STM-4 w 4 e


APS-INDI

APS-INDI

Switch status normal No Resend configuration

APS-INDI

P
Yes Draw switched traffic flow diagram Loopback section after section to locate faulty LU/XC

Switch status normal No Restart APS protocol node by node to locate faulty LU/XC

Replace faulty LU/XC

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page38

Traffic Interruption
Normal route
1

R-LOS APS-INDI APS-INDI


w1:17

S R-LOS

e1:17

e1:17

w1:17

w 3

t2:1

t2:1

e APS-INDI MSP Ring 1 P w STM-4 5 e w 4 e w APS-INDI APS-INDI


P P e3:17 e3:17

e 2 w

Switched route

e1:17

e3:17

e3:17

1 w1:17 2 w3:17 t2:1

1
w3:17

5
w3:17

4
w3:17

3 t2:1

Notes

One

complex line Can use dichotomy


Page39

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Bit Errors
Possible Causes

External causes
Performance

Equipment failure
Transmitter

degradation of fibers, excessive attenuation Dirty fiber joint or incorrect connector Poor equipment grounding Strong interference source near the equipment Poor ventilation, high operating temperature

or receiver

failure in LU Poor synchronization Poor coordination between XC and LU/TU Fan failure Faulty boards or poor performance

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page40

Bit Errors
Equipment operator Measure optical power Check cable or fiber connection and grounding Clean fiber connector Operations Check ventilation and temperature Hardware loopback Replace board Exclude interference source NMS operator Query and analyze alarms/ performance events Loopback section by section Configuration modification Implement switch

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page41

Bit Errors
No-protection Line-Case
RSBBE MSBBE HPBBE MSFEBBE HPFEBBE

LPBBE

LPFEBBE

Network Configuration

Node 1 is the centralized services node. Each station has E1 services with node 1. Too many bit errors
Page42

Failure Description

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Bit Errors
Where is the Problem?
w w E w

RSBBEM SBBEHP BBE

MSFEBBEH PFEBBE

LPBBE Perform

LPFEBBE
Check and exclude external causes

ance event analysis

Performance event analysis LPBBE from 4 to 1 LU first then TU Failure locates between 3 or 4 continue Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page43 RSBBE/MSBBE/HPBBE from 4 to 3

Bit Errors
Analysis
w w E w
RSBBE MSBBE HPBBE MSFEBBE HPFEBBE

Perform ance event analysis

LPBBE
Check fans and temperature Normal Yes Measure or query optical power Normal Yes continue No No Solve problems

LPFEBBE

Check and replace transmitter/fiber/ connector/receiver

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page44

Bit Errors
Final Solution
w w E w
RSBBE MSBBE HPBBE MSFEBBE HPFEBBE

LPBBE

LPFEBBE Connect BER tester

Loopback & Replacement


Loopback

Active/standby XC switch

Modify configuration

Locate and replace the faulty LU/XC

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page45

Bit Errors
Think About It!
RSBBE MSBBE HPBBE MSFEBBE HPFEBBE

LPBBE

LPFEBBE

Question

How

to solve occasional bit errors? Interchange You can not loopback for a long time Fiber or LU

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page46

Questions

What is the key of troubleshooting?

To locate a failure ACCURATELY in certain station

What is the principle of troubleshooting?

External first, then internal Station first, then boards LU first, then TU Higher-severity alarms first, then lowerseverity alarms

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page47

Summary

Which methods for troubleshooting?

1Alarm and performance analysis 2Loopback 3Replacement 4Configuration Data Analysis 5Configuration Modification 6Test with instruments 7Rule of Thumb

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page48

Thank you
www.huawei.com