Você está na página 1de 66

ZXWR RNC

Radio Network Controller


Emergency Maintenance

Version: V3.12.10

ZTE CORPORATION
No. 55, Hi-tech Road South, ShenZhen, P.R.China
Postcode: 518057
Tel: +86-755-26771900
Fax: +86-755-26770801
URL: http://ensupport.zte.com.cn
E-mail: support@zte.com.cn
LEGAL INFORMATION
Copyright 2013 ZTE CORPORATION.
The contents of this document are protected by copyright laws and international treaties. Any reproduction or
distribution of this document or any portion of this document, in any form by any means, without the prior written
consent of ZTE CORPORATION is prohibited. Additionally, the contents of this document are protected by
contractual confidentiality obligations.
All company, brand and product names are trade or service marks, or registered trade or service marks, of ZTE
CORPORATION or of their respective owners.
This document is provided as is, and all express, implied, or statutory warranties, representations or conditions
are disclaimed, including without limitation any implied warranty of merchantability, fitness for a particular purpose,
title or non-infringement. ZTE CORPORATION and its licensors shall not be liable for damages resulting from the
use of or reliance on the information contained herein.
ZTE CORPORATION or its licensors may have current or pending intellectual property rights or applications
covering the subject matter of this document. Except as expressly provided in any written license between ZTE
CORPORATION and its licensee, the user of this document shall not acquire any license to the subject matter
herein.
ZTE CORPORATION reserves the right to upgrade or make technical change to this product without further notice.
Users may visit ZTE technical support website http://ensupport.zte.com.cn to inquire related information.
The ultimate right to interpret this product resides in ZTE CORPORATION.

Revision History

Revision No. Revision Date Revision Reason

R1.0 2013-03-07 First Edition

Serial Number: SJ-20121213161606-020

Publishing Date: 2013-03-07(R1.0)

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Contents
About This Manual ......................................................................................... I
Chapter 1 Overview to Emergency Maintenance .................................... 1-1
1.1 Purpose of Emergency Maintenance ................................................................... 1-1
1.2 Basic Principles of Emergency Maintenance ........................................................ 1-1
1.3 Precautions of Emergency Maintenance .............................................................. 1-2

Chapter 2 Normal Running Characteristics of the System .................... 2-1


2.1 Criteria to Judge If the OMM Is in Normal Condition.............................................. 2-1
2.2 Criteria to Judge If the RNC Is in Normal Condition .............................................. 2-1
2.3 Criteria to Judge If the Base Station Is in Normal Condition................................... 2-2

Chapter 3 Emergency Maintenance Flow................................................. 3-1


3.1 Flow of Emergence Maintenance ........................................................................ 3-1
3.2 Checking Services.............................................................................................. 3-2
3.3 Fault Records .................................................................................................... 3-3
3.4 Initial Location and Analysis of Fault Causes........................................................ 3-3
3.5 Emergency Aid................................................................................................... 3-4
3.6 Service Recovery ............................................................................................... 3-4
3.7 Service Observation ........................................................................................... 3-5
3.8 Information Records ........................................................................................... 3-5

Chapter 4 Emergency Maintenance on Abnormal Services................... 4-1


4.1 Handling Flow .................................................................................................... 4-1
4.2 Power Supply Check .......................................................................................... 4-1
4.2.1 Checking Power Supply in Equipment Room.............................................. 4-2
4.2.2 Checking Power Supply of Rack................................................................ 4-3
4.2.3 Checking Power Supply of Shelf................................................................ 4-3
4.2.4 Powering-on for Check ............................................................................. 4-4
4.3 Handling Service Interruption Caused by Board Abnormality ................................. 4-4
4.4 Checking Working Status of System Clock........................................................... 4-6
4.5 Handling Service Interruption Caused by Transmission Abnormality ...................... 4-6
4.5.1 Handling CSTM-1 Interconnection Fault..................................................... 4-7
4.6 Analyzing RNC Fault Coverage..........................................................................4-11
4.7 Handling RNC Service Abnormality and Interruption ........................................... 4-12
4.7.1 Handling Iu Interface Faults .................................................................... 4-12
4.7.2 Handling Clock System Faults................................................................. 4-13

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


4.7.3 Handling Call Failures............................................................................. 4-14
4.7.4 Handling Mute Calls ............................................................................... 4-15
4.7.5 Handling Download and Webpage Access Failures after Activating PS
Services ............................................................................................... 4-16
4.8 Handling Node B Service Abnormality and Interruption ....................................... 4-17
4.8.1 Handling Large-Scale Cell Outages ......................................................... 4-17
4.8.2 Handling Absence of Cell Signals and Low Success Rate of RRC
Establishments ..................................................................................... 4-18
4.8.3 Handling Service Interruption Caused by Radio Cell Abnormality .............. 4-19
4.8.4 Handling Service Interruption Caused by Radio Configuration Data
Modification Error.................................................................................. 4-20
4.9 Handling OMM/M31 Abnormality and Interruption............................................... 4-20
4.9.1 Handling OMM and NetNumen U31 Abnormality and Interruption.............. 4-20
4.9.2 Handling OMM and NetNumen U31 Performance Data Delay and
Reporting Failure .................................................................................. 4-24
4.9.3 Handling Database Access Failure .......................................................... 4-29
4.9.4 Handling Oversize Rollback Segment Caused by Mass Data Deletion ....... 4-29
4.9.5 Handling Free Disk Space Insufficiency Caused by Improper Partition ....... 4-30
4.10 Handling Overload.......................................................................................... 4-31
4.10.1 Handling MP CPU Overload.................................................................. 4-31
4.11 Data Restoration............................................................................................. 4-33

Appendix A Data Backup and Recovery ................................................. A-1


A.1 Overview to Board Reset and Changeover .......................................................... A-1
A.2 Influence of Reset and Changeover .................................................................... A-1
A.3 Changeover Modes............................................................................................ A-2

Appendix B Emergency Maintenance Tables and Common


Information Description........................................................................ B-1
B.1 Abnormality Record Table .................................................................................. B-1
B.2 Troubleshooting Record Table ............................................................................ B-3
B.3 Equipment Emergency Maintenance Requisite .................................................... B-4
B.4 Common Panel Indicators .................................................................................. B-6
B.5 Link/Cell Fault Confirmation Methods .................................................................. B-7
B.5.1 Checking Whether NCP Link is Normal .................................................... B-7
B.5.2 Checking Whether Cell Establishment is Normal........................................ B-8

Figures............................................................................................................. I
Tables ............................................................................................................ III
Glossary .........................................................................................................V

II

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


About This Manual
Purpose
This manual describes the process and methods about emergency maintenance of
ZXWR RNC base station controller, and the maintenance table for recording emergency
maintenance information.

Intended Audience
This manual is intended for:

l System engineers
l Maintenance engineers

What Is in This Manual


This manual contains the following chapters:

Chapter Summary

1, Overview Describes basic principles and precautions of


ZXWR RNCemergency maintenance.

2, Characteristics of System Normal Running Describes the characteristics when the system
runs normally.

3, Emergency Maintenance Flow Describes the ZXWR RNC emergency


maintenance flow and describes each step in
detail.

4, Emergency Maintenance of RNC Service Describes the emergency handling methods on


Abnormalities the abnormality of ZXWR RNC service.

Appendix A, Board Resetting and Switchover Describes the board resetting and switchover.

Appendix B, Emergency Maintenance Record Describes the RNC emergency maintenance


Table and Common Information Description record tale and common information description

Related Documentation
The following documentation is related to this manual:

l ZXWR RNC Radio Network Controller Hardware Description


l ZXWR RNC Radio Network Controller Parts Replacement Guide
l ZXWR RNC Radio Network Controller Status Management Operation Guide

Conventions
This manual uses the following typographical conventions:

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Typeface Meaning

Note: provides additional information about a certain topic.

II

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 1
Overview to Emergency
Maintenance
Table of Contents
Purpose of Emergency Maintenance ..........................................................................1-1
Basic Principles of Emergency Maintenance ..............................................................1-1
Precautions of Emergency Maintenance ....................................................................1-2

1.1 Purpose of Emergency Maintenance


Emergency maintenance is to deal with the emergent faults. When some emergent faults
occur on the system or the equipment, to remove the faults quickly, to restore the system
or the equipment, the emergent measures help to retrieve or to reduce the loss.
During the operation, due to the external or internal causes, critical faults may occur on
some parts and functions of ZXWR RNC. In these cases, do start the emergent fault
troubleshooting flow immediately. According to the prompt message, signaling trace (that
is, calling trace), and error logs, determine the fault range, find the fault cause, and deal
with the faults.

1.2 Basic Principles of Emergency Maintenance


Emergency maintenance is to recover the normal running of the equipment quickly. The
premise is that the system runs normally before an emergent accident occurs.
Observer the following basic emergency maintenance principles:
l In the routine maintenance, the operators can refer to ZXWR RNC emergency
maintenance documents, past fault analysis and experience in handling the faults.
l Operators should, on a regular basis, organize related management personnel and
maintenance personnel for study and drill. Related maintenance personnel should
know more about the system in the routine maintenance, especially the common
exception information of OMC alarm and the flashing of ZXWR RNC panel indicators.
They should skillfully use the common tools such as data backup and recovery tool.
l When the emergent accident occurs, the maintenance personnel should keep a sober
mind first. Check whether the hardware and transmission of ZXWR RNC is normal,
and judge whether this accident results from ZXWR RNC. If so, deal with the fault
according to the emergency accident handling plan or refer to the related procedures
provided in this manual.

1-1

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

l Before/During/After handling the emergency, the maintenance personnel should


collect the equipment alarm information related to this accident and send relevant
fault handling report, equipment alarm file and log file to ZTE CORPORATION for fault
analysis and location, so that it can provide better after-sales services for carriers.
l When major faults occur on the site, recover the services within as short time as
possible. Meanwhile, before performing the switch, reset, and reboot, open the
fault positioning analysis tools, such as, NM alarm and signalling tracing, keep the
information that the fault positioning and analysis need.

1.3 Precautions of Emergency Maintenance


Precautions during the emergency maintenance are as below.
l Locate the faults quickly. Some restoration operations (for example, board reset) may
have a huge impact on the running of the system, so that such operations can be
performed by or under the operators with enough maintenance experience.
l Attach all the contact information of ZTE CORPORATION in the prominent positions,
to facilitate user contact.
l If the faults have a huge impact on the network running, the maintenance personnel
should contact ZTE Customer Support Center or local ZTE office at the earliest
possible time for technical support, no matter whether they believe they can or cannot
solve the fault.

1-2

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 2
Normal Running
Characteristics of the System
Table of Contents
Criteria to Judge If the OMM Is in Normal Condition ...................................................2-1
Criteria to Judge If the RNC Is in Normal Condition ....................................................2-1
Criteria to Judge If the Base Station Is in Normal Condition ........................................2-2

2.1 Criteria to Judge If the OMM Is in Normal Condition


The following criteria can be used to judge if the OMM is in normal condition:
l The user can log in the server from the Client.
l The user can query performance statistics on the performance management interface.
l The user can view the current alarms and notifications on the alarm management
interface. All alarms are accurately reported in time.
l The user can check the real-time status of all managed objects on the dynamic
management interface.
l The user can configure data through the configuration management function.

2.2 Criteria to Judge If the RNC Is in Normal Condition


The following criteria can be used to judge if the RNC is in normal condition:
1. When the user check the running status of the RNC through the EMS system:
l The OMM server links to the RNC OMP and the user can ping through the RNC
OMP address from the SBCX. For example, check the link status in the EMS
system. imeans that the link is established successfully. means that the link
establishment fails.

Run the command below to ping the RNC OMP address from the SBCXFor
example, the RNC OMP address is 129.1.1.1:

#ping 129.1.1.1

l There is no alarm about any board in the alarm management interface of the EMS
system, especially no alarm saying that the control-plane communication link is
broken.
l In the alarm management interface of the EMS system, all boards are in normal
active/standby status.

2-1

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

l In the dynamic data management interface of the EMS system, all units, subunits,
neighboring office signaling points, No. 7 links, IMA groups, IMA links, PVCs,
signalling links, AAL2 channels, SCTP connections, AS & ASP, Node B ports,
cells, and channels are in unblocked / activated / available status.
In the dynamic data management interface of the EMS system, all base stations
are unblocked, or without blocking.
2. When the user check the working status of the board indicators at the RNC rack side:
l No red indicator is in the ON status on all boards.
l For the two ROMB/RCB of all modules, the Active indicator is in the ON status
on one MP and the Standby indicator is in the ON status on the other. The
Run indicator flashes slowly (being ON/OFF once every one second, similarly
hereinafter). The active/standby slots conform to what the EMS indicates.
l The Run indicators of the ROMB/RCB flashes slowly.
l For the two ICMGs, the Active indicator is in the ON status on one ICMG and
the Standby indicator is in the ON status on the other. The Run indicator flashes
slowly. The reference selection indicator and the trace indicator are always ON.
l The Run indicator on the DTB flashes slowly. The E1 indicator flashes slowly at
1Hz.

If the indicator of the port configured with E1 flashes slowly, it means that the E1 is
correctly connected; if the indicator is always ON, it means that data is configured
but the E1 is not connected; If the indicator is OFF, it means no data is configured.

2.3 Criteria to Judge If the Base Station Is in Normal


Condition
The following criteria can be used to judge if the base station is in normal condition:

l There is no alarm at the EMS alarm management station.


l The performance indices in the EMS statistics are normal.
l All cells in the base station that is managed dynamically by the EMS are in normal
status.
l The KPI statistics show that the number of RRC connection requests is not 0 and the
success rate of RRC establishments is above 98%.
l There are no board alarms and all indicators flash normally at the base station side.

2-2

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 3
Emergency Maintenance
Flow
Table of Contents
Flow of Emergence Maintenance ...............................................................................3-1
Checking Services......................................................................................................3-2
Fault Records.............................................................................................................3-3
Initial Location and Analysis of Fault Causes..............................................................3-3
Emergency Aid...........................................................................................................3-4
Service Recovery .......................................................................................................3-4
Service Observation ...................................................................................................3-5
Information Records ...................................................................................................3-5

3.1 Flow of Emergence Maintenance


The flow of the emergency maintenance is as shown in Figure 3-1.

Figure 3-1 Flow of Emergency Maintenance

3-1

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

It involves the following steps:


1. Check services.
2. Record abnormalities and output Abnormality Record Table.
3. Make initial location and analysis of faults.
4. Launch the emergency aid, record and send Equipment Emergency Maintenance
Requisite.
5. Recover services.
6. Observe services.
7. Make records of information and fill in Troubleshooting Record Table.

3.2 Checking Services


Context
When an emergency fault occurs, check the services according to the following steps:

Steps
1. Go to the cabinet immediately to check the power supply. If the power failure occurs in
large area, inform the power supply maintenance persons to recover the power supply.
Shut down the power supply of the cabinets one by one. Power on after the power
supply is stable.
2. If the external power supply is normal, after reading the users complaints, observe
the calling status of all offices from the performance statistics console. Determine the
fault occurrence range, in all offices or in some offices. If the fault occurs only in some
offices, contact the personnel in the offices, checking the interface state and link state,
positioning the fault range, and determining whether the fault is on the local office. If
not, deal with the peer office. If so, go to Step 3.

3. Check whether the indicator status on the hardware boards is normal. Check whether
the physical connection and link with other element is normal. If so, contact the
maintenance personnel of other element for the troubleshooting, or find the possible
source by referring to the emergency maintenance manual of other element.
4. If there is no obvious hardware fault on the boards, check whether the software and the
data has problem. After observing OMC client alarm information, check whether there
is alarm of the board abnormality or link abnormality. If all is normal, check whether
the radio resource cell status is normal, whether the physical connection and link with
other element is normal. Try to recover quickly: Checking the operation logs, checking
whether the system is down due to data mis-modification or deletion (through checking
MML operation logs and alarm time, judging the relativity of the operation and fault).
IF so, recover the data.

3-2

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 3 Emergency Maintenance Flow

5. If all is normal, contact the personnel of other element (such as, Node B, CN) for the
troubleshooting, or find the possible source by referring to the emergency maintenance
manual of other element.
End of Steps

Follow-Up Action

Caution!

When the fault occurs, it is important to locate the fault, especially locating whether the
fault occurs on the local office or other office. It is important for the fast troubleshooting.

3.3 Fault Records


Before/During the start of the emergency recovery plan or the fault recovery, make records
of the running version and phenomena in the abnormality table.
Back up OMC configuration data properly.

Caution!
The abnormality record is very useful in emergency aid and the subsequent problem
analysis and summary. Therefore, be sure to fill a complete abnormality record.

3.4 Initial Location and Analysis of Fault Causes


Pick up relevant data about alarm, performance, and printing, and analyze obvious
phenomenon about network fault. Observe the information of equipment operation, and
board indicator. Check the fault caused by ZXWR RNC equipment or other reasons, and
determine its involved scope.
If the fault is located as being caused by the ZXWR RNC equipment, you shall analyze
field alarms, performance, signalling, and printing log, and do troubleshooting after finding
proper fault point.

Locate and analyze the fault based on the following three aspects:
1. Service faults often begin with user complaints, so you shall register the user number.
Analyze the base station where the complaint user is located in accordance with
different tools at radio and CN sides, to locate and analyze the fault.
l Use signalling trace and probe to find out CN, RNC, or Node B where the
complaint user is located, to locate and determine fault related equipment.

3-3

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

l If you can't determine the location of complaint user at the RNC side, you shall
search for help from the CN side.
2. Determine fault scope through the analysis of KPI index.
l Query relevant indices in KPI to determine the affected base station scope about
the fault.
l Determine whether it is a global fault based on the faulty base station.
l Determine whether it is associated with the module and specific board based on
the faulty base station.
3. Test arrangement.
If possible, arrange test at specific area, and provide more accurate information on
emergency maintenance.

3.5 Emergency Aid


If the services are not recovered after the start of the emergency recovery plan or the
troubleshooting procedure provided in this document, or it is a critical system fault, ask
for emergency aid immediately. ZTE CORPORATION provides emergency aid channels
through 7 x 24 hour service hotlines, and remote and on-site technical support.
l Service hotline
Hotline of ZTEs customer service center: 800-830-1118
International service hotline: +8675526771900
Service fax: (0755)26770801
International service fax: +8675526770801

In addition, provide the on-site fault record table wherever possible to allow ZTE
maintenance personnel to learn and locate the fault more easily.

l Remote technical support


According to information provided at the service hotline, ZTE technical support experts
log on to the problematic site remotely. Experts can solve common problems on the
phone. If the problem is complex, ZTE CORPORATION will send the maintenance
experts to the site for on-site technical support.

l On-site technical support

After the maintenance experts arrive at the site, they take emergency maintenance
actions to recover the communication as soon as possible.

3.6 Service Recovery


If the methods provided in this manual and remote emergency aid cannot help to locate
faults and recover services, switch, reset and replace boards to recover the system service.
These operations may give a great impact. Refer to Appendix E and Appendix A of ZXWR
RNC (V3.07.310) Radio Network Controller Trouble Shooting before the operation.

3-4

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 3 Emergency Maintenance Flow

Caution!
Board handover, reset and replacement may have a great influence on the system running.
Therefore, refer to Appendix E and Appendix A of ZXWR RNC (V3.07.310) Radio Network
Controller Trouble Shooting before the operation.
Make records of the current status before any board handover and physical location
change.
Make records of each step and symptom occurring in the service recovery on the site.

3.7 Service Observation


After the service recovery, make a further check to see whether the system has recovered
completely, to avoid any other problems. Observe by referring to ZXWR RNC (V3.07.310)
Radio Network Controller Trouble Shooting and ensure the normal running of the services.
In addition, arrange attendants in the period of service peak to make sure to solve the
problem in time (if any problem occurs).

3.8 Information Records


Collecting fault information is important for asking for technical support, analyzing and
locating the fault cause, and preventing such faults from occurring. It not only provides
precious maintenance experience for operators, but also provides good reference for
manufactures to improve the equipment.
After the service recovery, collect the fault recovery procedures and analyze the fault
causes according to the fault handling record table. If there is any emergency aid request,
do feed back the recovery results to ZTE service hotline.
In addition, no matter whether the fault is dealt with or not, the maintenance personnel
should collect the information in time. The information to collect includes the following
items:

1. Brief notice
The operator makes the brief notice, including the fault occurrence time, fault
properties, fault symptom, and detailed troubleshooting steps. If the fault can not be
removed, provide the detailed dealing steps, for the fast troubleshooting in the future.

2. System debugging information


Copy all logs on the EMS server and save to a new folder path.

Save the log files on ZXWR RNC SBCX with the file manager.
3. Alarm information

3-5

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Collect the history alarms from thirty minutes before to thirty minutes after the fault.
Maintenance personnel can query and save on the alarm browsing window.
4. Command log information
Collect the command log information from thirty minutes before to thirty minutes after
the fault. Maintenance personnel can query the operation log, security log, and system
log on the log management subsystem of EMS.
5. RNC abnormality log
You can export all log files under the directory /IDE0/ExcInfoand Exc_omp.txt and
Exc_pp.txt under the directory /DOC0 of the active/standby OMP boards when a
fault occurs. After that, you can query the fault according to these log files.

Caution!
The log files from the active and standby OMP boards must be saved separately to
avoid overwriting.

3-6

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4
Emergency Maintenance on
Abnormal Services
Table of Contents
Handling Flow ............................................................................................................4-1
Power Supply Check ..................................................................................................4-1
Handling Service Interruption Caused by Board Abnormality ......................................4-4
Checking Working Status of System Clock .................................................................4-6
Handling Service Interruption Caused by Transmission Abnormality...........................4-6
Analyzing RNC Fault Coverage ................................................................................4-11
Handling RNC Service Abnormality and Interruption.................................................4-12
Handling Node B Service Abnormality and Interruption ............................................4-17
Handling OMM/M31 Abnormality and Interruption ....................................................4-20
Handling Overload....................................................................................................4-31
Data Restoration ......................................................................................................4-33

4.1 Handling Flow


The following describes procedures to check ZXWR RNC emergency faults. The handling
procedure can change with specific situation. For example, skip Steps 3 and 4 if there are
no modifications on the configuration data.
1. Check the power supply.
2. Handle the user service interruption caused by ZXWR RNC board fault.
3. Check the system clock working status.
4. Handle the user service interruption caused by abnormal transmission.
5. Handle the user service interruption caused by abnormal radio cell.
6. Handle the user service interruption caused by the wrong modification of ZXWR RNC
radio configuration data.

If the above steps cannot help in troubleshooting and solution, please refer to Emergency
Aid.

4.2 Power Supply Check


The system power failure may result from the power supply fault in the equipment room.
When the power failure occurs suddenly, confirm the fault source first. Check in the
following order:
1. Check whether the power supply in the equipment room is normal.

4-1

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

2. Check whether the power supply of all racks is normal.


3. Check whether the power supply of all shelves is normal.
4. Power on the equipment again.
5. Validate the service.
The flow is as shown in Figure 4-1.

Figure 4-1 Handling Process During System Power Failure

4.2.1 Checking Power Supply in Equipment Room


Prerequisite
To remove the power failure caused by the power supply fault in the equipment room, the
maintenance personnel should follow the below steps and power on the system again.

Steps
1. Before the power supply in the equipment room recovers, to prevent from the accident,
power off all switches on the cabinet power distribution subrack which connects with
the external power supply system. Keep the dual-path power supply on the rack in
OFF file.

2. Check the power supply system in the equipment room.


3. After the power supply in the equipment room recovers, power on the system again.
The power-on order is as below.

4-2

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

a. Power on the power distribution subrack. Check whether the power supply voltage
is within the normal range: -57 V ~ -40 V.
b. Power on the dual-path on the racks. Check whether the power supply voltage is
within the normal range: -57 V ~ -40 V.
c. Recover the power supply of the network cabinet and server cabinet. Start the
EMS server, charging dual-machine server, and disk machine.
End of Steps

4.2.2 Checking Power Supply of Rack


1. Check whether the power supply on the rack is normal.
a. If RUN on the rack power distribution subrack panel is ON (green) and flashes at
1Hz, the power supply of the rack is normal. That is, the inlet cable is normal and
the internal circuit of the power distribution shelf is normal.
b. If RUN on the rack power distribution subrack panel is OFF (green), the power
supply of the subrack is abnormal. That is, the two paths of the inlets on the
power distribution subrack are under-voltage. Check whether the cable from the
power distribution subrack to the rack is loose or broken. If so, replace the cable
in time.
2. Recover the power supply on the rack.
Check whether the dual-path power supply voltage of the power distribution subrack is
within the normal range (-40 V ~ -57 V). If so, set the power switch to the ON position
to resume the rack power supply.

4.2.3 Checking Power Supply of Shelf


Steps
1. Check whether the power supply on the shelf is normal.
i. If all LED indicators on the shelf are OFF, the power supply on the shelf is
abnormal.
ii. If the switch on the back of the shelf is in OFF file, the supply on the shelf is
abnormal.

2. Recover the power supply on the shelf.


Check whether the dual-path power supply voltage of the power distribution subrack
is within the normal range. If so, set the power switch on the back of the shelf in ON
file, the shelf power supply recovers.

End of Steps

4-3

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

4.2.4 Powering-on for Check


Prerequisite
The order to power on the system again and the precautions are describes as below.

Steps
1. Power on OMC server and OMC client.
Power on OMC server and OMC client. Start the related service programs. If the
connection is normal, the login main interface pops up at the client. After logging on
successfully, the client can operate normally.
2. Power on ZXWR RNC rack, shelf, and board.

ZXWR RNC power-on order: Power on the master control shelf and then other
shelves. Power on and start ZXWR RNC. Observe RUN status to check whether the
system startup is normal (no alarm and RUN is flashing slowly). After making sure
that OMC server is started normally, check the following items:
l Check whether there is communication fault through the fault management, such
as, whether the communication between modules is normal. If so, deal with it.
l Check whether the signaling link gets error in the office through the fault
management. If so, deal with it.
l Perform the basic service test. Make sure that the service recovers normal
through the signaling tracing and failure observing.
l The upper-level EMS alarms can be reported normally and the performance
statistics can be reported normally, too.
End of Steps

4.3 Handling Service Interruption Caused by Board


Abnormality
This section describes several types of board that have a close tie with the normal running
of services. They are to facilitate rapid location and troubleshooting of faults.
1. Interface unit: APBE, DTA, DTT, SDTT,GIPI4, SDTA2, and GIPI3, which mainly
provide the data access of ZXWR RNC Iu/Iub/Iur interface and are the termination
of AAL2/AAL5/ATM and IP over E1 link layer processing. Here, APBE provides the
optical fiber access (STM-1), and the optical interface SD on the panel indicates
its connection status. DTA/DTT supports E1 access and E1 indicator on the panel
indicates E1 connection status. SDTI and SDTA2 provide channelized CSTM-1
access.GIPI4 is the Giga IP interface board of ZXWR RNC and provides IP access
and OMCB gateway.
2. Switching unit: PSN, GLI, UIMC, UIMU, CHUB, THUB, and GUIM, which provide the
inter-board service exchange platform.

4-4

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

3. Processing unit: RCB and RUB, which process the upper layer protocols of ZXWR
RNC control plane and user plane.
Generally, the alarm function of OMC client and the flashing status of ZXWR RNC rack
board can help to judge the failed board and its causes.
1. Log on to EMS client and click Tool > Alarm Management. Check OMC alarm function
of ZXWR RNC EMS, and then check whether there is any board alarm the type of the
alarm board.
2. Observe other indicators of the board.
The following is examples for the flashing of common indicators.

a. Check ENUM on the board. In normal cases, it is solid OFF. If the indicator is
solid ON or flashes, the board is out of position. Unplug and plug it to observe the
status again.

b. If RUN indicator slowly flashes (frequency: 1 time/s) and ALM is solid OFF, the
board is running normally. If other indicators flash, the board is not running
normally at this time. If RUN is solid OFF, the board fails in self-test. If both RUN
and ALM flash slowly (1 time/s), this board is under active/standby changeover.
Wait for a while to see whether the board recovers to its normal status.
c. Check ACT on the board. If it is solid ON, this board is an active board while if it
is solid OFF, the board is a standby one. This indicator is to locate active/standby
changeover failure.
Proposals for handling such fault:
1. The alarm management information of ZXWR RNC EMS generally indicates the
alarm causes and recommended operation to eliminate this alarm. Perform related
operations according to such information.
2. Wait for ZXWR RNC board to recover to its normal status, and observe whether the
user service restores to normal.
If indicators flash abnormally for long during the board running and the alarm still exists,
try the following operations:
1. Reserve the alarm information.
2. Reset the alarm board or replace the board.

Caution!
Resetting ZXWR RNC boards may have a huge influence on services. Such as, if you
reset RUB, it is necessary to re-create all cells and user services on this board. if you
reset the interface board, it is necessary to re-create all bearers allocated on this board.
Therefore, please proceed with caution.

4-5

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

4.4 Checking Working Status of System Clock


The clock system fault may result in the global service interruption. Check the clock system
with the following methods:
l Check the LED indicator statuses of the clock board (CLKG/ICM) to find whether the
board is running normally
l If the clock board is normal, check whether its connection with GUIM on BGSN shelf
is normal
l If the clock board is normal and its connection with GUIM is normal, check whether
GUIM is normal.
If the clock fault still exists, contact the related persons for timely recovery.

4.5 Handling Service Interruption Caused by


Transmission Abnormality
Check with the following methods to judge the user service interruption caused by
abnormal transmission:
1. On the EMS client, check the status of the transmission links, such as NCP, CCP,
ALCAP, MTP3B links, association and see whether it fails.
2. On the EMS client, check whether there is any resource alarm for the cell public
transmission channel, No.7 link, NCP, CCP, and association. Check whether the alarm
exists constantly and cannot recover.
3. In the case of ATM transmission mode, check the optical interface SD and E1 indicator
of the interface board, to judge the transmission line for normality.
a. For APBE, check the optical interface SD. The indicator is solid on during the
normal communication. Otherwise, there may be faults with the optical fiber.
b. For DTA, check and make sure that the E1 indicator is slowly flashing (1 time/s)
during the normal communication; otherwise, there might be something wrong
with the E1 connection.
c. For DTT, check and make sure that the E1 indicator is slowly flashing (1 time/s)
during the normal communication. Otherwise, there may be faults on the
connection. For example, solid on indicates E1 link configuration but blocked.
4. In the case of IP transmission mode, check GIPI, GIPI3, and CMP.
CMP is to deal with the system signaling processing data. GIPI4 provides ZXWR RNC
external IP interface. When the fault occurs on GIPI, the communication between
ZXWR RNC and other elements disconnects.
Check with the following methods to judge the working status between GIPI4, GIPI3,
and CMP.
a. Check RUN on the panel. When the communication is normal, RUN is flashing
slowly (one/1 s). If it is abnormal, check whether the IP cable connection is normal
first, and then check to see if there is any failure alarm about the port on the GIPI4.

4-6

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

b. Check whether ALM on the panel is ON.


c. Query whether GIPI4/CMP CPU occupation ratio reaches 100% on the cabinet
diagram on Equipment Resource Management of EMS.
Proposals to handling the link resource faults:
1. Check whether the data to be negotiated by such external NEs as Node B, CN and
ZXWR RNC are consistent (such as NCP, CCP, MTP3B link, ATM address, and IP
address). If there is any abnormal configuration data, the cause may be local NE or
other NEs have modified the configuration data. Make confirmation and modify them.
2. If there is not abnormality, perform the self-loop on optical interface or IMA group at
ZXWR RNC side.
3. If the conditions allow (for example, the distance between NEs is very small), perform
the self-loop at the corresponding remote NE according to link fault location. For
example, for Iub link, perform the self-loop on the optical interface of the interface
board at Node B side. For Iu interface, perform the self-loop on the optical interface
of the interface board at CN side.
4. If the fault disappears after the local self-loop, the cause may be the abnormal
running of the peer NE. If the peer NE becomes normal after the self-loop, the cause
is transmission network configuration fault.
5. If the fault still exists after the self-loop, check the optical fiber for damages and
exposing.
6. For IP network, when all equipment is running normally, if the global services
disconnect, the maintenance personnel should examine whether IP network is
running normally first.
a. Check the association status in EMS configuration management. If the association
is not in service status, recreate the association. If the creation fails, connect the
cable from the interface to the debugging machine. Set the IP address of the
debugging machine as the local interface IP address and check the IP network
through PINGing the peer interface IP address.
b. In the performance counter, check the office IP link type QoS statistics. Know the
accessibility of the peer IP address according to the packet loss rate.
c. With the dedicated instrument or software, test the transmission delay, error bit
rate, jitter of the IP network, confirming whether such faults as network blocking,
network thunderstorm, and virus attack, occur in the IP network.
7. If the problems fail to be solved with all above methods, launch the emergency aid, or
reset the interface board.

4.5.1 Handling CSTM-1 Interconnection Fault


ESDTI/ESDTA is the SDH/SONET network interface board of the platform, to access
the ATM/IP link through the CSTM-1 interface. This section makes a conclusion on the
interconnection fault of ESDTI/ESDTA in this mode and the troubleshooting on the fault.

4-7

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

In the SDH transport network shown in Figure 4-2, REG is the regenerative repeater, ADM
is the add/drop multiplexer, DXC is the digital cross-connection equipment, and TM is the
terminal multiplexer.

Figure 4-2 Network Location and Alarm Structure of ESDTI, ESDTG, and ESDTT

The optical interface board terminates the regeneration section overhead between the
REG and the optical interface board. Alarms for maintaining the regenerator section
include: LOS (Loss Of Signal), LOF (Loss Of Frame), and RS-TIM (Regenerator Section
- Trace Identifier Mismatch).

The optical interface board terminates the regeneration section overhead between the
nearest ADM and the optical interface board. Alarms for maintaining the regeneration
section include: MS-AIS (Multiplex Section - Alarm Indication Signal), MS-FERF (Multiplex
Section - Far End Receive Failure), SF (Signal Failure), and SD (Signal Degrade).
The optical interface board terminates the higher order path overhead between the
DXC and the optical interface board. Alarms for maintaining the higher order path
include: AU-AIS (Administration Unit-Alarm Indication Signal, AU-LOP (Administration
Unit - Loss Of Pointer), HP-TIM (Higherorder Path Trace Identifier Mismatch), HP-UNEQ
(Higher-order Path UN-Equipped), HP-PLM (Higherorder Path - Payload Label Mismatch),
HP-FERF (High-order Path - Far End Receive Failure), and LOM (Loss Of Multiframe).
The optical interface board terminates the lower order path overhead between the TM
and optical interface board. Alarms for maintaining the lower order path include: TU-AIS
(Tributary Unit - Alarm Indication Signal), TU-LOP (Tributary Unit - Loss Of Pointer),
LP-RDI (Lower-order Path - Remote Defect Indication), LP-RFI (Lower-order Path -
Remote Failure Indication), LP-TIM (Lower-order Path - Trace Identifier Mismatch),
LP-UNEQ (Lower-order Path UN-Equipped), and LP-PLM (Lowerorder Path - Payload
Label Mismatch).
The optical interface board terminates the E1 circuit overhead between the opposite switch
and the optical interface board. Alarms for maintaining the E1 circuit include: E1AIS (E1-
Alarm Indication Signal), E1LOF (E1 Loss Of Frame), E1-LOM (E1 Loss Of Multiframe),
E1-RAI (E1 Remote Alarm Indicator), E1-FEBE (E1 Far End Block Error), and E1-SLIP.

4-8

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

4.5.1.1 Principles for Handling Transmission Alarms


l Handle the near-end alarms prior to handling the far-end alarms
Near-end alarms are reported by the optical interface board when it detects the
abnormality in the signals received. Far-end alarms are reported to the optical
interface board when the opposite device detects the abnormality in the signals
received. The near-end alarms should be handled first because they are original
alarms that indicate the abnormality.
l Observe the commonness of the alarms
When several optical interface boards generate alarms of the same type, or when
alarms of the same type occur on several E1 links, observe whether the board or E1
links are in the same office or whether the opposite exchanges are the same model.

l Contact the peer end to handle the far-end alarms


Far-end alarms are reported by the peer equipment. When far-end alarms occur on
the optical interface board, go to the far-end equipment maintenance personnel to
know about what are original alarms and what original alarms are eliminated. Far-end
alarms of the optical interface board disappear.
l Ping transmission at each section of Ethernet.
Locate the transmission fault. That is, find out the faulty section (such as the section
cannot be pinged through or the section to which packet loss occurs), or locate the
faulty equipment.
l Detect inner media plane

Detect the media plane of RNC, inter-boards, and inter-shelves, to check whether
there is any packet loss resulting form hardware fault in RNC.
l Handle transmission alarms

Remove the alarms for packet loss at the bottom bearer layer caused by abnormal or
unstable transmission.
l Capture packets

Packet capture at the external switch is a method to check on which NE packet


loss and disorder occur. Packet capture at the site and RNC is used together for
comparison. The PTN equipment at the Iub interface can also support to capture
packets.
The function of packet capture varies from different manufactures, so you can contact
the transmission manufacturer for support.

4.5.1.2 Methods for Handling Transmission Alarms


l Determine the fault type through comparison
When the alarm exists on some interface boards, if allowed, change the boards or
connect cables to determine whether the alarm is related to the board or office.

4-9

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

l Locate the fault through loopback


EMS test management interface provides different loopback settings for the interface
board, including line loopback of optical path, test loopback of optical path, line
loopback of optical path at the system side, line loopback of E1, and test loopback of
E1.

4.5.1.3 Causes for Transmission Alarms


l LOS, LOF
The cause may be that the REG device directly connecting the interface board is faulty
or the pigtail/flange between the local-end ODF and the equipment is faulty.
l AU-AIS, AU-LOP, HP-UNEQ, HP-PLM
The cause is that the SDH transport network does not enable/configure the higher
order path.
l TU-AIS, TU-LOP, LP-UNEQ, LP-PLM
The cause is that the lower order path is not established in SDH transport network or
the DXC configuration does not meet the requirement of networking.
l E1-AIS, E1-LOF

The cause is the connection fault between the opposite exchange and the SDH
transport device, such as, E1 cable connection fault.
l RS-TIM,HP-TIM,LP-TIM

The cause is that the values of local J0, J1, and J2 are inconsistent with the
configurations of SDH transport device. Alarms of these three types do not affect
the services.
To eliminate the alarms, obtain the values of J0, J1, and J2 related to the transport
device through the query opposite configuration and then modify the values in the
database.

l RS-FERF, HP-FERF, LP-RDI, E1-RAI


Check whether there are near-end alarms on the corresponding layer first. If there
are, eliminate the near-end alarms on the opposite, eliminate them first.
Specially, for E1-RAI alarms, contact the maintenance personnel of the opposite
exchange to confirm whether the E1 frame format is same as the local end.

l E1-SLIP
If E1SLIP occurs when the board is running normally, the cause is the clock fault.

4.5.1.4 E1 Array Mode Fault


Several E1 links of the interface board has two array modes on the optical path, G.707
and Tributary. The array modes of both interconnection ends must be same. If not, the

4-10

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

probable symptom is: Services borne on E1 1, 4, 7, 10, 13, 16, 19, 23, 26, 29, 32, 35, 38,
41, 45, 48, 51, 54, 57, 60, 63 are normal and on others are disconnected.
To confirm whether the array modes are same or not, insert E1AIS alarm on the opposite
on E1 2. If the E1 22 detects the alarm, the array modes at the interconnection ends are
different.

4.6 Analyzing RNC Fault Coverage


How to analyze the RNC fault coverage is described in Table 4-1.

Table 4-1 RNC Fault Coverage Analysis

Fault Coverage Possible Causes Recommended Solutions

All CS and PS services in the Power failure Check the power supply.
whole network are blocked. CN-side failure Check the CN side.

All CS services in the whole CN-side failure Check the CN side.


network are blocked.

All PS services in the whole CN-side failure Check the CN side.


network are blocked.

All CS and PS services in a APBE fault Check the board and replace it
single RNC are blocked. Incorrect configurations if necessary.
corresponding to the office Modify office direction
at the CN side configurations.

All CS services in a single RNC APBE fault Check the board and replace it
are blocked. SS7 link fault if necessary.
Check SS7 configurations.

All PS services in a single RNC APBE fault Check the board and replace it
are blocked. SS7 link fault if necessary.
Check SS7 configurations.

All services of a resource shelf UIM fault Check the UIM and replace it if
are blocked. GLI fiber fault necessary.
CHUB connection fault Check the GLI fiber and the GLI
port.
Check the CHUB connection
and the CHUB port.

All services of a CMP module RCB fault Switch over the RCB.
are blocked. Replace the failed RCB.

All services of an IMA are IMA fault Check the IMA and replace it if
blocked. Media plane fault necessary.
Take further measures as
required according to the media
plane test.

4-11

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Fault Coverage Possible Causes Recommended Solutions

All services of an SDTA2/SDTI SDTA2/SDTI fault Check the SDTA2/SDTI and


are blocked. Fiber channel fault replace it if necessary.
Check the fiber channel to which
the SDTA2/SDTI corresponds.

All services of a DTA/DTI are DTA/DTI fault Check the DTA/DTI and replace
blocked. RDTA fault it if necessary.
Check the RDTA and replace it
if necessary.

All services of a Node B are IMA group fault Check the IMA group and
blocked. Node B fault analyze the symptoms.
Check the Node B.

All services of a cell are blocked. Incorrect cell configurations Check cell configurations.
Manual blocking Unblock the cell.

4.7 Handling RNC Service Abnormality and Interruption


4.7.1 Handling Iu Interface Faults
Iu interface faults mainly include: a) the SS7 cannot reach the Iu interface; b) services
cannot be connected; c) calls cannot be got through; d) downloading or browsing cannot
be activated; and e) the signalling point unreachable alarm occurs in the background. Iu
interface faults are basically signalling link faults, which are usually caused by incorrect
data modifications, board failures or transmission link abnormalities.
How to analyze Iu interface faults are described in Figure 4-3.

Figure 4-3 Analyzing Iu Interface Faults

1. Many calls cannot be got through, or the Internet cannot be accessed and the terminal
cannot be activated.
2. Check alarms on the EMS alarm management interface to see if there is any office
direction unreachable alarm, and if the alarm occurs in all RNCs. If so, the fault lies
in the CN. If the fault only occurs in one or several RNCs, it is possibly caused by
RNC-side problems.

4-12

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

Recommended Solutions
1. Check to see if all tables are synchronized for the data modifications of the whole
network or a single RNC. If so, recover the data.
2. Check to see if there is any alarm about inaccessible calls or unreachable signals in
all RNCs. If so, check the CN side.
3. Check to see if there are frequent SSCOP link establishments and disconnections
(The message is BGN, END.) Make sure that the PVC bandwidth and the PVC type
of both sides of the Iu interface are identical.
4. Check the optical interface indicator of the RNC interface board. If the SD indicator is
off, check to see if the fiber connection is correct. If yes, reset or replace the APBE
and the interface board. If the SD indicator still off, check the CN side.
5. If the SD indicator is on, replace the interface board. If the problem still exists, check
the CN side.

4.7.2 Handling Clock System Faults


Fault Analysis
1. The clock reference lost alarm occurs on the EMS alarm management interface. The
indicator on the clock board is not in the tracing or holdover status.
2. The 16M clock lost alarm or the clock drive lost alarm occurs on the UIM/interface
board of the resource shelf. The UIM alarm indicator is always on.
How to analyze the clock system faults is described in Figure 4-4.

Figure 4-4 Analyzing Clock System Faults

Recommended Solutions
1. If the clock reference lost alarm occurs on the clock board, check to see if the clock
output connection on the RGIM is correct and if the connection is loose.
2. Conduct an active/standby changeover to the interface board or the optical interface.
3. If the alarm still exists after step 2, conduct an active/standby changeover to the CLK
clock board.

4-13

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

4. If the alarm remains after the above three steps, replace the rear board of the CLK
clock board and replace RGIM.
5. If the resource shelf reports the 16M clock driving alarm, take the following measures:
a. Check the clock cables on the rear board of the UIM to see if they are connected
correctly and if there is any loose connection.
b. Conduct an active/standby changeover to the UIM, with the driving clock being
provided by the standby UIM.
c. Replace the UIM, or replace the board whose driving clock fails.

4.7.3 Handling Call Failures


Call failures can be caused by many reasons, including faults arising from RCB/RSB
control plane and signalling processing, Iu interface board, and the CN side. It is
recommended to identify the fault coverage of call failures according to subscribers
complaints, on-site test, and signalling tracing. If the CS service cannot be connected in
only a few cells, the fault is possibly local. If no call can be got through in all cells of the
Node Bs in an RNC, it is probable that the Iu interface fails possibly due to RNC interface
board fault or CN processing fault. If the CS service cannot be processed in only a single
cell, it is recommended to fix it through routine maintenance and troubleshooting.
How to analyze call failures is described in Figure 4-5.

Figure 4-5 Analyzing Call Failures

1. If no call can be got through in many RNCs or throughout the network, the problem lies
in the CN side. If the failure only occurs in some areas, the problem lies in the RNC.
2. Check the SS7 link and the AAl2 channel (Iu office direction) through the background
dynamic management interface to see if they are in normal condition.
3. Check to see if the APBE operates normally. Check the background alarm
management interface to see if there is any APBE fault alarm.
4. Check the background alarm management interface to see if there are many alarms
about failed common channels or out-of-service cells.
5. Check to see if the cells in which no call can be got through belong to the same interface
board or RCP.

4-14

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

6. Check to see if call failures occur regularly. If the call fails once per several times of
calls, it is possible that one of the AAl2 channels at the Iu interface fails.

Recommended Solutions
1. Check to see if the RNC data configuration is modified before the failure occurs. If so,
recover the configuration by importing the backup data.
2. Check the SS7 link. If it is abnormal, handle it by following the criteria to analyze RNC
fault coverage.
3. Reset or replace the interface board.
4. If step 3 doesnt work, conduct an active/standby changeover between No.3 and No.4
module, setting the active module to the standby board.
5. Reset the interface board to which the failed cell belongs.

4.7.4 Handling Mute Calls


Fault Description
Unilateral or voiceless conversations occur during speech calls. These faults can be
caused by any failure arising from UE, air interface, Node B, RNC user plane, and CN. In
unilateral conversations, data packets cannot be transmitted correctly between the calling
party and the called party, resulting in that only one party can hear the voice. It is difficult
to find the problem because there are many network elements involved. Generally, such
a problem can be located by two means. One is to check statistics; the other is to make a
CS loopback test.
How to analyze mute calls is described in Figure 4-6.

Figure 4-6 Analyzing Mute Calls

Fault Analysis
1. When either party or both party cannot be heard in a speech call, replace the UE first,
and then make a test call in the same environment. If the fault does not occur any
more, the problem probably lies in the UE.

4-15

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

2. If unilateral conversations still occur after testing different brands of UEs for many
times, the problem possibly lies in the system.
3. Use two UEs to make a test call, and do an uplink loopback test and a downlink
loopback test on the calling party or the called party in the signalling tracing system. If
you can hear your voice from the calling UE during the uplink loopback test, it means
that there is no problem from the UE to the RNC, and the problem possibly lies in
interface board or the CN side. If not, the problem possibly lies in the user plane or
the Iub interface.

Recommended Solutions
1. Check to see if a global data modification is made before the failure occurs. If so,
recover to the pre-modification data.
2. Replace the UE. If the failure does not occur any more, the problem lies in the UE.
Report it to the UE maker for solution.
3. Reset APBE (Iu interface board).
4. If the fault still exists after step 3, reset the RUB where services are bourne (To check
the RUB, enter the command UcpmcGetInstNo IMSI in the RDS to get the inst No,
and then enter the command UcpmcShow InstNo, 3 (instNo is the instance number)
to find the slot of the RUB corresponding to the instance number).
5. Reset the IMA/APBI/DTA to which the failed cell belongs.
6. If the fault still exists, reset
7. If the problem remains after all these steps, contact personnel at the CN side for
troubleshooting.

4.7.5 Handling Download and Webpage Access Failures after


Activating PS Services
Fault Analysis
1. When a data card or a mobile phone processes PS services, it cannot open webpages
or download data through FTP after the PS service is activated.
Through the signalling tracing system, it is found that the signalling service can run
correctly. No webpage can be accessed through the UE. There is no alarm on the
EMS alarm management interface. If the webpage access failure occurs in all cells,
the problem possibly lies in the Iu-interface user plane. If the failure only occurs in
several cells, the problem possibly lies in the poor quality of the air interface. It is
recommended to handle it by following the instructions in troubleshooting manuals.

2. Make a packet transmission test to the UE by using the tool in the signalling tracking
system. If the UE downloads data at a normal rate during the test, it means that there
is no problem from the UE to the RNC user plane.
3. Make a ping packet test. If no problem is found during the test, the problem possibly
lies in the Iu interface, or the IP packet limitation made at the CE/CN side.

4-16

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

4. Replace the UE. If the download and webpage access failures does not exist any more,
the problem lies in the UE. Contact the UE maker for solution.
How to Analyze download and webpage access failures after activating PS services is
described in Figure 4-7.

Figure 4-7 Analyzing Download and Webpage Access Failures after Activating PS
Services

Recommended Solutions
1. Check to see if the data configuration is modified before the failure occurs. If so,
recover the configuration by importing the backup data.
2. Reset the GIPI, which segments and regroups packets. If the failure still exists, replace
the interface board.
3. If the failure remains, conduct an active/standby changeover to the UIM.
4. If the changeover doesnt work, reset the RUB where the PS service is established.
5. If the failure remains after all these resets, ask personnel at the CE and the CN sides
for troubleshooting to see if the problem is caused by the MTU packet limitation.

4.8 Handling Node B Service Abnormality and


Interruption
4.8.1 Handling Large-Scale Cell Outages
Cell outages are mainly caused by NCP link or CCP link disconnections, SCTP
disconnections, and common channel establishment failures, which then result in
cell establishment failures or repeated deletions and creations of common channels.
Generally, the alarms about NCP/CCP/SCTP link disconnections are caused by
transmission- and signalling processing-related problems, which should be analyzed
through such information as the location where the alarm is generated and the module to
which the cell belongs.

4-17

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Fault Analysis
1. Check the EMS to see if large-scale cell outages occur to all RNCs, and if all
transmission-related boards generate alarms. If so, the problem probably lies in
transmission.
2. Check the alarms on the EMS alarm management interface. If the interface board
generates many E1/IMA/SCTP link alarms, the cell outage is possibly caused by
transmission-related problems. For IP transmission, check to see if there is any
conflict in terms of MAC address or IP address.
3. If there are cell outage alarms but no interface board transmission failure alarms in the
EMS system, the problem may be caused by RCP failure.
4. If cell outages only occur to several interface boards, the problem possibly lies in the
Iub interface board.
How to analyze large-scale cell outages is described in Figure 4-8.

Figure 4-8 Analyzing Large-Scale Cell Outages

Recommended Solutions
1. Check to see if a global parameter modification is made before the failure occurs. If
so, recover the configuration by importing the backup data.
2. If all out-of-service cells belong to the same module and the transmission interface
board generates no alarms, conduct an active/standby changeover to the home RCB
module.
3. If all out-of-service cells belong to the same resource shelf and the transmission
interface board generates no alarms, conduct an active/standby changeover to the
UIMU/GUIM/GUIM2.
4. If all cells that belong to an interface board are out of service, reset or replace the
APBE/SDTA.

4.8.2 Handling Absence of Cell Signals and Low Success Rate of


RRC Establishments
The absence of cell signals is mainly caused by failures arising from common transmission
channel establishments, system message broadcasts, and UE-dedicated radio link (on
Node B) releases, or by transmission bandwidth resource leakage. Such problems are
analyzed by checking fault notifications, QoS alarms, success rate of RRC establishments,
and users complaints, or by making tests.
How to analyze the absence of cell signals and low success rate of RRC establishments
is described in Figure 4-9.

4-18

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

Figure 4-9 Analyzing Absence of Cell Signals and Low Success Rate of RRC
Establishments

Fault Analysis
1. Check the EMS interface to see if there are QoS alarms about the success rate of
RRC establishments. If so, it means that the current common transmission channels
are established successfully and the UE has initiated RRC establishments.
2. Check the EMS alarm management interface to see if there are notifications about
system message update failure. If so, it means that broadcast messages cannot be
delivered and the UE cannot access the network correctly due to the update failure.
3. Connect an LMT to the site to see if the BCH packet transmission increases normally.
If not, it means that the Node B fails to deliver broadcast messages.
4. Conduct ALCAP and FP signalling tracing through RNC or LMT signalling tracing
to see if the transmission allocation and the FP synchronization fail during RRC
establishments.

Recommended Solutions
1. Check to see if a global parameter modification is made before the failure occurs. If
so, recover the configuration by importing the backup data.
2. If there are notifications about system message update failure, modify the SIB1 value
of the cell and trigger the system message once to refresh the updating process.
3. If the Node B fails to deliver broadcasts, or if the transmission allocation and FP
synchronization fails, block and unblock the cell.
4. If all these steps dont work, reset the Node B.

4.8.3 Handling Service Interruption Caused by Radio Cell


Abnormality
Check with the following methods to judge the user service interruption caused by
abnormal radio cell are:

1. On OMC unified UMS client, check whether the cell establishment is normal.
2. Through Node B LMT, check whether the cell establishment is normal.
3. The abnormal activities take place in one or more cells, and all the activities in this
cell are abnormal or have a quite low success rate, while radio processes originated
in other cells run normally.

Proposals to handling radio cell establishment abnormality:

4-19

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

1. The possibility of ZXWR RNC/CN failure is quite little if The radio processes originated
in other cells run normally pops up.
2. Check whether the cell is in block status.
3. Reset the cell.
4. Wait for the system to recover, and then check whether the fault still exists.
5. Check whether Node B transceiving antenna is connected well and whether the power
amplifier is normal.

4.8.4 Handling Service Interruption Caused by Radio Configuration


Data Modification Error
The modification on radio configuration data may not immediately cause the service
abnormality. With the increase of users (such as the coming of traffic peak time) and the
change of user service types, various radio problems can be caused, such as, low access
rate of user service, unstable service rate.

Proposals to handling such fault:


1. Log on to EMS client, click Tool > Log Management, and check ZXWR RNC log
management to see whether there is radio resource configuration data modification.
2. If there radio resource configuration data modification, back up the current ZXWR RNC
configuration data.
3. Recover the data configuration of radio configuration.
4. Wait for the system to recover.

Caution!
The radio resource data is based on such factors as onsite call model and onsite landforms
combining with network planning and optimization, so do not modify it. To adjust the
parameters, make a proper data backup beforehand.

4.9 Handling OMM/M31 Abnormality and Interruption


4.9.1 Handling OMM and NetNumen U31 Abnormality and
Interruption
Fault Description
Generally, the symptom is that the Client cannot log in the Server.
How to analyze OMM and NetNumen U31 abnormality and interruption is described in
Figure 4-10.

4-20

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

Figure 4-10 Analyzing OMM and NetNumen U31 Abnormality and Interruption

Handling Steps
1. Check to see if the communication between the Client and the Server is normal.
a. Ping the IP address of the Client and the Server to see if the communication is
normal.
If the IP address can be pinged through, but the packet loss rate is high and the
network is intermittent, check to see if there is another computer with the same
IP, if the dhcp function is enabled illegally in any computer in the internal network,
and if the physical connection of all NEs is correct.
b. If the IP address cannot be pinged through, check the physical connection
between the Client and the Server for abnormality.
If the Server and the Client are not in the same subnetwork, use the command
netstat r to check if the Server and the Client can communicate through the
router. If not, add a route by running this command: route add xx.xx.xx.xx (network
IP address) -netmask xx.xx.xx.xx (subnet mask) xx.xx.xx.xx (gateway IP address); for
example:
#route add 192.168.0.0 -netmask 255.255.255.0 10.11.201.254
This command will add a route to the 192.168.0.0 network section, with the
gateway IP address being 10.11.201.254. The routes added by this means will
not exist any more after the operating system is restarted. Therefore, it is required
to write the route configuration command in the startup script; for example, at the
end of the /etc/rc3 file.
c. Check to see if the router is configured correctly.
2. Use another Client to log in the Server. If the login succeeds, the problem lies in the
Client; if the login fails, the problem lies in the Server.
3. Make sure that there is enough space in the system disk and the disk where the Client
software is installed. If the space is not enough, delete unnecessary files to make
more space.
4. Check to see if the Client is affected by virus and if the operating system runs normally.
a. Use the latest virus definitions to kill the virus.

4-21

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

b. If the operating system does not run normally, reinstall it or find another computer
to install the Client software.
5. Check to see if the Server hard disk is fully used.
a. Run the command df -k to check if any hard disk partition in the Server is fully
occupied.
b. If so, use the command rm rto delete unnecessary files, such as redundant log
files, or use the command rm rto delete useless folders to make more room.
For example, to delete the log files under the directory $OMCHOME/log, run the
following commands:

$rm 123.txt
$rm r 1234
6. Check to see if the Server process is normal.

a. Check to see if the Server process is running.

Run the command ps u womcr to check the real-time OMM process. For
example, the following shows a normal OMM process:
bash-3.00$ ps -u gomcr
PID TTY TIME CMD
5844 ? 00:00:00 run-linux.sh
5851 ? 00:00:00 ftpserver-linux
5855 ? 00:18:44 java
5872 ? 00:00:00 java

b. Check to see if the Server logs are still being printed.


The NetNumen U31 server logs are saved under the directory $OMCHOME/log.
To check the log output, run the following command:
bash-3.00$ tail -f server-20090112-0935-00020.log
2009-01-12 15:16:03,108 INFO [class
com.zte.ums.zxgomcr.emf.fm.bsc.MsgsDispatcher]
Receive msg: -2
2009-01-12 15:16:03,108 INFO [class
com.zte.ums.zxgomcr.emf.fm.bsc.MsgsDispatcher] Receive link
break message.

c. If there is no OMM log output, use the command ps ef|grep java to check the
OMM to see if there is any java process running. Run the command kill 9 to
quit the java process. For example, if the java process number is 1209, run the
following command to quit it:

kill -9 1209

4-22

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

Note:
There are possibly more than one java process in the system. Kill them all.

7. Check to see if the database runs normally.


a. Check to see if the database has been started.
Run the command $ps ef|grep oracle to check the database process. The output,
for example, is shown as follows:
oracle 1273 1 0 Jan 15 ? 0:00 ora_q000_womcr
oracle 1275 1 0 Jan 15 ? 0:01 ora_q001_womcr
oracle 1338 1 0 Jan 15 ? 0:07 oraclewomcr (LOCAL=NO)
iomcr 13141 13127 0 21:11:39 pts/3 0:00 grep oracle
oracle 13134 1 0 21:10:42 ? 0:00 ora_j000_womcr
oracle 12274 1 0 16:44:02 ? 0:47 oraclewomcr (LOCAL=NO)
oracle 1273 1 0 Jan 15 ? 0:00 ora_q000_womcr
oracle 1275 1 0 Jan 15 ? 0:01 ora_q001_womcr
oracle 1338 1 0 Jan 15 ? 0:07 oraclewomcr (LOCAL=NO)
iomcr 13141 13127 0 21:11:39 pts/3 0:00 grep oracle
oracle 13134 1 0 21:10:42 ? 0:00 ora_j000_womcr
oracle 12274 1 0 16:44:02 ? 0:47 oraclewomcr (LOCAL=NO)
b. Check to see if the database can be connected. Run the command sqlplus /nolog
to log in as an ORACLE user, and then enter the command connect sys/oracle as
sysdba in sql command mode to connect the database. If connected is displayed,
it means that the connection is successful, for example:
$ sqlplus /nolog
SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jan 17
16:42:43 2008
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> connect sys/oralce as sysdba
Connected.
SQL>
c. If the database runs normally, restart the EMS Server programme only.
d. If the database does not run normally, quit the database.
e. Restart the database.

4-23

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Connect the database, and then shut down ORACLE by running the command
shutdown immediate. Connect the database again and run the command startup
to start the database.

Note:
It is required to log in sqlplus as an ORACLE user.

Shut down ORACLE


SQL> connect sys/oralce as sysdba
Connected.
SQL>shutdown immediate
Start ORACLE
SQL> connect sys/oralce as sysdba
Connected.
SQL>startup;
f. Restart the iOMCR.
8. If all steps above do not work, restart the Server.
a. Shut down and then restart the Server.
It is recommended to quit the EMS programme process and the database before
the restart.
The command to restart the Server is (it is required to shut down the Server as a
root user):
#/usr/sbin/shutdown -y -g 0 -i 6
or #init 6.
b. Start the database (optional; the database may start automatically).
c. Start the EMS programme.

4.9.2 Handling OMM and NetNumen U31 Performance Data Delay


and Reporting Failure
Fault Description
The typical symptom is that performance data cannot be queried, or the performance data
delay alarm occurs in the system.
How to analyze OMM and NetNumen U31 performance data delay and reporting failure is
described in Figure 4-11.

4-24

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

Figure 4-11 Analyzing OMM and NetNumen U31 Performance Data Delay and Reporting
Failure

Fault Analysis
1. Process to collect the OMM performance data
Generally, performance data is collected in this order: OMP SBCX; EMS-(ftp)
OMM-(ftp) SBCX. The following steps describe how performance data is collected.
a. The RNC collects performance data according to the measurement tasks created
in OMM.
b. The RNC uploads the collected data to the log server.
c. The OMM server takes the performance data (.dat) from the log server through
FTP.
d. The OMM resolves the data files, and then save them into the database.
e. The OMM creates EMS data files under its FTP directory according to the
measurement tasks created by the EMS and notifies the EMS to take the files.
f. The EMS server takes the performance data (.xml) from the OMS server through
FTP.
g. The EMS resolves the data files, and then save them into the database.
2. Process to check the OMM performance data
Anything wrong occurring in any of the steps above can lead to data query failure.
Check the data by selectively following the steps below as required.
a. The RNC collects performance data according to the measurement tasks created
in OMM.

On the performance management interface, check if this type of measurement


task is created. If not, create the measurement task and deliver it to the NE.

4-25

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Note:
Measurement tasks are classified into basic measurement and general
measurement. The basic measurement is to collect all data by default and the
user does not have to create measurement tasks. For the basic measurement,
the name of the measurement task has a suffix "(Base)".

b. The RNC uploads the collected data to the log server.


i. Check to see if the network connection from the ROMP to the log server is
correct (network cable, IP).
ii. Check to see if the LogService programme on the log server runs normally.
iii. Check to see if the home directory, username, and password for the FTP
service are correctly set on the log server. The home directory is IDE0. The
username and the password are RNCV3PM and RNCV3PM respectively.
iv. Check to see if the performance data file is created under the home FTP
directory \Rnc\Rms\Pm\[Measurement Object Type]\ on the log
server, and if the file size is normal.
c. The OMM server takes the performance data (.dat) from the log server through
FTP.
i. On the performance management interface, check if the IP address of the log
server is configured correctly.
ii. Check to see if the network connection from the OMM server to the log server
is correct (network cable, IP).
iii. Check to see if the FTP service from the OMM server to the log server is
normal. The FTP username and password are RNCV3PM and RNCV3PM
respectively. If the FTP fails, check the FTP setting in step 2 to see if it is
correct.
iv. Check to see if the OMM server has taken data files from the log server. Go
to the following directory to check the last time when the data files were taken
(If there is not the latest file, check the steps above for any problem.):
\ums-svr\zxwomc\RNS\RNC\zxwomc-pm-emf-rnc.par\[NE No.]\D
ATA\FILE\[Measurement Object Type]\bak
ums-svr\tmp\RNCPMDATABAK\[NE No.]\DATA\[Measurement
Object Type]\bak (This path is used for 3.17.300k, 3.17.310d, or later
versions.)
d. The OMM resolves the data files, and then save them into the database.
i. Check to see if the performance table space is not enough (the username is
RNS_PM), or if automatic extension is set.

4-26

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

ii. On the OMM NM interface, click System Management > Database


Management > View Data Resource to check the usage of the database.
For more details, use the corresponding Oracle tools.
iii. Check to see if there is any data file that is too large to be saved into the
database. Such printing results as follows can be found in the log files under
the directory ums-svr\log.
Caused by:
com.zte.ums.uep.api.pal.pm.common.PmException:
callSaveProc SQLException java.sql.SQLException:

ORACLE:ORA-06502: PL/SQL: numeric or value error:


number precision too large
ORA-06512: at "RNS_PM.PROC_RNC_RNC_RAB", line 1046

ORA-06512: at line 1
Caused by: java.sql.SQLException: ORACLE:ORA-06502:
PL/SQL: numeric or value error: number precision too
large
ORA-06512: at "RNS_PM.PROC_RNC_RNC_RAB", line 1046
ORA-06512: at line 1

Note:
When this happens, send the logs to the troubleshooting team in time.

iv. Check to see if there is repeated data.


v. Check to see if there is any file resolution failure.
If a file cannot be resolved, it will be saved under the following directory:
\ums-svr\zxwomc\RNS\RNC\zxwomc-pm-emf-rnc.par\[NE No.]\D
ATA\FILE\[Measurement Object Type]\err

\ums-svr\tmp\RNCPMDATABAK\[NE No.]\DATA\[Measurement
Object Type]\err (This path is used for 3.17.300k, 3.17.310d, or later
versions.)

Note:

When this happens, send the file that cannot be resolved to the
troubleshooting team.

4-27

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

vi. If you cannot query the data about such radio objects as some cells, check to
see if they are set to the debugging status.
If you cannot query the data about a cell, a Node B, or a cell pair, check the
corresponding Node B to see if it is set to the debugging status. If so, the data
of the cells, Node Bs, and cell pairs that corresponds to the Node B will be
saved in the debugging table.
To check the debugging data, enter the engineering mode (by pressing
Ctrl+Shift+P) on the Performance Management interface in the OMM NM
system. If the site is set to the debugging status, check to see if it is necessary
to modify the status.
e. The OMM creates EMS data files under its FTP directory according to the
measurement tasks created by the EMS and notifies the EMS to take the files.
Check to see if the OMM creates EMS data files on time.
Open the logs created during the corresponding period under the directory \ums
-svr\log\, and then search the key word .xml in the printing result. If there is
such printing result as follows, it means the OMM has created files and notified
the EMS.
INFO [SocketTransport] send a notify:10100035:Code:10100035
,sequenceId:E5048513-C97F-9928-CAFC-597478C908CD
destUrl:socket://172.22.96.3:21125/ Object0:\tmp\ftp\nmi\pm
\WRNC\30117\PM200903101447+080024A20090310.1430+0800-200903
10.1445+0800_30117_103_-_1.xml
INFO [com.zte.ums.csp.nmi.pm.adapter.EmsPmAdapterNotifMea
s]fileName= \tmp\ftp\nmi\pm\WRNC\30117\PM200903101447+08002
4A20090310.1430+0800-20090310.1445+0800_30117_103_-_1.xml
f. The EMS server takes the performance data (.xml) from the OMS server through
FTP.
g. The EMS resolves the data files, and then save them into the database.

Typical Solutions
1. Typical case 1: The parameters for the FTP service are modified, which leads to a
failure to save the performance data into the database.
Symptoms: No result is displayed during the query of the latest performance statistics
at the NM Client.

a. Check the network connection of the OMC server and the log server. The
connection from the OMC server to the log server, as well as the connection from
the log server to the NE ROMP, is correct.

b. Log in the database on the OMC server by using the username rns_pm. Run the
corresponding command to check the last time when performance data is saved.

4-28

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

c. Check the available space of the hard disk where the Oracle is installed. There is
enough space.
d. Check to see if the OMC has taken the data files.
e. Check to see if the log Server has created files.
f. Check to see if the FTP service of the log server is configured correctly.
After these steps, it can be concluded that the OMC server fails to take performance
statistics from the log server. The OMC server takes performance data through the
FTP service of the log server, so the troubleshooting should focus on the settings of
the FTP service.
The home directory for the anonymous user of the FTP service on the log server is
modified, and this is the reason for the problem.
2. Typical case 2: The FTP connection of the OMC fails frequently.
Symptoms: There are many data delay alarms on the EMS and these alarms mostly
last for less than 30 minutes. The EMS logs show that the OMM fails to notify the EMS
to take files on time.
First, check the network connection of the OMC server and the log server. The FTP
connection from the OMC server to the log server and from the log serer to the NE
ROMP is incorrect. Restart the FTP programme and the problem is solved.

4.9.3 Handling Database Access Failure


Fault Description
The database is inaccessible after a sudden power-off.
This is because the database needs to do some work to recover after a sudden power-off.
Generally, the work will be done automatically during the database startup.

Recommended Solutions
Recover the database manually by running proper commands.
Use sqlplus to log in the system as sysdba. Run the command alter database open and
check the execution of the command. If it is executed successfully, the database will be
back to normal. If not, run the command recover database. If the database is accessible
after running this command, it means the database recovers.

4.9.4 Handling Oversize Rollback Segment Caused by Mass Data


Deletion
Fault Description
After some data is deleted through the command DELETE from the database, the hard
disk space is largely used and there is not enough space.

4-29

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

This is because the data deleted through the command DELETE will go to the rollback
segment first, and then will be removed from the rollback segment after the command is
submitted, but the space occupied by the UNDO table space will not be released.

Recommended Solutions
Log into the database as the SYS user SYSDBA.
Run the following commands:
SQLPLUS /NOLOG
CONN SYS/ORACLE@WOMC AS SYSDBA
Where, ORACLE is the password and WOMC is the SID of the database.
1. Create new UNDO table space by running the following commands:
CREATE UNDO
TABLESPACE "UNDOTBS3"

DATAFILE 'E:\ORACLE\ORADATA\WOMC\UNDOTBS03.DBF' SIZE 200M


REUSE AUTOEXTEND
ON NEXT 5120K MAXSIZE 32767M
2. Switch the current UNDO table space to the new UNDO table space by running the
following command:
alter system set undo_tablespace=undotbs3 scope=both
3. Run the following command and wait until all the UNDO segments in the previous
UNDO table space are offline.
select usn,xacts,status from v$rollstat
Check to see if there is the value PENDING OFFLINE in the STATUS field. If so,
wait for a moment before executing the command above; if not, turn to the next step.
4. Run the following command to delete the previous UNDO table space.
drop tablespace undotbs1 including contents and datafiles

4.9.5 Handling Free Disk Space Insufficiency Caused by Improper


Partition
Fault Description
Performance data cannot be saved into the database. There are alarms about shortage
of free disk space.

This is because the hard disk is unreasonably partitioned and planned, so that the
database is filled with performance data or alarms, especially after the database extends
itself automatically.

4-30

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

Recommended Solutions
Follow the steps below to move a certain data file from the disk that has no enough free
space.
1. Make offline the table space to which the data file belongs by running the command
alter tablespace (table space name) offline.
2. Cut the data file and paste it into the new directory (the file can be renamed).
3. Run the following command to reconnect the table space and the data file: alter tables
pace (table space name) rename datafile the previous absolute path of the data file +
the previous file name to the new absolute path of the data file + the new file name.
4. Make the table space online by running this command: alter tablespace (table space
name) online
The data file is moved.
The following command can be used to check the location of a table space data file.
table space name (capital le
select file_name from dba_data_files where tablespace_name=
.
tter)

4.10 Handling Overload


4.10.1 Handling MP CPU Overload
Fault Description
The MP CPU overload alarm occurs.

The performance statistics shows that the average MP load is above 60%.
The fault is mainly caused by insufficient traffic planning, traffic burst, and UE registration
burst.

How to handle MP CPU overload is described in Figure 4-12.

4-31

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Figure 4-12 Handling MP CPU Overload

Recommended Solutions
1. How to handle MP overload caused by increased traffic.
During the MP overload period, keep a close eye on the MP load. If the load is above
80%, block some cells manually to lower the load.
Modify the corresponding parameters when the MP load is relatively low.
Modify the access parameters to reduce the retransmissions of RRC connection
requests.
Modify the location update parameters to reduce the periodic location updates. Make
the modifications according to the MSC. The modified parameters must be lower than
the values set in the MSC.
If all RCP modules are not evenly loaded, modify the number of sites that belong to
these RCP modules.

2. How to handle MP overload not caused by increased traffic.


Check to see if the MP runs normally. Check the history alarms of the MP. If there is
any abnormality, conduct an active/standby MP changeover, or replace the MP.

In the rack diagram of status management, check the active/standby status of the MP
to see if the MP board is in an abnormal status. If abnormal, click the MP board to make
an active/standby changover. Check to see if signalling tracing and RTV measurement
are enabled. If so, disable them.

Go to the MP-related logs and send them to the UMTS troubleshooting team.

4-32

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Chapter 4 Emergency Maintenance on Abnormal Services

4.11 Data Restoration


This section introduces the importance and the methods of the data restoration.
1. On the EMS configuration management interface, select Configuration Mangaement
> Data Restoration on the menu bar. The Data Restoration dialog box is displayed,
see Figure 4-13.

Figure 4-13 Data Restoration Dialog Box

2. From the Server drop-down list, select the backup configuration data file, and click
Restore.

Before handling ZXWR RNC emergency faults, back up the configuration data first. On
one hand, the fault recovery may involve configuration data modification, and the data can
restore onsite status to avoid the worse case during the emergency fault recovery. On the
other hand, reserve the first-hand information for ZTEs maintenance and technical support
personnel and the technicians at the home front, helping to analyze and locate problems
and improving the system performance.

4-33

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

This page intentionally left blank.

4-34

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Appendix A
Data Backup and Recovery
Table of Contents
Overview to Board Reset and Changeover ............................................................. A-1
Influence of Reset and Changeover........................................................................ A-1
Changeover Modes ................................................................................................ A-2

A.1 Overview to Board Reset and Changeover


Board reset is an operation that must be restarted in some faults. Hot resetting means to
restart the system when the system is being powered on, so it is the software resetting. All
related resets mentioned below are hot ones. To guarantee the normal system running and
to avoid the abnormal system running caused by resetting due to the board software and
hardware faults, back up important boards. The table below describes the board backup.

Table A-1 Backup Mode of Boards

Backup Type Boards that Support the Backup Mode

APBI, GIPI3, GIPI4, SDTA2, DTA, IMAB, DTI, EIPI, SDTI, RCB, ROMB,
1+1 backup
UIMC, GUIM, THUB, CLKG, ICM, ICMG, SBCX, RCB

1:1 backup APBE, APBE2, APBI, SDTA, SDTA2, SDTI, SDTB, POSI

Load sharing RUB, GLI, PSN

A.2 Influence of Reset and Changeover


In M/S configuration of such boards as ROMB, UIMC, UIMU, GIPI, CHUB, THUB, CLKG,
ICM, GIPI4, POSI, SDTA2, SDTT, GLI, and PSN, their reset do not have influence on
services.
In active/standby board configuration, RCB which to be DMP have no backup, resetting
active RCB which to be DMP may cause the loss of services borne on that board
currently, but new services can access after successful changeover. Therefore, make
RCB active/standby changeover in the case of low traffic. RUB have no backup, the
resetting function of RUB is similar to that of RCB, that is, the services borne on the
current board may lose. Therefore, reset with care.

A-1

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

A.3 Changeover Modes


1. Manual changeover at EMS
l Log on to the NetNumen client, right-click a board, and select the board to reset
from the pop-up menu.
l Log on to the EMS client. In the rack diagram of status management, select the
board to change over, right-click on the board which is in the active status, and
select Normal Switch from the pop-up menu.
2. Manual changeover at RNC
l Reset the active board on the RNC rack.
l Press the EXCH key on the board panel.
3. Automatic changeover due to RNC fault
The RNC system will activate a changeover after detecting any fault.

A-2

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Appendix B
Emergency Maintenance
Tables and Common
Information Description
Table of Contents
Abnormality Record Table ...................................................................................... B-1
Troubleshooting Record Table ................................................................................ B-3
Equipment Emergency Maintenance Requisite....................................................... B-4
Common Panel Indicators....................................................................................... B-6
Link/Cell Fault Confirmation Methods ..................................................................... B-7

B.1 Abnormality Record Table


The table below serves as an example only. It is better to optimize according to actual
ZXWR RNC maintenance items.

Table B-1 Abnormality Record Table

Equipment name Equipment No.

Item Abnormality description

Fault occurrence time

Fault occurrence scope

Program version of
active ROMP

B-1

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Version of connected
Node B/CN

OMC version

Board whose ZXWR


RNC indicator shows
abnormality

Serious alarm item


reported by OMC

Operation log
information of OMC

For example, the environment of the equipment room: temperature and


humidity change. Record it if any.

Project information

B-2

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Appendix B Emergency Maintenance Tables and Common Information Description

Including the resource name (such as service cell ID, ATM No.) whose
status is abnormal through query and abnormal contents. Record it if
there is any.

Abnormality query
information

Obtain this part of information under the assistance of ZTEs remote


technical support personnel after emergency assistance request launches.

Information of signal
tracing

B.2 Troubleshooting Record Table


Table B.1 Troubleshooting Record Table

Equipment name Equipment No.

Fault occurrence time (HH-MM-DD-YY) Fault elimination time (HH-MM-DD-YY)

Fault type:

Fault source:

Fault phenomena:

B-3

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

Solution:

Summary:

Signature of the attendant: Signature of the handling person:

B.3 Equipment Emergency Maintenance Requisite


This requisition is to notify ZTE technical support center by fax when the carrier fails to
solve the fault on his own. It is better to attach the on-site fault record table with the fax,
to allow ZTE personnel to locate and eliminate the faults more easily.

Table B-2 Equipment Emergency Maintenance Requisite

The user should fill in the following fields.

Name No. Software version

Complaint (HH-MM- Com-


Telephone
time DD-YY) plainant

Complaint
company or
In the warranty period or not ()Y()N
organiza-
tion

Abnormality Record Table (Please attach it on the blank below):

B-4

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Appendix B Emergency Maintenance Tables and Common Information Description

Details of the handling process (as detailed as possible):


Reviewed by: Stamp of the department:

ZTE personnel should fill in the following fields:

Solution Time of settlement

O Guide through telephone


O Remote maintenance (HH-MM-DD-YY)
O On-site support

Handling result:
Handled by: Stamp of the department:

Unresolved problems:

B-5

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

B.4 Common Panel Indicators


Table B-3 Common Panel Indicators

Status Name RUN Status ALM Status Priority Description

Initial Solid ON Solid OFF 0 Initial status

Normal Flash at 1 Hz Solid OFF 1 Normal running

The version
Flash at 5 Hz Solid OFF Booting is being
downloaded.

The version
download failed.
The board does
Flash at 1 Hz Flash at 5 Hz Booting not match the
configuration and
cannot download
its version.

DEBUG version
Downloading
has successfully
version
downloaded
VxWorks and
is waiting for
downloading
and running the
Solid ON Solid OFF Booting
version.
RELEASE version
indicates the
version download
is successful and
the version is
starting.

The board
Solid OFF Flash at 5 Hz 7
self-test failed.
Self-test failure The startup of the
Solid OFF Flash at 2 Hz 8 operation support
system fails.

B-6

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Appendix B Emergency Maintenance Tables and Common Information Description

Status Name RUN Status ALM Status Priority Description

Getting the logical


Flash at 5 Hz Flash at 5 Hz 9
address fails.

The power-on of
Flash at 5 Hz Flash at 2 Hz 10 basic processes
fail or time out.

The kernel data


Flash at 5 Hz Flash at 1 Hz 11 area fails in
initialization.

Alarm of
mismatch
Flash at 2 Hz Flash at 5 Hz 6 among version,
hardware and
Running fault configurations.
alarm
The media plane
Flash at 2 Hz Flash at 2 Hz 2 communication
disconnects.

Flash at 2 Hz Flash at 1 Hz 3 HW disconnects.

The link to OMP


Flash at 1 Hz Flash at 2 Hz 4
breaks.

Active/standby
Flash at 1 Hz Flash at 1 Hz 5 changeover is in
process.

The 8K and 16M


No change Solid ON 12 hardware clocks
lose.

Besides common indicators mentioned above, different boards have their own indicators.
For the detailed description of the indicators, refer to the related hardware description
manual.

B.5 Link/Cell Fault Confirmation Methods


B.5.1 Checking Whether NCP Link is Normal
Steps
1. You can check the status of NCP link and CCP link at UMTS Radio Resource > Iub
Link on the EMS status management interface.
End of Steps

B-7

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

B.5.2 Checking Whether Cell Establishment is Normal


On the NetNumen status management interface, you can check the status of the cell on
UMTS Radio Resource > Cell, and check the common channel status on UMTS Radio
Resource > Channel.

B-8

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Figures
Figure 3-1 Flow of Emergency Maintenance ............................................................. 3-1
Figure 4-1 Handling Process During System Power Failure ...................................... 4-2
Figure 4-2 Network Location and Alarm Structure of ESDTI, ESDTG, and
ESDTT.................................................................................................... 4-8
Figure 4-3 Analyzing Iu Interface Faults .................................................................. 4-12
Figure 4-4 Analyzing Clock System Faults .............................................................. 4-13
Figure 4-5 Analyzing Call Failures .......................................................................... 4-14
Figure 4-6 Analyzing Mute Calls ............................................................................. 4-15
Figure 4-7 Analyzing Download and Webpage Access Failures after Activating PS
Services................................................................................................ 4-17
Figure 4-8 Analyzing Large-Scale Cell Outages...................................................... 4-18
Figure 4-9 Analyzing Absence of Cell Signals and Low Success Rate of RRC
Establishments ..................................................................................... 4-19
Figure 4-10 Analyzing OMM and NetNumen U31 Abnormality and
Interruption ........................................................................................... 4-21
Figure 4-11 Analyzing OMM and NetNumen U31 Performance Data Delay and
Reporting Failure .................................................................................. 4-25
Figure 4-12 Handling MP CPU Overload................................................................. 4-32
Figure 4-13 Data Restoration Dialog Box ................................................................ 4-33

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Figures

This page intentionally left blank.

II

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Tables
Table 4-1 RNC Fault Coverage Analysis ................................................................. 4-11
Table A-1 Backup Mode of Boards ............................................................................A-1
Table B-1 Abnormality Record Table .........................................................................B-1
Table B-2 Equipment Emergency Maintenance Requisite .........................................B-4
Table B-3 Common Panel Indicators .........................................................................B-6

III

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Tables

This page intentionally left blank.

IV

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


Glossary
ALCAP
- Access Link Control Application Protocol
APBE
- ATM Process Board Enhanced version
BGSN
- Backplane of Giga universal Service Network
CHUB
- Control plane HUB
CLKG
- Clock Generator
CN
- Core Network
EMS
- Electromagnetic Susceptibility
GIPI
- GE IP Interface

GLI
- Gigabit Line Interface

GUIM
- Gigabit Universal Interface Module
ICM
- Integrated Clock Module
LMT
- Local Maintenance Terminal
MP
- Main Processor
MTP3B
- B-ISDN Message Transfer Part level 3
OMC
- Operation & Maintenance Center

POSI
- POS Interface Board
PSN
- Packet Switched Network

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential


ZXWR RNC Emergency Maintenance

RCB
- RFS Circuit Breaker
ROMB
- RNC Operation & Maintenance Board
RUB
- RNC User Plane Board
SBCX
- X86 Single Board Computer
THUB
- Trunk HUB
UIMC
- Universal Interface Module for Control plane (BCTC or BPSN)
UIMU
- Universal Interface Module for User Plane

VI

SJ-20121213161606-020|2013-03-07(R1.0) ZTE Proprietary and Confidential

Você também pode gostar