Você está na página 1de 17

Implementing ITIL for Incident Management

Hi-Tech HTTD Storage CoE


Brijesh Das M.K.
brijesh.mk@tcs.com

Date Month Year

CONFIDENTIAL
Table of Contents 

Why use ITIL ...................................................................................................3


Incident Management Process............................................................................4
Customer ........................................................................................................5
Challenge ........................................................................................................5
Implementation ...............................................................................................5
Incident Management Goal ................................................................................6
Who Reports an Incident ...................................................................................7
Escalation Policy...............................................................................................7
Reporting an Incident .......................................................................................7
Incident Management Activity Workflow ..............................................................8
Evaluation Process ...........................................................................................9
Escalation Process ............................................................................................9
Resolution Process .........................................................................................12
Inputs...........................................................................................................13
Activities .......................................................................................................13
Outputs ........................................................................................................13
Roles & Responsibilities...................................................................................14
Incident Record Keeping .................................................................................17
Incident Report ..............................................................................................17
The Result.....................................................................................................17

2
Why Use ITIL?

Organizations are increasingly dependent upon IT to satisfy their corporate


aims and meet their business needs. This growing dependency leads to
growing needs for quality IT services – quality that is matched to business
needs and user requirements as they emerge.

ITIL provides a comprehensive, consistent and coherent set of best practices


for IT Service Management processes, promoting a quality approach to
achieving business effectiveness and efficiency in the use of information
systems. ITIL processes are intended to be implemented so that they
underpin, but do not dictate, the business processes of an organization

ITIL codifies IT services management best practices. Among the benefits


associated with adopting the libraries best practices, clients have identified,
improved customer satisfaction with IT services, better communications and
information flows between IT staff and customers, and reduced costs in
developing procedures and practices within an enterprise

ITIL can help with IT provision by providing:

• Better Customer Service: ITIL will deliver better services which are
tailored to the needs of the customer

• Better Cost Effectiveness: ITIL assists organizations in providing a


quality IT service within a business environment affected by
budgetary constraints but also growing user expectations.

• Better Motivation and Productivity: ITIL encourages staff to view IT


Service Management as a recognized professional skill, ultimately
increasing effective performance.

3
Incident Management Process

An incident is an unplanned interruption to an IT service, or a reduction in


the quality of an IT service. Failure of a configuration item that has not yet
impacted service is also an incident.

The purpose of Incident Management is to restore normal service as quickly


as possible, and to minimize the adverse impact on business operations.
Incidents are often detected by event management, or by users contacting
the service desk. Incidents are categorized to identify who should work on
them and for trend analysis, and they are prioritized according to urgency
and business impact.

If an incident cannot be resolved quickly, it may be escalated. Functional


escalation passes the incident to a technical support team with appropriate
skills; hierarchical escalation engages appropriate levels of management.

After the incident has been investigated and diagnosed, and the resolution
has been tested, the Service Desk should ensure that the user is satisfied
before the incident is closed.

An Incident Management tool is essential for recording and managing


incident information.

Incident Management links with other processes, activities and functions,


these are:
• Configuration Management Process
• Change Management Process
• Problem Management Process
• Process, Procedures and documentation for Training and Knowledge
Management.

Incident Management lays more emphasis on Change and Problem


Management. Change Management tries to limit incidents happening as a
result of any change whereas repetitive incidents are eliminated by Problem
Management by making available the known error database.

4
Customer:

The client is a market leader in specialty areas of global supply chain management, providing end-
to-end logistics and supply chain solutions based on leading edge technology, innovative thinking
and design to deliver cost savings and competitive advantage to its customers. The client delivers
intelligent logistics solutions for both inbound and outbound material flow requirements.

Challenge:

IT infrastructure of the company is based on the ITIL framework and Service Desk forms the centre
of all activities, the desk has the responsibility of running Incident, Problem and Change
Management.

Incident Management is also a major activity across IT, any downtime can put a significant dent on
the SLA parameters, and only efficient handling of the incidents can help in reduced downtime of
services and provide for high availability of IT services for both internal and external customers.

The challenge in the client environment was to develop an incident management policy with
incremental escalation time lines so that all stake holders are well informed with information and in
the process lead to reducing downtime for incidents through efficient handling of incidents

Implementation

Incident and Change Management was implemented by Service Desk, it is important to note that
ITIL defines Service Desk as a function and not as a process.

Service Desk is the front lines of IT, acting as the liaison between IT and the business units or end
users. They are responsible for logging problem reports or service requests, forwarding them to
responsible services, tracking progress, reporting status to requesters and management escalation
if necessary, and closing requests when the work has been completed.
Incident Management Goal:

The primary goal of the Incident Management process was to restore normal service operation as
quickly as possible and minimize the adverse impact on business operations, thus ensuring that the
best possible levels of service quality and availability are maintained. ‘Normal service operation’ is
defined here as service operation within Service level agreement (SLA) limits.

Incident Management Process Overview

6
Who reports Incident?

™ Customer: They are individuals who commission, pay for, and own the IT Services. A
customer is likely to report a service deficiency within the SLA.

™ User: People who use IT Services on daily basis are the users, they are likely to report a
software application incident or a printer malfunction.

Escalation Policy

‘Escalation’ is the mechanism that assists timely resolution of an incident. It takes place during
every activity in the resolution process, it can be of the following two types:

™ Functional

Transferring an Incident from 1st Level to 2nd Level support groups or further is called
‘functional escalation’ and primarily takes place because of lack of knowledge or expertise.
Functional escalation was to be followed when agreed time intervals elapsed and must not
exceed the (SLA) agreed resolution times.

™ Hierarchical

‘Hierarchical escalation’ would take place at any moment during the resolution process when
it is likely that resolution of an incident would not be in time. In case of lack of knowledge or
expertise, hierarchical escalation was performed manually (by the Service Desk or other
support staff). Automatic hierarchical escalation could be considered after a certain critical
time interval, when it was likely that a timely resolution would fail. Preferably, this takes
place long enough before the (SLA) agreed resolution time is exceeded so that corrective
actions by authorized line management can be carried out.

Reporting an Incident

End User could report incidents to Service Desk by using the following:

• Phone\ Fax\Walk-in

• Monitoring Tool (HpOpenview)

• Messaging Services (Emails\Messenger Services)

7
8
Evaluation Process

Incident captured through monitoring or reported by the plant

™ Customer Impacted

• Determine which systems are impacted


• Determine the level of the impact (customer down?)
• Determine if a work around exists and is being implemented
• Immediately escalate to
o 2nd Level business Analyst
• Inform
o SD Manager
o Site Analyst
o Plant Manager
o Regional IT Manager

™ Plant Production Impacted

• Determine extent of the impact


• Determine the types of system impacted
o Database: Immediately escalate to 2nd level DBA
o SAP: Utilize 1st level SAP troubleshooting matrix
o E-Mail: Utilize 1st level email troubleshooting matrix
o WAN: Utilize 1st level WAN troubleshooting matrix
o LAN: Utilize 1st level LAN troubleshooting matrix
o WLAN: Utilize 1st level WLAN troubleshooting matrix
o Hardware: Utilize 1st level hardware troubleshooting matrix
• Problem resolved using the problem using 1st level troubleshooting matrix
o No: escalate to appropriate 2nd level support
o Yes:
ƒ log incident in HPOV
ƒ assign incident to SD Mgr for review

Escalation Process

During all incidents, the service desk representative would do the following:

™ Track important events during the incident


1. Ask for clarification as required
2. Include the root cause if identified during the incident
3. Include immediate problem resolution activities
4. Include any permanent problem resolutions that may be identified
5. Coordinate resources as required during the incident

9
™ < 10 minutes

Initialcall or monitoring notification


1. Determine the cause of the incident
2. Determine if production is down or impaired
3. Escalate to Tier II (BA, Infrastructure, DBA) using the on-call matrix
4. Track incident in HPOV
5. Open the conference bridge if required by the Tier II support
6. Transfer the phones to other SD team member if available

™ @ 10 Minutes

1. Telephonically notify the Site Analyst


a. For SAP or E-Mail notify all affected sites
2. Telephonically notify the OSC
3. For SAP or E-Mail notify all affected sites

™ <= 15 Minutes (Incident Resolved no plant/customer downtime)

1. Log incident in HPOV


2. Continue to follow incident management process

™ @ 15 Minutes

1. Track the incident using the incident record


2. Contact the following telephonically
a. SD Manager
i. If unavailable, contact the Director of Infrastructure
1. If unavailable, contact the V.P. of Software development
b. Site Analyst
c. OSC or Site contact
i. If unable to reach the OSC or site contact, call the plant manager
3. Send the Incident Notification E-Mail to the following
a. CIO
b. Plant Manager
c. Site Analyst
d. OSC
e. Regional IT Manager
f. Vice President of Application Software
g. Technical Services Director
h. Software Development Manager
i. Application and Software Analysis Manager
j. Database and Application Manager
k. Service Desk Manager

10
™ @ 20 Minutes

1. Telephonically contact:
a. Regional IT Director
b. Plant Manager
c. Database Manager (as required)
d. Software Manager (as required)

™ @ 25 Minutes

1. Telephonically contact
a. V.P. Software
b. Director Infrastructure

™ @ 30 Minutes

1. Telephonically contact CIO

™ @ 45 Minutes (and every 30 minutes thereafter)

1. Send the Incident Notification E-Mail to the following


a. CIO
b. Plant Manager
c. Site Analyst
d. OSC
e. Regional IT Manager
f. Vice President of Application Software
g. Technical Services Director
h. Software Development Manager
i. Application and Software Analysis Manager
j. Database and Application Manager
k. Service Desk Manager

11
Resolution Process

Send the Incident Resolved email to:

1. CIO
2. Plant Manager
3. Site Analyst
4. OSC
5. Regional IT Manager
6. Vice President of Application Software
7. Technical Services Director
8. Software Development Manager
9. Application and Software Analysis Manager
10. Database and Application Manager
11. Service Desk Manager

Perform a debrief of the individuals involved to ensure that the following have been tracked in
Incident report:-

1. Confirm the time line


2. The root cause
3. Immediate corrective actions
4. Permanent corrective actions
5. Identify the incident owner (SD Manager will assign as required)

12
Inputs

• Incident details were sourced from End Users via the Service Desk, networks or computer
operations via monitoring tools and manual detection during defined operational hours (Service
Catalogue)

• Configuration item (CI) details from the Configuration Management Database (CMDB)

• Response from Incident matching against Problems and Known Errors

• Resolution details

• Response on RFC to effect resolution for Incident(s).

Activities

• Incident detection, recording and alerting

• Interrogation, classification, prioritization and initial support

• Investigation and diagnosis: A resolution or Work-around was required to be established as


quickly as possible in order to restore the service to End Users with minimum disruption to their
work.

• Resolution and recovery, resolution of the Incident and restoration of the agreed service.

• Closure

• Incident ownership, monitoring, tracking and communication.

Outputs

• Resolved (via Workarounds or Known Errors) and closed Incidents

• RFC for Incident resolution;

• Incident record information (including linkages to resolutions and/or Workarounds and/or CI


data) `

• Communication to Clients and End Users.

• Management information (reports and procedural information)

13
Roles & Responsibilities

1st Level - Service Desk

The Service Desk was responsible for the monitoring of the resolution process of all registered
incidents– in effect the Service Desk was the owner of all incidents. The Service Desk played an
important role in the Incident Management process, as follows:

• Service Desk was an independent function, monitoring Incident resolution progress of all
registered and reported incidents.

• All Incidents were reported to and registered by the Service Desk – where detected Incidents
were generated automatically, the process still included registration of the incident by the Service
Desk (automatically or manually)

• Primary goal of the Service Desk was to resolve majority of the issues at the 1st level itself

On receipt of an incident notification, the responsibilities and main actions carried out by the
Service Desk were:

• Incident detection and recording; record basic details – this included timing data and details of
symptoms observed.

• Routing service requests to support groups when incidents were not closed within the stipulated
amount of time as defined in the SLA, if a service request had been made, the request was handled
in conformance with the organization’s standard procedures

• Initial support and classification and prioritization from the CMDB, the Configuration Items (CI)
reported as the cause for an Incident was selected, to complete the Incident record, the
appropriate priority was derived and the End User was given unique system-generated Incident
number for all future communications.

• Tracking of incidents assigned to 2nd level support following unsuccessful resolution at 1st level,
in this case the history was updated and incident assigned to second level with the relevant details
and then assigned back to the Service Desk to then notify the End User.

• Closure of incidents, following the review of classification, the incident record is closed and details
of the resolution action and the appropriate category code were added.

• Ownership, monitoring, tracking and communication.

14
Specialist Support Groups

IT department within the company had specialist groups which contributed to handling and
investigation of incidents at critical times. Incidents that cannot be resolved immediately by the
Service Desk are assigned to specialists within 2nd and 3rd Level Support groups. Support would
be involved in tasks such as:

• Monitoring Incident details, including the Configuration Items affected

• Incident investigation and diagnosis (including resolution where possible)

• Detection of possible Problems and the assignment of them to the appropriate Problem
Management team for them to raise Problem records

• The resolution and recovery of assigned Incidents.

The definition for 2nd Level and 3rd Level support were defined as under

™ 2nd Level Support

IT 2nd Level Support included Network, Database, Application and System’s Team, they
were part of the internal workforce. When an incident required additional 2nd Level
resources from internal support teams to assist with investigation and resolution of the
error, Service Desk was responsible for engaging the help of other 2nd Level resources as
required.

™ 3rd Level Support

IT 3rd Level Support referred to the support personnel that were external to the organization
i.e. they worked for an external company, supplier or vendor. When an incident required
3rd Level resources from external support to assist with investigation of the error, the 2nd
Level support group assigned to the incident was responsible for engaging the help of those
extra resources.

15
Service Desk Manager

Service Desk Manager played an important role in Incident Management and had the prime
responsibility for ensuring compliance with the process and ensuring the highest standards for
ongoing delivery of 1st Level support services, besides these he also had the following
responsibilities

• Ownership, monitoring, and keeping effective records of the incident.

• Monitoring the status and progress towards resolution of all open Incidents.

• Keeping affected End Users informed about progress.

• Follow the escalation procedure as and when required.

Incident Manager

The Incident Manager had the responsibility for:

• Driving the efficiency and effectiveness of the Incident Management process

• Producing management information

• Managing the workflow of the Incident Management Process

• Monitoring the effectiveness of Incident Management and making recommendations for


improvement

• Developing and maintaining the Incident Management systems.

16
Incident Record Keeping

Throughout the Incident lifecycle the record must be maintained, this would allow the Service Desk
agents to provide an End User with the most up to date progress report. Such activities would
include:

• Modify status (e.g. ‘new’ to ‘work-in-progress’ or ‘pending’)

• Modify business impact/priority

• Monitor escalation status.

• Update history details

• Enter time spent and costs

HP Openview was used as the authoritative tool to record this information.

Incident Report

The Incident Report showed entire life-cycle of the case, it was therefore one of the most important
aspects of an Incident to keep up to date. Without an incident report ongoing process
improvements would not have been possible. The report had a field for Immediate Corrective
Actions which enumerated the steps taken to resolve the issue and a Permanent Corrective Action
field which described the future course of action to prevent the incident from re-occuring. This
report is made available to the Problem Management team during the Corrective Action meeting.

The Results

Following were the benefits of implementing Incident Management:

a) Providing timelier incident resolution resulted in reduced business impact.

b) Improved user productivity

c) SLA focused production information

d) Independent, customer-focused incident monitoring.

17

Você também pode gostar