Você está na página 1de 6

As in the other Processes, Problem Management policies are a matter for each organisation to define according to their specific

requirements. This will depend on the criticality of Services and the consequences of their loss or disruption. A basic principle or concept is that Problem Management has two distinct but complementary faces: reactive and proactive. Reactively, Problem Management supports Incident Management and Service Level Management in the diagnosis and resolution of Major Incidents. Proactively, Problem Management offers significantly wider benefits to both the Service Lifecycle and the business by identifying and eliminating weaknesses in the Services and supporting infrastructure to help prevent Incidents occurring or recurring and to mitigate the effect of Incidents that do occur. In doing so, the Process makes use of Problem Models. A Problem Model is a similar idea to that of an Incident Model, namely that it provides a standardised approach to managing a certain recurring type of Problem which is more efficient than devising an approach to each Problem as it occurs. A Problem Model would typically include:

the steps needed to manage the Problem; the order in which these steps should be undertaken; defined responsibilities for each step; timescales and thresholds for each step; escalation procedures; documentation, logging and reporting requirements.

PROCESS ACTIVITIES, METHODS AND TECHNIQUES


A typical reactive Problem Management Process flow is shown in Figure 10.1.

Source: OGC ITIL Service Operation ISBN 978-0-113310-46-3

Figure 10.1: Problem Management Process flow The key steps in this Process are described below.

Problem Detection
The occurrence of a Problem can come from one or more of the following sources:

The Service Desk while investigating an Incident. A technical support group during their investigation of an Incident. Event monitors. Notification from a third party (e.g. a Supplier). Problem Management itself as part of Incident analysis and trending.

Problem Logging
All Problems and their relevant details should be logged, regardless of the source. The Problem Log should contain:

date and time stamps; cross reference to relevant Incidents and their details; details of what is affected (e.g. user, Service, equipment); priority and category; work undertaken on diagnosis and/or recovery.

Problem Categorisation
The purposes of Problem categorisation are:

to support diagnosis and recovery; to provide meaningful management information to support Activities such as trend analysis, Supplier Management and Continual Service Improvement.

Like Incidents, categories are typically multi-level. Two examples are shown in Figure 10.2, one for a hardware Problem and one for a software Problem.

Source: OGC ITIL Service Operation ISBN 978-0-113310-46-3

Figure 10.2: Two examples of categorising a Problem

Problem Prioritisation

Problem prioritisation is important for the same reasons as Incident prioritisation and the same approach can be used. However, Problem priorities must additionally take into account the frequency and impact of related Incidents. In addition to impact and urgency, ITIL also recognises Problem severity. Severity refers to the cost or effort needed to resolve the Problem in terms of people, actions and time.

Problem Investigation and Diagnosis


The purpose of this step is to find the root cause of the Problem. The initial focus should be on the Configuration Management System (CMS) to help identify the point of failure. Problems can also be matched on the Known Error Database (KEDB). After this, there are a number of techniques available depending on the situation and priority. These are described below.

Chronological order: List key Events by time to provide clues about cause and effect. Pain value analysis: Analyse the impact of the Problem in terms of people, Services and cost to understand the business consequences better in order to prioritise the response and allocate appropriate resources to the resolution. (Not a diagnosis technique.) Kepner and Tregoe: For the formal analysis of complex Problems, based on defining the Problem, describing the Problem, then identifying the true cause from the possible and probable causes. Brainstorming: Simply gathering ideas about the root cause from people who may have relevant ideas or experience. Ishikawa diagrams: A way of documenting the results of, for instance, a brainstorming session to help further identify the root cause. An example is shown in Figure 10.3.

Source: OGC ITIL Service Operation ISBN 978-0-113310-46-3

Figure 10.3: Example of a completed Ishikawa diagram

Pareto analysis: A method for identifying and focusing on the typically small number of potential causes that are responsible for the majority of failures (often referred to as the 80:20 rule). By listing the causes of Incidents by frequency and adding these cumulatively, you will typically find that some 20 per cent of causes are responsible for 80 per cent of failures.

Workarounds
A workaround applies to the Incidents caused by the Problem, not the Problem itself. Where a workaround for an Incident exists, it should be documented in the Problem record and the Problem record should remain open until a permanent resolution for the underlying root cause can be found.

Raising a Known Error


There is contradictory guidance within ITIL on whether or not a workaround needs to be in place before a Known Error record can be created. The formal definition requires this to be the case ("A Known Error is a Problem that has a documented root cause and a workaround."). However, within section 4.4.5.7, it is advised (correctly) that the Known Error record should be created for information

purposes "Even though the diagnosis may not be complete or a workaround foundit is inadvisable to set a concrete procedural point exactly when a Known Error record must be raised. It should be done as soon as it becomes useful to do so."[*] The purpose of the Known Error record is to provide information against which to correlate other Incidents and Problems both at the time and later so that a more accurate picture of the cause can be built up, aiding permanent resolution. If the workaround does exist, this can conveniently be applied to new Incidents prior to resolution of the underlying root cause, to reinstate the Service.

Problem Resolution
A Problem is usually resolved through the application of a Change. Depending on the priority and severity, this may justify an emergency Change. Occasionally a Problem may remain unresolved for a long time if, for instance, the solution is not cost justified. In these circumstances, the workaround continues to be applied. EXAMPLE

A particular PC application running on a certain type of PC occasionally causes an error message. There is no impact on the Service or user but the workaround is to reboot the PC. If the organisation's refresh programme plans to release a newer version of the application that does not cause this Problem within the next six months, it may not consider it worthwhile to expend the time and money to resolve the Problem in the interim.

Problem Closure
A Problem should be closed on successful application of the RFC to resolve the Problem. Any related Incident records should also be closed. Both records should fully document the resolution actions. Any associated Known Error records and Configuration Item records should also be appropriately updated.

Major Problem Review


After the successful resolutions of whichever Problems the organisation defines as major, a review should be conducted to learn from the Processes and actions and contribute to continual improvement. The review should identify positives and negatives and any follow-up actions. The results should be documented, for instance in procedures, work instructions and configuration records. The Problem Manager is responsible for ensuring the appropriate actions are taken. The resulting understanding should be fed back to the business via the service review meeting.

Errors Detected in the Development Environment


Known deficiencies in the Release of a new or changed application along with workarounds or resolutions should result in a Problem record and be recorded in the KEDB to minimise subsequent support costs.
[*] In the event that an exam question refers to a Known Error without defining the term, it is safer to assume that it requires a workaround.

TRIGGERS, INPUTS, OUTPUTS AND PROCESS INTERFACES


Most Problem records are triggered from Incidents, but they may also come from the testing of new or changed applications. Another source is Supplier product release information.

Within Service Transition, the Change and Problem Management Processes refer to each in respect of both RFCs to resolve a Problem and Problems resulting from failed Changes. Configuration Management helps identify CIs at fault and determine Problem impact. The KEDB is also part of the CMS and can hold Problem records. The Release and Deployment Management Process ensures that known deficiencies in new Releases together with their workarounds are transferred from the development KEDB to the production KEDB. Within Service Design, the Problem Management Process interfaces with the Availability and Capacity Management Processes typically in support of proactive Problem prevention. The proactive side of Problem Management also supports Continual Service Improvement. Problem Management helps improve Service Levels and contributes to Service reviews by Service Level Management. Financial Management provides some of the cost and Service guidelines to which Problem Management adheres and contributes to the assessment of the cost-effectiveness of proposed resolution actions.

INVOLVEMENT IN INFORMATION MANAGEMENT


The CMS contributes to impact assessment, and therefore prioritisation, and provides information to support trend analysis. The KEDB speeds resolution Activities through Problem and Incident matching to help diagnosis and identify workarounds. ITIL recommends that only the Problem Manager adds new records to the KEDB to avoid duplication and ensure consistency. The CMS and the KEDB are component parts of the Service Knowledge Management System (SKMS).

USING METRICS TO CHECK EFFECTIVENESS AND EFFICIENCY


Note Please refer also to Appendix 3 for information on the generic use of metrics to check and improve efficiency and effectiveness.

The following metrics can help Problem Management assess and improve its effectiveness and efficiency:

Percentage of Problems resolved within the timescales set out in the SLA, by time period. Average cost of resolving a Problem, by time period. Percentage of Major Problems where Major Problem reviews have been carried out, by time period. Percentage of actions from completed Major Problem reviews that have been completed, by time period. The number of Known Errors identified, by time period.

The actual number of Problems identified during a period is useful to give an indication of the scale of issues and the resources required, but on its own it is not a measure of the effectiveness or efficiency of the Process.

CHALLENGES, CRITICAL SUCCESS FACTORS AND RISKS


Note Please refer also to Appendix 4 for generic challenges, Critical Success Factors and Risks.

Challenges
While Incident Management focuses on restoring Service as quickly as possible, Problem Management is concerned with ascertaining and removing the root cause of one or more Incidents. The two Processes work closely together. However, there can at times be a tension between the Incident Management and Problem Management Processes. Often the Problem investigation and diagnosis phase can be time consuming. If Incident Management has a quick workaround to restore Service, they will want to use it. This may not aid Problem Management that needs to understand the root cause. Problem Management may require an outage or to take a dump of data which again may be at odds with Incident Management striving to get the Service back running as soon as possible. Other challenges include:

ensuring that the Incident and Problem tools are compatible and communicate with each other; understanding the real business impact of Problems.

Critical Success Factors


Problem Management is clearly reliant on an effective Incident Management Process and appropriate interfaces and toolsets. The capability must exist to correlate records from each Process, and users of each Process should be familiar with each set of procedures, documentation and outputs. A business perspective is also an essential dimension of Problem Management.

Risks
Two main Risks exist:

Undertaking Problem Management as part of Incident Management. The two Processes should not be undertaken by the same team. This is because: o the two objectives are inconsistent. Incidents require a rapid fix or workaround but Problems require more time to conduct investigative work to diagnose the root cause in order to resolve the underlying issue; o the need to respond quickly to Incidents is likely to take all the available resources, meaning that Problem investigation is often only a secondary priority. Many organisations only adopt reactive Problem Management, which is little more than Major Incident Management. Most of the benefits of Problem Management come from the proactive aspect, which must therefore receive appropriate resource and focus.

Você também pode gostar