Você está na página 1de 22

CERTIFICATE

This is to certify that the Project Work titled Building Disaster recovery capabilities has been successfully completed at MphasiS Ltd.by Dinesh Kumar, under my guidance, in partial fulfillment of the Post Graduate Diploma in Industrial Management at National Institute of Industrial Engineering (NITIE), Mumbai. I wish him a bright and prosperous future.

Mr. V. B. Khanapuri Assoc.Professor Faculty Guide NITIE, Mumbai

Acknowledgement ................................................................................................................................... 3 Executive Summary ................................................................................................................................. 3 Company Overview.................................................................................................................................. 3 Applications ......................................................................................................................................... 4 BPO ..................................................................................................................................................... 4 Infrastructure Services ......................................................................................................................... 4 Payments Solutions.............................................................................................................................. 5 Project Overview ..................................................................................................................................... 6 Need for the project ............................................................................................................................ 6 Objectives of the project ...................................................................................................................... 6 Key Deliverables .................................................................................................................................. 6 Literature Survey ..................................................................................................................................... 6 BCP Planning Methodology .................................................................................................................. 6 Building Disaster recovery capability in Service desk business .................................................................. 7 Identification of critical requirements .................................................................................................. 7 Defining Maximum Tolerable Period of Disruption ............................................................................... 7 Defining Recovery Time Objectives ...................................................................................................... 7 Identification of Alternate Site ............................................................................................................. 8 Identification of resource and infrastructure requirements .................................................................. 8 BCM Response ..................................................................................................................................... 8 Training and testing ........................................................................................................................... 11 Maintenance and updating ................................................................................................................ 11 Identification and prioritization of possible risks in Disaster Recovery strategy ...................................... 11 Risk Identification .............................................................................................................................. 11 Risk Prioritization ............................................................................................................................... 16 Analysis of current recovery strategies .................................................................................................. 18 Account A .......................................................................................................................................... 18

Account B .......................................................................................................................................... 19 Account C .......................................................................................................................................... 20 Account D .......................................................................................................................................... 21 Miscellaneous .................................................................................................................................... 22

Acknowledgement
This project is the result of two months of work whereby I have been accompanied and supported by many people. It is a pleasant aspect that I have now the opportunity to express my sincere thanks to all of them. I would like to express my gratitude to all my professors, my institute, team members & the company. Their generous help and support enabled me to complete this study within the stipulated time period. I am very obliged to National Institute of Industrial Engineering (NITIE)& MphasiSLtd.for giving me the opportunity to work on this project & would like to specially thank Dr. V. B. Khanapuri (Associate Professor, NITIE) ,Mr. MakarandVaze ( Project Manager, MphasiS Ltd.) and Mr. RitieshSethi (Associate Vice President,MphasiS Ltd.) who provided guidance, help and cooperation at every stage of the project. I would like to thank my colleagues and friends who have directly or indirectly provided me assistance and helped to materialize this project. However I take the responsibility for all my shortcomings. I, in all sincerity, hope that my efforts will be appreciated.

Executive Summary
The Mphasis India BPO Service Delivery Centre provides Service Desk support to various corporate clients. In order to ensure that the Service Desk services are available at all times, the BPO has developed a disaster recovery plan to provide basic Service Desk service in the event of a disaster. The objective of the project is to analyze the disaster recovery strategies for four accounts so as to improve the current disaster recovery strategies. The project expected identification and prioritization of possible risks and analysis of recovery strategies of four corporate accounts.

Company Overview
3

MphasiS (BSE: 526299, NSE: MPHASIS), a unit of Hewlett-Packard Co., is an information technology services company based in Bangalore, India. MphasiS is certified with ISO 9001:2008, ISO/IEC 27001:2005 and is assessed at CMMI v 1.2 Level 5.It is the sixth largest IT company in India with more than 38,000 employees as of 2010. The company has 29 offices in 14 countries with delivery centers in India, Sri Lanka, China, North America and Europe. MphasiS is engaged in providing information technology services to its customers around the world. MphasiS provides integrated solutions that include business process outsourcing, infrastructure technology, and application services. The application service offered by the company includes application development as well as applications maintenance and support services. The markets served by the company are financial services & insurance, healthcare, manufacturing, government, transportation, communications, and consumer & retail industries.

Applications
MphasiS is uniquely positioned to cater to the global strategic outsourcing demand and have integrated changing environments into Applications service portfolio. Full range of application services helps to invest more strategically in core business activities, while improving ROI from tight IT budgets.

BPO
MphasiS, a leader in providing end-to-end Business Process Outsourcing services, provides high quality, value-added voice and transaction-based services to Fortune 500 companies worldwide. Being the early entrant in the BPO space, they are experts in providing strategy, solutions, and services to solve complex business issues and achieve results. By coupling their Business Process Outsourcing services and in-depth industry specific knowledge, they provide business focused solutions tailored to clients strategic goals.

Infrastructure Services
MphasiS Infrastructure services have the distinguished blend of classic HP pedigree clubbed with skilled scalable workforce and firm focus on service delivery quality. Whether you are looking to reduce costs, improve business-to-IT alignment, drive innovation, or reduce risks, MphasiS offers a comprehensive suite of remote Infrastructure management services. MphasiS helps optimize clients infrastructure through a mature global delivery model, standardized processes, adoption of IT Infrastructure Library (ITIL) framework and global service network, robust private network with built in redundancies. This network enables secure connection with clients 24x7x365.

Payments Solutions
MphasiS offers a gamut of services such as consulting, application development, modernization, integration, and testing in payments arena to clients worldwide. The array of solutions from MphasiS for wholesale payments, retail payments, PCI DSS compliance, and other value added services are well designed to help clients reduce processing costs, retain investments, streamline processes, transform applications, identify and deploy new channels thereby increasing their revenues streams.

Applications

BPO

Infrastructure Services
Remote Monitoring Center

Payments Solutions
Retail & Wholesale Payments

Applications Development

Industry Offerings

Applications Management

Service Offerings

Data Center Services

PCI & DSS Compliance

Enterprise Applications Services

Workplace Services

Mobile Transaction Processing Solution

Business Practices

Network Services

Value Added Services

Technology Practices

Security Services

Consulting Services

Solutions

Project Overview
Need for the project
Business continuity plans are not taken up very seriously and disaster recovery strategies were not defined with proper rationale for many BPO accounts of the Mphasis. This is so because disaster situation had not been faced since the inception of most of the accounts. This project aims to identify the possible risks in the disaster recovery strategies and to recommend the improvements in the disaster recovery strategies of the four accounts.

Objectives of the project


    To identify the possible risks inthe recovery strategies ofthe service desk business. To suggest corrective actions for the identified risks. Prioritization of the identified risks. Analysis of the recovery strategies of the four accounts of the company.

Key Deliverables
    List of the possible risks with corrective actions Prioritized list of the identified risks Risks in the current recovery strategies of the four account Recommendations to avoid the risks associated with current recovery strategies of the four accounts

Literature Survey
BCP Planning Methodology

Building Disasterrecovery capability in Service desk business


Building disaster recovery capabilities in Service Desk business involves the following steps:

Identification of critical requirements


Following are the critical requirements for the service desk business: - Human resources: Since the business requires the calls to be handled by the agents therefore, agents are the most critical requirement for the service desk business. o Number of agents to provide expected/agreed level of service.

- Infrastructure Requirements: Following are the supporting infrastructure required for an agent to perform his job o Seats o Desktops and laptops o VOIPS (Voice over Internet Protocol) o Network / LAN, power points o Landline, mobile, fax, printer - Application requirement o DW software: It is the software used to maintain the status of the agents and route the calls to the agents. o EKMS: Enterprise Knowledge Management System contains all the supporting documents for the agents to find the solutions to the problems faced by caller.

Defining Maximum Tolerable Period of Disruption


The Maximum Tolerable Period of Disruption (MTPoD) for the project/account is that duration of time after which an organizations viability will be irrevocably threatened if delivery of products/services cannot be resumed. It can either be defined in the contract or can be calculated through business impact analysis.

Defining Recovery Time Objectives


The maximum period of time for which the project may remain non-operational without incurring irrevocable damage is referred to as the Recovery Time Objective (RTO). It should be defined with the service level requirements. It should not exceed MTPoD.

Identification of Alternate Site


Alternate site should be defined considering the availability of infrastructure requirements and the time and cost of transportation.

Identification of resource and infrastructure requirements


Requirements of resources and infrastructure should be quantified according to the service level requirements.

BCM Response
Project BCP Team
The Project BCP team required to respond to an incident and subsequently initiate recovery procedures are as follows: y
Project/Account Owners

Project owners are the personnel responsible for all the decisions made regarding the project. These personnel are to be informed when the BCP is to be invoked, as they are to take the key decisions during the incident and the recovery phases. y
Project/ Account BCM Coordinator(s)

BCM Coordinators are the personnel responsible for authorizing activities, communications and response teams that may be necessary for supporting the incidents in consultation with IMT and Project Owners. They aid the Project Owners in determining whether a disaster should bedeclared and which Teams should be mobilized. They also Report progress and problems as required. y
Recovery Team

The Recovery Team consists of resources that are critical to the delivery of the project including team members, team leads and project managers. Their role is to initiate the recovery of critical activities identified during the Business Impact Analysis (BIA). The details of this team are depicted the call tree in Table below and referenced in Table 2 : Recovery Team Call Tree
Project Team Detail Role Project Owner Name Contact Details Backup Team Member Name of the Backup team person Contact Details

Name of the project owner

Office: Contact No. Home: Contact No. Cell Phone:

Office: Contact No. Home: Contact No. Cell Phone:

Contact No. BCMS Coordinator Name of BCMS Coordinator Office: Contact No. Home: Contact No. Cell Phone: Contact No. Recovery Team Refer Table 2: Recovery Team Call Tree for more details

Contact No. Office: Contact No. Home: Contact No. Cell Phone: Contact No.

Table 1: Project/Account BCP Team Contact Details

Recovery Team Call Tree


Recovery Function Project/Account Delivery Head Name of Individual Contact Numbers Home: Contact No. Office: Contact No. Cell: No. Project/Account Delivery Manager Name Project/Account Delivery Manager of Contact Alternate Name Contact Details Home: Contact No. Office: Contact No. Cell: No. Name of Project/Account Delivery Manager Delegate Contact

Name of Project/ccount Delivery Head

Name of Project/Account Delivery Head Delegate

Home: Contact No. Office: Contact No. Cell: No. Contact

Home: Contact No. Office: Contact No. Cell: No.. Contact

BCMS Coordinator

Name of Coordinator

BCMS

Home: Contact No. Office: Contact No. Cell: No. Contact

Name of Alternate BCMS Coordinator

Home: Contact No. Office: Contact No. Cell: No. Contact

Member 1

Name member Name

of

project

Member 2

of

project

member

Table 2: Recovery Team Call Tree

Notification and Escalation Process


In the event of a disaster, the steps to be followed from initial notification to the recovery of the identified business processes are detailed below in Figure 1:

Figure 1: Notification and Escalation Process

BCP Invocation Plan


Invoking this plan implies that a recovery operation has begun and will continue with top priority until workable recovery support has been established. The critical systems to be recovered will depend on Maximum Tolerable Period of Disruption (MTPoD) defined by the project/account team. The specific project BCP will be invoked by the Project Manager. The recovery team will be mobilized using the call tree presented in Table 2: Recovery Team Call Tree. The recovery teams will meet at the alternate location as described in the Detailed Recovery Procedures and Requirements.

10

Incident Management Plan


Each project or account is associated with a facility. Each facility owns a site level Incident Management Plan.

Training and testing


Timely training and testing should be done for the Business continuity management procedures so that everyone involved in the recovery procedure knows their roles and responsibilities.

Maintenance and updating


BCM (business continuity management) procedures and BCP should be updated regularly based on the feedback of testing. BCP also needs to be updated based on any changes in the organization like change in contact details, change in the service level agreements or changes in the call volume.

Identification and prioritization of possible risksin Disaster Recovery strategy


Possible flaws or risksin the recovery strategy were identified and the prioritization wasdone based on the response of the operations managers.

Risk Identification
Following major risks with the recovery strategy were identified:

MTPoD is not defined


If MTPoD is not defined then there would be no target time for recovery and the company may exceed the time after which the organizations viability will be irrevocably threatened. It may lead to loss of business and loss of goodwill. Corrective action: MTPoD is either defined in the service level agreements and contract or it should be calculated using the business impact analysis. In former case, MTPoD is agreed upon with

11

the customer and it is defined in SLA. In later case, it is calculated by comparing the revenue loss or penalty to the expected profit from the account.

Number of days considered (for adjusting 24X7 ops)

1
Factors 0-4 hours` 4 hours 24 hours

10

15

1 day - 3 days

3 days - 7 days

1 week - 2 weeks

2 weeks and above

Number of billable hours lost Primary Operational impact Any other financial Impact Total Operational Impact

This operational impact could be compared with the expected profit from the account or the profit above minimum profit margin required. MTPoD could be defined for each service separately.

RTO is not defined


If RTO is not defined recovery process could exceed MTPoD. This could result in the irrevocable loss. It should be defined with service level requirements to be resumed. Corrective action: To Determine the time to resume activity, some thought needs to be given to defining the level of performance at resumption (e.g. number of personnel, manufacturing throughput, invoices produced) and determining the time required to return to normal levels of operation. It is believed that the standard is not looking for a scientific calculation of the latter (BCM is not a science) but is just looking for an indication from someone that understands the activity.It should be generally less than 50% of the MTPoD.

Resource requirements are not quantified


It would lead to confusion at the time of disaster because there is no clear requirement defined for the number of agents that needs to be present. And it 12

would be difficult to identify the infrastructure requirementsas it would depend on the number of agents. If infrastructure requirements are not defined clearly there are chance to raise request for more or less than requirement. Corrective action: Number of agents required to meet the service level requirements defined in RTOshould be clearly defined. It could be derived from the peak level requirements and the service level of the RTO. For example, if 100% service level is required and the peak level agents requirement is 45 then 45 seats needs to be booked. And if 33% service level is required then 15 seats needs to be reserved and it should be scaled to 45 within MTPoD. Seats Required = (No of agents in peak hours) x ( service level requirement) One or two extra seats may need to be added for operations manager and assistant operations manager.Other infrastructure requirements could be defined using the number of agents as follows: Desktops = No of agents Landlines = 1 or 2 for operation manager and assistant operations manager Printers = 1 No. of ports = Total number of seats VOIP = Total number of seats LAN = Total number of seats Number of power points = 2 x (number of agents) + 2 (Each agent requires two power points while manager requires only one) Applications: Internet explorer, DW, Avaya interaction center, EKMS

Alternate site is quite far


It would take lot of time to travel to the alternate location or transportation would be quite expensive. This could lead to not meeting the RTO or could lead to high expenditure on transportation. If it is defined in different country it would be difficult to have enough resources there or it could be quite expensive. Corrective action: Recovery site should be reachable within RTO so that operations could be resumed and it transportation should not be very expensive. And the facilities like

13

Mphasis network should be available at the alternate site. As the Mphasis has offices at Mumbai and Pune, they can act as alternate site for each other. Mumbai-Pune: Distance = 160 kms Travel time by road = 3 to 4 hrs Travel time by train = 3hrs

Alternate site is in the same city


Usually natural or civil disasters affect the whole city so it would not be a nice idea to have alternate site in the same city. Corrective action: It would be better not to have alternate site in the same city.

Alternate contact is not defined in the BCP team


If the primary contact is not available at the time of disaster due to any reason it would lead to lot of chaos and mismanagement and it may lead to complete failure of disaster recovery. Corrective action: Every BCP team member should have an alternate contact. It is very crucial and cannot be neglected.

Unavailability of BCP copies at the time of disaster


If BCP copies are not accessible at the time of disaster it would become difficult to carry out disaster recovery procedure. Corrective action: Usually soft copy of the BCP is kept at a shared location and local copy is kept with recovery team members on their laptops. Hard copy is also kept at recovery location. In addition to this hardcopy could be kept at the residence of the recovery team members.

Unawareness and inadequate training of BCP

14

If team members are not aware of the recovery process or not practiced the BCP procedure they may find it difficult to execute it at the time of disaster. Corrective action: Regular exercise should be done for the BCP procedures and escalation process should be practiced regularly. Currently it is practiced annually or semi-annually. Quarterly tests could be taken to find out the understanding of the disaster recovery procedures. If tests results are not satisfactory training needs to be given for that member.

BCP is not updated with latest contacts


If BCP is not updated with latest contacts it would create trouble in contacting theBCP team members. It may cause delay due to the time lost in finding latest contacts for the members. Corrective action: Whenever a team member leaves or joins company the contact details should be updated. And whenever contact information of a member is changed the BCP plan needs to updated with latest contact information.

Availability of infrastructure requirements at alternate site is not verified


If all the infrastructure requirements are not verified it would lead to not meeting service level requirements. Followings could be the causes: Network Availability: Mphasis network availability is the most crucial requirements. If it is not available no calls could be received. Desktops, VOIPs, Power points, etc: All these requirements are crucial for handling the call by an agent. Applications ( DW, EKMS) : DW is necessary for routing the calls and EKMS is necessary for the agents find the solutions of the customers problems. Corrective action: Network and application availability needs to be verified. Assignment of seats and infrastructure requirements needs to be verified with the chief risk officer at the alternate site.

15

Risk Prioritization
Operations Managers and assistant operations manager were asked to identify each risk as high, medium or low. Based on the following description: High Risk:Risks that could lead to failure of recovery or could lead to exceed the MTPoD.These risks could lead to serious business loss. Medium Risk : Risks that causes large delays and lead to exceed RTO and sometimes MTPoD. These could cause the business loss and could affect the service levels. Low Risk: Risks that could cause operational problems and could lead to delays in RTO. Business is not affected but could cause delays and interruption in recovery. Following values were assigned to the three levels: High :3

Medium : 2 Low :1

Averages are calculated for each risks and categorization is done as follows: 1 to 1.5 : Low 1.5 to 2.5 : Medium 2.5 to 3 : High

Sl. No.

Risks

R1 R2 R3 R4 R5 R6 R7 R8 Average

Risk category

MTPoD is not defined RTO is not defined

2.9

High

2.3

Medium

16

Resource requirements are not quantified Alternate site is quite far Alternate site is in the same city Alternate contact is not defined in the BCP team Unavailability of BCP copies at the time of disaster Unawareness and inadequate training of BCP BCP is not updated with latest contacts Availability of infrastructure requirements at alternate site has not been verified

2.4

Medium

1.4

Low

1.6

Medium

1.9

Medium

1.8

Medium

2.4

Medium

1.6

Medium

10

2.8

High

Following list of prioritization is created from the average score: 1) MTPoD is not defined 2) Availability of infrastructure requirements at alternate site is not verified 3) Resource requirements are not quantified 4) Unawareness and inadequate training of BCP 5) RTO is not defined 6) Alternate contact is not defined in the BCP team 7) Unavailability of BCP copies at the time of disaster 8) Alternate site is in the same city 9) BCP is not updated with latest contacts 10) Alternate site is quite far 17

Analysis of current recovery strategies


Four accounts had to be analyzed to find the risks in their currentrecovery strategy. Corrective actions also need to be suggested for the accounts.

Account A
Account Details: Following are the important details about the account:  Number of Agents= 96 approx.  Approx. no of calls = 22,000 per month  Contract is with HP and HP deals with customer

Current Recovery strategy:  For US/Canada and EMEA, the alternative recovery site is Tower 4, Magarpatta, Pune.  Strategy is to have the calls routed through a DR skill to a specific defined set of employee ids.  3 agents would be creating the ticket number with the customer details and problems. 18

 Tickets would be sent to the Mumbai site when it is up. Risks with the strategy:  No business impact analysis is done and MTPoD and RTO are not defined:  No plans if the disaster effects the functioning of Mumbai site for more than 24 hrs.  Plan has been left to discussion at the time of disaster.  Three agents would not be enough to handle peak hour volumes which requirearound 40 agents. Recommendations:  Business impact analysis must be done to define MTPoD and RTO or it shouldbe agreed upon with HP.  Based on RTO and service level agreements alternate site and recovery requirements should be set.

Account B
Account Details:  Number of Agents= 102 approx.  Approx no of calls = 24,000 per month  New account, started in Oct, 2010  BCP plan is yet to be signed off Current Recovery strategy:  MTPoD is 120hr according to contract  RTO is set as 90 hrs  30 % of services need to be rendered within 24 hrs  Alternate site 1: Tower CC4, Magarpatta, Pune  Alternate site 2: Infinity Park, Mumbai  Number of seats required = 30

19

 LAN (30), VOIP (30) , Power Points (58), Laptops (2), printer (1),etc are required Risks with current strategy:  RTO of 90 hrs is not defined with the service level requirements. 90 hrs that is 75% of MTPoD is quite large.  No rationale behind 30 seats requirements. It would not be possible to handle peak hour call volume which requires 45 agents. Recommendations:  RTO must be defined with the service level requirements. RTO is generally less than 50% of the MTPoD.  45 seats are required if 100% service level needs to be met because peak hour requirements is 45 agents.

 If 30 seats are used then what service level could be met should be defined and agreed upon.

Account C
Account Details:  Number of Agents= 125 approx.  Approx no of calls = 27,000 per month  BCP plan is yet to be signed off although the account is quite old Current Recovery strategy:  RTO = 9 hrs  Call transferred to HP, Budapest and HP would handle the calls Risks with current strategy:  MTPoD is not defined  Alternate site is not defined in India  Recovery requirements have not been identified Recommendations: 20

 BIA must be done to define the MTPoD  Alternate site could be Mumbai. Mangalore agents who handles only mails and chats can be trained to manage the voice calls in case of disaster  Service level should be defined with RTO

Account D
Account Details:  Number of Agents= 66 approx.  Approx no of calls = 19,000 per month  Standard format is not used for BCP Current Recovery strategy:  Calls will be routed to Pune EON Kharadi and as many agents as possible will be moved to Pune EON Kharadi.  Critical Service Desk support is to be provided within 1 hour  Basic Service Desk services would be available within 4 hours Risks with current strategy:  MTPoD is not defined  Alternate site is in the same city  Recovery requirements have not been identified Recommendations:  BIA must be done to define the MTPoD  Alternate site could be Mumbai  Requirements should be defined

21

Miscellaneous
Following observations are common to all the accounts:  Total number seats required at a recovery site which is an alternate site to the accounts which are located at same site should be compared with available seats.If there is significant difference in the requirements and availability, alternate sites should be adjusted.  Tests for the awareness of the BCP should be conducted quarterly and training should be conducted if the results are not satisfactory

22

Você também pode gostar