Você está na página 1de 14

THE CLARIION SSE HANDBOOK

This document is current as of 12/30/04. Every effort has been made to ensure the accuracy of all the items in this document. However, if you come across some information that is incomplete or incorrect, please contact this handbooks current author, David Yellope (yellope_david@emc.com) to get this book updated.

Table of Contents
Section I: Basic Case Handling Procedures 1.1 Reviewing Clarify Case Text 1.2 Dialing into CLARiiON Remotely 1.3 Setting Up a WebEx Session 1.4 Finding Correct Array in Navisphere 1.5 Array Health Checklist Section II: Escalation Requirements 2.1 Time Frame for DU/DL Severity 1 Cases 2.2 Basic Requirements for CLARiiON TS2 Page 3

Page 5

Page 6 Section III: Case Movement 3.1 Turning over cases for weekend or vacation 3.2 Turning over cases at end of shift 3.3 Warm Transfers/Consulting with other Groups 3.4 Dispatching case to Unisys-Supported sites 3.5 Engaging the TSG Implementation Support Group Section IV: Types of Cases 4.1 Switch Zoning Issues 4.2 Host Issues 4.3 Root Cause Analysis (RCA) Cases 4.4 Performance Request Cases 4.5 CLARiiON Disk Library (CDL) Cases 4.6 VMWare/Legato cases 4.7 AX series Cases 4.8 Host Implementation Cases Section V: Support Lab Procedures 5.1 Receiving Information from the Field 5.2 If you cannot make a scheduled shift Page 9

Page 12

Page 13 Section VI: Updates and Upcoming Changes 6.1 Latest Code Levels 6.2 Updates to Error Codes in Latest Release 6.3 Upcoming Changes in Procedure Section VII: Useful Tools for Troubleshooting Page 14

Section I: Basic Case Handling Procedures


1.1 Reviewing Clarify Case Text
Please make sure to review all text in a Clarify case before dialing into a CLARiiON array, or calling a customer. That will prevent duplication of effort with prior efforts, or give you an update as to the status of array. Also, there are useful tools to determine the best time to call the customer (if not already noted in the case). Check the Prod/Contr. tab, and at the bottom of that screen, theres a field that notes the customers local time. If that field is blank, you can also check the site/contact tab to determine the customers location, and make an educated guess as to the local time for the customer.

1.2 Dialing into a CLARiiON Remotely


When it becomes necessary to dial into a remote array, click the three dots next to the site name. That will open up a sub window, and click the modem button. That will bring up a screen listing all the modems linked to arrays on this site. Try to find the modem number for the array listed in the case (sometimes, you will have to make an educated guess on which array from the information in the case). When you find the entry that corresponds to the array in question, double click that entry and a screen will open with all the dial-in details for that array. This will provide information needed to dial into this array. Sometimes the information is incomplete or missing, but there is some standard information that can be attempted in its place. Look for the following fields (Going from the top of the modem screen to the bottom): Connection Number: This is the phone number we will need to dial into the remote information. Usually you will just copy that phone number (with a 1 for long distance, usually) right into the modem field in EMCRemote to dial in. Login: The proper log in to use in EMCRemote. You would plug this into the USER field. This field may be blank, if so, leave it blank on EMCRemote when attempting to dial in to the remote array. Password: The EMCRemote password, just copy and paste this right into the Password field in EMCRemote. You need a password to attempt to dial out to a remote CLARiiON array, if this field is blank, try the password RAII (without quotes). Dial in Conditions: This field will tell you whether or not we can dial in to this box remotely. It can be set to one of the following settings for CLARiiON cases. Unrestricted Access: Customer places no restrictions on dial in, we can dial in as necessary to troubleshoot array issues.

Customer Permission Req: The customer has required that we get their permission before we dial into the box. Usually, we would call the contact person for the case or the Site Contact (if different), and request permission to dial in. They may need to enable dial in, such as connecting a modem or setting their copy of EMCRemote to accept incoming calls. No Dial in Allowed: The customer does not allow remote modem access to the CLARiiON. This usually means that either there is no modem hooked up to the CLARiiON, or the customer has restricted access to it, due to site security processes. You will either have to work with the customer remotely over the phone, dispatch a field engineer, or ask the customer for a WebEx session (see WebEx topic) Notes Field: This is usually where information pertaining to the workstation and Navisphere are. That would include any needed information to unlock the Windows workstation and Navisphere. The standard login/password for workstations and Navisphere (if there is no information already in Clarify) is Administrator/password (or admin/password). The best resource to gather information not in Clarify is to contact the customer directly on this case and request that they supply you with the needed information. If the customer does provide this information, please make sure to update the modem screen with the needed information. If the customer cannot provide this information, you can also dispatch a field engineer to gather this information and update Clarify

1.3 Setting up a WebEx session


If we are not able to dial into a remote array, or in the interests of working a case faster, we can set up a WebEx remote session. This is faster then remote dial up, as it works over the Web via Internet Explorer. This is very useful if you need to gather large files (such as dump files after a SP Reboot). You will need someone at the remote end to set up the connection. This can either be the customer, or you can do it yourself if you are already dialed in to the array. Set up a session using WebEx (note the session number, you will need this), and on the remote CLARiiON workstation, open up an Internet Explorer window, and go to http://emcsupport.webex.com

Once there, choose Join a support session and have customer enter (or enter yourself) the information needed to join a session (session #, remote persons name, email address and company) and click Submit. That will link the remote workstation to the WebEx session you created. From there, click Take Control under the Desktop menu. (The remote workstation will need to grant you permission for this action).

Once you control the workstation, you can work directly on the remote computer via the WebEx window. If you are dialed in remotely, you can disconnect at this point.

1.4 Finding correct array in Navisphere


The customer could have multiple arrays linked to the remote workstation, making it necessary to open Navisphere and confirm that you are working with the correct array. Usually, the correct IP for Navisphere is listed in the notes section of Navisphere. If that information is not available, we can hunt through the remote workstation Internet Explorer for the correct IP address. Usually (but not always), this will be an all-numeric IP, pointing at an internal network address. The address may also have the trailer /start.html or /setup. Usually, most arrays are listed in Navisphere at least by the last four digits of the CLARiiON Serial #. If there is no identifying information in the Navisphere information, you can still get the serial # for the array by right-clicking the array name in Navisphere and choosing Properties

1.5 Array Health Checklist


A) Check that there are no "F icons on the array (that would indicate a physical part of the array has faulted.) B) Under the Tools Menu at the top of the screen, choose Faults. If there are no hardware or software faults at the current time, it should read The array is operating normally C) Next click on the pull down option Tools and select Trespassed LUNs and check for LUNs that have failed over from one SP to another. If no failed over LUNs move on to the next step. D) At this point if you do not find anything wrong you should then look in the SP logs (Right click on the SP in Navisphere and choose View Events) for suspicious error messages, i.e. SP panic SP removed / inserted disk drive removed / inserted. E)If you find a suspicious error message and do not know what it is, try looking it up in the Primus knowledge database for your answer. Make sure to note it in the Clarify case what you found and link the Primus article if available. F) If you cannot find your answer in the Primus Knowledge database you should consult a senior SSE or your shift lead before going to T2.

Section II: Escalation Requirements


2.1 Time Frame for DU/DL Severity 1 Cases
DU/DL cases are extremely critical issues, as the customer has lost data or access to their data (possible issues include a double-faulted raid group, dirty cache, or both SPs being down). It is imperative that the case be moved up the line as quickly as possible.

From the time the case enters Level 1 Support, Level 1 has thirty minutes to either resolve the situation, or to escalate to level 2. With all DU/DL cases, it is much preferred to have the SSE work with customer directly, be it over the phone, via remote dial-up, or via a WebEx session, as it can take a CE hours to go on site and get the information we need, and can get remotely. While the case itself must be moved to Level 2 within 30 minutes, the original SSE can continue to work with the customer (for example, gathering information for Level 2, or trying alternative methods to solve customers issues), while consulting with TS2. CLARiiON Level 1 support can continue working on the case for up to one hour after escalation while aiding in the proper direction of the case. If issue has reached the 1:30 mark since opening without resolution, the SSE should disengage from the case if possible.

2.2 Basic requirements for CLARiiON TS2


Before escalating a case to Level 2 support, all appropriate logs must be available and readable. TS2 will NOT work a case if there is no information for them to work with, this includes SP Collects, Host grabs (for host cases), dump files (for bug checks), etcetera. Only when we have all this information can we escalate the case to Level 2. Once we have this information, dial x.70450, Press 1 for Platforms, and 2 for CLARiiON TS2. You will be transferred either to a CLARiiON TS2 person, or a mod/cell phone of management. Explain that you are escalating a case to level 2, and why. They will ask for case #, and any information you may have already. Once they say its ok to escalate the case, place a note in the case stating who you spoke with in Level 2 support and that they Okd transferring the case to Level 2. Use the Escalation AIM key (Ctrl-Shift-E), and enter all information on the form, all applicable information must also be entered on the environment screen, prior to escalation. Only then can you transfer the case to CLARiiON TS2 queue. As a generally accepted rule, once a case is in Level 2 support, it should NOT come back to Level 1 to be further worked.

Section III: Case Movement


3.1 Turning over cases for weekend or vacation
We have several queues (based on customer location) for turning over cases. Cases should be placed in the appropriate queue when more work needs to happen on the case and the SST or SSE who owns the case will not be in for several days (vacation, comp time, weekends or scheduled time off)

Change the title of the case (you must own the case to change the title), add the notation as below with the current status of the case. CCB (Date) Customer Callback Call Customer on the date specified. Usually, the date will be the next business day, but this can also be used when a customer is out of the office for several days and requests a call back at another time. CE Call Back (Date) - Similar to Customer Callback, but this is for contacting field engineers. LFR-Logs For Review We have received logs for review, but were unable to get them reviewed by the end of the current shift. US Note: We do NOT turn over Eastern US Logs cases to the Australian shift, instead handing them out first thing the next morning. WFL Waiting for Logs We have made a request for some kind of documentation on the error (be it SP Collects, Host grabs, dump files, etcetera). Every day or two, the case must be touched by a SST or a SSE to determine whether the requested logs have been uploaded. Once the information has been confirmed, the case should be handed to the SST who hands out cases, and priority on the case should go to the person who originally requested the logs. NRC (Date) No Response, closing on (Date) We have attempted to make contact with the contact several times on this case. If a week has gone by, without the customer responding to our attempts to contact them, the SST/SSE will make a final call to the customer and inform them that if we do not hear back from the customer within 24 hours, that we will close out the call. (Ctrl-Shift-S, C, G for case text in AIM Keys). They then would put the case into the appropriate turnover Q with a title of NRC with tomorrows date. Any NRC cases with todays date in the title in the turnover queue should be reviewed to determine if customer had gotten back in touch with the support lab, and if not, to close the call out for no response. Make a note in the case about the current status of the case, and the customers current expectations (if known) Place case in one of the following queue (based on location of customer) CLARIION TS1 AMER TURNOV: Place calls in this queue if the customer is in North, Central, or South America. (Examples: USA, Mexico, Brazil) CLARIION TS1 APAC TURNOV: Place calls in this queue if the customer is in Asia or the Pacific (Examples: Japan, China, and Australia) CLARIION TS1 EMEA TURNOV: Place calls in this queue if the customer is in Europe, the Middle East, or Africa (Examples, United Kingdom, Germany, Israel, South Africa)

3.2 Turning over cases at end of shift


If you are unable to finish a case by the end of your shift, you can turn the case over to the oncoming shift, if necessary. The SST who is handing out cases (Book Duty) also covers turnover cases. Make sure the book SST is aware of the case at least a half an hour before the end of your shift. Make sure your notes in the case are clear and understandable. At approximately a half hour before the end of the shift (for example, 5:30 PM here in the US) the book SST will make a final check with the SSE for turnover, and using the appropriate form, send an email to the oncoming shift to inform them of the turnover cases. If a case is in the field, or the Unisys-Dispatch queue, it should be pulled to either the CLARiiON TS1 Vic queue, or the appropriate turnover queue (AMER for America (if logs), EMEA for Europe, Middle East or Africa, and APAC for Asia and the Pacific). If it is in the VIC queue, please leave it there. If a case cannot be worked by the oncoming shift, it should be turned back over to the appropriate queue, with a note explaining why it was unable to be worked.

3.3 Warm Transfers/Consulting with other Groups


Sometimes, when working a case, it becomes necessary to consult with people from another group, or if a case does not involve a problem on the CLARiiON, to move the case to another queue. Before dispatching/forwarding to another queue, somebody from that queue MUST be aware that you are transferring the case to their queue. Dial Extension 70900 and follow the prompts as follows. A: Press 1 for Platforms, or 2 for Solutions A1: Under the Platforms menu, Press 1 for Symmetrix, 2 for CLARiiON *this is us*, 3 for Centera A2: Under Solutions 1 for Info Safety. 2 for Intelligent Supervision, 3 for Infrastructure Services, 4 OSAPI UNIX, 5 OSAPI Windows, 6 Networks Storage, or 7 for Info Mover/SBS You may get kicked out to the cell phone for that group. Let the person on the other end know the situation with the case, and either consult with that person, or if transferring the case to the other group, put a note in the case saying who you talked with in the other group, and that you told them you were going to transfer the case over to their queue.

3.4 Dispatching case to Unisys-Supported sites


If it is necessary to send a case to the field, check to see if the customer is Unisyssupported. Unisys only does break/fix work on the array itself, and in some cases, McData and Brocade switches. Unisys WILL NOT do host attach, host environment, 8

switch zoning or Cisco switch cases. For these cases, if you need something from the field, dial the CSTs (70402), request they page out the local SM, and when the SM calls back in, explain the situation and that we need an EMC CE to attend the site for work that Unisys cannot do.

3.5 Engaging the TSG Implementation Support Group


You may only engage this group on Implementation cases if the cases TSG is an EMC Paid Engagement

Section IV: Types of Cases


4.1 Switch Zoning Issues
If a customer calls in, with an issue with zoning a switch, you can attempt to work the issues for a few minutes. You can also consult with Level 1 SSC Infrastructure Services (70900, 1 at the first voice menu, and 3 at the second voice menu) for assistance. However, if the issue looks to be lengthy, or ongoing, it is preferred to engage the field for the customer on this issue. Unisys will not handle Switch Zoning issues. Call the CSTs (x.70402) and request that the EMC Service Manager be paged out on this case. Once the Service Manager calls back, update him on the case and have him page out an EMC CE on the issue.

4.2 Host Issues


Many of the cases we get here in CLARiiON have to deal with attached hosts. With active host issues, it is recommended that you get a Host Grab (with the CLARiiON IP option for the optional information from the CLARiiON) from the host in question, combined with SP Collects, they may give us a clear picture of what is causing the problem on the host. With new host installs, it is recommended that you at least steer the customer to Powerlink, and the CLARiiON Procedure Generator, (CPG) and have the customer generate a procedure to attach the host to the CLARiiON. If the case is a severity one and/or customers temperature is hot, it is ok to volunteer to help customer go through the procedure generated, but at least give them the CPG and telling them to run it through the CPG, as this may forestall future calls.

4.3 Root Cause Analysis (RCA) Cases


After a case has been worked by the field and resolved (the array is operating normally), the customer may request a Root Cause Analysis (RCA). To start a RCA do the following. (Note: You MUST own the case to do this) Open a log window and hit Ctrl-Shift S to bring up the AIM Key Main Menu. On the next menu, choose (I) RCA TEXT.

The following text will be pasted into Clarify:


Problem Statement: RCA Initiated By : Location of Logs : See primus emc94469 for more details For more information on RCA's including RCA Forms click this link RCA Home Page. http://www.cs.isus.emc.com/config/process/rca/rca.htm

Fill out the requested fields, and change the New Case Status field to RCA Dispatch the case directly to the regional queue for the case (you can get this info by pressing the three dots next to the site name on the main case screen, and on the next screen, choose Support Info. For example, if the region is 59000 and the district is 52117, you would dispatch the case to 59000-52117.

4.4 Performance Request Cases


We need the following information to follow up on any customer request for a performance issue: SP Collects Host Grabs/EMC Grabs Switch Logs (if available) Navisphere Analyzer (.nar) files CE must install Analyzer and run for two hours if not already on array Performance Support Request Form (found on clariis)

Verify all information is up on a publicly accessible location, and is readable. A Level 1 SSE must review SP Collects/Host Grabs to see if there are any hardware issues with array before escalating case to Level 2 for performance request.

4.5 CLARiiON Disk Library (CDL) cases


As of the writing of this document, plans have been made to get every SST and SSE a short training program in CDL basics. The training (scheduled for 1/05) will give a basic understanding of the CDL GUI, how to run x-rays, find version numbers, as well as information on the basic architecture of a backup SAN, what it looks like, where the DL fits in the configuration, and more. There also will be a queue in Clarify for CDL cases (much like the current AX TS1 queue) in the near future. Good Primus Solutions with CDL information: emc98560, emc96882 and emc94967

SSEs need the following information to work a CDL case, per Primus emc97708:

10

A) Disk Library Version (w/Patches Installed) B) Current State C) Is CDL Up or Down? D) Is the box in production? E) Are clients able to access CDL? F) Can console access CDL? G) Backup Server Version and OS H) Is Backup server currently connected to CDL? I) Backup software name and version # J) Is compression enabled on CDL? K) Description of customers current issue. L) Gather X-Rays M) Gather EMCGrab files from host N) Gather SP Collects Once we have this information, (or as much as you can gather), turn case over to a CDLtrained SSE for troubleshooting. Currently, Scott Gauthier and Mark Kersbergen are CDL-trained here in the US.

4.6 VMWare/Legato cases


CLARiiON is not responsible for cases involving VMWare and Legato. Still check the CLARiiON as normal, to determine if there are any faults on the CLARiiON itself, however, if the CLARiiON is functioning normally, then we cannot work the case further. Call a CST Shift Lead at 70440, and inform them of the mis-queued case. They will move the case to the proper queue.

4.7 AX Series cases


The AX Series of CLARiiONs have their own queue AX Series TS1, and currently, we should not be receiving calls in to the Support Lab, as they are supported directly by the reseller. However, if a case comes in, we can attempt to work it. There are several people trained on AX Series arrays here in the US, and they should be given any AX Series case that comes into the queue. Currently, the AX Series trained SSEs are: Jim Ferreira, Neil MacLeod and Michael Valenti. Currently, if an AX Series case has been worked for more then 1 hour, the case is to be escalated to Level 2 Support, who will immediately open a DIMS on the case for Engineering.

4.8 Host Implementation Cases


Customers often call in looking for help with implementing a new host into their environment. Whether or not they explicitly indicate that they are implementing a new

11

host is another issue all together. It is important to follow these steps before attempting to help any customer with a new host install: Have customer check eLab Navigator to ensure the entire configuration is supported. Direct customer to the Clariion Procedure Generator if not already in use. You must encourage the customer to use the CPG and perform the work on their own. We are here to help those customers that have checked eLab, have a supported config, and have issues successfully installing a host while following the CPG. Once ELab and CPG is found to be in use do not complete the call without informing the customer that it is highly recommended they test failover (if in use) before going live with the host. We have found that over 25% of DUs are related to improper configurations of failover.

If the customer creates a Severity 1 case for a new host implementation, it is important to follow this procedure: As the SM and CE are being paged out anyway, request the CST to direct them to you. If Unisys is involved work with the EMC SM directly. Inform the EMC SM and/or CE that the activity needs to go through Change Control. Inform the SM that the case is going to be assigned to the primary CE listed in Clarify for the account while CCA is being performed. If the field team has any issues with this process contact your shift lead or manager to work with you on the case.

Section V: Support Lab Procedures


5.1 Receiving information from the Field
When we need to receive logs, host grabs, or SP Collects, it is preferred that the customer or CE NOT send the CLARiiON SST or SSE information directly (it makes it much harder to get the information if that person is not in the office). Instead, we prefer that they use one of two public methods to get us the data needed. A) The GCSC email box (all CLARiiON SSEs and SSTs should have access to this mailbox, inform a shift lead if you do NOT have this email box in Outlook). The incoming email address is GCSC@emc.com B) The CLARiiON FTP. The IP address is 128.222.1.2, the login is: CLARiiON, the password is: 314ispi. The CE or customer should create a folder on the FTP under the case number (if one does not already exist) and upload the information to that directory. No matter how they upload the data, the customer or CE should call into the Support Lab as soon as the information is uploaded to get the information looked at by a CLARiiON technician. The SST or SSE should verify the information is at the location indicated by 12

the customer or CE, and that all the data is readable. This should be done, while the customer or CE is on the line with the SST or SSE.

5.2 If you cannot make a scheduled shift


If you cannot make a scheduled shift due to illness or other reasons, a shift lead MUST be aware of your absence. The preferred method here in the US for SSTs and SSEs is to call 508-435-1000 and speak with either Ron Natoli (x70650) or Dornell Burns (x70597). A secondary method would be to contact them via email (natoli_ron@emc.com or burns_dornell@emc.com). If you cannot email Dornell and Ron (for example, on weekends), call the support lab and request that one of the SSTs or SSEs inform Ron and Dornell to explain your absence. A missed shift may only be made up at the shift leads discretion.

Section VI: Updates and Upcoming Changes


6.1 Latest Code Versions
The latest version of code released is Release 16, which was released to the public on January 13th, 2005. There are several patches to Release 13 and 14, which add a lot of the functional changes of Release 16 as well.

6.2 Updates to Error Codes in Latest Release


In late December, two updates in the latest version of FLARE code for Clarions updated the meaning of certain status codes: 801 SOFT SCSI BUS ERRORS: 801 Soft SCSI Bus Errors, with a sense code of 0x09 (which means drive is to be considered faulted and in need of replacement), will now dial home as an 803 Recommend Disk Replacement Error. We still treat the error the same way (Running Background Verify on all LUNs associated with the drive before replacing the drive) 920 HARD MEDIA ERRORS: On ATA (181, 250 and 320 GB) drives, a 920 Hard Media Error with extended status of 0x3d is the exact same error as an 820 Soft Media Error with extended status of 0x05. Therefore, on ATA disks, a single 920 error is NOT cause to replace the drive. Only replace an ATA disk for 920 errors if there are three or more errors on the same drive within one week. At that point, follow the rules for proactive drive replacements. Non ATA drives should be replaced after a Hard Media Error as normal.

13

6.3 Upcoming Changes in Procedure


As of January, 2005, several members of the Support Lab are testing a new policy that would reduce the amount of cases that are sent out to the field to gather information from the array (SP Collects, Host Grabs, Memory Dumps, and the like). In these cases, the Support Lab is not to dispatch the case to the field to gather this information, instead they are to contact the customer and attempt to get the desired information remotely (there are some programs designed to make it easier for the customer to gather this information for us remotely) Once this policy has been finalized, it will be added to this handbook.

Section VII: Useful tools for troubleshooting


There are several useful tools for working cases in the Cork Toolbox, available at http://toolbox.isus.emc.com/ Amongst the tools available: (With URLs to directly link you to the tool in question) HEAT (Host Environment Analysis Tool) http://toolbox.isus.emc.com/heat.php SWAT (Switch Analysis Tool) http://toolbox.isus.emc.com/swat.php EMCGrab http://www.aspac.isus.emc.com/emcgrab/ EMCReports http://toolbox.isus.emc.com/emcreports/index.php SPLAT (SP Logs Analysis Tool) http://www.cs.isus.emc.com/csweb2/dgweb/utilities/SPLAT/index.htm

14

Você também pode gostar