Escolar Documentos
Profissional Documentos
Cultura Documentos
Rev. 3.41
Training
Student Guide
Rev. 3.41
Training
2003 Hewlett-Packard Company All other product names mentioned herein may be trademarks of their respective companies. Hewlett-Packard Company shall not be liable for technical or editorial errors or omissions contained herein. The information is provided as is without warranty of any kind and is subject to change without notice. The warranties for HP products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty. Servicing HP Proliant Server Products June 2003
3 Service Resources
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 Serial Numbers ......................................................................................................... 2 Standard Warranties ................................................................................................. 5 Service and parts information resources................................................................... 6 HP PartSurfer.......................................................................................................... 10 Service Parts Information (SPI) CD-ROM............................................................. 11 Electronic and Telephone Support Services........................................................... 12 Information and notification services..................................................................... 24 Learning Check ...................................................................................................... 30
4 Server Technology
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 PCI............................................................................................................................ 2 SCSI Architecture..................................................................................................... 6 Server Subsystems.................................................................................................. 18 Processor Subsystem .............................................................................................. 19 Memory Subsystem ................................................................................................ 30 Power Subsystem.................................................................................................... 48 Input/Output Subsystem ......................................................................................... 52 Software Subsystem ............................................................................................... 60 Fault Prevention and Recovery Management......................................................... 61 Learning Check ...................................................................................................... 70
Rev. 3.31
ii
Rev. 3.31
Contents
6 Array Products
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 Drive array technology ............................................................................................. 2 RAID levels supported by HP array controllers....................................................... 4 Smart Array controllers .......................................................................................... 13 Smart Array controller features .............................................................................. 14 Smart Array 6402/6404 .......................................................................................... 15 Smart Array 641/642 .............................................................................................. 16 Smart Array 5312 ................................................................................................... 17 Smart Array 5304 and 5302 ................................................................................... 18 Smart Array 532 ..................................................................................................... 20 Smart Array 5i and 5i plus...................................................................................... 21 Smart Array 4200 ................................................................................................... 22 Smart Array 431 ..................................................................................................... 23 Integrated Smart Array a.k.a. RAID on a chip (ROC) ........................................... 24 RAID LC2 .............................................................................................................. 25 Smart Array 3200 ................................................................................................... 26 Smart Array 4250ES and 3100ES .......................................................................... 27 SMART 2/E 2/P 2DH 221 and 2SL Array Controllers.......................................... 28 Array Controller Service Considerations ............................................................... 31 Array configuration utilities ................................................................................... 33 Learning Check ...................................................................................................... 34
8 Troubleshooting Methodology
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 2 Troubleshooting Prerequisites .................................................................................. 3 HP Troubleshooting Methodology Overview .......................................................... 9 Step 1-Collecting Data ........................................................................................... 13 Step 2-Evaluating Information to Isolate Mode of Failure .................................... 21 Step 3-Developing an Optimized Action Plan ....................................................... 25 Step 4-Executing the Action Plan........................................................................... 29 Step 5-Evaluating Results....................................................................................... 33 Step 6-Implementing Preventive Measures............................................................ 37 Learning Check ...................................................................................................... 46
Rev. 3.31
iii
iv
Rev. 3.31
Contents
Appendix F - Sample Inspect Report Appendix G - Sample ADU Report Appendix H Cabling Guidelines Appendix I - Error Codes
Rev. 3.31 v
Course Overview
Module 1
Introduction
Servicing HP ProLiant Server Products is a training program that focuses on customer communication skills, service tools, software utilities, troubleshooting, and repair/replacement procedures. It also gives an overview of the features of the ProLiant server product line and major service considerations. Learning concepts are reinforced through a series of comprehensive lab exercises that provide the opportunity to gain valuable hands-on experience with HP products. The training curriculum has two major segments:
Each of these segments requires demonstrable proficiency and is measured by comprehensive certification testing.
Demonstrate customer communication skills. Demonstrate ability to quickly and accurately find ProLiant service documentation and product-specific information. Demonstrate knowledge of and ability to use HP service resources in configuring, upgrading, and servicing ProLiant server products. Demonstrate technical proficiency and skills by working with HP products in a lab environment. Demonstrate logical troubleshooting skills by problem recognition, problem isolation, solution development, and testing for proper operation.
Rev. 3.41
11
Course Design
Servicing HP ProLiant Server Products is a three-day class, made up of a series of product modules, lab assignments, classroom presentations, and product demonstrations. A series of lab assignments are designed to reinforce the concepts presented in the course and give you the opportunity to enhance your proficiency with ProLiant server products. Learning checks and review questions let you confirm your understanding of the information. To make this course a successful learning experience, you are expected to do the following:
Actively participate in all class presentations. Your experience and expertise is of value to the entire class and will greatly supplement the information being presented. Complete the lab exercises assigned by your instructor to the best of your ability. The exercises are your opportunity to demonstrate that you have developed the skills and knowledge to service and support ProLiant server products.
12
Rev. 3.41
Course Content
The following is a brief overview of the information and topics presented in the course modules:
Module1: Course Overview covers course goals, objectives, class materials, class expectations, lab assignments, requirements for certification, and final testing requirements Module 2: Maximizing Customer Satisfaction provides an overview of effective customer communication skills Module 3: Service Resources covers documentation, software, utilities, online services, and technical support Module 4: Server Technology addresses server technologies and subsystems Module 5: Server Product Line Overview describes the ProLiant server product family including: ML, DL, CL and BL server products Product positioning and server introduction timeline Features and service considerations
Module 6: Smart Array Products covers the HP Smart Array controller family Module 7: Tools and Utilities covers ProLiant server installation and configuration Module 8: Troubleshooting Methodology covers the 6 steps to logical troubleshooting and troubleshooting flowcharts) Module 9: Server Diagnostic Tools provides information on the tools used to diagnose faults in various ProLiant server subsystems Legacy products are covered in the following appendices: A. Entry Level Servers B. Workgroup Servers C. Enterprise Servers D. Appliance Servers and Storage Systems
Appendix E on Fault Isolation helps you determine subsystems or Field Replaceable Units (FRUs) that could be causing a problem Other appendices provide sample reports from various tools Lab exercises provide hands-on experience to reinforce the classroom material
Rev. 3.41
13
Classroom Facilities
Your instructor will provide additional information about class specifics. However, it is important to note a few general guidelines that must be followed:
Be sure to locate fire exits. The classroom is smoke free. Your instructor will provide information regarding available smoking facilities. Set pagers and cell phones in silent mode. Your instructor will point out the location of phones for use by class members. There will be scheduled AM and PM breaks. However, feel free to take a break whenever you need to. Your instructor will review the restroom and telephone locations with you before the break. Your instructor will present lunch information. Please ensure that you return from lunch on time. Because a lot of information and activities must be covered in three days, it is essential that you: Begin class promptly. Use your time wisely. Attend the entire class time allotted. The instructor will cover only a portion of the information in this student guide during class. It is to serve as a reference manual for you to use in the future.
14
Rev. 3.41
Exam Preparation
The best way to prepare for the Server Certification Exam is to complete the threeday ILT course Servicing HP ProLiant Server Products or the Servicing HP ProLiant Server Products WBT. Server+ certification and training is also recommended preparation. To improve your chance of success with the certification exam, use the preparation guide which is accessible via the Internet at
http://h18014.www1.hp.com/training/service/ACT/010_500_epg.html
Sample questions are provided in the exam preparation guide to demonstrate the types of questions to expect at the testing center.
Certification Testing
Certification testing is required at proctored testing centers. This is a 75 minute competency verification exam where the student can demonstrate mastery of skills and experience to service ProLiant Server products. Exam questions are developed by experienced service engineers based on skills and content defined in the exam preparation guide. Benchmarking the scores of practicing service engineers sets the passing score. This is your assurance that your APS certification will be valued. To register for the ProLiant Server Maintenance Exam, you may call:
Thompson Prometric at 1-800-366-EXAM
Rev. 3.41
15
APS Application
After successfully passing the Server Certification Exam, and the Server+ certification, complete and submit an APS application, along with the supporting documentation. The application can be found at
http://h10017.www1.hp.com/certification/na/application.html
Certification Maintenance
The final segment of the training program centers upon the necessity for Accredited Platform Specialists to maintain their knowledge of new HP products that are introduced after certification. HP will proactively provide new product self-paced training. This may be in the form of computer-based training distributed on CD, print-based materials, or new product training implemented through the Internet. HP will determine which new products require self-paced training for the APS. When this determination is made, Accredited Platform Specialists will be notified.
Current Information
You will find the most current APS Certification information at
www.hp.com/go/certification
16
Rev. 3.41
Introduction
Our customers have the right to expect their HP equipment to perform properly. When customers call for help, it is because their systems are functioning below their level of expectation. This situation is an opportunity to gain new customers and to win their loyalty through great service and support. Because the customer is usually unhappy with the situation, it is essential to turn the service event into a positive experience. Conveying concern and determination in a positive and confident manner is as important as resolving the problem quickly and professionally. Customers remember a positive service experience, which becomes a very influential factor for future equipment and service needs. Take every opportunity possible to reinforce the customers decision to buy HP products and services. Companies are vulnerable when their database, financial, application, gateway, or e-mail systems are malfunctioning. It is difficult to measure the true cost of downtime for a company. However, we know that productivity suffers when systems are down for repair or maintenance. Maintaining the quality and consistent availability of HP systems is a critical task. There are two essential skills or areas of expertise needed to provide superior service. The first is outstanding customer service skills. The second is outstanding troubleshooting skills. These two skills overlap and each is an essential part of the other. This module presents an approach for maximizing customer satisfaction, including an overview of effective customer communication techniques. Topics include:
Objectives
To maximize customer satisfaction, field service engineers should be able to:
Define the elements of effective customer communication. List ways to provide effective customer service.
Rev. 3.41
21
Computers are critical to the day-to-day functioning of a company. People are adversely affected by failure, it is imperative that you use that voice when dealing with our customers.
Why Use
Customers may make initial purchasing decisions based on price, but good customer service is what builds loyal customers. Customers are also more likely to recommend HP to others if they receive excellent, professional, customer service. Conversely, dissatisfied customers quickly spread the word, and discourage others from choosing HP. Effective customer service skills are not only good for creating customer satisfaction, but are also a key component of troubleshooting. These skills provide the technician or service engineer with an effective diagnostic tool.
Preparing for the service call During the service call Following up
22
Rev. 3.41
Research the company to become familiar with their business, their computer systems, and how they use the systems. If this is not possible before the service call, learn as much as you can while on- site. Research the problem. Try to get as much information as possible so that you arrive with the appropriate tools, parts, hardware, software, and so on. Try these questioning and listening strategies to help you research the problem, whether over the phone or on-site. Seek the most effective person or persons to help you understand the perceived problem. This person is usually the system administrator or the person assigned to network operating system support responsibilities. Maintain eye contact. Ask questions, summarize, and rephrase the answers to make sure you understand what the customer has said. Let the customer do most of the talking. Do not interrupt the customer even if you think you already understand the problem. Additional details may change your mind. Respond to the customers comments so that they know you are listening. Listen to what the customer is saying, and how they are saying it. Be aware of the customers tone. If you are meeting in person, also pay attention to gestures and facial expressions. Keep an open mind and do not jump to conclusions. Use terms and expressions that your customer understands. Be professional and positive. Do not argue with a customer. Listen carefully for pertinent information related to the problem, such as: Symptoms How often the problem is occurring When it started What was the desktop or workstation doing when the problem occurred Information that does not seem to fit. The customer may be misinterpreting the situation.
Rev. 3.41
23
Arrive on time and prepared. If you are unavoidably delayed, let the customer know as soon as possible. Treat all employees at the customers company with courtesy and respect. Maintain a caring attitude while gathering information to understand the problem and the context of the failure. Avoid showing a belittling know-itall attitude, using ill manners, or presenting a non-caring attitude. Maintain a professional appearance at all times. Use an effective greeting: Project a positive, helpful attitude at the beginning to help ensure customer cooperation and smooth communication as the service event progresses. Do not act rushed or hurried. A proper introduction lets the customer know how you fit into the organization, and gives them a sense that a professional is taking responsibility for helping to solve the problem. Include all four parts of an effective greeting, whether in person or over the phone:
Part Salutation Purpose
Examples: A greeting such as Hi, Hello, Good morning, or Good afternoon, spoken sincerely provides acknowledgment of the customer and starts the conversation in a positive direction. Your name Stating your name:
Implies you are responsible for your work. Implies you stand behind what you do. Gives the customer a path for future assistance. Lets the customer know you represent your company proudly. Implies that your company stands behind their work. Gives the customer a path for future assistance.
Company name
Purpose
Tell the customer why you are there or why you are calling. Be sure to include the word HP. The customer can let you know if you are at, or calling, the correct location and save valuable time. If the customer calls you, state your purpose before they ask to let them know they have reached the correct location.
24
Rev. 3.41
Ask questions and listen to the customer. (See the previous tips for questioning and listening strategies.) Apply the steps in the Troubleshooting Methodology Flowchart.1 Be positive. Focus on what you can do, not what you cannot do. Negative words project an uncaring, unhelpful attitude. Instead of I cant replace your diskette drive right now say I can order a replacement diskette drive right now. Keep negative words out of conversation as much as possible, including: no, not, never, wont, cant, doesnt, wrong, and but. Eliminate these words from your vocabulary. Avoid assigning blame. Instead of Youve called the wrong extension say Let me get you to the person who can help. Use confident words and phrases. Do not say This is the first time Ive ever seen one of these. Instead, say This is a great machine or Ill get this problem resolved for you. Explain clearly what you are there to do and get down to business. Set the customers expectations with a timeframe and an outline of your approach to diagnose and resolve the issue. By doing this, you are taking charge of the situation, while discouraging any unreasonable demands. Stay focused on the goal to resolve the issue. If the customer drifts into unrelated topics, politely turn the conversation back to the problem. After you do this a few times, the customer usually takes the hint and either stick to the subject or leave you alone to do your job. Provide a timeline, schedule, or plan, as appropriate. Update the customer as soon as possible and reset any expectations to allow the customer to adequately plan or prepare if you realize that you need additional time troubleshoot, that downtime needs to be scheduled or parts need to be ordered. Update the chain of command, if appropriate. The customer may take care of this task if he/she feels the situation is under control and progressing.
1 Covered in module 13
Rev. 3.41
25
Following Up
After the meeting or service call:
Keep the customer updated and informed. (See the previous tips.) Follow up with any work that needs to be done. Follow through with any promises made. Once you have solved the problem, remind the customer of the problem and the steps you took to solve it. Use the same terms and phrases that the customer originally used, so that it will be clear that you solved the original problem. Use terms and language the customer understands when explaining the solution. Answer any additional questions the customer may have. Include the words Thank you and HP as well as your name and, if you are an authorized service provider, your companys name, in the closure. This assures that the customer knows we appreciate the opportunity to solve the problem and that we are available for future opportunities. Check back with the customer to make sure the problem is truly solved. This is good public relations for HP and will reinforce their decision to choose HP.
If you notify the customer about the possibility of a customer satisfaction survey, they can address any issues they have with the service call before it is ended. You might say, An HP representative may be contacting you to conduct a survey on the service that my company has provided you on this issue. Is there anything that would keep you from scoring our service a five out of five? If the customer has any issues that have not been addressed, you can address them at this time. This could also lead to a discussion about who should be contacted for the survey, and where. That would avoid any possible frustration from the wrong person being contacted. In addition, this gives the customer an opportunity to ask questions about the survey process.
26
Rev. 3.41
Acknowledge the customers feelings. Be sympathetic and empathetic. For example, tell the customer I can understand why you are upset. Rephrasing and repeating the customers complaint lets them know that you understand their problem. Remember, agreeing with the customer and understanding the customer are two different things. Put yourself on the customers team by using we and us. This lets the customer know you are working together to resolve the problem. Apologize for any inconvenience. Accept responsibility for resolving the problem, even if you did not have anything to do with creating it. You are representing HP, not just yourself. Let the customer know that you are there to help, and accept responsibility for resolving the problem. If it is necessary to refer the customer to someone else, let the customer know that you will stay on as part of the team to help solve the problem. Stress what you can do for the customer, rather than what you cannot do. Give the customer as many options as possible and stress them all. Try to find an agreeable solution among the options. Stay positive. Contact your service manager if you are still unable to satisfy the customer. Allow your service manager to handle the situation and to set the customers expectations. Most service managers are experienced in managing these situations. Never escalate the anger in an already difficult situation.
Remember, when dealing with an angry customer, dont get hooked by their anger! It only complicates the situation if you become defensive. Try not to take their anger or the situation personally
Rev. 3.41
27
Learning Check
1. What voice should HP service engineers use?
2.
3.
4.
What types of information should you research before the service call?
5.
6.
7.
8.
28
Rev. 3.41
9.
Rev. 3.41
29
2 10
Rev. 3.41
HP Service Resources
Module 3
Introduction
A thorough understanding of the available service and support resources is key to providing superior customer satisfaction. This module provides an overview of these resources. Topics include:
Serial numbers Standard Warranties Maintenance information Parts information Web-based resources Electronic and telephone support services Training Enterprise Computing Services
Objectives
To use HP service resources effectively, service engineers should be able to:
Interpret a serial number Locate part numbers for HP products Locate product information for HP products Access product and service resources on-line.
Rev. 3.41
31
Serial Numbers
A serial number is an integral part of the product label, on which the serial number must be presented along with the HP product and/or part number. Pre-merger Compaq serial numbers used a 12 digit format while pre-merger HP serial numbers used 10 digits. Effective January 1, 2003 the 10-digit policy became effective for new product design and introduction in the new HP. Both 12 and 10 digit serial number codes provide valuable information such as:
Country code (pre-merger and new HP) Model configuration (pre-merger Compaq) Date of manufacture Supply site/Vendor code Unique sequential identifier
There are two different serial number formats previously used for pre-merger Compaq servers. One is used to identify machines built at Compaq sites existing before 1998, the other to identify products built at Compaq sites added since 1998 and before the merger with HP. Since January 1, 2003, the 10-digit serial number policy has applied to ProLiant servers built by HP. As an exception to the policy an interim interpretation of the 10-digit format will be used until the transition to the standard 10-digit format is complete.
Refurbished Units
Units with an R in the first position of the serial number are refurbished products and sold through a reseller.
32
Rev. 3.41
HP Service Resources
Rev. 3.41
3 3
CC S YWW ZZZZ
ZZZZ = Unique sequential ID YWW = Date of manufacture S = Supply site CC = country code
S ZZZZ AAAA YW
YW = Date of manufacture AAAA = Configuration code ZZZZ = Unique identifier S = Supply site
34
Rev. 3.41
HP Service Resources
Standard Warranties
HP offers a variety of warranty programs:
Servers For most servers HP Services provides a three-year, limited warranty, including Pre-Failure Warranty (coverage of hard drives, memory and processors) fully supported by a worldwide network of resellers and service providers and lifetime toll-free 7 x 24 hardware technical phone support. Limited Warranty includes 3 year Parts, 3 year Labor, 3-year on-site support. Complete information on the warranty of any product is available in the product QuickSpecs on HPs worldwide website at http://h18004.www1.hp.com/products/servers/platforms/warranty/index.html
Refurbished Products (identified by an R in the first character of the serial number) are refurbished by HP and sold through a reseller or through HP Factory Outlet (HP Works). A one-year parts and carry-in limited warranty cover them. Spare and Option Part Warranties are covered by a 90-day warranty or the warranty of the computer into which they are installed, whichever is longer. Most options carry a one-year warranty. Once installed in an HP machine, they assume the remainder of that machines warranty or the remainder of the option warranty, whichever is longer. (Some options have three-year warranties.)
NOTE ABOUT WARRANTIES: Do not make assumptions about warranty based on a systems serial number. For example, the serial number of a system that has a failed hard drive may show a manufacture date of four years ago but the proof of purchase on the hard drive may show that it is under parts warranty.
Rev. 3.41
3 5
Service Announcements, Advisories, and Bulletins Maintenance and Service Guides (MSGs) QuickFind 2000 HP Parts Surfer Service Parts Information (SPI)
36
Rev. 3.41
HP Service Resources
Service Announcements These publications describe new HP products and new or modified service programs for our Authorized Service Providers. A good example of this type of announcement is Service Product Announcement 3341N which announces the ProLiant DL380 G3 server. Service Product Announcements describe new HP products. Service Program Announcements describe new or modified service programs. Service Announcements can be found in the QuickFind Support Reference Library.
Rev. 3.41
3 7
Service Advisories SAs are problem/solution pair documents. They are the official communications from HP to our Service Partners regarding issues and repair instructions. Special return procedures or warranty extension information is also communicated through these documents. The schedule for publishing Service Advisories depends on your access method. External to HP, SAs are available on the FTP site for QuickFind CD subscribers to update their local copies. This is updated once every other week. SAs are also available to external service partners through the CSN Service Partner website. This site is updated daily and partners can search by serial number, date, document type, and more. High-priority SAs are printed and distributed every two weeks to subscribers. They are simultaneously published to OARS. Notice that OARS also contains SAs that are informative and address noncritical technical issues. Service Bulletins These publications are urgent proactive documents of extreme importance to Service Providers. They are printed and delivered immediately, regardless of the biweekly mailing schedule for regular Service Advisories. They are also published directly in OARS.
NOTE: A Service Advisory or Service Bulletin that is distributed on yellow paper is to be considered critically important. It must be given priority attention. Electronically distributed copies highlight a warning message visually separated in the document from regular text.
For a detailed comparison of these service document types and modifications to the numbering system, see Service Advisory 1900(A) and Service Announcement 2000. Service Advisories and Bulletins are available from two sources:
All Advisories and Bulletins are available on OARS. All Advisories and Bulletins are available on CSN.
38
Rev. 3.41
HP Service Resources
Illustrated Parts Catalog Provides an illustrated reference for specific HP personal computer spare parts a good resource for part numbers. Service Preliminaries Provides preliminary warnings and cautions, information about necessary equipment, and warranty information. Removal and Replacement Procedures Describes how to remove and replace field subassemblies for specific HP personal computers. Switch and Jumper Settings Provides detailed information for setting switches and jumpers. It notes specific settings for each board. Power-On Self-Test (POST) Describes the internal system diagnostic programs that are executed automatically when you power on the system. Error Messages and Codes Lists the POST and HP Diagnostics Error Codes, and the required course of action to resolve the problem described by each error code. Specifications Provides operating and performance specifications for the specific HP personal computer for which a particular guide is developed. Index Assists in locating specific information throughout the guide. http://www3.compaq.com/support/reference_library/
Rev. 3.41
3 9
HP PartSurfer
HP PartSurfer provides fast, easy access to parts information for a wide range of HP products. With this application you can: search for part information by product name or model number look up part information by keyword, category or part type cross-reference exchange to new part numbers identify all HP products that use/reference a specific part number generate on-screen and hard-copy reports display exploded product and part views
The HP Service Parts Information (SPI) CD-ROM is a complement to HP PartSurfer. The SPI CD-ROM accesses the same data as HP PartSurfer, but with a slightly different interface. HP-SPI CD-ROMs can be purchased individually or as a subscription (includes four quarterly updates). You can subscribe to HP-SPI by fax, mail or phone. For information on how to order the SPI CD-ROM, please check out the SPI web site
3 10
Rev. 3.41
HP Service Resources
This CD-ROM-based information tool for Windows provides fast, easy access to all the latest information on HP parts in one location. The application includes a Parts Database and Parts Information Reports. Parts Database features enable you to : Search for parts information by product model name or number. Look up parts information by keyword, category, or part type. Cross-reference any exchange part number to its new part number. Identify all HP products that use or refer to a specific part number. Locate part information when product information is unknown. Display exploded product and part views On-screen and hard-copy reports by model name, number, and family. Parts ID lists that give you product breakdown by category and keyword. Parts price lists with product breakdown by part number.
Rev. 3.41
3 11
Websites ActiveUpdate ActiveAnswers Online services Reseller services Technical Support Center Training
3 12
Rev. 3.41
HP Service Resources
HP Websites
Having access to the latest HP technical product information, diagnostic software, and SoftPaq solution files is critical for all service providers. HP provides this information through a variety of websites, including:
Active Update: http://www.compaq.com/products/servers/management/activeupdate/ Active Answers: http://h71019.www7.hp.com/ActiveAnswers/ Channel Services Network: http://web7.compaq.com/csn/ Customer Profile Center: http://compaq.mycustomprofile.com HP PartSurfer: http://partsurfer.hp.com/cgi-bin/spi/main HP Services: http://www.hp.com/services HP Support: http://h71025.www7.hp.com/support/home/ HP Worldwide: http://www.hp.com Product Change Notification: http://h18000.www1.hp.com/pcn/ QuickFind 2000: http://h18018.www1.hp.com/Cas-Catalog/quickfind.html Training: http://h18004.www1.hp.com/training/
Rev. 3.41
3 13
HP Worldwide Website
The HP worldwide website is located at http://www.hp.com.
The HP worldwide website is a major source for information and resources. It provides access to:
3 14
Rev. 3.41
HP Service Resources
HP Services Website
Access the HP Services website at http://www.hp.com/services or select Services from the HP home page.
The HP Services page provides details about the most comprehensive service and support programs available in the computer industry:
Hardware and Software Infrastructure eBusiness Platform eBusiness Solutions Industry Focused
Rev. 3.41
3 15
HP Support Home
Access the HP Support website by selecting Support and Drivers from the HP worldwide website and then selecting Compaq and HP ProLiant Servers or by using the following url: http://h71025.www7.hp.com/support/home/index.asp
From the Support home page you can access information about the following topics:
Software and drivers Natural language search Reference library Forums and communities Support tools Warranty information Contact support Parts Feedback
Following are expanded descriptions for the type of information associated with each link:
3 16 Rev. 3.41
HP Service Resources
Software and drivers This link takes you to a page where you can search for software and drivers by server or by operating system as well as locate Softpaqs for specific products. Natural language search On this page you can use a search engine to find answers to questions you can ask in the same manner as you would in conversation, for example, Where can I find information on the Rapid Deployment Pack? Reference library In the Reference library you can get information for specific products including service notifications, frequently asked questions, white papers and manuals. Forums and communities This link will take you to a listing of a variety of forums such as business customer discussion groups, customer communities and forums for particular HP products. Support tools Among the support tools that you can find on this page are proactive notification tools, Internet call logging services, Internet connection services, InsightManager and ActiveAnswers. Warranty information Here you can use a product serial number to determine the warranty expiration date for that product. Contact support This link takes you to a page where you can email technical support engineers with questions about ProLiant servers, find and order spare parts, locate resellers and service centers and obtain telephone numbers for pre- and post sales technical support worldwide. Parts Here you will find links to HP PartSurfer, the End User Replaceable Parts (EURP) program and the Spare Parts Store as well as spare part information and illustrations. Feedback This link takes you to a form where you can fill out a survey and provide information on how HP can make your support experience better in the future. Note: this is not the place for obtaining technical support for that type question use Contact ssupport.
Rev. 3.41
3 17
In the Reference library you select from Product Category, Product Family and Product Series to target a particular server model. Once you have completed the selections a search engine proceeds to locate documents associated with the product.
3 18
Rev. 3.41
HP Service Resources
As displayed here the Reference Library search locates all of the documents associated with the product including:
Service notifications Frequently asked questions Parts documentation Manuals Service links White papers
Rev. 3.41
3 19
Support Information A more detailed summary of the information in the Reference Library is shown in the table below.
Category Information
Customer Advisories
Controllers and adapters Hardware Operating systems and utilities Internet solutions Communications and networks Clusters System management
Software information referenced by product family, model and operating system or by operating system and software category including
Support Paq Display Management Agents Management Applications and Utilities Network Storage System ROMpaqs/BIOS Utilities
Manuals
Setup and Installation Guide User and Reference Guides Option Related Guides
Other
3 20
Rev. 3.41
HP Service Resources
HP Channel Services Network provides real-time access to service, sales, and support information. Various business transactions can be conducted on CSN, including parts ordering and tracking, electronic claims, CarePaq registration, and Depot registration. Various reports and metrics, sales assistance, as well as support tools are also available. Access CSN at http://web7.compaq.com/csn/. Sign-up online or call 1-800-231-9977, option 8. Based on their business model, partners can choose the appropriate partner program. Each partner program has a specific set of service offerings that partners can sell or deliver. Service delivery partners have access to training, service delivery methodologies and varying levels of technical support, along with additional opportunities to partner with HP to deliver a complete solution to their customers. HP is currently streamlining the authorization process for partners - check with your local channel account manager for more information.
Rev. 3.41 3 21
Information and Support Tools In addition to the various ordering and reporting functions of CSN, there are many tools that are valuable to service personnel. To access these tools, login to CSN and select Tools List from the menu. Next, select Information and Support Tools. The following selections are available: Diagnostic Tools/Utilities provides a search engine by operating system for HP Setup, Diagnostics, and Insight Manager. Links to any training or references for the utilities are available, as well as links to the Softpaqs needed to load these utilities. Technical Information/QuickFind provides a search engine to QuickFind and other technical databases. Search by serial number, document number, product, operating system, and geographic location. Also allows for selection of Service Advisories and Service Bulletins. Technicians Toolbox provides subscriptions to technician tools. Also available is a Basic Toolkit which includes, QuickFind, SmartStart for Servers, and the Support Software CD for Portables, Desktops, and Professional Workstations. Vendor Links provide direct access to vendor support websites (Comm, Baan, Cisco, Intel, Microsoft, Nortel, Novell, Oracle, and SCO) Vendor Support Forum provides access to online forums (Baan, Cisco, Intel, Microsoft, and Novell) allowing technicians to discuss products or service challenges with other IT professionals
3 22
Rev. 3.41
HP Service Resources
HP partnership web
How to become a partner How to find a local HP partner Referral tools for business partners Training and certification
Rev. 3.41
3 23
ActiveUpdate Active Update is a web-based client application that provides proactive notification and automatic delivery of software updates for HP servers, desktops, portables, workstations, and handheld PCs. It connects you to a secure HP server that delivers the latest updates and notifications based on your subscription profile. Once they are downloaded to your local or networked database repository, you choose which updates to implement, and when to deploy them.
Product Change Notification The Product Change Notification system uses a secure web site for proactively communicating product changes via e-mail. Based on a customer provided profile, PCN notifies customers 30-60 days in advance of upcoming critical changes that may impact their computing environment.
Customer Profile Center The Customer Profile Center allows customers to receive information via email that's relevant to their needs--the latest product announcements, updates, news, and special offers that will help users make more informed purchasing decisions. The following table summarizes and compares the features of the information and notification services described above: Comparison of information services
Active Update
Proactive delivery of information and software for specified SoftPaqs via web-based subscription service. Customer controls which updates to deploy and when. Requires the installation of the ActiveUpdate client application. Available for Microsoft Windows platforms only.
PCN
Proactive delivery of information on planned hardware and software changes via email. Notification is sent 30 to 60 days in advance. Does not require the installation of client software.
Profile Center
Proactive delivery of information and links for response to marketing offers via email.
3 24
Rev. 3.41
HP Service Resources
ActiveUpdate
ActiveUpdate is a web-based application that proactively notifies and delivers the latest software updates for HP servers, desktops, workstations, and portables.
Saves you time by downloading and storing new updates automatically Delivers information customized to your needs Provides easy to understand descriptions about the software updates Simplifies access to the latest software updates for HP servers, desktops, portables, and workstations by providing a single point of access
System administrators can subscribe to software updates by server, desktop, workstation or portable models at http://h18000.www1.hp.com/products/servers/management/activeupdate/. Select the models, operating systems, and languages for the SoftPaq files you want downloaded. You must submit your subscription in order to receive downloads. Minimum requirements for using ActiveUpdate are as follows:
Operating System Windows 95/98, Windows 2000 Professional, or Windows NT Workstation/Server 4.0. Minimum Hardware Pentium or higher recommended. Memory Minimum 32MB RAM for Windows 95/98, 64MB forWindows 2000 Professional or Windows NT 4.0. Disk Space 20MB for the ActiveUpdate software and 1GB for the local cache. Internet Connection Internet connection required (direct or dialup) Web Browser Microsoft Internet Explorer 5.0 or higher.
Rev. 3.41
3 25
ActiveAnswers
ActiveAnswers provides a dynamic set of tools, e-services, and information to help customers plan, deploy, and operate business solutions. Designed for CIOs, IT managers, VARs, systems integrators, and consultants, ActiveAnswers simplifies solutions to help you achieve faster returns on your IT investments. First-time users are asked to register. Previous users can log on and begin their research at http://h71019.www7.hp.com/ActiveAnswers/ Categories available for research include:
Customer Relationship Management Database and Business Intelligence ERP and Supply Chain Infrastructure and Architecture Internet and E-Commerce Messaging/Collaboration and Portals Telecom and Service Providers
3 26
Rev. 3.41
HP Service Resources
Reseller Services
HP provides 24-hour-a-day, 7-day-a-week support and services for its Authorized Service Providers at 1-800-231-9977. (Outside of North America, contact your local Geo.) You must supply your Authorized Reseller ID to use this toll-free service. A call prompter answers your call with a recorded message, then instructs you to select the type of service needed. Sections available include:
Technical Support Spare part information, warranty verification, Service Order Management, or field return receiving Product features and configuration information Accredited Systems Engineer Support or HPCare Systems Partner - special IDs are required to access these services
When you call for technical assistance with your HP server or server option be sure to have the following information available: HP Reseller ID# Product name, model number, and serial number Hardware configuration and expansion boards installed Detailed description of any error messages and any associated error codes Knowledge of the conditions under which the problem occurred Familiarity with any previous troubleshooting steps taken Hard copy of data (INSPECT) Hard copy of System Configuration Resource Map Version of network operating systems Printouts of software configuration files setup Updated ROM and drivers, and recorded versions All network operating systems patches installed and up to date An INSPECT, Survey, or Insight Manager report ready to fax or e-mail
Rev. 3.41
3 27
Provide pre-sales and post-sales product information for commercial and consumer, desktops, workstations and portable units. Send brochures and QuickSpecs. Provide part number, configurations, and upgrade information. Provide assistance with dealer locations if you are unable to access it from the option on the menu.
3 28
Rev. 3.41
HP Service Resources
Training
You can obtain HP training information and registration from the following sources:
U.S. 1-800-732-5741 Canada 1-800-392-7024 Outside of North America, contact your local Geo.
Information and registration can also be done on the Education and Training website at http://h18014.www1.hp.com/training/ The following information/services are available:
Training schedule or information via Fax Self-paced training or training installation video information and ordering Sales and technical training information and registration Student ID number for registering in Service, Sales or HP Accredited Professional classes
Rev. 3.41
3 29
Learning Check
1. A unit configuration code is embedded in the 12-digit serial number used in pre-merger HP. True False
2. The MSG includes an illustrated parts catalog and is a good place to find spare part numbers. True False
3. Which service resource provides partners opportunities to enhance business capabilities to sell and deliver global IT services through a Web-based management system? a. ActiveUpdate b. ActiveAnswers c. Channel Services Network d. Support website 4. Which service is a web-based client application that provides proactive notification and automatic delivery of software updates for HP servers? a. ActiveUpdate b. ActiveAnswers c. Service Parts Information d. Support website 5. What information resource would contain parts removal procedures? a. Service Bulletin b. Support website c. Service Parts Information d. Maintenance and Service Guide 6. Which of the following would have information about new products? a. Service Advisories b. Service Announcements c. Service Bulletins d. All of the above
3 30 Rev. 3.41
HP Service Resources
7. What information resource would allow you to identify all HP products that use or reference a specific part number ? a. Maintenance and Service Guide b. ActiveAnswers c. Channel Services Network d. HP PartSurfer 8. Where would find a link to the Spare Parts Store? a. HP support website b. ActiveAnswers c. Channel Services Network d. HP PartSurfer 9. What resource would you use to locate white papers for a particular ProLiant server? a. HP support website b. ActiveAnswers c. Channel Services Network d. HP PartSurfer 10. What resource provides a dynamic set of tools, e-services, and information to help customers plan, deploy, and operate business solutions? a. ActiveUpdate b. ActiveAnswers c. Channel Support Network d. HP Support website
Rev. 3.41
3 31
3 32
Rev. 3.41
Server Technology
Module 4
Introduction
To develop outstanding troubleshooting skills, it is important to know how the servers function normally. The remaining modules focus on the tools needed to develop those troubleshooting skills. This module provides an overview of Server Technology specific to ProLiant servers. It describes the various devices and subsystems that make up a ProLiant server. Topics include:
Objectives
To demonstrate an understanding of ProLiant server technology, service engineers should be able to:
Describe the features of PCI, PCI-X and SCSI bus architectures Identify the various subsystems that make up a server Identify the components of each subsystem Describe the interaction between the various server subsystems Describe how the server will react in a failure situation
Rev. 3.41
41
PCI
PCI architecture and features
The PCI bus (Peripheral Component Interconnect) is a local-bus design developed by Intel, Compaq, DEC, IBM and NCR in late 1991. The focus is oriented around electrical specifications at the expense of ease of integration. The PCI standard offers a number of features and advantages. The fundamental design of PCI involves a buffered local bus. The bus always utilizes some sort of PCI bridge which provides a number of advantages, most importantly making the PCI bus processor independent. Bridging allows buffering and concurrent bus master access, suiting the bus to multitasking environments. By placing a bridge between the bus and the microprocessor the bus frequency may be standardized, eliminating the issues caused by varying processor frequencies in different systems. PCI also offers 64-bit support, making the bus very well suited to Pentium implementations.
L2 Cache
System Bus
PCI Bus
SCSI
PCI Slots
EISA Bus
Diskette
42
Rev. 3.41
Server Technology
PCI Features The most significant features and characteristics of the PCI bus are:
Provides switchless and jumperless support. Plug and play capable, no requirement to run a configuration utility in a PCI only system. (A PCI EISA system will require the configuration utility.) Utilizes a multiplexed bus, meaning addresses and data move over the same set of wires. This requires less system board space and results in less traces or wires than ISA/EISA. The processor independent design allows the bus to be supported under many processors. One option card will work in multiple different processor computers. The bridging requirement protects against potential problems associated with high-speed host/local buses. Supports intelligent I/O devices and burst mode transfers of 133 MB/second. The PCI bus may buffer read or write activity to allow the processor to continue with other tasks rather than wait for the I/O operation to complete. Currently, HP uses 32 and 64-bit wide PCI buses. 64-bit PCI cards can be installed in a 32-bit bus, however they will only run at 32-bit. Parity checking is done on all server PCI buses (control, address, data).
PCI Bus Speeds and Transfer Rates The transfer rates of ProLiant systems are determined by the speed of the various buses, which are derived from the system clock. Peripherals such as the graphics subsystem, network controllers, and hard drives take advantage of the PCI local bus for faster system throughput. The PCI bus provides a 32-bit data path operating at 33MHz or 66 MHz, or a 64bit data path operating at 33MHz or 66MHz. This greatly increases the performance of peripherals such as the graphics subsystem and the network controller.
PCI Bus Performanc e 32-bit c ard 33MHz 133MB/s 32-bit c ard 66MHz 267MB/s 64-bit c ard 33MHz 267MB/s 64-bit c ard 66MHz 533MB/s
Rev. 3.41
43
Year Clock Level MHz 2002 33 66 Volts 3.3 3.3 3.3 3.3 3.3 3.3 3.3
32-bit
64-bit 266 533 533 800 1070 533 800 1070 2130 4270
5 or 3.3 133
PCI-X 1.0 1999 66 100 133 PCI-X 2.0 2002 66 100 133 266 533
44
Rev. 3.41
Server Technology
PCI bus slots The PCI interface provides two bus widths (32- and 64-bit) and two signaling levels (5- and 3.3-volt). Below are the four types of PCI expansion slots used on personal computers and servers.
32-bit connector 32-bit, 5v 32-bit, 3.3v 64-bit, 5v 64-bit, 3.3v 64-bit connector
Adapter bus widths and slot bus widths are completely interoperable; a 32-bit card can be used in a 64-bit slot and a 64- bit card can be used in a 32-bit slot (although its operation will be limited to 32-bit transfers). However, the signaling level may restrict where a card can be installed. A keyed scheme is used on the 32-bit connector to determine the signaling level.
Correspondingly, PCI cards are keyed in one of three ways: 5-volt, Universal, or 3.3-volt (shown below). A Universal card can be installed in either a 5- or 3.3-volt slot, but a 5- or 3.3volt card must be installed in a slot that specifically supports its level of signaling. The latest systems designed to support faster slots with 3.3-volt signaling will accept only Universal and 3.3-volt PCI cards. Legacy cards keyed for 5-volt signaling will not work in systems that provide only 3.3-volt slots.
Rev. 3.41
45
SCSI Architecture
SCSI (Small Computer System Interconnect) is a system-level parallel channel or I/O bus designed for interconnecting peripheral devices that have intelligent controllers, thereby allowing control signals and data to flow to all peripherals. The (smart array) controller does not need to know how many cylinders, heads, or sectors are available on each device. The local device intelligence is capable of managing these functions, including errors. Servers and workstations use SCSI to communicate with a variety of external RAID storage devices. The SCSI system contains three main components:
SCSI Controller The SCSI controller is the interface between the computer and the other devices on the bus. The controller may be built into the mother board or housed on a SCSI host bus adapter card in a PCI or PCI-X slot.
46
Rev. 3.41
Server Technology
SCSI Cables SCSI cables consist of 34 pairs of multi stranded flexible copper wires for a total of 68 conductors. SCSI devices inside the server are connected to the SCSI controller using a 68-pin ribbon cable. The ribbon cable has a connector at each end and one or more connectors along its length. External SCSI devices are connected to the SCSI controller on the SCSI host bus adapter using a round 68-pin cable. Two terminators, one at each end of the SCSI bus prevent signal reflections within the cables. SCSI Devices All SCSI devices share the same data and control lines and only two devices (an initiator and a target) can communicate at a time. To facilitate communication on the bus each device must have a unique address or ID number. The number of physical addresses on a bus is a function of the bus width. There can be up to eight devices on an 8-bit bus (ID numbers 0 to 7) and up to 16 devices on a 16-bit bus (ID numbers 0 to 15).
Rev. 3.41
47
SCSI Protocols
SCSI-1 SCSI-1 devices used proprietary commands and were very often incompatible with each other. SCSI-1 supports 5 MB/s transfers in synchronous mode and 3 MB/s transfers in asynchronous mode. In synchronous mode a block of bytes as a whole is acknowledged by the target. In asynchronous mode each byte is acknowledged by the target. Today this mode is only used during the command phase. SCSI-2 New speed levels allow for 20 MB/s (Fast Wide). One of the greatest achievements of the SCSI-2 specification is the Common Command Set which allows devices from various vendors to cooperate. SCSI-3 SCSI-3 defines Wide Ultra (40 MB/s), Wide Ultra2 (80 MB/s), Ultra3 (160 MB/s) and Ultra320 (320 MB/s) transfers as well as the new LVD interface. The following table summarizes the characteristics of SCSI protocols.
Fast / Fast Wide SCSI Fast SCSI reduces the signal length from 200ns to 100ns and doubles the transfer rate to 10 MB/s. Reducing the signal length makes the signals much more prone to distortion. Fast SCSI requires active termination, high quality cables and supports cable lengths up to 3 meters. Fast Wide SCSI allows a transfer rate of 20 MB/s. Ultra / Wide Ultra SCSI By reducing the signal length to 50ns, the transfer rate achieves 40 MB/s for a Wide Ultra SCSI transfer. The cable length is reduced to 1.5 meters (6 feet). High quality cables witch a matched impedance are required.
48
Rev. 3.41
Server Technology
Wide Ultra2 SCSI Wide Ultra2 is the first SCSI protocol that uses low voltage differential (LVD) signaling instead of Single Ended (SE) signaling. The transfer rate is 80 MB/s and cables can have a length of up to 12 meters. Ultra3 SCSI Also called Ultra 160, Ultra3 SCSI not only increases the transfer rate to 160 MB/s but also introduces improved data reliability by adding a CRC checksum. Ultra3 SCSI has a transfer rate of 160 MB/s and supports domain validation and CRC. Ultra3 SCSI is only available as Wide SCSI. Ultra 320 SCSI Ultra 320 increases throughput to 320 MB/s and adds technologies that improve bus utility and data integrity. These include Higher clock frequency Ultra 320, like Ultra3, uses double transition clocking to trigger data transfer on both the rising and falling edges of the bus clock signal. It also operates at 80 MHz, twice the frequency of the Ultra3 (Ultra 160). Data streaming Read data streaming minimizes the overhead of data transfer by allowing the target to send one data stream packet followed by multiple data packets. Write data streaming performance is also increased because the bus turnaround delay is not incurred between each data packet. Packetization and QAS During arbitration no data is being transferred on the bus so decreasing arbitration time improves SCSI system performance. Quick arbitration and selection (QAS) eliminates the bus free phase and reducing the number of times arbitration must occur. In other words, QAS allows a device waiting for the bus to grab the bus without arbitration after the previous initiator and target disconnect. Together, QAS and packetization increase performance by 20 to 30%. Flow control Flow control allows the initiator to optimize its pre-fetching of data during writes and flushing of data FIFOs during reads. The target device indicates when the last packet of a data stream will be transferred which allows the initiator to terminate the data pre-fetch or begin flushing data FIFOs sooner than previously possible.
Rev. 3.41
49
Electrical Interface
There are three electrical levels of SCSI: Single Ended (SE), High voltage differential (HVD) and Low voltage differential (LVD)
Data line
Data+ line
signal level
signal level
Data- line
Common Ground
Common Ground
Single Ended
LV Differential
Single Ended (SE) SCSI Single-ended SCSI uses the ground line as a signal reference. The receiver detects the magnitude of the signal (TTL technology) and decides whether the signal is a logical one or a logical zero. SE SCSI is very sensitive to ground shifts and electro magnetic interference (EMI). For this reason single-ended SCSI allows only for short cables. High Voltage Differential (HVD) SCSI A much lesser used SCSI technology is High Voltage Differential (HVD) SCSI, where the signals travel on two wires. The difference in voltage between the wire pairs determines if the signal is a logical one or zero. HVD technology has excellent noise immunity and a maximum cable length of 25 meters but requires external transceivers making it more expensive than SE and LVD. The fastest transfer mode supported by HVD is Wide Ultra. Older tape libraries used HVD. HVD SCSI devices cannot be mixed with SE or LVD SCSI devices. Low-Voltage Differential (LVD) SCSI Low Voltage Differential SCSI takes all the advantages from HVD and adds new features. Low signal voltage swings allow the whole technology to be integrated on a single chip. LVD SCSI is backwards compatible with Single Ended SCSI. However, once a single ended device is connected to the bus, all devices will operate in SE mode. LVD was first implemented in Ultra2 SCSI technology. Maximum cable length is 12 meters. All newer SCSI developments starting with Ultra2 SCSI must use LVD signaling.
4 10
Rev. 3.41
Server Technology
Ground
Proliant servers and options do not use termination at the device level. Internal termination for ProLiant servers is active and handled on the bus. Use cables with integrated active termination. Drive cages have active termination on the backplane board. Disk drive enclosures are also terminated by default. No action is required when a disk enclosure is connected to a server. Devices placed on the hot-plug backplane should not be terminated. This is handled actively onboard the controller with active termination applied at cable end.
Rev. 3.41
4 11
LVD 80 MB/s
Ultra3
LVD 40 MB/s
LVD 40 MB/s
SE 40 MB/s
Server Technology
SCSI configuration
SCSI IDs SCSI IDs are usually set up on the drives by selecting a unique ID number through an array of jumpers. ID7 is reserved for the SCSI Host Bus Adapter (Smart Array Controller). IDs are automatically set with HP hot-pluggable hard drives. SCSI devices SCSI devices are daisy chained together using a common conductor or cable. This conductor is a hot-plug backplane in most ProLiant servers. All signals are common between all SCSI devices on the 50- or 68-pin cable. Internal and External Connectors The internal and external connectors of a single SCSI bus cannot be used at the same time. If you have both internal and external devices, two separate SCSI channels must be used. This requires two controllers or a multi-channel controller.
Port-1
Port-1
Port-2
Port-1
Port-2
Port-2
Maximum Supported Devices per Bus Single Ended SCSI supports up to 7 devices - for up to 15 devices a repeater is required. LVD-based SCSI supports up to 15 devices per bus without a repeater. In many documents the maximum number is stated as 14 because the largest StorageWorks drive enclosures support only 14 drives.
Rev. 3.41
4 13
SCSI Connectors
These illustrations show the various wide and narrow internal and external SCSI cable connectors.
External SCSI cables have a round wire with securable connectors. Internal SCSI cables have a ribbon wire with push-on connectors. SCSI cables are keyed to deter improper installation Internal Wide SCSI cables are narrower than standard internal SCSI cables. The external 68 pin (Fast-Wide) cable is wider and the internal 68 pin (FastWide) cable is smaller than the 50 pin one used by Fast-SCSI.
4 14
Rev. 3.41
Server Technology
Rev. 3.41
4 15
4 16
Rev. 3.41
Server Technology
Hot-Plug Drives
Hot-plug drive support allows a failed physical drive in a hardware or software fault tolerant volume to be replaced while the computer is still running. This support requires an array controller supporting hot-plug drives and a hot-plug drive bus. The family of SMART Array Controllers provides this capability.
Hot-plug support enhances the capability of the On-Line Spare drive as the failed drive may be replaced while the computer is still running. The On-Line Spare may become available for any further failed drives. Replaced hot-plug drives may be equal or larger in size to the original drive. Note the following about hot-pluggable hard drives:
The drives require support by the SCSI controller and bus. They support a variety of both hardware and software fault tolerance. In a fault tolerant environment, a failed drive can be replaced without bringing the system down.
Rev. 3.41
4 17
Server Subsystems
The main components of a system are the buses, the controllers, and the subsystems. A server has five subsystems:
Subsystem Processor Components or FRUs Processor(s) Processor board(s), if any System board Processor bus circuitry, including GTL (Gunning Transceiver Logic) bus Terminator board Memory modules Memory expansion board(s) Processor boards that have SIMM sockets and soldered memory Memory controller chip is usually on system board, but some system boards have memory module sockets or soldered-on memory Internal: Power supply 230/115 V AC switch On/Off switch Voltage Regulator Module (VRM) or Power Processor Module (PPM) Fans System board thermistors Access panels External: Uninterruptible Power Supply (UPS) and cables Power cord Power strip or power distribution unit Outlet Line voltage Wall switch Rack-mountable blanking panels Input Keyboard devices Mouse Video display (touch screen) Output devices Video display IMD (Integrated Management Display) Input/Output Serial and parallel ports devices Mouse and keyboard ports Expansion cards (SCSI controller, Video controller, Network Interface Controller, SMART Array controllers, Modem) Storage devices (hard drive, CD-ROM drive, diskette drive) Operating system/network operating system Applications Device drivers Users data files Tools and utilities
Memory
Power
Input/Output
Software
4 18
Rev. 3.41
Server Technology
Processor Subsystem
In general, the processor controls all the activity between the elements that make up a computer. The other devices in the computer are controlled by the program running in the processor. The processor controls the devices by placing a control signal and an address onto the system bus. If the device controller sees the address and control signal it has been configured to react to, it then responds, either reading the data bus (processor WRITE) or placing data onto the data bus in response (processor READ).
Processor Types
Pentium 4 The Intel Pentium 4 processor has a new hyper pipelined design (a 20 stage pipeline vs 10 stage for Pentium III). The deeper pipeline enables instructions inside the processor to be queued and executed at a much faster rate, allowing processors to achieve higher clock speeds. Intels name for new Pentium 4 features is NetBurst.
Following are features of the Pentium 4 processor: Single Instruction Multiple Data (SIMD) uses multiple data elements that are packed into a single instruction. MMX instructions operate simultaneously on two 32-bit integers while SSE instructions simultaneously operate on four 32-bit floats. Eight new 128-bit registers were added for SSE. SSE2 extends MMX and SSE technology. SSE2 is a set of 144 new instructions that are compatible with the original 70 SSE instructions and 57 MMX instructions. The 64-bit MMX instructions are extended to 128-bits and now support two 64-bit double precision FP operations at the same time. This accelerates encryption, video, speech and scientific applications.
Rev. 3.41
4 19
Two arithmetical logical units (ALUs) called double-pumped ALUs have twice the effective performance compared to the ALUs of Pentium III processors as each ALU is capable of executing an operation in every half-clock cycle. Execution Trace Cache - The instruction trace cache is a L1 cache that caches decoded IA-32 instructions and helps to remove decoder pipeline latency. Quad Pumped Front Side Bus (FSB) The FSB is still clocked with 100 MHz. The Pentium 4 processor however, can transfer 4 data sets (one set is 64 bits) per clock cycle. As a result, the FSB has a bandwidth of 3.2 GB/s. The 100 MHz Quad Pumped bus is also referred to as the 400 MHZ FSB. 2.53 GHz and faster P4 processors have a 533 MHz FSB (4.0 GB/s). Xeon and Xeon MP processors The dual processor-capable (DP) and multiprocessor-capable (MP) versions of the Pentium 4 processor are called Intel Xeon and Xeon MP. The dual processorcapable Xeon processor allows two processors to work together in a single system. It offers a larger cache (512 KB or 1MB) than the single processor version (256 KB). The pin layout is different from the single processor version.
(DP) version
MP version
The multiprocessor-capable Xeon MP processor allows four processors to work together on a single front side bus. The L3 cache is available in a 2MB, 1 MB or 512 KB version. Using special chipsets, multiple groups of four processors can be combined into 8-way, 16-way and 32-way systems. Each physical Xeon processor consists of two logical processors. With HyperThreading technology, the two logical processors can execute different tasks simultaneously using shared hardware resources. From a software or architecture perspective, this means operating systems and user programs can schedule threads to logical processors as they would on multiple physical processors.
4 20
Rev. 3.41
Server Technology
Hyper-Threading Technology A system with processors that use Hyper-Threading technology appears to software as having twice the number of processors than it physically has. The two logical processors per chip can execute different threads simultaneously using shared hardware resources.
physical processor 1 physical processor 2 physical processor 3 physical processor 4
LP-1 LP-2
LP-1 LP-2
LP-1 LP-2
LP-1 LP-2
From a software or architecture perspective, this means that operating systems and user programs can schedule threads to logical processors as they would on multiple physical processors. From a hardware perspective, instructions from both logical processors will execute simultaneously on shared execution resources. The end result is a performance boost for multi-threaded and multi-tasked software. Hyper-Threading can be switched off in RBSU for software that cannot profit from hyper-threading. Otherwise, the operating system may execute a job on the idle logical processor that repeatedly checks for work to do (idle loop) consuming significant execution resources. Operating Systems and Hyper-Threading A server with four physical processors may exceed the license limit of the OS, if the OS cannot differentiate between physical and logical processors. Once Windows 2000 reaches the license limit, it will only use the number of processors supported by the OS license. In the example above, Windows 2000 Server would only use logical processors 1, 2, 3, and 4. Windows 2000 Advanced Server would use all eight. A four-processor license for Windows.Net would use all 8 logical processors. Netware 5.0 and higher supports Hyper-Threading but requires a special driver (CPQMPK.PSM). Linux and Solaris 8 also support Hyper-Threading.
Rev. 3.41
4 21
Pentium III processor (Coppermine) Second generation Pentium III processors (code name "Coppermine") are based on 0.18 technology. They have 28 million transistors and use a new 370 pin packaging, called FlipChip Pin Grid Array (FC-PGA). FC-PGA Pentium III processors are available with 133 or 100 MHz FSB and a 256 KB 8-way set associative L2 cache that runs at full processor speed. The Coppermine core is identical with Coppermines in a Slot-1 cartridge.
Service Issue When the processor heat sink is removed and the same heat sink is reinstalled, the thermal contact between the processor and the heat sink becomes damaged. This causes the processor to overheat. To resolve this issue, a new heat sink should be installed.
4 22
Rev. 3.41
Server Technology
Pentium III processor (Tualatin) The latest Pentium III processors are based on 0.13 technology (code name "Tualatin"). The Pentium III processor now has a 512KB L2 cache and is still using the FC-PGA package. The package has an Integrated Heat Spreader (IHS) and is labeled as FC-PGA2. The 370-pin zero insertion force socket (PGA370) is the same as the socket used for Coppermine processors. Coppermine and Tualatin processors are not compatible.
FC-PGA
All Pentium III processors implement a Dynamic Execution micro architecture, a unique combination of multiple branch prediction, data flow analysis, and speculative execution. The processor can execute MMX instructions for enhanced media and communication performance. Additionally, streaming singleinstruction, multiple data (SIMD) extensions for enhanced floating point and 3-D application increase performance. Multiple low-power states can significantly reduce power consumption. The processor includes an integrated on-die 512KB 8-way set associative L2 cache. The L2 cache implements the Advanced Transfer Cache Architecture with a 256-bit wide bus. The processor also includes a 16 KB L1 instruction cache and 16 KB L1 data cache. All caches run at full processor speed. The Tualatin has a cacheable memory space of 64 GB and allows systems with more than 4 GB of RAM. The 0.13 Pentium III processor uses a lower voltage on the front side bus than the 0.18 based processors. As a result, Tualatin processor with 512KB L2 cache will not work in a previous generation platform due to incompatible system bus signal levels. ULV Version The ultra low voltage (ULV) version of the Tualatin has a 100 MHz front side bus and is not compatible with the standard version of the Pentium III processor. It is used in systems that require ultra low power consumption (e.g., notebooks, ProLiant BL10e).
Rev. 3.41
4 23
Pentium III Xeon processors The Pentium III Xeon processor is based on the Pentium III core with a few additions. The L2 cache is 512 KB, 1MB or 2 MB and operates at full speed. The Pentium III Xeon uses the same SC330 package as Pentium II Xeon processors and has all the new features that were introduced with the Pentium III.
72 new instructions designed especially to enhance the performance of floating point operations and to accelerate memory-access. Eight new 128-bit registers have been added to the IA32 architecture. The Pentium III Xeon requires different voltages for the processor core and the L2-cache. Older Pentium II Xeon based systems can be upgraded - but not mixed - with Pentium III Xeon processors. The Pentium III has a fixed core speed multiplier, the Pentium III Xeon, however, still requires external setting of the core speed. Thus it is possible to mix Pentium III Xeon processors with different speeds. The cache address limit is 64 GB and the internal multiprocessor support is limited to four processors per GTL+ bus. 8way systems have a total of three GTL+ busses and do support two groups of 4 processors each. Pentium II Xeon systems can be upgraded with Pentium III Xeon processors. They are supported by the existing PPMs (processor power modules). The 100 MHz FSB has a bandwidth of 800 MB/s only and can be a severe performance bottleneck in 4-processor servers. This can be compensated with a large L2 cache.
4 24
Rev. 3.41
Server Technology
Pentium III Xeon processors (DP version) The 800, 866, 933 MHz and 1 GHz Pentium III Xeon processor is based on the Pentium III core. It does not have the typical Xeon features and supports only two processors per system. The L2 cache size is 256 KB. The FSB speed is 133 MHz. In other words: This processor is a standard Pentium III processor in a Slot-2 cartridge instead of a Slot-1 or FC-PGA packaging. The only difference between a 800 MHz Pentium III and a 800 MHz Pentium III Xeon processor is the packaging and the integrated processor power module called On Cartridge Voltage Regulation (OCVR). OCVR
Caution Pentium III Xeon processors with a 133 MHz FSB are only supported in the ProLiant ML530. These processors have a gold-colored heat sink.
Rev. 3.41
4 25
Processor Steppings Processor steppings are versions of the same processor model that vary only slightly. Each stepping requires changes to System ROM. For each processor stepping Intel provides a microcode patch for inclusion in the System ROM. Within the System ROM there is a table where the patches are stored. HP continually adds newly released Intel patches to keep the ROMs up to date. "Unsupported Processor" Message When a processor is upgraded or the system board is replaced, the server may stop to respond during POST (Power-On Self Test) and the following message is displayed: "Unsupported Processor. System Halted". This happens when the System ROM does not recognize the stepping of the processor. The only solution is to upgrade the System ROM. The RomPaq diskette, however, will not boot after the error message has been displayed. The server must be set to disaster recovery mode. Disaster Recovery Procedure Some servers have a DIP switch labeled "Disaster Recovery". Other systems require the system configuration DIP switch to be set to: 1=on, 4=on, 5=on, 6=on. This setting is not documented in some service manuals. After setting the appropriate switch, insert the RomPaq diskette (CD is not supported). After the system has been powered-on, wait for the beep code that indicates the end of the ROM upgrade. This may take up to 5 minutes. There may be no video output on the monitor during disaster recovery.
Note Upgrade the System ROM before upgrading a processor to avoid the need to use Disaster Recovery.
4 26
Rev. 3.41
Server Technology
Profusion 8-Way Architecture The Profusion chipset joins the two processor buses, the I/O bus, and the two memory ports together through a crossbar switch. The otherwise independent processor and I/O buses are joined by a logical connection that is made only when required to transfer data. The AGTL+ bus running at 100 MHz can support a maximum of five loads per bus. This allows four processors and one connection to the memory controller on each processor bus and up to four host-to-PCI bridges with a connection to the memory controller on the I/P bus. Each of the three AGTL+ buses has independent access to the two memory ports. This architecture prevents I/O traffic from consuming bandwidth on the processor buses. In addition, the use of 100MHz buses and 5 independent paths allows the crossbar switch to deliver an aggregate instantaneous peak throughput of 4GB/s. The following figure shows a block diagram of the 8-way SMP architecture.
Dual 100MHz processor buses with dedicated 100MHz I/O bus (AGTL+) 8-way multiprocessing with Pentium III Xeon processors Multiported system architecture (five-point crossbar switch) Dual-ported, interleaved memory Uniform memory access for all eight processors Dual cache accelerators and up to three host-to-PCI bridges Up to 32GB of synchronous dynamic random access memory (SDRAM)
4 27
Rev. 3.41
Intel Itanium Processors Large Memory Addressability The number of lines (in bits) available to the address bus determines the maximum addressable memory size for a processor. With 64-bit addressing capabilities, the Itanium processor leapfrogs the memory addressing capabilities of the 32-bit processors that preceded it. Intel 32-bit Processors Previous 32-bit processors added four bits of Page Address Extensions (PAE) to translate between 32-bit linear addresses and 36-bit physical addresses. This allowed a theoretical maximum addressable memory size of 64GB. Intel Itanium Processor The Itanium processor steps up to what is commonly referred to as Large memory addressability. Though a theoretical maximum of 16,000,000TB could be reached with 64 address lines in the Itanium, chipset and space constraints allow only 44 physical address pins on the processor. Even with this limitation, the maximum addressable memory is 16TB. The DL590/64 is the first ProLiant server to support the Intel Itanium IA64
EPIC Explicitly Parallel Instruction Computing (EPIC) is a design philosophy. The Itanium architecture is based on EPIC which is a unique combination of innovative features such as predication, speculation and explicit parallelism. Speculation allows the compiler to schedule load instructions ahead of branches and stores to reduce memory latency. Predication eliminates branches and associated branch misprediction penalties. Parallelism enables the compiler to provide more information to the processor allowing it to execute multiple operations simultaneously on a sustained basis.
4 28
Rev. 3.41
Server Technology
Though designed for optimal performance with 64-bit operating systems and software, the Itanium processor supports 32-bit binary compatibility in hardware and does not require software emulation. Machine Check Architecture Enhanced Machine Check Architecture provides advanced error detection, correction and containment which improves the processors ability to contain and fix errors in the caches and on the system bus, reducing downtime. Three Caches The Itanium includes three levels of cache. L1 and L2 caches are integrated into the processor die. The L3 cache is off the processor die, on the cartridge, but runs at full processor frequency. L3 2MB or 4MB of unified, on-cartridge L3 cache organized as 4-way setassociative with 64-byte cache line size. Fully pipelined and optimized to provide fast access to data at a bandwidth of 12.8GB/s using a 128-bit wide cache bus. L2 The L2 cache is 96KB, 6-way set-associative, and fully pipelined with 64byte cache line size. L1 The L1 cache is a 32KB (16KB data and 16KB instruction), 4-way setassociative, and fully pipelined with 32-byte cache line size. Double-Pumped Data Bus The Itanium processor is compatible with a double-pumped data bus. Double pumping a 133MHz bus provides a bus speed of 266MHz, enabling 64-bit system bus transactions between the system controllers and processors at 2.1 GB/s.
Clock cycle
Normal
Double Pumped
data latched
P N
data latched data latched data latched data latched data latched
Double-pumped buses means twice the transactions in a normal clock cycle. Instead of sending, or latching, data out on only one edge of the clock cycle, double-pumped buses send data on the rising and falling edge of the clock cycle. With Itanium systems, there are two overlapping clock strobes, each operating 180 degrees out of phase with the other. Data is sent at that intersection of the two strobes, which happens twice for each clock cycle.
Rev. 3.41
4 29
Memory Subsystem
Memory stores information for future use. Random Access Memory (RAM) is defined as memory in which the data can be read by the processor, modified through processing, and then written back for storage. The amount of time
Speed Comparison Faster
SRAM DDR SDRAM RamBus SDRAM EDO FPM
Slower
required to either read data from or write data to memory is referred to as access time and is measured in nanoseconds. RAM types include FPM DRAM, EDO RAM, SDRAM, RamBus DRAM (RDRAM), DDR SDRAM and SRAM. The graphic above illustrates the comparative speed of the various technologies.
Memory Packaging
Memory is used in several areas of the computer including the Main System, Cache, and Video. Systems use either SIMMs (Single In-Line Memory Modules) or DIMMs (Dual In-Line Memory Modules). These are small circuit boards on which integrated circuits (ICs) are mounted. SIMMs (Single In-Line Memory Modules) SIMMs were developed to be an easy way to upgrade and downgrade system memory. Modern PCs are designed for a larger 72-pin SIMM, and older system boards use a 30-pin SIMM. The additional pins allow each SIMM to deliver four bytes of data (plus parity) in every memory request. SIMMs are inserted at an angle and pushed back. Incompatibility SIMM connectors can be either gold or tin plated. Contact reliability can be affected if the different metal types are mixed, for example, placing a tin-plated SIMM into a gold-plated memory socket. This metal mixing can cause accelerated corrosion, which results in bad connections and can ultimately cause system failure. Contacts must be the same, gold to gold and tin to tin. SIMMs must be installed at the specified speed for the system. Mixing SIMMs of the specified speed with SIMMs of a lower speed can produce timing differences. When SIMMs of the specified speed are mixed with SIMMs of a higher speed, the higher speed SIMMs will run at the specified speed, not the higher speed. Parity and nonparity SIMMs should not be mixed.
4 30
Rev. 3.41
Server Technology
DIMMs (Dual In-Line Memory Modules) DIMMs are the next advancement in memory packaging. DIMMs with parity are 72 bits wide, while SIMMs are 36 bits wide with parity. DIMMs offer greater capacity. DIMMs are available in 5 volt and 3.3 volt. Systems are designed for a specific voltage, and the sockets and DIMMs are keyed to preventing installation of the wrong ones. Buffered DIMMs use a buffer to help reduce loading on the bus and improve signal quality at the DRAM. All address signals and most control signals are
buffered
unbuffere d 5 V
3,3 V
buffered. Data is not buffered. Unbuffered DIMMs have no buffering between the bus and DRAMs.
Dual Inline Memory Module (DIMM)
DRAM Technologies
Over the past few years, improvements in DRAM storage density have increased capacity from just 1 kilobit (Kb) per chip to 512 megabits (Mb) per chip. This improvement in storage capacity has reduced the number of DRAM chips required for a particular module capacity. Until recently, computer memory components operated at 5 volts, the industry standard. Today, computer memory components operate at 3.3 volts, which allows them to run faster and consume less power. Memory Access Time Memory access time is measured in billionths of a second (nanoseconds, ns). Although DRAM density has improved significantly over the last few years, DRAM speed has not kept pace with processor performance because there is a physical limit to how fast DRAM can handle data requests.
Rev. 3.41
4 31
System Bus Timing A system bus clock controls all computer components that execute instructions or transfer data. The smallest unit of time measured by the system bus clock is called a clock tick, or cycle. A complete clock cycle is measured from one rising edge to the next rising edge. The clock speed, or clock frequency, is measured in megahertz (MHz.
Components operate more efficiently when they are in sync, or synchronized, with the system bus clock. If a component is not synchronized (asynchronous) with the system bus clock, either the rest of the system or the component itself must wait one or more additional clock cycles for data or instructions due to clock resynchronization. Memory Bus Speed The speed of the DRAM is not the same as the true speed (or frequency) of the overall memory subsystem. The memory subsystem operates at the memory bus speed, which has the same frequency (in MHz) as the main system bus clock. The two main factors that control the speed of the memory subsystem are the memory timing and the maximum DRAM speed.
SDRAM Technologies
The original DRAM took approximately six system bus clock cycles for each memory access. FPM, EDO, and SDRAM improved performance by automatically retrieving data from additional memory locations on the assumption that they too will be requested. FPM and EDO DRAMs are controlled asynchronously. When processor speeds were less than 66 MHz, FPM and EDO DRAMs were fast enough to keep pace. But as processors became faster, they had to wait more often for data from FPM and EDO DRAMs. SDRAM uses a clock to synchronize the input and output signals on the memory chip. This clock is synchronized with the system bus clock so that the memory chips and processor coordinate the execution of commands and the transmission of data. SDRAM Features SDRAM is the most prevalent memory being used in systems today. In addition to synchronous operation, SDRAM has other features that accelerate data retrieval multiple memory banks, burst mode access, greater bandwidth, and registers. Multiple Memory Banks SDRAM divides memory into two to four banks for simultaneous access to more data. While one memory bank is being accessed, the other bank remains ready to
4 32
Rev. 3.41
Server Technology
be accessed. This allows the processor to initiate a new memory access before the previous access has been completed, resulting in continuous data flow. Burst Mode Access The architectural enhancement of SDRAM allows data to be accessed with each clock cycle after the initial request has been satisfied. SDRAM uses this process, called data bursting, to achieve greater data throughput. Increased Bandwidth The bandwidth (capacity) of the memory bus increases with its width (in bits) and its frequency (in MHz). By transferring 8 bytes (64 bits) at a time and running at 100 MHz, SDRAM increases memory bandwidth to 800 MB/s, 50 percent more than EDO DRAMs (533 MB/s at 66 MHz). Registered SDRAM Modules To achieve higher memory subsystem densities, registers have been added to memory modules. These registers isolate the modules heavily loaded address and control buses from the rest of the system. The fewer loads that the memory bus sees, the greater the amount of memory that can be added to the system.
Rev. 3.41
4 33
Rambus DRAM The Rambus design provides higher performance than traditional SDRAM because RDRAM transfers data on both edges of a synchronous, high-speed clock pulse. RDRAM is capable of operating at 800 MHz and providing a peak bandwidth of 1.6 GB/s. Current RDRAMs use the first generation of signaling technology called Rambus Signaling Level (RSL), which allows data to be transferred on both edges of a synchronous clock pulse, effectively sending two bits every clock cycle. Quad Rambus Signaling Levels (QRSL), the next-generation technology, transfers two bits of data per clock edge, theoretically doubling peak bandwidth.
4 34
Rev. 3.41
Server Technology
Double Data Rate SDRAM Double Data Rate (DDR) SDRAM has the same core design as SDRAM with two basic differences: more advanced synchronization circuitry and delay-locked loop. allow data to be read on both the rising and falling edges of the clock, thus delivering twice the bandwidth of standard DRAM without increasing the clock frequency. DDR SDRAM has peak data transfer rates of 1.6 and 2.1 GB/s at clock frequencies of 100 MHz and 133 MHz, respectively. Because of different signaling technology, it is not possible to mix SDRAM and DDR SDRAM within the same memory subsystem. Although the specification is still being finalized, DDR II will be backward compatible with DDR SDRAM and will improve bus utilization to increase performance and bandwidth, yielding a theoretical peak bandwidth of 6.4 GB/s. DDR II is also expected to provide improvements in cost, power requirements, I/O, packaging, and clocking. This table summarizes the various types of DDR SDRAM and associated naming conventions.
Rev. 3.41
4 35
Server Technology
Rev. 3.41
4 37
4 38
Rev. 3.41
Server Technology
Online Spare Memory mode for G2 servers Online Spare Memory mode provides a higher level of memory protection than Standard Memory mode. Online Spare Memory is beneficial to businesses with sites that do not have sufficient IT staff available to service a failure, do not always have replacement memory on hand, or cannot bring down the server before a scheduled shutdown.
To enable Online Spare mode, customers use the ROM-Based Setup Utility (RBSU) at startup to designate bank C as Online Spare memory. For the ProLiant ML370 G2 and DL380 G2 servers, Bank C must be populated before the server can be configured in Online Spare mode. Banks A and B are considered as system memory with a total capacity of 4 GB if 1-GB DIMMS are used; however, bank B does not have to be populated. The DIMMs installed in bank C must be of equal or greater capacity than those in the remaining banks. For example, if 512-MB DIMMs are used in bank A and 1-GB DIMMs are usedin bank B, the DIMMs in bank C would have to be at least 1-GB DIMMs. The next generation of the Online Spare implementation will not use a dedicated memory bank. Rather, the last populated bank will be the Online Spare bank. For example if banks A and B are populated, the DIMMs in bank B can be used as Online Spare Memory. The memory socket configuration may also differ from current generation product. Refer to the Setup and Installation Guide for memory socket configuration and Online Spare Memory population requirements. In Online Spare Mode, if a DIMM in bank A or B exceeds a predefined error threshold, an amber attention LED in front of the failed DIMM will light. The error will be corrected, but the data from the entire bank that contains the failed DIMM will be copied to the Online Spare memory bank. The failed bank will be deactivated, but the server will remain available until the customer can replace the failed DIMM during a scheduled shutdown. Online Spare memory mode for G3 servers The Online Spare implementation for the ProLiant ML370 G3 and DL380 G3 server does not require Bank C to be populated. The Online Spare Bank is always the last populated bank.
Rev. 3.41
4 39
Advanced Memory Protection for ProLiant 500 series servers The HP ProLiant 500 series servers come standard with a primary memory board. The primary memory board has eight DIMM sockets for a total capacity of 8 GB, if 1-GB DIMMS are used in Standard Memory mode. The HP ProLiant ML570 G2 and ML530 G2 are examples of servers that use 2-way interleaving, while the ProLiant DL580 G2 is an example of a server that uses 4-way interleaving. In systems using 2-way interleaving, the sockets are organized into four banks (A, B, C, and D) with two sockets in each bank. Systems using 4-way interleaving are organized into two banks with four sockets each. The DIMMs must be installed in banks of four, one-at-a-time, and the DIMMS in each bank must be of the same type and capacity for the system to operate properly. No operating system support is required for this option. All software and drivers are in the system BIOS. With a single memory board, customers can also enable Online Spare Memory mode and Single-Card Memory Mirroring. Customers can purchase an optional memory board to increase the available memory in Standard or Online Spare Memory modes or to enable Hot Plug Mirrored mode. The following sections explain the memory protection options for both single-board and dual-board configurations. Online Spare Memory mode (Single memory board configuration) Using RBSU, customers can designate Bank D as Online Spare Memory and designate the remaining banks (A, B, and C) as system memory. Bank D on the primary memory board is always the Online Spare bankeven if the optional memory board is also installed. Bank D must be populated before the server can be configured in Online Spare mode. If one of the DIMMs in banks A, B, or C reaches a pre-defined error threshold, the system copies the data from the entire memory bank that contains the failed DIMM to the Online Spare Memory bank. The system then deactivates the failed bank and illuminates the memory board LED indicator in front of the failed DIMM. HP Insight Manager will provide system warnings on the monitor or by other means such as paging. This operation maintains server availability and memory reliability without service intervention. The DIMM that exceeded the error threshold can be replaced at the customer's convenience during a scheduled shutdown.
4 40
Rev. 3.41
Server Technology
Online Spare Memory mode (Dual memory board configuration) The ProLiant ML570 G2 and DL580 G2 servers support dual memory boards. Using dual memory boards in Online Spare mode, users can increase system memory up to 16 GB and maintain a higher level of memory protection than with Standard Memory mode. If the optional memory board is installed prior to booting the server, bank D on the primary memory board can still be designated as the Online Spare bank using RBSU. Using Online Spare mode in a 2-way interleaving configuration, the server can support up to 2 GB of Online Spare memory in bank D on the primary board and up to 14 GB of system memory in the remaining banks (using 1-GB DIMMs).
Systems with 4-way interleaving have only two memory banks per board (four sockets per bank) and can therefore only support failover from a maximum of three banks to the Online Spare bank. Mirrored memory mode Mirrored Memory mode is a fault-tolerant memory option that provides a higher level of availability than Online Spare Memory. Online Spare Memory mode protects against single-bit errors and entire DRAM failure, but Mirrored Memory mode provides full protection against single-bit and multi-bit errors. For this reason, Mirrored Memory mode is beneficial to businesses that cannot afford downtime and cannot risk waiting until scheduled downtime to replace degraded memory modules. Mirrored memory mode single memory board configuration (non-hot plug) Customers can enable Mirrored Memory mode using the primary memory board that comes standard with the server. This capability provides customers with full protection against single-bit and multi-bit errors using a single memory board. Customers can designate up to two banks (C and D) as mirrored memory. Servers operating in Mirrored Memory mode with a single memory board can support up to 4 GB of system memory (and an equivalent amount of redundant memory) using 1-GB DIMMs. To enable Mirrored Memory mode in a server with 2-way interleaving (ProLiant ML570 G2 or ML530 G2), banks A and B must be configured identically to banks C and D, respectively. To enable Mirrored Memory mode in a server with 4-way interleaving (ProLiant DL580 G2), bank A must be configured identically to bank B. The same data is written to both system memory and mirrored memory banks, but data is read only from the system memory banks. If a DIMM in the system memory banks experiences a multi-bit error or reaches the pre-defined error
Rev. 3.41
4 41
threshold for single-bit errors, banks C and D are automatically designated as system memory and banks A and B are designated as mirrored memory. Data is still written to the system and mirrored memory banks, but it is read only from the system memory banks. This will allow continuous operation and maintain the level of server availability except in the highly unlikely case of a simultaneous error in exactly the same location on a DIMM and its mirrored DIMM. The system illuminates the memory board LED indicators of the DIMMs in the bank that experienced the multi-bit error. These DIMMs can be replaced at the customer's convenience during a scheduled shutdown.
Mirrored memory mode dual memory board configuration Hot Plug Mirrored Memory mode uses the optional memory board to provide complete redundancy and a higher level of memory protection than Online Spare mode. Hot Plug Mirrored Memory also provides hot-add and hot-replace capability to increase server availability. Hot-add allows the customer to increase memory capacity by adding DIMMs to open slots, while hot-replace allows a customer to replace a failed DIMM while the system continues to operate. This capability is especially useful for businesses that cannot afford downtime and cannot risk waiting until scheduled downtime to replace degraded memory modules. Servers operating in mirrored memory mode support up to 8 GB of system memory (and an equivalent amount of redundant memory) using 1-GB DIMMs. To enable Hot Plug Mirrored mode, the two boards must be configured identically. The same data is written to both boards, but data is only read from the primary board.
4 42
Rev. 3.41
Server Technology
Hot Plug Mirrored Memory configuration requirements For hot-plug support, the second memory board must meet the following requirements: Same number of memory banks populated as the first board. Same amount (total capacity) of memory in each bank as the first board. Same type of memory in each bank as the first board (single-sided or doublesided). If a DIMM on the primary board experiences a multi-bit error or reaches the error threshold for single-bit errors, the data is read from the optional board. This will enable the customer to hot-replace the failed DIMMs on the primary board without shutting down the server. HP will use Hot Plug Mirrored Memory along with Advanced ECC to provide protection against all memory errors except in the highly unlikely case of a simultaneous error in exactly the same location on a DIMM and its mirrored DIMM.
On the membrane of the memory board, a Ready to Hot Plug light will indicate when it is safe to remove one of the memory boards. When the light is green, the user can remove either memory board with the following restrictions: If no errors have occurred, either board can be removed. If one of the memory banks has a failure, the user can only remove the board that contains the failed bank. If both boards have a failed bank, the user cannot remove either board. While this type of failure is highly unlikely, this restriction will protect the customer from entering a risky configuration with one memory board that has known multi-bit errors. The server must be shut down to service this type of failure. Hot-plug capabilities The ProLiant 500 Series servers feature hot-add functionality, which allows the customer to increase memory capacity by adding DIMMs to open slots. Hot-add capability requires support from the operating system to recognize the additional memory that is installed. Microsoft Windows Server 2003 supports hot-add capability in the HP ProLiant 500 Series servers.
Rev. 3.41
4 43
Advanced Memory Protection for ProLiant 700 Series servers HP Hot Plug RAID Memory is available for the ProLiant 700 Series servers such as the ProLiant DL 740 and ProLiant DL760 G2). HP Hot Plug RAID Memory allows the memory subsystem to operate continuously, even in the event of a complete memory device failure. RAID, in this case, stands for Redundant Array of Industry-standard DIMMs, which should not be confused with the Redundant Array of Independent Disks (RAID) schemes used for hard disk drive storage. While HP Hot Plug RAID memory is conceptually similar to RAID Level 4 disk storage technology, there are some key performance and implementation differences, which are described next. HP Hot Plug RAID Memory Hot Plug RAID Memory is conceptually similar to RAID Level 4 in that it generates parity for an entire cache line of data during write operations and records the parity information on a dedicated parity cartridge. This parity information is checked during read operations.
This is where the similarity between HP Hot Plug RAID Memory and RAID disk storage technology ends. Hot Plug RAID Memory does not have the mechanical delays of seek time and rotational latency associated with disk drive arrays. Storage subsystem arrays use a single bus to write the stripes sequentially across multiple drives. In contrast, Hot Plug RAID Memory uses parallel, point-to-point connections to write data simultaneously across multiple memory cartridges. Also, Hot Plug RAID Memory eliminates the write bottleneck associated with typical storage subsystem RAID implementations. In a storage array, the RAID controller generally performs a read operation of existing parity before a write operation can be completed. If a dedicated parity drive is being used, a bottleneck occurs. However, because Hot Plug RAID Memory usually operates on an entire cache line of data, there is no need to read existing parity before a write operation. Therefore, no performance bottleneck occurs.
4 44
Rev. 3.41
Server Technology
HP Hot Plug RAID Memory Operation How does HP Hot Plug RAID Memory work? Servers with HP Hot Plug RAID Memory use five memory controllers to control five memory cartridges. Each cartridge can hold up to eight industry-standard DIMMs. When the memory controllers need to write data to memory, they split the data into four blocks and write them to four of the memory cartridges. A RAID engine calculates parity information, which is stored on the fifth cartridge. With the four data cartridges and the parity cartridge, the data subsystem is completely redundant such that if the data from any DIMM is incorrect or any cartridge is removed, the data can be recreated from the remaining four cartridges.
Hot-plug capabilities The redundancy in HP Hot Plug RAID Memory allows customers to hot-replace, hot-add, and hot-upgrade DIMMs without shutting down the server. Hot replace is replacing a failed DIMM while the system continues to operate. HP Hot Plug RAID memory offers hot-replace capability in a driverless implementation that requires no support from the operating system. Servers will have hot-replace capability directly out of the box, regardless of the operating system. Hot-add and hot-upgrade capabilities allow customers to scale the server's available memory. Hot-add allows the customer to increase memory capacity by adding DIMMs to open slots. Hot-upgrade allows the customer to replace smaller capacity DIMMs with larger capacity DIMMs. Hot-add and hot-upgrade capabilities require support from the operating system to recognize the additional memory that is installed. Microsoft Windows Advanced Server, Windows Data Center, Novell NetWare 6.0, and SCO UnixWare 7.1.2 will support these capabilities in the HP ProLiant 700 Series servers. HP is working with other operating system vendors to ensure that these capabilities will be supported in their future releases. When a hot-plug operation is completed, HP Hot Plug RAID Memory automatically rebuilds the data across all the memory cartridges. The process to rebuild 4 GB of memory takes less than 30 seconds.
Rev. 3.41
4 45
Online Spare Memory Configuration Configuration procedure: 1. It is highly recommended you test new memory when first adding it to the system. Follow these three steps: a. Under Advanced Options in RBSU - ROM-Based Setup Utility, change the setting Post Speed Up to disable (enabled by default.) b. Make sure that Online Spare Memory is disabled - it is by default. This option is also in RBSU under Advanced Options and Advanced Memory Protection. c. Reboot. All the memory will be tested. This may take a few minutes, depending on how much memory is installed in your system. Once the memory has been tested, you can enable Post Speed Up again for faster system boot. 2. Once the memory has been tested, power down the system and make sure that bank C is populated with memory no smaller than either bank A or B. 3. Power on your server. Online Spare Memory is disabled by default; therefore, all the memory is initially counted and configured as available primary memory. 4. At the prompt, press F9 to enter RBSU. 5. From the RBSU main menu, select Advanced Options. 6. Using the arrow key, move down and select Advanced Memory Protection. 7. To activate Online Spare Memory, highlight Online Spare and press enter. Once you press enter, your choice is saved. (The default option is Standard ECC, giving maximum memory size for applications that require large memory.) 8. Press ESC twice to go back to the main RBSU menu. 9. Press F10 to exit RBSU and your server will automatically re-boot. As your server reboots subsequent to enabling Online Spare Memory, it will display the following message: xxxxMB System Memory and xxxxMB memory reserved for Online Spare Note: If the memory size requirements for proper operation are not met, RBSU will not allow you to enable Online Spare Memory and will display the message: Caution: Current memory configuration does not support Online Spare. See documentation.
4 46
Rev. 3.41
Server Technology
Online Spare Memory Troubleshooting The system will inform you when the ECC threshold has been exceeded by: 1. Integrated Management Log The IML Log will have the following entry: Online Spare Memory Engaged for Faulty Module (Slot x, Memory Module y) 2. OS Console Depending on your OS, the console will display one of the following messages: a. NT/Windows 2000: The System Management Driver has determined that memory module x in slot n has exceeded the memory error threshold and Online Spare Memory has been engaged. b. Netware: CPQHLTH: Excessive ECC memory errors detected and automatically corrected. Online-Spare Memory engaged. c. UNIX / Linux: Excessive ECC memory errors detected and automatically corrected. Online-Spare Memory engaged. 3. The following LEDs will light: a. Amber colored LED next to the failed DIMM inside the server. LED will stay on to signify which DIMM has exceeded single bit error threshold until the system is rebooted. b. Internal Health LED on the front panel will light up Amber to signify ECC error and switch over. 4. Insight Manager will display Degraded or Failed status under the Advanced Memory Protection section.
Rev. 3.41
4 47
Power Subsystem
The power subsystem includes everything related to power, thermal issues and adequate airflow. It sometimes helps when isolating a failure to think of this subsystem in terms of two groups: everything related to power internal to the system and everything related to power external to the system. The power supply is switch controlled. Hot Pluggable Power Supplies Built-in hot plug power supply support allows users to insert or remove power supplies in fault tolerant configurations while the system is still up and running.
Embedded microcontroller Automatic load sharing Automatic line sensing Independent line cord Hot plug, N+1 redundant All failure conditions sent to IMD and CIM Common design throughout workgroup servers Common design throughout high end servers
Systems with traditional power supplies do not perform a power supply self-test. With intelligent power supply technology, the microcontroller performs a self-test upon startup that checks the power supply temperature sensors, RAM integrity, ROM revision, analog to digital (A/D) and digital to analog (D/A) accuracy, and non-volatile memory integrity. In case of a failed self-test, the power supply will not enable and will indicate failure by flashing an amber status LED. The inclusion of a self-test at system startup greatly increases system reliability. A system administrator can now discover possible power supply problems before a system runs and performs functions. This could prevent the power supply from failing during a critical function. For example, if the D/A accuracy was outside tolerances the power supply status LED would indicate a failure.
4 48
Rev. 3.41
Server Technology
Hot-pluggable power supply assemblies can be identified by a port wine colored removal and insertion latch assembly.
DC Power
AC power is connected to this power supply. Fault detected in this power supply. Failed self-test. Power supply failed to restart after a prolonged fault. Power supply will restart within 20 seconds. DC power not switched on or interlock open. AC power is connected to this power supply.
AC Power
Green
Off
4 49
Hot Pluggable Fans Redundant fans ensure proper airflow around temperature sensitive components in case of fan failure. Server fans speed up as the temperature rises and alert the operating system through Insight Manager if the temperature approaches a critical point. Hot-pluggable, redundant fans are standard on todays servers. Like other hot plug components, these fans can be individually powered down and replaced in the event of a failure, while the redundant fan takes over. This helps ensure that a fan failure will not take the server down. Fan Status Check the fan LEDs to determine fan status.
LED Green Amber Off Fan Status Power to fan. Fan OK Replace fan No power to fan. Ensure fan is properly seated. Ensure power to fan is good. Replace fan.
4 50
Rev. 3.41
Server Technology
Redundant Processor Power Modules (PPMs) or Voltage Regulator Modules (VRMs) A processor requires tightly controlled power from a dedicated power supply. If a power supply module supporting a processor fails, the system goes down. To prevent that, ProLiant servers have either three processor power supply modules to support every two processors (two active and one redundant) or fully redundant power modules. If one power module fails, the redundant power module takes over operation without interrupting system operation. Some redundant PPMs are actually multiple physical PPMs. Some, such as the one pictured here, have two PPMs on one physical board. If one PPM fails, the second one takes over.
Rev. 3.41
4 51
Input/Output Subsystem
I/O devices link the user with the system. I/O devices can be uni-directional or bidirectional. PCI Hot Plug HP PCI Hot Plug technology enables the removal and replacement of PCI controllers without shutting down the system or interfering with other controllers on the PCI bus. The operating system, the system hardware and the device driver must all support PCI Hot Plug for this function to be used. This is currently available with Microsoft Windows NT, Microsoft Windows 2000/2003 and Novell IntranetWare. SCOs UnixWare operating system also gives administrators full hot-plug capability. The first generation of hot-pluggable PCI slots required a utility to turn off the driver in the operating system. The second generation of slots turns off the driver when the slot is powered down. The utility is provided whether first or second generation slots are in the server. PCI Hot Plug systems incorporate the following features that differentiate them from conventional systems:
Advanced system circuitry that permits software control of the PCI Hot Plug slots LED status indicators for each PCI Hot Plug slot that indicate if a slot has power, and if the device driver reported an attention condition Slot release levers that automatically disable power to the hot plug slot when opened Wider PCI slot spacing and dividers between hot plug slots that permit safe insertion and removal of controllers, while avoiding contact with active adjacent PCI options Each hot plug slot can be isolated from PCI bus, uninterrupted service on adjacent adapters Adapter locks prevent removing adapters with power Backward compatible to existing PCI cards Must have networking installed in order to use hot plug PCI, because of RPC calls used
4 52
Rev. 3.41
Server Technology
Board Slot Status The LEDs at each expansion slot indicate the board slot status.
LED What the Slot LEDs Indicate
Power is currently applied to the slot. Do not open the slot release lever. The slot is functioning normally. Power is currently applied to this slot, but the slot needs attention, such as when there is a problem with the slot, the board, or the driver. Do not open the slot release lever. Follow these steps: 1. 2. 3. 4. 5. 6. Through the PCI Hot Plug application, turn power off to the slot (the green LED turns off). Open the slot release lever (the amber LED turns off). Remove or replace the board. Connect the cables to the PCI board. Close the slot release lever. Return power to the slot through the PCI Hot Plug application (the green LED turns on).
Power to this slot is turned off, but this slot needs attention, such as when there is a problem with the slot, the board, or the driver. Follow these steps: 1. 2. 3. 4. 5. Open the slot release lever (the amber LED turns off). Remove or replace the board. Connect the cables to the PCI board. Close the slot release lever. Return power to the slot through the PCI Hot Plug application (the green LED turns on).
The power to the slot is off. If you need to replace the card in this slot, follow these steps: 1. 2. 3. 4. 5. Open the slot release lever. Remove or replace the board. Connect the cables to the PCI board. Close the slot release lever. Return power to the slot through the PCI Hot Plug application (the green LED turns on).
Rev. 3.41
4 53
PCI Hot Plug with Microsoft Windows NT On Microsoft Windows NT servers, installation of the hot-plug user interface creates a new icon in the Control Panel called Hot-Plug. This utility can also be accessed through a shortcut in the Administrative Tools folder. The utility provides a means for managing the hot-plug PCI slots on the local server and on remote nodes. A built-in filter permits the user to select the chassis and slots being viewed. The hot-plug utility provides information about the controllers plugged into the hot-plug slots, such as card location, board specific information, driver name, duplex status, and board status. The administrator can use the hot-plug utility to perform the following maintenance tasks:
Turn the power to individual slots off and on to permit controller replacement View the properties page(s) for the controllers Mark devices failed when they are suspect and remove that status once repaired Run diagnostics on the controllers to determine their current status
4 54
Rev. 3.41
Server Technology
Rev. 3.41
4 55
PCI Hot Plug with Novell IntranetWare The PCI Hot Plug architecture takes advantage of the inherent modularity of IntranetWare to minimize the changes required of third party adapter card software. The system relies on a new central component, the Novell Event Bus, which facilitates communications between the different software modules. The Event Bus is first implemented as a NetWare Loadable Module (NLM), allowing implementation of PCI Hot Plug on existing versions of IntranetWare. These components include:
Novell Event Bus (NEB) Novell Configuration Manager (NCM) OEM Specific System Bus Driver (SBD) Novell Configuration Manager Console (NCMCON) CPQHLTH.NLM Device Drivers ODI-Compliant network adapters NWPA-Compliant storage adapters Other Adapters Installation Tools
4 56
Rev. 3.41
Server Technology
Hot Pluggable Drives Hot-pluggable drive support allows easier servicing and high availability. Built-in hot plug drive support allows users to insert or remove drives in fault tolerant configurations while the system is still up and running. Inserting new hot plug drives is necessary for on-line capacity expansion. Removing hot plug drives is required when a disk drive fails and needs to be replaced.
On-Line
Drive Access
Drive Failure
Status
ON Flashing OFF
Meaning
Hard drive online. Power to hard drive. Do not remove hard drive. Hard drive being rebuilt. Do not remove hard drive. Hard drive off. Drive is being accessed. Drive is not being accessed. Problem with hard drive. Replace drive. Hard drive functioning normally.
Flashing OFF
ON OFF
Rev. 3.41
4 57
The following illustration gives LEDs on the front of LVD drives and their meanings:
3 Fault Off On X
Indicator Meaning
OK to remove drive if not part of a faulttolerant configuration OK to remove failed drive Drive is online, do not remove
4 58
Rev. 3.41
Server Technology
Network Interface Controllers All HP network device drivers have integrated error recovery features that allow the drivers to detect failure events and recover from these errors. The drivers can reset the NIC and continue running, usually without noticeable interruption, after the following types of errors:
Adapter check interrupt When the hardware detects a problem, a detailed console error message is generated and an immediate attempt to recover begins. Link status change Link status changes occur when a cable is unplugged or there is a hub problem. If a fatal link status change occurs, the driver attempts to recover from it. Transmit integrity check failure If the driver receives indication that the interface integrity is compromised (by a cable or hub failure, for example), it reports the failure and attempts to recover.
An RJ45 connection is used on a network interface controller for 100TX. HP network interface controllers support a network speed of 1000 MB/s. Optional Redundant NIC Support Under Windows NT 4.0, Novell IntranetWare, and SCO UNIX, NICs can be installed in redundant controller pairs, sharing a driver. For example, dual-port fast Ethernet network interface controllers can support redundant NICs. When the device driver detects an error on the NIC and cannot effect recovery, the driver switches the roles of the active and standby interfaces (standby becomes active) without interruption of service, allowing conveniently scheduled replacement of the failed controller. In systems with hot plug capability, the failed NIC can be replaced without shutting down the system.
Same Subnet Network and MAC (Media Access Controller) Address failover Detection of Adapter and Cabling Faults
CPQSET The CPQSET installation utility allows the user to run initial diagnostics and configure NIC teams. A CPQSET icon is usually placed in the Control Panel when a HP NIC driver is installed.
Rev. 3.41
4 59
Software Subsystem
The following components make up the software subsystem:
Operating System/Network Operating System Applications Insight Manager Device Drivers Users data files Tools and Utilities Systems and Options ROMpaq ROM-based Configuration Utility Array Configuration Utility Diagnostics Virus Protection
4 60
Rev. 3.41
Server Technology
1
Fault Prevention
Predicts and avoids failures
2
Fault Tolerance
Keeps server running in event of component failure
3
Rapid Recovery
Quickly and automatically recovers from critical failures
Insight Manager
C EC ory m Me AID R
A S R
Rev. 3.41
4 61
Controller Duplexing
Some operating systems support controller duplexing, a fault tolerance feature that requires two SMART Array Controllers. With duplexing, the two controllers each have their own drives that contain identical data. In the unlikely event of a Controller failure, the remaining drives and Array Controller service all requests. Controller duplexing is not the same as duplexing the SCSI buses on a single SMART Controller. Controller duplexing is a function of the operating system and takes the place of other fault tolerance methods. Refer to the documentation included with the operating system for implementation. HP recommends using hardware-based fault tolerance instead of controller duplexing. Hardware-based fault tolerance provides a much more robust and controlled environment for fault tolerance protection. If controller duplexing is used, configure each SMART Controller with RAID 0 to achieve maximum storage capacity. In addition, the following fault-tolerant features will not be available: Online Spare, Auto Reliability Monitoring, Interim Data Recovery, and Automatic Data Recovery.
4 62
Rev. 3.41
Server Technology
Perform an automatic restart in case of a system lockup, thermal issue, or UPS activation Switch to a recovery server in the event of system failure. Send notification to a pager when ASR has been activated. Allow remote control of the server through a serial port, network connection, or a remote insight board that has an onboard modem.
ASR-2 can be configured to page an administrator when the system restarts. ASR2 depends on the application and driver that routinely notify the ASR-2 hardware of proper system operations. If the time between ASR-2 notifications exceeds the specified period, ASR-2 assumes a fault has occurred and initiates the recovery process.
Server Down
Pager
Remote
Server Down
Log-Reboot-Analyze
ASR-2
Reboots server after a H/W or S/W failure
1. 2. 3. 4. 5. 6.
Server Up
Logs error to the Critical Error Log Resets the server Pages the administrator Tests devices automatically, deallocates bad components Reboots server If server reboot is successful, Pages a 2nd time
Pager
Software Error Recovery automatically restarts the server after a software-induced server failure Environmental Recovery allows the server to restart when temperature, fan, or AC power conditions return to normal
Rev. 3.41
4 63
Unattended Recovery
For unattended recovery, ASR-2 logs the error information to the Critical Error Log, resets the server, pages the system administrator (if a modem is present and paging is selected), and tries to restart the operating system. Often the server restarts successfully, making unattended recovery the ideal choice for remote locations where trained service personnel are not immediately available. ASR-2 tries to restart the server up to 10 times. If ASR-2 cannot restart the server within 10 attempts, it places a critical error in the Critical Error Log, starts the server into HP Utilities, and enables remote access if configured. ASR-2 must be configured to load the operating system after restart.
Attended Recovery
For attended recovery, ASR-2 performs the following actions:
Logs the error information to the Critical Error Log Resets the server Pages (if a modem is present and Paging is selected) Starts HP Utilities from the hard drive Enables remote access
During system configuration, these utilities are placed on the system partition of the hard drive. If dial-in access has been configured and have a modem with an auto-answer feature installed, the system administrator can dial in and remotely diagnose or reconfigure the server. If HP Utilities has been accessed for network access, the utilities can be accessed over the network. Insight Manager can be used for dial-in or network access.
4 64
Rev. 3.41
Server Technology
Remote Options
Remote Options enables you to remotely control the server through a modem or a network. Most of the options are self-explanatory, but those that are ambiguous in meaning are explained below:
Serial interface The communications port for the modem that is used by Server Failure Notification and Remote Options. Com1 and 2 are the only available selections. Network status Enable remote control of the server through the network. Network frame type Make sure this option is set correctly for your network; otherwise, no remote communication will occur. Ethernet II is the selection that will work on a standard Microsoft TCP/IP network.
NOTE: In Remote Options, the modem and network access should not be used at the same time. The remote connection function may not work properly when both are enabled.
Hardware Requirements
To use ASR-2 over a modem, you need the following:
HP modem or optional Hayes modem, the communication parameters must be set for 8 data bits, no parity, and 1 stop bit System Configuration Utility, version 2.24 or later and Diagnostics Utility installed on the system partition of the hard drive ASR-2 configured to load HP Utilities after restart
Rev. 3.41
4 65
ASR-2 Security
The standard HP password features function differently during ASR-2 than during a typical system startup. During ASR-2, the system does not prompt for the Power-On Password. This allows the ASR-2 to restart the operating system or HP Utilities without user intervention. To maintain system security, set the server to boot in Network Server Mode (an option in the System Configuration Utility). This option ensures that the server keyboard is locked until the Keyboard Password is entered. Select an Administrator Password (an option in the System Configuration Utility). During attended ASR-2 (local or remote), the Administrator Password must be entered before any modifications can be made to the server configuration
4 66
Rev. 3.41
Server Technology
---Or---
Server boots the HP Utilities on the system partition on the hard drive
| If a modem is installed, ASR puts the modem on auto answer so that the Server Administrator can dial in using third party terminal emulator software to remotely run the HP Utilities to identify the source of the fault
If the server continues experiencing hardware/software errors and the number of ASR cycles exceed the specified number of recovery attempts, the server will log an error to the Server Health Log or the Integrated Management Log and boot the HP Utilities from the system partition on the hard drive
| Or | Local Server Administrator runs HP Utilities from server console to identify the source of the fault
Rev. 3.41
4 67
Simplified ASR
Servers with ROM based setup have Simplified ASR. Simplified ASR is enabled when the Server Management Driver is loaded. It can be disabled through the Insight Manager Recovery icon. The timer is automatically set to 10 minutes. In case of a thermal shutdown, UPS shutdown, or OS hang, the server will attempt to reboot to the operating system after 10 minutes. Simplified ASR does not have the paging features or the configuration features of ASR-2.
4 68
Rev. 3.41
Server Technology
Health Driver
SYSMGMT.SYS/CPQHLTH.NLM - Called System Management Driver Provides support for: ASR Thermal Protection Health Log/Critical Error Log Support Remote Control of server from Insight Manager PC Requires configuration in System Configuration I2C Bus implementation
Driver uses IRQ13 for fan and temperature alerting; possible conflict with other devices.
The Health Driver continually resets the ASR-2 timer according to the frequency you specified in the System Configuration Utility (for example, 10 minutes). If the ASR-2 timer counts down to zero before being reset, due to an operating system crash, or a server lock-up, ASR-2 restarts the server into either HP Utilities or the operating system (as indicated by the System Configuration parameters). The default value is 10 minutes. The allowable settings are 5, 10, 20, and 30 minutes. For remote and off-site (unattended) servers, setting the software error recovery time-out for 5 minutes reduces the server downtime and allows the server to recover quickly. For local (attended) servers located onsite, you can set the software error recovery time-out for 20 or 30 minutes, giving you time to arrive at the server if you wish to manually diagnose the problem. The Health Driver is independent of the ASR-2 timer. You should load it enabling the ASR-2 timer. This allows the driver to detect and log information about numerous hardware and software errors in the Integrated Management Log. However, you cannot enable the ASR-2 timer without loading the Health Driver. Before ASR-2 restarts the server, it will record any information available about the condition of the operating system in the Critical Error Log, or the Integrated Management Log depending on the server support. This information can be used to diagnose an operating system crash or server lock-up, while still allowing the server to be restarted.
Rev. 3.41
4 69
Learning Check
1. PCI provides switchless and jumperless support, plug and play capability, and processor independent design. 2. True False
Conventional PCI adapters will operate in PCI-X slots, and vice versa. True False
3.
Legacy PCI cards keyed for 5-volt signaling will work in systems that provide only 3.3-volt slots. True False
4.
Which of the following statements about parallel SCSI is true? a. b. c. d. The (smart array) controller knows how many cylinders, heads, or sectors are available on each device. The SCSI host bus adapter must be built into the mother board not in a PCI or PCI-X slot. Ultra 320 SCSI operates at twice the frequency of Ultra3. Low-Voltage Differential (LVD) devices are not backward compatible with Single Ended (SE) devices.
4 70
Rev. 3.41
Server Technology
5.
Which of the following statements about SCSI configuration is true? a. b. c. d. On HP hot pluggable hard drives SCSI IDs are usually set up by selecting a unique ID number through an array of jumpers. With proper termination the internal and external connectors of a single SCSI bus can be used at the same time. Single ended (SE) SCSI supports up to 15 devices per bus without a repeater. If you have both internal and external devices, two separate SCSI channels must be used.
6.
Serial Attached SCSI (SAS) will use the same electrical and physical interface as Serial ATA (SATA) which will allow its controller to accept either a SATA or SAS hard drive. True False
7.
Which of the following is a true statement about processors? a. Processor steppings are versions of the same processor model that vary only slightly. Each stepping requires changes to System ROM. b. c. The Pentium 4 has a Quad Pumped Front Side Bus (FSB) that provides an effective speed of 400MHz with a 100 MHz clock. A system with processors that use Hyper-Threading technology appears to software as having twice the number of processors than are physically present. All of the above.
d.
Rev. 3.41
4 71
8. a. b. c.
Which of the following is a true statement about memory? Despite different signaling technology, it is possible to mix SDRAM and DDR SDRAM within the same memory subsystem. ECC Memory subsystems can correct a two-bit failure. The redundancy in Hot Plug RAID Memory allows customers to hotreplace, hot-add, and hot-upgrade DIMMs without shutting down the server Hot-update allows the customer to replace smaller capacity DIMMs with larger capacity DIMMs. HP PCI Hot Plug technology enables the removal and replacement of PCI controllers without shutting down the system or interfering with other controllers on the PCI bus. What three components must provide support to make this possible?
d.
9.
10. While working on a ProLiant server you notice that the speed of the power supply fan is changing. This is an indication of an impending fan failure. True False
4 72
Rev. 3.41
Introduction
ProLiant server products deliver top performance for a variety of business applications. This module describes the rationale for server model designations and covers the chronology of product introductions. This module also provides information on features and service considerations for the newest servers in the product line. Legacy models are covered in appendices A, B and C and appendix D focuses on appliance servers and related products. Topics in this module include:
Product positioning framework Server introduction timeline Maximized Expansion servers (ML) Density-Optimized servers (DL) Ultra-dense server blades (BL) Packaged cluster servers (CL)
Objectives
To demonstrate an awareness of the ProLiant server product line, service personnel should be able to:
Identify the major categories of ProLiant servers. Explain the organizing principles of the ProLiant server product line. Describe the features and characteristics of the newest ProLiant servers. Locate configuration and service information relative to each product.
Revision 3.41
51
The needs of our customers are rapidly changing, driven by the Internet and other accelerating technologies. To meet those needs, ProLiant has continued to evolve by taking on a new positioning framework for the entire family of servers. This positioning framework better addresses our customers target needs, and more clearly reflects the breadth of our offering in a way that will directly tie to what customers want. A new positioning framework has been implemented for the ProLiant server family.
ProSignia Servers have been rebranded ProLiant and aligned with existing ProLiant servers. ProLiant servers have transitioned to a new positioning framework, and have taken on a new numbering system. ML - Maximized Expansion Servers DL - Density-Optimized Servers BL - Ultra-dense, power-efficient server blades CL - Cluster Servers Appliance Servers
Renumbering
Next-generation platforms of current ProLiant servers have been given new numbers based upon the new positioning framework. Only those ProLiant servers that have been announced with new platform architecture have been renumbered. Servers have not been renumbered retroactively. We will continue to sell our current ProLiant servers, with their existing numbering, until they are discontinued. The ProLiant 6000, ProLiant 6500 and ProLiant 7000 have not undergone a platform transition, and have not been renumbered. They will maintain their naming until they reach end-of-life.
Organizing Principles
The new positioning framework for the ProLiant family is based on two organizing principles:
Customer environment designated by prefix, e.g., ML Customer application type designated by series number, e.g., 330
52
Revision 3.41
Customer environment: Customer environment is indicated by the model prefix, e.g., ML denotes emphasis on maximum expansion and DL denotes emphasis on maximum density. The ML line denotes a line of ProLiant servers that offer maximum internal expansion. They are ideal for remote and branch offices and offer all-inclusive server/storage solutions. They are available in both rack and tower models. ProLiant ML Line Transition Table
Transitioned From
ProLiant 400 Prosignia 720 ProLiant 800 Prosignia 740 ProLiant 1600 ProLiant 1600R ProLiant 3000 ProLiant 3000R ProLiant 5500 ProLiant 5500R ProLiant 8000
To
New Models
ProLiant ML330 ProLiant ML350 ProLiant ML370 ProLiant ML530 ProLiant ML570 ProLiant ML750
The DL line denotes a line of ProLiant servers that are densityoptimized for space constrained and rack-mounting environments. They are intended for data center and external storage environments as well as efficient clustering. They are available only in rack-optimized models. ProLiant DL Line Transition Table
Transitioned From To New Models
ProLiant DL320 ProLiant DL360 ProLiant DL380 ProLiant DL580 ProLiant DL590 ProLiant DL760
The BL line denotes a line of ProLiant servers that are ultra-dense, power-efficient server blades, which integrate a server-class chipset, ultra-low voltage processor, and other power-saving components in an ultra-dense design that reduces power and cooling costs and saves space. Customers can install up to 280 ProLiant BL10e server blades in a standard 42U rack for better utilization of valuable data center space. BL systems range from power-efficient single processor blades to highperformance SMP server blades. The CL line denotes a line of ProLiant servers that are packaged for simplified clustering. They are a self-contained, ready-to-go clustering solution and are ideal for a variety of high-availability environments, such as data centers and remote offices. They fit in standard racks or can be configured as a stand-alone tower.
53
Revision 3.41
To
New Models
ProLiant CL380
Customer application type: The level of performance and availability they achieve defines the three series of servers in the ProLiant ML and DL line.
ProLiant 300 series offers cost-effective servers to run small databases and applications, to serve as web servers, or to support infrastructure needs such as file/print and domain server functions. The ProLiant 500 series offers more performance and availability to handle complex web applications, large databases, and to serve as critical file servers. The ProLiant 700 series offers maximum performance and availability for industry-standard computing to support very large databases, multiapplication needs, and mid-range applications. The 700 series servers are also an effective solution for server consolidation.
Generation Identifier
As ProLiant servers transition from one generation to the next there is a need to visually identify which generation of server is being serviced to ensure that the correct documentation, options and parts are used. A one-square-centimeter label will be affixed to the server to identify the generation, e.g.:
Racks: left rack screw opposite Intel logo Towers: top left chassis behind door/bezel
The identifier will be used in documentation where the generation difference is relevant, e.g., technical documentation. It may appear in one of several formats depending on constraints such as available space in a database field:
The identifier may not be used in certain marketing documentation such as brochures and pictures. It will not be applied retroactively but will be implemented with new generations of servers going forward.
Revision 3.41
54
ProLiant 5000
ProLiant 6400R
ProLiant 8500
Workgroup
ProLiant 2500
ProLiant 850R
ProLiant 1850R
Entry Level
ProSignia 200
ProLiant 800
ProLiant 1200
1996
1997
1998
1999
Revision 3.41
55
Maximized Expansion
Cluster
Density Optimized
ProLiant DL380
ProLiant DL360
ProLiant DL580
ProLiant DL320
2000
Revision 3.41
56
ProLiant ML750
ProLiant ML330e
ProLiant ML370G2
ProLiant ML330G2
ProLiant ML350G2
ProLiant DL760
ProLiant DL380G2
ProLiant DL590
2001
Revision 3.41
57
ProLiant ML310
ProLiant DL360G2
ProLiant DL580G2
ProLiant DL320G2
ProLiant DL380G3
ProLiant DL360G3
2002
2003
Revision 3.41
58
ProLiant BL40p
ProLiant BL20p G2
ProLiant ML330G3
ProLiant DL760G2
ProLiant DL740
ProLiant DL560
2003
2004
Revision 3.41
59
Learning Check
1. What are the current organizing principles for the ProLiant positioning framework?
2.
Describe the levels of performance and availability offered by the three series of servers in the ProLiant ML/DL line.
Revision 3.41
5 10
ProLiant ML310 ProLiant ML330 ProLiant ML350 ProLiant ML370 ProLiant ML530 ProLiant ML 570 ProLiant ML750
Objectives
To demonstrate an awareness of ProLiant maximized expansion server products, service personnel should be able to:
Describe the features and characteristics of ProLiant ML servers. Locate configuration and service information relative to each product.
Revision 3.41
5 11
ProLiant ML310
1P Intel Pentium 4 2.0/2.2/2.8GHz 400/533MHz Frontside bus 256MB 266Mhz PC2100 DDR SDRAM standard on 2.53/2.8GHz models 128MB 266Mhz PC2100 DDR SDRAM standard on 2.0/2.2GHz models Four DIMM slots, expandable to 4GB maximum 512KB second level ECC cache Four 64-bit/33MHz PCI NC7760 PCI Gigabit Server Adapter (integrated/embedded) Wake On LAN support Integrated Dual Channel Ultra ATA-100 IDE Adapter with Integrated ATA RAID 0, 1,
& 1+0 (ATA Models) OR
48X CD-ROM and 1.44MB disk drive assembly Support for up to five 1 Wide Ultra3 NHP SCSI hard drives or four 1 ATA NHP
Drives (depending on Model)
Two serial ports One parallel port Two RJ-45 Ethernet ports Two USB ports
Video Warranty
Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory One-year, limited warranty, Next Business Day 1 year on-site limited Global warranty
and Pre-Failure Warranty, which covers processors, memory, and hard drives Certain restrictions and exclusions 5 12
Revision 3.41
Reference 1 2 3 4 5 6
Description 48X CD-ROM Removable media bays 1.44 MB floppy drive Two 1 Non Hot Plug Four 64-bit/33MHz PCI System fan
Revision 3.41
5 13
The NMI Debug button is located near the center of the system board. The NonMaskable Interrupt (NMI) is a diagnostic mechanism that allows for crash dump files to be created in situations when a system is hung and unable to respond to traditional debug mechanisms. The NMI Debug button can be used to diagnose software failures by forcing the operating system to invoke the Non-Maskable Interrupt (NMI) handler and generate a crash dump log. This log can provide critical troubleshooting information that may be difficult or impossible to obtain through other means. The user initiates a Non-Maskable Interrupt (NMI) by pressing the NMI Debug button. The NMI can allow a hung system to become responsive enough to generate a crash dump log. The button is enabled/disenabled in RBSU. Warning! The NMI Debug button causes the unit to abruptly fail, as it is designed to do. Therefore, it should never be used during normal operation. It may be necessary at some time to clear and reset system configuration settings. When the system configuration switch position 6 is set to the ON position, the system is prepared to erase all system configuration settings from both CMOS and NVRAM.
Clearing NVRAM
Warning! Clearing nonvolatile RAM (NVRAM) deletes the system configuration. Refer to Chapter 5 "Server Configuration and Utilities," in the Server Setup and Installation Guide for instructions on configuring the server. To switch to the backup ROM: 1. 2. 3. 4. Power down the server. Set the system configuration switch positions 1, 5, and 6 to the On position. Power up the server. (the ROM will beep and halt when the ROM images have been swapped.) Power down the server, and reset all switches to the default Off position..
The system ID switchbank, located on the system board, is reserved for use by authorized service providers only. All switches default to the Off position. No two SCSI devices connected to the same SCSI controller can have the same SCSI ID. If another SCSI device is connected to the controller, check its SCSI ID before beginning the installation procedure for the additional device. The SCSI ID is set by jumpers located on each device. When installing any ATA devices, make sure that the jumper on the device is set to Cable Select (CS). This setting allows the cable to automatically assign the device ID of an ATA drive attached to the cable.
ATA devices
Revision 3.41
5 14
The ProLiant ML330 replaced the ProLiant 400 and ProSignia 720 servers. ML330e is a lower cost version of the ML330 with support for ATA drives instead of SCSI. ML330 G2 is an entry-level two-processor server. ML330 G3 has Xeon processors, a 533MHz front side bus and an embedded gigabit NIC. The standard features include:
ML330 Processor 667MHz1.0GHz Pentium III 256K L2 1 2 non-hot-plug ML330e 800MHz, 933MHz or 1.0GHz PIII 256K L2 1 64MB/2GB 2 non-hot-plug ML330G2 ML330G3 1.26GHz or 1.4GHz 2.4GHz or 2.8GHz Pentium III Xeon 512K L2 1 or 2 128MB/4GB 2 non-hot-plug 512K L2 1 or 2 256MB/4GB 2 non-hot-plug
Removable Media 4x1.5 (3 available) 4x1.5 (3 available) 4x1.5 (3 available) 5 SCSI or 4ATA Bays for NHP Drives, for NHP Drives, for NHP Drives, 2- NHP Drives, AIT, bay HP SCSI Cage, DAT; Optional 2AIT, DAT AIT, DAT bay HP SCSI Cage AIT, DAT Network Controller Integrated NC3163 Integrated NC3163 Fast Ethernet Fast Ethernet Storage Controllers Integrated, singlechannel Ultra2 SCSI Integrated, dualchannel Ultra ATA 100 Integrated NC3163 Integrated Fast Ethernet NC7760 10/100/1000 Integrated, dualchannel Ultra3 SCSI or Integrated, dual-channel Ultra ATA 100 RAID 4x64bit; 1x32bit (33MHz) Integrated Single Channel Ultra320 SCSI Adapter in a PCI slot 4x64bit (33MHz) Serial , RJ-45, Parallel, Graphics, Keyboard, Mouse, Two USB ports Integrated ATI Rage XL PCI with 8MB RAM 1-1-1
2x64bit; 3x32bit (33MHz) Two serial , RJ-45, Parallel, Graphics, Keyboard, Mouse Integrated ATI Rage XL PCI with 4MB RAM 3-3-3
Two serial , RJ-45, Two serial , RJ-45, Parallel, Graphics, Parallel, Graphics, Keyboard, Mouse Keyboard, Mouse, Two USB ports Integrated ATI Rage Integrated ATI XL PCI with 4MB Rage XL PCI with RAM 8MB RAM 3-1-1 1-1-1
Video
Warranty
Revision 3.41
5 15
Revision 3.41
5 16
Revision 3.41
5 17
Revision 3.41
5 18
Amber indicates pre-failure of processor or DIMM Red indicates failure of processor, PPM or fan Language choice is selected after the F10setup is invoked. This eliminates the need for separate images o f the ROM. When flashing the ROM, the ROMPaq flashes both the System ROM and the integrated SCSI controllers ROM. Only PC133MHz ECC registered DIMMs can be used for the server to boot successfully. The server feature board must be installed in slot 3 for the system to boot successfully. Failure to do this will generate an 800 POST error. The server management information cable must be installed. Failure to do this will generate an 801 POST error. If Wide Ultra2 or Wide Ultra3 drives are mixed with Wide Ultra devices on the embedded controller, all drives will run at Wide Ultra speeds. If Wide Ultra2 and Wide Ultra3 drives are mixed on the embedded controller the devices will run at the maximum speed of the controller and drive. All 64 bit PCI Slots support only 3.3V PCI cards POST error messages are non-standard for ML330 and ML330e only. Always refer to the MSG for POST errors. Before loading the operating system, it must be selected through the System menu of BIOS Setup Utility There is no system utility partition, therefore Diagnostics must be run from BIOS for the ML330. Rom-Based Setup Utility (RBSU) resident in ROM in ML330e and ML330 G2 There is a battery for CMOS on the system board and a battery for NVRAM on the server feature board. Both are removable. All Remote Insight Boards must be installed in slot 4, a32-bit PCI slot (slot 5 for ML330 G2). No option kits for updating to latest processor technology are currently being offered on any of the previous ProLiant ML330 models. 1GHz processor spared with heatsink and 110 CFM fan. ML330 and ML330e do not support the same set of operating systems; to determine the difference see the OS support matrix at ftp://ftp.compaq.com/pub/products/servers/os-support-matrix-310.pdf
Mass Storage
Revision 3.41
5 19
The ProLiant ML350 replaced the ProLiant 800 servers. The standard features of the ProLiant ML350, ML350 1GHz, ML350 G2 and ML350 G3 include:
Generation 1 Processors Intel Pentium III 933 MHz, 866 MHz, 800 MHz, 733MHz, 667MHz, 600EB (extended bus), MHz 128 MB PC133MHz ECC Registered SDRAM DIMM memory Maximum 2GB Integrated 256KB Level 2 ECC cache Upgradeable to dual processing Two 64bit/33MHz, PCI Four 32bit/33MHz, PCI One dedicated ISA slot 1GHz Intel Pentium III 1GHz FCPGA (Flip Chip) Generation 2 and G2 Array Intel Pentium III 1.4GHz, 1.26 GHz or 1.13 GHz (Array model available with 1.4 and 1.26 GHz only) 128 MB PC133MHz ECC Registered SDRAM (256 MB for array models) Maximum 4GB Integrated 512KB Level 2 ECC cache Upgradeable to dual processing Five 64bit/33MHz, PCI (one used by SmartArray 532 in array model) One 32bit/33MHz, PCI No ISA slot HP NC3163 Fast Ethernet NIC (embedded) PCI 10/100 WOL (Wake On LAN) Generation 3 and G3 Array Intel Xeon Processor 3.06, 2.8, 2.4 GHz 533MHz FSB Hyperthreading and NetBurst 256 MB PC2100 ECC DDR SDRAM (512 MB for array models) Maximum 8GB Integrated 512KB Level 2 cache (full speed) Upgradeable to dual processing Four 64bit/33MHz, PCIX (one used by SmartArray 532 in array model) One 32bit/33MHz, PCI No ISA slot Broadcom NC7760 (embedded) PCI 10/100/1000 WOL (Wake On LAN) 5 20
Memory
128 MB PC133MHz ECC Registered SDRAM DIMM memory Maximum 4GB Integrated 256KB Level 2 ECC cache Upgradeable to dual processing Four 64bit/33MHz, PCI Two 32bit/33MHz, PCI No ISA slot
Cache memory
Network controller
HP NC3163 Fast Ethernet NIC (embedded) PCI 10/100 WOL (Wake On LAN)
HP NC3163 Fast Ethernet NIC (embedded) PCI 10/100 WOL (Wake On LAN)
Revision 3.41
Storage controller
Integrated Dual Channel Wide Ultra3 SCSI (Smart Array 532 RAID controller in array model) Support for up to six 1 hot plug drives Two 5.25-inch available 1.44MB diskette drive One 40X Max or faster IDE CDROM drive Two serial ports, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated ATI RAGE XL Video Controller with 8MB SDRAM Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor
Integrated Dual Channel Wide Ultra3 SCSI (Smart Array 641 RAID controller in array model) Support for up to six 1 hot plug drives Two 5.25-inch available 1.44MB diskette drive One 48X Max or faster IDE CDROM drive One serial port, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor
Storage
Support for up to four 1 hot-plug or non-hot-plug drives Two 5.25-inch available 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Two serial ports, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Integrated ATI RAGE IIC Video Controller with 4MB Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor
Support for up to four 1 hot-plug or non-hot-plug drives Two 5.25-inch available 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Two serial ports, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated ATI RAGE XL Video Controller with 4MB SDRAM Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor
Removable drives
Interfaces
Video
Warranty
Revision 3.41
5 21
Revision 3.41
5 22
Revision 3.41
5 23
Revision 3.41
5 24
Both processors must be the same speed. Pentium III processors can no longer be down-clocked or up-clocked. Intel now locks in the speed. All processor sockets must be populated with a processor or terminator board in order for the server to boot successfully. Failure to do this will generate an 802 POST error. If 2 processors are installed, the processor in slot 2 must have the same or lower stepping as the processor in slot 1 in order for the server to boot successfully. Failure to do this will generate an 805 POST message. The processors can be exchanged between processor slots to remedy this. Only PC133MHz ECC registered DIMMs can be used for the server to boot successfully. The system will generate an 804 POST error with incorrect memory installed. PC133MHz ECC registered SDRAM DIMMS are downward compatible in systems using 100 MHz SDRAM. The server feature board must be installed in slot 1for the system to boot successfully. Failure to do this will generate an 800 POST error. The server feature board on the ProLiant 400 and ProSignia 720 is not interchangeable with the board for the ML350. It is recommended that when mixing drives, connect Wide Ultra2 drives to channel 1/A of the integrated SCSI controller, and other drives to Channel 2/B. The connectors on the integrated SCSI controller support 2 internal cables, or one internal and one external cable, or 2 external cables. The cable for SCSI channel 2/B is optional (not standard with the machine) PCI Slots 2 and 3 do not support 5V PCI cards POST error messages are non-standard. Always refer to the MSG for POST errors. Before loading the operating system, it must be selected through the System menu of BIOS Setup Utility Remove the system fan before removing the system board. There is no system utility partition, therefore Diagnostics must be run from diskette.
Memory
Mass Storage
There is a battery for CMOS on the system board and a battery for NVRAM on the server feature board. Both are removable. There is an option kit available to upgrade an existing ProLiant 800 6/350/400/450 to a ProLiant ML350. To upgrade an existing ProLiant 800 Model 6/350/400/450 with the 6/500 Processor Upgrade Option Kit number 401268B21 the server must have: A minimum System ROM revision of 2/18/1999. The Processor Core Frequency Switch reset to 1=ON, 2=OFF, 3=OFF, 4=ON.
Revision 3.41
5 25
The standard features of the ProLiant ML370, ML370 G2 and ML370 G3 include:
ML370 Processors ML370 G2 ML370 G3
Intel Pentium III 800MHz , 866MHz , 933MHz, 1GHz 128MB (expandable to 4GB) of 133MHz ECC Registered SDRAM
Intel Pentium III 1.13 GHz, 1.26GHz , 1.4GHz 256MB (expandable to 6GB) of 133MHz ECC Registered SDRAM Dual interleaved memory Online spare memory capable RBSU configurable
Intel Xeon 3.06GHz w 533MHz FSB Intel Xeon 2.4, 2.8 GHz w 400MHz FSB 1GB of 2-way interleaved capable PC2100 DDR SDRAM running at 266MHz on 3.06GHz models 12GB max 512 MB of 2-way interleaved capable PC2100 DDR SDRAM running at 200MHz on 2.8GHz models and lower 12GB max Online spare memory capable RBSU configurable 512KB L2 ECC cache all models 1MB L3 cache avail on 3.06GHz models Six 100MHz PCI-X slots (non hot plug) Embedded NC7781 NIC 10/100/1000 supporting Wake On LAN
Memory
Cache memory
Six PCI slots (four 32-bit/33MHz, two 64-bit/33MHz) Embedded NC3163 Fast Ethernet NIC 10/100 supporting Wake On LAN
Six PCI slots (four 64-bit/33MHz, two HP 64-bit/66MHz Embedded NC3163 Fast Ethernet NIC 10/100 supporting Wake On LAN
Revision 3.41
5 26
Storage controllers
Integrated dual channel Wide Ultra2 embedded RAID option Optional Integrated Array Controller RAID 0, 1, 1 + 0, 5 (RAID On Chip ROC) Optional Integrated Smart Array 5i Controller 0, 1, 0+1, 5 436.8GB (Six 72.8GB drives) or 509.6GB with 2 (36.4) NHP drives in removable media Six 1 hard drive bays, four removable media bays
Integrated dual channel Wide Ultra3 - embedded RAID option Optional Integrated Array Controller RAID 0, 1, 1 + 0, 5 (RAID On Chip ROC) Optional Integrated Smart Array 5i Controller 0, 1, 0+1, 5 436.8GB (Six 72.8GB drives) or 582.4GB with 2 HP (72.8GB) hard drives in optional drive cage for removable media area. Six 1 hard drive bays, four removable media bays Rack or tower 5U chassis 8MB Upgradeable to dual processing
Internal Storage
582.4 GB ((6 x 72.8 GB 1 with standard internal hot plug drive cage + (2 x 72.8 GB 1) with optional ML3xx Internal Two Bay Hot Plug Wide Ultra2/Ultra3 SCSI Drive Cage) Rack or tower 5U chassis 8MB Upgradeable to dual processing
Rack or tower 5U chassis 4MB Upgradeable to dual processing The Pentium III 1GHz processor option kit is supported on all previously shipped and currently shipping ProLiant ML370 servers. Next-business-day Three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Warranty
Next-business-day Three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Next-business-day Three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Revision 3.41
5 27
Revision 3.41
5 28
Revision 3.41
5 29
Revision 3.41
5 30
Revision 3.41
5 31
The ProLiant ML530 is the next generation of the ProLiant 3000 servers. ML530 G2 is a 2P enterprise server with mirrored memory.
ML530 Processors Processor upgrade 1GHz, 933MHz, 866MHz, 800MHz Pentium III Xeon Supports dual processing 133MHz Frontside bus Highly Parallel System Architecture Pentium III Xeon 1GHZ processor option kit is supported on the ProLiant ML530 800MHz, 866MHz and 933MHz models. 128MB or 256MB (depending on model) 133Mhz ECC registered SDRAM DIMMs upgradeable to 4GB maximum ML530 G2 2.4 GHz, 3.0 GHz Pentium III Xeon Supports dual processing 400MHz system bus Highly Parallel System Architecture Upgradeable to dual processing
Memory
1GB (2 x 512 MB) 200MHz DDR SDRAM DIMMs upgradeable to 16 GB maximum Advanced Memory Protection including Mirrored Memory and Online Spare Memory 2:1 interleaved memory 512KB L2
256KB L2
Eight slots: Five 64-bit PCI (33MHz) Two 64-bit PCI (66MHz) One 32-bit PCI (33MHz) Integrated 10/100 NC3163 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra2 SCSI controller Smart Array 5302/32 Array Controller (Array Models only)
Seven PCI-X Slots Four 64-bit/100MHz Hot Plug Three 64-bit/100MHz Non-Hot Plug Integrated 10/100 NC3163 Fast Ethernet Wake On LAN and PXE support Integrated Dual Channel Wide Ultra3 SCSI controller
Network controller
Storage controller
Revision 3.41
5 32
Two Ultra3/Ultra4-ready SCSI Drive Cages standard support up to 12 1 hot plug hard drives Optional ML5xx Internal Two Bay Hot Plug Wide Ultra2/Ultra3 SCSI Drive Cage (with fan) 1.44MB diskette drive One 40X IDE CD-ROM drive Support for up to twelve 1.0-inch drives Optional drive cage adds support for 2 additional 1.0 inch drives Two bays for optional tape backup, DVD, or SCSI devices Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated Rage XL 8MB SDRAM video memory Next-business-day, three-year onsite limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Two 5.25-inch available 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Support for up to six 1.0-inch drives Optional drive cage adds support for 6 additional 1.0 inch drives
Interfaces
Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port
Video Warranty
ATI Rage IIC 4MB video RAM Next-business-day, three-year onsite limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Revision 3.41
5 33
Revision 3.41
5 34
Revision 3.41
5 35
Both processors must be the same speed. Pentium III Xeon processors can no
longer be down-clocked or up-clocked. Intel now locks in the speed. slot 2 is terminated on the system board, slot 1 is not.
In a single processor system, always install the processor in slot 1. Processor Pentium III Xeon Processors with the gold colored heat sinks must be
Memory installed. Gold colored heat sinks indicate a 133MHz bus. Older processors (black >100MHz bus or green heat sinks 100MHz bus) will not boot.
Only PC133MHz ECC registered DIMMs can be used for the server to boot PC133MHz ECC registered SDRAM DIMMS are downward compatible in
systems using 100 MHz SDRAM.
successfully. The system will generate an 804 POST error with incorrect memory installed.
Cables
Cables are color-coded to reduce service time and support. The routing of cables is very important. When removing or replacing cables,
make sure that you route them in the same manner as the original, including the use of any cable clips. This will assure no cables are pinched when the system board tray is moved. bracket can be removed to provide room to maneuver.
When replacing the system board tray or cables, the PCI retainer and PCI
PCI Retainer
PCI Bracket
Revision 3.41
5 36
The standard features of the ProLiant ML570 and ML570 G2 include the following:
ML570 Processors Intel Pentium III Xeon processor 900MHz, 700MHz Upgradeable to quad processing Expansion slots Six total, five available: Two 64-bit 66MHz PCI Hot Plug Two 64-bit 33MHz PCI Hot Plug One 64-bit 33MHz PCI Non-Hot Plug (not available) One 32-bit 33MHz PCI Non-Hot Plug Storage controller Integrated 10/100 NC3163 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra2 SCSI controller Optional integrated Smart Array controller ML570 G2 Intel Pentium III Xeon processor 1.4GHz, 1.5GHz, 1.9GHz, 2.0GHz, 2.5 GHz, 2.8GHz Hyper-Threading technology 400MHz frontside bus Upgradeable to quad processing 2MB L3 (2.0GHz, 2.8GHz only) 1MB L3 (1.5GHz, 1.9GHz, 2.0GHz, 2.5 GHz) 512KB (1.4GHz) 1024MB (PC1600-MHz Registered ECC SDRAM DIMM Memory) (Standard on 2P Rack Models only) 512MB (PC1600-MHz Registered ECC SDRAM DIMM Memory) (Standard on 1P Rack Models only) Support for a maximum of 32GB Seven 64-bit/100MHz PCI-X slots (four hot-pluggable)
Cache memory
2MB L2 per processor (900MHz, 700MHz) 1MB L2 per processor (700MHz only) 1024MB PC100MHz Advanced ECC SDRAM (900MHz) 512MB PC100MHz Advanced ECC SDRAM (700MHz) Support for a maximum of 16GB
Memory
Network controller
Integrated 10/100 NC3163 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra3 SCSI controller Optional integrated Smart Array controller 5 37
Revision 3.41
Storage
One 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Twelve 1 hard drive bays
One 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Twelve 1 hard drive bays Optional two additional 1 hotpluggable drives Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port Integrated ATI Rage IIC Video Controller with 4MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Interfaces
Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port Integrated ATI Rage IIC Video Controller with 4MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Video
Warranty
Revision 3.41
5 38
Tower Models
1. 1.44-MB Floppy Drive 2. 32x or 40x IDE CD-ROM 3. Front Bezel 4. Wide Ultra2/Ultra3 Hot Plug Drive Cage (12 x 1) (Two 6 x 1 drive cages ship standard) 5. Diagnostic Lighting 6. 5.25-inch Removable Media Bays 7. Hot Plug Fans 8. Peripheral Board 9. Processors 10. Memory Board
Rack Models
1. Rack handles 2. Sliding Rails 3. 1.44MB Floppy Drive 4. 32x or 40x IDE CD-ROM Drive 5. Diagnostic Lighting 6. Wide Ultra2/Ultra3 Hot Plug Drive Cage (12 x 1) (Two 6 x 1 drive cages ship standard) 7. 5.25-inch Removable Media Bays
Revision 3.41
5 39
Revision 3.41
5 40
Power: The Power LED will be flashing amber if there is a temporary shutdown due to a thermal event. If it is steady amber, the system is in standby and no +5V, +12V or +3.3V power is available. Auxiliary power is supplied to the system and a portion of the system logic may still be active. LEDs will have power and may be used for diagnosis. If the LED is off, no AC power is provided to the system. Memory: Flashing amber memory status indicates a processor or memory failure which can be pinpointed by checking the Internal Diagnostics Display (IDD) on the peripheral board (discussed later). Fan: Flashing amber fan status indicates a fan failure. LED indicators on the individual fans will enable you to identify the one that is failing. Power supply: Flashing amber power supply status indicates a failure. LEDs on the individual supplies will identify which one has failed.
An improperly seated component in the interlock chain causes the associated LEDs on the system board to light. There are seven LEDs to monitor seven components: four processor boards, the memory board, the peripheral board and the power supply backplane board. All of the LEDs are extinguished if there are no interlock errors. One or more LEDs are lit when a board is not properly seated.
Revision 3.41
5 41
There is an Internal Diagnostic Display on the peripheral board which indicates the failure of a memory module or processor. It displays a two-digit alphanumeric code that corresponds to a specific memory module or processor. The Diagnostic jumper must be removed before the IDD will display a code. Internal Diagnostic Display (IDD) Indicator Codes
Serial Port B is not installed in the factory. A cable is included in the country kit that can be connected from the peripheral board to the back of the chassis. There is a blank plate installed which can be removed when the connector is installed. The WOL feature is only supported by operating systems that support ACPI. At this time, that is only Windows 2000. The WOL feature is enabled through the System Configuration Utility. Use the following steps to enable WOL. 1. 2. Press the Ctrl and A keys before the Continue message displays. This will take you to the Advanced Mode. Scroll down to find the Enable WOL selection
All Remote Insight Boards, including the new Remote Insight Lights-Out Edition, must be installed in PCI slot 6. Cabling the board to J8 of the system board provides the Lights-Out Edition with full control over the server power state. There are no VRM or PPM slots. This machine has On-Chip Voltage Regulation (OCVR). The VRM or PPM is part of the processor cartridge. The entire system board tray is a field replaceable unit including the system board itself. When lifting the ProLiant ML570 server, do not handle the server by the bezel because damage to the bezel may result. (Ribs in the plastic may get broken and cause the bezel to vibrate). To use the IRC capability, an external modem must be connected to one of the serial ports.
Revision 3.41
5 42
ProLiant ML750
Memory Capacities
Processors 4 4 2
Eleven hot-pluggable 64-bit PCI slots (9 x 33MHz, 2 x 66MHz); ten 64-bit, one 32-bit Integrated HP NC3134 Fast Ethernet NIC 64 PCI Dual Port 10/100, upgradeable to Gigabit
Cableless Smart Array 4250ES Controller with optional redundancy Two half-height removable media bays Support for up to 21 one-inch hot-plug Wide Ultra3 SCSI hard drives in three combinable drive cages Integrated 1280 x 1024 x 256 color on PCI local bus, 2-MB video memory Rack-optimized 14U chassis Global three-year on-site limited warranty for parts and labor with nextbusiness-day response Pre-Failure Warranty coverage of hard drives, memory and processors
Revision 3.41
5 43
Reference 1 2 3 4
Description 2 x 66MHz SCSI/PCI hotpluggable expansion slots 9 x 33MHz PCI hot-pluggable expansion slots Rack rail slots Support for up to eight 500MHz Pentium III processors with redundant Processor Power Modules (one per processor) Rear hot-pluggable redundant fan Rear redundant processor fans Memory expansion board Redundant internal processor fans
Reference 10 11 12 13
Description High speed IDE CD-ROM drive (low-profile) 1.44MB diskette drive (lowprofile) Front hot-pluggable processor fan Hot-plug drive bay; three internal drive cages standard for 21 x 1inch Wide-Ultra2 SCSI hotpluggable hard drives On/Standby power switch Integrated Management Display (IMD) Hot-pluggable redundant I/O fans Smart Array 4250ES Controller (optional redundant array controller shown)
5 6 7 8
14 15 16 17
9
Revision 3.41
Miscellaneous
Revision 3.41
5 45
Density-Optimized Servers
HPs DL line denotes a line of ProLiant servers that are density-optimized for space constrained and rack-mounting environments. DL server products include:
ProLiant DL320 ProLiant DL360 ProLiant DL380 ProLiant DL560 ProLiant DL580 ProLiant DL590 ProLiant DL760
Objectives
To demonstrate an awareness of HP density-optimized server products, service personnel should be able to:
Describe the features and characteristics of HP DL servers. Locate configuration and service information relative to each product.
Revision 3.41
5 46
2.26-GHz, 2.66, 3.06 GHz Pentium 4 FCPGA, 533-MHz Front Side Bus
Integrated Dual Channel Ultra ATA/100 Optional slotless single channel Wide
Ultra3 SCSI controller module
ROM and 1.44MB disk drive assembly for controlled software updates and maximized in-rack security in ATA Models (2 x 80 GB 1" ATA/100 Non-Hot Plug Drives) or up to 72.8 GB in SCSI models (2 x 36.4 GB 1" SCSI nonHot Plug drives)
up to 80 GB in ATA Models (2 x 40 GB 1" ATA/100 Non-Hot Plug Drives) or up to 72.8 GB in SCSI Models (2 x 36.4 GB 1" SCSI non-Hot Plug drives) Serial port Two RJ-45 ports Graphics port Keyboard port Mouse port Two USB ports
Interfaces
Serial port Two RJ-45 ports Graphics port Keyboard port Mouse port Two USB ports
Revision 3.41
5 47
Chassis
Video
Controller w 8 MB SDRAM memory Standard global 3/1/1 next business day (3-year parts, 1-year labor, 1-year on-site) Extended, Pre-Failure Warranty which covers processors, memory, hard drives
Warranty
Standard global 3/1/1 next business day (3-year parts, 1-year labor, 1year on-site) Warranty upgrades available to 3/3/3 and 4 hours response time
Revision 3.41
5 48
Reference 1 2 3 4 5 6 7 8 9 10 11 12
Description Thumb Tabs Power Supply LED Indicators Removable CD-ROM/Diskette Drive Assembly (included in some models) Two 3.5 x 1 ATA or SCSI non-hot plug drive bays Fixed Rails Fan (7 Total) Processor (populated) DIMM Memory Slots (4 total) Ultra ATA/100 Controller Module (ATA Models) Single Channel Wide Ultra2 SCSI Controller Module (SCSI Models) 64-bit/33MHz PCI Slot
Revision 3.41
5 49
Reference 1 2
Description
Up to two 1-inch height HP ATA or SCSI non-hot-plug hard drives An optional removable CD-ROM/diskette drive assembly including a low-profile 3.5-inch diskette drive and a low-profile CD-ROM drive
Revision 3.41
5 50
This capability is available with Windows 2000 and Windows NT only. The two rear and three center wall fans are interchangeable; all must be operational for the system to run (there is no redundancy). Single fans are available as spares; the center wall spare has three fans already mounted on it A Fan 6 Error indicates an error from either one of the power supply fans. When this error occurs, you must replace the entire power supply unit. Although there is no interlock LED, there is an interlock circuit which prevents power up if the PCI riser board is not seated Trip Caution 43C - the server saves all running data and then shuts down one minute later. Trip Deadly 49C the server shuts down immediately. The installation of a Smart Array Controller to manage external SCSI hard drives is the same as installing any other PCI expansion card. If the Smart Array Controller is being used for the internal drives, the existing internal controller module must first be removed. The ProLiant DL320 drive activity LED does not flash when the Linux operating system is in use. (There will, however, be LED activity on the drives themselves). Novell is not supported because it is primarily a file and print server OS. With only two internal drive bays and no parallel port, this would not be a good platform for Novell The non-maskable interrupt switch (NMI) on the system board is for manufacturing use only. Always apply a new thermal pad and heat sink before reseating the processor. Failure to use a new heat sink may result in damage to the processor. When replacing the processor remove the plastic cover to expose the adhesive side of the thermal pad on the new heat sink before placing the heat sink on the processor. The system will not continue to operate if the plastic cover is left in place.
Novell Support
Revision 3.41
5 51
1.26GHz, 1.13GHz, 1GHz, 933MHz, 866MHz, 800MHz or 550MHz processor Dual processor capability (except for 550MHz processor) Customers who choose to upgrade from 1GHz and below will require an upgrade kit in addition to the processor option kit. 128 MB 133-MHz ECC registered SDRAM DIMM memory expandable to 4GB 256KB One 64-bit 33MHz PCI slot One 32-bit 33MHz PCI slot
Intel Xeon 1.4GHz, processor with 133MHz front side bus Dual processor capability Customers who choose to upgrade from 1GHz and below will require an upgrade kit in addition to the processor option kit. 256 MB 133-MHz ECC registered SDRAM DIMM memory expandable to 4GB 512KB Level 2 Two full length expansion PCI slots: 64-bit/66MHz
Processor upgrades
Intel Xeon 2.4GHz, 2.8GHz or 3.06GHz processor with 533MHz front side bus Dual processor capability Option kits available for Intel Xeon 3.06GHz, 2.80 GHz, 2.40 GHz processors
Memory
512 MB or 1024 MB 266MHz PC2100 DDR SDRAM expandable to 8GB 512KB Level 2 1024KB Level 3 Two full length expansion PCI-X slots: 64-bit/100MHz Note: One PCI-X slot if redundant power supply installed
Revision 3.41
5 52
Network controller
Two NC7780 PCI-X 10/100/1000-T Server Adapter Note: 64-bit/133MHz PCI-X bus speeds not supported - will run at 64-Bit/66MHz. Smart Array 5i Controller (integrated on system board) Note: External SCSI port not offered Wide Ultra2/Ultra3 SCSI Drive Cage supports up to two 1 hot plug hard drives Maximum internal storage 293.6 GB (2 x 146.8 GB Ultra320, 1" drives) Optional removable CD-ROM/Diskette Drive Assembly Serial port Two RJ-45 ports External SCSI connector Keyboard port Mouse port Two USB ports iLO remote management port
Storage controller
Smart Array 5i Plus Controller (integrated on system board) Note: External SCSI port not offered Wide Ultra320 SCSI Drive Cage supports up to two 1 hot plug hard drives Maximum internal storage 293.6 GB (2 x 146.8 GB Ultra320, 1" drives) Optional removable CD-ROM/Diskette Drive Assembly Serial port Two RJ-45 ports External SCSI connector
Wide Ultra2/Ultra3 SCSI Drive Cage supports up to two 1 hot plug hard drives Maximum internal storage 145.6 GB (internal drive cage) (2 x 72.8 GB Wide Ultra3, 1 drives) Optional removable CD-ROM/Diskette Drive Assembly Serial port Two RJ-45 ports External SCSI connector
Interfaces
management port
Chassis Warranty
Three-year on-site Next-Business-Day limited Global warranty Extended Pre-Failure Warranty covers Pentium III processors, memory, and hard drives
Protected by HP Services, including a three-year, next business day on-site limited global warranty and extended Pre-Failure Warranty which covers processors, memory, and hard drives Certain restrictions and exclusions apply.
Protected by HP Services, including a three-year, next business day on-site limited global warranty and extended Pre-Failure Warranty which covers processors, memory, and hard drives Certain restrictions and exclusions apply.
Revision 3.41
5 53
Revision 3.41
Revision 3.41
5 55
The DL360 has dual processor capabilities with all but 550MHz processors. Installing two 550MHz processors will generate a halt and POST error. The system automatically detects and configures settings when a processor is added or replaced. The Processor socket 1 must be populated at all times for the system to complete POST. 1.26/1.13GHz SKUs use a different system board than the 1GHz and Below. The new system board is NOT backwards compatible with the 1GHz and below processors. This is because the 1.26 and 1.13GHz processors have 512K of lervel-2 cache and require a new VRM (PPM) and socket To upgrade 550 MHz, 800 MHz, 866 MHz or 933 MHz Models to 1.0 GHz, the HP ProLiantDL360 P1000 Upgrade Kit is required (PN 225352-B21). When upgrading 550 MHz, 800 MHz, 866 MHz, 933 MHz, 1.0 GHz Models to a 1.266 GHz or 1.133 GHz Model, the HP ProLiant DL360 P1133/P1126 Upgrade Kit (236122-B21) is required. Only PC133MHz ECC registered DIMMS can be used in this server. External drives support RAID 0 only off integrated SCSI controller. Integrated SCSI controller supports only single tape drives - not tape libraries. If Wide Ultra2 and Wide Ultra3 drives are mixed on the embedded array controller, all drivers will operate at Wide Ultra2 speeds. The shipping pin must be removed before the CD-ROM/Floppy drive assembly can be ejected. The server must be placed in power standby mode before removing the CDROM/Floppy drive assembly. The drive assembly bay should always have either the CD-ROM/Floppy drive assembly or a bezel blank installed for proper air flow. Failure to do so may result in thermal damage. All Remote Insight Boards must be installed in the 32-bit PCI slot. To allow LAN access to the Remote Insight Lights-Out Edition (RILOE), a LAN cable must be attached to the RJ-45 connector on the RILOE board. The RJ-45 connector on the rear panel will not provide network access to the RILOE. The CMOS/NVRAM battery is not soldered down and is replaceable. Failure to run the server without an expansion board or an expansion slot cover in each of the expansion slots may cause thermal damage. Some POST error messages are non-standard. Always refer to the Maintenance and Service Guide for POST error messages. Unit Identification Switch in front and back of server for easy identification of server in rack.
Miscellaneous
Revision 3.41
5 56
The standard features of the ProLiant DL380 G1, G2 and G3 servers include:
DL380 G1 Processor DL380 G2 DL380 G3
Pentium III 1GHz, 933MHz, 866MHz, 800MHz, 733MHz or 667MHz Upgradeable to dual processing 128 MB 133-MHz ECC SDRAM memory expandable to 4GB
Intel Xeon Processor 3.06 GHz, 2.8GHz, 2.4GHz Upgradeable to dual processing 1024 MB 266MHz PC2100 DDR SDRAM on 3.06GHz models expandable to 12GB or 512 MB 200MHz PC2100 DDR SDRAM on 2.8GHz models or lower expandable to 6GB Advanced ECC and online spare capable 512KB Level 2 1024KB Level 3 Two 64-bit/ 100MHz hot plug One 64-bit/133MHz non-hot plug Two integrated NC7781 PCI-X Gigabit NICs Integrated Smart Array 5i+ controller
Memory
512KB per processor Two 64-bit/ 66MHz hot plug One 64-bit/33MHz non-hot plug Two HP NC3163 Fast Ethernet NIC 64 PCI dual base controller Integrated Smart Array 5i controller
Network controller
Embedded HP NC3163 Fast Ethernet 10/100 PCI NIC with Wake on LAN Integrated Smart Array controller
Storage controller
Revision 3.41
5 57
1.44MB diskette drive, low-profile 24X max CD-ROM drive Support for up to six 1 Wide Ultra2/Ultra3 hot plug hard drives: 4 in the standard drive cage; 2 in an optional 2x1inch drive cage Two serial ports/one parallel port RJ-45 port External SCSI port (for tape only) ports
1.44MB diskette drive, low-profile 24X max CD-ROM drive Support for up to six Ultra3 hot plug hard drives: five 1 drives and one 1.6 (for disks or tape).
1.44MB diskette drive, low-profile 24X max CD-ROM drive Support for up to 6 drives with single or dual channel (using either the embedded Smart Array 5i Plus controller or a PCIbased controller) One serial port/two USB ports Three RJ-45 ports (one for iLO remote management) External SCSI port (for tape only) Keyboard and mouse ports XL video controller with 8MB video memory
Interfaces
One serial port/two USB ports Two RJ-45 ports External SCSI port (for tape only) ports
Graphics
IIC video controller with 4MB video memory 3U form factor rackmount Three-year on-site limited warranty Extended Pre-Failure Warranty covers Pentium III processors, memory, and hard drives
IIC video controller with 8MB video memory 2U form factor rackmount Three-year on-site limited warranty Extended Pre-Failure Warranty covers Pentium III processors, memory, and hard drives
Chassis Warranty
Three-year on-site limited warranty Extended Pre-Failure Warranty covers processors, memory, and hard drives
Revision 3.41
5 58
Revision 3.41
5 59
Ref. # 1 2 3 4 5
Description Hot-plug drive cage accommodating four 1-inch heightSCSI hot-plug hard drives Two 5.25-inch wide x half-height drives Optional drive cage that supports two 1-inch media devices Low-profile IDE CD-ROM drive Diskette drive
Ref. # 1 2 3 4
Revision 3.41
Description Support for up to six 1-inch, hot-plug SCSI hard drives Support for one optional 1.6-inch HP Universal Hot-Plug Tape drive with five hot-plug SCSI hard drives installed One bay occupied by a slimline 1.44-MB diskette drive One CD MultiBay occupied by a removable 24X IDE CD-ROM drive 5 60
Processor slot 1 must be populated at all times. If it is necessary to remove the processor from slot 1, install the second processor in slot 1. Both processors must be the same speed. Pentium III processors can no longer be down-clocked or up-clocked. Intel now locks in the speed. The system ROM maintains a primary and redundant image of the BIOS. If one image is corrupt, POST error 105 Current System ROM is corrupt-now booting redundant system ROM will appear. If bot ROM images are corrupt, enable disaster recovery mode by setting SW2 position 1,4, 5, and 6 to the ON position and rebooting. Only PC133MHz ECC register DIMMS can be used. IMPORTANT: Do not force the installation of DIMMs. If the alignment does not match, it is probably the wrong type of DIMM. This battery is not soldered onto the board and is replaceable. If only 1 SCSI drive is used, it should be installed in Bay 0. Wide Ultra2 and Wide Ultra3 drives can be mixed on the embedded controller, but all drives will operate at Wide Ultra2 speeds.
Memory
!
CMOS and NVRAM Battery Storage
Revision 3.41
5 61
ProLiant DL560
Three PCI-X slots total: Two 64-bit 100MHz non-hot plug One 64-bit 133MHz non-hot plug Two embedded 10/100/1000 HP NC7781 Dual channel Ultra3 controller Smart Array 5i Plus controller (with 64MB memory) One slimline 1.44MB ejectable drive One slimline 24X IDE ejectable CD-ROM Up to two internal 1 U320 hot plug hard drives Serial, mouse, keyboard and video ports One RJ-45 iLO connector Two RJ-45 NIC connectors Two USB ports
Interfaces
Video Warranty
Integrated 1280x1024, 16M color Video Controller with 8MB Video Memory Three-year on-site limited warranty. 3-years parts, 3-years labor, and 3-years onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Revision 3.41
5 62
Revision 3.41
5 63
The DL560 G2 has an embedded array controller (Smart Array 5i) which connects to the two drives on the front of the server. The array controller does not have a connection to support external drives, so customers must install an array controller in an expansion slot to use external SCSI storage. When the fans are in a fully redundant configuration, there will be a sudden reduction in fan noise after the server completes POST. All of the fans spin up at power up and self-test. When the redundant fans spin down, there is a sudden reduction in fan noise. This is normal. Install memory in pairs of identical DIMMs. All DIMMs installed must be the same speed Install DIMMS into both slots of the next available memory bank, beginning with bank A, then bank B, lastly bank C. 207-Memory Configuration Warning - DIMM In DIMM Socket X does not have Primary Width of 4 and only supports standard ECC. 209-Online Spare Memory Configuration - Spare bank is invalid. Mixing of DIMMs with Primary Width of x4 and x8 is not allowed in this mode
Fans
Memory
Revision 3.41
5 64
The standard features of the ProLiant DL580 and DL580 G2 include the following:
Processors Cache memory Memory DL580 Intel Pentium III Xeon processor 900MHz, 700MHz Upgradeable to quad processing 2MB L2 per processor (900MHz, 700MHz) 1MB L2 per processor (700MHz only) 1024MB PC100MHz Advanced ECC SDRAM (900MHz) 512MB PC100MHz Advanced ECC SDRAM (700MHz) Support for a maximum of 16GB DL580 G2 2.8,2.5,2.0,1.9,1.6,1.5,1,4 GHz Xeon MP Upgradeable to quad processing 2MB (2.0, 2.8GHz) iL3 or 1MB (2.5, 2.0, 1.9, 1.6, 1.5 GHz) iL3 or 512KB (1.40 GHz) iL3 2048MB 200MHz DDR, Advanced ECC, 4:1 inteleaved (2P model) 1024MB 200MHz DDR, Advanced ECC, 4:1 inteleaved (1P model) Support for a maximum of 32GB Online Spare Memory, Single Board Mirrored Memory, Hot-Plug Mirrored Memory
Expansion slots
Six total, five available: Two 64-bit 66MHz PCI Hot Plug (one available) Two 64-bit 33MHz PCI Hot Plug One 64-bit 33MHz PCI Non-Hot Plug One 32-bit 33MHz PCI Non-Hot Plug Integrated 10/100 NC3134 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra2 SCSI controller Optional integrated Smart Array controller
Four full length hot pluggable 64-bit/100 MHz PCI-X slots Two full length non-hot pluggable 64bit/100 MHz PCI-X slots (one available, one used for the NIC)
Integrated HP NC7770 PCI-X Gigabit Server Adapter in a slot Integrated Smart Array 5i Plus Controller (Dual Channel, Ultra3) with 64-MB total memory on 5i Plus Memory Module Battery-Backed Write Cache Enabler module on all 2P models (optional on 1P model)
Revision 3.41
5 65
Storage
One 1.44MB diskette drive One 32X Max or faster IDE CD-ROM drive Four 1 hard drive bays One serial ports (2nd available with auxiliary serial connector provided) Parallel port External Wide Ultra2 SCSI RJ-45 port Keyboard port Mouse port Integrated ATI RAGE IIC Video Controller with 4MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
One 1.44MB diskette drive 24x IDE CD-ROM Drive (slim line) ejectable for security and serviceability Four 1 hard drive bays One serial port Keyboard port Mouse port Graphics port iLO remote management RJ-45 port USB ports (2)
Interfaces
Video Warranty
Integrated ATI RAGE IIC Video Controller with 8MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Revision 3.41
5 66
Revision 3.41
5 67
Revision 3.41
5 68
There is an Internal Diagnostic Display on the peripheral board which indicates the failure of a memory module or processor. It displays a two-digit alphanumeric code that corresponds to a specific memory module or processor. The Diagnostic jumper must be removed before the IDD will display a code. Internal Diagnostic Display (IDD) Indicator Codes
Revision 3.41
5 69
Serial Port B is not installed in the factory. A cable is included in the country kit that can be connected from the peripheral board to the back of the chassis. There is a blank plate installed which can be removed when the connector is installed. The WOL feature is only supported by operating systems that support ACPI. At this time, that is only Windows 2000. The WOL feature is enabled through the System Configuration Utility. Use the following steps to enable WOL. 1. Press the Ctrl and A keys before the Continue message displays. This will take you to the Advanced Mode. 2. Scroll down to find the Enable WOL selection All Remote Insight Boards, including the new HP Remote Insight Lights-Out Edition, must be installed in PCI slot 6. Cabling the board to J8 of the system board provides the Lights-Out Edition with full control over the server power state. There are no VRM or PPM slots. This machine has On-Chip Voltage Regulation (OCVR). The VRM or PPM is part of the processor cartridge. To use the IRC capability, an external modem must be connected to one of the serial ports. In the event that the server locks up during normal operation because of software or hardware problems, depress and hold the power switch for a minimum of four seconds. This causes the system to transition to an off state.
Revision 3.41
5 70
ProLiant DL590/64
Expansion slots
Eleven total, ten available: Eight 64-bit 66MHz PCI Hot Plug, eight available Three 64-bit 33MHz PCI Hot Plug, two available Integrated dual port 10/100 NC3134 Fast Ethernet Upgradeable to quad port Gigabit Integrated, dual channel, Wide-Ultra2 SCSI Smart Array Controller Optional support of Ultra3 and Fibre Channel in an I/O slot
Integrated LS-120 Drive Integrated 24x IDE CD-ROM (Slim Line) Drive Four 1 hard drive bays Two serial ports Parallel port Two USB ports RJ-45 port Keyboard port Mouse port
Interfaces
Video Warranty
Embedded ATI Rage XL Video Controller with 8-MB SDRAM Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)
Revision 3.41
5 71
Revision 3.41
5 72
The actual power supplies and the power supply blanks look very similar. To install a new hot-plug power supply, you must remove a blank. If the server is powered on, make sure that you remove a blank and not the actual power supply. If you mistakenly remove a power supply and it is the only power supply in the server, then the server loses power, resulting in loss of data. (A power supply has a release lever on the left side, whereas a blank has a release tab at the top instead of a release lever.) If you are using a 120-volt AC power source to power the ProLiant DL590/64 server, do not install more than 2 power supply/SPM pairs. Using 3 power supply/SPM pairs requires more current than a 120-volt power source can provide. Consequently, the server will not power up under these conditions. HP does not support transferring memory modules from other non-Itanium platforms. To prevent damage to equipment or loss of information, HP strongly recommends using DIMMs supplied by HP. The ProLiant DL590/64 also supports third-party industry standard 168-pin PC100 CL2 ECC Registered DIMMs with the following restrictions: TDAL=3 Tclks, TRC=6 Tclks when operating at Tclk = 15 ns (66.67 MHz). Other third-party memory may result in error messages and possible loss of data. Do not grasp the components of the memory board VRM when removing it. These components are very fragile and can easily break. Hold the memory board VRM by the circuit board only. The ejection mechanism moves the VRM away from the memory board sufficiently to reduce the force needed to complete the memory board VRM removal. The server will not power up if the bus-to-core ratio jumpers are set to a speed higher than the slowest processor installed in the server. Failure to properly set the bus-to-core ratio jumpers can cause damage to the server and void the warranty. To prevent damage to the processor board, always ensure that a processor/PPM blank is installed in any unused processor slot before securing the triple beam in place. To avoid damaging components when replacing the I/O board, ensure that the I/O board is attached to the subpan along with the System Power Module basket, PCI basket insulator, and I/O board latch release lever assembly. To avoid damage to the I/O board and sideplane board connector when replacing the sideplane board assembly, ensure that the sideplane board assembly is fully pushed to the rear of the server to fully seat into the connectors before tightening the thumbscrew. Never remove more than one of the hot-plug redundant I/O fans at a time while the server is powered up. Loss of BIOS settings occurs when the battery is removed. BIOS settings must be reconfigured whenever the battery is replaced. As a precaution, place a sheet of paper on the metal surface below the battery. This will prevent the battery from shorting out if it gets dropped on the metal surface. When maintenance mode is enabled (the maintenance switch is set to on) and the system is powered up, NVRAM configuration is invalidated. 5 73
Power Source
Memory
Processor
I/O Board
ProLiant DL740
Memory
Expansion slots Network controller Storage controllers Storage and expansion Video Form factor Management Warranty
Six 64bit/100MHz PCI-X hot pluggable Dual integrated (10/100/1000) Smart Array 5i Four 1.0" Ultra320 Wide Ultra3 hard drives Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory 4U (7") rack form factor - standard 19 rack-mountable Integrated Lights Out Standard (iLO) Limited Warranty includes 3 year Parts, 3 year Labor, 3-year on-site support Pre-Failure Warranty
Revision 3.41
5 74
Revision 3.41
5 75
Racks
Revision 3.41
5 76
with 2MB L2 cache 700MHz Pentium III Xeon processor with 1MB or 2MB L2 cache Supports up to eight processors for 8-way symmetric multiprocessing (SMP) Memory
2048 MB ECC protected 100MHz ECC SDRAM DIMMs (4P Models) 1024 MB ECC protected 100MHz ECC SDRAM DIMMs (2P Models) Support for a maximum of 16GB Eleven 64-bit hot-plug (8 PCI-X, 3 PCI Dual 10/100 NC3134 NIC Integrated Smart Array Controller: one channel for RAID support with internal drives, one for external tape drive 1.44MB diskette, 24X or greater IDE CD-ROM drive Support for up to four one-inch hotpluggable Wide Ultra2 SCSI hard drives in three combinable drive cages Integrated ATI Rage IIC graphics controller with 2MB Synchronous Graphics RAM (SGRAM) Rack-optimized 7U chassis
with 1MB L3 cache Support for up to 8 processors - 4 processors ship standard 133MHz ECC SDRAM Hot Plug RAID Memory 2048MB min on 1.5GHz models (2560MB total) 4096MB min on >1.5GHz models (5120 MB total) /64GB max (80GB total) Eleven 64-bit hot-plug: ten 64-bit 100MHz PCI-X one 64-bit 133MHz PCI Integrated NC7770 PCI-X Gigabit Integrated Smart Array 5i Controller
1.44MB diskette, 24X or greater IDE CD-ROM drive Support for up to four one-inch hotpluggable Ultra 320 SCSI hard drives Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory
Video
Form Warranty
Global three-year on-site limited warranty for parts and labor with next-business-day response Extended Pre-Failure Warranty (if Insight Manager is installed on the server) covers
processors, memory, and hard drives
Upgrade Kit
Revision 3.41
An upgrade option kit (PN 190756-B21) is available to upgrade any ProLiant 8500 or
ProLiant DL760 G1 to a ProLiant DL760 G2 5 77
Reference 1 2
Description Processor/memory module One to eight 550MHz Pentium III Xeon processors with redundant processor power module Integrated Management Display (IMD), optional for Model 1 Integrated 1.44MB diskette drive Integrated high speed IDE CDROM drive (low-profile) Media module Four 1-inch Wide-Ultra2 SCSI hot-pluggable drive bays
Reference 8 9
Description Integrated lift handles Two redundant hot-plug power supplies (single power supply on Model 1) I/O module Redundant hot-pluggable fans Eleven hot-plug I/O slots including eight PCI-X and three PCI System interconnect status indicators
3 4 5
10 11 12
6 7
13
Revision 3.41
5 78
Revision 3.41
5 79
Cables
Setup
PCI vs PCI-X
Revision 3.41
5 80
The ProLiant DL760 has several sets of LEDs that assist in troubleshooting: Front panel (power status, fan status, Information Management display) System interconnect Fan PCI Hot Plug Power supply When one of the connected components in the interconnect chain is improperly seated in its connector or is missing, the System Interconnect LED associated with the fault origination point will illuminate on the system midplane board and will be displayed on the top access panel. When any of the interconnect LEDs are lit, the front panel power status LED will illuminate amber. System Interconnect Status Indicators Component Indicator Component Emergency shutdown 10 Memory board Processor MP8 11 Processor power module Processor MP7 12 Processor and memory module Processor MP6 13 I/O module and fans Processor MP5 14 Media module Processor MP4 15 SCSI backplane 1 Processor MP3 16 SCSI backplane 2 Processor MP2 17 Reset Processor MP1
The ProLiant DL760 comes standard with two 1150/500W Redundant Hot Plug Power Supplies. Each power supply generates 500W in the 110VAC configuration or 1150W in the 220VAC configuration. The power supplies autosense 110VAC or 220VAC and are auto load balancing. They are microprocessor controlled which allows them to be monitored for advanced health and configuration management. Memory can be expanded to a maximum of 16GB. Install SDRAM DIMM modules two at a time in the proper sockets. When installing or replacing memory, you must use only 256MB, 512MB or 1GB SDRAM DIMMs. Each DIMM of a given bank must be the same size, type, and speed.
Revision 3.41
5 81
The Processor Power Module must be installed before you install the accompanying processor. Attempting to install the PPM afterward could damage the electronic components on the PPM. On the host board are three rows of contacts. If a processor is not fully seated, these contacts will not line up and the unit will not function. To ensure proper seating for this sensitive connection, HP developed newly designed ejectors on these processors, which also serve as injectors used to fully seat the processor. As you push down on a levers, they will cam down to seat the processor completely. 1. 2. To remove a processor: Lift up and rotate the front and rear ejector levers on the processor outward. Use the tabs to pull out the processor. If you remove a processor, you must install a processor terminator board before powering up the server. The system will not power up if there are empty slots. If more than 4 processors are installed, NT Enterprise is required. Mixing of PII Xeon and PIII Xeon processors is not supported. Pentium III Xeon 550MHz, 700 MHz, or 900 MHz processors cannot be mixed in the same server. They all have to be of the same speed. When Processors are installed on the second of the two system (processor) buses, a pair of Cache Coherency Accelerators must be installed. If the coherency accelerator memory fails or the modules are not installed properly, the system will initialize only processor bus 1. There is up to a 40 second delay between power on and video. F9 is used to access the ROM based configuration utility This machine has a remote-flash redundant ROM which allows recovery in the event of ROM failure
Miscellaneous
Revision 3.41
5 82
Customers should only install 4 or 8 of the same processor in the DL760G2. Processor
mixing of different frequencies and cache sizes is not supported on the DL760G2. Unpredictable behavior may result if processors are mixed. At the very least this will cause the system to run at the slowest processor speed. The Processor & Memory Module needs to be re-extended about 2.5-inches to disconnect it from the System Midplane Assembly, and re-bolted with the orange Shipping Screws before shipping the unit. The standard procedure for installing a DL760 G2 into a rack now involves removing the shipping screws prior to removing all modules and powers supplies, attaching slides to the empty chassis and rails to the rack, mounting the chassis in the rack, reinstalling the modules, and verifying all critical components are properly seated by reviewing the system status indicator lights. Alert customers and resellers to keep these shipping screws for future use. provide PCI hot plug support or if you do not have the appropriate device drivers installed. Failure to take these precautions causes system shutdown and risks data integrity. Hot plug capability is only functional when using a hot plug aware expansion board and after installing: The PCI-X Hot Plug device drivers (located on the SmartStart CD and HP website), and An operating system that supports PCI-X Hot Plug technology (support levels vary) Hot-Replace capability is operating system independent; Hot Add or Hot Upgrade require operating system and application support. All models come standard with all five memory cartridges populated and Hot Plug RAID Memory enabled. Total memory consists of addressable memory plus the redundant memory DIMMs are installed in bank pairs of ten. A bank of memory is five DIMMs one in the corresponding slot across each of the five cartridges - and in order to achieve interleaving performance advantages, memory must be installed two banks at a time (a bank pair). An LED bar is located directly underneath the memory cartridges. This LED bar includes an LED for all 40 DIMM slots in the memory subsystem. When a cartridge is removed from the server to replace a failed DIMM, the LED for the failed DIMM remains lit so that it can be matched with the label of the DIMM slot inside the memory cartridge. When a cartridge is removed and the server is therefore running in non-redundant memory mode, any attempt to unlock a second cartridge will not bring down the power to that second cartridge. Instead, audible and visual alarms will indicate the need to relock that second cartridge - if a second cartridge is unlocked and removed from the server while the server is running, the server will fail.
Shipping screws
Do not attempt a PCI hot plug procedure if your operating system does not
Revision 3.41
5 83
Blade Servers
HPs BL line denotes a line of ProLiant servers that offer power-efficient servers in ultra-dense, space-saving packaging. The offerings in the BL line include:
Objectives
To demonstrate an awareness of HP blade server products, service personnel should be able to:
Describe the features and characteristics of HP BL servers. Locate configuration and service information relative to each product.
Revision 3.41
5 84
ProLiant BL10e
3U form factor (5.25 high x 17.5 wide x 28.35 deep) Bays for up to 20 ProLiant BL10e server blades RJ-45 Patch Panel (with 40 RJ45 connectors) or RJ21 Patch Panel (with 4 RJ21 connectors) or ProLiant BL e-Class C-GbE interconnect switch (4 Gigabit Ethernet uplinks) Local console and remote network access Remote power control for enclosure and server blades Remotely toggle on/off unit identification LEDs for blades &enclosure Monitors/controls enclosure fans, temperature sensors, blade status Connects to each blades serial console Two diagnostic adapter interfaces provide ProLiant BL10e server blade with diagnostic LEDs, buttons and the following external ports: mouse, keyboard, video, serial, USB (2) Three-year limited warranty on enclosure and interconnect trays
3U form factor (5.25 high x 17.5 wide x 28.35 deep) Bays for up to 20 ProLiant BL10e server blades RJ-45 Patch Panel (with 40 RJ45 connectors) or RJ21 Patch Panel (with 4 RJ21 connectors) or ProLiant BL e-Class C-GbE interconnect switch (4 Gigabit Ethernet uplinks) Local console and remote network access Remote power control for enclosure and server blades Remotely toggle on/off unit identification LEDs for blades &enclosure Monitors/controls enclosure fans, temperature sensors, blade status Connects to each blades serial console Two diagnostic adapter interfaces provide ProLiant BL10e server blade with diagnostic LEDs, buttons and the following external ports: mouse, keyboard, video, serial, USB (2) Three-year limited warranty on enclosure and interconnect trays
Interconnect Tray
Warranty
Revision 3.41
5 85
ProLiant BL10e Server Blade BL10e G1 Processor Cache memory Memory BL10e G2
Single ultra-low voltage (ULV) Pentium III 900MHz, 100MHz front side bus 512KB L2 512MB PC133MHz registered ECC SDRAM Expandable to 1GB maximum using a total of two DIMM slots None available Two NC3163 10/100 Fast Ethernet NICs 64 with Wake on LAN (WOL) 40-GB Ultra ATA/100 5,400 rpm nonhot-plug hard drive, 2.5" Diagnostic port support for local keyboard, video, mouse, diskette drive Also supports USB devices : keyboard, mouse, CD-ROM, floppy disk One-year limited warranty on server blades
Single ultra-low voltage (ULV) Pentium M 1GHz, 400MHz front side bus 1MB L2 512MB C2100 registered ECC DDR Expandable to 1GB maximum using a total of two DIMM slots None available Two NC3163 10/100 Fast Ethernet NICs 64 with Wake on LAN (WOL) 40-GB Ultra ATA/100 5,400 rpm nonhot-plug hard drive, 2.5" Diagnostic port support for local keyboard, video, mouse, diskette drive Also supports USB devices : keyboard, mouse, CD-ROM, floppy disk One-year limited warranty on server blades
Revision 3.41
5 86
1 7 6 6
10
Reference 1 2 3 4 5
Description Hot Plug Power Supply (600W) Center wall assembly ProLiant BL e-Class Integrated Administrator module Fan backplane assembly Fan cage
Reference 6 7 8 9 10
Description Hot Plug fan Enclosure status assembly ProLiant BL e-Class C-bE Interconnect Switch interconnect tray RJ-21 patch panel interconnect tray RJ-45 patch panel interconnect tray
Revision 3.41
5 87
3 4
1 2
Reference 1 2 3 4
Description ProLiant BL10e server blade ATA hard drive assembly 133MHz SDRAM DIMM 3.3-V Lithium battery
Revision 3.41
5 88
Use only 128-MB, 256-MB, or 512-MB, 72-bit wide, 3.3 V, registered ECC SDRAM. SDRAM can be either 100 or 133 MHz. Use HP SDRAM only. You can perform a graceful shutdown of a ProLiant BL10e server blade or a ProLiant BL e-Class enclosure by using the Power Off option in the Integrated Administrator. You can also perform a graceful shutdown of a ProLiant BL e-Class enclosure and all the server blades by pressing the enclosure power button on the rear of the enclosure if your operating system is Microsoft Windows 2000. If your operating system is RedHat Linux, you must have the HP Linux Health driver installed in order for the server blades to shut down gracefully. You can perform an emergency shutdown of a ProLiant BL10e server blade by pressing and holding the power button on the front of the server blade for four seconds. You can also perform an emergency shutdown of a ProLiant BL e-Class enclosure and all server blades by pressing and holding the power button for four seconds. Note: Performing an emergency shutdown blade may result in the loss of any unsaved data. The Integrated Administrator performs an emergency shutdown of the enclosure and all server blades only after trying for five minutes to perform a graceful shutdown. If, after five minutes, the Integrated Administrator cannot perform a graceful shutdown on the enclosure and all server blades, the Integrated Administrator performs an automatic emergency shutdown. Performing an emergency shutdown on the enclosure may result in the loss of any unsaved data on all server blades in that enclosure. Integrated Administrator security settings are assigned to server blade bays, not to server blades. If server blades change locations within the enclosure, Integrated Administrator settings must also be adjusted to ensure accurate security. Do not remove a failed power supply until a replacement power supply is available, to avoid a thermal event.
Revision 3.41
5 89
Intel Pentium III FC-PGA processor 1.40GHz Upgradable to dual processing 512-KB Level 2 Cache 133MHz bus
2.8GHz or 3.06GHz Intel Xeon processor Upgradeable to dual processing 1MB Level 3 cache (3.06 GHz only) 512KB Level 2 cache (3.06GHz, 2.8Ghz) 533MHz front side bus 512MB ECC PC2100 DDR Std/8GB max Integrated Smart Array 5i Plus with optional battery-backed write cache Three 10/100/1000 NICs 1 Dedicated iLO Port
512MB PC133 ECC SDRAM Std/4GB max Integrated Smart Array 5i with optional battery-backed write cache Three 10/100 NICs 2 upgradeable to 10/100/1000T 1 Dedicated iLO Port
Hard Drive Bays Slots Chassis Server mgmt Power Server Blade Enclosure
Two 3.5 SCSI hot plug drive bays No PCI slots - all features are integrated 1U X 6U form factor plugs vertically into 6U server enclosure
Integrated Lights-Out Rack-centralized External shared redundant hot-plug power 10 bays available - 8 bays for server blades plus 2 outside bays for interconnect modules Server blades blind mate into the server blade enclosure backplane for power and data connections Up to 6 BL p-Class 6U server blade enclosures fit in a 42U rack Server blade management module attached to the back of each server blade enclosure to report events for all servers and provide asset and inventory information
Revision 3.41
5 90
Revision 3.41
5 91
While installing a Service Pack If the following dialog box displays: Your computer vendor installed this file on your computer. Do you want this Service Pack to replace this file? Click on the NO button. Do no overwrite Compaq or HP software when prompted while installing a service pack, unless instructed to do so. When servicing the power and server blade enclosures: be aware that the server blade enclosure and the power enclosure do not have locking mechanisms that prevent them from sliding out of the rack while servicing. The default address for iLO is 192.168.1.1. Ensure this address is not used before plugging a new blade into the network. Do not connect the front iLO ports to a hub. All server blades have the same IP address through the diagnostic port. Multiples on a hub make the server blades indistinguishable on the network When removing server blades, physically label each server blade to ensure it will be installed back into the same position in the enclosure. Connecting to the diagnostic port with the diagnostic cable automatically disables the iLO connection on the rear of the server blade. If the server does not automatically power up and POST after inserting in the enclosure, press and hold the power button on the front of the server blade for at least 6 seconds. If using the Virtual Power Button in iLO, always use the Press and Hold selection when powering up a server blade for the first time. If both iLO and the blade are not responding, view iLO on an adjacent blade. If the adjacent blade appears normal, reset the out of service blade and iLO by operating the release lever and backing the blade out enough to disconnect power entirely from the blade for about 10 seconds. Re-install the blade and view iLO again. If iLO is still inoperable, remove the blade and test in the diagnostic station (or replace the blade with a spare for testing purposes, or insert the blade into a spare blade slot in the enclosure, if any, and try again). The new ProLiant BL20p G2 blade fits into the same enclosure as the BL20p and BL40p blades and shares the same power. SAN connectivity on the ProLiant BL20p G2 is provided using a Dual Port Fibre Channel Mezzanine Card specifically designed for it. The card cannot be installed in the ProLiant BL20p G1.
Enclosure insertion precaution iLO default address iLO port connect precaution Blade removal procedure Diagnostic cable connection Power up failure after blade insertion iLO and blade failure to respond
Revision 3.41
5 92
ProLiant BL40p
Xeon MP 2.8GHz, 2.0GHz, 1.5GHz 400MHz bus 2MB Level 3 cache (2.8GHz, 2.0GHz only) 1MB Level 3 cache (2.0Ghz, 1.5GHz) Up to 4 processors 1GB PC2100 ECC DDR Std/12GB max (2p model) 512MB PC2100 ECC DDR Std/12GB max (1p model) Advanced memory protection with online spare Integrated Smart Array 5i Plus with optional battery-backed write cache Five 10/100/1000T Ethernet PXE enabled connections One dedicated iLO port Four 3.5" SCSI hot plug drive bays Two PCI-X slots for SAN connectivity Plugs vertically into p-Class server enclosure Up to 12 BL40p blades fit in 42U rack Integrated Lights-Out Rack-centralized External shared redundant hot-plug power Four bays wide X 6U high form factor - plugs vertically into 6U server enclosure
RAM Std/Max
Drive controller NIC Hard Drive Bays Slots Chassis Server mgmt Power Server Blade Enclosure
Revision 3.41
5 93
Revision 3.41
5 94
Server Blades (by default) may be configured to power up upon insertion; however, this setting can be changed through iLO to manual power-up using the power button. Use the setting on the iLO Rack Settings page called Enable Automatic Power On. If a server is removed for any reason, ensure a blank is inserted in its place. When removing server blades, physically label each server blade to ensure it will be installed back into the same position in the enclosure. The default address for the iLO front port on all blades is always 192.168.1.1. Change this port to an unused address before plugging it into a network.
Revision 3.41
5 95
Cluster Line
HPs cluster line offers ProLiant servers with simple and affordable packaged clusters powered by ProLiant Servers and Smart Array technology. The current offerings in the packaged cluster line include:
ProLiant DL380 G3 Cluster ProLiant DL380 G3 Integrated Cluster ProLiant DL380 G2 Cluster ProLiant CL380
Objectives
To demonstrate an awareness of HP packaged cluster products, service personnel should be able to:
Describe the features and characteristics of HP packaged cluster servers. Locate configuration and service information relative to each product.
Revision 3.41
5 96
The standard features of the ProLiant DL380 G2 and DL380 G3 packaged clusters include:
DL380 G2 Servers DL380 G3
Two ProLiant DL380 G2 servers Server features listed under DL380 G2 8U configuration fixture
Two ProLiant DL380 G3 servers Server features listed under DL380 G3 8U configuration fixture 14U configuration fixture available for racked version Smart Array Cluster storage Now supports U320 SCSI 10k and 15K rpm Universal Hard Drives 4-Port Shared storage module option for the highest level of availability (multipath software included) Two VHDCI SCSI cables (one per server) Ethernet crossover cable (cluster heartbeat for MSCS) Three-year on-site limited warranty Extended Pre-Failure Warranty covers processors, memory, and hard drives
Packaging
Storage
Cables
Two VHDCI SCSI cables (one per server) Ethernet crossover cable (cluster heartbeat for MSCS) Three-year on-site limited warranty Extended Pre-Failure Warranty covers processors, memory, and hard drives
Warranty
Revision 3.41
5 97
Revision 3.41
5 98
3. 4U Shared Storage with 14 1 Hot Plug Drives 4. Open rack space for options (6U total)
Revision 3.41
5 99
Revision 3.41
5 100
The standard features of the ProLiant CL380 packaged cluster includes the following (all features per server unless specified other wise)::
Processor Cache Memory Upgradeability Memory Network Controller Expansion Slots
Intel Pentium III 1.0 GHz 256KB level 2 writeback cache per processor Upgradeable to dual processing 128MB PC 133MHz registered ECC DRAM Maximum 4GB Embedded NC3163 Fast Ethernet PCI 10/100 WOL for heartbeat monitoring NC3123 Fast Ethernet PCI 10/100 for public LAN (occupies one PCI slot per server) Four total, two available Three 64-bit/66MHz PCI 3.3V or universal cards (two available) One 32-bit/33MHz PCI 5V or universal cards (not available) 64-bit dual channel Wide Ultra2 in PCI slot (interface to shared storage) Integrated Smart Array Controller (utilized for server boot) 1.44MB diskette drive, 24x IDE CD-ROM, no Hard Drives Shared internal storage can accommodate six 1 Wide Ultra3 drives Optional non-shared internal storage can accommodate two 1 Ultra2/Ultra3 drives One RAID CR3500 controller ships standard; second controller optional (per cluster) Up to six drives in cluster server cabinet (per cluster) One parallel, two serial, mouse, keyboard, external SCSI (tape only), RJ45 Integrated ATI RAGE IIC video controller with 4MB memory Three year limited; pre-failure coverage of processors, memory, hard drives 5 101
Revision 3.41
5 102
Revision 3.41
5 103
Introduction
This module gives an overview of HP Smart Array products. Topics include: Drive array technology RAID and fault tolerance HP Smart Array controller features HP Smart Array controller service considerations HP Smart Array controller configuration utilities
Objectives
To demonstrate knowledge of HP array products and utilities, service personnel should be able to: Describe the features and benefits of drive array technology Explain how HP array controllers support RAID and fault tolerance. Describe the features and characteristics of current HP array controllers. Describe the key features of HP array configuration utilities List general service considerations for array controllers
Rev. 3.41
61
Drive array technology distributes data across a series of drives that have been configured as a single logical volume. This data distribution scheme makes it possible to access data from multiple drives more quickly than from any one physical drive. It also allows the arrayed drives to service multiple requests simultaneously.
62
Rev. 3.41
1234
In addition to having multiple drives logically configured as a single drive, drive arrays provide the following features: Data striping across multiple drives. A file is divided into a selected number of sectors and then written across a series of drives. The process of writing (or reading) a file across multiple drives is much faster. Multiple channels. The drive array has up to four channels that can be used at the same time, thus increasing performance. Request processing. Because multiple commands can be issued across multiple devices, the commands can be processed at the same time and the requests are processed in the most logical order (Tagged Command Queuing).
Rev. 3.41
63
Advanced Data Guarding with Two Sets of Parity; ADG is sometimes called RAID 6. The following pages describe these levels in detail.
NOTE: The HP RAID implementations are achieved at the hardware level. Some operating systems support RAID configurations implemented in software at the operating system level. Software implementations of RAID add additional overhead to the CPU and are less efficient than hardware implementations. NOTE: RAID levels 2 and 3 (Complex Error Correction and ParallelTransfer, Parity Drive) are no longer used in the industry.
64
Rev. 3.41
The above illustration shows how a file is broken into stripes (or segments) and then written across multiple disks. This greatly improves the disk latency (the amount of time a disk head has to wait for the target sector to move under the head). In addition, 100 percent of the disk space is available for data and overall disk performance is improved.
Rev. 3.41
65
Striping Factor
s Byte 16 K
16 K Byte s
KB yt es
16 es yt KB
In this figure the drive array is striping 64KB of data across a four-drive, no fault tolerance array. Striping unites multiple physical drives into a single logical drive. The logical drive is arranged so blocks of data are written alternately across all physical drives in the logical array. The number of sectors per block is referred to as the striping factor. Depending on the array controller in use, the striping factor can be modified, with Array Configuration utility. Any change to the logical volume geometry (such as striping factor, volume size, or RAID level) may be data destructive. Changes such as these require a complete backup operation before and a restore operation after the modifications have been made. Example If the striping factor is 32, intelligence in the array controller writes 32 sectors to one physical drive and 32 sectors to the next physical drive in the array. Cycling continues through the drives until the write is complete. Since a sector is 512 bytes, a striping factor of 32 is equivalent to a stripe block of 16 KB. Limitations Data striping is faster than conventional file writing to a single disk; however, there is no fault protection should a drive fail. In the above illustration, if disk 1 should fail, the entire file could not be retrieved, nor could additional information be written to the drives. As more drives are added to the array, the potential for drive failure rises. For example, calculating the Mean Time Between Failure (MTBF) for a physical disk and then for a RAID 0 implementation yields interesting results.
66
16
Rev. 3.41
If the MTBF of a single drive is 200,000 hours, the MTBF of an array with five similar drives is figured as 200,000 divided by 5 for a total array MTBF of 40,000. The number lowers simply because there are more physical spindles that are subject to failure. Therefore, a RAID 0 implementation is not suited for faulttolerant environments.
Rev. 3.41
67
Disk 0
Disk 1
M irrored Data
Limitations Although RAID 1 is a viable fault-tolerant solution, it is an expensive solution in that it requires twice as much drive storage (only 50 percent of the total disk space is available for data storage).
NOTE: The HP implementation of drive mirroring is done with hardware. Drive mirroring can also be implemented in software at the operating system level. However, note that software mirroring adds additional overhead to the CPU and is often less efficient than hardware mirroring.
68
Rev. 3.41
Limitations The biggest limitation of RAID 4 is the time required to encode the parity information and then access a single dedicated parity drive to store the information. While it does provide fault tolerance, it does require a dedicated parity drive.
Rev. 3.41
69
File striped across multiple disks, parity sum also written across multiple disks
RAID 5 is best suited for I/O-intensive applications and transaction processing, thus making it an ideal solution for high-performance faulttolerant servers. The biggest limitation of RAID 5 is the increased read time in a failure. Regardless of which disk fails data has to be recalculated on each read from the remaining disks. RAID 5 has the same drive requirements as RAID 4, except that the space used for parity is distributed across all the drives in the volume.
6 10
Rev. 3.41
RAID 1+0 is a combination of striping and mirroring data. RAID 1+0 writes data across the drives in the same fashion as RAID 0, and achieves redundancy by mirroring data similar to RAID 1. Unlike RAID 1, the data disks are also the mirror disks. RAID 1+0 mirrors data back onto the data disks rotated by one drive. An odd number of drives can be used in a RAID 1+0 configuration, whereas RAID 1 requires an even number of drives. You can continue to access data in a RAID 1+0 configuration with a single drive failure or multiple drive failures. As long as 1 drive of each mirrored pair is functioning, the set will function.
Rev. 3.41
6 11
6 12
Rev. 3.41
Rev. 3.41
6 13
Inter-generation data compatibility for ease of migration Standard configuration and management tools across product line Automatic data transfer from a failed drive to an online spare Redundant ROM protection against firmware image corruption Pre-failure notification of impending hard disk failure Capacity expansion allows the addition of drives to an existing array Volume extension increases the space on an existing logical drive RAID migration allows online reconfiguration to a new level of fault tolerance Stripe size migration to tune performance
6 14
Rev. 3.41
6 15
Modular, easy-to-upgrade design lets you optimize performance as needed, from 64-MB to 128-MB battery-backed cache. Battery-backed Cache protects cached data in the event of a power outage, server failure or controller failure, and redundant, replaceable batteries take that protection even further. Ultra320 SCSI technology delivers high performance and data bandwidth up to 320 MB/s bandwidth per channel. Dual Channels (SA-642 only) provide up to 1.5TB of storage with 20 drives. Mix-and-match LVD SCSI compatibility protects your investments and lets you deploy drives as needed. 64-bit, 133 MHz PCI-X interface boosts bandwidth above 1GB/s burst transfer rate using the PCI-X bus. 64-bit memory addressing supports servers with greater than 4 GB of memory. Online Management Features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority. Hot plug tape support (AIT100/200, 50/100, 35/70; DAT 20/40) Multiple logical drives per array S.M.A.R.T. support (Drive Pre-Failure Warranty) Auto-Reliability Monitoring (ARM) Dynamic Sector Repair Background Parity Initialization
Rev. 3.41
6 16
Modular, easy-to-upgrade design lets you optimize performance as needed, from 128-MB to 256-MB battery-backed cache. High-performance, Fifth generation architecture offers the new hardware RAID engine, and a new memory architecture for increased performance over previous controllers. Ultra3 SCSI technology delivers high performance and data bandwidth up to 160 MB/s bandwidth per channel. Dual Channels provide the ability to support up to 2TB with 28 drives. Mix-and-match LVD SCSI compatibility protects your investments and lets you deploy drives as needed. Battery-backed cache protects cached data in the event of a power outage, server failure or controller failure, and redundant, replaceable batteries take that protection even further. 64-bit, 133 MHz PCI-X interface boosts bandwidth above 1GB/s burst transfer rate over PCI-X bus. 64-bit memory addressing supports servers with greater than 4 GB of memory. Online Management Features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority.
Note: The Smart Array 5312 is basically a PCI-X version of the SMART 5302. The 5312, however, does not support the ADAM module (required for RAID ADG) and cannot be upgraded to four ports or with a SAN module.
Rev. 3.41
6 17
A new hardware RAID engine and new performance memory architecture to significantly improve performance over previous controllers. Modular design allows optimizing performance and increasing capacity from two to four channels with 32, 64, 128, or 256 MB battery backed cache. RAID ADG (Advanced Data Guarding) delivers high fault tolerance similar to RAID 1 while keeping capacity utilization high like RAID 5.
This feature protects data from multiple drive failures while only requiring the capacity of two drives to store parity information. This higher level of protection is ideal where large logical volumes are required. RAID ADG can withstand two simultaneous hard drive failures without data loss or downtime - twice as many as RAID 5. This is a standard feature with Smart Array 5304/128 and as an option for the SA-5302/64 and SA-5302/32 models. RAID ADG requires a minimum of 64 MB battery backed cache.
Ultra3 SCSI delivers up to 160 MB/s per channel bandwidth and up to 4 channels provides the highest storage capacity per PCI slot in the industry. Mix-and-match LVD SCSI compatibility protects the investments and allows for drives to be deployed as needed. Battery-backed cache protects cached data during power outages, server failure or controller failure, and uses redundant batteries. A 64 bit, 66 MHz PCI interface boosts bandwidth up to a 533 MB/s total transfer rate. 64-bit memory addressing supports servers with greater than 4 GB of memory. Online management features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority.
6 18
Rev. 3.41
Recovery ROM Upgradeable firmware - 2 MB flashable ROM Support for HP universal hot-plug tape drives. SAN Access, the industrys first integrated SCSI controller and Fibre Channel SAN adapter offers:
Centralized, consolidated backup solutions Incremental SAN based primary storage for groups of 5-10 servers driving Direct Attach Storage (SCSI) with Smart Array 5302 Controllers.
Notes: 1. The Smart Array 5304 supports RAID ADG as a standard. ADG resembles RAID 5 but requires two parity drives and can handle the simultaneous failure of two drives without data loss. The cache module uses AECC technology which can handle the failure of a single memory chip without causing data loss or system interruption. 2. The SMART 5302 can be upgraded to a 5304 by adding a 2-channel Ultra3 module. The SMART 5302 controller requires a HW upgrade to enable RAID ADG (minimum of 64 MB cache 1) and ADAM module). ADAM is the ADG Activation Module. The batteries of the cache module are redundant. The cache module uses AECC technology which can handle the failure of a single memory chip without causing data loss or system interruption.
Rev. 3.41
6 19
Compatibility with all Ultra2 and Ultra3 LVD family products Recovery ROM protects against a ROM corruption Ultra3 SCSI technology delivers high performance and data bandwidth up to 160 MB/s bandwidth per channel Mix-and-match LVD SCSI compatibility protects your investments and lets you deploy drives as needed Dual SCSI channels allows for up to 2 TB of storage per server slot Software consistency among all Smart Array family products: Array Configuration Utility XE (ACU-XE), Array Configuration Utility (ACU), Insight Manager (IM), Array Diagnostic Utility (ADU) and SmartStart. 64-bit, 66MHz PCI interface boosts bandwidth up to 533 MB/s total transfer rate 64-bit memory addressing supports servers with greater then 4 GB of memory 3.3 Volt slot support only (provides the latest in low-voltage, 64-bit support) 32MB Memory optimizes performance and data throughput. Pre-Failure Warranty support for hard disk drives (requires Insight Manager). 1. If a SMART 221 or 2SL is upgraded with a SMART 532, some features (RAID level migration, drive expansion, stripe set migration) cannot be used. Enabling these features requires deleting the existing arrays and creating new arrays. 2. The SMART 532 does NOT support all single ended SCSI devices.The Proliant Storage enclosures F1, F2, U1, U2 and UE are not supported by the SMART 532.
Notes:
6 20
Rev. 3.41
Increased performance over the Integrated Smart Array Controller Transportable, hardware-based Wide Ultra3 SCSI RAID controller, with 32of memory (64MB on the Smart Array 5i plus). More robust and easier to use than software-based RAID Compatibility with all Ultra2 and Ultra3 LVD family products and a seamless upgrade to next generation Ultra3 Smart Array controllers. 32-bit architecture Dual SCSI channels (one internal and one external) Universal hot-plug tape drive support and native SCSI pass through for tape backup Recovery ROM protects against a ROM image corruption
Smart Array 5i vs. Integrated Smart Array controller (ROC) The main advantages of Smart Array 5i Controller over ROC include:
Dual Ultra3 SCSI channels (1 internal and 1 external) 32 MB of cache memory used for code, transfer buffers, and non-battery backed read cache Recovery ROM to protect against a ROM image corruption
Note: Because the Integrated Smart Array Controller is embedded on the system board and cannot be removed you cannot upgrade it to the Smart Array 5i Controller but you can move any hard drives to a Smart Array 5i controller..
Rev. 3.41
6 21
High performance - 320 MB/s parallel SCSI bandwidth High capacity - 4 SCSI channels with (14) 18.2-GB drives per channel externally = 1 TB of external storage.
Note: Requires the use of the StorageWorks Enclosure Model 4214R or 4214T products.
2. I/O architecture benchmarked at up to 4x the current I/Os per second of the award-winning SA-3200 controller. This architecture includes:
Super-scalar RISC processor for greater processing power Split memory architecture for greater internal bandwidth 64-bit PCI standard for greater system bus bandwidth
Feature summary
4 external SCSI ports; 2 internal SCSI ports Over 1 TB of external storage per server slot supported 64-MB1 ECC protected, battery-backed and removable, cache daughter board for up to 4 days of data protection
Note: 56 MB useable for Read/Write cache, 8 MB used for transfer buffer and scripts memory.
Ultra2 SCSI: up to 320 MB total SCSI band-width Performance architecture 64-bit PCI design Data compatible with previous Smart Array family products-seamless upgrade from previous generations of Smart Array family controllers. Online management features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority
Note: The SMART 4200 is basically a SMART 3200 with two additional SCSI channels. As a result of using 64-bit PCI technology and a 64-bit RISC processor the SMART 4200 has a higher performance than the SMART 3200.
6 22
Rev. 3.41
Compatibility with all Wide Ultra2 and Wide Ultra3 LVD family products and a seamless upgrade to next generation Wide Ultra3 Smart Array controllers. Software consistency with all Smart Array family products including ACUXE, ACU, IM, ADU and SmartStart Wide Ultra3 SCSI with up to 160 MB/s SCSI bandwidth 64-bit PCI bus design Up to 254.8-GB of storage per server slot 16-MB of controller cache is used to create a high performance engine and optimize data throughput. 16-MB of DRAM used for code, transfer buffers, and non-battery backed read cache 1. To utilize capacity expansion and RAID level migration features when upgrading or migrating from previous generation Smart Array controllers, the customer must perform a backup of the existing data and recreate logical drives using the Smart Array 431 controller. 2. The SMART 431 does NOT support all single ended SCSI devices (Wide Ultra and Fast Wide). The Proliant Storage enclosures F1, F2, U1, U2 and UE are not supported by the SMART 431. 3. If a SMART 221 or 2SL is upgraded with a SMART 431, some features (RAID level migration, drive expansion, stripe set migration) cannot be used. Enabling these features requires deleting the existing arrays and creating new arrays.
Notes:
Rev. 3.41
6 23
Better performance and ease of use than software- based RAID Available only on ProLiant DL360, DL380, DL580, ProLiant 8500, and ProLiant 8500 Data Center Solution servers standard. Also available as an optional upgrade for the ProLiant ML370 and ML570 servers. Embedded hardware RAID controller Wide Ultra2 SCSI 32-bit architecture Fault tolerant RAID supported for all internal hard disk drives Dual Channel Wide Ultra2 SCSI performance; 80MB/s per channel, Channel 1 - internal disk drive cage Channel 2 - external for tape drive support including Hot Plug tape options. Supports a maximum of four 1" Wide Ultra2/Ultra3 SCSI Hot Plug drives. 8MB Read Cache. (Write Cache not available due to lack of battery backup). Support for RAID 0, 1+0, and 5 from Channel 1, only. Channel 2 supports standard SCSI operation, only. ROC does not support Tape Libraries or Tape AutoLoaders from either Channel. Only external tape devices are supported for CH 2.. Internal tape devices are not supported: Channel 2 external connector is VHDCI. LVD interface required by Wide Ultra2/Ultra3 SCSI protocol. Supports migration to higher performance PCI Smart Array Controllers. Backward compatible with Fast, Fast-Wide SCSI, and Wide Ultra SCSI-3 devices.
Note: The Integrated SMART controller converts the on-board SCSI to RAID protected SCSI. It does not add extra channels. If the RAID features of the Integrated SMART controller are not used, the controller should be removed.
6 24
Rev. 3.41
RAID LC2
The HP RAID LC2 Controller is a single channel PCI RAID controller targeted at entry-level and workgroup servers that need hardware RAID. Qualified on the HP ProLiant ML330, ML350 and the DL320 servers, the RAID LC2 Controller offers entry-level RAID functionality. With support for Wide Ultra2 SCSI and up to six internal disk drives, the RAID LC2 Controller provides data compatibility and an upgrade path to all HP Smart Array controllers.
Compatibility with all Wide Ultra2, Wide Ultra3, Ultra320 LVD disk drives (Ultra3 and Ultra 320 attached drives will transfer data at a maximum of 80 MB/s) Seamless Upgrade to all HP Smart Array controllers Wide Ultra2 SCSI: Up to 80 MB/s SCSI bandwidth Single Internal Channel 8-MB Read Cache 32-bit PCI Bus Design Up to 218.4 GB of storage using six 36.4 GB internal disk drives
Note: The ProLiant ML310, ProLiant ML330, and ProLiant DL320 servers use the 36.4 GB Wide Ultra3 or Ultra320 drives
Note: The ProLiant ML330 G2, ProLiant ML350, ProLiant ML350 G2, and ProLiant ML350 G3 servers use the 146.8 GB Ultra320 drives
Auto Rebuild Feature Online Spare Support Pre-Failure Warranty support for hard disk drives (requires that Insight Manager be installed) 1. The RAID LC2 controller is a single channel PCI RAID controller targeted at entry-level and workgroup servers that need low cost hardware RAID. The RAID LC2 controller is data compatible with other Smart Array controllers and can be upgraded without data loss. 2. The RAID LC2 controller does not support ACU and ACU XE and must be configured before the SmartStart CD is used to install the server.
Notes:
Rev. 3.41
6 25
Supports Wide-Ultra2 SCSI, a 16-bit, 40 MHz bus with a data transfer rate of 80 MB/s Has two channels with support for up to 30 drives (15 per channel) Supports two external Wide-Ultra2 SCSI connections or can be custom configured for internal or external connections using daughter boards Has a removable Array Accelerator battery-backed 64 MB read/write cache board with ECC (Error Checking and Correcting) memory* Has read ahead caching Supports hot-plug PCI Allows multiple logical drives per drive array Supports RAID 0, 1+0 (also called RAID 10), 1, 4, and 5 fault tolerance options Supports Wide-Ultra2 SCSI, Wide-Ultra SCSI-3, Fast-Wide SCSI-2, and Fast SCSI-2 hard drives Allows performance monitoring through Insight Manager Is available in 32 bit PCI Bus Master interface
Notes: 1. If all cabling is external, remove the daughterboard. 2. Upgrading to firmware 4.44 or higher will reduce the number of array controller failures.
6 26
Rev. 3.41
Record setting RISC processor architecture enhances controller performance Hot plug cable-free design provides greater online availability in redundant controller environments Data compatibility with all previous Smart Array controllers for simpler, more cost effective upgrades Processing tasks are divided between two engines. One generates fault tolerance information and manages data flow, while the other prepares and sorts array storage commands. Three SCSI channels provide up to 1.528 TB of high-availability fault tolerant storage using the 72.8-GB Wide Ultra3 hard drives. 64-MB ECC-protected and battery-backed removable read/write cache module maximizes I/O performance without sacrificing data integrity. Redundant controller boards enable you to Hot Plug a redundant SA-4250ES controller without bringing down the server. Maximum internal capacity for the ProLiant 8000 server is twenty-one, 1 in (2.54 cm) 72.8-GB drives, resulting in 1.528-TB of storage (21 x 72.8 GB 1" Wide Ultra3 drives). 1. The SMART 3100ES and 4250ES are specifically designed for the internal drive cages of a Proliant 6000, 7000, 8000 and ML750. Drive arrays can be spanned over two or three drive cages. 2. No cables are attached to the controller. All SCSI signals are routed through a special ES (extended SCSI) PCI slot. The SMART 4250ES and 3100ES controllers have no support for external drives. Data are compatible with other SMART controller models. Two SMART 4250ES or two 3100ES controllers can be set up as a redundant hot-pluggable pair.
Notes:
Rev. 3.41
SCSI Channel 2
SMART-2/E
The internal card top connector provides connection to SCSI Channel 1. The external card edge connector provides connection to SCSI Channel 2. To connect an external storage system to Channel 1, it is necessary to use the provided cable and punch-out block to pass the internal Fast-Wide bus to an external connection point. A slot cover pass-out point can also be used for systems that lack the punch-out block provision. The SMART-2SL has two internal connectors (a 50-pin for Fast-SCSI-2 devices and a 68-pin for Fast-Wide or Wide Ultra devices) and one external connector (68-pin). Because the SMART-2SL is a single channel controller only one of these three connectors may be used at any given time. It is possible to upgrade from a SMART-2SL to a SMART-2/P, /E, or /DH array controller. However, you may encounter a configuration error when moving drives from the SMART-2SL to the external channel of the one of the dual channel controllers. To allow movement of drives from port 1 to port 2 between controllers, make sure the new controller firmware is revision 1.78 or later. Although the SMART-2/E has the same capabilities of the SMART-2/P, it did not include the Symbios Logic 875 chipset and will not support Wide-Ultra transfer rates. SMART-2 array controllers with a firmware update support up to 15 drives. The PCI controllers in the SMART-2 family are bridged controllers and will cause the renumbering of secondary PCI buses when installed on the primary bus in select servers.
WARNING: After adding one of these controllers, it is important to check the configuration of affected controllers on the subsequent buses.
6 28
Rev. 3.41
The SMART-2/P, SMART-2SL, and the SMART-2DH support Wide-Ultra transfer rates internally only. HP only guarantees Fast-Wide-SCSI-2 transfer rates on external drives connected to these controllers. Wide-Ultra transfer rates are supported when these controllers are connected to a ProLiant/U Storage System. Both green drive lights on hot-pluggable drives attached to the SMART-2 family of array controllers may illuminate periodically while the server is idle. In most cases, this is a normal condition, and indicates that the controller is performing a test on the drives called Dynamic Sector Repair. This test runs in the background only while the server is idle and does not necessarily mean that there are problems with a driveit is only a test. The expansion card on a SMART-2 controller (none on 2SL or 221) contains the battery backed cache. In cases where the server fails, data is kept in the onboard cache for up to four days. It can be written to disk upon restoration of the unit within that period of time. The expansion card can also be transferred to a new controller in cases where the controller itself fails. The SMART-2 family of controllers is configured with the Array Configuration Utility. With the SMART-2, several configuration limitations of the SMART were overcome, allowing addition of drives, array expansion, and drive configuration without closing the operating system (online). Online array expansion is not supported with the SMART-2SL. The HP SMART SCSI Array Controller and all models of the SMART-2 SCSI Array Controller support SCSI hard drives only. Connecting it to any other SCSI device will not work and may permanently damage the controller.
Rev. 3.41
6 29
SA-532 SA-5302
6 30
Rev. 3.41
During periods of inactivity, drives attached to an array controller will run Dynamic Sector Repair (DSR). This is normal activity. Make sure to check controller order any time a controller is added or hardware changes are made to the controller or server. Any controller containing bootable drives must be first in the controller order. The logical geometry of the drives is determined by the operating system selected. Previously stored data will be lost if the operating system setting is changed. To change operating systems, the data must be backed up, and then reinstalled. Any changes to logical volume size or RAID level may be data destructive. There is no way to recover from data loss in these situations. All SMART-2 controller families support up to four online spares, except the single channel controllers, which support only two online spares. All HP array controllers are bus mastering devices. Adding an array controller can free up system resources for other activities as well as increase disk read-write performance. SCSI IDs are not displayed at POST when the SCSI devices are connected to an array controller.
Rev. 3.41
6 31
On-Line Spare
Hot-plug drive support enhances the capability of the On-Line Spare drive, as the failed drive may be replaced while the computer is still running, and the array can return to its original configuration. The On-Line Spare is an effective solution of returning to a fault-tolerant condition after a drive fails. An On-Line Spare is a redundant physical drive that takes the place of any drive that may fail in a hardware fault tolerant logical volume. An array controller supporting On-Line Spare drives not only has the ability to detect unrecoverable drive errors, but also to initiate a background rebuild to the On-Line Spare. The entire process is managed by the processor on the array controller and is independent of the operating system.
Mirrored Pair
Once an On-Line Spare is automatically activated and data from the failed drive rebuilt, the failed drive may be replaced. Following replacement, the data on the spare is spooled onto the new drive and the spare is again available to failed volumes. The On-Line Spare is required to be equal or larger in size than the drive it is replacing. However, once configured as an On-Line Spare, it may become a replacement for any fault-tolerant logical volume on the array controller. Rebuild times vary depending on overall array controller activity. With minimal server activity a 1GB drive takes approximately 10 to 20 minutes to rebuild on the SMART Array Controller.
6 32
Rev. 3.41
Option ROM Configuration for Arrays (ORCA) an off-line ROMbased configuration utility that runs independent of the operating system. ORCA can be started during the boot process and uses a menu-driven interface for minimal configuration needs by experienced users. ORCA is accessible by pressing F8 after system POST. It allows the user to create and delete logical drives and to set the boot controller order. ORCA does not support drive expansion, RAID level migration and stripe size migration. Smart Array controllers with ORCA Support include
all embedded RAID controllers (ROC, 5i, 5i plus, future products) all 5th generation Smart Array controllers (532, 5302, 5304, 5312) all future Smart Array generations
Array Configuration Utility (ACU) a configuration utility that can be run or installed from SmartStart 5.5 or earlier. ACU has a graphical interface for extensive configuration needs. Wizards are available to support novice users. The Array Configuration Utility (ACU) simplifies array configuration. ACU can be started from within the OS, from the SmartStart CD or from a bootable diskette. Under Windows 2000, Windows NT and Novell Netware this utility can be started online. The server does not have to be powered down when disks are configured.
Array Configuration Utility XE (ACU-XE) a browser-based utility that has both wizard-based assistance and different operating modes for different skill levels or faster configuration.
ACU XE combines the power of the Internet and the features of ACU to provide local or remote, web-based array configuration and management. ACU XE has an easy to use browser based interface and allows you to manage all Smart Array controllers as well as StorageWorks RA4100, RA4000 and MSA1000 enclosures from one central location. ACU-XE is also shipped with SmartStart 6.x.
Rev. 3.41
6 33
Learning Check
1. Although RAID 5 can handle up to 14 drives, HP recommends considering RAID ADG when the number of drives exceeds eight. True 2. False
Hot plug hard drives support which of the following? a. b. c. d. Replacement of a failed drive in a fault tolerant array Addition of drives and arrays Expansion of arrays Replacement of an array controller while the machine is on-line
3.
Which of the following are features of the SmartArray 6404? a. b. c. d. 32-bit array controller Supports up to 56 drives 133MHz PCI-X 256MB cache
4.
Which of the following statements about RAID are true? a. b. c. d. RAID 1+0 is the least expensive fault tolerant RAID method RAID 5 stores parity across all drives in the array RAID 5 is a more expensive solution than RAID 1+0 RAID ADG performance exceeds that of RAID 5
5.
HP Smart Array Controllers provide a software level RAID solution. True False
6.
Ultra320 SCSI protocol is backward compatible with Ultra2 and Ultra3 drives. True False
6 34
Rev. 3.41
7.
Which of the following Smart Array controllers support RAID ADG? a. b. c. d. Smart Array 6404 Smart Array 5312 Smart Array 641 Smart Array 5304
8.
The ability to recover fully from a drive failure without taking the server offline requires the use of hot-plug drives with an array controller. True False
9.
If a customer has a failed drive in a RAID 5 set and another drive in prefailure mode which should be replaced first?
10. If a customer has a two failed drives in a RAID 5 set what should you do?
Rev. 3.41
6 35
Introduction
This module gives an overview of various tools and utilities that can aid in servicing HP products. Topics include:
SmartStart System Erase ROM-Based Setup Utility (RBSU) Obtaining current device drivers
Objectives
To use HP tools and utilities, service engineers should be able to:
List the functions performed by SmartStart Describe key differences between SmartStart 5.x and 6.x Locate and install a ProLiant Support Pack (PSP) Compare system erase for SmartStart 5.x and 6.x Access and use the ROM Based Setup Utility (RBSU) List at least 3 sources of HP device drivers
Rev. 3.41
71
SmartStart
Server Integration Tool SmartStart is a set of server integration tools and utilities that optimizes platform configuration and simplifies setup of servers. It also provides functionality for integrating operating system installations on ProLiant servers to achieve optimum reliability and performance. Intelligent Manageability features extend the benefits of SmartStart and facilitate consistency and reliability of server deployment and on-going system maintenance. Server Configuration and Installation SmartStart provides intelligent server configuration and software installation and tuning assistance via a graphical tool, ensuring a streamlined, optimized and reliable setup of ProLiant servers. This tool enables navigation and a summary screen that tracks details on how the system will be configured. This walk-through graphical interface guides the user through every step of the configuration process providing maximum ease of use and confidence that the system is configured properly. Diagnostics and Drivers SmartStart includes the suite of ProLiant server software from diagnostics to drivers and supports the integration of "off-the-shelf" versions of leading operating system software. SmartStart for Servers is shipped standard with every ProLiant Server. You can easily stay up-to-date with SmartStart releases with one of our flexible subscription services.
SmartStart Functions
SmartStart performs the following functions: Automatic Hardware Detection SmartStart automatically detects and configures ProLiant hardware appropriately for the selected software and displays a summary of the configuration and selected parameters to review before any of the software is installed. Drive Array Configuration SmartStart configures physical and logical drive volumes and advanced RAID options Assisted Operating System Integration SmartStart assisted install tunes the configuration precisely for host ProLiant platform and performs the software installation without any further user intervention when the appropriate CD is inserted. Utilities SmartStart automatically installs and configures Insight Management Agents. Insight Manager can then be installed on the management workstation directly from the management CD.
72
Rev. 3.41
ProLiant Support Paqs (PSPs) ProLiant Support Paqs (PSPs) allow you to manually install or upgrade drivers and utilities from Windows NT, Windows 2000, Windows 2003 or NetWare.
SmartStart Setup
When using SmartStart 5.x to setup a server, there are three installation paths to choose from. Assisted Integration Path Replicated Install Path Manual Configuration Path
Assisted Integration The Assisted Integration path provides the full hardware and software integration benefits of SmartStart. This path guides the user through the collection of information needed for configuring the hardware and installing the system software, providing validation, online help, and recommended defaults along the way. A summary is available at any time to review the installation settings and is saved for later reference. A server profile diskette is required for Assisted Integration. To create a server profile diskette, create an empty SPD.ini file using notepad or the edit utility at the command prompt. Replicated Install In SmartStart 5.x the Replicated Install path allows the user to replicate saved operating system configurations across multiple servers. Replicated install captures and saves parameters during the installation of supported software. The configuration information is then saved into "profiles". These profiles can be used over and over to accelerate the installation of software. By using replicated install, users save time and gain a consistent way to deploy NT across the enterprise. SmartStart 6.x does not include a method to perform replicated installations. The SmartStart 6.x deployment process for ProLiant servers configured with RBSU is faster and the interview questions have been streamlined. Performing attended replications for a small number of servers at one time does not require a complicated replicated installation path. Manual Configuration The Manual Configuration path allows the user to run the System Configuration Utility manually and follow the installation procedures of the software manufacturer to install the software. However, full integration benefits are only achieved with the Assisted Integration path. This path may be used to install an operating system using CDs which are not SmartStart enabled. It may also be used for installing software from the Software Product CDs, if more flexibility with the installation settings is desired.
Rev. 3.41
73
HP Website The latest PSP deployment utilities, PSPs, and individual components for supported Microsoft Windows and Novell NetWare operating systems are always available on the HP website http://h71025.www7.hp.com/support/swdrivers/index which is accessible from any system with a Web browser and access to the Internet.
74
Rev. 3.41
HP ActiveUpdate The latest HP deployment utilities, PSPs, and individual components for Microsoft Windows NT 4.0, Windows 2000, and Novell NetWare are also available from HP ActiveUpdate v2.0. ActiveUpdate is a Web-based client application for Windows systems only. The ActiveUpdate client reduces the time that administrators spend searching the Web for the latest server updates by proactively delivering updates to a centralized software repository. You can obtain the ActiveUpdate client from the HP website: http://h18000.www1.hp.com/products/servers/management/activeupdate/. NOTE: Although you can use ActiveUpdate to maintain a centralized, networkbased software repository for all of the operating systems discussed in this guide, the ActiveUpdate client does not run on Novell NetWare systems. ActiveUpdate requires initial configuration on a Windows-based system. SmartStart for Servers CD When Web access is not available or download speeds are too slow, the PSP deployment utilities, PSPs, and individual components for Microsoft Windows NT 4.0, Windows 2000, Novell NetWare 4.2, NetWare 5.1, and NetWare 6.0 can also be obtained from the SmartStart for Servers CD 5.3 or later.
The URL for the SmartStart home page is http://h18000.www1.hp.com/products/servers/management/smartstart/index.html SmartStart New Product Support Pages From the SmartStart home page you can link to specific versions of SmartStart that provide the following information:
Rev. 3.41
New server products supported New option products supported Links to current versions of configuration tools Links to updated drivers and support software Links to ROM updates for specific servers Links to customer advisories
75
SmartStart CD Contents
SmartStart contains optimized drivers and utilities that give you maximum performance on all leading operating systems. The SmartStart 6.x CD contains: Support Software Microsoft Windows 2000/2003 Microsoft Windows NT 4.0 Linux Novell NetWare
Utilities . ROM Update Utility provides customized options for updating system, option and hard drive firmware. Array Configuration Utility (ACU) enables you to configure newly added array controllers and associated storage devices. Array Diagnostics Utility (ADU) performs device tests on HP array controller hardware. Insight Diagnostics performs tests on system components and displays information about a servers hardware and software configuration. Erase Utility provides options to clean different areas of the system: attached drives, non-attached drives, BIOS, and non-volatile RAM (NVRAM).
Management CD Contents An integral piece of the ProLiant Essentials Foundation Pack includes HPs suite of Intelligent Manageability products. Visit http://h18013.www1.hp.com/products/servers/management/index.html for more information about HP Management Software. The Management CD includes: Insight Manager 7 SP2 ActiveUpdate v2.0 Version Control Agent Version Control Repository Manager Survey Management Agents for:
76
Subscription Service
The SmartStart subscription service provides customers 8 new releases of the SmartStart CD and the Management CD for a period of approximately one year from the date of purchase. Order by phone at 1-800-573-1099 or online at http://hp.productorder.com/smartstart/. SmartStart Server Packs SmartStart and Insight Manager7 ship standard with every ProLiant server packaged in the new ProLiant Essentials Foundation Pack. SmartStart Request Pack Customers who have received defective media can request a replacement single Request Pack. In the United States, SmartStart replacement CDs can be ordered by calling 1-800-OK-Compaq (1-800-652-6672). In other countries, customers should contact a Compaq Authorized Supplier or local HP Services Center for request pack availability and ordering information.
ML/DL G2 and G3 servers and some ML/DL G1 servers RBSU ROM Update Utility Array Configuration Utility Array Diagnostic Utility Erase utility Insight Diagnostics Survey Utility Pre-ML/DL and most ML/DL G1 servers SCU Replicated installation A requirement for s Server Profile Diskette or System Partition
77
WARNING: If you start a previously configured server with SmartStart and it prompts you to run the System Erase Utility, do not run the System Erase Utility unless you want to clear all existing server configuration and data. The System Erase Utility destroys all configuration information and data. The System Erase Utility completely erases all hard drives.
System Erase with SmartStart 6.x SmartStart 6.x includes the System Erase Utility. The System Erase Utility provides options to clean different areas of the system: attached drives, nonattached drives, BIOS, and non-volatile RAM (NVRAM). Unlike previous versions of SmartStart, the System Erase Utility for SmartStart 6.3 does not erase the Smart Array controllers. For legacy systems, the Erase Utility is still available for download. To access the latest release of the Erase Utility, go to the Software and Drivers download area on www.hp.com.
78
Rev. 3.41
RBSU is updateable and it is resident in ROM. The table below illustrates some of the feature differences between RBSU and SCU:
ROM-Based Setup Utility Saves changes to NVRAM as they are made Silent conflict resolution Embedded in system ROM; does not use disk Customized for each server resulting in smaller, faster utility Configuration oriented and table driven Replication utility support with configuration info in RBST table Utility update through RBSU ROM flash or physical ROM change System Configuration Utility Does not save changes until the user exits Displays warnings when conflicts are resolved Disk-based; can be installed on system partition Comprehensive utility one version supports all servers Device oriented and file driven No direct replication utility support except through configuration backup Utility update through new version of the software
Rev. 3.41
79
Running RBSU On a 32-bit server: 1. Press the F9 key when prompted during the startup sequence. 2. Modify configuration settings as desired. 3. Exit RBSU by pressing the Escape key at the main menu. The system must be restarted when configuration settings are changed. A confirmation to exit appears on the screen, and the current boot controller is also displayed for reference purposes. To confirm exiting RBSU, press the F10 key. The server restarts using the new configuration settings. Running RBSU On a 64-bit server: 1. Select System Maintenance from the Boot menu. 2. Select ROM-Based Setup Utility. 3. Modify configuration settings as desired. 4. Exit RBSU by pressing the Escape key. If you have made any changes that require the system to be restarted, a box will appear stating that the system must be restarted. Restart the server. The server powers up using the new configuration settings. Initial Boot On initial boot (for a system that has not yet been configured) you will be required to enter the following information:
Language and Operating System Primary boot controller Date and time
NOTE: To bypass this step you must insert a Diagnostics ROMpaq diskette into the floppy drive before booting the server. This would enable you to upgrade the ROM or run the SmartStart scripting tools.
7 10
Rev. 3.41
This menu, located on the left-hand side of the screen, allows you to choose which configuration setting to view or modify. The choices are: System Options PCI Devices Standard Boot Order (IPL) (applies only to 32-bit servers) Boot Controller Order Date and Time Automatic Server Recovery (ASR) Server Passwords Server Asset Text (and IMD Textapplies only to 64-bit servers) Advanced Options BIOS Serial Console (applies only to 32-bit servers) Utility Language
On the right-hand side of the screen, a window displays basic information about the server. This information includes the server model, serial number, BIOS version, backup BIOS version, memory installed, and processors installed. Pressing the F1 key when any menu option is highlighted will allow you to view a description of that feature.
Rev. 3.41
7 11
System Options
Following are the options available from the System Options choice on the main menu: OS Selection Serial Number Embedded COM Port A Embedded COM Port B Embedded LPT Port Integrated Diskette Controller NUMLOCK Power-On State Embedded NIC Port Pre-Boot Execution Environment (PXE) Support (applies to 32-bit servers only) Diskette Write Control Diskette Boot Control Advanced Memory Protection
PCI Devices
The PCI Devices option displays the configuration settings of the PCI devices installed in the server and allows you to modify the IRQ. Multiple PCI devices can share an interrupt. To disable a device, press enter while the device is highlighted. A menu will appear with options to change the IRQ, as well as to disable the device. If the device cannot be disabled on your system, only IRQs will be available to change. IMPORTANT: Disabling a PCI Controller on a server with the PCI hot-plug driver installed will disable all controllers on that PCI driver if the server is running Windows 2000 or Windows.NET. To avoid this issue, remove the controller instead of disabling it. IMPORTANT: For 64-bit servers, devices can only be viewed, and no changes can be made.
7 12
Rev. 3.41
Rev. 3.41
7 13
Server Passwords
The Set Power-On Password option sets a password that controls access to the server during power-up. The server cannot be powered up until the correct password is entered. The Set Power-On Password option uses a simple character string with a maximum of seven characters. To disable or clear the password, enter the password followed by / (slash) when prompted to enter the password. The Set Admin Password option sets a password to control access to the administrative features of the server. The Set Admin Password option is a simple character string with a maximum of seven characters. To disable or clear the password, enter the password followed by / (slash) when prompted to enter the password. The Network Server Mode option is a simple toggle setting that sets the server to operate in network server mode. This feature works in conjunction with the poweron password. When set to Disabled, the server operates normally. When it is set to Enabled, the following actions occur: 1. The local keyboard remains locked until the power-on password is entered. 2. The power-on password prompt is bypassed. 3. When a diskette is in the diskette drive, the server does not start unless the power-on password is entered locally. NOTE: Network server Mode cannot be enabled until the power-on password has been established. The Quicklock option is a simple toggle setting that either enables or disables the Quicklock feature. When set to Enabled, the keyboard is locked by pressing the Ctrl+Alt+L keys. The keyboard remains locked until the power-on password is typed. NOTE: If the power-on password is disabled at the power-on key prompt, the Quicklock feature remains inactive until the password is changed in RBSU.
7 14
Rev. 3.41
Rev. 3.41
7 15
Advanced Options
The Advanced Options menu includes options that allow you to configure the advanced features of the server. These include MPS Table Mode (applies only to 32-bit servers) Hot-Plug Resources (applies only to 32-bit servers) POST Speed Up (applies only to 32-bit servers) Post F1 Prompt Redundant ROM Selection Erase Non-volatile Memory Set CPU Corrected Wake-On LAN (applies only to 32-bit servers) Advanced memory protection IDE EDD 3.0 (applies only to 64-bit servers) NMI Debug Button (applies only to 32-bit servers) Custom POST Message Processor Hyper-Threading Secondary IDE Channel Support (applies only to the ProLiant ML530 G2 server)
System Partition SmartStart Automated Installation still creates and populates a System Partition for User Diagnostic Utilities. If a System Partition is available, the system will reboot and automatically run RBSU if you press the F10 key to enter the System Partition, and then select Configure Hardware. Embedded System Maintenance Utilities are found in some Compaq Generation 2 and later servers and can be run by pressing the F10 key when prompted from the Power-On Self Test (POST) sequence. Systems with a System Maintenance Menu do not have an accessible System Partition, so all system utilities should be run from the System Maintenance Menu.
7 16
Rev. 3.41
Device Drivers
Current device drivers are essential for proper operation and can be obtained from:
SmartStart CD (PSP) HP Website http://www.hp.com/country/us/eng/support.html Internet FTP site ftp.Compaq.com Download Facility 281-518-1418, US and Canada (outside North America, contact your local Geo) Online Services CompuServe (keyword GO COMPAQ) America Online (keyword COMPAQ) Prodigy (keyword COMPAQ)
Technical Support Center 1-800-OK-COMPAQ (1-800-652-6672, US and Canada); outside North America, contact your local Geo
Rev. 3.41
7 17
Learning Check 1. The streamlined installation process for SmartStart 6.x has eliminated the need for which installation path found in SmartStart 5.x? _____________________________________________________________ 2. What utility provides the means of updating system, option and hard drive firmware? _____________________________________________________________ 3. What component erased by the system erase utility in SmartStart 5.x is not erased by the system erase utility in SmartStart 6.x? _____________________________________________________________ 4. List three sources of ProLiant Support Paqs (PSPs): _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 5. List four functions of the ROM-Based Setup Utility (RBSU): _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 6. The RBSU main menu is accessed by pressing which function key during the system boot process? _____________________________________________________________
7 18
Rev. 3.41
7.
8. What RBSU main menu choice includes the options of erasing non-volatile memory and setting advanced memory protection?
_____________________________________________________________
Rev. 3.41
7 19
7 20
Rev. 3.41
HP Troubleshooting Methodology
Module 8
Introduction
The high degree of interaction between the system, options hardware, operating system, and software can make it difficult to isolate to the root cause of the problem. Intermittent problems and problems generated by multiple subsystem malfunctions can be especially difficult to troubleshoot. Minimizing the time to problem resolution is critical to attaining and maintaining a high level of customer satisfaction by maximizing the availability of HP equipment. Use of this methodology will enable service providers to distinguish themselves in the marketplace by being able to provide this higher level of customer satisfaction. This methodology provides a logical framework to troubleshoot system problems and reach problem resolution. A logical framework also provides a consistent and solid foundation for other technicians and system engineers to work from when escalation is necessary. This module presents the HP troubleshooting methodology, used to diagnose and resolve HP system issues. Topics include:
Troubleshooting prerequisites HP troubleshooting methodology overview Collecting data Evaluating information to isolate mode of failure Developing an optimized action plan Implementing the action plan Evaluating results Implementing preventive measures
Rev. 3.41
81
Objectives
To use the HP troubleshooting methodology, service personnel should be able to:
Identify the troubleshooting prerequisites. Explain the HP troubleshooting methodology for diagnosing and troubleshooting HP systems. Explain the importance of collecting data. Identify effective techniques for data collection. Evaluate information to isolate the specific mode of failure. Develop an optimized action plan with possible primary and alternate solutions. Implement the action plan. Implement preventive measures.
A Learning Check at the end of this module will test your understanding of the information and concepts presented.
82
Rev. 3.41
HP Troubleshooting Methodology
Troubleshooting Prerequisites
Observing Safety Precautions
The first step in troubleshooting must always include personal and data safety. Your personal safety is the single most important factor to protect when servicing equipment. Never work under unsafe conditions. If you feel your personal safety may be at risk, contact your service manager immediately. Protect yourself and HP equipment from contact with unintentional live voltage or ESD damage. Observe the following precautions when servicing HP equipment:
Electrical shock protection Physical injury and equipment protection Electrostatic discharge awareness and precautions
Electrical Shock Protection It is critically important to read and abide by the following electrical shock warnings to avoid the risk of personal injury. Contact HP technical assistance if you have any questions regarding electrical shock protection before servicing HP equipment.
WARNING: No one should attempt to make any repairs at the component level or to make any modifications to any printed circuit board. Improper repairs can create a safety hazard.
WARNING: Never disassemble or attempt to repair HP power supplies, UPSs or monitors. The yoke and deflection coils of a CRT typically have 20K V to 40K V applied and often the charge is held in-state by capacitors. Severe injury could occur by accidental contact with this circuitry.
WARNING: Before servicing system products, disconnect the power cord. In systems that have multiple power supplies, disconnect all the power cords. The high-end systems do not completely shut off with the front panel Power On/Standby switch. The standby position removes power from most of the electronics and the drives, but portions of the power supply and some internal circuitry remain active.
WARNING: Safety interlocks are installed on some HP servers. Do not attempt to permanently defeat the safety interlocks that prevent access to hazardous energy and avoid the risk of personal injury.
Rev. 3.41
83
Physical Injury and Equipment Protection HP equipment needs to be handled, installed, removed, and disassembled properly to avoid the risk of personal injury or possible damage to the equipment. Observe the following warnings to protect against personal injury or product damage:
WARNING: Avoid the risk of personal injury by not lifting or moving heavy items without assistance. This includes large monitors, systems, and rack components such as uninterruptible power supplies. For components installed above shoulder height in a rack, have assistance removing the component to a work surface for repair or preventive maintenance. After the work on the system has been completed, have assistance replacing the component in the rack.
WARNING: Before working on high-end systems in a tower form factor that have casters, lock the casters in place to prevent the system from rolling during disassembly.
WARNING: Allow internal components to cool before handling them to prevent the risk of personal injury.
WARNING: To reduce the risk of personal injury or damage to the rack, be sure that: The leveling jacks are extended to the floor. The full weight of the rack rests on the leveling jacks. The stabilizers are attached to the rack if it is a single rack installation. The racks are coupled together in multiple rack installations.
Do not overlook this next warning. A fully loaded 42U rack is extremely heavy, with a load capacity of 1,000 pounds.
WARNING: To reduce the risk of personal injury, always ensure that the rack is adequately stabilized before extending a component outside the rack. A rack may become unstable if more than one component is extended for any reason. Extend only one component at a time.
84
Rev. 3.41
HP Troubleshooting Methodology
Electrostatic Discharge Awareness and Precautions Static electricity is an electrical charge at rest. The Triboelectric Effect is the generation of static electricity caused by rubbing two substances together (mechanical friction). Static electricity is generated every time you walk across a carpet or pull tape from a roll. Most of the time you are not aware of it unless the air is dry and you can hear or see the static charge crackle and spark its way to a new location. At humidity levels of 40% or lower, just by moving around, you can build up a static potential in your body of hundreds of volts. ESD Precautions Observe electrostatic discharge (ESD) precautions when servicing HP equipment. Every time a system is opened or a part is handled, there is a risk of damaging system components with electrostatic charges or of harming yourself by accidentally coming into contact with live voltage if working on a system with power applied. When handling boards, use a wrist strap with safety resistance connected to an earth ground. The wrist strap keeps the static charge at near zero volts as it drains off the charge to earth ground. This allows the chips to be safe from static charges during handling. The safety resistance of a 1-M or 2-M resistor is installed in series between the wrist strap and the earth ground. This is important because if an accidental shock does occur (if you did get across the 120V ac line), the voltage would push a lot of current through the wrist strap to ground. This current will be absorbed by the resistor in the wrist strap and not by the low resistance of your body. ESDS Precautions Maintain and transport electrostatic discharge sensitive (ESDS) components in closed ESD protective packages (bags or containers). Keep ESDS items in their original ESD protective containers until they are needed to avoid unnecessary handling. Unpack and handle parts only at an ESD approved workstation. Tape documentation to the outside of the bag to avoid direct contact with the ESDS item. ESDS material returned for use must be packaged in ESD protective containers. Do not use any damaged ESD protective packages, that is, bags that are ripped, torn, crumpled, punctured, and so on. Keep packing material away from the ESD safe workstation.
Rev. 3.41
85
Operating system backup System configuration backup Documenting existing software settings
Operating System Backup If the system contains valuable data, verify that the customer has at least two complete known-good backups of the operating system and data, a copy of the backup software, and a functional tape drive that can read the backup. Two backups ensure complete data recovery in case something happens to the first tape or during the first restore attempt. Ensure that the customer understands that backups and restores are their responsibility. If the customer is not willing to take responsibility for these actions, do not put yourself in jeopardy. Contact your service manager for further direction. Do not overlook the importance of verifying this pre-work. Losing a companys data without any means to recover it is truly the worst possible scenario. Avoid it at all costs. Even seemingly trivial circumstances can lead to disastrous consequences if a backup is not available. System Configuration Backup Document the system settings, if this has not been done already. If the system configuration will be changed, first obtain a record of the current system configuration settings.
Create a backup.sci file to a diskette before making any changes. To do this, go into the system partition by using the F10 key during the boot up process. Select System Configuration, Configuration Backup, Backup to a System Configuration SCI file, Enter filename (backup.sci), and press Enter to write the data to a diskette.
86
Rev. 3.41
HP Troubleshooting Methodology
Documenting Existing Software Settings If software settings will be changed as part of the action plan, first record the existing settings and parameters. If the action plan does not work, the original settings can be restored. If new files will replace old files, first rename the original files so that they can be reused later if the action plan fails. Generally, the file extension can be changed to something distinguishable such as .old or .bad. Record the original and the new name of the files changed. General Server Shutdown and Startup Procedures Ask the system administrator to follow HPs recommended general procedures for server shutdown and startup, as listed in the following table:
Operation General server shutdown Procedure 1. 2. 3. 4. General server startup 1. 2. 3. Exit applications Exit operating system Power down the server Power down the peripherals Start up peripherals. Start up server. Look for errors.
Rev. 3.41
87
88
Rev. 3.41
HP Troubleshooting Methodology
Action plan did not resolve expected issue or identify alternate cause of issue.
NO
Problem Solved?
YES
Rev. 3.41
89
Ability to ask the right questions Ability to determine and use the most appropriate tools for each situation Understanding of how the system will react in a failure scenario Identifying hardware components in the system Identifying software components in the system Asking questions to understand what failed and in what context Continuing to ask questions to learn as much detail as possible Gathering failure information such as: Stop/Abend/Trap messages Insight Manager error conditions Critical error log messages POST messages
Step 2 Evaluate the Data to Determine Potential Subsystems Causing the Issue
After you collect data and identify the symptoms, evaluate all of these facts and symptoms to:
Determine which components could cause what happened. Isolate faults to a hardware or software subsystem. Understand the mode of failure.
8 10
Rev. 3.41
HP Troubleshooting Methodology
Identify specific root causes for specified mode of failure. Identify possible solutions for each possible root cause. Order the solution by balancing the time/cost it will take to implement each solution against the likelihood that the solution will fix the issue or by the potential value of the information gained if the solution is inadequate. Identify the steps necessary to implement each solution. Compile all the steps into an optimized action plan by eliminating redundancy and ensuring that only one variable is being manipulated at a time. Incorporate an escalation plan into the master action plan: Be prepared to escalate for technical assistance. List the order of whom to contact and the information needed by each.
Carefully execute each step. Apply only one solution or variable at a time. Observe and record the results of each step including any error messages or changes in functionality.
Collect more data. Utilize the information gathered from implementation of the action plan. Evaluate the information. Develop another optimized action plan. Implement the optimized action plan.
Repeat these steps as additional information is gathered and new action plans are optimized, executed, and evaluated, until problem resolution is reached.
Rev. 3.41
8 11
Determine the root cause of the problem. Determine proactive steps that can prevent the problem from recurring. Devise a system test to verify changes and procedures before implementing them into production. Implement a new set of procedures, software, and administrative maintenance to attain a higher level of availability. Perform preventive maintenance, including checking for loose cables, reseating boards, and checking for proper airflow. Add fault tolerant elements to critical subsystems, where applicable.
8 12
Rev. 3.41
HP Troubleshooting Methodology
Rev. 3.41
8 13
8 14
Rev. 3.41
HP Troubleshooting Methodology
Questioning Questioning is a valuable technique, but you should understand that everything you hear may not be one hundred percent accurate. Much of the reply will be perception or from memory. Questioning is useful to understand the customers perception of the problem and will provide valuable clues that may lead you in the right direction. Information gained from questioning must be validated. Questioning also involves careful listening skills. Do not interrupt a customer and never assume you know what the customer is going to say. Questioning is the art of polite interrogation. Open-Ended Questions When beginning your data collection, ask open-ended questions. Open-ended questions are those that will provoke and permit spontaneous and unguided responses. The customers complete explanation will provide more details than a short answer. A customer may mention something that turns out to be a valuable clue. Ask the customer to explain what happened and listen instead of asking a series of short questions looking for specific details. The Right Questions Asking the right questions obviously depends on the immediate issue. First, center your questions on identifying failure symptoms. Once all the symptoms have been identified, the questions should then center on identifying what may have occurred before the symptoms appeared. This line of questioning will help isolate the problem to a subsystem or to a defective field replaceable unit within that subsystem.
Rev. 3.41
8 15
Controlled Questioning Once generic open-ended questioning is finished, use controlled questioning to dig deeper into the situation. The results of the open-ended questioning should provide you with a beginning baseline of what occurred and what symptoms appeared. Controlled questions should be used when one of the answers provided to an open-ended question is either not logical or provides a clue to a malfunctioning subsystem. When a clue is provided, examine further by asking specific, focused, probing questions. The answers to these questions should allow you to ask even more specific and relevant questions until you understand how the system is functioning and can define precisely when and where the error occurs and in what context the error occurs. The answers will taper off as you tap out the customers knowledge of the failing system and situation. At this stage, use the appropriate tools and utilities to collect the information that the customer was unable to provide and to validate or invalidate the information.
Look into all the error logs for information. Look up operating system errors in the appropriate tools. These tools usually offer valuable information on the root cause of the problem and offer suggestions on which items to check. Probe related components for possible links. Continue in this manner until you have collected as many facts as possible regarding the condition of the system both before and after the problem occurred.
Examples Collect enough data to pull the pieces together to see the big picture:
Can you describe the problem in detail? What happened prior to the point of malfunction? Look for discrepancies. What does not fit? Is there a system log, an Inspect report, a network map? Was there an error message? What was it?
8 16
Rev. 3.41
HP Troubleshooting Methodology
Observation Observation is an important and useful data collection technique. Using your powers of observation and your senses can provide critical information on failures. If you are in front of the system, observation should always be one of the first techniques used. This can be accomplished by physical inspection. Look for something that is not connected or that is out of place. Visual Indicators Look for:
Something that appears wrong Charred or discolored components Unconnected plugs or cables Switches in an incorrect position Smoke Physical damage LED activity
Anything that sounds different Anything that sounds wrong Beeps Clicks Whirring sounds Grinding sounds
Olfactory Indicators
Sniff for bad or unusual odors. (Acrid electrical odor indicates burnout.)
Tactile Indicators
Carefully touch components to learn if something is cool when it should be warm. Carefully touch components to learn if something is overheating. Toggle switches to find out if they click in place or are loose. Wiggle cables or wires to find out if they are loose. Press down on boards to find out if they are seated correctly or reseat them. Use your fingers to detect frayed cables (better than visual observation).
8 17
Rev. 3.41
8 18
Rev. 3.41
HP Troubleshooting Methodology
A complete baseline is a document or set of documents defining all the facts that can be known about the hardware, software, and firmware configuration of a system including its environmental conditions. It also includes the version of diagnostic tools and utilities used on the system. There are no guesses, estimations, or assumptions in a baseline. All details are researched and verified. A working baseline is the set of facts needed to understand the problem. The more facts that are collected, the more accurate the assessment of the problem will be.
Process of Baselining Document all the facts you collect about the system. If you can, print a screen image of any error messages that may have been produced for further reference. Create useful drawings such as a diagram indicating which boards are in which slots or a diagram of nodes with their network addresses. As you go through the data collection process of asking questions, making observations, using diagnostic tools, and controlled questioning to gather more detail, document all the facts. This set of documents, printouts, and drawings are your accurate field notes that together create a snapshot of the system before any changes are made. As you gain experience in creating a baseline, you will automatically gather the information needed to produce a working baseline defining how the system is functioning, precisely when and where the error occurs, and in what context the error occurs. The baseline can be carefully evaluated in the next step of the HP troubleshooting methodology to determine which subsystems have the potential to produce the symptoms recorded based on the current hardware, software and firmware configuration, and environmental conditions. The troubleshooting direction you take next will be determined by the evaluation of this information. Once solutions are tried and tested, the changes and the results will be compared against this baseline.
Rev. 3.41
8 19
Field Journal Many technicians and system engineers maintain notebooks or electronic journals filled with field cases and solutions so they have written records to draw on in the future. Most organize journals into sections, one for each subsystem with subsections for problems and solutions. This is one of the most valuable reference tools because it is filled with real problems the author lived through. Some create their own journals by creating a template and then binding many copies of this template in one notebook. As each new problem is attacked, relevant information is filled in on the template. This is entirely a personal system, there is no one right way of doing this. What works for one technician may not work for another. The following information is common to many of these journals. The information listed here is only to provide an example. Pertinent Hardware Items
Model of the system or subsystem Serial number Version of the System Configuration Utility Version of the System Diagnostics System ROM date or version Options ROM date or version Type, quantity, size, speed, and layout of RAM HP boards and slot locations Third-party boards and slot locations, if any Externally connected hardware, if any POST error code, if any Other errors reported, if any
8 20
Rev. 3.41
HP Troubleshooting Methodology
Once you have evaluated the data, determine if a true failure exists before proceeding.
Rev. 3.41
8 21
Elimination
The process of elimination is an important part of troubleshooting. The elimination technique simplifies the variables or FRUs that make up the present configuration by removing suspect FRUs. Remove suspect FRUs to observe how the system operates when they are not part of it. If the system still malfunctions, these FRUs are probably not contributing to the problem. However, if the problem is resolved, one or more of these FRUs is a contributing factor. Eliminating FRUs is also valuable to see if the problem changes once they are removed, thus identifying new potential FRU failures. Example Some of the older ROM versions in switchboxes caused jerky pointing device movements and freeze-ups. The firmware can be changed out to correct this. To verify that the switchbox is causing these symptoms, temporarily remove the switchbox from the system. Removing the switchbox and switchbox cables can also be used to isolate miscellaneous video, keyboard, and pointing device problems. If the problem disappears when the switchbox is eliminated, then either the switchbox or the switchbox cables are the problem FRU. Further elimination will isolate the defective part.
8 22
Rev. 3.41
HP Troubleshooting Methodology
Minimum Configuration
Minimum configuration is the process of removing all the FRUs except the ones necessary to configure the system to a minimum configuration. This is a drastic, but effective way to eliminate a large quantity of FRUs at once. If the problem still occurs, all those removed components have just been eliminated as contributors to the problem. If the problem goes away, take your time adding components back until the cause or causes are discovered. Variations on this technique that can be useful in troubleshooting include reducing just memory or just processors to the minimum hardware configuration or just removing all added expansion boards. Reducing the system to just HP components is a fairly common variation that quickly identifies any conflicts with third-party components. This technique is good to use if you have no indication of which FRU may be contributing to the problem. In some cases, removing unnecessary hardware immediately also provides a set of spares if duplicate parts are installed in the system. Example Temporarily remove all options installed in the expansion slots, any additional memory DIMMs and processors, and their processor power modules (PPMs). Boot the system back up and test if the problem still occurs.
Rev. 3.41
8 23
If the customer expects a system or option to perform to a specific level and it cannot, the customer will view this as a failure. The system or option may not have the capability to perform what the customer expects or may not have the capability to perform to the degree that the customer is expecting. For example, the customer expects the integrated SCSI controller to be able to perform as a SCSI array controller; but because it does not have that capability, it cannot. There is absolutely nothing wrong with the integrated SCSI controller. To resolve this issue, educate the customer. A careful and accurate explanation of what the system or option is capable of performing needs to be relayed. The customer may need to purchase additional equipment or select a different system to attain the functionality or level of performance desired.
Understanding how a system functions when running properly can quickly resolve those calls from customers believing that something is wrong with their system when it is performing properly. The customers lack of understanding of how a system operates or inaccurate knowledge regarding how a system should operate can lead the customer to believe that a properly functioning system is malfunctioning. For example, if the customer does not understand the significance of the various LED illuminations, the customer may believe that the system is defective when, in fact, the system is functioning normally. The customers perception that their system is malfunctioning can be resolved with a detailed explanation of why a particular behavior is observed, what it indicates, and why it is normal. Point the customer to documentation, if possible.
8 24
Rev. 3.41
HP Troubleshooting Methodology
Identifying possible root causes Identifying possible solutions Planning and scheduling Identifying potential problems in implementing a solution Identifying how to test results Optimizing the action plan
8 25
8 26
Rev. 3.41
HP Troubleshooting Methodology
Prioritize the possibilities. Avoid backtracking. Eliminate redundant steps. Change one variable at a time and implement one solution at a time.
Prioritizing the Possibilities After you have listed all the possible solutions to the root causes, prioritize them according to their likelihood of correcting the root cause. Once the most likely solution is selected, continue down the list, selecting the next best proposed solution until they are all in the most sensible order. Take into consideration the side effects the solutions could cause. Weigh the following criteria when prioritizing the potential solutions:
The time it will take to execute If a possible solution takes a very long time to execute, it may not be a reasonable solution to execute. The cost of downtime is very high and must be kept to a minimum. A solution that involves a great deal of time to execute may be a great solution, but because it is time consuming it may not be ordered as the first solution to act on.
The monetary cost and the amount of work involved Costly or difficult solutions are usually ordered after less costly or easier ones. These solutions may also require more preparation time as well. When calculating monetary cost, remember to think about the cost of downtime to the customer.
The value of the information gained from the failure of the solution Even when a solution fails, valuable information may be gained. This failure may make the problem worse. If it does, you are probably on the right track, and you need just to pick a different variable. New error messages generated by the system may provide the clues you were missing. No change at all may indicate that you have selected the wrong subsystem to troubleshoot. The failure may eliminate a subsystem or FRU as the cause of failure and point you toward a different subsystem as the root cause.
Rev. 3.41
8 27
Avoiding Backtracking Backtracking can waste valuable time. Watch for it when you create the action plan and try to eliminate it as much as possible during the optimization phase. An example of backtracking is going to an error log twice to look for different error messages. Eliminating Redundant Steps Look at ways to completely eliminate redundant steps. If that is not possible, reduce them as much as possible. Performing redundant steps if not completely necessary is also a waste of your time. Look at ways of linking two solutions together into one master solution if it can be done in a logical way. Changing One Variable at a Time and Implementing One Solution at a Time Execute each step of the plan by applying only one solution or by changing only one variable at a time. Understanding exactly what corrected the problem can lead to understanding what may have caused the problem to occur in the first place. When multiple changes are made at one time, it is impossible to know which one solved or modified the problem. Whenever possible, avoid applying multiple changes at one time.
8 28
Rev. 3.41
HP Troubleshooting Methodology
Schedule time with the customer. Implement the optimized action plan.
Set up a timeframe to execute the action plan, observe, and evaluate the results. Include enough time to implement preventive measures and perform preventive maintenance. Underestimating the time needed to perform the necessary work or to recover from a disaster will result, at best, in rushing through a job or, at worst, in failing to complete a job coupled with customer dissatisfaction. Explain to the customer exactly what the action plan involves, how much time is required to execute it, and the risks associated with it. Depending on the guidelines set out by the service center manager, your service center may also need to be informed. The customer should be familiar with how much time a complete restore will take in a worst-case scenario and be able to estimate the total amount of time required. If the customer selects non-business hours for warranty work to be scheduled, or cannot provide an adequate window to perform the necessary work, contact your service manager for direction.
Rev. 3.41
8 29
Take a few minutes to look at the action plan to make sure that any necessary precautions and pre-work have been adhered to and completed and that the steps appear complete and in the correct order. Anything that needs to be completed, fixed, or taken care of should be done now. Make sure that everything needed to execute and complete the action plan is ready. Gather tools needed for system disassembly, necessary spare parts, and configuration utilities, if appropriate. Now it is time to begin. Change only one variable at a time. Record the output. Observe the results. Test the solution.
Changing Only One Variable at a Time Implement each step of the plan by applying only one solution or by changing only one variable at a time. It is important to work with only one change at a time regardless if that change involves a modification, addition, or deletion of a specified item. By following this simple guideline, you can observe exactly which variable corrected or modified the existing problem or a symptom of the problem. You can also observe which variable appears to have no visible impact at all on the problem or symptom. This is equally important information to collect to understand and resolve the system problem. Understanding exactly what corrected the problem can lead to understanding what may have caused the problem to occur in the first place. Understanding the root cause is important to prevent the problem from occurring again in the future and putting needed preventive measures in place to ensure it. If you make multiple changes at one time, you do not know which one solved or modified the problem. Avoid applying multiple changes at one time whenever possible.
8 30
Rev. 3.41
HP Troubleshooting Methodology
Recording the Output On a hard copy of the optimized action plan, draw a column beside the steps. This column will be used to record the results of the steps executed. Immediately following the complete execution of a possible solution, these results or output will be used as input to the evaluation process, which judges if the solution completely solves the problem under every situation. Record the results of each step after it is executed, making sure to include any error messages or additional information collected. For future reference, record the date and time as well as how long it took to complete these steps. If these written records are complete and accurate, they will serve to eliminate future guesswork and possible confusion regarding what happened in what order. If the action plan solved the problem, it is a recorded solution for future use if the problem reappears. Observing the Results Carefully observe each step of the action plan as it is implemented. Watch for the occurrence of new symptoms or the elimination of existing ones. Some results are obvious, such as the introduction of informational or error messages, or significant changes in functionality. Other changes may not be as obvious and may require checking system logs for any new events recorded after the change was made. If the action plan calls for a reboot, watch for changes at POST if relevant. If the system has been failing at a certain point, watch if the system can now go past it or if it still fails at the same point. The following are examples of the types of observations that should be recorded on the hard copy of the action plan.
What are the results of the step? Watch for and record new symptoms, such as error messages or informational messages. Did anything change? If so, what? Check system logs. Look for any type of change, no matter how insignificant. Was any functionality gained or diminished? Functionality changes are an important indicator of the effectiveness of the action plan. Were any errors made in implementing the step? Was more than one variable changed at a time? Watch for and record any mistakes made while executing the step or the action plan. Were any steps skipped or completed out of order? Circle the steps not executed and number the true order in which the steps were executed. Were any steps accidentally added? Were any steps added intentionally to complete or correct another step? Place check marks against the steps as they are executed to avoid this. If steps had to be added in order to proceed, record why and indicate what step they were added after.
Rev. 3.41
8 31
Testing the Solution Once a solution has been completely applied, the action plan should provide for running a test or series of tests to be evaluated later. The tests may include running diagnostic utilities that up to this point have failed and displayed a particular error message. If the problem is with an application or database, it may be necessary to have the customer perform the test and evaluate the results. It is important to observe and record the results of each test executed. Also, notice if the test finishes or stops at a certain point. If a test does not complete, run it again and record those results as well. It is equally important to take notice if the solution has had no visible change to the system or system logs. The solution produced no results and may warrant reexamination of the subsystem or other subsystems.
8 32
Rev. 3.41
HP Troubleshooting Methodology
Evaluation Criteria
Consider the following criteria in evaluating if the solution adequately corrects and addresses the problem:
Are the results logical and consistent? Are the results what you expected to see? Is there an indication that steps were left out or added, or that other deviations were made in executing the action plan? Are there any error messages or new symptoms among the results? Do the results indicate if any side effects have been inadvertently introduced? Do the results prove that the problem is completely resolved? If the problem is only partially resolved, were there any functionality gains or losses or did any error message or symptom change? Is the solution only a temporary patch? Is the solution actually a workaround? Were sufficient tests run to check if the solution works correctly under all conditions?
The result of this evaluation should clearly indicate whether the problem has been resolved. If the results are inconclusive, perform additional tests. Ask the customer to test the solution. It may take several days to determine if the solution actually fixes the problem and does not generate any side effects.
Rev. 3.41
8 33
8 34
Rev. 3.41
HP Troubleshooting Methodology
Rev. 3.41
8 35
8 36
Rev. 3.41
HP Troubleshooting Methodology
Customer involvement High-availability features System management features Software management On-site spare parts Service offerings Preventive maintenance
Rev. 3.41
8 37
Customer Involvement
Recognize that the customer can be a tremendous asset in the prevention of problems. The cause of many problems is operator error, some type of human intervention, or lack of human intervention. The customer is already on-site and has a significant interest in preventing downtime and problems. Most customers would rather be proactive than reactive. Telling the Customer Where Error Information Can Be Found Most customers are willing to help if the action items are not time consuming. Some customers want to be as self-sufficient as possible and will pick up any tasks that will assist in this goal. Take a few minutes to show the customer where the error information is recorded on the system and how to read the IMD and error logs. This step empowers the customer to call and schedule service on a warranty prefailure and prevents a failed system condition. At the very least, if customers are instructed in where to look for error information, the next time they have a service issue they will want to engage your services and they will be able to provide accurate and useful data. Explaining the Resolution Explain the resolution to the problem that was just solved and write it down for the customer in clear and simple steps. This information will serve as a guide if the problem reappears. Depending on the complexity of the problem, the customer may be able to complete the steps or at least will be able to refer to it when calling for service the next time it occurs. Items the Customer Should Implement If the data on the system is important and the system does not currently have a scheduled backup routine, advise the customer of the necessity of implementing one. Suggest instituting a complete library of backups with off-site storage as part of a disaster recovery plan. Also, explain that the backups should be periodically tested to verify that they are functional and complete. A problem logging and resolution notebook is a helpful thing to have beside each system. When properly maintained, they provide a complete history of the system. Problems can then be categorized by failure types, such as hardware failure, operating system error, application error, user error, and malicious problems caused by virus programs or sabotage. Preventive maintenance actions should be recorded. Every hardware and software installation, modification, or removal should be written down as well. System configuration printouts, as well as utility diskettes can also be stored with the resolution notebook. These items can save a great deal of time in the future and ensure accuracy especially when dealing with future part replacement. If the problem involved the network cabling or IP addresses, suggest that the customer keep an up-to-date network topology map in an accessible location.
8 38
Rev. 3.41
HP Troubleshooting Methodology
If the system has little available hard drive disk space, suggest to the customer the need to periodically check for this and to perform routine file archive or removal of unneeded files. Also, suggest the available possibilities of expanding the system to accommodate more hard drive disk storage. If the customer has a DAT drive, explain the value of a scheduled cleaning program. If the customer has a DLT drive, describe what a dropped leader is and how to look for it. Also, explain the importance of tape cartridge labels placement. If it is placed on the exposed surface, it cannot fall off or become lodged inside the tape drive.
Rev. 3.41
8 39
High-Availability Features
If a server is capable of supporting fault tolerant options and they are not implemented, it may be because the customer is not aware that they are available. Fault tolerant features include:
Redundant power supplies Redundant fans RAID array controllers On-line spare drives for RAID arrays Duplexed SCSI controllers Redundant PPMs Redundant network adapters Off-line backup processors
All of these features are designed to minimize downtime. Check QuickFind for information on which features are available for the customers system. Automatic System Recovery (ASR) If it is not already set up, suggest using the Automatic System Recovery feature to restart a system after a critical hardware or software error occurs. This feature is especially useful when the error occurs while no one is onsite to service the system. ASR requires loading the HP Health Driver and enabling the Automatic System Recovery-2 (ASR-2) feature in the system configuration utility. If a critical error occurs, the system will record the error information in the System Health Logs, reboot the system, and page the system administrator. The system can be configured for automatic recovery or for attended local or remote access to diagnostic and configuration tools.
8 40
Rev. 3.41
HP Troubleshooting Methodology
NOTE: The version control update is available by downloading SoftPaq SP0965.exe. SoftPaq SP0965.exe is consistently updated to reflect the latest version.
If the customer has not already implemented an Integration System, explain the advantage of one. Integration System is a network system that acts as a repository of approved system software and configuration standards that can be implemented across distributed systems. Access to the latest software for the update of an Integration System is enabled through a dedicated HP Support Software System on the World Wide Web and through the HP SmartStart Subscription Service with periodic releases of CD updates. Through Integration System Maintenance in Insight Manager, the administrator can compare the latest software versions available via the Internet or CD to those stored on the Integration System and use the information provided to assess the need for any new versions. The administrator can then select the versions desired for download to the Integration System. Once the Integration System is updated, the new software is available for both new SmartStart installations and for update of production systems.
Rev. 3.41
8 41
Software Management
Keep abreast of operating system updates and patches. Many customers already do this, but doing so yourself may provide the extra edge needed to solve a problem or understand a conflict. It is vital to weigh the risk of implementing the change versus the added functionality it provides. It is advisable to test all changes first on a test system to check for functionality changes before implementing these changes on a production system. Ask the customer if any virus protection software is installed on the system, how long ago it was updated, and the frequency it is set to scan. If no virus protection software is installed or if it is out of date or infrequently used, advise the customer of the need to install it, keep it up to date, and use it. Even though macro viruses have grown exponentially in the last two years, boot-sector viruses still account for four out of the ten most common infections. Boot-sector viruses are a leading cause of corruption on systems running the Microsoft Windows NT operating system. Suggest subscribing to SmartStart to ensure that the customer has the latest drivers and utilities on-site. It also provides the necessary license to update Insight Manager to the latest version.
8 42
Rev. 3.41
HP Troubleshooting Methodology
Rev. 3.41
8 43
Service Offerings
Whenever you see a customers need for a service plan or contract that would be of use to him/her, advise the customer of the possibility of obtaining service level agreements to enhance their environment. This measure means that you will need to be familiar with the various service offerings and contracts that your service center offers. A timely suggestion of a service to fill a customers need can greatly increase customer satisfaction as well as increase business revenue for your company. An updated listing of warranty upgrades and service offerings (CarePaqs) is available at http://www.compaq.com/services/carepaq/. Service offerings include:
8 44
Rev. 3.41
HP Troubleshooting Methodology
Preventive Maintenance
Here are some suggestions for preventive maintenance measures that can prevent problems from occurring:
While you have the system cover off, take a few extra moments to get rid of any dust build-up with a can of anti-static air, tighten any loose connections, reseat boards, and inspect any cables for frays. Move the cables away from sources of heat and give them more slack if possible. Check for adequate airflow, and dislodge anything blocking the fans. Do not clean connectors with erasers. It removes the gold, causes static discharge, and leaves residue. If connectors need to be cleaned, use isopropyl alcohol or a special cleaning solution applied with a cotton-tipped swab. Make sure systems are not positioned tightly up against walls and that there is adequate space around them for proper airflow. Move magnetized office items such as magnetized screwdrivers and telephones with electromagnetic ringers away from the system. Advise the customer if you find any of these conditions: the system sharing a power line with high-current machines; e.g., laser printers, air conditioners, copiers, and coffee machines; ungrounded power strips; and outlets in need of repair. Check the adequacy of the power back-up system. Besides having Uninterruptible Power Supply (UPS) protection for the system, consider the power protection requirements for the hubs, bridges, routers, and gateways to avoid network functionality loss. Also, check that no UPS is overloaded. Before adding faster or larger hard drives make sure that a thermal upgrade or power supply upgrade is not necessary. The heat generated by some of the larger and faster hard drives may cause a thermal overload unless there are provisions for additional cooling. If a terminator board needs to be removed to add a processor board, give the board to the customer to store. Later, in the event that there is a processor problem, replace the failed processor board with the terminator board to keep the system functioning with the remaining processors until a replacement can be installed.
Rev. 3.41
8 45
Learning Check
1. What are the six steps of the HP Troubleshooting Methodology?
2.
3.
4.
5.
8 46
Rev. 3.41
HP Troubleshooting Methodology
6.
What is the main difference between the elimination technique and the minimum configuration technique?
7.
Which is more important, understanding the customers reported problem or understanding the true failure?
8.
What criteria should you consider when optimizing your action plans?
9.
Rev. 3.41
8 47
10.
When evaluating results, you should never ask the customer to test the solution. True False
11. Are the results to a solution always immediately available and visible? Why or why not?
12. If executing the entire action plan did not solve the problem, what is the next step to try?
13. List at least three things you should do after solving the problem.
15. It is safe to field repair monitors as long as you have been fully trained. True False
8 48
Rev. 3.41
Introduction
Tools can serve multiple purposes in the diagnostic methodology. They can assist in the collection of data. They can assist in the evaluation process and in fault isolation between subsystems. They can be an essential part of implementing a troubleshooting action plan. This module covers features of various diagnostic tools and highlights their use in troubleshooting servers, focusing on how the various tools can serve the data collection process. This module assumes a basic knowledge of the tools. Therefore, only selected portions and advanced aspects of these tools are presented. Tools that are specific to ProLiant servers are also covered. The tool sections presented are those that are considered the primary sources for the information given in the section. While the suggested tool may not be the only source, it is the most efficient to use in finding the information. Topics include: HP Insight Diagnostics Survey Array Diagnostic Utility Insight Manager Remote Insight Lights-Out Edition Server Troubleshooting Guide Summary of resources and tools
Rev. 3.41
91
Objectives
To use server diagnostic tools, service personnel should be able to:
HP Insight Diagnostics Survey Array diagnostics utility (ADU) Insight manager Remote Insight Lights-Out Edition ROM update utility
Select the correct tool for the task. Interpret the results of the tool reports. Use diagnostic tools for troubleshooting and preventive maintenance.
92
Rev. 3.41
HP Insight Diagnostics
SmartStart Home Page HP Insight Diagnostics are accessed from the SmartStart CD. They will also be available in online mode, accessible from the operating system. When the online version is available it will be distributed as a Softpaq. To access ProLiant server maintenance utilities, boot from the SmartStart CD and choose the Install button. You will see the SmartStart Home page which offers you the choice to Setup the server. Since the server has already been setup, you would not select this choice but would instead choose the Maintenance tab to gain access to the diagnostics and utilities on the SmartStart CD.
Rev. 3.41
93
94
Rev. 3.41
Rev. 3.41
95
Survey utility example - memory By choosing the Advanced mode of Survey and selecting one of the components you can obtain detailed information about that component. In the example shown here you can determine a number of facts about the memory on the server system board. For example, four of the six DIMMs are populated with 256 MB each of ECC memory, for a total of 1 GB. You will also note that the system has been configured for online spare memory.
96
Rev. 3.41
Quick Test All Devices Selecting the Test tab provides you with a mechanism for testing the various server components and subassemblies. In the center of the screen you can see that there is a choice of testing All devices or specific individual devices. At the left of the screen you can then select the Type of Test and Test Mode. The choices of test type include Quick, Complete or Custom. The mode may be either Interactive or Unattended. In this example the system board DIMMs 1 and 2 are selected for a quick test.
Rev. 3.41
97
Test Status Choosing the test Status tab allows you to keep track of the test progress. In this example, a Noise test is in progress and a Cache test will be run next. The condition legend at the left shows you that the blue dot to the left of the test item indicates the status is unknown at this point.
98
Rev. 3.41
Test Log Choosing the test Log tab enables you to see the test results after the test is complete. Here in this example you will note that both tests were successful as indicated by the green icon before each test item.
Rev. 3.41
99
Integrated Management Log (IML) Also built into this suite of diagnostic utilities is the IML viewer which was previously a separate utility run under the operating system. The IML gives you details of error conditions generated during or after POST. Included are the severity, class, date, count and description of each error. A check box is provided to allow you to note when the condition causing the error message has been repaired.
9 10
Rev. 3.41
Insight Diagnostics Help A Help tab on the maintenance menu gives you useful information about the diagnostic utilities. In this example you can see a description of the theory of operation for Insight Diagnostics. You will note that the diagnostics can be run in both online and offline mode but that different information is available depending on the mode. A detailed description of Survey tells you about the sessions that enable you to note any changes that may have occurred during the interval between them. A description of the different types of test is also shown on this screen. Here you can see the differences between Quick, Complete and Custom test modes.
Rev. 3.41
9 11
Erase Utility
The system erase utility is another feature of the server diagnostics available from the Maintenance menu. Here you can choose to erase all drives, system NVRAM or CMOS on your server. Unlike the previous server erase utility, however, this utility does not erase the smart array controller NVRAM.
9 12
Rev. 3.41
Rev. 3.41
9 13
Computer Model: ProLiant 5500 System ROM Version: 10/01/1997 SLOT SUMMARY: Slot Num Slot Type -------- --------Slot 3 PCI Array Controllers and Host Adapters Detected -------------------------------------------Smart Array 3200 Controller
Version Date Time System ROM Slot and controller identification Identifies installed array controllers and shows the slots in which they are installed. Unlike DAAD, only filled slots are shown.
9 14
Rev. 3.41
Subsystem Information
SUBSYSTEM INFORMATION: Chassis Serial Num: This Controller Array Serial Number: Cache Serial Number: Other Controller Array Serial Number: Cache Serial Number: D745BRZ10018 P165C0BBFH16VR P19200BBFH1C4E Not Available Not Available
Serial number information is available here for the server as well as the array controller.
CONTROLLER IDENTIFICATION: Configured Logical Drives: Configuration Signature: Adapter Firmware Revision: Adapter ROM Revision: Adapter Hardware Revision: Boot Block Version: Drive Present Map: External Drive Map: Board ID: Cable or Config Error: Non-disk map: Invalid Host RAM Address: CPU Revision: CPU to PCI ASIC Rev: Cache Controller ASIC Rev: PCI to Host ASIC Rev: Marketing Revision: Expand Disable Code: SCSI Chip Count: Max SCSI ID's per Bus: Big Drive Map: Big Ext Drive Map: Big Non-Disk Drive Map:
2 0xaceb0678 '3.08' '3.08' 0x01 '3.08' 0x0000000f 0x00000000 0x40320e11 0x00 (No) 0x00000000 No 0x75 0x03 0xff 0x01 0x41 (Rev A) 0x01 2 16 0x000f 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0080 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
The ROM and firmware revisions for the array controller are available here, as well as the number of logical drives and the maximum number of SCSI IDs per SCSI bus.
Rev. 3.41
9 15
0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
Drive status Current condition of the logical drive. As shown, this logical drive is OK. Blocks to rebuild Blocks re-mapped Replaced marked OK map Media was exchanged Cache failure Expand failure
9 16
Rev. 3.41
WS7000172246.... ....1.52....COMP AQ WDE2170S ............ .... WS7000172246.... ....1.52....COMP AQ WDE2170S ............ ....
Use this section to determine if a drive is in pre-failure or has failed. This section is important as it allows you to compare service times with various errors, and various errors with each other. SCSI port x, drive ID x:
Factory, since power, and threshold The Factory column is the one of main concern, as it holds all of the failures (or data) that have ever occurred with the drive. The Since Power column holds failures only since last poweron (or last driver load) and may register no faults depending on when the unit was last rebooted. Service The Factory service time is the number of minutes (in hexadecimal notation) that the drive has been in use. Use it to determine the age of the drive, as some faults may occur only after a certain number of hours. Also, compare this time with other drives showing similar faults. The faults may be false, or from another source (such as Insight Manager agents). Rtry read This entry should report all 0s.
Rev. 3.41
9 17
ECC read Note that this field shows one error. However, when comparing this with the Service time, it is not found to be an issue as the drive has been in service for quite some time. Rtry write This field should report all 0s. Seek errors This field should report all 0s. An error here would indicate a definite hardware problem with the drive. Spin time The Factory spin time column should always be lower than the Threshold column. Re-mapped The number of sectors that needed to be remapped due to being bad. An increasing number of remapped sectors indicates that the drive should be replaced. Note that there is a threshold number, and it may vary with different types of drives. Timeouts These timeouts occur when the system tries to access the drive, and should rarely (if ever) happen. A relatively small number in relation to the service time of the drive in not a problem, but if the number continues to increase, or occur on some of the other drives, see errors are also listed in other places. Rebuilds This number will increment if: A drive fails, is removed and reinstalled, being rebuilt in the process. A failed drive is replaced. The new drive will be rebuilt and the counter incremented. A rebuild occurs on the specified drive ID.
9 18
Rev. 3.41
Bus flts Double-digit bus faults are generally acceptable, but triple-digit ones are not.
Bus faults should have been an interesting statistic, but unfortunately system software (Insight Manager Agents) are responsible for most of them. If you suspect real SCSI bus problems, compare this value for drives that have similar service times. If the value is the same for these drives, the bus faults were probably due to Insight Manager agents. If the values differ between drives with similar service time, then look at Bd tgt cnt (bad target count). A bad SCSI bus can also result in target (drive) selection errors that will show up there.
Hot plgs The hot-plug counter acts as a re-insertion counter. If a drive failed, was pulled out, and reinstalled, the counter would then increment. If a new drive were put in instead, its counter would not change.
Rev. 3.41
9 19
Hrdw errors This field should report all 0s. In the sample, two errors are shown. These errors could be either transient errors or actual hard errors. Those shown were possibly caused by heat problems or power fluctuations. However, as the number increases, so does the likelihood that the drive actually has hard errors. Bad tgt cnt (bad target count) This entry could be an indication of a SCSI bus signal integrity problem. However, we have some drives that we suspect should not be in the system due to thermal and (possibly) power supply considerations. It is possible that they could be a factor here as well. Compare this entry with the Bus flts entry. A bad SCSI bus can result in errors being shown in both the Bus flts and Bad tgt cnt fields.
9 20
Rev. 3.41
= = = =
Block(Vl) --------001d77e0(0) 001d77e0(0) 00000000(0) 001d77e0(0) 003d1be2(1) 003d9740(0) 00000000(1) 003dc0c0(1) 00396dd7(1)
Time ---000000dd 000000dd 000000dd 000000dd 0000012b 0000012b 0000012b 00000137 0000014f
Op -28 28 00 28 28 2a 2a 28 28
Info ---0000 0000 0000 0000 0000 0000 0000 0000 0000
Total errors logged Codes: Error SCSI stat SCSI CAM Sense key Sense code/qual/block Time The time stamp for the error. Use this entry to compare the time stamp on other drives with the same or related errors to find causes that may be external to the drive (thermals, bus errors, and so on). Op Info
Rev. 3.41
9 21
Accelerator Status
ACCELERATOR STATUS: Logical Drive Disable Map: Read Cache Size: Posted Write Size: Disable Flag: Status: Disable Code: Total Memory Size: Battery Count: Battery Status: Parity Read Errors: Parity Write Errors: Error Log: Failed Batteries: Board Present: Accelerator Failure Map: Max Error Log Entries: NVRAM Load Status: Memory Size Shift Factor: Non Battery Backed Memory: Memory State: 0xfffffffc 28672 KBytes 28672 KBytes 0x00 0x00000001 0x0000 57344 KBytes 3 0x0007 0000 0000 N/A 0x0000 Yes 0x00000000 16 0x00 0x00 0 KBytes 0x00
Read cache size The amount of the controllers cache dedicated as a read cache. This value is configured in the Controller Settings - Accelerator Ratio in the Array Configuration Utility. Posted write size The amount of the controllers cache dedicated as a posted-write cache. It is configured in the same way as the read cache. The read and write caches are mutually exclusive. Total memory size Total memory (cache) on the controller. Battery count Number of batteries on the controller. Battery status Parity read/Parity write errors Any errors accessing the cache will be indicated here. Failed batteries (number of)
9 22
Rev. 3.41
00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 40 ec 62 00 00 00 01 00 00 00 00
The entries here are self-explanatory and are mentioned for information-gathering purposes only. SCSI port x, drive ID y:
Vendor ID Product ID Product revision Serial number Drive capacity Device supports (lists supported features) Physical drive flags (such as SMART Enabled)
9 23
Rev. 3.41
This portion of the ADU report gives general information concerning the device attached to the designated SCSI port:
Installed drive map Hot plug counts (by drive ID) Fan alert counts Alarm status SCSI device type Ultra bus faults
9 24
Rev. 3.41
ADU Reader
The ADU Reader website is accessed at http://stain.cca.cpqcorp.net/. The Reader menu provides you with a sample report that you can save to a file and then submit for analysis. The menu also has a Help feature that provides access to information about error conditions and report terminology. Another feature of this utility is the Drive Model decoder. Simply feed it the model number and you will get back a display of information about the drive including option and spare part numbers. To use the Reader for report analysis, direct it to the drive location containing the ADU text report previously generated and select Analyze Report. A sample analysis is displayed on the following slides.
Rev. 3.41
9 25
Analyzing an ADU Report Controller Info The first part of an ADU report provides User Information that includes the date and time of the report, the server model and its system ROM version. Controller Information is the next section and contains information on the type of controller and the slot in which it is installed as well as its firmware revision and the number of logical drives associated with it. The next section is the Controller Error Report. Here you would see a list of errors if any were found. In this example, none were detected.
Analyzing an ADU Report Drive Statistics Drive Statistics provide information on a number of parameters including: Service Time - The Factory service time is the number of minutes (in hexadecimal notation) that the drive has been in use. Use it to determine the age of the drive, as some faults may occur only after a certain number of hours. Also, compare this time with other drives showing similar faults. The faults may be false, or from another source (such as Insight Manager agents). Read Blocks - Number of sectors read as requested. Write Blocks - Number of sectors written to media. Seeks - Number of seeks. Spin Cycle - Number of spin-up cycles. Spin time - Spin-up time in tenths of the second.
9 26
Rev. 3.41
Re-mapped - The number of sectors that needed to be remapped due to being bad. An increasing number of remapped sectors indicates that the drive should be replaced. Note that there is a threshold number, and it may vary with different types of drives. Rebuilds - This number will increment if: z A drive fails, is removed and reinstalled, being rebuilt in the process. z A failed drive is replaced. The new drive will be rebuilt and the counter incremented. z A rebuild occurs on the specified drive ID. Hot-plugged - The hot-plug counter acts as a re-insertion counter. If a drive failed, was pulled out, and reinstalled, the counter would thenincrement. If a new drive were put in instead, its counter would not change.
Rev. 3.41
9 27
9 28
Rev. 3.41
Rev. 3.41
9 29
9 30
Rev. 3.41
Insight Manager
Next generation of HP management technology Insight Manager 7 represents the next generation of HP management technology. It incorporates the strengths of Insight Manager (Win32) and Insight Manager XE, and delivers new functionality designed not only to help diagnose system fault, performance and configuration management, but to also facilitate system software maintenance throughout the server life-cycle. Easy to set up and use Insight Manager 7 is easy to set up and use. It may be installed on EVO or Deskpro systems running Microsoft Windows 2000 Professional or Windows XP (with Service Pack 1) or on ProLiant servers running Microsoft Windows NT 4 Server, Windows 2000 Server, Windows 2000 Advanced Server, Windows 2000 Professional, or Windows .NET Server 2003. Accessible through Microsoft Internet Explorer Insight Manager 7 is accessible through Microsoft Internet Explorer and provides seamless access to the HP Insight Management Agents, the Integrated Lights Out and Remote Insight Lights Out Edition. With Insight Manager 7, critical management information is available from any location accessible via a LAN, WAN, or secure remote connection, so systems administrators have the information and tools that they need when they need them. Capable of managing a wide variety of systems Insight Manager 7 is capable of managing a wide variety of systems. It manages HP servers, clusters, desktops, workstations, and portables. It also manages nonHP devices instrumented to the Simple Network Management Protocol (SNMP) or the Distributed Management Interface (DMI.) Insight Manager 7 is the perfect management tool for customers with heterogeneous management needs. Provides Secure Socket Layer (SSL) encryption Finally, Insight Manager 7 provides Secure Socket Layer (SSL) encryption for data privacy as well as user administration and authentication integrated with local, NT domain, or Windows 2000 Active Directory accounts. It also makes extensive use of RSA Public Key technology to ensure that only authorized users can take advantage of sensitive and potentially data-destructive features. HP wants to ensure that powerful security goes hand in hand with powerful management functionality.
Rev. 3.41
9 31
Insight Manager 7 Management Server The Insight Manager 7 Management Server sits at the center of the systems management architecture. It aggregates fault, asset, performance, and configuration data from all discovered systems attached to the network. It is also responsible for management tasks conducted against groups of servers, such as SNMP status polling, e-mail and paging notification, and system software update. Finally, Insight Manager 7 discovers and provides linkages to management applications that run at the agent layer. This enables users to access all management capabilities available to them from a single point of access.
9 32
Rev. 3.41
Management Agents and Management Processors Management Agents are applications that typically run on each server within the managed environment. Examples of management agents are the HP Insight Management Agents, the Survey Utility, the Version Control Agent, and the Version Control Repository Manager. Agents perform numerous functions including fault and performance management, configuration management, system software version control and update, policy-based fault recovery, and cluster management. A management processor provides hardware-based, remote management and administration capabilities for individual servers. The Remote Insight Lights-Out Edition is an example of a management processor. Through its graphical remote console, users may take full control of servers located in secure data centers or servers located in remote offices with no dedicated IT staff. The Remote Insight Lights-Out Edition also provides the ability to remotely power on and power off servers. Web Browser User Interface The Web-browser serves as the primary user interface for all HP management products. HP has chosen to Web-enable its products in order to offer systems administrators flexibility and mobility. Because users are not tied to a specific management console, they are free to manage from any location with a LAN, WAN, or secure remote connection.
9 33
and updating HP system software. The software maintenance architecture allows customers to version manage HP system software based on internally established baselines. It also allows customer to distribute BIOS, driver, and management application updates to multiple servers through a single software update task. Remote Insight Integration Insight Manager 7 discovers all Remote Insight Lights-Out Edition (RILOE) boards, Remote Insight Light-Out Edition II (RILOE II) boards, and Integrated Lights-Out (iLO) management processors running in the managed environment. Users may access RILOE, RILOE-II or iLO from the Insight Manager 7 home page by clicking on the status icon in the management processor column. RILOE provides functionality such as graphical remote console, Virtual Power, and Virtual Floppy Drive. To learn more about the Remote Insight Lights-Out Edition, visit the HP Management website at http://www.compaq.com/manage Blade Server Visual Locator Insight Manager 7 provides blade server visualization that pinpoints the exact position of blade servers within their enclosure and rack. It also correlates alerts generated by shared infrastructure elements, and associates the Remote Insight Integrated Lights-Out and Integrated Administration management processors with the servers that they manage. Queries and Tasks Insight Manager 7 queries and tasks enable group management of HP servers and other devices connected to the network.
Queries are device or event groups based on user-defined criteria (for example, all servers, all important events, all servers running Windows 2000, etc.) Insight Manager 7 automatically updates all queries as new devices are added to the managed network and as new events are saved in the database. Tasks are operations, such as Software Update or SNMP Status Polling, performed against groups of managed devices. All tasks are based on queries and therefore self-updating. When a new device is added to the managed network it will automatically be added to the appropriate set of tasks. All tasks, including Group Configuration and System Software Update, may be scheduled to happen either immediately, periodically, or at some specified time in the future.
Group Configuration This feature enables administrators to change important configuration settings on groups of ProLiant servers. For example, users may use Group Configuration to change SNMP Settings, Management Agent passwords and security settings, and Version Control Agent settings across multiple devices.
Rev. 3.41
9 35
EMail and Paging Notification Insight Manager 7 provides the ability to send both email and paging notifications based on the receipt of a specified event or a change in device status. This gives systems administrators unrestricted mobility and removes the need for constant monitoring of a management console. Cluster Monitor The Cluster Monitor provides enhanced monitoring capabilities for Microsoft Cluster Server (MSCS), Tru64 UNIX, OpenVMS, SCO UnixWare 7, NonStop clustered servers running on ProLiant and AlphaServer systems. The Cluster Monitor navigation pane displays all discovered clusters and the data pane displays detailed information regarding CPU and disk utilization as well as environmental status on individual cluster nodes. Cluster Monitor will discover and link to the Intelligent Cluster Administrator to allow systems administrators to manage cluster policies, to take cluster resources on and off line, and to replicate cluster settings across multiple MSCS clusters. Reporting Insight Manager 7 provides inventory-reporting capabilities. Through a simple report creation wizard, you are able to display asset information across groups of servers. Asset information includes CPU, disk, memory, system, option boards, system software information, and operating system data. In addition to generating default reports, you can create customer-defined report configurations, edit report configurations, and delete report configurations. Insight Manager 7 allows exporting of inventory reports in CSV format for easy importing into most well known reporting tools. You also have the option to save the source of reports and import the resulting text file into tools such as Microsoft Excel. All users with login access to Insight Manager 7 will have the ability to generate reports. Language Support Insight Manager 7 may be installed on English, French, German, Spanish, and Japanese versions of Microsoft Windows 2000 Professional, Windows XP Professional, Microsoft Windows NT Server version 4, Windows .NET Server 2003 Standard and Enterprise Edition, Microsoft Windows 2000 Server and Microsoft Windows 2000 Advanced Server. Database support also extends to English, French, German, Spanish, and Japanese. Service Integration Insight Manager 7 integration with Intelligent Service Link software provides automatic, secure reporting of service events for systems under service contracts directly to HP Customer Support Centers or qualified service providers.
9 36
Rev. 3.41
Rev. 3.41
9 37
Device Status bar Click an underlined number link to view the devices with the associated status. The red, orange, and yellow color-coded status blocks indicate the general health of your network. Uncleared Events bar Click an underlined number link to view a list of uncleared events with Major, Minor, or Critical status. The red, orange, and yellow color-coded status blocks indicate the general health of your network.
9 38
Rev. 3.41
Device Search When the home page loads, the cursor is positioned in the Device Search field. Enter the name of the device that you would like to be found. The Device Search feature allows you to quickly retrieve details about a device using its name. Click Search, to locate the indicated device. If an exact match is found, the device page is displayed for that device. If an exact match is not found, the device page displays a list of devices in the database whose names closely resemble the target name. This list of device names will be a hyperlink and clicking a name in the list brings up the device page for that device. If no devices in the database resemble the target device, the device page will indicate the device was not found.
Note
The search field only allows the following characters to be entered: letters, numbers, tilde, dash, period, underscore, apostrophe, and space.
Results from Query The first time that you log in to Insight Manager 7, the Results From Query section displays the All Server query results. However, you may customize this section by clicking the Configure Me! link, which is located on the Results from Query bar. The Configure Me! link allows you to view only the devices or events in which you are interested. The query results will include the presence of the Actions menu allowing you to create new queries and tasks, print the query results list, delete devices or events, create reports, ping devices, assign user to events, add comments to an event, or clear events. The query results window also includes: a View menu that allows you to choose between a Details view and an Icon view; the ability to sort the query results by column; and the ability to choose what columns you would like to view in the query results table. All columns can be dragged and dropped to any location in the Results from Query section Devices and Events The Device and Events box explains the difference between devices and events. This box contains a hyperlink to the Overview page, which displays Device Status and Uncleared Event Status. You can also reach the Overview page by clicking the Devices tab from the toolbar. Click the Reports hyperlink and the Reports page is displayed. From here you can Create/Run New Report or use an existing report. You may also reach the Reports page by selecting the Devices tab and then clicking Reports.
Rev. 3.41
9 39
Queries The Queries box provides an explanation of queries and provides separate links for devices, events, clusters and favorite queries. By clicking devices, events or clusters, the Queries page will be displayed for whichever link that you choose. You will be able to view your own personal queries along with the other queries that you have access to. By clicking favorite queries, the Configure Folders page is displayed listing your folders. Tasks The Tasks box provides a link to the Tasks page by clicking the Task link. It also provides links to example tasks. By clicking an example task, the Create/Edit Task page is displayed for the chosen task. The Tasks box is displayed only if you have operator or administrator rights. Resource Center The Resource Center box offers links to management-related websites at www.hp.com Administration The Administration box allows you to fine-tune Insight Manager 7 for your environment. The links provided here are to the Automatic Discovery page, the Discovery Filter Configuration page, the Accounts page, and the Protocols page. You may also reach these pages by clicking the Settings icon from the toolbar. Additional messages may display in this section if you have not initiated Discovery. The Administration box is displayed only if you have administrator rights.
9 40
Rev. 3.41
As you navigate through the console you will be using some of the most widely used features of Insight Manager 7. It provides access to a list of devices defined by a pre-defined or a custom query, and allows users the ability to search for devices. See the table below for more details.
Device Status
Rev. 3.41
9 41
Expanded and collapsed lists in the menu frame Larger edit fields for functions, such as IP Address Range Help icon Edit fields for entering settings, such as Retries and Timeout Buttons that initiate an action, such as Execute Discovery Now Submenus indicated by a plus/minus button next to the menu
The Discovery Process Discovery is the process of finding and identifying a device at a specific address on the network (IP or IPX), and collecting information about that device. Insight Manager 7 discovers and identifies devices on your network and maintains a database of the information. You can run discovery at any time from the Automatic Discovery page and set your own schedule. You must visit this page at least once to set the initial range used for Discovery before the discovery process can begin.
9 42
Rev. 3.41
Rev. 3.41
9 43
Remote Insight Lights-Out Edition (RILOE) Remote Insight Lights-Out Edition II (RILOE II) Integrated Lights Out (iLO)
RILOE and RILOE II are PCI-based options that provide remote server management capability. RILOE II has replaced RILOE and offers enhanced performance and greater functionality. Starting in early 2002, HP began to integrate Remote Insight Lights-Out capabilities into ProLiant servers. The ProLiant DL360 G2 was the first server to offer Integrated Lights-Out (iLO), the next generation of HP's technology integrated directly into the server architecture. Standard iLO provides a text-based interface to the customer as an integral part of the server at no extra charge. Advanced iLO has a graphical interface and is available to the customer through the purchase of a license key. This module will focus on the RILOE II and iLO products.
9 44
Rev. 3.41
Up to 25 users User administration features provide the capability to define up to 25 users with customizable access rights Supports .NET Fully compatible with all ProLiant DL and ML servers and now supports Microsoft .NET (when available from vendor) and SuSE Linux operating systems Direct access to EMS console Integration with Microsoft .NET allows access to EMS console directly from the RILOE II user interface
Rev. 3.41
9 45
RILOE and RILOE II Differences Processor speed Remote Insight Lights-Out Edition II has an IBM 405GP PowerPC embedded processor running at 200Mhz for faster remote console performance. User interface Remote Insight Lights-Out Edition II features a new tab-based user interface for improved browser navigation. Security Remote Insight Lights-Out Edition II uses 128-bit encryption for remote console for improved security. Virtual functions Remote Insight Lights-Out Edition II provides USB-based Virtual Floppy and Virtual CD functionality. This functionality is supported on ProLiant servers with the Remote Insight 30-pin connector, running a USB supported operating system.
9 46
Rev. 3.41
RILOE II Screens RILOE II/RILOE Login Screen The RILOE II Login Screen has a distinctly different appearance from that of the RILOE. In either case you will be prompted for a user name and password.
Rev. 3.41
9 47
RILOE II Interface Although the RILOE II provides the same functions of the RILOE with some additional enhancements, the user interface has a different look and feel. As you can see below, there are four tabs along the top for System Status, Remote Console, Virtual Devices and Administration. Each of these tabs has a set of submenus that display along the left side of the screen when the tab is selected. The Status Summary screen is the first item on the System Status menu and is the first screen to be displayed after the user logs in. The remaining tabs and submenus provide functions similar to the RILOE.
Operational Overview During normal operation, the RILOE II passes the keyboard and mouse signals to the server and functions as the primary video controller. This configuration allows the following operations to occur:
Transparent substitution of a remote keyboard and mouse for the server keyboard and mouse Saving of video captures of reset sequences and failure sequences in the RILOE II memory for later replay Simultaneous transmission of video to the server monitor and to a Remote Console monitor
9 48
Rev. 3.41
Accessing the RILOE II for the First Time RILOE II is preconfigured with a default user name, password, and DNS name. A network settings tag with the preconfigured values is attached to the board. Use these values to access the board remotely from a network client using a standard Web browser. IMPORTANT: For security reasons, HP recommends changing these default settings after accessing Remote Insight Lights-Out Edition II for the first time. Default values: User name: Administrator Password: The last eight digits of the serial number DNS name: RIBXXXXXXXXXXXX, where 12 Xs are the MAC address of RILOE II NOTE: User names and passwords are case sensitive. After the default user name and password are verified, the Remote Insight Status Summary screen is displayed. The Remote Insight Status Summary provides general information about the RILOE II, such as the user currently logged on, server name and status, Remote Insight IP address and name, and latest log entry data. The summary home page also shows whether the RILOE II has been configured to use HP Web-based Management and Insight Management Web agents. Insight Manager 7 link to RILOE Insight Manager 7 discovers all Remote Insight Lights-Out Edition (RILOE) boards, Remote Insight Lights-Out Edition II (RILOE II) boards and Integrated Lights-Out (iLO) management processors running in the managed environment. Users may access RILO, RILOE II or iLO from the Insight Manager Home page by clicking on the status column in the management processor column. This integration provides the following capabilities:
Insight Manager 7s ability to automatically launch the Lights-Out Configuration Utility on multiple cards simplifies configuration of multiple Lights-Out ports. Server administration is more efficient and centralized Combining Virtual Media features and the Smart Start Scripting Toolkit enables the remote deployment of servers in an unattended fashion. The ProLiant Essentials Rapid Deployment Value Pack automates deploying and provisioning server software configurations through the RILOE interface. Insight Agents provide system monitoring and pre-failure alerting through the RILOE II network interface. Insight Agents on the remote server can be accessed directly from RILOE
Rev. 3.41
9 49
Insight Insight Manager 7 displays RILOE II events Insight Manager 7 provides options to manage the recovery options of remote servers. The recovery options of Insight Manager 7 will also provide the status of RILOE II and access to the diagnostics on RILOE II. In addition to useful information about the RILOE II itself, the status screen provides network information and information about power cable status. Events that are recorded include system resets, ASR, system power loss, user logins to the RILOE II and unsuccessful login attempts. RILOE II Survey Report The RILOE II provides features for proactive system management and efficient troubleshooting of server problems. In addition to the Remote Console, you have access to overall server status information, video replay of previous server resets, and other information gathered by the Survey utility.
9 50
Rev. 3.41
RILOE II Global Settings Session Timeout (minutes)Controls how long a session can remain inactive before the Remote Insight board forces the user to log in again. The default is 15 minutes and can be set up to 120 minutes. ROM Configuration Utility (F8)Enables or disables the use of the F8 key during POST to access the Remote Insight ROM-Based Configuration Utility. Emergency Management ServicesEnables or disables the use of Windows 2003 Server EMS through the RILOE II. Bypass reporting of external power cableEnables or disables the RILOE II board to report to the operating system agent to which the external power cable is connected. Remote Console Port ConfigurationEnables or disables configuring of the port address.
Remote Access with Pocket PCEnables or disables access to the RILOE II from a Pocket PC. Remote Console Data EncryptionEnables encryption of Remote Console data. If using a standard telnet client to access the RILOE II board, this setting must be Disabled. SSL Encryption StrengthAllows you to set a 40-bit or 128-bit cipher strength. The most secure is 128-bit (High). Current CipherDisplays the encryption algorithm currently being used to protect data during transmission between the browser and the RILOE II.
Rev. 3.41
9 51
Remote Insight HTTP PortAllows you to change this setting, if required by your environment. Remote Insight HTTPS PortAllows you to change this setting, if required by your environment. Remote Insight Remote Console PortAllows you to change this setting, if required by your environment. Host KeyboardEnables or disables the host keyboard. Level of Data ReturnedAllows you to select the amount of data that is returned to an HTTP identification request from Insight Manager 7.
9 52
Rev. 3.41
iLO/RILOE differences
iLO is embedded on the ProLiant server. RILOE is a PCI option card. iLO integrates system board management and diagnostics functionality with Lights-Out technology. System board management is not part of the RILOE functionality. iLO does not require any internal or external cables for its' operations. RILOE does require internal or external cable(s) for operation. iLO has Standard and Advanced features RILOE features are all standard.
iLO Standard and Advanced features iLO Standard Features Virtual text remote console Virtual power button control Dedicated LAN connectivity Automatic IP configuration via DHCP/DNS/WINS Industry standard 128-bit SSL IML and iLO event logging Support for up to 12 user accounts iLO Advanced Features Virtual graphic remote console Virtual floppy drive
Rev. 3.41
9 53
iLO Status Summary The iLO Status Summary provides general information about iLO such as the user currently logged on, server name and status, iLO IP address and name and latest log enry data. The Status Summary also shows whether iLO has been configured to use HP Web-Based Management and Insight Management Web agents.
9 54
Rev. 3.41
iLO Integrated Management Log The Integrated Management Log (IML) allows you to view logged remote server events. Logged events include all server-specific events recorded by the system health driver including operating system information and ROM-based POST codes.
Rev. 3.41
9 55
Server and iLO diagnostics As an integrated management processor iLO monitors the progress of the boot process of the server. The Host Server ROM writes Port 84 codes as it is booting. Integrated Lights-Out records and displays these codes. Selected POST codes are considered to be milestone codes and will have a description associated with them. You may use these milestones and descriptions to determine how far the server progressed through the boot process. HP uses Non-Volatile RAM to store server environment variable information. This information may be useful to HP engineers and advanced customers who have detailed knowledge of HP System Management architecture.
9 56
Rev. 3.41
RILOE Features Hardware-based graphical console A Hardware-based graphical console turns a standard browser on a management PC into a virtual desktop of the host server. This enables full control of the host servers display, keyboard, and mouse even if the servers operating system is not responding or the server is without power. Browser support for Internet Explorer and Netscape Browser support enables access to the Remote Insight Board through an integrated HTML Remote Insight menu. This menu resides in the firmware of the Remote Insight Board. LAN access through onboard network interface card LAN access enables access to the Remote Insight Board through the network. An integrated 10/100 Ethernet NIC on the Remote Insight Board supports TCP/IP, allowing you to access the Lights-Out Edition through the network without having to use a phone line. The NIC can auto-select between 10MB and 100MB. Server failure alerting Remote Insight detects when the server has lost power or has been reset by the Automatic Server Recover (ASR) circuitry after the operating system has stopped
Rev. 3.41
9 57
responding. Alerts can be sent to up to 12 management accounts through SNMP traps. SNMP is the network management protocol used by Insight Manager and other industry-standard network management applications. Reset and failure sequence replay Video text sequences stored on the Lights-Out Edition allow you to play back, pause, and replay server startup and shutdown sequences. These sequences include all system and operating system error messages and fatal error screens, such as NetWare Abend screens and Windows NT blue screens. Remote reset Remote reset enables you to initiate a cold reset from the management PC to bring the host server back on line when it is not responding. This type of restart does not shut down the server operating system gracefully but is useful in situations when the operating system is unresponsive Integration with Insight Manager Integration with Insight Manager provides hardware-based asynchronous manageability, including access to Insight Management Agents, and support for all full in-band SNMP management under key operating environments User administrator security To ensure security, RILOE supports up to 12 users with customizable access rights and individual log in names and passwords, implements MD5 encrypted password security and provides event generation for invalid login attempts. External power even when server is powered down An external power connector ensures continuous power to the Remote Insight Lights-Out Edition, even during a server power failure, or when the server is turned off. Auto configuration of IP address via DNS/DHCP RILOE provides automatic network configuration it comes with a default name and DHCP client that leases an IP address from the DNS/DHCP server on the network Survey Using industry-standard browsers, RILOE users can access the Survey configuration file, providing the latest server configuration information to assist in the diagnostic process. Virtual power button and virtual floppy drive With the virtual power button, authenticated users can remotely turn the host server on or off using any standard browser interface. Virtual floppy drive allows an administrator to remotely reboot a server from a diskette inserted in a remote
9 58
Rev. 3.41
management PC from anywhere on the network by capturing and transferring an image of a diskette over the network into the memory of the host servers RILOE. The board then redirects diskette read/write requests to diskette sectors in memory instead of the local diskette drive. RILOE Option Kit The option kit includes (clockwise starting on left): External power adapter Provides power to the Remote Insight Lights-Out Edition when the server power is off. Power cord for the external power adapter. Keyboard/mouse adapter cable Allows local and remote use of a keyboard and mouse. Virtual power button cables for ProLiant ML and DL servers Allows remote control of the power switch on the host server Virtual power button cables for ProLiant 1850R and ProLiant 8000 Allows remote control of the power switch on the host server Network settings tag contains pre-configured values for default user name, password and DNS name. HP recommends changing these default settings after accessing the RILOE board for the first time.
Power cord
Note: PCI slot, cables and video switch settings vary among servers. (In some cases the cable ships with the server, in others, the cable from the RILOE option kit is used). Refer to table 2-1 in the Remote Insight Lights-Out Edition User Guide for the
Rev. 3.41
9 59
data for your server. You can also find the information on the HP website as seen below for RILOE II.
Important! All servers support the keyboard/mouse external cable as well as the AC adapter. However, the default configuration always relies on having the internal cable connected so RILOE II can provide the virtual power buttons, Virtual Floppy, and Virtual Media USB applet. Whenever the 16- or 30-pin internal cables are used, the external cables should not be used. Frequently, customers try to use the external mouse/keyboard cables with the internal cables, causing conflicts with the mouse and keyboard functions. RILOE LEDs, Switch and Connectors The Remote Insight Lights-Out Edition external connectors are shown below:
9 60
Rev. 3.41
Video connector
Keyboard/mouse Connector
NEC LEDs
Under normal conditions, the Remote Insight Lights-Out Edition takes power from the PCI slot and does not require an external source. However, attachment of the board to a separate AC power source using the included AC adapter provides backup power should power within the server fail. Video connector The Remote Insight Lights-Out Edition video circuit becomes the primary video. Therefore, the host server monitor should be connected to the Remote Insight Lights-Out Edition. Keyboard/mouse connector To provide remote keyboard and mouse control, the keyboard and mouse signals must pass through the Remote Insight Lights-Out Edition. LAN connector The network connector provides a full-time 10MB/s or 100MB/s network connection to the host server. This provides a management PC with access to the host server without the need for separate telephone lines or modem sharing devices.
Rev. 3.41
9 61
Two LEDs Two LEDs are located on the RJ-45 connector to indicate network connectivity: 1. A green LED on the bottom right illuminates when a link is present from the Ethernet hub and flashes to indicate network traffic. 2. An amber LED on the top right illuminates to indicate a 100MB/s connection and is off to indicate a 10MB/s connection.
LEDs
J11
SW3
J12
Nine LEDs are located in the upper left corner of the Lights-Out Edition board. (viewing it component side up). During the initial boot of the Remote Insight Lights-Out Edition, the LED indicators flash randomly. After the board is booted, LED 7 will flash once a second. If the any combination of the LEDs illuminates after the initial boot, it indicates a hardware failure. Under this circumstance, try resetting the Remote Insight Lights-Out Edition. SW3, a four position DIP switch, is located near the right end of the board. It allows the user to enable and disable video and put the Lights-Out Edition board into a flash recovery mode. J11 is the virtual power button 16-pin connector. Refer to the appropriate server Setup and Installation Guide or Maintenance and Service Guide installation instructions. J12 is the Virtual power button 4-pin connector Refer to the appropriate server Setup and Installation Guide or Maintenance and Service Guide installation instructions.
9 62
Rev. 3.41
RILOE F8 Setup When the server boots you are given the option of configuring the RILOE board. Pressing F8 at the prompt displays a menu-driven interface from which you can change information related to the user, network, etc. Exiting the menu returns you to the boot process.
RILOE Remote Console Login When you set up the RILOE option you specify the network address of the RILOE board along with the names and passwords for any users. Using the network address for the RILOE board from a web browser will invoke a login screen as shown here.
Rev. 3.41
9 63
RILOE Home Page The home page displays with information about the board and the server being managed when you login to the RILOE. Notice in the Remote Console menu at the left there is a Remote Console (Frame) selection that displays Remote Insight information within a frame that allows you to maintain a view of the RILOE menu while observing the activity on the server display.
RILOE Remote Console Frame View This is a view of the Microsoft Windows 2000 Advanced Server desktop as seen from the RILOE console in frame view. The Remote Console redirects the host server console to the remote client to provide the user with full video and keyboard access.
9 64
Rev. 3.41
RILOE Status In the Server Information menu there is a Status selection that displays the screen shown here. This provides status information about both the server and the Remote Insight board.
RILOE Global Settings Global Settings is a choice under the Administration menu that allows the user to view and modify miscellaneous information including security and keyboard settings. Among these is the ability to set a time-out limit for the RILOE session. For security reasons, RILOE will automatically logout the user if the session exceeds this period with no activity. Clicking the Refresh button on your browser will bring up the dialog box that will allow the user to login again. If the Remote Console Port Configuration is set to Auto, the Remote Console Port is enabled only when a Remote Console Session is in progress.
Rev. 3.41
9 65
RILOE Event Log The Logs menu allows you to view either the Event log or the Integrated Management Log. A sample Event Log is shown here. Logged events include major server events such as a server power outage or server reset, and Remote Insight events such as a loose cable, or unauthorized login attempt. User actions are also logged such as server power on/off, power (reset) cycle, virtual floppy activity, and clearing of event log.
RILOE Integrated Management Log This slide provides an example of what you would see when invoking the Integrated Management Log from the Logs menu. Logged events include all server specific events recorded by the health driver (OS information, ROM Post codes, etc.)
9 66
Rev. 3.41
RILOE Reset Sequences Reset Sequences provides the user with video replay capability of critical host server sequences. This includes the previous two boot sequences, with ROM POST messages and OS load information. The user can also view the video sequences leading up to the last host server reset, including any abend information generated by the OS.
Rev. 3.41
9 67
Survey Utility
The Survey Utility is an online information gathering agent that runs on servers, gathering critical hardware and software information from various sources and saving it as a history of multiple sessions. It was developed to enable you to resolve problems without taking the server offline. Server Utility now includes a web browser interface that enables remote control of the utility and facilitates transfer of survey information from remote machines to a service provider. This online access is available from supported operating systems. Survey is also available from a tab on the SmartStart-based HP Insight Diagnostics menu. The Survey Utility is an agent, similar to other HP management agents, and is supported on all ProLiant servers. In addition to its text output file, it gathers up to 10 configuration captures (or sessions) and can report on changes that have occurred to the system hardware or software over time. Survey captures data as sessions, where a session is defined as an organized group of data describing the configured state of the system at a specific point in time. It will keep up to 10 distinct sessions, organized as 3 distinct types:
The original session (always session number 2) is the first session sampled, is treated as a master configuration, and will never be overwritten by the utility. The checkpoint sessions (session numbers 3 to 10) are the next 8 samples that differ significantly from the previous session. They are maintained in a first in, first out (FIFO) fashion and may be deleted as the number of checkpoints increases. Checkpoints are generated only when something that would not change under normal operation of the server is changed; thus, not all items that change will generate checkpoints. The active session (always session number 1) is the last information captured, and is overwritten each time a sample is taken. The session information is maintained in a file called SURVEY.IDI in the same directory as the executable portion of the program. This file contains all of the binary information captured for every session and can be analyzed locally by the Survey Utility, or it can be sent to another location such as a help center or to HP where the Survey Utility can generate custom reports on the information.
9 68
Rev. 3.41
Diagnosis of a server without shutting down the unit. Remote server diagnosis, where the customer or field technician may send you the Survey files; or, you may use Remote Insight to view the Survey files through your web browser. Determining if changes to the server have caused a problem. For example, if the server was working correctly yesterday, and has a problem today, you can generate a Survey file that compares the current configuration with the last known good one. Some examples of system changes that may contribute to system failures and that may be detected by using Survey are: Were any cards recently added? Has memory been added (or subtracted)? Have any services or devices stopped that were running in the last known good session? Was a ROM upgrade performed? Were any hard drives hot-plugged?
Accessing the Integrated Management Log (IML) on a server that does not have an Integrated Management Display (IMD). After running the Survey Utility, you can view the Integrated Management Log by loading the output of the utility (typically called SURVEY.TXT) into a text viewer such as Microsoft Notepad. The event list follows the system slot information. Once you have opened the file in a text viewer you can print its contents using the print feature of the viewer.
These examples illustrate the types of failure isolation functions that can be performed with Survey.
Rev. 3.41
9 69
Where does Survey reside? The default location is C:\Compaq\Survey for NT. The default location for Novell is the system directory. How do I know what sessions exist? Run Survey -v from the DOS command prompt to get a listing as shown in the diagram above. How do I view the IDI file? Survey.idi cannot be read directly because it is in binary format. Survey uses this file to generate Survey.txt. See the command line options section for more details.
9 70
Rev. 3.41
Excerpt from Survey report using survey -o10,9 fdifference to generate the report.
Rev. 3.41
9 71
Explanation of Sample Survey Report This Survey report is very short due to the fact that two Survey sessions were compared and only the differences were listed, as specified in the command line parameters. By comparing two Survey sessions, one from when the server was operating properly and one from when it was not, it is possible to see what has changed that may have affected the unit. In the sample, session 10 was the primary session, and it was compared with session 9 for differences. Thus, if an item has a plus beside it, then that was its setting in 10, and a minus signifies its setting in 9. Session 9 was created when the system was fully functional. Then, a hot-plug hard drive was pulled from a storage system, simulating a drive failure, and session 10 was created. When they were compared, the Survey.txt file shows the drive information from 9 with minus signs alongside. This minus means that these parameters have changed. In this case, the plus beside the empty in the Storage Slot 1 field shows that the drive is no longer there. If the drive had been replaced, the new drive information would be shown with plus signs.
9 72
Rev. 3.41
Minimum browser requirements include support for tables, frames, Java, JavaScript and Java Development Kit. In addition, all of the following options must be enabled:
The following login accounts are available for use with the web-enabled Survey Utility:
Operator or Administrator access is necessary to capture a new configuration sample. Pointing the Browser to the Device Home Page Survey Utility allows you to view information from a web browser, either locally or remotely using the following procedure: 1. Determine the address of the target machine for the Survey Utility information that you want to view: a. b. 2. To view data locally, use the URL: http:// Localhost:2301/. To view data remotely, use the URL: http:// machine:2301/ where machine is the IP address or the computer name under DNS.
Enter the IP address in the Address field of the browser. This will provide a display of the Insight Manager Web-Based Management Device Home page displays for that machine. You may select the Login Account link to log in as another user. Use this option if you need to perform operations that require additional access rights, such as capturing a new configuration sample. Select Survey Utility. The most recent Survey report for the selection utility displays.
9 73
3.
4.
Rev. 3.41
Navigation The default browser view for the Survey Utility contains the following three frames: 1.
Title Frame Located in the upper left corner of the browser window Contains the following links: Help displays the Survey Utility User Guide. Report displays the Survey Utility Report in the data frame. Options displays the Options page for the Survey Utility. Device Home displays the device home page from where the Survey Utility was selected.
2. Navigation Frame
Located below the Title Frame on the left side of the window Contains tree applet that allows navigation of the Data frame on the right Located on the right side of the window Used for Setting configuration options Displaying a Survey Utility Report
3. Data Frame
Using Survey Utility Options The Options page of the web-enabled Survey Utility displays all captured Survey sessions and allows you to control the utility from a browser to perform the following functions:
Select Different Configuration History Files Select Primary and Compare Session Download the Configuration History File Select a Report Type Generate a New Report Capture a New Configuration Sample
9 74
Rev. 3.41
SCU diagnostics
On legacy systems (pre-ML/DL) the precursor to RBSU was the System Configuration Utility (SCU) which provided a menu-based interface to certain configuration and diagnostic functions. Following is a brief description of some of the diagnostic capabilities found on SCU.
Test Computer
TEST Menu The Test Computer menu provides three types of diagnostics routines:
Routine Quick Check Diagnostics Description This option will run high-level diagnostics on all detected hardware in the system. However, any errors found with this test would most likely be hard failures and would have already been reported during POST. In such a case, running diagnostics would not be necessary as the POST code will detail the failure. (A listing of POST error codes can be found in Appendix G.) This option is most useful for burn-in testing of all devices. The continuous looping feature continues testing all devices until it is stopped by hitting CTRL+BREAK; or, if the Stop on errors option is checked, until it finds an error with a device. Unattended testing without continuous looping will run all tests once and then stop. Use this option to test individual devices. This is most useful in troubleshooting intermittent errors, such as a high number of ECC memory errors or problems with a drive. It can also be used to verify that a particular component has failed and that its replacement if fully operational.
Automatic Diagnostics
Prompted Diagnostics
Rev. 3.41
9 75
Diagnostic Tests Any of the three modes of Test Computer can perform the following tests:
Primary Processor Test 100 Series Error Codes. Identifies failures Memory Test 200 Series Error Codes. The System Memory Test will Write, Read, Compare Test uses static patterns to exercise memory. Noise Test checks the integrity of data transfer through data lines. Random Data Pattern Test uses random data patterns to exercise Random Address Test uses random data patterns written to random Random Long Test uses four patterns to exercise long memory. It Keyboard Test 300 Series Error Codes.
Parallel Printer Test 400 Series Error Codes. Diskette Drive Test 600 Series Error Codes. SMART Array Controller Test The following options are under the Drive Monitoring Diagnostic Test Controller Diagnostic Test verifies that the hard drive controller can Seek Test performs sequential seeks over the hard drive and then Read Test performs a random head seek test followed by a test of the Select All the Above Tests runs the Drive Monitoring Diagnostic Test, Surface Analysis performs multiple write/read/compares on each track Serial Test 1100 Series Error Codes.
Modem Communications Test 1200 Series Error Codes. Fixed Disk Drive Test 1700 Series Error Codes. Tape Drive Test 1900 Series Error Codes. Advanced VGA Board Test 2400 Series Error Codes. 32-Bit DualSpeed NetFlex-2 Controller and 32-Bit DualSpeed Token Ring Controller Test 6000 Series Error Codes. SCSI Fixed Disk Drive Test 6500 Series Error Codes. CD-ROM Drive Test 6600 Series Error Codes. SCSI Tape Drive Test 6700 Series Error Codes. Server Manager/R Board Test 7000 Series Error Codes. Pointing Device Interface Test 8600 Series Error Codes. Network Controller(s) Test
9 76
Rev. 3.41
Inspect
Inspect System ROM Keyboard System Ports System Storage Graphics Memory Operating System System Files Network System Configuration Server Health Miscellaneous Print Save to File Add Comments Exit Inspect
ROM Determines the ROM revisions of various components, including the system board. System Storage Gathers information concerning drives and mass storage controllers. Memory Identifies the number, type, and position of installed memory modules. Network Determines the I/O address, IRQ, speed, and MAC address of installed NICs. Server Health Displays the health logs, which include Standby Recovery Server status, Critical Error log, Correctable Memory Error log, and the Revisions table that details the system board and riser card revisions.
Rev. 3.41
9 77
Test ASR
Use Test ASR to verify a new ASR configuration or troubleshoot an existing one. It causes the following to happen: 1. 2. 3. 4. A test alert will be generated in the Server Health Log. The system will be restarted. The system ROM will check for bad memory. Depending on the selections made in the ASR options menu, the following may occur: 5. 6. The pager number and message will be dialed. The modem will be set to auto-answer or will dial out to another computer. The system will boot into the configuration partition (if installed) or will boot into the operating system.
Either remotely or from the host, you can run the Diagnostics utilities to view the Server Health Log. Depending on the version of ASR, a successful reboot will generate a page.
9 78
Rev. 3.41
Upgrade Firmware
The Upgrade Firmware option starts the ROMPaq Firmware Upgrade Utility, which allows you to upgrade the system and option ROMs in the server. ROM upgrades can fix many issues such as those that can arise when a new card with the latest firmware is installed in a server that has an older system ROM. The symptoms for these issues vary widely and range from solid failures to intermittent problems. It may not be readily apparent that the cause of an issue is ROM-related. Therefore, it is recommended that a system and its options all be upgraded to the latest ROM revisions when issues arise.
WARNING: Powering down a system or otherwise interrupting a ROM upgrade may result in inoperative system boards or components, and may require replacement of the failed part! Some options do have boot-block on the ROM, and in the event of an upgrade interruption or ROM corruption can be re-flashed by powering up the unit with a ROMPaq diskette inserted.
Remote Utilities
The Remote Utilities option allows connection to the computer either through a modem or through a network if enabled. The ASR feature must be configured before using this feature (see Automatic Server Recovery in the System Configuration section). Remote control will be handled through an ANSI terminal emulation program, such as ProComm or Windows HyperTerminal. Once the remote service session is established, the Remote Utilities menu options for uploading or downloading files to the server will be operational.
Rev. 3.41
9 79
The IMD is a standard feature on the current high-end servers (except the 6400R).
64 character backlit LCD (16 x 4 rows) Menu driven Four user navigating buttons Allows F1 POST entry Auxiliary power supply Displays POST, system alerts, fan failures, and user information Continuous event wrap Common design throughout high-end servers
The IMD is off unless AC power is applied to the power supply and +5V AUX is on. When it powers up, the IMD displays:
COMPAQ LCD MODEL #56022 LCD FIRMWARE 1.9
9 80
Rev. 3.41
Initialization
On systems that have an ON/STANDBY switch, when AC power is first applied, the buttons have no effect on the display until the system is powered on. Once the system is powered on, the IMD starts its initialization sequence. The LCD screen clears, it displays the model number and the LCD firmware revision, and then the MAIN MENU appears. The initialization sequence begins displaying:
System Initialization EISA Initialization PCI Auto Cfg. Processors Video Memory Test Cache Test Memory Initialization Drive Arrays Floppy Drive Option ROMs SCSI Devices F10 Prompt
A rotating line appears by each of the above prompts to indicate that section of POST is being executed. When POST completes that section, the rotating line is replaced by a check mark. When the system is powered down, the display indicates System Powered OFF.
Rev. 3.41
9 81
Displaying Events
The following is an example of how an event is displayed on the Integrated Management Display:
**001 of 010** --CAUTION-03/19/1997 12:54 PM FAN INSERTED Main System Location: System Board Fan ID: 03 **END OF EVENT**
Advantages of IMD
Because you can access the IMD directly from the server instead of going to the management console, you can achieve a greater level of system uptime and serviceability. For example, if your data center consists of racks of HP servers and Insight Manager on the management console has notified you that one of the servers is down, the IMD can display the user-defined server name, making it easier for you to identify from among all of the data center servers, the server that has gone down. The IMD can also store and display information about the system administrator who services the server. To customize how the IMD displays information, you can set user-definable options without taking the server offline. You can also add custom menu items that provide a more comprehensive status of the server. For example, if you wish to keep track of when the server was last serviced, you can enter this information and later have access to it through the IMD. These functions are currently supported only under Windows NT.
9 82
Rev. 3.41
View the IML of either local or remote systems Obtain a single historical record of recent system events and errors for postdiagnosis review View detailed system event information in a readable format Save an IML as a binary file so that users can view the saved IML file at a later date or possibly even at a different location Filter or sort IML entries to find specific information quickly Save the IML to a comma-separated file for viewing at a later date using a third-party application, such as a spreadsheet program Print out a hard copy of the IML
Install the IML viewer from the SSD for either NT or Novell (CPQIML.NLM). Severity Levels in the IML Severity levels displayed in the Integrated Management Log are as follows:
Level Informational Repaired Caution Description A comprehensive chronicle of past hardware or software system events. This type of event requires no action by the administrator. An action has taken place to fix this system event and the user marked this event as being repaired. A non-critical system error has occurred and may or may not require action by the administrator, however, it is recommended to take action if possible, then mark the event as repaired. A system component on the unit has failed and requires action by the administrator. Replace the system component, then mark the event as repaired.
Critical
Rev. 3.41
9 83
9 84
Rev. 3.41
Learning Check
1. What four tabs are found on the initial SmartStart screen? _____________________________________________________________ _____________________________________________________________ 2. What six pieces of information about the host server are displayed on the SmartStart Home screen? _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 3. What tab on the Server Diagnostics menu would enable you to determine the location and size of installed memory? _____________________________________________________________ 4. What HP Insight Diagnostics screen provides a list of errors detected during POST? _____________________________________________________________ 5. What three components are erased by the Erase utility? _____________________________________________________________ _____________________________________________________________ 6. After saving an Array Diagnostic Utility (ADU) report what web-based utility could you use to format and summarize it? _____________________________________________________________
Rev. 3.41
9 85
7.
If SCSI bus fault values for drives with similar service times are the same, what is the likely cause? _____________________________________________________________
8.
What feature provides the means to configure the type of systems Insight Manager 7 will discover? _____________________________________________________________
9.
What iLO Advanced features are not found in iLO Standard? _____________________________________________________________ _____________________________________________________________
10. What key is used during POST to access the Remote Insight ROM-Based Configuration Utility ? _____________________________________________________________ 11. RILOE II is preconfigured with a default user name, password, and DNS name where are these found? _____________________________________________________________ 12. What iLO screen can be used to determine how far a server progressed during the boot process before failing? _____________________________________________________________
9 86
Rev. 3.41