Escolar Documentos
Profissional Documentos
Cultura Documentos
SEMATECH
Technology Transfer 92031014A-GEN
SEMATECH and the SEMATECH logo are registered service marks of SEMATECH, Inc.
Product names and company names used in this publication are for identification purposes only and may be trademarks or service marks of their respective companies
May 5, 1992
Abstract:
This guideline was developed by a task force comprised of reliability experts and users of reliability methodologies from the SEMI/SEMATECH member companies. The document was written to address the needs of semiconductor equipment manufacturers and their customers. It includes a description of the principles of a cost-effective reliability program, instructions on how to get started, and details on what needs to be done. A large portion of the document is dedicated to analysis and testing methodologies. These include: Failure Modes and Effects Analysis (FMEA), Fault Tree Analysis (FTA), Component Failure Analysis (CFA), Human Reliability Analysis (HRA); and Reliability Testing, Component Testing, Accelerated Testing (Sudden Death, Step-Stress Testing), Burn-in Testing, Life Testing, Environmental Stress Screening, Qualification Testing, and Acceptance Testing.
Keywords: Life Cycle Phases, Reliability Testing, RAMP, Failure, FRACAS, Failure Modes and Effects
Analysis, Quality Function Deployment (QFD), Design of Experiment, Cost of Ownership, Infant Mortality, Reliability Qualification Testing (RQT), Taguchi, Users Groups, Reliability Block Diagram Modeling (RBD), Environmental Stress Screening (ESS), Fault Tree Analysis (FTA) Authors: Dhudsia, Vallabh
Approvals:
Vallabh Dhudsia, Project Manager & Author Keith Erickson, Director Dan McGowan, Technical Information Transfer Team Leader
iii Table of Contents 1 SUMMARY ................................................................................................................................. 1 2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE CYCLE........... 2 2.1 Introduction ......................................................................................................................... 2 2.2 The Equipment Life Cycle .................................................................................................. 2 2.3 Life Cycle Phases ................................................................................................................ 3 2.4 Life Cycle Cost.................................................................................................................... 9 2.5 The Reliability Improvement Process ............................................................................... 13 2.6 Applying the Reliability Improvement Process................................................................. 21 2.7 Summary ........................................................................................................................... 23 2.8 References ......................................................................................................................... 24 3 IMPLEMENTATION OF THE RELIABILITY IMPROVEMENT PROCESS....................... 25 3.1 Introduction ....................................................................................................................... 25 3.2 Managements Role........................................................................................................... 25 3.3 Applying the Reliability Improvement ProcessThe Reliability Improvement Process..... 26 3.4 Specific Applications of the Reliability Improvement Process......................................... 44 3.4.1 Starting with Equipment in the Design Phasewith Equipment in the Design Phase .................................................................................................................... 44 3.4.2 Starting with Equipment in the Prototype Phase ................................................... 46 3.4.3 Starting with Equipment in the Pilot Production Phasewith Equipment in the Pilot Production Phase ......................................................................................... 47 3.4.4 Starting with Equipment in the Production and Operation Phasewith Equipment in the Production and Operation Phase ............................................. 49 3.4.5 Starting with Equipment in Phase Out Phase with Equipment in Phase Out Phase .................................................................................................................... 50 3.5 Functional ResponsibilitiesResponsibilities...................................................................... 51 3.6 Where to Begin.................................................................................................................. 52 3.7 Reliability Plans ................................................................................................................ 55 3.8 Application of Resources and Communicating Value ...................................................... 56 3.9 Summary ........................................................................................................................... 57 3.10 References ....................................................................................................................... 58 4 ACTIVITIES AND TOOLS IN THE RELIABILITY IMPROVEMENT PROCESS............... 59 4.1 Introduction ....................................................................................................................... 59 4.2 Reliability ActivitiesActivities.......................................................................................... 59
SEMATECH
iv List of Figures Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs................................................ 9 Figure 2-2. Impact of a reliability program on life cycle cost...................................................... 11 Figure 2-3. Optimizing Life Cycle Costs ..................................................................................... 12 Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment............................. 13 Figure 2-5. The Reliability Improvement Process........................................................................ 14 Figure 2-6. Application of Reliability Improvement Process....................................................... 22 Figure 3-1. Multiple Equipment and Their Life Cycle Phase Status............................................ 53 Figure 4-1. A Block Model Developed in RAMP for the SETEC Generic Wafer Handler System...................................................................................................................... 125 Figure 4-2. An Estimate of the Cumulative Distribution Function for MTBF .......................... 127 Figure 4-3. A Pareto Diagram for Component Contribution to System Failure ........................ 128 Figure 4-4. A Revised Block Diagram for the SETEC Generic Wafer Handler System, showing the Addition of the Redundant Wafer Sensor............................................ 128 Figure 4-5. An Estimate of the Cumulative Distribution Function for MTBF after Modifying the Generic Wafer Handler System........................................................ 129 Figure 4-6. A Pareto Diagram for Component Contribution to System Failureafter Modifying the Generic Wafer System...................................................................... 130 Further analysis reveals that C2 fails if parts 1 and 2 (P1 and P2) fail. C4 fails if parts 3 or 4 (P3 or P4) fail. The block diagram model now looks like:.............................. 135
SEMATECH
v List of Tables Table 3-1. Reliability Improvement Process Applied at Six Different Starting Points................. 27 Table 3-2. Reliability Improvement Process Activities ............................................................... 31 Table 3-3. Reliability Improvement Process Activities2-3. Reliability Improvement Process Activities for the Design Phase..................................................................... 34 Table 3-4. Reliability Improvement Process Activities for the Prototype Phase......................... 37 Table 3-5. Reliability Improvement Process Activities for the Pilot Production Phase .............. 40 Table 3-6. Reliability Improvement Process Activities for the Production and Operation Phase .......................................................................................................................... 42 Table 3-7. Reliability Improvement Process Activities for the PhaseOut Phase2-7. Reliability Improvement Process Activities for the PhaseOut Phase ...................... 44 Table 3-8. Design Phase Reliability Improvement Process Activities......................................... 45 Table 3-9. Prototype Phase Reliability Improvement Process Activities..................................... 47 Table 3-10. Pilot Production Phase Reliability Improvement Process Activities When Initiated In Pilot Production Phase............................................................................. 48 Table 3-11. Production and Operation Phase Reliability Improvement Process Activities When Initiated in Production and Operation Phase ................................................... 50 Table 3-12. Phase Out Phase Reliability Improvement Process Activities When Initiated in Phase-Out Phase..................................................................................................... 51 Table 3-13. Current Product Line Status...................................................................................... 54
SEMATECH
iii Acknowledgements To assist in the development of these guidelines, a task force of representatives from the semiconductor industry was assembled to provide guidance in the structure and content. Their contributions and dedication to this effort has been excellent and beyond the call of duty. Our thanks to each of the task force members, reviewers, and contributors for their commitment to such an ambitious effort. It has made the development of these guidelines more enjoyable and possible. TASK FORCE MEMBERS Sandia National Labs. - SETEC Wallis Cramond Dennis Huffman SEMATECH Dr. Vallabh H. Dhudshia, Texas Instruments, Inc. David Seekon, National Semiconductor Corp. Mario Villacourt SEMATECH Member Companies Denny Johnson, International Business Machines (IBM) Karl Koch, Digital Equipment Corp. (DEC) John OReilly, DEC Richard Talbot, IBM Larry Waite, National Semiconductor Corp. Chuck Woodard, IBM SEMI/SEMATECH Dr. Michael McGraw SEMI/SEMATECH Member Companies Ron Dornseif, Genus Jack Olivieri, MKS Instruments Dr. Ralph Dudley, Applied Materials Dr. Robert Cranwell Dr. Ron Iman Dr. Irving Hall Teresa Sype
SEMATECH
iv REVIEWERS and CONTRIBUTORS Samuel Becktel, Genus Richard E. Howard, Luxtron Products Dr. Samuel Keene, IBM Richard Gerstner, SEMATECH David Troness, Intel Sue Howell, SEMI/SEMATECH Dennis R. Hoffman, TI Bob Holmstrom, ATEQ Corp. Dr. David J. Klinger, AT&T Dr. Jerry Brandwie, RI Dr. Richard Prairie, SETEC Debra Vogler, Varian Associates
SEMATECH
v The SEMATECH Perspective Statement from Bill Spencer, CEO of SEMATECH: Todays competitive environment demands an increasing level of reliability in semiconductor manufacturing equipment. The industry has made great strides in the last four years in improving reliability. In fact, VLSI Research reports that in its annual customer survey, reliability has fallen to sixth place on the list of biggest problems, after being number one for 10 years. VLSI is quick to give SEMATECH credit for much of the improvement. And while the existence of SEMATECH was a key element, the supplier industry should receive added praise for stepping up and solving a major problem. But, as with so much of this business today, reliability is a race without an end. And the formula to improved reliability is to build it into every stage of development. This Reliability Guideline will assist in development of a program to ensure consideration of reliability factors at every stage of product development from inception through qualification. The Guideline was developed by a task force comprised of reliability experts and users of reliability methodologies from the SEMI/SEMATECH member companies. As a result, it offers best-of-breed concepts and is written to meet the needs of semiconductor equipment manufacturers and their customers. Im sure it will prove an excellent tool.
SEMATECH
vi
Preface These guidelines have been written for use by semiconductor equipment suppliers and customers. They are intended as a road map that these groups can refer to for assistance in improving the reliability of their semiconductor manufacturing equipment as part of a long-term strategy aimed at regaining an increased worldwide market share. Although there is an abundance of reliability information available in text books, military handbooks and standards, and guidebooks directed at specific products, there is no concise, single source document available for the semiconductor equipment industry. The purpose of these guidelines is to fill this gap. To assist in this effort, a task force consisting of representatives from the semiconductor industry was assembled to provide guidance in the structure and content of these guidelines. The guidelines do not provide comprehensive instruction on the details of reliability engineering; rather they provide a description of the principles of a cost-effective reliability program, instructions on how to get started, and details on what needs to be done. Descriptions of necessary program activities and reliability concepts are provided along with references for those who desire additional information. The focus of the guidelines is on hardware reliability realizing that software reliability is an important aspect of reliability for a large segment of semiconductor manufacturing equipment. However, other guidelines exist that address the issue of software reliability. Thus, the software reliability topic is discussed only briefly. The guidelines: Are intended to be of value to managers, reliability engineers, and designers Are not a "detailed how-to" document, but rather a "roadmap of how to" Are centered around a continuous improvement process referred to as the Reliability Improvement Process Cover the entire equipment life cycle as it applies to the semiconductor equipment industry Even though emphasis is placed on designing in reliability, the guidelines show how to incorporate reliability into every phase of the equipment life cycle.
SEMATECH
vii The guidelines are broken into three sections: Section 2.0, The Reliability Improvement Process and Equipment Life Cycle, describes the Reliability Improvement Process and the Equipment Life Cycle. Life cycle phases are defined and discussed, as well as life cycle costs. The five steps of the Reliability Improvement Process are defined and discussed. Section 3.0, Implementation of the Reliability Improvement Process, describes the activities involved in applying each step of the Reliability Improvement Process to each phase of the Equipment Life Cycle. The section associated with activities provides information on applying the Reliability Improvement Process continuously throughout the entire life cycle. Also discussed are the activities associated with applying the Reliability Improvement Process during later phases of the life cycle. Section4.0, Activities and Tools in the Reliability Improvement Process, provides a description of the activities and tools that are part of the Reliability Improvement Process. Activities are grouped under engineering, data, and testing. Specific tools used in the application of certain activities are also discussed. Section 3.0 is meant to provide more information and guidance on activities and tools used in the application of the Reliability Improvement Process.
SEMATECH
SUMMARY
These guidelines focus on a continuous improvement process referred to as the Reliability Improvement Process, and the Equipment Life Cycle. These two concepts are introduced and discussed in Section 1.0 of the guidelines. Knowledge of the equipment life cycle is important because it provides a basis for understanding how and where reliability engineering enters into the process of designing, producing, and operating the equipment. In this document, the life cycle has been broken into six distinct phases, each representing a unique portion of the life cycle. These six life cycle phases are: 1. Concept and Feasibility Phase 2. Design Phase 3. Prototype (alpha-site) Phase 4. Pilot Production (beta-site) Phase 5. Production and Operation Phase 6. Phase-out Phase These phases provide the framework for tracking reliability improvement throughout the equipment life cycle phases and guidance on when and where to apply resources. Life cycle costs concepts are introduced to help understand the impact on expenditures and cost of ownership when reliability is initiated at different phases of the life cycle. The Reliability Improvement Process provides a means for systematically improving reliability throughout the equipment life cycle. It is an iterative process of setting goals, evaluating, comparing, and improving directed toward continuous reliability improvement. It consists of five basic steps. 1. Establish reliability goals and requirements for equipment 2. Apply reliability engineering or improvement activities, as needed 3. Conduct an evaluation of the equipment or equipment design 4. Compare the results of the evaluation to the goals and requirements and make a decision for the next step 5. Identify problems and root causes The process then returns to Step 2, and repeats Steps 2 through 5 until goals and requirements are met.
SEMATECH
2 The role of management in implementing the Reliability Improvement Process is introduced in Section 2.0. Management has responsibilities in establishing and implementing the Reliability Improvement Process. These responsibilities include establishing the right environment and choosing individuals to champion the effort. Section 2.0 provides details on preparing for and implementing the Reliability Improvement Process, including a discussion on the various activities associated with each step of the Reliability Improvement Process and each phase of the life cycle. The Reliability Improvement Process can be used for a piece of equipment regardless of its placement in the life cycle. The discussion in Section 2.0 includes information on how to select equipment for initiating reliability improvement, the importance of data, and the choice of activities when resources are limited. Activities and tools used in applying the Reliability Improvement Process are discussed in more detail in Section 3.0. Three types of activities are listed: engineering, datarelated, and testing. Many of the activities require tools for implementation. These tools come from various disciplines such as probability and statistics and reliability engineering. References that have detailed information on the tool or activity are provided at the end of each activity in Section 3.0. 2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE CYCLE Introduction
2.1
The reliability improvement process and the equipment life cycle form the basis for these guidelines and are introduced in this section. The reliability improvement process is an iterative process that provides: An effective and systematic way to include reliability in equipment design A structure for making reliability improvements throughout the equipment life cycle The reliability improvement process provides a means for making revolutionary advancements when it is applied to equipment early in the design stage, or during major design upgrades, or for making evolutionary improvements to existing equipment. Knowledge of the equipment life cycle is important because it provides: The framework for applying the reliability improvement process A basis for understanding the best practice for improving equipment reliability and the cost of the improvement Life cycle costs are introduced in this section to provide a perspective on the impact of initiating the reliability improvement process early in the equipment life cycle. A thorough knowledge of life cycle costs and life cycle phase relationships helps to achieve better equipment at lower total costs. 2.2 The Equipment Life Cycle
The equipment life cycle begins when the idea for the equipment is conceived and ends when the equipment is no longer useful. The life cycle consists of phases that describe the state of design, process of development, and production of the equipment. A working knowledge of these phases
SEMATECH
Technology Transfer # 92031014A-GEN
3 enables proper planning and execution of the activities and functions necessary for designing, manufacturing, and operating reliable equipment in a cost effective manner. 2.3 Life Cycle Phases
In this document, the life cycle has been divided into the six phases listed below. As indicated, these six phases can be grouped under three macro phases. The three macro phases are sometimes used in place of the six phases for illustrative purposes; this in no way impacts the concepts and methodology presented.
1. 2. 3. 4. 5. 6.
Concept and Feasibility Design Prototype (alpha (X)-site) Pilot Production (Beta (B)-site Production and Operations Phase-out Phase
Macrophases
A discussion of each of the six life cycle phases follows. 1. Concept and Feasibility. The life cycle begins with this phase; the need for new equipment is identified and alternative approaches to fulfilling that need are explored. The need for new equipment may be based on existing equipment that can no longer perform its intended function or on customer requirements for which the necessary equipment does not exist.
Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out Concept/Feasibility
During this phase, marketing and sales personnel, customer service representatives, design and reliability engineers, and manufacturing engineers work together with the customer to: Determine the need for new equipment Establish reliability goals Evaluate the feasibility of meeting these goals Estimate resource requirements Examine alternative design concepts
SEMATECH
4 Select those concepts to be studied in more detail during the design phase Estimate cost trade offs
The concept and feasibility phase, and the design phase that follows, are the optimal times for using design-for-reliability practices. 2. Design. The alternative design concepts selected during the concept and feasibility phase are explored in more detail by the design engineers during this phase of the life cycle. A design disclosure package is prepared and evaluated by all concerned parties. Reliability and manufacturing engineers, as well as quality assurance and field service personnel are generally called on by the design engineers for input concerning parts selection, components, serviceability, and manufacturing processes. Also, reliability goals set for the equipment during the concept and feasibility phase are translated into requirements very early in the design phase. Requirements are useful in making preliminary reliability allocations to subsystems and components to understand cost impacts. This phase of the life cycle can be separated into two parts: preliminary design and final design.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out
During the preliminary design process, design and reliability engineers: Modify goals to meet customer requirements Evaluate a number of design alternatives Make preliminary reliability allocations to subsystems and components Prepare a design disclosure package of requirements and specifications Estimate cost considerations More than one design alternative may be selected for the final design phase if serious questions remain about the best choice. During the final design process, customer and supplier representatives, design and reliability engineers, project managers, field service personnel, manufacturing engineers, and quality assurance personnel: Update reliability allocations to subsystems and components Carry out design reviews Implement design-for-reliability practices Update the design disclosure package to reflect these reviews Select specific designs for prototype construction
SEMATECH
Technology Transfer # 92031014A-GEN
Several iterations of design review and redesign are usually required before a design is ready for prototype construction. Design reviews are important in measuring the progress against design requirements and gaining management approval to proceed with the prototype phase of the life cycle. These reviews are carried out in parallel with the design process and are often categorized as follows: Requirements Review - review the equipments design requirements Preliminary Design Review - evaluate the preliminary design against requirements Critical Design Review - provide design to the customer(s) for review 3. Prototype. Specific designs selected during the design phase are built and tested during this phase to determine if all design requirements will be met. The prototype phase provides the first opportunity to validate the entire design, and is therefore commonly called alpha-site evaluation. Selected customers are included in alpha-site evaluations and are asked to provide feedback on all aspects of the equipment.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out
Multiple design alternatives may require prototyping and testing if serious questions exist about the best overall choice. It is common for reliability engineers to have responsibility for performing these tests. However, manufacturing personnel will have responsibility for determining that parts and components conform to specifications within financial guidelines. During the prototype phase, design, reliability, test, and manufacturing engineers, as well as quality assurance personnel: Build and test one or more prototypes of a design Present the test results for a pilot production design review Redesign as needed to fix weaknesses or make other desirable changes Conduct additional design reviews as appropriate The design reviews should include another critical design review to give the customer an opportunity to review the latest design being considered. Concurrent with redesigns and design reviews, reliability engineers, quality assurance personnel, and manufacturing engineers will develop quality assurance plans, design inspection and testing programs, set up production facilities, and develop production plans in preparation for the pilot production phase.
Technology Transfer # 92031014A-GEN
SEMATECH
6 4. Pilot Production. This phase of the life cycle serves as a bridge between the prototype phase and the production and operation phase. This is the first opportunity for the equipment to be evaluated in an extended customer environment, and is therefore commonly called beta-site evaluation. In fact, it may be the first time that the equipment is exposed to a customers processes.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out
The purpose of the pilot production phase is to help identify and correct problems with the equipment before full-scale production begins. Design and reliability engineers should evaluate the actual level of equipment reliability and determine what needs to be accomplished to meet requirements in a cost effective manner. During the pilot production phase, project management, reliability engineers, manufacturing and test personnel, and customer service representatives: Qualify the equipment manufacturing process Establish field trials and customer applications of equipment Monitor the equipments performance Identify root causes of failures Implement a "corrective action" program for reliability problems Determine cost of ownership Prior to the production and operation phase of the life cycle, reliability and design engineers should evaluate equipment reliability and make the appropriate recommendations. If the actual equipment reliability level is less than desired, specific reliability improvement activities that were identified in the corrective action program should be implemented. This is the last opportunity to make design changes and other improvements before full-scale production. Design reviews conducted at this point are often broken down into: Qualification Review - verify that the final design meets requirements Production Readiness Review - to determine the readiness of full production Reliability Budget Review - verify the reliability goal allocations If any design changes were made at this point, another critical design review may be appropriate.
SEMATECH
7 5. Production and Operation. This phase of the life cycle represents the time when units are produced and sold. All major reliability problems should have been identified and corrected prior to the production and operation phase. A formal program must be in place for collecting and analyzing field service data and performance data for the customers unit as well as for the cost impact.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out
During the production and operation phase, field service personnel, management, quality assurance personnel, and reliability engineers: Implement a field tracking and customer feedback and satisfaction program Provide training and technical assistance to customers Document and employ installation testing and operation procedures Identify and report operation and maintenance problems Record failure data in a formal database Manage continuous improvement efforts Determine cost of ownership impacts Recorded failure data should account for uncertainty due to variations in site, product vintage, and customer procedures. After proper review, decisions are made for resource allocation for continuous improvement in the reliability process. The supplier and customer should function as partners in these efforts and may participate in user groups. Once equipment is in the field, it is important to continually monitor reliability, analyze failures and identify root causes, implement corrective actions, and improve known causes of failures both for the current and the next generation of equipment. 6. Phase Out. The equipment product line is approaching the end of its useful life during this final phase of the life cycle. The end of useful life naturally occurs earlier for the supplier than it does for the customer. The end of useful equipment life for the customer can occur due to obsolescence, wear, or a change in business plans. To remain competitive, the supplier must make plans for the next generation of equipment before phasing out current generation production.
SEMATECH
8
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out
The information gained during the six phases of the life cycle should be retained so that it can be used to improve future generations of similar or new equipment. This completes the life cycle for the current generation of equipment. Each new generation of equipment would experience basically the same life cycle. Supplier Cost Implications. The early life cycle phases typically represent the smallest portion of those total life cycle costs borne by the supplier, yet generally represent the region where the greatest impact on equipment reliability can be made. As a design moves toward completion, design details become increasingly fixed. Thus, the cost in time and dollars to correct reliability problems increases. Figure 1-1 shows that typically, toward the end of the design/development macro phase of the life cycle, only 15% of the life cycle costs are consumed, but approximately 95% of the total life cycle costs have been determined (i.e., locked in).[2] Thus, changes made to improve reliability after the design/development macro phase have little impact on overall life cycle costs, but can be very expensive in terms of costly design changes, retrofits, service calls, warranty claims, and customer goodwill. This is not meant to imply that equipment already in the production/operation macro phase should be ignored in terms of improving reliability. Reliability improvement activities should continue throughout the life cycle.
SEMATECH
100
95% 85%
100
Operation (50%)
80
20 12% 3% 0
20
Source:
Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs Although reliability improvements made earlier in the life cycle can increase initial supplier costs, they generally result in lower support costs for the supplier and lower operational costs for the customer. Also, early improvement could reduce the suppliers costs of production, warranty, and service. 2.4 Life Cycle Cost
Two criteria used by semiconductor manufacturers to select equipment for a manufacturing step or process are: 1. Technical 2. Economical[1] The question asked for the technical criterion is, "Can a particular piece of equipment or equipment line do the manufacturing step or process required?" The question asked for the economical criterion is, "Does the result of the manufacturing process justify or support the cost and on-going expense of a particular piece of equipment or equipment line?" It is increasingly common for several pieces of equipment to be able to meet the technical criterion. Thus, the economical criterion is becoming increasingly important. Customers consider not only the initial purchase price, but the costs associated with equipment operations over its entire life (i.e., life cycle costs).
SEMATECH
10 Life cycle costs include both equipment supplier costs, which are passed on to the customer in the purchase price of the equipment, and all costs incurred by the customer over the equipment life. Supplier costs plus the suppliers gross profit margin are referred to asacquisition costs, and include: Research and development Marketing and sales Testing and manufacturing Supplier shipping and installation Supplier training and support Supplier service and spare parts Warranty costs Continuous improvement Costs incurred by the customer are referred to as operational costs, and include: Customer installation and training Operating costs Customer service costs and spares inventory Customer performed maintenance Customer space costs Scheduled maintenance Equipment improvements and upgrades Down time and scrap costs Disposal costs Life cycle costs implications to both the supplier and the customer are discussed in the following paragraphs.
SEMATECH
11 Customer Cost Implications. Improvements in reliability made by the supplier early in the equipment life cycle may result in higher development costs being passed on to the customer in the equipment acquisition costs. However, this can be more than offset as the customer benefits by having lower operational costs with increased reliability and up time that results in greater productivity. Figure 1-2 illustrates how a reliability program impacts acquisition and operational costs. As this figure indicates, acquisition costs may increase due to efforts to improve reliability.
Operational Total Life Cycle Costs Costs Operational Costs Total Life Cycle Costs Acquisition Costs No Formal Reliability Program With Formal Reliability Program
Acquisition Costs
Figure 2-2. Impact of a reliability program on life cycle cost However, operational costs, and even more important, total life cycle costs decrease. It is important for the customer to make equipment purchase decisions based on total life cycle costs and not just on initial purchase price.
SEMATECH
12 Optimizing Life Cycle Costs. Increasing acquisition costs to improve equipment reliability and lower operational and total life cycle costs is clearly a recommended practice. However, there is a point at which increasing acquisition costs to obtain higher levels of reliability is no longer beneficial. Figure 1-3 shows an optimal point beyond which total life cycle costs begin increasing with further improvements in reliability.
Life Cycle Costs Optimized Cost Point Life Cycle Costs Acquisition Costs Operational Costs
Reliability
Figure 2-3. Optimizing Life Cycle Costs When this occurs, a more reliable technology is required for further improvement. Reliability insights from a technology used in one generation of equipment should be documented so they can be used to improve the next generation. Improvements in technology transfer between equipment generations will generally produce a decrease in the life cycle costs in each succeeding generation of equipment as shown in Figure 2-4.
SEMATECH
13
Generation 1 Generation 2
Generation 3
Generation 4
Reliability
Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment 2.5 The Reliability Improvement Process
The reliability improvement process is an iterative process that is applied at each phase of the equipment life cycle. It consists of five basic steps: 1. Establish reliability goals and requirements for equipment 2. Apply reliability engineering or improvement activities, as needed 3. Conduct an evaluation of the equipment or equipment design
SEMATECH
14 4. 5. Compare the results of the evaluation to the goals and requirements and make a decision to move either to the next step or the next phase Identify problems and root causes
The process then returns to Step 2, and Steps 2 through 5 are repeated until goals and requirements are met. The reliability improvement process steps are shown in the flowchart in Figure 1-5.
Establish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
Yes
SEMATECH
15
1.
Establish Reliability Goals and Requirements. The first step in the reliability improvement process is to establish reliability goals and requirements. A distinction is made between goals and requirements. Goals are more internally driven and may or may not be met. Requirements, on the other hand, are more specific and are customer driven. Requirements are usually included as deliverables in contractual agreements. Goals are the starting point, but are modified to satisfy customer requirements early in the equipment life cycle.
Establish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
Yes
All goals have certain common characteristics. The following criteria can be used to assist in establishing goals[3]: Attainability: Goals should be set at levels reasonably attainable within the available time span. Large goals over long periods should be avoided to maintain interest and commitment. Subgoals over shorter times are more attainable and more cost effective. Supportability: Support and resources must be available at the time they are needed to achieve goals. Advance planning is needed to determine the resources and the extent to which they can or will be provided. Acceptability: Goals must be acceptable to those who will be actively involved in pursuing these goals. Acceptance is influenced by relevance, perceived importance, reasonableness, and desirability of outcome. Measurability: Goals provide standards against which performance may be assessed and, therefore, should be selected for suitability and defined in a way that enables measurement. To make them measurable, goals must be defined qualitatively, quantitatively, and in terms of performance parameters, values, and time scales.
SEMATECH
16 2. Reliability Engineering and Improvements. Once goals and requirements have been established, design-for-reliability practices, or reliability improvement activities are applied to enhance the reliability of equipment that is in any phase of the life cycle, or for equipment already in existence.
Establish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
There are some basic practices that can be applied to improve reliability. These include: Simplicity. Simplification of equipment configuration is one of the basic principles of designing-for-reliability. Added parts or features increase the number of failure modes. A common practice in simplification is referred to as component integration (the use of a single component to perform multiple functions). Redundancy. Another reliability improvement practice is to include more than one way to accomplish a function by having certain components or subassemblies in parallel, rather than in series. Beyond a certain point, redundancy may be the only cost-effective way to design reliable equipment. Proven Components and Methods. To the extent possible, designers should use components and methods that have been shown to work in similar applications. Using proven components can minimize analyses and testing to verify reliability, thus reducing time and costs of demonstrating reliability of the equipment. Derating. Derating is the practice of using components or materials at environmental conditions or loads that are less severe than their limiting condition. Under these conditions, the component or material is expected to be more reliable. Eliminating Known Causes of Failure (Fault Avoidance). This can be accomplished through screening and burn-in procedures to eliminate weak components before equipment is actually shipped to the customer.
SEMATECH
17 Failure Detection Techniques. Reliability of equipment can be improved by incorporating failure detection methods or self-healing devices such as periodic maintenance schedules, monitoring procedures, automatic sensing and switching devices. Ergonomics or Human Factors Engineering. The activities of humans can be very important to equipment reliability. The equipment design must consider human factors aspects such as the person-machine interface, human reliability, and maintainability.
Conduct Evaluation. The next step in the reliability improvement process is to conduct an evaluation of the equipment or equipment design to assess its reliability level. A powerful tool for conducting this evaluation is reliability modeling. For equipment in the early phases of the life cycle, reliability modeling can be used to predict the equipments performance to provide information for design changes or for evaluating design alternatives. For equipment that is already in production or is operational in the field, reliability modeling, combined with testing and failure data analysis, can be used to identify critical components and help guide resource allocation and reliability improvement decisions.
Establish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
There are a number of reliability prediction models. These include: Block diagram models. A block diagram is used to logically represent the equipment being modeled by breaking it down into subsystems and components. Equipment reliability is modeled using failure data on the subsystems and components. State transition (Markov) models. Equipment reliability is modeled by identifying the various operating conditions (states) that the equipment, subsystem, or component can experience, and the probability of transition from one state to another. Other techniques for evaluating equipment reliability and identifying design weaknesses include:
SEMATECH
18 Fault tree analysis (FTA). A "top down" approach beginning with an undesirable event (usually equipment failure) at the top or system level and identifying the events at subsequent lower levels that can cause the undesirable top event. Failure modes and effects analysis (FMEA). A technique for systematically identifying, analyzing, and documenting the possible failure modes within a design and the effects of such failures on equipment performance.
Testing is another tool for evaluating equipment reliability. Typically, three different categories of testing are applied: 1. Component tests - useful in flushing out basic weaknesses in critical components 2. Systems tests - intended to explore effects of component interactions 3. Reliability demonstration tests - used to demonstrate equipment capability The above concepts are discussed in more depth in Section 2.0 and 3.0.
SEMATECH
19
4.
Are Goals and Requirements Met? Results of the evaluation process are compared to reliability goals and requirements. If goals and requirements are not met, the problems and root causes should be identified as described in Step 5, and reliability improvement activities should be initiated. If goals and requirements are met or exceeded, then approval can be given to move to the next phase of the life cycle, or goals and requirements can be updated and additional analyses carried out. For example, if the equipment is in the concept and feasibility or design phase of the life cycle, sensitivity analyses can be conducted to evaluate design and cost trade-offs such as: Design complexity versus reliability Maintainability versus reliability Increased costs versus reliability
Esbablish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
If goals are, or can be exceeded by a significant margin, then the supplier should capitalize on the situation by turning it into a competitive leadership position. Upon completing design trade-off studies, approval can be given to move to the next phase of the equipment life cycle where the reliability improvement process is again initiated. 5. Identify Problems and Root Causes. If reliability goals and requirements are not met, the reasons need to be identified and corrective actions should be taken. Test data on prototypes or actual equipment in the field can be used to supplement information on equipment reliability generated from predictive modeling. Testing can also help to identify causes of failure and any potential reliability problems.
SEMATECH
20
Establish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
A key tool useful for reporting and analyzing failure data is the failure reporting, analysis, and corrective action system (FRACAS). This tool is discussed in more detail in Sections 2.0 and 3.0. Test data and all reported failures should be investigated to verify that a failure occurred. Failure verification can be performed by subjecting the component to the same conditions as those reported when the "failure" occurred. The reliability improvement process now returns to Step 2, where reliability improvement and growth activities are initiated, or upgrades and modifications to reliability goals and requirements are made. Reliability growth activities generally fall into the following major categories: Strengthening the existing design, by testing or modeling (or both) to identify optimal design changes to improve reliability. The process of identifying weak areas can be aided by performing sensitivity studies using the reliability model of the system. Redesigning part or all of the system (fault tolerance), which includes studying ergonomicenhancing software, adding redundancy, and incorporating error detection techniques. Eliminating known causes of failure (fault avoidance), which includes using screening and burn-in procedures to eliminate weak components, derating parts, and using more reliable parts. Steps 2 through 5 are repeated until goals and requirements are met. The process may require several cycles of goal setting, evaluating, comparing, and improving. Approval can then be given to move to the next phase of the life cycle, where the reliability improvement process is again applied.
SEMATECH
21
2.6
Optimal benefits from use of the reliability improvement process are clearly realized when the process is applied to equipment in the concept and feasibility phase of the life cycle and then continuously applied thereafter. Benefits can also be realized when the improvement process is applied to equipment that is in some advanced phase of its life cycle. It is important to address equipment reliability throughout the life cycle. For example, reliability improvements may be necessary: Following the Prototype Phase, because of design deficiencies or parts problems uncovered during prototype testing Beginning the Pilot Production Phase, due to reliability related issues resulting from manufacturing a new equipment line During the Production and Operation Phase, because feedback from field personnel and customers indicate reliability problems due to unanticipated failure mechanisms. Activities Activities associated with applying the reliability improvement process to the equipment life cycle remains basically the same from one phase of the life cycle to the next. Others, however, vary because of the change in focus from phase to phase. For example, focus in the concept and feasibility macro phase is primarily on "planning and allocating;" focus in the design and development macro phase is primarily on "predicting and verifying;" and focus in the production and operation macro phase is primarily on "evaluating and improving." The activities also vary depending on whether the improvement process has been continuously applied to equipment as it moved through its life cycle from concept and feasibility to phase out, or whether it is being applied for the first time to equipment that is in some advanced phase. For example, consider equipment in the prototype phase: If the reliability improvement process has been applied continuously to the equipment in the concept and feasibility phase and in the design phase, then the reliability goals and requirements already exist. Thus, the reliability goals and requirements activity consists, primarily, of updating the goals and requirements; the primary focus would be on prototype testing and corrective action activities. However, if the reliability improvement process was applied to equipment for the first time during the prototype phase, then developing reliability goals and requirements should be a major focus because these goals and requirements do not exist. These concepts are discussed in more detail in Section 2.0. Figure 1.6 provides a high-level view of the main activities associated with applying the reliability improvement process to each of the three macro phases of the life cycle. This is provided primarily to illustrate the flow from one macro phase to the next. A more detailed discussion of applying the reliability improvement process to all six phases of the life cycle, and a list of the associated activities, is presented in Section 2.0. Some of the activities will vary as the reliability improvement process is tailored to a particular need or equipment line. However, the reliability improvement process remains unchanged.
SEMATECH
22
Concept/Feasibility
Establish Goals/Requirements
Concept/Feasibility
Step 4. Are Goals/Requirements Met? Go/No Go Decision on Next Phase Establish Goals/Requirements
-Set Reliability Goals -Create Reliability Program Plan -Develop Conceptual Designs -Develop Preliminary Model -Evaluate Conceptual Designs -Next Phase Go/No Go Approval -Identify Problems and Root Causes -Develop Corrective Actions
No Step 5. Identify Problems & Root Causes Step 4. Are Goals/Requirements Met? Go/No Go Decision on Next Phase
Concept/Feasibility
Establish Goals/Requirements
-Revise Goals/Requirements -Implement Field Tracking System -Begin Customer Feedback Program -Start Corrective Action Program -Upgrade Reliability Model
-Identify Problems and Root Causes -Develop Corrective Actions -Begin Phase Out Activities
SEMATECH
23
2.7
Summary
Knowledge of the equipment life cycle is important because it provides a basis for understanding how and where reliability engineering enters into the process of designing, producing, and operating the equipment. The equipment life cycle is broken into distinct phases, each representing a unique portion of the equipment life. These phases provide the framework for tracking reliability throughout the life cycle of the equipment and guidance on when and where to apply resources. Awareness of life cycle costs help equipment owners understand the impact on expenditures and cost of ownership when reliability is initiated at different life cycle phases. The reliability improvement process provides a means for systematically improving reliability throughout the equipment life cycle. Optimal benefits are realized when reliability is designed into a piece of equipment. However, it is important to improve reliability throughout the life of the equipment to meet reliability goals and objectives. The reliability improvement process is an iterative process of setting goals, then evaluating (predicting), comparing, and improving those goals. Central to the reliability improvement process is data collection and analysis; design improvements; and operations and maintenance procedure improvements. About Section 3.0 The next section provides details on preparing for and implementing the reliability improvement process. It includes a discussion of the various activities associated with each step of the improvement process and each phase of the life cycle. In preparation for this discussion, the following questions may assist in assessing current reliability practices and focus. 1. Is the importance of reliability conveyed throughout the company? 2. Is the approach to reliability improvement reactive or proactive? 3. Is the equipment development process life cycle oriented? 4. Have specific goals and requirements been established for equipment reliability and its growth? 5. Does the organization have technical and executive managers who champion the reliability cause? 6. Is demonstrated achievement of reliability goals a part of the criteria for deciding when equipment is ready for release to market? 7. Does the organization collect data that can readily be used in measuring and providing guidance for equipment reliability performance? 8. Do indicators of reliability performance exist for all equipment? 9. Are these indicators routinely monitored to ensure achievement of improvement goals? 10. Is a closedloop failure reporting and corrective action system in place?
SEMATECH
24 2.8 References 1. 2. 3. SI Staff, "Selecting a Product: The Task at Hand," Semiconductor International, March 1991, pages 7-8. J. E. Arsenault and J. A. Roberts, Reliability and Maintainability of Electronic Systems, Potomac, MD:Computer Science Press, 1980. W. Grant Ireson and Clyde F. Coombs, Jr., Handbook of Reliability Engineering and Management, Editors in Chief, McGraw-Hill, 1988.
SEMATECH
25
3 3.1
To ensure that maximum benefits are achieved when implementing the reliability improvement process, it is important to have an understanding of: Managements role in the implementation process The activities associated with applying the process Functional responsibilities in the implementation process Where to start the process How to use limited resources and communicate the value of the process Each of these topics is discussed in this section. Primary focus is given to applying the reliability improvement process. Activities associated with applying the reliability improvement process to equipment in the concept and feasibility phase and continuing throughout its life cycle are discussed first. Later, the discussion focuses on activities associated with applying the reliability improvement process to equipment in an advanced phase (other than concept and feasibility) of the life cycle. 3.2 Managements Role
Management plays a vital role in implementing the reliability improvement process. It has the responsibility for establishing the right environment, and in choosing individuals to champion the effort. The champions provide leadership and are accountable for the success of the reliability improvement process. Managements Responsibility One of managements primary responsibilities is to convey the importance of reliability throughout the company. Institutionalizing the reliability improvement process may require a cultural change and even an organizational change. Therefore, management leadership and commitment to this change is essential to ensure success. Success also depends on managements understanding of the activities involved in the reliability improvement process and on their support of these activities. Reliability Champions Selection of reliability champions is critical to the success of the reliability improvement process. Two reliability champions are recommended for moderate-to-large sized companies: an executive champion and a technical champion. In a small company, these two roles may be combined for one person. Executive Champion. The role of the executive champion is to: Provide executive leadership in reliability improvement matters Promote reliability improvement throughout the company Provide assurance that the reliability improvement process is supported
Technology Transfer # 92031014A-GEN
SEMATECH
26 Work closely with the technical champion to develop reliability activities Mentor the reliability improvement process and ensure that accomplishments are acknowledged
Depending on the size of the company, the executive champion could occupy any of a number of upper management positions. The following are a few examples: President or vice president Chief operations officer Chief technical officer Corporate total quality management executive Technical Champion. The technical champion establishes the reliability improvement process and is held accountable for its success. The technical champion takes an active role in: Providing both managerial and technical leadership Ensuring the implementation of an effective cross-functional improvement process Selecting the reliability activities to be performed and the tools that will be used Ensuring that the reliability improvement process is continuously applied Training participants in reliability concepts and tools If not already experienced in reliability, the technical champion should be trained in reliability principles. This training should include a full understanding of the equipment life cycle and life cycle costs concepts as well as reliability improvement process activities. This ensures the background necessary to provide proper guidance for application of the activities and tools associated with implementing the reliability improvement process. The technical champion could be the manager of, or chief engineer within, one of the following organizations: Systems engineering Reliability engineering Product engineering Customer engineering 3.3 Applying the Reliability Improvement ProcessThe Reliability Improvement Process
The reliability improvement process can be applied continuously as equipment moves through its life cycle phases. Activities associated with applying the process may vary as the equipment moves from one phase of the life cycle to the next. This variation results from a change in focus from phase to phase, and from the fact that an activity performed in one phase lays the foundation for activities in subsequent phases. Activities will also vary depending on whether the improvement process is applied continuously as equipment moves through its life cycle (from concept and feasibility to phase out), or whether it is applied for the first time to equipment that is in some advanced (other than concept and feasibility) phase. The following table lists the sections that contain descriptions of the reliability improvement process for each of the starting points (process applied for the first time):
SEMATECH
Technology Transfer # 92031014A-GEN
27 Table 3-1. Reliability Improvement Process Applied at Six Different Starting Points
Starting Points/Life Cycle Phase in Which The Process Applied For The First Time Concept and Feasibility Design Prototype Pilot Production Production/Operation Phase Out
Reference Sections
Section 3.3.1 Section 3.4.1 Section 3.4.2 Section 3.4.3 Section 3.4.4 Section 3.4.5
SEMATECH
28 Starting with Equipment in the Concept and Feasibility Phase The following paragraphs discuss the activities that are performed when the reliability improvement process is first applied to equipment in the concept and feasibility phase and then continuously applied in subsequent phases. The discussion for each life cycle phase concludes with a list of objectives that will have been met as a result of applying the reliability improvement process, and a table summarizing the activities associated with applying the process to that phase of the life cycle. Concept and Feasibility Step 1. Establish Goals and Requirements. In the concept and feasibility phase, the focus of Step 1 is on establishing goals to meet customer requirements. Later these goals may be revised, and are eventually modified to reflect changes in customer requirements, or in response to observations regarding equipment performance level.
Concept/Feasibility Design Prototype (a-site) Pilot Production (b-site) Production/Operation Phase Out
Goals can be established based on: Customer Voice. When establishing reliability goals, it is important to consider who the customers are and what aspects of reliability they regard as most important. The supplier must fully understand customers needs, and be able to translate these needs into equipment-specific information for setting goals. Competitive Benchmarking. Competitive benchmarking is a process used by suppliers to measure and compare their products, services, and operations against competitors and world class performers. Reverse Engineering. The systematic dismantling of equipment with a high reliability ranking is referred to as reverse engineering. The information obtained provides information about the actual reliability of similar equipment and the technology used to achieve that reliability. Warranty Requirements. To remain competitive, the reliability goals must support the established warranty requirements. Equipment Maintenance. It is essential to discuss maintenance aspects of the equipment with field personnel when establishing reliability goals. Improperly addressing maintenance issues can lead to a design with very high user-perceived reliability, but prohibitive maintenance costs.
SEMATECH
29 Once goals have been established, a reliability program plan is created that documents how these goals will be achieved. It defines: Activities to be performed Resources required to fulfill the activities Schedule for these activities Procedures by which the activities will be performed Organizations and interfaces required to perform the activities The program plan provides management and the customer with a means of measuring progress and assuring that requirements will be accomplished. Step 2. Reliability Engineering and Improvements. In the concept and feasibility phase, Step 2 of the reliability improvement process focuses first on developing alternative design concepts. All possible alternatives should be identified and evaluated to ensure that those selected for the design phase are capable of fulfilling goals and requirements. Functional block diagrams are used to develop the basic concepts for the equipment and to evaluate their feasibility. The functional block diagram is updated as the concept changes. The next step is to develop a preliminary model of the equipment using the functional block diagrams. The initial model is created at a gross level; that is, the equipment is broken into a few (approximately 10 to 20) major subsystems. This model is used to make initial predictions of the equipment reliability (Step 3). A reliability allocation is conducted to allocate the equipment reliability goal into the individual major subsystems. This is done to make equipment reliability requirements more manageable and to establish individual reliability requirements for each major subsystem. Since no detailed information on the equipment is yet available, the allocation process is approximate; it is used to guide the designer when developing various concepts. In this phase, the equipment has not been built, so other sources of data are required. Historical data can be used for those subsystems that are similar to previous generations of equipment. For those subsystems for which no historical data is available, expert judgement can be used. Expert judgement takes the opinion of individuals that are considered to be knowledgeable about a subsystem or component and uses this knowledge to create initial reliability values. Another reliability engineering activity available for identifying conceptual design weaknesses is a failure modes and effects analysis (FMEA). This is a technique for systematically identifying, analyzing and documenting the possible failure modes within a design and the effects of such failures on equipment performance. The process of setting up an FMEA is initiated in this step, but it is used later in Step 5 to help identify problems and root causes.
SEMATECH
30 Step 3. Conduct Evaluation. The subsystem failure data and the reliability prediction model are used to evaluate the reliability of the conceptual design. A reality check assures that the predicted reliability value makes sense. Evaluate the following: Predicted versus the anticipated reliability value Historical and expert opinion data used to calculate equipment reliability Reliability prediction model Conceptual design review(s) of the concepts that will be carried to the design phase are conducted at this point. These design reviews are also useful in evaluating the current level of the predicted reliability of the concepts being considered. Step 4. Are Goals and Requirements Met? A comparison is made between established goals and the predicted reliability values. If the goals are not met, continue to Step 5 where problems and root causes are identified. If the goals are met or exceeded, approval is eventually given to move to the design phase of the life cycle, where goals may be modified to meet customer requirements. Step 5. Identify Problems and Root Causes. If goals are not met, problems and root causes should be identified. Sensitivity analyses can be conducted to direct attention to those subsystems that have the greatest impact on the equipment reliability. If an FMEA was developed in Step 2, use it to examine the potential failure modes identified and to establish possible root causes. The reliability improvement process now returns to Step 2 (reliability improvement and growth activities are initiated). These might include: Adding high-level redundancy Using proven high reliability components and parts Forming partnerships with sub-tier suppliers Derating Once the conceptual design improvements have been selected and incorporated, both the functional block diagram and the reliability prediction model are re-evaluated. The model and the data used in the model are changed to reflect the conceptual design improvements. If an FMEA was initiated, it is also updated to reflect design changes. Steps 2 through 5 are repeated until goals are met and approval is given to move to the design phase of the life cycle. At the end of concept and feasibility phase, the following objectives have been met: Reliability goals have been established and allocated to major subsystems A reliability program plan has been initiated Conceptual designs that form the basis of the equipment design are determined Feasibility that selected conceptual designs will meet goals is demonstrated Table 3-2 summarizes the activities associated with applying the reliability improvement process to the concept and feasibility phase. There are three designators used for the activities:
SEMATECH
31 E(engineering), D(data), T(testing). These designators followed by a number provides the location of the activity in Section 3.0. Table 3-2. Reliability Improvement Process Activities
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? Establish reliability goals (E1) Create reliability program plan (E2) Develop functional block diagrams (E3) Create preliminary reliability model (E4) Allocate reliability goals (E5) Collect historical failure data (D1) Develop preliminary FMEA (E14) Develop preliminary Life Cycle Cost (AT19) Preliminary prediction of equipment reliability (E6) Conceptual design review(s) (E7) Compare goals to predicted reliability values 5. Identify Problems and Root Causes If goals are not met, continue to Step 5 If goals are met move to design phase of life cycle Activities
Design Step 1. Establish Goals and RequirementsGoals and Requirements. The reliability goals established in the concept and feasibility phase of the life cycle are modified and become reliability requirements in the design phase. Requirements need to be well-defined so that they are understandable by design engineers and manufacturers. Requirements should be broad in nature and be both qualitative (e.g., definition of responsibilities and program requirements) and quantitative (e.g., mean time between failures and uptime). Concept/Feasibility
SEMATECH
32
System level requirements are allocated to major subsystems and components. Once reliability requirements have been established, the reliability program plan is updated to reflect these requirements. Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. Design-for-reliability practices are applied at this step in the improvement process. Application of design-for-reliability practices creates a proactive environment for the design team. Some of the more basic practices include: Simplicity. Simplification of equipment configuration is one of the basic principles of designing-for-reliability. Added parts or features increase the number of failure modes. A common practice in simplification is referred to as component integration, which is the use of a single component to perform multiple functions. Proven Components. To the extent possible, designers should use components that have been shown to work in similar applications. Using proven components can minimize analyses and testing to demonstrate reliability of equipment. Derating. Derating is the practice of using components or materials at environmental conditions or loads that are less severe than their limiting condition. Under these conditions, the component or material is expected to be more reliable. Redundancy. Another reliability improvement practice is to include more than one method for accomplishing a function by having certain components or subassemblies in parallel, rather than in series. Beyond a certain point, redundancy may be the only cost-effective way to design reliable equipment. Failure Detection. Reliability of equipment can be improved by incorporating failure detection methods such as automatic sensing and switching devices. Ergonomics or Human Factors Engineering. The equipment design must consider human factors aspects such as the person-machine interface, human reliability, and maintainability. The functional block diagram is updated as the design develops. The gross reliability model, which consists of major subsystems, is expanded. Each subsystem is broken into more detail. For example, a wafer handler subsystem could be categorized into software, electronics, arm, and casing components. The reliability allocated to a subsystem is further allocated to the component level. As was the case in the concept and feasibility phase, this allocation is based on limited information available during the early phases of the life cycle; it is used as a guide when developing the various designs. As the design progresses, the allocation becomes finalized. If an FMEA was not developed in the concept and feasibility phase of the life cycle, initiate it in this phase. As was the case in the concept and feasibility phase, equipment in the design phase has not yet been built, so actual component failure data may not be available. Here again, historical data can be used for those components that are similar to previous generations of equipment. Use standard handbooks (such as MIL-HDBK-217[1] or NPRD-91 Handbook[2]), or expert opinion to obtain data for those components where no historical data is available.
SEMATECH
Technology Transfer # 92031014A-GEN
33 If a critical component is used for the first time and the life data is not available, run a simulated life test to generate the life data under the expected use conditions. Step 3. Conduct Evaluation. Use the subsystem and component failure data, and the updated reliability prediction model, to evaluate the reliability of the current equipment design. As was the case in the concept and feasibility phase, evaluate the following: Data sources and their validity Predicted versus the anticipated reliability value Historical and expert opinion data used in determining equipment reliability Reliability prediction model Conduct design review(s) of the design(s) that will be carried to the prototype phase at this time. These reviews are often broken down into: Requirements Review - review the equipments design requirements Preliminary Design Review - evaluate the preliminary design against requirements Critical Design Review - provide design to the customer(s) for review Step 4. Are Goals and Requirements Met? Compare the reliability requirements and the predicted reliability values. If requirements are not met, continue to Step 5 where problems and root causes are identified. If requirements are met, approval is given to move to the prototype phase of the life cycle. Step 5. Identify Problems and Root Causes. If requirements are not met, sensitivity analyses can be conducted to direct attention to those subsystems and components that have the greatest impact on the equipment reliability. Evaluate the FMEA that was developed in Step 2 to determine potential failure modes of the subsystems and components. The process now returns to Step 2, where reliability improvement activities are initiated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move to the prototype phase of the life cycle. At the end of the design phase, the following objectives have been met: The core architecture of the equipment design has been finalized Design(s) have been chosen for prototype
SEMATECH
34 Table 3-3 summarizes the activities associated with applying each step of the reliability improvement process to the design phase. Table 3-3. Reliability Improvement Process Activities2-3. Reliability Improvement Process Activities for the Design Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? -
Activities
Modify goals to match customer requirements(E1) Update reliability program plan (E2) Apply design-for-reliability practices (E9) Update functional block diagram (E3) Expand reliability model to include more detailed subsystems (E4) Allocate subsystem requirements to subsystem components (E5) Collect failure data for components within subsystems (D1) Evaluate reliability of purchased components (E11) Run life test on new and critical components (AT18) Update Life Cycle Cost (AT19) Perform ergonomics and human factors studies (E12) Conduct software reliability studies (E13) Implement FMEA (E16) Predict equipment reliability (E6) Conduct design reviews (E7) Compare reliability requirements to predicted values If requirements are not met, continue to Step 5 If requirements are met, move to prototype phase of life cycle
Prototype Step 1. Establish Goals and RequirementsGoals and Requirements. At this point in the life cycle, requirements have been established and little remains to be done other than to upgrade these as the design moves toward completion and prototypes are built. Modeling, as well as failure data analyses can be used to appraise current equipment reliability levels and evaluate what levels are achievable.
SEMATECH
35 Concept/Feasibility Design
As was the case in the previous two phases, the reliability program plan is updated. Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. The functional block diagram is again updated in the prototype phase to reflect any design changes. Subsystems and components having the greatest impact on equipment reliability are further expanded in the reliability prediction model. If reliability requirements were revised in Step 1, re-allocation to major subsystems and components may be necessary. For those subsystems and components that are modeled in more detail, reliability allocations need to be made to lower levels. If more than one prototype is built, a reliability model for each prototype design may be needed. Conduct a test to generate subsystem and system level reliability data for each of the prototypes. Aspects of the test program that are considered include: Test objectives Test parameters Test sample size Test duration Test environments
Component tests are useful for identifying basic weaknesses in critical components, whereas system tests are useful in exploring the effects of components interactions. Results from component tests alone should not be used for predicting system reliability performance, since component tests rarely duplicate system interactions. A failure reporting and corrective action system (FRACAS) can be initiated to record failure data gathered during the testing program. The FRACAS is a closed-loop reporting system that is useful in: Identifying failures and establishing a historical data base Analyzing failures to determine the cause Documenting the corrective action required to minimize reoccurrence of the failures Maximum benefits from a FRACAS are realized when it is implemented early in a test program and is directly coupled to the modeling effort. Failures identified during in-house testing (e.g., prototype tests) are easier to analyze than failures in the field. Furthermore, it is more cost effective to identify and correct failures earlier in the life cycle.
Technology Transfer # 92031014A-GEN
SEMATECH
36 The actual failure modes that are uncovered during testing, should be recorded in the FRACAS, and compared to the predicted failure modes established in the FMEA. Where difference occur, the reasons should be identified. Step 3. Conduct EvaluationEvaluation. Reliability of the various prototypes is evaluated based on the test data. Results of the prototype test are then presented for a design review prior to pilot production. Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Compare the results of the testing of the prototype(s) to the requirements to see if they have been met. If the requirements are not met, move to Step 5, where problems and root causes are identified. If requirements are met, then a design review is performed, including a management go/no go decision to continue to the pilot production phase of the life cycle. Step 5. Identify Problems and Root CausesProblems and Root Causes. A sensitivity analysis is conducted to direct attention to those subsystems and components that have the greatest impact on the equipment reliability. Root causes of the failures recorded in the FRACAS are identified and corrective actions implemented. A more detailed failure analysis might also be performed on those subsystems and components that are failing at a significantly higher rate than previously anticipated. The process now returns to Step 2, where improvement activities are initiated. If a FRACAS was initiated, it might identify corrective actions that could be implemented to eliminate failures. Other possibilities include: Derating Procedural changes Process changes A preventive maintenance (PM) program can be developed for subsystems and components that degrade equipment performance. Partnerships established with suppliers are continually nurtured and purchased subsystems and components are continually evaluated. Human capabilities and limitations are considered and changes are made to the equipment to eliminate failures due to human errors. The software reliability program is continued. For critical subsystems and components, the optimal operating range is found and the impact of the optimal range on other components is evaluated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move to the pilot production phase of the life cycle. At the end of the prototype phase, the following objectives have been met: The prototype(s) has been tested and evaluated to determine its capability of achieving the requirements. This includes redesigning and re-evaluating until a go/no go decision is reached The core subsystem and component designs are finalized. Table 3-4 summarizes the activities associated with applying the reliability improvement process to the prototype phase.
SEMATECH
37 Table 3-4. Reliability Improvement Process Activities for the Prototype Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities 3. Conduct Evaluation 4. Are Goals and Requirements Met? Update reliability requirements (E1) Update reliability program plan (E2) Update functional block diagram (E3) Expand reliability model, as needed (E4) Re-allocate subsystem and component reliability requirements (E5) Establish test plan (T1) Conduct Prototype test (T2) Establish FRACAS (E17) Perform human reliability analysis (D2) Develop preventive maintenance program (E10) Continue to evaluate the reliability of purchased components (E11) Perform ergonomics studies (E12) Conduct software reliability studies (E13) Update Life Cycle Cost (AT19) Evaluate prototype reliability (T2) Conduct design review(s) (E7) Compare reliability requirements to predicted values - If requirements are not met, continue to Step 5 - If requirements are met move to pilot production phase of life cycle 5. Identify Problems and Root Causes Perform sensitivity analyses (E8) Evaluate FRACAS to identify problems and root causes (E17) Evaluate FMEA to identify potential failure modes (E14) Perform failure analyses on critical components (E16)
Pilot ProductionProduction Step 1. Establish Goals and RequirementsGoals and Requirements. During the pilot production phase, upgrades are made to goals and requirements, as appropriate, and the reliability program plan is updated to reflect these, as well as other, changes. Modeling and failure data analyses are used to assess current and potential levels of equipment performance. Concept/Feasibility Design Prototype (-site)
SEMATECH
38
Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. Functional block diagrams and the reliability model are once again updated to reflect any changes that occurred during the prototype phase. If a FRACAS was not implemented during the prototype phase, then it should be done at this time. The test program is evaluated and updated as needed. Any aspects of the test program that are not clearly defined during the prototype phase should be established here. Additional tests that should be implemented at this time are: Burn-in tests Reliability qualification tests (RQT) Burn-in tests are useful in identifying weak components or subsystems prior to field use. An RQT is useful in initial customer applications of the equipment to evaluate equipment performance in actual operating environments. The RQT is also useful in verifying compliance with contractual objectives; whereby, equipment is tested according to a predetermined plan under specified environmental conditions and pass/fail criteria prior to a full-scale production decision[3]. Testing equipment in an environment that represents usage throughout its service life allows for establishing reasonable correlations between test results and actual field experience. The manufacturing processes should be qualified at this time to avoid the manufacturing problems identified during the pilot production. Qualifying manufacturing processes before full-scale production reduces manufacturing costs and prevents equipment performance degradation[4]. Qualifying manufacturing processes includes: Performing a process capability study Establishing process control Monitoring the defect level Reducing the defect level Periodically assessing and controlling the processes[5] Both new and existing manufacturing processes should be requalified periodically to ensure requirements are maintained. Personnel involved in the manufacturing process should be properly trained before introduction of the equipment. Step 3. Conduct EvaluationEvaluation. The pilot production phase of the life cycle is generally the first time equipment is evaluated in a customer environment. Thus, reliability modeling and prototype testing, engineers should work closely with customer service and field service personnel to evaluate initial customer applications of the equipment to evaluate its performance in actual operating environments. A reliability qualification test (RQT) is performed to verify compliance with contractual objectives. Problems and failures occurring during testing should be carefully analyzed, and recommendations for corrective action should be issued as part of the FRACAS. Failure modes identified in the FMEA are compared to reported failures during testing. Differences that occur should be analyzed. Definitions of failures should be issued, and pass-fail criteria should be established. Failures generally fall into four categories[5]:
SEMATECH
Technology Transfer # 92031014A-GEN
39 1. Catastrophic/Hard failures - failures that are permanent. For equipment, these failures reflect an irreversible physical change. These failures are easily identified and replicated. 2. Marginal failures - failures that are due to dirty or degraded performance of the critical components. The equipment is operational, but the output is not within the acceptable limits. 3. Intermittent failures - failures that only occur due to unstable equipment or varying software conditions. Intermittent failures occur randomly and are difficult to replicate. 4. Soft failures - failures that result from temporary environmental conditions. Like intermittent failures, soft failures occur randomly and are difficult to replicate. The pilot production phase provides the last opportunity to make design changes and other improvements before full-scale production begins. Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Results of field testing are compared to requirements to determine if they are met. If requirements are not met, the process moves to Step 5 where problems and root causes are identified. If requirements are met, a design review is conducted, and a go/no go decision to continue to the production and operation phase of the life cycle is made. Step 5. Identify Problems and Root CausesProblems and Root Causes. Sensitivity analyses, as well as feedback from a FRACAS and FMEA, are used to direct attention to problem areas and root causes. Techniques such as a Pareto analysis can assist in focusing on addressing major problems first, and then working to lower level problems later. The process now returns to Step 2, where improvement activities and corrective actions are initiated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move to the production and operation phase of the life cycle. At the end of the pilot production phase of the life cycle, the following objectives have been met: Capability of the pilot production design is tested and evaluated to determine if the design can achieve the end use requirements in the customers operating environment. The equipment design for full-scale production and deployment is finalized. Table 3-5 summarizes the activities associated with applying the reliability improvement process to the pilot production phase of the life cycle.
SEMATECH
40 Table 3-5. Reliability Improvement Process Activities for the Pilot Production Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Update reliability requirements, as needed (E1) - Update reliability program plan (E2) - Update functional block diagram, if needed (E3) - Update reliability model, if needed (E4) - Re-allocate reliability requirements, as needed (E5) - Upgrade testing program, as needed (T1) - Implement FRACAS, if not already done (E17) - Perform human reliability analyses (D2) - Perform software reliability studies (E13) - Perform ergonomic studies (E12) - Update preventive maintenance program, as needed (E10) - Continue to evaluate reliability of purchased components (E11) - Update Life Cycle Cost (AT19) 3. Conduct Evaluation - Conduct tests of equipment (T2) - Evaluate equipment reliability (E6) - Conduct design review(s) (E7) 4. Are Goals and Requirements Met? - Compare reliability requirements to observed values - If requirements are not met, continue to Step 5 - If requirements are met move to production & operations phase of life cycle 5. Identify Problems and Root Causes - Perform sensitivity analyses (E8) - Evaluate FRACAS (E17) - Evaluate FMEA (E14) - Perform failure analyses on critical components (E16)
Production/Operation 5 Step 1. Establish Goals and Requirements. Final updates to reliability requirements and the reliability program plan are made at this point. All major reliability problems should have been identified and corrected prior to full-scale production and deployment of the equipment. Concept/Feasibility Design Prototype (-site) Pilot Production (-site)
SEMATECH
41 Step 2. Reliability Engineering and Improvements. Functional block diagrams and the reliability model are updated to reflect any design changes that occurred during the pilot production phase. The FRACAS data base is updated to reflect failure modes uncovered during pilot production testing. The observed failures are also used to update the reliability model. A field tracking and customer feedback program is initiated to record operation and maintenance problems in the field. This information should account for uncertainty due to variations in site, equipment vintage, and customer procedures. Step 3. Conduct EvaluationEvaluation. Evaluation of the equipments performance at this point consists primarily of feedback from maintenance records. However, the effect of the pending corrective actions should be counted to predict the equipments future performance. Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Here again, if requirements are not being met, then problems and root causes are identified in Step 5. If requirements are being met, then it is important to continually monitor equipment performance and to implement a process of continuous improvement until decisions are made to phase out the current generation of equipment and begin development of the next generation. Step 5. Identify Problems and Root CausesProblems and Root Causes. Failures and problems reported during full-scale production and deployment in the field are fed through the FRACAS to verify the failure(s) and to identify root causes and corrective actions. Pareto analyses can be used to prioritize problems. The process now returns to Step 2, where improvements and corrective actions are implemented. Steps 2 through 5 are repeated until requirements are met. At the end of the equipments production and operation phase, the following objectives have been met: The equipment is manufactured in a manner that uniformly meets the customer and supplier requirements. Continuous improvement goals and requirements are established and demonstrated. Table 3-6 summarizes the activities associated with applying the reliability improvement process to the production and operation phase of the life cycle.
SEMATECH
42 Table 3-6. Reliability Improvement Process Activities for the Production and Operation Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Final update of reliability requirements, if needed (E1) - Final update of reliability program plan (E2) - Update FRACAS data base (E17) - Implement field tracking, customer feedback (D1) and corrective action program - Update human reliability analyses (D2) - Update software reliability studies (E13) - Update ergonomic studies (E12) - Update preventive maintenance program, as needed (E10) - Continue to evaluate reliability of purchased components (E11) - Update Life Cycle Cost, if required (AT19) 3. Conduct Evaluation 4. Are Goals and Requirements Met? - Assess equipment reliability based on the field data(E6) - Evaluate feedback from field tracking and maintenance records (D1) - Compare requirements to observed values - If requirements are not met, continue to Step 5 - If requirements are met: * Continually monitor equipment performance * Implement process of continuous improvement * Revise goals and requirements, as appropriate (E1) * Eventually phase out current generation equipment 5. Identify Problems and Root Causes - Perform sensitivity analyses (E8) - Perform failure analyses on field failures (E16)
Phase Out Step 1. Establish Goals and RequirementsGoals and Requirements. At this point in the life cycle, there are no goals or requirements to establish. A general goal would be to set requirements for subsystems and components to be carried over to the next generation of equipment. Also, it is important to have documented and retained all the information gained during the life cycle phases of the current generation of equipment so that similar mistakes will not be repeated.
SEMATECH
Phase Out
Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. There are no reliability engineering or reliability improvements to be made at this point. Phase-out alternatives should be offered to customers of current generation equipment. Possible alternatives might include: Training and spare parts availability for current generation equipment Trade-ins on new generation equipment (customer discounts) Inventory of current generation equipment could be phased out in stages such as: Stage 1 - where spare parts requirements are maintained Stage 2 - where spare parts are sold to customers who still want them (last chance) Stage 3 - where remaining spare parts are scrapped Step 3. Conduct EvaluationEvaluation. At this point, there is nothing to evaluate except the past performance of the generation of equipment being phased out. The failure rate database of the subsystems and components is being carried over to the next generation of equipment for future reliability modeling. Step 4. Are Goals and Requirements Met? Since no goals or requirements have been established, there are none to compare. Step 5. Identify Problems and Root Causes. As previously mentioned, it is important to retain all information on the performance of the equipment being phased out so that the information can be used to improve future generations of similar or new equipment. At the end of the phase-out phase of the life cycle, the following objectives have been met: The discontinuation of production and field support is planned and implemented in a manner that satisfies both the customer and supplier needs. Subsystems and components carried over to the next generation of equipment are evaluated for information that will cause an improvement in the next generation. A failure rate database has been developed for subsystems and components for the next generation of equipment. Table 3-7 summarizes the activities involved in applying the reliability improvement process to the phase out phase of the life cycle.
SEMATECH
44 Table 3-7. Reliability Improvement Process Activities for the PhaseOut Phase2-7. Reliability Improvement Process Activities for the PhaseOut Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements
Activities - Set requirements for subsystems and components to be carried to next generation of equipment - Document and retain all information gathered during generation of equipment being phased out
2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? 5. Identify Problems and Root Causes
- Offer phase-out alternatives to customers of equipment being phased out - Phase out current generation equipment in stages - Assess reliability of the current generation(E6) and carried information to next generation of equipment. - There are no goals or requirements to meet - Retain all information on equipment being phased out so that it can be used in future generations of equipment
3.4
When applying the reliability improvement process for the first time to equipment in some advanced phase (other than concept and feasibility) of the life cycle, the activities will vary from those discussed earlier. This is because the activities that would have been performed in the previous life cycle phase(s) have not been performed and must, to some extent, be made up. The discussion in the following paragraphs is based on starting the reliability improvement process in some phase of the life cycle other than the concept and feasibility phase, and then continuously applying it throughout the remainder of the phases. For example, if the reliability improvement process is being applied for the first time to equipment that is already in the prototype phase of its life cycle, then activities associated with each step of the process for that phase and all subsequent phases (pilot production, production and operation, and phase out) are considered. The activities associated with applying the reliability improvement process to phases beyond the phase in which the process is being initiated are, however, basically the same as those discussed earlier. Furthermore, this discussion is similar to the earlier discussions that involved the application of the improvement process. Therefore, every process improvement step in every life cycle phase is not discussed in detail. Only the differences are highlighted. 3.4.1 Starting with Equipment in the Design Phasewith Equipment in the Design Phase
When equipment has reached the design phase, the basic concept has already been established and is fixed in the minds of the design engineers. It is more difficult to incorporate customer needs into the design in this phase than in the concept and feasibility phase. However, it is not too late and is clearly important, to incorporate customer needs and requirements when establishing reliability goals. If a reliability program plan has not been initiated, do so at this time.
SEMATECH
45 If the functional block diagrams and the corresponding reliability model were not initiated in the concept and feasibility phase, develop them now. Equipment reliability requirements are then allocated to individual major subsystems in the model. Failure data are collected for use in the reliability model. Other activities associated with applying the reliability improvement process to the remainder of the process steps and life cycle phases are identical to those discussed earlier and are listed in Tables 3-4 through 3-7. Therefore, they are not listed again here. Table 3-8 summarizes the activities associated with applying the reliability improvement process to equipment that is in the design phase. The activities listed in Table 3-8 are similar to those listed in Table 3-3; the difference is in the activities listed under Steps 1 and 2. Table 3-8. Design Phase Reliability Improvement Process Activities
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Establish reliability goals and requirements (E1) - Establish reliability program plan (E2) - Apply design-for-reliability practices (E9) - Develop functional block diagram (E3) - Develop reliability model (E4) - Allocate requirements to subsystems and components (E5) - Collect failure data for subsystems and components (D1) - Evaluate reliability of purchased components (E11) - Perform ergonomic studies (E12) - Conduct software reliability studies (E12) - Implement FMEA (E16) - Develop life Cycle Cost (AT19) 3. Conduct Evaluation 4. Are Goals and Requirements Met? - Predict equipment reliability (E6) - Conduct design reviews (E7) - Compare reliability requirements to predicted values - If requirements are not met, continue to Step 5 - If requirements are met move to prototype phase of life cycle - Perform sensitivity analyses (E8) - Evaluate FMEA (E14)
SEMATECH
For equipment already in the prototype phase of the life cycle, the design is fixed. There is little opportunity to make major design changes due to cost and time constraints. However, it is still important to set goals and to understand and establish customer requirements. Furthermore, available failure data can be used to assess the current performance of the equipment for establishing upgrades to goals and requirements. If, a reliability program plan has not been developed, create one that identifies and ties together all of the reliability improvement process activities that will be performed during the prototype phase and subsequent phases of the life cycle. Develop the functional block diagrams and reliability models to better understand and predict the reliability of equipment designs being prototyped. Update these model(s) as the design changes but realize that the models may become more complex as the design evolves. Develop detailed breakdowns of the subsystems that are significant contributors to system unreliability. Allocate reliability requirements to the individual subsystems. The subsystem allocations are then further divided into component allocations. The allocation process is used as a guide for improving the reliability of the equipment components and subsystems. Table 3-9 summarizes the activities associated with applying the reliability improvement process to equipment that is in the prototype phase. The activities associated with applying the reliability improvement process to the remainder of the life cycle phases are identical to those discussed earlier and listed in Tables 2-5 through 2-7. Therefore, details are not listed here.
SEMATECH
- Perform sensitivity analyses (E8) - Evaluate FRACAS (E17) - Evaluate FMEA (E14) - Perform failure analyses on critical components (E16)
3.4.3
Starting with Equipment in the Pilot Production Phasewith Equipment in the Pilot Production Phase
For equipment in the pilot production phase of the life cycle, the focus should be on appraising the actual level of equipment reliability (from available data) and determining what levels are desired and obtainable. This is still an important step in the environment of customer requirements. A reliability program plan can still be created to identify and tie together all of the reliability improvement process activities that will be performed during the pilot production phase and subsequent phases of the equipment life cycle.
SEMATECH
48 The majority of this effort should be directed at making needed design improvements once the equipment is evaluated. It is not too late to incorporate some design-for-reliability practices. The focus should be on reliability growth activities directed at the existing design. A method for collecting, tracking, and storing reliability data should be established. A FRACAS can be initiated and used to track reported failures during pilot production, and to identify corrective actions necessary to eliminate these failures. It is still not too late to initiate an FMEA. Ergonomic studies can be used very effectively at this point. Table 3-10 summarizes the activities associated with applying the reliability improvement process to equipment starting in the pilot production phase. Table 3-10. Pilot Production Phase Reliability Improvement Process Activities When Initiated In Pilot Production Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? Activities Establish reliability goals and requirements (E1) Establish reliability program plan (E2) Create functional block diagram (E3) Create reliability model (E4) Allocate reliability goals and requirements (E5) Establish data collection and tracking system (D1) Establish testing program (T1) Establish FRACAS (E17) Establish FMEA (E16) Perform human reliability analyses (D2) Perform ergonomic studies (E12) Perform software reliability studies (E13) Establish preventive maintenance program (E10) Evaluate reliability of purchased components (E11) Evaluate equipment reliability (E6) Conduct tests of equipment (T2) Conduct design review(s) (E7) Compare goals and requirements to observed values 5. Identify Problems and Root Causes If requirements are not met, continue to Step 5 If requirements are met move to production & operation phase
Perform sensitivity analyses (E8) Evaluate FRACAS (E17) Evaluate FMEA (E14) Perform failure analyses on critical components (E16)
SEMATECH
49 3.4.4 Starting with Equipment in the Production and Operation Phasewith Equipment in the Production and Operation Phase
For equipment in the production and operation phase of the life cycle, the design is fixed. There is no opportunity to make major design changes at this time. Thus, the focus of Step 1 should be on appraising the actual level of reliability of equipment in this phase, and evaluating the levels that are desired and whether these levels are achievable. Upgrades to existing equipment can be made based on failure data analyses. Although rather late in the life cycle, creating a reliability program plan to track the activities to be performed during this phase and the phase out period of the life cycle is still beneficial. Efforts should focus on making needed improvements to the existing design and on reliability growth activities since it is too late to design reliability into the system. Table 3-11 summarizes the activities associated with applying the reliability improvement process to equipment that is in the production and operation phase of the life cycle. The activities associated with applying the improvement process to the phaseout phase of the life cycle are identical to those discussed earlier and listed in Table 3-7 and, therefore, are not listed here.
SEMATECH
50 Table 3-11. Production and Operation Phase Reliability Improvement Process Activities When Initiated in Production and Operation Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? Activities Establish reliability goals and requirements (E1) Establish reliability program plan (E2) Develop functional block diagram (E3) Create reliability model (E4) Allocate goals and requirements (E5) Establish FRACAS (E17) Establish FMEA (E14) Implement field tracking and customer feedback program (D1) Perform human reliability analyses (D2) Perform ergonomic studies (E12) Perform software reliability studies (E13 Establish preventive maintenance program (E10) Evaluate reliability of purchased components (E11) Assess equipment reliability using the field data (E6) Evaluate feedback from field tracking and maintenance records (D1) Use FRACAS to evaluate field failures (E17) Compare goals and requirements to observed values If requirements are not met, continue to Step 5 If requirements are met: * Continually monitor equipment performance * Implement process of continuous improvement * Eventually phase out current generation equipment Perform sensitivity analyses (E8) Perform failure analyses (E16)
3.4.5
Starting with Equipment in Phase Out Phase with Equipment in Phase Out Phase
It is much too late to make any changes to the equipment during the phase-out phase. The goal in this phase is limited to collecting the reliability data of the equipment in order to gain insight into the next generation of equipment. This information can save tremendous amounts of time and money in the concept and feasibility phase of the next generation. There are no reliability engineering or reliability improvements to be made at this point. Phaseout alternatives should be offered to customers of current generation equipment. Table 3-12 summarizes the activities involved in applying the reliability improvement process to equipment that is in the phase-out phase of the life cycle. This table is identical to Table 3-7.
SEMATECH
51 Table 3-12. Phase Out Phase Reliability Improvement Process Activities When Initiated in Phase-Out Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements Activities - Set requirements for subsystems and components to be carried to next generation of equipment - Document and retain all information gathered during generation of equipment being phased out 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? 5. Identify Problems and Root Causes - Offer phase-out alternatives to customers of equipment being phased out - Phase out current generation equipment in stages - Create reliability model of subsystems and components carried to next generation equipment (E4) - There are no goals or requirements to meet - Retain all information on equipment being phased out so that it can be used in future generations of equipment
3.5
Functional ResponsibilitiesResponsibilities
The executive and technical reliability champions have responsibility for ownership of the reliability improvement process. However, various groups are assigned responsibility for implementing and maintaining the reliability improvement process activities during the life cycle of a piece of equipment. The type of group that should be held accountable depends on the particular life cycle phase and the activity being performed. Both managers and engineers are given responsibility for activities. Although a particular group has been assigned overall responsibility for an activity, other groups may actually provide assistance or perform the activity. Because each company has a unique management structure, the reliability champions responsibilities include choosing the appropriate groups to assist, participate, and own each activity. For companies that have a reliability engineering group, the following paragraphs present recommended practices and organizational guidelines that will help make the reliability improvement process activities successful. Recommended practices for reliability engineers: The engineering group and designers (not the reliability engineers) are accountable for the reliability of the design and the cost of poor reliability, All designers are trained in basic reliability methods and tools by the reliability group Reliability engineers are part of the design team Reliability engineers assist designers The reliability group is accountable for reliability planning, program development, and assuring adherence to program policy
SEMATECH
52 Organizational guidelines for reliability engineering group: The group reports to development engineering manager, not to quality assurance The group reports to the systems engineering manager, not to field service Reliability engineer(s) report to the program manager of equipment with other members of the design team not to operations The group exists as a separate peer group with engineering (Caution: this can lead to reliability engineers being accountable for reliability and becoming isolated from the design team), not part of sales 3.6 Where to Begin
One of the most difficult problems facing a company is where to begin. In an ideal environment, a reliability program would evolve along with the formation of the company and the development of its first product. A master plan for continuous reliability improvement would have been established, and reliability activities would have been initiated as needed throughout the equipments life cycle. In a more typical situation, a company has an informal reliability effort. This effort may be applied sporadically, based on the personal style and management priorities of the equipment development manager. If the companys equipment has poor reliability in the field, a major engineering project may be initiated to fix specific reliability problems. Otherwise, the company faces losing business to the competition. The management team frequently does not recognize the need for or require development of a core reliability program that ensures ongoing attention to reliability requirements for all equipment. Even if management recognizes the need for the reliability process, they often find themselves in a reactive mode with current equipment problems and limited resources. Often, management may not be willing to wait for the benefits of a reliability program that is developed at the same time as its next product. Although each companys situation is unique, there are some general guidelines that can be used to determine where implementation of a reliability program would be most effective. The first step involves assessing where in the life cycle each equipment line falls, and determining its current reliability performance. The ultimate goal is to choose one equipment line on which to focus reliability improvement activities. Obviously, the earlier in the equipment life cycle reliability improvement activities are implemented, the greater the benefits. It is likely that a supplier will be developing more than one equipment line at any given time, each of which is in a different phase of its life cycle. For example, Figure 2-1 shows three equipment lines, each of which is in a different phase of its life cycle. Equipment A is in full production Equipment B is in the design phase Equipment C is just beginning the concept and feasibility phase Benefits can be gained by applying the reliability improvement process to any of these three equipment lines. However, there are optimal situations to be aware of.
SEMATECH
53 Figure 3-1. Multiple Equipment and Their Life Cycle Phase Status
Equipment C
Design
-Site
-Site
Phase Out
Equipment B
Design
-Site
-Site
Phase Out
Design
-Site
-Site
Phase Out
Today
Time
Equipment C has the greatest potential for cost-effective improvements in reliability because it is in the earliest phase of its life cycle. However, this does not mean that it is too late to improve the reliability of Equipment A and B. Reliability improvements can and should be considered in every phase of the life cycle. However, when starting a reliability improvement process, it is generally advantageous to choose equipment that will show immediate successes. If sufficient resources exist, address all equipment in all life cycle phases. Because it is unlikely that this is the situation, the following priorities are recommended: 1. Equipment in the Production and Operation Phase. Although this is a reactive strategy, it is the most customer oriented, and is capable of demonstrating quick benefits. Another benefit of starting with equipment in this phase is that data on the equipment in the field is available and can be used to determine current reliability performance. If you are unable to determine your current situation, it is difficult to set realistic goals and determine whether they have been met. It is also important to assess the impact of upgrades to equipment in this phase using the reliability model and existing failure data.
SEMATECH
54 2. Equipment in the Design Phase. This is a proactive strategy and has the greatest long-term benefits. In this phase, it is difficult to determine what the reliability performance of the equipment will be unless the previous generation has a database and a significant number of similar parts. If this information exists, it can be used with modeling to evaluate potential performance of designs being considered. Equipment in the Prototype or Pilot Production Phase. These phases are reactive and have benefits between the prior two stages. There is some amount of data available; therefore, the anticipated reliability performance of the equipment in the field can be determined. The drawback with these phases is the expense and time involved if major design changes are necessary. Equipment in the Concept and Feasibility Phase. This is a proactive and the least expensive phase. Significant reliability improvements can be made to equipment in this phase with minimal use of resources. However, as with the design phase, the lack of data makes it difficult to determine reliability performance. In general, ignore equipment in or near Phase Out. Activities should be limited to customer requests. However, if the product that is being phased out has future generations that are significant to the companys strategic plan, collecting data and analyzing failures of the product will yield tremendous insight into development of the next generation.
3.
4.
5.
When making a choice, choose equipment that you know will have future generations. As mentioned in Section 1.0, the cost of improving equipment reliability will decrease as it moves from generation to generation. Knowing the reliability performance of existing equipment is essential to evaluating current equipment status and for setting reliability goals for current and future equipment. It is difficult to set realistic and attainable performance goals without this knowledge. Table 2-13 illustrates the type of reliability performance information that is available for the three equipment lines shown in Figure 2-1. Table 3-13. Current Product Line Status
Equipment Current Life Cycle Phase A Production and Operation Current Reliability Performance Actual - MTBFp Actual - MTTR Predicted - MTTR Goal - MTTR Predicted - MTBF p Design B C Concept and Feasibility Goal - MTBF p
Mean time between failures (MTBFp) and mean time to repair (MTTR) are the two measures of reliability performance used in this illustration. SEMI Standard E10-90[6] provides several other measures of reliability. Table 2-13 indicates that the MTBFp and MTTR values are known for
SEMATECH
Technology Transfer # 92031014A-GEN
55 Equipment A. Actual data are not available for Equipment B and C because they are in early stages of development. However, Equipment B has predicted values based on the design and Equipment C has goals that it is targeted to meet. Reliability and design engineers determine current reliability performance by collecting and analyzing data received from a number of sources, including Field service reports Customer feedback In-house testing In situations where data is not available, but reliability performance needs to be determined, preliminary engineering judgements, mathematical predictions, and consensus using the opinions of experts can be used as a first cut at data values. As discussed previously, one of the cornerstones of reliability improvement is the reliability data reporting system. It is an organized means of gathering factual data about equipment performance-both good and bad. Although useful data estimates can be determined during the concept and feasibility phase as well as the design and development phases of the equipment life cycle, the most meaningful data is collected during the production and operation phase, when the equipment is operating in its intended environment. Nevertheless, information gathered in any phase of the life cycle can be used to ensure that the reliability goals are attained with minimal time and expense commitments. Section 4.0 discusses in detail the activities associated with data collection and analysis. These activities include determining: What data to collect How to use this data The most effective format to use when collecting data How to transform the data into failure rates How to get numerical values for human errors
It is important to note that an effective reliability improvement process includes a central database that includes data collected for all equipment of the same model or type and accounts for uncertainty due to variations in site, equipment vintage, and customer procedures. 3.7 Reliability Plans
The supplier should develop several reliability plans, a general company plan covering all products, and the specific product for individual equipment lines. The following six elements must be included in these plans: 1. Objectives 2. Constraints, limitations and requirements that exist at the time the plan is written 3. Basic assumptions made 4. Activities to be performed to meet objectives 5. Resources required to perform the planned activities 6. A schedule showing when the activities will be started and completed
Technology Transfer # 92031014A-GEN
SEMATECH
56 General Company PlanCompany Plan An overall reliability plan tailored to a company that takes into account the companys size and available resources; the plan addresses the following issues: The companys reliability policy Identification of reliability champions The overall strategy How reliability skills will be acquired within the company, and A description of organizational activities Specific Product Plans Each equipment line requires a reliability plan based on the life cycle phase of the equipment line, reliability goals and requirements, schedule limitations, and resources available. The more stringent the goals, the more activities, tools, and resources required to achieve the goals. Also, the shorter the schedule, the more resources that must be applied over the scheduled period. The plan will identify the specific reliability activities and tools that will be used for a specific equipment line, and who (or which department) is responsible for performing them. 3.8 Application of Resources and Communicating Value
There are typically two difficult problems facing an organization at this point Applying limited and already allocated resources to what appears to be a monumental undertaking Communicating the value of the reliability improvement process to key decision makers and participants In an ideal environment, a master plan for continuous reliability improvement would have been established and reliability activities would have been initiated as needed throughout the equipments life cycle. In a more typical situation, a company has an informal reliability effort. This effort may be applied sporadically, based on the personal style and management priorities of the equipment development manager. If the companys equipment has poor reliability recorded in the field, a major engineering project may be initiated to fix specific reliability problems. Otherwise, the company faces losing business to the competition. The management team frequently does not recognize the need for or require development of a core reliability plan that ensures ongoing attention to reliability requirements for all equipment. Even if management recognizes the need for the reliability process, they often find themselves in a reactive mode with current equipment problems and limited resources. Often management may not be willing to wait for the benefits of implementing a reliability improvement process that is developed at the same time as its next product. Ideally, once a piece of equipment has been selected for the reliability improvement process, responsible individuals or groups would perform all the activities within the process steps. If resources are limited, individuals or groups would perform selected activities. The choice of activities depends on the company, and ultimately only the companys people know what resources can be successfully deployed and the best time frame for employing these activities.
SEMATECH
Technology Transfer # 92031014A-GEN
57 However, the following items should be considered: Select activities that require various groups to work together on reliability improvement. This extends ownership of the reliability mission and shows success across multiple fronts. Initially choose activities that will give immediate benefits. Implementation of the reliability improvement process requires a long-term sense of vision and commitment. However, the engineer needs to "sell" management and participants on the advantages of the activities. This generally requires some demonstration of improvements almost immediately. If portions of an activity are already in place, build on them. Specific reliability skills training should be taught to individuals as they become directly involved and are ready to apply new skills to real issues. The vision of reliability for the equipment and the plan for how that reliability is going to be met should be discussed early to orient everyone in the company to the reliability effort. The implementation of the reliability process as described, occurs in a somewhat piecemeal fashion. However, this approach offers an effective means of applying limited resources to real and timely issues. When this approach is used, it is particularly important to have a technical champion to manage the entire equipment reliability effort. This ensures that a coherent and well-coordinated development effort occurs. It is best to start small; start with one piece of equipment, implementing those activities that fit best in your company. Attempting to implement the process for all equipment simultaneously generally does not work. Once the reliability process for one piece of equipment is in place and the next piece of equipment is targeted for reliability improvement, find those activities that overlap. For example, if components or subsystems in the first piece of equipment are identical or very similar in the next piece of equipment, combine databases and reliability models for those parts. Communicating Value Communicating the value of the reliability effort to key decision makers and participants is vitally important and can be accomplished in three ways: 1. Translate the reliability efforts and benefits to measures such as cost savings, resource or cost avoidance, time to market, or market share gain. 2. Demonstrate a series of immediate short-term improvements and document those improvements noting the benefits gained. 3. Develop a champion in senior management who will support your reliability efforts when top level support is needed. The champion has the respect of decision makers and also the authority to influence and encourage participants. 3.9 Summary
The role management plays in the reliability improvement process is vital. Management has unique responsibilities in the establishment and implementation of the process. Management also assigns individuals to the role of reliability champions. The executive champion provides
Technology Transfer # 92031014A-GEN
SEMATECH
58 reliability leadership with the full support of upper management. The technical champion establishes the reliability improvement process and is responsible for its success. The five steps of the reliability improvement process can be applied to a piece of equipment no matter what phase it is in. This section discussed the activities associated with each step of the reliability improvement process for each phase of the life cycle. This section also included a discussion on how to select a piece of equipment to implement a reliability program based on the life cycle phases. The section also covered the importance of data, the choice of activities when resources are limited, rules for the reliability program plan, and suggestions on how to communicate the value of the reliability effort to key decision makers and participants in the reliability program. Section 3.0 provides more detailed descriptions of the reliability-related activities and presents some of the tools and techniques available in planning, developing, and implementing a reliability improvement program. 3.10 References 1. MIL-HDBK-217E, Reliability Prediction of Electronics Components. 2. Non-Electronics Part Reliability Data, Reliability Analysis Center, Rome, NY, 1991. 3. RMS Committee, RMS, Reliability, Maintainability & Supportability Guidebook, SAE G-11, Society of Automotive Engineers, Inc, Warrendale, PA, 1990. 4. DOD 4245.7-M, Transition from Development to Production, September, 1985. 5. William W. Everett, et al., Reliability by Design, A Guide to Reliability Management, Issue 1, AT&T, Indianapolis, IN, November 1990. 6. SEMI E10-90, Guideline for Definition and Measurement of Equipment Reliability, Availability, and Maintainability (RAM), SEMI 1990.
SEMATECH
59
4.1
The first two sections of these guidelines provided an overview of the reliability improvement process and the equipment life cycle. This section provides a description of the activities and tools that are part of the reliability improvement process. The reliability activities are grouped as: Engineering Data Testing Engineering activities form the foundation of the reliability improvement process. Data activities also play an important role because the engineering activities depend on data. Testing activities provide a valuable source of data and information. There are three designators used for the activities: E (engineering), D (data), and T (testing). These designators followed by a number provide the location of the activity in this section. Some of the activities stand alone; that is, they do not require any formally recognized tools of the trade. These tools come from various academic disciplines such as probability and statistics, and reliability engineering. However, many of the activities use these standard methods and techniques referred to as tools. The designator used for the tools is AT, followed by a number. 4.2
Reliability ActivitiesActivities
The following lists summarize the reliability activities that are discussed in this section: Engineering Activities E1 Reliability Goals E2 Reliability Program Plan E3 Functional Block Diagrams E4 Equipment Reliability Modeling E5 Reliability Goal Allocation E6 Equipment Reliability Quantification E7 Design Reviews E8 Sensitivity Analysis E9 Design for Reliability Practices E10 Preventive Maintenance Program (PM) E11 Reliability of Purchased Components E12 Ergonomic Studies E13 Software Reliability Studies E14 Failure Modes and Effects Analysis (FMEA)
SEMATECH
60 E15 Equipment Characterization E16 Component Failure Analysis E17 Failure Reporting and Corrective Action System (FRACAS) Data Activities D1 Data Collection and Data Base Management D2 Human Reliability Analysis (HRA) Testing Activities T1 Test Plans T2 Reliability Tests Reliability Tools The following list summarizes the reliability tools that are discussed in this section. AT1 Accelerated Testing AT2 Burn-In Testing AT3 Cause & Effect (Fishbone) Diagram AT4 Competitive Benchmarking AT5 Design of Experiments (DOE) AT6 Environmental Stress Screening (ESS) AT7 Fault Tree Analysis (FTA) AT8 Life Testing AT9 Pareto Diagram AT10 Process Capability AT11 Quality Function Deployment (QFD) AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT13 Reliability Development/Growth Testing (RD/GT) AT14 Reliability Qualification Testing (RQT) AT15 Reliability Block Diagram Modeling (RBD) AT16 Repairable Systems Analysis AT17 Taguchi Methodology AT18 User Groups AT19 Cost of Ownership Calculations The following pages discuss each activity. Following the activity descriptions is a description of the tools in enough detail that the reader can either use the tool or understand what it can be used for. References are available at the end of each activity or tool that requires more detailed descriptions. Much of the material used in the activity and tool descriptions come directly from the references. The purpose of this section is not to recreate work that has already been done well, but rather to give the reader an opportunity to know what the activity or tool is about and where to go for more information.
SEMATECH
61
SEMATECH
62 Even though safety and maintainability goals are not addressed in these guidelines, some mention of these goals is necessary because of the key interactive role they play with reliability. Designers should identify safety, maintainability, and reliability goals at the same time. Since maintainability is built into equipment, it is primarily addressed in the concept and feasibility and design phases. Maintainability is achieved by carefully considering and balancing numerous factors such as basic physical configuration and layout of the design, test provisions for quick fault location, interchangability of replaceable parts, adequate maintenance procedures, and skill levels of technicians. As with reliability, pertinent data is collected to estimate the maintainability measures and to ensure that the maintainability goals are being achieved. It is important to remember that setting reliability goals is not a one-time affair; it is a continuous process of gradual improvements that are made toward the goals over time. Applicable Tools AT4 Competitive Benchmarking AT11 Quality Function Deployment (QFD)
References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering and Management, New York:McGraw-Hill, 1988, pp. 2.3-2.8.
SEMATECH
63
Equipment Specific Plan. Each piece of equipment requires a reliability plan based on the equipments life cycle phase, reliability goals, schedule, and available resources. The loftier the goal, the more activities, tools, and resources are required to meet that goal. Also, the shorter the schedule, the more resources must be applied over the scheduled period. The plan identifies the specific reliability activities and tools to be used for particular equipment, and who (or what department) is responsible for performing them.
References
MIL-STD-785B, Reliability Program For Systems And Equipment Development And Production, Task 101, 15 September 1980.
SEMATECH
64
Step 2.
Reliability Engineering/Improvements
An example of a functional block diagram is given in the icon above. The functional block diagram represents a hypothetical personal computer (PC). As can be seen from the diagram, the PC has two hard disk drives (HD1, HD2) and two floppy drives (FD1, FD2). The keyboard, IO board, ram card, disk controller card, and video control card all derive their power from the power supply via the mother board. The CRT (monitor) is a separate unit with its own power supply. The need for schematics and flow diagrams is well recognized, but typically these are too complex to use directly. It is important to construct diagrams that depict clearly and simply how the equipment functions. Subsystems, components, support systems, and human actions that lead to equipment failure should be obvious when the functional block diagram is constructed properly.
SEMATECH
65
SEMATECH
66 failure rate does not change over time. The model also assumes that the components, parts, and subsystems being modeled are repairable and that the repaired items are as good as new.
Failure Rate
Time
If a block diagram is used to model the equipment, the equipment model will consist of series blocks (when the failure of one subsystem, component, or part causes the equipment to fail), parallel blocks (when every subsystem, component, or part must fail for the equipment to fail) or a combination of these. The following paragraphs discuss how to create a reliability model. The first step involves clearly defining what is meant by equipment failure. For example, one might define failure as any occurrence that causes the equipment to be down for more than a given period of time (e.g., 6 minutes) or any occurrence that results in wafer damage. This step also involves identifying all of the failure mechanisms that lead to the defined equipment failure. If, for example, equipment failure is defined as a down time of 6 minutes or more, all failure mechanisms that cause the equipment to be down at least 6 minutes are included in the reliability model. If equipment failure is defined as any occurrence that results in wafer damage, all failure mechanisms that result in wafer scrap are identified. Field data is often useful in defining what is meant by equipment failure and in identifying mechanisms that lead to failure. The next step involves creating the reliability model. Fault trees and reliability block diagrams are the tools that are used to do this. RAMP is a software package that has been created to help in the documentation and analysis of a reliability model. It uses reliability block diagrams. RAMP allows one to create the reliability model on a personal computer, provides a means of documenting failures, and performs the Boolean algebra necessary to solve the model. A reasonable starting point in the creation of the model is to initially create a coarse model made up of the equipments major subsystems. If a block diagram is used as the modeling tool, the model would consist of approximately 10 to 20 major subsystems; that is, in the model, one block would represent each major subsystem. Later versions of the model add detail only to those subsystems that are identified as being important; that is, only those subsystems that cause the equipment to fail are broken down into components and parts. Adding detail to unimportant subsystems for the sake of completeness simply increases the modeling effort without adding to the usefulness of the results. Careful examination of field data helps determine the appropriate level of detail for the model. In general, the model should not be more detailed than the available
SEMATECH
Technology Transfer # 92031014A-GEN
67 information will support. If the modeling effort is for equipment not yet in the field, field data for a previous generation of equipment can yield valuable insights into improvements in the next generation. Once the model is completed, it can be transformed into an equation for quantification, which is discussed in engineering activity E6. The equipment reliability is calculated using the failure data collected for the subsystems, components and parts. The following paragraphs discuss tips that will make the modeling effort easier. 1. Think carefully about the subsystem divisions for the equipment being modeled. The choice of subsystems will vary from company to company and equipment to equipment; however, it is best to base the choice on functional considerations not on parts count methods. Choose subsystems based on the functions they perform. Group components and parts under the subsystems that make functional sense. 2. Avoid parts list modeling. That is, do not represent the equipment as a collection of parts. It is important to include failure modes such as operator errors, software failures and failures that are the result of drifting out of specification. In addition, valuable insight into the equipment is gained by thinking about failure modes and interactions between different subsystems. Parts list modeling does not encourage this kind of thinking. 3. It is best to begin by modeling an existing piece of equipment. Good reliability modeling practice comes through experience. If the first model created is for equipment that is well understood, the model can be validated in terms of the failure rate and failure mechanisms. Also, introduction of a reliability modeling program will almost always cause the data collection and data management procedures to be revised. It is generally better to sort out data problems with an existing system than with a new system. 4. No matter what phase of the life cycle the equipment is in, it is best to keep the model as simple as possible. As the model becomes more complicated, it becomes more difficult to interpret. 5. As the reliability process proceeds, continually change, expand, and improve the model. This allows the model to be used throughout the life of the equipment. Applicable Tools AT7 Fault Tree Analysis (FTA) AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT15 Reliability Block Diagram Modeling (RBD)
References
Campbell, J.R., Iman, R., Longsine, D., Thompson, B., A Tutorial on Reliability Modeling Using RAMP, Albuquerque, NM:SETEC, Sandia National Laboratories, SETEC91-030, 1991. MIL-HDBK-217E, Reliability Prediction of Electronic Equipment, Griffiss AFB,NY:Rome Air Development Center, October 1986.
SEMATECH
68
SEMATECH
69
References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering and Management, NY, McGraw-Hill, 1988, pp. 18.34-18.42. Juran, J., F. Gryna, editors, Jurans Quality Control Handbook, Fourth edition, NY, McGrawHill, 1988, pp. 13.21-13.22. Kapur, K., L. Lamberson, Reliability in Engineering Design, NY, John Wiley and Sons, 1977, pp. 405-422. Lloyd, D., M. Lipow, Reliability: Management, Methods, and Mathematics, Second edition, Milwaukee, WS, The American Society for Quality Control, 1984, pp. 25-27, 267-270. OConnor, P., Practical Reliability Engineering, Third edition, NY, John Wiley and Sons, 1991, pg. 136.
SEMATECH
70
C A B D E
Equipment Failure = A + B + [ C * D ] + E
Applicable Tools AT7 Fault Tree Analysis (FTA) AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT15 Reliability Block Diagram Modeling (RBD)
SEMATECH
71
SEMATECH
72 Ingredients for a successful design review include An emphasis on constructive input to designers, instead of criticism. The purpose of a review is not to challenge the work of a designer, but to anticipate weak areas in a design and eliminate them as early in the life cycle as possible. Avoiding the creation of an environment where the designer feels threatened. The designer listens to the results of the review and, along with line management, has the final decision on the design. Creating a design review team from a variety of areas. These areas may include manufacturing, field service, reliability and quality engineering, procurement, materials engineering, shipping, marketing, and design engineering personnel who are not directly associated with the design under review. Customer involvement in a post-design review meeting in which the program is reviewed may yield insight into what the customer values in the equipment. Adequate planning for and emphasis on design review meetings. A formal agenda and advanced documentation is distributed. Focusing on the unproven and untried features of a design. Sufficient structure in the design review process. Identified design weaknesses are documented and provisions are made for their elimination. Subsequent review meetings include a discussion of these weaknesses. A realization that the design review may uncover areas of conflict between departments. Management support. Management is responsible for emphasizing the importance of a carefully planned design.
References
Everett, W., et.al., Reliability by Design A Guide to Reliability Management, Issue 1, Indianapolis, IN, AT&T Bell Laboratories, November 1990, pp. 55-56. Juran, J., F. Gryna, Jurans Quality Control Handbook, Fourth edition, NY, McGraw-Hill, 1988, pp. 13.7-13.11, 16.5-16.6. Lloyd, D., M. Lopow, Reliability: Management, Methods, and Mathematics, Second edition, Milwaukee, WS, The American Society for Quality Control, 1991, pp. 28-30. OConnor, P., Practical Reliability Engineering, Third edition, NY, John Wiley and Sons, 1991, pp. 160-162.
SEMATECH
73
SEMATECH
74
References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, Albuquerque, NM, SETEC, Sandia National Laboratories, SETEC91-030, 1991, pp. 4350.
SEMATECH
75
SEMATECH
76 Standby with Changeover Redundancy. This system has one component operating and one or more identical components in standby. When one component fails, the next component takes over. The assumption here is that no repairs are carried out on failed components until all of the components have failed; that is, the first component and all of the standby components have failed. Standby with Several Operating Components. In this system, there are N operating components and n components in standby. For example, the system consists of 5 identical components, 3 of which must work for the system to be successful. If one of the components fails, another takes its place. This continues until there are none left to take over for a failed component; then repairs occur.
Procedural changes. Procedural changes involve creating a new procedure or changing an existing one to prevent reliability degradation. For example, improving procedures for handling electrostatic-sensitive parts or for aligning dimensionally critical components. Process control. Process control involves modifying a manufacturing process that is degrading reliability. The idea, simply stated, is that if the manufacturing process is understood and controlled, the equipment will come out all right. J. Tunner discusses five basic steps which, if followed, lead to total manufacturing process control: 1. Clearly defining what is required of the equipment 2. Understanding the production process 3. Improving the process so that acceptable equipment is manufactured 4. Controlling and monitoring the process itself 5. Searching out new quality improvement opportunities These steps are applicable to any manufacturing operation. There are numerous tools that are useful for process control: Cause & Effect (Fishbone) Diagrams, Design of Experiments, Pareto Diagrams, Process Capability, and Taguchi Methodology. It is important to note that the success of these steps depends on taking a team approach; that is, operators, engineers, scientists, supervisors, and other key persons throughout the company are involved in all steps. Design for Maintainability. Equipment maintainability is defined as a measure of the ease and rapidity with which equipment can be restored to or maintained in an operational status. It is important that maintained equipment are designed so that maintenance tasks are easily performed and the skill level required for diagnosing, repairing, and scheduling maintenance is not too high. Desirable features include: Making access and handling easy Using standard tools and equipment Eliminating the need for delicate adjustments or calibrations The repairable system analysis tool is useful here in establishing maintenance policies and in highlighting subsystems, components, and parts that need to be more maintainable. While the designer has no control over the performance of the maintenance people, he or she can directly affect the inherent maintainability of the equipment.
SEMATECH
77 Deployment considerations. Reliability degradation during deployment is typically a result of the interaction between people and the equipment or the equipment and the environment. Some of these problems can be prevented if appropriate measures are taken. These measures include: Documenting deployment procedures Training personnel and users Testing during installation Providing technical assistance Identifying and correcting problems Establishing equipment change procedures Improper handling of equipment during delivery and installation can degrade the inherent reliability that has been designed into the equipment. To prevent problems associated with handling the equipment, procedures specifically developed for storage and shipping, installation, and handling and operation are created. In addition, training installation and maintenance personnel and users in the installation, operation, and maintenance of the equipment can significantly reduce reliability problems. Specifying a plan for testing during installation will verify that the installed equipment operates properly and according to specifications and that the equipment performance has not been degraded as a result of shipping and handling. Providing appropriate technical assistance helps customers solve problems. It is also important to identify and correct problems that occur during shipping, installation, operation, and maintenance. Problems are reported and recorded, carefully analyzed, and then reported to the design and manufacturing staff to prevent their recurrence. Equipment change procedures are the methods by which the equipment is changed in the field to meet or enhance the original performance specifications. These specifications are established to assure the customer that any such changes maintain compatibility with existing equipment and do not adversely affect customer requirements. Use of preferred and proven processes, components, and materials. The reliability of a piece of equipment depends on the reliability of its processes, components and materials. Concepts and procedures for ensuring process, component and material reliability include: Selecting, specifying, qualifying, and controlling materials and processes Qualifying and requalifying components Conducting a supplier testing and reliability monitoring program Monitoring subcontractors and suppliers Screening and derating components and materials Applicable Tools AT3 Cause & Effect (Fishbone) Diagram AT5 Design of Experiments (DOE) AT9 Pareto Diagram AT10 Process Capability AT16 Repairable Systems Analysis AT17 Taguchi Methodology
Technology Transfer # 92031014A-GEN
SEMATECH
78
References
Arsenault, J., J. Roberts, editors, Reliability & Maintainability of Electronic Systems, Potomac, MD, Computer Science Press, 1980, pp. 280-293, 365-393. Boothroyd, G., P. Dewhurst, Product Design For Assembly, Wakefield, RI:Boothroyd Dewhurst, Inc. Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 47-54. Davidson, J., editor, The Reliability of Mechanical Systems, London, Mechanical Engineering Publications Limited for The Institution of Mechanical Engineers, 1988, pp. 47-57. OConnor, P., Practical Reliability Engineering, Third Edition, NY, John Wiley & Sons, 1991, pp. 219-220, 117-125, 328-329. Skrabec, Q. Jr., "The Transition for 100% Inspection to Process Control," Quality Progress, April 1989, pp. 35-36. Smith, J. R., "Reliability Analysis By Simulation," 41st Annual Quality Congress Transactions, May 4-6, 1987, pp. 654-662. Tunner, J., "Total Manufacturing Process Control-The High Road To Product Control," Quality Progress, October 1987, pp. 43-50. Vanderbei, K., et.al., Reliability by Design, Indianapolis, IN:AT&T, 1990, pp. 105-114, 61-71. MIL-STD-470B, Maintainability Program for System and Equipment, Irvine, CA:Global Engineering Documents, 30 May 1989.
SEMATECH
79
References
MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Oct. 1984, pp. 11-87 to 11-93, 12-47 to 12-49.
SEMATECH
80
SEMATECH
81 One of the most important steps in the supplier reliability program is measurement and feedback. Measurements provide a means of determining if the supplier is meeting the agreed upon reliability requirements. Feedback gives the supplier the necessary information to improve the product. Finally, for every product that does not meet the reliability requirements, suppliers are asked what corrective action they will take. They should be able to provide answers to the following questions: What caused the product to not meet the reliability requirements? What changes need to be made to make the product meet the requirements? How will these changes be made foolproof? How will the customer know that these changes have been made? It is the customers responsibility to ensure that the supplier works to find the root cause of each failure to meet the requirements and takes the necessary action to permanently eliminate the cause. The role of the supplier in improving reliability of the equipment is critical. For the supplier to continuously improve the products reliability, the customer must demand it. Applicable Tools AT6 Environmental Stress Screening (ESS) AT8 Life Testing AT18 User Groups
References
Broeker, E., "Build a Better Supplier-Customer Relationship," Quality Progress, September 1989, pp. 67-68. Juran, J., F. Gryna, editors, Jurans Quality Control Handbook, Fourth edition, NY:McGrawHill, 1988, pp. 15.1-15.46, 30.18-30.21. Klock, J., "How to Manage 3,555 (or Fewer) Suppliers," Quality Progress, June 1990, pp. 43-47. Richardson, J., "Vendor Quality Assurance in a Process Industry," Quality Progress, November 1984, pp. 60-63.
SEMATECH
82
Coffee Shop
Phones
Ergonomics (or Human Factors Engineering) is a discipline concerned with designing equipment, operations, and work environments to match human capabilities and limitations. Ultimately, everything that one designs has an impact on the human in one way or another. Someone will have to fabricate the equipment, package it, distribute it, unpack it and prepare it for use, operate or use it, service and maintain it, and finally dispose of it. For this reason, designers should be constantly alert to the human factors implications of their proposed design. Keep in mind that the ultimate success of the equipment depends on how well the user performs the tasks associated with it. The intent of human factors engineering in this document is to focus on and resolve humanequipment interface problems and solutions wherever or whatever they are. Philosophically, then, human factors engineering is looking at a design from the standpoint of user efficiency, or total human-equipment output effectiveness. Inherent in this philosophy are the following objectives: To make the users contribution to the equipment output as efficient as possible so that the basic equipment output is not compromised by human failures. To make the combined user-equipment involvement as safe as possible so that neither human nor equipment failures will compromise the users health or damage the hardware. Inherent in this objective is the avoidance of injury to others and of damage to adjacent hardware.
SEMATECH
83 To minimize the stress that the equipment imposes on the user as he or she uses, operates, services, or maintains it. This includes such stresses as an undue energy demand, frustration in trying to deal with the equipment at any point in the human-equipment interaction, and worry about whether one is using the equipment properly. To maximize the acceptability of the equipment, not only in terms of its attractiveness, but also in terms of giving users the feeling that the equipment allows them to use it efficiently and keep it in good working order with a minimum of effort.
The methods of ergonomics are based on a logical and systematic process of: (1) establishing the proper role of the human with the equipment, (2) designing the human-equipment interfaces to fit the humans capabilities and limitations, (3) evaluating and testing to see that the design does fit these capabilities and limitations, and (4) properly training the human to operate the equipment. If the equipment has used ergonomically sound human-equipment interfaces, the following items have been accomplished: The equipment conforms to populational stereotypes and user expectations It is easy to learn how to operate the equipment Easily perceived displays and simple controls allow effective and efficient communication between humans and the equipment The tasks allocated to humans and the equipment are based on known relative strengths and weaknesses Relevant information is provided to the user by the equipment which avoids reliance on the users memory Effective and efficient performance of equipment functions are facilitated Whenever practicable, human engineering specialists should be used to help identify and solve human engineering problems. However, this is not always possible. There are numerous human factors references available; however, most of these references are directed to human factors or human engineering specialists. The reference provided at the end of this activity has been directed specifically toward the engineer or designer and provides a number of guidelines to assist designers in doing their own human engineering. Its purpose is to provide a general reference to key human factors questions and human-equipment interface design suggestions in a form that engineers and designers can utilize with a minimum of searching or study.
References
Woodson, W., Human Factors Design Handbook Information and Guidelines for the Design of Systems, Facilities, Equipment, and Products for Human Use, New York:McGraw-Hill Book Company, 1981.
SEMATECH
84
SEMATECH
85 CHECK 3: Train Personnel in Priority Areas Software requirements Software testing Software configuration management Software inspections The primary indicator of process improvement at this time is the use of software inspections to identify and classify defects throughout the software life cycle. The intent is to find as many defects as possible, conduct a root cause analysis to identify how the process might be improved in order to reduce defects in the future, and measure the resources; that is, the time, personnel, and costs, required to correct the defects. There is emerging research that is attempting to link the early defect identification with the software operational reliability failure data. A checklist of activities that will improve software operational reliability include: CHECK 1: Define equipment and software reliability goals Probability Failure intensity Fault density CHECK 2: Analyze failure data from equipment test/operation Equipment identification data
Equipment Identification/Version Subsystem Identification/Version Location of Equipment Software Release #/Version Software Component Version : : : : : : : : : : : : : : : [name & version#] [three characters] [site name] [release #] [version #] [id#] [mo/da/yr] [hh:mm:ss] [hh:mm:ss] [hh:mm:ss] [hh:mm:ss] [mo/da/yr] [1,2,3,4,5] [text description] [task logs]
SEMATECH
CHECK 3: Apply failure classification scheme Code Severity 1. Equipment Abort Description of Failure A software or firmware problem that results in an equipment abort or crash. A software or firmware problem that severely degrades the equipment and no alternative workaround exists; restarts not acceptable. A software or firmware problem that severely degrades the equipment and an alternative workaround exists; process can continue with more operator action; restarts not acceptable. An indicated software or firmware problem that does not severely degrade the equipment or any essential function; restart acceptable. All other minor problems/non-functional faults due to software or firmware problems.
2.
3.
4.
5.
Minor Fault
CHECK 4: Apply operational reliability model for the decision process Poisson process models are typical. When will software meet reliability goals? When can software release be delivered? What level of support will be required?
SEMATECH
87 An example set of data collection, analysis, and reporting process flow steps include: STEP 1: Begin test sequence. STEP 2: Collect equipment and execution data for each failure. STEP 3: Send collected data to analysis personnel at end of test sequence. STEP 4: Respond to queries from analysis personnel for more information. STEP 5: Record failure and management status data. STEP 6: Update software operational reliability data base. STEP 7: Generate failure/fault count summary reports. STEP 8: Update software operational reliability model. STEP 9: Generate software operational reliability measures, graphs. STEP 10: Provide summary of results to management on a regular basis. The references provide more detail about software reliability.
References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering, NY:McGraw-Hill, 1988. Musa, J., A. Iannino, K. Okumoto, Software Reliability: Measurement, Prediction, Application, NY:McGraw-Hill, 1987. SETEC, "Software Reliability for SEMI/SEMATECH Companies (Draft)," SEMATECH, SETEC-91-032, December 20, 1991.
SEMATECH
88
Equipment: Subsystem: Reference Drawing: Subsystem /Module & Function Potential Failure Mode Potential Local Effect(s) Of Failure
FMEA Fault Code # Potential End Effect(s) Of Failure S E V C Potential R Cause(s) Of Failure
Date: Sheet: Prepared By: O Current C Controls C /Fault Detection Recommended Action(s)
The complexity of the equipment and the availability of data dictate the FMEA analysis approach that will be used. There are two primary approaches for accomplishing an FMEA. One is the hardware approach which lists individual hardware components and analyzes their possible failure modes. The other is the functional approach which recognizes that every component is designed to perform a number of functions that can be classified as outputs. These outputs are listed and their failure modes are analyzed. For complex systems, a combination of the functional and hardware approaches may be used. The FMEA may start at the highest equipment level and proceed down to lower levels (top-down) or start at the lowest level and proceed to the highest equipment level (bottom-up). The hardware approach is normally used when hardware components can be uniquely identified from schematics, drawings, and other engineering and design data. This approach is generally done bottom-up. The functional approach is normally used when hardware components cannot be uniquely identified or when equipment complexity requires analysis from the highest equipment level down through succeeding levels. This approach is generally done top-down. An FMEA analysis is used to: Ensure that all conceivable failure modes and their effects are understood Assist in the identification of design weaknesses Select design alternatives Select design improvements Prioritize corrective actions
SEMATECH
Technology Transfer # 92031014A-GEN
89 Select test programs Assist in troubleshooting existing equipment with operating problems
Since an FMEA concentrates on identifying possible component failures and their effects on the equipment, design deficiencies can be identified and improvements can be made. Identification of potential failures leads to a recommendation for an effective test program. Failure modes can be prioritized according to their frequency so that concentrated effort can be placed on the higher priority components; that is, on those components with the most failures. A limitation of the FMEA analysis is that it considers each failure mode individually, if a single failure does not affect the equipment but two or more failures do, the FMEA analysis is not well-suited to assessing the combined effects of these failures on the equipment. As the equipment proceeds through the life cycle phases, one may conduct a progressively more detailed FMEA analysis. An FMEA analysis consists of four steps: 1. Establishing the scope of the analysis 2. Collecting data 3. Preparing a components list 4. Preparing the FMEA worksheets It is important to clearly state the scope of the FMEA analysis. Clearly identifying the boundaries of the equipment so that no component within that equipment is left out is an important part of the scope. Also included in the scope is the identification of underlying causes of failures and the possible effects of these failures on the equipment. Failure detection, safeguards, frequency of the failure, and the criticality of the effects of the failure information may also be included. The type of information necessary to perform the analysis includes: equipment configurations, designs, specifications, and operating procedures. Data may also be collected by interviewing: design personnel; operations, testing, and maintenance personnel; component vendors; and outside experts, to gather as much information as possible. A list of all components in the equipment is prepared before examining the potential failure modes of each of those components. Functions, operating conditions (such as; temperature, loads, and pressure), and environmental conditions of each component may be included in the components list. According to C. Sundararajan, the following questions are answered for every component of the equipment. 1. How can the component fail? (There could be more than one mode of failure.) 2. What are the consequences (effects) of the failure? 3. How critical are the consequences? 4. How is the failure detected? 5. What are the safeguards against the failure? How many of these questions are asked and which ones they are depends on the scope and purpose of the analysis. When these questions are answered, all significant failure modes of the different components are identified, their detection and safeguards are documented, and their effects on the equipment are determined.
SEMATECH
90 Findings of the FMEA analysis are recorded in a tabular format in FMEA worksheets. MIL-STD1629A describes the worksheets in detail.
References
Sundararajan, C., Guide to Reliability Engineering Data, Analysis, Applications, Implementation, and Management, NY:Van Nostrand Reinhold, 1991, pp. 146-152. MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Irvine, CA:Global Engineering Documents, 12 October 1988, Global Engineering Documents, pp. 7-100 to 7-121. MIL-STD-1629A, Procedures for Performing a Failure Mode, Effects, and Criticality Analysis, Washington, DC:Department of Defense, 24 November 1980.
SEMATECH
91
SEMATECH
92
SEMATECH
93
Engineering Activity E17: Failure Reporting, Analysis and Corrective Action A Failure Reporting, Analysis and Corrective Action System (FRACAS) provides a closed-loop feedback path by which data on failures occurring during field tests and operation are collected, recorded, and analyzed to determine where problems are concentrated in the design. This promotes continuous improvement in equipment reliability. A FRACAS is also used to track internal test performance and provides a good historical basis for comparison to external equipment performance.
Test Inspect Correct Reliable Product
DATABASE
CORRECTIVE ACTION
Actions
Reports
FAILURE REPORTING
Analysis
ANALYSIS
A FRACAS is used to: Establish a closed-loop failure reporting system Establish procedures that are used to determine the cause of subsystem and component failures Document the corrective actions taken The reason for establishing a closed loop system is that it allows one to collect, analyze, and record failures down to a specified level, that is to the subsystem, component and part level. Procedures for initiating failure reports, the analysis of failures, feedback of corrective action into the design, manufacturing and test processes are identified. The closed-loop system includes provisions that ensure that effective corrective actions are taken on a timely basis by a follow-up audit that reviews all open failure reports, failure analysis and corrective action suspense dates,
Technology Transfer # 92031014A-GEN
SEMATECH
94 and the reporting of delinquencies to management. The failure cause for each failure is clearly stated. The objectives of a FRACAS are to: Assess historical reliability performance Develop a pattern of deficiencies Provide engineering data for corrective action Develop statistical data for component failure rates and downtime component selection suitability criteria component application reviews future designs and design reviews product improvement programs spares provisioning life cycle costing Develop contractual performance data Provide warranty information Furnish safety and regulatory compliance data Assess liability-claim information
References
A Reliability Guide to Failure Reporting, Analysis, and Corrective Action Systems, Milwaukee, WS:American Society for Quality Control, 1977. MIL-STD-785B, Reliability Program for Systems and Equipment Development and Production, Task 104, Philadelphia, PA:Naval Publications and Forms Center, 1980.
SEMATECH
95
One of the building blocks for FRACAS is the collection of data and managing that data with a data base management system. Together, they provide an organized way to gather factual data about equipment performance - both good and bad. Based on the reliability model for the equipment, a shopping list for data is established. Each component or subsystem modeled in the fault tree or block diagram requires data in the form of a failure probability or frequency. Several types of data are needed to determine the failure probability and to assess product reliability: Cumulative operating time Number of failures Conditions present at the time of failure There are three methods used for collecting reliability data. The first method involves the use of a standardized reporting form that is filled out by engineers and technicians who are involved in equipment testing, troubleshooting, and repair. These forms need to be simple to use and ask only for needed information. An example of a reliability reporting form is on the following page. To obtain a better understanding of the final use and importance of the data; personnel involved in the collection of the data, final test technicians, and field service engineers are part of the team that designs the data collection form and are involved in analyzing the data. The second method involves the use of customer database and equipment tracking information. This requires an excellent on-going customer supplier relationship. Great care must be taken to ensure compatibility between the supplier and multiple customers data. Simply agreeing to SEMI E10-90 specifications will not suffice; although basing the specifications on E10-90 makes it industry compatible. In addition, a standard way of identifying failures and assists to the subsystems and components should be devised. Inclusion of key customer equipment engineers in evaluating the validity of the data collected is very useful. The third method is to use the on-board CPU power to monitor and track equipment status, faults, and errors. Customers agree to allow the information to be downloaded to a floppy disk and removed from the site. The ability to time stamp and match this information to customer data base information provides useful data.
SEMATECH
96
Project/Model
Impact/Effect/Consequences of Problem
Remarks
Reported By
Date
Referred Problem To
SEMATECH
97 If there is no equipment in the field from which to collect data there are several sources of data available: Historical data Sub-tier supplier data In-house data Expert judgement Historical data is data that has been collected for a previous generation of equipment or similar equipment. The use of this data is limited to those subsystems and components that are similar to those in current equipment. This data also requires that attention is paid to trends; that is, if the subsystem or component had been undergoing improvements or if the methods of collecting the data were changing, these must be accounted for. When a subsystem or component is purchased from a supplier, that supplier should be able to supply the data that has been collected for that part up to this point in time. Once a testing program exists for the equipment, in-house data is available. For those subsystems and components that have none of the previous sources of data available, expert judgement can be used to create initial reliability values. Expert judgement takes the opinion of individuals who are considered to be knowledgeable about a subsystem or component and uses this knowledge to create failure rates. It should be noted that these sources of data do not always represent the environment and operating conditions that the equipment will see in the field. Thus, the preferred source of data is always field data. When collecting data, it is important to keep all of the data. This makes it possible to represent the subsystem and component failure rates over a range of values and more accurately represents the variety of environments and users that the subsystem and component will see. It cannot be stressed enough that the validity of the reliability model and its predictions depend on the validity of the data. A statement commonly used by software users is, "Garbage In, Garbage Out," which is just as applicable here. As soon as possible replace historical and expert judgement data with data collected during testing and operation in the field. At this time it is important to discuss how the collected data is translated into failure rates, that are used to improve the equipments reliability. In a typical piece of equipment, some components are under stress or used continuously while others are used cyclically. Thus, failure rates can be defined as a function of time (per hour) or cycle (per wafer). In either case, the collected data includes the number of cycles, wafers, or hours during which the failures occurred. Failures are evaluated to assure that the failures were genuine and resulted in equipment shutdown or lost production time. Once the evaluation is done, translating data into failure rates is fundamentally simple. Suppose that a database includes 25 machines operating over a 9month period. If component A failed 20 times and the average operational time for the 50 machines was 70 percent (that is, its utilization factor is 0.70), the failure rate for component A would be MTBF = 20/[25(9 mo.)(30 days/mo.)(24 hr./day)0.7] = 1.8x10-4 failures/hr.
SEMATECH
98 Suppose a second component, B, failed 12 times, but it relates to wafers, and the machine averages 10 wafers/hr. the failure rate of component B would be 12/[25(9mo.)(30 day/mo.)(24 hr./day)(10 wafers/hr.)0.70] = 9.5x10-5 failures/wafer processed. Alternatively, it would be MTBF = 9.5x10-5 failures/wafer(10 wafers/hr.) = 9.5x10-4 failures/hr. The key, of course, is knowing or estimating the utilization factor. This can be determined by tabulating and averaging the operational times of all 25 machines. It can also come from groups of machines, given general production information. Applicable Tools AT18 User Groups
References
Bigelow, J., "Tailored Data Collection," Quality, August 1991, pp. 21-22. Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 47-54. SEMI E10-90, Guideline For Definition And Measurement Of Equipment Reliability, Availability, and Maintainability (RAM), SEMI 1990, pp. 69-75.
SEMATECH
99
References
Ericson, D., editor, et.al., Analysis of Core Damage Frequency: Internal Events Methodology, NUREG\CR-4550, Volume 1, Revision 1, SAND86-2084, Albuquerque, NM:Sandia National Laboratories, pp. 7-1 to 7-80. Siegel, A., J. Wolf, A Technique for Evaluating Man-Machine Systems Design, Human Factors, 3:1, 1961. Swain, A., H. Guttmann, Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications, NUREG/CR-1278, SAND80-0200, Albuquerque, NM:Sandia National Laboratories, August 1983. Swain, A.D., Shortcuts in Human Reliability Analysis, Holland:Nordhoff Publishing Company, NATO Advanced Study Institute on Generic Techniques in Systems Reliability Assessment, 1975. MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Vol. I of II, Irvine, Ca:Global Engineering Documents, 12 October 1988, pg. 7-100.
SEMATECH
100
References
Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 51-52. MIL-STD-781D, Reliability Testing for Engineering Development, Qualification, and Production, Washington, DC:Department of Defense, 17 Oct 1986. Arsenault, J., F. Roberts, editors, Reliability & Maintainability of Electronic Systems, Potomac, MD:Computer Science Press, Inc., 1980, pp. 353-354.
SEMATECH
101
SEMATECH
102 many failures will occur in components and parts that have not been sufficiently proven out; this makes failure tracking difficult. Another disadvantage in starting equipment testing too early is that if too many component and part failures occur, the remainder will be subjected to too many start operations, which are perhaps severer than steady-state operation. Consequently, a false impression of the failure distributions will occur, compared with those expected in operation. Equipment testing focuses on "Is the component or part reliable within the subsystem or equipment?" Equipment testing does not eliminate component testing, but helps to pinpoint the faulty components or parts, so that they may be replaced or modified by superior products. Equipment testing is a way of realistically evaluating reliability as well as guiding component and part improvement by systematically discovering problems and weaknesses. There are several tools that are useful for testing subsystems and equipment. As with component tests, accelerated testing can be used to gather reliability data in a shorter period of time. It can also be used with Environmental Stress Screening (ESS) for subsystems and Reliability Development/Growth Testing (RD/GT) for both subsystems and equipment. ESS is not done at the equipment level; however, it is useful at the subsystem level. ESS can be used to stimulate failures by stressing the subsystem to detect and remove early failures. RD/GT is used to identify and correct failure modes and then to verify that the failure has been eliminated. Reliability Qualification Testing (RQT) is used to verify that critical subsystems and the equipment meet design goals and comply with contractual/program objectives. Life testing can be used to evaluate the useful life or reliability of a subsystem or the equipment. Burn-In Testing is used to screen out defects during a subsystems or equipments infant mortality period. Reliability Demonstration Tests are used to demonstrate, often to the customer, that the equipment is capable of meeting its specified performance and reliability for a stated period of operation. This type of test can be very expensive and requires careful planning and execution. The equipment and its associated subsystems, components, and parts that are going to be tested, and the test conditions to be used must be closely controlled to ensure the validity of the final results. It is often the practice to disassemble the items totally after the tests are completed to inspect each one for wear, damage, or signs of impending failure. A tool that is very useful for reliability demonstration tests is Reliability Qualification Testing (RQT). RQT is used to verify that the equipment will meet design goals and comply with contractual/program requirements. Applicable Tools AT1 Accelerated Testing AT2 Burn-In Testing AT6 Environmental Stress Screening (ESS) AT8 Life Testing AT13 Reliability Development/Growth Testing (RD\GT) AT14 Reliability Qualification Testing (RQT)
SEMATECH
103
References
Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 51-52. Lloyd, D.K., M. Lipow, RELIABILITY: Management, Methods, and Mathematics, Second Edition, Milwaukee, WS:The American Society for Quality Control, 1991, pp. 349-354.
SEMATECH
104
References
Hall, I., W. Cramond, D. Huffman, Summary of the SETEC Accelerated Testing Workshop, SETEC91-017, Albuquerque, NM:Sandia National Laboratories, 1991. OConnor, P., Practical Reliability Engineering, Third Edition, NY:John Wiley & Sons, 1991, pp. 264-267.
SEMATECH
105
The left decreasing portion of the curve is the infant mortality period, where a disproportionate number of failures occur early in the equipments lifetime. The flat part represents the constant failure rate during the useful life of the equipment. The right increasing portion is the wear-out period. It is useful to know, as closely as possible, where the infant mortality ends and the wear out starts, even when burn-in tests are not performed. Burn-in has proven to be an effective means of screening out defects during a components infant mortality period. The typical burn-in test combines electrical stresses with temperature cycling for short periods of time to activate temperature and voltage failure mechanism dependencies. The two types of burn-in tests are static and dynamic. In static burn-in, a bias may be applied to the device under test at very high temperatures. In dynamic burn-in, entire circuit cards may be operated to simulate actual equipment operation. Screening out the infant mortality failures results in more reliable components. Because most of the failures occur during the infant mortality phase of the components life, this method of testing results in reliability improvement of the equipment. Burn-in tests are usually conducted on 100% of the production units to weed out production errors related to minor variations in workmanship and process fluctuations that result from engineering changes. Burn-in tests also discover some residual design errors. In these tests, the stresses applied are usually within published performance constraints, and are applied for short periods of time. Their purpose is to prevent production-related errors from being shipped. Products that have undergone burn-in tests should be failure free.
Technology Transfer # 92031014A-GEN
SEMATECH
106
References
Klinger, D., Y. Nakada, M. Menendez, AT&T Reliability Manual, NY:Van Nostrand Reinhold, 1990, pp. 52-57. Punches, K., "Burn-In and Strife Testing," Quality Progress, May 1986, pp. 93-94.
SEMATECH
107
For each cause ask, "Why does it happen?" and list responses as branches off the major causes. The causes shown as branches can have sub-causes, indicated by sub-branches, and so on.
References
Ishikawa, K., Guide to Quality Control, White Plains, NY:Quality Resources, 1982, pp. 8-29. OConnor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons, 1991, pp. 311-312 The Memory Jogger, Methuen, MA:GOAL/QPC, 1988, pp. 24-29.
SEMATECH
108
The process itself is straightforward and simple; Industry Week outlines the benchmarking process with a list of 10 steps. However, the simplicity of the process belies its true power. One aspect of benchmarking that sets it apart is that it directs a companys focusoutside their own walls - aimed squarely at the marketplace and their competition. This leads to setting goals that are geared toward being the best in the world, not just slightly better than last year. Another benefit of benchmarking is that it can provide the blueprints for how a company can leap ahead of even the best of its competitors. Improvements are not only in the equipment but in secondary and supporting systems and processes. Other benefits of benchmarking include: Identifying the keys for success for each area studied Providing specific quantitative targets Creating an awareness of state-of-the-art approaches Cultivating a culture where change, adaptation, and continuous improvement are actively sought out Spotting emerging competitors and seeing where the company should be going in the future
SEMATECH
109
References
Altany, D., "Copycats," Industry Week, November 5, 1990, pp. 11-18. Camp, R., Benchmarking: The Search For Best Practices That Lead To Superior Performance, Milwaukee, WS:ASQC Quality Press, 1989. Pryor, L., Beating The Competition: A Practical Guide To Benchmarking, Washington DC:Kaiser Associates, 1988. Competitive Benchmarking: What It Is And What It Can Do For You, Stamford, CONN:Xerox Corporate Quality Office, Reference No. 700P90201, May 1987.
SEMATECH
110
111 In particular, in the presence of interactions, full-factorial and fractional-factorial designs are superior to one-at-a-time strategies. Fractional-factorial designs are useful for screening and are highly efficient for large numbers of factors. However, one assumes that only low-order interactions are present. When the experiment is run with center points both full-factorial and fractional-factorial designs can signal curvature or non-linearity. When used with steepest-ascent methods, factorial designs provide efficient second order optimization. The final stage of optimization can be achieved using response-surface methods. These methods are usually based on a second degree polynomial model that allows estimation of curvature. Although multi-level factorial designs could be used for fitting higher order surfaces, the family of central-composite designs are built up from fractional-factorial or full-factorial designs by adding selected axial joints.
References
Box, G., W. Hunter, J. Hunter, Statistics for Experimenters, An Introduction to Design, Data Analysis, and Model Building, New York:John Wiley and Sons, 1978. Taguchi, G., Introduction To Quality Engineering: Designing Quality into Products and Processes, White Plains, NY:UNIPUB/Kraus International Publications, 1987.
SEMATECH
112
SEMATECH
113
References
Bailey, R., R. Gilbert, "STRIFE Testing for Reliability Improvement," PROCEEDINGS Institute of Environmental Sciences, Vol. 1, 1981, pp. 119 - 123. Bird, C., "Unit Level Environmental Screening," PROCEEDINGS - Institute of Environmental Sciences, May 1980, pp. 63 - 64. Punches, K., "Burn-In and Strife Testing," Quality Progress, May 1986, pp. 93 - 94. Tustin, W., "Shake and Bake the Bugs Out," Quality Progress, Sept. 1990, pp. 61-64. MIL-STD-785B, Reliability Program For Systems And Equipment Development And Production, Task 301 Environmental Stress Screening, 3 July 1986, pp. 301-1 to 301-2. MIL-STD 810E, Environmental Test Methods And Engineering Guidelines, 14 July 1989. RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11, Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 203-209.
SEMATECH
114
SS 1
SS 2
SS 3
C1
C2
C3
C4
C5
P1
P2
P3
P4
FTA is used to determine the various combinations of events; that is, component-level failures, that could result in equipment failure. Component-level failures include hardware failures, human errors, and software errors. A failure can range from noncompliance with specifications to the inability of a component to perform its intended function. Component-level failures, in fault tree (FT) terminology, are called primary events. Equipment failure refers to an undesired state of the equipment; such as, the equipment stops functioning or makes bad products. Equipment failure, in fault tree terminology, is called the top event. A fault tree is not a model of all possible equipment failures or all possible causes of equipment failure. A fault tree is tailored to its top event; that is, the fault tree only includes those failures that cause that top event to occur. Construction of a FT begins by defining what the top event is, for example, failure of the equipment at less than 1000 hours. The next step involves determining the various ways that this failure can occur. This is initially done at a fairly gross level. (For example, equipment failure due to failure of the wafer handler subsystem). Once the equipment is modeled at a gross level; that is, the model consists of 10 to 20 major subsystems, the next step is to determine which of the subsystems should be modeled in more detail. If a particular subsystem rarely fails and it is anticipated that this situation will not change, it would be a waste of time and effort to model it. Concentrate instead on those subsystems that cause the equipment to frequently or catastrophically fail. Those subsystems that are targeted as a reliability problem for the equipment are broken into more detail. For example, the wafer handler subsystem could be
SEMATECH
Technology Transfer # 92031014A-GEN
115 broken into the arm, associated software, and electrical components. Only those portions of the wafer handler subsystem that significantly contribute to failure of that subsystem are broken into more detail. This process is continued for all identified subsystems until all potential ways of failing the equipment are identified. The remainder of the description of this tool will focus on a general description of fault tree analysis and the Boolean algebra necessary to quantify the fault tree into an equipment failure rate. The references at the end of the description provide more detailed information. At the top of the FT the top event is listed within a rectangle. The icon at the beginning of this tool description has labeled its top event Equipment Failure. Next, the question, "How can the equipment fail?" is asked. All those events; that is, subsystems, that can cause equipment failure are placed in the FT under the top event, see Subsystem 1 (SS1), Subsystem 2 (SS2), and Subsystem 3 (SS3) in the icon. Gates are used to connect the events. The gate between the top event, equipment failure, and the primary events, SS1, SS2, and SS3, indicates that failure of SS1, SS2 or SS3 will cause the equipment to fail. Some of the symbols used in a fault tree include:
Primary Events
Basic Event A basic failure requiring no further development.
Undeveloped Event
An event that is not further developed either because it is insignificant or information is unavailable.
Gates
AND Gate Output fault occurs if all the input faults occur.
OR Gate
Transfer Symbols
Transfer In Indicates that the tree is developed further on another page. Indicates that this portion of the tree connects at the corresponding transfer in.
Transfer Out
There are other less-used events and gates that are described in texts on FTA. As can be seen in the icon, SS1 fails if component 1 or 2 (C1 or C2) fail. C2 fails only if both parts 1 and 2 (P1 and P2) fail. SS3 fails if components 3, 4, and 5 (C3, C4, and C5) all fail. Failure of C4 requires either part 3 (P3) or part 4 (P4) to fail. Once construction of the fault tree is completed, it is translated into an equation that is used to quantify the equipment failure rate. Fault trees are based on Boolean algebra. Boolean algebra is the mathematical manipulation of events derived from logical reasoning. The references discuss Boolean algebra in detail; it will not be discussed here. The Boolean equations for the icon fault tree are:
Technology Transfer # 92031014A-GEN
SEMATECH
116 Equipment Failure = SS1 + SS2 + SS3 SS1 = C1 + C2 C2 = P1 * P2 SS3 = C3 * C4 * C5 C4 = P3 + P4 where + means OR, and * means AND. Substituting into the equipment failure equation, Equipment Failure = C1 + P1 * P2 + SS2 + C3 * (P3 + P4) * C5 expanding and using the associative and distributive laws Equipment Failure = C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5. Each of the terms in this equation is a scenario that leads to the top event; for example, C1 is a failure of component 1 which leads to equipment failure. In the IC equipment industry, the fault tree will consist almost entirely of OR gates. This means that every primary event is a scenario leading to the top event. AND gates are used when there is redundant equipment. Redundancy is a principle often used for critical safety functions. The fault tree has been translated into an equation, it is now time to quantify the probability of the top event as a function of the primary events. Often, the term probability is used when what is really meant is frequency, probabilities must lie between 0 and 1. A frequency can be any number greater than or equal to 0, depending on the number of events that occur and the time scale. For example, if a component fails twice per year, its frequency is 2/yr, or 0.66/mo. Using the previous example, the probability of the Equipment Failure can be written, P(Equipment Failure) = P(C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5). But, how does one deal with the right-hand side of the equation? Considering the basic laws of probability and the small probability approximation, and assuming that the events are independent, the example equation becomes: P(Equipment Failure) = P(C1) + P(P1)*P(P2) + P(SS2) + P(C3)*P(P3)*P(C5) + P(C3)*P(P4)*P(C5).
References
Dhillon, B.S., Quality Control, Reliability, and Engineering Design, New York:Marcel Dekker, Inc., 1985, pp. 154-163. Roberts, N., W. Vesely, D. Haasl, F. Goldberg, Fault Tree Handbook, NUREG-0492, Washington, DC:U.S. Nuclear Regulatory Commission, January, 1981. Sundararajan, C., Guide To Reliability Engineering Data, Analysis, Applications, Implementation, and Management, New York:Van Nostrand Reinhold, 1991. pp. 153-285.
SEMATECH
117
References
Lloyd, D., M. Lipow, RELIABILITY: Management, Methods, and Mathematics, Second Edition, Milwaukee, WS:The American Society for Quality Control, 1991, pp. 307-319, 352. Nelson, W., Applied Life Data Analysis, NY:John Wiley & Sons, 1982.
SEMATECH
118
No. of Failures
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
The Pareto diagram is a vertical or horizontal bar chart used to quantify and identify problems and determine which problems should be worked on first. The bars are used to present a graphic picture of the problems related to equipment. The bars are arranged in descending order of importance from left to right. Analyzing failure data and using that data to create a Pareto diagram allows for determining how to solve the largest proportion of the overall reliability problem with the most economical use of resources.
References
Harrington, H., The Improvement Process, New York:McGraw-Hill, 1987, pp. 108-110, 207. Ishikawa, K., Guide to Quality Control, White Plains, NY:Quality Resources, 1982, pp. 42-49. O'Connor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons, 1991, pp. 270-271. The Memory Jogger, Second Edition, Methuen, MA:GOAL/QPC, 1988, pg. 17.
SEMATECH
119
If equipment, subsystems, components, or parts have a tolerance (or specification) width, and are produced by a process that generates variation in the parameter(s) of interest, it is important that the process variation be less than the tolerance width. The ratio of the tolerance to the process variation is called the process capability index, and is expressed as
Cp = T 6
where T is the tolerance width and 6 represents an interval of six standard deviations or, plus or minus three standard deviations from the process mean. A Cp of 1 indicates that a process will generate approximately 3 out-of-specification units in 1000, given the following assumptions. The first assumption is that the process is normally distributed and stable. Any systematic divergence, due for example to set-up errors, movement of the process mean during the manufacturing cycle, or other causes, could significantly affect the output. Therefore, the use of Cp to characterize a production process is appropriate only for processes that are under statistical control; that is, there are no special causes of variation such as those just mentioned, only common causes. Common cause variation is the random variation inherent in the process, when it is under statistical control. The Cp index also assumes that the tolerance center and the process mean coincide; that is, the process average is centered on the nominal value.
SEMATECH
120 The Cpk index uses the Cp index as a starting point for stating a processs capability, however, it accounts for the process center not being the nominal value. Cpk is expressed as
C pk = (1- K) C p
if D>x; otherwise replace D-x withx -D. D is the design center,x is the process mean, and T is the tolerance width. Ideally Cp = Cpk. There are several things to keep in mind when using Cp and Cpk indices: If the process is not stable, Cp and Cpk are meaningless statistics. Not all processes can be assumed to be normally distributed. A naive user may incorrectly assess the fraction of process output that will be out of specification. Cp and Cpk do not yield the same information about a process Both Cp and Cpk are closely tied to traditional 0-1 loss and do not account for losses incurred for being off-target; each measures distance from specifications not distance from target.
References
Gitlow, H., S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of Quality, Boston, MA:IRWIN, 1989, pp. 451-457. Kane, V. E., "Process Capability Indices," Journal of Quality Technology, Vol. 18, No. 1, January 1986, pp. 41-52. OConnor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons, 1991, pp. 302-303. Sullivan, L., "Reducing Variability: A New Approach to Quality," Quality Progress, July 1985, pp. 15-21. The Memory Jogger, Second Edition, Methuen, MA, GOAL/QPC, 1988, pp. 64-68.
SEMATECH
121
Correlation Matrix
y rit io Pr
How?
What?
Relationship Matrix
Importance Ratings
How much?
The focus of QFD is almost entirely on the customer; that is, the voice of the customer. The attitude promoted by QFD is one of problem avoidance rather than problem solving. QFD is best used in a team or group context. The information required to complete a QFD matrix is usually found in many different disciplines or skill sets. The information needed stretches from a few simple (but presumably accurate) statements of customer needs, all the way to the most detailed manufacturing process description. Therefore, it is not a methodology that can be effectively used by a single person. Advantages of QFD include: Promoting careful planning of the equipment through all life cycle phases in such a way that attention is paid to customer needs
SEMATECH
122 Eliminating spurious engineering and process requirements; that is, those that have no role in meeting customer needs Shortening the time it takes to move through the concept and feasibility to production and operation phases by avoiding later life cycle changes that stretch out the cycle time Identifying problem areas early, exposing areas for improvement, and providing documentation for these activities
Difficulties with QFD include: Being semi-quantitative, QFD doesnt replace good engineering judgement and good sense An inability to compensate for an inaccurate or incomplete list of customer needs Not being designed to promote innovation in the sense of new or radical product ideas Requiring the use of a wide variety of expertise and a team environment The basics of the QFD matrix are simple; although, in practice it is a great deal of work to collect the information necessary to create the matrix. Generally the QFD matrix consists of seven parts What? How? Relationship matrix Priority Correlation matrix Importance ratings How much? What? is a collection of simple statements of customer wants, needs, or requirements; that is, the voice of the customer. These statements are easy for the customer to identify with and to understand. They accurately and simply list the group of characteristics or properties that make the customer happy. How? is a list of engineering, design and technical properties that are necessary to develop the equipment. The What? list becomes the titles for the QFD matrix rows, and the How? list becomes the titles for the columns, see the icon at the beginning of the QFD discussion. The relationship matrix is used to relate the What? rows to the How? columns. A relevance number or symbol is assigned to the intersections of the rows and columns. This results in establishing the relationship between what the customer wants and how the equipment is going to meet those wants. Usually an extra column, called priority, is placed just to the left of the relationship matrix. It is used to assign importance weights to the customer wants; that is, to determine which of the customer wants are the most important to the customer. This determines which characteristics will get the most focus. The determination is made with the customer, or at least with some very good knowledge of what the customer wants.
SEMATECH
123 Engineering, design, and technical properties are not independent of one another. Therefore, it is necessary to examine how they relate to one another. This results in the roof of the house of quality which is the correlation matrix. It is also necessary to determine if the properties are correlated positively or negatively. An example of negatively correlated properties would be strength and flexibility. The matrix is usually expanded further to include the importance ratings and the How much? column. The importance ratings contain numbers derived from the matrix values and the priority column. It is used to indicate the importance of each of the properties with respect to the customer wants. The How much? column contains the target values for every property listed in the How? column. It answers the question, "How much is enough?"
References
Akao, Y., editor, Quality Function Deployment: Integrated Customer Requirements into Product Design, Norwalk CN:Productive Press, 1990. Hauser, J., D. Clausing, "The House of Quality," Harvard Business Review, May-June 1988, pp. 63-73. Ryan, N., editor, Taguchi Methods and QFD: Hows and Whys for Management, Dearborn, MI:ASI Press, 1988, pp. 63-110.
SEMATECH
124
Modeling produces its maximum economic benefit when performed during the design phase of the equipment life cycle. However, modeling can also provide economic benefits when applied to existing equipment. The development of a system model depends heavily on the users understanding of the equipment that is being modeled. However, proper utilization of the model also requires the analyst to have a working knowledge of several concepts in the areas of statistics, probability, and reliability. Version 1.0 of RAMP provides the capability for developing, editing, and evaluating reliability models for equipment used in semiconductor manufacturing. This capability is supported by an integrated data management system and an integrated graphics output capability. The following features were included to make the software as user friendly as possible: Menu driven. All options available to the user can be accessed from on-screen menus. Help screens. Context-sensitive help is available to the user at all times. Mouse support. Mouse support is provided on all screens where use of the mouse significantly improves the user interface.
SEMATECH
Technology Transfer # 92031014A-GEN
125 Graphics output. Graphics output is fully integrated into the software. Modular design. The design of the software package is modular to allow easy modification or addition of capabilities. Integrated data management. Management of component data is fully integrated into the software. File management. Management of file names and file identification is transparent to the user.
WHS-TC-VS WHS-ROBTARM WHS-ROBSERV WHS-ROB WSEN WHS-ROBELEC WHS--ELEC PS WHS-ELEC CIB
Figure 4-1. A Block Model Developed in RAMP for the SETEC Generic Wafer Handler System A system model for the equipment is easily developed in RAMP in the form of a block diagram. Figure 3-1 gives an example of a block model representation of a SETEC generic wafer handler as developed in RAMP by the analyst. The system is represented with 14 components in series (7 of which are shown in Figure 4-1). Component failure rate information, including a characterization of the uncertainty, is entered into the component data library in RAMP. RAMP converts the block diagram model in Figure 4-1 to a mathematical equation and uses random selection techniques to sample the component failure rates from the component data library. The output from RAMP provides complete sensitivity and uncertainty analysis results for various performance measures that are associated with a reliability analysis of the system being modeled, including System MTBF The system MTBF is for the modeled system. A range of values for the MTBF and the distribution associated with that range is provided. Component contribution to system failure The fractional contribution that a component makes to the failure of the system. Component contribution to subsystem failure The fractional contribution that a component makes to the failure of the subsystem. Subsystem contribution to system failure The fractional contribution that a subsystem makes to the failure of the system. Reliability Improvement The value of reliability improvement for a component is the system MTBF (in hours) that would result if the failure rate for that component were zero (that is, the component were perfectly reliable or nearly so). Uncertainty importance Uncertainty importance provides a measure of the contribution of a component to the uncertainty in the probability of system failure. Results produced by RAMP are available in various types of displays that include Histograms A histogram is a graphical presentation of sample data using classes (that is, intervals) on the x axis and relative frequency on the y axis. Cumulative distribution functions (CDFs) A CDF is a graph of the cumulative relative frequency (cumulative fraction) of observations less than or equal to a given value.
SEMATECH
126 Pareto diagrams A Pareto diagram is a bar chart with the displayed values ordered from the largest to the smallest. RAMP orders displayed values based on the mean. The 5th and 95th percentiles are also displayed when they are available. Summary statistics A written list of all the statistics calculated by RAMP is displayed, such as the average MTBF, standard deviation for MTBF, and selected quantiles of the uncertainty distribution for MTBF. Input samples This option allows an analyst to view or print input failure rates as sampled from component failure rate distributions. Output results from samples This option allows the analyst to view or print the numerical results that are calculated for each of the sampled failure rates. Statistical results This option allows an analyst to view or print selected statistical results, such as the mean value for all components. Based on the characterization of the failure rates in the component data library for the SETEC generic wafer handler system shown in Figure 4-1, the summary statistics produced by RAMP give a mean value for MTBF of 93 hrs with about a 5 percent chance of being less than 50 hrs and a 5 percent chance of exceeding 178 hrs. A graph of the estimated cumulative distribution function for MTBF that is produced by RAMP is given in Figure 4-2.
SEMATECH
127
Figure 4-2. An Estimate of the Cumulative Distribution Function for MTBF The Pareto diagram in Figure 3-3 identifies the components that are the dominant contributors to the failure of the system such as robot servo, robot wafer sensor, elevator door, and sensor amplifiers. The Pareto diagram uses three horizontal bars with each component name rather than the usual one bar. This is done to display the uncertainty associated with the contribution of each component to system failure. The three bars represent the 95th percentile, the mean, and the 5th percentile of the distribution of the components contribution to system failure. Now assume that the engineers involved with the SETEC generic wafer handler have developed a new and improved elevator that improves its MTBF by a factor of 2. The component data library is modified to reflect the new MTBF for the elevator. In addition, the engineers would like to evaluate the impact on system reliability of a design change that would incorporate redundancy by adding another robot wafer sensor in parallel. Because the sensors are in parallel, they must both fail before they cause the system to fail, thus improving the system MTBF. The block diagram model is modified to include this desired design change. The modified block diagram is shown in Figure 4-4.
SEMATECH
128
WHS-ROB WSENP
Figure 4-4. A Revised Block Diagram for the SETEC Generic Wafer Handler System, showing the Addition of the Redundant Wafer Sensor After these modifications, the summary statistics produced by RAMP give a mean value for MTBF of 137 hrs for an increase of 47 percent. There is approximately a 5 percent chance of the MTBF being less than 64 hr and a 5 percent chance of it exceeding 249 hr. A graph of the estimated cumulative distribution function for MTBF that is produced by RAMP is given in Figure 4-5.
SEMATECH
129
Figure 4-5. An Estimate of the Cumulative Distribution Function for MTBF after Modifying the Generic Wafer Handler System The new Pareto diagram is given in Figure 4-6 and shows that the wafer sensor is no longer a problem and has dropped out of the top ten list of components contributing to system failure. In addition, the elevator door has now dropped behind the sensor amplifiers in the rankings.
SEMATECH
130
Figure 4-6. A Pareto Diagram for Component Contribution to System Failureafter Modifying the Generic Wafer System This example has illustrated how RAMP provides a prediction of the system MTBF (including the uncertainty in the prediction) after making two improvements in the system. Thus, modeling has provided a tool for adopting a proactive position rather than a reactive position with respect to making changes in the system to improve its reliability. That is, the analyst now has a good idea of how the proposed changes will affect the performance of the system and knows where to expend the companys resources to provide an even greater improvement prior to committing those resources. This simple example provided a flavor of how RAMP works and demonstrated the usefulness of modeling. Modeling alone does not make a system reliable, but it does provide an organized means of understanding the system as well as being a tool to guide the wise expenditure of resources for improved reliability.
References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, SETEC91-030, Albuquerque, NM:Sandia National Laboratories. Campbell, J., B. Thompson, D. Longsine, P. OConnell, R. Iman, RAMP Users Reference Manual, SETEC91-030, Albuquerque, NM:Sandia National Laboratories.
SEMATECH
131
References
Technology Transfer # 92031014A-GEN
SEMATECH
132 Arsenault, J., J. Roberts, Reliability & Maintainability of Electronic Systems, Potomac, MD:Computer Science Press, Inc., 1980, pp. 344-353. MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Volume I of II, Irvine, CA:Global Engineering Documents, 12 October 1988. 92714, pp. 8-68 to 8-90. RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11, Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 149 - 158.
SEMATECH
133
References
Ireson, W., C. Coombs, Jr., Handbook of Reliability Engineering and Management, NY:McGraw-Hill, 1988, pp. 8.1 - 8.39. RADC Reliability Engineers Toolkit An Application Oriented Guide for the Practicing Reliability Engineer, Griffiss Air Force Base, NY:Systems Reliability and Engineering Division, Rome Air Development Center, July 1988, pg. 101. RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11, Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 211-217.
SEMATECH
134
SS1
SS2
SS3
Reliability Block Diagram (RBD) models are one of the tools that can be used to create a reliability model of equipment. One of the easiest ways to describe the basic ideas used in the creation of RBD models is to create a simple RBD; for a more detailed description of the diagrams look at the sources listed in the references. Construction of a reliability block diagram begins by defining what is meant by equipment failure; for example, equipment failure may be defined as any failure that causes the equipment to be down for 8 minutes or longer. Once this is done, the next step is to determine the various ways that this failure can occur. This is initially done at a gross level; that is, 10 to 20 subsystems are defined that can lead to equipment failure. A block diagram model that consists of 3 subsystems (SS1, SS2, and SS3) follows:
C3
C1
C2
SS2
C4
C5
In this example SS2 is not a significant contributor to the unreliability of the equipment, so it will not be broken into any more detail. SS1 and SS3 however, are contributors to equipment unreliability. SS1 fails if component 1 or 2 (C1 or C2) fail. SS3 fails if components 3, 4, and 5 (C3, C4, and C5) all fail. The block diagram model now looks like:
SEMATECH
135
C3 P1 C1 P2 C5
Further analysis reveals that C2 fails if parts 1 and 2 (P1 and P2) fail. C4 fails if parts 3 or 4 (P3 or P4) fail. The block diagram model now looks like: Once construction of the model is complete, it is translated into a Boolean equation which is then used to quantify the equipment reliability. The references discuss Boolean algebra in detail, it will not be discussed here. The Boolean equation for the RBD is: Equipment Failure = C1 + P1 * P2 + SS2 + [C3 * (P3 + P4) * C5] expanding and using the associative and distributive laws, Equipment Failure = C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5. Each of the terms in this equation represent a way that the equipment can fail. For example, if part 1 and part 2 (P1 and P2) fail, the equipment fails. The reliability block diagram has been translated into an equation, it is now time to quantify the probability that the equipment fails as a function of its subsystems, components, and parts. Often the term probability is used when what is really meant is frequency, probabilities must lie between 0 and 1. A frequency can be any number greater than or equal to 0, depending on the number of failures and the time scale used. For example, if a component fails twice per year, its frequency is 2/yr, or 0.66/mo. Using the previous example, the probability of equipment failure can be written, P(Equipment Failure) = P(C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5). But, how does one deal with the right-hand side of the equation? Considering the basic laws of probability and the small probability approximation, and assuming that the events are independent, the example equation becomes: P(Equipment Failure) = P(C1) + P(P1)*P(P2) + P(SS2) + P(C3)*P(P3)*P(C5) + P(C3)*P(P4)*P(C5).
SS2
P3
P4
SEMATECH
136
References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, Albuquerque, NM:Sandia National Laboratories, SETEC91-030, pp. 9 - 31. Klinger, D., Y. Nakada, M. Menendez, AT&T Reliability Manual, New York:Van Nostrand Reinhold, 1990, pp. 78-91. MIL-STD-756B, Reliability Modeling and Prediction, Washington, DC:Department of Defense, 18 November 1981, pp. 1001-1 to 1001-11.
SEMATECH
137
References
Asher, H., H. Feingold, Repairable Systems Reliability Modeling, Inference, Misconceptions and Their Causes, New York:Marcel Dekker, Inc., 1984. Nelson, W., "Graphical Analysis of System Repair Data," Journal of Quality Technology, Vol. 20, No. 1, Jan. 1988, pp. 24-35.
SEMATECH
138
SEMATECH
139 3. Quality improvement requires never-ending reduction of variation in product and process performance around desired values. The quality of a product cannot be improved unless the quality characteristics of that product can be identified and measured and the ideal values are known. Each quality characteristic varies from unit to unit and over time. The objective of a continuous quality improvement process is to reduce this variation; that is, make the quality characteristics as close to their ideal values as possible. However, it is generally not economical or necessary to improve all quality characteristics since not all characteristics are of equal importance. Performance characteristics are defined as those characteristics that determine the products performance in satisfying the customers requirements. The ideal value is called the target value. If a product is of high quality, the performance characteristics remain close to their targeted values under all operating conditions. The variation of a performance characteristic about its target value is referred to as performance variation. The smaller the performance variation about the target value, the better the quality. Target specifications are typically stated in terms of nominal values and tolerances about these values. It is not acceptable to state target values in terms of interval specifications only. This leads to the idea that it is okay to be anywhere within the interval and that magically the performance characteristics deteriorate when they move out of the interval. The goal is for the performance characteristics to always be at their targeted values. 4. Societys loss due to performance variation is frequently proportional to the square of the deviation of the performance characteristic from its target value. Any variation in a products performance characteristic about its targeted value causes a loss to society. This loss can range from inconvenience to monetary loss and physical harm. Variation is represented mathematically in the following manner. Let be a performance characteristic measured on a continuous scale and let the target value of be . Let () represent dollar losses suffered by society at some time during the products life span due to the deviation of from . Generally, the larger the deviation of the performance characteristic from it target value , the larger the loss to society, (). However, it is usually difficult to determine the actual mathematical form of (). Often, a quadratic approximation to () adequately represents economic losses due to the deviation of from . The simplest quadratic loss function is () = k(-)2, where k is some unknown constant that can be determined when () is known for a particular value of . There are three cases of the loss function that are typically used: when a specific target value is the best and the loss increases symmetrically as the performance characteristic deviates from the target when the smaller is better, for example, if the performance characteristic is the amount of impurity and the target value is zero; here the smaller the impurity, the better it is when the larger the better, for example, if the performance characteristic is strength; here the larger the strength the better it is The average loss to society due to performance variation is obtained by "statistically averaging" the quadratic loss () = k(-)2 associated with the possible values of . In the case of quadratic loss functions, the average loss due to performance variation is
Technology Transfer # 92031014A-GEN
SEMATECH
140 proportional to the mean squared error of about its targeted value . Therefore the fundamental measure of variability is the mean squared error and not the variance. The concept of quadratic loss emphasizes the importance of continuously reducing performance variation. 5. The final quality and cost of a manufactured product are determined to a large extent by the engineering design of the product and its manufacturing process. The number of manufacturing imperfections in a product, hence the manufacturing cost of a product, is significantly affected by the products design and the design of the process used to produce the product. Generally, a products field performance is affected by environmental variables as well as human variations in operating the product, product deterioration, and manufacturing imperfections. Note that these sources of variation are chronic problems. Manufacturing imperfections are the deviations of the actual parameters of a manufactured product from their nominal values. These imperfections are caused by inevitable uncertainties in a manufacturing process and are responsible for performance variation across different units of a product. Dealing with variations due to environmental factors and product deterioration can be done only in the products concept and design phases. The manufacturing costs and imperfections in a product are largely determined by the design of the manufacturing process. Increasing process controls can reduce manufacturing imperfections; however, process controls cost money. It is, therefore, necessary to reduce both manufacturing imperfections and process controls. Once the process is under statistical control, it can be improved. Without a stable process it is almost impossible to discover a means of reducing variation due to chronic problems. 6. Performance variation can be reduced by exploiting the nonlinear effects between a products and/or processs parameters and the products desired performance characteristics. Due to the importance of the product and process design, quality control must begin in the concept phase of the life cycle and continue through all phases. There are two types of quality control methods: Off-line, which are technical aids for quality and cost control in product and process design. These are used to improve product quality and manufacturability, and to reduce product development, manufacturing, and lifetime costs. On-line, which are technical aids for quality and cost control in manufacturing. As with performance characteristics, all specifications of product and process parameters should be stated in terms of ideal values and tolerances around these ideal values. The idea is not to produce products whose parameters are barely inside the tolerance intervals. Such products are likely to be of poor quality due to the interdependencies of the parameters. A product performs best when all parameters of the product are at their ideal values. Further, the knowledge of ideal values of product and process parameters encourages continuous quality improvements. Taguchi has introduced a three-step approach to assign nominal values and tolerances to product and process parameters: System design Parameter design
SEMATECH
System design involves applying scientific and engineering knowledge to produce a basic functional prototype design. The prototype model defines the initial setting of the product or process parameters. System design requires an understanding of both the customers needs and the manufacturing environment. A product cannot satisfy the customers needs unless it is designed to do so. Designing for manufacturability requires an understanding of the manufacturing environment. Parameter design involves identifying the settings of product or process parameters that reduce the sensitivity of engineering designs to the sources of variation. Adjustment of the mean value of a performance characteristic to its targeted value is usually a much easier engineering problem than the reduction of performance variation. The utilization of nonlinear effects of product or process parameters on the performance characteristics to reduce the sensitivity of engineering designs to the sources of variation is the essence of parameter design. Because parameter design reduces performance variation by reducing the influence of the sources of variation rather than by controlling them, it is a very costeffective technique for improving engineering designs. It is economically advantageous for a designer to provide designs that are tolerant to statistical variations. Tolerance design involves determining tolerances around the nominal settings identified by parameter design. Industry commonly assigns tolerances using convention rather than science. Narrow tolerances increase manufacturing costs while wide tolerances increase performance variation. Thus, tolerance design is a trade-off between societys loss due to performance variation and the increase in manufacturing costs. 7. Statistically planned experiments can be used to identify the settings of product (and process) parameters that reduce performance variation. This is the portion of Taguchis methodology that is subject to criticism. Engineers tend to like Taguchis statistical methods because he has made a serious effort to develop methods that are easy for a non-statistical expert to use. However, Taguchis experiments can be enormous and extremely inefficient. Taguchis approach to the use of statistically planned experiments for parameter design involves classification of the performance characteristics of a product or process into two categories: design parameters and sources of noise. Design parameters are those product or process parameters whose nominal settings can be chosen by the responsible engineer. These nominal settings define the product or process design specifications and vice versa. The sources of noise are all those variables that cause the performance characteristics to deviate from their targeted values. The noise factors are those sources of noise that can be systematically varied in a parameter design experiment. The key noise factors, those that represent the major sources of noise affecting a products performance in the field and a process performance in the manufacturing environment, should be identified and included in the experiment.
SEMATECH
142
References
Barker, T.B., "Quality Engineering By Design: Taguchis Philosophy,"Quality Progress, December 1986, pp. 32-42. Gitlow, H., S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of Quality, Boston, MA:IRWIN, 1989, pp. 491-507. Gunter, B., "A Perspective on the Taguchi Methods," Quality Progress, June 1987, pp. 44-52. Kackar, R.N., "Taguchis Quality Philosophy: Analysis and Commentary,"Quality Progress, December 1986, pp. 21-29. Miller, K.L., D. Woodruff, "A Design Masters End Run Around Trial and Error,"Business Week/Quality, October 15, 1991, pg. 24. Phadke, M.S., Quality Engineering Using Robust Design, Englewood Cliffs, NJ:Prentice Hall, 1989. Port, O., J. Carey, "Quality: A Field With Roots That Go Back To The Farm," Business Week/Quality, October 15, 1991, pg. 15. Ross, P.J., Taguchi Techniques for Quality Engineering Loss Function, Orthogonal Experiments, Parameter and Tolerance Design, New York, NY:McGraw-Hill Book Company, 1988. Taguchi, G., Introduction To Quality Engineering Designing Quality into Products and Process, White Plains, NY:Asian Productivity Organization, 1987.
SEMATECH
143
References
EIP Data Gathering Group, SEMATECH, Austin, TX Partnering For Total Quality A Total Quality Tool Kit, Volume Six, SEMATECH, 1990, pp. 76, 61.
SEMATECH
144
SEMATECH
145
References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, SETEC91-030, Albuquerque, NM:Sandia National Laboratories. Campbell, J., B. Thompson, D. Longsine, P. OConnell, R. Iman, RAMP Users Reference Manual, SETEC91-030, Albuquerque, NM:Sandia National Laboratories. Cost of Ownership Model, SEMATECH Technology Transfer # 91020473B-GEN, Austin, TX:SEMATECH, January 24, 1991
SEMATECH