Você está na página 1de 162

Guidelines for Equipment Reliability

SEMATECH
Technology Transfer 92031014A-GEN

SEMATECH and the SEMATECH logo are registered service marks of SEMATECH, Inc.
Product names and company names used in this publication are for identification purposes only and may be trademarks or service marks of their respective companies

1997 SEMATECH, Inc.

Guidelines for Equipment Reliability


Technology Transfer # 92031014A-GEN SEMATECH

May 5, 1992

Abstract:

This guideline was developed by a task force comprised of reliability experts and users of reliability methodologies from the SEMI/SEMATECH member companies. The document was written to address the needs of semiconductor equipment manufacturers and their customers. It includes a description of the principles of a cost-effective reliability program, instructions on how to get started, and details on what needs to be done. A large portion of the document is dedicated to analysis and testing methodologies. These include: Failure Modes and Effects Analysis (FMEA), Fault Tree Analysis (FTA), Component Failure Analysis (CFA), Human Reliability Analysis (HRA); and Reliability Testing, Component Testing, Accelerated Testing (Sudden Death, Step-Stress Testing), Burn-in Testing, Life Testing, Environmental Stress Screening, Qualification Testing, and Acceptance Testing.

Keywords: Life Cycle Phases, Reliability Testing, RAMP, Failure, FRACAS, Failure Modes and Effects
Analysis, Quality Function Deployment (QFD), Design of Experiment, Cost of Ownership, Infant Mortality, Reliability Qualification Testing (RQT), Taguchi, Users Groups, Reliability Block Diagram Modeling (RBD), Environmental Stress Screening (ESS), Fault Tree Analysis (FTA) Authors: Dhudsia, Vallabh

Approvals:

Vallabh Dhudsia, Project Manager & Author Keith Erickson, Director Dan McGowan, Technical Information Transfer Team Leader

iii Table of Contents 1 SUMMARY ................................................................................................................................. 1 2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE CYCLE........... 2 2.1 Introduction ......................................................................................................................... 2 2.2 The Equipment Life Cycle .................................................................................................. 2 2.3 Life Cycle Phases ................................................................................................................ 3 2.4 Life Cycle Cost.................................................................................................................... 9 2.5 The Reliability Improvement Process ............................................................................... 13 2.6 Applying the Reliability Improvement Process................................................................. 21 2.7 Summary ........................................................................................................................... 23 2.8 References ......................................................................................................................... 24 3 IMPLEMENTATION OF THE RELIABILITY IMPROVEMENT PROCESS....................... 25 3.1 Introduction ....................................................................................................................... 25 3.2 Managements Role........................................................................................................... 25 3.3 Applying the Reliability Improvement ProcessThe Reliability Improvement Process..... 26 3.4 Specific Applications of the Reliability Improvement Process......................................... 44 3.4.1 Starting with Equipment in the Design Phasewith Equipment in the Design Phase .................................................................................................................... 44 3.4.2 Starting with Equipment in the Prototype Phase ................................................... 46 3.4.3 Starting with Equipment in the Pilot Production Phasewith Equipment in the Pilot Production Phase ......................................................................................... 47 3.4.4 Starting with Equipment in the Production and Operation Phasewith Equipment in the Production and Operation Phase ............................................. 49 3.4.5 Starting with Equipment in Phase Out Phase with Equipment in Phase Out Phase .................................................................................................................... 50 3.5 Functional ResponsibilitiesResponsibilities...................................................................... 51 3.6 Where to Begin.................................................................................................................. 52 3.7 Reliability Plans ................................................................................................................ 55 3.8 Application of Resources and Communicating Value ...................................................... 56 3.9 Summary ........................................................................................................................... 57 3.10 References ....................................................................................................................... 58 4 ACTIVITIES AND TOOLS IN THE RELIABILITY IMPROVEMENT PROCESS............... 59 4.1 Introduction ....................................................................................................................... 59 4.2 Reliability ActivitiesActivities.......................................................................................... 59

Technology Transfer # 92031014A-GEN

SEMATECH

iv List of Figures Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs................................................ 9 Figure 2-2. Impact of a reliability program on life cycle cost...................................................... 11 Figure 2-3. Optimizing Life Cycle Costs ..................................................................................... 12 Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment............................. 13 Figure 2-5. The Reliability Improvement Process........................................................................ 14 Figure 2-6. Application of Reliability Improvement Process....................................................... 22 Figure 3-1. Multiple Equipment and Their Life Cycle Phase Status............................................ 53 Figure 4-1. A Block Model Developed in RAMP for the SETEC Generic Wafer Handler System...................................................................................................................... 125 Figure 4-2. An Estimate of the Cumulative Distribution Function for MTBF .......................... 127 Figure 4-3. A Pareto Diagram for Component Contribution to System Failure ........................ 128 Figure 4-4. A Revised Block Diagram for the SETEC Generic Wafer Handler System, showing the Addition of the Redundant Wafer Sensor............................................ 128 Figure 4-5. An Estimate of the Cumulative Distribution Function for MTBF after Modifying the Generic Wafer Handler System........................................................ 129 Figure 4-6. A Pareto Diagram for Component Contribution to System Failureafter Modifying the Generic Wafer System...................................................................... 130 Further analysis reveals that C2 fails if parts 1 and 2 (P1 and P2) fail. C4 fails if parts 3 or 4 (P3 or P4) fail. The block diagram model now looks like:.............................. 135

SEMATECH

Technology Transfer # 92031014A-GEN

v List of Tables Table 3-1. Reliability Improvement Process Applied at Six Different Starting Points................. 27 Table 3-2. Reliability Improvement Process Activities ............................................................... 31 Table 3-3. Reliability Improvement Process Activities2-3. Reliability Improvement Process Activities for the Design Phase..................................................................... 34 Table 3-4. Reliability Improvement Process Activities for the Prototype Phase......................... 37 Table 3-5. Reliability Improvement Process Activities for the Pilot Production Phase .............. 40 Table 3-6. Reliability Improvement Process Activities for the Production and Operation Phase .......................................................................................................................... 42 Table 3-7. Reliability Improvement Process Activities for the PhaseOut Phase2-7. Reliability Improvement Process Activities for the PhaseOut Phase ...................... 44 Table 3-8. Design Phase Reliability Improvement Process Activities......................................... 45 Table 3-9. Prototype Phase Reliability Improvement Process Activities..................................... 47 Table 3-10. Pilot Production Phase Reliability Improvement Process Activities When Initiated In Pilot Production Phase............................................................................. 48 Table 3-11. Production and Operation Phase Reliability Improvement Process Activities When Initiated in Production and Operation Phase ................................................... 50 Table 3-12. Phase Out Phase Reliability Improvement Process Activities When Initiated in Phase-Out Phase..................................................................................................... 51 Table 3-13. Current Product Line Status...................................................................................... 54

Technology Transfer # 92031014A-GEN

SEMATECH

iii Acknowledgements To assist in the development of these guidelines, a task force of representatives from the semiconductor industry was assembled to provide guidance in the structure and content. Their contributions and dedication to this effort has been excellent and beyond the call of duty. Our thanks to each of the task force members, reviewers, and contributors for their commitment to such an ambitious effort. It has made the development of these guidelines more enjoyable and possible. TASK FORCE MEMBERS Sandia National Labs. - SETEC Wallis Cramond Dennis Huffman SEMATECH Dr. Vallabh H. Dhudshia, Texas Instruments, Inc. David Seekon, National Semiconductor Corp. Mario Villacourt SEMATECH Member Companies Denny Johnson, International Business Machines (IBM) Karl Koch, Digital Equipment Corp. (DEC) John OReilly, DEC Richard Talbot, IBM Larry Waite, National Semiconductor Corp. Chuck Woodard, IBM SEMI/SEMATECH Dr. Michael McGraw SEMI/SEMATECH Member Companies Ron Dornseif, Genus Jack Olivieri, MKS Instruments Dr. Ralph Dudley, Applied Materials Dr. Robert Cranwell Dr. Ron Iman Dr. Irving Hall Teresa Sype

Technology Transfer # 92031014A-GEN

SEMATECH

iv REVIEWERS and CONTRIBUTORS Samuel Becktel, Genus Richard E. Howard, Luxtron Products Dr. Samuel Keene, IBM Richard Gerstner, SEMATECH David Troness, Intel Sue Howell, SEMI/SEMATECH Dennis R. Hoffman, TI Bob Holmstrom, ATEQ Corp. Dr. David J. Klinger, AT&T Dr. Jerry Brandwie, RI Dr. Richard Prairie, SETEC Debra Vogler, Varian Associates

SEMATECH

Technology Transfer # 92031014A-GEN

v The SEMATECH Perspective Statement from Bill Spencer, CEO of SEMATECH: Todays competitive environment demands an increasing level of reliability in semiconductor manufacturing equipment. The industry has made great strides in the last four years in improving reliability. In fact, VLSI Research reports that in its annual customer survey, reliability has fallen to sixth place on the list of biggest problems, after being number one for 10 years. VLSI is quick to give SEMATECH credit for much of the improvement. And while the existence of SEMATECH was a key element, the supplier industry should receive added praise for stepping up and solving a major problem. But, as with so much of this business today, reliability is a race without an end. And the formula to improved reliability is to build it into every stage of development. This Reliability Guideline will assist in development of a program to ensure consideration of reliability factors at every stage of product development from inception through qualification. The Guideline was developed by a task force comprised of reliability experts and users of reliability methodologies from the SEMI/SEMATECH member companies. As a result, it offers best-of-breed concepts and is written to meet the needs of semiconductor equipment manufacturers and their customers. Im sure it will prove an excellent tool.

William J. Spencer President and Chief Executive Officer

Technology Transfer # 92031014A-GEN

SEMATECH

vi

Preface These guidelines have been written for use by semiconductor equipment suppliers and customers. They are intended as a road map that these groups can refer to for assistance in improving the reliability of their semiconductor manufacturing equipment as part of a long-term strategy aimed at regaining an increased worldwide market share. Although there is an abundance of reliability information available in text books, military handbooks and standards, and guidebooks directed at specific products, there is no concise, single source document available for the semiconductor equipment industry. The purpose of these guidelines is to fill this gap. To assist in this effort, a task force consisting of representatives from the semiconductor industry was assembled to provide guidance in the structure and content of these guidelines. The guidelines do not provide comprehensive instruction on the details of reliability engineering; rather they provide a description of the principles of a cost-effective reliability program, instructions on how to get started, and details on what needs to be done. Descriptions of necessary program activities and reliability concepts are provided along with references for those who desire additional information. The focus of the guidelines is on hardware reliability realizing that software reliability is an important aspect of reliability for a large segment of semiconductor manufacturing equipment. However, other guidelines exist that address the issue of software reliability. Thus, the software reliability topic is discussed only briefly. The guidelines: Are intended to be of value to managers, reliability engineers, and designers Are not a "detailed how-to" document, but rather a "roadmap of how to" Are centered around a continuous improvement process referred to as the Reliability Improvement Process Cover the entire equipment life cycle as it applies to the semiconductor equipment industry Even though emphasis is placed on designing in reliability, the guidelines show how to incorporate reliability into every phase of the equipment life cycle.

SEMATECH

Technology Transfer # 92031014A-GEN

vii The guidelines are broken into three sections: Section 2.0, The Reliability Improvement Process and Equipment Life Cycle, describes the Reliability Improvement Process and the Equipment Life Cycle. Life cycle phases are defined and discussed, as well as life cycle costs. The five steps of the Reliability Improvement Process are defined and discussed. Section 3.0, Implementation of the Reliability Improvement Process, describes the activities involved in applying each step of the Reliability Improvement Process to each phase of the Equipment Life Cycle. The section associated with activities provides information on applying the Reliability Improvement Process continuously throughout the entire life cycle. Also discussed are the activities associated with applying the Reliability Improvement Process during later phases of the life cycle. Section4.0, Activities and Tools in the Reliability Improvement Process, provides a description of the activities and tools that are part of the Reliability Improvement Process. Activities are grouped under engineering, data, and testing. Specific tools used in the application of certain activities are also discussed. Section 3.0 is meant to provide more information and guidance on activities and tools used in the application of the Reliability Improvement Process.

Technology Transfer # 92031014A-GEN

SEMATECH

SUMMARY

These guidelines focus on a continuous improvement process referred to as the Reliability Improvement Process, and the Equipment Life Cycle. These two concepts are introduced and discussed in Section 1.0 of the guidelines. Knowledge of the equipment life cycle is important because it provides a basis for understanding how and where reliability engineering enters into the process of designing, producing, and operating the equipment. In this document, the life cycle has been broken into six distinct phases, each representing a unique portion of the life cycle. These six life cycle phases are: 1. Concept and Feasibility Phase 2. Design Phase 3. Prototype (alpha-site) Phase 4. Pilot Production (beta-site) Phase 5. Production and Operation Phase 6. Phase-out Phase These phases provide the framework for tracking reliability improvement throughout the equipment life cycle phases and guidance on when and where to apply resources. Life cycle costs concepts are introduced to help understand the impact on expenditures and cost of ownership when reliability is initiated at different phases of the life cycle. The Reliability Improvement Process provides a means for systematically improving reliability throughout the equipment life cycle. It is an iterative process of setting goals, evaluating, comparing, and improving directed toward continuous reliability improvement. It consists of five basic steps. 1. Establish reliability goals and requirements for equipment 2. Apply reliability engineering or improvement activities, as needed 3. Conduct an evaluation of the equipment or equipment design 4. Compare the results of the evaluation to the goals and requirements and make a decision for the next step 5. Identify problems and root causes The process then returns to Step 2, and repeats Steps 2 through 5 until goals and requirements are met.

Technology Transfer # 92031014A-GEN

SEMATECH

2 The role of management in implementing the Reliability Improvement Process is introduced in Section 2.0. Management has responsibilities in establishing and implementing the Reliability Improvement Process. These responsibilities include establishing the right environment and choosing individuals to champion the effort. Section 2.0 provides details on preparing for and implementing the Reliability Improvement Process, including a discussion on the various activities associated with each step of the Reliability Improvement Process and each phase of the life cycle. The Reliability Improvement Process can be used for a piece of equipment regardless of its placement in the life cycle. The discussion in Section 2.0 includes information on how to select equipment for initiating reliability improvement, the importance of data, and the choice of activities when resources are limited. Activities and tools used in applying the Reliability Improvement Process are discussed in more detail in Section 3.0. Three types of activities are listed: engineering, datarelated, and testing. Many of the activities require tools for implementation. These tools come from various disciplines such as probability and statistics and reliability engineering. References that have detailed information on the tool or activity are provided at the end of each activity in Section 3.0. 2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE CYCLE Introduction

2.1

The reliability improvement process and the equipment life cycle form the basis for these guidelines and are introduced in this section. The reliability improvement process is an iterative process that provides: An effective and systematic way to include reliability in equipment design A structure for making reliability improvements throughout the equipment life cycle The reliability improvement process provides a means for making revolutionary advancements when it is applied to equipment early in the design stage, or during major design upgrades, or for making evolutionary improvements to existing equipment. Knowledge of the equipment life cycle is important because it provides: The framework for applying the reliability improvement process A basis for understanding the best practice for improving equipment reliability and the cost of the improvement Life cycle costs are introduced in this section to provide a perspective on the impact of initiating the reliability improvement process early in the equipment life cycle. A thorough knowledge of life cycle costs and life cycle phase relationships helps to achieve better equipment at lower total costs. 2.2 The Equipment Life Cycle

The equipment life cycle begins when the idea for the equipment is conceived and ends when the equipment is no longer useful. The life cycle consists of phases that describe the state of design, process of development, and production of the equipment. A working knowledge of these phases
SEMATECH
Technology Transfer # 92031014A-GEN

3 enables proper planning and execution of the activities and functions necessary for designing, manufacturing, and operating reliable equipment in a cost effective manner. 2.3 Life Cycle Phases

In this document, the life cycle has been divided into the six phases listed below. As indicated, these six phases can be grouped under three macro phases. The three macro phases are sometimes used in place of the six phases for illustrative purposes; this in no way impacts the concepts and methodology presented.

1. 2. 3. 4. 5. 6.

Concept and Feasibility Design Prototype (alpha (X)-site) Pilot Production (Beta (B)-site Production and Operations Phase-out Phase

Concept and Feasibility

Design and Development

Macrophases

Production and Operation

A discussion of each of the six life cycle phases follows. 1. Concept and Feasibility. The life cycle begins with this phase; the need for new equipment is identified and alternative approaches to fulfilling that need are explored. The need for new equipment may be based on existing equipment that can no longer perform its intended function or on customer requirements for which the necessary equipment does not exist.
Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out Concept/Feasibility

During this phase, marketing and sales personnel, customer service representatives, design and reliability engineers, and manufacturing engineers work together with the customer to: Determine the need for new equipment Establish reliability goals Evaluate the feasibility of meeting these goals Estimate resource requirements Examine alternative design concepts

Technology Transfer # 92031014A-GEN

SEMATECH

4 Select those concepts to be studied in more detail during the design phase Estimate cost trade offs

The concept and feasibility phase, and the design phase that follows, are the optimal times for using design-for-reliability practices. 2. Design. The alternative design concepts selected during the concept and feasibility phase are explored in more detail by the design engineers during this phase of the life cycle. A design disclosure package is prepared and evaluated by all concerned parties. Reliability and manufacturing engineers, as well as quality assurance and field service personnel are generally called on by the design engineers for input concerning parts selection, components, serviceability, and manufacturing processes. Also, reliability goals set for the equipment during the concept and feasibility phase are translated into requirements very early in the design phase. Requirements are useful in making preliminary reliability allocations to subsystems and components to understand cost impacts. This phase of the life cycle can be separated into two parts: preliminary design and final design.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

During the preliminary design process, design and reliability engineers: Modify goals to meet customer requirements Evaluate a number of design alternatives Make preliminary reliability allocations to subsystems and components Prepare a design disclosure package of requirements and specifications Estimate cost considerations More than one design alternative may be selected for the final design phase if serious questions remain about the best choice. During the final design process, customer and supplier representatives, design and reliability engineers, project managers, field service personnel, manufacturing engineers, and quality assurance personnel: Update reliability allocations to subsystems and components Carry out design reviews Implement design-for-reliability practices Update the design disclosure package to reflect these reviews Select specific designs for prototype construction
SEMATECH
Technology Transfer # 92031014A-GEN

5 Estimate cost trade offs and considerations

Several iterations of design review and redesign are usually required before a design is ready for prototype construction. Design reviews are important in measuring the progress against design requirements and gaining management approval to proceed with the prototype phase of the life cycle. These reviews are carried out in parallel with the design process and are often categorized as follows: Requirements Review - review the equipments design requirements Preliminary Design Review - evaluate the preliminary design against requirements Critical Design Review - provide design to the customer(s) for review 3. Prototype. Specific designs selected during the design phase are built and tested during this phase to determine if all design requirements will be met. The prototype phase provides the first opportunity to validate the entire design, and is therefore commonly called alpha-site evaluation. Selected customers are included in alpha-site evaluations and are asked to provide feedback on all aspects of the equipment.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

Multiple design alternatives may require prototyping and testing if serious questions exist about the best overall choice. It is common for reliability engineers to have responsibility for performing these tests. However, manufacturing personnel will have responsibility for determining that parts and components conform to specifications within financial guidelines. During the prototype phase, design, reliability, test, and manufacturing engineers, as well as quality assurance personnel: Build and test one or more prototypes of a design Present the test results for a pilot production design review Redesign as needed to fix weaknesses or make other desirable changes Conduct additional design reviews as appropriate The design reviews should include another critical design review to give the customer an opportunity to review the latest design being considered. Concurrent with redesigns and design reviews, reliability engineers, quality assurance personnel, and manufacturing engineers will develop quality assurance plans, design inspection and testing programs, set up production facilities, and develop production plans in preparation for the pilot production phase.
Technology Transfer # 92031014A-GEN

SEMATECH

6 4. Pilot Production. This phase of the life cycle serves as a bridge between the prototype phase and the production and operation phase. This is the first opportunity for the equipment to be evaluated in an extended customer environment, and is therefore commonly called beta-site evaluation. In fact, it may be the first time that the equipment is exposed to a customers processes.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

The purpose of the pilot production phase is to help identify and correct problems with the equipment before full-scale production begins. Design and reliability engineers should evaluate the actual level of equipment reliability and determine what needs to be accomplished to meet requirements in a cost effective manner. During the pilot production phase, project management, reliability engineers, manufacturing and test personnel, and customer service representatives: Qualify the equipment manufacturing process Establish field trials and customer applications of equipment Monitor the equipments performance Identify root causes of failures Implement a "corrective action" program for reliability problems Determine cost of ownership Prior to the production and operation phase of the life cycle, reliability and design engineers should evaluate equipment reliability and make the appropriate recommendations. If the actual equipment reliability level is less than desired, specific reliability improvement activities that were identified in the corrective action program should be implemented. This is the last opportunity to make design changes and other improvements before full-scale production. Design reviews conducted at this point are often broken down into: Qualification Review - verify that the final design meets requirements Production Readiness Review - to determine the readiness of full production Reliability Budget Review - verify the reliability goal allocations If any design changes were made at this point, another critical design review may be appropriate.

SEMATECH

Technology Transfer # 92031014A-GEN

7 5. Production and Operation. This phase of the life cycle represents the time when units are produced and sold. All major reliability problems should have been identified and corrected prior to the production and operation phase. A formal program must be in place for collecting and analyzing field service data and performance data for the customers unit as well as for the cost impact.
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

During the production and operation phase, field service personnel, management, quality assurance personnel, and reliability engineers: Implement a field tracking and customer feedback and satisfaction program Provide training and technical assistance to customers Document and employ installation testing and operation procedures Identify and report operation and maintenance problems Record failure data in a formal database Manage continuous improvement efforts Determine cost of ownership impacts Recorded failure data should account for uncertainty due to variations in site, product vintage, and customer procedures. After proper review, decisions are made for resource allocation for continuous improvement in the reliability process. The supplier and customer should function as partners in these efforts and may participate in user groups. Once equipment is in the field, it is important to continually monitor reliability, analyze failures and identify root causes, implement corrective actions, and improve known causes of failures both for the current and the next generation of equipment. 6. Phase Out. The equipment product line is approaching the end of its useful life during this final phase of the life cycle. The end of useful life naturally occurs earlier for the supplier than it does for the customer. The end of useful equipment life for the customer can occur due to obsolescence, wear, or a change in business plans. To remain competitive, the supplier must make plans for the next generation of equipment before phasing out current generation production.

Technology Transfer # 92031014A-GEN

SEMATECH

8
Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

The information gained during the six phases of the life cycle should be retained so that it can be used to improve future generations of similar or new equipment. This completes the life cycle for the current generation of equipment. Each new generation of equipment would experience basically the same life cycle. Supplier Cost Implications. The early life cycle phases typically represent the smallest portion of those total life cycle costs borne by the supplier, yet generally represent the region where the greatest impact on equipment reliability can be made. As a design moves toward completion, design details become increasingly fixed. Thus, the cost in time and dollars to correct reliability problems increases. Figure 1-1 shows that typically, toward the end of the design/development macro phase of the life cycle, only 15% of the life cycle costs are consumed, but approximately 95% of the total life cycle costs have been determined (i.e., locked in).[2] Thus, changes made to improve reliability after the design/development macro phase have little impact on overall life cycle costs, but can be very expensive in terms of costly design changes, retrofits, service calls, warranty claims, and customer goodwill. This is not meant to imply that equipment already in the production/operation macro phase should be ignored in terms of improving reliability. Reliability improvement activities should continue throughout the life cycle.

SEMATECH

Technology Transfer # 92031014A-GEN

100

95% 85%

100

80 % Locked-In Costs 60 % Locked-In Costs 40

Operation (50%)

80

60 % Total Costs 40 Production (35%)

20 12% 3% 0

20

0 Concept/Feasibility Design/Development Production/Operation

Source:

Arsenault and Roberts, Reliability and Maintainability of Electronic Systems

Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs Although reliability improvements made earlier in the life cycle can increase initial supplier costs, they generally result in lower support costs for the supplier and lower operational costs for the customer. Also, early improvement could reduce the suppliers costs of production, warranty, and service. 2.4 Life Cycle Cost

Two criteria used by semiconductor manufacturers to select equipment for a manufacturing step or process are: 1. Technical 2. Economical[1] The question asked for the technical criterion is, "Can a particular piece of equipment or equipment line do the manufacturing step or process required?" The question asked for the economical criterion is, "Does the result of the manufacturing process justify or support the cost and on-going expense of a particular piece of equipment or equipment line?" It is increasingly common for several pieces of equipment to be able to meet the technical criterion. Thus, the economical criterion is becoming increasingly important. Customers consider not only the initial purchase price, but the costs associated with equipment operations over its entire life (i.e., life cycle costs).

Technology Transfer # 92031014A-GEN

SEMATECH

10 Life cycle costs include both equipment supplier costs, which are passed on to the customer in the purchase price of the equipment, and all costs incurred by the customer over the equipment life. Supplier costs plus the suppliers gross profit margin are referred to asacquisition costs, and include: Research and development Marketing and sales Testing and manufacturing Supplier shipping and installation Supplier training and support Supplier service and spare parts Warranty costs Continuous improvement Costs incurred by the customer are referred to as operational costs, and include: Customer installation and training Operating costs Customer service costs and spares inventory Customer performed maintenance Customer space costs Scheduled maintenance Equipment improvements and upgrades Down time and scrap costs Disposal costs Life cycle costs implications to both the supplier and the customer are discussed in the following paragraphs.

SEMATECH

Technology Transfer # 92031014A-GEN

11 Customer Cost Implications. Improvements in reliability made by the supplier early in the equipment life cycle may result in higher development costs being passed on to the customer in the equipment acquisition costs. However, this can be more than offset as the customer benefits by having lower operational costs with increased reliability and up time that results in greater productivity. Figure 1-2 illustrates how a reliability program impacts acquisition and operational costs. As this figure indicates, acquisition costs may increase due to efforts to improve reliability.

Operational Total Life Cycle Costs Costs Operational Costs Total Life Cycle Costs Acquisition Costs No Formal Reliability Program With Formal Reliability Program

Acquisition Costs

Figure 2-2. Impact of a reliability program on life cycle cost However, operational costs, and even more important, total life cycle costs decrease. It is important for the customer to make equipment purchase decisions based on total life cycle costs and not just on initial purchase price.

Technology Transfer # 92031014A-GEN

SEMATECH

12 Optimizing Life Cycle Costs. Increasing acquisition costs to improve equipment reliability and lower operational and total life cycle costs is clearly a recommended practice. However, there is a point at which increasing acquisition costs to obtain higher levels of reliability is no longer beneficial. Figure 1-3 shows an optimal point beyond which total life cycle costs begin increasing with further improvements in reliability.

Life Cycle Costs Optimized Cost Point Life Cycle Costs Acquisition Costs Operational Costs

Reliability

Figure 2-3. Optimizing Life Cycle Costs When this occurs, a more reliable technology is required for further improvement. Reliability insights from a technology used in one generation of equipment should be documented so they can be used to improve the next generation. Improvements in technology transfer between equipment generations will generally produce a decrease in the life cycle costs in each succeeding generation of equipment as shown in Figure 2-4.

SEMATECH

Technology Transfer # 92031014A-GEN

13

Generation 1 Generation 2

Life Cycle Costs

Generation 3

Generation 4

Reliability

Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment 2.5 The Reliability Improvement Process

The reliability improvement process is an iterative process that is applied at each phase of the equipment life cycle. It consists of five basic steps: 1. Establish reliability goals and requirements for equipment 2. Apply reliability engineering or improvement activities, as needed 3. Conduct an evaluation of the equipment or equipment design

Technology Transfer # 92031014A-GEN

SEMATECH

14 4. 5. Compare the results of the evaluation to the goals and requirements and make a decision to move either to the next step or the next phase Identify problems and root causes

The process then returns to Step 2, and Steps 2 through 5 are repeated until goals and requirements are met. The reliability improvement process steps are shown in the flowchart in Figure 1-5.

Establish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Yes

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

Figure 2-5. The Reliability Improvement Process

SEMATECH

Technology Transfer # 92031014A-GEN

15

1.

Establish Reliability Goals and Requirements. The first step in the reliability improvement process is to establish reliability goals and requirements. A distinction is made between goals and requirements. Goals are more internally driven and may or may not be met. Requirements, on the other hand, are more specific and are customer driven. Requirements are usually included as deliverables in contractual agreements. Goals are the starting point, but are modified to satisfy customer requirements early in the equipment life cycle.
Establish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Yes

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

All goals have certain common characteristics. The following criteria can be used to assist in establishing goals[3]: Attainability: Goals should be set at levels reasonably attainable within the available time span. Large goals over long periods should be avoided to maintain interest and commitment. Subgoals over shorter times are more attainable and more cost effective. Supportability: Support and resources must be available at the time they are needed to achieve goals. Advance planning is needed to determine the resources and the extent to which they can or will be provided. Acceptability: Goals must be acceptable to those who will be actively involved in pursuing these goals. Acceptance is influenced by relevance, perceived importance, reasonableness, and desirability of outcome. Measurability: Goals provide standards against which performance may be assessed and, therefore, should be selected for suitability and defined in a way that enables measurement. To make them measurable, goals must be defined qualitatively, quantitatively, and in terms of performance parameters, values, and time scales.

Technology Transfer # 92031014A-GEN

SEMATECH

16 2. Reliability Engineering and Improvements. Once goals and requirements have been established, design-for-reliability practices, or reliability improvement activities are applied to enhance the reliability of equipment that is in any phase of the life cycle, or for equipment already in existence.
Establish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

There are some basic practices that can be applied to improve reliability. These include: Simplicity. Simplification of equipment configuration is one of the basic principles of designing-for-reliability. Added parts or features increase the number of failure modes. A common practice in simplification is referred to as component integration (the use of a single component to perform multiple functions). Redundancy. Another reliability improvement practice is to include more than one way to accomplish a function by having certain components or subassemblies in parallel, rather than in series. Beyond a certain point, redundancy may be the only cost-effective way to design reliable equipment. Proven Components and Methods. To the extent possible, designers should use components and methods that have been shown to work in similar applications. Using proven components can minimize analyses and testing to verify reliability, thus reducing time and costs of demonstrating reliability of the equipment. Derating. Derating is the practice of using components or materials at environmental conditions or loads that are less severe than their limiting condition. Under these conditions, the component or material is expected to be more reliable. Eliminating Known Causes of Failure (Fault Avoidance). This can be accomplished through screening and burn-in procedures to eliminate weak components before equipment is actually shipped to the customer.

SEMATECH

Technology Transfer # 92031014A-GEN

17 Failure Detection Techniques. Reliability of equipment can be improved by incorporating failure detection methods or self-healing devices such as periodic maintenance schedules, monitoring procedures, automatic sensing and switching devices. Ergonomics or Human Factors Engineering. The activities of humans can be very important to equipment reliability. The equipment design must consider human factors aspects such as the person-machine interface, human reliability, and maintainability.

Conduct Evaluation. The next step in the reliability improvement process is to conduct an evaluation of the equipment or equipment design to assess its reliability level. A powerful tool for conducting this evaluation is reliability modeling. For equipment in the early phases of the life cycle, reliability modeling can be used to predict the equipments performance to provide information for design changes or for evaluating design alternatives. For equipment that is already in production or is operational in the field, reliability modeling, combined with testing and failure data analysis, can be used to identify critical components and help guide resource allocation and reliability improvement decisions.
Establish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

There are a number of reliability prediction models. These include: Block diagram models. A block diagram is used to logically represent the equipment being modeled by breaking it down into subsystems and components. Equipment reliability is modeled using failure data on the subsystems and components. State transition (Markov) models. Equipment reliability is modeled by identifying the various operating conditions (states) that the equipment, subsystem, or component can experience, and the probability of transition from one state to another. Other techniques for evaluating equipment reliability and identifying design weaknesses include:

Technology Transfer # 92031014A-GEN

SEMATECH

18 Fault tree analysis (FTA). A "top down" approach beginning with an undesirable event (usually equipment failure) at the top or system level and identifying the events at subsequent lower levels that can cause the undesirable top event. Failure modes and effects analysis (FMEA). A technique for systematically identifying, analyzing, and documenting the possible failure modes within a design and the effects of such failures on equipment performance.

Testing is another tool for evaluating equipment reliability. Typically, three different categories of testing are applied: 1. Component tests - useful in flushing out basic weaknesses in critical components 2. Systems tests - intended to explore effects of component interactions 3. Reliability demonstration tests - used to demonstrate equipment capability The above concepts are discussed in more depth in Section 2.0 and 3.0.

SEMATECH

Technology Transfer # 92031014A-GEN

19

4.

Are Goals and Requirements Met? Results of the evaluation process are compared to reliability goals and requirements. If goals and requirements are not met, the problems and root causes should be identified as described in Step 5, and reliability improvement activities should be initiated. If goals and requirements are met or exceeded, then approval can be given to move to the next phase of the life cycle, or goals and requirements can be updated and additional analyses carried out. For example, if the equipment is in the concept and feasibility or design phase of the life cycle, sensitivity analyses can be conducted to evaluate design and cost trade-offs such as: Design complexity versus reliability Maintainability versus reliability Increased costs versus reliability
Esbablish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

If goals are, or can be exceeded by a significant margin, then the supplier should capitalize on the situation by turning it into a competitive leadership position. Upon completing design trade-off studies, approval can be given to move to the next phase of the equipment life cycle where the reliability improvement process is again initiated. 5. Identify Problems and Root Causes. If reliability goals and requirements are not met, the reasons need to be identified and corrective actions should be taken. Test data on prototypes or actual equipment in the field can be used to supplement information on equipment reliability generated from predictive modeling. Testing can also help to identify causes of failure and any potential reliability problems.

Technology Transfer # 92031014A-GEN

SEMATECH

20

Establish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

A key tool useful for reporting and analyzing failure data is the failure reporting, analysis, and corrective action system (FRACAS). This tool is discussed in more detail in Sections 2.0 and 3.0. Test data and all reported failures should be investigated to verify that a failure occurred. Failure verification can be performed by subjecting the component to the same conditions as those reported when the "failure" occurred. The reliability improvement process now returns to Step 2, where reliability improvement and growth activities are initiated, or upgrades and modifications to reliability goals and requirements are made. Reliability growth activities generally fall into the following major categories: Strengthening the existing design, by testing or modeling (or both) to identify optimal design changes to improve reliability. The process of identifying weak areas can be aided by performing sensitivity studies using the reliability model of the system. Redesigning part or all of the system (fault tolerance), which includes studying ergonomicenhancing software, adding redundancy, and incorporating error detection techniques. Eliminating known causes of failure (fault avoidance), which includes using screening and burn-in procedures to eliminate weak components, derating parts, and using more reliable parts. Steps 2 through 5 are repeated until goals and requirements are met. The process may require several cycles of goal setting, evaluating, comparing, and improving. Approval can then be given to move to the next phase of the life cycle, where the reliability improvement process is again applied.

SEMATECH

Technology Transfer # 92031014A-GEN

21

2.6

Applying the Reliability Improvement Process

Optimal benefits from use of the reliability improvement process are clearly realized when the process is applied to equipment in the concept and feasibility phase of the life cycle and then continuously applied thereafter. Benefits can also be realized when the improvement process is applied to equipment that is in some advanced phase of its life cycle. It is important to address equipment reliability throughout the life cycle. For example, reliability improvements may be necessary: Following the Prototype Phase, because of design deficiencies or parts problems uncovered during prototype testing Beginning the Pilot Production Phase, due to reliability related issues resulting from manufacturing a new equipment line During the Production and Operation Phase, because feedback from field personnel and customers indicate reliability problems due to unanticipated failure mechanisms. Activities Activities associated with applying the reliability improvement process to the equipment life cycle remains basically the same from one phase of the life cycle to the next. Others, however, vary because of the change in focus from phase to phase. For example, focus in the concept and feasibility macro phase is primarily on "planning and allocating;" focus in the design and development macro phase is primarily on "predicting and verifying;" and focus in the production and operation macro phase is primarily on "evaluating and improving." The activities also vary depending on whether the improvement process has been continuously applied to equipment as it moved through its life cycle from concept and feasibility to phase out, or whether it is being applied for the first time to equipment that is in some advanced phase. For example, consider equipment in the prototype phase: If the reliability improvement process has been applied continuously to the equipment in the concept and feasibility phase and in the design phase, then the reliability goals and requirements already exist. Thus, the reliability goals and requirements activity consists, primarily, of updating the goals and requirements; the primary focus would be on prototype testing and corrective action activities. However, if the reliability improvement process was applied to equipment for the first time during the prototype phase, then developing reliability goals and requirements should be a major focus because these goals and requirements do not exist. These concepts are discussed in more detail in Section 2.0. Figure 1.6 provides a high-level view of the main activities associated with applying the reliability improvement process to each of the three macro phases of the life cycle. This is provided primarily to illustrate the flow from one macro phase to the next. A more detailed discussion of applying the reliability improvement process to all six phases of the life cycle, and a list of the associated activities, is presented in Section 2.0. Some of the activities will vary as the reliability improvement process is tailored to a particular need or equipment line. However, the reliability improvement process remains unchanged.

Technology Transfer # 92031014A-GEN

SEMATECH

22

Concept/Feasibility
Establish Goals/Requirements

Step 2. Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Concept/Feasibility
Step 4. Are Goals/Requirements Met? Go/No Go Decision on Next Phase Establish Goals/Requirements

No Step 5. Identify Problems & Root Causes

Step 2. Reliability Engineering/Improvements

Step 3. Conduct Evaluation

-Set Reliability Goals -Create Reliability Program Plan -Develop Conceptual Designs -Develop Preliminary Model -Evaluate Conceptual Designs -Next Phase Go/No Go Approval -Identify Problems and Root Causes -Develop Corrective Actions
No Step 5. Identify Problems & Root Causes Step 4. Are Goals/Requirements Met? Go/No Go Decision on Next Phase

Concept/Feasibility
Establish Goals/Requirements

Step 2. Reliability Engineering/Improvements

-Translate Goals into Requirements


-Apply Design-For-Reliability Practices -Carry out Design Reviews -Upgrade Reliability Model -Predict Equipment Performance -Next Phase Go/No Go Approval -Identify Problems and Root Causes -Develop Corrective Actions

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

-Revise Goals/Requirements -Implement Field Tracking System -Begin Customer Feedback Program -Start Corrective Action Program -Upgrade Reliability Model

-Identify Problems and Root Causes -Develop Corrective Actions -Begin Phase Out Activities

Figure 2-6. Application of Reliability Improvement Process

SEMATECH

Technology Transfer # 92031014A-GEN

23

2.7

Summary

Knowledge of the equipment life cycle is important because it provides a basis for understanding how and where reliability engineering enters into the process of designing, producing, and operating the equipment. The equipment life cycle is broken into distinct phases, each representing a unique portion of the equipment life. These phases provide the framework for tracking reliability throughout the life cycle of the equipment and guidance on when and where to apply resources. Awareness of life cycle costs help equipment owners understand the impact on expenditures and cost of ownership when reliability is initiated at different life cycle phases. The reliability improvement process provides a means for systematically improving reliability throughout the equipment life cycle. Optimal benefits are realized when reliability is designed into a piece of equipment. However, it is important to improve reliability throughout the life of the equipment to meet reliability goals and objectives. The reliability improvement process is an iterative process of setting goals, then evaluating (predicting), comparing, and improving those goals. Central to the reliability improvement process is data collection and analysis; design improvements; and operations and maintenance procedure improvements. About Section 3.0 The next section provides details on preparing for and implementing the reliability improvement process. It includes a discussion of the various activities associated with each step of the improvement process and each phase of the life cycle. In preparation for this discussion, the following questions may assist in assessing current reliability practices and focus. 1. Is the importance of reliability conveyed throughout the company? 2. Is the approach to reliability improvement reactive or proactive? 3. Is the equipment development process life cycle oriented? 4. Have specific goals and requirements been established for equipment reliability and its growth? 5. Does the organization have technical and executive managers who champion the reliability cause? 6. Is demonstrated achievement of reliability goals a part of the criteria for deciding when equipment is ready for release to market? 7. Does the organization collect data that can readily be used in measuring and providing guidance for equipment reliability performance? 8. Do indicators of reliability performance exist for all equipment? 9. Are these indicators routinely monitored to ensure achievement of improvement goals? 10. Is a closedloop failure reporting and corrective action system in place?

Technology Transfer # 92031014A-GEN

SEMATECH

24 2.8 References 1. 2. 3. SI Staff, "Selecting a Product: The Task at Hand," Semiconductor International, March 1991, pages 7-8. J. E. Arsenault and J. A. Roberts, Reliability and Maintainability of Electronic Systems, Potomac, MD:Computer Science Press, 1980. W. Grant Ireson and Clyde F. Coombs, Jr., Handbook of Reliability Engineering and Management, Editors in Chief, McGraw-Hill, 1988.

SEMATECH

Technology Transfer # 92031014A-GEN

25

3 3.1

IMPLEMENTATION OF THE RELIABILITY IMPROVEMENT PROCESS Introduction

To ensure that maximum benefits are achieved when implementing the reliability improvement process, it is important to have an understanding of: Managements role in the implementation process The activities associated with applying the process Functional responsibilities in the implementation process Where to start the process How to use limited resources and communicate the value of the process Each of these topics is discussed in this section. Primary focus is given to applying the reliability improvement process. Activities associated with applying the reliability improvement process to equipment in the concept and feasibility phase and continuing throughout its life cycle are discussed first. Later, the discussion focuses on activities associated with applying the reliability improvement process to equipment in an advanced phase (other than concept and feasibility) of the life cycle. 3.2 Managements Role

Management plays a vital role in implementing the reliability improvement process. It has the responsibility for establishing the right environment, and in choosing individuals to champion the effort. The champions provide leadership and are accountable for the success of the reliability improvement process. Managements Responsibility One of managements primary responsibilities is to convey the importance of reliability throughout the company. Institutionalizing the reliability improvement process may require a cultural change and even an organizational change. Therefore, management leadership and commitment to this change is essential to ensure success. Success also depends on managements understanding of the activities involved in the reliability improvement process and on their support of these activities. Reliability Champions Selection of reliability champions is critical to the success of the reliability improvement process. Two reliability champions are recommended for moderate-to-large sized companies: an executive champion and a technical champion. In a small company, these two roles may be combined for one person. Executive Champion. The role of the executive champion is to: Provide executive leadership in reliability improvement matters Promote reliability improvement throughout the company Provide assurance that the reliability improvement process is supported
Technology Transfer # 92031014A-GEN

SEMATECH

26 Work closely with the technical champion to develop reliability activities Mentor the reliability improvement process and ensure that accomplishments are acknowledged

Depending on the size of the company, the executive champion could occupy any of a number of upper management positions. The following are a few examples: President or vice president Chief operations officer Chief technical officer Corporate total quality management executive Technical Champion. The technical champion establishes the reliability improvement process and is held accountable for its success. The technical champion takes an active role in: Providing both managerial and technical leadership Ensuring the implementation of an effective cross-functional improvement process Selecting the reliability activities to be performed and the tools that will be used Ensuring that the reliability improvement process is continuously applied Training participants in reliability concepts and tools If not already experienced in reliability, the technical champion should be trained in reliability principles. This training should include a full understanding of the equipment life cycle and life cycle costs concepts as well as reliability improvement process activities. This ensures the background necessary to provide proper guidance for application of the activities and tools associated with implementing the reliability improvement process. The technical champion could be the manager of, or chief engineer within, one of the following organizations: Systems engineering Reliability engineering Product engineering Customer engineering 3.3 Applying the Reliability Improvement ProcessThe Reliability Improvement Process

The reliability improvement process can be applied continuously as equipment moves through its life cycle phases. Activities associated with applying the process may vary as the equipment moves from one phase of the life cycle to the next. This variation results from a change in focus from phase to phase, and from the fact that an activity performed in one phase lays the foundation for activities in subsequent phases. Activities will also vary depending on whether the improvement process is applied continuously as equipment moves through its life cycle (from concept and feasibility to phase out), or whether it is applied for the first time to equipment that is in some advanced (other than concept and feasibility) phase. The following table lists the sections that contain descriptions of the reliability improvement process for each of the starting points (process applied for the first time):
SEMATECH
Technology Transfer # 92031014A-GEN

27 Table 3-1. Reliability Improvement Process Applied at Six Different Starting Points

Starting Points/Life Cycle Phase in Which The Process Applied For The First Time Concept and Feasibility Design Prototype Pilot Production Production/Operation Phase Out

Reference Sections

Section 3.3.1 Section 3.4.1 Section 3.4.2 Section 3.4.3 Section 3.4.4 Section 3.4.5

Technology Transfer # 92031014A-GEN

SEMATECH

28 Starting with Equipment in the Concept and Feasibility Phase The following paragraphs discuss the activities that are performed when the reliability improvement process is first applied to equipment in the concept and feasibility phase and then continuously applied in subsequent phases. The discussion for each life cycle phase concludes with a list of objectives that will have been met as a result of applying the reliability improvement process, and a table summarizing the activities associated with applying the process to that phase of the life cycle. Concept and Feasibility Step 1. Establish Goals and Requirements. In the concept and feasibility phase, the focus of Step 1 is on establishing goals to meet customer requirements. Later these goals may be revised, and are eventually modified to reflect changes in customer requirements, or in response to observations regarding equipment performance level.

Concept/Feasibility Design Prototype (a-site) Pilot Production (b-site) Production/Operation Phase Out

Goals can be established based on: Customer Voice. When establishing reliability goals, it is important to consider who the customers are and what aspects of reliability they regard as most important. The supplier must fully understand customers needs, and be able to translate these needs into equipment-specific information for setting goals. Competitive Benchmarking. Competitive benchmarking is a process used by suppliers to measure and compare their products, services, and operations against competitors and world class performers. Reverse Engineering. The systematic dismantling of equipment with a high reliability ranking is referred to as reverse engineering. The information obtained provides information about the actual reliability of similar equipment and the technology used to achieve that reliability. Warranty Requirements. To remain competitive, the reliability goals must support the established warranty requirements. Equipment Maintenance. It is essential to discuss maintenance aspects of the equipment with field personnel when establishing reliability goals. Improperly addressing maintenance issues can lead to a design with very high user-perceived reliability, but prohibitive maintenance costs.

SEMATECH

Technology Transfer # 92031014A-GEN

29 Once goals have been established, a reliability program plan is created that documents how these goals will be achieved. It defines: Activities to be performed Resources required to fulfill the activities Schedule for these activities Procedures by which the activities will be performed Organizations and interfaces required to perform the activities The program plan provides management and the customer with a means of measuring progress and assuring that requirements will be accomplished. Step 2. Reliability Engineering and Improvements. In the concept and feasibility phase, Step 2 of the reliability improvement process focuses first on developing alternative design concepts. All possible alternatives should be identified and evaluated to ensure that those selected for the design phase are capable of fulfilling goals and requirements. Functional block diagrams are used to develop the basic concepts for the equipment and to evaluate their feasibility. The functional block diagram is updated as the concept changes. The next step is to develop a preliminary model of the equipment using the functional block diagrams. The initial model is created at a gross level; that is, the equipment is broken into a few (approximately 10 to 20) major subsystems. This model is used to make initial predictions of the equipment reliability (Step 3). A reliability allocation is conducted to allocate the equipment reliability goal into the individual major subsystems. This is done to make equipment reliability requirements more manageable and to establish individual reliability requirements for each major subsystem. Since no detailed information on the equipment is yet available, the allocation process is approximate; it is used to guide the designer when developing various concepts. In this phase, the equipment has not been built, so other sources of data are required. Historical data can be used for those subsystems that are similar to previous generations of equipment. For those subsystems for which no historical data is available, expert judgement can be used. Expert judgement takes the opinion of individuals that are considered to be knowledgeable about a subsystem or component and uses this knowledge to create initial reliability values. Another reliability engineering activity available for identifying conceptual design weaknesses is a failure modes and effects analysis (FMEA). This is a technique for systematically identifying, analyzing and documenting the possible failure modes within a design and the effects of such failures on equipment performance. The process of setting up an FMEA is initiated in this step, but it is used later in Step 5 to help identify problems and root causes.

Technology Transfer # 92031014A-GEN

SEMATECH

30 Step 3. Conduct Evaluation. The subsystem failure data and the reliability prediction model are used to evaluate the reliability of the conceptual design. A reality check assures that the predicted reliability value makes sense. Evaluate the following: Predicted versus the anticipated reliability value Historical and expert opinion data used to calculate equipment reliability Reliability prediction model Conceptual design review(s) of the concepts that will be carried to the design phase are conducted at this point. These design reviews are also useful in evaluating the current level of the predicted reliability of the concepts being considered. Step 4. Are Goals and Requirements Met? A comparison is made between established goals and the predicted reliability values. If the goals are not met, continue to Step 5 where problems and root causes are identified. If the goals are met or exceeded, approval is eventually given to move to the design phase of the life cycle, where goals may be modified to meet customer requirements. Step 5. Identify Problems and Root Causes. If goals are not met, problems and root causes should be identified. Sensitivity analyses can be conducted to direct attention to those subsystems that have the greatest impact on the equipment reliability. If an FMEA was developed in Step 2, use it to examine the potential failure modes identified and to establish possible root causes. The reliability improvement process now returns to Step 2 (reliability improvement and growth activities are initiated). These might include: Adding high-level redundancy Using proven high reliability components and parts Forming partnerships with sub-tier suppliers Derating Once the conceptual design improvements have been selected and incorporated, both the functional block diagram and the reliability prediction model are re-evaluated. The model and the data used in the model are changed to reflect the conceptual design improvements. If an FMEA was initiated, it is also updated to reflect design changes. Steps 2 through 5 are repeated until goals are met and approval is given to move to the design phase of the life cycle. At the end of concept and feasibility phase, the following objectives have been met: Reliability goals have been established and allocated to major subsystems A reliability program plan has been initiated Conceptual designs that form the basis of the equipment design are determined Feasibility that selected conceptual designs will meet goals is demonstrated Table 3-2 summarizes the activities associated with applying the reliability improvement process to the concept and feasibility phase. There are three designators used for the activities:

SEMATECH

Technology Transfer # 92031014A-GEN

31 E(engineering), D(data), T(testing). These designators followed by a number provides the location of the activity in Section 3.0. Table 3-2. Reliability Improvement Process Activities
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? Establish reliability goals (E1) Create reliability program plan (E2) Develop functional block diagrams (E3) Create preliminary reliability model (E4) Allocate reliability goals (E5) Collect historical failure data (D1) Develop preliminary FMEA (E14) Develop preliminary Life Cycle Cost (AT19) Preliminary prediction of equipment reliability (E6) Conceptual design review(s) (E7) Compare goals to predicted reliability values 5. Identify Problems and Root Causes If goals are not met, continue to Step 5 If goals are met move to design phase of life cycle Activities

Perform sensitivity analyses using reliability model (E8)

Design Step 1. Establish Goals and RequirementsGoals and Requirements. The reliability goals established in the concept and feasibility phase of the life cycle are modified and become reliability requirements in the design phase. Requirements need to be well-defined so that they are understandable by design engineers and manufacturers. Requirements should be broad in nature and be both qualitative (e.g., definition of responsibilities and program requirements) and quantitative (e.g., mean time between failures and uptime). Concept/Feasibility

Design Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

Technology Transfer # 92031014A-GEN

SEMATECH

32

System level requirements are allocated to major subsystems and components. Once reliability requirements have been established, the reliability program plan is updated to reflect these requirements. Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. Design-for-reliability practices are applied at this step in the improvement process. Application of design-for-reliability practices creates a proactive environment for the design team. Some of the more basic practices include: Simplicity. Simplification of equipment configuration is one of the basic principles of designing-for-reliability. Added parts or features increase the number of failure modes. A common practice in simplification is referred to as component integration, which is the use of a single component to perform multiple functions. Proven Components. To the extent possible, designers should use components that have been shown to work in similar applications. Using proven components can minimize analyses and testing to demonstrate reliability of equipment. Derating. Derating is the practice of using components or materials at environmental conditions or loads that are less severe than their limiting condition. Under these conditions, the component or material is expected to be more reliable. Redundancy. Another reliability improvement practice is to include more than one method for accomplishing a function by having certain components or subassemblies in parallel, rather than in series. Beyond a certain point, redundancy may be the only cost-effective way to design reliable equipment. Failure Detection. Reliability of equipment can be improved by incorporating failure detection methods such as automatic sensing and switching devices. Ergonomics or Human Factors Engineering. The equipment design must consider human factors aspects such as the person-machine interface, human reliability, and maintainability. The functional block diagram is updated as the design develops. The gross reliability model, which consists of major subsystems, is expanded. Each subsystem is broken into more detail. For example, a wafer handler subsystem could be categorized into software, electronics, arm, and casing components. The reliability allocated to a subsystem is further allocated to the component level. As was the case in the concept and feasibility phase, this allocation is based on limited information available during the early phases of the life cycle; it is used as a guide when developing the various designs. As the design progresses, the allocation becomes finalized. If an FMEA was not developed in the concept and feasibility phase of the life cycle, initiate it in this phase. As was the case in the concept and feasibility phase, equipment in the design phase has not yet been built, so actual component failure data may not be available. Here again, historical data can be used for those components that are similar to previous generations of equipment. Use standard handbooks (such as MIL-HDBK-217[1] or NPRD-91 Handbook[2]), or expert opinion to obtain data for those components where no historical data is available.
SEMATECH
Technology Transfer # 92031014A-GEN

33 If a critical component is used for the first time and the life data is not available, run a simulated life test to generate the life data under the expected use conditions. Step 3. Conduct Evaluation. Use the subsystem and component failure data, and the updated reliability prediction model, to evaluate the reliability of the current equipment design. As was the case in the concept and feasibility phase, evaluate the following: Data sources and their validity Predicted versus the anticipated reliability value Historical and expert opinion data used in determining equipment reliability Reliability prediction model Conduct design review(s) of the design(s) that will be carried to the prototype phase at this time. These reviews are often broken down into: Requirements Review - review the equipments design requirements Preliminary Design Review - evaluate the preliminary design against requirements Critical Design Review - provide design to the customer(s) for review Step 4. Are Goals and Requirements Met? Compare the reliability requirements and the predicted reliability values. If requirements are not met, continue to Step 5 where problems and root causes are identified. If requirements are met, approval is given to move to the prototype phase of the life cycle. Step 5. Identify Problems and Root Causes. If requirements are not met, sensitivity analyses can be conducted to direct attention to those subsystems and components that have the greatest impact on the equipment reliability. Evaluate the FMEA that was developed in Step 2 to determine potential failure modes of the subsystems and components. The process now returns to Step 2, where reliability improvement activities are initiated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move to the prototype phase of the life cycle. At the end of the design phase, the following objectives have been met: The core architecture of the equipment design has been finalized Design(s) have been chosen for prototype

Technology Transfer # 92031014A-GEN

SEMATECH

34 Table 3-3 summarizes the activities associated with applying each step of the reliability improvement process to the design phase. Table 3-3. Reliability Improvement Process Activities2-3. Reliability Improvement Process Activities for the Design Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? -

Activities
Modify goals to match customer requirements(E1) Update reliability program plan (E2) Apply design-for-reliability practices (E9) Update functional block diagram (E3) Expand reliability model to include more detailed subsystems (E4) Allocate subsystem requirements to subsystem components (E5) Collect failure data for components within subsystems (D1) Evaluate reliability of purchased components (E11) Run life test on new and critical components (AT18) Update Life Cycle Cost (AT19) Perform ergonomics and human factors studies (E12) Conduct software reliability studies (E13) Implement FMEA (E16) Predict equipment reliability (E6) Conduct design reviews (E7) Compare reliability requirements to predicted values If requirements are not met, continue to Step 5 If requirements are met, move to prototype phase of life cycle

5. Identify Problems and Root Causes

Perform sensitivity analyses (E8) Evaluate FMEA (E14)

Prototype Step 1. Establish Goals and RequirementsGoals and Requirements. At this point in the life cycle, requirements have been established and little remains to be done other than to upgrade these as the design moves toward completion and prototypes are built. Modeling, as well as failure data analyses can be used to appraise current equipment reliability levels and evaluate what levels are achievable.

SEMATECH

Technology Transfer # 92031014A-GEN

35 Concept/Feasibility Design

Prototype (-site) Pilot Production (-site) Production/Operation Phase Out

As was the case in the previous two phases, the reliability program plan is updated. Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. The functional block diagram is again updated in the prototype phase to reflect any design changes. Subsystems and components having the greatest impact on equipment reliability are further expanded in the reliability prediction model. If reliability requirements were revised in Step 1, re-allocation to major subsystems and components may be necessary. For those subsystems and components that are modeled in more detail, reliability allocations need to be made to lower levels. If more than one prototype is built, a reliability model for each prototype design may be needed. Conduct a test to generate subsystem and system level reliability data for each of the prototypes. Aspects of the test program that are considered include: Test objectives Test parameters Test sample size Test duration Test environments

Component tests are useful for identifying basic weaknesses in critical components, whereas system tests are useful in exploring the effects of components interactions. Results from component tests alone should not be used for predicting system reliability performance, since component tests rarely duplicate system interactions. A failure reporting and corrective action system (FRACAS) can be initiated to record failure data gathered during the testing program. The FRACAS is a closed-loop reporting system that is useful in: Identifying failures and establishing a historical data base Analyzing failures to determine the cause Documenting the corrective action required to minimize reoccurrence of the failures Maximum benefits from a FRACAS are realized when it is implemented early in a test program and is directly coupled to the modeling effort. Failures identified during in-house testing (e.g., prototype tests) are easier to analyze than failures in the field. Furthermore, it is more cost effective to identify and correct failures earlier in the life cycle.
Technology Transfer # 92031014A-GEN

SEMATECH

36 The actual failure modes that are uncovered during testing, should be recorded in the FRACAS, and compared to the predicted failure modes established in the FMEA. Where difference occur, the reasons should be identified. Step 3. Conduct EvaluationEvaluation. Reliability of the various prototypes is evaluated based on the test data. Results of the prototype test are then presented for a design review prior to pilot production. Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Compare the results of the testing of the prototype(s) to the requirements to see if they have been met. If the requirements are not met, move to Step 5, where problems and root causes are identified. If requirements are met, then a design review is performed, including a management go/no go decision to continue to the pilot production phase of the life cycle. Step 5. Identify Problems and Root CausesProblems and Root Causes. A sensitivity analysis is conducted to direct attention to those subsystems and components that have the greatest impact on the equipment reliability. Root causes of the failures recorded in the FRACAS are identified and corrective actions implemented. A more detailed failure analysis might also be performed on those subsystems and components that are failing at a significantly higher rate than previously anticipated. The process now returns to Step 2, where improvement activities are initiated. If a FRACAS was initiated, it might identify corrective actions that could be implemented to eliminate failures. Other possibilities include: Derating Procedural changes Process changes A preventive maintenance (PM) program can be developed for subsystems and components that degrade equipment performance. Partnerships established with suppliers are continually nurtured and purchased subsystems and components are continually evaluated. Human capabilities and limitations are considered and changes are made to the equipment to eliminate failures due to human errors. The software reliability program is continued. For critical subsystems and components, the optimal operating range is found and the impact of the optimal range on other components is evaluated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move to the pilot production phase of the life cycle. At the end of the prototype phase, the following objectives have been met: The prototype(s) has been tested and evaluated to determine its capability of achieving the requirements. This includes redesigning and re-evaluating until a go/no go decision is reached The core subsystem and component designs are finalized. Table 3-4 summarizes the activities associated with applying the reliability improvement process to the prototype phase.

SEMATECH

Technology Transfer # 92031014A-GEN

37 Table 3-4. Reliability Improvement Process Activities for the Prototype Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities 3. Conduct Evaluation 4. Are Goals and Requirements Met? Update reliability requirements (E1) Update reliability program plan (E2) Update functional block diagram (E3) Expand reliability model, as needed (E4) Re-allocate subsystem and component reliability requirements (E5) Establish test plan (T1) Conduct Prototype test (T2) Establish FRACAS (E17) Perform human reliability analysis (D2) Develop preventive maintenance program (E10) Continue to evaluate the reliability of purchased components (E11) Perform ergonomics studies (E12) Conduct software reliability studies (E13) Update Life Cycle Cost (AT19) Evaluate prototype reliability (T2) Conduct design review(s) (E7) Compare reliability requirements to predicted values - If requirements are not met, continue to Step 5 - If requirements are met move to pilot production phase of life cycle 5. Identify Problems and Root Causes Perform sensitivity analyses (E8) Evaluate FRACAS to identify problems and root causes (E17) Evaluate FMEA to identify potential failure modes (E14) Perform failure analyses on critical components (E16)

Pilot ProductionProduction Step 1. Establish Goals and RequirementsGoals and Requirements. During the pilot production phase, upgrades are made to goals and requirements, as appropriate, and the reliability program plan is updated to reflect these, as well as other, changes. Modeling and failure data analyses are used to assess current and potential levels of equipment performance. Concept/Feasibility Design Prototype (-site)

Pilot Production (-site) Production/Operation Phase Out

Technology Transfer # 92031014A-GEN

SEMATECH

38

Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. Functional block diagrams and the reliability model are once again updated to reflect any changes that occurred during the prototype phase. If a FRACAS was not implemented during the prototype phase, then it should be done at this time. The test program is evaluated and updated as needed. Any aspects of the test program that are not clearly defined during the prototype phase should be established here. Additional tests that should be implemented at this time are: Burn-in tests Reliability qualification tests (RQT) Burn-in tests are useful in identifying weak components or subsystems prior to field use. An RQT is useful in initial customer applications of the equipment to evaluate equipment performance in actual operating environments. The RQT is also useful in verifying compliance with contractual objectives; whereby, equipment is tested according to a predetermined plan under specified environmental conditions and pass/fail criteria prior to a full-scale production decision[3]. Testing equipment in an environment that represents usage throughout its service life allows for establishing reasonable correlations between test results and actual field experience. The manufacturing processes should be qualified at this time to avoid the manufacturing problems identified during the pilot production. Qualifying manufacturing processes before full-scale production reduces manufacturing costs and prevents equipment performance degradation[4]. Qualifying manufacturing processes includes: Performing a process capability study Establishing process control Monitoring the defect level Reducing the defect level Periodically assessing and controlling the processes[5] Both new and existing manufacturing processes should be requalified periodically to ensure requirements are maintained. Personnel involved in the manufacturing process should be properly trained before introduction of the equipment. Step 3. Conduct EvaluationEvaluation. The pilot production phase of the life cycle is generally the first time equipment is evaluated in a customer environment. Thus, reliability modeling and prototype testing, engineers should work closely with customer service and field service personnel to evaluate initial customer applications of the equipment to evaluate its performance in actual operating environments. A reliability qualification test (RQT) is performed to verify compliance with contractual objectives. Problems and failures occurring during testing should be carefully analyzed, and recommendations for corrective action should be issued as part of the FRACAS. Failure modes identified in the FMEA are compared to reported failures during testing. Differences that occur should be analyzed. Definitions of failures should be issued, and pass-fail criteria should be established. Failures generally fall into four categories[5]:
SEMATECH
Technology Transfer # 92031014A-GEN

39 1. Catastrophic/Hard failures - failures that are permanent. For equipment, these failures reflect an irreversible physical change. These failures are easily identified and replicated. 2. Marginal failures - failures that are due to dirty or degraded performance of the critical components. The equipment is operational, but the output is not within the acceptable limits. 3. Intermittent failures - failures that only occur due to unstable equipment or varying software conditions. Intermittent failures occur randomly and are difficult to replicate. 4. Soft failures - failures that result from temporary environmental conditions. Like intermittent failures, soft failures occur randomly and are difficult to replicate. The pilot production phase provides the last opportunity to make design changes and other improvements before full-scale production begins. Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Results of field testing are compared to requirements to determine if they are met. If requirements are not met, the process moves to Step 5 where problems and root causes are identified. If requirements are met, a design review is conducted, and a go/no go decision to continue to the production and operation phase of the life cycle is made. Step 5. Identify Problems and Root CausesProblems and Root Causes. Sensitivity analyses, as well as feedback from a FRACAS and FMEA, are used to direct attention to problem areas and root causes. Techniques such as a Pareto analysis can assist in focusing on addressing major problems first, and then working to lower level problems later. The process now returns to Step 2, where improvement activities and corrective actions are initiated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move to the production and operation phase of the life cycle. At the end of the pilot production phase of the life cycle, the following objectives have been met: Capability of the pilot production design is tested and evaluated to determine if the design can achieve the end use requirements in the customers operating environment. The equipment design for full-scale production and deployment is finalized. Table 3-5 summarizes the activities associated with applying the reliability improvement process to the pilot production phase of the life cycle.

Technology Transfer # 92031014A-GEN

SEMATECH

40 Table 3-5. Reliability Improvement Process Activities for the Pilot Production Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Update reliability requirements, as needed (E1) - Update reliability program plan (E2) - Update functional block diagram, if needed (E3) - Update reliability model, if needed (E4) - Re-allocate reliability requirements, as needed (E5) - Upgrade testing program, as needed (T1) - Implement FRACAS, if not already done (E17) - Perform human reliability analyses (D2) - Perform software reliability studies (E13) - Perform ergonomic studies (E12) - Update preventive maintenance program, as needed (E10) - Continue to evaluate reliability of purchased components (E11) - Update Life Cycle Cost (AT19) 3. Conduct Evaluation - Conduct tests of equipment (T2) - Evaluate equipment reliability (E6) - Conduct design review(s) (E7) 4. Are Goals and Requirements Met? - Compare reliability requirements to observed values - If requirements are not met, continue to Step 5 - If requirements are met move to production & operations phase of life cycle 5. Identify Problems and Root Causes - Perform sensitivity analyses (E8) - Evaluate FRACAS (E17) - Evaluate FMEA (E14) - Perform failure analyses on critical components (E16)

Production/Operation 5 Step 1. Establish Goals and Requirements. Final updates to reliability requirements and the reliability program plan are made at this point. All major reliability problems should have been identified and corrected prior to full-scale production and deployment of the equipment. Concept/Feasibility Design Prototype (-site) Pilot Production (-site)

Production/Operation Phase Out

SEMATECH

Technology Transfer # 92031014A-GEN

41 Step 2. Reliability Engineering and Improvements. Functional block diagrams and the reliability model are updated to reflect any design changes that occurred during the pilot production phase. The FRACAS data base is updated to reflect failure modes uncovered during pilot production testing. The observed failures are also used to update the reliability model. A field tracking and customer feedback program is initiated to record operation and maintenance problems in the field. This information should account for uncertainty due to variations in site, equipment vintage, and customer procedures. Step 3. Conduct EvaluationEvaluation. Evaluation of the equipments performance at this point consists primarily of feedback from maintenance records. However, the effect of the pending corrective actions should be counted to predict the equipments future performance. Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Here again, if requirements are not being met, then problems and root causes are identified in Step 5. If requirements are being met, then it is important to continually monitor equipment performance and to implement a process of continuous improvement until decisions are made to phase out the current generation of equipment and begin development of the next generation. Step 5. Identify Problems and Root CausesProblems and Root Causes. Failures and problems reported during full-scale production and deployment in the field are fed through the FRACAS to verify the failure(s) and to identify root causes and corrective actions. Pareto analyses can be used to prioritize problems. The process now returns to Step 2, where improvements and corrective actions are implemented. Steps 2 through 5 are repeated until requirements are met. At the end of the equipments production and operation phase, the following objectives have been met: The equipment is manufactured in a manner that uniformly meets the customer and supplier requirements. Continuous improvement goals and requirements are established and demonstrated. Table 3-6 summarizes the activities associated with applying the reliability improvement process to the production and operation phase of the life cycle.

Technology Transfer # 92031014A-GEN

SEMATECH

42 Table 3-6. Reliability Improvement Process Activities for the Production and Operation Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Final update of reliability requirements, if needed (E1) - Final update of reliability program plan (E2) - Update FRACAS data base (E17) - Implement field tracking, customer feedback (D1) and corrective action program - Update human reliability analyses (D2) - Update software reliability studies (E13) - Update ergonomic studies (E12) - Update preventive maintenance program, as needed (E10) - Continue to evaluate reliability of purchased components (E11) - Update Life Cycle Cost, if required (AT19) 3. Conduct Evaluation 4. Are Goals and Requirements Met? - Assess equipment reliability based on the field data(E6) - Evaluate feedback from field tracking and maintenance records (D1) - Compare requirements to observed values - If requirements are not met, continue to Step 5 - If requirements are met: * Continually monitor equipment performance * Implement process of continuous improvement * Revise goals and requirements, as appropriate (E1) * Eventually phase out current generation equipment 5. Identify Problems and Root Causes - Perform sensitivity analyses (E8) - Perform failure analyses on field failures (E16)

Phase Out Step 1. Establish Goals and RequirementsGoals and Requirements. At this point in the life cycle, there are no goals or requirements to establish. A general goal would be to set requirements for subsystems and components to be carried over to the next generation of equipment. Also, it is important to have documented and retained all the information gained during the life cycle phases of the current generation of equipment so that similar mistakes will not be repeated.

SEMATECH

Technology Transfer # 92031014A-GEN

43 Concept/Feasibility Design Prototype (-site) Pilot Production (-site) Production/Operation

Phase Out

Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. There are no reliability engineering or reliability improvements to be made at this point. Phase-out alternatives should be offered to customers of current generation equipment. Possible alternatives might include: Training and spare parts availability for current generation equipment Trade-ins on new generation equipment (customer discounts) Inventory of current generation equipment could be phased out in stages such as: Stage 1 - where spare parts requirements are maintained Stage 2 - where spare parts are sold to customers who still want them (last chance) Stage 3 - where remaining spare parts are scrapped Step 3. Conduct EvaluationEvaluation. At this point, there is nothing to evaluate except the past performance of the generation of equipment being phased out. The failure rate database of the subsystems and components is being carried over to the next generation of equipment for future reliability modeling. Step 4. Are Goals and Requirements Met? Since no goals or requirements have been established, there are none to compare. Step 5. Identify Problems and Root Causes. As previously mentioned, it is important to retain all information on the performance of the equipment being phased out so that the information can be used to improve future generations of similar or new equipment. At the end of the phase-out phase of the life cycle, the following objectives have been met: The discontinuation of production and field support is planned and implemented in a manner that satisfies both the customer and supplier needs. Subsystems and components carried over to the next generation of equipment are evaluated for information that will cause an improvement in the next generation. A failure rate database has been developed for subsystems and components for the next generation of equipment. Table 3-7 summarizes the activities involved in applying the reliability improvement process to the phase out phase of the life cycle.

Technology Transfer # 92031014A-GEN

SEMATECH

44 Table 3-7. Reliability Improvement Process Activities for the PhaseOut Phase2-7. Reliability Improvement Process Activities for the PhaseOut Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements

Activities - Set requirements for subsystems and components to be carried to next generation of equipment - Document and retain all information gathered during generation of equipment being phased out

2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? 5. Identify Problems and Root Causes

- Offer phase-out alternatives to customers of equipment being phased out - Phase out current generation equipment in stages - Assess reliability of the current generation(E6) and carried information to next generation of equipment. - There are no goals or requirements to meet - Retain all information on equipment being phased out so that it can be used in future generations of equipment

3.4

Specific Applications of the Reliability Improvement Process

When applying the reliability improvement process for the first time to equipment in some advanced phase (other than concept and feasibility) of the life cycle, the activities will vary from those discussed earlier. This is because the activities that would have been performed in the previous life cycle phase(s) have not been performed and must, to some extent, be made up. The discussion in the following paragraphs is based on starting the reliability improvement process in some phase of the life cycle other than the concept and feasibility phase, and then continuously applying it throughout the remainder of the phases. For example, if the reliability improvement process is being applied for the first time to equipment that is already in the prototype phase of its life cycle, then activities associated with each step of the process for that phase and all subsequent phases (pilot production, production and operation, and phase out) are considered. The activities associated with applying the reliability improvement process to phases beyond the phase in which the process is being initiated are, however, basically the same as those discussed earlier. Furthermore, this discussion is similar to the earlier discussions that involved the application of the improvement process. Therefore, every process improvement step in every life cycle phase is not discussed in detail. Only the differences are highlighted. 3.4.1 Starting with Equipment in the Design Phasewith Equipment in the Design Phase

When equipment has reached the design phase, the basic concept has already been established and is fixed in the minds of the design engineers. It is more difficult to incorporate customer needs into the design in this phase than in the concept and feasibility phase. However, it is not too late and is clearly important, to incorporate customer needs and requirements when establishing reliability goals. If a reliability program plan has not been initiated, do so at this time.

SEMATECH

Technology Transfer # 92031014A-GEN

45 If the functional block diagrams and the corresponding reliability model were not initiated in the concept and feasibility phase, develop them now. Equipment reliability requirements are then allocated to individual major subsystems in the model. Failure data are collected for use in the reliability model. Other activities associated with applying the reliability improvement process to the remainder of the process steps and life cycle phases are identical to those discussed earlier and are listed in Tables 3-4 through 3-7. Therefore, they are not listed again here. Table 3-8 summarizes the activities associated with applying the reliability improvement process to equipment that is in the design phase. The activities listed in Table 3-8 are similar to those listed in Table 3-3; the difference is in the activities listed under Steps 1 and 2. Table 3-8. Design Phase Reliability Improvement Process Activities
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Establish reliability goals and requirements (E1) - Establish reliability program plan (E2) - Apply design-for-reliability practices (E9) - Develop functional block diagram (E3) - Develop reliability model (E4) - Allocate requirements to subsystems and components (E5) - Collect failure data for subsystems and components (D1) - Evaluate reliability of purchased components (E11) - Perform ergonomic studies (E12) - Conduct software reliability studies (E12) - Implement FMEA (E16) - Develop life Cycle Cost (AT19) 3. Conduct Evaluation 4. Are Goals and Requirements Met? - Predict equipment reliability (E6) - Conduct design reviews (E7) - Compare reliability requirements to predicted values - If requirements are not met, continue to Step 5 - If requirements are met move to prototype phase of life cycle - Perform sensitivity analyses (E8) - Evaluate FMEA (E14)

Technology Transfer # 92031014A-GEN

SEMATECH

46 3.4.2 Starting with Equipment in the Prototype Phase

For equipment already in the prototype phase of the life cycle, the design is fixed. There is little opportunity to make major design changes due to cost and time constraints. However, it is still important to set goals and to understand and establish customer requirements. Furthermore, available failure data can be used to assess the current performance of the equipment for establishing upgrades to goals and requirements. If, a reliability program plan has not been developed, create one that identifies and ties together all of the reliability improvement process activities that will be performed during the prototype phase and subsequent phases of the life cycle. Develop the functional block diagrams and reliability models to better understand and predict the reliability of equipment designs being prototyped. Update these model(s) as the design changes but realize that the models may become more complex as the design evolves. Develop detailed breakdowns of the subsystems that are significant contributors to system unreliability. Allocate reliability requirements to the individual subsystems. The subsystem allocations are then further divided into component allocations. The allocation process is used as a guide for improving the reliability of the equipment components and subsystems. Table 3-9 summarizes the activities associated with applying the reliability improvement process to equipment that is in the prototype phase. The activities associated with applying the reliability improvement process to the remainder of the life cycle phases are identical to those discussed earlier and listed in Tables 2-5 through 2-7. Therefore, details are not listed here.

SEMATECH

Technology Transfer # 92031014A-GEN

47 Table 3-9. Prototype Phase Reliability Improvement Process Activities


Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements Activities - Establish reliability goals and requirements (E1) - Establish reliability program plan (E2) - Create functional block diagram (E3) - Create reliability model (E4) - Allocate reliability requirements to subsystems and components (E5) - Establish test plan (T1) - Establish data collection program (D1) - Establish FRACAS (E17) - Establish FMEA (E16) - Perform human reliability analysis (D2) - Develop preventive maintenance program (E10) - Continue to evaluate the reliability of purchased components (E11) - Perform ergonomic studies (E12) - Conduct software reliability studies (E13) - Develop Life Cycle Cost (AT19) 3. Conduct Evaluation - Test prototype(s) (T2) - Evaluate prototype reliability (E6) - Conduct design review(s) (E7) 4. Are Goals and Requirements Met? - Compare reliability requirements to predicted values 5. Identify Problems and Root Causes If requirements are not met, continue to Step 5 If requirements are met move to pilot production phase of life cycle

- Perform sensitivity analyses (E8) - Evaluate FRACAS (E17) - Evaluate FMEA (E14) - Perform failure analyses on critical components (E16)

3.4.3

Starting with Equipment in the Pilot Production Phasewith Equipment in the Pilot Production Phase

For equipment in the pilot production phase of the life cycle, the focus should be on appraising the actual level of equipment reliability (from available data) and determining what levels are desired and obtainable. This is still an important step in the environment of customer requirements. A reliability program plan can still be created to identify and tie together all of the reliability improvement process activities that will be performed during the pilot production phase and subsequent phases of the equipment life cycle.

Technology Transfer # 92031014A-GEN

SEMATECH

48 The majority of this effort should be directed at making needed design improvements once the equipment is evaluated. It is not too late to incorporate some design-for-reliability practices. The focus should be on reliability growth activities directed at the existing design. A method for collecting, tracking, and storing reliability data should be established. A FRACAS can be initiated and used to track reported failures during pilot production, and to identify corrective actions necessary to eliminate these failures. It is still not too late to initiate an FMEA. Ergonomic studies can be used very effectively at this point. Table 3-10 summarizes the activities associated with applying the reliability improvement process to equipment starting in the pilot production phase. Table 3-10. Pilot Production Phase Reliability Improvement Process Activities When Initiated In Pilot Production Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? Activities Establish reliability goals and requirements (E1) Establish reliability program plan (E2) Create functional block diagram (E3) Create reliability model (E4) Allocate reliability goals and requirements (E5) Establish data collection and tracking system (D1) Establish testing program (T1) Establish FRACAS (E17) Establish FMEA (E16) Perform human reliability analyses (D2) Perform ergonomic studies (E12) Perform software reliability studies (E13) Establish preventive maintenance program (E10) Evaluate reliability of purchased components (E11) Evaluate equipment reliability (E6) Conduct tests of equipment (T2) Conduct design review(s) (E7) Compare goals and requirements to observed values 5. Identify Problems and Root Causes If requirements are not met, continue to Step 5 If requirements are met move to production & operation phase

Perform sensitivity analyses (E8) Evaluate FRACAS (E17) Evaluate FMEA (E14) Perform failure analyses on critical components (E16)

SEMATECH

Technology Transfer # 92031014A-GEN

49 3.4.4 Starting with Equipment in the Production and Operation Phasewith Equipment in the Production and Operation Phase

For equipment in the production and operation phase of the life cycle, the design is fixed. There is no opportunity to make major design changes at this time. Thus, the focus of Step 1 should be on appraising the actual level of reliability of equipment in this phase, and evaluating the levels that are desired and whether these levels are achievable. Upgrades to existing equipment can be made based on failure data analyses. Although rather late in the life cycle, creating a reliability program plan to track the activities to be performed during this phase and the phase out period of the life cycle is still beneficial. Efforts should focus on making needed improvements to the existing design and on reliability growth activities since it is too late to design reliability into the system. Table 3-11 summarizes the activities associated with applying the reliability improvement process to equipment that is in the production and operation phase of the life cycle. The activities associated with applying the improvement process to the phaseout phase of the life cycle are identical to those discussed earlier and listed in Table 3-7 and, therefore, are not listed here.

Technology Transfer # 92031014A-GEN

SEMATECH

50 Table 3-11. Production and Operation Phase Reliability Improvement Process Activities When Initiated in Production and Operation Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? Activities Establish reliability goals and requirements (E1) Establish reliability program plan (E2) Develop functional block diagram (E3) Create reliability model (E4) Allocate goals and requirements (E5) Establish FRACAS (E17) Establish FMEA (E14) Implement field tracking and customer feedback program (D1) Perform human reliability analyses (D2) Perform ergonomic studies (E12) Perform software reliability studies (E13 Establish preventive maintenance program (E10) Evaluate reliability of purchased components (E11) Assess equipment reliability using the field data (E6) Evaluate feedback from field tracking and maintenance records (D1) Use FRACAS to evaluate field failures (E17) Compare goals and requirements to observed values If requirements are not met, continue to Step 5 If requirements are met: * Continually monitor equipment performance * Implement process of continuous improvement * Eventually phase out current generation equipment Perform sensitivity analyses (E8) Perform failure analyses (E16)

5. Identify Problems and Root Cause

3.4.5

Starting with Equipment in Phase Out Phase with Equipment in Phase Out Phase

It is much too late to make any changes to the equipment during the phase-out phase. The goal in this phase is limited to collecting the reliability data of the equipment in order to gain insight into the next generation of equipment. This information can save tremendous amounts of time and money in the concept and feasibility phase of the next generation. There are no reliability engineering or reliability improvements to be made at this point. Phaseout alternatives should be offered to customers of current generation equipment. Table 3-12 summarizes the activities involved in applying the reliability improvement process to equipment that is in the phase-out phase of the life cycle. This table is identical to Table 3-7.

SEMATECH

Technology Transfer # 92031014A-GEN

51 Table 3-12. Phase Out Phase Reliability Improvement Process Activities When Initiated in Phase-Out Phase
Reliability Improvement Process Step 1. Establish Goals and Requirements Activities - Set requirements for subsystems and components to be carried to next generation of equipment - Document and retain all information gathered during generation of equipment being phased out 2. Reliability Engineering and Improvements 3. Conduct Evaluation 4. Are Goals and Requirements Met? 5. Identify Problems and Root Causes - Offer phase-out alternatives to customers of equipment being phased out - Phase out current generation equipment in stages - Create reliability model of subsystems and components carried to next generation equipment (E4) - There are no goals or requirements to meet - Retain all information on equipment being phased out so that it can be used in future generations of equipment

3.5

Functional ResponsibilitiesResponsibilities

The executive and technical reliability champions have responsibility for ownership of the reliability improvement process. However, various groups are assigned responsibility for implementing and maintaining the reliability improvement process activities during the life cycle of a piece of equipment. The type of group that should be held accountable depends on the particular life cycle phase and the activity being performed. Both managers and engineers are given responsibility for activities. Although a particular group has been assigned overall responsibility for an activity, other groups may actually provide assistance or perform the activity. Because each company has a unique management structure, the reliability champions responsibilities include choosing the appropriate groups to assist, participate, and own each activity. For companies that have a reliability engineering group, the following paragraphs present recommended practices and organizational guidelines that will help make the reliability improvement process activities successful. Recommended practices for reliability engineers: The engineering group and designers (not the reliability engineers) are accountable for the reliability of the design and the cost of poor reliability, All designers are trained in basic reliability methods and tools by the reliability group Reliability engineers are part of the design team Reliability engineers assist designers The reliability group is accountable for reliability planning, program development, and assuring adherence to program policy

Technology Transfer # 92031014A-GEN

SEMATECH

52 Organizational guidelines for reliability engineering group: The group reports to development engineering manager, not to quality assurance The group reports to the systems engineering manager, not to field service Reliability engineer(s) report to the program manager of equipment with other members of the design team not to operations The group exists as a separate peer group with engineering (Caution: this can lead to reliability engineers being accountable for reliability and becoming isolated from the design team), not part of sales 3.6 Where to Begin

One of the most difficult problems facing a company is where to begin. In an ideal environment, a reliability program would evolve along with the formation of the company and the development of its first product. A master plan for continuous reliability improvement would have been established, and reliability activities would have been initiated as needed throughout the equipments life cycle. In a more typical situation, a company has an informal reliability effort. This effort may be applied sporadically, based on the personal style and management priorities of the equipment development manager. If the companys equipment has poor reliability in the field, a major engineering project may be initiated to fix specific reliability problems. Otherwise, the company faces losing business to the competition. The management team frequently does not recognize the need for or require development of a core reliability program that ensures ongoing attention to reliability requirements for all equipment. Even if management recognizes the need for the reliability process, they often find themselves in a reactive mode with current equipment problems and limited resources. Often, management may not be willing to wait for the benefits of a reliability program that is developed at the same time as its next product. Although each companys situation is unique, there are some general guidelines that can be used to determine where implementation of a reliability program would be most effective. The first step involves assessing where in the life cycle each equipment line falls, and determining its current reliability performance. The ultimate goal is to choose one equipment line on which to focus reliability improvement activities. Obviously, the earlier in the equipment life cycle reliability improvement activities are implemented, the greater the benefits. It is likely that a supplier will be developing more than one equipment line at any given time, each of which is in a different phase of its life cycle. For example, Figure 2-1 shows three equipment lines, each of which is in a different phase of its life cycle. Equipment A is in full production Equipment B is in the design phase Equipment C is just beginning the concept and feasibility phase Benefits can be gained by applying the reliability improvement process to any of these three equipment lines. However, there are optimal situations to be aware of.

SEMATECH

Technology Transfer # 92031014A-GEN

53 Figure 3-1. Multiple Equipment and Their Life Cycle Phase Status

Equipment C

Concept and Feasibility

Design

-Site

-Site

Production and Operation

Phase Out

Equipment B

Concept and Feasibility

Design

-Site

-Site

Production and Operation

Phase Out

Equipment A Concept and Feasibility Production and Operation

Design

-Site

-Site

Phase Out

Today

Time

Equipment C has the greatest potential for cost-effective improvements in reliability because it is in the earliest phase of its life cycle. However, this does not mean that it is too late to improve the reliability of Equipment A and B. Reliability improvements can and should be considered in every phase of the life cycle. However, when starting a reliability improvement process, it is generally advantageous to choose equipment that will show immediate successes. If sufficient resources exist, address all equipment in all life cycle phases. Because it is unlikely that this is the situation, the following priorities are recommended: 1. Equipment in the Production and Operation Phase. Although this is a reactive strategy, it is the most customer oriented, and is capable of demonstrating quick benefits. Another benefit of starting with equipment in this phase is that data on the equipment in the field is available and can be used to determine current reliability performance. If you are unable to determine your current situation, it is difficult to set realistic goals and determine whether they have been met. It is also important to assess the impact of upgrades to equipment in this phase using the reliability model and existing failure data.

Technology Transfer # 92031014A-GEN

SEMATECH

54 2. Equipment in the Design Phase. This is a proactive strategy and has the greatest long-term benefits. In this phase, it is difficult to determine what the reliability performance of the equipment will be unless the previous generation has a database and a significant number of similar parts. If this information exists, it can be used with modeling to evaluate potential performance of designs being considered. Equipment in the Prototype or Pilot Production Phase. These phases are reactive and have benefits between the prior two stages. There is some amount of data available; therefore, the anticipated reliability performance of the equipment in the field can be determined. The drawback with these phases is the expense and time involved if major design changes are necessary. Equipment in the Concept and Feasibility Phase. This is a proactive and the least expensive phase. Significant reliability improvements can be made to equipment in this phase with minimal use of resources. However, as with the design phase, the lack of data makes it difficult to determine reliability performance. In general, ignore equipment in or near Phase Out. Activities should be limited to customer requests. However, if the product that is being phased out has future generations that are significant to the companys strategic plan, collecting data and analyzing failures of the product will yield tremendous insight into development of the next generation.

3.

4.

5.

When making a choice, choose equipment that you know will have future generations. As mentioned in Section 1.0, the cost of improving equipment reliability will decrease as it moves from generation to generation. Knowing the reliability performance of existing equipment is essential to evaluating current equipment status and for setting reliability goals for current and future equipment. It is difficult to set realistic and attainable performance goals without this knowledge. Table 2-13 illustrates the type of reliability performance information that is available for the three equipment lines shown in Figure 2-1. Table 3-13. Current Product Line Status
Equipment Current Life Cycle Phase A Production and Operation Current Reliability Performance Actual - MTBFp Actual - MTTR Predicted - MTTR Goal - MTTR Predicted - MTBF p Design B C Concept and Feasibility Goal - MTBF p

Mean time between failures (MTBFp) and mean time to repair (MTTR) are the two measures of reliability performance used in this illustration. SEMI Standard E10-90[6] provides several other measures of reliability. Table 2-13 indicates that the MTBFp and MTTR values are known for
SEMATECH
Technology Transfer # 92031014A-GEN

55 Equipment A. Actual data are not available for Equipment B and C because they are in early stages of development. However, Equipment B has predicted values based on the design and Equipment C has goals that it is targeted to meet. Reliability and design engineers determine current reliability performance by collecting and analyzing data received from a number of sources, including Field service reports Customer feedback In-house testing In situations where data is not available, but reliability performance needs to be determined, preliminary engineering judgements, mathematical predictions, and consensus using the opinions of experts can be used as a first cut at data values. As discussed previously, one of the cornerstones of reliability improvement is the reliability data reporting system. It is an organized means of gathering factual data about equipment performance-both good and bad. Although useful data estimates can be determined during the concept and feasibility phase as well as the design and development phases of the equipment life cycle, the most meaningful data is collected during the production and operation phase, when the equipment is operating in its intended environment. Nevertheless, information gathered in any phase of the life cycle can be used to ensure that the reliability goals are attained with minimal time and expense commitments. Section 4.0 discusses in detail the activities associated with data collection and analysis. These activities include determining: What data to collect How to use this data The most effective format to use when collecting data How to transform the data into failure rates How to get numerical values for human errors

It is important to note that an effective reliability improvement process includes a central database that includes data collected for all equipment of the same model or type and accounts for uncertainty due to variations in site, equipment vintage, and customer procedures. 3.7 Reliability Plans

The supplier should develop several reliability plans, a general company plan covering all products, and the specific product for individual equipment lines. The following six elements must be included in these plans: 1. Objectives 2. Constraints, limitations and requirements that exist at the time the plan is written 3. Basic assumptions made 4. Activities to be performed to meet objectives 5. Resources required to perform the planned activities 6. A schedule showing when the activities will be started and completed
Technology Transfer # 92031014A-GEN

SEMATECH

56 General Company PlanCompany Plan An overall reliability plan tailored to a company that takes into account the companys size and available resources; the plan addresses the following issues: The companys reliability policy Identification of reliability champions The overall strategy How reliability skills will be acquired within the company, and A description of organizational activities Specific Product Plans Each equipment line requires a reliability plan based on the life cycle phase of the equipment line, reliability goals and requirements, schedule limitations, and resources available. The more stringent the goals, the more activities, tools, and resources required to achieve the goals. Also, the shorter the schedule, the more resources that must be applied over the scheduled period. The plan will identify the specific reliability activities and tools that will be used for a specific equipment line, and who (or which department) is responsible for performing them. 3.8 Application of Resources and Communicating Value

There are typically two difficult problems facing an organization at this point Applying limited and already allocated resources to what appears to be a monumental undertaking Communicating the value of the reliability improvement process to key decision makers and participants In an ideal environment, a master plan for continuous reliability improvement would have been established and reliability activities would have been initiated as needed throughout the equipments life cycle. In a more typical situation, a company has an informal reliability effort. This effort may be applied sporadically, based on the personal style and management priorities of the equipment development manager. If the companys equipment has poor reliability recorded in the field, a major engineering project may be initiated to fix specific reliability problems. Otherwise, the company faces losing business to the competition. The management team frequently does not recognize the need for or require development of a core reliability plan that ensures ongoing attention to reliability requirements for all equipment. Even if management recognizes the need for the reliability process, they often find themselves in a reactive mode with current equipment problems and limited resources. Often management may not be willing to wait for the benefits of implementing a reliability improvement process that is developed at the same time as its next product. Ideally, once a piece of equipment has been selected for the reliability improvement process, responsible individuals or groups would perform all the activities within the process steps. If resources are limited, individuals or groups would perform selected activities. The choice of activities depends on the company, and ultimately only the companys people know what resources can be successfully deployed and the best time frame for employing these activities.
SEMATECH
Technology Transfer # 92031014A-GEN

57 However, the following items should be considered: Select activities that require various groups to work together on reliability improvement. This extends ownership of the reliability mission and shows success across multiple fronts. Initially choose activities that will give immediate benefits. Implementation of the reliability improvement process requires a long-term sense of vision and commitment. However, the engineer needs to "sell" management and participants on the advantages of the activities. This generally requires some demonstration of improvements almost immediately. If portions of an activity are already in place, build on them. Specific reliability skills training should be taught to individuals as they become directly involved and are ready to apply new skills to real issues. The vision of reliability for the equipment and the plan for how that reliability is going to be met should be discussed early to orient everyone in the company to the reliability effort. The implementation of the reliability process as described, occurs in a somewhat piecemeal fashion. However, this approach offers an effective means of applying limited resources to real and timely issues. When this approach is used, it is particularly important to have a technical champion to manage the entire equipment reliability effort. This ensures that a coherent and well-coordinated development effort occurs. It is best to start small; start with one piece of equipment, implementing those activities that fit best in your company. Attempting to implement the process for all equipment simultaneously generally does not work. Once the reliability process for one piece of equipment is in place and the next piece of equipment is targeted for reliability improvement, find those activities that overlap. For example, if components or subsystems in the first piece of equipment are identical or very similar in the next piece of equipment, combine databases and reliability models for those parts. Communicating Value Communicating the value of the reliability effort to key decision makers and participants is vitally important and can be accomplished in three ways: 1. Translate the reliability efforts and benefits to measures such as cost savings, resource or cost avoidance, time to market, or market share gain. 2. Demonstrate a series of immediate short-term improvements and document those improvements noting the benefits gained. 3. Develop a champion in senior management who will support your reliability efforts when top level support is needed. The champion has the respect of decision makers and also the authority to influence and encourage participants. 3.9 Summary

The role management plays in the reliability improvement process is vital. Management has unique responsibilities in the establishment and implementation of the process. Management also assigns individuals to the role of reliability champions. The executive champion provides
Technology Transfer # 92031014A-GEN

SEMATECH

58 reliability leadership with the full support of upper management. The technical champion establishes the reliability improvement process and is responsible for its success. The five steps of the reliability improvement process can be applied to a piece of equipment no matter what phase it is in. This section discussed the activities associated with each step of the reliability improvement process for each phase of the life cycle. This section also included a discussion on how to select a piece of equipment to implement a reliability program based on the life cycle phases. The section also covered the importance of data, the choice of activities when resources are limited, rules for the reliability program plan, and suggestions on how to communicate the value of the reliability effort to key decision makers and participants in the reliability program. Section 3.0 provides more detailed descriptions of the reliability-related activities and presents some of the tools and techniques available in planning, developing, and implementing a reliability improvement program. 3.10 References 1. MIL-HDBK-217E, Reliability Prediction of Electronics Components. 2. Non-Electronics Part Reliability Data, Reliability Analysis Center, Rome, NY, 1991. 3. RMS Committee, RMS, Reliability, Maintainability & Supportability Guidebook, SAE G-11, Society of Automotive Engineers, Inc, Warrendale, PA, 1990. 4. DOD 4245.7-M, Transition from Development to Production, September, 1985. 5. William W. Everett, et al., Reliability by Design, A Guide to Reliability Management, Issue 1, AT&T, Indianapolis, IN, November 1990. 6. SEMI E10-90, Guideline for Definition and Measurement of Equipment Reliability, Availability, and Maintainability (RAM), SEMI 1990.

SEMATECH

Technology Transfer # 92031014A-GEN

59

ACTIVITIES AND TOOLS IN THE RELIABILITY IMPROVEMENT PROCESS Introduction

4.1

The first two sections of these guidelines provided an overview of the reliability improvement process and the equipment life cycle. This section provides a description of the activities and tools that are part of the reliability improvement process. The reliability activities are grouped as: Engineering Data Testing Engineering activities form the foundation of the reliability improvement process. Data activities also play an important role because the engineering activities depend on data. Testing activities provide a valuable source of data and information. There are three designators used for the activities: E (engineering), D (data), and T (testing). These designators followed by a number provide the location of the activity in this section. Some of the activities stand alone; that is, they do not require any formally recognized tools of the trade. These tools come from various academic disciplines such as probability and statistics, and reliability engineering. However, many of the activities use these standard methods and techniques referred to as tools. The designator used for the tools is AT, followed by a number. 4.2

Reliability ActivitiesActivities

The following lists summarize the reliability activities that are discussed in this section: Engineering Activities E1 Reliability Goals E2 Reliability Program Plan E3 Functional Block Diagrams E4 Equipment Reliability Modeling E5 Reliability Goal Allocation E6 Equipment Reliability Quantification E7 Design Reviews E8 Sensitivity Analysis E9 Design for Reliability Practices E10 Preventive Maintenance Program (PM) E11 Reliability of Purchased Components E12 Ergonomic Studies E13 Software Reliability Studies E14 Failure Modes and Effects Analysis (FMEA)

Technology Transfer # 92031014A-GEN

SEMATECH

60 E15 Equipment Characterization E16 Component Failure Analysis E17 Failure Reporting and Corrective Action System (FRACAS) Data Activities D1 Data Collection and Data Base Management D2 Human Reliability Analysis (HRA) Testing Activities T1 Test Plans T2 Reliability Tests Reliability Tools The following list summarizes the reliability tools that are discussed in this section. AT1 Accelerated Testing AT2 Burn-In Testing AT3 Cause & Effect (Fishbone) Diagram AT4 Competitive Benchmarking AT5 Design of Experiments (DOE) AT6 Environmental Stress Screening (ESS) AT7 Fault Tree Analysis (FTA) AT8 Life Testing AT9 Pareto Diagram AT10 Process Capability AT11 Quality Function Deployment (QFD) AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT13 Reliability Development/Growth Testing (RD/GT) AT14 Reliability Qualification Testing (RQT) AT15 Reliability Block Diagram Modeling (RBD) AT16 Repairable Systems Analysis AT17 Taguchi Methodology AT18 User Groups AT19 Cost of Ownership Calculations The following pages discuss each activity. Following the activity descriptions is a description of the tools in enough detail that the reader can either use the tool or understand what it can be used for. References are available at the end of each activity or tool that requires more detailed descriptions. Much of the material used in the activity and tool descriptions come directly from the references. The purpose of this section is not to recreate work that has already been done well, but rather to give the reader an opportunity to know what the activity or tool is about and where to go for more information.

SEMATECH

Technology Transfer # 92031014A-GEN

61

Engineering Activity E1: Reliability Goals


Reliability goals are used to focus attention toward producing reliable equipment and to serve as standards against which reliability achievements can be measured. These goals define the design requirements which in turn form the basis for design specifications. Various sources are used to establish goals: Customer Voice. Listening to the voice of the customer means understanding what the customer wants and needs. Quality Function Deployment (QFD) is a tool developed to help establish goals through customer involvement. Customers identify the qualities they need and want in equipment, using their own words. These qualities are then translated by the supplier into measurable technical goals. QFD is most useful when applied during the concept and feasibility and design phases. However, it is important in every phase to understand what the customers consider to be their primary needs and wants. It is also important to establish customer partnerships to assure continued customer involvement. Competitive Benchmarking. Competitive benchmarking is a process used by a company to measure and compare their products, services, and operations against their toughest competitors and those companies demonstrating world class performance. Reverse Engineering. The systematic dismantling of a piece of equipment with a high reliability ranking is called reverse engineering. The information obtained provides clues about the actual reliability of similar equipment and the technology used to achieve that reliability. Contractual Agreements. A contractual agreement is a formal document that contains an explicit statement of the customers requirements for reliability and safety. No matter what phase of the life cycle the equipment is in, the reliability and safety values agreed upon by the customer and the supplier are set. An inability to maintain these values leads to a dissatisfied customer. Warranty Requirements. To remain competitive, the reliability goals must support the warranty requirements. The following criteria are used to establish goals; Attainable. Establishing reliability goals involves making those goals realistic for the given technology constraints. However, these goals should still be a "stretch," that is they should be challenging. Reliability goals and the time allotted to accomplish these goals must be carefully correlated. Trade offs may be necessary to match completion dates to the level of achievable reliability. Resources Available. Resource availability at the time they are required is important. It is best to stay with what can be realistically supported. Equipment Perspective. Approach setting goals from an overall equipment perspective. Attempt to optimize reliability, cost, time to market, resources, and maintainability while staying within the overall equipment specifications and design constraints. Measurable. It is difficult to determine if goals are being met if those goals are not defined quantitatively.

Technology Transfer # 92031014A-GEN

SEMATECH

62 Even though safety and maintainability goals are not addressed in these guidelines, some mention of these goals is necessary because of the key interactive role they play with reliability. Designers should identify safety, maintainability, and reliability goals at the same time. Since maintainability is built into equipment, it is primarily addressed in the concept and feasibility and design phases. Maintainability is achieved by carefully considering and balancing numerous factors such as basic physical configuration and layout of the design, test provisions for quick fault location, interchangability of replaceable parts, adequate maintenance procedures, and skill levels of technicians. As with reliability, pertinent data is collected to estimate the maintainability measures and to ensure that the maintainability goals are being achieved. It is important to remember that setting reliability goals is not a one-time affair; it is a continuous process of gradual improvements that are made toward the goals over time. Applicable Tools AT4 Competitive Benchmarking AT11 Quality Function Deployment (QFD)

References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering and Management, New York:McGraw-Hill, 1988, pp. 2.3-2.8.

SEMATECH

Technology Transfer # 92031014A-GEN

63

Engineering Activity E2: Reliability Program Plan


The purpose of the reliability program plan is to identify and tie together all of the activities required for the reliability improvement process. A company typically has several reliability plans. The general company plan covers all equipment, while the equipment-specific plans are developed for individual equipment. These plans include six elements: Objectives Constraints, limitations, and requirements (that exist at the time that the plan is written) Basic assumptions Activities required to meet the objectives Resources necessary to perform the plan Schedule showing when the activities will start and be completed Every plan needs clearly stated objectives that can be easily understood. Constraints, limitations, or requirements may exist due to physical, monetary or manpower limitations, identifying these early may address problems before they arise. The basic assumptions stated in the reliability program plan will differ from company to company and equipment to equipment; for example, equipment A can implement its reliability program within 6 months, however, equipment B requires 10 months. The heart of the reliability plan is the identification of the activities required to meet the objectives. The plan also requires a statement of the resources required to implement the plan, along with an explanation of why those resources are required, if it is not obvious. If the resources are not available, then the plan should show how they will be obtained. Finally, a schedule is developed showing when the selected activities will start and be completed. General Company Plan. An overall reliability plan takes the companys size and available resources into account. The plan addresses the following issues: The companys reliability policy Identification of reliability champions The overall strategy How reliability skills will be acquired within the company A description of organizational activities

Equipment Specific Plan. Each piece of equipment requires a reliability plan based on the equipments life cycle phase, reliability goals, schedule, and available resources. The loftier the goal, the more activities, tools, and resources are required to meet that goal. Also, the shorter the schedule, the more resources must be applied over the scheduled period. The plan identifies the specific reliability activities and tools to be used for particular equipment, and who (or what department) is responsible for performing them.

References
MIL-STD-785B, Reliability Program For Systems And Equipment Development And Production, Task 101, 15 September 1980.

Technology Transfer # 92031014A-GEN

SEMATECH

64

Engineering Activity E3: Functional Block Diagrams


During this activity, the equipment is depicted by clear, abbreviated schematics that show the major subsystems, components, or parts of the equipment and the critical support systems such as power, actuation signals, control, and cooling. A functional block diagram is used to show how the equipment subsystems, components, and parts interact with one another and with the support systems. A block diagram provides a clear picture of how the equipment functions and can be used to create a reliability model. It also helps create an understanding of what makes the equipment work and what causes it to fail. If alternative concepts or designs have been created, each one should have its own functional block diagram.
Establish Goals/Requirements

Step 2.

Reliability Engineering/Improvements

Step 3. Conduct Evaluation

Step 4. Are Goals/Requirements Met?

Go/No Go Decision on Next Phase

No Step 5. Identify Problems & Root Causes

An example of a functional block diagram is given in the icon above. The functional block diagram represents a hypothetical personal computer (PC). As can be seen from the diagram, the PC has two hard disk drives (HD1, HD2) and two floppy drives (FD1, FD2). The keyboard, IO board, ram card, disk controller card, and video control card all derive their power from the power supply via the mother board. The CRT (monitor) is a separate unit with its own power supply. The need for schematics and flow diagrams is well recognized, but typically these are too complex to use directly. It is important to construct diagrams that depict clearly and simply how the equipment functions. Subsystems, components, support systems, and human actions that lead to equipment failure should be obvious when the functional block diagram is constructed properly.

SEMATECH

Technology Transfer # 92031014A-GEN

65

Engineering Activity E4: Equipment Reliability Modeling


During this activity, the equipment reliability modeling allows one to predict what the reliability of a piece of equipment is or will be; it is particularly useful when the equipment is complex. To be of the most value, the reliability model is created in the early life cycle phases; that is, concept and feasibility, and design. However, the earlier in the equipments life the reliability model is created, the more challenging it is for the model to realistically predict the equipments reliability. The reliability of a piece of equipment is known with absolute certainty only after it has been used in the field until it is worn out and its failure history has been faithfully recorded. Even though one cannot predict the equipment reliability with absolute certainty, a reliability model can predict the equipments reliability with enough confidence that changes to the equipment that lead to improved reliability can be proposed. No matter what phase of the life cycle a piece of equipment is in, reliability modeling has numerous benefits; these include: Improving understanding of the equipment Allowing an early evaluation of design alternatives Identifying critical subsystems, components, and parts and their interactions Guiding resource allocations to portions of the equipment most needing improvement. In order to design and manufacture reliable equipment, it is important to understand how the various subsystems, components, and parts fit together; how they affect one another during normal operation; and what the reliability of the subsystems, components, and parts must be to achieve the desired equipment reliability. Understanding these issues allows one to predict what the future performance of an equipment design will be. Thus, design alternatives can be modeled and the best designs chosen. Reliability modeling can consider subsystems, components, parts, materials, and human errors; that is, anything that affects the equipments reliability, to determine what the reliability of a piece of equipment will be. Target areas for improvement can then be found. Reliability modeling also allows one to consider the natural variation inherent in equipment. This variation is due to differences in how operators use the equipment, the environment in which the equipment is used, differences in how the equipment was manufactured, and so forth. The reliability modeling described here is not concerned with time degradation; that is, the equipment is neither being broken in nor at the point of wear out. This idea can be explained more clearly by discussing the typical "bathtub" curve seen in many reliability texts. The following figure shows a typical bathtub curve, also known as a failure rate curve, over the life of a part, component, subsystem or the equipment. The early part of the curve, where the failure rate is decreasing, is often called burn-in or the break-in stage. The later part of the curve, where the failure rate is increasing, is typically called the wear-out stage. As was mentioned, the reliability model discussed in these guidelines assumes that the components, parts, subsystems and the equipment itself are in the constant failure rate portion of this curve. This allows one to assume that all components, parts, subsystems, and the equipment have a constant failure rate; that is, the

Technology Transfer # 92031014A-GEN

SEMATECH

66 failure rate does not change over time. The model also assumes that the components, parts, and subsystems being modeled are repairable and that the repaired items are as good as new.

Failure Rate

Time

If a block diagram is used to model the equipment, the equipment model will consist of series blocks (when the failure of one subsystem, component, or part causes the equipment to fail), parallel blocks (when every subsystem, component, or part must fail for the equipment to fail) or a combination of these. The following paragraphs discuss how to create a reliability model. The first step involves clearly defining what is meant by equipment failure. For example, one might define failure as any occurrence that causes the equipment to be down for more than a given period of time (e.g., 6 minutes) or any occurrence that results in wafer damage. This step also involves identifying all of the failure mechanisms that lead to the defined equipment failure. If, for example, equipment failure is defined as a down time of 6 minutes or more, all failure mechanisms that cause the equipment to be down at least 6 minutes are included in the reliability model. If equipment failure is defined as any occurrence that results in wafer damage, all failure mechanisms that result in wafer scrap are identified. Field data is often useful in defining what is meant by equipment failure and in identifying mechanisms that lead to failure. The next step involves creating the reliability model. Fault trees and reliability block diagrams are the tools that are used to do this. RAMP is a software package that has been created to help in the documentation and analysis of a reliability model. It uses reliability block diagrams. RAMP allows one to create the reliability model on a personal computer, provides a means of documenting failures, and performs the Boolean algebra necessary to solve the model. A reasonable starting point in the creation of the model is to initially create a coarse model made up of the equipments major subsystems. If a block diagram is used as the modeling tool, the model would consist of approximately 10 to 20 major subsystems; that is, in the model, one block would represent each major subsystem. Later versions of the model add detail only to those subsystems that are identified as being important; that is, only those subsystems that cause the equipment to fail are broken down into components and parts. Adding detail to unimportant subsystems for the sake of completeness simply increases the modeling effort without adding to the usefulness of the results. Careful examination of field data helps determine the appropriate level of detail for the model. In general, the model should not be more detailed than the available
SEMATECH
Technology Transfer # 92031014A-GEN

67 information will support. If the modeling effort is for equipment not yet in the field, field data for a previous generation of equipment can yield valuable insights into improvements in the next generation. Once the model is completed, it can be transformed into an equation for quantification, which is discussed in engineering activity E6. The equipment reliability is calculated using the failure data collected for the subsystems, components and parts. The following paragraphs discuss tips that will make the modeling effort easier. 1. Think carefully about the subsystem divisions for the equipment being modeled. The choice of subsystems will vary from company to company and equipment to equipment; however, it is best to base the choice on functional considerations not on parts count methods. Choose subsystems based on the functions they perform. Group components and parts under the subsystems that make functional sense. 2. Avoid parts list modeling. That is, do not represent the equipment as a collection of parts. It is important to include failure modes such as operator errors, software failures and failures that are the result of drifting out of specification. In addition, valuable insight into the equipment is gained by thinking about failure modes and interactions between different subsystems. Parts list modeling does not encourage this kind of thinking. 3. It is best to begin by modeling an existing piece of equipment. Good reliability modeling practice comes through experience. If the first model created is for equipment that is well understood, the model can be validated in terms of the failure rate and failure mechanisms. Also, introduction of a reliability modeling program will almost always cause the data collection and data management procedures to be revised. It is generally better to sort out data problems with an existing system than with a new system. 4. No matter what phase of the life cycle the equipment is in, it is best to keep the model as simple as possible. As the model becomes more complicated, it becomes more difficult to interpret. 5. As the reliability process proceeds, continually change, expand, and improve the model. This allows the model to be used throughout the life of the equipment. Applicable Tools AT7 Fault Tree Analysis (FTA) AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT15 Reliability Block Diagram Modeling (RBD)

References
Campbell, J.R., Iman, R., Longsine, D., Thompson, B., A Tutorial on Reliability Modeling Using RAMP, Albuquerque, NM:SETEC, Sandia National Laboratories, SETEC91-030, 1991. MIL-HDBK-217E, Reliability Prediction of Electronic Equipment, Griffiss AFB,NY:Rome Air Development Center, October 1986.

Technology Transfer # 92031014A-GEN

SEMATECH

68

Engineering Activity E5: Reliability Goal Allocation


The reliability goal allocation activity involves allocating or apportioning the equipment reliability goals into individual subsystem, component, and part goals. The advantages of allocating reliability goals include: Persuading equipment design and development personnel to understand the relationships between parts, components, subsystems, and the overall equipment reliability. This leads to an understanding of the basic reliability problems inherent in a design. Persuading the design engineer to consider reliability equally with other equipment parameters such as performance, cost, and weight characteristics. Ensuring adequate design, manufacturing methods, and testing procedures. Giving design engineers numerical goals for each portion of the design. The ability to specify a set of goals to a sub-tier supplier who is producing a subsystem, component, or part of the equipment. When starting the reliability allocation process, use the overall equipment goals and the equipment reliability model along with a basic reliability allocation model. There are several basic models available, such as the equal-apportionment technique, the ARINC apportionment technique, and the AGREE allocation method; these are all described in Reliability in Engineering Design, by Kapur and Lamberson. When allocating reliability goals, combine engineering judgement along with knowledge of: How the various subsystems, components, and parts are related The reliability of similar subsystems, components, or parts of previous equipment The complexity of the subsystems, components, or parts The importance of the subsystems, components, or parts to the equipment reliability In the early life cycle phases, there is a lack of detailed information; thus, the allocation process is approximate. However, a tentative reliability allocation can be done to guide the design team. If the allocated goals for a piece of equipment cannot be met using the current technology or if the goals can be met too easily, the equipment is modified and the allocations reassigned. This process is repeated until the allocations meet the equipment requirements. It is important to mention here that reliability is only one of many design attributes that need to be allocated. The allocation process is repeated for other attributes such as; safety, maintainability, and ease of use. A conscious decision is made in the trade-off between reliability and other attributes. Applicable Tools AT7 Fault Tree Analysis (FTA) AT15 Reliability Block Diagram Modeling (RBD)

SEMATECH

Technology Transfer # 92031014A-GEN

69

References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering and Management, NY, McGraw-Hill, 1988, pp. 18.34-18.42. Juran, J., F. Gryna, editors, Jurans Quality Control Handbook, Fourth edition, NY, McGrawHill, 1988, pp. 13.21-13.22. Kapur, K., L. Lamberson, Reliability in Engineering Design, NY, John Wiley and Sons, 1977, pp. 405-422. Lloyd, D., M. Lipow, Reliability: Management, Methods, and Mathematics, Second edition, Milwaukee, WS, The American Society for Quality Control, 1984, pp. 25-27, 267-270. OConnor, P., Practical Reliability Engineering, Third edition, NY, John Wiley and Sons, 1991, pg. 136.

Technology Transfer # 92031014A-GEN

SEMATECH

70

Engineering Activity E6: Equipment Reliability Quantification


In this activity, the equipment reliability model developed using a reliability block diagram or a fault tree is transformed into a simple equation which is then used to quantify the equipment failure rate. The reliability block diagram modeling description given in AT15 includes general information on how to write the Boolean equation for a block diagram. Boolean equations are used to quantify the block diagram; that is, take the block diagram created for a piece of equipment and translate it into a failure rate for that equipment, based on the individual failure rates of the subsystems, components, and parts that make up that equipment. The reliability, analysis and modeling program (RAMP) software has been developed specifically to solve reliability block diagram models. The fault tree analysis description given in AT7 also includes general information on how to translate the model created for the equipment into a Boolean equation, which can then be quantified into the equipment failure rate.

C A B D E

Equipment Failure = A + B + [ C * D ] + E

Applicable Tools AT7 Fault Tree Analysis (FTA) AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT15 Reliability Block Diagram Modeling (RBD)

SEMATECH

Technology Transfer # 92031014A-GEN

71

Engineering Activity E7: Design Reviews


Conducting a review of a proposed design is a way of evaluating a design and providing some assurance that a new design will not create problems as the equipment proceeds through its life cycle phases. There is not generally a separate design review for reliability. In fact, reliability should be included as an integral part of the regular design review process that consists of a sequence of reviews at appropriate points in the equipments life. Design reviews provide an opportunity to review the design, make decisions, and establish action plans. They also ensure that: The design meets all performance requirements The design has been studied to identify possible weaknesses Alternative designs and components have been considered The design can be manufactured at a low cost The design has easy, low-cost field maintenance Designers, reliability engineers, manufacturing engineers, field service, management and other appropriate personnel participate in the reviews. The basis of the design review is the design as it exists at the time of the review. The purpose of reliability-oriented design reviews includes: Verifying the appropriateness of the reliability model Detecting potential design weaknesses and any other condition that could degrade equipment reliability Verifying fault detection and diagnostic capabilities Determining equipment recovery strategies Verifying the quality of component and equipment reliability data Developing a parts derating strategy Verifying reliability qualification test results Evaluating electrical, mechanical, and thermal aspects of the design Determining the effects of reliability and maintainability engineering on the design Determining the extent to which software affects the equipment reliability Reviewing the status of previous review actions Using the lessons learned from previous generation equipment Deciding what trade-offs should be made between process, manufacturing and reliability. In order to accomplish the purpose of the design review, reviews must occur during all life cycle phases and numerous times during a phase. The reviews may be formal or informal; within a group, such as design; or include a cross-functional group of individuals, such as design, manufacturing and field service; they may even include management. Generally those reviews that include management are those that are involved in making critical decisions.
Technology Transfer # 92031014A-GEN

SEMATECH

72 Ingredients for a successful design review include An emphasis on constructive input to designers, instead of criticism. The purpose of a review is not to challenge the work of a designer, but to anticipate weak areas in a design and eliminate them as early in the life cycle as possible. Avoiding the creation of an environment where the designer feels threatened. The designer listens to the results of the review and, along with line management, has the final decision on the design. Creating a design review team from a variety of areas. These areas may include manufacturing, field service, reliability and quality engineering, procurement, materials engineering, shipping, marketing, and design engineering personnel who are not directly associated with the design under review. Customer involvement in a post-design review meeting in which the program is reviewed may yield insight into what the customer values in the equipment. Adequate planning for and emphasis on design review meetings. A formal agenda and advanced documentation is distributed. Focusing on the unproven and untried features of a design. Sufficient structure in the design review process. Identified design weaknesses are documented and provisions are made for their elimination. Subsequent review meetings include a discussion of these weaknesses. A realization that the design review may uncover areas of conflict between departments. Management support. Management is responsible for emphasizing the importance of a carefully planned design.

References
Everett, W., et.al., Reliability by Design A Guide to Reliability Management, Issue 1, Indianapolis, IN, AT&T Bell Laboratories, November 1990, pp. 55-56. Juran, J., F. Gryna, Jurans Quality Control Handbook, Fourth edition, NY, McGraw-Hill, 1988, pp. 13.7-13.11, 16.5-16.6. Lloyd, D., M. Lopow, Reliability: Management, Methods, and Mathematics, Second edition, Milwaukee, WS, The American Society for Quality Control, 1991, pp. 28-30. OConnor, P., Practical Reliability Engineering, Third edition, NY, John Wiley and Sons, 1991, pp. 160-162.

SEMATECH

Technology Transfer # 92031014A-GEN

73

Engineering Activity E8: Sensitivity Analysis


This activity discusses two types of analysis; uncertainty and sensitivity. Uncertainty provides a means of more accurately representing a line of equipment. Sensitivity allows one to simulate changes to subsystems, components, and parts and determine the impact of those changes on the equipment. Uncertainty is a means of addressing variability. Suppose that one tracks the failure history of a piece of equipment at a particular customers location. By the time the equipment is worn out, one has an accurate MTBF value for that unit. While this information is useful, it is unlikely that an identical unit will have the same MTBF, even if that unit was manufactured at the same factory by the same personnel and used at the same customer location. Uncertainty allows one to assign a range of values to a subsystem, component or part failure rate which is then propagated through the reliability model and yields a range over which the equipment MTBF will fall. This provides a more accurate representation of the equipment line. Uncertainty is also useful in pointing out those subsystems, components and parts that require more accurate data. If, for example, the solution of the reliability model highlights a component that has a range of MTBF from 10 hours to 1,000 hours, it is clear that one cannot predict with any confidence, what the reliability performance of that component will be from unit-to-unit. It may be that the failure rate range is large because there is no field data available and the design team does not have a good idea of what its failure rate will be. If this is the case, the team needs to find a way to get more accurate information on this component. Another cause of the large range may be that the component failure rate truly falls within this range, in which case, the team needs to address the unit-to-unit variability. Once these sources of variability are identified, they can be reduced. The reliability model has already been created for the equipment in engineering activity E4. The solution of the model has highlighted those subsystems, components and parts that are the largest contributors to the unreliability of the equipment. The design team can now concentrate on improving these items. Sensitivity analysis uses the reliability model to simulate changes to the equipment. Changes could range from modifying individual component failure rates to completely re-designing a subsystem. Component failure rate changes could be due to a preventive maintenance procedure that will significantly reduce the likelihood of that failure occurring or to changing suppliers of the component to one who has a more reliable product. If a subsystem is re-designed, a reliability model can be created for it and used to replace the previous subsystem model. The new model is then solved and the effect of the change on the overall equipment reliability predicted. This allows the design team to make rational decisions on what changes to make in the equipment that result in the most improved reliability for the least amount of expenditure in time and money. A software tool called RAMP has been designed to address both uncertainty and sensitivity. The data base associated with RAMP allows one to input a range of values for a failure rate. A sensitivity analysis is easy to perform using RAMP and yields results quickly. Applicable Tools AT12 Reliability, Analysis and Modeling Program (RAMP) Software AT19 Cost of Ownership Calculations

Technology Transfer # 92031014A-GEN

SEMATECH

74

References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, Albuquerque, NM, SETEC, Sandia National Laboratories, SETEC91-030, 1991, pp. 4350.

SEMATECH

Technology Transfer # 92031014A-GEN

75

Engineering Activity E9: Design for Reliability Practices


Critical items of a design can be enhanced by applying reliability practices such as: derating; design simplification; redundancy; procedural changes; process control; design for maintainability; deployment considerations; and use of preferred and proven processes, components, and materials. All of these principles are briefly discussed in the following paragraphs. Derating. Derating means that components are operated at less severe stresses than their rating specifies; e.g., a capacitor rated at 300 V is used in a 200 V application. Components selected for derating are typically critical to the reliability of the equipment and are selected at significantly higher rates for voltage, current, temperature, environment, and power dissipation. Derating enhances reliability by: Reducing the likelihood that marginal components will fail during the life of the equipment Reducing the effects of parameter variations Reducing long-term drift in parameter values Providing an allowance for uncertainty in stress calculations Providing some protection against transient stresses, such as voltage spikes Design Simplification. Reducing the complexity of equipment is one way of improving its reliability as well as its manufacturability and maintainability. The following are considered when simplifying a design: Identify components that can be eliminated or combined with other components in the equipment. Ensure that the simplification does not impose higher stresses on other components in the equipment. Do not replace components known to be reliable with components that perform complex or multiple functions unless the latters reliability is known. G. Boothroyd and P. Dewhurst have developed a process that can be used to simplify a design. This process involves reducing the number of individual components and ensuring that the remaining components are easy to manufacture and assemble. Redundancy. Unlike the goal of design simplification, redundancy actually increases the number of components. However, the use of redundancy is one of the most effective ways to improve reliability. The following are different types of redundancy: Pure Parallel Redundancy. In a pure parallel system, more than one component or subsystem can perform the same function. If two components are in a pure parallel system, failure of the equipment due to these components requires both components to fail. Pure Parallel with Partial Redundancy. In this system, equipment works if some of the components work. For example, the system consists of four independent components where successful operation of the equipment occurs when any two of the four are working.
Technology Transfer # 92031014A-GEN

SEMATECH

76 Standby with Changeover Redundancy. This system has one component operating and one or more identical components in standby. When one component fails, the next component takes over. The assumption here is that no repairs are carried out on failed components until all of the components have failed; that is, the first component and all of the standby components have failed. Standby with Several Operating Components. In this system, there are N operating components and n components in standby. For example, the system consists of 5 identical components, 3 of which must work for the system to be successful. If one of the components fails, another takes its place. This continues until there are none left to take over for a failed component; then repairs occur.

Procedural changes. Procedural changes involve creating a new procedure or changing an existing one to prevent reliability degradation. For example, improving procedures for handling electrostatic-sensitive parts or for aligning dimensionally critical components. Process control. Process control involves modifying a manufacturing process that is degrading reliability. The idea, simply stated, is that if the manufacturing process is understood and controlled, the equipment will come out all right. J. Tunner discusses five basic steps which, if followed, lead to total manufacturing process control: 1. Clearly defining what is required of the equipment 2. Understanding the production process 3. Improving the process so that acceptable equipment is manufactured 4. Controlling and monitoring the process itself 5. Searching out new quality improvement opportunities These steps are applicable to any manufacturing operation. There are numerous tools that are useful for process control: Cause & Effect (Fishbone) Diagrams, Design of Experiments, Pareto Diagrams, Process Capability, and Taguchi Methodology. It is important to note that the success of these steps depends on taking a team approach; that is, operators, engineers, scientists, supervisors, and other key persons throughout the company are involved in all steps. Design for Maintainability. Equipment maintainability is defined as a measure of the ease and rapidity with which equipment can be restored to or maintained in an operational status. It is important that maintained equipment are designed so that maintenance tasks are easily performed and the skill level required for diagnosing, repairing, and scheduling maintenance is not too high. Desirable features include: Making access and handling easy Using standard tools and equipment Eliminating the need for delicate adjustments or calibrations The repairable system analysis tool is useful here in establishing maintenance policies and in highlighting subsystems, components, and parts that need to be more maintainable. While the designer has no control over the performance of the maintenance people, he or she can directly affect the inherent maintainability of the equipment.

SEMATECH

Technology Transfer # 92031014A-GEN

77 Deployment considerations. Reliability degradation during deployment is typically a result of the interaction between people and the equipment or the equipment and the environment. Some of these problems can be prevented if appropriate measures are taken. These measures include: Documenting deployment procedures Training personnel and users Testing during installation Providing technical assistance Identifying and correcting problems Establishing equipment change procedures Improper handling of equipment during delivery and installation can degrade the inherent reliability that has been designed into the equipment. To prevent problems associated with handling the equipment, procedures specifically developed for storage and shipping, installation, and handling and operation are created. In addition, training installation and maintenance personnel and users in the installation, operation, and maintenance of the equipment can significantly reduce reliability problems. Specifying a plan for testing during installation will verify that the installed equipment operates properly and according to specifications and that the equipment performance has not been degraded as a result of shipping and handling. Providing appropriate technical assistance helps customers solve problems. It is also important to identify and correct problems that occur during shipping, installation, operation, and maintenance. Problems are reported and recorded, carefully analyzed, and then reported to the design and manufacturing staff to prevent their recurrence. Equipment change procedures are the methods by which the equipment is changed in the field to meet or enhance the original performance specifications. These specifications are established to assure the customer that any such changes maintain compatibility with existing equipment and do not adversely affect customer requirements. Use of preferred and proven processes, components, and materials. The reliability of a piece of equipment depends on the reliability of its processes, components and materials. Concepts and procedures for ensuring process, component and material reliability include: Selecting, specifying, qualifying, and controlling materials and processes Qualifying and requalifying components Conducting a supplier testing and reliability monitoring program Monitoring subcontractors and suppliers Screening and derating components and materials Applicable Tools AT3 Cause & Effect (Fishbone) Diagram AT5 Design of Experiments (DOE) AT9 Pareto Diagram AT10 Process Capability AT16 Repairable Systems Analysis AT17 Taguchi Methodology
Technology Transfer # 92031014A-GEN

SEMATECH

78

References
Arsenault, J., J. Roberts, editors, Reliability & Maintainability of Electronic Systems, Potomac, MD, Computer Science Press, 1980, pp. 280-293, 365-393. Boothroyd, G., P. Dewhurst, Product Design For Assembly, Wakefield, RI:Boothroyd Dewhurst, Inc. Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 47-54. Davidson, J., editor, The Reliability of Mechanical Systems, London, Mechanical Engineering Publications Limited for The Institution of Mechanical Engineers, 1988, pp. 47-57. OConnor, P., Practical Reliability Engineering, Third Edition, NY, John Wiley & Sons, 1991, pp. 219-220, 117-125, 328-329. Skrabec, Q. Jr., "The Transition for 100% Inspection to Process Control," Quality Progress, April 1989, pp. 35-36. Smith, J. R., "Reliability Analysis By Simulation," 41st Annual Quality Congress Transactions, May 4-6, 1987, pp. 654-662. Tunner, J., "Total Manufacturing Process Control-The High Road To Product Control," Quality Progress, October 1987, pp. 43-50. Vanderbei, K., et.al., Reliability by Design, Indianapolis, IN:AT&T, 1990, pp. 105-114, 61-71. MIL-STD-470B, Maintainability Program for System and Equipment, Irvine, CA:Global Engineering Documents, 30 May 1989.

SEMATECH

Technology Transfer # 92031014A-GEN

79

Engineering Activity E10: Preventive Maintenance Program


If a specific component used in a piece of equipment has a reliability value below what is required, a method of circumventing this deficiency is through preventive maintenance (PM). PM involves developing a maintenance schedule where, prior to failure, Components that are partially worn out or aged are replaced by new components Components that require adjustments or become contaminated are inspected, readjusted, or cleaned, as required Developing a PM program is advantageous because it: Is less expensive than having the equipment down at an undesirable time Helps alleviate degradation in equipment performance Yields insights into how the next generation of equipment can be improved There are two tools that are useful in this activity; repairable systems analysis and user groups. Repairable systems analysis can be used to compare different maintenance policies, predict future numbers of repairs, and to highlight areas where preventive maintenance would improve the equipment reliability. User groups are an effective means of maintaining open communication between the equipment supplier and the equipment customer. Applicable Tools AT16 Repairable Systems Analysis AT18 User Groups

References
MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Oct. 1984, pp. 11-87 to 11-93, 12-47 to 12-49.

Technology Transfer # 92031014A-GEN

SEMATECH

80

Engineering Activity E11: Reliability of Purchased Components


Evaluating the reliability of purchased subsystems, components and parts allows the customer to choose those that are the best and that meet the reliability needs of their equipment. However, as the article by E. Broeker implies, evaluating reliability is not limited to purchasing various products and testing them to determine which has the best reliability. It also involves building a customer-supplier relationship, one based on mutual respect, trust, and benefit. Unfortunately, this takes time and requires management support from both the supplier and the customer. There are several methods that can be used to perform evaluations on the suppliers products when the customer-supplier relationship is limited to purchasing subsystems, components and parts: Request reliability performance information or data from potential suppliers. Be sure to obtain the basis for any reliability claims Include reliability performance requirements as part of the purchase contract Perform tests on the products Developing a good supplier-customer relationship requires planning. The supplier should be regarded as an extension of the customers process. Sharing information is often necessary if suppliers are to provide products that meet all of the customer requirements. A supplier will not be able to understand a customers problem without some knowledge of the customers processes and procedures, likewise for the customer. One of the first things the customer does before selecting a supplier is to establish clear, precise requirements for the product to be provided. This provides a basis from which various suppliers can be evaluated. Questions that may be asked include: Does the supplier understand what is required of the product? Can the supplier meet the schedule? What experience should the supplier have in making the product? How much will the supplier be involved in developing the customers equipment? How will conformance to requirements be measured? What sort of corrective action system will be required? The answers to these questions helps the customer evaluate which supplier can best meet its needs. It is also important to select suppliers who are capable of conforming to specified requirements. Information on this ability comes from a review of past performance, tests of incoming materials, and on-site evaluations. The on-site evaluation provides the opportunity to assess the suppliers methods of assuring reliability. The supplier may need to implement a reliability program similar to the one the customer is using. In order for the customer to produce high reliability equipment, their supplier must produce reliable products. This is non-negotiable. Customers and suppliers must agree completely on the required reliability of the product. This involves making sure that the supplier understands exactly what the customer expects, a willingness to explain why the suppliers product must meet the expectations, and establishing precise methods for accepting the product.

SEMATECH

Technology Transfer # 92031014A-GEN

81 One of the most important steps in the supplier reliability program is measurement and feedback. Measurements provide a means of determining if the supplier is meeting the agreed upon reliability requirements. Feedback gives the supplier the necessary information to improve the product. Finally, for every product that does not meet the reliability requirements, suppliers are asked what corrective action they will take. They should be able to provide answers to the following questions: What caused the product to not meet the reliability requirements? What changes need to be made to make the product meet the requirements? How will these changes be made foolproof? How will the customer know that these changes have been made? It is the customers responsibility to ensure that the supplier works to find the root cause of each failure to meet the requirements and takes the necessary action to permanently eliminate the cause. The role of the supplier in improving reliability of the equipment is critical. For the supplier to continuously improve the products reliability, the customer must demand it. Applicable Tools AT6 Environmental Stress Screening (ESS) AT8 Life Testing AT18 User Groups

References
Broeker, E., "Build a Better Supplier-Customer Relationship," Quality Progress, September 1989, pp. 67-68. Juran, J., F. Gryna, editors, Jurans Quality Control Handbook, Fourth edition, NY:McGrawHill, 1988, pp. 15.1-15.46, 30.18-30.21. Klock, J., "How to Manage 3,555 (or Fewer) Suppliers," Quality Progress, June 1990, pp. 43-47. Richardson, J., "Vendor Quality Assurance in a Process Industry," Quality Progress, November 1984, pp. 60-63.

Technology Transfer # 92031014A-GEN

SEMATECH

82

Engineering Activity E12: Ergonomic Studies


Both reliability and ergonomics are concerned with predicting, measuring, and improving equipment performance. Equipment failures are caused by human errors and equipment malfunctions. Thus, the overall equipment reliability is evaluated from the viewpoint that the equipment consists not only of the equipment and its associated procedures, but also includes the people who use them. One must identify and plan for human reliability factors and their effects on the overall equipment reliability. For example, when the interface between the human and the equipment is complex, the possibility of human error increases, with an accompanying increase in the probability of equipment failure. It is interesting to note that designing in reliability frequently includes detecting and correcting equipment malfunctions, which is a task often assigned to humans. Thus, the equipment performance can be enhanced or degraded, depending on whether or not the malfunction indicators are presented so that they are understood readily. Studying human response to audio and visual stimuli provides valuable guidance in the design of equipment malfunction indicators.

Ground Transportation First Aid Baggage Claim

Coffee Shop

Phones

Ergonomics (or Human Factors Engineering) is a discipline concerned with designing equipment, operations, and work environments to match human capabilities and limitations. Ultimately, everything that one designs has an impact on the human in one way or another. Someone will have to fabricate the equipment, package it, distribute it, unpack it and prepare it for use, operate or use it, service and maintain it, and finally dispose of it. For this reason, designers should be constantly alert to the human factors implications of their proposed design. Keep in mind that the ultimate success of the equipment depends on how well the user performs the tasks associated with it. The intent of human factors engineering in this document is to focus on and resolve humanequipment interface problems and solutions wherever or whatever they are. Philosophically, then, human factors engineering is looking at a design from the standpoint of user efficiency, or total human-equipment output effectiveness. Inherent in this philosophy are the following objectives: To make the users contribution to the equipment output as efficient as possible so that the basic equipment output is not compromised by human failures. To make the combined user-equipment involvement as safe as possible so that neither human nor equipment failures will compromise the users health or damage the hardware. Inherent in this objective is the avoidance of injury to others and of damage to adjacent hardware.

SEMATECH

Technology Transfer # 92031014A-GEN

83 To minimize the stress that the equipment imposes on the user as he or she uses, operates, services, or maintains it. This includes such stresses as an undue energy demand, frustration in trying to deal with the equipment at any point in the human-equipment interaction, and worry about whether one is using the equipment properly. To maximize the acceptability of the equipment, not only in terms of its attractiveness, but also in terms of giving users the feeling that the equipment allows them to use it efficiently and keep it in good working order with a minimum of effort.

The methods of ergonomics are based on a logical and systematic process of: (1) establishing the proper role of the human with the equipment, (2) designing the human-equipment interfaces to fit the humans capabilities and limitations, (3) evaluating and testing to see that the design does fit these capabilities and limitations, and (4) properly training the human to operate the equipment. If the equipment has used ergonomically sound human-equipment interfaces, the following items have been accomplished: The equipment conforms to populational stereotypes and user expectations It is easy to learn how to operate the equipment Easily perceived displays and simple controls allow effective and efficient communication between humans and the equipment The tasks allocated to humans and the equipment are based on known relative strengths and weaknesses Relevant information is provided to the user by the equipment which avoids reliance on the users memory Effective and efficient performance of equipment functions are facilitated Whenever practicable, human engineering specialists should be used to help identify and solve human engineering problems. However, this is not always possible. There are numerous human factors references available; however, most of these references are directed to human factors or human engineering specialists. The reference provided at the end of this activity has been directed specifically toward the engineer or designer and provides a number of guidelines to assist designers in doing their own human engineering. Its purpose is to provide a general reference to key human factors questions and human-equipment interface design suggestions in a form that engineers and designers can utilize with a minimum of searching or study.

References
Woodson, W., Human Factors Design Handbook Information and Guidelines for the Design of Systems, Facilities, Equipment, and Products for Human Use, New York:McGraw-Hill Book Company, 1981.

Technology Transfer # 92031014A-GEN

SEMATECH

84

Engineering Activity E13: Software Reliability Studies


Software failures impact the ability of a piece of equipment to accomplish its intended function. Therefore, the equipments reliability model must include appropriate software components; that is, software reliability must be an integral part of equipment reliability concerns. In addition, software must be managed to reduce these concerns within project constraints. Software reliability is defined as the probability that software will perform its intended function for a specified period of time, in a specified environment. Three key concepts are: Failure. A failure is defined as an inability of equipment controlled by software to successfully perform in accordance with its specified requirements. The source of the failure is an identified software fault; the source of the fault is a human error. Time. The measure of time includes calendar time, operational time, and computer processor unit time. From a user perspective, calendar time and operational time are the most important. From a modeling accuracy perspective, operational time and CPU time are the most important. From a data collection perspective, calendar time is the easiest to collect while CPU time is the most difficult. Environment. The environment includes; the input domain scenario, profile, and tests being conducted; the parts of the equipment being used during the tests; and the physical environment in which the tests are being conducted. The actual operational environment of the software is of the most interest. That is, the closer the test scenario, equipment configuration, and physical environment are to the actual operating environment, the more accurate the software operational reliability computations will be. Software reliability management is concerned with meeting the software reliability goals by building the software to satisfy requirements consistent with project constraints; such as, cost, schedule, resources, and performance. Software reliability management has two complementary elements: software design reliability and software operational reliability. Software design reliability is concerned with improving the software life cycle processes and the individual products; that is, the plans, specifications, code, and tests, that are the inputs and outputs of those processes. An emphasis is placed on early defect prevention, fault detection and fault removal. Software operational reliability is concerned with measuring how well the software performs or is predicted to perform its intended function in its operational environment. The emphasis here is on the use of testing and failure data measurements. A checklist of activities that will improve software design reliability includes: CHECK 1: Baseline Current Software Processes Define the current software development and support processes CHECK 2: Identify Immediate Areas of Improvement Management Engineering Training

SEMATECH

Technology Transfer # 92031014A-GEN

85 CHECK 3: Train Personnel in Priority Areas Software requirements Software testing Software configuration management Software inspections The primary indicator of process improvement at this time is the use of software inspections to identify and classify defects throughout the software life cycle. The intent is to find as many defects as possible, conduct a root cause analysis to identify how the process might be improved in order to reduce defects in the future, and measure the resources; that is, the time, personnel, and costs, required to correct the defects. There is emerging research that is attempting to link the early defect identification with the software operational reliability failure data. A checklist of activities that will improve software operational reliability include: CHECK 1: Define equipment and software reliability goals Probability Failure intensity Fault density CHECK 2: Analyze failure data from equipment test/operation Equipment identification data
Equipment Identification/Version Subsystem Identification/Version Location of Equipment Software Release #/Version Software Component Version : : : : : : : : : : : : : : : [name & version#] [three characters] [site name] [release #] [version #] [id#] [mo/da/yr] [hh:mm:ss] [hh:mm:ss] [hh:mm:ss] [hh:mm:ss] [mo/da/yr] [1,2,3,4,5] [text description] [task logs]

Test execution data


Test Procedure/Sequence Test Start Date (Calenda) Test Start Time (Operational) Test Start Time (Execution) Failure Time (Execution) Failure Time (Operational) Failure Date (Calendar) Failure Classification Problem Description Log File Data

Technology Transfer # 92031014A-GEN

SEMATECH

86 Failure identification data


Failure Identification # Failure Node Failure Reference Id# Failure Correction Time (est) Failure Correction Time (act) Failure Correction Resources (est) Failure Correction Resources (act) : : : : : : : : : : : : [id #/ unique] [component id#] [previous failure] [work days] [work days] [person days] [person days] [fatal|chg|info] [1-high to 7-low] [open/closed & date] [acc/kill/def &date] [release #]

Management status data


Classification Priority Status Disposition Scheduled Release

CHECK 3: Apply failure classification scheme Code Severity 1. Equipment Abort Description of Failure A software or firmware problem that results in an equipment abort or crash. A software or firmware problem that severely degrades the equipment and no alternative workaround exists; restarts not acceptable. A software or firmware problem that severely degrades the equipment and an alternative workaround exists; process can continue with more operator action; restarts not acceptable. An indicated software or firmware problem that does not severely degrade the equipment or any essential function; restart acceptable. All other minor problems/non-functional faults due to software or firmware problems.

2.

Equipment Degraded No Work-around

3.

Equipment Degraded Work-around

4.

Equipment Not Degraded

5.

Minor Fault

CHECK 4: Apply operational reliability model for the decision process Poisson process models are typical. When will software meet reliability goals? When can software release be delivered? What level of support will be required?

SEMATECH

Technology Transfer # 92031014A-GEN

87 An example set of data collection, analysis, and reporting process flow steps include: STEP 1: Begin test sequence. STEP 2: Collect equipment and execution data for each failure. STEP 3: Send collected data to analysis personnel at end of test sequence. STEP 4: Respond to queries from analysis personnel for more information. STEP 5: Record failure and management status data. STEP 6: Update software operational reliability data base. STEP 7: Generate failure/fault count summary reports. STEP 8: Update software operational reliability model. STEP 9: Generate software operational reliability measures, graphs. STEP 10: Provide summary of results to management on a regular basis. The references provide more detail about software reliability.

References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering, NY:McGraw-Hill, 1988. Musa, J., A. Iannino, K. Okumoto, Software Reliability: Measurement, Prediction, Application, NY:McGraw-Hill, 1987. SETEC, "Software Reliability for SEMI/SEMATECH Companies (Draft)," SEMATECH, SETEC-91-032, December 20, 1991.

Technology Transfer # 92031014A-GEN

SEMATECH

88

Engineering Activity E14: Failure Modes and Effects Analysis (FMEA)


Failure modes and effects analysis (FMEA) is a technique for systematically identifying, analyzing, and documenting the possible failure modes that exist for a piece of equipment and the effects of such failures on the equipments performance. The term failure mode is used to refer to the possible ways in which a component can fail. If the criticality of each failure mode is analyzed, the analysis is called a failure modes, effects, and criticality analysis (FMECA). The purpose of the criticality analysis is to rank each potential failure mode identified according to the severity of the failure and its probability of occurrence, based on the best available data.

Equipment: Subsystem: Reference Drawing: Subsystem /Module & Function Potential Failure Mode Potential Local Effect(s) Of Failure

FMEA Fault Code # Potential End Effect(s) Of Failure S E V C Potential R Cause(s) Of Failure

Date: Sheet: Prepared By: O Current C Controls C /Fault Detection Recommended Action(s)

The complexity of the equipment and the availability of data dictate the FMEA analysis approach that will be used. There are two primary approaches for accomplishing an FMEA. One is the hardware approach which lists individual hardware components and analyzes their possible failure modes. The other is the functional approach which recognizes that every component is designed to perform a number of functions that can be classified as outputs. These outputs are listed and their failure modes are analyzed. For complex systems, a combination of the functional and hardware approaches may be used. The FMEA may start at the highest equipment level and proceed down to lower levels (top-down) or start at the lowest level and proceed to the highest equipment level (bottom-up). The hardware approach is normally used when hardware components can be uniquely identified from schematics, drawings, and other engineering and design data. This approach is generally done bottom-up. The functional approach is normally used when hardware components cannot be uniquely identified or when equipment complexity requires analysis from the highest equipment level down through succeeding levels. This approach is generally done top-down. An FMEA analysis is used to: Ensure that all conceivable failure modes and their effects are understood Assist in the identification of design weaknesses Select design alternatives Select design improvements Prioritize corrective actions
SEMATECH
Technology Transfer # 92031014A-GEN

89 Select test programs Assist in troubleshooting existing equipment with operating problems

Since an FMEA concentrates on identifying possible component failures and their effects on the equipment, design deficiencies can be identified and improvements can be made. Identification of potential failures leads to a recommendation for an effective test program. Failure modes can be prioritized according to their frequency so that concentrated effort can be placed on the higher priority components; that is, on those components with the most failures. A limitation of the FMEA analysis is that it considers each failure mode individually, if a single failure does not affect the equipment but two or more failures do, the FMEA analysis is not well-suited to assessing the combined effects of these failures on the equipment. As the equipment proceeds through the life cycle phases, one may conduct a progressively more detailed FMEA analysis. An FMEA analysis consists of four steps: 1. Establishing the scope of the analysis 2. Collecting data 3. Preparing a components list 4. Preparing the FMEA worksheets It is important to clearly state the scope of the FMEA analysis. Clearly identifying the boundaries of the equipment so that no component within that equipment is left out is an important part of the scope. Also included in the scope is the identification of underlying causes of failures and the possible effects of these failures on the equipment. Failure detection, safeguards, frequency of the failure, and the criticality of the effects of the failure information may also be included. The type of information necessary to perform the analysis includes: equipment configurations, designs, specifications, and operating procedures. Data may also be collected by interviewing: design personnel; operations, testing, and maintenance personnel; component vendors; and outside experts, to gather as much information as possible. A list of all components in the equipment is prepared before examining the potential failure modes of each of those components. Functions, operating conditions (such as; temperature, loads, and pressure), and environmental conditions of each component may be included in the components list. According to C. Sundararajan, the following questions are answered for every component of the equipment. 1. How can the component fail? (There could be more than one mode of failure.) 2. What are the consequences (effects) of the failure? 3. How critical are the consequences? 4. How is the failure detected? 5. What are the safeguards against the failure? How many of these questions are asked and which ones they are depends on the scope and purpose of the analysis. When these questions are answered, all significant failure modes of the different components are identified, their detection and safeguards are documented, and their effects on the equipment are determined.

Technology Transfer # 92031014A-GEN

SEMATECH

90 Findings of the FMEA analysis are recorded in a tabular format in FMEA worksheets. MIL-STD1629A describes the worksheets in detail.

References
Sundararajan, C., Guide to Reliability Engineering Data, Analysis, Applications, Implementation, and Management, NY:Van Nostrand Reinhold, 1991, pp. 146-152. MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Irvine, CA:Global Engineering Documents, 12 October 1988, Global Engineering Documents, pp. 7-100 to 7-121. MIL-STD-1629A, Procedures for Performing a Failure Mode, Effects, and Criticality Analysis, Washington, DC:Department of Defense, 24 November 1980.

SEMATECH

Technology Transfer # 92031014A-GEN

91

Engineering Activity E15: Equipment Characterization


The characterization of a piece of equipment involves identifying the optimal and extreme operating ranges for: Electronic parameters, such as voltage, current, frequency Environmental parameters, such as temperature, humidity, vibration Mechanical adjustments, such as dial settings and clearances Faulty inputs, such as gas, water, electrical Operational characteristics, such as wafer handler arm velocity versus number of broken wafers During characterization, the preferred operating range of an individual component is determined as well as the impact of this range on the other components in the equipment. Components that do not have a range that interfaces properly or are totally incompatible with the other components are replaced or redesigned. This process is continued until compatible ranges are established for all components. Applicable Tools AT5 Design of Experiments (DOE) AT8 Life Testing AT10 Process Capability

Technology Transfer # 92031014A-GEN

SEMATECH

92

Engineering Activity E16: Component Failure Analysis


The purpose of failure analysis is to determine what failed and why it failed. The root cause of the failure is determined so that the correct change is made and the failure does not recur. Root causes include: operator or maintenance errors, over stressed parts, and factory assembly errors. A useful tool for helping to determine root causes is a cause and effect diagram. Failure analysis is most appropriate when either: A particular component type is failing at a significantly higher rate than previously estimated or predicted The failure has a major impact on safety or performance or both. All details concerning the failure are recorded, these include: The observed failure mode The relevant conditions under which the failure occurred How long the failed component was operating before failure (Operating time should be estimated if it is possible that the component was inoperative a significant amount of time before it was noticed.) Extenuating circumstances; such as damage occuring during troubleshooting Assembly drawings of mechanical components A copy of the component data sheet or specifications Circuit schematic diagrams for electronic components The undisturbed failed component within its assembly (This is particularly important for mechanical or electromechanical components.) The level or depth of the failure analysis depends on the level where the corrective action will be taken. For instance, if a subassembly may be replaced by a more reliable one from another supplier, it is only necessary to determine which subassembly failed. Otherwise, the specific component that failed is found and replaced within the equipment. The supplier of a failed component may perform the failure analysis at no cost since there is a vested interest. Supplier application or product engineers can also be very helpful in pointing out possible improvements to their components. Failure analysis laboratories are set up to analyze components and can also be helpful. Major semiconductor suppliers and failure analysis laboratories analyze components using visual microscopes, scanning electron microscopes, dissection, and elemental analysis techniques. Applicable Tools AT3 Cause and Effect (Fishbone) Diagram

SEMATECH

Technology Transfer # 92031014A-GEN

93

Engineering Activity E17: Failure Reporting, Analysis and Corrective Action A Failure Reporting, Analysis and Corrective Action System (FRACAS) provides a closed-loop feedback path by which data on failures occurring during field tests and operation are collected, recorded, and analyzed to determine where problems are concentrated in the design. This promotes continuous improvement in equipment reliability. A FRACAS is also used to track internal test performance and provides a good historical basis for comparison to external equipment performance.
Test Inspect Correct Reliable Product

Design and Production


CUSTOMER

FAILURE REVIEW BOARD

Test failure report Quality Assurance Report

DATABASE

CORRECTIVE ACTION

Actions

Reports

FAILURE REPORTING

Development Implementation Verification

Analysis

ANALYSIS

Failure Investigation Cause Investigation

Reprinted with permission 1991 Society of Automotive Engineers,Inc.

A FRACAS is used to: Establish a closed-loop failure reporting system Establish procedures that are used to determine the cause of subsystem and component failures Document the corrective actions taken The reason for establishing a closed loop system is that it allows one to collect, analyze, and record failures down to a specified level, that is to the subsystem, component and part level. Procedures for initiating failure reports, the analysis of failures, feedback of corrective action into the design, manufacturing and test processes are identified. The closed-loop system includes provisions that ensure that effective corrective actions are taken on a timely basis by a follow-up audit that reviews all open failure reports, failure analysis and corrective action suspense dates,
Technology Transfer # 92031014A-GEN

SEMATECH

94 and the reporting of delinquencies to management. The failure cause for each failure is clearly stated. The objectives of a FRACAS are to: Assess historical reliability performance Develop a pattern of deficiencies Provide engineering data for corrective action Develop statistical data for component failure rates and downtime component selection suitability criteria component application reviews future designs and design reviews product improvement programs spares provisioning life cycle costing Develop contractual performance data Provide warranty information Furnish safety and regulatory compliance data Assess liability-claim information

References
A Reliability Guide to Failure Reporting, Analysis, and Corrective Action Systems, Milwaukee, WS:American Society for Quality Control, 1977. MIL-STD-785B, Reliability Program for Systems and Equipment Development and Production, Task 104, Philadelphia, PA:Naval Publications and Forms Center, 1980.

SEMATECH

Technology Transfer # 92031014A-GEN

95

Society of Automotive Engineers Data

Activity D1: Data Collection and Data

One of the building blocks for FRACAS is the collection of data and managing that data with a data base management system. Together, they provide an organized way to gather factual data about equipment performance - both good and bad. Based on the reliability model for the equipment, a shopping list for data is established. Each component or subsystem modeled in the fault tree or block diagram requires data in the form of a failure probability or frequency. Several types of data are needed to determine the failure probability and to assess product reliability: Cumulative operating time Number of failures Conditions present at the time of failure There are three methods used for collecting reliability data. The first method involves the use of a standardized reporting form that is filled out by engineers and technicians who are involved in equipment testing, troubleshooting, and repair. These forms need to be simple to use and ask only for needed information. An example of a reliability reporting form is on the following page. To obtain a better understanding of the final use and importance of the data; personnel involved in the collection of the data, final test technicians, and field service engineers are part of the team that designs the data collection form and are involved in analyzing the data. The second method involves the use of customer database and equipment tracking information. This requires an excellent on-going customer supplier relationship. Great care must be taken to ensure compatibility between the supplier and multiple customers data. Simply agreeing to SEMI E10-90 specifications will not suffice; although basing the specifications on E10-90 makes it industry compatible. In addition, a standard way of identifying failures and assists to the subsystems and components should be devised. Inclusion of key customer equipment engineers in evaluating the validity of the data collected is very useful. The third method is to use the on-board CPU power to monitor and track equipment status, faults, and errors. Customers agree to allow the information to be downloaded to a floppy disk and removed from the site. The ability to time stamp and match this information to customer data base information provides useful data.

Technology Transfer # 92031014A-GEN

SEMATECH

96

Project/Model

Part Name Affected

Date Problem Found

Part Number Affected

Name of Major Component Affected

Description of Problem (what, where, when, how many, etc.)

Impact/Effect/Consequences of Problem

Apparent Cause of Problem

Remarks

Reported By

Date

Referred Problem To

SEMATECH

Technology Transfer # 92031014A-GEN

97 If there is no equipment in the field from which to collect data there are several sources of data available: Historical data Sub-tier supplier data In-house data Expert judgement Historical data is data that has been collected for a previous generation of equipment or similar equipment. The use of this data is limited to those subsystems and components that are similar to those in current equipment. This data also requires that attention is paid to trends; that is, if the subsystem or component had been undergoing improvements or if the methods of collecting the data were changing, these must be accounted for. When a subsystem or component is purchased from a supplier, that supplier should be able to supply the data that has been collected for that part up to this point in time. Once a testing program exists for the equipment, in-house data is available. For those subsystems and components that have none of the previous sources of data available, expert judgement can be used to create initial reliability values. Expert judgement takes the opinion of individuals who are considered to be knowledgeable about a subsystem or component and uses this knowledge to create failure rates. It should be noted that these sources of data do not always represent the environment and operating conditions that the equipment will see in the field. Thus, the preferred source of data is always field data. When collecting data, it is important to keep all of the data. This makes it possible to represent the subsystem and component failure rates over a range of values and more accurately represents the variety of environments and users that the subsystem and component will see. It cannot be stressed enough that the validity of the reliability model and its predictions depend on the validity of the data. A statement commonly used by software users is, "Garbage In, Garbage Out," which is just as applicable here. As soon as possible replace historical and expert judgement data with data collected during testing and operation in the field. At this time it is important to discuss how the collected data is translated into failure rates, that are used to improve the equipments reliability. In a typical piece of equipment, some components are under stress or used continuously while others are used cyclically. Thus, failure rates can be defined as a function of time (per hour) or cycle (per wafer). In either case, the collected data includes the number of cycles, wafers, or hours during which the failures occurred. Failures are evaluated to assure that the failures were genuine and resulted in equipment shutdown or lost production time. Once the evaluation is done, translating data into failure rates is fundamentally simple. Suppose that a database includes 25 machines operating over a 9month period. If component A failed 20 times and the average operational time for the 50 machines was 70 percent (that is, its utilization factor is 0.70), the failure rate for component A would be MTBF = 20/[25(9 mo.)(30 days/mo.)(24 hr./day)0.7] = 1.8x10-4 failures/hr.

Technology Transfer # 92031014A-GEN

SEMATECH

98 Suppose a second component, B, failed 12 times, but it relates to wafers, and the machine averages 10 wafers/hr. the failure rate of component B would be 12/[25(9mo.)(30 day/mo.)(24 hr./day)(10 wafers/hr.)0.70] = 9.5x10-5 failures/wafer processed. Alternatively, it would be MTBF = 9.5x10-5 failures/wafer(10 wafers/hr.) = 9.5x10-4 failures/hr. The key, of course, is knowing or estimating the utilization factor. This can be determined by tabulating and averaging the operational times of all 25 machines. It can also come from groups of machines, given general production information. Applicable Tools AT18 User Groups

References
Bigelow, J., "Tailored Data Collection," Quality, August 1991, pp. 21-22. Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 47-54. SEMI E10-90, Guideline For Definition And Measurement Of Equipment Reliability, Availability, and Maintainability (RAM), SEMI 1990, pp. 69-75.

SEMATECH

Technology Transfer # 92031014A-GEN

99

Data Activity D2: Human Reliability Analysis (HRA)


Human reliability analysis (HRA) is a technique used to systematically identify, analyze, quantify, and document the possible human failure modes within a design, and the effects of such failures on the overall equipment reliability. Analyses of the behavior and needs of humans are among the more controversial of the sciences; thus, it is no surprise that there are several competing approaches to the handling and identification of people problems. The most widely used quantitative HRA technique is the Technique for Human Error Rate Prediction (THERP), developed at Sandia National Laboratories. THERP is defined as a method to predict human error rates and to evaluate the degradation to a man/machine system likely to be caused by human errors in association with equipment functioning, operational procedures and practices, and other system and human characteristics which influence system behavior. There are five steps in applying the THERP model: 1. Define equipment failures 2. Identify related human operations and tasks related to each equipment failure 3. Estimate associated human error probabilities 4. Estimate the effects of the human errors on the equipment reliability 5. Recommend changes to the man/machine system and return to step 2 The NATO article listed below summarizes and explains the THERP model (and extols its virtues). The article from Human Factors is an annotated bibliography of Sandia Laboratories work in this area and will be very helpful to anyone trying to estimate the effects of human frailty on a system. It also lists 44 sources of further information.

References
Ericson, D., editor, et.al., Analysis of Core Damage Frequency: Internal Events Methodology, NUREG\CR-4550, Volume 1, Revision 1, SAND86-2084, Albuquerque, NM:Sandia National Laboratories, pp. 7-1 to 7-80. Siegel, A., J. Wolf, A Technique for Evaluating Man-Machine Systems Design, Human Factors, 3:1, 1961. Swain, A., H. Guttmann, Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications, NUREG/CR-1278, SAND80-0200, Albuquerque, NM:Sandia National Laboratories, August 1983. Swain, A.D., Shortcuts in Human Reliability Analysis, Holland:Nordhoff Publishing Company, NATO Advanced Study Institute on Generic Techniques in Systems Reliability Assessment, 1975. MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Vol. I of II, Irvine, Ca:Global Engineering Documents, 12 October 1988, pg. 7-100.

Technology Transfer # 92031014A-GEN

SEMATECH

100

Test Activity T1: Test Plans


Testing activities are driven by the need to ensure that all goals and requirements for a piece of equipment and its subsystems, components, and parts are achieved. Testing is the primary means of generating enough data on critical components, parts, and subsystems to reduce the uncertainty of data being fed into the equipment model. It also provides a predetermined strategy for testing the equipment as a whole. The testing plan includes testing across all life cycle phases, and is updated and refined as required. The plan changes as the equipment passes through each phase of the life cycle and is updated at the time of transition from one phase to the next. A testing plan encompasses all aspects of testing necessary to meet reliability goals. Since testing is one of the basic tools in reliability improvement it is also a means of providing continuous improvement. The testing plan includes procedures and criteria for: Testing equipment and subsystems Testing components and parts Reliability demonstration testing For every test performed, there must be a clear definition of requirements so that the proper type and number of tests are conducted, valid measurements are made, and the necessary data are obtained. One good practice is to predict the expected results or level of performance based on calculations or best engineering judgment. These predictions serve as a guide for monitoring the tests and assessing the validity of the test results. During testing, it is not unusual to experience unexpected failures. Some of these may be fluke conditions, but more often each failure is an indication of a true problem. Thus, it is a good practice to include all test failures in the failure statistics and investigations. Specific tests should be planned to coordinate with the total testing program so that the derived information has the maximum possible value for continuing application throughout later stages of the program.

References
Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 51-52. MIL-STD-781D, Reliability Testing for Engineering Development, Qualification, and Production, Washington, DC:Department of Defense, 17 Oct 1986. Arsenault, J., F. Roberts, editors, Reliability & Maintainability of Electronic Systems, Potomac, MD:Computer Science Press, Inc., 1980, pp. 353-354.

SEMATECH

Technology Transfer # 92031014A-GEN

101

Test Activity T2: Reliability Tests


Reliability tests consists of testing: Components and parts Equipment and subsystems To demonstrate reliability Testing in these three categories is discussed in more detail in the following paragraphs. Component testing involves testing individual components and parts to examine the relative merits of alternative designs and to determine design margins. Such tests are also useful in determining the validity of design and calculation methods. Component testing forms an important part of development, at this stage, various components and parts are tested over a wide range of conditions. This is done to insure that the best of several alternative designs will be chosen, and that the part or component will perform satisfactorily at other than nominal conditions when integrated into the equipment. Problems associated with component testing include realistically simulating equipment environments, including parametric input and variation to the component or part and the determining number of tests required to demonstrate reliability. Thus, component testing is better suited to improving reliability by optimum selection; that is, flushing out basic weaknesses in critical components and parts, than to determining the absolute value of reliability. There are several tools that are useful for testing components and parts. Accelerated testing can be used to gather reliability data in a shorter period of time; it can be used with Environmental Stress Screening (ESS) and Reliability Development/Growth Testing (RD/GT). ESS can be used to stimulate failures by stressing the component or part to detect and remove early failures. RD/GT is used to identify and correct failure modes and then to verify that the failure has been eliminated. Life testing can be used to evaluate the useful life or reliability of a component or part. Burn-In testing is used to screen out defects in the part or components during the respective infant mortality periods. (See AT2). Equipment testing involves testing of individual subsystems or the equipment itself. Equipment testing is basic to reliability improvement. In order to achieve the best results, the equipment and subsystems should be tested under conditions that closely simulate the expected operating conditions. Equipment tests are intended to explore the effects of component and part interactions under loading and environmental conditions of the real world. The tests are conducted on an iterative basis; that is, they follow a test, fail, fix, and retest approach. This approach is intended to find the failure mode for the weakest link and design it out, find the second weakest link and design it out, and so on, until an adequate level of reliability performance is achieved. Equipment tests are also performed to see whether certain configurations are feasible or which of several are optimal with respect to performance, cost, and modes of behavior under varying conditions. When testing on the equipment level, there is obviously no need to simulate internal environments. The equipment has a lower reliability requirement relative to its components and parts. This makes it easier to demonstrate an absolute reliability number, which is dependent on the cost and/or number of equipment and subsystems available for testing. If started too soon,
Technology Transfer # 92031014A-GEN

SEMATECH

102 many failures will occur in components and parts that have not been sufficiently proven out; this makes failure tracking difficult. Another disadvantage in starting equipment testing too early is that if too many component and part failures occur, the remainder will be subjected to too many start operations, which are perhaps severer than steady-state operation. Consequently, a false impression of the failure distributions will occur, compared with those expected in operation. Equipment testing focuses on "Is the component or part reliable within the subsystem or equipment?" Equipment testing does not eliminate component testing, but helps to pinpoint the faulty components or parts, so that they may be replaced or modified by superior products. Equipment testing is a way of realistically evaluating reliability as well as guiding component and part improvement by systematically discovering problems and weaknesses. There are several tools that are useful for testing subsystems and equipment. As with component tests, accelerated testing can be used to gather reliability data in a shorter period of time. It can also be used with Environmental Stress Screening (ESS) for subsystems and Reliability Development/Growth Testing (RD/GT) for both subsystems and equipment. ESS is not done at the equipment level; however, it is useful at the subsystem level. ESS can be used to stimulate failures by stressing the subsystem to detect and remove early failures. RD/GT is used to identify and correct failure modes and then to verify that the failure has been eliminated. Reliability Qualification Testing (RQT) is used to verify that critical subsystems and the equipment meet design goals and comply with contractual/program objectives. Life testing can be used to evaluate the useful life or reliability of a subsystem or the equipment. Burn-In Testing is used to screen out defects during a subsystems or equipments infant mortality period. Reliability Demonstration Tests are used to demonstrate, often to the customer, that the equipment is capable of meeting its specified performance and reliability for a stated period of operation. This type of test can be very expensive and requires careful planning and execution. The equipment and its associated subsystems, components, and parts that are going to be tested, and the test conditions to be used must be closely controlled to ensure the validity of the final results. It is often the practice to disassemble the items totally after the tests are completed to inspect each one for wear, damage, or signs of impending failure. A tool that is very useful for reliability demonstration tests is Reliability Qualification Testing (RQT). RQT is used to verify that the equipment will meet design goals and comply with contractual/program requirements. Applicable Tools AT1 Accelerated Testing AT2 Burn-In Testing AT6 Environmental Stress Screening (ESS) AT8 Life Testing AT13 Reliability Development/Growth Testing (RD\GT) AT14 Reliability Qualification Testing (RQT)

SEMATECH

Technology Transfer # 92031014A-GEN

103

References
Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 51-52. Lloyd, D.K., M. Lipow, RELIABILITY: Management, Methods, and Mathematics, Second Edition, Milwaukee, WS:The American Society for Quality Control, 1991, pp. 349-354.

Technology Transfer # 92031014A-GEN

SEMATECH

104

Applicable Tool AT1: Accelerated Testing


Accelerated tests are performed when the test time necessary to provide adequate reliability assurance under normal operating conditions is inordinately long, and therefore very expensive. Gathering reliability data should not hold up the development of the equipment and it should be as economical as practicable. Therefore, it is important to be able to accelerate reliability tests. Reliability tests can be accelerated by increasing the sample size, provided that the item being tested does not have wearout characteristics during its anticipated life. Increasing the sample size is appropriate for small, cheap items that can be produced in quantity. Using the large sample reduces the error in the reliability estimate for the population due to part-to-part variability. However, large-sample reliability tests, to provide a high total operating time, should be supported by some long duration testing if there is reason to suspect that failure modes exist which have high times to first failure. Extrapolation of reliability data over long periods of time must be treated with caution, and therefore whenever practicable, supporting long duration tests should be considered. A particular type of large sample test is sudden death testing, in which the sample is split into subgroups and the time to first failure in each group is collected. Increasing the severity of the test is an obvious approach when large samples cannot be provided. However, there are two problems: 1. What is the equivalent operating time under normal stress? 2. Are the failures induced under the accelerated test conditions the same as those that might occur under normal conditions? Another type of accelerated testing is step-stress testing. Step-stress testing is a technique whereby the item is tested initially at normal stress, but after a certain time the stress is increased, and stepwise increases are continued until the item fails. It is important in accelerated testing to ensure that unrealistic failure modes are not introduced by the higher stresses. The physics of the materials being tested and analysis of failures should indicate whether or not such failure modes are likely to occur or be stimulated. Obviously, failure modes that can occur only at stresses well above the maximum operating stress will not be of interest. For example, increasing temperature beyond a certain level may change the strength of a material, so it is important that temperature increments are kept within limits. It is also possible that interactions may occur between different stresses, so that the combined weakening effect is greater than would be expected from a simple additive process.

References
Hall, I., W. Cramond, D. Huffman, Summary of the SETEC Accelerated Testing Workshop, SETEC91-017, Albuquerque, NM:Sandia National Laboratories, 1991. OConnor, P., Practical Reliability Engineering, Third Edition, NY:John Wiley & Sons, 1991, pp. 264-267.

SEMATECH

Technology Transfer # 92031014A-GEN

105

Applicable Tool AT2: Burn-In Testing


Burn-in is a special type of test that might be better described as a pre-delivery operation of the equipment. The following figure shows the life cycle failure probability as a curve that resembles a cross-section of a common bathtub.

The left decreasing portion of the curve is the infant mortality period, where a disproportionate number of failures occur early in the equipments lifetime. The flat part represents the constant failure rate during the useful life of the equipment. The right increasing portion is the wear-out period. It is useful to know, as closely as possible, where the infant mortality ends and the wear out starts, even when burn-in tests are not performed. Burn-in has proven to be an effective means of screening out defects during a components infant mortality period. The typical burn-in test combines electrical stresses with temperature cycling for short periods of time to activate temperature and voltage failure mechanism dependencies. The two types of burn-in tests are static and dynamic. In static burn-in, a bias may be applied to the device under test at very high temperatures. In dynamic burn-in, entire circuit cards may be operated to simulate actual equipment operation. Screening out the infant mortality failures results in more reliable components. Because most of the failures occur during the infant mortality phase of the components life, this method of testing results in reliability improvement of the equipment. Burn-in tests are usually conducted on 100% of the production units to weed out production errors related to minor variations in workmanship and process fluctuations that result from engineering changes. Burn-in tests also discover some residual design errors. In these tests, the stresses applied are usually within published performance constraints, and are applied for short periods of time. Their purpose is to prevent production-related errors from being shipped. Products that have undergone burn-in tests should be failure free.
Technology Transfer # 92031014A-GEN

SEMATECH

106

References
Klinger, D., Y. Nakada, M. Menendez, AT&T Reliability Manual, NY:Van Nostrand Reinhold, 1990, pp. 52-57. Punches, K., "Burn-In and Strife Testing," Quality Progress, May 1986, pp. 93-94.

SEMATECH

Technology Transfer # 92031014A-GEN

107

Applicable Tool AT3: Cause & Effect (Fishbone) DiagramTool AT3:


The cause-and-effect diagram was invented by Dr. Kaoru Ishikawa to represent the relationship between some effect, that is problem, and all the possible causes influencing it. The diagram is also called a Ishikawa diagram, or a fishbone diagram because a well-detailed diagram will take on the shape of fishbones. The main problem is indicated on a horizontal line and possible causes of that problem are shown as branches. A common set of major categories for causes consists of Personnel Work methods Materials Equipment Environment

For each cause ask, "Why does it happen?" and list responses as branches off the major causes. The causes shown as branches can have sub-causes, indicated by sub-branches, and so on.

References
Ishikawa, K., Guide to Quality Control, White Plains, NY:Quality Resources, 1982, pp. 8-29. OConnor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons, 1991, pp. 311-312 The Memory Jogger, Methuen, MA:GOAL/QPC, 1988, pp. 24-29.

Technology Transfer # 92031014A-GEN

SEMATECH

108

Applicable Tool AT4: Competitive BenchmarkingTool AT4:


Competitive Benchmarking is an ongoing formal process used by a company to measure and compare their products, services, and operations against their toughest competitors and those companies demonstrating world class performance. The aim of this process is to identify the leading companies secrets and use this information to establish goals and priorities, and target areas for improvement.

The process itself is straightforward and simple; Industry Week outlines the benchmarking process with a list of 10 steps. However, the simplicity of the process belies its true power. One aspect of benchmarking that sets it apart is that it directs a companys focusoutside their own walls - aimed squarely at the marketplace and their competition. This leads to setting goals that are geared toward being the best in the world, not just slightly better than last year. Another benefit of benchmarking is that it can provide the blueprints for how a company can leap ahead of even the best of its competitors. Improvements are not only in the equipment but in secondary and supporting systems and processes. Other benefits of benchmarking include: Identifying the keys for success for each area studied Providing specific quantitative targets Creating an awareness of state-of-the-art approaches Cultivating a culture where change, adaptation, and continuous improvement are actively sought out Spotting emerging competitors and seeing where the company should be going in the future

SEMATECH

Technology Transfer # 92031014A-GEN

109

References
Altany, D., "Copycats," Industry Week, November 5, 1990, pp. 11-18. Camp, R., Benchmarking: The Search For Best Practices That Lead To Superior Performance, Milwaukee, WS:ASQC Quality Press, 1989. Pryor, L., Beating The Competition: A Practical Guide To Benchmarking, Washington DC:Kaiser Associates, 1988. Competitive Benchmarking: What It Is And What It Can Do For You, Stamford, CONN:Xerox Corporate Quality Office, Reference No. 700P90201, May 1987.

Technology Transfer # 92031014A-GEN

SEMATECH

110

Applicable Tool AT5: Design of Experiments (DOE)


Design of Experiments (DOE) refers to a collection of methods, largely but not exclusively statistical, for collecting and analyzing data under controlled conditions. This collection includes methods for the design and analysis of simple experiments, as well as strategies for moving from one experiment to the next based on previous results. The goal of all these methods is to maximize the information contained in and available from relatively little data. Experiments are performed for a variety of purposes, some exploratory, others confirmatory. Exploratory experiments include those aimed at cause detection, as well as those designed to accomplish the Taguchi goals of parameter design and tolerance design. Confirmatory experiments include, for example, process qualification studies. Regardless of an experiments purpose, the experimenter must face and deal with three issues common to all experimental situations: Response variability Known but extraneous systematic effects Extraneous unknown effects The general strategies used in experimental design to deal with these issues are replication, blocking, and randomization, respectively. Other aspects of experimental design include the: Selection of factors and determination of factor levels Selection of response Selection of the specific combination of factor levels at which to run the experiment Precise specification of the experimental procedure to be followed Each of these activities is governed by the experiments purpose. Methods for analyzing experimental data can be either numerical or graphical. The commonest family of numerical techniques are comprehended under the heading ANOVA, and include formal hypothesis tests, confidence intervals, and multiple comparison procedures. Graphical methods of analysis include simple histograms and dot-frequency plots, normal probability plots of effects and residuals, and Bayes plots. An important family of experimental designs are the full-factorial and fractional-factorial. Usually implemented with 2-level factors, they can be readily extended to multi-level factors. A serious drawback of a multi-leveled factorial design is its expense, the number of experiments grows exponentially as the number of levels increases. To a large extent, this is the reason for the popularity of 2-level factorial designs in initial screening experiments. The purpose of factorial/fractional-factorial designs is: Screening and first pass optimization Investigating the effect of many factors simultaneously Assessing interactions or coupling of factor effects
SEMATECH
Technology Transfer # 92031014A-GEN

111 In particular, in the presence of interactions, full-factorial and fractional-factorial designs are superior to one-at-a-time strategies. Fractional-factorial designs are useful for screening and are highly efficient for large numbers of factors. However, one assumes that only low-order interactions are present. When the experiment is run with center points both full-factorial and fractional-factorial designs can signal curvature or non-linearity. When used with steepest-ascent methods, factorial designs provide efficient second order optimization. The final stage of optimization can be achieved using response-surface methods. These methods are usually based on a second degree polynomial model that allows estimation of curvature. Although multi-level factorial designs could be used for fitting higher order surfaces, the family of central-composite designs are built up from fractional-factorial or full-factorial designs by adding selected axial joints.

References
Box, G., W. Hunter, J. Hunter, Statistics for Experimenters, An Introduction to Design, Data Analysis, and Model Building, New York:John Wiley and Sons, 1978. Taguchi, G., Introduction To Quality Engineering: Designing Quality into Products and Processes, White Plains, NY:UNIPUB/Kraus International Publications, 1987.

Technology Transfer # 92031014A-GEN

SEMATECH

112

Applicable Tool AT6: Environmental Stress Screening (ESS)


Environmental Stress Screening (ESS) is a modern production tool used to increase reliability. It has been particularly useful in the electronics industry. The methodology consists of the application of environmental inputs; that is, electrical, thermal and mechanical stresses, to equipment to accelerate the occurrence of potential failures. The environmental inputs are chosen to maximize defect identification in a minimum amount of time without creating any new defects. ESS is used to stimulate failures by stressing subsystems, components, and parts to detect and remove early failures due to weak subsystems, components, and parts; workmanship defects; and other nonconformance anomalies. It is particularly useful in uncovering process-induced defects. The stressing does not need to simulate the precise operating environment. However, the subsystem, component, or part is cycled through its operational modes while simultaneously being subjected to the required environmental stresses. Many stressors have been found to be effective in ESS, including temperature cycling, random vibration, altitude, humidity, and sound. Rapid thermal cycling and random vibration are the most commonly used environmental screens and are effective in detecting most types of latent defects. The type of test done at the component level will usually be different from the type of test done at a higher or lower level. At lower levels, stronger screens can be used without damaging the product. This is desirable because a stronger screen normally results in a higher defect detection rate and lower repair costs later in the life cycle. At higher levels, ESS can be used to identify intermittent failures effectively through power-on cycling. Equipment level screens are not advisable because lower-cost screens tend to precipitate most screenable defects. Even without ESS, some defects will be found before delivery to the customer. However, many defects will remain and cause service difficulties, particularly early in service. With ESS, many more of the existing errors are found at the factory, which leads to an improvement in In-service reliability. If a pattern of defects is observed, changes are made in the manufacturing design, manufacturing method, or both to eliminate the root cause of the problem. This means fewer defects and even lower manufacturing costs.

SEMATECH

Technology Transfer # 92031014A-GEN

113

References
Bailey, R., R. Gilbert, "STRIFE Testing for Reliability Improvement," PROCEEDINGS Institute of Environmental Sciences, Vol. 1, 1981, pp. 119 - 123. Bird, C., "Unit Level Environmental Screening," PROCEEDINGS - Institute of Environmental Sciences, May 1980, pp. 63 - 64. Punches, K., "Burn-In and Strife Testing," Quality Progress, May 1986, pp. 93 - 94. Tustin, W., "Shake and Bake the Bugs Out," Quality Progress, Sept. 1990, pp. 61-64. MIL-STD-785B, Reliability Program For Systems And Equipment Development And Production, Task 301 Environmental Stress Screening, 3 July 1986, pp. 301-1 to 301-2. MIL-STD 810E, Environmental Test Methods And Engineering Guidelines, 14 July 1989. RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11, Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 203-209.

Technology Transfer # 92031014A-GEN

SEMATECH

114

Applicable Tool AT7: Fault Tree Analysis


Fault Tree Analysis (FTA) is one of the most widely used and versatile methods of deductive analysis. Deductive analysis constitutes reasoning from the general to the specific. For example, the equipment has failed; now, what events have caused the failure of that equipment. This approach is commonly called the "Sherlock Holmesian" approach. Holmes, faced with given evidence, has the task of reconstructing the events leading up to the crime. Indeed, all successful detectives are experts in deductive analysis.
Equipment Failure

SS 1

SS 2

SS 3

C1

C2

C3

C4

C5

P1

P2

P3

P4

FTA is used to determine the various combinations of events; that is, component-level failures, that could result in equipment failure. Component-level failures include hardware failures, human errors, and software errors. A failure can range from noncompliance with specifications to the inability of a component to perform its intended function. Component-level failures, in fault tree (FT) terminology, are called primary events. Equipment failure refers to an undesired state of the equipment; such as, the equipment stops functioning or makes bad products. Equipment failure, in fault tree terminology, is called the top event. A fault tree is not a model of all possible equipment failures or all possible causes of equipment failure. A fault tree is tailored to its top event; that is, the fault tree only includes those failures that cause that top event to occur. Construction of a FT begins by defining what the top event is, for example, failure of the equipment at less than 1000 hours. The next step involves determining the various ways that this failure can occur. This is initially done at a fairly gross level. (For example, equipment failure due to failure of the wafer handler subsystem). Once the equipment is modeled at a gross level; that is, the model consists of 10 to 20 major subsystems, the next step is to determine which of the subsystems should be modeled in more detail. If a particular subsystem rarely fails and it is anticipated that this situation will not change, it would be a waste of time and effort to model it. Concentrate instead on those subsystems that cause the equipment to frequently or catastrophically fail. Those subsystems that are targeted as a reliability problem for the equipment are broken into more detail. For example, the wafer handler subsystem could be
SEMATECH
Technology Transfer # 92031014A-GEN

115 broken into the arm, associated software, and electrical components. Only those portions of the wafer handler subsystem that significantly contribute to failure of that subsystem are broken into more detail. This process is continued for all identified subsystems until all potential ways of failing the equipment are identified. The remainder of the description of this tool will focus on a general description of fault tree analysis and the Boolean algebra necessary to quantify the fault tree into an equipment failure rate. The references at the end of the description provide more detailed information. At the top of the FT the top event is listed within a rectangle. The icon at the beginning of this tool description has labeled its top event Equipment Failure. Next, the question, "How can the equipment fail?" is asked. All those events; that is, subsystems, that can cause equipment failure are placed in the FT under the top event, see Subsystem 1 (SS1), Subsystem 2 (SS2), and Subsystem 3 (SS3) in the icon. Gates are used to connect the events. The gate between the top event, equipment failure, and the primary events, SS1, SS2, and SS3, indicates that failure of SS1, SS2 or SS3 will cause the equipment to fail. Some of the symbols used in a fault tree include:
Primary Events
Basic Event A basic failure requiring no further development.

Undeveloped Event

An event that is not further developed either because it is insignificant or information is unavailable.

Gates
AND Gate Output fault occurs if all the input faults occur.

OR Gate

Output fault occurs if at least one input fault occurs.

Transfer Symbols
Transfer In Indicates that the tree is developed further on another page. Indicates that this portion of the tree connects at the corresponding transfer in.

Transfer Out

There are other less-used events and gates that are described in texts on FTA. As can be seen in the icon, SS1 fails if component 1 or 2 (C1 or C2) fail. C2 fails only if both parts 1 and 2 (P1 and P2) fail. SS3 fails if components 3, 4, and 5 (C3, C4, and C5) all fail. Failure of C4 requires either part 3 (P3) or part 4 (P4) to fail. Once construction of the fault tree is completed, it is translated into an equation that is used to quantify the equipment failure rate. Fault trees are based on Boolean algebra. Boolean algebra is the mathematical manipulation of events derived from logical reasoning. The references discuss Boolean algebra in detail; it will not be discussed here. The Boolean equations for the icon fault tree are:
Technology Transfer # 92031014A-GEN

SEMATECH

116 Equipment Failure = SS1 + SS2 + SS3 SS1 = C1 + C2 C2 = P1 * P2 SS3 = C3 * C4 * C5 C4 = P3 + P4 where + means OR, and * means AND. Substituting into the equipment failure equation, Equipment Failure = C1 + P1 * P2 + SS2 + C3 * (P3 + P4) * C5 expanding and using the associative and distributive laws Equipment Failure = C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5. Each of the terms in this equation is a scenario that leads to the top event; for example, C1 is a failure of component 1 which leads to equipment failure. In the IC equipment industry, the fault tree will consist almost entirely of OR gates. This means that every primary event is a scenario leading to the top event. AND gates are used when there is redundant equipment. Redundancy is a principle often used for critical safety functions. The fault tree has been translated into an equation, it is now time to quantify the probability of the top event as a function of the primary events. Often, the term probability is used when what is really meant is frequency, probabilities must lie between 0 and 1. A frequency can be any number greater than or equal to 0, depending on the number of events that occur and the time scale. For example, if a component fails twice per year, its frequency is 2/yr, or 0.66/mo. Using the previous example, the probability of the Equipment Failure can be written, P(Equipment Failure) = P(C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5). But, how does one deal with the right-hand side of the equation? Considering the basic laws of probability and the small probability approximation, and assuming that the events are independent, the example equation becomes: P(Equipment Failure) = P(C1) + P(P1)*P(P2) + P(SS2) + P(C3)*P(P3)*P(C5) + P(C3)*P(P4)*P(C5).

References
Dhillon, B.S., Quality Control, Reliability, and Engineering Design, New York:Marcel Dekker, Inc., 1985, pp. 154-163. Roberts, N., W. Vesely, D. Haasl, F. Goldberg, Fault Tree Handbook, NUREG-0492, Washington, DC:U.S. Nuclear Regulatory Commission, January, 1981. Sundararajan, C., Guide To Reliability Engineering Data, Analysis, Applications, Implementation, and Management, New York:Van Nostrand Reinhold, 1991. pp. 153-285.

SEMATECH

Technology Transfer # 92031014A-GEN

117

Applicable Tool AT8: Life Testing


Life testing is used to estimate and demonstrate the numerical reliability of a part, component, subsystem, or piece of equipment; that is, evaluate its useful life or reliability. Part and component tests are typically performed to examine the relative merits of alternative designs and to determine design margins. Subsystem and equipment tests are intended to explore the effects of component and part interactions under the loading and environmental conditions of day-to-day use. Life tests can be carried out either at normal operating conditions or at accelerated stress levels. When performing life testing, one is not only interested in when an item fails, but also in which part in a component or which component in a piece of equipment fails. Other considerations include: 1) determining the mode or modes of failure; that is, the types of failure, as exemplified by performance drift, erratic performance, and catastrophic failure, 2) the mechanism of failure; that is, the reasons for failure caused by poor design, and 3) part mis-application; in other words, the how and why of failure. Time-to-failure testing by actually generating a failure, together with the subsequent failure analysis, helps to find answers to these questions when time is the critical parameter. Many types of electronic, electromechanical, and hydraulic equipment fall into this category when they are continuously operating or experiencing a large number of cycles wherein the transient conditions of starting and stopping are not more severe than the accumulation of time. Life testing indicates how much more (or less) life the equipment has than is required for operational use. This in turn allows priorities for reliability improvement to be established. A subset of life testing is truncated (life) testing. Often a life test may be truncated before all test units have failed due to time limitations. Truncated data arises when, either by accident or design, values for all test items are not available. Truncated data is distinct from missing data. The type of analysis done on truncated data depends on the type of test plan and on the objectives of the test. There are many test plans that yield truncated data and methods for designing these test plans are well developed. The analysis methods are also well known. Life tests can be truncated in various ways; for example, the test might be stopped when a predetermined number of units have failed or when a specified amount of test time has elapsed. The truncation of the test depends on the resources available and the goals of the test.

References
Lloyd, D., M. Lipow, RELIABILITY: Management, Methods, and Mathematics, Second Edition, Milwaukee, WS:The American Society for Quality Control, 1991, pp. 307-319, 352. Nelson, W., Applied Life Data Analysis, NY:John Wiley & Sons, 1982.

Technology Transfer # 92031014A-GEN

SEMATECH

118

Applicable Tool AT9: Pareto Diagram


The Pareto diagram is based on the Pareto principle of the significant few and the insignificant many. It is often found that a large proportion of failures in equipment are due to a small number of causes.

No. of Failures

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Part 7

The Pareto diagram is a vertical or horizontal bar chart used to quantify and identify problems and determine which problems should be worked on first. The bars are used to present a graphic picture of the problems related to equipment. The bars are arranged in descending order of importance from left to right. Analyzing failure data and using that data to create a Pareto diagram allows for determining how to solve the largest proportion of the overall reliability problem with the most economical use of resources.

References
Harrington, H., The Improvement Process, New York:McGraw-Hill, 1987, pp. 108-110, 207. Ishikawa, K., Guide to Quality Control, White Plains, NY:Quality Resources, 1982, pp. 42-49. O'Connor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons, 1991, pp. 270-271. The Memory Jogger, Second Edition, Methuen, MA:GOAL/QPC, 1988, pg. 17.

SEMATECH

Technology Transfer # 92031014A-GEN

119

Applicable Tool AT10: Process Capability

If equipment, subsystems, components, or parts have a tolerance (or specification) width, and are produced by a process that generates variation in the parameter(s) of interest, it is important that the process variation be less than the tolerance width. The ratio of the tolerance to the process variation is called the process capability index, and is expressed as
Cp = T 6

where T is the tolerance width and 6 represents an interval of six standard deviations or, plus or minus three standard deviations from the process mean. A Cp of 1 indicates that a process will generate approximately 3 out-of-specification units in 1000, given the following assumptions. The first assumption is that the process is normally distributed and stable. Any systematic divergence, due for example to set-up errors, movement of the process mean during the manufacturing cycle, or other causes, could significantly affect the output. Therefore, the use of Cp to characterize a production process is appropriate only for processes that are under statistical control; that is, there are no special causes of variation such as those just mentioned, only common causes. Common cause variation is the random variation inherent in the process, when it is under statistical control. The Cp index also assumes that the tolerance center and the process mean coincide; that is, the process average is centered on the nominal value.

Technology Transfer # 92031014A-GEN

SEMATECH

120 The Cpk index uses the Cp index as a starting point for stating a processs capability, however, it accounts for the process center not being the nominal value. Cpk is expressed as
C pk = (1- K) C p

where K= D-x T/2

if D>x; otherwise replace D-x withx -D. D is the design center,x is the process mean, and T is the tolerance width. Ideally Cp = Cpk. There are several things to keep in mind when using Cp and Cpk indices: If the process is not stable, Cp and Cpk are meaningless statistics. Not all processes can be assumed to be normally distributed. A naive user may incorrectly assess the fraction of process output that will be out of specification. Cp and Cpk do not yield the same information about a process Both Cp and Cpk are closely tied to traditional 0-1 loss and do not account for losses incurred for being off-target; each measures distance from specifications not distance from target.

References
Gitlow, H., S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of Quality, Boston, MA:IRWIN, 1989, pp. 451-457. Kane, V. E., "Process Capability Indices," Journal of Quality Technology, Vol. 18, No. 1, January 1986, pp. 41-52. OConnor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons, 1991, pp. 302-303. Sullivan, L., "Reducing Variability: A New Approach to Quality," Quality Progress, July 1985, pp. 15-21. The Memory Jogger, Second Edition, Methuen, MA, GOAL/QPC, 1988, pp. 64-68.

SEMATECH

Technology Transfer # 92031014A-GEN

121

Applicable Tool AT11: Quality Function Deployment (QFD)


Quality Function Deployment (QFD) is a disciplined approach to engineering and process planning that projects customer requirements through all phases of the equipment life cycle. QFD is also known as the "house of quality" because the matrix used in its implementation resembles a house.

Correlation Matrix
y rit io Pr

How?

What?

Relationship Matrix

Importance Ratings

How much?

The focus of QFD is almost entirely on the customer; that is, the voice of the customer. The attitude promoted by QFD is one of problem avoidance rather than problem solving. QFD is best used in a team or group context. The information required to complete a QFD matrix is usually found in many different disciplines or skill sets. The information needed stretches from a few simple (but presumably accurate) statements of customer needs, all the way to the most detailed manufacturing process description. Therefore, it is not a methodology that can be effectively used by a single person. Advantages of QFD include: Promoting careful planning of the equipment through all life cycle phases in such a way that attention is paid to customer needs

Technology Transfer # 92031014A-GEN

SEMATECH

122 Eliminating spurious engineering and process requirements; that is, those that have no role in meeting customer needs Shortening the time it takes to move through the concept and feasibility to production and operation phases by avoiding later life cycle changes that stretch out the cycle time Identifying problem areas early, exposing areas for improvement, and providing documentation for these activities

Difficulties with QFD include: Being semi-quantitative, QFD doesnt replace good engineering judgement and good sense An inability to compensate for an inaccurate or incomplete list of customer needs Not being designed to promote innovation in the sense of new or radical product ideas Requiring the use of a wide variety of expertise and a team environment The basics of the QFD matrix are simple; although, in practice it is a great deal of work to collect the information necessary to create the matrix. Generally the QFD matrix consists of seven parts What? How? Relationship matrix Priority Correlation matrix Importance ratings How much? What? is a collection of simple statements of customer wants, needs, or requirements; that is, the voice of the customer. These statements are easy for the customer to identify with and to understand. They accurately and simply list the group of characteristics or properties that make the customer happy. How? is a list of engineering, design and technical properties that are necessary to develop the equipment. The What? list becomes the titles for the QFD matrix rows, and the How? list becomes the titles for the columns, see the icon at the beginning of the QFD discussion. The relationship matrix is used to relate the What? rows to the How? columns. A relevance number or symbol is assigned to the intersections of the rows and columns. This results in establishing the relationship between what the customer wants and how the equipment is going to meet those wants. Usually an extra column, called priority, is placed just to the left of the relationship matrix. It is used to assign importance weights to the customer wants; that is, to determine which of the customer wants are the most important to the customer. This determines which characteristics will get the most focus. The determination is made with the customer, or at least with some very good knowledge of what the customer wants.

SEMATECH

Technology Transfer # 92031014A-GEN

123 Engineering, design, and technical properties are not independent of one another. Therefore, it is necessary to examine how they relate to one another. This results in the roof of the house of quality which is the correlation matrix. It is also necessary to determine if the properties are correlated positively or negatively. An example of negatively correlated properties would be strength and flexibility. The matrix is usually expanded further to include the importance ratings and the How much? column. The importance ratings contain numbers derived from the matrix values and the priority column. It is used to indicate the importance of each of the properties with respect to the customer wants. The How much? column contains the target values for every property listed in the How? column. It answers the question, "How much is enough?"

References
Akao, Y., editor, Quality Function Deployment: Integrated Customer Requirements into Product Design, Norwalk CN:Productive Press, 1990. Hauser, J., D. Clausing, "The House of Quality," Harvard Business Review, May-June 1988, pp. 63-73. Ryan, N., editor, Taguchi Methods and QFD: Hows and Whys for Management, Dearborn, MI:ASI Press, 1988, pp. 63-110.

Technology Transfer # 92031014A-GEN

SEMATECH

124

Applicable Tool AT12: Reliability Analysis and Modeling


The Reliability Analysis and Modeling Program (RAMP) has been developed by SETEC in support of SEMATECH. The RAMP software can be used to assist the system analyst or designer in the construction of a system reliability model for equipment used in semiconductor manufacturing. A system model provides the analyst with useful information in many different forms, including the following: An evaluation of design alternatives prior to cutting metal An estimate of the system reliability (or MTBF) A quantification of the uncertainty in the estimate An identification of major contributors to system failure A rationale for allocating available resources to improve the performance of the system

Modeling produces its maximum economic benefit when performed during the design phase of the equipment life cycle. However, modeling can also provide economic benefits when applied to existing equipment. The development of a system model depends heavily on the users understanding of the equipment that is being modeled. However, proper utilization of the model also requires the analyst to have a working knowledge of several concepts in the areas of statistics, probability, and reliability. Version 1.0 of RAMP provides the capability for developing, editing, and evaluating reliability models for equipment used in semiconductor manufacturing. This capability is supported by an integrated data management system and an integrated graphics output capability. The following features were included to make the software as user friendly as possible: Menu driven. All options available to the user can be accessed from on-screen menus. Help screens. Context-sensitive help is available to the user at all times. Mouse support. Mouse support is provided on all screens where use of the mouse significantly improves the user interface.
SEMATECH
Technology Transfer # 92031014A-GEN

125 Graphics output. Graphics output is fully integrated into the software. Modular design. The design of the software package is modular to allow easy modification or addition of capabilities. Integrated data management. Management of component data is fully integrated into the software. File management. Management of file names and file identification is transparent to the user.
WHS-TC-VS WHS-ROBTARM WHS-ROBSERV WHS-ROB WSEN WHS-ROBELEC WHS--ELEC PS WHS-ELEC CIB

Figure 4-1. A Block Model Developed in RAMP for the SETEC Generic Wafer Handler System A system model for the equipment is easily developed in RAMP in the form of a block diagram. Figure 3-1 gives an example of a block model representation of a SETEC generic wafer handler as developed in RAMP by the analyst. The system is represented with 14 components in series (7 of which are shown in Figure 4-1). Component failure rate information, including a characterization of the uncertainty, is entered into the component data library in RAMP. RAMP converts the block diagram model in Figure 4-1 to a mathematical equation and uses random selection techniques to sample the component failure rates from the component data library. The output from RAMP provides complete sensitivity and uncertainty analysis results for various performance measures that are associated with a reliability analysis of the system being modeled, including System MTBF The system MTBF is for the modeled system. A range of values for the MTBF and the distribution associated with that range is provided. Component contribution to system failure The fractional contribution that a component makes to the failure of the system. Component contribution to subsystem failure The fractional contribution that a component makes to the failure of the subsystem. Subsystem contribution to system failure The fractional contribution that a subsystem makes to the failure of the system. Reliability Improvement The value of reliability improvement for a component is the system MTBF (in hours) that would result if the failure rate for that component were zero (that is, the component were perfectly reliable or nearly so). Uncertainty importance Uncertainty importance provides a measure of the contribution of a component to the uncertainty in the probability of system failure. Results produced by RAMP are available in various types of displays that include Histograms A histogram is a graphical presentation of sample data using classes (that is, intervals) on the x axis and relative frequency on the y axis. Cumulative distribution functions (CDFs) A CDF is a graph of the cumulative relative frequency (cumulative fraction) of observations less than or equal to a given value.

Technology Transfer # 92031014A-GEN

SEMATECH

126 Pareto diagrams A Pareto diagram is a bar chart with the displayed values ordered from the largest to the smallest. RAMP orders displayed values based on the mean. The 5th and 95th percentiles are also displayed when they are available. Summary statistics A written list of all the statistics calculated by RAMP is displayed, such as the average MTBF, standard deviation for MTBF, and selected quantiles of the uncertainty distribution for MTBF. Input samples This option allows an analyst to view or print input failure rates as sampled from component failure rate distributions. Output results from samples This option allows the analyst to view or print the numerical results that are calculated for each of the sampled failure rates. Statistical results This option allows an analyst to view or print selected statistical results, such as the mean value for all components. Based on the characterization of the failure rates in the component data library for the SETEC generic wafer handler system shown in Figure 4-1, the summary statistics produced by RAMP give a mean value for MTBF of 93 hrs with about a 5 percent chance of being less than 50 hrs and a 5 percent chance of exceeding 178 hrs. A graph of the estimated cumulative distribution function for MTBF that is produced by RAMP is given in Figure 4-2.

SEMATECH

Technology Transfer # 92031014A-GEN

127

Figure 4-2. An Estimate of the Cumulative Distribution Function for MTBF The Pareto diagram in Figure 3-3 identifies the components that are the dominant contributors to the failure of the system such as robot servo, robot wafer sensor, elevator door, and sensor amplifiers. The Pareto diagram uses three horizontal bars with each component name rather than the usual one bar. This is done to display the uncertainty associated with the contribution of each component to system failure. The three bars represent the 95th percentile, the mean, and the 5th percentile of the distribution of the components contribution to system failure. Now assume that the engineers involved with the SETEC generic wafer handler have developed a new and improved elevator that improves its MTBF by a factor of 2. The component data library is modified to reflect the new MTBF for the elevator. In addition, the engineers would like to evaluate the impact on system reliability of a design change that would incorporate redundancy by adding another robot wafer sensor in parallel. Because the sensors are in parallel, they must both fail before they cause the system to fail, thus improving the system MTBF. The block diagram model is modified to include this desired design change. The modified block diagram is shown in Figure 4-4.

Technology Transfer # 92031014A-GEN

SEMATECH

128

Figure 4-3. A Pareto Diagram for Component Contribution to System Failure


WHS-TC-VS WHS-ROBTARM WHS-ROBSERV WHS-ROB WSEN WHS-ROBELEC WHS--ELEC PS WHS-ELEC CIB

WHS-ROB WSENP

Figure 4-4. A Revised Block Diagram for the SETEC Generic Wafer Handler System, showing the Addition of the Redundant Wafer Sensor After these modifications, the summary statistics produced by RAMP give a mean value for MTBF of 137 hrs for an increase of 47 percent. There is approximately a 5 percent chance of the MTBF being less than 64 hr and a 5 percent chance of it exceeding 249 hr. A graph of the estimated cumulative distribution function for MTBF that is produced by RAMP is given in Figure 4-5.

SEMATECH

Technology Transfer # 92031014A-GEN

129

Figure 4-5. An Estimate of the Cumulative Distribution Function for MTBF after Modifying the Generic Wafer Handler System The new Pareto diagram is given in Figure 4-6 and shows that the wafer sensor is no longer a problem and has dropped out of the top ten list of components contributing to system failure. In addition, the elevator door has now dropped behind the sensor amplifiers in the rankings.

Technology Transfer # 92031014A-GEN

SEMATECH

130

Figure 4-6. A Pareto Diagram for Component Contribution to System Failureafter Modifying the Generic Wafer System This example has illustrated how RAMP provides a prediction of the system MTBF (including the uncertainty in the prediction) after making two improvements in the system. Thus, modeling has provided a tool for adopting a proactive position rather than a reactive position with respect to making changes in the system to improve its reliability. That is, the analyst now has a good idea of how the proposed changes will affect the performance of the system and knows where to expend the companys resources to provide an even greater improvement prior to committing those resources. This simple example provided a flavor of how RAMP works and demonstrated the usefulness of modeling. Modeling alone does not make a system reliable, but it does provide an organized means of understanding the system as well as being a tool to guide the wise expenditure of resources for improved reliability.

References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, SETEC91-030, Albuquerque, NM:Sandia National Laboratories. Campbell, J., B. Thompson, D. Longsine, P. OConnell, R. Iman, RAMP Users Reference Manual, SETEC91-030, Albuquerque, NM:Sandia National Laboratories.

SEMATECH

Technology Transfer # 92031014A-GEN

131

Applicable Tool AT13: Reliability Development/Growth TestingTool


The purpose of the Reliability Development/Growth Test (RD/GT) is to improve the reliability of equipment through a disciplined process of systematic and permanent removal of failure mechanisms and design weaknesses. RD/GT is conducted under simulated or actual usage environments based on operational requirements and mission profiles of the operational environment. Reliability Development/Growth Test (RD/GT) is a closed loop reliability improvement process that involves testing under simulated or actual usage environments. The purpose of RD/GT is to: Induce failures Detect the failures Determine the cause of the failures Identify corrective actions to correct the failures Implement effective corrective actions Test to verify that the failure causes have been removed Corrective action encompasses redesign, part and material changes, and changes in the design and manufacturing processes. The reliability of the equipment is improved by systematically implementing corrective action, which results in significantly higher reliability in the field. The rate at which reliability grows depends on how effectively and rapidly failure modes can be identified, corrected and then verified by retest. Candidates for RD/GT include high risk and mission critical items. High risk items usually represent designs utilizing new or state-of-the-art technology. Other candidates include those items that are major contributors to the overall equipment reliability, are high in cost, and those that experience suggests need reliability improvement. There are several things that should be mentioned about RD/GT. The first is that RD/GT is only as effective as the ability of the implemented process to detect and correct problems as they occur. It should be recognized that unless problems are identified and fixes implemented and verified, no reliability growth will occur. Reliability improvement results from fixes that eliminate failure sources discovered through the analysis of test data. Reliability improvement is a function of a design and manufacturing process improvement, not just test time. Implementing an RD/GT that merely tests equipment and repairs failures will not result in reliability improvement. The second thing is that RD/GT will not effectively increase the reliability of an item that has a low initial design reliability. If initial reliability is too low (due mainly to inadequate design), the item will require an unrealistic reliability improvement or growth rate in order to reach an acceptable level of reliability. This will be reflected in the requested amount of test time and program cost. RD/GT is an engineering task designed to improve design reliability. Monitoring, tracking and assessing the results of an RD/GT gives management insight into the efficiency of the process, and provides a tool for evaluating development status and reallocating resources when necessary to achieve the proper growth rate.

References
Technology Transfer # 92031014A-GEN

SEMATECH

132 Arsenault, J., J. Roberts, Reliability & Maintainability of Electronic Systems, Potomac, MD:Computer Science Press, Inc., 1980, pp. 344-353. MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Volume I of II, Irvine, CA:Global Engineering Documents, 12 October 1988. 92714, pp. 8-68 to 8-90. RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11, Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 149 - 158.

SEMATECH

Technology Transfer # 92031014A-GEN

133

Applicable Tool AT14: Reliability Qualification Testing (RQT)Tool


Reliability Qualification Testing (RQT) is used to verify that the equipment will meet design goals and comply with contractual/program objectives. The test is performed under specified environmental conditions and pass/fail criteria are established prior to a production decision. Usually, the qualification test is performed on the overall equipment. However, some sublevel testing may be performed on a few critical items. Involved in this test is a customer review of the results of the suppliers qualification test to ensure that a valid statistical representation of the projected equipment performance has been achieved. Also involved in this test is periodically pulling equipment out of the normal manufacturing cycle, that is out of production, and performing extended qualifications tests on it. This provides performance benchmarks and a means of assuring that engineering changes are meeting design goals. A major aspect of this test is the final and acceptance testing that is carried out on each piece of equipment. It is important to note that when testing plans are developed, they should be upgraded as the equipment passes through the various life cycle phases. Final Test. The process of equipment qualification and acceptance is generally accomplished in two parts. The first occurs after final test on the production floor. The equipment qualification is often referred to as final test. If properly planned, the customers source inspection can be included at the same time. The detail involved in source inspections varies widely depending on customer requirements, but all aspects of source inspections are normally covered during final test. Once the equipment is operating properly, the reliability test starts. This involves multiple, repeated cycling of equipment functions and subsystems. This test can also help bring infant mortality failures to the surface before the equipment is delivered to a customer. Acceptance Test. The next phase is acceptance testing. This occurs after equipment installation at the customers site. The engineer starts by verifying that the equipment still passes tests similar to those performed during equipment qualification to check for shipping and installation damage. The focus then shifts to process capability. This is where the second half of the equipment characterization takes place. Once the equipment is fully characterized with an optimal or pre-determined process, the reliability testing begins again. This focuses on process stability, repeatability and performance. The tests can collect concurrent equipment reliability data because this testing generally involves many operational cycles of the equipment.

References
Ireson, W., C. Coombs, Jr., Handbook of Reliability Engineering and Management, NY:McGraw-Hill, 1988, pp. 8.1 - 8.39. RADC Reliability Engineers Toolkit An Application Oriented Guide for the Practicing Reliability Engineer, Griffiss Air Force Base, NY:Systems Reliability and Engineering Division, Rome Air Development Center, July 1988, pg. 101. RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11, Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 211-217.

Technology Transfer # 92031014A-GEN

SEMATECH

134

Applicable Tool AT15: Reliability Block Diagram Modeling (RBD)Tool

SS1

SS2

SS3

Reliability Block Diagram (RBD) models are one of the tools that can be used to create a reliability model of equipment. One of the easiest ways to describe the basic ideas used in the creation of RBD models is to create a simple RBD; for a more detailed description of the diagrams look at the sources listed in the references. Construction of a reliability block diagram begins by defining what is meant by equipment failure; for example, equipment failure may be defined as any failure that causes the equipment to be down for 8 minutes or longer. Once this is done, the next step is to determine the various ways that this failure can occur. This is initially done at a gross level; that is, 10 to 20 subsystems are defined that can lead to equipment failure. A block diagram model that consists of 3 subsystems (SS1, SS2, and SS3) follows:
C3

C1

C2

SS2

C4

C5

In this example SS2 is not a significant contributor to the unreliability of the equipment, so it will not be broken into any more detail. SS1 and SS3 however, are contributors to equipment unreliability. SS1 fails if component 1 or 2 (C1 or C2) fail. SS3 fails if components 3, 4, and 5 (C3, C4, and C5) all fail. The block diagram model now looks like:

SEMATECH

Technology Transfer # 92031014A-GEN

135

C3 P1 C1 P2 C5
Further analysis reveals that C2 fails if parts 1 and 2 (P1 and P2) fail. C4 fails if parts 3 or 4 (P3 or P4) fail. The block diagram model now looks like: Once construction of the model is complete, it is translated into a Boolean equation which is then used to quantify the equipment reliability. The references discuss Boolean algebra in detail, it will not be discussed here. The Boolean equation for the RBD is: Equipment Failure = C1 + P1 * P2 + SS2 + [C3 * (P3 + P4) * C5] expanding and using the associative and distributive laws, Equipment Failure = C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5. Each of the terms in this equation represent a way that the equipment can fail. For example, if part 1 and part 2 (P1 and P2) fail, the equipment fails. The reliability block diagram has been translated into an equation, it is now time to quantify the probability that the equipment fails as a function of its subsystems, components, and parts. Often the term probability is used when what is really meant is frequency, probabilities must lie between 0 and 1. A frequency can be any number greater than or equal to 0, depending on the number of failures and the time scale used. For example, if a component fails twice per year, its frequency is 2/yr, or 0.66/mo. Using the previous example, the probability of equipment failure can be written, P(Equipment Failure) = P(C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5). But, how does one deal with the right-hand side of the equation? Considering the basic laws of probability and the small probability approximation, and assuming that the events are independent, the example equation becomes: P(Equipment Failure) = P(C1) + P(P1)*P(P2) + P(SS2) + P(C3)*P(P3)*P(C5) + P(C3)*P(P4)*P(C5).

SS2

P3

P4

Technology Transfer # 92031014A-GEN

SEMATECH

136

References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, Albuquerque, NM:Sandia National Laboratories, SETEC91-030, pp. 9 - 31. Klinger, D., Y. Nakada, M. Menendez, AT&T Reliability Manual, New York:Van Nostrand Reinhold, 1990, pp. 78-91. MIL-STD-756B, Reliability Modeling and Prediction, Washington, DC:Department of Defense, 18 November 1981, pp. 1001-1 to 1001-11.

SEMATECH

Technology Transfer # 92031014A-GEN

137

Applicable Tool AT16: Repairable Systems Analysis


Repairable systems analysis is a method that can be used to estimate the number of repairs that occurred per equipment versus the equipments age. Age is used to mean any appropriate measure of equipment usage such as days, hours, or cycles. Repairable systems analysis is also used to: Evaluate whether a repair rate increases or decreases with equipment age, which is useful for equipment retirement and burn-in decisions Compare different equipment designs, production periods, maintenance policies, environments, and operating conditions Predict the future number of equipment repairs Reveal unexpected information and insight into component repairs It is important to mention that time between repair data on a piece of equipment is analyzed differently than the time between failure data. The use of failure rate parameters are generally not meaningful for single equipment repair data, especially if the reliability of the equipment is increasing or decreasing. Statistical tests, graphical procedures, or both are available for determining if the failure rate is increasing, decreasing, or staying constant. If the time between repairs on a piece of equipment are gradually getting longer, one could reasonably assume that the reliability is improving. On the other hand, if these times are decreasing, one assumes that the reliability is decreasing. Using standard methods for estimating the failure rate, this increasing or decreasing of the time between repairs is often overlooked. The methodology associated with this tool detects trends of this type.

References
Asher, H., H. Feingold, Repairable Systems Reliability Modeling, Inference, Misconceptions and Their Causes, New York:Marcel Dekker, Inc., 1984. Nelson, W., "Graphical Analysis of System Repair Data," Journal of Quality Technology, Vol. 20, No. 1, Jan. 1988, pp. 24-35.

Technology Transfer # 92031014A-GEN

SEMATECH

138

Applicable Tool AT17: Taguchi Methods


The following paragraphs will introduce some of the basic elements of Dr. Genichi Taguchis quality methodology. Even though these elements are directed at quality, they apply equally well to reliability. Taguchis methods are in part philosophical and in part methodological. The methodological component, which consists of Taguchis use of statistical concepts and tools, is the subject of heated controversy. However, the heart of his message has more to do with his conceptual framework for the process of quality improvement and nearly all practitioners accept Taguchis central philosophical ideas. There are seven points that explain some of the basic elements of Taguchis philosophy: 1. The quality of a manufactured product is measured by the total loss created by that product to society. 2. In a competitive economy, continuous quality improvement and cost reduction are necessary for staying in business. 3. Quality improvement requires the never-ending reduction of variation in product and process performance around desired values. 4. Societys loss due to performance variation is frequently proportional to the square of the deviation of the performance characteristic from its target value. 5. The final quality and cost of a manufactured product are determined to a large extent by the engineering designs of the product and its manufacturing process. 6. Performance variation can be reduced by exploiting the nonlinear effects between a products and/or processs parameters and the products desired performance characteristics. 7. Statistically planned experiments can be used to identify the settings of product (and process) parameters that reduce performance variation. These points will be discussed in more detail. 1. The quality of a manufactured product is measured by the total loss created by that product to society. According to Taguchi, "Quality is the loss imparted to society from the time a product is shipped." This view of quality includes customers, manufacturers, and the community in the definition of quality. According to this perspective, quality improvement saves society more resources than it costs, and it benefits everyone: customers, manufacturers, and the community. This is a new way to think of investments in quality improvement. A quality improvement project is justified as long as the resulting savings to customers are more than the cost of improvements. 2. In a competitive economy, continuous quality improvement and cost reduction are necessary for staying in business. In a competitive economy, a company that does not earn a reasonable profit cannot survive for long. A sure way of increasing market share is to provide high quality products at a low price, which is what customers want. Thus, companies that are determined to stay in business use high quality and low cost as their competitive strategy. Such companies also realize that the quality of their product is never good enough and manufacturing costs are never low enough.

SEMATECH

Technology Transfer # 92031014A-GEN

139 3. Quality improvement requires never-ending reduction of variation in product and process performance around desired values. The quality of a product cannot be improved unless the quality characteristics of that product can be identified and measured and the ideal values are known. Each quality characteristic varies from unit to unit and over time. The objective of a continuous quality improvement process is to reduce this variation; that is, make the quality characteristics as close to their ideal values as possible. However, it is generally not economical or necessary to improve all quality characteristics since not all characteristics are of equal importance. Performance characteristics are defined as those characteristics that determine the products performance in satisfying the customers requirements. The ideal value is called the target value. If a product is of high quality, the performance characteristics remain close to their targeted values under all operating conditions. The variation of a performance characteristic about its target value is referred to as performance variation. The smaller the performance variation about the target value, the better the quality. Target specifications are typically stated in terms of nominal values and tolerances about these values. It is not acceptable to state target values in terms of interval specifications only. This leads to the idea that it is okay to be anywhere within the interval and that magically the performance characteristics deteriorate when they move out of the interval. The goal is for the performance characteristics to always be at their targeted values. 4. Societys loss due to performance variation is frequently proportional to the square of the deviation of the performance characteristic from its target value. Any variation in a products performance characteristic about its targeted value causes a loss to society. This loss can range from inconvenience to monetary loss and physical harm. Variation is represented mathematically in the following manner. Let be a performance characteristic measured on a continuous scale and let the target value of be . Let () represent dollar losses suffered by society at some time during the products life span due to the deviation of from . Generally, the larger the deviation of the performance characteristic from it target value , the larger the loss to society, (). However, it is usually difficult to determine the actual mathematical form of (). Often, a quadratic approximation to () adequately represents economic losses due to the deviation of from . The simplest quadratic loss function is () = k(-)2, where k is some unknown constant that can be determined when () is known for a particular value of . There are three cases of the loss function that are typically used: when a specific target value is the best and the loss increases symmetrically as the performance characteristic deviates from the target when the smaller is better, for example, if the performance characteristic is the amount of impurity and the target value is zero; here the smaller the impurity, the better it is when the larger the better, for example, if the performance characteristic is strength; here the larger the strength the better it is The average loss to society due to performance variation is obtained by "statistically averaging" the quadratic loss () = k(-)2 associated with the possible values of . In the case of quadratic loss functions, the average loss due to performance variation is
Technology Transfer # 92031014A-GEN

SEMATECH

140 proportional to the mean squared error of about its targeted value . Therefore the fundamental measure of variability is the mean squared error and not the variance. The concept of quadratic loss emphasizes the importance of continuously reducing performance variation. 5. The final quality and cost of a manufactured product are determined to a large extent by the engineering design of the product and its manufacturing process. The number of manufacturing imperfections in a product, hence the manufacturing cost of a product, is significantly affected by the products design and the design of the process used to produce the product. Generally, a products field performance is affected by environmental variables as well as human variations in operating the product, product deterioration, and manufacturing imperfections. Note that these sources of variation are chronic problems. Manufacturing imperfections are the deviations of the actual parameters of a manufactured product from their nominal values. These imperfections are caused by inevitable uncertainties in a manufacturing process and are responsible for performance variation across different units of a product. Dealing with variations due to environmental factors and product deterioration can be done only in the products concept and design phases. The manufacturing costs and imperfections in a product are largely determined by the design of the manufacturing process. Increasing process controls can reduce manufacturing imperfections; however, process controls cost money. It is, therefore, necessary to reduce both manufacturing imperfections and process controls. Once the process is under statistical control, it can be improved. Without a stable process it is almost impossible to discover a means of reducing variation due to chronic problems. 6. Performance variation can be reduced by exploiting the nonlinear effects between a products and/or processs parameters and the products desired performance characteristics. Due to the importance of the product and process design, quality control must begin in the concept phase of the life cycle and continue through all phases. There are two types of quality control methods: Off-line, which are technical aids for quality and cost control in product and process design. These are used to improve product quality and manufacturability, and to reduce product development, manufacturing, and lifetime costs. On-line, which are technical aids for quality and cost control in manufacturing. As with performance characteristics, all specifications of product and process parameters should be stated in terms of ideal values and tolerances around these ideal values. The idea is not to produce products whose parameters are barely inside the tolerance intervals. Such products are likely to be of poor quality due to the interdependencies of the parameters. A product performs best when all parameters of the product are at their ideal values. Further, the knowledge of ideal values of product and process parameters encourages continuous quality improvements. Taguchi has introduced a three-step approach to assign nominal values and tolerances to product and process parameters: System design Parameter design

SEMATECH

Technology Transfer # 92031014A-GEN

141 Tolerance design

System design involves applying scientific and engineering knowledge to produce a basic functional prototype design. The prototype model defines the initial setting of the product or process parameters. System design requires an understanding of both the customers needs and the manufacturing environment. A product cannot satisfy the customers needs unless it is designed to do so. Designing for manufacturability requires an understanding of the manufacturing environment. Parameter design involves identifying the settings of product or process parameters that reduce the sensitivity of engineering designs to the sources of variation. Adjustment of the mean value of a performance characteristic to its targeted value is usually a much easier engineering problem than the reduction of performance variation. The utilization of nonlinear effects of product or process parameters on the performance characteristics to reduce the sensitivity of engineering designs to the sources of variation is the essence of parameter design. Because parameter design reduces performance variation by reducing the influence of the sources of variation rather than by controlling them, it is a very costeffective technique for improving engineering designs. It is economically advantageous for a designer to provide designs that are tolerant to statistical variations. Tolerance design involves determining tolerances around the nominal settings identified by parameter design. Industry commonly assigns tolerances using convention rather than science. Narrow tolerances increase manufacturing costs while wide tolerances increase performance variation. Thus, tolerance design is a trade-off between societys loss due to performance variation and the increase in manufacturing costs. 7. Statistically planned experiments can be used to identify the settings of product (and process) parameters that reduce performance variation. This is the portion of Taguchis methodology that is subject to criticism. Engineers tend to like Taguchis statistical methods because he has made a serious effort to develop methods that are easy for a non-statistical expert to use. However, Taguchis experiments can be enormous and extremely inefficient. Taguchis approach to the use of statistically planned experiments for parameter design involves classification of the performance characteristics of a product or process into two categories: design parameters and sources of noise. Design parameters are those product or process parameters whose nominal settings can be chosen by the responsible engineer. These nominal settings define the product or process design specifications and vice versa. The sources of noise are all those variables that cause the performance characteristics to deviate from their targeted values. The noise factors are those sources of noise that can be systematically varied in a parameter design experiment. The key noise factors, those that represent the major sources of noise affecting a products performance in the field and a process performance in the manufacturing environment, should be identified and included in the experiment.

Technology Transfer # 92031014A-GEN

SEMATECH

142

References
Barker, T.B., "Quality Engineering By Design: Taguchis Philosophy,"Quality Progress, December 1986, pp. 32-42. Gitlow, H., S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of Quality, Boston, MA:IRWIN, 1989, pp. 491-507. Gunter, B., "A Perspective on the Taguchi Methods," Quality Progress, June 1987, pp. 44-52. Kackar, R.N., "Taguchis Quality Philosophy: Analysis and Commentary,"Quality Progress, December 1986, pp. 21-29. Miller, K.L., D. Woodruff, "A Design Masters End Run Around Trial and Error,"Business Week/Quality, October 15, 1991, pg. 24. Phadke, M.S., Quality Engineering Using Robust Design, Englewood Cliffs, NJ:Prentice Hall, 1989. Port, O., J. Carey, "Quality: A Field With Roots That Go Back To The Farm," Business Week/Quality, October 15, 1991, pg. 15. Ross, P.J., Taguchi Techniques for Quality Engineering Loss Function, Orthogonal Experiments, Parameter and Tolerance Design, New York, NY:McGraw-Hill Book Company, 1988. Taguchi, G., Introduction To Quality Engineering Designing Quality into Products and Process, White Plains, NY:Asian Productivity Organization, 1987.

SEMATECH

Technology Transfer # 92031014A-GEN

143

Applicable Tool AT18: User Groups


The reason for creating User Groups is to establish clear and direct communication between the equipment supplier and the equipment user. One of the most effective means of establishing and maintaining user groups is through user group meetings. The user group meetings are structured working level meetings where needed equipment improvements are identified and prioritized, problems are solved, and strategic information is shared. Key factors for successful user group meetings include having: A sufficient number of both user attendees with "hands-on" knowledge of equipment performance and supplier attendees with design, manufacturing and field service responsibilities "Ownership" of the meeting by one individual or joint ownership by one user person and one supplier person Plenty of lead time for identification of attendees, surveys, and for meeting preparation A well-paced agenda with enticements that lead to discussion and user participation "Effective Meetings" skills used by leaders A comfortable meeting setting (facilities and accommodations)

References
EIP Data Gathering Group, SEMATECH, Austin, TX Partnering For Total Quality A Total Quality Tool Kit, Volume Six, SEMATECH, 1990, pp. 76, 61.

Technology Transfer # 92031014A-GEN

SEMATECH

144

Applicable Tool AT19: Life-Cycle Cost Calculations


Life cycle costs include initial purchase price and the costs associated with equipment installation and operations over its entire life. Life cycle costs include both equipment supplier costs, which are passed on to the customer in the purchase price of the equipment, and all costs incurred by the customer over the life of the equipment. Supplier costs plus the suppliers profit margin are referred to as acquisition costs, and include: Research and development Marketing and sales Testing and manufacturing Supplier shipping and installation Supplier training and support Supplier service and spare parts Warranty costs Continuous improvement Costs incurred by the customer are referred to as operational costs, and include: Customer installation and training Operating costs Customer service costs and spares inventory Customer performed maintenance Customer space costs Scheduled maintenance Equipment improvements and upgrades Down time and scrap costs Disposal costs Life cycle costs can be calculated manually by summing up all the expected costs and then normalizing the amount by production units such as number of wafers expected to be produced over the life of the equipment. If the equipment life is very long (more than 3 years) present value (discounted values) of the costs occurring in the later years should be used rather than the phase values in those years. If RAMP software (AT12) or SEMATECH Cost Of Ownership (COO) Model software is available, the life cycle cost calculations can be done using any one of those models.

SEMATECH

Technology Transfer # 92031014A-GEN

145

References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using RAMP, SETEC91-030, Albuquerque, NM:Sandia National Laboratories. Campbell, J., B. Thompson, D. Longsine, P. OConnell, R. Iman, RAMP Users Reference Manual, SETEC91-030, Albuquerque, NM:Sandia National Laboratories. Cost of Ownership Model, SEMATECH Technology Transfer # 91020473B-GEN, Austin, TX:SEMATECH, January 24, 1991

Technology Transfer # 92031014A-GEN

SEMATECH

SEMATECH Technology Transfer 2706 Montopolis Drive Austin, TX 78741 http://www.sematech.org

Você também pode gostar