Você está na página 1de 21

Cogn Tech Work (2016) 18:193–213

DOI 10.1007/s10111-015-0354-y

ORIGINAL ARTICLE

Experts’ knowledge renewal and maintenance actions


effectiveness in high-mix low-volume industries, using Bayesian
approach
Anis Ben Said1,2 • Muhammad Kashif Shahzad2 • Eric Zamai2 •

Stéphane Hubac1 • Michel Tollenaere2

Received: 20 July 2015 / Accepted: 5 October 2015 / Published online: 19 October 2015
Ó Springer-Verlag London 2015

Abstract Increasing demand diversity has resulted in methodology based on Bayesian approach and an extended
high-mix low-volume production where success depends on FMECA method to support experts’ knowledge renewal and
the ability to quickly design and develop new products. This maintenance actions effectiveness. In the proposed
requires sustainable production capacities and efficient methodology, FMECA files capitalize and model experts’
equipment utilization which are ensured through appropriate existing knowledge as an operational Bayesian network (O-
maintenance strategies. Presently, these are derived from BN) to provide real-time feedback on poorly executed
experts’ knowledge, capitalized in FMECA (failure mode, maintenance actions. The accuracy of O-BN is monitored
effect and criticality analysis), and effective maintenance through drifts in maintenance performance measurement
procedures. Abu-Samah et al. (Failure prognosis methodol- (MPM) indicators that result in learning an unsupervised
ogy for improved proactive maintenance using bayesian Bayesian network (U-BN) to discover new causal relations
approach. In: 9th IFAC symposium on fault detection, from historical data. The structural difference between
supervision and safety for technical processes. Paris, France, O-BN and U-BN highlights potential new knowledge which
2015) found increasing unscheduled breakdowns, failure is validated by experts prior to updating existing FMECA
durations and number of repair actions in each failure as the and associated maintenance procedures. The proposed
key challenges while sustaining production capacities in methodology is evaluated in a well-reputed high-mix low-
complex production environment. Obviously, maintenance volume semiconductor production line to demonstrate its
based on the historical knowledge is not always effective to ability to dynamically renew experts’ knowledge and
cope up with an evolving nature of equipment failure improve maintenance actions effectiveness.
behaviors. Therefore, this paper presents an operational
Keywords Maintenance optimization  Dynamic
knowledge management  Bayesian network  FMECA 
& Muhammad Kashif Shahzad High-mix low-volume production
muhammadkashif.shahzad@g-scop.inpg.fr
Anis Ben Said
anis.bensaid@st.com
1 Introduction
Eric Zamai
eric.zamai@grenoble-inp.fr
Production lines, like semiconductor industry (SI), are
Stéphane Hubac presently challenged by short product life cycles that lead
stephane.hubac@st.com
to the everlasting development of new products and tech-
Michel Tollenaere nologies. In addition, increasing demand diversity and
michel.tollenaere@grenoble-inp.fr
volume that often result in high-mix low-volume produc-
1
STMicroelectronics, 850 Rue Jean Monnet, 38926 Crolles, tion turn manufacturing into a highly complex production
France environment. Success in such a competitive environment
2
G-SCOP, Université de Grenoble Alpes, 38000 Grenoble, requires not only sustainable capacities but also their effi-
France cient utilization rate which are ensured through appropriate

123
194 Cogn Tech Work (2016) 18:193–213

maintenance strategies. The product mix fluctuations propose to use RCM methodology based on FMECA
resulting from high-mix low-volume production and short approach to develop a cost-effective maintenance program
product life cycles are found to have strongest impact on for electric power distribution system. It is also used by
overall equipment efficiency (OEE) due to unstable cor- Bertling et al. (2005) to provide quantitative relationship
rective and preventive maintenance actions. This leads to between preventive maintenance (PM) of assets and the
unstable production capacities. Abu-Samah et al. (2014) total maintenance cost. Similarly, APC is associated with
have found drifting equipment failure behaviors as an capitalized knowledge and risk-based methodologies to
evidence of product mix fluctuations that are directly pro- implement process and equipment control strategies. Both
portional to failure occurrences and durations. This signi- of these domains require experts’ knowledge capitalization
fies that knowledge about the failure behaviors, which is to handle the equipment failures, maintenance policies
capitalized in FMECA and/or maintenance procedures, is optimization and finding links between causes and effects
not well understood. It also signifies that existing mainte- to define an appropriate control plan. Ison and Spanos
nance procedures, used for repairs and derived from (1996) and Moore et al. (2006) show the use of APC
experts’ knowledge, have become ineffective to repair techniques as fault detection and classification (FDC) to
unscheduled equipment breakdowns. characterize and control variability on semiconductor
There exist multiple systems (Fig. 1) in the production manufacturing equipment. In this context, Mili et al. (2009)
line to store and share experts’ knowledge, such as have proposed an enhanced FMECA protocol to dynami-
advanced documentation and control system (ADCS), e.g., cally manage a risk-based maintenance actions plan
maintenance procedures, advanced process control (APC), (RBAP) linked to the equipment failure causes and effects.
recipe management systems (RMS), e.g., automatic recipe To measure maintenance policies efficiency and support
design protocols on some advanced equipment. These are analysis of equipment-related unstable production capaci-
based on the experts’ knowledge mainly stored in FMECA ties, many studies have proposed maintenance performance
files that are not dynamically updated at an appropriate measurement (MPM) indicators. Parida and Kumar (2006)
frequency to cope with the emerging of OEE challenges for present a brief review of existing MPM. Crocker (1999)
controlling equipment behavior, in a highly competitive considers the maintenance effectiveness along three axes:
and complex manufacturing context. inspection effectiveness, repair effectiveness and mainte-
Figure 1 provides a global view of the characteristics nance-induced failures. It can be noticed here that
and challenges versus equipment constraints and experts’ approaches like RCM, CBM and APC are strictly based on
knowledge across the production line. At present, the experts’ qualitative judgments. However, to deal with
equipment breakdowns and maintenance strategy opti- drifting equipment behaviors in a dynamic manufacturing
mization issues are addressed with classical approaches environment, such capitalized knowledge should be upda-
such as reliability-centered maintenance (RCM) and con- ted dynamically from historical data as highlighted by
dition-based maintenance (CBM). Yssaad et al. (2014) Redmill (2002), Hubac and Zamai (2013).

Fig. 1 Characteristics and challenges versus equipment constraints and expert knowledge

123
Cogn Tech Work (2016) 18:193–213 195

In high-mix low-volume production context where equip- approaches and (3) how maintenance performance evalu-
ment behavior is evolving, the existing approaches suppose ation is taken into consideration while executing effec-
that corrective and preventive maintenance knowledge is tiveness and human factor impacts.
properly capitalized and renewed at an appropriate frequency
through experts’ intervention. The failure misdiagnosis and 2.1 Maintenance strategies optimization
increasing number of failure occurrences and durations (Abu-
Samah et al. 2014) confirm that we do not only need to In a high-mix low-volume production environment like SI,
dynamically update the experts’ knowledge but also need to maintenance is a key contributor to improve and sustain
improve the effectiveness of maintenance actions executed by production capacities. There are mainly three well-known
technicians. The frequency to update experts’ knowledge can and adapted maintenance strategies as corrective (run to
be triggered based on the drifts of MPM indicators. The causes failure), preventive (time and usage based) and predictive
associated with equipment failures drifts may be related not maintenance (Mili et al. 2009). The corrective maintenance
only to maintenance procedures and ineffectiveness of exist- (CM) deals with unpredictable failures that destabilize the
ing diagnosis protocols, but also to other causes like process equipment performance and availability, and consequently
types, equipment recipes designs, products and technologies OEE. In order to reduce unscheduled breakdown occur-
mix fluctuation in production. rences and durations, preventive maintenance is introduced
In this paper, we present an operational methodology to avoid known causes and failures as the source of pro-
based on Bayesian approach and an extended FMECA duction capacity variations. At present, the industry has
method to support experts’ knowledge renewal and main- relied on systematic preventive maintenance (PM) based on
tenance actions effectiveness. In the proposed methodol- historical knowledge and calendar to optimize production
ogy, we capitalize and model experts’ existing knowledge capacities while ensuring product quality. The OEE and
from FMECA files as an operational Bayesian network (O- maintenance policies are supported at field level by the well-
BN) to provide real-time feedback on poorly executed known total preventive maintenance (TPM) philosophy that
maintenance actions. The accuracy of O-BN is monitored integrates and shares the ownership of maintenance tasks and
through drift in MPM indicators that results in learning an responsibilities at all levels of the organizations (McKone
unsupervised Bayesian network (U-BN) to discover new et al. 1999; Lin et al. 2015). However, the main drawbacks of
causal relations from historical data. The structural differ- PM are over- and under-engineering that lead to increased
ence between O-BN and U-BN highlights potential new costs and/or reduced production capacities. Consequently,
knowledge which is validated by experts prior to modifying condition-based maintenance (CBM) is introduced as an
existing FMECA and associated maintenance procedures. alternative strategy where maintenance actions are triggered
The proposed methodology is evaluated in a well-reputed according to equipment condition monitoring and/or poten-
high-mix low-volume semiconductor production line to tial failure occurrence anticipation, also known as predictive
demonstrate its ability to dynamically renew experts’ CBM. The non-predictive CBM consists of controlling a set
knowledge and improve maintenance actions effectiveness. of significant parameters to monitor equipment health with
This paper is divided in three sections. The literature defined thresholds (Susto et al. 2012; Krishnamurthy et al.
review, in Sect. 2, addresses the topics linked to mainte- 2005). Besides this, predictive CBM concept is based on
nance strategies optimization, knowledge management, real-time monitoring of the equipment health indexes (EHI)
maintenance actions effectiveness and human factors and is drift detection for failure predictions, and if an accurate
concluded by a literature review synthesis. Then, Sect. 3 cause diagnosis probability (CDP) is available, execute pre-
describes the proposed operational methodology to over- failure interventions. Recently, prediction of failures has
come the issues concerning dynamic renewal of experts’ been enabled by the availability of artificial intelligent
knowledge and maintenance actions effectiveness. Then, techniques and sufficient recorded data, also known as pre-
the methodology is evaluated step by step using data col- dictive maintenance (PdM). Few prominent PdM approa-
lected from a well-reputed high-mix low-volume semi- ches are classification methods (Baly and Hajj 2012),
conductor production line. In Sect. 4, the paper concludes prediction approaches (Susto et al. 2011; Schirru et al. 2010)
with perspectives and conclusions. and regression methods (Hsieh et al. 2013; Susto et al. 2012).
The CBM and PdM are advanced maintenance strategies,
which require known and stable variability distributions
2 Literature review (otherwise statistical models are unstable) of equipment
failures and causes; however, none of these approaches
The literature review has been performed across (1) anal- consider model updating when the production equipment
ysis of existing approaches to handle maintenance strate- behavior is subject to high-mix low-volume drifting
gies optimization, (2) associated knowledge management environment.

123
196 Cogn Tech Work (2016) 18:193–213

The CBM and PdM are addressed with the classical variability control in competitive high-mix low-volume
approaches such as APC for detection and RCM for environment also require an effective knowledge manage-
knowledge capitalization with an objective to share and ment (KM) system to maintain up-to-date operating pro-
reuse experts’ knowledge to optimize CM and PM actions cedures. The KM subject is often addressed in the literature
efficiency and duration, also known as effectiveness. As (e.g., Teece 2000). Mustapha et al. (2015) present an
highlighted above, RCM is the maintenance strategy overview of knowledge management strategies and their
mainly based on FMECA to prioritize maintenance activ- respective performance in industrial context and Meihami
ities as well as to capitalize experts’ knowledge for and Meihami (2014) highlight the impact of KM on
equipment reliability. This keeps an appropriate balance learning, innovation and organizational performance
between CM and PM (Bertling et al. 2005; Yssaad et al. enhancement. The successful use of FMECA for experts’
2014). The APC approach uses fault detection and classi- knowledge capitalization to prevent failures is demon-
fication (FDC) methods to deal with detection challenges strated by Luo and lee (2015), whereas KM strategy is also
using complex algorithms (e.g., He and Wang 2010). used by Medina-Oliva et al. (2015) for dependability
Verdier and Ferreira (2011) use K-nearest neighbor (KNN) investigation of different concepts in industrial systems
approach to improve detection extracted from equipment through semantic rules that are used to build a probabilistic
sensor signals. The Bayesian network (BN)-based algo- relational model (PRM), to assess maintenance strategies.
rithms have emerged as a prominent approach for equip- Similarly, it can be found that reliability modeling methods
ment failures diagnosis considering its capability to (Dai et al. 2015) use capitalized knowledge to build an
represent the causal dependency between variables with analytic network of a process including different process
visual graphs and to combine multisource historical data components dependencies.
and experts’ knowledge (Bouaziz et al. 2013). Weber and In above examples, the proposed KM models suppose a
Jouffe (2006) used BN for dynamic condition monitoring static approach in terms of causal structure, as well as
and failure diagnosis to support CBM strategy in complex stable dependencies between functional components of the
industries (e.g., SI and aircraft). These studies focus their described system and/or processes, or do not take into
research strictly on the equipment-related causes of fail- account rapid drifts in the production environment and so
ures, and none of them take into account non-equipment the associated knowledge evolution. It may have a strong
failure sources such as product, process, recipe design impact on maintenance actions effectiveness and human
associated with products and technology mix and/or factors like the relationship between people and technolo-
maintenance procedures ineffectiveness. gies, tools, environments and systems (Rasmussen 2000).
We conclude that existing maintenance strategy opti-
mization approaches are based on the hypothesis that 2.3 Maintenance actions effectiveness and human
variability distributions of equipment failures and causes factors
are stable (otherwise statistical models are not reliable),
and none of these approaches consider model updating Besides correct formulation of maintenance procedures
when production equipment behavior is subject to high- from experts’ static knowledge, their proper execution is
mix low-volume environment. So it becomes crucial to also crucial to achieve maintenance strategy optimization.
address another aspect of drifting behaviors: How to update Parida and Kumar (2006) present a brief review of existing
the knowledge at the convenient frequency, for example in maintenance performance measurement (MPM) indicators
FMECA and maintenance procedures and statistical to evaluate the maintenance actions execution level. The
models? authors have also introduced the concept of total mainte-
nance effectiveness, integrating both internal and external
2.2 Knowledge management effectiveness. They further propose the integration of
hierarchical levels and multi-criteria MPM indicators. On
Industry has historically based its competitiveness strategy the other hand, Crocker (1999) considers maintenance
on time to market, then on time to volume and finally on effectiveness along three dimensions as inspection effec-
time to quality. Nowadays, in a high-mix low-volume tiveness, repair effectiveness and maintenance-induced
production environment, stressed by product and technol- failures. These studies present a set of industrial mainte-
ogy obsolescence speed, knowledge management is rec- nance performance-related indicators without taking into
ognized as a key issue to sustain competitiveness and account direct impact of fulfillment effectiveness of
innovation. The organizations need to turn from informa- maintenance functions during action executions. Muchiri
tion-based to knowledge-based organizations which require et al. (2011) demonstrated that MPM indicators should not
learning loops while integrating technical systems and be defined without taking into account the interaction
human actors. The maintenance strategies optimization and between maintenance and other functions like production.

123
Cogn Tech Work (2016) 18:193–213 197

Moreover, these studies do not take into account the impact high-mix low-volume production context, the MPM indi-
of human factor and evolution of product mix as cause of cators drift may be explained by an inappropriate usage
variability in a dynamic industrial context. and/or insufficient level of existing knowledge. To over-
The human factor is a key contributor in the effective- come these gaps, we present an extended FMECA with the
ness of maintenance actions. Bruseberg (2008) clarifies that BN approach. This allows continuous update of mainte-
human factor integration (HFI) can create value by ensur- nance function and fulfillment criteria, linked with main-
ing safe, effective and efficient system performance. The tenance procedures and checklists. In the proposed
human factor topic is particularly studied in the aviation extended FMECA approach, concept of OFC (objective
maintenance domain. Masson and Koning (2001) focus fulfillment criteria) is introduced that allows the evaluation
their work on the impact of human error and present an of maintenance actions effectiveness. The MPM indicators
overview of human error management concept to fulfill drifts highlight inappropriate usage and/or insufficient level
joint aviation requirements (JAR-66) standard. This study of existing knowledge; therefore, related issues concerning
demonstrates the huge impact of maintenance performance inappropriate usage of the existing knowledge must be
on reliability and safety through real examples; however, removed. To address poor maintenance execution, we
they do not address the issue associated with the control of propose to use an operational BN to store and capitalize
human behavior during maintenance actions execution. An existing knowledge from extended FMECA. To address
integrated methodology is proposed by Rashid et al. (2014) second aspect, actual measurements of control parameters
to proactively monitor the human factors using fuzzy the- are converted into OFC levels and are fed in O-BN which
ory which consists of detecting and eliminating negative then provides feedback concerning maintenance actions
causal factors. Similarly, Chang and Wang (2010), Cac- execution effects to improve technicians’ cognition about
ciabue et al. (2003) studied the interaction of human factors maintenance actions execution, instead of local subjective
with the environment and their link with actions execution judgment. Finally, the literature also highlights the fact that
effectiveness. These studies propose a cognitive model in many cases, it is unknown whether capitalized knowl-
known as SHELL (software, hardware, environment, live- edge is up to date or missing. Therefore, it is important to
ware) for aircraft maintenance technician (AMT) to eval- define a process to control the relevance/accuracy of the
uate the impact of human risk during maintenance opera- existing knowledge and update it dynamically with
tions. The aim is to help airlines to identify their major experts’ intervention. For this purpose, an unsupervised BN
operational and managerial weaknesses to improve risk (U-BN) is proposed to enhance the ability to discover new
management for maintenance operations under limited causal structures, learned from updated historical data, and
resource. In this model, the software represents mainte- to support dynamic knowledge update. This is accom-
nance procedures for a given task, hardware represents plished through structural comparison of O-BN and U-BN
tools and materials (lubricants, anti-corrosion agents, etc.), where new causalities highlight potential knowledge.
environment represents organizational structure and envi- These are further validated by an expert prior to modifying
ronmental context, whereas live-ware represents charac- the FMECA and maintenance procedures.
teristics of AMT. All of these approaches consider static
structures and highly rely on experts’ knowledge. In
dynamic environments, structure of the model should be 3 Proposed methodology to improve maintenance
updated regularly to ensure monitoring of maintenance actions effectiveness in high-mix low-volume
functions performance, but the management of the evolu- context
tion of adverse events in a dynamic context requires con-
tinuous update of experts’ knowledge with subsequent In view of the gaps identified in above section, we present a
modifications in maintenance procedures. three-step methodology (Fig. 2) based on extended
FMECA method and BN approach. The first step focuses
2.4 Literature review synthesis on capitalizing, unifying, and sharing experts’ knowledge
on best maintenance practices. Experts use FMECA files
The above discussion highlights three gaps toward the for a given maintenance procedure (MP) and equipment to
effectiveness of maintenance actions execution in a define functions and set of associated OFC that must be
dynamic manufacturing context. First improvement is in respected by technicians during maintenance execution.
experts’ existing knowledge with an objective to design These criteria serve as basis to prevent known or forecasted
appropriate maintenance procedures and associated failure modes (FM) and evaluate the effectiveness of exe-
checklists. It is also concluded that when knowledge update cuted actions. This step also includes the description and
is not performed at an appropriate frequency to deal with feedback of potential effects (consequences) for poorly
the evolving nature of equipment and process behaviors, in executed maintenance actions. The experts’ knowledge,

123
198 Cogn Tech Work (2016) 18:193–213

Fig. 2 Extended FMECA and Technicians Experts


BN-based methodology to
improve maintenance actions Step1: Knowledge Start
effectiveness capitalization & usage

FMECA

Checklist Execution
by equipment and maintenance type

Criteria fulfillment
MPM control

associated to MP Function
Fulfillment Criteria level
MPM
Maintenance Performance Measurement
Industrial Indicators strategy

No
MPM
out of control ?

Yes
Maintenance
Checklist(s)
OFC
Generator Operational BN
(Bayes. Network) Knowledge
Maintenance actions capitalization (FMECA based)
feedback (FM, Effects)
CMMS
Step2: Existing Knowledge
(FM and Effects) accuracy control
For each
Existing BN prediction
maintenance
accuracy evaluation
actions in the
checklist
Data collection
Actual and Operational BN feed backs
from all maintenance events.

Operational BN Model accuracy


Versus actual events

Prediction accuracy No
out of control ?

Yes

Step 3: Knowledge update


Maintenance
Database

Unsupervised BN
New causal structure learned from
historical data (till date)
Other database:

No New potential
MES

knowledge (Causal
structure) ?

Yes
New detection and/or
effects Yes

No

End

capitalized in the FMECA files, leads the modification in maintenance actions to fulfill overall maintenance proce-
maintenance procedures that comprise maintenance func- dure’s function). Mili et al. (2009) have showed the
tions and associated checklists (i.e., sequence of effectiveness of dynamically capitalizing experts’

123
Cogn Tech Work (2016) 18:193–213 199

knowledge and linking adverse events with their causes developed using experts’ knowledge from FMECA. The
and effects. However, static nature of FMECA put limits new causal links highlight potential new knowledge which
on its ability to cope up with drifting equipment and pro- is then validated by an expert prior to subsequent modifi-
cess behavior, because FMECA is exclusively based on cations in FMECA and maintenance procedures. The
experts’ knowledge and needs human intervention to be absence of structural difference shows ineffectiveness of
updated (Denson et al. 2014; Mili et al. 2008). So, in case the existing knowledge that requires inclusion of new data
of rapid change of process and equipment behaviors, how dimensions to explain variability in out-of-control MPM
experts can be informed that FMECA contents need to be indicators. This step is not yet treated and included in this
updated? We propose to use MPM indicators to highlight paper; however, if multivariate statistical analysis supports
the equipment behavior drifts associated with knowledge the inclusion of new data, the O-BN has to be learned again
capitalization stored in FMECA and/or procedure usage followed by modification in FMECA.
issues.
Any drift in MPM indicators from target is an evidence 3.1 Industrial context of the case study
of inappropriate usage or ineffective capitalization of
experts’ knowledge. Hence, these can be used to dynami- The methodology is evaluated in the SI industry which is
cally trigger knowledge capitalization and/or usage con- highly competitive production environment constrained by
cerns, related to the evolving nature of human factors or rapid product and technology changes. This requires
equipment and process behaviors. Therefore, we first equipment to operate irrespective of the product mix on the
address the issue associated with inappropriate usage of production line. The SI is known for its high-mix low-
capitalized knowledge stored in FMECA and described in volume production environment where it is necessary to
procedures by building an O-BN that links the causal deal with dynamic equipment behavior drifts. The value
dependency between PM functions (objectives), associated chain for the SI is presented in Fig. 3 which highlights the
fulfillment criteria, failure modes and effects. The OFC is complex and competitive business environment (Shahzad
implemented by quantifying it into classes. This O-BN et al. 2011). This starts with customer demand for new
provides feedback to technicians related to the poor exe- product functionalities followed by design phase where
cution of maintenance actions, thanks to failure mode (FM) computer-aided design (CAD) simulations are used to
effects, through Bayesian inference. The O-BN is linked assess electrical description of requested functionalities
with computerized maintenance management system and transform in physical layouts and mask generation for
(CMMS) to exchange information filled by actors during production. The front-end phase in manufacturing is used
maintenance actions execution and to give the necessary to manufacture devices (transistors, resistors, etc.) on sili-
feedback to improve the effectiveness of existing proce- con wafers. Finally, the wafers are tested, before been cut
dures usage. This step advocates that experts’ knowledge is in individual dies for packaging, also known as backend
properly used and described inside the maintenance pro- manufacturing.
cedures; however, it does not offer to dynamically renew The front-end manufacturing comprises production
the knowledge and adapt maintenance procedures when operations performed in different workshops such as sur-
MPM indicators drift and/or maintenance procedures exe- face cleaning, silicon oxidation, dielectric and metal
cution remains ineffective. This may be due to the evolving deposition, photolithography, etching and polishing
nature of equipment and process behavior and associated (Mönch et al. 2012). This consists of more than 200 pro-
with missing or inaccurate existing knowledge stored duction operations, 1100 individual steps and 8 weeks of
inside FMECA and maintenance procedures. In the second cycle time (Brown et al. 2010). The case study is carried
step of proposed methodology, we control the relevance of out in dielectric (DIEL) workshop where wafers pass 8
existing experts’ knowledge by evaluating O-BN model times for processing at different stages of the production
accuracy, using historical data. The O-BN is modified only flow to deposit thin film electrical insulator on the wafer
when prediction accuracy is not within user-defined limits, surface. The deposition is performed by plasma-enhanced
unless Step 1 is performed to improve maintenance effec- chemical vapor deposition (PECVD) equipment that has
tiveness as explained above. The Step 3 of the proposed multiple maintenance procedures and associated checklists,
methodology is executed to support knowledge update designed by experts to sustain OEE in DIEL workshop.
when model accuracy drifts beyond user-defined limits. The causes of PECVD equipment failure drifts may be
The MPM indicator drift and O-BN inaccuracy are the related not only to maintenance procedures and ineffec-
triggering events to highlight the need of experts’ knowl- tiveness of existing diagnosis protocols but also to other
edge update by finding new cause–effects links. This is causes like process types, equipment recipes designs,
done by structurally comparing U-BN which is built using associated with products and technologies mix fluctuation.
up-to-date historical maintenance data and O-BN which is In a field study at a world-reputed semiconductor

123
200 Cogn Tech Work (2016) 18:193–213

Fig. 3 SI value chain from


customer to product delivery

manufacturer, we noted that order of maintenance actions design maintenance procedures and O-BN structure when
execution for a given maintenance procedure varies MPM control limits are not reached. The FMECA unifies
between technicians. Uzsoy et al. (1994), Abu-Samah et al. maintenance actions and controls associated risks by cap-
(2014) demonstrated that equipment availability and pre- italizing experts’ knowledge to design maintenance pro-
ventive maintenance plan are disturbed by unscheduled cedures as presented in Fig. 4.
equipment breakdown due to the drifting equipment In functional scope analysis, the maintenance actions
behavior. objectives for equipment and maintenance types are
defined as functions. These are further linked to FMs based
3.2 Step 1: Knowledge capitalization and usage on experts’ knowledge, because if actions are added to a
in maintenance context procedure to fulfill objectives, it is due to the fact that
experts have already seen the FM and know how to detect
The existing knowledge capitalization, unification, and or prevent it by checking criteria (setup or control limits).
sharing are the crucial steps toward effective maintenance These criteria must be respected by technicians during
in a dynamic production environment. The FMECA is a maintenance execution to fulfill the PM objectives. The
widely accepted methodology to address functional risks OFC is quantified into classes and subclasses and is used to
with RCM approach. This allows analysis, evaluation and evaluate the drifts through O-BN. The FM analysis phase
ranking to highlight and prioritize actions to recover from allows defining value criteria, effects, causes, detection and
poor MPM indicators performances. The FMEA was ini- prevention for each failure mode. The value criteria are
tially developed by US military (MIL-P-1629 1949) for defined as the negation of associated FM, because if a
failure mode and effect analysis on system components. value criterion is not fulfilled, the FM can potentially occur
This was updated with criticality or severity assessment by (see example in Figs. 5, 6). The risks analysis phase
NASA to ensure expected reliability of the space systems involves computation of risk priority number (RPN) based
(Jordan 1972). The FMEA, also known as FMECA, is on severity (SEV), occurrence (OCC) and detection (DET)
applied in multiple fields, e.g., software failure mode and as RPN = SEV 9 OCC 9 DET. If RPN passes threshold,
effects analysis (SWFMEA), design FMEA, process the expert has to define operational fixes to reduce the risk
FMEA, system or concept FMEA, etc. The scope of using by introducing new functions or new value criteria asso-
FMECA in these fields is to identify, prioritize and elimi- ciated with existing functions. This step is repeated till
nate potential failures from system, design or process RPN falls below the defined threshold, whereas upon
functions before they occur (Omdahl 1988; Villacourt evaluation and approval of value criteria, operating pro-
1992) or at least with reduced effect through an earlier cedures can be updated. This improved RCM-based pro-
detection. In the proposed methodology, FMECA is cess aims to help maintenance actors to prioritize the
employed with RCM approach to collect, capitalize and maintenance activities for cost-efficient solution. It allows
organize experts’ existing knowledge for effective main- to define more robust maintenance operating procedures
tenance practices, dynamically. The objective is to identify with O-BN that also provide feedback to improve the
and prevent the risks linked with poor maintenance actions effectiveness of maintenance procedure usage as explained
execution. The FMECA document serves as the basis to in Sect. 3.2.3.

123
Cogn Tech Work (2016) 18:193–213 201

Start

Yes Define Functions

Functional Scope
as maintenance action objectives
by equipment & maintenance type

Analysis
Failure mode (FM) identified
Based on expert Knowledge
Operational Fixes

Define Value criteria


New Function No To fill the Function objectives

(Causes, Effects, Detections & Preventions )


needed ? (Prevent from FM) with set-up &
control limits

Define Effect
By FM

FM Analysis
Potential Causes
identified by FM

Define Prevention
Define Detection by Detection on functional
by Function Value criteria value criteria linked to
causes

Severity Quotation Detectability Quotation Occurrences Quotation


(SEV) (DET) (OCC)

Risk Analysis
Risk evaluation and ranking OFC Generator
(RPN) Objective Fulfillment Criteria
RPN= SEV x OCC x DET linked to criteria level used by BN
to evaluate maintenance
execution efficiency
No Is risk
acceptable

Yes

Function and criteria approved Operating Procedure


Update

Maintenance procedure
(Checklist) updating

End

Fig. 4 FMECA process toward unified maintenance actions and associated risk control

3.2.1 Step 1: Existing knowledge capitalization with helium, etc. An example of the functional analysis
and risk evaluation using standard FMECA is presented in
The design of maintenance procedures uses FMECA; Fig. 5. The values of severity, occurrence, detectability
therefore, we selected preventive maintenance (PM) pro- and RPN have been normalized for confidentiality rea-
cedure for process chamber cleaning. The key functions sons. The results of extended FMECA including OFC
identified by experts in FMECA for the selected PM and levels for each fulfillment criterion are presented in Fig. 6.
equipment type are the (1) equipment and human factor It can be seen that each of the fulfillment criteria is dis-
security, (2) PM anticipation, (3) pumping out the buffer cretized in multiple levels, using Likert scale, in column
and load lock, (4) dismantling foreline and (5) leak test orange) with an experts’ intervention. The OFC column

123
202 Cogn Tech Work (2016) 18:193–213

Title : PM procedure for cleaning Process chamber --> Actual Risk (FMECA Analyses)
Objective
Function

Current
Value Criteria Potential Failure Potential Effects of Potential Cause/ Current Controls Current Controls (S) (O)
Criteria Levels (D) DET OFC RPN =
Definition Mode Failure Mechanism of Failure Prevention Detection SEV OCC
(S*O*D)
Ckecklist paper lid closing:
No =1 item buffer by 2 persons 10 1 1 1 10
Risk of crushing, the lid is
Lid buffer closed by 2 Closing lid buffer not checked. Work Accident
Affects people safety heavy even if witheheld by a
persons realized by 2 people Pictogram of crushing risk Declaration
bailer
Yes =2 in presented in this 10 1 1 2 10
procedure.

No =1 Alarm equipment 2 1 2 1 4
Reset the buffer and Load-Lock (LL) under vaccum

Waste of time during the PM, Ckecklist paper: buffer


Buffer vacuum buffer ventilated Downtime during LCF
impossible LCF calibration pumping checked
Pumping out the buffer and Load-Lock (LL)

Yes =2 calibration 2 1 2 2 4

No =1, Alarm equipment 2 1 2 1 4


Ckecklist paper: buffer
LL vacuum LL ventilated Downtime Waste of time during PM, during LCF
purge checked
Yes =2 impossible LCF calibration calibration 2 1 2 2 4
Equipment Level

No =1 4 2 3 1 24
Defectivity, backstreaming Ckecklist paper: buffer
Buffer purging not reset Downtime D0 mecax25 buffer
Buffer purge reset chamber to buffer during LCF purge checked
Yes =2 calibration 4 2 3 2 24

> 7,5 mT/min =1 4 1 3 1 12


Defectivity as leak buffer
Leak rate buffer < Leak rate buffer > (attached) or outgassing, to Ckecklist paper: item buffer
= 7mT/min =2 Downtime D0 mecax25 buffer 4 1 3 2 12
7mT/min 7mT/min do before the LCF calibration leak rate <7mt checked
because the necessity of
< 7 mT/min =3 reopening buffer 4 1 3 3 12

> 7,5 mT/min =1 4 1 3 1 12


Defectivity as leak LL or
outgassing, to do before the Ckecklist paper: item LL
Leak rate LL < = 7mT/min =2 Leak rate LL > 7mT/min Downtime D0 mecax25 buffer 4 1 3 2 12
LCF calibration because the leak rate < 7mt checked
7mT/min
necessity of reopening LL or
< 7 mT/min =3 buffer 4 1 3 3 12

No =1 Waste of time during the PM, Detection when going 5 2 3 1 30


the output of wafers manual Ckecklist paper: item LL outside the
LL heater not reset LL heater reset Downtime
LCF that need to be LL <80 ° heater not reset checked calibration wafers
Yes = 2 C manual 5 2 3 2 30

Fig. 5 Knowledge capitalization and RPN for pumping out the buffer and load lock function: FMECA

Title : PM procedure for cleaning Process chamber --> Actual Risk (FMECA Analyses)
Objective
Function

Potential Failure Potential Effects Potential Cause/ Current Controls Current Controls (S) (O) (D) Current RPN
Value Criteria Definition Criteria Levels
Mode of Failure Mechanism of Failure Prevention Detection SEV OCC DET OFC
OFI
= (S*O*D)

Reflected power
Defines the ramp-up and ramp-down rates of the HF
HF RF and LF RF Ramp Up Rate (W/s) and Ramp

>10% of set point 7 3 5 1 105


=1
and LF RF generators in W per second

5% <Reflected Reflected power >


Reflected power < RF reflected power
power< 10% of set 10% of set point (or FDC control of reflected equipment/estar
10% of set point (or Parts failure reduces generator & 7 3 5 2 105
point coeff reflected > @ recipe creation. alarm
coeff reflected <0.3) chamber parts lifetime
=2 0.3)
Down Rate (W/s)

Equipment Level

Reflected power<
10% of set point 7 3 5 3 105
=3

RF reflected
stabilisation time >
7 4 6 1 168
2s
=1
RF reflected 1s<RF reflected RF reflected RF reflected power
stabilisation time < FDC control of reflected equipment/estar
powerstabilisation powerstabilisation Parts failure reduces generator & 7 4 6 2 168
2s @ recipe creation. alarm
time < 2s time > 2s chamber parts lifetime.
=2
RF reflected
stabilisation time <
7 4 6 3 168
1s
=3

Fig. 6 Knowledge capitalization and RPN for RF ramp up rate function: Extended FMECA

(green) is also added to FMECA that measures responses cost), which are highlighted by MPM indicator drift. The
and judgments of technicians during PM execution. The failure is associated with detection (column 9) and also
effectiveness of performed actions is measured by O-BN the associated known causes and preventions (columns 7
as presented in Sect. 3.2.3. The FMECA (Figs. 5, 6) and 8). Then, severity (SEV), occurrence (OCC), detec-
presents existing knowledge as functional blocks, objec- tion (DET) and RPN (columns 10, 11, 12 and 14) are
tives associated with actions to guarantee quality that may computed as per standard. The OFC (column 13) has been
affect cost (first two columns in Figs. 5, 6). For each added taking into account the registered value in the
function, value criteria are defined (columns 3 and 4) to CMMS by the operator if the action is performed manu-
check an associated detection if the functions objectives ally in checklist or through detection, to assess RPN
are fulfilled. If not, the expert describes the associated values based on actual registered data or to support BN
failure and linked effects (column 6; yield, cycle time or conditional probabilities table.

123
Cogn Tech Work (2016) 18:193–213 203

3.2.2 Step 1: Maintenance performance measurement The sampling frequency to control these indicators and
(MPM) control their limits are defined according to operational manage-
ment rules. Indeed, daily, weekly, monthly (Fig. 8a) or
The first step of the proposed methodology allows experts’ quarterly (Fig. 8b) reports are generated to monitor the
knowledge capitalization based on FMECA approach to maintenance effectiveness and production line behavior.
adjust the maintenance procedures (checklist). However, One or several indicators, below the target, highlight an
the effectiveness of capitalized knowledge usage is neither equipment drift and a potential knowhow capitalization
evaluated nor controlled. Therefore, any drift in MPM and/or procedure usage issues.
indicators from operational target can be explained by a set
of FMs which are not under control. This is an evidence of 3.2.3 Step 1: Existing knowledge capitalization as O-BN
ineffective capitalization or inappropriate usage of capi- and feedback on maintenance execution effectiveness
talized knowledge in terms of FM detection, prevention,
effects or causes. Hence, MPM indicators may be used to When MPM indicators, described in Sect. 3.2.2, drift and
dynamically trigger knowledge capitalization and ineffec- reach operational control limits, inefficient capitalization or
tive usage concerns related to the evolving nature of human inappropriate usage of capitalized knowledge needs to be
factors or equipment and process behaviors. Consequently, verified. To support these tasks, we propose learning O-BN
MPM indicators trigger the need to implement O-BN as based on extended FMECA as presented in Sect. 3.1. The
well as U-BN, respectively. O-BN links the causal dependency between PM functions,
In manufacturing facilities, MPM indicators and their associated fulfillment criteria, failure modes and effects.
control limits are set by maintenance experts according to The BN is directed acyclic graph (DAG) and is composed
the industrial context and requirements. In the literature, of nodes (random variables) and directed edges (links, arcs)
several studies focus their research on MPM indicators as between nodes as shown in Fig. 9. The directed edges
(Simões et al. 2011) who present a literature review on this represent the causal dependency among nodes in the net-
subject. They present the evolution in MPM usage as an work. The effect of parent nodes on child node is assessed
important maintenance function, resource, activity and with conditional probability table (CPT) that is computed
practice. Muchiri et al. (2010) completed an industrial sur- from input data using inference algorithm based on ‘‘Bayes
vey to find how MPM can be used in industrial maintenance theorem’’ (Peter 2012) in Eq. 1.
management. They conclude that MPM indicators used in a The information provided by BN are probabilistic values
given manufacturing environment are directly influenced by which are computed using Bayes theorem and represent
the fulfillment of the maintenance objectives. Similarly, risk evaluation under uncertainty (Kjærulff and Madsen
Weber and Thomas (2006) developed a framework to define 2006; Jensen and Nielsen 2007; Pourret et al. 2008). At
key performance indicator (KPI) for managing the mainte- each level of the BN structure, formalism can be read in the
nance function based on physical asset management following way, also called ‘‘Bayes condition.’’ The prob-
requirements and asset reliability process. Moreover, ability of A (failure mode) given that the event D has
Muchiri et al. (2011) demonstrate that MPM indicators occurred is given by Eq. 1. This can be applied at different
cannot be defined in isolation, but should be the result of a levels of the O-BN to compute the CPT for all the nodes.
careful analysis of the interaction between maintenance and The BN approach is chosen because of its capabilities to
organizational functions. These indicators are classified into represent experts’ knowledge capitalized in FMECA files
three categories: equipment and process related (e.g., (Garcia and Gilabert 2011). The affinity between FMECA
capacity utilization, OEE, and availability), cost related structure and causal nets like Bayesian network is also
(e.g., maintenance cost per unit production cost) and main- demonstrated by Lee (2001), Weber et al. (2001). Bayesian
tenance task related (e.g., ratio of scheduled and total networks fit to FMECA as they allow the same way of
maintenance tasks) (Muchiri et al. 2010). Following these describing causality chains (Fig. 10). The BN is able to
recommendations about MPM indicator definition, we have manage the uncertainty regarding causes and failure
extracted from the literature a set of MPM indicators, widely modes. This is due to the fact that information samples
accepted in the industry and also used in the SI (Fig. 7), and (detection, failure, effect and context) are hard to be
we use them in our proposed methodology. sampled and synchronized in time. Moreover, information
In DIEL workshop case study, the experts have defined a takes time to be available; therefore, during this elapsed
set of the most representative MPM indicators and their time, the equipment and process behavior may evolve. A
operational control limits extracted from MPM category key characteristic of Bayesian techniques (Liu 2008) is the
list presented in Fig. 7. An example of these MPM indi- capability to treat both continuous and discrete variables.
cators is presented below with their definition and the way The O-BN is to be linked with CMMS to exchange
to compute it (Table 1). information, filled by actors or MES, during maintenance

123
204 Cogn Tech Work (2016) 18:193–213

Fig. 7 Industry-standard MPM indicators from literature also used in SI (Weber and Thomas 2006, SEMI E79 2000)

Table 1 MPM indicators definition for DIEL case study (Semi-E79 Standard)
MPM indicator Definition and computing formula Loss productivity elements

MTTR: mean time to To calculate the average maintenance hours required for Inefficient operating maintenance protocol
repair scheduled and unscheduled downtime
Availability 100 % 9 (Equipment uptime/total time) Unscheduled downtime: failures, process drifts and
efficiency % scheduled: PM, qualification, setup, etc.
Operational 100 % 9 (Processing time/equipment uptime) Minor stop, offline training, install, engineering, no
efficiency % product, no operator, etc.
Rate efficiency % 100 % 9 (Theoretical production time for actual units/
processing time)
Quality efficiency % 100 % 9 (Theoretical production time for effective units/ Scrap, rework
theoretical production time for actual units)
OEE: overall Availability efficiency 9 operational efficiency 9 rate
equipment efficiency 9 quality efficiency
efficiency
PM success rate Number of succeeded maintenance work order (WO)/total Inefficient or inappropriate operating maintenance
executed WO protocol

actions execution and to give the necessary feedback to network to be predicted is ‘‘effect’’ as the consequences of
improve the effectiveness of existing procedures usage. executed maintenance actions. The structure of the network
PðA; DÞ PðDjAÞPðAÞ is defined by experts using FMECA file as presented in
PðAjDÞ ¼ ¼ ð1Þ Fig. 11. The CPT associated with the O-BN nodes can be
PðDÞ PðDÞ
defined by experts according to their knowledge and/or by
using inference algorithm and historical dataset. Then,
3.2.4 Step 1: O-BN and OFC accuracy of resulting BN is validated through tenfold
cross-validation approach (McLachlan et al. 2004). The
In this section, we present the structure of O-BN developed BN is able to predict the consequences of executed actions
based on existing experts’ knowledge. The columns from through fulfillment criteria monitoring for each PM func-
the extended FMECA are mapped based on color as nodes tion (objective). These fulfillment criteria are discretized
in the BN as presented in Fig. 10. The target node in this into multiple levels (OFC) as defined in the extended

123
Cogn Tech Work (2016) 18:193–213 205

Fig. 8 a Monthly indicator


OEE % trend. b Quarterly
indicator PM Success rate trend

An example of inference from BN model is highlighted


as a proof of concept in Fig. 12. Indeed, the BN enables
predicting potential failure mode (FM) and its effects on
product and equipment for given PM objective and asso-
ciated fulfillment criteria. We observe from the fig-
ure above that, in the presence of ‘‘pressure
chamber [ 7.5 Torrs,’’ and non-presence of ‘‘flush with
helium (He),’’ the likely failure mode is cold chamber that
impacts the deposition rate and by consequence the product
quality. This example demonstrates the O-BN ability to
highlight the likely failure and consequence of executed
Fig. 9 Example of BN and associated conditional probabilities action based on the criteria level (OFC) filled by technician
table (CPT) in CMMS during maintenance executions. This predictions
result is dynamically fed back to the technician as textual
FMECA to measure the effectiveness of executed mainte- message.
nance actions, accurately. Indeed, this O-BN is connected The above O-BN demonstrated that it can capture
to CMMS where technicians enter the OFC levels based on effectiveness and provide feedback to technicians on con-
measurements or their judgment of fulfillment criteria. sequences of their preformed actions, during maintenance
These values are input to O-BN which predicts effects on intervention. However, O-BN has static structure which is
equipment and product quality using Bayesian inference. defined based on experts’ judgment. Moreover, this static
This step ensures that experts’ knowledge is properly used model cannot take into account the impact of evolving
and described inside the operating procedures. equipment behaviors because of high-mix low-volume

123
206 Cogn Tech Work (2016) 18:193–213

Fig. 10 BN structure definition from experts’ knowledge in FMECA

Fig. 11 BN based on experts’ judgment to monitor performed maintenance efficiency

123
Cogn Tech Work (2016) 18:193–213 207

Fig. 12 Proof of concept for knowledge-based BN model

production. Hence, in next section, we propose a dynamic comparing the predicted information with the actual values
loop to learn BN structure from historical data to discover in the dataset. The accuracy is defined as the ratio of the
new knowledge and update FMECA and maintenance number of correct predictions to number of total predic-
procedures, accordingly. tions. The total precision of the model is 68 % at the last
evaluation.
3.3 Step 2: Existing knowledge accuracy control
3.4 Step 3: New causal links discovery and existing
In step 1 of the proposed methodology, a static capital- knowledge update
ization and usage of experts’ knowledge are proposed in
order to enhance the maintenance actions effectiveness by The BN as a causal network has the ability to deal with
supporting cognitive process of human actors, during the probabilistic inferences under uncertainty. In practice, BN
execution of maintenance actions. In addition, we used structure is defined, exclusively, according to experts’
experts’ knowledge that defines only cause–to-effect rela- judgment and FMECA documents. Besides, the BN
tionships which are under control. However, in dynamic structure can also be learned from data using unsupervised
production environment like high mix low volume, these or semi-supervised learning algorithms which is helpful in
links change continuously. Therefore, experts’ knowledge discovering new causalities from historical data (Hecker-
should be updated accordingly. As discussed in previous man 1997). The third step in proposed approach is about
section, the MPM indicators generate triggers to control the learning the BN structure using unsupervised algorithms
accuracy of knowledge capitalized in FMECA as well as EQ, Taboo and Taboo Order (Abu-Samah et al. 2014)
maintenance procedures. We propose to control the accu- based on minimum description length (MDL) as an
racy using O-BN which is developed using experts’ objective function. This function measures the correlation
knowledge and update it as per potential new knowledge. and complexity of causal network and provides ‘‘automatic
This allows the evaluation of the reliability of existing significance thresholds’’ (Rissanen 1978; Bouckaert 1993)
experts’ knowledge through the evaluation of perdition that serves as criteria to select the lowest score (best
accuracy. The input data are collected from the CMMS structure) BN. The equivalence class (EQ) algorithm is
over previously executed maintenance actions. The pre- efficient to learn causal structure from historical data
diction accuracy is computed and compared with the user- (Friedman and Koller 2000). It serves to reduce the search
defined threshold that leads to search for potential new space. This algorithm considers that two BN structures are
knowledge. The relevance of existing knowledge is eval- equivalent if the set of distributions presented by one
uated through O-BN accuracy using historical interventions structure can be presented identically by the other one
maintenance data collected from CMMS system. This (Chickering 2002; Munteanu and Bendou 2001). The
dataset is used to evaluate the accuracy of O-BN by Taboo search algorithm aims to optimize and refine an

123
208 Cogn Tech Work (2016) 18:193–213

existing structure of causal model. Therefore, it is helpful backstream error are found to be correlated with product
to readjust an initial structure defined by experts or defectivity. For example, we can observe from Fig. 14 that
developed using EQ algorithm; thus, it can improve MDL the nodes in red rectangle gas line purge, PTV failure are
score as well. The Taboo order (Teyssier and Koller 2005) no more linked to residue in gas line. The found structural
is presented as an extension of Taboo algorithm that offers changes are presented to the experts for their validation as a
more accurate results, but is more expensive in terms of new knowledge. The FMECA is modified according to new
computing time than simple taboo search. knowledge; otherwise these changes are discarded. Even-
In this paper, we used EQ search followed by Taboo tually, it serves to modify maintenance procedures enabling
refining algorithm to learn the BN structure. The historical adaptations of maintenance practices influenced by high-
data are extracted from CMMS as executed maintenance mix low-volume production.
actions, equipment and product metrology data. In the
proposed third step, new potential knowledge is high- 3.5 Feedback of maintenance actors on proposed
lighted by comparing initial BN structure defined by BN-based approach
experts and learned BN from data. The structural change
between variables is considered as potential new knowl- In order to evaluate the usefulness and the impact of O-BN
edge. These are fed to the experts to validate new causal- and U-BN models based methodology on the cognition of
ities between variables from the invalid beliefs before their technicians and experts, face-to-face meetings were held
inclusion in the FMECA knowledge base. Then, the with the maintenance actors. The question was what is your
maintenance procedure is updated accordingly in order to opinion about O-BN and U-BN models to support experts’
improve the effectiveness of maintenance actions as well. knowledge and to help technicians during maintenance
The dynamic characteristic of the SI leads to evolving execution with feedback on the consequence of their
nature of equipment performance drifts which destabilize the actions? The experts in these meetings were the equipment
production capacities. Thus, industrial performance requires engineers from DIEL workshop. The majority believed that
dynamic updating of maintenance procedures to adapt with proposed BN-based methodology is helpful to discover
high-mix low-volume production. Therefore, new BN model new knowledge because some correlations cannot be
has been learned from historical data collected across DIEL identified without historical data investigation; for exam-
production line, using unsupervised algorithms. The dataset ple, the unsupervised learning found that 15 min of lid
used for this case study is composed of preceding execution of thermalization is associated with downtime. The experts
revised maintenance procedure from FMECA (Figs. 5, 6). It is explain that sensor signal changes during thermalization;
also composed of equipment parameters like RF, chamber therefore, if the sensors are set at target rate without ther-
pressure, and product metrology after maintenance such as malization, its value will not be at the target at the end of
defectivity and deposition rate. This dataset is divided into two CFL (parts inside the equipment) calibration. That impacts
parts, 75 % rows for learning and 25 % for testing. The BN is the plates positioning and consequently the downtime.
learned using EQ and Taboo unsupervised learning in Baye- There was not even a single comment against the proposed
sialab 5.0, respectively. The BN model is presented below in BN-based methodology because of the fact that this helps
Fig. 13a. The learned network is then validated with the them to achieve their operational targets by quickly
testing part of dataset. Thus, for this case study as presented in understanding the unscheduled equipment breakdown
Fig. 13b, the contingency fit is observed to be 77 and 72 %, scenarios. The proposed methodology acts as experts’
which represent the log-likelihood of joint probability distri- cognition provoking tool which is based on data collected
bution of the U-BN with the used dataset, respectively across the production line. The data analysis with evidence
(threshold of 75 % is used as criteria to accept model). could trigger paralysis; however, the final authority to
Moreover, the color of each node in this new U-BN model reject newly generated knowledge by the experts if they are
corresponds to its respective class (objectives, criteria, failure not convinced overrule this consequence. Nevertheless,
modes, and effects). they recommended extending dataset used to learn the
The learned model from historical data can contain new U-BN. This also requires the development of advanced
knowledge. Hence, we compare the structures of experts’ algorithm to compare two BN structures with different
knowledge-based model (O-BN) presented in Fig. 11 and dimensionalities. The maintenance actors also consider that
the learned model presented in Fig. 13a (ByesiaLab 5.0). feedback to technicians on the consequences of their
Figure 14 shows an example of found structural changes in actions is useful to improve their auto-training cognition
terms of causal links. It is worth mentioning that the both process. For example, the fact to clean the process module
O-BN and U-BN have the same set of nodes. of the equipment with alcohol and then with water (H2O)
The new knowledge can be extracted from new causal may lead to the problem of vacuum pressure on equipment
links in the learned network. For example, the plasma and during production or qualification; in this case, the

123
Cogn Tech Work (2016) 18:193–213 209

Fig. 13 a Unsupervised Bayesian Network (U-BN) example. b BN contingency fit diagram for U-BN learned from historical data using
unsupervised algorithm

123
210 Cogn Tech Work (2016) 18:193–213

Fig. 14 Structural difference between O-BN and U-BN models allowing knowledge discovery

technician knows that it is necessary to clean the process Second, this study proposed a dynamic approach to
module with water and then alcohol (and not the reverse update and renew maintenance experts’ knowledge to
order) to avoid equipment contamination. But the predic- design maintenance procedure in order to adapt it with the
tion results provided by O-BN should be presented in the evolving nature of the industrial context. Indeed, an
most effective way to be accepted by the technicians. extended FMECA technique with objective fulfillment
Hence, improvement request is taken into account in our criteria (OFC) collects and unifies existing experts’
present developments. knowledge. This dynamic loop of maintenance procedures
and knowledge update is triggered by defined MPM
indicators and existing knowledge accuracy control. The
4 Conclusion and perspectives loop of new knowledge discovering proposed in this study
is based on finding new causal links (new arcs direction)
The sustainable production capacities in high-mix low- in BN structure to support experts to update existing
volume production environment play an important role in knowledge using unsupervised learning from historical
the competitiveness of SI. For this reason, the proposed dataset. However, this methodology does not offer the
methodology deals with the maintenance actions effec- possibility to learn new BN structure with new nodes and
tiveness to optimize production capacities which are comparison of different BN structures with different data
destabilized by variability in production environment. The dimensionality.
contributions of this study can be classified into two parts Hence in the future, it is of interest to extend this
toward optimized equipment availability and consequently methodology to learn and to compare BN structure with
reduced OEE variability by efficient maintenance policies. different set of nodes using an extended dataset. It is also
First, this supports the improved cognition of maintenance interesting to develop an agile information system that
actors by providing dynamic feedback of potential conse- illustrates the industrial implementation of dynamic
quences of executed actions on the product quality and knowledge management protocols and to develop a human
equipment performance. In such a way, the failure occur- machine interface and advanced algorithm able to interpret
rences may be improved because of the impact of the the BN prediction results and show the feedback to tech-
maintenance actions effectiveness on equipment reliability, nicians in a more effective way (graphical results with
and this also ensures good product quality. The gain in reasoning). Moreover, it is also interesting to find answer to
terms of failure occurrences can be observed from Fig. 15. the question of developing BN at the equipment level or
The normalized values are plotted for 17 failures in red whole workshop level. Moreover, generic methodology
color (before 4 months) and in green color (after 4 months) based on statistical analysis is also required to identify and
of the deployment of proposed methodology in the pro- compute automatically the MPM limits. These limits could
duction line. be adjusted as the function of variability observed in the

123
Cogn Tech Work (2016) 18:193–213 211

Fig. 15 Impact of BN feedback


to technicians on failure
occurrences

production line through standard statistical process control O-BN Operational Bayesian network
(SPC). PdM Predictive maintenance
PM Preventive maintenance
Acknowledgments The authors acknowledge STMicroelectronics
PRM Probabilistic relational model
for providing an opportunity to carry out field study in their main-
tenance department. The authors also acknowledge the European RCM Reliability-centered maintenance
project ENIAC INTEGRATE, ANRT (National French Agency for RMS Recipe management systems
Research and Technology), and Rhone Alpes region for their support. RPN Risk priority number
Glossary RPN* Normalized risk priority number
SHELL Software, hardware, environment, live-ware
ADCS Advanced documentation control systems model
APC Advanced process control SI Semiconductor industry
AMT Aircraft maintenance technicians SPC Statistical process control
BEOL Backend of line TPM Total productive maintenance
BN Bayesian network U-BN Unsupervised Bayesian network
CAD Computer-aided design WO Work order
CBM Condition-based maintenance
CM Corrective maintenance
CMMS Computerized maintenance management
system References
CPT Conditional probability table
CVD Chemical vapor deposition Abu-Samah A, Shahzad MK, Zamaı̈ E, Hubac S (2014) Methodology
DIEL Dielectric deposition workshop for integrated failure-cause diagnosis with bayesian approach:
application to semiconductor manufacturing equipment. In:
EQE Quivalence class algorithm Proceedings of second European conference of the prognostics
FDC Fault detection and classification and health management society, Nantes, France
FEOL Frontend of line Abu-Samah A, Shahzad MK, Zamaı̈ E, Ben Said A (2015) Failure
FM Failure mode prognosis methodology for improved proactive maintenance
using bayesian approach. In: 9th IFAC symposium on fault
FMEA Failure mode and effects analysis detection, supervision and safety for technical processes, Sept
FMECA Failure mode effects and criticality analysis 2015, Paris, France
He Helium Baly R, Hajj H (2012) Wafer classification using support vector
HFI Human factor integration machines. IEEE Trans Semicond Manuf 25(3):373–383
Bertling L, Allan R, Eriksson R (2005) A reliability-centered asset
IC Integrated circuit maintenance method for assessing the impact of maintenance in
KM Knowledge management power distribution systems. IEEE Trans Power Syst 20(1):75–82
KNN K-nearest neighbor Bouaziz MF, Zamaı̈ E, Duvivier F (2013) Towards Bayesian network
KPI Key performance indicator methodology for predicting the equipment health factor of
complex semiconductor systems. Int J Prod Res
MDL Minimum description length 51(15):4597–4617
MPM Maintenance performance measurement Bouckaert RR, (1993) Probabilistic network construction using the
MP Maintenance procedure minimum description length principle. In: Lecture Notes in
OEE Overall equipment efficiency Computer Science, vol 747, pp 41–48
Brown SM, Hanschke T, Meents I, Wheeler BR, Zisgen H (2010)
OFC Objective fulfillment criteria Queueing model improves IBM’s semiconductor capacity and
OOC Out of control lead-time management. Interfaces 40(5):397–407

123
212 Cogn Tech Work (2016) 18:193–213

Bruseberg A (2008) Presenting the value of human factors integra- Luo SH, Lee GG (2015) Applying failure mode and effects analysis
tion: guidance, arguments and evidence. Cogn Technol Work for successful knowledge management. Total Qual Manag Bus
10(3):181–189 Excell 26(1–2):62–75
Cacciabue PC, Mauri C, Owen D (2003) The development of a model Masson M, Koning Y (2001) How to manage human error in aviation
and simulation of an aviation maintenance technician task maintenance? The example of a Jar 66-HF education and
performance. Cogn Technol Work 5(4):229–247 training programme. Cogn Technol Work 3(4):189–204
Chang YH, Wang YC (2010) Significant human risk factors in aircraft McKone KE, Schroeder RG, Cua KO (1999) Total productive
maintenance technicians. Saf Sci 48(1):54–62 maintenance: a contextual view. J Oper Manag 17(2):123–144
Chickering DM (2002) Learning equivalence classes of Bayesian- McLachlan GJ, Do KA, Ambroise C (2004) Analyzing microarray
network structures. J Mach Learn Res 2:445–498 gene expression data. Wiley, New York
Crocker J (1999) Effectiveness of maintenance. J Qual Maint Eng Medina-Oliva G, Weber P, Iung B (2015) Industrial system knowl-
5(4):307–314 edge formalization to aid decision making in maintenance
Dai W, Maropoulos PG, Zhao Y (2015) Reliability modelling and strategies assessment. Eng Appl Artif Intell 37:343–360
verification of manufacturing processes based on process Meihami B, Meihami H (2014) Knowledge management a way to
knowledge management. Int J Comput Integr Manuf gain a competitive advantage in firms (evidence of manufactur-
28(1):98–111 ing companies). Int Lett Soc Human Sci 03:80–91
Denson B, Tang SY, Gerber K, Blaignan V (2014) An effective and Mili A, Siadat A, Hubac S, Bassetto S (2008) Dynamic management
systematic design FMEA approach. In: Proceeding of reliability of detected factory events and estimated risks using FMECA. In:
and maintainability symposium (RAMS), annual IEEE confer- Proceeding of management of innovation and technology,
ence, pp 1–6 ICMIT 2008 4th IEEE international conference on,
Friedman N, Koller D (2000). Being Bayesian about network pp 1204–1209
structure. In: Proceedings of the sixteenth conference on Mili A, Bassetto S, Siadat A, Tollenaere M (2009) Dynamic risk
uncertainty in artificial intelligence. Morgan Kaufmann, management unveil productivity improvements. J Loss Prev
pp 201–210 Process Ind 22:25–34
Garcia A, Gilabert E (2011) Mapping FMEA into Bayesian networks. Military US (1949) Procedure for performing a failure mode effect
Int J Perform Eng 7(6):525–537 and criticality analysis. United States military procedure MIL-P-
He QP, Wang J (2010) Large-scale semiconductor process fault 1629
detection using a fast pattern recognition-based method. IEEE Mönch L, Fowler JW, Mason SJ (2012) Production planning and
Trans Semicond Manuf 23(2):194–200 control for semiconductor wafer fabrication facilities: modeling,
Heckerman D (1997) Bayesian networks for data mining. Data Min analysis, and systems. Springer, New York
Knowl Disc 1(1):79–119 Moore T, Harner B, Kestner G, Baab C, Stanchfield J (2006) Intel’s
Hsieh YS, Cheng FT, Huang HC, Wang CR, Wang SC, Yang HC FDC proliferation in 300 mm HVM: progress and lessons
(2013) Vm-based Baseline predictive maintenance scheme. learned. In: Proceeding of AEC/APC Symp. XVIII, Westmin-
IEEE Trans Semicond Manuf 26:132–144 ster, CO
Hubac S, Zamai E (2013) Politiques de maintenance equipment en Muchiri P, Pintelon L, Martin H, De Meyer AM (2010) Empirical
flux de production stressant—equipment maintenance policy in analysis of maintenance performance measurement in Belgian
stressed manufacturing flow (technology or product). Edition TI industries. Int J Prod Res 48(20):5905–5924
(Technique de l’ingenieur) [AG 3535] Muchiri P, Pintelon L, Gelders L, Martin H (2011) Development of
Ison A, Spanos CJ (1996) Robust fault detection and fault classifi- maintenance function performance measurement framework and
cation of semiconductor manufacturing equipment. In: Proceed- indicators. Int J Prod Econ 131(1):295–302
ings of the 5th international symposium on semiconductor Munteanu P, Bendou M (2001) The EQ framework for learning
manufacturing, pp 1–4 equivalence classes of Bayesian networks. In: First IEEE
Jensen FV, Nielsen TD (2007) Bayesian networks and decision international conference on data mining (IEEE ICDM), San José
graphs, 2nd edn. Springer, New York Mustapha I, Jusoh A, Nor KM (2015) A review on quality
Jordan WE (1972) Failure modes, effects and criticality analyses. In: management systems maintenance framework based on process
Proceedings of the annual reliability maintainability symposium, based management, knowledge quality and knowledge self-
pp 30–37 efficacy. J Teknol 72(4):7–12
Kjærulff UB, Madsen AL (2006) Probabilistic networks for practi- Omdahl TP (1988) Reliability, availability, and maintainability
tioners—a guide to construction and analysis of Bayesian (RAM) dictionary. ASQC Quality Press, Milwaukee
networks and influence diagrams. Department of Computer Parida A, Kumar U (2006) Maintenance performance measurement
Science, Aalborg University, HUGIN Expert A/S (MPM): issues and challenges. J Qual Maint Eng 12(3):239–251
Krishnamurthy L, Adler R, Buonadonna P, Chhabra J, Flanigan M, Peter ML (2012) Bayesian statistics: an introduction. Wiley, New York
Kushalnagar N, Nachman L, Yarvis M (2005) Design and Pourret O, Naı̈m P, Marcot B (2008) Bayesian networks: a practical
deployment of industrial sensor networks: experiences from a guide to applications. Wiley, Chichester
semiconductor plant and the North Sea. In: Proceedings of the Rashid HSJ, Place CS, Braithwaite GR (2014) Eradicating root causes
3rd international conference on embedded networked sensor of aviation maintenance errors: introducing the AMMP. Cogn
systems, San Diego, California, USA, November 2005, pp 02–04 Technol Work 16(1):71–90
Lee B (2001) Using Bayes belief networks in industrial FMEA Rasmussen J (2000) Human factors in a dynamic information society:
modelling and analysis. Proc Annu Reliab Maintainab Symp where are we heading? Ergonomics 43(7):869–879
15(4):281–293 Redmill F (2002) Risk analysis—a subjective process. Eng Manag J
Lin XJ, Lin Q, Zhang GN (2015) Effectivity of total productive 12(2):91–96
maintenance (TPM) in large size organizations—a case study in Rissanen J (1978) Modeling by shortest data description. Automatica
Shandong Lingong. Appl Mech Mater 701:1249–1252 14(5):465–658. doi:10.1016/0005-1098(78)90005-5
Liu Y (2008) Predictive modeling for intelligent maintenance in Schirru A, Pampuri S, DeNicolao G (2010) Particle filtering of hidden
complex semiconductor manufacturing processes. ProQuest, gamma processes for robust predictive maintenance in semicon-
Doctorate thesis, University of Michigan ductor manufacturing. In: Proceedings of 6th IEEE CASE

123
Cogn Tech Work (2016) 18:193–213 213

SEMI E79-0200 (2000) Standard for definition and measurement of Uzsoy R, Lee CY, Martin-Vega LA (1994) A review of production
equipment productivity. Semiconductor Equipment and Material planning and scheduling models in the semiconductor industry
International (Mountain View, CA) part II: shop-floor control. IIE Trans 26(5):44–55
Shahzad MK, Hubac S, Siadat A, Tollenaere M (2011) An extended Verdier G, Ferreira A (2011) Adaptive mahalanobis distance and-
business model to ensure time-to-quality in semiconductor nearest neighbor rule for fault detection in semiconductor
manufacturing industry. In: International conference on enter- manufacturing. IEEE Trans Semicond Manuf 24(1):59–68
prise information systems, Portugal, 2011 Villacourt M (1992) Failure mode and effects analysis (FMEA): a
Simões JM, Gomes CF, Yasin MM (2011) A literature review of guide for continuous/line improvement for the semiconductor
maintenance performance measurement: a conceptual frame- equipment industry SEMATECH. Transfer 92020963B-ENG
work and directions for future research. J Qual Maint Eng Weber P, Jouffe L (2006) Complex system reliability modeling with
17(2):116–137 dynamic object oriented Bayesian networks (DOOBN). Reliab
Susto G, Beghi A, DeLuca C (2011) A predictive maintenance system Eng Syst Saf 91:149–162
for silicon epitaxial deposition. In: Proceedings of IEEE Weber A, Thomas R (2006) Key performance indicators: measuring
conference on automation science and engineering (CASE), and managing the maintenance function. Ivara Corporation,
pp 262–267 Burlington
Susto G, Pampuri S, Schirru A, Beghi A (2012) Optimal tuning of Weber P, Suhner MC, Iung B (2001) System approach-based
epitaxy pyrometers. In: Proceedings of 23rd IEEE/SEMI Bayesian network to aid maintenance of manufacturing process.
advanced semiconductor manufacturing conference, pp 294–299 In: Proceedings of 6th IFAC symposium on cost oriented
Teece DJ (2000) Strategies for managing knowledge assets: the role automation, low cost automation, Berlin, Germany, October
of firm structure and industrial context. Long Range Plan 2001, pp 8–9
33(1):35–54 Yssaad B, Khiat M, Chaker A (2014) Reliability centered mainte-
Teyssier M, Koller D (2005) Ordering-based search: a simple and nance optimization for power distribution systems. Int J Electr
effective algorithm for learning Bayesian networks. In: Proceed- Power Energy Syst 55:108–115
ings of 21st conference on uncertainty in AI (UAI), pp 584–590

123

Você também pode gostar