Você está na página 1de 47

Reliability and Maintainability

Engineering: An Overview
E. A. Elsayed
Department of Industrial and Systems Engineering
Rutgers University
New Jersey, USA
elsayed@rci.rutgers.edu
1
Brief Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
current sensors technologies
2
3
Reliability Importance
One of the most important characteristics of a product,
it is a measure of its performance with time
(Transatlantic and Transpacific cables)
Products recalls are common (only after time elapses).
Early failure in the field (Dreamliner 787 grounding)
Products are discontinued because of fatal accidents
(Pinto, Concord)
Medical devices and organs (reliability of artificial
organs)How reliable?
3
Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
current sensors technologies
4
Reliability Economics
Oil Pipeline Shutdown (Hardware Failure)
BP shuts oilfield August 8, 2006
Damaged pipeline in Alaska affects 8% of U.S. oil
production; crude surges; record gas prices seen.
The threat of a stoppage also endangers Alaska's budget:
Oil taxes account for more than 90 percent of its revenues.
BP officials have acknowledged they did not test the pipes
adequately using a so-called pig device which is run
through a pipe to gauge corrosion (utilizes ultrasound to
detect corrosion).,,,inspection and maintenance strategies.
5
Reliability Engineering
Air Traffic Delays (Software Failure)
Nov 19 2009: A computer glitch caused flight
cancellations and delays across the U.S.
The problem involved the FAA computer systems
in Salt Lake City and Atlanta that handle
automated flight plans, forcing air traffic
controllers to revert to the much more time-
consuming approach of entering flight plans by
hand.
Software failure (7000 flights)
6
Global Impact: Oceanic Airspace
Data Link Communication Reliability
Oakland ARTCC
ARINC SITA
Ground
Earth
Stations
Uplink
7
Engine
Monitoring
AOC or Aircraft
Operational
Communication
Position
Monitoring
Recall of Cars due to Degradation
October 1, 2012
(Environmental conditions)
8
General Motors Co. recalled more than 40,000 cars sold in
warm-weather states because a plastic part might crack and
cause a fuel leak.
The recall affects vehicles sold or currently registered in
Arizona, California, Florida, Nevada or Texas. Owners in
Arkansas and Oklahoma also are included in the recall of the
2009 Cobalt and G5.
The vehicles have plastic parts connected to the fuel pump
which could crack. If the crack gets large enough, fuel could
leak out of the vehicle and cause a fire.
Reliability Definitions Measurements
When you a buy a product or service
you request high quality and high reliability
How do you measure it? What is high?
How long? Reliability: 0.99 at year 5, 0.999 at year 4
Time dependent qualityreliability
How do companies predict reliability and estimate
warranty?
Reliability of cold standby units New tires and old
tires
9
Maximum Reliability level
R
e
l
i
a
b
i
l
i
t
y
W
ith
R
e
p
a
ir
s
Time
N
o

R
e
p
a
ir
s
Some Initial Thoughts
Repairable and Non-Repairable
Another measure of reliability is availability (probability
that the system provides its functions when needed).
10
Some Initial Thoughts
Failure Rate During Life Cycle
Will you buy additional warranty?
Burn in and removal of early failures.
Time
F
a
i
l
u
r
e

R
a
t
e
Early Failures
Constant
Failure Rate
Increasing
Failure
Rate
11
Brief Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
current sensors technologies
12
13
Reliability Definitions
Reliability is a time dependent characteristic.
It can only be determined after an elapsed time but
can be predicted at any time.
It is the probability that a product or service will
operate properly for a specified period of time (design
life) under the design operating conditions without
failure.
13
14
Other Measures of Reliability
Availability is used for repairable systems
It is the probability that the system is operational at
any random time t.
It can also be specified as a proportion of time that
the system is available for use in a given interval
(0,T).
Mission availabilityplay time, military products
One shot devicesmissiles, standby generators.
14
15
Other Measures of Reliability
Mean Time To Failure (MTTF): It is the average
time that elapses until a failure occurs.
It does not provide information about the distribution
of the TTF, hence we need to estimate the variance
of the TTF.
Mean Time Between Failure (MTBF): It is the
average time between successive failures.
It is used for repairable systems.
15
16
Other Measures of Reliability
Mean Residual Life (MRL): It is the expected remaining
life, T-t, given that the product, component, or a system
has survived to time t.
Failure Rate (FITs failures in 10
9
hours): The failure rate in
a time interval [ ] is the probability that a failure per
unit time occurs in the interval given that no failure has
occurred prior to the beginning of the interval.
Hazard Function: It is the limit of the failure rate as the
length of the interval approaches zero.
1
2
t t
1
( ) [ | ] ( )
( )
t
L t E T t T t f d t
R t
t t t

= > =
}
16
Brief Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
current sensors technologies
17
18 18
Types of Reliability Testing
Highly accelerated life testing (HALT)
Highly accelerated stress screening (HASS)
Accelerated life testing (ALT)
Degradation testing (DT)
Accelerated Degradation Testing (ADT)
Acceptance test (AT)
19
20 20
Purposes and Needs
1. Investigate failure modes and correct designs
2. Improve reliability during development
3. Demonstrate reliability for accepting a design
4. Predict reliability
5. Eliminate units with manufacturing defects
(infant mortality failures)
6. Accept or reject a products
21 21
Idea of Accelerated Stress Testing
Compression and extrapolation
Use c ondi t i on
Sever er c ondi t i on
Li f e
St r ess
Unk now n
ex t r apol at i on
MTTF
Testing and Acceptance
Testing and Life Prediction:
Conduct extensive reliability testing for both
demonstration (acceptance of products) and life prediction.
Collect continuously to have constant assessment of the
system reliability. Always assess the mean residual life to
determine the optimum time to replace the system
specially in the wear out region..
Suppliers need to provide evidence of systems reliability.
22
Current Cycle Profile: -40 C to 125 C
New Cycle Profile: -65 C to 170 C
Stress type, stress loading, sample size, duration of
the test.???
24 24
Summary of Reliability Testing
1 2 3 4 5 6
RDT X
RGT X
HALT X X
HASS X X
ALT X X
DT X X
ADT X X
Burn-in X
Acceptance X X
Brief Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
current sensors technologies
25
System Design
System design
Consider many alternatives
such as: Having highly
reliable components without
redundancy and high cost or
less reliable components
but have redundancy (less
expensive system), explicit
redundancy.
Prime components..
0.95
0.90
0.90
R=0.95
R=0.99
26
System Design Contd
System design: Implicit
Redundancy
Pumps connected in series.
Two consecutive failures for the system to fail
27
Brief Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
current sensors technologies
28
Failure Rate and Maintenance
Availability
Region 1, repair upon failure (FR)
Region 2, no PM
Region 3, PM, Condition-based, if possible.
Time
F
a
i
l
u
r
e

R
a
t
e
Early Failures
Constant
Failure Rate
Increasing
Failure
Rate
29
30
Distribution of the state
after maintenance
System State under Imperfect Maintenance
Time t
T
1
> T
2
> T
3
> T
4
0
S
T
A
T
E
X(t)
D
PM
R
1
R
+
1
R
2
R
+
2
D
F
R
3
R
+
3
R
4
R
+
4
R
5
R
+
5
R
6
R
+
6

is the failure threshold while is preventive


maintenance threshold.
F
D
PM
D
Maintenance
Maintenance:
Time to perform maintenance is normally
recommended by the manufacturer of the equipment
operating under normal conditionswhat is normal?
If the estimate of failure time is conservative, one
risks the replacement of items before its useful life.
If the estimate of failure time is optimistic, one risks
that the component or system may fail in service.
Use Condition-Based Maintenance
31
Brief Topics
Some initial thoughts and personal
experience
Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,
Inspection policies
Condition-based maintenance based on
advances in sensors technologies
32
Technological Advances and Maintenance
Condition-based Maintenance
The recent advances in sensors technology,
chemical and physical non-destructive testing
(NDT), and sophisticated measurement techniques,
information processing, wireless communications
and internet capabilities have significantly impacted
the condition-based maintenance approach by
providing dynamic maintenance schedules that
minimize the cost, downtime and increase system
availability.
33
Condition-Based Maintenance Example
Otis Elevators / United Technologies
Remote Elevator Monitoring
It is an interconnected system of sensors,
monitors, circuits, hardware and software used
to collect, record, analyze and communicate
elevator data continuously monitors hundreds of
different functions on elevators worldwide
(braking time, acceleration, deceleration, floor
level, door opening, forces on strands,..).
34
Condition-Based Maintenance Example
If the system detects a problem, it analyzes and
diagnoses the problems cause and location. It
makes the service and aids Otis mechanic in
identifying the exact component(s) causing the
problem, which helps to facilitate a timely and
accurate resolution.
Issues:
Which component or subsystem?
What is the indicator?
What is the alarm level of the indicator?
What is the action?
35
36 36
Most of mechanical components such as gears,
brakes, bearings exhibit degradation before
failure.
Ductile materials have degradation indicators:
change in its physical properties
Electronic components such as resistors,
capacitors, diodes exhibit change in
performanceresistance drift as an example
Polymers and elastomers change properties with
time and stress
Degradation Indicators
37 37
Examples of these indicators include hardness
which is a measure of degradation of
elastomers. This is due to the fact that
elastomeric materials are critical to many
applications including hoses, seals and dampers
of various types and their hardness increases
over time to a critical level at which their ability
to absorb energy is severely degraded. This
may lead to cracks or excessive wear and
related failure modes in components
Degradation Indicators
38
Continuous-time Continuous-state
Degradation (Indicators)
Examples
Light intensity degradation of light emitting diode (LED)
Metal crack propagation
Wear of tires, brake system wear out, nozzle blocking.
Increase of stiffness of energy absorbing material.
Strength loss of steel beams due to corrosion
Degradation Path
39
( ) ( )
o = + dX t dt dW t
( ) ( ) ( ) ( )
0
o o = + = = + +
} } }
i i
dX t dt dW t X t x t W t
Brownian Motion: Parameters Estimation
Using the Maximum Likelihood Estimator we obtain the
parameters of the Brownian Motion degradation path as
40
( )
( )
2
1 1
2
2
2
1 1

1 1

o
o
= =
= =

= =
= A = A

= =
A A


n n
i i
i
i
i i
i i
n n
i i
i
i
i i
y m
y
m v
n n
m t v t
y m
y
t n t n
First Passage Time Distribution
41
degradation 1
degradation 2
threshold 1
threshold 2
42
Relate Degradation to Failure Time
Distribution (First Passage Time)
:Failure time
0
10
20
30
40
50
0 20 40 60 80 100 120 140 160 180 200
Time
Data 1
&Exponential Fit 1
Data 2
&Exponential Fit 2
Data 3
&Exponential Fit 3
Data 4
&Exponential Fit 4
Critical Degradation
Degradation vs Time
( ) ) R t t = >
st
Prob(the 1 time for degradation measure to cross the failure threshold
Failure threshold
First Passage Time Distribution
43
In degradation modeling we set a degradation
threshold level at which the performance of the
system is considered unacceptable. The threshold
level is . The probability of failure can be
defined as
The time to cross the threshold level is referred to
as the first passage time. It has a distribution and
its density function is given as shown next.
f
D
( ) ( )
1

o

| |
= s = u
|
\ .
f
D t
F t P T t
Failure Data in Oil and Gas Industry
OREDA database:
MTBF motor driven dry-wet compressor unit
=1.8 year (It is really MTTF)
Major Failures: 80% of all forced outages are
caused by unforeseen liquid ingress into
compressor
44
Failure Data in Oil and Gas Industry
OREDA database:
Seal Failures: 80% of all seal failures are caused
contamination
aging (elastomers)
Solution
Analysis of 11,000 mechanical seal failures from
148 different reliability contract and alliance plant
sites over two years show that that 13% of the
seal failures are attributable to lack of effective
corrective and preventive maintenance.
45
Reliability in Oil and Gas Chain
Failure Minimization
Minimize failures in gas transportation
What causes flow stoppage? Pipe failure (corrosion,
leak), pump failure.
Redundancy, highly reliably components,
Minimize failures in gas processing
Prevent corrosion and erosion of equipment
Prevent condensation of liquid
Prevent ice and gas hydrates
Ice and gas
hydrates
Corrosion
46
Summary
Reliability has a major impact on the service and
products provided by any enterprise in terms of its
economics and consequently its survival
Improvements in reliability can be achieved through
choice of components, design of the system
(redundancy, if needed), testing and prediction,
maintenance and repair strategies. Effective tools are
available.
47

Você também pode gostar