Application of Principal Component Pursuit To Process Fault Detection and Diagnosis

2013 American Control Conference (ACC)
Washington, DC, USA, June 17-19, 2013
Application of Principal Component Pursuit to Process Fault Detection and

Diagnosis
Yue Cheng and Tongwen Chen
Abstract— Data-driven process monitoring has been exten- The PCP technique stems from compressed sensing [6],
sively discussed in both academia and industry because of [7], which reveals a surprising message: the minimum num-
its applicability and effectiveness. One of the most applied ber of data needed to reconstruct a signal may overcome the
techniques is the principal component analysis (PCA). Recently
a new technique called principal component pursuit (PCP) limitation imposed by the Nyquist-Shannon criterion if the
is introduced. Compared to PCA, PCP is more robust to signal is sparse in a certain sense. Inspired by this idea, a
outliers. In this paper, the application of the PCP technique to matrix completion method was proposed to recover a data
process monitoring is thoroughly discussed from training data matrix from only a few entries [8]. Going one step further,
preprocessing to residual signal post-filtering. A new scaling researchers focused on a more challenging problem: recov-
preprocessing step is proposed to improve quality of data
matrices in the sense of low coherence. A residual generator and ering a low rank data matrix contaminated by gross errors
a post-filter suitable for PCP generated process models are also on some of its entries. A novel solution, the PCP technique,
provided. The post-filtered residual represents the fault signal, was then provided [1], [2]. The essential idea of the PCP
which makes the fault detection, isolation and reconstruction technique is to replace the original non-convex optimization
procedures simple and straightforward. A numerical example problem of the matrix rank and the count of non-zero entries
is provided to describe and illustrate the PCP-based process
modeling and monitoring procedures. by a convex optimization problem using the nuclear and
ℓ1 norms. In [2], [3], deterministic conditions under which
N OTATION the two optimization problems are exactly equivalent were
provided; statistic counterparts were provided in [1]. While
the conditions are relatively mild, they greatly depend on the
coherence of the uncontaminated data matrix. The concept
kXk∗ nuclear norm (sum of singular values)
of coherence and a related coherence index were firstly
kXk0 0 norm of a matrix (number of non-zero entries)
introduced in a compressed sensing method in [9]; the same
kXk1 ℓ1 norm of a matrix (sum of magnitudes of all
authors then adjusted this concept to matrix completion and
entries)
PCP techniques [1], [8], and defined three coherence indices.
kXk∞ ℓ∞ norm of a matrix(maximum magnitude of all
Generally speaking, the smaller the coherence indices are, the
entries)
lower the requirement on the completeness of the signal or
diag(X) the set of diagonal entries of a square matrix
the matrix is.
I. I NTRODUCTION The PCP technique becomes popular in the image pro-
cessing area. The technique has been applied to video
In process monitoring, data-driven methods are widely surveillance [1], [10] and face recognition [11] successfully.
used because of their low cost and availability of enormous There are also some attempts to apply PCP to latent sematic
amount of process data from historical data bases. It is gener- indexing [12]. The PCP technique can also be applied to
ally true that high dimensional process data are governed by many other potential problems. Briefly speaking, in all the
underlying low dimensional subspaces. Therefore, principal areas that PCA can be used and outliers are inevitable, PCP
component analysis (PCA) becomes one of the most popular may be a good alternative. Process monitoring is such an
and widely used data-driven methods in industry to capture area. However, to our best knowledge there has been only
such low dimensional subspaces. However, PCA is sensitive one paper on this topic [13]. In [13], a comparison between
to outliers, also called gross errors, in the data. In recent the process monitoring approaches based on PCA and PCP
years, a new matrix decomposition technique called princi- was given. A conclusion was drawn that the PCP technique
pal component pursuit (PCP) is introduced and extensively was promising in process monitoring because the PCP-based
discussed [1]–[5]. Compared with PCA, PCP is robust to out- method could overcome most of the shortcomings of PCA-
liers. Moreover, the new technique outperforms other robust based methods.
PCA methods because of its polynomial-time complexity This paper makes a further discussion on how to apply
and mild conditions under which good performance can be the PCP technique to process monitoring, especially on
guaranteed. the data preprocessing. The rest of the paper is organized
as follows. In Section II, a brief introduction on PCP is
This work was supported by an NSERC strategic project. given. In Section III, a new scaling method is proposed as
Y. Cheng and T. Chen are with the Department of Electrical and Computer
Engineering, University of Alberta, Alberta, T6G 2V4, Canada. Email: a preprocessing step to improve the PCP-based modeling
cheng5@ualberta.ca; tchen@ualberta.ca. result. An online fault detection and diagnosis procedure
978-1-4799-0178-4/$31.00 ©2013 AACC 3535

based on PCP generated process models is discussed in with probability
p at least 1 − c max(n1 , n2 )−10 , PCP with
Section IV. Finally, concluding remarks and future work are λ = 1/ max(n1 , n2 ) is exact, provided that
given in Section V.
rank(L0 ) ≤ ρr min(n1 , n2 )µ−1 log−2 (max(n1 , n2 )),
II. P RINCIPAL COMPONENT PURSUIT and m ≤ ρs n1 n2 .
(6)
L0 ∈ Rn1 ×n2 is a low rank data matrix. S0 ∈ Rn1 ×n2 is
a sparse perturbation matrix. M is the summation of L0 and III. P ROCESS MODEL BUILDING BASED ON PCP
S0 , namely, the observed data matrix. The goal of PCP is to Preprocessing is an important but usually overlooked step
solve the following optimization problem: in PCA-based methods. A main issue is scaling. Variables
Minimize rank(L) + λkSk0 (1) in different units, e.g., meters and centimeters, should be
scaled. Variables in the same unit or in different types of
Subject to L + S = M. units are also usually scaled if the variances of them are very
This optimization problem is very difficult to solve. However, different. As a result, currently the simplest but most widely
the authors of [1], [2] pointed out that the ℓ1 norm was a good used scaling method is normalization, namely, dividing each
replacement of the 0 norm and the nuclear norm was a good variable by its standard deviation.
replacement of the matrix rank. As a result, the objective However, there is a debate that scaling will distort PCA
function of the original optimization problem in (1) can be generated results; so how can one guarantee that the dis-
modified, and the new optimization problem is as follows: tortion is always acceptable? Although some papers tried
to draw some conclusions [14], [15], their statements were
Minimize kLk∗ + λkSk1 (2) always based on case studies. Therefore, the conclusions
were also drawn from special cases.
Subject to L + S = M.
Intuitively, the PCP technique should not be influenced
This new formulation is a convex optimization problem. It by variable scaling, since neither the rank nor the sparsity
can be effectively solved by a modified alternating directions of a data matrix changes after scaling. However, this is not
method provided as Algorithm 1 in [1]. But, is the solution true. The PCP technique can recover the low rank and sparse
to problem (II) a good approximation of that to problem matrices when the conditions in (6) are satisfied. However, ill
(1)? In other words, is the solution to problem (II) a good scaling may greatly increase the coherence index and violate
approximation of the low rank matrix L0 and sparse matrix the conditions. Consequently, PCP may fail to recover the
S0 ? A positive answer is important since if it is not the case, data matrices even if the rank and sparsity do not change.
it makes no sense to use the solution to problem (II) to do An example is shown below.
the recovery. Example 1: A data set has 35 variables and 1000 samples.
Theorem 1.1 in [1] answered the question. In the theorem, So L0 and S0 are 1000 × 35 matrices. The underlying rank
the concept of coherence was used. The coherence of a is 5, i.e., rank(L0 ) = 5, and 5% data are outliers. The time
subspace was defined in [8]: trends of the original signals (only the first seven and last
Definition 1: Let U be a subspace of Rn of dimension seven variables) are shown in Fig. 1. Obviously, there are
r and PU be the orthogonal projection onto U . Then the some scaling problems– see the third and last four variables.
coherence of U is defined to be The variances of this five variables are extremely large,
n which lead to large values of µ(V ) and µ1 . The PCP result
µ(U ) = max kPU ei k22 , (3)
r 1≤i≤n is far from the real L0 and S0 we used to set up the example.
where ei is the i-th standard basis vector. By using the normalized data, µ(V ) decreases from almost
Suppose the singular value decomposition (SVD) of a rank-r 7 to 2.4. µ1 also decreases from greater than 10 to less than
matrix L0 is given by L0 = U ΣV T , where U ∈ Rn1 ×r and 6. PCP can reach a correct result for the normalized data. By
V ∈ Rn2 ×r , the coherences of the column and row spaces using some other scaling parameters we can further decrease
of L0 , namely, U and V , are: µ(V ) to about 1.2 and µ1 to less than 5.
As a result, we may decrease the coherence of the data
n1
µ(U ) = r max kU ei k22 ; matrix and improve the PCP performance by optimizing
n2 (4)
µ(V ) = r max kV ei k22 . the scaling parameters. In recent years a similar topic is
The mutual-coherence of U and V can also be defined as: extensively discussed in compressed sensing [16], [17]. The
r essential idea is optimizing the projection matrix to mini-
n1 n2
µ1 = kU V T k∞ . (5) mize the coherence between the dictionary matrix and the
r projection matrix. However, to our best knowledge, there
Denote the maximum value of µ(U ), µ(V ) and µ1 by µ, is no work that addresses this idea in the PCP technique.
Theorem 1.1 in [1] can be expressed as follows. The reasons may be twofold: first, the projection matrix
Theorem 1: Suppose L0 has coherence index µ, and that in compressed sensing is an arbitrarily designed matrix
the support set, i.e., the set of non-zero positions in the while the data matrix in PCP is not designable; second,
matrix, of S0 is uniformly distributed among all sets of current applications of PCP are mainly in the field of image
cardinality m. Then there is a numerical constant c such that processing, where scaling is not a significant problem.
3536
x1
20
0
n1 > n2 . Assume that the outlier-free low rank matrix L0
−20
0 100 200 300 400 500 600 700 800 900 1000
has its SVD: U ΣV T , and the scaling vector is:
x2
50
0
α = [ α1 α2 ··· αn2 ],
−50
0 100 200 300 400 500 600 700 800 900 1000
2000
x3 where αi > 0, i = 1, 2, · · · , n2 . The scaled outlier-free
0 matrix is:
−2000  
50
0 100 200 300 400 500
x4
600 700 800 900 1000
α1 0 · · · 0
 .. .. 
0
T  0
 α2 . . 
 = Us Σs VsT .
−50
0 100 200 300 400 500 600 700 800 900 1000 Ls = U ΣV  .
x5
 .. . .. . ..

20
0 
0
−20
0 100 200 300 400 500 600 700 800 900 1000
0 ··· 0 αn2
x6
20
0
Obviously, the left null space of L0 is the same as that of
−20
0 100 200 300 400 500 600 700 800 900 1000
Ls . As a result, µ(U ) = µ(Us ), since
x7
r
= max kU ei k2 = max(diag(U U T ))
20
0 n1 µ(U )
−20
0 100 200 300 400 500 600 700 800 900 1000 = 1 − min(diag(I − U U T )) = max(diag(Us UsT ))
= nr1 µ(Us ).
;;;
x29
20
0
However, µ(V ) 6= µ(Vs ). Given an orthogonal basis of the
−20
0 100 200 300 400 500 600 700 800 900 1000
null space of L0 , say V⊥ and the scaling vector α, the
50
x30
procedure to obtain µ(VS ) is as follows:
0
−50
1) Calculate a basis of the null space of Vs :
0 100 200 300 400 500 600 700 800 900 1000
x31  −1
50
0
α1 0 · · · 0
.. 
 0 α2 . . .
−50

0 100 200 300 400 500
x32
600 700 800 900 1000
 .  V⊥ ;
2000
 . .. ..
 ..

0
. . 0 
−2000
2000
0 100 200 300 400 500
x33
600 700 800 900 1000
0 ··· 0 αn2
2) Find out a normalized orthogonal basis Vs⊥ ;
0
−2000
0 100 200 300 400 500 600 700 800 900 1000
5000
x34 3) Calculate the coherence index:
0
T
−5000
0 100 200 300 400 500 600 700 800 900 1000
µ(Vs ) = 1 − min(diag(Vs⊥ Vs⊥ )).
x35
5000
0 pSince µ(U ) is unchangeable, and the upper bound of µ1 is

−5000
0 100 200 300 400 500 600 700 800 900 1000 rµ(U )µ(Vs ) [8], a rational way to decrease coherence is
minimizing µ(Vs ) by optimizing the scaling vector α. There
Fig. 1. Time trends of the original signals are two difficulties on this optimization problem. Firstly,
the null space basis V⊥ is difficult to calculate, since we
only have the contaminated data matrix M instead of the
Intuitively, a subspace with low coherence should dis- outlier-free one L0 . Secondly, even if we can obtain V⊥ ,
tribute its energy on all the coordinates evenly. However, the optimization problem is still not easy because of the
if the variance of several variables are much greater than the non-convexity of the objective function. We can use M to
others in the data matrix, most of the energy of the subspace approximately estimate V⊥ . As to the second difficulty, our
V will concentrate on these directions and lead to a high solution is adopting heuristic optimization algorithms. In
coherence index µ(V ), which is a typical case of ill scaling. this paper we adopt differential evolution (DE) algorithm
The original data set in Example 1 is in such a case. While introduced in [18]. Although the approximation and the
extremely high variance on several variables can destroy the heuristic method cannot guarantee optimality, we can usually
low coherence, this does not means that all variables sharing reach good scaling vector that dramatically decreases µ(Vs )
the same variance can reach the lowest coherence, which according to our simulation results.
is also shown in Example 1. As a result, the relationship Example 2: We continue with Example 1. An approxi-
between the coherence index and the scaling vector is not mate null space basis V⊥ is obtained using the observed
so straightforward. An approach to find the optimal scaling data. Then 50 trials of the DE algorithm are conducted to
vector in the sense of lowest coherence is worthy of studying. observe the algorithm’s stability. Based on our experience,
Given the observed data matrix M = L0 + S0 ∈ Rn1 ×n2 , the standard normalization is usually a good initial point
each column denotes a certain process variable, and each row for the scaling vector; so in this example we first normalize
is a sample. Since usually we have sufficient historical data, the data and then find out the scaling vector based on this
the observed matrix has more rows than columns, namely normalized data matrix. The optimal scaling vectors obtained
3537
in the 50 trials are shown in Fig. 2. The solution is not quite 11
stable; but they have similar trends. Figs. 3 and 4 provide
the values of µ(Vs ) and µ1 corresponding to the optimal 10 µ of well scaled data
1
solutions searched in the 50 trials. The red dash dot line µ of standard normalized data
1
µ1 of orignal data
and black dash line show us the values of the original data 9
matrix and standard normalized data matrix, respectively. We

can also observe slight variations on the optimized values; 8
but the scaled results are obviously better than the original
and standard normalized ones. Then we apply Algorithm 7
1 in [1] to the well scaled data matrix and the standard

6
normalized data matrix. In all the 50 trials the outlier-free
matrix L0 can be successfully reached. Since the solutions
5
are not the same in the 50 trails, the numbers of iterations to
converge in Algorithm 1 are also different for the different 4
0 5 10 15 20 25 30 35 40 45 50
optimized scaled data matrices. Fig. 5 shows the histogram of
the iteration numbers. The number of iterations for standard
normalized data matrix is 3150, which is larger than all the
Fig. 4. µ1 of original and scaled data
trials using optimized scaled data matrices. Hence the well
scaled data matrix usually means a faster convergence rate
in Algorithm 1.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Fig. 5. Histogram of the number of iterations used to reach convergence
0
0 5 10 15 20 25 30 35
Fig. 2. Scaling parameters in 50 trials

IV. FAULT DETECTION AND DIAGNOSIS
In Section III, the modeling procedure is discussed. A

main purpose of building process models is process mon-
itoring, namely, fault detection and diagnosis (FDD). Ap-
7 plications of PCA to FDD are extensively discussed, e.g.
[19]–[21]. A PCA generated process model determines two
µ(V) of orignal data
6 µ(V) of standard normalized data subspaces: the principal component subspace (PCS) and
µ(V) of well scaled data the residual subspace (RS). A measurement is decomposed
5 by projecting it to the two subspaces. The ℓ2 norm of
the projection in RS (the residual vector) is called the
4 squared prediction error (SPE). Another statistic, T 2 statistic,
describes the variation of a process in the PCS. Control limits
3 of these two indices are provided for fault detection [21].
The PCP generated model also provides PCS and RS.
2 V V T denotes the PCS and V⊥ V⊥T is the RS. The projections
of a measurement in the two subspaces are determined by
1 an optimization problem:
0 10 20 30 40 50
Minimize ||x̃|| (7)

Fig. 3. µ(V ) of original and scaled data Subject to x̂ + x̃ = x,
V⊥ x̂ = 0,
3538
where x is the measurement, x̂ and x̃ are its projections summation of faults and outliers, it is hard to read the faults
in PCS and RS, respectively. When the ℓ2 norm is used in from it. But the filtered signal via a generalized median filter
the objective function, the optimal solution is the orthogonal provides a clear view of the faults. The time trends of the
projection that is the projection used in PCA methods. faulty signals (only the first seven and last seven variables,
However, since the fault cardinality is usually small, the fault blue dash curves) and the corresponding residual signals (red
behaves in the same wat as a sparse outlier in the abnormal solid curves) are shown in Fig. 6. If we apply the PCA-
case when a single sample is investigated. As a result, the ℓ1 based isolation and reconstruction method proposed in [21],
35!
norm, which is the heuristic of the 0 norm, is a better choice we need to try 35 + 2!×33! = 630 different combinations,
in this situation. One question must be answered is whether since two independent faults may occur simultaneously. If
the solution obtained by minimizing the ℓ1 norm can well the cardinality of faults further increases, the number of
estimate the sparse residual vector. According to [22], a key combinations increases dramatically, which means a large
point to guarantee the equivalence of the sparse solution and computational burden.
the minimal ℓ1 norm solution is that sparse vectors cannot lie
x1
in the complement of the RS, namely, the PCS. This is also 10
the requirement of PCP decomposition, and the reason why 0
−10
we search for the optimal scaling parameters. As a result, 0 20 40 60 80 100
x2
120 140 160 180 200
50
the RS of the well-scaled data rather than the original data 0
should be used for residual vector calculation. −50

0 20 40 60 80 100 120 140 160 180 200
x3
In the PCP-based method, we will only use the residual 1000
vector x̃ in fault detection. The main reason why we avoid 0
−1000
0 20 40 60 80 100 120 140 160 180 200
using the outlier-free vector x̂ is the stationary requirement 100
x4
on that vector. Only if the outlier-free signal is statistically 0
stationary do the variation tests, such as the T 2 test, make −100

0 20 40 60 80 100
x5
120 140 160 180 200
sense. However, the stationary requirement usually cannot 10
0
hold in practice, especially when the process works at several −10
0 20 40 60 80 100 120 140 160 180 200
different operating points. 200
x6
As stated in [13], the main advantage of applying the PCP- 0
based fault detection, isolation, and reconstruction approach −200

0 20 40 60 80 100
x7
120 140 160 180 200
is its simplicity. All the three purposes can be reached 50
simultaneously. A fault is detected and isolated on a certain −50

0 20 40 60 80 100 120 140 160 180 200
variable when the corresponding residual is non-zero, and

the null hypothesis that the non-zero samples are outliers 50
x29
is rejected. x̂ is the reconstructed fault-free and outlier-free 0
vector. In order to distinguish between outliers and faults, −50

0 20 40 60 80 100 120 140 160 180 200
x30
a certain univariate post-filter should be applied to each 100
0
residual variable. Because of the existence of outliers, also −100
0 20 40 60 80 100 120 140 160 180 200
called impulse noise in the signal processing literature, the 50
x31
distributions of the residual variables in the fault-free case 0
have heavy tails. As a result, a generalized median filter −50

0 20 40 60 80 100
x32
120 140 160 180 200
is a good choice [23], [24]. The design of this post-filter 2000
is based on an assumption that abnormality continuing for −2000

0 20 40 60 80 100 120 140 160 180 200
several consecutive samples in a univariate signal is a fault, 2000

x33
while outliers appear sparsely. 0
−2000
Example 3: We continue with Examples 1 and 2. 200 0 20 40 60 80 100
x34
120 140 160 180 200
5000
more samples are generated in the original PCS with 5% 0
outliers. Three faults are added: −5000

0 20 40 60 80 100 120 140 160 180 200
x35
1) from the 161st sample, the 3rd variable cannot be 2000
updated, so the magnitude stays at the value of the 0
−2000
160th sample; 0 20 40 60 80 100 120 140 160 180 200
2) the 101st to 120th samples of the 5th variable have an Fig. 6. Time trends of the fault signals and the filtered residual signals
offset of -4;
3) from the 31st sample, an offset of 800 is added on the
35th variable. V. CONCLUDING REMARKS AND FUTURE WORK
Randomly choosing one group of scaling vectors and the
corresponding PCS and RS we obtained in Example 2, we A process monitoring method based on PCP is thoroughly
solve the optimization problem in (7) for each sample. Fi- discussed. In order to improve modeling accuracy, an optimal
nally the residual signal is obtained. Since the residual is the scaling method is proposed as a preprocessing step. After
3539
modeling, a PCP-based fault detection and diagnosis ap- [13] J. D. Isom and R. E. LaBarre, “Process fault detection, isolation,
proach is introduced. Compared with PCA-based approaches, and reconstruction by principal component pursuit,” in 2011 American
Control Conference, San Francisco, CA, USA, 2011, pp. 238–243.
the proposed approach determines the residual signal by [14] M. G. Borgognone, J. Bussi, and H. Guillermo, “Principal component
solving constrained linear programming problems instead analysis in sensory analysis: covariance or correlation matrix?” Food
of unconstrained quadratic programming problems. Using Quality and Preference, vol. 12, no. 5-7, pp. 323–326, 2001.
[15] J. Wen, X. Xiao, J. Dong, Z. Chen, and X. Dai, “Data normalization
filtered residual signals via univariate generalized median for diabetes II metabonomics analysis,” in The 1st International
filters, the fault detection, isolation and reconstruction can Conference on Bioinformatics and Biomedical Engineering, Wuhan,
be fulfilled simultaneously with ease, which is the main China, 2007, pp. 682–685.
[16] M. Elad, “Optimized projections for compressed sensing,” IEEE
advantage over PCA-based approaches. Transactions on Signal Processing, vol. 55, no. 12, pp. 5695–5702,
However, there is still large room to improve the optimal 2007.
scaling algorithm. In the future work, we will adopt some [17] J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse
signals: simulataneous sensing matrix and sparsifying dictionary opti-
other heuristic or non-heuristic algorithms that may be better mization,” IEEE Transactions on Inage Processing, vol. 18, no. 7, pp.
than the DE algorithm in the sense of accuracy and swiftness. 1395–1408, 2009.
Coordinate search may be a good choice because of its [18] R. Storn and K. Price, “Differential evolution - a simple and efficient
heuristic for global optimization over continuous spaces,” Journal of
low computational burden, but its convergence property in Global Optimization, vol. 11, no. 4, pp. 341–359, 1997.
our problem still needs further study. Iterations between the [19] L. H. Chiang, E. L. Russell, and R. D. Braatz, Fault Detection and
optimal scaling and modeling steps can increase the accuracy Diagnosis in Industrial Systems. London Great Britain: Sorubger-
Verlag, 2001.
on coherent indices caused by using the observed matrix [20] S. J. Qin, “Statistical process monitoring: Basics and beyond,” Journal
M instead of the outlier-free matrix L0 to estimate V⊥ ; but of Chemometrics, vol. 15, pp. 480–502, 2003.
it is time consuming until some computationally efficient [21] F. A. Carlos and S. J. Qin, “Reconstruction-based contribution for
process monitoring,” Automatica, vol. 45, pp. 1593–1600, 2009.
algorithms are introduced. [22] D. L. Donoho, “For most large underdetermined systems of linear
Moreover, in practice, both outliers and noise exist in the equations the minimal l1 -norm solution is also the sparest solution,”
disturbance. But in our work, only outliers are considered. A Communication on Pure and Applied Mathematics, vol. 59, no. 6, pp.
797–829, 2006.
robust PCP technique also considering small entry-wise noise [23] A. C. Bovik, T. S. Huang, and D. C. Munson, “A generalization of
is recently proposed in [4]. So in the future, performance of median filtering using linear combinations of order statistcs,” IEEE
Transections on Acoustics, Speech, and Signal Processing, vol. 31,
the robust PCP technique with both noise and outliers will no. 6, pp. 1342–1350, 1983.
also be studied. [24] Y. H. Lee and S. A. Kassam, “Generalized median filtering and re-
lated nonlinear filtering techniques,” IEEE Transections on Acoustics,
R EFERENCES Speech, and Signal Processing, vol. 33, no. 3, pp. 672–683, 1985.
[1] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component

analysis?” Journal of ACM, vol. 58, no. 3, 2011.
[2] V. Chandrasekaran, S. Sanghavi, P. A.Parrilo, and A. Willsky, “Sparse
and low-rank matrix decompositions,” in 15th IFAC Symposium on
System Identification, vol. 15, no. 1, Saint-Malo, France, 2009, pp.
1493–1498.
[3] ——, “Rank-sparsity incoherence for matrix decomposition,” SIAM
Journal on Optimization, vol. 21, no. 2, pp. 572–596, 2011.
[4] Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma, “Stable principal
component pursuit,” in 2010 IEEE International Symposim on Infor-
mation Theory Proceedings, Austin, Texas, USA, 2010, pp. 1518–
1522.
[5] H. Xu, C. Caramanis, and S. Sanghavi, “Robust PCA via outlier
pursuit,” IEEE Transactions on Information Theory, in press.
[6] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles:
exact signal reconstruction from highly incomplete frequency infor-
mation,” IEEE Transactions on Information Theory, vol. 52, no. 2,
pp. 489–509, 2006.
[7] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Infor-
mation Theory, vol. 52, no. 4, pp. 1298–1306, 2006.
[8] E. J. Candes and B. Recht, “Exact matrix completion via convex
optimization,” Fundations of Computational Mathematics, vol. 9,
no. 6, pp. 717–772.
[9] E. J. Candes and J. Romberg, “Sparsity and incoherence in compres-
sive sampling,” Inverse Problems, vol. 23, no. 3, pp. 969–985, 2007.
[10] A. B. Ramirez, H. Arguello, and G. Arce, “Video anonaly recovery
from compressed spectral imaging,” in IEEE International Conference
on Acoustics, Speech and Signal Processing, Santander, Bucaramanga,
Colombia, 2011, pp. 1321–1324.
[11] J. Wright, A. Yang, A. Ganesh, Y. Ma, and S. Sastry, “Robust face
recognition via spares representation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
[12] K. Min, Z. Zhang, J. Wright, and Y. Ma, “Decomposing background
topics from keywords by principal component pursuit,” in the 19th
ACM International Conference on Information and Knowledge Man-
agement, New York, NY, USA, 2010, pp. 269–277.
3540

Application of Principal Component Pursuit To Process Fault Detection and Diagnosis

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Application of Principal Component Pursuit To Process Fault Detection and Diagnosis

Enviado por

Direitos autorais:

Formatos disponíveis

2013 American Control Conference (ACC)

Washington, DC, USA, June 17-19, 2013

Application of Principal Component Pursuit to Process Fault Detection and

978-1-4799-0178-4/$31.00 ©2013 AACC 3535

0 pSince µ(U ) is unchangeable, and the upper bound of µ1 is

matrix and standard normalized data matrix, respectively. We

1 in [1] to the well scaled data matrix and the standard

Fig. 2. Scaling parameters in 50 trials

In Section III, the modeling procedure is discussed. A

Minimize ||x̃|| (7)

the requirement of PCP decomposition, and the reason why 0

should be used for residual vector calculation. −50

vector x̃ in fault detection. The main reason why we avoid 0

on that vector. Only if the outlier-free signal is statistically 0

stationary do the variation tests, such as the T 2 test, make −100

sense. However, the stationary requirement usually cannot 10

As stated in [13], the main advantage of applying the PCP- 0

based fault detection, isolation, and reconstruction approach −200

is its simplicity. All the three purposes can be reached 50

simultaneously. A fault is detected and isolated on a certain −50

variable when the corresponding residual is non-zero, and

is rejected. x̂ is the reconstructed fault-free and outlier-free 0

vector. In order to distinguish between outliers and faults, −50

distributions of the residual variables in the fault-free case 0

have heavy tails. As a result, a generalized median filter −50

is a good choice [23], [24]. The design of this post-filter 2000

is based on an assumption that abnormality continuing for −2000

several consecutive samples in a univariate signal is a fault, 2000

while outliers appear sparsely. 0

outliers. Three faults are added: −5000

updated, so the magnitude stays at the value of the 0

160th sample; 0 20 40 60 80 100 120 140 160 180 200

[1] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component

Você também pode gostar