Você está na página 1de 7

322

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

Brief Papers
New Discrete-Time Recurrent Neural Network
Proposal for Quadratic Optimization
With General Linear Constraints
Mara Jos Prez-Ilzarbe

Abstract In this brief, the quadratic problem with general


linear constraints is reformulated using the Wolfe dual theory,
and a very simple discrete-time recurrent neural network is
proved to be able to solve it. Conditions that guarantee global
convergence of this network to the constrained minimum are
developed. The computational complexity of the method is
analyzed, and experimental work is presented that shows its high
efficiency.
Index Terms Discrete time, global convergence, hybrid
constraints, neural networks, quadratic optimization.

I. I NTRODUCTION
The problem of quadratic optimization with general linear
constraints [1] can be stated as
1
min E x = x T M x + p T x
2
s.t. h down H T x h up down < x up
and Bx = b

(1)

where M is a symmetric matrix. The use of neural networks is


a very interesting approach to solve this problem in real time.
This is because neural algorithms are specifically designed
to be easily implemented in hardware, and they are directly
amenable for parallel realization.
The first neural network published for quadratic optimization is [2], and since then many others have been proposed.
Most of them are continuous-time dynamical systems [3][9].
Among them, methods [7] and [8] seem to be the most
efficient: they can deal with (2) with a complete set of
linear constraints and have less computational complexity
than the others, as has been shown in [8]. There are much
fewer discrete proposals, probably because their stability is
generally more difficult to assure. However, the discrete-time
models are preferable to continuous-time models for computer
implementations, and are particularly advantageous in realtime applications. We can find in the literature some discretetime neural models to solve problem (2) [10][14] and for
use in related applications such as identification and control
[15], [16] or image restoration [17].
Manuscript received January 12, 2012; revised July 4, 2012; accepted
September 26, 2012. Date of publication December 11, 2012; date of current
version January 11, 2013.
The author is with the Departamento de Automtica y Computacin, Universidad Pblica de Navarra, Pamplona 31006, Spain (e-mail:
mjperez@unavarra.es).
Digital Object Identifier 10.1109/TNNLS.2012.2223484

The first discrete-time neural network for quadratic optimization was published in [10]. This is a simple model
especially appropriate for hardware implementation [11], but
it is able to solve the quadratic problem subject only to bound
constraints. In [12], a model was developed that was able to
solve general nonlinear optimization problems (in particular
quadratic) subject to equality and bound constraints. After that,
a discrete-time model based on the continuous one proposed
in [8] was published in [13]. This last can deal with a complete
set of linear constraints (bound, inequality, and equality). And
the most recent discrete-time proposal, to our knowledge,
has been published in [14]. This last can perform quadratic
optimization with only inequality constraints, including bound
constraints as a particular case, provided that matrix H T is full
row-rank. This implies that the number of such constraints
is not higher than the number of optimization variables. As
was pointed out in [4], in this situation the set of inequality
constraints can be transformed onto a set of bound ones by
making a transformation of variables, and then the discretetime networks published in [10] and [12] are also able to solve
the corresponding optimization problem. This brief focuses on
quadratic problems with a higher number of constraints than
the number of variables, in which the discrete-time networks
[10], [12], [14] are not applicable.
In [18], it is shown how, with a particular formulation
of the Wolfe dual theory, the quadratic problem subject to
general linear constraints can be transformed into another
quadratic problem only subject to lower-bound constraints,
which is obviously much easier to solve. In particular, it can
be solved by using a network similar to the one presented
in [10], and can be implemented in hardware using the fieldprogrammable gate array [11]. Here, we propose the model,
develop its equilibrium and convergence conditions, study its
computational complexity, and illustrate its efficiency with
several examples. We also make comparisons with the network
presented in [13].
II. W OLFE D UAL OF THE Q UADRATIC P ROBLEM
In [18], we can find a particularly simple Wolfe dual
formulation for the following quadratic problem:
1 T
y Gy + g T y s.t. A T y a
(2)
2
where y R n , G is a symmetric and positive definite n n
matrix, A is an n m matrix, g R n , and a R m . Observe
that E y is strictly convex and, as the constraint region is a
convex set, the constrained minimum is unique [1].
To put (2) in the form (3), it is necessary to eliminate
the equality constraints, and the procedure proposed in [8]
has been followed: matrix B is partitioned in two parts B =
(B I , B I I ) in such a way that Bx = B I x I + B I I x I I , where B I
min E y =

2162237X/$31.00 2012 IEEE

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

is a nonsingular r r matrix, r being the number of equality


constraints, and x T = (x IT , x ITI ) where x IT = (x 1 ,, xr ),
y T = x ITI = (xr+1 ,, x n ). Observe that n = n  r . Then, x
and y are related by
x = Qy + q

 1 

BI b
B I1 B I I
, and (2) can be
and q =
where Q =
O
I
transformed into
1
min
E y = y T Gy + g T y
2
s.t. rdown < R T y rup , down I I < y up I I
(3)
where
G = Q T M Q; g = Q T (Mq + p)




HT Q
h down H T q
T
R =
; rdown =
B I1 B I I
down I B I1 b


h up H T q
rup =
up I B I1 b
and down I , up I , down I I , up I I contain the upper and lower
bound values corresponding to x I and x I I . Finally, it is trivial
to write (4) as (3) by putting A = (R, I, R, I), I being the
T
T , upT , r T
identity n n matrix, and a T = (rup
down , down I I ).
II
The Wolfe dual formulation of the primal problem (3) can
be expressed [18] as L(y, ) = y T Gy/2 + g T y + T (Ay a)
subject to Gy + g+ A = 0 and 0, where are the
Lagrange multipliers associated to each constraint in (3). Using
the constraints of the dual to eliminate y from the objective
function [18], we obtain
min E =

1 T
W + d T + ct
2

s.t. 0

(4)

where W = A T G 1 A, d = AT G 1 g +a, and ct = gT G 1 g/2


is constant and can be removed from (5). Observe that (5) is
again a problem quadratic in the variables , but subject only
to the lower bounds 0. Once the dual solution has been
found, the solution y of the primal problem can be obtained
[18] by solving the constraint equations y = G 1 (A + g).
It is clear that this formulation can be applied only if
matrix G is nonsingular. So, the method proposed in this
brief is applicable to problems that can be expressed in
the form (3) with G being a positive definite matrix, i.e.,
quadratic problems strictly convex in the subspace defined by
the equality constraints. This is obviously the case if M in
(2) is positive definite. In addition, G can be positive definite
when matrix M is singular and/or indefinite.
Finally, it must be pointed out that the model developed
in [13] has a broader range of applicability than (5), since
the former is valid for G positive semidefinite. However, the
case of G > 0 is interesting and has numerous applications in
many areas of science and engineering, such as signal processing, communications, and control. For these applications, the
neural method proposed here is a good choice since, as will be
seen, it has a lower complexity than the one proposed in [13],
and presents quicker convergence in all examples tested. An
example of application of predictive control in which always
M > 0 is presented in Section V.

323

III. N EURAL N ETWORK M ODEL FOR W OLFE D UAL AND


C ONVERGENCE C ONDITIONS
The neural network proposed in [10] applied to solve the
dual problem (5) can be written as
(k + 1) = f (o(k + 1)) = f ((k) + (k))
with (k) = C(W (k) + d)

(5)

where the matrix C is diagonal, ci j = ci i j , and function f


implements the lower bounds in (5), i.e., f (oi ) = oi if oi 0,
f (oi ) = 0 if oi < 0.
In [10], this kind of neural model was studied, its ability
for solving bound constrained quadratic problems with positive
definite matrix was proved, and neuron updating rules guaranteeing net convergence were developed. In [19], alternative
updating rules were proposed for the same network. So, if W
is positive definite, the results obtained in [10] and [19] can be
directly applied. Observe, however, that although G is positive
definite, W can be only positive semidefinite. Actually, Wu
= 0 when Au = 0 and, in general, Au = 0 can occur for
nonnull u vectors. In the following, the analysis made in [10]
is extended to cover this case.
A. Network Equilibrium State
A quadratic problem having W positive semidefinite with
only lower bounds constraints can, in general, not have solution, since function (5) can be unbounded in the constraints
region. However, this cannot be the case when (5) is the Wolfe
dual of a solvable primal problem. Function E in (5) has
a minimum finite value which, in fact, coincides [18] with
the minimum value of the primal function E y in (3): i.e.,
E y (y ) = E( ). Then the network equilibrium analysis made
in [10] applies, and the necessary and sufficient condition
for the equilibrium point of (6) to coincide with the unique
solution of (5) is ci > 0 for all i .
B. Global Convergence Neuron Updating Rules
The proof of global convergence involves demonstrating that
E defined in (5) is a Lyapunov function for the dynamical
system (6). As proved above, E has a lower bound. Then,
for E to be a Lyapunov function, we must prove that it
decreases along any trajectory governed by (6). The E change
per iteration E(k + 1) = E((k+1)) E((k)) can be
expressed as
1
E(k + 1) = E T (k) (k) + T (k)W  (k)
2
= [I ] + [I I ]

(6)

where E(k) = W (k) + d,  (k) = (k+ 1) (k), and


the fact that W is symmetric has been used. The nature of
constraints in (5) assures that sign(i ) = sign(i ) and, with
ci > 0, sign(
 i ) = sign( E i ), so the first term in (7) is
[I ] = i |( E)i ||i | 0. Then, to obtain a negative
E value it is sufficient to have |[II]|
2 < |[I ]|. In addition,
we know that 0 [I I ] W    /2. Finally, note that
| | |i | = ci |( E)i |, so taking ci = c for all i , we have
 i 2
  =  | |2 c  |( E)i || |. Thus, a sufficient
i
i
i
i

324

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

condition to assure |[II]| < |[I ]|, i.e., to assure global


convergence of (6), is
ci = c

with

c < 2/ W 

(7)

and from (9), using any matrix norm definition, different rules
can be obtained. In particular, using the spectral norm || ||2 ,
we have the following rule.
Rule 1:
ci = c

i with 0 < c < 2/ W 2 = 2/max

where max is the maximum eigenvalue of W . It was shown


in [19] that Rule 1 assures exponential convergence of the
neural network proposed in [10], which is formally identical
to (6), for the case of positive definite matrix. Here, it has been
demonstrated that it guarantees global convergence when W
is positive semidefinite.
On the other hand, in [4] and [10] the usefulness of applying
a preconditioning technique was shown, developed in [20] for
linear systems, to improve the convergence speed of nonlinear
dynamical systems of the form of (6). It consists of working
in a transformed space  = P 1 , P being a diagonal

preconditioner with pii = 1/ wii . With it, problem (5) is


equivalent to minimizing E  = T W   /2 + d T  subject to
 0, where W  = PWP and d  = Pd. It is interesting
to understand what we are doing on the primal space when
preconditioning is made on the dual space. Using the primal
dual relationships, we see that conditioning is equivalent to
reformulating the restrictions in (3) as AT y a  , where
AT = PAT and a  = Pa. Note that, as long as P is a
diagonal matrix with positive elements, this reformulation does
not change the optimization problem. On the other hand, it is
easy to prove the following:
 (k + 1) = f  (  (k)) C  (W   (k) + d  )


(k + 1) = f ((k)) P C (W (k) + d)
2

(8)

Rule 3:



 
|wi j | .
ci = c /wii with 0 < c < 2/ W   = 2/ max
j

Rule 4:

 
 
ci = c /wii with 0 < c < 2/W  F = 2/
wi2j .
i

The use of the matrix trace as the maximum eigenvalue


upper bound was proposed in [19]. Taking into account that
wii = 1 for all i , we have Trace(W  ) = n, leading to the rule
0 < c < 2/Trace(W  ) = 2/n. Using the fact that W  is positive
semidefinite, it is easy to prove that |wi j | 1 for all i = j . So
Rule 3 is always less restrictive than the one obtained using
the matrix trace, and this last one is discarded. On the other
hand, this is identical to the global convergence updating rule
obtained in [10]: 0 < c < 2/(n|w |max ) = 2/n.
IV. R EDUCED D IMENSION F ORMULATION
The least complex models previously proposed to solve
quadratic problem (4) ([7][9] and [13]) work in a space of
dimension n + m, whereas in model (6), in the case of a
problem with a complete set of constraints, vector is of
dimension 2(n + m). This seems to be a disadvantage of the
network presented here. It will be shown in this section that
(6) can be easily reformulated to work in a space of reduced
dimension n + m.
Taking into account that W = A T G 1 A, with A =
(A+ , A+ ) and A+ = (R, I ), it is natural to split vector in
two pieces: +T = ( 1 , . . . , n+m ) and T (n+m+1 ,
. . . , 2n+2m ). Then we have (W )T = ((W + ( + ))T ,
(W + ( + ))T ), where W + = A+T G 1 A+ . With this,
(6) can be rewritten as
+ (k + 1) = f (o+ (k + 1)) = f ( + (k) (k) C + d + )
(k + 1) = f (o (k + 1)) = f ( (k) + (k) C + d )
(10)
with (k) = C + W + ( + (k) (k))

i.e., applying (6) in the  space with matrix C  is equivalent to


applying (6) directly in the space with matrix C = P 2 C  . We
remark that P 2 is the inverse of the diagonal of the Hessian.
So (10), and the updating rules that will be developed below,
makes use of second-order information. From (10), imposing
convergence condition (9) we have
 
(9)
ci = c /wii with c < 2/ W  .

where d +T = (d1 , . . . , dn+m ), d T = (dn+m+1 , . . . , d2n+2m ),


and matrix C + has ci+ = ci for i = 1:n + m. Note that,
with any of the rules 1 to 4, we have ci+n+m = ci for i =
1:n + m. Direct implementation of (6) and the reduced dimension implementation (12) are illustrated in Fig. 1(a) and (b),
respectively, where z 1 represents a unit delay.

From (11), using the spectral matrix norm we obtain the


following rule.
Rule 2:
 
ci = c /wii with 0 < c < 2/ W   = 2/

In this section, the model proposed here will be compared


with the one developed in [13] in terms of simplicity, computational complexity, and parallelization amenability. The method
in [13] to solve problem (4) is

max

W

where max is the maximum eigenvalue of


= PWP.
Note that all matrix norm definitions fulfill || ||2 || ||, so
Rules 1 and 2 are the least restrictive that can be obtained from
conditions (9) and (11), respectively. Other norm definitions,
although give more restrictive rules, can have the advantage
of lower calculation complexity. From (11), using the infinity
norm || || and the Frobenious norm || || F , we obtain the
following rules.

V. C OMPUTATIONAL C OMPLEXITY

u(k + 1) = s( N z(k) Gz(k)


g)
N z(k)
T
u(k + 1)
z(k + 1) = z(k) + h( N + G)

(11)

where z T = (y, )T , are the m Lagrange multipliers associated with the inequality constraints in (4), s implements the
bounds [down I I , up I I ] [rdown , rup ], h is a positive constant,
and




 
I O
G R
g

G=
; N=
; g =
.
O I
0
RT O

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

325

(a)

Fig. 2.

Trajectories of dual variables i for Example 1.

Fig. 3.

Trajectories of dual variables i for Example 2.

(b)
Fig. 1.
(a) Full dimension and (b) reduced dimension network
implementation.

First, (12) has less connecting weights and a simpler structure than model (13), thus being more appropriate for hardware
implementation. In particular, the proposed network has one
layer, whereas (13) is a two-layer network. So, (12) has better
characteristics for parallelization. This model entails, at each
iteration, only one matrix vector multiplication in dimension
n + m: + = C + W + ( + + ). This can be computed
by n + m processing units working in parallel and making
each one n + m multiplications. In comparison, model (13)
entails, at each iteration, two serial steps: a first one for
calculating u(k +1) from z(k) and a second one for calculating
z(k + 1) from u(k + 1). Each step can be computed with units
working in parallel, each one making n + m multiplications.
So, in parallel implementations, one iteration of (13) needs
roughly twice the computing time needed by one iteration
of (12).
The arguments of the above paragraph prove that model (12)
is computationally advantageous compared with (13) even in
the case of a problem with a complete set of linear constraints.
The advantage of (12) increases as bound constraints are
relaxed. Then, the dimension of vectors + and is n b + m,
where n b < n is the number of bound constrained variables,
and to implement (12), each processing unit must compute
n b + m multiplications. On the contrary, each processing unit
in (13) must compute n + m multiplications, independently of
the value of n b .
Finally, experimental work presented in next section shows
that the proposed network has higher convergence speed
than (13).

VI. E XPERIMENTAL R ESULTS


In the following, several examples are tested to illustrate
the performance of the neural model proposed. The figures
correspond to results obtained with initial point (0) = 0 and
using Rule 2 with the maximum allowed value of c . At the end
of this section, comparisons among different neuron updating
rules are presented.

A. Example 1
The first example in [5] is (2) with







21
30
5/12 5/2 1 0
M=
; p=
; H=
12
30
1
1 0 1
T
= (35/12, 35/2, 5, 5).
downT = (0, 0); h down

Vectors and matrixes in (3) for this example are G = M,


T
, downT ), and A = (H ,I ), G
g = p, a T = (h down
being positive definite since M is. And, from here, W and d
in (5) are obtained, and neural model (6) is used. In Fig. 2,
the trajectories of the dual variables i are shown. As can be
seen, in less than 20 iterations the neural network converges
to the dual solution T = (0, 6, 0, 9, 0, 0), corresponding to
the primal one x T = y T = (5, 5).

B. Example 2
The third example in [5] is (2) with

2
0
M =
1
0

0
1
0
0

1
0
2
1

0
0
;
1
1

1
1 3
3
2 1

p=
1 ; H = 1 2
1
1 1

downT = (0,0,0,0);

0
1

4
0

T
h up
= (5, 4, 1.5).

For this example, we have G = M, g = p, a T =


T ,downT ), and A = (H ,I ). In Fig. 3, we can see
(h up
the trajectories of the variables. They converge in about
60 iterations to the dual solution T (0.4545, 0, 0, 0, 0,
1.7272, 0), corresponding to the primal one x T = (0.2727,
2.0909, 0.0000, 0.5455).

326

Fig. 4.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

Trajectories of dual variables i for Example 3.


Fig. 5. (a) Samples of optimal unconstrained (*) and constrained (o) control
signal. (b) Samples of predicted (o) and desired (*) system output.

C. Example 3
The second example in [8] is (2) with

1
4 3 4 2 4
0
3 2 8 8 5

M =
4 8 10 6 2 ; B = 3
1
2 8 6 0 1
2
4 5 2 1 4

0
1

2
2

necessary. In this brief, the impulse response is used

y((k + nr )T ) =

k+nr


ti u((k + nr i )T )

(12)

i=nr

p T = (3, 0, 2, 6, 0); b T = (6, 0); downT = (0, 0, 0, 0, 0); where ti are the samples of the system impulse response. With
all that, the cost function J can be written in the form (2) with
upT = (10, 10, 10, 10, 10).
x i = u((k + i 1)T ) for i = 1 to N, M = T T T where T is
Using (3) to eliminate the equality constraints, a problem of an N N matrix whose r th row is Tr = (tnr+r1 , tnr+r2 ,,
t nr ,0, 0,, 0), and p = T T yd , where yd is a vector whose
the form of (4) is obtained with

i th component is ydi = yd ((k + nr + i 1)T ). In J , there


82 29 3
49
also appears a term that depends on the control signal samples
G = 29 28 33 ; g = 15
at instants before kT, so it does not depend on the optimization
3 33 80
114
variables and can be removed. The restrictions most usually

 
 

3 1 2
6
4
imposed are bound constraints to limit the values of the control
T
R =
; rdown =
; rup =
.
3 2 2
0
10
signal u(kT ), and inequality constraints to limit the values of
the control signal variations u(kT ) = u(kT ) u((k 1)T ).
An important point is, that although M is indefinite, G is
In this context, the following optimization problem has been
positive definite and the network proposed here can be used. In
generated.
Fig. 4, the obtained i trajectories are shown. They converge
1) Minimize J for k = 0 and prediction horizon N = 8, i.e.,
in about 30 iterations to the dual solution T = (0, 0, 0, 0,
T
obtain the optimal values of u(0),,u(7T ). Sampling
0, 0, 0, 0, 0, 50.148), corresponding to the primal one y =
period T = 0.25.
(1.242, 1.822, 0.000), and to the original problem solution
T
2)
Linear
prediction with the discrete model obtained by
x = (4.096, 0.082, 1.242, 1.822, 0.000).
discretizing with T = 0.25, the continuous transfer
function F(s) = 0.3/((0.4s + 1)(0.7s + 1)). The impulse
D. Example 4
response samples are: {tnr , tnr+1 , ..} = {0.0243, 0.0476,
0.0497, 0.0435, 0.0351, 0.0271, 0.0203, 0.0149, 0.0108,
As fourth example, an application to model predictive
0.0078, ,}.
control (MPC) has been chosen. The basic idea of MPC is to
3)
Null
initial conditions and desired output given by the
calculate the values of the control signal that minimize a cost
response
to a unit step of a first-order linear system with
function which penalizes noncompliance with control objecgain
K
=
1 and time constant = 0.5.
tives. For example, for a single-input/single-output system, and
4)
Restrictions
10 u(kT ) 10, 6 u(kT ) 6 for
if the goal is simply to match the output and a desired output,
k
=
1:7,
and
6 u(0) 6.
the cost function J to be minimized is
In Fig. 5(a), we can see the samples of both the optimal
J (u(kT ), . . . , u((k + N 1)T ))
control signal without restrictions (which gives J = 0) and the
nr+N1
1 
optimal one with restrictions. This last sequence is the solution
=
(y ((k + l)T ) yd ((k + l)T ))2
of the posed problem: x = (u (0),u (7T )) = (6, 9.198,
2
l=nr
3.198, 0.006, 6.006, 1.62, 4.563, 2.461) and gives J = 0.0388.
where u(kT ), yd (kT ), and y(kT ) denote the values of the In Fig. 5(b), we can see the corresponding samples of the
control signal, the desired output, and the predicted system predicted system output and the desired system output. The
output at instant kT, respectively, T being the sampling period. proposed network takes less than 600 iterations to reach the
The system delay is denoted by nr, and the prediction horizon optimal solution. In Fig. 6, the evolution of ||x x|| is
is N. To predict the system output, a model of the system is shown.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

Fig. 6.

Evolution of ||x x || for Example 4.


TABLE I
AVERAGE N UMBER OF I TERATIONS TO C ONVERGE
Model/Rule
(6)/Rule 1
(6)/Rule 2
(6)/Rule 4
(13)

P1
22
13
14
293

P2
477
140
152
194

P3
81
35
38
43199

P4
4995
1002
1082
804387

327

published. Several neuron updating rules are developed that


guarantee network global convergence to the constrained minimum for this general case. The most efficient updating rules
are those that use Hessian information in addition to gradient.
The model presented has a very simple structure, low
computational complexity, and good characteristics for parallelization. In addition, compared with a recently published
discrete-time model [13], it exhibits higher convergence speed
in all problems tested. In particular, for the example corresponding to an MPC application, the superiority of the
proposed method is clear. The model in [13], however, is
more powerful than the one presented here in the sense that
it has a broader range of applications, since it can be used
to solve problems only convex (not strictly convex). In this
respect, future work should include the extension of the model
presented here in order to cover the nonstrictly-convex case
and to develop better updating rules for the model proposed
in [13] in order to improve its convergence rate.
R EFERENCES

E. Comparison
In Table I, we can see the average number of iterations
needed by networks (6) and (13) to converge to the optimal
solution for all examples tested (named P1 to P4). The
convergence criterion is ||x* x|| <= 0.001||x*||. Averages
correspond to 100 initial points randomly distributed inside
[0,10]n . In model (6), the highest values allowed by Rules 1, 2,
and 4 are chosen for the ci constants. Rule 3 is not used since,
for the four problems tested, this is a little more restrictive than
Rule 4. Finally, the results obtained using network (13) have
been obtained with h = 1/max , where max is the maximum
N + G)
T as proposed in [13].
eigenvalue of ( N + G)(
The best performance is obtained with model (6) and Rule 2
in all examples. Comparing results obtained by (6) with
Rules 1 and 2, we can evaluate the effect of the preconditioning
technique applied, and we can see that it accelerates network
convergence in all cases. Comparing Rules 2 and 4, we can
evaluate the effect of using different matrix norms in the ci
calculation. And we can see that, in all cases, the use of  max
leads to quicker convergence than the use of ||W  || F . Speed
differences are not very high because, for the four problems
tested, ||W  || F is not very far than  max . Finally, by comparing
the second and fourth rows we can see that convergence speed
of (13) is lower than that of the model proposed here for all
tested problems, especially for examples 3 and 4.
VII. C ONCLUSION
A particular form of the Wolfe dual, combined with the
appropriate method for eliminating the equality constraints,
was used to reformulate the problem of quadratic optimization
with general linear constraint as a much simpler one of
quadratic optimization with lower bound constraints. A simple
discrete-time neural network model has been proved to be
able solve this problem. The quadratic problems to which
the proposed method is applicable are those that are strictly
convex in the subspace defined by the equality constraints. It
can work with any type and any number of linear constraints,
which is not the case in most discrete-time models previously

[1] M. S. Bazaraa and C. Shetty, Nonlinear Programming: Theory and


Algorithms. New York: Wiley, 1979.
[2] J. J. Hopfield and D. W. Tank, Neural computation of decision in
optimization problems, Biol. Cybern., vol. 52, pp. 141152, Mar. 1985.
[3] S. Sudharsanan and M. Sundareshan, Exponential stability and systematic synthesis of a neural network for quadratic minimization, Neural
Netw., vol. 4, no. 5, pp. 599613, 1991.
[4] A. Bouzerdoum and T. R. Pattison, Neural network for quadratic
optimization with bound constraints, IEEE Trans. Neural Netw., vol. 4,
no. 2, pp. 293304, Mar. 1993.
[5] Y. Leung, K.-Z. Chen, Y.-C. Jiao, X.-B. Gao, and K. S. Leung, A
new gradient-based neural network for solving linear and quadratic
programming problems, IEEE Trans. Neural Netw., vol. 12, no. 5, pp.
10741083, Sep. 2001.
[6] Y. Xia, An extended projection neural network for constrained optimization, Neural Comput., vol. 16, no. 4, pp. 863883, Apr. 2004.
[7] Q. Liu and J. Wang, A simplified dual neural network for quadratic
programming with its KWTA application, IEEE Trans. Neural Netw.,
vol. 17, no. 6, pp. 15001510, Nov. 2006.
[8] X. Hu and J. Wang, Design of general projection neural networks for
solving monotone linear variational inequalities and linear and quadratic
optimization problems, IEEE Trans. Syst. Man Cybern. B, vol. 37, no. 5,
pp. 14141421, Oct. 2007.
[9] M. Yashtini and A. Malek, Solving complementarity and variational
inequalities problems using neural networks, Appl. Math. Comput.,
vol. 190, no. 1, pp. 216230, Jul. 2007.
[10] M. J. Prez-Ilzarbe, Convergence analysis of a discrete-time recurrent
neural network to perform quadratic real optimization with bound
constraints, IEEE Trans. Neural Netw., vol. 9, no. 6, pp. 13441351,
Nov. 1998.
[11] D. Anguita, A. Boni, and S. Ridella, A digital architecture for support
vector machines: Theory, algorithm, and FPGA implementation, IEEE
Trans. Neural Netw., vol. 114, no. 5, pp. 9931009, Sep. 2003.
[12] M. Yashtini and A. Malek, A discrete-time neural network for solving nonlinear convex problems with hybrid constraints, Appl. Math.
Comput., vol. 195, no. 2, pp. 576584, Feb. 2008.
[13] H. Tang, H. Li, and Z. Yi, A discrete-time neural network for optimization problems with hybrid constraints, IEEE Trans. Neural Netw.,
vol. 21, no. 7, pp. 11841189, Jul. 2010.
[14] Q. Liu and J. Cao, Global exponential stability of discrete-time recurrent neural network for solving quadratic programming problems subject
to linear constraints, Neurocomputing, vol. 74, no. 17, pp. 34943501,
Oct. 2011.
[15] F. Ornelas-Tellez, E. N. Sanchez, and A. G. Loukianov, Discrete-time
neural inverse optimal control for nonlinear systems via passivation,
IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 8, pp. 13271339,
Aug. 2012.
[16] C. E. Castaeda and P. Esquvel, Decentralized neural identifier and
control for nonlinear systems based on extended Kalman filter, Neural
Netw., vol. 31, pp. 8187, Jul. 2012.

328

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013

[17] Y. Xia, C. Sun, and W. X. Zheng, Discrete-time neural network for fast
solving large linear estimation problems and its application to image
restoration, IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5, pp.
812820, May 2012.
[18] R. Fletcher, Practical Methods of Optimization, 2nd ed. New York:
Wiley, 1987.
[19] K. C. Tan, H. J. Tang, and Z. Yi, Global exponential stability of
discrete-time neural networks for constrained quadratic optimization,
Neurocomputing, vol. 56, pp. 399406, Jan. 2004.
[20] A. Greenbaum and G. H. Rodrigue, Optimal preconditioners of a given
sparsity pattern, BIT Numer. Math., vol. 29, no. 4, pp. 610634, Dec.
1989.

Incorporating Mean Template Into Finite Mixture


Model for Image Segmentation
Hui Zhang, Member, IEEE, Q. M. Jonathan Wu,
Senior Member, IEEE, and Thanh Minh Nguyen

Abstract The well-known finite mixture model (FMM) has


been regarded as a useful tool for image segmentation application.
However, the pixels in FMM are considered independent of each
other and the spatial relationship between neighboring pixels
is not taken into account. These limitations make the FMM
more sensitive to noise. In this brief, we propose a simple and
effective method to make the traditional FMM more robust to
noise with the help of a mean template. FMM can be considered
a linear combination of prior and conditional probability from
the expression of its mathematical formula. We calculate these
probabilities with two mean templates: a weighted arithmetic
mean template and a weighted geometric mean template. Thus,
in our model, the prior probability (or conditional probability)
of an image pixel is influenced by the probabilities of pixels
in its immediate neighborhood to incorporate the local spatial
and intensity information for eliminating the noise. Finally, our
algorithm is general enough and can be extended to any other
FMM-based models to achieve super performance. Experimental
results demonstrate the improved robustness and effectiveness of
our approach.
Index Terms Expectation maximization (EM) algorithm,
finite mixture model, image segmentation, mean template, spatial
constraints.

I. I NTRODUCTION
Image segmentation is one of the most important and
difficult problems in many applications, such as robot vision,
object recognition, and medical image processing. Although
Manuscript received May 24, 2012; revised November 2, 2012; accepted
November 4, 2012. Date of publication December 20, 2012; date of current
version January 11, 2013. This work was supported in part by the Canada
Chair Research Program and the Natural Sciences and Engineering Research
Council of Canada, and the National Natural Science Foundation of China
under Grant 61105007.
H. Zhang is with the Department of Electrical and Computer Engineering,
University of Windsor, Windsor, ON N9B 3P4, Canada, and also with the
School of Computer & Software, Nanjing University of Information Science
& Technology, Nanjing 210044, China (e-mail: nrzh@uwindsor.ca).
Q. M. J. Wu and T. M. Nguyen are with the Department of Electrical
and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4,
Canada (e-mail: jwu@uwindsor.ca; nguyen1j@uwindsor.ca).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2012.2228227

different methodologies [1][5] have been proposed for image


segmentation, it remains a challenge due to overlapping
intensities, low contrast of images, and noise perturbation.
One of the most widely used clustering models for image
segmentation is the well-known finite mixture model (FMM)
[1]. The latest research of FMM for classification and image
representation is discussed in [6][8]. A good survey on the
history of the mixture expert can be found in [9]. The Gaussian
mixture model (GMM) is most commonly selected as a particular case of the FMM by assuming the conditional probability
as a Gaussian distribution [10][12]. The GMM is a flexible
and powerful statistical modeling tool for multivariate data.
The main advantage of the standard GMM is that it is easy
to implement and the small number of parameters can be
efficiently estimated by adopting the expectation maximization
(EM) algorithm. However, as a histogram-based model, the
GMM assumes that each pixel in an image is independent of its
neighbors and does not take into account spatial dependencies.
Moreover, it does not use the prior knowledge that adjacent
pixels most likely belong to the same cluster. Thus, the
performance of the GMM is sensitive to noise and image
contrast levels.
To overcome this limitation, a wide variety of approaches
have been proposed to incorporate spatial information into the
image. A common approach is the use of a Markov random
field (MRF) [13]. Such a method aims to impose spatial
smoothness constraints on the image pixel labels. However,
most reported methods are limited to using an MRF as a
general prior in an FMM-based approach. Recently, a special
case of the MRF modelthe hidden MRF (HMRF) model
was proposed [14], [15]. The state sequence of HMRF cannot
be observed directly, but can be observed indirectly through
a field of observations. In the HMRF model, the spatial
information in an image is encoded through the contextual
constraints of neighboring pixels, which are characterized
by conditional MRF distributions. Parameter estimation in
HMRF models usually relies on maximum likelihood (ML)
or Bayesian methods. Besag [16] introduced the idea of
the pseudo likelihood approximation when ML estimation is
intractable. Based on this well-known approximation, various
HMRF model estimation approaches have been proposed [17],
[18]. To deal with different HMRF models, the EM algorithm
is most commonly used for parameter learning.
The main drawback of HMRF models is that they are computationally expensive to implement, and require the additional
parameter to control the degree of image smoothness. The
chosen parameter has to be both large enough to tolerate
the noise, and small enough to preserve image sharpness
and details. Thus, the parameter is noise dependent to some
degree and selected generally based on experience. In this
brief, we propose a simple, fast, and effective algorithm to
make the traditional FMM more robust to noise, by incorporating local spatial information and pixel intensity value
information with the help of a mean template. The conditional probability (or prior probability) of an image pixel
is influenced by the probabilities of pixels in its immediate
neighborhood. Moreover, our model is general enough and

2162237X/$31.00 2012 IEEE

Você também pode gostar