Escolar Documentos
Profissional Documentos
Cultura Documentos
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
Brief Papers
New Discrete-Time Recurrent Neural Network
Proposal for Quadratic Optimization
With General Linear Constraints
Mara Jos Prez-Ilzarbe
I. I NTRODUCTION
The problem of quadratic optimization with general linear
constraints [1] can be stated as
1
min E x = x T M x + p T x
2
s.t. h down H T x h up down < x up
and Bx = b
(1)
The first discrete-time neural network for quadratic optimization was published in [10]. This is a simple model
especially appropriate for hardware implementation [11], but
it is able to solve the quadratic problem subject only to bound
constraints. In [12], a model was developed that was able to
solve general nonlinear optimization problems (in particular
quadratic) subject to equality and bound constraints. After that,
a discrete-time model based on the continuous one proposed
in [8] was published in [13]. This last can deal with a complete
set of linear constraints (bound, inequality, and equality). And
the most recent discrete-time proposal, to our knowledge,
has been published in [14]. This last can perform quadratic
optimization with only inequality constraints, including bound
constraints as a particular case, provided that matrix H T is full
row-rank. This implies that the number of such constraints
is not higher than the number of optimization variables. As
was pointed out in [4], in this situation the set of inequality
constraints can be transformed onto a set of bound ones by
making a transformation of variables, and then the discretetime networks published in [10] and [12] are also able to solve
the corresponding optimization problem. This brief focuses on
quadratic problems with a higher number of constraints than
the number of variables, in which the discrete-time networks
[10], [12], [14] are not applicable.
In [18], it is shown how, with a particular formulation
of the Wolfe dual theory, the quadratic problem subject to
general linear constraints can be transformed into another
quadratic problem only subject to lower-bound constraints,
which is obviously much easier to solve. In particular, it can
be solved by using a network similar to the one presented
in [10], and can be implemented in hardware using the fieldprogrammable gate array [11]. Here, we propose the model,
develop its equilibrium and convergence conditions, study its
computational complexity, and illustrate its efficiency with
several examples. We also make comparisons with the network
presented in [13].
II. W OLFE D UAL OF THE Q UADRATIC P ROBLEM
In [18], we can find a particularly simple Wolfe dual
formulation for the following quadratic problem:
1 T
y Gy + g T y s.t. A T y a
(2)
2
where y R n , G is a symmetric and positive definite n n
matrix, A is an n m matrix, g R n , and a R m . Observe
that E y is strictly convex and, as the constraint region is a
convex set, the constrained minimum is unique [1].
To put (2) in the form (3), it is necessary to eliminate
the equality constraints, and the procedure proposed in [8]
has been followed: matrix B is partitioned in two parts B =
(B I , B I I ) in such a way that Bx = B I x I + B I I x I I , where B I
min E y =
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
1 T
W + d T + ct
2
s.t. 0
(4)
323
(5)
(6)
324
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
with
c < 2/ W
(7)
and from (9), using any matrix norm definition, different rules
can be obtained. In particular, using the spectral norm || ||2 ,
we have the following rule.
Rule 1:
ci = c
(k + 1) = f ((k)) P C (W (k) + d)
2
(8)
Rule 3:
|wi j | .
ci = c /wii with 0 < c < 2/ W = 2/ max
j
Rule 4:
ci = c /wii with 0 < c < 2/W F = 2/
wi2j .
i
max
W
V. C OMPUTATIONAL C OMPLEXITY
(11)
where z T = (y, )T , are the m Lagrange multipliers associated with the inequality constraints in (4), s implements the
bounds [down I I , up I I ] [rdown , rup ], h is a positive constant,
and
I O
G R
g
G=
; N=
; g =
.
O I
0
RT O
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
325
(a)
Fig. 2.
Fig. 3.
(b)
Fig. 1.
(a) Full dimension and (b) reduced dimension network
implementation.
First, (12) has less connecting weights and a simpler structure than model (13), thus being more appropriate for hardware
implementation. In particular, the proposed network has one
layer, whereas (13) is a two-layer network. So, (12) has better
characteristics for parallelization. This model entails, at each
iteration, only one matrix vector multiplication in dimension
n + m: + = C + W + ( + + ). This can be computed
by n + m processing units working in parallel and making
each one n + m multiplications. In comparison, model (13)
entails, at each iteration, two serial steps: a first one for
calculating u(k +1) from z(k) and a second one for calculating
z(k + 1) from u(k + 1). Each step can be computed with units
working in parallel, each one making n + m multiplications.
So, in parallel implementations, one iteration of (13) needs
roughly twice the computing time needed by one iteration
of (12).
The arguments of the above paragraph prove that model (12)
is computationally advantageous compared with (13) even in
the case of a problem with a complete set of linear constraints.
The advantage of (12) increases as bound constraints are
relaxed. Then, the dimension of vectors + and is n b + m,
where n b < n is the number of bound constrained variables,
and to implement (12), each processing unit must compute
n b + m multiplications. On the contrary, each processing unit
in (13) must compute n + m multiplications, independently of
the value of n b .
Finally, experimental work presented in next section shows
that the proposed network has higher convergence speed
than (13).
A. Example 1
The first example in [5] is (2) with
21
30
5/12 5/2 1 0
M=
; p=
; H=
12
30
1
1 0 1
T
= (35/12, 35/2, 5, 5).
downT = (0, 0); h down
B. Example 2
The third example in [5] is (2) with
2
0
M =
1
0
0
1
0
0
1
0
2
1
0
0
;
1
1
1
1 3
3
2 1
p=
1 ; H = 1 2
1
1 1
downT = (0,0,0,0);
0
1
4
0
T
h up
= (5, 4, 1.5).
326
Fig. 4.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
C. Example 3
The second example in [8] is (2) with
1
4 3 4 2 4
0
3 2 8 8 5
M =
4 8 10 6 2 ; B = 3
1
2 8 6 0 1
2
4 5 2 1 4
0
1
2
2
y((k + nr )T ) =
k+nr
ti u((k + nr i )T )
(12)
i=nr
p T = (3, 0, 2, 6, 0); b T = (6, 0); downT = (0, 0, 0, 0, 0); where ti are the samples of the system impulse response. With
all that, the cost function J can be written in the form (2) with
upT = (10, 10, 10, 10, 10).
x i = u((k + i 1)T ) for i = 1 to N, M = T T T where T is
Using (3) to eliminate the equality constraints, a problem of an N N matrix whose r th row is Tr = (tnr+r1 , tnr+r2 ,,
t nr ,0, 0,, 0), and p = T T yd , where yd is a vector whose
the form of (4) is obtained with
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
Fig. 6.
P1
22
13
14
293
P2
477
140
152
194
P3
81
35
38
43199
P4
4995
1002
1082
804387
327
E. Comparison
In Table I, we can see the average number of iterations
needed by networks (6) and (13) to converge to the optimal
solution for all examples tested (named P1 to P4). The
convergence criterion is ||x* x|| <= 0.001||x*||. Averages
correspond to 100 initial points randomly distributed inside
[0,10]n . In model (6), the highest values allowed by Rules 1, 2,
and 4 are chosen for the ci constants. Rule 3 is not used since,
for the four problems tested, this is a little more restrictive than
Rule 4. Finally, the results obtained using network (13) have
been obtained with h = 1/max , where max is the maximum
N + G)
T as proposed in [13].
eigenvalue of ( N + G)(
The best performance is obtained with model (6) and Rule 2
in all examples. Comparing results obtained by (6) with
Rules 1 and 2, we can evaluate the effect of the preconditioning
technique applied, and we can see that it accelerates network
convergence in all cases. Comparing Rules 2 and 4, we can
evaluate the effect of using different matrix norms in the ci
calculation. And we can see that, in all cases, the use of max
leads to quicker convergence than the use of ||W || F . Speed
differences are not very high because, for the four problems
tested, ||W || F is not very far than max . Finally, by comparing
the second and fourth rows we can see that convergence speed
of (13) is lower than that of the model proposed here for all
tested problems, especially for examples 3 and 4.
VII. C ONCLUSION
A particular form of the Wolfe dual, combined with the
appropriate method for eliminating the equality constraints,
was used to reformulate the problem of quadratic optimization
with general linear constraint as a much simpler one of
quadratic optimization with lower bound constraints. A simple
discrete-time neural network model has been proved to be
able solve this problem. The quadratic problems to which
the proposed method is applicable are those that are strictly
convex in the subspace defined by the equality constraints. It
can work with any type and any number of linear constraints,
which is not the case in most discrete-time models previously
328
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2013
[17] Y. Xia, C. Sun, and W. X. Zheng, Discrete-time neural network for fast
solving large linear estimation problems and its application to image
restoration, IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5, pp.
812820, May 2012.
[18] R. Fletcher, Practical Methods of Optimization, 2nd ed. New York:
Wiley, 1987.
[19] K. C. Tan, H. J. Tang, and Z. Yi, Global exponential stability of
discrete-time neural networks for constrained quadratic optimization,
Neurocomputing, vol. 56, pp. 399406, Jan. 2004.
[20] A. Greenbaum and G. H. Rodrigue, Optimal preconditioners of a given
sparsity pattern, BIT Numer. Math., vol. 29, no. 4, pp. 610634, Dec.
1989.
I. I NTRODUCTION
Image segmentation is one of the most important and
difficult problems in many applications, such as robot vision,
object recognition, and medical image processing. Although
Manuscript received May 24, 2012; revised November 2, 2012; accepted
November 4, 2012. Date of publication December 20, 2012; date of current
version January 11, 2013. This work was supported in part by the Canada
Chair Research Program and the Natural Sciences and Engineering Research
Council of Canada, and the National Natural Science Foundation of China
under Grant 61105007.
H. Zhang is with the Department of Electrical and Computer Engineering,
University of Windsor, Windsor, ON N9B 3P4, Canada, and also with the
School of Computer & Software, Nanjing University of Information Science
& Technology, Nanjing 210044, China (e-mail: nrzh@uwindsor.ca).
Q. M. J. Wu and T. M. Nguyen are with the Department of Electrical
and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4,
Canada (e-mail: jwu@uwindsor.ca; nguyen1j@uwindsor.ca).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2012.2228227