Dual Adaptive Control of Nonlinear Stoch

PIL: sooo5-1o!J8(!
37)00181-7
Pergamon
Automatica, Vol. 34, No. 2, pp. 245-253, 1998

Elsevia Science Ltd. AU rights nscrwd
printed in Great Britain
LMO5-1098/98 $19.00 + 0.00
0 1998
Brief Paper
Dual Adaptive Control of Nonlinear Stochastic

Systems using Neural Networks*
SIMON FABRIT and VISAKAN KADIRKAMANATHANT
Key Words-Dual
control; stochastic control; neural control; adaptive control; nonlinear control.
control has been analyzed mainly for adaptive control of linear systems with unknown parameters
(Astr6m and Helmersson, 1986; Chan and Zarrop,
1985; Filatov et al., 1995; Jacobs and Patchell, 1972;
Maitelli and Yoneyama, 1994; Pronzato et al.,
1996; Wenk and Bar-Shalom, 1980) or for nonlinear systems having known functionals but whose
state must be estimated (Tse and Bar-Shalom, 1973,
1976). Because of the advantages associated with it,
there has been a recent resurgence of research on
dual control (Filatov et al., 1995; Gevers, 1995;
Maitelli and Yoneyama, 1994; Pronzato et al.,
1996; Wittenmark, 1995). However, none of these
addresses the problem when the system is nonlinear
and the functions are unknown.
Hence, in this work we investigate the use of dual
adaptive control for the affine class of nonlinear,
discrete-time systems when the nonlinear functions
are unknown and a stochastic additive disturbance
is present at the output. Two types of neural network are considered for modelling the unknown
functions. In Section 2 a brief overview of dual
control is given. Section 3 develops the dual neural
network controller for both cases of Gaussian
radial basis function (RBF) and sigmoidal multilayer perceptron (MLP) networks. Section 4 contains simulation results and this is followed by
a conclusion.
Abstract-A suboptimal dual adaptive system is developed for

control of stochastic, nonlinear, discrete time plants that are
tine in the control input. The nonlinear functions are assumed
to be unknown and neural networks are used to approximate
them. Both Gaussian radial basis function and sigmoidal multilayer perceptron neural networks are considered and parameter
adjustment is based on Kahnan filtering. The result is a control
law that takes into consideration the uncertainty of the parameter estimates, thereby eliminating the need to perform prior
open-loop plant identtication. The performance of the system is
analyzed by simulation and Monte Carlo analysis. 0 1998
Elsevier Science Ltd. All rights reserved.
1. INTRODUCTION
The use of neural networks for adaptive control of

the affine class of nonlinear systems in discrete time
has been recently investigated (Chen, 1990; Chen
and Khalil, 1992, 1995; Liu et al., 1996, 1997;
Narendra and Parthasarathy 1990). The neural networks are included for modelling the system functionals which are assumed to be unknown. Adaptive laws are used to adjust the network parameters
so as to obtain good control performance.
The approaches taken so far in neural adaptive
control typically adopt a heuristic certainty equivalence procedure. This implies that the network approximations are used in a control law as if they
were the true system functions, completely ignoring
their uncertainty. When the uncertainty is large, for
example during start-up, this can lead to an inadequate transient response. To take the uncertainty
of the unknowns into consideration, a stochastic
adaptive approach can be taken (Astram and
Wittenmark, 1989; Saderstriim, 1994). This leads
to the so-called dual control principle introduced
by Feldbaum in the 1960s (Feldbaum, 1960,
1961,1965; Wittenmark, 1975,1995). Dual adaptive
2. DUAL CONTROL
The advantages of dual control follow because

the resulting system will possess the dual features of
(i) taking the system state optimally along a desired trajectory, with due consideration given to the
uncertainty of the parameter estimates and (ii) eliciting further information so as to reduce future
parameter uncertainty, thereby improving the estimation process. Effect (i) is called caution because,
in providing the tracking function, the controller
does not use the estimated parameters blindly as if
they were true. Effect (ii) is called probing because
the controller generates signals that encourage faster parameter convergence. Such a controller is said
*Received 13 February 1997, received in revised form 5

September 1997. This paper was not presented at any IFAC
meeting. This paper was recommended for publication in
revised form by Associate Editor C. J. Harris under the direction of Editor C. C. Hang. Corresponding author S. Fabri.
1729; E-mail
Tel. +44 114 222 5250; Fax +44114273
cop95sf@shefield.ac.uk.
t Department of Automatic Control 8~ Systems Engineering,
The University of Sheffield, Mappin Street, Sheffield, Sl 3JD, U.K.
245
246
Brief Papers
to be actively adaptive. Dual control can offer improvement

over other adaptive
schemes, particularly when the control horizon is short, the initial
parameter
uncertainty
is large or the parameters
are changing rapidly. It has exhibited improved
performance in practical applications
such as economic system optimization
(Bar-Shalom and Wall,
1980), chip refiner control in the pulp industry
(Allison et al., 1995) and roll angle control of a
vertical-takeoff
airplane pilot plant (Filatov et al.,
1996).
Technically,
a dual controller
aims at finding
a control input u(t) which minimizes the N-stage
criterion
N-l
Jdual
i 1=0
[Y(r + 1) - Y,(t + 1)]1 Y > (1)

1
where y,(t) is the system reference

controlled output, E { .} denotes
pectation taken over all random
ing the parameters, and Y is the
at time t defined as Y:= {y(t) .
input, y(t) is the

mathematical
exvariables, includinformation
state
y(0) u(t - 1) . . .
u(O)).
In principle, this control input can be found by
solution of the so-called Bellman equation, via dynamic programming.
However, in most practical
situations, this is impossible to implement because
it involves operations that are highly computationally and memory intensive (Astrom and Wittenmark,
1989). For this reason,
most practical
adaptive controllers disregard completely the dual
features proposed by Feldbaum and are referred
to as non dual controllers. Two such examples are
the heuristic certainty equivalence (HCE) and the
cautious controllers
(Bar-Shalom
and Tse, 1974).
These controllers
often result in an inadequate
transient
response;
the former exhibiting
large
overshoot and the latter, slow response time.
Some of the neural network control schemes
proposed
in literature,
being of the HCE type,
avoid the serious overshoot and stability problems
that might arise from neglecting caution by first
performing intensive, open-loop, off-line training to
identify the plant and reduce the prior uncertainty
of the parameters
(Narendra
and Parthasarathy,
1990; Rovithakis
and Christodoulou,
1994; Chen
and Khalil, 1995). Then a control and identification
phase is started, with the neural network parameters set to these pre-trained
values, which are
substantially
close to the actual values. In our case,
this pre-training
phase is avoided and parameter
uncertainty
is taken into consideration
and influenced
by a control
law derived from dual
adaptive
principles.
This is more efficient and
economical
in practical applications
because the
off-line training scheme can be time consuming and
hence expensive.
The approach to solve the problem of complexity

associated with Bellmans equation,
is to derive
control laws that are practically implementable
but
which, to a certain extent, retain the desirable properties of the ideal dual controller, i.e. caution and
probing. This approach does not lead to an exact
solution of Bellmans equation and so, such controllers are called suboptimal dual. They can be
broadly divided into the implicit and explicit type
(Filatov et al., 1996).
This investigation
will use an explicit type suboptimal dual approach based on that described by
Milito et al. (1982) for suboptimal control of linear
stochastic systems, but extended to the case of
stochastic, discrete-time,
affine nonlinear
systems.
Gaussian
radial
basis function
and sigmoidal
multilayer
perceptron
neural networks are both
considered for modelling the unknown
nonlinear
system functions.
1. CONTROLLER DESIGN
3.1. The control objective
The objective is to control the stochastic, singleinput single-output,
affine nonlinear
system of the
general form
y(t) =f[x(t
- l)] + y[x(t
- l)]u(t
+ e(r),
- 1)
(2)
where y(t) is the output, u(t) the control input,

x(t-l)=[y(t-n)...y(t-1)
u(t-l-m)...
u(r - 2)]T is the system
state vector,
mI n
are known system parameters,
f[x(t - l)] and
g[x(t - l)] are the unknown nonlinear functions of
the state vector, e(t) is independent
and zero-mean
Gaussian
noise with known variance a*. As explained by Chen and Khalil (1995), for stability of
the internal dynamics, we assume that the system is
minimum
phase and also that g[x(t - l)] is
bounded away from zero.
If
the
nonlinear
functions
f[x(t - l)],
g[x(t - l)] were known, the control law
u(t)
yr(t
1) -fCWl
9 CXNI
results
in y(t + 1) - yr(t + 1) = e(t + l), which
minimizes Jdualbecause the term in the summation
of cost function (1) will then be e*(t + 1) which by
assumption
is independent
of u(t) and Y (_&striim
and Wittenmark,
1989).
It is interesting
to note that the bilinear plant
studied in (Jacobs and Potter, 1978), whose dual
optimization
was solved numerically,
is a special
case of the more general nonlinear
class (2) considered in this paper.
241
Brief Papers
3.2. The Gaussian RBF neural network controller
We will first develop the design of the suboptimal
dual controller implemented via Gaussian radial
basis function neural networks.
3.2.1. Radial basisfunction networks. Two Gaussian radial basis function neural networks (Poggio
and Girosi, 1990) are used to approximate the nonlinear functions f[x(t - l)], g[x(t - l)] within
a compact set x. c IF?, where the state vector
x(t - 1) is known to be contained. 2 thus represents the network approximation region. The output of the neural networks is given by
j?rCx(t - l),
*f(t)1 = ~iT(O(Prlxt
- 111,
s^rCx(t
- l), *g(t)1= *;r(o@gCX(t
- 111,
where wf, wg are vectors containing the linear parameters of the neural network and af[x(t - l)],
@,[x(t - l)] are the Gaussian basis function vectors, whose ith element is given by
- Ilx - mfi II
afi = exp
20;
- Ilx - m,,
aei = exp
24
II2
I
where mf,, ng, are the coordinates of the centre of

the ith basis function and a:, C$ are the variances
for the networks approximating f[x(t - l)] and
g[x(t - l)], respectively.
The basis functions are centred on points of
a regular square sampling mesh inside xn where the
mesh spacing and the variance of the basis functions are chosen a priori. This can be done using the
method proposed by Sanner and Slotine (1992)
which would ensure any desired network approximation accuracy, provided the linear parameters
are set to some optimal value. However the optimal
linear parameters required to ensure this accuracy
are as yet unknown, and so they will be estimated
by recursive adjustment of wf and fig.
3.2.2. Parameter estimation. Assume that inside
xn the network approximation errors are negligibly
small when the linear parameter vectors are equal
to some optimal value w;, w:. Hence from equation (2) it follows that:
- l)] + e(t),
where
w* = [WlfrT; WZTIT, cP[x(t - l)]
= [$[x(t
- l)] : @r[x(t
P(t + 1) = {I - K(t)aT[x(t
- l)]u(t - 1)-J?
- l)]}P(t),
G(t + 1) = +(t) + K(t)&(t),
(4)
W)fNx@ - 1)l
K(t) = (d + @[x(t - l)]P(t)@[x(t
- l)])
with initial conditions w(0) = m, P(0) = R,. e(t):=

y(t) - f+(t)@[x(t - l)] denotes the innovations at
time t.
From the properties of Kalman filtering it follows that the conditional distribution of y(t + 1)
given Y is Gaussian with mean wT(t + l)@[x(t)]
and variance aT[x(t)]P(t + l)@[x(t)] + rr. Note
that although the Kalman filter might incur a significant computational burden when estimating the
parameters of large networks; nevertheless its use is
fundamental for updating the conditional probability distribution of y. This information is essential in
dual control because the uncertainty of the estimates must be taken into consideration.
3.2.3. The control law. We will consider an explicit-type, suboptimal dual cost function based on
the innovations dual controller developed by Milito
et al. (1982) for linear systems. This cost function
explicitly includes a term concerning the innovations at time t + 1, i.e. E(t + 1) = y(t + 1) wT(t + l)Qr[x(t)]. The idea is to reward performance that encourages &(t + 1) to remain high, so
that parameter updating in equation (4) is driven by
richer information. Hence, the cost function has the
form
Ji,, = E{ Cy(t + 1) - Yr(t +
+ re2(t
w*(t + 1) = w*(t),
y(t) = w*T@[x(t
The optimal parameters requiring estimation appear linearly in the output equation, so that the
well tstablished techniques based on Kalman filtering (Astrom, 1970; Jazwinski, 1970) can be used if
we assume that the initial optimal parameter vector
w* (0) has a Gaussian distribution with mean m and
covariance R,. Note that in practice R, can be used
to reflect the extent of prior knowledge of the parameters; larger values indicating great uncertainty,
and hence less confidence, in the initial parameter
estimate (Ljung, 1979).
Using Kalman filter theory (Astrom, 1970; Astriim and Wittenmark, 1989) we obtain the following recursive parameter adjustment rules:
+ 1)l Y},
1)12
+ qU2(t)
(5)
where E {. 1Y } denotes mathematical expectation

given information Y, yl(t) is the reference input,
q 2 0 and - 1 I r I 0 are designer-chosen scalar
weighting factors. The difference between this cost
function and that originally developed by Milito
et al. (1982), concerns the inclusion of a cost term
Brief Papers
248
for u(t). Higher q induces a penalty on large control

signals, reflecting that in practice, the control amplitude needs to be constrained. Smaller r rewards
higher variance of the innovations, encouraging
a dual-like effect.
Since the reference and control inputs are deterministic and ~(t + I)/ Y is Gaussian distributed,
then from the previously-mentioned properties of
the Kalman filter it can be shown that
Jinn = (r +
l)(QTIX(t)]P(t
+ l)@[X(t)]
CJ2)
+ qu2(t) + (tiT(t + l)@[x(t)]

- y,(r + l))*.
It is now relatively easy to optimize this cost function with respect to u(t) by differentiation and
equating to zero, avoiding the need to resort to
a complex dynamic programming algorithm that
would have resulted from solution of the ideal
optimal dual control problem. This results in the
optimal control law:
u*(t)
(Y& + 1) -.LC~lkhC~I
- (1 + r)c,f
irC.1 +q+(l fib,,
(6)
where the arguments [ 1 of $ and & are [x(t),

wr(t + l)] and [x(t), w,(t + l)], respectively,
ugf:= @i[x(t)]P~,-(t
+ l)@,[x(t)]
and uqg:= (Di
[x(t)]Pg,(t
+ l)@,[x(t)], and matrix P(t+ 1)has
been repartitioned as:
P&+1)
p;,o
pq,u + 1)
...
PgfU + 1)
+ 1)
.
where Pff , Pg4are square (nVl.x n,f), (nrgx nVg)

sub-matrices; nVf, n,, denoting the number of basis
functions in the x. & networks, respectively.
3.2.4. Analysis of the control law. From equation
(6) it is clear that the controller can take into
consideration the uncertainty of the parameter estimates via inclusion of the variance-related terms
us/ and uggin the control law. Parameter r is acting
as a weighting factor where, at one extreme, the
controller can completely ignore these parameter
uncertainty terms when Y= - 1 or, at the other
extreme, it gives maximum attention to parameter
uncertainty when r = 0. For intermediate settings,
- 1 < r < 0, one obtains a balance between these
two extremes.
The case r = 0 is equivalent to a cautious controller with the disadvantages normally associated
with this kind of suboptimal controller; namely
turn-off and slowness of response (Astrom and
Wittenmark, 1989). This follows because strong
emphasis is given to the uncertainty of the parameter estimates and the controller is very cautious
on using them. In fact, very small control signals
are applied when terms vgf and v09are large.
The case r = - 1 and q = 0, on the other hand,
corresponds to a controller designed on a heuristic
certainty equivalence basis. The parameter estimates v%(t)are used as if they were the optimal
parameters w* by replacing the actual nonlinear
system functions in control law (3) with the network approximations, completely disregarding the
approximation uncertainty. This often results in
excessively high peak overshoot during the transient part of the response because no consideration
is given to the fact that the parameters have not
yet achieved their optimal values and so quite
large control signals are applied, resulting from
the absence of the uncertainty terms in the control
law (6).
The case - 1 < r < 0 provides a compromise
between these two extremes, being neither too cautious (and hence sluggish) nor too bold (and hence
crude). This is the motivation behind the design of
Militos innovations suboptimal dual controller,
where the level of caution can be varied between
zero (non-cautious) and a value which results in
a cautious controller.
3.3. The sigmoidal MLP neural network controller
A neural network that is more widely used than
the radial basis function type is the sigmoidal multilayer perceptron network. Unfortunately this neural network does not preserve the advantage of
linearity in the unknown parameters and so its
parameter adjustment rules tend to be more complex than for the RBF case. However, because the
support of its basis functions is not localized, one
typically requires a relatively smaller number of
neurons to achieve similar function approximation
accuracy. This consideration is more so important
for cases of high dimensional input, considering
that RBF networks suffer from the curse of dimensionality where the number of units increases exponentially with state dimension.
3.3.1. Sigmoidal MLP networks. Two sigmoidal
MLP networks will be used, each having one
hidden layer and one summing output node, to
approximate the unknown functions f[x(t - 1)],
g[x(t - l)]. The outputs of the two neural networks are, respectively, given by
JCx(r - l), ir/(t)l= e;(t) @fCX@- 111,
y*sL-x(r- l), qt)l = ~,lw$Cx~t - 111,
(7)
where c,,c, are vectors containing the parameters

(weights) of the output layer neuron and
Brief Papers
af[x(t - l)], aB[x(t - l)] are the sigmoidal activation function vectors, representing the output of
the nodes in the hidden layer, whose ith element is
given by
a,< = l/(1 + exp(-w;,(t)x.(t
- l))),
aei = I/(1 + exp(--*~#)x,(t
- I))),
249
and Niranjan, 1993). The EKF applied to the

model of equation (8) gives
P(t + 1) = {I - K(t)V,(t)}P(t),
P(0) = R,,
+(t + 1) = tit(t) + K(t)&(t), w(O)= m,
(9)
W)V :(t)
K(t) = (02 + V,(t)P(t)V;f(t))
where GirTi( *i,(t) are the parameter vectors of the

ith neuron in the hidden layer and xJt - l):=
[x(t - l)T i llT denotes the system state vector augmented by an additional constant input serving as
a neuron bias input. The number of units in the
hidden layers of the j?sand & networks is denoted
by nsf and nsg, respectively.
The values of the network parameters required
to ensure some desired approximation accuracy are
unknown and hence require estimation. For convenience these will be grouped as a single vector
tv:= [iqql
where e(t):= y(t) - l@(t), x(t - l), u(t - 1)). V,,(t)

denotes the transpose of the gradient vector
of h(w*, x(t - l), u(t - 1)) with respect to w*
evaluated at w* = +(t), for which a closed form
expression can be found directly by differentiation,
resulting in
v,(t)
[Vh,(t)
! Vh,(tb(t
1)1,
(10)
where
V,,,(t) = [@:[x(t - I)] . . . ~f,exp(-+~ix,)
... t3Fi... e;ql ... bt$...]T.
In contrast to the RBF network, not all of these

appear linearly in equations (7) because of the parameters of the hidden layer neurons.
x (@,Jx;f . . . 1,
V,,(t) = [QT[x(t - l)] . . . f$,,exp(-wzix,)

x (@,,)2xf . . . 1,
3.3.2. Parameter estimation. As in the previous

case, we assume that there exist some optimal
values of the vector 6, denoted by w*:= [cfw/*T
*T . . . w,: . . . IT, such that the net. . . w;T . . . c,*Twgl
work approximation errors are arbitrarily small in
the space of interest. Hence from equations (7) and
(2), the plant can be modelled by:
w*(t + 1) = w*(t),
(8)
y(t) = h(w*, x(t - l), u(t - 1)) + e(t),
where
h(w*, x(t - l), u(t - l)):= C;T@f[Wf*,, x(t - l)]
+ ce*T@P,[W;*,
x(t - l)]u(t - 1)
is a nonlinear function of the unknown optimal
parameters w*. Since the parameters to be estimated do not appear linearly in the system model,
nonlinear estimation techniques have to be used.
The extended Kalman filter (EKF) (Anderson and
Moore, 1979; Jazwinski, 1970) is the most widely
used nonlinear estimator and for our case it also
represents a natural progression from the (linear)
Kalman filter used in the RBF network case. The
EKF has been applied in system identification using MLP networks (Kimura et al., 1996; Watanabe
et al., 1991) and shown to give better results than
the back-propagation training algorithm (Rumelhart et al., 1986), and also for function estimation
with Gaussian RBF networks (Kadirkamanathan
i = 1, . . . ,nsf,
i = 1, . . . , n,,
and tf, and & denote the ith element of vectors

C,, i$, respectively.
To be able to proceed in a similar manner to the
Gaussian RBF case, we will assume that the initial
optimal parameter vector w*(O) has a Gaussian
distribution with mean m and covariance R, and
that the conditional distribution of w*(t + 1) given
Y is approximately Gaussian, with mean *(t + 1)
and covariance matrix P(t + 1) as given by the
EKF equations (9) (Jacobs, 1974). It should be
emphasized that the latter is only an approximation and does not follow naturally as in the linear
Kalman filter. Even so, this assumption still does
not lead to straight forward conclusions regarding
the conditional distribution of y(t + 1) given Y,
because of the nonlinear relationship between
y and w* as seen in equation (8). Hence we will
linearize y(t + 1) about @(t + 1) by a first order
Taylor series expansion, to get
y(t + 1) x h(aqt + l), x(t), u(t))
+ V,(t + l)(w*(t + 1) - fi(t + 1))
+ e(t + 1)
(11)
where vh(t + 1) is the same as v,,(t) but evaluated at

w(t + l), u(t) and x(t).
From the assumed Gaussian conditional distribution of w*(t + l), mentioned before, and equation (11) it thus follows that the conditional distribution of y(t + 1) given Y is also approximately
Brief Papers
250
Gaussian with mean h(+(t + l), x(t), u(t)) and variance V,(t + l)P(t + l)V,T(t + 1) + g2.
3.3.3. The control law. Consider the same cost
function as before, based on Militos innovations
dual controller given by equation (5), where in this
case the innovations sequence is ~(t + 1) = y(t f 1)
- h@(t + l), x(t), u(t)).
Proceeding exactly as for the RBF case and using
the approximations on the conditional distribution
of y(t + 1) outlined in the previous section, the
optimal control law is obtained as
u*(t)= (Yr@+ 1) -m)s^sc~1 - (1 + r)CLSf

(12)
s^fC.l+4+(1+G,,
where the arguments [ 1 of fs and is are [x(l),

iZ,(t + l)] and [x(t), &(t + l)], respectively.
pLsf:= Vhg(r+ l)P,,(t + l)V%(t + 1) and p,,:= Vhg
(t + l)P,,(t + l)V;fg(t + 1) represent the uncertainty terms. P(t + 1) has been partitioned as before,
but in this case Pf,, P,, are square (nsf(n +
m + 2) x nJn + m + 2)), (nsg(n + m -t 2) x nsg(n +
m + 2)) sub-matrices, respectively.
Note that the result is a control law which is
identical in structure to that of the RBF network
controller, except that the Gaussian basis function
vectors appearing in the uncertainty terms are replaced by the corresponding gradient vector of
h evaluated at ti. This reflects the principle behind
the extended Kalman filter which was used for
estimation; namely that it linearizes a nonlinear
system about the most recent parameter estimate
and consequently the statistical properties of its
estimates are calculated on this basis. Hence, as far
as controller performance is concerned, comments
similar to the RBF dual adaptive controller apply
also in this case.
4. SIMULATION
RESULTS
The performance of the system was tested by

simulation of two example plants that satisfy the
general affine form and assumptions outlined in

Section 3.1. They do not represent any particular
practical application, but have proved convenient
for analysis of the proposed system. Note that the
plants were not subjected to an initial open-loop
system identification phase. Closed-loop control
was activated immediately with the initial parameter estimates selected at random, and not having
been pre-trained.
4.1. Simulation 1
The plant of the first simulation is given by
y(t + 1) = sin(x(t)) + cos(3x(t))
+ (2 + cos(x(t)))u(t) + e(t + l),

where state x(t) = y(t) and the noise variance
gz = 0.001. f(x) = sin(x(t)) + cos(3x(t)) and g(x)
= 2 + cos(x(t)) represent the unknown nonlinear
dynamics. The reference input is obtained by sampling a unit amplitude, 0.1 Hz square wave filtered
by a network of transfer function l/(s + 1). A
Gaussian RBF controller is implemented in this
example. The network approximation region is
chosen as x,, = [ - 2,2]. The i network is chosen to
have Gaussian basis functions of variance 1 placed
on a mesh of spacing 0.5, whilst the & network basis
functions have variance of 3.6 and a mesh spacing
of 2. The Kalman filter initial parameter covariance
was set to P(0) = 10001.
Trials were conducted with three different controllers corresponding to the heuristic certainty
equivalence (HCE) (r = -l), cautious (r = 0) and
innovations dual (- 1 < r < 0) controller. The
same noise sequence, initial conditions and reference input were used in each case. A typical output
is shown in Fig. 1. (N.B: In plot a(i), the y-axis is
truncated to enable clear visualization of the
steady-state tracking. The actual amplitude in the
initial period of the response can be determined
from plot a(ii), which is purposely drawn at a different scale from the rest.) Note that, as expected, the
Fig. 1. Tracking error and accumulated cost: (a) HCE; (b) cautious; (c) dual.
251
Brief Papers
HCE controller initially responds violently, showing large overshoot, because it is not taking into
consideration the inaccuracy of the parameter estimates. Only after the initial period, when the parameters converge, does the control assume good
tracking. On the contrary, the cautious controller is
slow to respond during the initial period, knowing
that the parameter estimates are still inaccurate.
Hence although no violent response is exhibited,
the controller is practically turned off during the
first 2 s. The innovations dual controller reaches
a compromise between these two extremes, clearly
showing no particularly unacceptable peak overshoot whilst tracking the reference input earlier
than the cautious controller. Hence, even qualitatively, it is clear that the performance of the innovations dual controller is the better one. To quantify
the performance, a Monte Carlo analysis involving
500 trials was performed. The accumulated cost
V(T) = CT=, (Y,(t) - y(t))* was calculated over the
whole simulation interval time T at each trial. The
results are shown in Fig. 1. The average of the
accumulated cost over 500 trials was 1434,6.7 and
5.7 for the HCE, cautious and dual cases, respectively. Hence, the dual controller shows the best
performance.
To reduce the overshoot of the HCE controller it
is tempting to increase the cost function weight
4 associated with u(t). Although this does reduce
overshoot, in some cases it can cause a general
deterioration of the tracking capabilities in the
steady-state, as shown in Fig. 2, where 4 was set to
1 for the HCE controller and 0.0001 for the other
two. The HCE accumulated cost is reduced drastically to around 15 but it is still higher than 6, the
order of magnitude of the cautious and dual controllers. The reason is that 4 tends to limit the
amplitude of the control at all times and not only
during those periods when parameter uncertainty is
large.
4.2. Simulation 2
The plant of the second simulation is similar to
that used in Chen and Khalil(1995), namely,
Y(r + 1) =0x(t))
+ s(x(r))u(r) + e(r + I),
where
x(t) = [y(t - l)y(t)lT,
g(x) = 1.2 and
f(x) = (1*5Y(t)Y(t - I)/(1 + Y*(t) + Y*(t - 1))) +
0.35sin(y(t) + y(t - 1)) represent the unknown
nonlinear dynamics and the noise e(t) has variance
a2 = 0.05. The reference input is the same as in
simulation 1. A MLP neural controller is tested on
this plant where the js and as networks are structured with 10 and 5 hidden unit neurons, respectively. The initial parameter estimates are chosen at
random and the initial covariance matrix P(0)has
a di_agonal structure with the terms corresponding
to fS and as set to 50 and 10, respectively. As before,
trials were conducted using the three different controllers with 4 set to 0.0001 in all cases. A typical
output is shown in Fig. 3. Note that the same
comments also apply in this case, with the innovations dual controller performing better. Figure 3
shows the results of the accumulated cost from
Monte Carlo analysis. The average of the accumulated cost over 100 trials was 500,48 and 42 for the
HCE, cautious and dual cases respectively. It is
clear that the innovations dual controller shows
best performance. A Gaussian RBF controller was
also tried on this plant subjected to noise rrz = 0.01,
with similar results as shown in Fig. 4.
5. CONCLUSIONS
The main contribution of the paper is to show
how dual control concepts can be applied to neural
adaptive control of unknown nonlinear systems
that are subjected to stochastic disturbances.
This method has the advantage of improving the
transient response of the system, especially during
(4 =mI YI t-4
Fig. 2. Effect of q: (a) HCE (q = 1); (b) cautious (q = 0.0001); (c) dual (q = 0.0001).
252
Brief Papers
10
JO
40
2nb.r
6.2
Me
Fig. 3. Tracking
error and accumulated
cost: (a) HCE; (b) cautious;
(c) dual.
Fig. 4. Tracking
error and accumulated
cost: (a) HCE; (b) cautious;
(c) dual.
start-up, when the uncertainty of the parameter

estimates is high. The system developed is based on
the innovations dual controller as originally proposed by Milito et al. (1982) for linear systems. It
was shown that the advantages of this design,
namely superior performance over nondual controllers and a relatively simple control law, also
hold for the more complex case of neural adaptive
control of affine nonlinear systems. In particular,
the suboptimal dual controller takes into consideration parameter uncertainty by introducing caution-like effects. Hence control and estimation are
performed simultaneously from the outset; eliminating the necessity of preceding the control phase
with an off-line, open-loop system identification
phase, as is typically the case with indirect-adaptive
HCE neural network control schemes. At the same
time, the level of caution can be varied so as to
reduce unacceptably slow responses.
Both Gaussian RBF and sigmoidal MLP neural
networks have been used for learning the unknown
nonlinearities. The network training algorithms
were based on Kalman filters for the RBF case and
extended Kalman filter theory for the MLP case.
Simulation results have been presented and the
advantage of utilising a suboptimal dual approach
has been confirmed by Monte Carlo analysis.
Issues for further research include evaluation of the

conditions under which the assumptions taken in
the derivation of the MLP control law are justified
and analysis of the stability properties of the closed
loop system.
AcknowledgementsPS. Fabri would like to acknowledge
support received from the ORS Awards Scheme, The University of
Sheffield and the University of Malta.
REFERENCES
Allison, B. J., J. E. Ciarniello, J. C. Tessier and G. A. Dumont
(1995). Dual adaptive control of chip refiner motor load.
Automatica, 31(8), 1169-l 184.
Anderson,
B. 0. and J. B. Moore (1979). Optima/ Filrering.
Prentice-Hall,
U.S.A.
astriim, K. J. (1970). Introduction to Stochastic Control The0r.v.
Academic Press, New York.
WstrGm, K. J. and A. Helmersson
(1986). Dual control of an
integrator with unknown gain. Comput. Maths. A&., 12A(6),
653-662.
AstrGm, K. J. and B. Wittenmark
(1989). Adaptive Control.
Addison-Wesley,
Reading, MA, U.S.A.
Bar-Shalom, Y. and E. Tse (1974). Dual effect, certainty equivalence and separation in stochastic control. IEEE Trans. Automat. Control, AC-19, 494-500.
Bar-Shalom,
Y. and K. D. Wall (1980). Dual adaptive control
and uncertainty
effects in macroeconomic
systems optimization, Automatica, 16, 147-156.
Chan, S. S. and M. B. Zarrop (1985). A suboptimal dual controller for stochastic systems with unknown parameters.
Int. J.
Concro!, 41(2), 507-524.
Brief Papers
253
Chen, F. C. (1990). Back-propagation neural networks for nonlinear self-tuning adaptive control (1990). IEEE Control Sys.
Mag. (Special issue on Neural Networks for Control Systems),
10(3), 44-48.
Chen, F. C. and H. K. Khalil (1992). Adaptive control of nonlinear systems using neural networks. Int. J. Control, 55,
1299-1317.
Chen, F. C. and H. K. Khalil(1995). Adaptive control of a class
of nonlinear discrete-time systems. IEEE Trans. Automat.
Control, 40(5), 791-801.
Feldbaum, A. A. (1960). Dual control theory I-II. Automation
and Remote Control, 21, 874-880, 10331039.
Feldbaum, A. A. (1961). Dual control theory III-IV. Automation
and Remote Control, 22, 1-12, 109-121.
Feldbaum, A. A. (1965). Optimal Control Systems. Academic
Press, New York.
Filatov, N. M., U. Keuchel and H. Unbehauen (1996). Dual
control for an unstable mechanical plant. IEEE Control Sys.
Mag. 16(4), 31-37.
Filatov, N. M., H. Unbehauen and U. Keuchel (1995). Dual
version of direct adaptive pole placement controller. In
Ljung, L. (1979). Asymptotic behaviour of the extended Kalman

filter as a parameter estimator for linear systems. IEEE Trans.
Automat. Control, AC-24, 36-50.
Maitelli, A. L. and T. Yoneyama (1994). A two-stage suboptimal
controller for stochastic systems using approximate moments.
Preprints of the 5th IFAC Symp. Adaptive Systems in Control

and Signal Processing, IFAC, Hungary, pp. 449-454.
Gevers, M. (1995). Identification for control. In Preprints of the
5th IFAC Symposium on Adaptive Systems in Control and
Signal Processing, IFAC, Hungary, pp. l-12.
Jacobs, 0. L. R. (1974). Introduction to Control Theory. Oxford
Rumelhart, D. E., G. E. Hinton and Williams (1986). Learning

internal representations by error propagation, volume 1
of Parallel Distributed Processing: Explorations in the
Microstructure of Cognition, Bradford Books/MIT Press,
Cambridge, MA,-pp. 318-361.
Sanner. R. M. and J. J. E. Slotine (1992). Gaussian networks for
direct adaptive control. IEEE &an;. Neural Networks, 3(6),
University Press, U.K.

Jacobs, 0. L. R. and J. Patchell, (1972). Caution and probing in
stochastic control. Int. J. Control, 16(l), 189-199.
Jacobs, 0. L. R. and R. V. Potter (1978). Optimal control of
a stochastic bilinear system. In M. J. Gregson, ed., Recent
Theoretical Developments in Control, Chap. 22, 403-419.
Academic Press, USA.
Jazwinski, A. H. (1970). Stochastic Processes and Filtering
Theory. Academic Press, New York.
Kadirkamanathan, V. and M. Niranjan (1993). A function
estimation approach to sequential learning with neural
networks. Neural Comput. S(6), 854-975.
Kimura, A., I. Arizona and H. Ohta (1996). An improvement of
a back propagation algorithm by the extended Kalman filter
and demand forecasting by layered neural networks. Int. J.
Sys. Sci., 27(5), 473-482.
Liu. G. P., V. Kadirkamanathan and S. A. Billings (1996).
Nonlinear predictive control via neural networks- In Proc.
IEE Int. Co& on Control 96. IEE. U.K., D. 746.
Liu, G. P., V.- Kadirkamanathan and S. k. Billings (1997).
Variable structure control for nonlinear discrete systems using neural networks. In 4th European Control Conf ECC97,
Belgium.
Automatica, 30(12), 1949-1954.
Milito, R., C. S. Padilla, R. A. Padiia and D. Cadorin (1982). An

innovations approach to dual control. IEEE Trans. Automat.
Control, AC-27(l),
133-137.
Narendra, K. S. and K. Parthasarathy (1990). Identification and

control of dynamical systems using neural networks. IEEE
Trans. Neural Networks, l(l), 4-27.
Poggio, T. and F. Girosi (1990). Networks for approximation
a;lh learning. Proc. IEEE, 78(9), 1481-1497.
__
Pronzato, L., C. KulcsL and E. Walter (1996). An actively
adaptive control policy for linear models. IEEE Trans.
Automat. Control, 41(6), 855-858.
Rovithakis. G. A. and M. A. Christodoulou (1994). AdaDtive control of unknown plants using dyn&&l
neural
networks. IEEE Trans. Sys. Man and Cybernetics, 24(3),
W-412.
837-863.
Siiderstriim, T. (1994). Discrete-time Stochastic Systems: Estimation and Control. Prentice-Hall International, U.K.
Tse, E. and Y. Bar-Shalom (1973). Wide-sense adaptive dual
control for nonlinear stochastic systems. IEEE Trans. Automat. Control, AC-18(2), 98-108.
Tse, E. and Y. Bar-Shalom (1976). Actively adaptive control

for nonlinear stochastic systems. Proc. IEEE, 64(8),
1172-1181.
Watanabe, K., T. Fukuda and S. Tzafestas (1991). Learning

algorithms of layered neural networks via extended Kalman
filters. Int. J. Sys. Sci., 22(4), 753-768.
Wenk, C. J. and Y. Bar-Shalom (1980). A multiple model
adaptive dual control algorithm for stochastic systems with
unknown parameters. IEEE Trans. Automat. Control,
AC-25(4), 703-710.
Wittenmark, B. (1975). Stochastic adaptive control methods:

a survey. Int. J. Control, 21, 705-730.
Wittenmark, B. (1995). Adaptive dual control methods: An overview. In Preprints of the 5th IFAC Symposium on Adaptive
Systems in Control and Signal Processing, IFAC, Hungary,
pp. 67-72.

Dual Adaptive Control of Nonlinear Stoch

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Dual Adaptive Control of Nonlinear Stoch

Enviado por

Direitos autorais:

Formatos disponíveis

PIL: sooo5-1o!J8(!

Automatica, Vol. 34, No. 2, pp. 245-253, 1998

Dual Adaptive Control of Nonlinear Stochastic

control; stochastic control; neural control; adaptive control; nonlinear control.

Abstract-A suboptimal dual adaptive system is developed for

The use of neural networks for adaptive control of

The advantages of dual control follow because

*Received 13 February 1997, received in revised form 5

to be actively adaptive. Dual control can offer improvement

[Y(r + 1) - Y,(t + 1)]1 Y > (1)

where y,(t) is the system reference

input, y(t) is the

The approach to solve the problem of complexity

where y(t) is the output, u(t) the control input,

where mf,, ng, are the coordinates of the centre of

G(t + 1) = +(t) + K(t)&(t),

with initial conditions w(0) = m, P(0) = R,. e(t):=

where E {. 1Y } denotes mathematical expectation

for u(t). Higher q induces a penalty on large control

+ qu2(t) + (tiT(t + l)@[x(t)]

where the arguments [ 1 of $ and & are [x(t),

where Pff , Pg4are square (nVl.x n,f), (nrgx nVg)

where c,,c, are vectors containing the parameters

aei = I/(1 + exp(--*~#)x,(t

and Niranjan, 1993). The EKF applied to the

+(t + 1) = tit(t) + K(t)&(t), w(O)= m,

where GirTi( *i,(t) are the parameter vectors of the

where e(t):= y(t) - l@(t), x(t - l), u(t - 1)). V,,(t)

... t3Fi... e;ql ... bt$...]T.

In contrast to the RBF network, not all of these

V,,(t) = [QT[x(t - l)] . . . f$,,exp(-wzix,)

3.3.2. Parameter estimation. As in the previous

and tf, and & denote the ith element of vectors

where vh(t + 1) is the same as v,,(t) but evaluated at

u*(t)= (Yr@+ 1) -m)s^sc~1 - (1 + r)CLSf

where the arguments [ 1 of fs and is are [x(l),

The performance of the system was tested by

general affine form and assumptions outlined in

+ (2 + cos(x(t)))u(t) + e(t + l),

+ s(x(r))u(r) + e(r + I),

error and accumulated

cost: (a) HCE; (b) cautious;

error and accumulated

cost: (a) HCE; (b) cautious;

start-up, when the uncertainty of the parameter

Issues for further research include evaluation of the

Ljung, L. (1979). Asymptotic behaviour of the extended Kalman

Preprints of the 5th IFAC Symp. Adaptive Systems in Control

Rumelhart, D. E., G. E. Hinton and Williams (1986). Learning

University Press, U.K.

Automatica, 30(12), 1949-1954.

Milito, R., C. S. Padilla, R. A. Padiia and D. Cadorin (1982). An

Narendra, K. S. and K. Parthasarathy (1990). Identification and

Tse, E. and Y. Bar-Shalom (1976). Actively adaptive control

Watanabe, K., T. Fukuda and S. Tzafestas (1991). Learning

Wittenmark, B. (1975). Stochastic adaptive control methods:

Você também pode gostar