Escolar Documentos
Profissional Documentos
Cultura Documentos
37)00181-7
Pergamon
0 1998
Brief Paper
control has been analyzed mainly for adaptive control of linear systems with unknown parameters
(Astr6m and Helmersson, 1986; Chan and Zarrop,
1985; Filatov et al., 1995; Jacobs and Patchell, 1972;
Maitelli and Yoneyama, 1994; Pronzato et al.,
1996; Wenk and Bar-Shalom, 1980) or for nonlinear systems having known functionals but whose
state must be estimated (Tse and Bar-Shalom, 1973,
1976). Because of the advantages associated with it,
there has been a recent resurgence of research on
dual control (Filatov et al., 1995; Gevers, 1995;
Maitelli and Yoneyama, 1994; Pronzato et al.,
1996; Wittenmark, 1995). However, none of these
addresses the problem when the system is nonlinear
and the functions are unknown.
Hence, in this work we investigate the use of dual
adaptive control for the affine class of nonlinear,
discrete-time systems when the nonlinear functions
are unknown and a stochastic additive disturbance
is present at the output. Two types of neural network are considered for modelling the unknown
functions. In Section 2 a brief overview of dual
control is given. Section 3 develops the dual neural
network controller for both cases of Gaussian
radial basis function (RBF) and sigmoidal multilayer perceptron (MLP) networks. Section 4 contains simulation results and this is followed by
a conclusion.
1. INTRODUCTION
2. DUAL CONTROL
246
Brief Papers
Jdual
i 1=0
u(O)).
In principle, this control input can be found by
solution of the so-called Bellman equation, via dynamic programming.
However, in most practical
situations, this is impossible to implement because
it involves operations that are highly computationally and memory intensive (Astrom and Wittenmark,
1989). For this reason,
most practical
adaptive controllers disregard completely the dual
features proposed by Feldbaum and are referred
to as non dual controllers. Two such examples are
the heuristic certainty equivalence (HCE) and the
cautious controllers
(Bar-Shalom
and Tse, 1974).
These controllers
often result in an inadequate
transient
response;
the former exhibiting
large
overshoot and the latter, slow response time.
Some of the neural network control schemes
proposed
in literature,
being of the HCE type,
avoid the serious overshoot and stability problems
that might arise from neglecting caution by first
performing intensive, open-loop, off-line training to
identify the plant and reduce the prior uncertainty
of the parameters
(Narendra
and Parthasarathy,
1990; Rovithakis
and Christodoulou,
1994; Chen
and Khalil, 1995). Then a control and identification
phase is started, with the neural network parameters set to these pre-trained
values, which are
substantially
close to the actual values. In our case,
this pre-training
phase is avoided and parameter
uncertainty
is taken into consideration
and influenced
by a control
law derived from dual
adaptive
principles.
This is more efficient and
economical
in practical applications
because the
off-line training scheme can be time consuming and
hence expensive.
1. CONTROLLER DESIGN
3.1. The control objective
The objective is to control the stochastic, singleinput single-output,
affine nonlinear
system of the
general form
y(t) =f[x(t
- l)] + y[x(t
- l)]u(t
+ e(r),
- 1)
(2)
yr(t
1) -fCWl
9 CXNI
results
in y(t + 1) - yr(t + 1) = e(t + l), which
minimizes Jdualbecause the term in the summation
of cost function (1) will then be e*(t + 1) which by
assumption
is independent
of u(t) and Y (_&striim
and Wittenmark,
1989).
It is interesting
to note that the bilinear plant
studied in (Jacobs and Potter, 1978), whose dual
optimization
was solved numerically,
is a special
case of the more general nonlinear
class (2) considered in this paper.
241
Brief Papers
3.2. The Gaussian RBF neural network controller
We will first develop the design of the suboptimal
dual controller implemented via Gaussian radial
basis function neural networks.
3.2.1. Radial basisfunction networks. Two Gaussian radial basis function neural networks (Poggio
and Girosi, 1990) are used to approximate the nonlinear functions f[x(t - l)], g[x(t - l)] within
a compact set x. c IF?, where the state vector
x(t - 1) is known to be contained. 2 thus represents the network approximation region. The output of the neural networks is given by
j?rCx(t - l),
*f(t)1 = ~iT(O(Prlxt
- 111,
s^rCx(t
- l), *g(t)1= *;r(o@gCX(t
- 111,
where wf, wg are vectors containing the linear parameters of the neural network and af[x(t - l)],
@,[x(t - l)] are the Gaussian basis function vectors, whose ith element is given by
- Ilx - mfi II
afi = exp
20;
- Ilx - m,,
aei = exp
24
II2
I
- l)] + e(t),
where
w* = [WlfrT; WZTIT, cP[x(t - l)]
= [$[x(t
- l)] : @r[x(t
P(t + 1) = {I - K(t)aT[x(t
- l)]u(t - 1)-J?
- l)]}P(t),
(4)
W)fNx@ - 1)l
K(t) = (d + @[x(t - l)]P(t)@[x(t
- l)])
+ re2(t
w*(t + 1) = w*(t),
y(t) = w*T@[x(t
The optimal parameters requiring estimation appear linearly in the output equation, so that the
well tstablished techniques based on Kalman filtering (Astrom, 1970; Jazwinski, 1970) can be used if
we assume that the initial optimal parameter vector
w* (0) has a Gaussian distribution with mean m and
covariance R,. Note that in practice R, can be used
to reflect the extent of prior knowledge of the parameters; larger values indicating great uncertainty,
and hence less confidence, in the initial parameter
estimate (Ljung, 1979).
Using Kalman filter theory (Astrom, 1970; Astriim and Wittenmark, 1989) we obtain the following recursive parameter adjustment rules:
+ 1)l Y},
1)12
+ qU2(t)
(5)
Brief Papers
248
l)(QTIX(t)]P(t
+ l)@[X(t)]
CJ2)
(Y& + 1) -.LC~lkhC~I
- (1 + r)c,f
irC.1 +q+(l fib,,
(6)
p;,o
pq,u + 1)
...
PgfU + 1)
+ 1)
.
emphasis is given to the uncertainty of the parameter estimates and the controller is very cautious
on using them. In fact, very small control signals
are applied when terms vgf and v09are large.
The case r = - 1 and q = 0, on the other hand,
corresponds to a controller designed on a heuristic
certainty equivalence basis. The parameter estimates v%(t)are used as if they were the optimal
parameters w* by replacing the actual nonlinear
system functions in control law (3) with the network approximations, completely disregarding the
approximation uncertainty. This often results in
excessively high peak overshoot during the transient part of the response because no consideration
is given to the fact that the parameters have not
yet achieved their optimal values and so quite
large control signals are applied, resulting from
the absence of the uncertainty terms in the control
law (6).
The case - 1 < r < 0 provides a compromise
between these two extremes, being neither too cautious (and hence sluggish) nor too bold (and hence
crude). This is the motivation behind the design of
Militos innovations suboptimal dual controller,
where the level of caution can be varied between
zero (non-cautious) and a value which results in
a cautious controller.
3.3. The sigmoidal MLP neural network controller
A neural network that is more widely used than
the radial basis function type is the sigmoidal multilayer perceptron network. Unfortunately this neural network does not preserve the advantage of
linearity in the unknown parameters and so its
parameter adjustment rules tend to be more complex than for the RBF case. However, because the
support of its basis functions is not localized, one
typically requires a relatively smaller number of
neurons to achieve similar function approximation
accuracy. This consideration is more so important
for cases of high dimensional input, considering
that RBF networks suffer from the curse of dimensionality where the number of units increases exponentially with state dimension.
3.3.1. Sigmoidal MLP networks. Two sigmoidal
MLP networks will be used, each having one
hidden layer and one summing output node, to
approximate the unknown functions f[x(t - 1)],
g[x(t - l)]. The outputs of the two neural networks are, respectively, given by
JCx(r - l), ir/(t)l= e;(t) @fCX@- 111,
y*sL-x(r- l), qt)l = ~,lw$Cx~t - 111,
(7)
Brief Papers
af[x(t - l)], aB[x(t - l)] are the sigmoidal activation function vectors, representing the output of
the nodes in the hidden layer, whose ith element is
given by
a,< = l/(1 + exp(-w;,(t)x.(t
- l))),
- I))),
249
P(0) = R,,
(9)
W)V :(t)
K(t) = (02 + V,(t)P(t)V;f(t))
tv:= [iqql
[Vh,(t)
! Vh,(tb(t
1)1,
(10)
where
V,,,(t) = [@:[x(t - I)] . . . ~f,exp(-+~ix,)
x (@,Jx;f . . . 1,
(8)
y(t) = h(w*, x(t - l), u(t - 1)) + e(t),
where
h(w*, x(t - l), u(t - l)):= C;T@f[Wf*,, x(t - l)]
+ ce*T@P,[W;*,
x(t - l)]u(t - 1)
is a nonlinear function of the unknown optimal
parameters w*. Since the parameters to be estimated do not appear linearly in the system model,
nonlinear estimation techniques have to be used.
The extended Kalman filter (EKF) (Anderson and
Moore, 1979; Jazwinski, 1970) is the most widely
used nonlinear estimator and for our case it also
represents a natural progression from the (linear)
Kalman filter used in the RBF network case. The
EKF has been applied in system identification using MLP networks (Kimura et al., 1996; Watanabe
et al., 1991) and shown to give better results than
the back-propagation training algorithm (Rumelhart et al., 1986), and also for function estimation
with Gaussian RBF networks (Kadirkamanathan
i = 1, . . . ,nsf,
i = 1, . . . , n,,
(11)
Brief Papers
250
Gaussian with mean h(+(t + l), x(t), u(t)) and variance V,(t + l)P(t + l)V,T(t + 1) + g2.
3.3.3. The control law. Consider the same cost
function as before, based on Militos innovations
dual controller given by equation (5), where in this
case the innovations sequence is ~(t + 1) = y(t f 1)
- h@(t + l), x(t), u(t)).
Proceeding exactly as for the RBF case and using
the approximations on the conditional distribution
of y(t + 1) outlined in the previous section, the
optimal control law is obtained as
RESULTS
Fig. 1. Tracking error and accumulated cost: (a) HCE; (b) cautious; (c) dual.
251
Brief Papers
HCE controller initially responds violently, showing large overshoot, because it is not taking into
consideration the inaccuracy of the parameter estimates. Only after the initial period, when the parameters converge, does the control assume good
tracking. On the contrary, the cautious controller is
slow to respond during the initial period, knowing
that the parameter estimates are still inaccurate.
Hence although no violent response is exhibited,
the controller is practically turned off during the
first 2 s. The innovations dual controller reaches
a compromise between these two extremes, clearly
showing no particularly unacceptable peak overshoot whilst tracking the reference input earlier
than the cautious controller. Hence, even qualitatively, it is clear that the performance of the innovations dual controller is the better one. To quantify
the performance, a Monte Carlo analysis involving
500 trials was performed. The accumulated cost
V(T) = CT=, (Y,(t) - y(t))* was calculated over the
whole simulation interval time T at each trial. The
results are shown in Fig. 1. The average of the
accumulated cost over 500 trials was 1434,6.7 and
5.7 for the HCE, cautious and dual cases, respectively. Hence, the dual controller shows the best
performance.
To reduce the overshoot of the HCE controller it
is tempting to increase the cost function weight
4 associated with u(t). Although this does reduce
overshoot, in some cases it can cause a general
deterioration of the tracking capabilities in the
steady-state, as shown in Fig. 2, where 4 was set to
1 for the HCE controller and 0.0001 for the other
two. The HCE accumulated cost is reduced drastically to around 15 but it is still higher than 6, the
order of magnitude of the cautious and dual controllers. The reason is that 4 tends to limit the
amplitude of the control at all times and not only
during those periods when parameter uncertainty is
large.
4.2. Simulation 2
The plant of the second simulation is similar to
that used in Chen and Khalil(1995), namely,
Y(r + 1) =0x(t))
where
x(t) = [y(t - l)y(t)lT,
g(x) = 1.2 and
f(x) = (1*5Y(t)Y(t - I)/(1 + Y*(t) + Y*(t - 1))) +
0.35sin(y(t) + y(t - 1)) represent the unknown
nonlinear dynamics and the noise e(t) has variance
a2 = 0.05. The reference input is the same as in
simulation 1. A MLP neural controller is tested on
this plant where the js and as networks are structured with 10 and 5 hidden unit neurons, respectively. The initial parameter estimates are chosen at
random and the initial covariance matrix P(0)has
a di_agonal structure with the terms corresponding
to fS and as set to 50 and 10, respectively. As before,
trials were conducted using the three different controllers with 4 set to 0.0001 in all cases. A typical
output is shown in Fig. 3. Note that the same
comments also apply in this case, with the innovations dual controller performing better. Figure 3
shows the results of the accumulated cost from
Monte Carlo analysis. The average of the accumulated cost over 100 trials was 500,48 and 42 for the
HCE, cautious and dual cases respectively. It is
clear that the innovations dual controller shows
best performance. A Gaussian RBF controller was
also tried on this plant subjected to noise rrz = 0.01,
with similar results as shown in Fig. 4.
5. CONCLUSIONS
The main contribution of the paper is to show
how dual control concepts can be applied to neural
adaptive control of unknown nonlinear systems
that are subjected to stochastic disturbances.
This method has the advantage of improving the
transient response of the system, especially during
(4 =mI YI t-4
Fig. 2. Effect of q: (a) HCE (q = 1); (b) cautious (q = 0.0001); (c) dual (q = 0.0001).
252
Brief Papers
10
JO
40
2nb.r
6.2
Me
Fig. 3. Tracking
(c) dual.
Fig. 4. Tracking
(c) dual.
REFERENCES
Allison, B. J., J. E. Ciarniello, J. C. Tessier and G. A. Dumont
(1995). Dual adaptive control of chip refiner motor load.
Automatica, 31(8), 1169-l 184.
Anderson,
B. 0. and J. B. Moore (1979). Optima/ Filrering.
Prentice-Hall,
U.S.A.
astriim, K. J. (1970). Introduction to Stochastic Control The0r.v.
Academic Press, New York.
WstrGm, K. J. and A. Helmersson
(1986). Dual control of an
integrator with unknown gain. Comput. Maths. A&., 12A(6),
653-662.
AstrGm, K. J. and B. Wittenmark
(1989). Adaptive Control.
Addison-Wesley,
Reading, MA, U.S.A.
Bar-Shalom, Y. and E. Tse (1974). Dual effect, certainty equivalence and separation in stochastic control. IEEE Trans. Automat. Control, AC-19, 494-500.
Bar-Shalom,
Y. and K. D. Wall (1980). Dual adaptive control
and uncertainty
effects in macroeconomic
systems optimization, Automatica, 16, 147-156.
Chan, S. S. and M. B. Zarrop (1985). A suboptimal dual controller for stochastic systems with unknown parameters.
Int. J.
Concro!, 41(2), 507-524.
Brief Papers
253
Chen, F. C. (1990). Back-propagation neural networks for nonlinear self-tuning adaptive control (1990). IEEE Control Sys.
Mag. (Special issue on Neural Networks for Control Systems),
10(3), 44-48.
Chen, F. C. and H. K. Khalil (1992). Adaptive control of nonlinear systems using neural networks. Int. J. Control, 55,
1299-1317.
Chen, F. C. and H. K. Khalil(1995). Adaptive control of a class
of nonlinear discrete-time systems. IEEE Trans. Automat.
Control, 40(5), 791-801.
Feldbaum, A. A. (1960). Dual control theory I-II. Automation
and Remote Control, 21, 874-880, 10331039.
Feldbaum, A. A. (1961). Dual control theory III-IV. Automation
and Remote Control, 22, 1-12, 109-121.
Feldbaum, A. A. (1965). Optimal Control Systems. Academic
Press, New York.
Filatov, N. M., U. Keuchel and H. Unbehauen (1996). Dual
control for an unstable mechanical plant. IEEE Control Sys.
Mag. 16(4), 31-37.
Filatov, N. M., H. Unbehauen and U. Keuchel (1995). Dual
version of direct adaptive pole placement controller. In
133-137.
Rovithakis. G. A. and M. A. Christodoulou (1994). AdaDtive control of unknown plants using dyn&&l
neural
networks. IEEE Trans. Sys. Man and Cybernetics, 24(3),
W-412.
837-863.
Siiderstriim, T. (1994). Discrete-time Stochastic Systems: Estimation and Control. Prentice-Hall International, U.K.
Tse, E. and Y. Bar-Shalom (1973). Wide-sense adaptive dual
control for nonlinear stochastic systems. IEEE Trans. Automat. Control, AC-18(2), 98-108.