Você está na página 1de 5

Applications of Optimal Control Theory

using Artificial Neural Networks


J.M. Martinez (*), C. Barret, M. Houkari,
P. Meyne, M. Dominguez (*)
Laboratoire de Robotique d'Evry, Allke Jean Rostand,
91025 Evry, France
(*) Centre d'Etudes de Saclay, DMT, 91190
Gif sur Yvette, France
e-mail : jmm@soleil.saclay.cea.fr
Abstract : This paper shows neural networks capabilities in
optimal control applications of non linear dynamic systems.
Our method is based ona classical method concerning the
direct research of the optlmal control using gradient techni-
ques. Weshow that neural approach and backpropagation
paradigmare able to solve et-ficiently equations relative to
necessary conditions for an optimizing solution. Wehave ta-
ken into account the known capabilities of neural networks
in approximation functions. And for dynamic systems, we
have generalized the indirect learning of inverse model
adaptive architecture that is capable of defining an optimal
control in relation to a temporal criterion.
Keywords : neural control theory, adaptive idenufication,
adaptive control, optimal control.
1. Introduction
Neural techniques have already shown their ability in iden-
tification and control processes [1,2,3,4,5,6]. At first these
techniques were introduced for static processes from direct
or induect learning architectures of the inverse model. Then
they were extended for dynamic processes. These approa-
ches are only local optimisation based, i.e. the effects of the
control in time are not really taken into account.
Analternative to this limitation consists in the elaboration of
the control from the minimization of a criterion in relation
with temporal evolutions of process states. This approach is
recurrent networks based where the behaviour of the process
is analyzed froma sequence of neural networks models. To
fit dynamical behaviour, one can use Back propagation
Through Time. BTT (12). Our approach is similar. But here,
wewill useBTT to deal with optimal control learning. We
will use a sequence of neural networks to estimate the tem-
poral evolution of the process states in order to define the
best control.
Today, claqsical methods in control theory are sufficiently
mature and are well formalized. Nevertheless, these
methods are not suitable inreal applications concerning op-
timal control. They may be implemented by coupling them
with neural methods. In[8], one emphazises the <<continuity
of the research onartificial neural networks with more tradi-
tional research,, in order to take advantages of the control
process background knowledge.
Definition of optimal control by classical methods requires
good knowledge of the process. An analytical model in al-
gebric-derivative or recurrent non linear equations is neces-
sary. In real processes models are not sufficiently known.
And when they are, it is not possible to use themin line be-
cause of temporal constraints. So, in real applications, one
usually deals with idenhfication problem using adaptive
linear models which provides the typical feature of the pro-
cess to classical controllers. But these models are not able to
estimate the process behaviour over a long time and optimal
control is not possible.
In relation with classical methods, neural techniques can be
distinguished by two characteristics. The fmt one consists in
the smcture of neural models, i.e. non linear models gene-
ralizing the classical linear approach. The second is relative
to adaptation algorithms like backpropagation that fits real
applications in adaptive and optimal control. Wedetal these
two points below.
For the most part of real cases processes are non linear. The
neural approach with its non linear features is better than the
linear approach. Neural models are to beseen as a generali-
zation of linear models. Indeed, if activation functions of
neural units are linear, we are back to linear identification
models. In an adaptive scheme, the parameters of identifica-
tion models are synaptic weights. In the other hand, it is
known from[9], that a two-layer network with an arbitrarily
large number of units in the hidden layer can approximate
any continuous function f E C(R", RP) over a compact sub-
set of R'. And we can add the high degree parallelism of the
neural computation capbable of dealing complex applica-
tions, using dedcated hardware.
The second point relative to backpropagation is more tech-
nical. It has been shown in [2, 6, 91that backpropagation
provides easily the jacobian of the neural model. So wecan
use it as if it was the jacobian of the process. The first idea is
to help operators of the process in direct or indirect mode to
define the better actions in relation to a given goal, i.e. to
answer the requests ((What If ?n and <<What For '5). <<What
if>>to help process operators to estimate perturbations on the
process before any decision on the control, and <<What for>>
to propose themsome variations onthecontrol to reach a de-
0-7803-2129-4194 $3.00 @ 1994 IEEE
sired goal onthe process state. The other idea that wepre-
sent here, is to use this appropriateness to extend control
help towards the definition of a real optimal control.
Section 2 will describe the classical direct method to find an
optimizing solution. This method gives necessary condi-
tions. I t is a well known method which can befound in[ 101.
Section 3 presents the resolution of these necessary condi-
tions using neural techniques. This approach is based onin-
direct learning of inverse model to identify the process by a
multi layered network. Section 4 gives our views on this ap-
proach that seems very attractive for real applications. We
conclude in theSection 5.
2. Optimization Method
Weconsider non linear dynamic systems which can bedes-
cribed by (1 ) :
X, +l =F( X, . U, ) X E RP U E Rq
where X, is the state vector and U, is the control vector at
discrete time t. Froma given initial state XO, the problemis
to tind il sequence of optimal control U, , U1, ..., U,, that mi-
nimize a given cost function by the equation (2) :
T+ 1
C(X,.X,. .... uo, U,, ...) = c Cr(Xr, Vr)
r = 1
This N-srage optimal control problem, when analytical cases
are not available, can be solve from numerical techniques by
computing gradients cost in relation to the sequence control
(3) :
ac
aut
&U, =-a.- I [O,U
To compute sensitivities of the cost with respect to varia-
tions in control space, the direct classical method leads to
solve associated recurrence equations fromfinal condition
(4) :
Yr-, =Fx r . Yr +c Xr with Y T+l =0
The gadients Cut =aC/dU, are calculated by ( 5) :
cur =Fur ' *, +C u r
Details of calculations are given in the Appendix. This sche-
meneeds the process model F (Equation 1) and the associa-
ted jacobian Fx (Equation 4) and Fu(Equation 5) . Wedeal
with this using neural techniques identification to provide a
neural model and backpropagation to compute gradients of
the cost function in relation to control inputs.
ping frominputs to outputs. Thevector of parameters (i.e.
weights synaptic) are calculated using backpropapation to
minimize a cost function based on the discrepancy between
mget outputs and network outputs. So, we can use this
adaptive scheme to deal with identification process. To per-
forma model of the process, wepropose for example the
classical series-parallel method , as seens as in Figure 3.1.
Other methods of identification can be used (e.g. feed-
forward or recurrent networks using stochastic or batch gra-
dients) [ll].
Here, the method enables us to identify a process described
by equation (1). So wesuppose that the identification pro-
blemis solved by a feedforward neural network. The neural
model which identifies the process, is a good model for
control as long as it gives good enough mapping from
inputs (state and control) to outputs (state). Besides. this
kind of learning is capable of adapting to possible process
drifts if it is kept on line.
Control
U,
- F(Xn,UJ
*
State A
x n ~a
Figure 3.1 : Series Parallel Identification
The notation f1 is the unit time delay. Backpropagation
also gives differentiations of inputs with respect to ourputs.
So, we are going b use backpropagation to solve equations
(4) and ( 5) in which we will substitute the jacobian of the
neural model for the jacobian of the real process model F
(6 and 7) :
y - 1 = @X r . Y r + C X r
Cur =@ut . *r +Cu r
These equations can be solved using backpropagation
through neural model. Now weare going to describe our
method. Building blocks of propagation and backprapaga-
tion steps are described respectively in Figure 3. 2 and 3.3.
The arrows in thick line represent the result of each
3. Neural Method
A feedforward network can be seen as a parameterized map-
1465
step.
PROP~ATI ON I Ut
The PROPAGATION step is the classical forward step for
multi-layered networks where we have added the calcula-
tion of the grahents of local function q(X,,UJ. This step
Figure 3.3 : Backorooazation SteD
- . .
defines each state Xt+l according to the previous state X,
and value control U,. During this step the network at rela-
tes to process state at discrete time t. So this step must be
repeated for t =1 to T+l to get the estimated temporal beha-
viour of the process at discrete time t =1,2 ,..., T+1. The par-
tial derivatives of local cost function from state and control,
Le; cXt =ac,(X,,U,)/dX, and cut =aq(X,,U,)/dU, are also
calculated at discrete time t=0, 1, ..., T.
The BACKPROPAGATION step, as seen as in Figure 3.3,
performs a classical backpropagation of adjoint vectors Y,
through internal state of neural models Qv The backpropag-
tion step provides the terms %tY, and the terms OUt.Y,. To
define each adjoint vector Y, and the sensitive relation of the
global cost Cut with respect to control vector U,, one must
add cut and cxt corresponding to the sensitive relation of lo-
tal cost c, with respect to variations in x, and U, respecti-
vely. Each of these terms has been calculated during the
PROPAGATION step.
Fromthis interpretation, one can find a minimun of the glo-
bal cost function in relation with the sequence of control
vectors U, fort =O,l , ..., T. Weshow inFigure 3.4 the general
architecture to provide the sensitive relations Cut, i.e. to sol-
ve the adjoint system. To lighten this figure wehave inclu-
ded in each block at the calculation of the respective local
gradient cut and cxt
In the following figure, two data streams go through each
elementary block (@,, c,). The first one consists in the propa-
gation of pairs (X,, U,) initializing the intemal state of each
unity. The second streamis relative to backpropagation seen
as an echo occurring on temporal terminal T+1. There is no
hypothesis concerning the depth of the temporal terminal.
This echo propagates on the horizontal axis thevalue of ad-
joint vectors Yt and on the vertical axis the values Cut in re-
lation to variations that must be applied on the control
vector. Onthis figure, the sequential distribution of the cost
function appears along the sequence of blocks @,. This dis-
tribution fits the definition of the global cost function as the
1466
.
sumof local cost functions. inputs are Xo, XT+ld. Inthis case and for the particular value
Ficure 3.4 : Adioint System Resolution
This method requires initial values of the control vectors
U,. To deal with this problemweuse learning capabilities
of neural networks. To define initial conditions, the idea is
to build a neural controller to estimate the o p m control
in relation with the initial state X, and each cost function q.
An iterative solution to perform the learning step of the
neural controller consists in channelling sensitive realtions
Cut to theneural controller.
These sensitives relations are seen as errors on the last layer
of the neural controller. Fromthese errors, the backpropa-
gation will adapt modifications of synaptic weights of the
neural controller to minimize these errors. And little by
little, after several iterations, neural controller will leam
optimal control in relation to initial Xo states and desired
states X,d included inlocal cost ct.
Figure 3.5 shows the general principle of this method. In ge-
neral. the number of neural controller inputs is dependent on
the number of desired states and desired control vectors, i.e.
on the depth of the temporal window of the cost function. In
general, the main objective is to control a path in the state
space. So the cost function is only dependent on the desired
states and the neural controller inputs are Xo, Xld,...,XT+, .
Similarly, sometimes the main goal is to reach a desired state
at discrete time T+1, so in this case the neural controller
d
T =0, werecognize the neural architecture which was pro-
posed [ l , 6,7].
Figure 3.5 : O~timal Control Learninc Architecture
4. Discussion
This approach can be generalized easily to processes which
are described by non linear recurrence relations such as Xt+l
=F(X,,X,-, ,..., U,,U,, ,... ). This representation is certainly
more adapted to processes for which delay lines link state
and control vectors. Onthe other hand, if there is no access
to the state vector, estimation techniques such as Kalman fil-
ters or other neural techniques can be used.
The gradient problems must be solved : value of the step, the
criterion to stop iterative procedure and the convergence to
a local minimun.We must also deal with all problems con-
cerning the numeric stability to solve the adjoint system.
Nevertheless wehave applied our method to solve the pro-
blemof the optimal control for a second order system, des-
cribed by d%/dt2 =U. For the states of the process, wedealt
with position and variation of the position using an Euler ap-
proximation (10 ms for the sampling periode). The step a of
the gradient (as seen in Equation 6) has been changed
between 5 and 200 according to the variations of the cost
function. After about 1000 iterations wefound the optimal
solution, i.e. the bang-bang control law.
1467
5. Conclusions
Today it is known that supervised learning is not completely
dependent on a teacher [ 11. To solve problems of control this
kind of learning is used to build a model of theccworld, and
to rely on this model to give directives to a controller in or-
der to reach a goal. Our work tries to apply this approach to
process optimal control, i.e. when a trajectory in state space
is desired.
Our approach is a generalization of the neural architectures
which were proposed by Jordan and Barto [l , 81. Indeed,
with only one neural model to estimate the state, i.e. with T
=0, werecognize their architectures. When the goal is spe-
clfied over a long time (T>O) our method is reminiscent of
Widrows works in [2]. The difference consists in the forma-
lization of the optimal control using background classical
methods. Wehope have proved that a sequence of fitted fee-
dforward networks to process can provide theoptimal con-
trol. Wehave shown that a baclcpropagation through this
sequence of neural models solves the adjoint systemof ne-
cessary conditions for an optimizing solution.
6. References
[ 11 M. I. Jordan. D. E. Rumelhart, ForwardModels : Super-
vised Learning with a distal teacher, Cognitive Science, 16,
page 307-354.
[2] D. Nguyen. B. Widrow, The Truck Backer-upper, In-
ternational Neural Network Conference, July 9-13 1990, Pa-
ris, France.
[3] K.S. Narendra, K. ParthaSarathy, Identification and
Control of Dynamical Systems Using Neural Network,
IEEE Trans. On Neural Networks, Vol. 1, No. 1, March
1990.
[4] D. Psaltis. A. Sideris and A. Yamamura, Neural Con-
trollers, IC, San Diego, 1987.
[ 5] M. Kawato, Computational Scheme and Neural
Network Models for Formation and Control of Multijoint
Ann Trajectory, in Neural Networks for Control edited by
W. Thomas Miller, R. Sutton and PJ. Werbos, Bradford
Book, 1990.
[6] J.M. Martinez, Ch. Parey, M. Houkari, Lar6tropropaga-
tion sous Iangle de la thCoriedu Contrijle, NEURO-NI-
MES91,4-8 Novembre 1991, Nmes, France.
171 A. G. Barto, Connectionnist Learning for Control in
Neural Networks for Control edited by W. Thomas Miller,
R. Sutton and P.J. Werbos, Bradford Book, 1990.
[8] K.M. Homik, M. Stinchcombe, H. White, Multi-layer
Feedforward Networks are Universal Approximators,
UCSD Depamnent of Economiccs Discussion Paper, June
1988.
[ Y] Y. Lecun. A Theorical Framework for Back-Propaga-
tion. Connectionnist Models, Summer School, Morgan
Kaufinann Publishers.
[ 101R. Boudarel, J . D e b , P. Guichet, Commande Opti-
male des Processus, Techniques de 1 Automatisme, Dunod
Paris 1968.
[ l l ] S.-Z. Qin, H.-T. Su, andT.J . McAvoy, Comparison of
Four Neural Net Learning Methods for Dynamic System
Identification, IEEE Trans. On Neural Networks, Vol. 3,
No. 1, Jan. 1992.
[ 121 P. J . Werbos, <<Backpropagation Through Time : What
it Does and How to Do it,,, Proc. IEEE, vol78, no 10, Oct
90, pp 1550-1560.
Appendix
Wedeal with systems and cost functions which are defined
by the following equations :
Xr+, =F(X,. U!) X E Rp U E R4
T
= ( vO) c cr (xp + T+ 1 ( T + 1 )
I = I
Fromthecost function C considered as a function of con-
trol vectors U,, wehave :
Let us define the adjoint vector by :
Adjoint vectors Y, are linked by following recurrent equa-
tions :
T+ l a c k ax, d ~ , , ~ - ac, q+,
,,,ax, ax,,, awl - G+~GF,
=
Using following notations :
Weobtain sensitives relations of the global cost C with res-
pect to control vectors U, :
Y, -, =c Xr +Yr - Fx , with YT+, =0
cur =Cur +y, Fur

Você também pode gostar