Escolar Documentos
Profissional Documentos
Cultura Documentos
323 Lecture 3
Dynamic Programming
Principle of Optimality
Dynamic Programming
Discrete LQR
Spr 2008
Dynamic Programming
16.323 31
Spr 2008
16.323 32
Classical Examples
Shortest Path Problems (Bryson gure 4.2.1) classic robot naviga
tion and/or aircraft ight path problems
10
6
B
x
7
11
Spr 2008
16.323 33
Spr 2008
Example 2
16.323 34
Basics: Jgh
= 2, and
Jfh = Jf g + Jgh
=5
Jeh
= min{Jef gh, Jeh} = min{[Jef + Jfh], Jeh}
= min{2 + 5, 8} = 7 e f g h
Also Jdh
= Jde
+ Jeh
= 10
Jch
= min{Jcdh, Jcf h} = min{[Jcd + Jdh
], [Jcf + Jfh]}
= min{[5 + 10], [3 + 5]} = 8 c f g h
Spr 2008
16.323 35
Jh
= min [Jx1 + Jx1h], [Jx2 + Jx2h], . . . , [Jxn + Jxnh]
xi
Can apply the same process to more general control problems. Typi
cally have to assume something about the system state (and possible
control inputs), e.g., bounded, but also discretized.
Roadmap:
Grid the time/state and quantized control inputs.
Time/state grid, evaluate necessary control
Discrete time problem discrete LQR
Continuous time problem calculus of variations cts LQR
Figure 3.1: Classic picture of discrete time/quantized space grid with the linkages
possible through the control commands. Again, it is hard to evaluate all options
moving forward through the grid, but we can work backwards and use the principle
of optimality to reduce this load.
Spr 2008
16.323 36
subject to
x = a(x, u, t)
x(t0) = xed
tf = xed
Other constraints on x(t) and u(t) can also be included.
Step 1 of solution approach is to develop a grid over space/time.
Then look at possible nal states xi(tf ) and evaluate nal costs
For example, can discretize the state into 5 possible cases x1,. . . ,x5
Ji = h(xitf ) , i
Spr 2008
16.323 37
Consider the scenario where you are at state xi at time tk , and apply
j
control uij
k to move to state x at time tk+1 = tk + t.
Approximate cost is
tk+1
g(x(t), u(t), t)) dt g(xik , uij
k , tk )t
tk
xik
a(xik , uij
k , tk )t
a(xik , ukij , tk )
xjk+1 xki
=
t
i
i
1 xk+1 xk
uij
=
q(x
,
t
)
f (xik , tk )
k k
k
t
j
So for any combination of xik and xk+1
Assuming already know the optimal path from each new terminal
point (xjk+1), can establish optimal path to take from xik using
j
i
i
j
J (xk , tk ) = min J(xk , xk+1) + J (xk+1)
j
xk+1
Spr 2008
Other Considerations
16.323 38
With bounds on the control, then certain state transitions might not
be allowed from 1 time-step to the next.
With constraints on the state, certain values of x(t) might not be
allowed at certain times t.
Spr 2008
16.323 39
Figure 3.2: At any time tk , have a two dimensional array of grid points.
ij
i
i
j
J (xk , tk ) = min J(xk , uk ) + J (xk+1, tk+1)
ij
uk
= min
ij
uk
g(xik , ukij , tk )t
+J
j
(xk+1
, tk+1)
Spr 2008
16.323 310
j
Do this for all admissible uij
k and resulting xk+1 , and then take
uk
Spr 2008
DP Example
16.323 311
u2(t) dt
x(t + t) x(t)
= ax(t) + u(t)
t
N
1
u2k t
k=0
Spr 2008
16.323 312
x
j2
J2 = h(x
2
) = (x
2
)
0
0
0.5
0.25
1
1
1.5
2.25
Given x1 and possible x2, can evaluate the control eort required to
make that transition:
u(1)
xi
1
0
0.5
1
1.5
x
j2
= x
i
1
+ u(1)
0 0.5 1 1.5
0
0.5 1 1.5
-0.5 0 0.5 1
-1 -0.5 0 0.5
-1.5 -1 -0.5 0
xi
1
0
0.5
1
1.5
x
j2
0 0.5 1 1.5
0 0.5 2 XX
0.5 0 0.5 2
2 0.5 0 0.5
XX 2 0.5 0
ij
and costs at time t = 1 given by J1 = J
12
+ J2(x2j )
J1
x
i
1
0
0.5
1
1.5
June 18, 2008
xj
2
0 0.5 1
0 0.75 3
0.5 0.25 1.5
2 0.75 1
XX 2.25 1.5
1.5
XX
4.25
2.75
2.25
Spr 2008
16.323 313
Take min across each row to determine best action at each possible
x1 J1(xj1)
xi1
0
0.5
1
1.5
xj2
0
0.5
0.5
1
xj
1
0 0.5
1
0 0.75 2.75
0.5 0.25 1.25
2 0.75 0.75
XX 2.25 1.25
1.5
XX
3.5
2
1.5
and again, taking min across the rows gives the best actions:
xi0
0
0.5
1
1.5
xj1
0
0.5
0.5
1
So now we have a complete strategy for how to get from any xi0 to
the best x2 to minimize the cost
This process can be highly automated, and this clumsy presenta
tion is typically not needed.
Spr 2008
Discrete LQR
16.323 314
1 T
1 T
J = xN HxN +
[xk Qk xk + ukT Rk uk ]
2
2
k=0
so that
1 T
T
gd(xk , uk ) =
x Qk xk + uk Rk uk
2 k
subject to the dynamics
xk+1 = Ak xk + Bk uk
Assume that H = H T 0, Q = QT 0, and R = RT > 0
Including any other constraints greatly complicates problem
Clearly JN [xN ] = 12 xTN HxN now need to nd JN 1[xN 1]
JN 1[xN 1] = min {gd(xN 1, uN 1) + JN [xN ]}
uN 1
1 T
T
T
= min
xN 1QN 1xN 1 + uN 1RN 1uN 1 + xN HxN ]
uN 1 2
Note that xN = AN 1xN 1 + BN 1uN 1, so that
1 T
JN 1[xN 1] = min
xN 1QN 1xN 1 + uTN 1RN 1uN 1
uN 1 2
T
Spr 2008
16.323 315
T
RN 1 + BN 1HBN 1 uN 1 + BNT 1HAN 1xN 1 = 0
1 T
T
uN 1 = RN 1 + BN 1HBN 1
BN 1HAN 1xN 1
FN 1xN 1
Furthermore, can show that
2JN 1[xN 1]
= RN 1 + BNT 1HBN 1 > 0
2
uN 1
so that the stationary point is a minimum
1
JN 1[xN 1] = xTN 1 QN 1 + FNT 1RN 1FN 1+
2
T
{AN 1 BN 1FN 1} H {AN 1 BN 1FN 1} xN 1
1 T
x
PN 1xN 1
2 N 1
Note that PN = H, which suggests a convenient form for gain F :
1 T
FN 1 = RN 1 + BNT 1PN BN 1
BN 1PN AN 1
(3.20)
Spr 2008
16.323 316
Now can continue using induction assume that at time k the control
will be of the form uk = Fk xk where
1 T
Fk = Rk + BkT Pk+1Bk
Bk Pk+1Ak
and Jk[xk ] = 12 xTk Pk xk where
Pk = Qk + FkT Rk Fk + {Ak Bk Fk }T Pk+1 {Ak Bk Fk }
Recall that both equations are solved backwards from k + 1 to k.
Now consider time k 1, with
1
Jk1[xk1] = min
xTk1Qk1xk1 + uTk1Rk1uk1 + Jk[xk ]
uk1 2
Taking derivative with respect to uk1 gives,
Jk1[xk1]
= uTk1Rk1 + {Ak1xk1 + Bk1uk1}T Pk Bk1
uk1
so that the best control input is
1 T
uk1 = Rk1 + BkT1Pk Bk1
Bk1Pk Ak1xk1
= Fk1xk1
Substitute this control into the expression for Jk1[xk1] to show that
1
Jk1[xk1] = xTk1Pk1xk1
2
and
Pk1 = Qk1 + FkT1Rk1Fk1 +
{Ak1 Bk1Fk1}T Pk {Ak1 Bk1Fk1}
Thus the same properties hold at time k 1 and k, and N and N 1
in particular, so they will always be true.
June 18, 2008
Spr 2008
16.323 317
Algorithm
PN = H
1 T
Fk = Rk + BkT Pk+1Bk
Bk Pk+1Ak
Pk = Qk + FkT Rk Fk + {Ak Bk Fk }T Pk+1 {Ak Bk Fk }
1 T
T
T
Pk = Qk +Ak Pk+1 Pk+1Bk Rk + Bk Pk+1Bk
Bk Pk+1 Ak
Initial assumption Rk > 0 k can be relaxed, but we must ensure
that Rk+1 + BkT Qk+1Bk > 0. 2
In the expression:
J
(xik , tk )
= min
ij
uk
g(xik , ukij , tk )t
+J
(xjk+1, tk+1)
2 Anderson
Spr 2008
Suboptimal Control
16.323 318
The optimal initial cost is J0[x0] = 12 xT0 P0x0. One question: how
would the cost of a dierent controller strategy compare?
uk = Gk xk
Can substitute this controller into the cost function and compute
N 1
1 T
1 T
J = xN HxN +
[xk Qk xk + ukT Rk uk ]
2
2
JG =
1 T
1
xN HxN +
2
2
k=0
N
1
k=0
where
xk+1 = Ak xk + Bk uk = (Ak Bk Gk )xk
Note that:
N 1
1
1 T
xk+1Sk+1xk+1 xTk Sk xk = xTN SN xN x0T S0x0
2
2
k=0
1 T
1 T
JG = xN HxN +
xk [Qk + GTk Rk Gk Sk ]xk
2
2
k=0
1
Spr 2008
Steady State
16.323 319
Assume3
Time invariant problem (LTI) i.e., A, B, Q, R are constant
System [A, B] stabilizable uncontrollable modes are stable.
For any H, then as N , the recursion for P tends to a constant
solution with Pss 0 that is bounded and satises (set Pk Pk+1)
1 T
T
T
Pss = Q + A Pss PssB R + B PssB
B Pss A (3.21)
Discrete form of the famous Algebraic Riccati Equation
Typically hard to solve analytically, but easy to solve numerically.
Can be many PSD solutions of (3.21), recursive solution will be
one.
Let Q = C T C 0, which is equivalent to having cost measurements
z = Cx and state penalty zT z = xT C T Cx = xT Qx. If [A, C]
detectable, then:
Independent of H, recursion for P has a unique steady state
solution Pss 0 that is the unique PSD solution of (3.21).
The associated steady state gain is
1 T
T
Fss = R + B PssB
B PssA
and using Fss, the closed-loop system xk+1 = (A BFss)xk is
asymptotically stable, i.e.,
|(A BFss)| < 1
Detectability required to ensure that all unstable modes penalized
in state cost.
If, in addition, [A, C] observable4, then there is a unique Pss > 0
3 See
Lewis and Syrmos, Optimal Control, Thm 2.4-2 and Kwakernaak and Sivan, Linear Optimal Control Systems, Thm 6.31
4 Guaranteed
if Q > 0
Spr 2008
16.323 320
so that A = 1, B = t = 1 and
N 1
1
1 2
2
J = x(N ) +
[xk + uk2 ]
4
2
k=0
Plot shows discrete LQR results: clear that the Pk settles to a con
stant value very quickly
Rate of reaching steady state depends on Q/R. For Q/R large
reaches steady state quickly
(Very) Suboptimal F gives an obviously worse cost
June 18, 2008
Spr 2008
16.323 321
Spr 2008
Gain Insights
16.323 322
1 T
T
FN 1 = RN 1 + BN 1PN BN 1
BN 1PN AN 1
which for the scalar case reduces to
BN 1PN AN 1
FN 1 =
RN 1 + BN2 1PN
BN 1PN AN 1
A
BN2 1PN
B
and
A
xN = (A BF )xN 1 = (A B )xN 1 = 0
B
regardless of the value of xN 1. This is a nilpotent controller.
BN 1PN AN 1
A
BN2 1
PN
B
and xN = 0 as well.
Spr 2008
1
2
3
4
5
6
7
8
% Jonathan How
clear all
close all
%A=1;B=1;Q=1;R=1;H=0.5;N=5;
A=1;B=1;Q=1;R=2;H=.25;N=10;
9
10
11
12
13
14
15
F(i)=inv(R+B*P(i+1)*B)*B*P(i+1)*A;
P(i)=(A-B*F(i))*P(i+1)*(A-B*F(i))+F(i)*R*F(i)+Q;
end
16
17
18
19
20
21
22
23
for j=N-1:-1:0
G(i)=F(1);
S(i)=(A-B*G(i))*S(i+1)*(A-B*G(i))+G(i)*R*G(i)+Q;
end
24
25
26
27
28
29
30
31
32
33
34
35
36
37
time=[0:1:N];
figure(1);clf
plot(time,P,ks,MarkerSize,12,MarkerFaceColor,k)
hold on
plot(time,S,rd,MarkerSize,12,MarkerFaceColor,r)
plot(time(1:N),F,bo,MarkerSize,12,MarkerFaceColor,b)
hold off
xlabel(Time)
ylabel(P/S/F)
text(2,1,[S(0)-P(0) = ,num2str(S(1)-P(1))])
axis([-.1 N -1 max(max(P),max(S))+.5] )
38
39
40
41
42
43
44
45
for j=N-1:-1:0
G(i)=.25;
S(i)=(A-B*G(i))*S(i+1)*(A-B*G(i))+G(i)*R*G(i)+Q;
end
46
47
48
49
50
51
52
53
54
55
56
57
58
59
figure(2)
%plot(time,P,ks,time,S,rd,time(1:N),F,bo,MarkerSize,12)
plot(time,P,ks,MarkerSize,12,MarkerFaceColor,k)
hold on
plot(time,S,rd,MarkerSize,12,MarkerFaceColor,r)
plot(time(1:N),F,bo,MarkerSize,12,MarkerFaceColor,b)
hold off
text(2,1,[S(0)-P(0) = ,num2str(S(1)-P(1))])
axis([-.1 N -1 max(max(P),max(S))+.5] )
ylabel(P/S/F)
xlabel(Time)
60
61
62
63
64
65
66
% state response
x0=1;xo=x0;xs1=x0;xs2=x0;
for j=0:N-1;
k=j+1;
xo(k+1)=(A-B*F(k))*xo(k);
xs1(k+1)=(A-B*F(1))*xs1(k);
16.323 323
Spr 2008
67
68
69
70
71
72
73
74
75
76
77
78
79
xs2(k+1)=(A-B*G(1))*xs2(k);
end
figure(3)
plot(time,xo,bo,MarkerSize,12,MarkerFaceColor,b)
hold on
plot(time,xs1,ks,MarkerSize,9,MarkerFaceColor,k)
plot(time,xs2,rd,MarkerSize,12,MarkerFaceColor,r)
hold off
legend(Optimal,Suboptimal with G=F(0),Suboptimal with G=0.25,Location,North)
%axis([-.1 5 -1 3] )
ylabel(x(t))
xlabel(Time)
print -dpng -r300 integ3.png;
80
16.323 324
Spr 2008
Appendix
16.323 325
Def: LTI system is controllable if, for every x(t) and every nite
T > 0, there exists an input function u(t), 0 < t T , such that the
system state goes from x(0) = 0 to x(T ) = x.
Starting at 0 is not a special case if we can get to any state
in nite time from the origin, then we can get from any initial
condition to that state in nite time as well. 5
Thm: LTI system is controllable i it has no uncontrollable states.
Necessary and sucient condition for controllability is that
rank Mc
rank
B
AB A2B An1B = n
CA
2
rank Mo
rank
CA = n
...
CAn1
5 This