W0 u0 V0(x0) u∗0(x0)
x0 -1 0 1
2 7.56 8.88 ∞ 7.56 -1
✷ It should be noticed, that state space and decision space in the previous examples are discrete. If the state space is continuous, then the method applied in the examples can be used as an approximation if we use a discrete grid covering the relevant part of the state space.
6.2 Continuous Dynamic Programming
Consider the problem of finding the input functionut, t∈R, that takes the system
˙
x=ft(xt, ut) x0=x0 t∈[0, T] (6.26) from its initial state along a trajectory such that the cost function
J =φT(xT) + Z T
0
Lt(xt, ut)dt (6.27) is minimized. As in the discrete time case we can define the truncated performance index
Jt(xt, uTt) =φT(xT) + Z T
t
Ls(xs, us)ds
which depend on the point of truncation, on the state, xt, and on the whole input function, uTt, in the interval from t to the end point, T. The optimal performance index, i.e. the Bellman function, is defined by
Vt(xt) = min
uTt
Jt(xt, uTt)
We have the following theorem, which states a sufficient condition.
Theorem 15: Consider the free dynamic optimization problem specified by (6.26) and (6.27). The optimal performance index, i.e. the Bellman function Vt(xt), satisfy the equation
−∂Vt(xt)
∂t = min
u
Lt(xt, ut) +∂Vt(xt)
∂x ft(xt, ut)
(6.28) with boundary conditions
VT(xT) =φT(xT) (6.29)
✷
Equation (6.28) is often denoted as the HJB (Hamilton, Jacobi, Bellman) equation.
Proof: In discrete time we have the Bellman equation Vi(xi) = min
ui
h Li(xi, ui) +Vi+1(xi+1) i with the boundary condition
VN(xN) =φN(xN) (6.30)
t+ ∆t i+ 1 t
i
Figure 6.8. The continuous time axis
In continuous timeicorresponds tot andi+ 1 tot+ ∆t, respectively. Then Vt(xt) = min
If we apply a Taylor expansion onVt+∆t(xt+∆t) and on the integral we have Vt(xt)i= min
Finally, we can collect the terms which do not depend on the decision Vt(xt) =Vt(xt) +∂Vt(xt)
In the limit ∆t → 0 we will have (6.28). The boundary condition, (6.29), comes
directly from (6.30). ✷
Notice, if we have a maximization problem, then the minimization in (6.28) is sub-stituted with a maximization.
If the definition of the Hamiltonian function
Ht=Lt(xt, ut) +λTtft(xt, ut) is used, then the HJB equation can also be formulated as
−∂Vt(xt)
∂t = min
ut
Ht(xt, ut,∂Vt(xt)
∂x )
92 6.2 Continuous Dynamic Programming
Example: 6.2.1 (Motion control). The purpose of this simple example is to illustrate the application of the HJB equation. Consider the system
˙
xt=ut x0=x0 and the performance index
J = 1 The HJB equation, (6.28), gives:
−∂Vt(xt)
The minimization can be carried out and gives a solution w.r.t. utwhich is ut=−∂Vt(xt)
∂x (6.31)
So if the Bellman function is known the control action or the decision can be deter-mined from this. If the result above is inserted in the HJB equation we get
−∂Vt(xt)
which is a partial differential equation with the boundary condition VT(xT) =1
2px2T
Inspired of the boundary condition we guess on a candidate function of the type Vt(x) = 1
2stx2
where the explicit time dependence is in the time function,st. Since
∂V
backwards. This is actually (a simple version of) the continuous time Riccati equa-tion. The solution can be found analytically or by means of numerical methods.
Knowing the function,st, we can find the control input from (6.31). ✷ It is possible to find the (continuous time) Euler-Lagrange equations from the HJB equation.
Appendix A
Quadratic forms
In this section we will give a short resume of the concepts and the results related to positive definite matrices. If z∈Rn is a vector, then the squared Euclidian norm is obeying:
J =kzk2=zTz≥0 (A.1)
IfAis a nonsignular matrix, then the vectorAzhas a quadratic normz⊤A⊤Az≥0.
LetQ=A⊤Athen
kzk2Q=zTQz≥0 (A.2)
and denotekzkQ as the square norm of zw.r.t.. Q.
Now, let S be a square matrix. We are now interested in the sign of the quadratic form:
J =z⊤Sz (A.3)
where J is a scalar. Any quadratic matrix can be decomposed in a symmetric part, Ss and a nonsymmetric part,Sa, i.e.
S=Ss+Sa Ss= 1
2(S+S⊤) Sa =1
2(S−S⊤) (A.4) Notice:
Ss⊤=Ss Sa⊤=−Sa (A.5)
Since the scalar, z⊤Sazfulfills:
z⊤Saz= (z⊤Saz)⊤ =z⊤Sa⊤z=−z⊤Saz (A.6) it is true thatz⊤Saz= 0 for anyz∈Rn, or that:
J =z⊤Sz=z⊤Ssz (A.7)
93
94 6.2 Continuous Dynamic Programming
An analysis of the sign variation of J can then be carried out as an analysis ofSs, which (as a symmetric matrix) can be diagonalized.
A matrix S is said to be positive definite if (and only if) z⊤Sz > 0 for allz ∈Rn zz > 0. Consequently, S is positive definite if (and only if) all eigen values are positive. A matrix, S, is positive semidefinite if z⊤Sz ≥0 for all z ∈ Rn zz >0.
This can be checked by investigating if all eigen values are non negative. A similar definition exist for negative definite matrices. IfJcan take both negative and positive values, thenSis denoted as indefinite. In that case the symmetric part of the matrix has both negative and positive eigenvalues.
We will now examine the situation
Example: A.0.1 In the following we will consider some two dimensional problems.
Let us start with a simple problem in which:
H =
In this case we have
J =z12+z22 (A.9)
and the levels (domain in which the loss functionJ is equalc2) are easily recognized as circles with center in origin and radius equalc. See Figure A.1,area aandsurface
a. ✷
Example: A.0.2 Let us continue the sequence of two dimensional problems in which:
The explicit form of the loss functionJ is:
J =z21+ 4z22 (A.11)
and the levels (withJ =c2) is ellipsis with main directions parallel to the axis and length equalc and 2c, respectively. See Figure A.1,area bandsurface b. ✷ Example: A.0.3 Let us now continue with a little more advanced problem with:
H =
In this case the situation is a little more difficult. If we perform an eigenvalue analysis of the symmetric part ofH (which isH itself due to the factH is symmetric), then we will find that:
H =V DV⊤ V =
-5 0 5
which means the column in V are the eigenvectors and the diagonal elements of D are the eigenvalues. Since the eigenvectors conform a orthogonal basis, V⊤V =I, we can choose to representz in this coordinate system. Let ξbe the coordinates in relation to the column ofV, then
z=V ξ (A.14)
and
J =z⊤Hz=ξ⊤V⊤HV ξ=ξ⊤Dξ (A.15) We are hereby able to write the loss function as:
J = 0.7ξ21+ 4.3ξ22 (A.16)
Notice the eigenvalues 0.7 and 4.3. The levels (J =c2) are recognized ad ellipsis with center in origin and main directions parallel to the eigenvectors. The length of the principal axis are √c
0.7 and √c
4.3, respectively. See Figure A.2 area candsurface c.
✷ For the sake of simplify the calculus we are going to consider special versions of quadratic forms.
Lemma 3: The quadratic form
J = [Ax+Bu]TS[Ax+Bu]
96 6.2 Continuous Dynamic Programming
can be expressed as
J =
Proof: The proof is simply to express the loss functionJ as J=zTSz z=Ax+Bu=
We have now studied properties of quadratic forms and a single lemma (3). Let us now focus on the problem of finding a minimum (or similar a maximum) in a quadratic form.
Lemma 4: Assume, uis a vector of decisions (or control actions) and xis a vector containing known state variables. Consider the quadratic form:
J= x⊤u⊤
h11 h12
h⊤12 h22
x u
(A.17) There exists not a minimum ifh22isnota positive semidefinite matrix. Ifh22ispositive definitethen the quadratic form has a minimum for
u∗=−h−221h⊤12x (A.18)
and the minimum is
J∗=x⊤Sx (A.19)
where
S=h11−h12h−221h⊤12 (A.20) If h22 is only positive semidefinite then we have infinite many solutions with the same
value ofJ. ✷
Proof: Firstly we have J = x⊤u⊤
h11 h12
h⊤12 h22
x u
(A.21)
= x⊤h11x+ 2x⊤h12u+u⊤h22u (A.22)
and then
∂
∂uJ= 2h22u+ 2h⊤12x (A.23)
∂2
∂u2J = 2h22 (A.24)
Ifh22is positive definite thenJ has a minimum for:
u∗=−h−221h⊤12x (A.25)
which introduced in the expression for the performance index give that:
J∗ = x⊤h11x+ 2x⊤h12u∗+ (u∗)⊤h22u∗
= x⊤h11x−2(x⊤h12h−221)h⊤12x +(x⊤h12h−221)h22(h−221h⊤12x)
= x⊤h11x−x⊤h12h−221h⊤12x
= x⊤(h11−h12h−221h⊤12)x
✷
Appendix B
Static Optimization
Optimization is short for either maximization or minimization. In this section we will focus on minimization and just give the results for the maximization case.
A maximization problem can easily be transformed into a minization problem. Let J(z) be an objective function which has to be maximized w.r.t. z. A The maximum can also be found as the minimum to−J(z).
B.1 Unconstrained Optimization
The simplest class of minimization problems is the unconstrained minimization prob-lem. Letz ∈ D ⊆Rn and let J(z)∈Rbe a scalar value function or a performance index which we have to minimize, i.e. to find a vector,z⋆∈ D, that satisfy
J(z⋆)≤J(z) ∀z∈ D (B.1)
Ifz⋆is the only solution satisfying
J(z⋆)< J(z) ∀z∈ D (B.2) we say thatz⋆ is a unique global minimum toJ(z) inDand often we write
z⋆=arg min
z∈DJ(z)
In the case where there exists severalz⋆∈Z⋆⊆ D satisfying (B.1) we say thatJ(z) attain its global minimum in the setZ⋆.
98
A vectorz⋆l is said to be a local minimum inDif there exists a setDl⊆ Dcontaining zl⋆ such that
J(z⋆l)≤J(z) ∀z∈ Dl (B.3)
Optimality conditions are available when J is a differentiable function and Dl is a convex subset ofRn. In that case we can approximateJ by its Taylor expansion
J =J(zo) + d
dzJ(zo)˜z+1 2˜zTd2
dz2J
˜ z+ε where ˜z=z−zo.
A necessary condition for z⋆ being a local optimum is that the point is stationary, i.e.
d
dzJ(z⋆) = 0 (B.4)
If furthermore
d2
dz2J(z⋆)>0
then (B.4) is also sufficient condition. If the point is stationary, but the Hessian is positive semidefinite, then additional information is needed to establish whatever the point is is a minimum. If e.g. J is a linear function, then the Hessian is zero and no minimum exists (in the unconstrained case).
In a maximization problem (B.4) is a sufficient condition if d2
dz2J(z⋆)<0