Continuous Dynamic Programming - Dynamic Optimization

W0 u0 V0(x0) u^∗₀(x0)

x0 -1 0 1

2 7.56 8.88 ∞ 7.56 -1

✷ It should be noticed, that state space and decision space in the previous examples are discrete. If the state space is continuous, then the method applied in the examples can be used as an approximation if we use a discrete grid covering the relevant part of the state space.

6.2 Continuous Dynamic Programming

Consider the problem of finding the input functionut, t∈R, that takes the system

x=ft(xt, ut) x0=x₀ t∈[0, T] (6.26) from its initial state along a trajectory such that the cost function

J =φT(xT) + Z T

Lt(xt, ut)dt (6.27) is minimized. As in the discrete time case we can define the truncated performance index

Jt(xt, u^T_t) =φT(xT) + Z T

Ls(xs, us)ds

which depend on the point of truncation, on the state, xt, and on the whole input function, u^T_t, in the interval from t to the end point, T. The optimal performance index, i.e. the Bellman function, is defined by

Vt(xt) = min

u^Tt

Jt(xt, u^T_t)

We have the following theorem, which states a sufficient condition.

Theorem 15: Consider the free dynamic optimization problem specified by (6.26) and (6.27). The optimal performance index, i.e. the Bellman function Vt(xt), satisfy the equation

−∂Vt(xt)

∂t = min

Lt(xt, ut) +∂Vt(xt)

∂x ft(xt, ut)

(6.28) with boundary conditions

VT(xT) =φT(xT) (6.29)

✷

Equation (6.28) is often denoted as the HJB (Hamilton, Jacobi, Bellman) equation.

Proof: In discrete time we have the Bellman equation Vi(xi) = min

h Li(xi, ui) +Vi+1(xi+1) i with the boundary condition

VN(xN) =φN(xN) (6.30)

t+ ∆t i+ 1 t

Figure 6.8. The continuous time axis

In continuous timeicorresponds tot andi+ 1 tot+ ∆t, respectively. Then Vt(xt) = min

If we apply a Taylor expansion onVt+∆t(xt+∆t) and on the integral we have Vt(xt)i= min

Finally, we can collect the terms which do not depend on the decision Vt(xt) =Vt(xt) +∂Vt(xt)

In the limit ∆t → 0 we will have (6.28). The boundary condition, (6.29), comes

directly from (6.30). ✷

Notice, if we have a maximization problem, then the minimization in (6.28) is sub-stituted with a maximization.

If the definition of the Hamiltonian function

Ht=Lt(xt, ut) +λ^T_tft(xt, ut) is used, then the HJB equation can also be formulated as

−∂Vt(xt)

∂t = min

Ht(xt, ut,∂Vt(xt)

∂x )

92 6.2 Continuous Dynamic Programming

Example: 6.2.1 (Motion control). The purpose of this simple example is to illustrate the application of the HJB equation. Consider the system

xt=ut x0=x₀ and the performance index

J = 1 The HJB equation, (6.28), gives:

−∂Vt(xt)

The minimization can be carried out and gives a solution w.r.t. utwhich is ut=−∂Vt(xt)

∂x (6.31)

So if the Bellman function is known the control action or the decision can be deter-mined from this. If the result above is inserted in the HJB equation we get

−∂Vt(xt)

which is a partial differential equation with the boundary condition VT(xT) =1

2px²_T

Inspired of the boundary condition we guess on a candidate function of the type Vt(x) = 1

2stx²

where the explicit time dependence is in the time function,st. Since

∂V

backwards. This is actually (a simple version of) the continuous time Riccati equa-tion. The solution can be found analytically or by means of numerical methods.

Knowing the function,st, we can find the control input from (6.31). ✷ It is possible to find the (continuous time) Euler-Lagrange equations from the HJB equation.

Appendix A

Quadratic forms

In this section we will give a short resume of the concepts and the results related to positive definite matrices. If z∈Rⁿ is a vector, then the squared Euclidian norm is obeying:

J =kzk²=z^Tz≥0 (A.1)

IfAis a nonsignular matrix, then the vectorAzhas a quadratic normz^⊤A^⊤Az≥0.

LetQ=A^⊤Athen

kzk²_Q=z^TQz≥0 (A.2)

and denotekzkQ as the square norm of zw.r.t.. Q.

Now, let S be a square matrix. We are now interested in the sign of the quadratic form:

J =z^⊤Sz (A.3)

where J is a scalar. Any quadratic matrix can be decomposed in a symmetric part, Ss and a nonsymmetric part,Sa, i.e.

S=Ss+Sa Ss= 1

2(S+S^⊤) Sa =1

2(S−S^⊤) (A.4) Notice:

S_s^⊤=Ss S_a^⊤=−Sa (A.5)

Since the scalar, z^⊤Sazfulfills:

z^⊤Saz= (z^⊤Saz)^⊤ =z^⊤S_a^⊤z=−z^⊤Saz (A.6) it is true thatz^⊤Saz= 0 for anyz∈Rⁿ, or that:

J =z^⊤Sz=z^⊤Ssz (A.7)

94 6.2 Continuous Dynamic Programming

An analysis of the sign variation of J can then be carried out as an analysis ofSs, which (as a symmetric matrix) can be diagonalized.

A matrix S is said to be positive definite if (and only if) z^⊤Sz > 0 for allz ∈Rⁿ z^z > 0. Consequently, S is positive definite if (and only if) all eigen values are positive. A matrix, S, is positive semidefinite if z^⊤Sz ≥0 for all z ∈ Rⁿ z^z >0.

This can be checked by investigating if all eigen values are non negative. A similar definition exist for negative definite matrices. IfJcan take both negative and positive values, thenSis denoted as indefinite. In that case the symmetric part of the matrix has both negative and positive eigenvalues.

We will now examine the situation

Example: A.0.1 In the following we will consider some two dimensional problems.

Let us start with a simple problem in which:

H =

In this case we have

J =z₁²+z₂² (A.9)

and the levels (domain in which the loss functionJ is equalc²) are easily recognized as circles with center in origin and radius equalc. See Figure A.1,area aandsurface

a. ✷

Example: A.0.2 Let us continue the sequence of two dimensional problems in which:

The explicit form of the loss functionJ is:

J =z²₁+ 4z²₂ (A.11)

and the levels (withJ =c²) is ellipsis with main directions parallel to the axis and length equalc and ₂^c, respectively. See Figure A.1,area bandsurface b. ✷ Example: A.0.3 Let us now continue with a little more advanced problem with:

H =

In this case the situation is a little more difficult. If we perform an eigenvalue analysis of the symmetric part ofH (which isH itself due to the factH is symmetric), then we will find that:

H =V DV^⊤ V =

-5 0 5

which means the column in V are the eigenvectors and the diagonal elements of D are the eigenvalues. Since the eigenvectors conform a orthogonal basis, V^⊤V =I, we can choose to representz in this coordinate system. Let ξbe the coordinates in relation to the column ofV, then

z=V ξ (A.14)

and

J =z^⊤Hz=ξ^⊤V^⊤HV ξ=ξ^⊤Dξ (A.15) We are hereby able to write the loss function as:

J = 0.7ξ²₁+ 4.3ξ₂² (A.16)

Notice the eigenvalues 0.7 and 4.3. The levels (J =c²) are recognized ad ellipsis with center in origin and main directions parallel to the eigenvectors. The length of the principal axis are √^c

0.7 and √^c

4.3, respectively. See Figure A.2 area candsurface c.

✷ For the sake of simplify the calculus we are going to consider special versions of quadratic forms.

Lemma 3: The quadratic form

J = [Ax+Bu]^TS[Ax+Bu]

96 6.2 Continuous Dynamic Programming

can be expressed as

J =

Proof: The proof is simply to express the loss functionJ as J=z^TSz z=Ax+Bu=

We have now studied properties of quadratic forms and a single lemma (3). Let us now focus on the problem of finding a minimum (or similar a maximum) in a quadratic form.

Lemma 4: Assume, uis a vector of decisions (or control actions) and xis a vector containing known state variables. Consider the quadratic form:

J= x^⊤u^⊤

h11 h12

h^⊤₁₂ h22

x u

(A.17) There exists not a minimum ifh22isnota positive semidefinite matrix. Ifh22ispositive definitethen the quadratic form has a minimum for

u^∗=−h⁻₂₂¹h^⊤₁₂x (A.18)

and the minimum is

J^∗=x^⊤Sx (A.19)

where

S=h11−h12h⁻₂₂¹h^⊤₁₂ (A.20) If h22 is only positive semidefinite then we have infinite many solutions with the same

value ofJ. ✷

Proof: Firstly we have J = x^⊤u^⊤

h11 h12

h^⊤₁₂ h22

x u

(A.21)

= x^⊤h11x+ 2x^⊤h12u+u^⊤h22u (A.22)

and then

∂

∂uJ= 2h22u+ 2h^⊤₁₂x (A.23)

∂²

∂u²J = 2h22 (A.24)

Ifh22is positive definite thenJ has a minimum for:

u^∗=−h⁻₂₂¹h^⊤₁₂x (A.25)

which introduced in the expression for the performance index give that:

J^∗ = x^⊤h11x+ 2x^⊤h12u^∗+ (u^∗)^⊤h22u^∗

= x^⊤h11x−2(x^⊤h12h⁻₂₂¹)h^⊤₁₂x +(x^⊤h12h⁻₂₂¹)h22(h⁻₂₂¹h^⊤₁₂x)

= x^⊤h11x−x^⊤h12h⁻₂₂¹h^⊤₁₂x

= x^⊤(h11−h12h⁻₂₂¹h^⊤₁₂)x

✷

Appendix B

Static Optimization

Optimization is short for either maximization or minimization. In this section we will focus on minimization and just give the results for the maximization case.

A maximization problem can easily be transformed into a minization problem. Let J(z) be an objective function which has to be maximized w.r.t. z. A The maximum can also be found as the minimum to−J(z).

B.1 Unconstrained Optimization

The simplest class of minimization problems is the unconstrained minimization prob-lem. Letz ∈ D ⊆Rⁿ and let J(z)∈Rbe a scalar value function or a performance index which we have to minimize, i.e. to find a vector,z^⋆∈ D, that satisfy

J(z^⋆)≤J(z) ∀z∈ D (B.1)

Ifz^⋆is the only solution satisfying

J(z^⋆)< J(z) ∀z∈ D (B.2) we say thatz^⋆ is a unique global minimum toJ(z) inDand often we write

z^⋆=arg min

z∈DJ(z)

In the case where there exists severalz^⋆∈Z^⋆⊆ D satisfying (B.1) we say thatJ(z) attain its global minimum in the setZ^⋆.

A vectorz^⋆_l is said to be a local minimum inDif there exists a setDl⊆ Dcontaining z_l^⋆ such that

J(z^⋆_l)≤J(z) ∀z∈ Dl (B.3)

Optimality conditions are available when J is a differentiable function and Dl is a convex subset ofRⁿ. In that case we can approximateJ by its Taylor expansion

J =J(zo) + d

dzJ(zo)˜z+1 2˜z^Td²

dz²J

˜ z+ε where ˜z=z−zo.

A necessary condition for z^⋆ being a local optimum is that the point is stationary, i.e.

dzJ(z^⋆) = 0 (B.4)

If furthermore

d²

dz²J(z^⋆)>0

then (B.4) is also sufficient condition. If the point is stationary, but the Hessian is positive semidefinite, then additional information is needed to establish whatever the point is is a minimum. If e.g. J is a linear function, then the Hessian is zero and no minimum exists (in the unconstrained case).

In a maximization problem (B.4) is a sufficient condition if d²

dz²J(z^⋆)<0

In document Dynamic Optimization (Sider 90-99)