• Ingen resultater fundet

Continuous Dynamic Programming

In document Dynamic Optimization (Sider 90-99)

W0 u0 V0(x0) u0(x0)

x0 -1 0 1

2 7.56 8.88 ∞ 7.56 -1

✷ It should be noticed, that state space and decision space in the previous examples are discrete. If the state space is continuous, then the method applied in the examples can be used as an approximation if we use a discrete grid covering the relevant part of the state space.

6.2 Continuous Dynamic Programming

Consider the problem of finding the input functionut, t∈R, that takes the system

˙

x=ft(xt, ut) x0=x0 t∈[0, T] (6.26) from its initial state along a trajectory such that the cost function

J =φT(xT) + Z T

0

Lt(xt, ut)dt (6.27) is minimized. As in the discrete time case we can define the truncated performance index

Jt(xt, uTt) =φT(xT) + Z T

t

Ls(xs, us)ds

which depend on the point of truncation, on the state, xt, and on the whole input function, uTt, in the interval from t to the end point, T. The optimal performance index, i.e. the Bellman function, is defined by

Vt(xt) = min

uTt

Jt(xt, uTt)

We have the following theorem, which states a sufficient condition.

Theorem 15: Consider the free dynamic optimization problem specified by (6.26) and (6.27). The optimal performance index, i.e. the Bellman function Vt(xt), satisfy the equation

−∂Vt(xt)

∂t = min

u

Lt(xt, ut) +∂Vt(xt)

∂x ft(xt, ut)

(6.28) with boundary conditions

VT(xT) =φT(xT) (6.29)

Equation (6.28) is often denoted as the HJB (Hamilton, Jacobi, Bellman) equation.

Proof: In discrete time we have the Bellman equation Vi(xi) = min

ui

h Li(xi, ui) +Vi+1(xi+1) i with the boundary condition

VN(xN) =φN(xN) (6.30)

t+ ∆t i+ 1 t

i

Figure 6.8. The continuous time axis

In continuous timeicorresponds tot andi+ 1 tot+ ∆t, respectively. Then Vt(xt) = min

If we apply a Taylor expansion onVt+∆t(xt+∆t) and on the integral we have Vt(xt)i= min

Finally, we can collect the terms which do not depend on the decision Vt(xt) =Vt(xt) +∂Vt(xt)

In the limit ∆t → 0 we will have (6.28). The boundary condition, (6.29), comes

directly from (6.30). ✷

Notice, if we have a maximization problem, then the minimization in (6.28) is sub-stituted with a maximization.

If the definition of the Hamiltonian function

Ht=Lt(xt, ut) +λTtft(xt, ut) is used, then the HJB equation can also be formulated as

−∂Vt(xt)

∂t = min

ut

Ht(xt, ut,∂Vt(xt)

∂x )

92 6.2 Continuous Dynamic Programming

Example: 6.2.1 (Motion control). The purpose of this simple example is to illustrate the application of the HJB equation. Consider the system

˙

xt=ut x0=x0 and the performance index

J = 1 The HJB equation, (6.28), gives:

−∂Vt(xt)

The minimization can be carried out and gives a solution w.r.t. utwhich is ut=−∂Vt(xt)

∂x (6.31)

So if the Bellman function is known the control action or the decision can be deter-mined from this. If the result above is inserted in the HJB equation we get

−∂Vt(xt)

which is a partial differential equation with the boundary condition VT(xT) =1

2px2T

Inspired of the boundary condition we guess on a candidate function of the type Vt(x) = 1

2stx2

where the explicit time dependence is in the time function,st. Since

∂V

backwards. This is actually (a simple version of) the continuous time Riccati equa-tion. The solution can be found analytically or by means of numerical methods.

Knowing the function,st, we can find the control input from (6.31). ✷ It is possible to find the (continuous time) Euler-Lagrange equations from the HJB equation.

Appendix A

Quadratic forms

In this section we will give a short resume of the concepts and the results related to positive definite matrices. If z∈Rn is a vector, then the squared Euclidian norm is obeying:

J =kzk2=zTz≥0 (A.1)

IfAis a nonsignular matrix, then the vectorAzhas a quadratic normzAAz≥0.

LetQ=AAthen

kzk2Q=zTQz≥0 (A.2)

and denotekzkQ as the square norm of zw.r.t.. Q.

Now, let S be a square matrix. We are now interested in the sign of the quadratic form:

J =zSz (A.3)

where J is a scalar. Any quadratic matrix can be decomposed in a symmetric part, Ss and a nonsymmetric part,Sa, i.e.

S=Ss+Sa Ss= 1

2(S+S) Sa =1

2(S−S) (A.4) Notice:

Ss=Ss Sa=−Sa (A.5)

Since the scalar, zSazfulfills:

zSaz= (zSaz) =zSaz=−zSaz (A.6) it is true thatzSaz= 0 for anyz∈Rn, or that:

J =zSz=zSsz (A.7)

93

94 6.2 Continuous Dynamic Programming

An analysis of the sign variation of J can then be carried out as an analysis ofSs, which (as a symmetric matrix) can be diagonalized.

A matrix S is said to be positive definite if (and only if) zSz > 0 for allz ∈Rn zz > 0. Consequently, S is positive definite if (and only if) all eigen values are positive. A matrix, S, is positive semidefinite if zSz ≥0 for all z ∈ Rn zz >0.

This can be checked by investigating if all eigen values are non negative. A similar definition exist for negative definite matrices. IfJcan take both negative and positive values, thenSis denoted as indefinite. In that case the symmetric part of the matrix has both negative and positive eigenvalues.

We will now examine the situation

Example: A.0.1 In the following we will consider some two dimensional problems.

Let us start with a simple problem in which:

H =

In this case we have

J =z12+z22 (A.9)

and the levels (domain in which the loss functionJ is equalc2) are easily recognized as circles with center in origin and radius equalc. See Figure A.1,area aandsurface

a. ✷

Example: A.0.2 Let us continue the sequence of two dimensional problems in which:

The explicit form of the loss functionJ is:

J =z21+ 4z22 (A.11)

and the levels (withJ =c2) is ellipsis with main directions parallel to the axis and length equalc and 2c, respectively. See Figure A.1,area bandsurface b. ✷ Example: A.0.3 Let us now continue with a little more advanced problem with:

H =

In this case the situation is a little more difficult. If we perform an eigenvalue analysis of the symmetric part ofH (which isH itself due to the factH is symmetric), then we will find that:

H =V DV V =

-5 0 5

which means the column in V are the eigenvectors and the diagonal elements of D are the eigenvalues. Since the eigenvectors conform a orthogonal basis, VV =I, we can choose to representz in this coordinate system. Let ξbe the coordinates in relation to the column ofV, then

z=V ξ (A.14)

and

J =zHz=ξVHV ξ=ξDξ (A.15) We are hereby able to write the loss function as:

J = 0.7ξ21+ 4.3ξ22 (A.16)

Notice the eigenvalues 0.7 and 4.3. The levels (J =c2) are recognized ad ellipsis with center in origin and main directions parallel to the eigenvectors. The length of the principal axis are c

0.7 and c

4.3, respectively. See Figure A.2 area candsurface c.

✷ For the sake of simplify the calculus we are going to consider special versions of quadratic forms.

Lemma 3: The quadratic form

J = [Ax+Bu]TS[Ax+Bu]

96 6.2 Continuous Dynamic Programming

can be expressed as

J =

Proof: The proof is simply to express the loss functionJ as J=zTSz z=Ax+Bu=

We have now studied properties of quadratic forms and a single lemma (3). Let us now focus on the problem of finding a minimum (or similar a maximum) in a quadratic form.

Lemma 4: Assume, uis a vector of decisions (or control actions) and xis a vector containing known state variables. Consider the quadratic form:

J= xu

h11 h12

h12 h22

x u

(A.17) There exists not a minimum ifh22isnota positive semidefinite matrix. Ifh22ispositive definitethen the quadratic form has a minimum for

u=−h221h12x (A.18)

and the minimum is

J=xSx (A.19)

where

S=h11−h12h221h12 (A.20) If h22 is only positive semidefinite then we have infinite many solutions with the same

value ofJ. ✷

Proof: Firstly we have J = xu

h11 h12

h12 h22

x u

(A.21)

= xh11x+ 2xh12u+uh22u (A.22)

and then

∂uJ= 2h22u+ 2h12x (A.23)

2

∂u2J = 2h22 (A.24)

Ifh22is positive definite thenJ has a minimum for:

u=−h221h12x (A.25)

which introduced in the expression for the performance index give that:

J = xh11x+ 2xh12u+ (u)h22u

= xh11x−2(xh12h221)h12x +(xh12h221)h22(h221h12x)

= xh11x−xh12h221h12x

= x(h11−h12h221h12)x

Appendix B

Static Optimization

Optimization is short for either maximization or minimization. In this section we will focus on minimization and just give the results for the maximization case.

A maximization problem can easily be transformed into a minization problem. Let J(z) be an objective function which has to be maximized w.r.t. z. A The maximum can also be found as the minimum to−J(z).

B.1 Unconstrained Optimization

The simplest class of minimization problems is the unconstrained minimization prob-lem. Letz ∈ D ⊆Rn and let J(z)∈Rbe a scalar value function or a performance index which we have to minimize, i.e. to find a vector,z∈ D, that satisfy

J(z)≤J(z) ∀z∈ D (B.1)

Ifzis the only solution satisfying

J(z)< J(z) ∀z∈ D (B.2) we say thatz is a unique global minimum toJ(z) inDand often we write

z=arg min

z∈DJ(z)

In the case where there exists severalz∈Z⊆ D satisfying (B.1) we say thatJ(z) attain its global minimum in the setZ.

98

A vectorzl is said to be a local minimum inDif there exists a setDl⊆ Dcontaining zl such that

J(zl)≤J(z) ∀z∈ Dl (B.3)

Optimality conditions are available when J is a differentiable function and Dl is a convex subset ofRn. In that case we can approximateJ by its Taylor expansion

J =J(zo) + d

dzJ(zo)˜z+1 2˜zTd2

dz2J

˜ z+ε where ˜z=z−zo.

A necessary condition for z being a local optimum is that the point is stationary, i.e.

d

dzJ(z) = 0 (B.4)

If furthermore

d2

dz2J(z)>0

then (B.4) is also sufficient condition. If the point is stationary, but the Hessian is positive semidefinite, then additional information is needed to establish whatever the point is is a minimum. If e.g. J is a linear function, then the Hessian is zero and no minimum exists (in the unconstrained case).

In a maximization problem (B.4) is a sufficient condition if d2

dz2J(z)<0

In document Dynamic Optimization (Sider 90-99)