• Ingen resultater fundet

Static and Dynamic Optimization (42111)

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "Static and Dynamic Optimization (42111)"

Copied!
29
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

Static and Dynamic Optimization (42111)

Build. 303b, room 048 Section for Dynamical Systems

Dept. of Applied Mathematics and Computer Science The Technical University of Denmark

Email: nkpo@dtu.dk phone: +45 4525 3356 mobile: +45 9351 1161

2019-11-24 14:37

Lecture 12: Stochastic Dynamic Programming

(2)

Outline of lecture

Recap: L11 Deterministic Dynamic Programming (D) Dynamics Programming (C)

Stochastics (Random variable) Stochastic Dynamic Programming Booking profiles

Stochastic Bellman

Stochastic optimal stepping (SDD) Reading guidance: DO p. 83-92.

(3)

Dynamic Programming (D)

Find a sequence of decisionsui i= 0, ,1, . . . N which takes the system xi+1=fi(xi, ui) x0=x0

along a trajectory, such that the cost function

J=φ(xN) +

N−1X

i=0

Li(xi, ui)

is minimized.

(4)

Dynamic Programming

The Bellman function (the optimal cost to go) is defined as:

Vi(xi) = min

uNi1

Ji(xi, uN−1i )

and is a function of the present state,xi, and index,i.

In particular

VN(xN) =φN(xN)

Theorem

The Bellman functionVi, is given by the backwards recursion Vi(xi) = min

ui

h

Li(xi, ui) +Vi+1(xi+1) i

xi+1=fi(xi, ui) x0=x0

with the boundary condition

VN(xN) =φN(xN)

Bellman equation is afunctional equation, gives asufficient conditionandV0(x0) =J.

(5)

Dynamic programming

ui=arg min

ui

h

Li(xi, ui) +Vi+1(fi(xi, ui)

| {z }

xi+1

)

| {z }

Wi(xi,ui)

i

If a maximization problem: min→max.

(6)

Type of solutions

−5 0

5 0

5

10 0

5 10 15 20 25

xt

x Vt(x)

time (i)

V

Fish bone method (Graphical method) Schematic method (Tables)−>programming Analytical (e.g. Sep. of variable)

Analytical:

Guess the type of functionality inVi(x)i.e. up to a number of parameter. Check if it satisfy the Bellman equation. This results in a (number of) recursion(s) for the parameter(s).

(7)

Continuous Dynamic Programming

Find the input functionut, t∈R, (more precisely{u}T0) that takes the system

˙

x=ft(xt, ut) x0=x0 t∈[0, T] (1) such that the cost function

J=φT(xT) + Z T

0

Lt(xt, ut)dt (2)

is minimized. Define the truncated performance index (cost to go) Jt(xt,{u}Tt) =φT(xT) +

Z T t

Ls(xs, us)ds The Bellman function (optimal cost to go) is defined by

Vt(xt) = min

{u}Tt

h

Jt(xt,{u}Tt)i We have the following theorem, which states a sufficient condition.

Theorem

The Bellman functionVt(xt), satisfy the equation

−∂Vt(xt)

∂t = min

ut

Lt(xt, ut) +∂Vt(xt)

∂x ft(xt, ut)

Hamilton Jacobi Bellman (3) This is a PDE with boundary conditions

VT(xT) =φT(xT)

7 / 29

(8)

Continuous Dynamic Programming

Proof.

In discrete time we have the Bellman equation V¯i(xi) = min

ui

h L¯i(xi, ui) + ¯Vi+1(xi+1) i

with the boundary condition

N(xN) =φN(xN)

t + ∆ t i + 1

t i

Then

Vt(xt) = min

ut

Zt+∆t

t

Lt(xt, ut)dt+Vt+∆t(xt+∆t)

Apply a Taylor expansion onVt+∆t(xt+∆t)

Vt(xt) = min

ut

h

Lt(xt, ut)∆t+Vt(xt) +∂Vt(xt)

∂x ft ∆t+∂Vt(xt)

∂t ∆t+o(|∆t|)i

(9)

Continuous Dynamic Programming

Proof.

Vt(xt) = min

ut

h

Lt(xt, ut)∆t+Vt(xt)+∂Vt(xt)

∂x ft∆t+∂Vt(xt)

∂t ∆t+o(|∆t|)i

(just a copy)

Collect the terms which do not depend on the decision (ut):

Vt(xt) =Vt(xt) +∂Vt(xt)

∂t ∆t+ min

ut

h

Lt(xt, ut) ∆t+∂Vt(xt)

∂x ft(xt, ut) ∆ti +o(|∆t|)

In the limit∆t→0(and after divide with∆t):

−∂Vt(xt)

∂t = min

ut

Lt(xt, ut) +∂Vt(xt)

∂x ft(xt, ut)

(10)

The HJB equation:

−∂Vt(xt)

∂t = min

u

Lt(xt, ut) +∂Vt(xt)

∂x ft(xt, ut)

(just a copy)

The Hamiltonian function

Ht(xt, ut, λTt) =Lt(xt, ut) +λTtft(xt, ut) The HJB equation can also be formulated as

−∂Vt(xt)

∂t = min

ut

Ht(xt, ut,∂Vt(xt)

∂x )

Link to Pontryagins maximum principle:

λTt =∂Vt(xt)

∂x

˙

xt = ft(xt, ut) State equation

−λ˙Tt = ∂

∂xt

Ht Costate equation

ut = arg min

ut [Ht] Optimality condition

(11)

Motion control

Consider the system

˙

xt=ut x0=x0 and the performance index

J=1 2px2T+

Z T 0

1 2u2t dt The HJB equation, (3), gives:

−∂Vt(xt)

∂t = min

ut

1

2u2t+∂Vt(xt)

∂x ut

VT(xT) =1 2px2T The minimization can be carried out and gives a solution w.r.t. utwhich is

ut=−∂Vt(xt)

∂x

So if the Bellman function is known the control action, the decision can be determined from this.

If the result above is inserted in the HJB equation we get

−∂Vt(xt)

∂t = 1

2

∂Vt(xt)

∂x 2

∂Vt(xt)

∂x 2

= −1 2

∂Vt(xt)

∂x 2

which is a partial differential equation with the boundary condition VT(xT) =1

2px2T

(12)

PDE:

−∂Vt(xt)

∂t =−1 2

∂Vt(xt)

∂x 2

(just a copy)

Inspired of the boundary condition we guess on a candidate function of the type Vt(x) = 1

2stx2 where the time dependence is in the function,st. Since

∂V

∂x =stx ∂V

∂t =1 2s˙tx2 the following equation

−1

2s˙tx2=−1 2(stx)2 must be valid for anyx, i.e. we can findstby solving the ODE

˙

st=s2t sT=p

backwards. This is actually (a simple version of) the continuous time Riccati equation. The solution can be found analytically or by means of numerical methods. Knowing the function,st, we can find the control input

ut=−∂Vt(xt)

∂x =−stxt

(13)

Stochastic Dynamic Programming

(14)

The Bank loan

Deterministic:

xi+1= (1 +r)xi−ui x0=x0 Stochastic:

xi+1= (1 +ri)xi−ui x0=x0

0 5 10 15 20 25

0 1 2 3 4 5 6 7 8 9 10

Rate of interests

%

time (month)

(15)

0 1 2 3 4 5 6 7 8 9 10 0

0.5 1 1.5 2 2.5 3 3.5 4 4.5

5x 104 Bank balance

Balance

time (year)

0 1 2 3 4 5 6 7 8 9 10

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

5x 104 Bank balance

Balance

time (year)

(16)

Discrete Random Variable

X∈

x1, x2, ..., xm ∈Rn pk=P

X=xk ≥0 Xm

k=1

pk= 1

1 2 3 4 5 6 7 8

0 0.2 0.4

En Xo

= Xm

k=1

pkxk

En g(X)o

= Xm

k=1

pkg(xk)

(17)

Stochastic Dynamic Programming

Consider the problem of minimizing (in some sense):

J=φN(xN,eN) +

N−1X

i=0

Li(xi, ui,ei)

subject to

xi+1=fi(xi, ui,ei) x0=x0 and the constraints

(xi, ui, ei)∈Vi (xN, eN)∈VN

eimight bevectorsreflectingmodel errorsor directstochastic effects.

(18)

Ranking performance Indexes

Wheneiand others are stochastic variable, what do we mean by that one strategy is better than another.

In a deterministic situation we mean that

J1> J2 (J1 (J2)being the objective function for strategy1 (2)).

In a stochastic situation we can choose the definition En

J1

o

>En J2

o

but others do exists. This choice reflects some kind of average consideration.

(19)

Example: Booking profiles

Normally a plane is over booked, ie. more tickets are sold than the number of seatsxN. Letxi

be the number of sold tickets on the beginning of dayi.

0 1 2 N

IfxN< xN we have empty seats - money out the window.

IfxN> xN we have to pay compensations - also money out the window.

So we want to find a strategy such we are minimizing:

En

φ(xN−xN)o

Letwibe therequestsfor a ticket on dayi (with probability: P

wi=k =pk)

and letvibe number ofcancellationson dayi (with probabilityP

vi=k =qk).

Dynamics:

xi+1=xi+min(ui, wi)−vi ei= wi

vi

Decision information: ui(xi).

(20)

Stochastic Bellman Equation

Consider the problem of minimizing:

J=En

φ(xN, eN) +

N−1X

i=0

Li(xi, ui, ei)o

subject to

xi+1=fi(xi, ui, ei) x0=x0 and the constraints

(xi, ui, ei)∈Vi (xN, eN)∈VN

Theorem

The Bellman function (optimal cost to go),Vi(xi)is given by the (backward) recursion:

Vi(xi) = min

ui

En

Li(xi, ui, ei) +Vi+1(xi+1)o

xi+1=fi(xi, ui, ei) VN(xN) =E

n

φN(xN, eN)o

where the optimization is subject to the constraints and the available information.

(21)

Discrete (SDD) case

Ifeiis discrete, ie.

ei

e1i, e2i, ... emi pki =P n

ei=eki o

k= 1, 2, ... m then the stochastic Bellman equation can be expressed as

Vi(xi) = min

ui

Xm

k=1

pkih

Li(xi, ui, eki) +Vi+1(fi(xi, ui, eki)

| {z }

xi+1

)i

| {z }

Wi(xi,ui)

with boundary condition

VN(xN) = Xm

k=1

pkNφN(xN, ekN)

The entries in the scheme below are now expected values (ie. weighted sums).

Wi ui Vi(xi) ui(xi)

xi 0 1 2 3

0 1 2 3 4

(22)

Optimal stochastic stepping (SDD)

Consider the system

xi+1=xi+ui+ei x0= 2, where

ei

−1 0 1 ui∈ {−1, 0, 1} xi∈ {−2,−1, 0, 1, 2}

and

pki ei

xi -1 0 1 -2 0 12 12 -1 0 12 12 0 12 0 12 1 12 12 0 2 12 12 0

J=En x24+

X3 i=0

x2i+u2io

Notice, no stochastic components.

(23)

Optimal stochastic stepping (SDD)

Firstly, from

J=En x24+

X3 i=0

x2i+u2io

(no stochastics in cost) we establishV4(x4) =x24. We are assuming perfect state information.

x4 V4

-2 4

-1 1

0 0

1 1

2 4

(24)

Optimal stochastic stepping (SDD)

Then we establish theW3(x3, u3)function (the cost to go):

W3(x3, u3) = Xm

k=1

pk3 h

L3(x3, u3, ek3) +V4(f3(x3, u3, ek3)i

W3(x3, u3) =p13

x23+u23+V4(x3+u3+e13)

e13, p13 +p23

x23+u23+V4(x3+u3+e23) e23, p23 +p33

x23+u23+V4(x3+u3+e33) e33, p33

| {z }

L3(x3,u3,ek3)

| {z }

f3(x3,u3,ek3)

or more compact:

W3(x3, u3) =x23+u23+p13

V4(x3+u3+e13) +p23

V4(x3+u3+e23) +p33

V4(x3+u3+e33)

(25)

Optimal stochastic stepping (SDD)

W3(x3, u3) = X3 k=1

pkh

x23+u23+V4(x3+u3+ek3)i

W3(0,−1) =1 2

02+ (−1)2+V4(0−1−1)

(−1, 1 2) +0

02+ (−1)2+V4(0−1 +0)

(0,0) +1

2

02+ (−1)2+V4(0−1 +1)

(1, 1 2)

=1

2(1 + 4) + 0 +1

2(1 + 0) = 3

W3 u3

x3 -1 0 1

-2 ∞ 6.5 5.5

-1 4.5 1.5 2.5

0 3 1 3

1 2.5 1.5 4.5

2 5.5 3.5 ∞

x4 V4

-2 4

-1 1

0 0

1 1

2 4

(just for reference)

(26)

Optimal stochastic stepping (SDD)

W3 u3 V3(x3) u3(x3)

x3 -1 0 1

-2 ∞ 6.5 5.5 5.5 1

-1 4.5 1.5 2.5 1.5 0

0 3 1 3 1 0

1 2.5 1.5 4.5 1.5 0

2 5.5 3.5 ∞ 3.5 0

W2 u2 V2(x2) u2(x2)

x2 -1 0 1

-2 ∞ 7.5 6.25 6.25 1

-1 5.5 2.25 3.25 2.25 0

0 4.25 1.5 3.25 1.5 0

1 3.25 2.25 4.5 2.25 0

2 6.25 6.5 ∞ 6.25 -1

(27)

Optimal stochastic stepping (SDD)

W1 u1 V1(x1) u1(x1)

x1 -1 0 1

-2 ∞ 8.25 6.88 6.88 1

-1 6.25 2.88 3.88 2.88 0

0 4.88 2.25 4.88 2.25 0

1 3.88 2.88 6.25 2.88 0

2 6.88 8.25 ∞ 6.88 -1

W0 u0 V0(x0) u0(x0)

x0 -1 0 1

2 7.56 8.88 ∞ 7.56 -1

Trace back: ui(xi). A feed back solution. Not a time function.

(28)

Deterministic setting (xi+1=xi+ui i= 0, ...3)

i 0 1 2 3

ui -1 0 0 0

Stochastic setting (xi+1=xi+ui+ei i= 0, ...3)

x0 u0 x1 u1 x2 u2 x3 u3

-2 1 -2 1 -2 1

-1 0 -1 0 -1 0

0 0 0 0 0 0

1 0 1 0 1 0

2 -1 2 -1 2 -1 2 0

(29)

Concluding remarks

Discrete state and decision space.

Approximation. Grid covering state and decision space.

Curse of dimensions - combinatoric explosion.

Referencer

RELATEREDE DOKUMENTER

A solution approach need only consider extreme points A constraint of a linear program is binding at a point x 0 if the inequality is met with equality at x 0.. It is

Count Jacopo Francesco Riccati Born: 28 May 1676, Venice Dead: 15 April 1754, Treviso University of Padua Source: Wikipedia...

Theorem 5 : Consider the dynamic optimization problem of bringing the system (3.17) from the initial state to a terminal state such that the end point constraints in (3.19) is met

You will be able to formulate and solve operations research and technical-economic models, and to appreciate the interplay between optimization models and the real-life

Constrained Control - Decisions Pontryagins Principle (D) Investment planning Pontryagins Principle (C) Orbit injection (II).. Reading guidance (DO:

These remaining rules will depend on the structure of the process term, in dierent ways for the dynamic and the static operators.. For the dynamic process operators they are

Section 4.3 shows how the static and dynamic constructs have to be instan- tiated in each case. For the GE-instantiation, all base types become Exp ; static and dynamic constants

If a dynamic GBM-based model with earnings that are allowed to ob- tain negative values due to the introduction of fixed operating costs is considered, how does optimal leverage