Static and Dynamic Optimization (42111)

(1)

Static and Dynamic Optimization (42111)

Build. 303b, room 048 Section for Dynamical Systems

Dept. of Applied Mathematics and Computer Science The Technical University of Denmark

Email: nkpo@dtu.dk phone: +45 4525 3356 mobile: +45 9351 1161

2019-11-24 14:37

Lecture 12: Stochastic Dynamic Programming

(2)

Outline of lecture

Recap: L11 Deterministic Dynamic Programming (D) Dynamics Programming (C)

Stochastics (Random variable) Stochastic Dynamic Programming Booking profiles

Stochastic Bellman

Stochastic optimal stepping (SDD) Reading guidance: DO p. 83-92.

(3)

Dynamic Programming (D)

Find a sequence of decisionsui i= 0, ,1, . . . N which takes the system xi+1=fi(xi, ui) x0=x₀

along a trajectory, such that the cost function

J=φ(xN) +

N−1X

i=0

Li(xi, ui)

is minimized.

(4)

Dynamic Programming

The Bellman function (the optimal cost to go) is defined as:

Vi(xi) = min

u^N_i⁻¹

Ji(xi, u^N−1_i )

and is a function of the present state,x_i, and index,i.

In particular

VN(xN) =φN(xN)

Theorem

The Bellman functionVi, is given by the backwards recursion Vi(xi) = min

ui

h

Li(xi, ui) +V_i+1(xi+1) i

x_i+1=fi(xi, ui) x0=x₀

with the boundary condition

VN(xN) =φN(xN)

Bellman equation is afunctional equation, gives asufficient conditionandV0(x0) =J^∗.

(5)

Dynamic programming

ui=arg min

ui

h

Li(xi, ui) +V_i+1(fi(xi, ui)

| {z }

xi+1

)

| {z }

Wi(xi,ui)

i

If a maximization problem: min→max.

(6)

Type of solutions

−5 0

5 0

5

10 0

5 10 15 20 25

xt

x Vt(x)

time (i)

V

Fish bone method (Graphical method) Schematic method (Tables)−>programming Analytical (e.g. Sep. of variable)

Analytical:

Guess the type of functionality inVi(x)i.e. up to a number of parameter. Check if it satisfy the Bellman equation. This results in a (number of) recursion(s) for the parameter(s).

(7)

Continuous Dynamic Programming

Find the input functionut, t∈R, (more precisely{u}^T₀) that takes the system

˙

x=ft(xt, ut) x₀=x₀ t∈[0, T] (1) such that the cost function

J=φT(xT) + Z T

0

Lt(xt, ut)dt (2)

is minimized. Define the truncated performance index (cost to go) Jt(xt,{u}^T_t) =φT(xT) +

Z T t

Ls(xs, us)ds The Bellman function (optimal cost to go) is defined by

Vt(xt) = min

{u}^Tt

h

Jt(xt,{u}^Tt)i We have the following theorem, which states a sufficient condition.

Theorem

The Bellman functionVt(xt), satisfy the equation

−∂Vt(xt)

∂t = min

ut

Lt(xt, ut) +∂Vt(xt)

∂x ft(xt, ut)

Hamilton Jacobi Bellman (3) This is a PDE with boundary conditions

VT(xT) =φT(xT)

7 / 29

(8)

Continuous Dynamic Programming

Proof.

In discrete time we have the Bellman equation V¯i(xi) = min

ui

h L¯i(xi, ui) + ¯Vi+1(xi+1) i

with the boundary condition

V¯N(xN) =φN(xN)

t + ∆ t i + 1

t i

Then

Vt(xt) = min

ut

Z_t+∆t

t

Lt(xt, ut)dt+Vt+∆t(xt+∆t)

Apply a Taylor expansion onV_t+∆t(x_t+∆t)

Vt(xt) = min

ut

h

Lt(xt, ut)∆t+Vt(xt) +∂Vt(xt)

∂x ft ∆t+∂Vt(xt)

∂t ∆t+o(|∆t|)i

(9)

Continuous Dynamic Programming

Proof.

Vt(xt) = min

ut

h

Lt(xt, ut)∆t+Vt(xt)+∂Vt(xt)

∂x ft∆t+∂Vt(xt)

∂t ∆t+o(|∆t|)i

(just a copy)

Collect the terms which do not depend on the decision (ut):

Vt(xt) =Vt(xt) +∂Vt(xt)

∂t ∆t+ min

ut

h

Lt(xt, ut) ∆t+∂Vt(xt)

∂x ft(xt, ut) ∆ti +o(|∆t|)

In the limit∆t→0(and after divide with∆t):

−∂Vt(xt)

∂t = min

ut

∂x ft(xt, ut)

(10)

The HJB equation:

−∂Vt(xt)

∂t = min

u

∂x ft(xt, ut)

(just a copy)

The Hamiltonian function

Ht(xt, ut, λ^T_t) =Lt(xt, ut) +λ^T_tft(xt, ut) The HJB equation can also be formulated as

−∂Vt(xt)

∂t = min

ut

Ht(xt, ut,∂Vt(xt)

∂x )

Link to Pontryagins maximum principle:

λ^T_t =∂Vt(xt)

∂x

˙

xt = ft(xt, ut) State equation

−λ˙^T_t = ∂

∂xt

Ht Costate equation

ut = arg min

ut [Ht] Optimality condition

(11)

Motion control

Consider the system

˙

xt=ut x₀=x₀ and the performance index

J=1 2px²_T+

Z T 0

1 2u²_t dt The HJB equation, (3), gives:

−∂Vt(xt)

∂t = min

ut

1

2u²_t+∂Vt(xt)

∂x ut

VT(xT) =1 2px²_T The minimization can be carried out and gives a solution w.r.t. utwhich is

ut=−∂Vt(xt)

∂x

So if the Bellman function is known the control action, the decision can be determined from this.

If the result above is inserted in the HJB equation we get

−∂Vt(xt)

∂t = 1

2

∂Vt(xt)

∂x 2

−

∂Vt(xt)

∂x 2

= −1 2

∂Vt(xt)

∂x 2

which is a partial differential equation with the boundary condition VT(xT) =1

2px²_T

(12)

PDE:

−∂Vt(xt)

∂t =−1 2

∂Vt(xt)

∂x 2

(just a copy)

Inspired of the boundary condition we guess on a candidate function of the type Vt(x) = 1

2stx² where the time dependence is in the function,st. Since

∂V

∂x =stx ∂V

∂t =1 2s˙tx² the following equation

−1

2s˙tx²=−1 2(stx)² must be valid for anyx, i.e. we can findstby solving the ODE

˙

st=s²_t s_T=p

backwards. This is actually (a simple version of) the continuous time Riccati equation. The solution can be found analytically or by means of numerical methods. Knowing the function,st, we can find the control input

ut=−∂Vt(xt)

∂x =−stxt

(13)

Stochastic Dynamic Programming

(14)

The Bank loan

Deterministic:

x_i+1= (1 +r)xi−ui x0=x₀ Stochastic:

xi+1= (1 +ri)xi−ui x0=x₀

0 5 10 15 20 25

0 1 2 3 4 5 6 7 8 9 10

Rate of interests

%

time (month)

(15)

0 1 2 3 4 5 6 7 8 9 10 0

0.5 1 1.5 2 2.5 3 3.5 4 4.5

5x 10⁴ Bank balance

Balance

time (year)

0 1 2 3 4 5 6 7 8 9 10

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

5x 10⁴ Bank balance

Balance

time (year)

(16)

Discrete Random Variable

X∈

x₁, x₂, ..., xm ∈Rⁿ p_k=P

X=x_k ≥0 Xm

k=1

p_k= 1

1 2 3 4 5 6 7 8

0 0.2 0.4

En Xo

= Xm

k=1

pkxk

En g(X)o

= Xm

k=1

pkg(xk)

(17)

Stochastic Dynamic Programming

Consider the problem of minimizing (in some sense):

J=φ_N(xN,e_N) +

N−1X

i=0

Li(xi, ui,ei)

subject to

x_i+1=fi(xi, ui,ei) x₀=x₀ and the constraints

(xi, ui, ei)∈Vi (xN, eN)∈VN

eimight bevectorsreflectingmodel errorsor directstochastic effects.

(18)

Ranking performance Indexes

Wheneiand others are stochastic variable, what do we mean by that one strategy is better than another.

In a deterministic situation we mean that

J₁> J₂ (J1 (J2)being the objective function for strategy1 (2)).

In a stochastic situation we can choose the definition En

J1

o

>En J2

o

but others do exists. This choice reflects some kind of average consideration.

(19)

Example: Booking profiles

Normally a plane is over booked, ie. more tickets are sold than the number of seatsx_N. Letxi

be the number of sold tickets on the beginning of dayi.

0 1 2 N

IfxN< x_N we have empty seats - money out the window.

IfxN> x_N we have to pay compensations - also money out the window.

So we want to find a strategy such we are minimizing:

En

φ(xN−x_N)o

Letwibe therequestsfor a ticket on dayi (with probability: P

wi=k =pk)

and letvibe number ofcancellationson dayi (with probabilityP

vi=k =q_k).

Dynamics:

x_i+1=xi+min(ui, wi)−vi ei= wi

vi

Decision information: ui(xi).

(20)

Stochastic Bellman Equation

Consider the problem of minimizing:

J=En

φ(xN, eN) +

N−1X

i=0

Li(xi, ui, ei)o

subject to

xi+1=fi(xi, ui, ei) x0=x₀ and the constraints

(xi, ui, ei)∈Vi (xN, e_N)∈V_N

Theorem

The Bellman function (optimal cost to go),Vi(xi)is given by the (backward) recursion:

Vi(xi) = min

ui

En

Li(xi, ui, ei) +V_i+1(xi+1)o

x_i+1=fi(xi, ui, ei) V_N(xN) =E

n

φ_N(xN, e_N)o

where the optimization is subject to the constraints and the available information.

(21)

Discrete (SDD) case

Ifeiis discrete, ie.

ei∈

e¹_i, e²_i, ... e^m_i p^k_i =P n

ei=e^k_i o

k= 1, 2, ... m then the stochastic Bellman equation can be expressed as

Vi(xi) = min

ui

Xm

k=1

p^k_ih

Li(xi, ui, e^k_i) +Vi+1(fi(xi, ui, e^k_i)

| {z }

xi+1

)i

| {z }

Wi(xi,ui)

with boundary condition

VN(xN) = Xm

k=1

p^k_NφN(xN, e^k_N)

The entries in the scheme below are now expected values (ie. weighted sums).

Wi ui Vi(xi) u^∗_i(xi)

xi 0 1 2 3

0 1 2 3 4

(22)

Optimal stochastic stepping (SDD)

Consider the system

xi+1=xi+ui+ei x0= 2, where

ei∈

−1 0 1 ui∈ {−1, 0, 1}^∗ xi∈ {−2,−1, 0, 1, 2}

and

p^k_i ei

xi -1 0 1 -2 0 ¹₂ ¹₂ -1 0 ¹₂ ¹₂ 0 ¹₂ 0 ¹₂ 1 ¹₂ ¹₂ 0 2 ¹₂ ¹₂ 0

J=En x²₄+

X3 i=0

x²_i+u²_io

Notice, no stochastic components.

(23)

Optimal stochastic stepping (SDD)

Firstly, from

J=En x²₄+

X3 i=0

x²_i+u²_io

(no stochastics in cost) we establishV4(x4) =x²₄. We are assuming perfect state information.

x4 V4

-2 4

-1 1

0 0

1 1

2 4

(24)

Optimal stochastic stepping (SDD)

Then we establish theW₃(x3, u₃)function (the cost to go):

W₃(x3, u₃) = Xm

k=1

p^k₃ h

L₃(x3, u₃, e^k₃) +V₄(f3(x3, u₃, e^k₃)i

W₃(x3, u₃) =p¹₃

x²₃+u²₃+V₄(x3+u₃+e¹₃)

e¹₃, p¹₃ +p²₃

x²₃+u²₃+V4(x3+u3+e²₃) e²₃, p²₃ +p³₃

x²₃+u²₃+V4(x3+u3+e³₃) e³₃, p³₃

| {z }

L3(x3,u3,e^k₃)

| {z }

f3(x3,u3,e^k₃)

or more compact:

W₃(x3, u₃) =x²₃+u²₃+p¹₃

V₄(x3+u₃+e¹₃) +p²₃

V₄(x3+u₃+e²₃) +p³₃

V₄(x3+u₃+e³₃)

(25)

Optimal stochastic stepping (SDD)

W₃(x3, u₃) = X3 k=1

p^kh

x²₃+u²₃+V₄(x3+u₃+e^k₃)i

W3(0,−1) =1 2

0²+ (−1)²+V4(0−1−1)

(−1, 1 2) +0

0²+ (−1)²+V4(0−1 +0)

(0,0) +1

2

0²+ (−1)²+V4(0−1 +1)

(1, 1 2)

=1

2(1 + 4) + 0 +1

2(1 + 0) = 3

W3 u3

x₃ -1 0 1

-2 ∞ 6.5 5.5

-1 4.5 1.5 2.5

0 3 1 3

1 2.5 1.5 4.5

2 5.5 3.5 ∞

x4 V4

-2 4

-1 1

0 0

1 1

2 4

(just for reference)

(26)

Optimal stochastic stepping (SDD)

W₃ u₃ V₃(x3) u^∗₃(x3)

x3 -1 0 1

-2 ∞ 6.5 5.5 5.5 1

-1 4.5 1.5 2.5 1.5 0

0 3 1 3 1 0

1 2.5 1.5 4.5 1.5 0

2 5.5 3.5 ∞ 3.5 0

W2 u2 V2(x2) u^∗₂(x2)

x2 -1 0 1

-2 ∞ 7.5 6.25 6.25 1

-1 5.5 2.25 3.25 2.25 0

0 4.25 1.5 3.25 1.5 0

1 3.25 2.25 4.5 2.25 0

2 6.25 6.5 ∞ 6.25 -1

(27)

Optimal stochastic stepping (SDD)

W1 u1 V1(x1) u^∗₁(x1)

x₁ -1 0 1

-2 ∞ 8.25 6.88 6.88 1

-1 6.25 2.88 3.88 2.88 0

0 4.88 2.25 4.88 2.25 0

1 3.88 2.88 6.25 2.88 0

2 6.88 8.25 ∞ 6.88 -1

W₀ u₀ V₀(x0) u^∗₀(x0)

x₀ -1 0 1

2 7.56 8.88 ∞ 7.56 -1

Trace back: ui(xi). A feed back solution. Not a time function.

(28)

Deterministic setting (xi+1=xi+ui i= 0, ...3)

i 0 1 2 3

u^∗_i -1 0 0 0

Stochastic setting (xi+1=xi+ui+ei i= 0, ...3)

x₀ u^∗₀ x₁ u^∗₁ x₂ u^∗₂ x₃ u^∗₃

-2 1 -2 1 -2 1

-1 0 -1 0 -1 0

0 0 0 0 0 0

1 0 1 0 1 0

2 -1 2 -1 2 -1 2 0

(29)

Concluding remarks

Discrete state and decision space.

Approximation. Grid covering state and decision space.

Curse of dimensions - combinatoric explosion.