Outline of the lecture

(1)

Time Series Analysis

Henrik Madsen

hm@imm.dtu.dk

Informatics and Mathematical Modelling Technical University of Denmark

DK-2800 Kgs. Lyngby

(2)

Outline of the lecture

Regression based methods, 2nd part:

Regression and exponential smoothing (Sec. 3.4) Time series with seasonal variations (Sec. 3.5)

(3)

Regression without explanatory variables

During Lecture 2 we saw that assuming known independent variables x we can forecast the dependent variable Y

To be able to do so we estimated θ in

Y_t = f(x_t, t; θ) + ε_t If we do not have access to x we may use:

Y_t = f(t;θ) + ε_t

During this lecture we shall consider models of this (last) form and we shall consider how θ^b can be updated as more

information becomes available

Only models linear in θ will be considered

(4)

Model: Constant mean

Y_t = µ + ε_t, ε_t i.i.d. with mean zero and constant variance σ² (white noise).

In vector form (t = 1, . . . , N): Y = 1µ + ε Estimate: µˆ = (1^T 1)⁻¹1^T Y = N⁻¹

XN t=1

Y_t = ¯y_·

Prediction (the conditional mean): Y^b_N+ℓ|N = µb = _N¹

XN t=1

Y_t Variance of the prediction error:

V

Y_N+ℓ − Yb_N+ℓ|N

= σ²(1 + _N¹ )

(5)

Updating the estimate

Based on Y1, Y2, . . . , Y_N we have µˆ_N = _N¹

XN t=1

Y_t

When we get one more observation Y_N+1 the best estimate is ˆ

µ_N+1 = _N¹₊₁

N+1

X

t=1

Y_t

Recursive update:

ˆ

µ_N+1 = 1 N + 1

N+1

X

t=1

Y_t = 1

N + 1Y_N+1 + N

N + 1µˆ_N

(6)

Model: Local constant mean

In the constant mean model the variance of the forecast error decrease towards σ² as 1/N

Therefore, if N is sufficiently high (say 100) there is not much gained by increasing the number of observations

If there is indications that the true (underlying) mean is actually changing slowly it can even be advantageous to “forget” old observations.

One way of doing this is to base the estimate on a rolling window containing e.g. the 100 most recent observations An alternative is exponential smoothening

(7)

Exponential smoothening

ˆ

µ_N = c

N−1

X

j=0

λ^jY_N_−j = c[Y_N + λY_N₋1 + · · · + λ^N⁻¹Y1]

Observation number

Weight

0 5 10 15 20 25 30

0 c

The constant c is chosen so that the weights sum to one, which implies that c = (1 − λ)/(1 − λ^N). For large N:

ˆ

µ_N+1 = (1−λ)Y_N+1 +λµˆ_N or Y^b_N+ℓ+1|N+1 = (1−λ)Y_N+1 +λYb_N+ℓ|N

(8)

Choice of smoothing constant α = 1 − λ

The smoothing constant α = 1 − λ determines how much the latest observation influence the prediction

Given a data set t = 1, . . . , N we can try different values before implementing the method on-line

S(α) =

XN t=1

(Y_t − Yb_t|t−1(α))²

If the data set is large we eliminate the influence of the initial estimate by dropping the first part of the errors when

evaluating S(α)

(9)

Example – wind speed 76 m a.g.l. at Risø

Measurements of wind speed every 10th minute

Task: Forecast up to approximately 3 hours ahead using exponential smoothing

10min. avg. of wind speed (m/s)

Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

2002 2003

510152025

(10)

S ( α ) for horizons 10 and 70 minutes

20000 30000 40000 50000

0.2 0.4 0.6 0.8

10 minutes

Weight on most recent observation

SSE

80000 85000 90000 95000 100000

0.2 0.4 0.6 0.8

70 minutes

SSE

10 minutes (1-step): Use α = 0.95 or higher 70 minutes (7-step): Use α ≈ 0.7

(11)

S ( α ) for horizons 130 and 190 minutes

130000 135000 140000 145000

0.2 0.4 0.6 0.8

130 minutes

SSE

174000 176000 178000 180000 182000 184000 186000

0.2 0.4 0.6 0.8

190 minutes

SSE

130 minutes (13-step): Use α ≈ 0.6 190 minutes (19-step): Use α ≈ 0.5

(12)

Example of forecasts with optimal α

m/s 468101214

Measurements 10 minute forecast 190 minute forecast

(13)

Trend models

Linear regression model

Functions of time are taken as the independent variables

(14)

Linear trend

Observations for t = 1, . . . , N

Naive formulation of the model: Y_t = φ⁰ + φ¹ t + ε_t

If we want to forecast Y_N+j given information up to N we use Yb_N+j|N = ˆφ⁰ + ˆφ¹ (N + j)

However, for on-line applications N + j can be arbitrary large The problem arise because φ0 and φ1 is defined w.r.t. the

origin 0

Defining the parameters w.r.t. the origin n we obtain the model:

Y_t = θ0 + θ1 (t − N) + ε_t

Using this formulation we get: Y^b_N+j|N = ˆθ⁰ + ˆθ¹ j

(15)

Linear trend in a general setting

The general trend model:

Y_N+j = f^T (j)θ + ε_N+j

The linear trend model is obtained when: f(j) =

1 j

It follows that for N + 1 + j: Y_N+1+j =

1 j + 1

^T

θ+ε_N+1+j =

1 0 1 1

1 j

^T

θ+ε_N+1+j

The 2 × 2 matrix L defines the transition from f(j) to f(j + 1)

(16)

Trend models in general

Model: Y_N+j = f^T(j)θ + ε_N+j

Requirement: f(j + 1) = Lf(j) Initial value: f(0)

In Section 3.4 some trend models which fulfill the requirement above are listed.

Constant mean: Y_N+j = θ0 + ε_N+j

Linear trend: Y_N+j = θ0 + θ1j + ε_N+j

Quadratic trend: Y_N+j = θ⁰ + θ¹j + θ²^j₂² + ε_n⁺_j k’th order polynomial trend:

Y_n+j = θ0 + θ1j + θ2 j²

2 + · · · + θ_k ^j_k^k_! + ε_N+j

Harmonic model with the period p:

(17)

Estimation

Model equations written for all observations Y1, . . . , Y_N

Y = x_Nθ+ ε





 Y1

Y2

... Y_N





 =







f^T (−N + 1) f^T (−N + 2)

... f^T(0)





θ+





 ε1

ε2

... ε_N







OLS-estimates: θ^b_N = (x^T_Nx_N)⁻¹x^T_NY or θb_N = F⁻_N¹h_N F _N =

N−1

X

j=0

f(−j)f^T(−j) h_N =

N−1

X

j=0

f(−j)Y_N_−j

(18)

ℓ -step prediction

Prediction:

Yb_N+ℓ|N = f^T(ℓ)θb_N Variance of the prediction error:

V [Y_N+ℓ − Yb_N+ℓ|N] = σ²

1 + f^T (ℓ)F ⁻_N¹f(ℓ)

100(1 − α)% prediction interval:

Yb_N+ℓ|N ± t_α/₂(N − p)p

V [e_N(ℓ)] =

Yb_N+ℓ|N ± t_α/₂(N − p)σb q

1 + f^T(ℓ)F ⁻_N¹f(ℓ)

2 T

(19)

Updating the estimates when Y

_N₊₁

is available

Task:

Going from estimates based on t = 1, . . . , N, i.e. ^bθ_N to estimates based on t = 1, . . . , N, N + 1, i.e. θ^b_N+1

without redoing everything. . . Solution:

θb_N+1 = F ⁻_N¹₊₁h_N+1

F _N+1 = F _N + f(−N)f^T(−N) h_N+1 = L⁻¹h_N + f(0)Y_N+1

(20)

Local trend models

We forget old observations in an exponential manner:

θb_N = arg min

θ S(θ; N)

where for 0 < λ < 1

S(θ; N) =

N−1

X

j=0

λ^j[Y_N_−j − f^T (−j)θ]²

Weight

0 5 10 15 20 25 30

0.00.40.8

(21)

WLS formulation

The criterion:

S(θ; N) =

N−1

X

j=0

λ^j[Y_N_−j − f^T (−j)θ]²

can be written as:







Y1 − f^T(N − 1)θ Y2 − f^T(N − 2)θ ...

Y_N − f^T(0)θ







T 





λ^N⁻¹ 0 · · · 0 0 λ^N⁻² · · · 0 ... ... . .. ...

0 0 0 1













Y1 − f^T(N − 1)θ Y2 − f^T(N − 2)θ ...

Y_N − f^T (0)θ







which is a WLS criterion with ^Σ = diag[1/λ^N⁻¹, . . . ,1/λ,1]

(22)

WLS solution

θb_N = (x^T_NΣ⁻¹x_N)⁻¹x^T_NΣ⁻¹Y or

θb_N = F ⁻_N¹h_N F_N =

N−1

X

j=0

λ^jf(−j)f^T (−j)

h_N =

N−1

X

j=0

λ^jf(−j)Y_N_−j

(23)

Updating the estimates when Y

_N₊₁

is available

θb_N+1 = F ⁻_N¹₊₁h_N+1

F _N+1 = F _N + λ^Nf(−N)f^T(−N) h_N+1 = λL⁻¹h_N + f(0)Y_N+1

When no data is available we can use h0 = 0 and F0 = 0

For many functions λ^Nf(−N)f^T (−N) → 0 for N → ∞ and we get the stationary result F _N+1 = F _N = F . Hence:

θb_N+1 = L^T θb_N + F ⁻¹f(0)[Y_N+1 − Yb_N+1|N]