Outline of PCA - Non-linear time series models

6.2 Non-linear time series models

7.1.7 Outline of PCA

If the PC's for the periods are compared, it can be concluded that they are very dierent and change over time. This is also expected, as it has already been concluded that there are dierent volatilities depending on the trend. There are also dierent types of periods with growing prices and falling prices. In the periods with falling prices the two periods modelled here show a bit of the same behaviour. This can be seen by the similarities in the PC's of period 2 and 4.

It is anticipated that the behaviour in the PC scores has the same behaviour of the data. In gure 7.3a QQ plot of the PC scores from the PC's in period 1 related to each index are found. The PC scores do not look normal, but since it was a little rough assumption to data, a better result would not have been expected after transforming data. It will not be appropriate to make any more statistical tests on the PCA, because weekly log return did not meet the assumption on PCA totally. Thereby, statistical tests might not present the accurate picture and they would be useless.

It was shown that using four PC's is enough to explain at least 76 % of the total variance, in some periods even more. 76 % and high communalities is an

acceptable level and therefore it will be the rst four PC's representing each regime that is modelled with GARCH or ARCH models in the next section. As this is a description of data, it is acceptable to use this information in further modelling despite lack of fullling the assumptions.

QQ plot of PC scores

in period 1

⁰ ²⁰ ⁴⁰ ⁶⁰ ⁸⁰

−604 ^KAXGI

0 20 40 60 80

−303 ^NDDUE15

0 20 40 60 80

−303

NDDUJN

0 20 40 60 80

−202

NDDUNA

0 20 40 60 80

−202 ^NDUEEGF

0 20 40 60 80

−11

TPXDDVD

0 20 40 60 80

−1.50.5

CSIYHYI

0 20 40 60 80

−1.50.5

JPGCCOMP

0 20 40 60 80

−0.51.0

NDEAGVT

0 20 40 60 80

−0.51.0

NDEAMO

Figure 7.3: QQ plot of the PC scores in period 1.

7.2 GARCH

GARCH models are used to model the dynamic structure in the new data, data in PCA space, derived using four principal components. They will be named PCA data. Actually it is the PCA data that should full the requirements of using GARCH, but if the log return indices do not full the requirement, neither will PCA data, and therefore we are satised with the analysis of weekly log return indices. The PCA data is mean adjusted from the PCA transformation, and it is assumed that there are no signicant lags in log return indices on gure4.3a. Also weekly log returns are assumed to be independent, and it was shown that the skewness is almost zero and the kurtosis is larger than 3, just as expected for a GARCH process. The volatility clustering has not been tested,

7.2 GARCH 67

only observed on plots. If the clustering is not represented signicantly in data it will not be signicant in the model, so tests for clustering are not necessary.

Because the models are built on PC-data, they do not represent a single index, but some general tendencies and aspects in the nancial market. E.g. some of the models based on the rst PC-data are models for the overall stock market and overall the bond market. The indices' behaviour can then be estimated upon their dependency on the behaviour of general market, that is turning from PCA space to data space.

To estimate the parameters and order of each PCA data series the R-function garch from the tseries-packace is used. It has some tests build-in that will be explained briey. The Jarque Bera test [32] tests the residuals for normality, with:

H0:

(ςF isher = 0 κExcess = 0 . The test statistics is:

J B=n 6

ς²+1

4κ²

where nis the number of observations. If the residuals are normal distributed the test statistics follows aχ²(2)distribution.

The Box-Ljung test [35] tests the squared residuals for the null hypothesis that they are independently distributed (random data), with the test statistics:

Q=n(n+ 2)

i=1

ˆ ρ²_i n−i,

wherenis the number of observations andhis the number of lags being tested in the autocorrelation with parameterρˆ_i. If the squared residuals are random, the test statistics follows aχ²(k)distribution.

When modelling data with GARCH models it is desirable that the residuals are Gaussian white noise, therefore accepting H₀ in the Jarque Bera test and the Box-Ljung test (large p-value), it cannot be rejected that the residuals are Gaussian white noise.

To see which model order ts the best, the summery output of dierent order combinations are studied. It is an iterative process when determining the model

order, therefore the modelling of the rst period of the rst PCA data series is studied closer to show the considerations. First a model of order(p, q) = (0,1) is modelled. The summary from R is:

> summary(gp1[[1]]) Call:

garch(x = data_pca[, 1], order = c(0, 1)) Model:

GARCH(0,1) Residuals:

Min 1Q Median 3Q Max

-3.32511 -0.58060 0.04404 0.70820 2.34910 Coefficient(s):

Estimate Std. Error t value Pr(>|t|) a0 3.408e+00 5.583e-01 6.104 1.04e-09 ***

a1 3.722e-14 1.570e-01 0.000 1

---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Diagnostic Tests:

Jarque Bera Test data: Residuals

X-squared = 2.0539, df = 2, p-value = 0.3581 Box-Ljung test

data: Squared.Residuals

X-squared = 0.1639, df = 1, p-value = 0.6856

7.2 GARCH 69

The summary contains info on the residuals, the coecients and the p-values from Jarque Bera test and Box-Ljung test. Thea0(α₀in6.13) is signicant but a1is not, but its size is close to zero so it will be ignored. The Jarque Bera test fails to rejectH₀but the null hypothesis of the squared residual being random cannot be rejected in the Box-Ljung test. The constraints in 6.13are obeyed.

Another model might perform better, therefore a (1,1) model is tested. The R-summary is:

> summary(gp1[[1]]) Call:

garch(x = data_pca[, 1], order = c(1, 1)) Model:

GARCH(1,1) Residuals:

Min 1Q Median 3Q Max

-3.3142 -0.5787 0.0439 0.7059 2.3414 Coefficient(s):

Estimate Std. Error t value Pr(>|t|) a0 3.229e+00 2.640e+02 0.012 0.990 a1 3.115e-14 1.620e-01 0.000 1.000 b1 5.883e-02 7.696e+01 0.001 0.999 Diagnostic Tests:

Jarque Bera Test data: Residuals

X-squared = 2.0539, df = 2, p-value = 0.3581 Box-Ljung test

data: Squared.Residuals

X-squared = 0.1639, df = 1, p-value = 0.6856

None of the coecient are now signicant, and therefore this model is useless.

The(0,2)is tried instead:

> summary(gp1[[1]]) Call:

garch(x = data_pca[, 1], order = c(0, 2)) Model:

GARCH(0,2) Residuals:

Min 1Q Median 3Q Max

-3.2058 -0.6091 0.0479 0.6971 2.4004 Coefficient(s):

Estimate Std. Error t value Pr(>|t|) a0 3.230e+00 9.670e-01 3.341 0.000836 ***

a1 3.618e-14 1.594e-01 0.000 1.000000 a2 5.830e-02 1.530e-01 0.381 0.703122

---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Diagnostic Tests:

Jarque Bera Test data: Residuals

X-squared = 1.4659, df = 2, p-value = 0.4805 Box-Ljung test

data: Squared.Residuals

X-squared = 0.2078, df = 1, p-value = 0.6485

Of course these result are close to the (0,1) model. There is still only one signicant parameter, and the residuals do not behave any better, therefore the (0,1) model is preferred. Sometimes testing for higher order suddenly gives a nice result. These tests for higher order have also been done in this project, but it has no interest to the reader unless an appropriate model is found, therefore useless models are not presented.

7.2 GARCH 71

In table7.7all the estimated GARCH parameters together with p-values, Box-Ljung, Jarque Bera test results are found.

Period PC-data α0 p-value α1 p-value J-B B-L

1 1 3.408 1.0e-09 3.7e-14 1 0.358 0.69

1 2 1.712 1.8e-04 0.088 0.698 0.616 0.96

1 3 1.210 2.7e-13 0.004 0.972 0.108 0.99

1 4 0.842 5.8e-10 8.8e-15 1 0.410 0.37

2 1 2.638 3.0e-04 0.331 0.041 0.203 0.81

2 2 1.499 7.7e-08 0.076 0.571 0.050 0.66

2 3 1.506 3.6e-12 2.8e-15 1 0.132 0.33

2 4 0.390 0.008 0.646 0.044 0.261 0.71

3 1 4.336 <2e-16 0.055 0.27 2.9e-06 0.78 3 2 1.842 <2e-16 0.074 0.136 4.6e-07 0.90 3 3 0.865 <2e-16 0.162 0.082 0.230 0.75 3 4 0.679 <2e-16 0.098 0.182 <2e-16 0.78

4 1 1.728 0.002 1.080 0.002 0.019 0.31

4 2 1.564 7.3e-04 0.108 0.692 0.598 0.90

4 3 0.302 2.1e-04 0.334 0.214 0.088 0.55

4 4 0.440 6.1e-05 0.012 0.96 0.025 0.99

5 1 3.139 2.2e-06 0.494 0.048 4.5e-10 0.46

5 2 1.508 3.8e-06 0.041 0.772 0.698 0.97

5 3 1.037 1.5e-08 0.034 0.759 0.561 0.96

5 4 0.446 <2e-16 0.022 0.689 1.2e-12 0.49 Table 7.7: Estimates of model parameters of the best GARCH t, with p-value

for each parameter and test results of the Jarque Bera test and the Box-Ljung test.

In table 7.7 it appears that the constraints of ARCH models are not met in period 4 PC-data 1, where α₁ > 1. I order to have nite variance, another model needs to be found. It turns out that the model with the lowest order that meets all constraints is an ARCH(3). The model parameter is found in table 7.8.

Period PC-data α0 p-value α1 p-value

4 1 5.268 0.0448 7.636e-01 0.0423

α2 p-value α3 p-value J-B B-L

4.804e-02 0.8028 9.993e-15 1 6.623e-05 0.1228 Table 7.8: Model parameters and test statistics for PC-data 1 in period 4.

As it is seen, it turns out that all the best ts are ARCH models of order 1, except the one mention above. Actually a lot of the models should be ARCH(0) models, becauseα₁ is not signicant. ARCH(0) processes are Gaussian white noise processes because:

Xt = σtωt (7.1)

σ²_t = α0, (7.2)

X_t = α²₀ω_t (7.3)

Despite that those models with insignicantα₁, the size ofα₁is relatively small and therefore they are still modelled as ARCH(1) processes. Some of the models have residuals that are not normal distributed, though all squared residuals are independently distributed. The use of weekly data seems resonable to obtain independent residuals, but having non-normal residuals is almost a matter of course when input data are non-normal with outliers. The test for normality easily fails if only a few residual are outliers, and therefore this will be ignored.

An ARCH(1):

Xt = σtωt (7.4)

σ²_t = α₀+α₁X_t−1² , (7.5) can be represented as an AR(1) process for X_t²:

σ²_t = α₀+α₁X_t−1²

σ_t²+X_t²−σ²_t = α0+α1X_t−1² +X_t²−σ²_t X_t² = α₀+α₁X_t−1² +X_t²−σ²_t X_t² = α0+α1X_t−1² +σ_t² ω_t²−1 X_t² = α0+α1X_t−1² +vt,

7.2 GARCH 73

where v_t = X_t² −σ²_t is the surprise in volatility. ω_t is i.i.d with E(ω_t²) = 1, therefore v_t is a white noise process with E(v_t) = 0. `-step forecasting in ARCH(1) is by [27]:

Xt+` = σt+`ωt+` (7.6)

σ_t+`² = α^`₁X_t²+

`−1

i=0

α₀αⁱ₁. (7.7)

This will be useful when generating scenarios, though it is a function in R that is used to simulate data.

It is actually quite surprising that none of the models are GARCH models, and in some cases not even ARCH models, despite the fact stated in the data analysis chapter where a test for data being white noise could not be rejected.

In chapter8the models from table7.7and7.8will be used to generate scenarios.

Scenarios, based on other methods, will be generated in order to evaluate the use of non-signicant models. The subjective choice of models will be discussed in chapter10, and the model will be tested in chapter9.

Chapter 8

Scenario generation

In the recent chapters data has been analysed, transformed and at last modelled into four models in each of the ve regimes. Now it is time to use these models to generate scenarios. The scenarios should not be predictions of the future index values, but represent a wide range of possible future index values. The approach to generate scenarios might be very subjective, but in order to get good scenarios dierent thing and aspect should be considered at rst, such as the use of the scenarios and which characteristics in data are necessary in order to get realistic scenarios, and afterward build in to the algorithm for the generation.

The purpose of generating scenarios in this project is to have scenarios that can be used in the investment process, more specied in the asset allocation pro-cess where optimization and risk management are key factors. For this reason, not only realistic scenarios are a must but also extreme events are a desirable quality. Scenarios should be constructed such that they can be tested for cor-rectness and accuracy. There might be patterns and characteristics in data that are unique or special for exactly that class of data. In this project, volatility clustering, trend, decreasing volatility when prices rise and increasing volatility when prices fall and correlation to other asset classes have been observed. It is also important that the scenarios take basic economic assumptions into ac-count, such as no arbitrage principle etc. Two dierent approaches have been used to generate scenarios. The rst presented in this chapter is based on data

in principal component space modelled with ARCH models. The other method is a simple bootstrap.

8.1 Scenario generation using ARCH models

The regime divided ARCH models on principal component are chosen in order to keep as much of the characteristics in data as possible. ARCH models are used in order to keep the behaviour in the indies e.g. trend and volatility clustering, however some models did not have this quality after splitting data into regimes.

PCA ensure the correlation between the indices, and reduces the number of models to be made. Splitting data into regimes ensures the dierent behaviour between up and down periods.

In document Scenario generation for nancial market indices (Sider 79-90)