• Ingen resultater fundet

4. Data and methodology

4.2 Methodology

4.2.2 Regression models

OLS regressions are computed in order to specify investor’s views on the Black Litterman portfolios. The predictive power of the variables is explored in order to obtain the premiums.

Regression analysis is an important statistical tool to investigate and establish if a chosen variable, or multiple variables, have an impact on the future movement of another variable (Enders, 2014). The regression models are based on historical data by applying time-series modelling. The models will be tested to detect out-of-sample prediction of the views, meaning that from the period 1980 to 2000 will be used as an estimation window to get a model that can explain out-of-sample for 2000 and further.

4.2.2.1 Equity premium prediction

Multiple regression models will be used to explain the change in tomorrow's return on the S&P 500 index. A general prediction model for equity excess return is given as

where rt+1 is the premium of an equity index, xi,t are the fourteen different variables, presented by Goyal and Welch (2007), that supposedly have predictable features over the index in question and 𝜀"I% is the error term of the model.

The forecast analysis’ will follow a large part of the methodology of Rapach et al. (2007) to predict the fourteen variables, and weighted them together in a combination forecast, using an average of the predictor variables. Goyal and Welch (2007) and Rapach et al. (2007) use a recursive estimation window. However, we will apply linear OLS regressions based on the in-sample forecast to predict out-of-sample forecast for the S&P 500. This means, that constants model coefficients, alpha and betas are carried out as estimates to forecast.

After performing the described procedure on all mentioned variables above, the combination forecast suggested by Rapach (2007) is applied to produce a forecast of the premium on the S&P 500 index using the fourteenth predictor variables. Rapach et al. (2007) present multiple suggestions on how to calculate a premium forecast that consistently outperforms the historical average. However, we chose to use the following simple averaging method presented in their paper:

𝑟̂P,"I%= ∑+)$%𝜔),"𝑟̂),"I% (Equation 4.2.6)

The equation is based on the results arising from the N individual prediction models estimated in Equation 4.2.6 above and the equation is based on the forecasted returns of the SP 500 index rather than the actual realized returns. This approach uses the mean average approach providing us with wi,t = 1/N for i = 1,.., N for each monthly observation t. Further, we use the estimated expected excess return, 𝑟),"I%, for all monthly observations t in our out-of-sample period.

4.2.2.1 Bond premium prediction

The methodology of the bond premium prediction is based on the economists from New York Fed (ACM) where the bond yield prediction is explained by OLS linear regression. Following three-step regressions for the parameters of the model:

1. The bond premium is estimated applying ordinary least squares decomposed into the in-sample period.

2. The excess returns are regressed on a constant and a lagged pricing factor, i.e. term premia according to the regression model

𝑟"^_2`− 𝑟; = 𝛼 + 𝛽𝑇𝑃".%+ 𝜀" (Equation 4.2.7)

Where the 𝑟"− 𝑟; is the excess bond premium, and TP is the term premia given from ACM.

The model applies the coefficient of the OLS-regressions [𝛼• 𝛽Ÿ]

and store the model coefficients, alpha and beta.

3. The regression model is now used to the application of the out-of-sample dataset, to predict the excess return based on the in-sample a linear forecast.

In the real estimation of the term structure, ACM applies a five-factor model as their baseline specification, in which they compare the four-factor model known from Cochrane and Piazzesi.

They notice that the risk-free short-term rate, including other pricing factors, provides them with the estimates of the zero-coupon yield without observing the curves. Generally, the term premium is said to reflect compensation for holding long-term bonds, but in reality, several factors influence the bond yield being the expectations and term premium components. One should note, that the methodology of ACM estimates the bond-yield, while we in fact, tries to estimate the bond price.

The regressions require that the term premium will have some information about the excess return in the long-term bonds. However, this is already found to be accurate, since the term premia is based on their the five-factor model of ACM, which fitted the data of zero-coupon yield provided by Gurkaynak, Sack, and Wright (2007) exceptionally well. As shown in APPENDIX 6, the term premium has been positive for many years, and based on the fact that this study is estimating the term premia from 1980 – 2000, is expected to have positive regression of the bond premium mostly of the out-of-sample.

The most commonly applied residuals statistics to evaluate the prediction are MAE, RMSE or MSE.

These are used to measure the magnitude of the error in the prediction models. MSE shows the mean squared error and computes how close the predicted values are to the true values. Many works of literature propose to use the same measures MAE, RMSE or MSE. MAE has been cited for being the primary measure for comparing forecasting out-of-sample (Chen, Twycross &

Garibaldi, 2017). The measure is the average over the out-of-sample of the absolute differences between prediction and actual values. MSE is defined as

𝑀𝑆𝐸 = 1

𝑁∑(𝑦 − 𝑦•)(

where y is the true value and 𝑦• is the predicted value.

RMSE (Root Mean Square Error) is the standard deviation of the residuals. RMSE measures how the prediction errors are spread out, and how the data is fitted, also defined as the measure of the difference values between the predicted values and actual values observed. An RMSE value of 0 indicates a perfect fit to data, and it will always be an achievement to aim for the lowest prediction error (Chen, Twycross & Garibaldi, 2017).

𝑅𝑀𝑆𝐸 = )'(𝑦•"− 𝑦")(

#

"$%

Eventually, the models with the lowest implied MSE and RMSE are preferred, since they provide the lowest errors of predicted values relative to the true values. The method takes into account the estimated bias but also takes in the estimate variance (Chen, Twycross & Garibaldi, 2017) (Enders, 2014).

Another model implied statistics measure of error is the R-squared, which measures how close the data are to the regression. In other words, it is defined as the proportion of the variation in the dependent variable that is predictable from the independent variable. The measure is also known as the coefficient of determination:

𝑅(=:f!8@)4:' Q@?)@")94

"9"@8 Q@?)@")94 = 1 −∑(a&.ag)$

∑(a&.ah ̄)$

The R-squared is always in between the value of 0% and 100%, where 0% means that the model explains no variability around the model mean. In comparison, 100% indicates that the model explains all the variability around the mean. In general, a higher R-squared proposes for a higher fitting model (Tsay, 2002).

The regression model is estimated on the in-sample dataset, which is then used to forecast the out-of-sample period. The coefficients of the regression models will subsequently be provided in addition to the test-statistics, 𝑡 =7"' :??9?P9:; and the 𝑅(. Similar to the approach of Rapach et al.

(2007), the historical average is applied as a benchmark 𝑦̄ = %

+· ∑ 𝑌.