• Ingen resultater fundet

Regression methodology for cross-sectional analysis

The following section on the regression methodology for the cross-sectional analysis is based on the econometric theory provided in Baddeley and Barrowclough (2009) and Wooldridge (2012). For the purpose of investigating the cross-sectional determinants of the CARs from our event study, Ordinary Least Squares (OLS) estimation will be applied. OLS involves estimating the line of best fit through the observations on the

44 dependent variable and the explanatory variables27. OLS estimation has the advantage of being computationally simple whilst still being able to yield unbiased and reliable estimators.

In order for estimators to be unbiased and consistent, the Gauss-Markov OLS assumptions for Best Linear Unbiased Estimators (BLUE) have to be satisfied:

1. The error term follows a normal distribution with mean equal to zero; 𝐸(𝜀𝑖) = 0 for all i 2. No autocorrelation; 𝑐𝑜𝑣(𝜀𝑖, 𝜀𝑖) = 0 for i ≠ j

3. Homoscedasticity; 𝜎𝑖2= 𝜎𝑗2 = 𝜎2 for all i and j

4. Model is correctly specified; no omitted or surplus explanatory variables 5. Exogeneity; 𝑐𝑜𝑣(𝜀𝑖, 𝑋𝑖) = 0

6. Linearity in the parameters

7. No perfect multicollinearity; no perfect linear association between explanatory variables

In this section, each of the 7 Gauss-Markov assumptions outlined above will be explained, and methods used to secure compliance with these assumptions will be outlined.

4.7.1 Assumption 1 - The error term follows a normal distribution

When estimating parameters through OLS estimation, we can make inferences about whether or not our hypotheses on the relationship between CAR and our explanatory variables are capturing what is observed in our real world dataset. However, if we want to test the accuracy of our inferences, we have to add the assumption of normality of the error term. With non-normality, t-distribution is only an approximation and will thus result in some inaccuracy concerning p-values.

With that being said, OLS does not per se require normal errors in order to estimate coefficients efficiently. In large samples, you can apply the central limit theorem to obtain correctly estimated p-values. It is acknowledged among econometricians that normality plays less of a role in showing that the OLS estimators are the best linear unbiased estimators. As Gelman and Hill (2006) puts it: “The regression assumption that is generally least important is that the errors are normally distributed. In fact, for the purpose of estimating the regression line (as compared to predicting individual data points), the assumption of normality is barely important at all. Thus, in contrast to many regression textbooks, we do not recommend diagnostics of the normality of regression residuals” (Gelman and Hill, 2006, p. 46).

27 Explanatory variables, predictors, hypothesis variables, and independent variables are used interchangeably throughout this thesis

45 Although the normality assumption is often referred to as redundant when estimating best linear unbiased estimators, we will still test the assumption by analysing QQ-plots and performing Jarque-Bera and Sharpiro-Wilke tests.

4.7.2 Assumption 2 - No autocorrelation

Autocorrelation is present in a dataset when the assumption that the covariance between error terms is zero is violated. Coefficient estimates remain unbiased, however, they are not efficient since the residuals do not only capture random and unimportant factors but also deterministic factors. Thus, the residuals from our sample regression will correlate with each other and, in turn, be non-random.

Autocorrelation is commonly a problem in time series regression, however, it can exist in cross-sectional data.

Spatial autocorrelation occurs if the data follows a natural spatial ordering where the residuals across cross-sectional units are correlated.

In our dataset, there exist only a few reasons why the residuals should correlate over time or space. Firstly, stock prices in general and the beta values in our market model are affected by the state of the Nordic markets at the specific point in time. This should already have been accounted for by including a control variable for the year of the IPO. Secondly, spatial autocorrelation might occur if there are certain conditions and institutional structures shared among firms on the same exchange or within the same industry. This is indeed plausible, which is also why we attempt to capture these effects by including control variables for exchange and industry. As we have not been able to identify any other obvious reasons for why our residuals should be correlated, we assume no autocorrelation when performing our regression analysis.

4.7.3 Assumption 3 - Homoscedasticity

The assumption of homoscedasticity requires that the variance of the error is constant across observations.

Constant error variance means that if one obtains repeated samples of data, the variance of the error for each observation would be the same as the variance of the error for all the other observations. However, if heteroscedasticity is present, the error variance will vary systematically across observations as our explanatory variables vary. If the assumption of homoscedasticity is violated, OLS estimators become inefficient and the variance estimates become biased, which in turn will cause misleading conclusions regarding t and F tests.

In order to test the assumption of homoscedasticity, we will perform Breusch-Pagan’s heteroscedasticity test and White’s general heteroscedasticity test.

46 4.7.4 Assumption 4 - Model is correctly specified

For a regression model to be correctly specified it should neither have any relevant variables with predictive power omitted nor include any irrelevant explanatory variables. Generally, a well-specified regression model has the attributes of parsimony, theoretical consistency, identifiability, goodness of fit, and predictive power.

To achieve this, theory and previous empirical research must guide the selection of the appropriate variables.

Through our extensive literature review and hypothesis development, we have identified our cross-sectional hypotheses and the appertaining variables.

Our hypothesis variables will at first glance merely be candidate variables for our final regression model. Our goal is to estimate a parsimonious regression model that describes the differences in CAR across all sample units simplistically and effectively; the goal is to derive a model that is simple in terms yet useful. Therefore, we perform a stepwise regression procedure whereby we enter and remove the explanatory variables to and from our model until we observe no justifiable reason to proceed.

When narrowing down our model specification, we automatically disregard certain cross-sectional hypotheses.

The hypotheses regarding the omitted variables will therefore be discredited. For such variables, a thorough theoretical and logical reasoning will be provided to substantiate the insignificance of these specific hypothesised relationships with CAR. The hypotheses regarding the variables included in the final model will be scrutinized further and thereafter concluded upon.

It should be emphasised that model building is an art rather than a science. It is based on subjective decision-making and the final model will merely be one out of many feasible and justifiable specifications.

4.7.5 Assumption 5 - Exogeneity

Violation of the assumption of exogeneity creates the problem of endogeneity. Endogeneity occurs when an explanatory variable is correlated with the error term. When an explanatory variable and the error term are correlated, OLS estimates will include some of the influence of the errors and, as a result, the coefficient estimates will be both biased and inconsistent.

Endogeneity often stems from omitted variable bias where one or more omitted variables are correlated with one or more of the included explanatory variables. We will investigate the problems of endogeneity and omitted variable bias closely when performing the stepwise regression procedure.

47 4.7.6 Assumption 6 - Linearity in the parameters

Linearity means that the mean of the response, E(Yi), at each set of values of the predictors, (x1i, x2i,…), is a linear function of the predictors. That is, the regression equations for our regression models must be a linear function.

To assess this assumption, we will graphically examine the linear relationship between the residuals and fitted values, as well as between the residuals and the explanatory variables.

4.7.7 Assumption 7 - No perfect multicollinearity

Perfect multicollinearity is the extreme version of multicollinearity that occurs when there exists a perfect linear correlation between explanatory variables (i.e. the correlation between two or more explanatory variables is exactly equal to ±1). Perfect multicollinearity will make OLS estimation spurious as it is impossible separate the individual effects of the explanatory variables. As a result, variances will be infinitely large and confidence intervals will be infinitely wide.

Perfect multicollinearity is seldom present when performing regression analysis, however, imperfect multicollinearity is a common issue for econometricians. Imperfect multicollinearity is evident when the correlation coefficient is large but less than 1 in absolute terms. With imperfect multicollinearity, OLS estimation will not become spurious as is the case for perfect multicollinearity. However, one’s estimation will be less accurate, as standard errors of coefficient estimates will be large, t-statistics of individual coefficients appear low and insignificant, R2 pertains to a high value despite of low t-statistics, and some of the signs of the coefficients may be opposite of the true relationship.

To detect this problem, we will check the simple pairwise correlation coefficients between the explanatory variables by computing a correlation matrix. In addition, we will construct Variance Inflated Factors (VIFs), which are compared to an upper limit rule-of-thumb value of 10.