• Ingen resultater fundet

5. Empirical results

5.6 Robustness checks

Page 75 of 177

superior peer pool depends on the combinations of selection variables in subject to the multiples applied. In the next section, the robustness of the best-performing methods from Section 5.2-5.4 will be addressed to examine the degree of reliability of the findings in relation to the research question.

Page 76 of 177

accuracy could potentially be improved if the peer group size is customized to each method and peer pool, but biases in comparing the methods relative to each other could arise if doing so.

Hence, the objective in this section is to examine whether the findings in the first part of the analysis remain robust across alternative peer group sizes, while secondarily it is considered whether greater accuracy could be achieved by applying a different peer group size. The three selection methods will be tested across peer groups ranging from one to nine as the previously reviewed literature acknowledges peer groups of four to eight firms. Increasing the peer group size to more than nine comparable firms is not deemed relevant in this study as the Danish peer pool is limited by few observations per year. The robustness will be examined for both the Danish and EU peer pool, as the results could vary due to the few observations in the Danish dataset relatively to the EU.

Figure 5.1 illustrates the selection methods’ prediction accuracy using a Danish peer pool for each of the three multiples measured by the median. Overall, the differences in prediction

Page 77 of 177

accuracy between each selection method are most prominent for EV/Sales compared to the two earnings multiples. However, all three multiples follow a similar pattern. For EV/Sales a minor improvement in accuracy is seen for a peer group size of four. The improvement is greatest for the industry method but can be noticed for INDSARD3 and INDSARD4. Overall, the spread between the selection methods for EV/Sales stays relatively stable across peer group size, thereby maintaining their positions in the ranking of performance relative to each other.

Hence, SARD5 keeps its ranking as the most accurate method at predicting EV/Sales multiples.

This indicates that the obtained results of the most optimal peer selection methods remain robust across peer group size leading to a higher level of reliability in regard to the research question.

For EV/EBITDA, the spread between the selection methods is smaller than for EV/Sales, which corresponds with the more ambiguous results observed in the first part of the analysis.

However, all methods are relatively stable across peer group sizes and especially industry is significantly outperformed regardless of the number of peers applied. For EV/EBIT, the selection methods also seem robust across peer groups consisting of one to four comparable firms when examining the level of prediction errors. In terms of the relative performance between the SARD- and INDSARD combinations, it varies somewhat which of the methods is the most optimal, however, these relative differences are small and relate to the overall ambiguous pattern obtained in the first part of the analysis. However, when the peer group size is larger than four firms, the INDSARD methods progressively start to separate from SARD, becoming less accurate. The pattern indicates that when the number of firms in the peer group increases, the INDSARD methods approach the same level of prediction accuracy as the pure industry. The results clearly reflect the issue previously discussed in Section 5.2.2, that for INDSARD a limitation lies in predetermined peer groups caused by too few observations per industry. As the size of the peer group increases, the possible peer groups will be selected by the industry criterion alone, limiting the influence of SARD. Noteworthy, choosing a larger peer group size than four for EV/EBITDA and EV/EBIT could improve the prediction accuracy from the SARD methods. However, the increases in prediction accuracy are minor, and thus not sure to be of significance.

Page 78 of 177

Figure 5.2 shows the accuracy across peer group size when an expanded peer pool with EU is applied. Similar to the results using a Danish peer pool, EV/Sales is robust as the spread between the methods stays constant across peer group size. For EV/EBIT and EV/EBITDA, the industry method varies across peer group size with an increase in accuracy for a peer group of five comparable firms. Noticeably, the industry method outperforms INDSARD4 and INDSARD5 for EV/EBIT when five comparable firms are used in the peer group. Overall, Figure 5.2 indicates a larger peer group size could be optimal for the larger cross-country peer pool. However, the increases observed are marginal and presumably not significant. Finally, SARD3, SARD4, SARD5, and INDSARD3 remain the best performing selection methods across peer group size supporting the results observed in Section 5.3, thus ensuring the reliability of the results to the research question.

Page 79 of 177

5.6.2 Prediction errors across time

In the following section, the sensitivity of the findings will be assessed over time. As in the previous section, all three selection methods will be tested whereto the best performing combinations for SARD and INDSARD will be examined. The sensitivity check will only be reviewed on EU, however, the table for the Danish peer pool can be found in Appendix 11, Table 11.1. As the Danish dataset only contains between 53 to 64 observations per year as seen in Appendix 1, Table 1.1, a change in observations may have a greater impact on the results.

Thus, fluctuations in prediction accuracy across years can be difficult to interpret as it cannot be determined whether the fluctuation is caused by the change in the number of observations or market conditions affecting the models for a specific year. As the EU dataset is large it will be less influenced by the number of observations per year, hence, it is used as the baseline for this sensitivity assessment.

As seen in Figure 5.3, the individual methods under EV/Sales look relatively stable in terms of prediction errors with the exception of industry and INDSARD3. No distinctive pattern across

Page 80 of 177

the years can be observed for these two selection methods. The reason for these fluctuations may presumably lie within industries enduring up- and downturns in regard to valuations, however, it is outside the scope of this thesis to examine this further. Comparing the INDSARD methods with each other, INDSARD4 and INDSARD5 are seen to be stable over the years as opposed to INDSARD3. This indicates, that adding Historic Revenue Growth stabilizes the selection method over time. The rankings between the different methods are relatively stable across the years, with industry remaining the worst performing and SARD5 the most accurate.

The spread for industry and SARD5 vis-à-vis the other methods are considerably larger compared to the two earnings-multiples and are consistent over time. A connection can be made to the significant results found for the EV/Sales multiple in Section 5.3.

Contrarily, for EV/EBITDA and EV/EBIT, the spread between methods is relatively smaller supporting the fact that significance levels are in general smaller in Section 5.3 relative to EV/Sales. Overall, it can be seen from Figure 5.3 for the two earnings multiples that industry, INDSARD4, and INDSARD5 are the least accurate selection methods across time. Moreover, SARD3, SARD4, SARD5, and INDSARD3 seem to be the best performing methods, however, the rankings relative to each other vary across time, explaining the insignificant results in the first part of the analysis and an uncertified pattern. Thus, the findings across time from 2010 until 2019 indicate that the results related to the research question vary if not performed averagely over the sample period.

5.6.3 Error groupings

In the following section, the sample of Danish target firms is grouped into industries and firm size, respectively, in order to examine if certain groups of targets are the driving force behind this study’s findings or whether some groups show different patterns for optimal peer group selection. The groupings are conducted directly on the prediction errors presented in Section 5.2-5.4.

5.6.3.1 Prediction errors across industry

A sensitivity test across industries will provide insight into whether the results are dictated by certain industries or remain robust. As the sample used in this thesis only consist of Danish targets, a possible bias towards certain industries can transpire. The Utility and Energy sector

Page 81 of 177

is excluded from the robustness check as too few Danish target firms are included in the sector and will not provide feasible results to be interpreted.

Table 5.10 reports the prediction errors of the selection methods grouped in industries in the GICS 2-digit layer for the EV/Sales multiple. Panel A presents SARD benchmarked to industry affiliation, while Panel B presents combinations of INDSARD. The number of firms within industry groupings are reported in parenthesis, indicating that the target firms in the sample are not evenly distributed among the industries, e.g. 184 firms belong to ‘Industrial’ opposed to 17 firms in ‘Real Estate’, thus, implying results might be influenced by larger industry groups. In general, the industry method and SARD results seem relatively robust across industry groups as shown in Panel A in Table 5.10 as SARD combinations in their majority outperform industry affiliation. However, the relative performance of the individual combination of SARD is dependent on the industry group. For seven out of nine industry groups, the results imply that SARD5 is the most accurate method against the remaining SARD combinations and industry.

Furthermore, it is also observable that industry and SARD1 are the worst performing selection methods across all industries, corresponding to the results obtained in the first part of the analysis. In Panel B, reporting the results of the INDSARD combinations, it is evident that the

Page 82 of 177

results are relatively robust across industry, e.g. the best performing method is INDSARD3 as found in Section 5.3.2.

Page 83 of 177

Table 5.11 and 5.12 report the prediction errors for the three selection methods for EV/EBITDA and EV/EBIT. From Panel A and Panel B it is in general seen, that the results do not consistently return the most accurate valuation estimations across industries, indicating that the results are in fact dependent on industry for all three selection methods. Overall, the results found in Section 5.2-5.4 are heavily driven by the ‘Industrials’ group, as a high percentage of the sample’s industry membership belongs hereto. This corresponds to the fact that the pattern observed in Table 5.11 and 5.12 for ‘Industrials’ is similar to the results obtained in the previous sections, for instance, INDSARD3 yielding the highest prediction accuracy.

However, across the remaining industry groupings, the most accurate methods vary, supporting the lack of significant results. The rankings comparing SARD and INDSARD are also more ambiguous for EV/EBITDA and EV/EBIT suggesting that the most accurate method is difficult to determine due to the variation within industry.

5.6.3.2 Prediction errors across firm size

In order to examine the robustness of the findings across firm size, the sample of Danish targets is split based on the market value of equity, i.e. Market Capitalization. Groups are based on sub-indexes, which in this study are determined by the official guidelines in Europe for Nasdaq OMX. Hence, Small Cap is defined as firms with a Market Cap below EUR 150m, while the Mid Cap index consists of firms with a size ranging between EUR 150m and 1,000m. Finally, Large-Cap contains firms with a Market Cap above EUR 1,000m (Nordnet, 2020). Based on these definitions, the sample of Danish targets consists of 40% Small Cap, 20% Mid Cap and 40% Large Cap. Such composition of the sample could potentially affect the valuation accuracy since it is found in the literature that prediction errors are highly correlated with firm size, as larger firms seem to yield greater valuation accuracy (Alford, 1992).

In Table 5.13, the median absolute percentage errors for Danish targets based on the EU peer pool are presented for each of the three EV-based multiples. Panel A presents the SARD method with industry as benchmark, while Panel B shows the grouping of errors for INDSARD.

When interpreting the results, there is no distinctive pattern of one index generating lower prediction errors compared to the other indexes, since it varies not only across multiples but also depending on the combination of SARD and INDSARD. These findings indicate that Large Cap does not capture valuation accuracy any better than Small or Mid Cap for Danish targets.

Page 84 of 177

The results in Section 5.3 in the first part of the analysis showed that the fundamental approach, SARD, yields significantly greater accuracy than industry affiliation with an underlying EU peer pool. Table 5.13 indicates that those findings are overall robust across firm size when considering industry against the most accurate SARD combinations. However, for Large Cap firms, only SARD 3-5 outperforms industry, hence, the least accurate predictions are in fact obtained using SARD1. This indicates, that in the first part of the analysis when all combinations of SARD outperform industry with statistical significance, those results are driven by the Small- and Mid-cap firms.

Page 85 of 177

Observing the most accurate combinations of SARD and INDSARD, some deviation is found across firm size in Table 5.13. Overall, INDSARD3 is the most accurate combination for EV/EBITDA regardless of firm size, while for EV/Sales and EV/EBIT the ranking of each combination is somewhat dependent on the individual indexes. INDSARD3 is similarly the best valuation predictor for Mid- and Large Cap firms while Small Cap firms yield greater prediction accuracy with INDSARD5 for EV/Sales and INDSARD2 for EV/EBIT. In regard to SARD, results are robust across firm size for EV/Sales. However, for the earnings multiples, SARD5 is solely the most accurate valuation predictor for Mid- and Large Cap while SARD3 is more accurate for Small Cap firms. This aids some explanation of the insignificant results found in Section 5.3.1 for SARD5 against SARD3 using an EU peer pool for predictions.

When comparing SARD against INDSARD, none of the indexes defer from the others in terms of generating better accuracy across (SARD) or within industries (INDSARD). Similarly, when comparing whether an EU peer pool or a smaller Danish peer pool is most accurate, as found in Appendix 12, Table 12.1, there is no index driving the overall results, since it varies dependently for each index. One exception occurs for EV/Sales using INDSARD, as Small Cap firms generally seem to yield greater accuracy using Danish peers, while Mid- and Large Cap are performing better predictions using a broader, EU peer pool.

To summarize, results in this study are not sensitive towards firm size in terms of overall prediction errors as no specific firm size generally yields greater accuracy than others. However, findings suggest than valuation accuracy could be improved if selection methods are tailored to the firm size in terms of the optimal combination of selection variables, as some deviations are found for Small Cap firms.

5.6.4 Univariate tests

Knudsen et al. (2017) identify peers “from an incrementally increasing ladder of combinations ranging from one to five selection variables” (p. 89). Hence, ROE is applied initially as it yields the most accurate multiple predictions on a stand-alone basis, while Net Debt/EBIT yields the second-most accurate and so forth. As justified in the methodology Section 4.3.1, the research design of this thesis is built around applying the same five selection variables as Knudsen et al. (2017) in order to ensure a foundation of comparison between the performance of SARD in the original paper and the results when the model is applied on a smaller market as Denmark. However, the fourth

Page 86 of 177

selection variable differs somewhat from Knudsen et al. (2017) as previously described, since one-year historic revenue growth has been applied in the absence of EPS forecasts. In this section, equivalent univariate tests are performed to examine if the sequence of the five selection variables is fully appropriate and optimal for this study. The relatively high level of absolute percentage errors obtained in the first part of the analysis could presumably stem from the sequence in fact not being optimal. Moreover, the results did not show a progressive improvement when adding of a selection variable to the combinations in all cases, also suggesting that the sequence is not fully appropriate for the chosen multiples.

Table 5.15 presents the median absolute percentage errors for each of the five selection variables as a single number firm valuator used on the EU peer pool. Each selection variable is examined for all three multiples, with the first line in the table presenting the errors when applied across industries, while the second line exhibits the errors applied within industries.

First and foremost, when examining the level for each selection variable individually across industries (SARD) compared to within industries (INDSARD), these findings suggest that all selection variables yield greater accuracy when applied within industries. For instance, using Size for firm valuation through EV/Sales, prediction errors are 0.658 while it drops to a level of 0.568 when applied within the same industries30. Testing each selection variable statistically in Appendix 13 in Table 13.1 shows that in general, all selection variables are significantly better at a 1%-level for both EV/Sales, EV/EBITDA, and EV/EBIT when applied within industries using Wilcoxon signed-rank tests. In Section 5.3.2, findings suggest that some combinations of INDSARD do not outperform SARD for EV/EBITDA and EV/EBIT multiples with an EU

30 Still used within 2-digit GICS level

Page 87 of 177

peer pool. Since the univariate results indicate that on an individual basis, selection variables are in fact all better at predicting valuation multiples for INDSARD, it implies that it is not the selection variable in itself that yields less accurate results when applied within industries but rather the combination of selection variables that are not fully appropriate for the underlying value drivers when used for INDSARD.

Subsequently, the univariate tests in Table 5.15 show that the ranking of selection variables varies across multiples. Hence, if the sequence is based on the univariate tests as done by Knudsen et al. (2017), the optimal order would be as follows for each multiple:

EV/Sales:

SARD: EBIT margin + Size + ROE + Growth + ND/EBIT INDSARD: EBIT margin + Size + ROE + Growth + ND/EBIT EV/EBITDA:

SARD: Size + ROE + ND/EBIT + EBIT margin + Growth INDSARD: ND/EBIT + EBIT margin + Growth + Size + ROE EV/EBIT:

SARD: ND/EBIT + EBIT margin + Growth + Size + ROE INDSARD: ND/EBIT + Size + EBIT margin + Growth + ROE

Overall, the findings imply that none of the three multiples prioritizes the selection variables identically. It is worth noticing that ROE is ranked in posterior positions in most cases as opposed to the original application in Knudsen et al.'s (2017) paper and what is done in this study. For EV/Sales, the EBIT margin is the most accurate valuation predictor being significant at a 1%-level as seen in Table 5.16. This corresponds to the highly improved prediction errors obtained in the first part of the analysis in Section 5.2-5.4. EV/Sales is the only multiple where the optimal sequence of selection variables is the same for SARD and INDSARD. For EV/EBIT the pattern is rather similar between SARD and INDSARD, but Size is a changing factor, as it is ranked second best for INDSARD while second worst for SARD. Net Debt/EBIT is seen as the best performing selection variable both across and within industries for EV/EBIT and is, as seen in Table 5.16, significant at a 1%-percent level compared to the four other variables for SARD and INDSARD. For EV/EBITDA, however, the sequence of selection variables in SARD does not share similarities with the sequence in INDSARD. Size and ROE is determined as the most appropriate proxies for the underlying value drivers when applied across industries, while the two same selection variables show the least accurate predictions when