• Ingen resultater fundet

DISCUSSION

In document May 15th, 2019 Mathias Pagh Jensen (Sider 74-80)

73

74

Predominately, I outlined availability heuristics, authority bias, and affect heuristics. All of the above alter decision-making. These concepts led to the research question. The results – especially those derived with the IRF – are indicative of effects on stock returns based on abovementioned concepts. Both the IRFs – verified and non-verified, respectively – are suggestive of both the affect heuristic and the availability heuristic playing a role. How so?

Consider the availability heuristic: The most recent news or sentiments color human decision-making. The rapidly decaying IRFs are allusive of this phenomenon. A positive shock to the sentiment variable occurs, the market reacts the following day – for verified users the following day and maybe an additional day – after which it dies out completely. In terms of the affect heuristic, the fact that there actually occurs a response in the stock return speaks indicatively to affections – or sentiments – carrying information.

The Granger causality tests conflict mildly with the findings related to the IRFs. However, the reasoning for this behavior can be related to the statistics. That is, the Granger causality test takes into account only the Granger causal relationship – 𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡𝐺𝑟𝑎𝑛𝑔𝑒𝑟 𝑐𝑎𝑢𝑠𝑒

→ 𝑠𝑡𝑜𝑐𝑘 𝑟𝑒𝑡𝑢𝑟𝑛 – whereas the IRF takes into consideration the whole system; both lagged values of the stock returns and the sentiment score. As a result, the inference can vary between the two methodologies. With regard to the Granger causality tests, we do notice a single promising model: The verified users’ sentiment scores. On a 90% significance level, we observe statistical significance. This suggest there is a Granger causal relationship. More on this below.

We briefly touch upon authority bias in the following. First of all, we notice that impulse lasts longer when measuring verified users’ sentiments. The results indicate sentiments – or opinions – from public figures drive decision more so than sentiments from regular people. While above can support the presence of authority bias, it may also relate to who is verified on Twitter.

On Twitter, media – such as newspapers and tv stations – are verified. If such verified users bring news stories, good or bad, this may very well relate to Apple’s actual stock performance, which in turn has an effect on investors. Recall the variables are interdependent, thus are the effects of a given shock.

75

Speaking of verified users, we do see indications that the number of tweets (the operationalization of “hype”) affect the stock price too. Interestingly, a positive shock to the change in the number of tweets is responded by a negative return the following day. The pattern is not too surprising, though. Bad news sells more stories than good news, thus an increase in the number of tweets can be related to negative stories coming out. Nevertheless, the response of such an impact dies out at 𝑛 = 2, suggesting that the availability heuristic holds. While the VAR system provides promising results through the IRF, the Granger causality test does not indicate a Granger causal relationship.

Concisely wrapping up above, there are indicative signs that support the research question, i.e. that lagged change in the number of tweets and Twitter sentiments have an effect on the stock return of Apple. By extension, this suggests support for the behavioral concepts, including both the heuristics and the biases. Broadening the perspective, is the above findings important?

I argue yes. If social media posts in fact proves to be an important feature in relation to stock returns beyond the suggestive, it may have major repercussions. How so? Not only is it essential knowledge to enterprises who need to scan and handle the social media channel even tighter, but it is also important findings on a macro-economic level. What if Twitter sentiments can drive the entire market? It would leave markets vulnerable to newer threats such as cyber-attacks, showing the indicative results are interesting to more people than arbitrage seekers.

To establish a further understanding of the obtained results, I discuss both the data quality and the methodological solidity in the subsection below.

6.2 Data Quality & Methodological Solidity

To put the results into perspective, one must discuss the quality of the data and the solidity of the methodology. More specifically, I discuss the validity and reliability of the data as well as the number of data points which goes into the statistical models. First of all, let’s define validity and reliability according to Olsen (2003). In broad lines, validity embraces the suitability of the data and methodology in relation to the question at hand; did I measure the variable I intended to measure? Reliability, on the other hand, implies time consistency. Basically, it means that the

76

methodology provides the same results today and tomorrow. The results are reproducible, that is. Together, validity and reliability account for the solidity of the paper.

Starting with validity, I attempted to operationalize the concepts of behavioral finance – sentiments, affections, heuristics, and biases – through Twitter data. I sampled all tweets over a period of approximately four months based on keywords chosen beforehand. Of course, this decision has left out a pool of tweets, however, the number of tweets sampled should ensure a representative sample. Operationalizing sentiments through Twitter is a trade-off between quantity and quality as well as accessibility and cost. Scrapping Twitter and performing a sentiment analysis provides quantity and accessibility. Another methodology – a qualitative approach – would entail interviewing a pool of sampled representative people every day during a four-month span. While that would provide a panel-data like data set, that could pave the way for more accurate measurement of sentiments, such a methodology is both costly and very time-consuming. As a result, the quantitative approach is the best bet to obtain a representative scale.

Above represents the imbalance between precision and richness put forth in Chapter 3:

Methodology.

Naturally, the sentiment scoring methodology can be questioned. Does it really measure sentiments? While the lexicon methods are well documented and based on thorough research of individual words, it still entails an amount of uncertainty. Despite an attempt to handle both sarcasm and negation through various techniques, of course, uncertainty persists. However, the number of daily tweets – and thus the aggregate sentiment score – should iron out some of the uncertainty. Finally, word interpretation and sentiments are a subjective matter, thus – with current techniques – sentiment analyses cannot be completely objective. This uncertainty goes into the modeling, too. In the end, the methodology applied is state of the art, thus the data and methodology suit the research question to the best of my ability. I discuss further research considerations in a later subsection.

Reliability. Are the results reproducible? Given the application of the same sentiment lexica (AFINN and NRC), the process described in Chapter 4: Deriving Sentiments from Twitter generates identical output – both per tweet and over time. The degree of reliability is

77

strengthened by the number of tweets included in the analysis. It is further reinforced by the coding-approach, as it ensures identical processing of each and every tweet. Above means that different human biases do not enter the equation from tweet to tweet. The absence of a stricter triangulation of methodologies pulls in the other direction, however. If resources had permitted, a supportive qualitative approach would have been an enhancing factor.

Finally, we consider the number of observations in terms of time-frequency. Despite collecting Twitter data over a four-month period, the models are built on merely 80 observations due to stock market trading days. The question about the right amount of observations is long and diverse, and 80 may be on the lower side. Especially regarding the VAR model based on the NRC lexicon, which naturally has a high number of coefficients. However, looking at the other models, according to Hyndman (2018) 𝑛 > 30 should be sufficient. The lack of the number of observations can be a contributing factor to the poor performance of the NRC lexicon. Although 80 is not ideal, it is the number of observations which was possible to obtain given the Twitter API constraints and the time frame in which the thesis is written.

6.3 Further Research Considerations

There is no denying that the sentiment scoring techniques through natural language processing are imperfect at this moment in time. Despite the abovementioned fact, the simple OLS based approach of vector autoregression has indicated promising results that sentiments alter financial decision-making. Due to the importance of such mechanisms, I strongly suggest further research on the area; including testing on a larger scale, scraping years of Twitter data, and modeling on more assets. Not only is a larger scale important, but also natural development within the field of natural language processing should enhance the accuracy of the sentiment scoring technique. More sophisticated techniques can help to provide results that are beyond suggestive. I leave that to future research.

Another point of interest going forward should be a short time frequency of the data sets.

Not shorten the number of observations but look at intraday data. The analysis conducted in the current paper indicates stock return responses caused by shocks to sentiments last for one or two days, after which it dies out completely. As a natural extension, future research should zoom

78

in on this interval, looking at hourly data within the opening hours of trading days. Do small impulses affect the stock price immediately and during the trading day? Analysis of hourly data can help answer this question. As with the more sophisticated natural language processing techniques, this is left for future research.

I touched upon the greater picture earlier. Looking at the bigger picture leads to more questions, thus emphasizing the need for further research. Such research can include simulating a structural shock to sentiments based on a theoretical cyber-attack. It could ask the question as to whether additional monitoring on social media with regard to financial markets is needed, and how individual entities can combat such shocks to the Twitter sentiments. Flipping the above on its heads, research on the area could include deriving specific trading strategies rooted in content analyses as shown in this thesis.

79

In document May 15th, 2019 Mathias Pagh Jensen (Sider 74-80)