• Ingen resultater fundet

Screenshot 1 Screenshot 2:

5. Concerned with Politics

110 3672171 2009-06-15


http://twitter.com/lacouvee\r\n Absolutely!




If I want numbers, I will play the

lottery....be real.

Be you! Who are you??\r\n

471635 6

2009-06-16 03:42:41\r\




Well said! RT

@DustypupVI:Yo u know you follow outstanding people when their tweets get your heart racing and your mind following suit!\r\n

577137 1

2009-06-17 05:18:01\r\




Personally, no!

Have also met people at

#victoriatweetupR T @GuyKawasaki:

Is social networking making u antisocial?:

http://trkk.us/?awd AC\r\n

907309 5

2009-06-21 06:27:13\r\




@SharonHayeso h no, wasn't implying that was what you should do ;but for me, is not worth time

&efforrt usually.

Hope you get some rest Sun\r\n

5. Concerned

111 and News/

Spreading Alarm or Scary/Shockin g News

441020 8

2009-06-15 21:39:43\r\







@IranElection09 footage of young girl SHOT


http://bit.ly/UAQx L #iranelection\r\n

4600795 2009-06-16 01:43:33\r\n

http://twitter.com/lacouvee\r\n RT

@windowsot: I read the newspaper to see what's wrong with the world. I read your tweets to see what's right with the


4641210 2009-06-16 02:25:24\r\n

http://twitter.com/lacouvee\r\n RT


Change time zone to GMT+3:30 &

location to Tehran. Confuse Iranian gov, protect real Tehran twitter users


474963 3






RT @claytonstark:

ok, they're not starving, and there's only 3 of them, but FEED THESE CHILDREN by downloading #flock http://trim.li/nk/3Bv\r


112 Discussion

After presenting the manual analysis above and applied the previous steps in the Cross-industry Standard Process for Data Mining (CRISP-DM), we reached the final stage – Deployment. According to Provost et al. (2013), the results from the data mining process are

“put into real use in order to realize some return on investment”. (Provost & Fawcett, p. 32, 2013) Applied in our case, this will involve implementing the predictive model described above in a business process. (Provost & Fawcett, p. 32, 2013) In our case this will mean applying it to public health care services as it this will be explored further after the limitations.


The biggest limitation in our research is, as stated earlier in the paper - limited experience of scraping the tweets, Twitter API’s restrictions and, prices being too high for less restricted access to data. Additionally, this impacted the results which we expected – the words and phrases used by those users. An approach towards taking more amounts of users and applying the manual approach to them would have resulted in deeper insights. Therefore, the manual check provided further limitations as many of the accounts were bots or companies, thus these requiring additional users to be taken into consideration and reviewed manually.


According to the literature, Twitter data is found to be useful in the following public health applications among which are monitoring diseases, prediction, emergency situations, public reaction, lifestyle and general applications. (Jordan et al., p. 2, 2018) Twitter, as presented in the “What is Twitter?” part, is a social media platform which allows sharing and tweeting short-text updates which, as shown in the content of the paper and as stated by authors – contains public information about social media addiction based on the language which users use. (Jordan et al., p. 2, 2018) Authors argue that while platforms including Twitter are considered as providing real-time services (tweeting), they can be promising for implementation in public health applications. (Jordan et al., p. 2, 2018)


The purpose of our research is to use social media data using machine learning algorithms and apply it to public health care services. More precisely, further application involves using machine learning models, as stated by researchers, to 1) monitor addiction health issues, 2) prediction – based on the manual analysis to predict addiction, 3) tracking addiction behaviors. (Jordan et al., p. 2, 2018) These are explored more below.

Twitter data to track future addiction behaviors

One of the applications of big data/ Twitter social media data, is that patterns can be analyzed and machine learning algorithms implemented to track future addiction behaviors, therefore estimating real-time addiction language patterns on the basis of the manual analysis in the paper. Additionally, big data can be used to identify social media addiction behaviors.

Detecting and identifying users who are potential for becoming social media addicts By using big data analytics, social media platforms such as Twitter can look into patterns that are typical for social media addiction users and identify the problem at its early stages. A further suggestion is, if the model detects more negative language, based on psychological professional assessment, then this person might be identified in the group of

“heavy/addicted” users. Therefore, following this method can be used for the future to detect changes in behaviors based on language before harmful levels of the individual are reached.

Future Work

For future research, we recommend the manual process which we created in this paper to be applied to more than three users. Therefore, a classification model can be implemented based on different categories and further machine learning models used. We advise this paper to be used as a starting point of advanced machine learning models. Building on the categories, a classification algorithm would be relevant to be used in order to 1) identify users of addicts/non-addicts accordingly to the language patterns which we found in the manual analysis 2) use updated Twitter data to more users and analyze their language 3) compare the results and see addiction patterns for individual users 4) use the models for future research on users potential of being an addict, thus focusing on specific users.

114 Using visualizations to improve future research

Additionally, instead of using frequency, a measurement in forms of percentages can be used to detect addicts from non-addicts. For instance, depending on how many of these categories: words, expressions, users tweet, an average percentage can be calculated where if the user uses too many of these words, he/she falls into the category of addict.

However, an interview with these users is needed to ensure proper assessment.

Visualizations could be also quite informative, for instance, using the business intelligence and software analytics Tableau (Tableau Software, n.d.) for visualizations might provide more insights into the data, especially when the categories are based on language patterns as we described in our paper.

Using Twitter API

For this purpose, access to historical data would have provided more deep insights for future predictions, analyzing and looking at language patterns that are more likely also to occur in the future. This means that a comparison could be made by using machine learning models on data from different time duration depending on which months and years the predictions are decided to be made.

Lastly, the manual analysis could be used as an initial point to create more complex machine learning models where these categories will determine the proxy for how the data will be analyzed further. Then, prediction models, for instance applying the logistic regression to more users would be one of the considerations for future work.


The purpose of the research is to use machine learning algorithms and data mining to identify language patterns from textual data. We investigated whether we could classify two groups of users – “Heavy/Addicted” and “Normal” users. To do this, we used logistic regression and bag-of-words representation using CountVectorizer in Python. To ensure proper and systematic process, the CRISP-DM model was implemented with following the steps recommended.

A first step involved business understanding of the case. A central topic in our research is the topic of addiction, which was explained in the first part of the paper. We could see that


addiction leads to negative consequences on several levels: psychological, physical, societal and economic. Therefore, the key takeaway when we talked about addiction is that addiction is characterized by the inability to control it, applied to social media addiction – to control the tweets users post. Additionally, when we talk about big data it was necessary to explore the value big data brings – improve health care services through implementation of machine learning by using social media data. Furthermore, leading to detection of health risk behavior, making the process of recovery easier and faster, preventing on early phrases people who experience and have issues with addiction problems such as social media addiction and make more informed approaches to prevent negative consequences initiated through social media platforms.

Big role in our paper played the data preparation because it was 1) time-consuming to go through all 1000 users and manually check them; 2) the next steps were dependent on our findings of “Heavy/Addicted” and “Normal” users. Moreover, we chose to focus our research and limit it to a time period – tweets in June. Bots and companies were removed. Further, as users who post only once we considered as not enough to analyze in our paper, we took users who post 22 times for the “Normal” users, which was dependent and decision based on the Python programme code. The “Heavy/Addicted” users were divided by taking the top frequently ones. The frequency of the users’ tweets formed our corpus of “Heavy/Addicted”

and “Normal” users to process further with our paper.

In the Modelling stage, we applied logistic regression and labeled our data – 0 meaning

“Non-Addicted” and 1 meaning “Addicted”. Further, wag-of-words representation was chosen to investigate language patterns. For evaluation, we used two techniques – train and test data split and cross-validation. Further techniques were used to improve our model.

Next, the term frequency-inverse document frequency (tf-idf) method was used, stop words were removed, resulting in head map visualization and Model coefficients from the Logistic regression model.

The findings showed that in order to classify these groups of users, we needed a context to understand what, for instance, the word “12for12k” would mean. Other words include:

Followfriday, topprog, Iran allday, Znatrainer, Lostnmissing, Laura330, The_tech update,


Tinysong, Breakingnews, Neda, Cc, Bitrebels, Autopsy, Wiretapper, Krystynchong, Jhills, Sugar, Babe, Iran, Tcot, Wink, Markismusing, Lotay, Michaelgrainger, Jason_pollock, P2, Blip, Digg, Listening, Buzzedition, Hugs, Iranelection, Repent, Hivemindmovie, Collective_soul, Grl, Jhillstephens, Forgiven, Rt.

This means that we could identify language patterns from the textual data, however, because our results did not prove to be reliable in our case, a classification of the two groups is still possible when the words are put into context. For this reason we used bigrams and trigrams which did not bring much informative words and phrases, therefore if the exact tweets are used, then that would have improved the results.

One way to put context, except only using bigrams and trigrams – is using a basic theory of emotion. However, using such a theory meant that our results would have been too much biased because of the above statements. The results showed us that even if we use this approach, this meant that we still would not have been able to understand the context;

therefore the actual tweets are needed.

To sum up, an identification of “Heavy/Addicted” and “Normal” users is possible; however, without the actual tweets it would be impossible to do that because of the lack of context.

Lastly, a key takeaway is that, when we compare the manual analysis with the machine learning approach, the results showed that machine learning is still not performing well at analyzing language comparing to humans.

117 References

@TwitterIR. (2019). Investor Fact Sheet Monetizable Daily Active Usage (mDAU) Year-Over-Year Growth.

#FollowFriday (TV Movie 2016) - IMDb. (n.d.). Retrieved May 8, 2020, from https://www.imdb.com/title/tt5233106/

AddictionCenter. (n.d.). Social Media Addiction - Addiction Center. Retrieved April 15, 2020, from https://www.addictioncenter.com/drugs/social-media-addiction

Aggarwal, M. (2018). Cross-Industry process for data mining. Retrieved April 16, 2020, from https://medium.com/@thecodingcookie/cross-industry-process-for-data-mining-286c407132d0

American Addiction Centers Resource. (n.d.). Computer/Internet Addiction Symptoms, Causes and Effects - PsychGuides.com. Retrieved April 15, 2020, from


American Psychiatric Association. (2017). What Is Addiction? Retrieved March 19, 2020, from https://www.psychiatry.org/patients-families/addiction/what-is-addiction Annual Report. (2019). Twitter, Inc. Annual Report - Our logo? Made up of 3 sizes of circles,

each representing local, topical, and global conversation!

Bahr, B. (2019). Community Examines Social Media Addiction - Atlanta Jewish Times.

Retrieved April 15, 2020, from


Balakrishnan, V., & Shamim, A. (2013). Malaysian Facebookers: Motives and addictive behaviours unraveled - ScienceDirect. Retrieved April 15, 2020, from

https://www.sciencedirect.com/science/article/pii/S0747563213000137?via%3Dihub Banerjee, D. (n.d.). Understanding Character Encoding. Retrieved April 16, 2020, from


Bansal, A. (n.d.). Python Dictionary. Retrieved April 16, 2020, from https://www.geeksforgeeks.org/python-dictionary/

Barry, F. (2009). Social Media Strategy: 12for12k Challenge with Danny Brown | npENGAGE.

Retrieved May 7, 2020, from https://npengage.com/nonprofit-fundraising/social-media-strategy-12for12k-challenge-with-danny-brown/

Beaumont, C. (2010). Twitter users send 50 million tweets per day - Telegraph. Retrieved

118 April 30, 2020, from


Berthon, P., Pitt, L., & Campbell, C. (2019). Summary of the Negative Effects of Addiction to Digital Experiences. | Download Scientific Diagram. Retrieved April 15, 2020, from https://www.researchgate.net/figure/Summary-of-the-Negative-Effects-of-Addiction-to-Digital-Experiences_tbl1_334741512

Bhanji, J. P., & Delgado, M. R. (2014, January). The social brain and reward: Social information processing in the human striatum. Wiley Interdisciplinary Reviews:

Cognitive Science, Vol. 5, pp. 61–73. https://doi.org/10.1002/wcs.1266

Bronshtein, A. (2017). A Quick Introduction to the “Pandas” Python Library. Retrieved April 16, 2020, from https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673

Brown, N. (2017). The Effects of Social Media Addiction | Fix.com. Retrieved May 11, 2020, from https://www.fix.com/blog/does-social-media-cause-depression/

Brownlee, J. (2018). A Gentle Introduction to k-fold Cross-Validation. Retrieved April 28, 2020, from https://machinelearningmastery.com/k-fold-cross-validation/

Brynjolfsson, Erik; McAfee, A. (2017). Harvard Business Review The Business of Artificial Intelligence What it can – and cannot – do for your organization What Can AI Do Today ? Harvard Business School Publishing Corporation, (The Big Idea), 3–12.

Brynjolfsson, E., & McAfee, A. (2012). Winning the race with ever-smarter machines. MIT Sloan Management Review, 53(2), 53–60.

Burnell, K., George, M. J., Vollet, J. W., Ehrenreich, S. E., & Underwood, M. K. (2019). Passive social networking site use and well-being: The mediating roles of social comparison and the fear of missing out. Cyberpsychology: Journal of Psychosocial Research on

Cyberspace, 13(3). https://doi.org/10.5817/cp2019-3-5

Cash, H., Rae, C. D., Steel, A. H., & Winkler, A. (2012). Internet Addiction: A Brief Summary of Research and Practice. Current Psychiatry Reviews, 8(4), 292–298.


Cielen, D., Meysman, A., & Ali, M. (2016). Introducing Data Science: Big data, machine learning, and more, using Python tools. In K. Maharry, D.; Roberts, M.; Thoms, J.; Petito (Ed.), Introducing Data Science (1st Editio). Retrieved from



Developer. (n.d.-a). Counting characters — Twitter Developers. Retrieved April 28, 2020, from https://developer.twitter.com/en/docs/basics/counting-characters

Developer. (n.d.-b). Twitter premium APIs – Twitter Developers. Retrieved May 4, 2020, from https://developer.twitter.com/en/premium-apis

Dhanani, S. (2017). For ML, More Data = More Better, But Not Always - Apteo - Medium.

Retrieved April 21, 2020, from https://medium.com/apteo/for-ml-more-data-more-better-but-not-always-920e0321b17b

EDGAR. (2020). Twitter Announces Fourth Quarter and Fiscal Year 2019 Results. Retrieved April 2, 2020, from

https://www.sec.gov/Archives/edgar/data/1418091/000141809120000019/twtrq419e x992.htm

Editorial Team. (2017). Why Big Data is Important to Your Business - insideBIGDATA.

Retrieved April 30, 2020, from https://insidebigdata.com/2017/09/09/big-data-important-business/

Edmonds, R. (2008). Anxiety, loneliness and Fear of Missing Out: The impact of social media on young people’s mental health | Centre for Mental Health. Retrieved April 15, 2020, from


Edugrad. (2019). Introduction to Regression. Retrieved April 14, 2020, from https://www.edugrad.com/tutorials/learn-regression-analysis/3

Emurasoft, I. (n.d.). EmEditor (Text Editor) – Text Editor for Windows supporting large files and Unicode! Retrieved April 23, 2020, from https://www.emeditor.com/

Felman, A. (2018). Addiction: Definition, symptoms, withdrawal, and treatment. Retrieved March 19, 2020, from https://www.medicalnewstoday.com/articles/323465

Forgiven (TV Movie 2007) - IMDb. (n.d.). Retrieved May 8, 2020, from https://www.imdb.com/title/tt0835456/

Forgiven | Definition of Forgiven by Merriam-Webster. (n.d.). Retrieved May 8, 2020, from https://www.merriam-webster.com/dictionary/forgiven

Gabe Zichermann. (2017). Is Trump a Twitter addict? | TechCrunch. Retrieved April 27, 2020, from https://techcrunch.com/2017/12/11/is-trump-a-twitter-addict/?guccounter=1 Gadaleta, F. (2019). The dark side of AI: social media and the optimization of addiction.


Retrieved April 15, 2020, from https://datascienceathome.com/the-dark-side-of-ai-social-media-and-the-optimization-of-addiction/

Galentino A, B. N. and S. L. (2017). Frontiers | Positive Arousal Increases Individuals’

Preferences for Risk | Psychology. Front. Psychol., 8(2142).


geeksforgeeks. (n.d.-a). Python | Pandas Dataframe/Series.head() method - GeeksforGeeks.

Retrieved May 12, 2020, from https://www.geeksforgeeks.org/python-pandas-

dataframe-series-head- method/?fbclid=IwAR0IuLtSM5h22ryOmXLuqn7EIOj_n9DTK3jZ5sg-XjbwAg1xRykDcZDm84o

geeksforgeeks. (n.d.-b). Python | Pandas DataFrame. Retrieved April 23, 2020, from https://www.geeksforgeeks.org/python-pandas-dataframe/

Glossary. (n.d.). Retrieved April 30, 2020, from https://help.twitter.com/en/glossary Gregory, C. (2019). Internet Addiction Disorder - Signs, Symptoms, and Treatments.

Retrieved April 15, 2020, from https://www.psycom.net/iadcriteria.html

Grgurević, I. (2017). How Social Media Affects Our Mental Health - Digital Reflections - Medium. Retrieved April 15, 2020, from https://medium.com/digital-reflections/how-social-media-affects-our-mental-health-5d65f3690ead

Guardian News & Media Limited. (2019). The machine always wins: what drives our

addiction to social media | Technology | The Guardian. Retrieved May 11, 2020, from https://www.theguardian.com/technology/2019/aug/23/social-media-addiction-gambling

Gunelius, S. (2020). How Do I Make Short URLs on Twitter? Retrieved April 30, 2020, from https://www.lifewire.com/make-short-urls-on-twitter-3476762

Gupta, S. (2019). A Guide to Unicode, UTF-8 and Strings in Python - Towards Data Science.

Retrieved April 16, 2020, from https://towardsdatascience.com/a-guide-to-unicode-utf-8-and-strings-in-python-757a232db95c

Hartney, E. (n.d.). An Overview of Internet Addiction. Retrieved April 15, 2020, from https://www.verywellmind.com/internet-addiction-4157289

Harvard IV-4 dictionary, L. value dictionary. (n.d.). General Inquirer Categories. Retrieved May 1, 2020, from http://www.wjh.harvard.edu/~inquirer/homecat.htm

help.twitter.com; hashtags. (n.d.). How to use hashtags. Retrieved April 30, 2020, from



Help.twitter.com; retweet. (n.d.). Retweet FAQs. Retrieved April 30, 2020, from https://help.twitter.com/en/using-twitter/retweet-faqs

Help.twitter.com. (n.d.). Help Center. Retrieved April 30, 2020, from https://help.twitter.com/en

Horvath, A. Tom; Misra, Kaushik; Epner, Amy K.; Cooper, G. M. (n.d.). Definition of Addiction - Addictions. Retrieved March 19, 2020, from


Ictea. (n.d.). About Twitter - Knowledgebase - ICTEA. Retrieved April 30, 2020, from https://www.ictea.com/cs/index.php?rp=%2Fknowledgebase%2F3377%2FSobre-Twitter.html&language=english

Jakobsen, M., & Holmgren, N. (2019). Detecting users at risk of becoming social media addicts : A big data approach to identification and value realization. Copenhagen Business School.

Jordan, S., Hovet, S., Fung, I., Liang, H., Fu, K.-W., & Tse, Z. (2018). Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response. Data, 4(1), 6.


Kaufmann, M. (2019). Big Data Management Canvas : A Reference Model for Value Creation from Data. https://doi.org/10.3390/bdcc3010019

Kazeniac, A. (2009). Social Networks: Facebook Takes Over Top Spot, Twitter Climbs.

Retrieved April 30, 2020, from



Kilgore, E. (2019). Trump’s Twitter Addiction Could Become a 2020 Campaign Issue.

Retrieved April 27, 2020, from https://nymag.com/intelligencer/2019/08/trumps-twitter-addiction-could-become-a-2020-campaign-issue.html

L. Johnson, N. (n.d.). What are Network Effects? Indirect and Direct Network Effects.

Retrieved April 15, 2020, from https://www.applicoinc.com/blog/network-effects/

Logallo, N. (2019). Data Science Methodology 101 - Towards Data Science. Retrieved April 27, 2020, from https://towardsdatascience.com/data-science-methodology-101-ce9f0d660336


Marr, B. (2015). Why only one of the 5 Vs of big data really matters | IBM Big Data &

Analytics Hub. Retrieved May 7, 2020, from

https://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters Matthews, K. (2019). How big data is fighting against gambling addiction. Retrieved May 1,

2020, from https://betanews.com/2019/03/08/big-data-fighting-gambling-addiction/

McCourt, A. (2018). Social Media Mining: The Effects of Big Data In the Age of Social Media - Yale Law School. Retrieved April 30, 2020, from


Media, A. (2017). Big Data and Analytics: Two Key Components for Business Growth.

Retrieved April 30, 2020, from https://medium.com/@avenewmedia/big-data-and-analytics-two-key-components-for-business-growth-deee0726e338

Mollett, A., Moran, D., & Dunleavy, P. (2011). Using Twitter in university research, teaching and impact activities A guide for academics and researchers. Retrieved from


Müller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python. O’Reilly Media, Inc.

Myers, B. C. (2011). 5 years ago today Twitter launched to the public. Retrieved April 30, 2020, from https://thenextweb.com/twitter/2011/07/15/5-years-ago-today-twitter-launched-to-the-public/

NationalMemo. (2020). #EndorseThis: The Liar Tweets Tonight! Vote Him Away - National Memo. Retrieved April 27, 2020, from https://www.nationalmemo.com/endorsethis-2645813112

Nicora, R. (2019). How is big data impacting social media? - Dative_io - Medium. Retrieved May 8, 2020, from https://medium.com/dative-io/how-is-big-data-impacting-social-media-df31aa3f66f6

nikhilaggarwal3. (n.d.). Read a file line by line in Python. Retrieved April 16, 2020, from https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/

Pankaj. (n.d.). Pandas concat() Examples. Retrieved April 21, 2020, from JournalDev website:


Pant, A. (2019). Workflow of a Machine Learning project - Towards Data Science. Retrieved April 28, 2020, from https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94