Finn ˚Arup Nielsen DTU Compute
Technical University of Denmark March 28, 2017
afinn
Started out as a English senti- ment word list for use in analysis of Twitter messages in 2009.
Later the approach was eval- uated with manually labeled tweets in published paper.
Shown Python code snippets on the Internet including my blog on how to use it.
In July 2015, turned into a GitHub repository.
0.1 release in November 2016.
Philosophies for afinn
Simple approach with little dependencies: The package should do what it should do and nothing more.
Open source.
Test thoroughly all elements of the package.
Documentation in the code for everything.
Tutorials.
Easy installation for other developers.
Should work for a broad number of Python versions.
GitHub-based development
Git-based development with GitHub.
Repository contains the Python module itself with data, test function, setup and package files files (setup.py, README.rst), notebooks with example code.
Other developers can work from it: 36 forks by differ- ent peoples.
The AFINN word list
Word associated with sentiment score between −5 (most negative) and +5 (most positive):
a b a n d o n -2
a b a n d o n e d -2
a b a n d o n s -2
a b d u c t e d -2
a b d u c t i o n -2 a b d u c t i o n s -2 a b h o r -3
a b h o r r e d -3
a b h o r r e n t -3 a b h o r s -3
a b i l i t i e s 2 a b i l i t y 2
Basic Afinn object
The word list is encapulated as a Python class (object-orientation)
The word list is loaded at object instantiation time, to avoid reading overhead during sentiment scoring
A text scored for sentiment based on the sentiment of individual words with a method from the class:
c l a s s A f i n n ():
def _ _ i n i t _ _ ( s e l f ):
s e l f . d a t a = s e l f . l o a d _ d a t a () def s c o r e ( self , t e x t ):
s c o r e = 0
for w o r d in t e x t :
s c o r e += s e l f . d a t a . get ( word , d e f a u l t =0) r e t u r n s c o r e
Basic use
Using the class: Object instantiation followed by calling the score meth- ods:
> > > f r o m a f i n n i m p o r t A f i n n
> > > a f i n n = A f i n n () # a f i n n is a o b j e c t n a m e now , not m o d u l e
> > > a f i n n . s c o r e ( ’ It is so h o r r e n d o u s l y bad ’ ) -3.0
> > > a f i n n . s c o r e ( ’ v e r y f u n n y ’ ) 4.0
Or score multiple texts in a list:
a f i n n _ s c o r e s = [ a f i n n . s c o r e ( t e x t ) for t e x t in t e x t s ]
Basic processing
The central part of the text processing uses regular expression (Python module: re) to extract words or to directly match against the AFINN dictionary.
i m p o r t re # I m p o r t r e g u l a r e x p r e s s i o n s t a n d a r d l i b r a r y m o d u l e
# S e t u p
l e x i c o n = { ’ i k k e god ’ : -2 , ’ i m p o n e r e n d e ’ : 3 , ’ i n e f f e k t i v ’ : -2}
r e g e x = re .c o m p i l e( ’ ( i k k e god | i m p o n e r e n d e | i n e f f e k t i v ) ’ )
# M a t c h and s c o r i n g
m a t c h e d = r e g e x . f i n d a l l ( " Den er i n e f f e k t i v og i k k e god " ) s c o r e = sum([ l e x i c o n [ w o r d ] for w o r d in m a t c h e d ])
score is now −4. A few phrases can be matched.
Code checking
flake8 tool can check that the code conforms to convention (PEP8).
$ f l a k e 8 a f i n n
(Nothing is reported if there is no convention issues) Further checking can be made with pylint.
Documentation
Documention in the “docstring” of a object method:
def s c o r e s _ w i t h _ p a t t e r n ( self , t e x t ):
""" S c o r e t e x t b a s e d on p a t t e r n m a t c h i n g .
P e r f o r m s the a c t u a l s e n t i m e n t a n a l y s i s on a t e x t . It u s e s a r e g u l a r e x p r e s s i o n m a t c h a g a i n s t the w o r d l i s t .
The o u t p u t is a l i s t of f l o a t v a r i a b l e s for e a c h m a t c h e d w o r d or p h r a s e in the w o r d l i s t .
P a r a m e t e r s - - - - t e x t : str
T e x t to be a n a l y z e d for s e n t i m e n t . R e t u r n s
- - - -
s c o r e s : l i s t of f l o a t s
S e n t i m e n t a n a l y s i s s c o r e s for t e x t
Documentation
and the documentation goes on with example code:
E x a m p l e s - - - -
> > > a f i n n = A f i n n ()
> > > a f i n n . s c o r e s _ w i t h _ p a t t e r n ( ’ G o o d and bad ’ ) [3 , -3]
> > > a f i n n . s c o r e s _ w i t h _ p a t t e r n ( ’ s o m e k i n d of i d i o t ’ ) [0 , -3]
"""
# T O D O : ": D " is not m a t c h e d w o r d s = s e l f . f i n d _ a l l ( t e x t )
s c o r e s = [ s e l f . _ d i c t [ w o r d ] for w o r d in w o r d s ] r e t u r n s c o r e s
Documention checking
There is a standard for documentation: PEP 257.
Tools exists to check whether the documentation is complete and whether it follows the standard: pydocstyle (previously called pep257).
I can call it with:
p y d o c s t y l e a f i n n
(It should report nothing if ok) There is a plugin in flake8
Afinn uses the Numpy document convention. However this cannot be tested: Currently no tools (AFAIK).
Testing
Unit tests in afinn/tests/test_afinn.py Test function have the prefix test_.
The prefix tells py.test, http://doc.pytest.org, to test it.
Example for testing the find_all method of the object:
def t e s t _ f i n d _ a l l ():
a f i n n = A f i n n ()
w o r d s = a f i n n . f i n d _ a l l ( " It is so bad " ) a s s e r t w o r d s == [ ’ bad ’ ]
Testing
Starting py.test in the afinn directory will automatically identify all test functions that should be executed based on test_ prefix:
$ py . t e s t
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = t e s t s e s s i o n s t a r t s = = = = = = = = = = = = = = = = p l a t f o r m l i n u x - - P y t h o n 3.5.2 , pytest -3.0.6 , py - 1 . 4 . 3 2 , pluggy - 0 . 4 . 0 r o o t d i r : / h o m e / f a a n / p r o j e c t s / afinn , i n i f i l e :
c o l l e c t e d 14 i t e m s
t e s t s / t e s t _ a f i n n . py . . . .
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 14 p a s s e d in 0 . 4 9 s e c o n d s = = = = = = = = = = = = =
Succinct!
Testing: doctesting
From method documentation:
E x a m p l e s - - - -
> > > a f i n n = A f i n n ()
> > > a f i n n . s c o r e s _ w i t h _ p a t t e r n ( ’ G o o d and bad ’ ) [3 , -3]
This piece of code can be tested: “doctest”
p y t h o n - m d o c t e s t a f i n n / a f i n n . py or . . .
Testing: doctesting
Testing the entire module:
$ py . t e s t - - doctest - m o d u l e s a f i n n
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = t e s t ...
p l a t f o r m l i n u x - - P y t h o n 3.5.2 , pytest -3.0.6 , py - 1 . 4 . 3 2 , ...
r o o t d i r : / h o m e / f a a n / p r o j e c t s / afinn , i n i f i l e : c o l l e c t e d 7 i t e m s
a f i n n / a f i n n . py . . . .
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 7 p a s s e d
Here 7 example code snippets were found in the docstrings, extracted and tested and found to be ok.
Testing with tox
I would like to have afinn working with different versions of Python:
Versions 2.6, 2.7, 3.3, 3.4 and 3.5.
tox combines testing with virtual environments enabling the test of different versions of Python.
tox creates virtual environments in afinn/.tox/<virtualenv> moves into them and executes whatever is specified in a tox.ini file (for afinn it is setup to execute py.test, doctesting and flake8).
tox neatly enables testing multiple versions with just a single command.
Testing with tox
$ tox
G L O B sdist - m a k e : / h o m e / f a a n / p r o j e c t s / a f i n n / s e t u p . py
p y 2 6 inst - n o d e p s : / h o m e / f a a n / p r o j e c t s / a f i n n /. tox / d i s t / afinn - 0 . 1 .zip ...
I n s t a l l i n g c o l l e c t e d p a c k a g e s : a f i n n
R u n n i n g s e t u p . py i n s t a l l for a f i n n ... d o n e S u c c e s s f u l l y i n s t a l l e d afinn - 0 . 1
p y 2 6 r u n t e s t s : c o m m a n d s [1] | py . t e s t t e s t _ a f i n n . py
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = t e s t s e s s i o n s t a r t s p l a t f o r m l i n u x 2 - - P y t h o n 2.6.9 , pytest -3.0.7 , py - 1 . 4 . 3 3 , pluggy - 0 . 4 . 0 r o o t d i r : / h o m e / f a a n / p r o j e c t s / afinn , i n i f i l e :
c o l l e c t e d 14 i t e m s
t e s t _ a f i n n . py . . . . ...
p y 2 6 : c o m m a n d s s u c c e e d e d p y 2 7 : c o m m a n d s s u c c e e d e d p y 3 3 : c o m m a n d s s u c c e e d e d p y 3 4 : c o m m a n d s s u c c e e d e d p y 3 5 : c o m m a n d s s u c c e e d e d f l a k e 8 : c o m m a n d s s u c c e e d e d c o n g r a t u l a t i o n s :)
Testing with Travis
Travis: cloud-based test- ing at https://travis-ci.
org/fnielsen/afinn
Ensures that the pack- age would also work on another system: Missing data? Missing dependen- cies?
Specified with a .travis.yml configuration file to run tox.
Jupyter notebooks
A couple of Jupyter note- books are available in the GitHub repository.
Used to demonstrate how the module can be applied with a dataset.
GitHub formats the note- book for human readability.
It would otherwise be raw JSON.
This notebook computes accuracy on a manually sentiment-scored Twitter dataset.
Python Package Index
afinn distributed from the cen- tral open archive Python Pack- age Index: https://pypi.python.
org/pypi/afinn
Enables others to download the package seamlessly
pip install afinn
Or search for it with:
pip search sentiment
Dependencies
Keep dependencies on a bare minimum: None, except standard library (codecs, re, os) — so far.
Otherwise the dependencies should have been added to requirements.txt Example from other package:
b e a u t i f u l s o u p 4 db . py
d o c o p t f a s t t e x t f l a s k
Flask - B o o t s t r a p g e n s i m
j s o n p i c k l e ...
Enables pip install -r requirements.txt
Issue: Versioneering
Versioneering is a problem at the moment.
Version string “0.1” is hard-coded in the setup file:
s e t u p (
n a m e = ’ a f i n n ’ ,
p a c k a g e s =[ ’ a f i n n ’ ] , v e r s i o n = ’ 0.1 ’ ,
...
PyPI version is 0.1, but if the GitHub repository is changed this version is no longer reflecting differences.
In the old days, developers would manually update the version.
Summary
The Python environment has good methods to standardize development.
Python can neatly enforce documentation.
A good number of tools help the developer to write in a best practice mode: testing frameworks, code and documentation style checkers.
Python provides a good framework for publishing open source code.
Persistent and versioned distribution.
Most of the “code” is documentation.
End