• Ingen resultater fundet

Understanding the results

baseline, since no published results has been obtainable for entity level sentiment analysis. The baseline presented here is thus a sentence-level baseline, calculated by using the Natural Language Toolkit (NLTK) for Python using a Naive Bayes Classifier (trained on movie reviews though rather than hotel reviews). The raw sentiment values calculated by the presented method are also available for each text in the test data set in TableC.1 in AppendixC. Theprecision andrecall results for both the baseline, and the presented method are shown in Table 6.1. As seen the recall is somewhat low for the proof of concept system, which is addressed in the next section, while it is argued that precision of the system is indeed acceptable, since even humans will not reach a 100% agreement.

Baseline Presented method

Precision 71.5% 92.3%

Recall 44.1% 35.3%

Table 6.1: Precision and recall results for proof of concept system.

6.3 Understanding the results

To investigate the low recall, focus was turned to clarifying why the presented method only yields results for 38.2% of the test data set. The C&C toolchain was able to give a syntactic deduction proof for94.4%of the test data set. However after closer inspection of the proofs constructed for texts in the test data set, it was discovered that approximately only half of these proofs were correct. This is of cause a major handicap for the presented method, since it is highly reliable on correct deduction proofs.

The reason the C&C toolchain behaves so inadequately is indeed expected, and thus a low racall was also expected, even though it is lower then hoped for. Recall from Section4.1that the C&C parser could recognize 98.4% of unseen data. However there is one major assumption if this promise should be met: the probability distribution of the input should of cause follow the same distribution as the training data that the C&C models was created from. The models were trained on The CCGbank [Hockenmaier and Steedman, 2007] and thereby follows the distribution ofThe Penn Treebank [Marcuset al., 1993]. That this follows a different probability distribution than the Opinosis data set may not be that surprising, since the treebank consists mostly of well-written newspaper texts. To illustrate this consider Figure6.1 which shows the probability distribution of sentence length (i.e. number of words) in: the set of hotels and restaurants reviews of Opinosis (2059sentences); and respectively a subset of Wall Street Jounal (WSJ) corpus, which is a representative andfree to use

sample of The Penn Treebank (3914sentences). This measure gives a clear indication that the two data sets indeed follows different probability distributions.

10 20 30 40 50 60

0%

1%

2%

3%

4%

5%

Sentence length in number of words

Hotels and restaurants WSJ

Figure 6.1: Sentence length distribution in representative subsets of Opinosis and The Penn Treebank.

At best this may however only explain half of missing results. To explain the rest focus needs to be turned to those sentences for which the C&C toolchain constructs proofs correctly, but does not yield any results. Consider Figure 6.2 which shows the de-duction proof for test sentence #9, which most humans, like the two individuals used for labeling, would agree expressed a positive opinion about the rooms. However the sentiment extraction algorithm fails to capture this, even though it is worth noticing, that the semantic expression in the conclusion of the proof, i.e.furnish095.0(room0.0), indeed contains a positive sentiment value.

Rooms NNS N: room0.0 NP: room0.0

were VBD (Sdcl\NP)/(Spss\NP) :λx.x

nicely RB (SX\NP)\(SX\NP) :λx.(x◦95.0) (Sdcl\NP)/(Spss\NP) :λx.(x◦95.0) <B×

furnished VBN Spss\NP:λx.furnish00.0(x) Sdcl\NP:λx.furnish095.0(x)

>

Sdcl: furnish095.0(room0.0)

<

. . Sdcl\Sdcl:λx.x Sdcl: furnish095.0(room0.0)

<

Figure 6.2: Sentence positive aboutroom, but with no sentiment value on its func-tor.

In the next chapter solutions for these problems are proposed and discussed, and it is argued that even though the recall results for the test data set are unacceptable low, applications for the presented method are indeed possible given some additional effort.

Chapter 7

Discussion

The presented method forentity level sentiment analysis using deep sentence struc-ture analysis has shown acceptable correction results, but inadequate robustness cf.

the previous chapter.

The biggest issue for the demonstrated proof of concept system is the lack of correct syntactic tagging models. It is argued that models following a closer probability distribution of review texts than The Penn Treebank models would have improved the robustness of the system significantly. One might think, that if syntactic labeled target data are needed, then the presented logical method really suffers the same issue as machine learning approaches, i.e. domain dependence. However it is argued that exactly because the models needed are ofsyntactic level, and not of sentiment level, they really do not need to be domain specific, but only genre specific. This reduces the number of models needed, as a syntactic tagging model for reviews might cover several domains, and thus the domain independence of the presented method is intact.

To back this argument up, consider Figure 7.1 which shows the probability distri-bution of sentence lengths in two clearly different sentiment domains, namelyhotels and restaurants (2059samples) andGPS navigation equipment (583samples). This measure gives strong indications, that a robust syntactic level model for either do-main would also be fairly robust for the other. The same is intuitively not true for sentiment level models.

10 20 30 40 50 60 0%

1%

2%

3%

4%

5%

Sentence length in number of words

Hotels and restaurants Navigation

Figure 7.1: Sentence length distribution for different review topics.

An interesting experiment would have been to see how the presented method per-formed on such genre specific syntactic models. Building covering treebanks for each genre to train such models is an enormous task, and clearly has not been achievable in this project even for a single genre (The Penn Treebank took eight years to compile).

HoweverSøgaard[2012] presents methods forcross-domain semi-supervised learning, i.e. the combination of labeled (e.g. CCGBank) and unlabeled (e.g. review texts) data from different domains (e.g. syntactic genre). This allows the construction of models that utilizes the knowledge present in the labeled data, but also biases it toward the distribution of the unlabeled data. The learning accuracy is of cause not as signifi-cant as compared to learning with large amounts of labeled target data, but it can improve cases as the one presented in this thesis greatly. The reason why this was not performed in this project is partly due to it was not prioritized, and the fact that the method still assumes access to raw labeled data (e.g. CCGBank) which was not available.

The other issue identified when analyzing the low robustness was the failure of ex-tracting sentiment values, even though they were actually present in the semantic expression yielded by the conclusion of the deduction proof. This clearly shows that the simple extraction algorithm given by Definition 4.3 is too constitutive, i.e. it turned out to be insufficient to only extract sentiment values at the atomic functor level for the subject of interest. However, since the knowledge is actually present, it is argued that more advanced extraction algorithms would be able to capture these cases.

The reason why more advanced extraction algorithms were not considered was that it would require more test data to validate that such advanced extraction strategies are well-behaved. Recall that it has not possible to find quality test data labeled on

7.1 Future work 57

entity level, and it was considered too time consuming to manually construct large amounts of entity labeled data.

With these issue addressed, it is argued that theproof of conceptsystem indeed shows at least the potential of the presented method, and further investment in labeled data, both syntactic tagging data, and labeled test data, would make the solution more robust.

7.1 Future work

Besides resolving the issue presented as the major cause of the low robustness, the presented method also leaves plenty of opportunities for expansion. This could include a more sophisticated pronoun resolution than the one presented in Section 5.5.

Likewise even more advanced extraction strategies could also include relating enti-ties by the use of some of the abstract topological relations available in semantic networks, e.g. hyponym/hypernym and holonym/meronym. With such relations, a strong sentiment of the entity room might inflict the sentiment value ofhotel, since room is a meronym ofbuilding, andhotel is a hyponym ofbuilding.

Chapter 8

Conclusion

This thesis has presented aformal logical method forentity level sentiment analysis, which utilizesmachine learning techniquesfor efficient syntactic tagging. The method should be seen as an alternative to pure machine learning methods, which have been argued inadequate for capturing long distance dependencies between an entity and opinions, and of being highly dependent on the domain of the sentiment analysis.

The main aspects of method was presented in three stages:

• TheCombinatory Categorial Grammar (CCG) formalism, presented in Chap-ter3, is a modern and formal logical technique for processing of natural language texts. The semantics of the system was extended in order to apply it to the field of entity level sentiment analysis.

• In order to allow the presented method to work on a large vocabulary and a wide range of sentence structures, Chapter4described the usage of statistical models for syntactic tagging of the texts, after it had been argued that such an approach is the only reasonable. Algorithms for building semantic expressions from the syntactic information was presented, along with a formal method for reasoning about the sentiment expressed in natural language texts by the use of semantic networks.

• Chapter5presented essential details about theproof of concept system, which

has been fully implemented, using functional programming, in order to demon-strate and test the presented method.

Finally the presented method was evaluated against a small set of manually an-notated data. The evaluation showed, that while the correctness of the presented method seem acceptably high, its robustness is currently inadequate for most real world applications as presented in Chapter6. However it was argued in the previous chapter that it indeed is possible to improve the robustness significantly given further investment and development of the method.

Appendix A

A naive attempt for lexicon acquisition

This appendix describes the efforts that was initially made in order to acquire a CCG lexicon by transforming a tagged corpus, namely the Brown Corpus. The approach turned out to be very naive, and was dropped in favor for the C&C models trained on the CCGBank [Hockenmaier and Steedman, 2007], in turn The Penn Treebank [Marcuset al., 1993].

The Brown Corpus

English is governed by convention rather than formal code, i.e. there is no regulating body like the Académie française. Instead, authoritative dictionaries, i.a. Oxford English Dictionary, describe usage rather than defining it. Thus in order to acquire a covering lexicon it is necessary to build it from large amount of English text.

The Brown Corpus was compiled byFrancis and Kucera[1979] by collecting written works printed in United States during the year 1961. The corpus consists of just over one million words taken from 500 American English sample texts, with the intension of covering a highly representative variety of writing styles and sentence structures.

Notable drawbacks of the Brown Corpus include its age, i.e. there are evidently review topics where essential and recurring words used in present day writing was not coined yet or rarely used back 50 years ago. For instance does the Brown Corpus not recognize the word internet. However it is one of the only larger free to use tagged corpus available, and for this reason is was chosen for the attempt. Even early analysis showed that coverage would be disappointing, since the Brown Copus only contains 80.4% of the words of thehotels and restaurants subset of theOpinosis data set [Ganesanet al., 2010] (3793words). This mean that every 5th word would on average be a guess in the blind. However the approach was continued to see how many sentences would be possible to syntactic analyze with the lexicon.

The corpus is annotated with part of speech tags, but does not include the deep structure of atreebank. There is a total of 82 different tags, some examples are shown in TableA.1. As shown from the extract the tags include very limited information, and while some features can be extracted (e.g. tense, person) in some cases, the tagging gives no indication of the context.

Tag Description VB verb, base form VBD verb, past tense

VBG verb, present participle/gerund VBN verb, past participle

VBP verb, non 3rd person, singular, present VBZ verb, 3rd. singular present

Table A.1: Extract from the Brown tagging set.

Without any contextual information the approach for translating this information into lexical categories becomes very coarse. For instance there are no way of determining whether a verb is intransitive, transitive, or di-transitive. The chosen method was simply to over-generate, i.e. for every verb, entries for all three types of verbs was added to the lexicon. In total 62 of such rules were defined, which produced a lexicon containing CCG categories for 84.5% of the56 057unique tokens present in the Brown Corpus.

Evaluating the lexicon

To evaluate the coverage of the acquired lexicon an shift-reduce parser was imple-mented in Haskell. It is not the most efficient parsing strategy, but was simple to

63

implement and considered efficiently enough to test the lexicon. A representative sample of the hotels and restaurants subset of the Opinosis data set was selected (156 sentences). The result was that the parser only was able to parse 10.9% of the sentences. The result was very disappointing, and it was not even considered whether these even were correctly parsed. Instead it was recognized that building a CCG lexicon from only a tagged corpus is not a feasible approach. Further develop-ment of the approach was dropped and instead the component was replaced with the C&C tools [Clark and Curran, 2007].

Appendix B

Source code

Complete source code for the proof of concept solution presented can be downloaded from the url: http://www.student.dtu.dk/~s072466/msc.zip.

The following lists the essential Haskell code files for the implementation. Main.hs defines the extraction and analysis algorithms, i.e. E and A, and also includes the main entry point for the application. The syntax and semantics for the semantic ex-pressions are defined inLambda.hs. The implementation of Combinatory Categorial Grammar (CCG) is defined by CCG.hs, and Annotate.hs defines both the generic annotation algorithmUgen, as well as the special case annotation algorithms. Finally Parser.hsdefines the parser for the output from the C&C toolchain. The following listing thus thereby not include the WordNet interface.

Main.hs

{−# LANGUAGE I m p l i c i t P a r a m s #−}

module Main where import Data . Map ( Map )

import q u a l i f i e d Data . Map a s Map import Data . Maybe

import Data . Char import C o n t r o l . Monad import P a r s e r importCCG import A n n o t a t e

import WordNet h i d i n g ( Word ) matchE : : S t r i n g −> S t r i n g −> B o o l

matchE s s ’ = ( map t o L o w e r s ’ ) == s | |

( map t o L o w e r s ’ ) == " i t " | | ( map t o L o w e r s ’ ) == " t h e y " | | ( map t o L o w e r s ’ ) == " them "

e x t r a c t : : S t r i n g −> SExpr−> [ F l o a t ]

e x t r a c t s ( Fun s ’ j _ e s ) | matchE s s ’ = j : ( c o n c a t $ map ( e x t r a c t s ) e s )

| o t h e r w i s e = ( c o n c a t $ map ( e x t r a c t s ) e s )

e x t r a c t s ( S e q e s ) = ( c o n c a t $ map ( e x t r a c t s ) e s )

e x t r a c t _ _ = [ ]

a n a l y s e : : ( Word−> Word )−> S t r i n g −> ( S t r i n g , I n t ) −> IO ( ( I n t , Maybe F l o a t ) ) a n a l y s e a n n o t a t i o n A l g o r i t h m s u b j e c t ( s e n t e n c e , i n d e x ) =do

t r e e <− runCc a n n o t a t i o n A l g o r i t h m s e n t e n c e i f ( i s J u s t t r e e ) then do

l e t s e x p r = n o d e E x p r $ f r o m J u s t t r e e l e t r = e x t r a c t s u b j e c t s e x p r l e t m = ( maximum r ) + ( minimum r ) i f ( n u l l r ) then r e t u r n ( i n d e x , N o t h i n g ) e l s e i f (m > 0 ) then

r e t u r n $ ( i n d e x , J u s t $ maximum r ) e l s e i f (m < 0 ) then

r e t u r n $ ( i n d e x , J u s t $ minimum r ) e l s e

r e t u r n ( i n d e x , N o t h i n g ) e l s e

r e t u r n ( i n d e x , N o t h i n g )

main : : IO ( )

main =do wne <− i n i t i a l i z e W o r d N e t W i t h O p t i o n s N o t h i n g N o t h i n g

−− D e f i n e p o s i t i v e c o n c e p t s

l e t a d j _ p o s _ l i s t = [ ( " g o o d " , 1 ) , ( " b e a u t i f u l " , 1 ) , ( " p l e a s a n t " , 1 ) , ( " c l e a n " , 1 ) , ( " q u i e t " , 1 ) , ( " f r i e n d l y " , 1 ) , ( " c h e a p " , 1 ) , ( " f a s t " , 1 ) , ( " l a r g e " , 1 ) , ( " n i c e " , 1 ) ]

l e t a d j _ n e g _ l i s t = [ ( " bad " , 1 ) , ( " h i d e o u s " , 1 ) , ( " u n p l e a s a n t " , 1 ) , ( " d i r t y " , 1 ) , ( " n o i s y " , 1 ) , ( " u n f r i e n d l y " , 1 ) , ( " e x p e n s i v e " , 1 ) , ( " s l o w " , 1 ) , ( " s m a l l

" , 1 ) , ( " n a s t y " , 1 ) ]

−− D e f i n e i n t e n s i f i e r s c o n c e p t s

l e t i n t e n s i f i e r s = [ ( " e x t r e m e " , 1 ) , ( "much" , 1 ) , ( " more " , 1 ) ] l e t q u a l i f i e r = [ ( " m o d e r a t e " , 1 ) , ( " l i t t l e " , 1 ) , ( " l e s s " , 1 ) ]

−− Load t h e s y n s e t s t h a t c o r r e s p o n d s t o t h e w o r d s i n t h e a b o v e l i s t s . l e t s_pos = (l e t ? wne = wne i n map ( \ ( w , i )−> h e a d $ s e a r c h w Adj i )

a d j _ p o s _ l i s t )

l e t s_neg = (l e t ? wne = wne i n map ( \ ( w , i )−> h e a d $ s e a r c h w Adj i ) a d j _ n e g _ l i s t )

−− Load t h e s y n s e t s t h a t c o r r e s p o n d s t o t h e w o r d s i n t h e a b o v e l i s t s .

l e t i n t e n s i f i e r s _ s s = (l e t ? wne = wne i n map ( \ ( w , i )−> h e a d $ s e a r c h w Adj i ) i n t e n s i f i e r s )

l e t q u a l i f i e r _ s s = (l e t ? wne = wne i n map ( \ ( w , i ) > h e a d $ s e a r c h w Adj i ) q u a l i f i e r )

−− P r i n t i n f o a b o u t t h e s e e d c o n c e p t s p u t S t r " \ n \ n "

p u t S t r $ " P o s i t i v e c o n c e p t s : \ n "

p u t S t r $ u n l i n e s $ map show s_pos p u t S t r " \ n "

p u t S t r $ " N e g a t i v e c o n c e p t s : \ n "

p u t S t r $ u n l i n e s $ map show s_neg p u t S t r " \ n "

p u t S t r $ " I n t e n s i f y i n g c o n c e p t s : \ n "

p u t S t r $ u n l i n e s $ map show i n t e n s i f i e r s _ s s p u t S t r " \ n "

p u t S t r $ " Q u a l i f y i n g c o n c e p t s : \ n "

p u t S t r $ u n l i n e s $ map show q u a l i f i e r _ s s p u t S t r " \ n \ n "

−− B u i l d s e m a n t i c n e t w o r k s and a n n o t a t i o n e n v i r o n m e n t p u t S t r " U n f o l d i n g s e m a n t i c n e t w o r k s . . . \ n "

l e t ( adjMap , a d j G r a p h ) = (l e t ? wne = wne i n u n f o l d G ( \ x−> ( r e l a t e d B y S i m i l a r x ++ r e l a t e d B y S e e A l s o x ) ) ( s_pos ++ s_neg ) )

l e t ( s c a l e M a p , s c a l e G r a p h ) = (l e t ? wne = wne i n u n f o l d G ( r e l a t e d B y S i m i l a r ) ( i n t e n s i f i e r s _ s s ++ q u a l i f i e r _ s s ) )

p u t S t r $ "− Change n e t w o r k : " ++ ( show $ Map . s i z e adjMap ) ++ " c o n c e p t s . \ n "

p u t S t r $ "− S c a l e n e t w o r k : " ++ ( show $ Map . s i z e s c a l e M a p ) ++ " c o n c e p t s . \ n "

67

l e t a d j P o s R o o t s = map ( f r o m J u s t . f l i p Map . l o o k u p adjMap ) s_pos l e t a d j N e g R o o t s = map ( f r o m J u s t . f l i p Map . l o o k u p adjMap ) s_neg

l e t a d j _ f u n = pAdj a d j G r a p h a d j P o s R o o t s a d j N e g R o o t s . c a t M a y b e s . map ( f l i p Map . l o o k u p adjMap )

l e t s c a l e I n t e n s i f i e r s R o o t s = map ( f r o m J u s t . f l i p Map . l o o k u p s c a l e M a p ) i n t e n s i f i e r s _ s s

l e t s c a l e Q u a l i f i e r R o o t s = map ( f r o m J u s t . f l i p Map . l o o k u p s c a l e M a p ) q u a l i f i e r _ s s l e t s c a l e F u n = p S c a l e s c a l e G r a p h s c a l e I n t e n s i f i e r s R o o t s s c a l e Q u a l i f i e r R o o t s .

c a t M a y b e s . map ( f l i p Map . l o o k u p s c a l e M a p ) l e t e n v = A n n o t a t i o n E n v {

wnEnv = wne , a d j F u n = a d j _ f u n , s c a l e F u n = s c a l e F u n }

−−Load r e v i e w d a t a

r e v i e w D a t a <− l i f t M l i n e s $ r e a d F i l e " . . / Data / r o o m s _ s w i s s o t e l _ c h i c a g o _ a . t x t "

−− A n a l y s e

r e s u l t <−mapM ( a n a l y s e ( a n n o t a t e W o r d e n v ) " room " ) $ z i p r e v i e w D a t a [ 1 . . ]

−− P r i n t r e s u l t s p u t S t r " \ n \ n "

p u t S t r " R e s u l t s : \ n "

p u t S t r $ u n l i n e s ( map ( \ ( i , r ) −> ( show i ) ++ " : " ++ ( show r ) ) r e s u l t ) p u t S t r " \ n \ n "

p u t S t r " F i n s h e d . \ n "

r e t u r n ( )

Lambda.hs

module Lambda where

import Data . L i s t ( nub , u n i o n , ( \ \ ) ) ;

−− | Data s t r u c t u r e f o r s e m a n t i c e x p r e s s i o n s

data SExpr = Var S t r i n g −− V a r i a b l e

| Abs S t r i n g SExpr −−Lambda a b s t r a c t i o n

| App SExpr SExpr −−Lambda a p p l i c a t i o n

| Fun S t r i n g F l o a t I n t [ SExpr ] −− F u n c t o r

| S e q [ SExpr ] −− S e q u e n c e

| I m p a c t C h a n g e SExpr I n t −− I m p a c t c h a n g e

| Change SExpr F l o a t −−Change

| S c a l e SExpr F l o a t −− S c a l e

d e r i v i n g ( Eq )

−− | R e t u r n s t h e s e t o f f r e e v a r i a b l e s i n t h e g i v e n e x p r e s s i o n f r e e : : SExpr−> [ S t r i n g ]

f r e e ( Var x ) = [ x ]

f r e e ( App e 1 e 2 ) = ( f r e e e 1 ) ‘ u n i o n ‘ ( f r e e e 2 )

f r e e ( Abs x e ) = ( f r e e e ) \ \ [ x ]

f r e e ( Fun _ _ _ e s ) = nub $ c o n c a t $ map f r e e e s f r e e ( I m p a c t C h a n g e e _) = ( f r e e e )

f r e e ( S e q e s ) = nub $ c o n c a t $ map f r e e e s

f r e e ( Change e _) = ( f r e e e )

f r e e ( S c a l e e _) = ( f r e e e )

−− | S a f e s u b s t i t u t i o n o f v a r i a b l e s x ’ w i t h e ’ i n e s u b s t : : SExpr−> S t r i n g −> SExpr−> SExpr

s u b s t e@ ( Var x ) x ’ e ’ | x == x ’ = e ’

| o t h e r w i s e = e

s u b s t e@ ( App e 1 e 2 ) x ’ e ’ = App ( s u b s t e 1 x ’ e ’ ) ( s u b s t e 2 x ’ e ’ ) s u b s t e@ ( Abs x e 1 ) x ’ e ’ | x == x ’ =

−−x i s bound i n e , s o do n o t c o n t i n u e e

| x ‘ e l e m ‘ f r e e e ’ =

−−x i s i n FV( e ’ ) , n e e d a l p h a−c o n v e r s i o n o f x : l e t x ’ ’ = h e a d $ x V a r s \ \ ( f r e e e 1 ‘ u n i o n ‘ f r e e e ’ ) i n s u b s t ( Abs x ’ ’ $ s u b s t e 1 x ( Var x ’ ’ ) ) x ’ e ’

| o t h e r w i s e =

−− o t h e r w i s e j u s t c o n t i n u e Abs x ( s u b s t e 1 x ’ e ’ )

s u b s t e@ ( Fun f j k e s ) x ’ e ’ = Fun f j k $ ( map ( \ e 1−> s u b s t e 1 x ’ e ’ ) ) e s s u b s t e@ ( S e q e s ) x ’ e ’ = S e q $ ( map ( \ e 1−> s u b s t e 1 x ’ e ’ ) ) e s s u b s t e@ ( I m p a c t C h a n g e e 1 k ’ ) x ’ e ’ = I m p a c t C h a n g e ( s u b s t e 1 x ’ e ’ ) k ’

s u b s t e@ ( Change e 1 j ) x ’ e ’ = Change ( s u b s t e 1 x ’ e ’ ) j

s u b s t e@ ( S c a l e e 1 j ) x ’ e ’ = S c a l e ( s u b s t e 1 x ’ e ’ ) j

−− | R e d u c e s a s e m a n t i c e x p r e s s i o n r e d u c e : : SExpr−> SExpr

−− −r e d u c t i o n

r e d u c e ( App ( Abs x t ) t ’ ) = r e d u c e $ s u b s t ( r e d u c e t ) x ( r e d u c e t ’ )

r e d u c e ( App t 1 t 2 ) = i f ( t 1 /= t 1 ’ ) then ( r e d u c e $ App t 1 ’ t 2 ) e l s e ( App t 1

t 2 )

where t 1 ’ = r e d u c e t 1

r e d u c e ( Abs x t ) = Abs x $ r e d u c e t

r e d u c e ( Fun f j k t s ) = Fun f j k $ map r e d u c e t s

r e d u c e ( S e q t s ) = S e q $ map r e d u c e t s

−−FC1 , FC2 , SC and PC r u l e s :

r e d u c e ( Change ( Fun f j 0 t s ) j ’ ) = Fun f ( j + j ’ ) 0 $ map r e d u c e t s

r e d u c e ( Change ( Fun f j k t s ) j ’ ) = Fun f j k $ map r e d u c e $ ( t a k e ( k 1 ) t s ) ++

[ Change ( t s ! ! ( k 1 ) ) j ’ ] ++ ( d r o p k t s ) r e d u c e ( Change ( S e q t s ) j ’ ) = S e q $ map ( r e d u c e . f l i p Change j ’ ) t s

r e d u c e ( Change ( Abs x t ) j ’ ) = Abs x $ r e d u c e $ Change t j ’

r e d u c e ( Change t j ) = i f ( t /= t ’ ) then ( r e d u c e $ Change t ’ j ) e l s e ( Change

t ’ j )

where t ’ = r e d u c e t

−− FS1 , FC2 , SS and PS r u l e s :

r e d u c e ( S c a l e ( Fun f j 0 t s ) j ’ ) = Fun f (i f j == 0 then j ’ e l s e j j ’ ) 0 $ map r e d u c e t s

r e d u c e ( S c a l e ( Fun f j k t s ) j ’ ) = Fun f j k $ map r e d u c e $ ( t a k e ( k 1 ) t s ) ++

[ S c a l e ( t s ! ! ( k 1 ) ) j ’ ] ++ ( d r o p k t s ) r e d u c e ( S c a l e ( S e q t s ) v ) = S e q $ map ( r e d u c e . f l i p S c a l e v ) t s

r e d u c e ( S c a l e ( Abs x t ) v ) = Abs x $ r e d u c e $ S c a l e t v

r e d u c e ( S c a l e t j ) = i f ( t /= t ’ ) then ( r e d u c e $ S c a l e t ’ j ) e l s e ( S c a l e t ’ j )

where t ’ = r e d u c e t

−− IC r u l e :

r e d u c e ( I m p a c t C h a n g e ( Fun f j k t s ) k ’ ) = Fun f j k ’ t s

r e d u c e ( I m p a c t C h a n g e t k ’ ) = i f ( t /= t ’ ) then

( r e d u c e $ I m p a c t C h a n g e t ’ k ’ ) e l s e

( I m p a c t C h a n g e t k ’ ) where t ’ = r e d u c e t

−− O t h e r w i s e

r e d u c e x = x

−− | C r e a t e s an i n f i n i t e l i s t o f v a r i a b l e s [ x , x ’ , x ’ ’ , . . . ] x V a r s : : [ S t r i n g ]

x V a r s = i t e r a t e (++ " ’ " ) " x "

z V a r s : : [ S t r i n g ]

z V a r s = i t e r a t e (++ " ’ " ) " z "

f V a r s : : [ S t r i n g ]

f V a r s = i t e r a t e (++ " ’ " ) " f "

−− | C r e a t e s an i n f i n i t e l i s t o f v a r i a b l e s [ x , y , z , x ’ , y ’ z ’ , x ’ ’ , y ’ ’ , z ’ ’ , . . . ] x y z V a r s : : [ S t r i n g ]

x y z V a r s = [ v ++ v ’ | v ’ <− ( i t e r a t e (++ " ’ " ) " " ) , v <− [ " x " , " y " , " z " ] ]

−− | I d e n t i t y s e m a n t i c e x p r e s s i o n l i d = Abs " x " $ Var " x "

−− | D e t e r m i n e s i f t h e e x p r e s s i o n c o m p l e x , i . e . n e e d s p a r e n t h e s i s i s C o m p l e x E x p r : : SExpr −> B o o l

i s C o m p l e x E x p r ( App _ _) = True

i s C o m p l e x E x p r ( S e q _) = True

i s C o m p l e x E x p r ( I m p a c t C h a n g e _ _) = True

i s C o m p l e x E x p r ( Change _ _) = True

i s C o m p l e x E x p r ( S c a l e _ _) = True

i s C o m p l e x E x p r _ = F a l s e

−− | P r e t t y p r i n t i n g o f d a t a s t r u c t u r e s i n s t a n c e Show SExpr where

s h o w s P r e c d ( Var x ) = ( s h o w S t r i n g x )

s h o w s P r e c d ( Abs x t ) = ( s h o w S t r i n g $ " \ \ " ++ x ++ " . " ) . ( s h o w s t ) s h o w s P r e c d ( App t 1 t 2 ) = ( s h o w P a r e n ( i s C o m p l e x E x p r t 1 ) ( s h o w s t 1 ) ) .

( s h o w S t r i n g " " ) .

( s h o w P a r e n ( i s C o m p l e x E x p r t 2 ) ( s h o w s t 2 ) ) s h o w s P r e c d ( Fun f j _ [ ] ) = ( s h o w S t r i n g $ f ++ " ’ " ++ ( show j ) )

s h o w s P r e c d ( Fun f j k t s ) = ( s h o w S t r i n g $ f ++ " ’ " ++ ( show j ) ++ " ( " ) . ( s h o w L i s t ’ t s ) . ( s h o w S t r i n g " ) " )

where s h o w L i s t ’ : : Show a => [ a ] −> ShowS s h o w L i s t ’ [ ] = s h o w S t r i n g " "

s h o w L i s t ’ [ a ] = s h o w s a

s h o w L i s t ’ ( a 1 : a 2 :a s) = ( s h o w s a 1 ) . ( s h o w S t r i n g " , " ) . ( s h o w L i s t ’ ( a 2 :a s) ) s h o w s P r e c d ( I m p a c t C h a n g e t k ’ ) = ( s h o w s t ) . ( s h o w S t r i n g "−>" ) . ( s h o w s k ’ ) s h o w s P r e c d ( S e q t s ) = ( s h o w s t s )

s h o w s P r e c d ( Change t 1 v ) = ( s h o w s t 1 ) . ( s h o w S t r i n g " " ) . ( s h o w s v ) s h o w s P r e c d ( S c a l e t 1 v ) = ( s h o w s t 1 ) . ( s h o w S t r i n g " " ) . ( s h o w s v )

69

CCG.hs

{−# LANGUAGE T y p e S y n o n y m I n s t a n c e s , F l e x i b l e I n s t a n c e s #−}

module CCG (

module U n i f i c a t i o n , module Lambda , moduleCCG ) where

import U n i f i c a t i o n import Lambda

type Token = S t r i n g −− L e x i c a l e n t r y

type Lemma = S t r i n g −−Lemma o f l e x i c a l e n t r y type Pos = S t r i n g −− P a r t o f s p e a r c h

−− | Data s t r u c t u r e f o r l e x i c a l u n i t s .

data Word = Word {

t o k e n : : Token , lemma : : Lemma , p o s : : Pos ,

c a t e g o r y : : C a t e g o r y , e x p r : : SExpr }

d e r i v i n g ( Eq ) type L e x i c o n = [ Word ]

i n f i x 9 : / −−Forward s l a s h o p e r a t o r i n f i x 9 : \ −−Backward s l a s h o p e r a t o r type A g r e e m e n t = [ F e a t u r e ]

data C a t e g o r y = S { a g r e e m e n t : : A g r e e m e n t }−− S e n t e n c e

| N { a g r e e m e n t : : A g r e e m e n t }−−Noun

| NP { a g r e e m e n t : : A g r e e m e n t }−−Noun P h r a s e

| PP { a g r e e m e n t : : A g r e e m e n t }−− P r e p o s i s i o n P h r a s e

| CONJ { a g r e e m e n t : : A g r e e m e n t }−− C o n j u g a t i o n ( t e m p e r a r y c a t e g o r y )

| P u n c t u a t i o n { a g r e e m e n t : : A g r e e m e n t }−− P u n c t a t i o n ( t e m p e r a r y c a t e g o r y )

| Comma { a g r e e m e n t : : A g r e e m e n t }−−Comma ( t e m p e r a r y c a t e g o r y )

| C a t e g o r y : / C a t e g o r y −− Forward s l a s h

| C a t e g o r y : \ C a t e g o r y −− Backward s l a s h

d e r i v i n g ( Eq )

data F e a t u r e = FDcl | F A d j | FEm | FI nv −− S e n t e n c e

| FTo | FB | FPt | FPss −− V e r b s

| FNg | FNb | F F o r

| FThr | FFrg −− F r a g m e n t s

| FQ | FWq | FQem −− Q u e s t i o n s

| FVar S t r i n g −− V a r i a b l e s

| FUnknown S t r i n g −−Unknowns

d e r i v i n g ( Eq , Show )

data PTree = PWord Word

| PFwdApp { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree , n2 : : PTree }

| PBwdApp { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree , n2 : : PTree }

| PFwdComp { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree , n2 : : PTree }

| PBwdComp { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree , n2 : : PTree }

| PBwdXComp { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree , n2 : : PTree }

| PFwdTR { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree }

| P L e x R a i s e { n C a t e g o r y : : C a t e g o r y , nExpr : : SExpr , n1 : : PTree } d e r i v i n g ( Eq , Show )

−− | G e t s t h e c a t e g o r y o f a n o d e i n t h e d e d u c t i o n t r e e . n o d e C a t e g o r y : : PTree −> C a t e g o r y

n o d e C a t e g o r y ( PWord w) = c a t e g o r y w n o d e C a t e g o r y x = n C a t e g o r y x

−− | G e t s t h e s e m a n t i c e x p r e s s i o n o f a n o d e i n t h e d e d u c t i o n t r e e . n o d e E x p r : : PTree > SExpr

n o d e E x p r ( PWord w) = e x p r w

n o d e E x p r x = nExpr x

i n s t a n c e U n i f i a b l e C a t e g o r y where

S _ =? S _ = True

N _ =? N _ = True

NP _ =? NP _ = True

PP _ =? PP _ = True

CONJ _ =? CONJ _ = True

P u n c t u a t i o n _ =? P u n c t u a t i o n _ = True

Comma _ =? Comma _ = True

( a : / b ) =? ( a ’ : / b ’ ) = a =? a ’ && b =? b ’

( a : \ b ) =? ( a ’ : \ b ’ ) = a =? a ’ && b =? b ’

_ =? _ = F a l s e

−− | R e t u r n i f a c a t e g o r y i s compound .

i s C o m p l e x : : C a t e g o r y > B o o l i s C o m p l e x (_ : / _) = True i s C o m p l e x (_ : \ _) = True i s C o m p l e x _ = F a l s e

−− | R e t u r n t h e a r g u m e n t o f t h e t y p e i n f l i c t e d by a compound c a t e g o r y . a r g : : C a t e g o r y−> Maybe C a t e g o r y

a r g (_: \ x ) = J u s t x a r g (_: / x ) = J u s t x

a r g x = N o t h i n g

−− | R e t u r n t h e r e s u l t o f t h e t y p e i n f l i c t e d by a compound c a t e g o r y . r e s : : C a t e g o r y−> Maybe C a t e g o r y

r e s ( x : \_) = J u s t x r e s ( x : /_) = J u s t x

r e s _ = N o t h i n g

−− | P r e t t y p r i n t i n g o f Word i n s t a n c e Show Word where

s h o w s P r e c d ( Word { t o k e n = t , lemma = lemma , p o s = p , c a t e g o r y = c , e x p r = e } ) = ( s h o w S t r i n g t ) . ( s h o w S t r i n g "~" ) . ( s h o w S t r i n g lemma ) . ( s h o w S t r i n g " / " ) . ( s h o w S t r i n g p ) . ( s h o w S t r i n g " " ) . ( s h o w s c ) . ( s h o w S t r i n g " : " ) . ( s h o w s e )

−− | P r e t t y p r i n t i n g o f C a t e g o r y i n s t a n c e Show C a t e g o r y where

s h o w s P r e c d ( S a ) = s h o w S t r i n g " S " . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d (N a ) = s h o w S t r i n g "N" . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d (NP a ) = s h o w S t r i n g "NP" . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d (PP a ) = s h o w S t r i n g "PP" . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d (CONJ a ) = s h o w S t r i n g "CONJ" . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d ( P u n c t u a t i o n a ) = s h o w S t r i n g " . " . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d (Comma a ) = s h o w S t r i n g " , " . ( s h o w S t r i n g " " ) . ( s h o w s a ) s h o w s P r e c d ( a : / b ) =

( ( s h o w P a r e n ( i s C o m p l e x a ) ( s h o w s a ) ) . ( s h o w S t r i n g " / " ) .

( s h o w P a r e n ( i s C o m p l e x b ) ( s h o w s b ) ) ) s h o w s P r e c d ( a : \ b ) =

( ( s h o w P a r e n ( i s C o m p l e x a ) ( s h o w s a ) ) . ( s h o w S t r i n g " \ \ " ) .

( s h o w P a r e n ( i s C o m p l e x b ) ( s h o w s b ) ) )

Annotate.hs

{−# LANGUAGE I m p l i c i t P a r a m s #−}

module A n n o t a t e where import C o n t r o l . Monad . ST import Data . STRef import Data . L i s t ( s o r t ) import Data . Map ( Map )

import q u a l i f i e d Data . Map a s Map import Data . Maybe

import Data . Graph . I n d u c t i v e import CCG

import WordNet h i d i n g ( Word ) data A n n o t a t i o n E n v = A n n o t a t i o n E n v {

wnEnv : : WordNetEnv ,

a d j F u n : : [ S e a r c h R e s u l t ] > F l o a t , −−rename t o a d j C h a n g e ? s c a l e F u n : : [ S e a r c h R e s u l t ] > F l o a t −−rename t o a d j S c a l e ? }

−− P a r a m e t e r s f o r t h e a n n o t a t i o n omega = 1 0 0

n = 5

−− | U n f o l d t h e g r a p h u s i n g t h e g i v e n r e l a t i o n and s e e d s . u n f o l d G : : ( Ord a ) => ( a−> [ a ] ) −> [ a ] > ( Map a Node , Gr a I n t ) u n f o l d G r s e e d s = runST $ u n f o l d S T r s e e d s

−− | S t a t e t r a s f o r m e r f o r u n f o l d i n g g r a p h s .

u n f o l d S T : : ( Ord a ) => ( a−> [ a ] ) −> [ a ] > ST s ( Map a Node , Gr a I n t ) u n f o l d S T r s e e d s =

do mapRef < newSTRef Map . empty −−Map f r o m I t e m t o Node

n o d e s R e f < newSTRef [ ] −− L i s t o f Node / [ Edge ] p a i r s i d R e f < newSTRef 0 −− C o u n t e r f o r i n d e x i n g n o d e s

−− R e c u r s i v e l y v i s i t s n l e t v i s i t n =

71

do−− T e s t i f n h a s a l r e a d y b e e n v i s i t e d

t e s t <− (r e t u r n . Map . l o o k u p n =<< r e a d S T R e f mapRef ) c a s e t e s t o f

J u s t v −> r e t u r n v N o t h i n g−>

do−−Get n e x t i d f o r t h i s i t e m i < r e a d S T R e f i d R e f m o d i f y S T R e f i d R e f ( + 1 )

−−Update i t e m / n o d e map

m o d i f y S T R e f mapRef ( Map . i n s e r t n i )

−− R e c u r s i v e l y v i s i t r e l a t e d i t e m s k s <−mapM v i s i t $ r n

l e t n s = ( ( i , n ) , [ ( i , k , 1 ) | k < k s ] ) m o d i f y S T R e f n o d e s R e f ( n s : )

r e t u r n i

−− V i s i t s e e d s mapM v i s i t s e e d s

−−Read r e s u l t s and r e t u r n map/ g r a p h−p a i r l i s t < r e a d S T R e f n o d e s R e f

nodeMap < r e a d S T R e f mapRef l e t n o d e s = [ n | ( n , _) <− l i s t ]

l e t e d g e s = c o n c a t [ e s | (_, e s ) <− l i s t ] r e t u r n ( nodeMap , mkGraph n o d e s e d g e s )

−− | P o l a r i t y v a l u e f o r a d j e c t i v e g r a p h s

pAdj : : R e a l b => Gr a b−> [ Node ]−> [ Node ]−> [ Node ]−> F l o a t pAdj g r p n s n n s q n s = ( sum $ d i s t A d j n n s q n s ) ( sum $ d i s t A d j p n s q n s )

where

d i s t A d j : : [ Node ] −> [ Node ] −> [ F l o a t ]

d i s t A d j s n s q n s = t a k e n $ s o r t [ normAdj ( l e n g t h ( s p s n qn g r ) 1 ) | s n < s n s , qn < q n s ]

normAdj : : I n t > F l o a t

normAdj x | x < 0 | | x > 1 0 = omega / ( f r o m I n t e g r a l n )

| o t h e r w i s e = ( f r o m I n t e g r a l x / 1 0 ) ( omega / ( f r o m I n t e g r a l n ) )

−− | P o l a r i t y v a l u e f o r s c a l e g r a p h s

p S c a l e : : R e a l b => Gr a b> [ Node ] > [ Node ] > [ Node ] > F l o a t

p S c a l e g r p n s n n s q n s = 2∗ ∗( ( sum $ d i s t S c a l e n n s q n s ) ( sum $ d i s t S c a l e p n s q n s ) ) where

d i s t S c a l e : : [ Node ]−> [ Node ]−> [ F l o a t ]

d i s t S c a l e s n s q n s = t a k e n $ s o r t [ n o r m S c a l e ( l e n g t h ( s p s n qn g r ) 1 ) | s n < s n s , qn <− q n s ]

n o r m S c a l e : : I n t −> F l o a t

n o r m S c a l e x | x < 0 | | x > 1 0 = 1 / ( f r o m I n t e g r a l n )

| o t h e r w i s e = ( f r o m I n t e g r a l x / 1 0 ) ( 1 / ( f r o m I n t e g r a l n ) )

c o n c a t 2 : : [ [ a ] ] > [ a ] c o n c a t 2 [ ] = [ ]

c o n c a t 2 x = l e t nonEmpty = f i l t e r ( n o t . n u l l ) x

i n ( map h e a d nonEmpty ) ++ c o n c a t 2 ( map t a i l nonEmpty ) s h i f t T e r m : : SExpr > I n t−> SExpr

s h i f t T e r m ( Abs x t ) k ’ = Abs x ( s h i f t T e r m t k ’ ) s h i f t T e r m ( Fun f j k t s ) k ’ = Fun f j k ’ t s

i s D e t p o s = p o s == "DT"

i s A d j p o s = ( t a k e 2 p o s ) == " JJ "

i s V e r b p o s = ( t a k e 2 p o s ) == "VB"

i s A d v e r b p o s = ( t a k e 2 p o s ) == "RB"

i s N o u n p o s = ( t a k e 2 p o s ) == "NN" | | p o s == "PRP"

i s P p o s = p o s == " IN " | | p o s == "TO" | | p o s == "WDT"

a n n o t a t e D e t e n v w@( Word { c a t e g o r y = c , t o k e n = t , lemma = l } )

| c =? NP [ ] : / N [ ] =

w { e x p r = l i d }

| o t h e r w i s e = a n n o t a t e A n y e n v w

a n n o t a t e N o u n e n v w@( Word { c a t e g o r y = c , t o k e n = t , lemma = l } )

| c =? N [ ] : / N [ ] =

w { e x p r = ( Abs " x " $ S e q [ Fun l 0 0 [ ] , Var " x " ] ) }−− P a r t o f m u l t i l e x i c a l noun

| o t h e r w i s e =

a n n o t a t e A n y e n v w

a n n o t a t e V e r b e n v w@( Word { c a t e g o r y = c , t o k e n = t , lemma = l } )

| c =? ( S [ FDcl ] : \ NP [ ] ) : / ( S [ FAdj ] : \ NP [ ] ) = w { e x p r = l i d }−− L i n k i n g v e r b

| o t h e r w i s e = a n n o t a t e A n y e n v w

a n n o t a t e A d j e n v w@( Word { c a t e g o r y = c , lemma = l } )

| ( S [ FAdj ] : \ NP [ ] ) =? c | | (NP [ ] : / NP [ ] ) =? c | | (N [ ] : / N [ ] ) =? c = l e t q u e r y = (l e t ? wne = ( wnEnv e n v ) i n s e a r c h l Adj A l l S e n s e s )

v a l u e = ( a d j F u n e n v ) q u e r y

i n w { e x p r = ( Abs " x " $ Change ( Var " x " ) v a l u e ) }

| o t h e r w i s e = a n n o t a t e A n y e n v w

a n n o t a t e A d v e r b e n v w@( Word { c a t e g o r y = c , lemma = l } )

| a r g c =? r e s c =

l e t−− Try t o l o o k u p a d j e c t i v e p e r t a i n y m s

q u e r y = (l e t ? wne = ( wnEnv e n v ) i n ( c o n c a t 2 $ map ( r e l a t e d B y P e r t a i n y m ) ( s e a r c h l Adv A l l S e n s e s ) ) ++ ( s e a r c h l Adj A l l S e n s e s ) )

s c a l e V a l u e = ( s c a l e F u n e n v ) $ f i l t e r ((==) Adj . srPOS ) $ q u e r y c h a n g e V a l u e = ( a d j F u n e n v ) $ f i l t e r ((==) Adj . srPOS ) $ q u e r y i n i f s c a l e V a l u e /= 1 then

w { e x p r = ( Abs " x " $ S c a l e ( Var " x " ) $ s c a l e V a l u e ) } e l s e

w { e x p r = ( Abs " x " $ Change ( Var " x " ) $ c h a n g e V a l u e ) }

| o t h e r w i s e = a n n o t a t e A n y e n v w

a n n o t a t e P e n v w@( Word { c a t e g o r y = c } )

| (NP [ ] : \ NP [ ] ) : / ( S [ ] : / NP [ ] ) =? c =

w { e x p r = Abs " x " $ Abs " y " $ I m p a c t C h a n g e ( App ( Var " x " ) ( Var " y " ) ) 1 }

| (NP [ ] : \ NP [ ] ) : / ( S [ ] : \ NP [ ] ) =? c =

w { e x p r = Abs " x " $ Abs " y " $ I m p a c t C h a n g e ( App ( Var " x " ) ( Var " y " ) ) 2 }

| (NP [ ] : \ NP [ ] ) : / NP [ ] =? c = w { e x p r = s h i f t T e r m ( e x p r w ’ ) 2 }

| o t h e r w i s e = w ’

where w ’ = a n n o t a t e A n y e n v w

a n n o t a t e A n y _ w@( Word { t o k e n = t , c a t e g o r y = c , lemma = l } ) = w { e x p r = c o n s t r u c t T e r m x y z V a r s [ ] c }

where

c o n s t r u c t T e r m : : [ S t r i n g ] > [ ( SExpr , C a t e g o r y ) ] −> C a t e g o r y−> SExpr c o n s t r u c t T e r m v s t s c = c a s e c o f

r : \ a−> c o n s t r u c t A b s t r a c t i o n v s t s a r r : / a−> c o n s t r u c t A b s t r a c t i o n v s t s a r

_ −> Fun l 0 0 $ r e v e r s e $ [ v | ( v , _) <− t s ]

c o n s t r u c t A b s t r a c t i o n : : [ S t r i n g ] −> [ ( SExpr , C a t e g o r y ) ] > C a t e g o r y −> C a t e g o r y−>

SExpr

c o n s t r u c t A b s t r a c t i o n ( v : v s ) t s a r =−−a = t a u _ a l p h a , r = t a u _ b e t a l e t t = Var v−−NP

c o n d 1 = any ( \ ( _, t ) > ( J u s t a ) =? a r g t ) t s −− t y p e s w h e r e a m i g t h b e u s e d a s a r g u m e n t

c o n d 2 = any ( \ ( _, t ) > a r g a =? ( J u s t t ) ) t s −− t y p e s w h e r e a m i g t h b e u s e d a s f u n c t i o n

t e r m = i f c o n d 1 then map ( \ ( t ’ , c ’ ) −> i f J u s t a =? a r g c ’ then ( App t ’ t , f r o m J u s t $ r e s c ’ ) e l s e ( t ’ , c ’ ) ) t s e l s e

i f c o n d 2 then map ( \ ( t ’ , c ’ ) −> i f a r g a =? J u s t c ’ then ( App t t ’ , f r o m J u s t $ r e s a ) e l s e ( t ’ , c ’ ) ) t s e l s e

( t , a ) : t s i n

Abs v ( c o n s t r u c t T e r m v s ( t e r m ) r )

−− | S p e c i a l a n n o t a t i o n f o r C&C c o n j−r u l e a n n o t a t e C o n j : : C a t e g o r y−> Word−> Word a n n o t a t e C o n j c a t w@( Word { lemma = l } ) =

w { e x p r = Abs " x " $ Abs " y " $ c o n s t r u c t T e r m [ ] c a t } where

newVar u s e d = h e a d $ d r o p W h i l e ( ‘ e l e m ‘ u s e d ) $ ( i t e r a t e (++ " ’ " ) " z " ) c o n s t r u c t T e r m : : [ S t r i n g ] > C a t e g o r y −> SExpr

c o n s t r u c t T e r m u s e d c = c a s e c o f

a : \ _−> l e t v = newVar u s e d i n Abs v ( c o n s t r u c t T e r m ( v : u s e d ) a ) a : / _−> l e t v = newVar u s e d i n Abs v ( c o n s t r u c t T e r m ( v : u s e d ) a )

_ −> S e q $ map ( \ t e r m−> f o l d r ( f l i p App ) t e r m $ map Var u s e d ) [ Var " x " , Var " y

" ]

−− | A u x i l i a r y f u n c t i o n f o r a n n o t a t i o n f o r C&C l p / rp−r u l e f l a t t e n : : C a t e g o r y > [ C a t e g o r y ]

f l a t t e n ( a : \ b ) = b : f l a t t e n a f l a t t e n ( a : / b ) = b : f l a t t e n a

f l a t t e n a = [ a ]

−− | S p e c i a l a n n o t a t i o n f o r C&C l p / rp−r u l e a n n o t a t e C o n j ’ : : Word> Word

a n n o t a t e C o n j ’ w@( Word { lemma = l , c a t e g o r y = c } ) = l e t f = f l a t t e n c

c 1 = f ! ! 0 c 2 = f ! ! 1 c r = f ! ! 2 i n

c a s e l e n g t h f o f

1−> e r r o r " A n n o t a t e Conj : We e x p e c t l e a s t t > t . "

2−> w { e x p r = l i d } −−Dummy c o n j e c t i o n , j u s t r e t u r n i d e n t i t y , f o r i n s t a n c e , and . i n " f u n n y , and happy . "

73

_> w { e x p r = Abs " x " $ Abs " y " $ c o n s t r u c t T e r m [ ] c 2 } where

newVar u s e d = h e a d $ d r o p W h i l e ( ‘ e l e m ‘ u s e d ) $ z V a r s c o n s t r u c t T e r m : : [ S t r i n g ] −> C a t e g o r y−> SExpr c o n s t r u c t T e r m u s e d c = c a s e c o f

a : \ _−> l e t v = newVar u s e d i n Abs v ( c o n s t r u c t T e r m ( v : u s e d ) a ) a : / _−> l e t v = newVar u s e d i n Abs v ( c o n s t r u c t T e r m ( v : u s e d ) a )

_ −> S e q $ map ( \ t e r m−> f o l d r ( f l i p App ) t e r m $ r e v e r s e $ map Var u s e d ) [ Var "

x " , Var " y " ]

−− | S p e c i a l a n n o t a t i o n f o r C&C l t c−r u l e a n n o t a t e L t c : : L e x i c o n−> Word−> Word

a n n o t a t e L t c _ w@( Word { t o k e n = t , c a t e g o r y = c , lemma = l } ) = w { e x p r = c o n s t r u c t T e r m x y z V a r s [ ] c }

where

c o n s t r u c t T e r m : : [ S t r i n g ] −> [ ( SExpr , C a t e g o r y ) ] −> C a t e g o r y−> SExpr c o n s t r u c t T e r m v s t s c = c a s e c o f

r : \ a−> c o n s t r u c t A b s t r a c t i o n v s t s a r r : / a−> c o n s t r u c t A b s t r a c t i o n v s t s a r _ −> S e q $ r e v e r s e $ [ v | ( v , _) <− t s ]

c o n s t r u c t A b s t r a c t i o n : : [ S t r i n g ] > [ ( SExpr , C a t e g o r y ) ] −> C a t e g o r y−> C a t e g o r y−>

SExpr

c o n s t r u c t A b s t r a c t i o n ( v : v s ) t s a r =−−a = t a u _ a l p h a , r = t a u _ b e t a l e t t = Var v −−NP

c o n d = any ( \ ( _, t ) −> ( J u s t a ) =? a r g t ) t s −− t y p e s w h e r e a m i g t h b e u s e d a s a r g u m e n t

t e r m = i f c o n d then map ( \ ( t ’ , c ’ ) −> i f J u s t a =? a r g c ’ then ( App t ’ t , f r o m J u s t $ r e s c ’ ) e l s e ( t ’ , c ’ ) ) t s e l s e

( t , a ) : t s i n

Abs v ( c o n s t r u c t T e r m v s ( t e r m ) r )

a n n o t a t e W o r d : : A n n o t a t i o n E n v−> Word> Word a n n o t a t e W o r d e n v w@( Word { p o s = p o s } )

| i s D e t p o s = a n n o t a t e D e t e n v w

| i s A d j p o s = a n n o t a t e A d j e n v w

| i s V e r b p o s = a n n o t a t e V e r b e n v w

| i s A d v e r b p o s = a n n o t a t e A d v e r b e n v w

| i s N o u n p o s = a n n o t a t e N o u n e n v w

| i s P p o s = a n n o t a t e P e n v w

| o t h e r w i s e = a n n o t a t e A n y e n v w

Parser.hs

{−# LANGUAGE I m p l i c i t P a r a m s #−}

module P a r s e r ( p a r s e L e x i c o n , p a r s e T r e e , runCc ) where

−−M i s c .

import C o n t r o l . Monad import Data . Char import Data . Maybe

−− P a r s e c

import Text . P a r s e r C o m b i n a t o r s . P a r s e c h i d i n g ( t o k e n ) import Text . P a r s e r C o m b i n a t o r s . P a r s e c . Expr

−− P r o c c e s s h a n d l i n g import S y s t e m . P r o c e s s importGHC. IO . H a n d l e

−−Data S t r u c t u r e s importCCG import A n n o t a t e

p L e x i c o n : : P a r s e r [ Word ]

p L e x i c o n = p L e x i c o n E n t r y ‘ endBy ‘ n e w l i n e p L e x i c o n E n t r y : : P a r s e r Word

p L e x i c o n E n t r y = do ( s t r i n g "w( " ) many1 d i g i t ( s t r i n g " , " ) many1 d i g i t ( s t r i n g " , ’ " ) t <− pToken ( s t r i n g " ’ , ’ " )

lemma < pToken ( s t r i n g " ’ , ’ " ) p o s < pToken ( s t r i n g " ’ , ’ " ) pToken

( s t r i n g " ’ , ’ " ) pToken

( s t r i n g " ’ , ’ " ) c < p C a t e g o r y E x p r ( s t r i n g " ’ ) . " )

r e t u r n $ Word t lemma p o s c ( Var " ? " ) p T r e e : : L e x i c o n−> P a r s e r PTree

p T r e e l =do ( s t r i n g " c c g ( " ) many1 d i g i t ( s t r i n g " , " ) t <− p S u b t r e e l ( s t r i n g " ) . " ) r e t u r n t

p S u b t r e e : : L e x i c o n > P a r s e r PTree

p S u b t r e e l =do t r y ( pBRule l " f a " $ \ c t 1 t 2 > PFwdApp c ( r e d u c e $ App ( n o d e E x p r t 1 ) ( n o d e E x p r t 2 ) ) t 1 t 2 )

<|> t r y ( pBRule l " ba " $ \ c t 1 t 2 > PBwdApp c ( r e d u c e $ App ( n o d e E x p r t 2 ) ( n o d e E x p r t 1 ) ) t 1 t 2 )

<|> t r y ( pBRule l " f c " $ \ c t 1 t 2 > PFwdComp c ( r e d u c e $ Abs " x " $ App ( n o d e E x p r t 1 ) ( App ( n o d e E x p r t 2 ) ( Var " x " ) ) ) t 1 t 2 )

<|> t r y ( pBRule l " b c " $ \ c t 1 t 2 > PBwdComp c ( r e d u c e $ Abs " x " $ App ( n o d e E x p r t 2 ) ( App ( n o d e E x p r t 1 ) ( Var " x " ) ) ) t 1 t 2 )

<|> t r y ( pBRule l " bx " $ \ c t 1 t 2 > PBwdXComp c ( r e d u c e $ Abs " x " $ App ( n o d e E x p r t 2 ) ( App ( n o d e E x p r t 1 ) ( Var " x " ) ) ) t 1 t 2 )

<|> t r y ( pURule l " t r " $ \ c t > PFwdTR c ( r e d u c e $ Abs " f " ( App ( Var " f " ) $ n o d e E x p r t ) ) t )

<|> t r y ( pConj l )

<|> t r y ( pConj ’ ’ l )

<|> t r y ( pConj ’ ’ ’ l )

<|> t r y ( pLex l )

<|> t r y ( pWord l )

<?> " s u b t r e e "

pURule : : L e x i c o n−> S t r i n g −> ( C a t e g o r y > PTree > PTree ) > P a r s e r PTree pURule l f r = do ( s t r i n g f )

( s t r i n g " ( ’ " ) c < p C a t e g o r y E x p r ( s t r i n g " ’ , " ) t <− p S u b t r e e l ( s t r i n g " ) " ) r e t u r n $ r c t

pBRule : : L e x i c o n−> S t r i n g −> ( C a t e g o r y > PTree > PTree > PTree ) > P a r s e r PTree pBRule l f r = do ( s t r i n g f )

( s t r i n g " ( ’ " ) c < p C a t e g o r y E x p r ( s t r i n g " ’ , " ) t 1 < p S u b t r e e l ( s t r i n g " , " ) t 2 < p S u b t r e e l ( s t r i n g " ) " ) r e t u r n $ r c t 1 t 2 pConj : : L e x i c o n−> P a r s e r PTree

pConj l =do r <− ( t r y ( s t r i n g " c o n j " ) <|> ( s t r i n g " l p " ) ) ( s t r i n g " ( ’ " )

t <− pToken ( s t r i n g " ’ , ’ " ) c 1 < p C a t e g o r y E x p r ( s t r i n g " ’ , ’ " ) c 2 < p C a t e g o r y E x p r ( s t r i n g " ’ , " ) t 1 < p S u b t r e e l ( s t r i n g " , " ) t 2 < p S u b t r e e l ( s t r i n g " ) " ) l e t t 1 ’ = c a s e t 1 o f

PWord w −> J u s t $ PWord $ a n n o t a t e C o n j c 1 $ w { c a t e g o r y = ( c 1 : \ c 1 ) : / c 1 }

o t h e r w i s e −> N o t h i n g i f ( i s N o t h i n g t 1 ’ ) then

u n e x p e c t e d ( " L e f t c h i l d o f a c o n j u n c t i o n r u l e ’ " ++ r ++ " ’ s h o u l d b e a word . " )

e l s e

r e t u r n $ PFwdApp c 2 ( r e d u c e $ App ( n o d e E x p r ( f r o m J u s t t 1 ’ ) ) ( n o d e E x p r t 2 ) ) ( f r o m J u s t t 1 ’ ) t 2

pConj ’ ’ ’ : : L e x i c o n > P a r s e r PTree

pConj ’ ’ ’ l =do r <− ( t r y ( s t r i n g " l p " ) <|> ( s t r i n g " l t c " ) ) ( s t r i n g " ( ’ " )

75

c < p C a t e g o r y E x p r ( s t r i n g " ’ , " ) t 1 < p S u b t r e e l ( s t r i n g " , " ) t 2 < p S u b t r e e l ( s t r i n g " ) " ) l e t t 1 ’ = c a s e t 1 o f

PWord w > J u s t $ PWord $ c a s e r o f

" l p " > a n n o t a t e C o n j ’ $ w { c a t e g o r y = c : / c }

" l t c " > a n n o t a t e L t c l (w { c a t e g o r y = c : / ( n o d e C a t e g o r y t 2 ) } )

o t h e r w i s e > N o t h i n g i f ( i s N o t h i n g t 1 ’ ) then

u n e x p e c t e d $ " L e f t c h i l d o f a c o n j u n c t i o n r u l e ’ " ++ r ++ " ’ s h o u l d b e a word . "

e l s e

r e t u r n $ PFwdApp c ( r e d u c e $ App ( n o d e E x p r ( f r o m J u s t t 1 ’ ) ) ( n o d e E x p r t 2 ) ) ( f r o m J u s t t 1 ’ ) t 2

pConj ’ ’ : : L e x i c o n > P a r s e r PTree pConj ’ ’ l = do r <− ( s t r i n g " r p " )

( s t r i n g " ( ’ " ) c 1 < p C a t e g o r y E x p r ( s t r i n g " ’ , " )

t o k e n < o p t i o n M a y b e (do { c h a r ’ \ ’ ’ ; t <− pToken ; c h a r ’ \ ’ ’ ; c h a r ’ , ’ ; r e t u r n t } )

c 2 < o p t i o n M a y b e (do { c h a r ’ \ ’ ’ ; c < p C a t e g o r y E x p r ; c h a r ’ \ ’ ’ ; c h a r ’ , ’ ; r e t u r n c } )

t 1 < p S u b t r e e l ( s t r i n g " , " ) t 2 < p S u b t r e e l ( s t r i n g " ) " ) l e t t 2 ’ = c a s e t 2 o f

PWord w > J u s t $ PWord $ a n n o t a t e C o n j ’ $ w { c a t e g o r y = c 1 : \ c 1 }

o t h e r w i s e > N o t h i n g i f ( i s N o t h i n g t 2 ’ ) then

u n e x p e c t e d $ " R i g h t c h i l d o f a c o n j u n c t i o n r u l e ’ " ++ r ++ " ’ s h o u l d b e a word . "

e l s e

r e t u r n $ PBwdApp c 1 ( r e d u c e $ App ( n o d e E x p r ( f r o m J u s t t 2 ’ ) ) ( n o d e E x p r t 1 ) ) t 1 ( f r o m J u s t t 2 ’ )

pLex : : L e x i c o n−> P a r s e r PTree pLex l =do ( s t r i n g " l e x ( ’ " )

c 1 <− p C a t e g o r y E x p r ( s t r i n g " ’ , ’ " ) c 2 <− p C a t e g o r y E x p r ( s t r i n g " ’ , " ) t <− p S u b t r e e l ( s t r i n g " ) " )

r e t u r n $ P L e x R a i s e c 2 ( n o d e E x p r t ) t pWord : : L e x i c o n > P a r s e r PTree

pWord l =do ( s t r i n g " l f ( " ) ( many1 d i g i t ) ( s t r i n g " , " )

w o r d I n d e x <− ( many1 d i g i t ) ( s t r i n g " , ’ " )

c <− p C a t e g o r y E x p r ( s t r i n g " ’ ) " )

r e t u r n $ PWord ( l ! ! ( ( r e a d w o r d I n d e x : : I n t ) 1 ) ) pToken : : P a r s e r S t r i n g

pToken = many1 $ u p p e r <|> l o w e r <|> d i g i t <|> o n e O f "_−$ , . ! ? " <|> e s c a p e d <|> ( c h a r ’ ’ >>

r e t u r n ’ \ ’ ’ )

e s c a p e d = c h a r ’ \ \ ’ >> c h o i c e ( z i p W i t h e s c a p e d C h a r c o d e s r e p l a c e m e n t s ) e s c a p e d C h a r c o d e r e p l a c e m e n t = c h a r c o d e >> r e t u r n r e p l a c e m e n t c o d e s = [ ’ \ ’ ’ , ’ " ’ ]

r e p l a c e m e n t s = [ ’ \ ’ ’ , ’ " ’ ] p P a r e n s : : P a r s e r a−> P a r s e r a p P a r e n s = b e t w e e n ( c h a r ’ ( ’ ) ( c h a r ’ ) ’ ) p B r a c k e t s : : P a r s e r a−> P a r s e r a p B r a c k e t s = b e t w e e n ( c h a r ’ [ ’ ) ( c h a r ’ ] ’ ) p C a t e g o r y E x p r : : P a r s e r C a t e g o r y

p C a t e g o r y E x p r = b u i l d E x p r e s s i o n P a r s e r p C a t e g o r y O p T a b l e p C a t e g o r y p C a t e g o r y O p T a b l e : : O p e r a t o r T a b l e Char s t C a t e g o r y

p C a t e g o r y O p T a b l e = [ [ op " / " ( : / ) A s s o c L e f t , op " \ \ " ( : \ ) A s s o c L e f t ] ] where

op s f a = I n f i x ( s t r i n g s >> r e t u r n f ) a

p C a t e g o r y : : P a r s e r C a t e g o r y

p C a t e g o r y = p P a r e n s p C a t e g o r y E x p r

<|> ( p C a t e g o r y ’ " S " S )

<|> t r y ( p C a t e g o r y ’ "NP" NP)

<|> ( p C a t e g o r y ’ "N" N)

<|> ( p C a t e g o r y ’ "PP" PP)

<|> ( p C a t e g o r y ’ " c o n j " CONJ)

<|> ( p C a t e g o r y ’ " . " P u n c t u a t i o n )

<|> ( p C a t e g o r y ’ " , " Comma)

<?> " c a t e g o r y "

p C a t e g o r y ’ : : S t r i n g > ( A g r e e m e n t −> C a t e g o r y )−> P a r s e r C a t e g o r y p C a t e g o r y ’ s c = do s t r i n g s

a < p A g r e e m e n t r e t u r n $ c a p A g r e e m e n t : : P a r s e r A g r e e m e n t

p A g r e e m e n t = o p t i o n [ ] ( p B r a c k e t s $ p F e a t u r e ‘ sepBy ‘ ( c h a r ’ , ’ ) ) p F e a t u r e : : P a r s e r F e a t u r e

p F e a t u r e = t r y ( s t r i n g " d c l " >> r e t u r n FDcl )

<|> t r y ( s t r i n g " a d j " >> r e t u r n FAdj )

<|> t r y ( s t r i n g " p t " >> r e t u r n FPt )

<|> t r y ( s t r i n g " nb " >> r e t u r n FNb )

<|> t r y ( s t r i n g " ng " >> r e t u r n FNg )

<|> t r y ( s t r i n g "em" >> r e t u r n FEm )

<|> t r y ( s t r i n g " i n v " >> r e t u r n FI nv )

<|> t r y ( s t r i n g " p s s " >> r e t u r n FPss )

<|> t r y ( s t r i n g " b " >> r e t u r n FB )

<|> t r y ( s t r i n g " t o " >> r e t u r n FTo )

<|> t r y ( s t r i n g " t h r " >> r e t u r n FThr )

<|> t r y ( s t r i n g " f r g " >> r e t u r n FFrg )

<|> t r y ( s t r i n g "wq" >> r e t u r n FWq )

<|> t r y ( s t r i n g "qem" >> r e t u r n FQem )

<|> t r y ( s t r i n g " q " >> r e t u r n FQ )

<|> t r y ( s t r i n g " f o r " >> r e t u r n FFor )

<|> ( do { v < many1 u p p e r ; r e t u r n $ FVar v } )

<|> ( do { v < many1 l o w e r ; r e t u r n $ FUnknown v } )

<?> " f e a t u r e "

p a r s e L e x i c o n : : S t r i n g −> L e x i c o n p a r s e L e x i c o n s t r =

c a s e p a r s e p L e x i c o n " P a r s e e r r o r : " s t r o f L e f t e > e r r o r $ show e

R i g h t r > r

p a r s e T r e e : : L e x i c o n > S t r i n g > ( Maybe PTree ) p a r s e T r e e l s t r =

c a s e p a r s e ( p T r e e l ) " P a r s e e r r o r : " s t r o f L e f t e > N o t h i n g −− e r r o r $ show e R i g h t r > J u s t r

g e t S e c t i o n : : H a n d l e > S t r i n g > IO ( Maybe S t r i n g ) g e t S e c t i o n h s =

do−− h W a i t F o r I n p u t h (−1) e o f <− hIsEOF h i f e o f then

r e t u r n N o t h i n g e l s e do

i n p S t r < h G e t L i n e h i f i n p S t r == " " then

r e t u r n $ J u s t s e l s e do

−− p u t S t r ( " Got l i n e : " ++ i n p S t r ++ " \ n " ) g e t S e c t i o n h ( s ++ i n p S t r ++ " \ n " ) runCc : : ( Word−> Word )−> S t r i n g −> IO ( Maybe PTree )

runCc a s =do

( J u s t i n H a n d l e , J u s t o u t H a n d l e , _, p r o c e s s H a n d l e ) < c r e a t e P r o c e s s ( p r o c " b i n / s o a p _ c l i e n t _ f i x " [

"−−u r l " , " h t t p : / / l o c a l h o s t : 9 0 0 0 " ] ) { s t d _ i n = C r e a t e P i p e ,

s t d _ o u t = C r e a t e P i p e }

h S e t B u f f e r i n g i n H a n d l e L i n e B u f f e r i n g h S e t B u f f e r i n g o u t H a n d l e L i n e B u f f e r i n g p u t S t r " \ n "

p u t S t r " P a r s i n g : \ n "

p u t S t r s p u t S t r " \ n "

h P u t S t r i n H a n d l e s h P u t S t r i n H a n d l e " \ n "

g e t S e c t i o n o u t H a n d l e " " −− D i s c a r d comments

g e t S e c t i o n o u t H a n d l e " " −− D i s c a r d f u n c t o r d e c l a r a t i o n s

77

t r e e < g e t S e c t i o n o u t H a n d l e " "−− T r e e l e x i c o n < g e t S e c t i o n o u t H a n d l e " "−− L e x i c o n i f ( i s J u s t t r e e ) && ( i s J u s t l e x i c o n ) then

do l e t l = map a $ p a r s e L e x i c o n $ f r o m J u s t l e x i c o n l e t t r e e ’ = f i l t e r ( n o t . i s S p a c e ) $ f r o m J u s t t r e e l e t t = p a r s e T r e e l t r e e ’

r e t u r n t e l s e

r e t u r n N o t h i n g

Appendix C

Labeled test data

The following table consists of a random sample chosen from the “Swissotel Hotel”

topic of theOpinosis data set [Ganesanet al., 2010] which contain any morphological form of the subject of interest: hotel rooms. Each sentence in the data set (which may not constitute a complete review) has been labeled independently by two human individualswith respect to the subject of interest: hotel rooms. Furthermore the table contains results for the baseline (sentence level polarity value), and results for the presented method (entity level polarity value of subject of interest).

Labeledtestdata 1 The rooms are in pretty shabby condition , but they are clean . Negative Negative Unknown

2

The rooms are spacious and have nice views, I was NOT im-pressed with the mattress and every, little, tiny thing costs money .

Unknown Unknown N/A

3

The rooms look like they were just remodled and upgraded, there was an HD TV and a nice iHome docking station to put my iPod so I could set the alarm to wake up with my music instead of the radio .

Positive Positive Unknown

4 The rooms were cleaned spic and span every day . Positive Positive Unknown 5

When I got to the room , I thought the new rooms would have a plasma since the website implies the new rooms would have them , but I guess those come later .

Negative Negative Unknown

6 Very impressed with rooms and view ! Positive Positive Unknown

7 The rooms are not all that big . Negative Negative Unknown

8 Expensive Parking but great rooms . Positive Positive 30.0

9 Rooms were nicely furnished . Positive Positive Unknown

10 The rooms are very clean , comfortable and spacious and

up-to-date . Positive Positive 52.0

11

I’ve olny ever stayed in the “standard” rooms in this property , all of which are spacious and airy , and function well for both business or leisure travellers .

Positive Positive Unknown

12

It does suffer , however , from a trend that I have been noticing that as rooms at business class hotels are upgraded , particularly with a patch panel for the big LCD , TV , drawer space becomes less and less .

Negative Negative Unknown

13 We even got upgraded to one of the corner rooms which also

looked west toward Michigan Ave and the Wrigley building . Positive Positive Unknown 14 The rooms were very clean , the service was polite and helpful ,

and it’s near the heart of Chicago ! Positive Positive 52.0

15 You can see downtown and or the Navy Pier from most of the

rooms . Positive Positive Unknown