• Ingen resultater fundet

4.2 ICA classification

4.2.4 Summary

ICA seem to identify the grouping structure in the feature data better than in the LSI model. This we must assume is partially because ICA is not restricted to an orthogonal basis as is LSI.

IC components keywords

IC1 afb air wing overcast aluminum space boeing photographer lockett airshow airplane stratofortresses

IC

2 view building dome park garden place fall

IC3 weight height position lbs born college

IC

4 draft weight position lbs born height selected round nhl

Table 4.11 Keywords from the 4 component ICA classification using all three feature modalities.

Exploiting this property we can use ICA for unsupervised classification. Re-garding text, the number of components seem to project a hierarchical structure that correspond to human labeling, thus a human context taxonomy. Evidence of this was present, but not all clear when using image features. The image features used are low level - color and texture, thus describing context more in general, as does text. As such, the description level of the data for each of the media where different: color 2, texture 3 and text 5. We will therefore expect this kind of ordering regarding most multimedia data sets. Another reason that we do not see the grouping structure so clearly in image features is that clas-sifying the data in more components than are natural present, does not comply well with the independent ”ray” like classification that ICA exploits.

In combination of all three modalities - text, color and texture, the overall grouping structure in the classification was strengthened. This presents evi-dence that all modalities adds valuable information.

In regards to ICA algorithm, we used models with symmetric source probabil-ity function, that in principal is not the natural choice giving the feature data is strictly positive. From experience we do however find that ”flipping” the com-ponents by changing component sign in general works fine, as opposed to the results from separation of raw images. In the online chat room application pre-sented in the next chapter 5, we did however experience anti-correlated compo-nents from time to time. This present an interesting social point regarding chat room behaviour, thus when a given semantic (vocabulary) is used, another is definitely not.

CH A P T E R

5

Applications of ICA in virtual environments

5.1 ICA in chat rooms

Internet chat rooms are getting more and more popular in various relations.

They define in principle their own contexts and often with a mixture of topics at the same time. This is especially true for the cafe like chat rooms, where no or little interference is present from a supervisor or moderator. Figure 5.1 shows a small sample of such chat room. In spite of this anarchy, valuable in-formation can be obtained from monitoring these activities. Inin-formation about e.g. peoples general thoughts on the daily news and trends. Another purpose, is that of presenting the chat users with the resent discussed topics in a chat room before entering, or giving notice when a topic is being discussed for he/she to participate.

Related research areas are found in topic detection and tracking[1] where gen-erally news streams are analysed for the purpose of collecting overall reports.

In the chat room text streams we do however have topics mixed together with-out clear beginnings and endings, and so separating with ICA seem the obvious choice. Related work can be found in [9] that extends this framework in

pro-<Zeno> shooby hey but you were in school then. :)

<Sharonelle> Zeno - oh, I don't recall exactly - just statements like that - over the past few weeks.

<Miez> heyy seagate

<Recycle> denise: he deserved it for stealing os code in his early days

<Zeno> ok Sharonelle

<denise> LOL @ Recycle

<HaleyCNN> Join Book chat at 10am ET in #auditorium. Chat with Robert Ballard author of "Eternal Darkness: A Personal History of Deep-Sea Exploration," after his appearance on CNN Morning News at 9:30am ET.

<heartattackagain> Smith Jones....lol....We might have an operating system that doesn't crash every thirty minits....lololol...

<EdShore> Shooby, I don't believe you. I've been doing this sine PET, TRS-80, and PIRATES! Don't tell me you've been CHATTING! PROVE IT!

<Zeno> Recycle LOL ethical and criminal laws are different for the business world

<_Seagate_> Recycle, thats what the technology business is all about.

<tribe> I heard a local radio talk show host saying last night that he has noticed everytime this Elian issue slows down, something happens to either the family in Miami or in Cuba to put it right back in the headlines. He mentioned the cousin's hospitalization as just the latest saga

<Diogenes> If Bill Gates was in Silicon Valley never a word would you have ever heard.

<Zeno> SJ you may have been doing sine but i have been doing cosine.

<shooby> Smith Jones: Compuserve since, heck, 76?

<Zeno> i mean Smith Jones

<Recycle> rumor has it that he was even dumpster diving at school for code

Figure 5.1 The chat consists of a mixture of contributors discussing multiple concurrent topics. The figure shows a small sample of the aCNN.comchat room, April 5, 2000.

jection pursuit.

In the following we use the ICA text classification previously presented. The Molgedey and Schuster ICA algorithm is especially attractive given the dy-namic nature of chat data, and the minor model complexity for online purposes.

At first we look at a retrospective analysis of a whole day to illustrate the prin-cipals, and secondly present the onlineWebChatInternet page.