• Ingen resultater fundet

The rise of commercial enterprise: Web directories, meta search, portals

Chapter 2: Organising the World’s Information

9.0 The rise of commercial enterprise: Web directories, meta search, portals

everything. It searched and ‘indexed’ FTP sites and directories, which is why it could be

considered the first search engine. Archie was not searching the entire document nor discovering servers that were linked together but instead focussed on the titles of the files.

Nonetheless, it represented a first effort to reign in a quickly growing, chaotic information resource, not by imposing order on it from above, but by mapping and indexing the disorder to make it more usable (ibid:21-22).

Many have compared this description to the way the Web has grown, almost in a haphazard manner, collating things randomly from disparate sources or ‘like a library that consists of a pile of books that grows as anyone throws anything they wish onto the pile’ (ibid:22-23).

In 1992 a browser called Lynx evolved that used hypertext links in documents. Erwise was a graphical browser using the ‘libwww’ or the library of the World Wide Web, developed by a group of master students in Helsinki in 1992 but was never funded to advance further. One of the in-between steps transitioning from browsing files to the early beginnings of the internet was the Gopher system, seen still by some as an alternative to the World Wide Web, which

‘facilitated working through directory structures, and insulated the individual from a command-line interface’ (Halavais 2009: 22-23). Yet Gopher lacked what was called ‘hypertext’ so, in 1992, Veronica was placed on Gopher servers as a crawler to search ‘menu-structured directories’ (ibid).43 Created in 1992, ViolaWWW stands for ‘Visually Interactive Object-Oriented Language and Application’ and was the first browser to add extended functionality such as embedded scriptable objects, stylesheets and tables. Eventually ViolaWWW lost out to another graphical browser called Mosaic that was released on April 22, 1993. Mosaic was a kind of central nervous system, which provided users with full-colour, graphic webpages and, more importantly, a visual understanding of networked webpages that were both fun and intuitive to surf (Calore 2010). This browser enabled not only geeks but also users from around the world to have access to the web and it was subsequently ported to Microsoft Windows, making it

popular. Traffic increased on the World Wide Web from around 500 known servers in 1993 to around 10000 in 1994 with Mosaic the predominant means for searching the web.

9.0 The rise of commercial enterprise: Web directories, meta search, portals

Considered to be the web’s first ‘robot’, in June 1993 the World Wide Web Wanderer was initially designed to measure the size of the web and it existed until late 1995. Written in Perl, the Wanderer wandered through the web of hyperlinks, indexing titles and created an index called Wandex. With crawling, these ‘indexers’ discovered new documents and chose what to index whilst building archives and deciding how items should be structured and organised through parsing. Two students at Stanford (Jerry Yang, David Filo) released Yahoo! in 1994 that offered a very familiar looking directory compiled by experts which resembled a library catalogue. While this traditional format made the relatively unknown space of the web seem less alien to many, it quickly ran into deep problems, both in terms of scale, with the impossibility of keeping up with the growth of the web and by its ontology, the categorical system could not contain the complexity and dynamism of the information space it claimed to organise. In 1994, PhD student Brian Pinkerton’s WebCrawler was the first search engine to index ‘full text search’ and originally had its own database, with ads in separate areas on the page.45 ‘Receiving its millionth query near the end of 1994, it clearly had found an audience on the early web, and, by the end of 1994, more than a half-dozen search engines were indexing the web’ (ibid:22-23).

In 1995, AltaVista also appeared as a full-text search engine built on automated information gathering and indexing and because it was faster and more comprehensive, it quickly overtook the human-compiled directory. It also established the now standard interface paradigm of a relatively empty page with a simple search box, in which users could enter a query and receive a ranked list of search results.

During the 1990s search engines developed faster processing power and more storage space.

The key factor was rapid growth––like the economy––that facilitated the expansion of the index, or database. The design of these infrastructures had to be implemented and competition existed between companies over who had the largest database––the size of the engine or directory index––and who could update search queries most quickly, or the retrieval speed (Van

Couvering 2010:97). The early search engine era consisted of two alternative models of service provision: the web directory (Yahoo!, Magellan) pioneered editorial ratings and LookSmart provided groups of sites that were categorised, and in some cases rated, by an editorial team (ibid). ‘The Open Directory Project, by releasing its volunteer-edited, collaborative

categorization, provided another way of mapping the space’ (Halavais 2009:24-25). These implementations offered the user the ability to browse their directories and to be able to search

‘effectively’, as there were ready and available indexes that had been selected, filtered or

‘curated’, sometimes by hand (ibid).

The second model was much more complex technically, and involved automated technology to browse websites, store them in an electronic index, and automatically retrieve them based on user queries. These were more properly called engines. (Van Couvering 2010:97). (Figure 18)

In 1996 there were already ‘metasearch engines’––aggregators that culled results from various sources. Drawing on search queries from users, they then sent out their own queries to search engines and algorithmically structured the results, or ‘ranked’ them and displayed them back to the user. Once again, the university was the site of innovation with Daniel Dreilinger’s

Savvysearch at Colorado State University aggregating around 20 different search engine results.

There was also Metaseek, a context-based search engine for images, and HotBot that combined results from Inktomi and the Direct Hit database where the user could search within the search

45 Nowadays it a metasearch engine with sponsored and non-sponsored ads. https://www.webcrawler.com/

Figure 18: Elizabeth van Couvering’s Table 3, a comprehensive timeline overview of the above-mentioned search engines (2010:96).

results (Beigi et al 1998). Yet another ‘metasearch engine’, Northern Light, ‘clustered’ results from both private and public information resources and custom search engines, which was considered an innovation at the time. In 1998 AltaVista began incorporating ‘editorially rated sites’, harking back to the earlier explorative days of Magellan, only with a new interface. As with the pre-history of the search engine described in Chapter 1, Ask Jeeves was a search engine that referenced the name of a personalised butler. ‘Ask Jeeves attempted to make the query process more user-friendly and intuitive, encouraging people to ask fully formed questions rather than use Boolean search queries’ (Halavais 2009:23-24).46 It is now Ask.com.

It is important to note that most of these early search engines came out of research in academic environments or non-commercial settings. ‘Search engine technology primarily developed from the academic discipline of information retrieval, which itself is something of a hybrid between library science (now often called information science) and computer science’ (Van Couvering 2010:95). The late 1990s was, however, a time of great diversity, with dozens of search engines competing for market share, using advertising as a business model with venture capital.

This first period of search engine history, then, is characterised by technological

innovation within research centres followed by commercialisation using advertising and licensing as business models and capitalisation through venture capital and the stock market. The market was competitive, consisting of multiple companies with different technologies (ibid:100).

The ‘middle period’ begins with the boom of the dot-com era in late 1997 and continues until 2001. It is ‘characterised by the change in focus from search engines to “portals” and the involvement of traditional media and telecommunications giants in the sector’ (ibid). From 1997-1998 the navigational aspect was still present along with a directory service, yet this terminology of ‘portal’ became used more frequently. One example of the portal

conglomeration and business models that were being incorporated into the act of searching was iWon, with every search query an entry to a lottery rewarding its audience and paying out to its users. Aggregators such as Dogpile and MetaSearch queried other search engines for their results and Direct Hit ranked popularity.

Eventually many of these directory-based portals became major players, particularly Yahoo, who experimented with a number of search engine partnerships, beginning with combining Inktomi’s search technology with their existing directory in 1998, and subsequently acquired some of the largest general-purpose search engines, including AlltheWeb.com, AltaVista, and HotBot (Halavais 2009:24-25).

During the late 1990s there was a large amount of money invested in venture capital, as the search engine market was divided up into shares. Various ways to produce revenue were being tested out in these models; acquisitions occurred, and the field became a bit smaller (ibid:27).

Van Couvering describes this period in the history of search engines as two-fold in that there was competition yet also mergers: the concept of the ‘walled-garden’ (2010:100) where

46 Boolean search allows users to combine their search queries with terms such as AND, NOT, OR and NEAR to refine their queries and obtain more relevant results. Known as Boolean operators they can limit, define or widen search queries, though nowadays most search engines have these Boolean parameters as default. Operators have corresponding symbols, where AND is equal to “+” and NOT is equal to “-” and OR is default, meaning whatever you type in generates returns.NEAR is equal to putting your search query in quotes, in a specific order.

companies thought they could control one part of the internet, even though they later offered

‘limited, curated sets of web- or Internet-based resources while preventing access to the rest’

(Plantin et al. 2017: 301-302). The other was the integration of various forms of content and mergers that reflected attempts by media conglomerates to co-opt the arena while neglecting search (Van Couvering 2010:100).47 (Figure 19) This is made apparent by the merger of the search engine with other services–– content from advertisers that included a range of topics e.g.

shopping, travel, email, music and finance (ibid:102). Venture capital still played a role, and

‘despite the diminution of the actual search engine from the core of the business to lossleading commodity, there continued to be new technical innovations in search’ (ibid). As acquisitions and mergers proliferated in the late 1990s the field diminished yet there was still investment from venture capital before the dot.com bubble crash of 2000-2001.

Figure 19: Elizabeth van Couvering’s Figure 6 illustrating search engine mergers and acquisitions (2010:96).

47 Yahoo! is perhaps a partial exception because it still survives today.

By the end of the 1990s the model was to have advertising finance search and, as more people came online, the content on the web was changing. The usage of search engines rivalled that of email with people querying ‘current events, health concerns, products, government services, natural disasters, their new neighbours, prospective employees or dates, and a myriad of other topics ranging from the mundane to the utmost serious’ (Hargattai 2007:769). For a search engine, the question was how to discern those users looking for porn, which was also a

dominant activity, from others searching for information or just ‘surfing’ the web.48 In 1998 an early theorist of the web, Phil Agre, contextualised the problem with search at this moment with a comparison of connections between two media: the web and the telephone.

Assuming that every page on the web had eight hyperlinks leaving it, and that the targets of these links were picked at random from all the possible sites on the web, the structure would be entirely unnavigable. Unlike the telephone, the web is very malleable, able to take on the characteristics not just of a point-to-point network or of a broadcast network, but a multitude of shapes between the two (Agre 1998; cited by Halavais 2009:59).

It is not only the web’s malleability but also its navigability, which is facilitated by

hyperlinking. This interconnection between texts would come to embody a type of politics, where authority determines the relevancy of information through the merger of search and research, as I show in the next chapter with Google’s PageRank algorithm.

48 However, more people were searching for pornography and reciprocally pornography sites were also in search of viewers and wanted to be found by their audience. ‘Often credited as the first money-making business online, adult websites became a big part of [the] online economy’ (Gilmore 2016).