• Ingen resultater fundet

Query operators

In document Zeeker: A topic-based search engine (Sider 102-105)

Many well known search engines, such as Google and Yahoo!, make advanced search options available to their users. These options are used to submit more specialized queries. Most search engines make a broad search (by default) using the given query terms. It is then left to the users to make use of the advanced search options in order to force the engine to be more selective and strict in its evaluation of relevant documents thus giving the user a possibility of a more narrow search.

10.2.1 Search operators

The most common kind of search options is the use of special search oper-ators. Common search operators include operators for exact matches of the query terms, Boolean AND and Boolean OR operators among others. Exact operators force the search engine to match the query terms in their exact form

10.2 Query operators 83

and in the same order as they are submitted. A Boolean AND search tells the engine that all the query terms should be matched, but not necessarily in the submitted order. Using a Boolean OR means that any document containing any of the submitted terms is a match for the query.

Many other operators can also be found in modern search engines. Some make it possible to include or exclude words from resulting documents, limit a search to specific sites and many more. Suffice to say that users can make their queries quite precise if the right operators are used.

Search operators are not a necessity in a modern search engine, but given the enormous amount of data on the Internet, submitting a general query to a search engine is often not enough. Hence, some operators are made available inZeeker Search Engine such that the users could have some control over their queries.

Only the most common operators are implemented, i.e. Boolean AND, Boolean OR and exact match operator. The default query in the search engine is quite strict as it demands that all terms appear in the document where the terms should appear in the submitted order with at most one term between them.

Query operators would therefore help users find more relevant documents if the strict default search returns no, few or too many results. The syntax for the operators will not be described here but can be found in the User Guide chapter in appendix A.

10.2.2 Category filtering

Thus far, much has been said about how documents are clustered in the index.

Having achieved that, a description of how these clusters are presented to the search engine’s users is in order.

The primary goal of the clustering is to provide a filtering mechanism on the retrieved documents. When the engine retrieves documents from the index it will calculate to what clusters these documents belong. With this information available, users are able to see which categories the results belong to. Further-more, it is possible for users to select which clusters they want to use as filter.

See appendix A for a screen-shot of a query example.

This kind of filtering could be implemented as search operators as described above, or as a list from which the users can select appropriate categories. How-ever, presenting the users with the full list of categories could be a problem since there might be several thousand categories1 thus making it a tiresome affair to find the appropriate filters to use. Therefore, the category filtering is implemented as a search operator. Users are also presented with links to the categories such that they can resubmit their queries with the category filtering enabled without knowing the exact syntax for the operators. The syntax for the category operator can be found in the User Guide in appendix A.

1There are over 3.000 categories for themusical groupset

84 Retrieval

10.2.3 Query expansion

In chapter 1 the concept of query expansion was introduced. In short, query expansion is used to expand the submitted queries with additional query terms and/or reweighing the original query terms before submitting the query. The idea is that adding more terms to the queries will make the search engine more likely to find relevant results. Query expansion is especially effective when it comes to ambiguous query terms such asjaguar.

Despite the simple idea behind query expansion, it is in no way a trivial task.

There are mainly three strategies that can be used to expand queries. These areautomatic,manual oruser-assisted.

Automatic expansion relies on the search engine itself to find what terms should be added to the query. Here clustering could be helpful since for each query term, the engine could calculate which clusters the terms belong to and get the most relevant terms for these clusters and then expand the query using these terms. However, this strategy will not always be feasible. Considering the term jaguar as an example, the engine might find that it belonged to two different clusters, one regarding the car and one regarding the animal. Popu-lar terms would then be grabbed from these clusters and used to expand the query. The query would then contain terms used for both the animal and the car, thus making the query more difficult to match with a document in the index.

When using manual query expansion, users are presented with a list of terms to choose from in order to expand their queries. The term lists for the indexes can be very long, therefore making it impractical to display them all. Instead, user-defined expansion could be used alongside automatic expansions, such that the engine would grab terms from popular clusters and present to the user thereby making the user choose the appropriate terms and categories.

The final approach is to use user assisted query expansion. Relevance feed-back, a form of user assisted query expansion, is when a user is presented with results to the original query in unchanged form and is then asked to tell the search engine what document is most relevant. The search engine then analyzes the document, grabs relevant terms from it, expands the original query with the grabbed terms and resubmits it. This way the user helps the search engine find the relevant document and categories.

Query expansion is without a doubt a useful tool when it comes to short queries, whereas expanding longer queries might be very difficult and could give worse results than the original queries. Relevance feedback is probably the best way to implement query expansion since the user indicates what is relevant and what is not. However, this kind of expansion means that the user has to submit two queries to find the wanted results. Clearly, retrieving relevant results first time around is preferable.

Although query expansion has great potential, it is not implemented in the first versions of theZeeker Search Engine. As discussed above, great care must be taken when implementing query expansion, and with the limited time frame

In document Zeeker: A topic-based search engine (Sider 102-105)