Querying the Classifications - 1.1 What is World Heritage

<Name>Biogeographical Regions</Name>

sites...

</Class>

<Name>Mixed</Name>

...

</Class>

<Name>Cultural</Name>

...

</Class>

sites...

. .

</Class>

</Classification>

associated with it. The query that searches the marked categories, has the following information available from the user:

• A list of unique ids of the marked categories.

• A list of keywords.

This information, together with the classifications and all the WH site-data, is enough to execute the query.

Figure2.11shows how a classification might look like. The ellipses denotes the

“categories” or “classes” of the classification, and the circles represents the sites.

The figure illustrates a scenario, where two categories C2 and C6 has been marked by a user – the categories are shown with bold line in the figure. The punctuated bold lines illustrate the part of the classification, that is relevant for the keyword search in the marked categories – the sites S1, S2, S3, S4 and S5 are the only sites, that should be included in the search.

C2 C3

C4 C5 C6

C7 C8

S1 S2 S3 S4 S5 S6 S7

Figure 2.11: Classification illustrating a search in marked categories Obviously it is easy enough to find all the relevant sites to search, when using a semistructured data model. When the unique ids of C2 and C6 are known, then all the successors, hence the relevant sites, can be located simply by using path expressions.

2.5.2 Finding Related Sites

In this report the term related sites or similar sites is used in the following way:

A site site A is similar or related to another site site B if they both belong to the same category.

For example all danish sites are similar sites to “Kronborg Castle”, because they all belong to the category “Danish sites”.

When a user has found a site that he finds interesting, he might want to find some similar sites, because the site has got some properties he finds interesting. It would

be a nice feature to be able to find other sites that belong to the same category as that site. Suppose the user has located a site S1 that he finds interesting. He then asks for a list of similar sites.

Figure2.12and figure2.13illustrates two classifications. The interesting site S1 has been marked with bold line in both classifications. The bold punctuated lines

C10 C11

S8 S1 S3 S10

C16 C13 C12

C17

C14 C15

S9 S11

Figure 2.12: Classification showing how to find similar sites

C2 C3

C4 C5 C6

C7 C8

S1 S2 S3 S4 S5 S6 S7

Figure 2.13: Classification showing how to find similar sites

show the part of the classifications which are interesting, when looking for similar sites. Basically all that must be done in order to find similar sites is to “find all the children of the parent of the interesting node (site)”, which is easily done using SSD representations of the classifications. In this case the similar sites are the children of the category nodes C15 and C4 – that is, the site nodes S2 and S7.

If the user is willing to accept similar sites that are not directly in the same category as the site he finds interesting, then the query for similar sites can be expanded to include the parent nodes of the category nodes C15 and C4. This is illustrated with the thin punctuated lines in the two figures.

2.5.3 Finding the Best Match

Naturally the visitors at the WH website should have the possibility to mark inter-esting categories in more than one classification and make a keyword search in all the classifications. Such a query can be executed by searching each of the classi-fication one at a time, but the same site will probably often appear as a match in several of the classifications. This imposes a need to order the matching sites by how good a match they are.

There are several issues that should be considered when ordering the matching sites – suppose a user has marked categories in several classifications and entered several keywords to search for. The following statements seems obvious:

• A site that matches two keywords, is better than a site that only matches one keyword.

• A site that is matched in two categories, is a better than a site which is only matched in a single category. This is true regardless of which classification the categories belong to. A site might be a member of two categories in the same classification.

It seems like a good idea to apply some “hit points” to each site while performing the queries. The pseudo code below defines how such a hit point system can work.

The search function takes a list of keywords, a list of ids for the categories that the user marked and a list of all the classification as arguments:

function search(keyword-list,markedCategoryID-list,classification-list) { for each classification in classification-list

for each category in classification.categories for each markedCategoryID in markedCategoryID-list

if category.ID=markedCategoryID then

for each site in category.sites for each keyword in keyword-list

if site contains keyword then assign 1 point to site else do nothing

else do nothing }

The best hits are then the sites with most points.

2.5.4 Presenting the Query Results

When the WH system has finished making a query it has to present the result to the user. One possible way to present the result is in the form of “result dataguide”.

Suppose the result of the “find similar sites” query (see the figures2.12 on the preceding page and2.13on the page before) should be presented to the user. The result can be expressed as a lattice as shown in figure2.14on the facing page. This lattice can be transformed to a dataguide the exact same way, as the classification

Similar sites C4

S1 S2 S1 S7

C15

Figure 2.14: Lattice representing a search result

lattices are transformed to a dataguides. There are several advantages to this result representation as opposed to simply presenting the result as a list of sites:

• If the user gets too many results from a query, he might want to make another query that only searches sites which were results of the original query. The lattice in figure2.14can be used directly for such a query, because it has the same structure as the original classifications.

• A “result dataguide” contains useful information about which categories a site belongs to – a simple list of matching sites does not. Hence visitors to the WH site would probably appreciate the dataguide structure.

• The functionality used to generate the “search dataguides” (as shown in fig-ure1.2on page4) and the “result dataguides” are the same. This can make the implementation a little simpler and more elegant.

The main disadvantage of displaying the query results as dataguides is, that it is more complicated to build the result lattice, hence it requires more processing power. This issue should be considered, when deciding on how to present query results to the user.

In document 1.1 What is World Heritage (Sider 53-57)