Unrooted LCP Queries - Overview of Data Structures

2.3 Overview of Data Structures

3.1.1 Unrooted LCP Queries

By default, the LCP data structure can only answer rooted LCP queries. Cole et al. [13]

describes how support for unrooted LCP queries on a compressed trieT(C_i)can be added at the cost of increasing the size of the data structure byO(|C_i|log|C_i|).

Lemma 3 (Cole et al.) Providedxhas been preprocessed in timeO(|x|), unrooted LCP queries onT(Ci) for any suffix of x can be performed in time O(log logn) by usingO(|C_i|log|C_i|)additional space.

The main idea in obtaining this lemma, is to create a heavy path decomposition ofT(C_i) and add the compressed subtries rooted in the root of every heavy path to the LCP data structure. This causes the additionalO(|C_i|log|C_i|)space. An unrooted LCP query

LCP(x, i, `), where`is not the root of a heavy path, can in timeO(1)be reduced to a rooted LCP query on the subtrie of a descendant heavy path. We give the proof of this lemma in Section 3.6.2.

We present a new result, showing that support for slower unrooted LCP queries on a compressed trieT(Ci)can be added using linear additional space.

Lemma 4 Providedxhas been preprocessed in timeO(|x|), unrooted LCP queries on T(C_i)for any suffix ofxcan can be performed in timeO(log|C_i|+ log logn)by using O(|C_i|)additional space.

To achieve this, we create a heavy path decomposition ofT(C_i), and show that an unrooted LCP queryLCP(x, i, `)can be answered by following at mostlog|C_i|heavy paths from`to the location where the search forxstops, using constant time per heavy path. On the final heavy path, we need aO(log logn)time predecessor query to determine the exact location where the search stops. The full proof of this lemma is given in Section 3.6.3.

15 3.2 The Longest Prefix String

A very central and important concept of the LCP data structure is the notion of thelongest prefix string(identical to thehighest overlap stringin Cole et al. [13]). Given a non-empty string setS⊂Σ^∗ and a stringx∈Σ^∗, the longest prefix string inSforx, denotedlps_S(x), is defined as follows:

Definition 1 (Longest Prefix String) Among those strings in S having the longest maximum common prefix withx,lps_S(x)is a string which is lexicographically closest tox, breaking ties arbitrarily.

Notice that for anyxthere is at least one, and at most two such strings inS, assumingS is non-empty. The maximum common prefix betweenxandlps_S(x)is equal tomaxpref(S, x), and we will useh_S(x) =|maxpref(S, x)|as a shorthand to denote the length of this prefix.

To find the longest common prefix string for x in a set S it is convenient to consider the compressed, sorted trie forS. Refer to Figure 3.1 for an illustration of some longest common prefix strings.

Figure 3.1:Illustrating the longest prefix string for the stringsx1=aabc,x2 =bbabc,x3=abc,x4=bbbf, x5=abdandx6=bbbdbin the string setS={abb,abdea,abdef,bbbb,bbbd,bbbfea}. In the above figures ofT(S), the strings ofShaving the longest maximum common prefix withxiare marked in green. Among these strings, the bold ones are those lexicographically closest toxi. That is,lps_S(xi)is a bold string. Recapitulating, we have thatlps_S(x1) =abb,lps_S(x2) =bbbb, lps_S(x4) =bbbfea,lps_S(x5) =abdeaandlps_S(x6) =bbbd. There is a tie forx3, solps_S(x3) is either abborabdea. The length of the shared prefixes (marked in bold) arehS(x1) = 1, hS(x2) = 2,hS(x3) = 2,hS(x4) = 4,hS(x5) = 3andhS(x6) = 4.

The cases in Figure 3.1 suggest thatlps_S(x)is always one of the two strings lexicographi-cally closest toxinS. This is confirmed by the following lemma.

16 THE LCP DATA STRUCTURE

Lemma 5 The longest prefix stringlps_S(x)is lexicographically closest toxamong all strings inS.

Proof In casex∈S,lps_S(x) =xand the lemma clearly holds. In casex /∈Sassume without loss of generality thatlps_S(x)≺x. To obtain a contradiction, suppose that there is a stringy∈S different from lps_S(x), which is lexicographically closer tox thanlps_S(x), i.e.,

lps_S(x)≺y≺x .

Applying Corollary 1 yields that|maxpref(y, x)| ≥ |maxpref(lps_S(x), x)|. This contra-dicts the assumption thatlps_S(x)is the longest prefix string forxinS.

3.3 Building the LCP Data Structure

In the construction of the LCP data structure, a suffix treeT(C)is built for the indexed stringt, where C = suff(t). On T(C), a nearest common ancestor labeling scheme is constructed. The construction ofT(C)can be done in timeO(nlogσ)because the suffix tree must be sorted lexicographically [37, 21]. The space usage isO(n), and constructing the nearest common ancestor labels takesO(n)time [1] as this is the number of vertices inT(C).

The data structure also stores the compressed tries T(C₁), T(C₂), . . . , T(C_q), where eachCi is a set of substrings oft. On these tries, a weighted ancestor data structure is built. Furthermore, let y be a substring of t. We define order(y) ∈ {1, . . . , n(n+ 1)/2}

to be the position ofy in the lexicographic ordering of all substrings oft. The value of order(y) for a substringy ∈Ci can be found by performing a pre-order traversal of the sorted suffix tree. The order set ofCiisorderset(Ci) ={order(y)|y∈Ci}. A predecessor data structure is constructed onorderset(C_i)for allC_i. Building the compressed tries takes timeO(logσPq

i=1|C_i|)and spaceO(Pq

i=1|C_i|). It takesO(Pq

i=1|C_i|)time and space to construct the weighted ancestor and predecessor data structures [3, 39].

Concluding, the LCP data structure takes spaceO(n+Pq

i=1|C_i|), and can be built in timeO(logσ(n+Pq

i=1|C_i|)). Note that the data structure as described here only supports rooted LCP queries. If support for unrooted LCP queries is required, further preprocessing may be necessary depending on the type of support to add. Adding support for unrooted LCP queries is described in Section 3.6.

3.4 Preprocessing a Query String

Before performing a query with a stringx, it must be preprocessed to obtain two parameters forxthat is needed to perform LCP queries inO(log logn) time. These two parameters are:

1. The longest prefix string inC = suff(t)forx, i.e.,lps_C(x).

2. The length of the maximum common prefixhC(x) =|maxpref(C, x)|.

17 We preprocessxby searchingT(C)forxfrom the root, matching the characters ofxone by one. Eventually the search stops, and one of the following two cases occur.

(a) The search stops in a non-branching vertex (i.e. a implicit vertex or a leaf). In this case the longest prefix string lps_C(x) will be either the left leaf or the right leaf in the subjacent subtree. Letc_x andc_T be the next unmatched character ofxandT(C), respectively. Ifxis a prefix of some string inC, it is fully matched when the search stops andcx=ε. If the search stopped in a leaf,c_T =ε. We letvbe the next explicit vertex inT(C), descendant of the location where the search stopped. Then the longest prefix string inC forxcan be determined as

lps_C(x) =

(leftleaf(v) ifc_x ≤c_T rightleaf(v) ifcx > cT

Notice that the only case in whichc_x=c_T is whenc_x =c_T =εand hencex∈C. In casecT =ε, the search stopped in a leaf, soleftleaf(v) = rightleaf(v) =v= lps_C(x).

(b) The search stops in a branching vertexv∈T(C). In this case we need a predecessor query to find the longest prefix string forx, effectively determining the sorting ofxin relation to the children ofv. As before letc_x be the next unmatched character ofx (possiblyε). Assuming a predecessor data structure has been built forvover the first character on the edges to its children, we can chooselps_C(x)as

lps_C(x)∈ {rightleaf(PRED(cx)),leftleaf(SUCC(cx))}.

Notice that the set is non-empty, since eitherPRED(c_x)orSUCC(c_x)must exist.

In both cases, the length of the maximum common prefix h_C(x) = |maxpref(C, x)|is found as the number of matched characters inx. We useO(|x|)time to search forxand obtain hC(x). In the first case, it takes constant time to findlps_C(x). The predecessor query in the second case takes timeO(log logσ), since the alphabet is the universe for the predecessor query. Thus, we have established that preprocessing a stringxrequires O(|x|+ log logσ)time.

3.4.1 Preprocessing All Suffixes of a Query String

In order to support unrooted LCP queries forx, we will need access tolps_C(x⁰)andhC(x⁰) for an arbitrary suffix x⁰ of x. The above method suggests that preprocessing each of the|x|suffixes ofxto determine these could take timeΘ

P|x|

i=1log logσ+|suff_i(x)|

= Θ |x|log logσ+|x|²

. However, as shown in the following, we can exploit techniques used in linear time construction ofgeneralized suffix treesto reduce the preprocessing time.

A generalized suffix tree is a trie T(suff(S))for a set of strings S ⊂ Σ^∗. Ukkonen’s Algorithm [37] can be used to construct generalized suffix trees on-line, inserting strings fromS one at a time. The algorithm does so by extending an already created generalized suffix tree Tⁱ for the string setS = {t₀, t1, . . . , ti}with all suffixes of a new string ti+1,

18 THE LCP DATA STRUCTURE

obtaining a new suffix treeTⁱ⁺¹that also contains all suffixes oft_i+1. For our purposes, the algorithm can be changed to not modifyTⁱ, thus only determining the locations inTⁱ where all suffixes oft_i+1 branched from the tree. This can be done in timeO(|t_i+1|)[21, p 116]. SinceTⁱis not changed, this effectively searches for all suffixes oft_i+1 inTⁱ.

Now, if we considerT(C)as a generalized suffix tree, we can in timeO(|x|)determine the location`_x⁰ ∈ T(C) where each suffix x⁰ ofx branched from the tree by searching for all suffixes of x using Ukkonen’s Algorithm. By storing these locations, h_C(x⁰) =

|maxpref(C, x⁰)|= |`_x⁰|is available in constant time. When needed, we can determine lps_C(x⁰)in timeO(log logσ)as described in the previous section, by using`_x⁰ ∈T(C)as the location where the search forx⁰ stopped. We have thus established the following lemma.

Lemma 6 Provided thatxhas been preprocessed in timeO(|x|),hC(x⁰)is available in constant time, andlps_C(x⁰)can be determined in timeO(log logσ)for any suffixx⁰ ofx.

This method also supports constant time lookup oflps_C(x⁰)by preprocessing all suffixes of xin timeO(|x|log logσ).

3.5 Rooted LCP Queries

In this section we show Lemma 2. The idea in answering a rooted LCP query forx on T(C_i) is to find a string z in C_i that x follows longest. We identify the leaf of T(C_i) that corresponds tozand the distanceh=|maxpref(x, z)|. We can then use a weighted ancestor queryWA(z, h)to determine the location wherexdiverges fromz. This location is the answer to the rooted LCP queryLCP(x, i). See Figure 3.2 for an illustration.

We will use the longest prefix string forxinCi,lps_C_i(x), as the stringz. The distance h is then equal to h_C_i(x) = |maxpref(lps_C_i(x), x)|. In order to determine lps_C_i(x) and h_C_i(x), we uselps_C(x)andh_C(x), which are available thanks to the preprocessing ofx.

We findlps_C_i(x)using a number of important lemmas. First, Lemma 7 and Corollary 2 show that we can determine the distance thatxfollows a stringy∈C_i in constant time.

T(Ci)

z= lps_C_i(x) h=hC_i(x)

LCP(x, i)

x ^WA(z, h)

Figure 3.2:Illustrating how a rooted LCP query forxonT(Ci)is answered by a weighted ancestor query on a stringz∈Cithatxfollows longest.

19 Lemma 8 shows that we can identify two candidate strings in C_i in time O(log logn), at least one of which is a valid choice forlps_C_i(x). In Lemma 9, we use Corollary 2 to determine which of the candidate strings thatxfollows longest, thereby obtaining a valid choice forlps_C_i(x)as well ash_C_i(x).

Lemma 7 Given a suffixy∈Coft, the distancehthat a stringx∈Σ^∗ followsycan be determined in constant time as

h=|maxpref(x, y)|= min(|NCA(lps_C(x), y)|, h_C(x)) provided thatlps_C(x)andh_C(x)are available.

Proof By Lemma 5,lps_C(x)is lexicographically closest toxamong all strings inC, so either

1. xlps_C(x)yorylps_C(x)x. In this case, Corollary 1 yields

h=|maxpref(x, y)|= min |maxpref(lps_C(x), y)|,|maxpref(x,lps_C(x))|

2. yxlps_C(x)orlps_C(x)xy. In this case, Corollary 1 yields

. By definition,|maxpref(x,lps_C(x))| ≥ |maxpref(y, x)|for anyy∈C, so

h=|maxpref(y, x)|=|maxpref(y,lps_C(x))|

= min |maxpref(lps_C(x), y)|,|maxpref(x,lps_C(x))|

For both of the above cases we have that

h= min |maxpref(lps_C(x), y)|,|maxpref(x,lps_C(x))|

By definition, |maxpref(x,lps_C(x))| =h_C(x), and since maxpref(lps_C(x), y) can be determined in constant time by a nearest common ancestor query on the leaves of T(C)corresponding tolps_C(x)andy, we have that

h= min |NCA(lps_C(x), y)|, h_C(x) . This concludes the proof.

We extend the lemma by showing that the distance thatxfollows a substring oftcan also be determined in constant time.

Corollary 2 Given a suffix y ∈ C of t, the distance h that a string x ∈ Σ^∗ follows pref_i(y), can be determined in constant time as

h=|maxpref(x,pref_i(y))|= min(i,|NCA(lps_C(x), y)|, h_C(x)) provided thatlps_C(x)andhC(x)are available.

20 THE LCP DATA STRUCTURE

Proof The distance that x follows pref_i(y) is at most |pref_i(y)| = i, and since x followsyat least as long asxfollowspref_i(y), the lemma follows from Lemma 7.

Using the predecessor data structure for orderset(Ci), we can in time O(log logn²) = O(log logn)determine the predecessor and successor string fory∈C in the lexicographic ordering ofC_i. We denote these strings asPRED_C_i(y) and SUCC_C_i(y) and the following lemma shows that fory= lps_C(x), at least one of these strings is a valid choice forlps_C_i(x).

Lemma 8 EitherPRED_C_i(lps_C(x))orSUCC_C_i(lps_C(x))is a valid choice forlps_C_i(x).

Proof We letx⁻ = PRED_C_i(lps_C(x))and x⁺ = SUCC_C_i(lps_C(x)). By definition, x⁻ andx⁺are the two strings inCi lexicographically closest tolps_C(x). We consider the following cases for the lexicographic ordering ofx.

1. x⁻ ≺ x ≺ x⁺. In this case x⁻ and x⁺ are also the two strings in C_i lexico-graphically closest to x, and by Lemma 5 one of them is a valid choice for lps_C_i(x).

2. xx⁻. We show thatx⁻is a valid choice forlps_C_i(x). To obtain a contradiction, assume thatx⁻ is not a valid choice forlps_C_i(x). Then there is a valid choice z ∈ C_i different fromx⁻ forlps_C_i(x). It must be the case thatz ≺ x⁻, since otherwisex⁻would be lexicographically closer toxthanz, contradicting thatz is a valid choice forlps_C_i(x). Sincex⁻is not a valid choice forlps_C_i(x), it must hold that either

a) xfollowszlonger thanx⁻, or

b) zis lexicographically closer toxthanx⁻, i.e.,xz≺x⁻.

Letz⁰ ∈Cdenote a suffix ofthavingzas a prefix. In the first case,xfollowsz⁰ longer thanx⁻, and thus also longer thanlps_C(x)because of the lexicographic ordering. In the second casez⁰ is lexicographically closer toxthanlps_C(x)since x z ≺x⁻ lps_C(x). Both cases contradict the definition of lps_C(x). This shows thatx⁻ must be a valid choice forlps_C_i(x).

3. xx⁺. In this casex⁺is a valid choice forlps_C_i(x). The argument is symmetri-cal to the previous.

The following lemma is obtained by combining the previously shown lemmas. The lemma provideslps_C_i(x)andhCi(x)which are needed in order to answer the rooted LCP query with a weighted ancestor query.

Lemma 9 Let C = suff(t) be the set of all suffixes of t. Given lps_C(x) and h_C(x), we can determinelps_C_i(x) andh_C_i(x)in timeO(log logn), provided that a nearest common ancestor data structure has been built for the suffix treeT(C).

21 Proof We first obtain the stringsx⁻=PRED_C_i(lps_C(x))andx⁺=SUCC_C_i(lps_C(x))in timeO(log logn). It follows from Lemma 8 that at least one ofx⁻andx⁺is a valid choice forlps_C_i(x). Lety⁻ andy⁺ denote suffixes inChavingx⁻andx⁺as a prefix, respectively.

From Lemma 7, the distance thatxfollows y⁻ andy⁺ is upper bounded by the distance thatlps_C(x)follows the strings. We can tell which of the stringsxfollows farthest by comparing the length of the maximum common prefix betweenlps_C(x)and y⁻ to that betweenlps_C(x)andy⁺as determined by two nearest common ancestor queries. From the lengths of the maximum common prefixes, we select the correct choice forlps_C_i(x)betweenx⁻andx⁺. Thus,

lps_C_i(x) =

(x⁻ if|NCA(lps_C(x), y⁻)| ≥ |NCA(lps_C(x), y⁺)|

x⁺ if|NCA(lps_C(x), y⁻)|<|NCA(lps_C(x), y⁺)| .

Note that in casey⁻andy⁺has an equally long maximum common prefix withlps_C(x), we selectx⁻. This is because x ≺x⁻ ≺lps_C(x) ≺x⁺ is a possible lexicographical ordering for the strings. This may happen if x⁻ ∈/ C is a prefix of lps_C(x), since a string is lexicographically ordered before any other string of which it is a prefix.

On the contrary,x⁻ ≺lps_C(x) ≺x⁺ ≺x is not a possible lexicographical ordering because there would be a string inChavingx⁺as a prefix, contradicting thatlps_C(x)is lexicographically closest toxinC. Thus,x⁻is always at least as close lexicographically toxasx⁺.

The maximum distance,h_C_i(x), thatxfollows a string inC_i equals the distance thatxfollowslps_C_i(x). Thus,hCi(x)is the maximum distance thatxfollows eitherx⁻ orx⁺since the maximum of these was the correct choice forlps_C_i(x). We determine the distances using Corollary 2 as

hCi(x) = max |maxpref(x, x⁻)|, |maxpref(x, x⁺)|

= max

min |NCA(lps_C(x), y⁻)|, h_C(x),|x⁻| , min |NCA(lps_C(x), y⁺)|, h_C(x),|x⁺|

Findingx⁺andx⁻can be done in timeO(log logn). The nearest ancestor queries can be answered in constant time. Hence the total time spent isO(log logn).

We now describe how to answer a rooted LCP query on T(C_i) in time O(log logn) for a suffix x⁰ of a string x, assuming that x has been preprocessed in time O(|x|). To answer a rooted LCP query LCP(x⁰, i), we first determine lps_C(x⁰) and h_C(x⁰) in time O(log logσ) =O(log logn)as described in Lemma 6 for use in the following lemmas. Then we determine the leaflps_C_i(x⁰)inT(Ci)andhCi(x⁰)in timeO(log logn)as described by Lemma 9. Knowing both of these parameters, the location wherex⁰ diverges fromT(C_i) can be found by a weighted ancestor query onT(C_i), determining the ancestor oflps_C_i(x⁰) having a depth (string length) equal tohCi(x), i.e.,WA(lps_C_i(x⁰), hCi(x))onT(Ci). Thus, a rooted LCP query can be answered in timeO(log logn), concluding the proof of Lemma 2.

22 THE LCP DATA STRUCTURE

3.5.1 Example of a Rooted LCP Query

In this section we illustrate and describe each of the steps necessary to answer a rooted LCP query for a small example. The goal is to answer a rooted LCP query for the string x = cacba on a compressed trie T(C_i). We assume that the LCP data struc-ture is built for the string t = bccbbccd, having the 28 unique substrings shown in Figure 3.3(a) with their lexicographic order number. The LCP data structure is built as previously described, producing the sorted suffix treeT(C)for all suffixesC= suff(t) = {bbccd,bccbbccd,bccd,cbbccd,ccbbccd,ccd,cd,d} of t. Furthermore, the suffixes in T(C) are labeled by their lexicographic order number and a nearest common ancestor data structure has been built forT(C).

First, we preprocess the query stringxto find the longest prefix string forxamong the suffixes oft,lps_C(x), and the length of the maximum common prefixhC(x). As shown in Figure 3.3(b), we find thatlps_C(x) =cbbccdwhich has order number19. The length of the maximum common prefix ish_C(x) = 1.

Next, we consider the compressed trieT(Ci)(see Figure 3.3(c)) storing the substrings Ci = pref₃(suff(t)) ={bbc,bcc,cbb,ccb,ccd,cd,d}. We assume thatT(Ci)is stored in the LCP data structure, and hence a weighted ancestor data structure has been built for T(Ci). Furthermore, a predecessor data structure has been prepared fororderset(Ci) = {3,7,16,21,26,27,28}, containing the lexicographic order numbers of the strings inCi.

We now describe how a rooted LCP queryLCP(x, i)for the stringxon the compressed trieT(Ci)is answered. First, we identify the longest prefix string forxinCi,lps_C_i(x)and the lengthhCi(x)as follows. The predecessor and successor oflps_C(x)(order number19) in the orderset ofC_i are the stringsx⁻ = cbb(order number16) and x⁺ = ccb(order number21). By Lemma 8 one of these strings is a valid choice forlps_C_i(x). To determine which, we use Corollary 2 to find the distance thatxfollowsx⁻andx⁺respectively. This step consists of performing two nearest common ancestor queriesNCA(y⁻,lps_C(x))and

NCA(y⁺,lps_C(x))on the suffix tree, wherey⁻andy⁺are suffixes ofthavingx⁻andx⁺as a prefix, respectively. In this way, we find thatxfollows bothx⁻andx⁺for a distance of1, and the longest prefix string inC_iforxisx⁻. The answer to the rooted LCP queryLCP(x, i) is the ancestor ofx⁻of depth1. This location can be found by a weighted ancestor query

WA(x⁻,1)onT(Ci)as shown in Figure 3.3(d).

3.6 Unrooted LCP Queries

In the following two subsections, we describe two different ways of answering an un-rooted LCP query LCP(x, i, `) on a trie T(Ci). The first method is the one stated by Cole et al. [13], which requires O(|C_i|log|C_i|) additional space to support unrooted queries in timeO(log logn)on a trieT(Ci). This method results in Lemma 3. The second method is a new solution that requiresO(|C_i|)additional space to add support for unrooted LCP queries in timeO(log|C_i|+ log logn)onT(C_i). This method results in Lemma 4.

(a) The 28 unique substrings of tand their lexicographic order number. The suffixes oftare marked in bold.

(b) The position of x = cacba in the suffix tree for t, i.e., T(C) withC = suff(t). The longest

(d) FindingLCP(x, i)by a weighted ancestor query on T(Ci), using thatlps_C_i(x) =x⁻andhC_i(x) = 1 has been determined.

Figure 3.3:Illustrating how to answer a rooted LCP queryLCP(x, i)on a compressed trieT(Ci)stored in the LCP data structure. The indexed text in this example ist=bccbbccdand the query string is x=cacba.

24 THE LCP DATA STRUCTURE

3.6.1 Prerequisites

Before describing the details of the two solutions, we first account for the prerequisites they share. We assume that the LCP data structure has been constructed as described in Section 3.3, i.e., in particular a nearest common ancestor data structure has been built for the suffix treeT(C).

Both the method by Cole et al. [13] and our new method rely on a heavy path de-compositionHofT(C_i)to add support for unrooted LCP queries onT(C_i). As described in Section 2.2.1, the top of each heavy pathH ∈ H is extended until every light edge contains exactly one single character. This implies that the root of a heavy pathH, which we denoteroot(H), is not necessarily an explicit vertex inT(Ci). To be able to index into a heavy pathH ∈ H, we store an array containing for each explicit vertexv∈H, the string length of the string starting inroot(H)and ending inv. By building a predecessor data structure for this array, we can find the location onH at string distance ifromroot(H) by a single predecessor query to determine the explicit parent vertex for the location.

Storing these predecessor data structures requires O(|C_i|) additional space, since each vertex inT(Ci) is contained in at most one heavy path. Indexing into the array forH can be done in constant time, and a predecessor query on the array can be answered in timeO(log log maxx∈C_i|x|) =O(log logn), since the size of the universe is bounded by the length of the longest string in Ci. Constructing the heavy path decomposition and the predecessor data structures takes timeO(|C_i|). We assume this is done when the LCP data structure is built.

For both methods of answering unrooted LCP queries, the following lemma is very central.

Lemma 10 Given a location`∈T(C_i)on a heavy pathH ∈ Hand a stringx∈Σ^∗, we lethdenote the distance thatxfollowsH starting in`. Provided thatxhas been preprocessed in timeO(|x|), we can determinehin constant time.

Proof Observe that the leaf of each heavy pathH ∈ H, leaf(H), corresponds to a

In document String Indexing for Patterns with Wildcards (Sider 20-0)