Matrix-vector multiplication in a wavelet basis

We now turn to the problem of computing the matrix-vector product ^y

=

^Hx

where^x ^y ² ^RN and^H is given as in (8.25). This system has the form shown in Figure 8.15.

H

⁰⁰

H

⁰¹

H

⁰²

H

⁰³

H

¹¹

H

¹²

H

¹³

H

²²

H

²³

H

³³

H

¹⁰

H

²⁰

H

³⁰

H

²¹

H

³¹

H

³²

H

x

⁰

x

=

y

⁰

y

Figure 8.15:The structure of^y⁼^H^xfor

⁼³^.

The vector

y

may be computed block-wise as follows

=

j⁼⁰

i = 0 1 :::

^(8.49)

where

is the depth of the wavelet transform. The symbols ^xj, ^yi, and ^Hij denote the different blocks of the ^x, ^y, and ^H as indicated in Figure 8.6. The computation in (8.49) is thus broken down into the tasks of computing the prod-ucts which we will denote by

=

^H^ij^x^j

i j = 0 1 :::

1

^(8.50)

In the following we distinguish between blocks on or below the diagonal (

i

j

⁾

and blocks above the diagonal (

i < j

^).

8.4.1 Blocks on or below the diagonal

Let ^vij,

i

j

, be the vector of length

L

ij representing the nonzero part of the first column of^Hij, i.e.

h

^ijm

=

(

v

^h^ij_m⁺^;1iN i

for^h

m +

1

ⁱNⁱ ²

0 L

^ij^;

1]

0

^otherwise

and

H

mn^ij

= h

^ij^h_m^;_nⁱN i

where

m = 0 1 ::: N

^i,

= N

ⁱ

=N

^{j, and}

is the offset relative to the upper left element (see (8.45)). From equation (8.50) we see that the typical element of

y

^ij

can be computed column wise as

y

_m^ij

=

n⁼⁰

H

_mn^ij

x

_jn

=

n⁼⁰

h

^ij^h_m^;_nⁱN i

x

=

n⁼⁰

v

^h^ij_m^;_n⁺^{;1 i}N

x

jn (8.51)

For each

n

this computation is only valid for those

m

0 N

ⁱ ^;

1]

^where

v

^h^ij_m^;_n⁺^{;1 i}

N i

is defined, namely where

0 m

n +

1

ⁱNⁱ

L

^ij^;

1

^(8.52)

Let

k

^and

l

be defined such that

k

m

l

whenever (8.52) is satisfied. Then we can find

k

from the requirement

k

n +

1

ⁱNⁱ

= 0

k =

n

+ 1

ⁱNⁱ and the last row as

l =

k + L

^ij ^;

1

ⁱNⁱ. Letting

y

k^ij^:l

= y

k^ij

y

k^ij⁺¹

::: y

l^ij

]

then we can write the computation (8.51) compactly as

y

_k^ij^:_l

= y

_k^ij^:_l

+ x

_jn

v

^0:^ij_L^ij^;1

n = 0 1 ::: N

^j^;

1

^(8.53)

When

k > l

the band is wrapped and (8.53) must be modified accordingly.

If the vector

x

is a wavelet spectrum then many of its elements are normally close to zero as described in Chapter 4. Therefore we will design the algorithm to disregard computations involving elements in

x

^j ^where

x

< "

The algorithm is given below

Algorithm 8.2:

^y^ij ⁼^H^ij^x^j

, i

j

For

n = 0

^to

N

^j^;

1

if^j

x

jn^j

> "

^then

k =

n

+ 1

ⁱNⁱ

l =

k + L

^ij^;

1

ⁱNⁱ if

k < l

^then

y

_k^ij^:_l

= y

_k^ij^:_l

+ x

_jn

v

^ij

else (wrap)

y

^0:^ijl

= y

^0:^ijl

+ x

v

_L^ij^ij^;_l^:_L^;1

y

_k^ij^:_Nⁱ^;1

= y

^ij_k^:_Nⁱ^;1

+ x

v

^0:^ij_L^ij^;_l^;1

end end end

8.4.2 Blocks above the diagonal

Let

v

^ij^,

i < j

, be the vector of length

L

^ij representing the nonzero part of the first row of^Hij, i.e.

h

^ijn

=

(

v

^h^ij_n⁺^;1iN j

for^h

n +

1

ⁱN^j ²

0 L

^ij^;

1]

0

^otherwise ^(8.54)

where

H

mn^ij

= h

^ij^h_n^;_mⁱ

N j

(8.55) with

= N

=N

i. An example (shown for^H¹³) is

v2v3 v4v5v6v7 v8v9v10⁰ v0v1

v0v1v2v3 v4v5v6v7 v8v9v10⁰

v10⁰ v0v1v2v3 v4v5v6v7 v8v9

v6v7 v8v9v10⁰ v0v1v2v3 v4v5

Here

= 4 = 3

N

= 8

N

= 32

L

¹³

= 12

(padded with one zero). The superscripts

i j

has been dropped in this example.

In order to be able to disregard small elements in

x

as in the previous section, we would like to convert the row representation to a column oriented format. As indicated in the example above, the block^Hij can be characterized completely by

column vectors each with length

L

^ij

=

together with some book-keeping information. We will assume that

L

ij is a multiple of

, possibly obtained through padding with zeros as suggested in the example above.

Now we choose these vectors from the columns

L

^ij ^;

L

^ij ^;

1 ::: L

^ij^;

+ 1

which are the

last columns where the top row is non-zero: the shaded area in the example above. Let these column vectors be denoted

Z

^:^ijd for

d = 0 1 :::

1

From (8.54) and (8.55) we get

Z

_md^ij

= H

_mL^ij ^ij^;^;_d

= h

^ij^h_L^ij^;^;_d^;_mⁱN j

= v

^h^ij_L^ij^;^;_d^;_m⁺^;1iN j

= v

^h^ij_L^ij^;_m^;_d^;1iN j

(8.56) for

m = 0 1 ::: L

^ij

=

1

^and

d = 0 1 :::

1

. In the example above we have

Z

^:^ij⁰

=

v 0

⁷

v

Z

^:^ij¹

=

v

¹⁰

v

⁶

v

Z

^:^ij²

=

v

⁹

v

⁵

v

Z

^:^ij³

=

v

⁸

v

⁴

v

⁰

Equation (8.50) can then be computed columnwise using

Z

^:^ijd instead of^Hij. The typical element of

y

^{ij is}

y

_m^ij

=

n⁼⁰

H

_mn^ij

x

_jn

=

n⁼⁰

Z

_sd^ij

x

_jn

Let

k

^and

l

be defined such that

k

m

l

^whenever

0 s

L

^ij

=

1

^and

let

n

be fixed. Our task is now to determine

d

, together with

k

^and

l

, such that

Z

sd^ij

= H

mn. Therefore we put^ij

s = 0

and look for

k

d

^{such that}

Z

⁰^ij_d

= H

_kn^ij ^(8.57)

In other words: For

n

given, we want to know which vector to use and at which row it must be aligned.

Next we insert the definitions (8.55) and (8.56) in (8.57) and use (8.54) to obtain the equation

v

^h^ij_L^ij^;_d^{;1 i}

= h

^ij^h_n^;_kⁱ

= v

^h^ij_n^;_k⁺^{;1 i}

N j

which is fulfilled whenever

L

^ij^;

d

1

ⁱN^j

=

n

k +

1

ⁱN^j ^,

L

^ij^;

d

n + k

ⁱN^j

= 0

Let

L = L

^ij^;

n

. Then we can write the requirement as

L

d + k

ⁱN^j

= 0

^(8.58)

Since we need

k

in the interval

0 N

ⁱ^;

1]

we rewrite (8.58) using Lemma B.2:

0 =

L

d + k

ⁱN^j

=

(L

d + k)=

ⁱN^j=

=

k + (L

d)=

ⁱNⁱ from which we get

k =

(d

L)=

ⁱNⁱ

For this to be well-defined

must be a divisor in

(d

L)

^{, i.e.}^h

d

L

ⁱ

= 0

^{so we}

must choose

d =

L

ⁱ

Expanding the expression for

L

we obtain the desired expressions

d =

L

^ij^;

n

ⁱ (8.59)

k =

(d

L

^ij

+ n + )=

ⁱNⁱ (8.60) Finally,

l

is obtained as

l =

k + L

^ij

=

1

ⁱNⁱ (8.61) Let

Z

sd be defined by (8.56). Using (8.59), (8.60) and (8.61) we can formulate^ij the algorithm as

Algorithm 8.3:

^y^ij ⁼ ^H^ij^x^j

, i < j

For

n = 0

^to

N

^j^;

1

if^j

x

n^j

> "

^then

d =

L

^ij^;

n

ⁱ

k =

(d

L

^ij

+ n + )=

ⁱNⁱ

l =

k + L

^ij

=

1

ⁱNⁱ if

k < l

^then

y

^ij_k^:_l

= y

_k^ij^:_l

+ x

Z

^:^ij_d

else (wrap)

y

^ij^0:_l

= y

^0:^ij_l

+ x

Z

_L^ij^ij₌^;_l^:_L^ij₌^;1_d

y

^ij_k^:_Nⁱ^;1

= y

^ij_k^:_Nⁱ^;1

+ x

Z

^0:^ij_L^ij₌^;_l^;1_d

end end end

8.4.3 Algorithm

The full algorithm for computing (8.49) is

Algorithm 8.4: Matrix-vector multiplication (CIRMUL)

= 0

For

j = 0

^to

For

i = 0

^to

i

j

^then

=

^yⁱ

+

^y^ij computed with Algorithm 8.2 else

=

^yⁱ

+

^y^ij computed with Algorithm 8.3

8.4.4 Computational work

We are now ready to look at the computational work required for the matrix-vector multiplication^y

=

^Hx. We take (8.49) as the point of departure and start by considering the typical block

The length of^xj is

N

^j so for blocks on and below the diagonal of^H (

i

j

^{) in}

Algorithm 8.2 there are

L

^ij

N

j multiplications and the same number of additions.

Hence the work is

2L

^ij

N

floating point operations. For blocks above the diagonal (

j > i

) in Algorithm 8.3 there are

N

L

^ij

= = L

^ij

N

i multiplications and additions, so the work here is

2L

^ij

N

ⁱ

The total work can therefore be written as follows

j⁼⁰

2L

^jj

N

+

i⁼¹ i^;1

j⁼⁰

2L

^ij

N

+

j⁼¹ j^;1

i⁼⁰

2L

^ij

N

ⁱ

where the first sum corresponds to blocks on the diagonal, the second to blocks below the diagonal, and the third to blocks above the diagonal. Since

L

^ij

= L

^ji

we can swap the indices of the last double sum to find the identity

j⁼¹ j^;1

i⁼⁰

2L

^ij

N

ⁱ

=

i⁼¹ i^;1

j⁼⁰

2L

^ij

N

Hence we may write the total work of the matrix-vector multiplication with the wavelet transform of a circulant matrix as

F

^CIRMUL

= 2

j⁼⁰

L

^jj

N

+ 4

i⁼¹ i^;1

j⁼⁰

L

^ij

N

^j ^(8.62)

We recall that

N

=

N=2

^for

j = 0

N=2

^;^j⁺¹ ^for

1 j

and that

L

ij is given by the recurrence formulas

L

^j^;1^j^;1

=

L

^jj

=2

+ D

1 L

^ij^;1

= L

^ij

+ 2

ⁱ^;^j

(D

1)

L

⁺¹⁺¹

= L

(the bandwidth of the original matrix

A

⁾

Tables 8.2, 8.3, and 8.4 show

F

^CIRMULevaluated for various values of

L

D

^{, and}

N

= 3 L = 5

N D = 2 D = 4 D = 6 D = 8 32 784 1472 1808 2064 64 1568 3072 4576 5792 128 3136 6144 9280 12288 256 6272 12288 18560 24576 512 12544 24576 37120 49152 1024 25088 49152 74240 98304 2048 50176 98304 148480 196608

Table 8.2: The number of floating point operations

F

^CIRMUL as a function of

N

^for

different values of

D

N = 256 D = 4

L = 3 L = 4 L = 5 L = 6 0 1536 2048 2560 3072 1 5120 5120 6144 6144 2 8448 8448 9216 9216 3 11520 11520 12288 12288 4 14592 14592 15360 15360 5 17664 17664 18432 18432

Table 8.3: The number of floating point operations

F

^CIRMULshown for different values of

^and

L

. Note that

L

⁼²

k

^and

L

⁼²

k

⁺¹⁽

k

²^N) yield the same values for

>

⁰^.

This is a direct consequence of the rounding done in (8.62).

N = 256 L = 5

D = 2 D = 4 D = 6 D = 8 0 2560 2560 2560 2560 1 4096 6144 8192 10240 2 5120 9216 13312 17408 3 6272 12288 18560 24576 4 7360 15360 23680 31808 5 8416 18432 28736 38336

Table 8.4: The number of floating point operations

F

^CIRMULshown for different values of

^and

D

Table 8.2 shows that

F

^CIRMUL depends linearly on

N

. Moreover, Tables 8.3 and 8.4 show that the computational work grows with the bandwidth

L

, the wavelet genus

D

, and the transform depth

. Suppose that

L

is given. Then we see that the matrix-vector multiplication is most efficient if we take no steps of the wavelet transform (

= 0

). Consequently, any work reduction must be sought in trunca-tion of the vector ^x. This can be justified because ^xwill often be a 1D wavelet transform of the same depth (

) as the matrix^H. Therefore, depending on

^and

the underlying application, we expect many elements in ^xto be close to zero so we may be able to discard them in order to reduce the computational work. Con-sider Table 8.3. If we take

N = 256

L = 3

= 4

as an example, the question boils down to whether such a truncation of^xcan reduce 14592 operations to less than the 1536 operations required for

= 0

(no transform). Assuming that the work depends linearly on the number of non-zero elements in^x, this means that^x must be reduced by a factor of

10

(at least) before any work reduction is obtained.

In document Wavelets in Scientific Computing (Sider 167-175)