We now turn to the problem of computing the matrix-vector product y
=
Hxwherex y 2 RN andH is given as in (8.25). This system has the form shown in Figure 8.15.
H
00H
01H
02H
03H
11H
12H
13H
22H
23H
33H
10H
20H
30H
21H
31H
32H
x
3x
2x
1x
0x
=
y
3y
2y
1y
0y
Figure 8.15:The structure ofy=Hxfor
=3.The vector
y
may be computed block-wise as followsyi
=
Xj=0
H
ij
xj
i = 0 1 :::
(8.49)where
is the depth of the wavelet transform. The symbols xj, yi, and Hij denote the different blocks of the x, y, and H as indicated in Figure 8.6. The computation in (8.49) is thus broken down into the tasks of computing the prod-ucts which we will denote byy
ij
=
Hijxji j = 0 1 :::
;1
(8.50)In the following we distinguish between blocks on or below the diagonal (
i
j
)and blocks above the diagonal (
i < j
).8.4.1 Blocks on or below the diagonal
Let vij,
i
j
, be the vector of lengthL
ij representing the nonzero part of the first column ofHij, i.e.h
ijm=
(
v
hijm+;1iN iforh
m +
;1
iNi 20 L
ij;1]
0
otherwiseand
H
mnij= h
ijhm;niN iwhere
m = 0 1 ::: N
i,= N
i=N
j, andis the offset relative to the upper left element (see (8.45)). From equation (8.50) we see that the typical element ofy
ijcan be computed column wise as
y
mij=
Nj
;1
X
n=0
H
mnijx
jn=
Nj
;1
X
n=0
h
ijhm;niN ix
jn=
Nj
;1
X
n=0
v
hijm;n+;1 iNi
x
jn (8.51)For each
n
this computation is only valid for thosem
20 N
i ;1]
wherev
hijm;n+;1 iN i
is defined, namely where
0
hm
;n +
;1
iNiL
ij;1
(8.52)Let
k
andl
be defined such thatk
m
l
whenever (8.52) is satisfied. Then we can findk
from the requirementh
k
;n +
;1
iNi= 0
,k =
hn
;+ 1
iNi and the last row asl =
hk + L
ij ;1
iNi. Lettingy
kij:l= y
kijy
kij+1::: y
lij]
then we can write the computation (8.51) compactly as
y
kij:l= y
kij:l+ x
jnv
0:ijLij;1n = 0 1 ::: N
j;1
(8.53)When
k > l
the band is wrapped and (8.53) must be modified accordingly.If the vector
x
is a wavelet spectrum then many of its elements are normally close to zero as described in Chapter 4. Therefore we will design the algorithm to disregard computations involving elements inx
j where
x
jn< "
The algorithm is given below
Algorithm 8.2:
yij =Hijxj, i
j
For
n = 0
toN
j;1
ifj
x
jnj> "
thenk =
hn
;+ 1
iNil =
hk + L
ij;1
iNi ifk < l
theny
kij:l= y
kij:l+ x
jnv
ijelse (wrap)
y
0:ijl= y
0:ijl+ x
jnv
Lijij;l:L;1y
kij:Ni;1= y
ijk:Ni;1+ x
nv
0:ijLij;l;1end end end
8.4.2 Blocks above the diagonal
Let
v
ij,i < j
, be the vector of lengthL
ij representing the nonzero part of the first row ofHij, i.e.h
ijn=
(
v
hijn+;1iN jforh
n +
;1
iNj 20 L
ij;1]
0
otherwise (8.54)where
H
mnij= h
ijhn;miN j
(8.55) with
= N
j=N
i. An example (shown forH13) is2
6
6
6
6
6
6
6
6
6
6
4
v2v3 v4v5v6v7 v8v9v100 v0v1
v0v1v2v3 v4v5v6v7 v8v9v100
v0v1v2v3 v4v5v6v7 v8v9v100
v0v1v2v3 v4v5v6v7 v8v9v100
v0v1v2v3 v4v5v6v7 v8v9v100
v0v1v2v3 v4v5v6v7 v8v9v100
v100 v0v1v2v3 v4v5v6v7 v8v9
v6v7 v8v9v100 v0v1v2v3 v4v5
3
7
7
7
7
7
7
7
7
7
7
5
Here
= 4 = 3
,N
1= 8
,N
3= 32
,L
13= 12
(padded with one zero). The superscriptsi j
has been dropped in this example.In order to be able to disregard small elements in
x
as in the previous section, we would like to convert the row representation to a column oriented format. As indicated in the example above, the blockHij can be characterized completely by column vectors each with lengthL
ij=
together with some book-keeping information. We will assume thatL
ij is a multiple of, possibly obtained through padding with zeros as suggested in the example above.Now we choose these vectors from the columns
L
ij ;L
ij ;;1 ::: L
ij;;+ 1
which are the
last columns where the top row is non-zero: the shaded area in the example above. Let these column vectors be denotedZ
:ijd ford = 0 1 :::
;1
.From (8.54) and (8.55) we get
Z
mdij= H
mLij ij;;d= h
ijhLij;;d;miN j= v
hijLij;;d;m+;1iN j= v
hijLij;m;d;1iN j(8.56) for
m = 0 1 ::: L
ij=
;1
andd = 0 1 :::
;1
. In the example above we haveZ
:ij0=
0
@
v 0
7v
31
A
Z
:ij1=
0
@
v
10v
6v
21
A
Z
:ij2=
0
@
v
9v
5v
11
A
Z
:ij3=
0
@
v
8v
4v
01
A
Equation (8.50) can then be computed columnwise using
Z
:ijd instead ofHij. The typical element ofy
ij isy
mij=
Nj
;1
X
n=0
H
mnijx
jn=
Nj
;1
X
n=0
Z
sdijx
jnLet
k
andl
be defined such thatk
m
l
whenever0
s
L
ij=
;1
andlet
n
be fixed. Our task is now to determined
, together withk
andl
, such thatZ
sdij= H
mn. Therefore we putijs = 0
and look fork
,d
such thatZ
0ijd= H
knij (8.57)In other words: For
n
given, we want to know which vector to use and at which row it must be aligned.Next we insert the definitions (8.55) and (8.56) in (8.57) and use (8.54) to obtain the equation
v
hijLij;d;1 iN
j
= h
ijhn;kiN
j
= v
hijn;k+;1 iN j
which is fulfilled whenever
h
L
ij;d
;1
iNj=
hn
;k +
;1
iNj ,h
L
ij;d
;n + k
;iNj= 0
Let
L = L
ij;n
;. Then we can write the requirement ash
L
;d + k
iNj= 0
(8.58)Since we need
k
in the interval0 N
i;1]
we rewrite (8.58) using Lemma B.2:0 =
hL
;d + k
iNj=
h(L
;d + k)=
iNj==
hk + (L
;d)=
iNi from which we getk =
h(d
;L)=
iNiFor this to be well-defined
must be a divisor in(d
;L)
, i.e.hd
;L
i= 0
so wemust choose
d =
hL
iExpanding the expression for
L
we obtain the desired expressionsd =
hL
ij;n
;i (8.59)k =
h(d
;L
ij+ n + )=
iNi (8.60) Finally,l
is obtained asl =
hk + L
ij=
;1
iNi (8.61) LetZ
sd be defined by (8.56). Using (8.59), (8.60) and (8.61) we can formulateij the algorithm asAlgorithm 8.3:
yij = Hijxj, i < j
For
n = 0
toN
j;1
ifj
x
nj> "
thend =
hL
ij;n
;ik =
h(d
;L
ij+ n + )=
iNil =
hk + L
ij=
;1
iNi ifk < l
theny
ijk:l= y
kij:l+ x
jnZ
:ijdelse (wrap)
y
ij0:l= y
0:ijl+ x
jnZ
Lijij=;l:Lij=;1dy
ijk:Ni;1= y
ijk:Ni;1+ x
jnZ
0:ijLij=;l;1dend end end
8.4.3 Algorithm
The full algorithm for computing (8.49) is
Algorithm 8.4: Matrix-vector multiplication (CIRMUL)
y
= 0
For
j = 0
toFor
i = 0
toIf
i
j
thenyi
=
yi+
yij computed with Algorithm 8.2 elseyi
=
yi+
yij computed with Algorithm 8.38.4.4 Computational work
We are now ready to look at the computational work required for the matrix-vector multiplicationy
=
Hx. We take (8.49) as the point of departure and start by considering the typical blockH
ij
x
j
The length ofxj is
N
j so for blocks on and below the diagonal ofH (i
j
) inAlgorithm 8.2 there are
L
ijN
j multiplications and the same number of additions.Hence the work is
2L
ijN
jfloating point operations. For blocks above the diagonal (
j > i
) in Algorithm 8.3 there areN
jL
ij= = L
ijN
i multiplications and additions, so the work here is2L
ijN
iThe total work can therefore be written as follows
X
j=0
2L
jjN
j+
Xi=1 i;1
X
j=0
2L
ijN
j+
Xj=1 j;1
X
i=0
2L
ijN
iwhere the first sum corresponds to blocks on the diagonal, the second to blocks below the diagonal, and the third to blocks above the diagonal. Since
L
ij= L
jiwe can swap the indices of the last double sum to find the identity
X
j=1 j;1
X
i=0
2L
ijN
i=
Xi=1 i;1
X
j=0
2L
ijN
jHence we may write the total work of the matrix-vector multiplication with the wavelet transform of a circulant matrix as
F
CIRMUL= 2
Xj=0
L
jjN
j+ 4
Xi=1 i;1
X
j=0
L
ijN
j (8.62)We recall that
N
j=
N=2
forj = 0
N=2
;j+1 for1
j
and that
L
ij is given by the recurrence formulasL
j;1j;1=
dL
jj=2
e+ D
;1 L
ij;1= L
ij+ 2
i;j(D
;1)
L
+1+1= L
(the bandwidth of the original matrixA
)Tables 8.2, 8.3, and 8.4 show
F
CIRMULevaluated for various values ofL
,D
,, andN
.= 3 L = 5
N D = 2 D = 4 D = 6 D = 8 32 784 1472 1808 2064 64 1568 3072 4576 5792 128 3136 6144 9280 12288 256 6272 12288 18560 24576 512 12544 24576 37120 49152 1024 25088 49152 74240 98304 2048 50176 98304 148480 196608
Table 8.2: The number of floating point operations
F
CIRMUL as a function ofN
fordifferent values of
D
.N = 256 D = 4
L = 3 L = 4 L = 5 L = 6 0 1536 2048 2560 3072 1 5120 5120 6144 6144 2 8448 8448 9216 9216 3 11520 11520 12288 12288 4 14592 14592 15360 15360 5 17664 17664 18432 18432
Table 8.3: The number of floating point operations
F
CIRMULshown for different values ofandL
. Note thatL
=2k
andL
=2k
+1(k
2N) yield the same values for>
0.This is a direct consequence of the rounding done in (8.62).
N = 256 L = 5
D = 2 D = 4 D = 6 D = 8 0 2560 2560 2560 2560 1 4096 6144 8192 10240 2 5120 9216 13312 17408 3 6272 12288 18560 24576 4 7360 15360 23680 31808 5 8416 18432 28736 38336
Table 8.4: The number of floating point operations
F
CIRMULshown for different values ofandD
.Table 8.2 shows that
F
CIRMUL depends linearly onN
. Moreover, Tables 8.3 and 8.4 show that the computational work grows with the bandwidthL
, the wavelet genusD
, and the transform depth. Suppose thatL
is given. Then we see that the matrix-vector multiplication is most efficient if we take no steps of the wavelet transform (= 0
). Consequently, any work reduction must be sought in trunca-tion of the vector x. This can be justified because xwill often be a 1D wavelet transform of the same depth () as the matrixH. Therefore, depending onandthe underlying application, we expect many elements in xto be close to zero so we may be able to discard them in order to reduce the computational work. Con-sider Table 8.3. If we take