What Is The WHT Anyway, and Why Are There So Many Ways To Compute It?

What is the WHT anyway, and
why are there so many ways to

compute it?
Jeremy Johnson
1, 2, 6, 24, 112, 568, 3032, 16768,…

Walsh-Hadamard Transform
• y = WHTN x, N = 2n
n
644447 44448 1 1
WHT N
= WHT2 ⊗ ... ⊗ WHT2 WHT2 = 1 − 1
WHT = WHT ⊗ WHT
4 2 2
1 1 1 1
 
1 1 1 1 1 − 1 1 − 1
=  ⊗  =
1 − 1 1 − 1 1 1 − 1 − 1
 
1 − 1 − 1 1
WHT Algorithms
• Factor WHTN into a product of sparse
structured matrices
• Compute: y = (M1 M2 … Mt)x

yt = Mtx
…
y2 = M2 y3
y = M1 y2
Factoring the WHT Matrix
• AC ⊗ ΒD = (Α ⊗ Β)(C ⊗ D)
• A ⊗ Β = (Α ⊗ Ι)(Ι ⊗ Β)
• A ⊗ (Β ⊗ C) = (A ⊗ Β) ⊗ C
• Im ⊗ Ιn = Ιmn
1 1 1 1  1 0 1 0  1 1 0 0
1 − 1 1 − 1 0 1 0 1  1 − 1 0 0 
WHT 4 = 1 1 − 1 − 1 = 1 0 − 1 0  0 0 1 1
    
1 − 1 − 1 1   0 1 0 − 1 0 0 1 − 1
WHT2 ⊗ WHT2 = (WHT2 ⊗ Ι2)(Ι2 ⊗ WHT2)

Recursive and Iterative
Factorization
WHT8 = (WHT2 ⊗ Ι4)(Ι2 ⊗ WHT4)
= (WHT2 ⊗ Ι4)(Ι2 ⊗ ((WHT2 ⊗ Ι2) (I2 ⊗ WHT2)))
= (WHT2 ⊗ Ι4)(Ι2 ⊗ (WHT2 ⊗ Ι2)) (I2 ⊗ (I2 ⊗ WHT2))
= (WHT2 ⊗ Ι4)(Ι2 ⊗ (WHT2 ⊗ Ι2)) ((I2 ⊗ I2) ⊗ WHT2)
= (WHT2 ⊗ Ι4)(Ι2 ⊗ WHT2 ⊗ Ι2) ((I2 ⊗ I2) ⊗ WHT2)
= (WHT2 ⊗ Ι4)(Ι2 ⊗ WHT2 ⊗ Ι2) (I4 ⊗ WHT2)

WHT8 =
(WHT2 ⊗ Ι4)(Ι2 ⊗ WHT2 ⊗ Ι2) (I4 ⊗ WHT2)
1 1 1 1 1 1 1 1
1 −1 1 −1 1 −1 1 − 1

1 1 −1 −1 1 1 − 1 − 1
 
1 −1 −1 1 1 −1 − 1 1
=
1 1 1 1 −1 −1 − 1 − 1
 
1 −1 1 −1 −1 1 − 1 1
1 1 −1 −1 −1 −1 1 1
 
1 −1 −1 1 −1 1 1 − 1
1 1  1 1  1 1 
 1 1  1 1  1 − 1 
   
 1 1  1 −1  1 1 
   
 1 1  1 − 1  1 − 1 
1 −1   1 1   1 1 
   
 1 −1  1  1 −1 
 1 −1  1 −1  1 1
   
 1 − 1  1 − 1  1 − 1
WHT Algorithms
• Recursive
WHT N = (WHT2 ⊗ IN / 2) (I2⊗ WHT N / 2)
• Iterative
WHT N = ∏ (I2i =1
n
i −1 ⊗ WHT ⊗ I n−i
2 2
)
• General
WHT2 = ∏ (I2
n
t
i =1
n1+L+ni −1 ⊗ WHT n
2i 2
)
⊗ I ni+1+L+nt ,
where n = n1 + L + nt
WHT Implementation
• Definition/formula
– N=N1* N2…Nt Ni=2ni
M
– x=WHTN*x xb,s=(x(b),x(b+s),…x(b+(M-1)s))
• Implementation(nested loop)
R=N; S=1;
for i=t,…,1 t
R=R/Ni
for j=0,…,R-1
WHT2n = ∏ (I
i= 1
⊗WHT2n i⊗ I 2 ni+1+ ··· + nt)
2 n1+ ··· + ni-1
for k=0,…,S-1
x NjNi i S + k , S = WHTN i ⋅ x NjNi i S + k , S
S=S* Ni;
Partition Trees
9
Left Recursive Right Recursive
3 4 2
4 4
1 2 1
3 1 1 3
1 1
2 1 1 2
Balanced
1 1 1 1
4
Iterative
4 2 2
1 1 1 1 1 1 1 1
Ordered Partitions
• There is a 1-1 mapping from ordered
partitions of n onto (n-1)-bit binary
numbers.
⇒There are 2n-1 ordered partitions of n.
162 = 1 0 1 0 0 0 1 0
1|1 1|1 1 1 1|1 1 → 1+2+4+2 = 9
Enumerating Partition Trees
00 01 01
3 3 3
2 1 2 1
1 1
10 10 11
3 3 3
1 2 1 2 1 1 1
1 1
Counting Partition Trees
1 +
 ∑ T n1
LT , n > 1
nt
Tn =  1 tn +L+ n = n
1, n = 1
T( z) = ∑ T z = z n
n 2 3 4
+ 2 z + 6 z + 24 z + L
n≥0
2
z T( z ) −1 + − 1 − 8z + 8 z
T( z) = +
(1 − z ) (1 − T( z ))
=
2(−2 + 2 z )
⇒ Tn = Θ(α / n ), α ≈ 6.8
n 3/ 2
WHT Package
Püschel & Johnson (ICASSP ’00)
• Allows easy implementation of any of the possible
WHT algorithms
• Partition tree representation
W(n)=small[n] | split[W(n1),…W(nt)]
• Tools
– Measure runtime of any algorithm
– Measure hardware events (coupled with PCL)
– Search for good implementation
• Dynamic programming
• Evolutionary algorithm
Histogram (n = 16, 10,000 samples)
•Wide range in performance despite equal number of arithmetic

operations (n2n flops)
•Pentium III consumes more run time (more pipeline stages)
•Ultra SPARC II spans a larger range
Operation Count
Theorem. Let WN be a WHT algorithm of
size N. Then the number of floating point
operations (flops) used by WN is Nlg(N).
Proof. By induction.
t
flops(W ) = ∑ 2n−n flops(W )
i
N i =1
N i
t t
= ∑ 2n−ni ni 2 i = 2n ∑ ni = n 2n
n
i =1 i =1
Instruction Count Model
3 3
IC(n) = α A(n) + ∑ β L (n) + ∑α l A (n)
i =1
i i
l =1
l
A(n) = number of calls to WHT procedure

α= number of instructions outside loops
Al(n) = Number of calls to base case of size l
α l = number of instructions in base case of size l
Li = number of iterations of outer (i=1), middle (i=2), and

outer (i=3) loop
βi = number of instructions in outer (i=1), middle (i=2), and
outer (i=3) loop body
Small[1]
.file "s_1.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
.globl apply_small1
.type apply_small1,@function
apply_small1:
movl 8(%esp),%edx //load stride S to EDX
movl 12(%esp),%eax //load x array's base address to EAX
fldl (%eax) // st(0)=R7=x[0]
fldl (%eax,%edx,8) //st(0)=R6=x[S]
fld %st(1) //st(0)=R5=x[0]
fadd %st(1),%st // R5=x[0]+x[S]
fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S]
fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ?????
fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0]
fstpl (%eax) //store x[0]=x[0]+x[S]
fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S]
ret
Recurrences
t
A(n) = 1 + ∑ 2 n −ni A(
n ), n = n + ... + n
i 1 t
i =1
A(n) = 0, n a leaf
A ( n ) =ν 2 , whereν l = number of leaves = l

n −l
l l
Recurrences
t
L1 ( n ) = t + ∑ 2 L1 ni
n −ni ( ),
i =1
n = n1 + ... + nt
t
L2 ( n ) = ∑ 2 n −ni n1+ + ni −1
n = n + ... + n
...
i =1
L2 ( ni) + 2 , 1 t
L3 ( n ) = ∑ 2 L2 ni 2
i =1
n −ni ( ) + n −ni ,
n = n + ... + n
1 t
L (n) = 0, n
i
a leaf
Histogram using Instruction Model (P3)
α l = 12, α l = 34, and α l = 106

α = 27
β1 = 18, β2 = 18, and β1 = 20
Algorithm Comparison
Recursive/Iterative Runtime Rec &Bal/It Instruction Count
2.00E+00
1.80E+00
1.60E+00 2.5
1.40E+00
1.20E+00
2
ratio
1.00E+00 r1/i1
1.5 rr1/i1
8.00E-01
6.00E-01 1 lr1/i1
4.00E-01 bal1/i1
0.5
2.00E-01
0.00E+00 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
11
13
15
17
19
WHT size(2^n)
Rec&It/Best Runtime Small/It Runtime

1.20E+01
1.20E+01
1.00E+01
1.00E+01
8.00E+00
r1/b I_1/rt
ratio
8.00E+00 6.00E+00
r3/b
r_1/rt
ratio
6.00E+00 i1/b 4.00E+00

i3/b
4.00E+00 b/b 2.00E+00
2.00E+00 0.00E+00
0.00E+00 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
WHT size(2^n)
WHT size(2^n)
Dynamic Programming
n
min Cost( ),
n1+… + nt= n
Tn1 … Tnt
where Tn is the optimial tree of size n.
This depends on the assumption that Cost only depends on

the size of a tree and not where it is located.
(true for IC, but false for runtime).
For IC, the optimal tree is iterative with appropriate leaves.

For runtime, DP is a good heuristic (used with binary trees).

What Is The WHT Anyway, and Why Are There So Many Ways To Compute It?

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

What Is The WHT Anyway, and Why Are There So Many Ways To Compute It?

Enviado por

Direitos autorais:

Formatos disponíveis

What is the WHT anyway, and

why are there so many ways to

1, 2, 6, 24, 112, 568, 3032, 16768,…

• Compute: y = (M1 M2 … Mt)x

WHT2 ⊗ WHT2 = (WHT2 ⊗ Ι2)(Ι2 ⊗ WHT2)

= (WHT2 ⊗ Ι4)(Ι2 ⊗ ((WHT2 ⊗ Ι2) (I2 ⊗ WHT2)))

= (WHT2 ⊗ Ι4)(Ι2 ⊗ (WHT2 ⊗ Ι2)) (I2 ⊗ (I2 ⊗ WHT2))

= (WHT2 ⊗ Ι4)(Ι2 ⊗ (WHT2 ⊗ Ι2)) ((I2 ⊗ I2) ⊗ WHT2)

= (WHT2 ⊗ Ι4)(Ι2 ⊗ WHT2 ⊗ Ι2) ((I2 ⊗ I2) ⊗ WHT2)

= (WHT2 ⊗ Ι4)(Ι2 ⊗ WHT2 ⊗ Ι2) (I4 ⊗ WHT2)

•Wide range in performance despite equal number of arithmetic

A(n) = number of calls to WHT procedure

Li = number of iterations of outer (i=1), middle (i=2), and

A ( n ) =ν 2 , whereν l = number of leaves = l

α l = 12, α l = 34, and α l = 106

Rec&It/Best Runtime Small/It Runtime

6.00E+00 i1/b 4.00E+00

where Tn is the optimial tree of size n.

This depends on the assumption that Cost only depends on

For IC, the optimal tree is iterative with appropriate leaves.

Você também pode gostar