Escolar Documentos
Profissional Documentos
Cultura Documentos
296.3
Page 1
Page 2
Page 3
Compression Outline
Introduction:
Lossless vs. lossy
Model and coder
Benchmarks
Information Theory: Entropy, etc.
Probability Coding: Huffman + Arithmetic Coding
Applications of Probability Coding: PPM + others
Lempel-Ziv Algorithms: LZ77, gzip, compress, ...
Other Lossless Algorithms: Burrows-Wheeler
Lossy algorithms for images: JPEG, MPEG, ...
Compressing graphs and meshes: BBK
296.3
Page 4
Encoding/Decoding
Will use message in generic sense to mean the data
to be compressed
Input
Message
Output
Message
Page 5
Page 6
20 + 21 + + 2n + 1 = 2n+1
empty string
296.3
Page 7
Model Probs.
Coder
Compressed Message
Example models:
Simple: Character counts, repeated strings
Complex: Models of a human face
296.3
Page 8
Quality of Compression
Runtime vs. Compression vs. Generality
Several standard corpuses to compare algorithms
e.g. Calgary Corpus
2 books, 5 papers, 1 bibliography,
1 collection of news articles, 3 programs,
1 terminal session, 2 object files,
1 geophysical data, 1 bitmap bw image
The Archive Comparison Test maintains a
comparison of just about all algorithms publicly
available
296.3
Page 9
Comparison of Algorithms
Program
RK
BOA
PPMD
IMP
BZIP
GZIP
LZ77
Score
430
407
265
254
273
318
?
Page 10
Compression Outline
Introduction: Lossy vs. Lossless, Benchmarks,
Information Theory:
Entropy
Conditional Entropy
Entropy of the English Language
Probability Coding: Huffman + Arithmetic Coding
Applications of Probability Coding: PPM + others
Lempel-Ziv Algorithms: LZ77, gzip, compress, ...
Other Lossless Algorithms: Burrows-Wheeler
Lossy algorithms for images: JPEG, MPEG, ...
Compressing graphs and meshes: BBK
296.3
Page 11
Information Theory
An interface between modeling and coding
Entropy
A measure of information content
Conditional Entropy
Information content based on a context
Entropy of the English Language
How much information does each character in
typical English text contain?
296.3
Page 12
1
i ( s) log
log p( s)
p ( s)
1
H ( S ) p( s) log
p ( s)
sS
296.3
Page 13
Entropy Example
p( S ) {.25,.25,.25,.25}
H ( S ) 4 .25 log 4 2
p ( S ) {.5,.25,.125,.125}
H ( S ) .5 log 2 .25 log 4 2 .125 log 8 1.75
p ( S ) {.75,.125,.0625,.0625}
H ( S ) .75 log(4 3) .125 log 8 2 .0625 log 16 1.2
296.3
Page 14
Conditional Entropy
The conditional probability p(s|c) is the probability of s
in a context c. The conditional self information is
i ( s| c) log p( s| c)
The conditional information can be either more or less
than the unconditional information.
The conditional entropy is the weighted average of the
conditional self information
H ( S | C ) p (c) p ( s | c) log
p( s | c)
cC
sS
296.3
Page 15
p(b|w)
p(w|w)
b
p(w|b)
296.3
p(b|b)
Page 16
296.3
Page 17
Shannons experiment
Asked humans to predict the next character given
the whole previous text. He used these as
conditional probabilities to estimate the entropy
of the English Language.
The number of guesses required for right answer:
# of guesses 1 2 3 4 5 > 5
Probability .79 .08 .03 .02 .02 .05
From the experiment he predicted
H(English) = .6-1.3
296.3
Page 18
Compression Outline
Introduction: Lossy vs. Lossless, Benchmarks,
Information Theory: Entropy, etc.
Probability Coding:
Prefix codes and relationship to Entropy
Huffman codes
Arithmetic codes
Applications of Probability Coding: PPM + others
Lempel-Ziv Algorithms: LZ77, gzip, compress, ...
Other Lossless Algorithms: Burrows-Wheeler
Lossy algorithms for images: JPEG, MPEG, ...
Compressing graphs and meshes: BBK
296.3
Page 19
Page 20
Discrete or Blended
message:
0001
011
message:
1,2,3, and 4
296.3
Page 21
296.3
Page 22
Prefix Codes
A prefix code is a variable length code in which no
codeword is a prefix of another codeword.
e.g., a = 0, b = 110, c = 111, d = 10
All prefix codes are uniquely decodable
296.3
Page 23
1
1 0
0
a = 0, b = 110, c = 111, d = 10
296.3
Page 24
Page 25
Average Length
For a code C with associated probabilities p(s) the
average length is defined as
la (C )
p(s)l (c)
( s ,c )C
296.3
Page 26
Relationship to Entropy
Theorem (lower bound): For any probability
distribution p(S) with associated uniquely
decodable code C,
H ( S ) la (C )
Theorem (upper bound): For any probability
distribution p(S) with associated optimal prefix
code C,
la (C ) H ( S ) 1
296.3
Page 27
cC
296.3
Page 28
sS
sS
2
p ( s)
log 1/ p ( s )
sS
sS
Page 29
la ( S )
p( s)l (c)
p(s) log1 / p( s)
p(s) (1 log(1 / p(s)))
1 p ( s ) log(1 / p ( s ))
( s , c )S
sS
sS
sS
1 H (S )
And we are done.
296.3
Page 30
296.3
Page 31
Huffman Codes
Invented by Huffman as a class assignment in 1950.
Used in many, if not most, compression algorithms
gzip, bzip, jpeg (as option), fax compression,
Properties:
Generates optimal prefix codes
Cheap to generate codes
Cheap to encode and decode
la = H if probabilities are powers of 2
296.3
Page 32
Huffman Codes
Huffman Algorithm:
Start with a forest of trees each consisting of a
single vertex corresponding to a message s and
with weight p(s)
Repeat until one tree left:
Select two trees with minimum weight roots p1
and p2
Join into single tree by adding root with weight
p1 + p2
296.3
Page 33
Example
p(a) = .1, p(b) = .2, p(c ) = .2, p(d) = .5
a(.1)
(.3)
b(.2)
c(.2)
d(.5)
(.5)
(1.0)
1
0
(.5) d(.5)
a(.1) b(.2)
(.3)
c(.2)
1
0
Step 1
(.3)
c(.2)
a(.1) b(.2)
0
1
Step 2
a(.1) b(.2)
Step 3
Page 34
296.3
Page 36
log(.999) .00144
If we were to send 1000 such messages we might
hope to use 1000*.0014 = 1.44 bits.
Using Huffman codes we require at least one bit per
message, so we would require 1000 bits.
296.3
Page 37
l 2 si
i 1
296.3
Page 38
i 1
c = .3
f (i )
p( j )
j 1
b = .5
Page 39
0.7
c = .3
0.7
c = .3
0.55
0.0
a = .2
0.3
c = .3
0.27
b = .5
b = .5
0.2
0.3
a = .2
b = .5
0.21
0.2
a = .2
0.2
Page 40
li li 1 si 1 f i
s1 p1
si si 1 pi
bottom of interval
size of interval
sn pi
i 1
296.3
Page 41
Warning
Three types of interval:
message interval : interval for a single message
sequence interval : composition of message
intervals
code interval : interval for a specific code used
to represent a sequence interval (discussed
later)
296.3
Page 42
296.3
Page 43
0.7
c = .3
0.7
0.49
0.2
0.0
b = .5
a = .2
0.55
c = .3
0.55
0.49
0.3
0.2
0.49
0.475
b = .5
a = .2
c = .3
b = .5
0.35
a = .2
0.3
Page 44
Representing Fractions
Binary fractional representation:
.75
.11
1/ 3
.0101
11 / 16 .1011
So how about just using the smallest binary fractional
representation in the sequence interval.
e.g. [0,.33) = .0 [.33,.66) = .1 [.66,1) = .11
But what if you receive a 1?
Should we wait for another 1?
296.3
Page 45
Representing an Interval
Can view binary fractional numbers as intervals by
considering all completions. e.g.
.11
.101
min
max
interval
.110
.1010
.111 [.75,10
. )
.1011 [.625,.75)
296.3
Page 46
1
.1
.11
.0
0
Note that if code intervals overlap then one code is
a prefix of the other.
Lemma: If a set of code intervals do not overlap
then the corresponding codes form a prefix code.
296.3
Page 47
.75
.625
296.3
Page 48
.110
.100
.001
Page 49
296.3
Page 50
Bound on Length
Theorem: For n messages with self information {s1,
,sn} RealArithEncode will generate at most
n
2 si bits.
i 1
Proof:
1 log s
1 log pi
i 1
1 log pi
i 1
n
1 si
i 1
n
2 si
i 1
296.3
Page 51
Page 52
296.3
R=256
c = 30
b=7
a = 11
Page 53
R=256
si
ui 1 li 1 1
ui
li
li 1 si f (mi 1) / T 1
li 1 si f (mi ) / T
l
0
296.3
Page 54
Page 55