Tutorial - Intercon 2014

Principles of Data Compression:
Theory and Applications
Dr. Daniel Leon-Salas
Motivation
The Information Revolution
IEEE Intercon 2014, Arequipa
Motivation
Consider a 3 minute song:
assuming two channels, a 16-bit resolution, a
sampling rate of 48 kHz, it will take 33 MB of disk
space to store the song.
Consider a 5 megapixel camera:

assuming an 8-bit resolution per pixel, it will take
5 MB of disk space to store one picture.
One second of video using the CCIR 601

format (720485) needs more than 30
megabytes of storage space.
Introduction
If data generation is growing at an explosive
rate, why not focus on improving transmission
and storage technologies?
Transmission and storage technologies are
improving but not at the same rate as data is
generated.
This is especially true for wireless
communications where the radio spectrum is
limited.
Introduction
Data compression is the art or science of
representing information in a compact form.
Data compression is performed by identifying
and exploiting structure and redundancies in
the data.
Data can be samples of audio, images, text
files, it can be generated by sensors or
scientific instruments, social networks,
markets, etc.
Introduction
Consider Morse code, developed in the 19th
century, in which letters are encoded with dots
and dashes.
some letters (e and a) occur more often than others (q
and j).
letters that occur more frequently are encoded using
shorter sequences: e .
a . Letters that occur less frequently are encoded using
longer sequences: q - - . j .- - -
In this case the statistical structure of the data

was exploited.
Introduction
There are many other types of structure in
data that can be exploited to achieve
compression.
In speech, the physical structure of our vocal
tract determines the kind of sounds that we
can produce instead of sending speech
samples we can send information about the
vocal tract to the receiver.
We can also exploit characteristics of the end
user of the data.
Introduction
In many cases, when transmitting images or
audio, the end user is a human.
Humans have limited hearing and vision
abilities.
We can exploit the limitations of human
perception to discard irrelevant information
and obtain higher compression.
Compression and Reconstruction

Original
Reconstructed
compression
reconstruction
(decompression)
Compression Algorithm
Lossless Compression
Lossless compression involves no loss of
information.
The recovered data is an exact copy of the
original.
Useful in applications that cannot tolerate any
difference:
medical images
scientific data
financial records
computer programs
Lossy Compression
In lossy compression some loss of information is
tolerated.
The original data cannot be recovered exactly but
results in higher compression ratios.
Useful in applications where some loss of
information is not critical:
speech coding
telephone
communications
video coding
digital photography
Compression Performance
Compression ratio (CR):
CR =
# bits required to represent data without compression

# bits required to represent data with compression
Rate: average number of bits per sample or symbol

Distortion (for lossy compression):
MSE =
2
2
2
max
PSNR dB = 10 log10
MSE
Example 1
Lets consider the following input sequence:
= [9, 11, 11, 11, 14, 13, 15, 17, 16, 17, 20, 21]
To encode this sequence using plain binary code, we would need to use 5
bits per number and a total of 60 bits.
K. Sayood, Introduction to Data Compression, 2nd edition, Morgan Kauffman
Example 1
If we use the model:
=+8
and compute the residual = = [0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1]
The residual consists of only three numbers {1, 0, 1} which can be

encoded using 2 bits per number for a total 36 bits.
Example 2
Input sequence:
a_barayaran_array_ran_far_faar_faaar_away
The sequence is made of eight different characters (symbols):
a, b, f, n, r, w, y, _
Hence, we can use three bits per symbol to encode the
sequence resulting in a total of 413=123 bits for the entire
sequence.
However, we can use fewer bits if we realize that some
symbols occur more frequently than others.
We can use fewer bits to encode the more frequent symbols.
K. Sayood, Introduction to Data Compression, 2nd edition, Morgan Kauffman
Example 2
Input sequence: a_barayaran_array_ran_far_faar_faaar_away
Input character
Frequency
Variable-length code
Fixed-length code
16
001
01100
010
0100
011
0111
100
000
101
01101
110
0101
111
000
codewords
001
codes
Using variable-length codes we can encode the sequence using only 97 bits.
Statistical Redundancy
Statistical redundancy was employed in
Example 2 to build a code to encode the input
sequence.
When compressing text, statistical redundancy
can be extended to, not only characters, but
also words dictionary technique.
Examples of compression solutions that use
the dictionary technique include the LempelZiv (LZ) algorithm, LZ77, gzip, Zip, PNG, PKZip.
Information and Entropy

Information can be defined as a message that helps to resolve
uncertainty.
In Information Theory information is taken as a sequence of
symbols from an alphabet.
Entropy is a measure of information.
source
A
{a1, a2 an}
symbols
First-order entropy of the source:
a1 a2 a3 a6 a8 a 5 a3 a4
( ) log ( )
=1
alphabet
message
Entropy
( ) log ( )
=1
If the base of the logarithm is 2 the units of entropy are bits. If the base is
10 the units are hartleys. If the base is e the units are nats.
The first-order entropy assumes that the symbols occur independently of
each other.
The entropy is a measure of the average number of bits needed to
encoded the output of the source.
Claude Shannon showed that the best rate that a lossless compression
algorithm can achieve is equal to the entropy of the source.
Example:
Lets consider a source with an alphabet consisting of four symbols: a 1, a2, a3, a4.
P(a1) = 1/2, P(a2) = 1/4, P(a3) = 1/8, P(a4) = 1/8
H = -(1/2 log2(1/2) + 1/4 log2(1/4) + 1/8 log2(1/8) + 1/8 log2(1/8)) = 1.75
bits/symbol.
Coding
Coding is the process of assigning binary sequences to symbols of
an alphabet.
Example:
Lets consider a source with a four-symbol alphabet such that: P(a1) = 1/2,
P(a2) = 1/4, P(a3) = 1/8, P(a4) = 1/8
H = 1.75 bits/symbol.
Symbol
Probability
Code 1
Code 2
Code 3
Code 4
a1
0.5
a2
0.25
10
01
a3
0.125
00
110
011
a4
0.125
10
11
111
0111
1.125 bits
1.25 bits
1.75 bits
1.875 bits
Average length
uniquely
decodable
codes
Prefix Codes
Consider the following codewords:
k bits
n bits
C2
C1
n bits
IF
C2
C1
then we say that C1 is a prefix of C2
k bits
dangling suffix
If the dangling suffix is itself a codeword, the code is not uniquely

decodable.
A prefix code is a code in which no codeword is a prefix of
another codeword.
Prefix codes are uniquely decodable.
Huffman Coding
Huffman coding is an algorithm for building optimum prefix
codes.
It was developed as a class assignment in the first class on
information theory taught by Robert Fano at MIT in 1950.
Huffman coding assumes that the probabilities of the source are
known.
Huffman coding is based on the following observations about
optimum prefix codes:
Symbols with higher probability have shorter codewords than
less probable symbols.
The two symbols with the lowest probabilities have the same
length (proof by contradiction)
In a Huffman code the codewords corresponding to the two
symbols with the lowest probabilities differ only in the last bit.
Huffman Coding
Example:
Lets build a Huffman code for a source with a four-symbol alphabet
such that: (a1) = 0.5, P(a2) = 0.25, P(a3) = 0.125, P(a4) = 0.125
0.5
0.25
0.125
0.125
a1
a2
a3
a4
0.5
0.25
a1
a2
0.25
0
a3
a4
Huffman Coding
2
0.5
0.25
a1
a2
0.25
0.5
a3
a4
0.5
a1
1
0
0.25
a2
0.25
0
a3
a4
Huffman Coding
4
1.0
0.5
a1
0.5
1
0
0.25
a2
0.125
a3
a4
Probability
Codeword
a1
0.5
a2
0.25
10
a3
0.125
110
a4
0.125
111
Average codeword length:

lavg = 0.51 + 0.252 + 0.1253 +
0.1253 = 1.75 bits
0.25
0
Symbol
0.125
It can be shown that for Huffman codes:

H(S) lavg H(S)+1
Decoding Huffman Codes

Example:
Decode the following message using the Huffman code from
previous example: 0110101110
Encoded message
0110101110
0110101110
a1
0110101110
a2
0110101110
0
a3
a4
0110101110
Decoded message
a1
a1 a3
a1 a3 a2
a1 a3 a2 a4
a1 a3 a 2 a4 a1
Adaptive Huffman Codes

Huffman coding requires knowledge of the probabilities of the source.
If this knowledge is not available, Huffman coding becomes a two-pass
procedure:
first pass to compute the probabilities
second pass to encode the output of the source.
The adaptive Huffman coding algorithm converts this two-pass
procedure into a single-pass procedure.
In adaptive Huffman coding, the transmitter and the receiver start with
a code tree that has a single node corresponding to all the symbols not
yet transmitted (NYT).
As transmission progresses, nodes corresponding to transmitted
symbols are added to the tree.
The first time a symbol is transmitted, the code for NYT is transmitted
first followed by a non-adaptive code agreed by the transmitter and the
receiver before transmission starts.
Golomb-Rice Codes
The Golomb-Rice codes are a family of codes commonly used in data
compression applications due to their low-complexity and good
compression performance.
The JPEG committee and the Consultative Committee for Space Data
Systems (CCSDS), for instance, have adopted the Golomb-Rice codes as
part of their standards.
Golomb-Rices codes have also been recommended in the lossless audio
compression standard H.264 and are already used in many commercial
audio compression software.
The Golomb-Rice codes have their origin in the pioneering work of
Golomb who proposed a method to encode run lengths of events of a
binary source when pom=1/2, where po is the probability of events and m
is an integer.
Golomb-Rice Codes
binary source
A
{0, 1}
100001000100001000000010001001
po is the probability of a 1
(pom=1/2 where m is an integer)
run lengths (non-negative integers)

P(n)
Geometric distribution
. 10. . 12.
0 1 2 3 4 5 6 7 8 9
11
Golomb-Rice Codes
The Golomb-Rice codes consider the special case when m = 2k (k0)
Encoding Procedure:
n
2k

unary code
n
mod k
2
natural binary
code
b7 b6 b5 b4 b3 b2 b1 b0
1 1 1 1 1 1 0 b3 b2 b1 b0
unary code
binary code
Example:
n =17 (00010001)
k=0 codeword = 111111111111111110
k=1 codeword = 1111111101
k=2 codeword = 1111001
k=3 codeword = 110001
k=4
k=5
k=6
k=7
codeword = 100001
codeword = 010001
codeword = 0010001
codeword = 00010001
Golomb-Rice Codes
Practical sources produce positive and negative numbers
(double-sided distribution)
P(n)
....
. 10. 11. .
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
Use the following mapping:

2n
M(n) =
if n 0
2|n|1 if n < 0
Maps positive input numbers

to even integers and negative
input numbers to odd integers.
Adaptive Golomb-Rice Codes
source
G-R
coder
codeword
adaptive
algorithm
M
Adaptive Golomb-Rice Codes
1)
2)
3)
4)
5)
6)
7)
Initialize k to kini;
Reset counter;
Read input n and encode it using parameter k;
If (unary code 1) increment counter;
If (unary code = 0) decrement counter;
If (counter value M) k++; Goto 2;
If (counter value -M) k--; Goto 2;
Entropy Coding
If the source has a narrow distribution, an entropy encoder (Huffman, Golomb-Rice,
arithmetic) can be used directly
P(n)
source
entropy
encoder
compressed
output
Otherwise, a decorrelation step might be necessary
source
decorrelation
predictive coding,
transform coding,
subband coding
entropy
encoder
compressed
output
Predictive Coding Decorrelation

In an image, a pixel
generally has a value
close to one of its
neighbors
55 57 59 63
58 61 63 69
60 64
X = 64
XX
-2
-1
eX
pixel prediction
prediction residual
Predictive Coding Decorrelation

Original
Residual
Histogram
Histogram
Context Adaptive Lossless Image

Compression (CALIC)
2
Pixel neighborhood
NN NNE
WW
NW
NE
The neighboring pixels N, W,

NE, NW, NN, WW, NNE are
available to both the encoder
and the decoder (assuming a
raster scan)
Initial pixel prediction:

If > 80

else if > 80

else {
+ /2 + ( )/4
if > 32
( + )/2
To get an idea of the boundaries present

in the neighborhood:
else if > 32
( + )/2
= | | + + | |
else if > 8
(3 + )/4
= | | + + | |
The initial prediction is refined based on the

relationships of the pixels in the neighborhood
(contexts). For each context we keep track of how much
prediction error is generated and use it to refine the
initial prediction.
else if > 8
(3 + )/4
}
Transform Coding
In transform coding the input sequence is transformed into another sequence in
which most of the information is contained in only a few elements.
For a 1D signal such as audio or speech, , the forward transform is defined as:
=
and the inverse transform is defined as:
=
the transforms are orthonormal transforms: = =
For 2D signals such as images, a two-dimensional separable transform is used.

In a separable transform, we can take a 1D transform in one dimension and
another 1D transform in the other dimension.
In matrix notation:
=
and the inverse transform is given by:
=
Transform Coding
In the JPEG standard, the forward transform is the Discrete Cosine Transform
(DCT) and the inverse transform is the Inverse Discrete Cosine Transform (IDCT).
The DCT transform matrix is defined as:
, =
1
2+1
cos
= 0, = 0,1, , 1
2
2+1
cos
= 1,2, , 1, = 0,1, , 1
input image
DC
DCT
Quantization
AC
quantization
table
DPCM
RLC
Entropy
encoder
compressed
image
Transform Coding - DCT

183 177 147
189 153 63
187 99 37
101 42 36
41 41 38
44 49 49
51 58 55
44 50 52
79
39
38
39
45
50
50
54
41
38
42
61
57
54
55
55
34
37
41
63
73
60
57
59
35
39
46
59
52
58
58
67
43
44
46
44
47
54
54
63
DCT
DC coefficient
AC coefficients
502.0 119.5 83.8 48.3
6.0
0.0
-0.1
-0.3
88.6 173.4 90.9 22.5 11.5
-1.8
-0.2
-0.8
62.0 78.7 22.2 -44.9 -19.8
-9.4
-7.3
-1.1
4.7 -37.1 -44.6 -30.2 -12.2
5.0
-3.0
4.1 11.5
5.1
12.2
3.5 -22.5 -36.9 -20.3 -13.0
12.1
9.7
-7.0
-6.6
2.6 11.3
8.5 11.5
9.2
7.9
3.7
-6.4
6.3 10.1
3.8
1.8
2.6
9.8
1.4
-2.0
0.3
2.3
-5.1
-1.2
Quantization of DCT Coefficients

Quantization Table ()
DCT coefficients
502.0 119.5
88.6 173.4
62.0 78.7
12.2 4.7
3.5 -22.5
12.1 9.7
9.2 7.9
2.6 9.8
83.8
90.9
22.2
-37.1
-36.9
-7.0
3.7
1.4
48.3
22.5
-44.9
-44.6
-20.3
-6.6
-6.4
-2.0
6.0
11.5
-19.8
-30.2
-13.0
2.6
6.3
0.3
0.0
-1.8
-9.4
-12.2
4.1
11.3
10.1
-1.2
-0.1
-0.2
-7.3
5.0
11.5
8.5
3.8
2.3
-0.3
-0.8
-1.1
-3.0
5.1
11.5
1.8
-5.1
16
12
14
14
18
24
49
72
11
12
13
17
22
35
64
92
10
14
16
22
37
55
78
95
16
19
24
29
56
64
87
98
24
26
40
51
68
81
103
112
40
58
57
87
109
104
121
100
51
60
69
80
103
113
120
103
= round
61
55
56
62
77
92
101
99
Quantized coefficients
496
84
56
14
0
24
0
0
121
168
78
0
-22
0
0
0
80
84
16
-44
-37
0
0
0
48 0
19 0
-48 0
-58 -51
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
After quantization the DCT

coefficients are transmitted
following a zig-zag pattern.
The coefficients are
encoded using a Huffman
code.
Transform Coding - DCT

Original
Coded using DCT
Sub-band Coding
In sub-band coding the input signal is decomposed into several subbands using an analysis filter bank.
Depending on the signal different sub-bands will contain different
amounts of information.
Sub-bands with lots of information are encoded using more bits while
sub-bands with little information are encoded using fewer bits.
At the decoder side, the signal is reconstructed using a bank of synthesis
filter.
. . .
f1
f2
f3
. . .
fM
Subband Coding
analysis
filter 1
entropy
encoder 1
. . .
entropy
decoder 1
synthesis
filter 1
analysis
filter 2
entropy
encoder 2
. . .
entropy
decoder 2
synthesis
filter 2
output
input
analysis
filter 3
entropy
encoder 3
analysis
filter M
entropy
encoder M
. . .
. . .
entropy
decoder 3
synthesis
filter 3
entropy
decoder M
synthesis
filter M
Further Reading
Khalid Sayood, Introduction to Data Compression, 4th edition, Morgan
Kaufmann, San Francisco, 2012.
G. Held and T. R. Marshall, Data Compression, 3rd edition, John Wiley
and Sons, New York, 1991.
N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall,
Englewood Cliffs, 1984.
B. E. Usevitch, A tutorial on modern lossy wavelet image compression:
foundations of JPEG 2000, IEEE Signal Processing Magazine, vol. 18, no.
5, 2001.
D. Pan, Digital audio compression, Digital Technical Journal, vol. 5, no.
2, 1993.
M. Hans and R. W. Schafer, Lossless compression of digital audio, IEEE
Signal Processing Magazine, vol. 18, no. 4, 2001.
G. E. Blelloch, Introduction to Data Compression, course notes,
Computer Science Department, Carnegie Mellon University

Tutorial - Intercon 2014

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Tutorial - Intercon 2014

Enviado por

Direitos autorais:

Formatos disponíveis

Principles of Data Compression:

Theory and Applications

Dr. Daniel Leon-Salas

IEEE Intercon 2014, Arequipa

Consider a 5 megapixel camera:

One second of video using the CCIR 601

In this case the statistical structure of the data

IEEE Intercon 2014, Arequipa

Compression and Reconstruction

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

# bits required to represent data without compression

Rate: average number of bits per sample or symbol

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

and compute the residual = = [0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1]

The residual consists of only three numbers {1, 0, 1} which can be

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

Information and Entropy

First-order entropy of the source:

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

then we say that C1 is a prefix of C2

If the dangling suffix is itself a codeword, the code is not uniquely

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

Average codeword length:

It can be shown that for Huffman codes:

IEEE Intercon 2014, Arequipa

Decoding Huffman Codes

IEEE Intercon 2014, Arequipa

Adaptive Huffman Codes

IEEE Intercon 2014, Arequipa

run lengths (non-negative integers)

IEEE Intercon 2014, Arequipa

Use the following mapping:

Maps positive input numbers

Adaptive Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

Adaptive Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

Otherwise, a decorrelation step might be necessary

IEEE Intercon 2014, Arequipa

Predictive Coding Decorrelation

IEEE Intercon 2014, Arequipa

Predictive Coding Decorrelation

IEEE Intercon 2014, Arequipa

Context Adaptive Lossless Image

The neighboring pixels N, W,

Initial pixel prediction:

To get an idea of the boundaries present

The initial prediction is refined based on the

For 2D signals such as images, a two-dimensional separable transform is used.

IEEE Intercon 2014, Arequipa

IEEE Intercon 2014, Arequipa

Transform Coding - DCT

502.0 119.5 83.8 48.3

88.6 173.4 90.9 22.5 11.5

62.0 78.7 22.2 -44.9 -19.8

4.7 -37.1 -44.6 -30.2 -12.2

3.5 -22.5 -36.9 -20.3 -13.0

IEEE Intercon 2014, Arequipa