Escolar Documentos
Profissional Documentos
Cultura Documentos
Compression
Compression is generally applied in multimedia communication to reduce
the volume of information to be transmitted or to reduce the bandwidth that is
required for its transmission. Compression is used to reduce the volume of
information to be stored into storages or to reduce the communication bandwidth
required for its transmission over the networks.
Compression Principles
1. Source encoders and destination decoders
2. Lossless compression and lossy compression
3. Entropy encodings
4. Source encoding
Source encoders and destination decoders
Lossless Compression
As the name implies, lossless compression schemes exploit redundancies
without incurring any loss of data. Thus, the data stream prior to encoding and
after decoding is exactly the same and no distortion in the reconstruction quality
is observed. Lossless image compression is exactly reversible. Ex : Text file
Lossless compression is achieved through the statistical redundancy. For
example, if we transform the image into a string of symbols prior to encoding and
then assign shorter code words to more frequently occurring symbols and longer
code words to less frequently occurring symbols, then we can achieve
compression and at the same time, the encoding process can be exactly reversed
during decoding, since there is an one-to-one mapping between the symbols and
their codes.
Lossless compression schemes: Run-length encoding, entropy coding, Ziv-
Lempel coding etc. Lossless image compression schemes can achieve only
limited extent of bandwidth reduction for data transmission, but preserves the
quality of the image, without suffering any distortion.
Lossy Image Compression
Contrary to lossless image compression, lossy image compression schemes
incur loss of data and hence suffer a loss of quality in reconstruction. Like
lossless image compression, the image is first transformed into a string of
symbols, which are quantized to a discrete set of allowable levels. It is possible to
achieve significant data compression, but quantization being a many-to-one
mapping is irreversible and exact reconstruction is never possible. Yet, if the loss
in reconstruction quality is acceptable to our visual perception, we may accept
this scheme in the interest of achieving very significant degree of compression.
Lossy compression schemes achieved through psychovisual redundancy. While
designing the quantizers, it must be known where we can tolerate loss of quality
and where we can not.
u Lossy compression algorithms, is normally not to reproduce an exact
copy of the source information after decompression
u Example application of lossy compression are for the transfer of
digitized images and audio and video streams
Transformer
This block transforms the original input data into a form that is more
suitable to compression. The transformation can be local, involving pixels in the
neighbourhood or global, involving the full image or a block of pixels.
The example of local transformation is
1. linear predictive coding followed by Differential Pulse Code Modulation
(DPCM),
2. Global transformation techniques use Discrete Fourier Transforms (DFT),
Discrete Cosine Transforms (DCT), Karhunen-Love Transforms (KLT), Discrete
Wavelet Transforms (DWT) etc,
The transformer block transforms the original spatial domain signal into another
spatial domain signal of reduced dynamic range, where only a few coefficients
contain bulk of the energy and efficient compression is possible. This block is is
lossless.
Quantization
Def:1.Reduce the number of distinct output values to a much smaller set.
Def:2. It is the process of round-off the signal amplitude for the sampling time
with the respective quantization level.
• Quantization is the process that confines the amplitude of a signal into a
finite number of values.
For the special case where Δ = 1, we can simply compute the output values for
these quantizers as:
Qmidrise(x) = [x] − 0:5
Qmidtread(x) = [x]+0:5
Performance of an M level quantizer. Let B = {b0; b1; : : : ; bM } be the set of
decision boundaries and Y = {y1; y2; : : : ; yM } be the set of reconstruction or
output values. Suppose the input is uniformly distributed in the interval
[−Xmax;Xmax]. The rate of the quantizer is: [R = log2M]
Quantization noise
The difference between the actual value of the analog signal amplitude and
the corresponding nominal amplitude(nearest quantization interval value)
for the particular sampling time is called Quantization noise.
• Ratio of the peak amplitude of a signal to its minimum amp is known as
the dynamic range. (Vmax to +Vma)
• No of bits per sample = N bits per sample
• Quantization interval q = 2 Vmax /2× 2N-1 = V max/2(N-1)
• Range of digital signal = -2N-1 to 2N-1 -1
(ii) Variable length coding (VLC) , also known as entropy coding, assigns code
words in such a way as to minimize the average length of the binary
representation of the symbols. This is achieved by assigning shorter code words
to the more probable symbols.
(a) Image decoder – Performs exact reversal of the coder in image compression
system. This block extracts the quantized coefficients.
(b) De-quantizer- Performs inverse of the quantization operation in image
compression. Since quantizer itself is lossy, de-quantization can never exactly
recover the transformed coefficients.
(c) Inverse Transformer – Performs exact reversal of the transformation
operation carried out in the corresponding image compression system. The output
of this block can be used for display.
Differential Pulse Code Modulation (DPCM)
Encode the difference between the current and previous 8x8 block
From the behavior in the past, the future signal values can be approximately
estimated
DPCM encoder consists of three steps:
– Predict the current signal value x(n) from past values x(n-1), x(n-2), ..
– Quantize the difference signal = prediction error
– Encode, e.g. VLC the prediction difference (prediction error).
As the output of the ADC is used directly and hence the accuracy of each
computed difference signal is known as residual signal . ADC operations
produce quantization errors. So with a basic DPCM, the previous value held in
the register is only the approximate value. More accurate version of the new
value can be identified by using the estimate of the current signal and also by the
prediction coefficients . As an example the in the below diagram, the difference
signal is computed by subtracting varying proportion of the last three predicted
values from the current digitized value output by the ADC. .
Principle: Depends on the varying number of bits for the difference in signal
which depends on the amplitude.
• The use of two subbands has the advantage that different bit rates can be
used for each
• In general the frequency components in the lower subband have a higher
perceptual importance than those in the higher subband
• For example with a bit rate of 64 kbps the lower subband is ADPCM
encoded at 48kbps and the upper subband at 16kbps
• The two bitstreams are then multiplexed together to produce the transmitted
(64 kbps) signal – in such a way that the decoder in the receiver is able to
divide them back again into two separate streams for decoding
Adaptive predictive coding
• Even higher levels of compression possible at higher levels of complexity
• These can be obtained by also making the predictor coefficients adaptive
• In practice, the optimum set of predictor coefficients continuously vary since
they are a function of the characteristics of the audio signal being digitized
• To exploit this property, the input speech signal is divided into fixed time
segments and, for each segment, the currently prevailing characteristics are
determined.
• The optimum set of coefficients are then computed and these are used to
predict more accurately the previous signal
• This type of compression can reduce the bandwidth requirements to 8kbps
while still obtaining an acceptable perceived quality
Linear predictive coding involves the source simply analyzing the audio
waveform to determine a selection of the perceptual features it contains.
• With this type of coding the perceptual features of an audio waveform are
analysed first
• These are then quantized and sent and the destination uses them, together
with a sound synthesizer, to regenerate a sound that is perceptually
comparable with the source audio signal
• With this compression technique although the speech can often sound
synthetic high levels of compressions can be achieved
• In terms of speech, the three features which determine the perception of a
signal by the ear are its:
Pitch: this is closely related to the frequency of the signal. This is important
since ear is more sensitive to signals in the range
2-5kHz
Period: this is the duration of the signal
Loudness: This is determined by the amount of energy in the signal
• The input speech waveform is first sampled and quantized at a defined rate
• A block of digitized samples – known as segment - is then analysed to
determine the various perceptual parameters of the speech that it contains
• The output of the encoder is a string of frames, one for each segment
• Each frame contains fields for pitch and loudness – the period determined
by the sampling rate being used – a notification of whether the signal is
voiced (generated through the vocal cords) or unvoiced (vocal cords are
opened)
And a new set of computed modal coefficients
\
3. Lossless Compression
-Run Length Coding, Statistical Coding, Huffman Coding, Dictionary
Coding, Arithmetic Coding.
Entropy encoding
• Entropy encoding is lossless and independent of the type of information
(semantic/ structure of the source information) that is being compressed. It
is concerned only with how the information is represented.
• e.g. Run-length encoding
• Statistical encoding
» Huffman encoding
» Arithmetic encoding
Run length coding- Lossless compression
Run length coding is that type of compression technique where the sequences of
same bytes are replaced with a flag and the number of the occurrences. As a flag
any unused special symbol that is not in the stream is used.
• When source information contains along substrings of the same character
or binary.
Example:1
Let a data stream contains AAAAABCDDDDDDF
Run length coding of this will be A!5BCD!6F
Statistical encoding
Based on the probability of occurrence of a pattern
Statistical encoding uses a set of variable-length code words with the shorter
code words used to represent the more frequently occurring symbols.
Ensure that shorter codeword in the set does not form the start of a longer
codeword, otherwise interpret will be wrong.
“Prefix property”: a shorter codeword must not form the start of a longer
codeword
Example Huffman coding algorithm
Entropy
The theoretical minimum average number of bits that are required to transmit a
particularNsource stream is known as the entropy of the source.
H p(si) log 2 p(si)
i 1
Where N is the number of different symbols in the source stream and Psi is the
probability of occurrence of symbol si.
How to calculate the average code word length
Average number of bits per codeword is given by , where Ni is the bit length of
the ith codeword of the codebook
Find the average code word length of the given table:
Symbol A B C D E
P(S) 0.25 0.30 0.12 0.15 0.18
Code 01 11 100 101 00
N
L code length of (si) p(si)
i 1
n
L Ni Pi 0.25 2 0.30 2 0.12 3 0.15 3 0.18 2 2.27bits
i 1
Solution
Step 1 : Sort all Symbols according to their probabilities (left to right) from
Smallest to largest these are the leaves of the Huffman tree
Step 3: label left branches of the tree With 0 and right branches of the tree
With 1
Example 2
Find the Huffman code for A,B,C,D,E,F,G,H respective probability of
occurrence
0.25, 0.25, 0.14, 0.14,0.055, 0.055,0.055,0.055.
Entropy, H: theoretical min. avg. number of bits that are required to
transmit a particular stream
N
H Pi) log2 Pi
i 1
Student’s exercise
1. Calculate the entropy of the above two problems :
2. Construct the Huffman coding tree for
Symbol A B C D E
Probability 0.25 0.30 0.12 0.15 0.18
Arithmetic Coding
This is an Optimal algorithm as Huffman coding with respect to compression
ratio.This is also a lossless compression. Arithmetic coding yields a single
codeword for each encoded string of characters
• The available space become full, then the number of entries is allowed to
increase incrementally
•
Example : LZW compression for string “ABABBABCABABBA"
Given :
Code 1 2 3
String A B C
ABABBABCABABBA
12 4 5 2 34 6 1.
DECODING : 1 2 4 5 2 3 4 6 1
K is the subsequent input code
.
4. Lossy Compression
-Transform coding, DFT, DCT, Harr Transform, KLT, Wavelet Transforms,
Embedded Zero Tree Coder
Differential encoding
Instead of sending large code words, from the source information, sent a
set of smaller codewords which indicate the difference in amplitude between the
current value of the signal being encoded and immediately preceding value of the
signal.
Example 12bits are required to represent the dynamic(current value) range of the
signal, but only 3 bits are required to transmit the difference between the
successive samples.
Example 2
If the sequence of DC coefficients are
12,13,11,11,10, …..
Then the corresponding difference value would be
12, 1 , -2, 0, -1, ……
Difference values are encoded in the form (SSS, value) where SSS field indicates
the number of bits needed to encode the value and value field the actual bits that
represent the value
Value SSS field Value field
12 4 1100
1 1 1
-2 2 01
0 0
-1 1 0
The basis idea is to decompose the image into a set of “waveforms”, each with a
particular “special” frequency.
To human eyes, high spatial frequencies are imperceptible and a good
approximation of the image can be created by keeping only the lower
frequencies.
DCT Algorithm
1. Divide picture into 16 by 16 blocks - (macroblocks).
2. Each macroblock is 16 pixels by 16 pixels( lines). (4 blocks)
3. Each block is 8 pixels by 8 lines.
4. Each pixels in the block represents the intensity value of the particular
position f(i,j)
5. Apply DCT over 8 X 8 Block -Frequency Coefficient
5. f(i,j) is the intensity of the pixel in row i and column j;
6. Level shift / Make all the values are centered around zero. The 8 arbitrary
grayscale values (with range 0 to 255) are level shifted by 128 .
7. F(u,v) is the DCT coefficient in row and column of the DCT matrix.
8. F(0,0) is the DC coefficient
9. u=1-7 for v=0, u=0 for v=1-7 or u=1-7, v=1-7 are AC co-efficients.
10. For most images, much of the signal energy lies at low frequencies; these
appear in the upper left corner of the DCT.
11. Compression is achieved since the lower right values represent higher
frequencies, and are often small - small enough to be neglected with little
visible distortion
Our goal is to find a transform T such that the components of the output Y are
uncorrelated, i.e E[YtYs] = 0, if t≠s
. Thus, the autocorrelation matrix of Y takes on the form of a positive diagonal
matrix.
• Since any autocorrelation matrix is symmetric and non-negative definite,
there are k orthogonal eigenvectors u1, u2,... , uk and k corresponding real
and nonnegative eigenvalues λ1 ≥ λ2 ≥ ··· ≥ λk ≥ 0.
If we define the Karhunen-Loeve transform as
T = [u1;u2; ;uk]T
• Then, the autocorrelation matrix of Y becomes
4 3 5 6 4.5
mean m x 4 2 7 7 5
1
4
5 5 6 7 5.75
Subtracting the mean vector from each input vector and
apply the KLT: x1 m x T y1output vector
Wavelet based coding – Haar Transform
Haar Transform is a very fast transform. The easiest wavelet transform..It
is useful in edge detection, image coding, and image analysis problems. Energy
Compaction is fair, not the best compression algorithms.
The simplest wavelet transform is so called as Haar wavelet transform. Here we,
repeatedly take the averages and differences and keep the result for every step.
Thus we do the multi resolution analysis. This will create smaller and smaller
size images ie ¼, 1/16 and so on.
Haar Transform steps
1. Find the average of each pair of samples.
Example 1
5. Insert
6. Normalize
7. Insert
8. Repeat ie 2+6/2=4,, and (2-6)/2= -2
Example 2
2D Wavelet Transform
Wavelet transform represents a signal with good resolution in both frequency and
time, by using a set wavelet basis function.
The objective of the wavelet transform is to decompose the input signal into set
of basis components (wavelets) that are easier to deal with, or have some
components that can be thresholded away, for compression purposes.
Discrete wavelets are formed from a mother wavelet, but with scale and
shift in discrete steps. The DWT makes the connection between wavelets in the
continuous time domain and filter banks" in the discrete time domain in a
multiresolution analysis framework.
The a,b is computed from the mother wavelet by translation and dilation
1 x b
a ,b ( x )
a a
After the above steps, one stage of the DWT is complete. The transformed
image now contains four subbands LL, HL, LH, and HH, standing for low-low,
high-low, etc. The LL subband can be further decomposed to yield yet another
level of decomposition. This process can be continued until the desired number
of decomposition levels is reached
Wavelet Packets
In the usual dyadic wavelet decomposition, only the low-pass filtered subband is
recursively decomposed and thus can be represented by a logarithmic tree
structure. A wavelet packet decomposition allows the decomposition to be
represented by any pruned subtree of the full tree topology.
The wavelet packet decomposition is very flexible since a best wavelet
basis in the sense of some cost metric can be found within a large library of
permissible bases. The computational requirement for wavelet packet
decomposition is relatively low as each decomposition can be computed in the
order of N logN using fast filter banks.
Embedded Zero-tree of Wavelet
Using an embedded code allows the encoder to terminate the encoding at any
point. Hence, the encoder is able to meet any target bit-rate exactly. Similarly, a
decoder can cease to decode at any point and can produce reconstructions
corresponding to all lower-rate encodings
• Embedded code that contains all lower rate codes , embedded at the
beginning of the bit stream.
• The EZW algorithm use a new data structure called the zero-tree
• The coefficient at the coarse scale is called the ‘parent“ while all
corresponding coefficients are the next finer scale of the same spatial
location and similar orientation are called “children".
• EZW algorithm consists two central components: Zero tree data structure
and the method of successive approximation quantization.
Second stage decomposition
First stage decomposition
LL 2 HL 2
LL 1 HL 1 HL 1
LH 2 HH 2
LH 1 LH 1
HH 1 HH 1
The significance map is coded using zero tree with four symbol alphabets. Four
symbols are
26 6 13 10
-7 7 6 4
4 -4 4 -3
2 -2 -2 0
.
Fix the threshold value =