Escolar Documentos
Profissional Documentos
Cultura Documentos
Department of Computer Science
Lecture notes:
IMAGE COMPRESSION
Pasi Fränti
2
TABLE OF CONTENTS:
IMAGE COMPRESSION
"A picture takes more than thousand bytes"
1. INTRODUCTION.................................................................................................................................................
1.1 Image types............................................................................................................................................................
1.2 Lossy versus lossless compression........................................................................................................................
1.3 Performance criteria in image compression.........................................................................................................
2 FUNDAMENTALS IN DATA COMPRESSION.................................................................................................
2.1 Modelling..............................................................................................................................................................
2.2 Coding...................................................................................................................................................................
3 BINARY IMAGES.................................................................................................................................................
3.1 Run-length coding.................................................................................................................................................
3.2 READ code............................................................................................................................................................
3.3 CCITT group 3 and group 4 standards.................................................................................................................
3.4 Block coding.........................................................................................................................................................
3.5 JBIG.......................................................................................................................................................................
3.6 JBIG2.....................................................................................................................................................................
3.7 Summary of binary image compression algorithms.............................................................................................
4 CONTINUOUS TONE IMAGES.........................................................................................................................
4.1 Lossless and near-lossless compression................................................................................................................
4.2 Block truncation coding........................................................................................................................................
4.3 Vector quantization...............................................................................................................................................
4.4 JPEG......................................................................................................................................................................
4.5 Wavelet..................................................................................................................................................................
4.6 Fractal coding........................................................................................................................................................
5 VIDEO IMAGES....................................................................................................................................................
LITERATURE...........................................................................................................................................................
3
1. Introduction
The purpose of compression is to code the image data into a compact form, minimizing both
the number of bits in the representation, and the distortion caused by the compression. The
importance of image compression is emphasized by the huge amount of data in raster images:
a typical grayscale image of 512´512 pixels, each represented by 8 bits, contains 256
kilobytes of data. With the color information, the number of bytes is tripled. If we talk about
video images of 25 frames per second, even a one second of color film requires
approximately 19 megabytes of memory, therefore the capacity of a typical hard disk of
a PCmachine (540 MB) can store only about 30 seconds of film. Thus, the necessity for
compression is obvious.
There exists a number of universal data compression algorithms that can compress almost any
kind of data, of which the best known are the family of ZivLempel algorithms. These
methods are lossless in the sense that they retain all the information of the compressed data.
However, they do not take advantage of the 2dimensional nature of the image data.
Moreover, only a small portion of the data can be saved by a lossless compression method,
and thus lossy methods are more widely used in image compression. The use of lossy
compression is always a tradeoff between the bit rate and the image quality.
1.1 Image types
From the compression point of view, the images can be classified as follows:
· binary images
· grayscale images
· color images
· video images
An illustration of this classification is given in Figure 1.1. Note that the groups of grayscale,
color, and video images are closely related to each other, but there is a gap between the binary
images and the grayscale images. This demonstrates the separation of the compression
algorithms. The methods that are designed for the grayscale images, can also be applied to
the color and video images. However, they usually do not apply to the binary images, which
are a distinct class of images from this point of view.
For comparison, Figure 1.1 also shows the class of textual data. The fundamental difference
between images and e.g. English text is the 2dimensionality of the image data. Another
important property is that the grayscales are countable, which is not true for English text. It is
not evident that any subsequent symbols, e.g. 'a' and 'b' are close to each other, whereas the
4
grayscales 41 and 42 are. These properties distinct the image data from other data like
English text.
Note also that the class of colorpalette images appears on the border line of image data and
the nonimage data. This demonstrates the lack of countable alphabet of the colorpalette
images, which makes them closer to other data. In fact, colorpalette images are often
compressed by the universal compression algorithms, see Section 4.7.
IMAGE UNIVERSAL
COMPRESSION COMPRESSION
B in a r y
G r a y -s c a le
im a g e s
im a g e s
T e x tu a l
Video d a ta
images Colour
T r u e c o lo u r palette
im a g e s images
Figure 1.1: Classification of the images from the compression point of view.
1.2 Lossy versus lossless compression
The motivation for lossy compression originates from the inability of the lossless algorithms
to produce as low bit rates as desired. Figure 1.2 illustrates typical compression performances
for different types of images and types of compression. As one can see from the example, the
situation is significantly different with binary and grayscale images. In binary image
compression, very good compression results can be achieved without any loss in the image
5
quality. On the other hand, the results for grayscale images are much less satisfactory. This
deficiency is emphasized because of the large amount of the original image data when
compared to a binary image of equal resolution.
100%
80%
60% 53.4 %
40%
20%
4.3 % 6.7 %
0%
IMAGE: CCITT-3 LENA LENA
TYPE: binary gray-scale gray-scale
METHOD: JBIG JPEG JPEG
(lossless) (lossless) (lossy)
Figure 1.2: Example of typical compression performance.
The fundamental question of lossy compression techniques is where to lose information. The
simplest answer is that information should be lost wherever the distortion is least. This
depends on how we define distortion. We will return to this matter in mode detailed in
Section 1.3.
The primary use of images is for human observation. Therefore it is possible to take
advantage of the limitations of the human visual system and lose some information that is less
visible to the human eye. On the other hand, the desired information in an image may not
always be seen by human eye. To discover the essential information, an expert in the field
and/or image processing and analysis may be needed, cf. medical applications.
In the definition of lossless compression, it is assumed that the original image is in digital
form. However, one must always keep in mind that the actual source may be an analog view
of the real world. Therefore the loss in the image quality already takes place in the image
digitization, where the picture is converted from analog signal to digital. This can be
performed by an image scanner, digital camera, or any other suitable technique.
6
see the details. The characteristics of the human eye cannot therefore be utilized, unless the
application in question is definitely known.
Here we will ignore the digitization phase and make the assumption that the images are
already stored in the digital form. These matters, however, should not be ignored when
designing the entire image processing application. It is still worthwhile mentioning that while
the lossy methods seem to be the main stream of research, there is still a need for lossless
methods, especially in medical imaging and remote sensing (i.e. satellite imaging).
1.3 Performance criteria in image compression
The aim of image compression is to transform an image into compressed form so that the
information content is preserved as much as possible. Compression efficiency is the principal
parameter of a compression technique, but it is not sufficient by itself. It is simple to design a
compression algorithm that achieves a low bit rate, but the challenge is how to preserve the
quality of the reconstructed image at the same time. The two main criteria of measuring the
performance of an image compression algorithm thus are compression efficiency and
distortion caused by the compression algorithm. The standard way to measure them is to fix a
certain bit rate and then compare the distortion caused by different methods.
The third feature of importance is the speed of the compression and decompression process.
In online applications the waiting times of the user are often critical factors. In the extreme
case, a compression algorithm is useless if its processing time causes an intolerable delay in
the image processing application. In an image archiving system one can tolerate longer
compression times if the compression can be done as a background task. However, fast
decompression is usually desired.
Among other interesting features of the compression techniques we may mention the
robustness against transmission errors, and memory requirements of the algorithm. The
compressed image file is normally an object of a data transmission operation. The
transmission is in the simplest form between internal memory and secondary storage but it
can as well be between two remote sites via transmission lines. The data transmission systems
commonly contain fault tolerant internal data formats so that this property is not always
obligatory. The memory requirements are often of secondary importance, however, they may
be a crucial factor in hardware implementations.
From the practical point of view the last but often not the least feature is complexity of the
algorithm itself, i.e. the ease of implementation. Reliability of the software often highly
7
depends on the complexity of the algorithm. Let us next examine how these criteria can be
measured.
Compression efficiency
size of the compressed file C
bit rate =
pixels in the image
N
(bits per pixel) (1.1)
where C is the number of bits in the compressed file, and N (=X×Y) is the number of pixels in
the image. If the bit rate is very low, compression ratio might be a more practical measure:
size of the original file N ×k
compression ratio =
size of the compressed file
C
(1.2)
Distortion
Distortion measures can be divided into two categories: subjective and objective measures.
A distortion measure is said to be subjective, if the quality is evaluated by humans. The use of
human analysts, however, is quite impractical and therefore rarely used. The weakest point of
this method is the subjectivity at the first place. It is impossible to establish a single group of
humans (preferably experts in the field) that everyone could consult to get a quality
evaluation of their pictures. Moreover, the definition of distortion highly depends on the
application, i.e. the best quality evaluation is not always made by people at all.
1 N
MAE yi xi
N i 1
(1.3)
8
1 N
MSE yi x i 2 (1.4)
N i 1
These measures are widely used in the literature. Unfortunately these measures do not always
coincide with the evaluations of a human expert. The human eye, for example, does not
observe small changes of intensity between individual pixels, but is sensitive to the changes in
the average value and contrast in larger regions. Thus, one approach would be to calculate the
mean values and variances of some small regions in the image, and then compare them
between the original and the reconstructed image. Another deficiency of these distortion
functions is that they measure only local, pixelbypixel differences, and do not consider
global artifacts, like blockiness, blurring, or the jaggedness of the edges.
9
2 Fundamentals in data compression
Data compression can be seen consisting of two separated components: modelling and coding.
The modelling in image compressions consists of the following issues:
· How, and in what order the image is processed?
· What are the symbols (pixels, blocks) to be coded?
· What is the statistical model of these symbols?
The coding consists merely on the selection of the code table; what codes will be assigned to
the symbols to be coded. The code table should match the statistical model as well as possible
to obtain the best possible compression. The key idea of the coding is to apply variable length
codes so that more frequent symbols will be coded with less bits (per pixel) than the less
frequent symbols. The only requirement of coding is that it is uniquely decodable; i.e. any two
different input files must result to different code sequences.
A desirable (but not necessary) property of a code is the socalled prefix property; i.e. no code
of any symbol can be a prefix of the code of another symbol. The consequence of this is that
the codes are instantaneously decodable; i.e. a symbol can be recognized from the code
stream right after its last bit has been received. Wellknown prefix codes are ShannonFano
and Huffman codes. They can be constructed empirically on the basis of the source. Another
coding scheme, known as GolombRice codes, is also a prefix code, but it presumes a certain
distribution of the source.
The coding is usually considered as the easy part of the compression. This is because the
coding can be done optimally (corresponding to the model) by arithmetic coding! It is
optimal, not only in theory, but also in practice, no matter what is the source. Thus the
performance of a compression algorithm depends on the modelling, which is the key issue in
data compression. Arithmetic coding, on the other hand, is sometimes replaced by sub
optimal codes like Huffman coding (or another coding scheme) because of practical aspects,
see Section 2.2.2.
2.1 Modelling
2.1.1 Segmentation
The models of the most lossless compression methods (for both binary and grayscale images)
are local in the way they process the image. The image is traversed pixel by pixel (usually in
rowmajor order), and each pixel is separately coded. This makes the model relatively simple
10
and practical (small memory requirements). On the other hand, the compression schemes are
limited to the local characteristics of the image.
In the other extreme there are global modelling methods. Fractal compression techniques are
an example of modelling methods of this kind. They decompose the image into smaller parts
which are described as linear combinations of the other parts of the image, see Figure 2.1. The
global modelling is somewhat impractical because of the computational complexity of the
methods.
Block coding is a compromise between the local and global models. Here the image is
decomposed into smaller blocks which are separately coded. The larger the block, the better
the global dependencies can be utilized. The dependencies between different blocks, however,
are often ignored. The shape of the block can be uniform, and is often fixed throughout the
image. The most common shapes are square and rectangular blocks because of their practical
usefulness. In quadtree decomposition the shape is fixed but the size of the block varies.
Quadtree thus offers a suitable technique to adapt to the shape of the image with the cost of a
few extra bits describing the structure of the tree.
In principle, any segmentation technique can be applied in block coding. The use of more
complex segmentation techniques is limited because they are often computationally
demanding, but also because of the overhead required to code the block structure; as the
shape of the segmentation is adaptively determined, it must be transmitted to the decoder also.
The more complex the segmentation, the more bits are required. The decomposition is a
tradeoff between the bit rate and good segmentation:
Simple segmentation: Complex segmentation:
+ Only small cost in bit rate, if any. High cost in the bit rate
+ Simple decomposition algorithm. Computationally demanding decomposition.
Poor segmentation. + Good segmentation according to image shape.
Coding of the blocks is a key issue. + Blocks are easier to code.
11
"pattern" "shape"
Figure 2.1: Intuitive idea of global modelling.
2.1.2 Order of processing
The next questions after the block decomposition are:
· In what order the blocks (or the pixels) of the image are processed?
· In what order the pixels inside the block are processed?
The most common order of processing is rowmajor order (toptodown, and from leftto
right). If a particular compression algorithm considers only 1dimensional dependencies (eg.
ZivLempel algorithms), an alternative processing method would be the socalled zigzag
scanning in order to utilize the twodimensionality of the image data, see Figure 2.2.
A drawback of the toptodown processing order is that the image is only partially "seen"
during the decompression. Thus, after decompression of 10 % of the image pixels, it is only
little known about the rest of the image. A quick overview of the image, however, would be
12
convenient for example in image archiving systems where the image database is browsed
often to retrieve the desired image. A progressive modelling is an alternative approach to the
ordering of the image to avoid this deficiency.
The idea in the progressive modelling is to arrange for the quality of an image to increase
gradually as data is received. The most "useful" information in the image is sent first, so that
the viewer can begin to use the image before it is completely displayed, and much sooner than
if the image were transmitted in normal raster order. There are three basically different ways
to achieve progressive transmission:
· In transform coding to transmit lowfrequent components of the blocks first.
· In vector quantization to begin with a limited palette of colors and gradually provide
more information so that color details increase with time.
· In pyramid coding a low resolution version of the image is transmitted first, following
with gradually increasing resolutions until full precision is reached, see Figure 2.3 for an
example.
These progressive modes of operation will be discussed in more detailed in Section 4.
(a)
(b)
Figure 2.2: Zigzag scanning; (a) in pixelwise processing; (b) in DCTtransformed block.
13
0.1 % 0.5 % 2.1 % 8.3 %
Figure 2.3: Early stage of transmission of the image Camera (256´256´8);
in sequential order (above); in progressive order (below).
2.1.3 Statistical modelling
Data compression in general is based on the following abstraction:
Data = information content + redundancy (2.1)
The aim of compression is to remove the redundancy and describe the data by its information
content. (an observant reader may notice some redundancy between the course of String
algorithms and this course). In statistical modelling the idea is to "predict" symbols that are to
be coded by using a probability distribution for the source alphabet. The information content
of a symbol in the alphabet is determined by its entropy:
H x log 2 p x (2.2)
where x is the symbol and p x is its probability. The higher the probability, the lower the
entropy, and thus the shorter codeword should be assigned to the symbol. The entropy H x
gives the number of bits required to code the symbol x in an average, in order to achieve the
optimal result. The overall entropy of the probability distribution is given by:
k
H p x × log 2 p x (2.3)
x 1
14
where k is the number of symbols in the alphabet. The entropy gives the lower bound of
compression that can be achieved (measured in bits per symbol), corresponding to the model.
(The optimal compression can be realized by arithmetic coding.) The key issue is how to
determine these probabilities. The modelling schemes can be classified into following three
categories:
· Static modelling
· Semiadaptive modelling
· Adaptive (or dynamic) modelling
In the static modelling the same model (code table) is applied to all input data ever to be
coded. Consider text compression; if the ASCII data to be compressed is known to consist of
English text, the model based on the frequency distribution of English text could be applied.
For example, the probabilities of the most likely symbols in ASCII file of English text are
p(' ')=18 %, p('e')=10 %, p('t')=8 % in an average. Unfortunately the static modelling fails if
the input data is not English text, rather than binary data of an executable file, for example.
The advantage of static modelling, is that no side information is needed to transmit to the
decoder, and that the compression can be done by onepass over the input data.
Semiadaptive modelling is a twopass method in the sense the input data is processed. In the
first phase the input data is analyzed and some statistical information of it (or the code table)
is sent to the decoder. At the second phase the actual compression is done on the basis of this
information (eg. frequency distribution of the data) which is now known by the decoder, too.
Dynamic modelling takes one step further and adapts to the input data "online" during the
compression. It is thus a onestep method. As the decoder does not have any prior knowledge
of the input data, an initial model (code table) should be used for compressing the first
symbols of the data. However, when the coding/decoding proceeds the information of the
symbols already coded/decoded can be taken advantage of. The model for a particular symbol
to be coded can be constructed on the basis of the frequency distribution of all the symbols
that have already been coded. Thus both encoder and decoder have the same information and
no sideinformation is needed to be sent to the decoder.
Consider the symbol sequence of (a, a, b, a, a, c, a, a, b, a). If no prior knowledge is allowed,
one could apply the static model given in Table 2.1. In semiadaptive modelling the frequency
distribution of the input data is calculated. The probability model is then constructed on the
basis of the relative frequencies of these symbols, see Table 2.1. The entropy of the input data
is:
Static model:
15
1
H 1.58 1.58...1.58 1.58
10
(2.4)
Semiadaptive model:
1
H × 0.51 0.51 2.32 0.51 0.51 3.32 0.51 0.51 2.32 0.51
10
7 2 1
× 0.51 × 2.32 × 3.32 = 1.16
10 10 10
(2.5)
Table 2.1: Example of the static and semiadaptive models.
STATIC MODEL SEMIADAPTIVE MODEL
symbol p(x) H symbol count p(x) H
a 0.33 1.58 a 7 0.70 0.51
b 0.33 1.58 b 2 0.20 2.32
c 0.33 1.58 c 1 0.10 3.32
In dynamic modelling the input data is processed as shown in Table 2.2. In the beginning,
some initial model is needed since no prior knowledge is allowed from the input data. Here
we assume equal probabilities. The probability of the first symbol ('a') is thus 0.33 and the
corresponding entropy 1.58. After that, the model is updated by increasing the count for
symbol 'a' by one. Note, that it is assumed that each symbol has been occurred once before the
processing, i.e. their initial counts equal to 1. (This is for avoiding the socalled zero
frequency problem. Because if we have no occurrence of a symbol, its probability would be
0.00 yielding entropy of ¥.)
At the second step, the symbol 'a' is again processed. Now the modified frequency
distribution gives the probability of 2/4 = 0.50 for the symbol 'a', resulting to entropy of 1.00.
As the coding proceeds, the more accurate approximation of the probabilities are obtained,
and in the final step the symbol 'a' has now the probability of 7/12 = 0.58, resulting to entropy
of 0.78. The sum of the entropies of the coded symbols is 14.5 bits, yielding to the overall
entropy of 1.45 bits per symbol.
The corresponding entropies of the different modelling strategies are summarized here:
Static modelling: 1.58 (bits per symbol)
Semiadaptive modelling: 1.16
16
Dynamic modelling: 1.45
Table 2.2: Example of the dynamic modelling. The numbers are the frequencies of the
symbols.
step:
symbol 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
a 1 2 3 3 4 5 5 6 7 7
b 1 1 1 2 2 2 2 2 2 3
c 1 1 1 1 1 1 2 2 2 2
p(x) 0.33 0.50 0.20 0.50 0.57 0.13 0.56 0.60 0.18 0.58
H 1.58 1.00 2.32 1.00 0.81 3.00 0.85 0.74 2.46 0.78
It should be noted, that in the semiadaptive modelling the information used in the model
should also be sent to the decoder, which would increase the overall bit rate. Moreover, the
dynamic modelling is inefficient in the early stage of the processing, but it will quickly
improve its performance when more symbols have been processed, and thus more information
can be used in the modelling. The result of the dynamic model is thus much closer to the
semiadaptive model when the length of the source is longer. Here the example was too short
that the model would have had enough time adapt to the input data.
The properties of the different modelling strategies are summarized as follows:
Context modelling:
So far we have considered only the overall frequency distribution of the source, but paid no
attention to the spatial dependencies between the individual pixels. For example, the
intensities of neighboring pixels are very likely to have strong correlation between each other.
Once again, consider ASCII data of English text, when the first five symbols have already
been coded: "The_q...". The frequency distribution of the letters in English would suggest that
17
the following letter would be blank with the probability of 18 %, or the letter 'e' (10 %), 't'
(8 %), or any other with the decreasing probabilities. However, by consulting dictionary under
the Qchapter, it can be found that more than 99 % of the words have letter 'u' following the
letter 'q' (eg. quadtree, quantize, quality). Thus, the probability distribution highly depends on,
in which context it occurs. The solution is to use, not only one, but several different models,
one for each context. The entropy of a Nlevel context model is the weighted sum of the
entropies of the individual contexts:
k
N
H N p c j × p xi c j × log 2 p xi c j (2.6)
j 1 i 1
where p x c is the probability of symbo x in a context c, and N is the number of different
contexts. In the image compression, the context is usually the value of the previous pixel, or
the values of two or more neighboring pixels to the west, north, northwest, and northeast of
the current pixel. The only limitation is that the pixels within the context template must have
been compressed and thus seen by the decoder, so that both the encoder and decoder have the
same information.
The number of contexts equals the number of possible combinations within the neighboring
pixels that are present in the context template. A 4pixel template is given in Figure 2.4. The
number of contexts increases exponentially with the number of pixels within the template.
With one pixel in template, the number of contexts are 2 8=256, but with two pixels the
number is already 22´8=65536, which is rather impractical because of high memory
requirement. One must also keep in mind, that in the semiadaptive modelling, all models
must be sent to the decoder, so the question is crucial also for the compression performance.
NW N NE
W
Figure 2.4: Example of a four pixel context template.
A solution to this problem is to quantize the pixels values within the template to reduce the
number of possible combinations. For example, by quantizing the values by 4 bits each, the
total number of contexts in a onepixel template is 2 4=16, and in a twopixel template
22´4=256. (Note, that the quantization is performed only in the computer memory in order to
calculate what context model should be used; the original pixel values in the compressed file
are untouched.)
18
Table 2.3: The number of contexts as the function of the number of pixels in the template.
No. of contexts
Pixels within No. of contexts if quantized to
the template 4 bits
1 256 16
2 65536 256
3 16´106 4096
4 4´109 65536
Predictive modelling:
Predictive modelling consists of the following three components:
· Prediction of the current pixel value.
· Calculating the prediction error.
· Modelling the error distribution.
The value of the current pixel x is predicted on the basis of the pixels that have already been
coded (thus seen by the decoder too). Refer the neighboring pixels as in Figure 2.4, thus a
possible predictor could be:
x
xW x N (2.7)
2
The prediction error is the difference between the original and the predicted pixel values:
e x x (2.8)
The prediction error is then coded instead of the original pixel value. The probability
distribution of the prediction errors are concentrated around zero while very large positive,
and very small negative errors are rare to appear, thus the distribution resembles Gaussian
normal distribution function where the only parameter is the variance of the distribution, see
Figure 2.5. Now even a static model can be applied to estimate the probabilities of the errors.
Moreover, the use of context modelling is not necessary anymore. The methods of this kind
are sometimes referred as differential pulse code modulation (DPCM).
19
p ( )e
e
255 0 255
Figure 2.5: Probability distribution function of the prediction errors.
2.2 Coding
As stated earlier coding is considered as the easy part of compression. That is, good coding
methods has been known since decades, eg. Huffman coding (1952). Moreover, the arithmetic
coding (1979) is wellknown to be optimal corresponding to the model. Let us next study
these methods.
2.2.1 Huffman coding
Huffman algorithm creates a code tree (called Huffman tree) on the basis of probability
distribution. The algorithm starts by creating for each symbol a leaf node containing the
symbol and its probability, see Figure 2.6a. The two nodes with the smallest probabilities
become siblings under a parent node, which is given a probability equal to the sum of its two
children's probabilities, see Figure 2.6b. The combining operation is repeated, choosing the
two nodes with the smallest probabilities, and ignoring nodes that are already children. For
example, at the next step the new node formed by combining a and b is joined with the node
for c to make a new node with probability p=0.2. The process continues until there is only one
node without a parent, which becomes the root of the tree, see Figure 2.6c.
The two branches from every nonleaf node are next labelled 0 and 1 (the order is not
important). The actual codes of the symbols are obtained by traversing the tree from the root
to the corresponding leaf nodes representing the symbols; the codeword is the catenation of
the branch labels during the path from root to the leaf node, see Table 2.4.
20
a b c d e f g
0.05 0.05 0.1 0.2 0.3 0.2 0.1
(a)
0.1
a b c d e f g
0.05 0.05 0.1 0.2 0.3 0.2 0.1
(b)
1.0
0 1
0.4
0 1
0.2 0.6
0 1 0 1
0.1 0.3
0 1 0 1
a b c d e f g
0.05 0.05 0.1 0.2 0.3 0.2 0.1
(c)
Figure 2.6: Constructing the Huffman tree: (a) leaf nodes; (b) combining nodes;
(c) the complete Huffman tree.
21
Table 2.4. Example of symbols, their probabilities, and the corresponding Huffman codes.
2.2.2 Arithmetic coding
Arithmetic coding is known to be optimal coding method in respect to the model. Moreover,
it is extremely suitable for dynamic modelling, since there is no actual code table like in
Huffman coding to be updated. The deficiency of Huffman coding is emphasized in the case
of binary images. Consider binary images having the probability distribution of 0.99 for
a white pixel, and 0.01 for a black pixel. The entropy of the white pixel is log20.99 = 0.015
bits. However, in Huffman coding the code length is bounded to be 1 bit per symbol at
minimum. In fact, the only possible code for binary alphabet would be one bit for each
symbol, thus no compression would be possible.
Let us next consider the fundamental properties of binary arithmetic. With n bits at most 2n
different combinations can represented; or with n bits a code interval between zero to one can
be divided into 2n parts each having the length of 2n, see Figure 2.7. Let us assume A is a
power of ½, that is A=2n. From the opposite point of view, an interval with the length A can
be coded by using log 2 A bits (assuming A is a power of ½).
1
111
0.875
110 11
0.75
101 10
0.625
100 1
0.5
011 0
0.375
010 01
0.25
001 00
0.125
000
0
22
Figure 2.7: Interval [0,1] is divided into 8 parts, thus each having the length of 23=0.125.
Each interval can now be coded by using log 2 0.125 3 bits.
Arithmetic coding starts by dividing the interval into subintervals according to the probability
distribution of the source. The length of each subinterval equals to the probability of the
corresponding symbol. Thus, the sum of the lengths of all subintervals equals to 1 filling the
range [0, 1] completely. Consider the probability model of Table 2.1. Now, the interval is
divided into the subintervals [0.0, 0.7], [0.7, 0.9], and [0.9, 1.0] corresponding to the symbols
a, b, and c, see Figure 2.8.
The first symbol in a sequence (a,a,b,...) to be coded is 'a'. The coding proceeds by taking the
subinterval of the symbol 'a', that is [0, 0.7]. Then this interval is again splitted into three
subintervals so that the length of each subinterval is relative to its probability. For example,
the subinterval for symbol 'a' is now 0.7 ´ [0, 0.7] = [0, 0.49]; for symbol 'b' it is 0.2 ´ [0, 0.7]
plus the length of the first part, that is [0.49, 0.63]. The last subinterval for symbol 'c' is [0.63,
0.7]. The next symbol to be coded is 'a', so the next interval will be [0, 0.49].
1
c
0.9
b
0.7
0.63
c
b
0.49 c
0.441
b
a 0.343
a
a
Figure 2.8: Example of arithmetic coding of sequence 'aab' using the model of Table 2.1.
The process is repeated for each symbol to be coded resulting to a smaller and smaller
interval. The final interval describes the source uniquely. The length of this interval is the
cumulative multiplication of the probabilities of the coded symbols:
23
n
A final p1 × p2 × p3 ×... × pn pi (2.9)
i 1
Due to the previous discussions this interval can be coded by
n n
C A log 2 pi log 2 pi (2.10)
i 1 i 1
number of bits (assuming A is a power of ½). If the same model is applied for each symbol to
be coded, the code length can be described in respect to the source alphabet:
m
C A pi × log 2 pi (2.11)
i 1
where m is the number of symbols in the alphabet, and pi is the probability of that particular
symbol in the alphabet. The important observation is that the C A equals to the entropy!
This means that the source can be coded optimally if A is a power of ½. This is the case if the
length of the source approaches to infinite. In practice, arithmetic coding is optimal even for
rather short sources.
Optimality of arithmetic coding:
The length of the final interval is not exactly a power of ½, as it was assumed. The final
interval, however, can be approximated by any of its subinterval that meets the requirement
A=2n. Thus the approximation can be bounded by
A A' A (2.12)
2
yielding to the upper bound of the code length:
The upper bound for the coding deficiency thus is 1 bit for the entire file. Note, that Huffman
coding can be simulated by arithmetic coding, where each subinterval division is restricted to
be a power of ½. There is no such restrictions in arithmetic coding (except in the final
interval), which is the reason why arithmetic coding is optimal, in contrary to Huffman
coding. The code length of Huffman coding has been shown to be bounded by
2 × log e
C A H ps1 log
e
H p s1 0.086 (bits per symbol) (2.14)
24
where ps1 is the probability of the most probable symbol in the alphabet. In the other words,
the relative performance of Huffman coding (in respect to the entropy H) is the better the
smaller is the probability of the most probable symbol s1. This is often the case in multi
symbol alphabet, however, in binary images the probability distribution is often very skew; it
is not rare that the probability of the white pixel is as high as 0.99.
In principle, the problem with the skew distribution of binary alphabet is possible to avoid by
blocking the pixels, for example by constructing a new symbol from 8 subsequent pixel in the
image. The redefined alphabet thus consists of all the 256 possible pixel combinations.
However, the probability of the most probable symbol (8 white pixels) is still too high, that is
0.998 = 0.92. Moreover, the number of combinations increases exponentially by the number
of symbols in the block.
Implementation aspects of arithmetic coding:
Two variables are needed to describe the interval; A is the length of the interval, and C is the
lower bound. The interval, however, very soon gets so small that it cannot be expressed by
16 or 32 bit integer in computer memory. The following procedure is thus applied. When the
interval falls completely below the half point (0.5), it is known that the codeword describing
the final interval starts with the bit 0. If the interval were above the half point, the codeword
would start with the bit 1. In both cases, the starting bit can be outputted, and the processing
can then be limited to the corresponding half of the full scale, which is either [0, 0.5], or
[0.5, 1]. This is realized by zooming the corresponding half as shown in Figure 2.9.
1.0 1.0
0.6
0.5 0.5
0.4
0.3
0.2
0.0 0.0
Figure 2.9: Example of half point zooming.
The underflow can also occur if the interval decreases so that its lower bound is just below the
half point, but the upper bound is still above it. In this case the halfpoint zoom cannot be
applied. The solution is the socalled quarterpoint zooming, see Figure 2.10. The condition
for quarterpoint zooming is that the lower bound of the interval exceeds 0.25, and the upper
bound doesn't exceed 0.75. Now it is known that the following bit stream is either "01xxx" if
final interval is below half point, or "10xxx" if the final interval is above the half point (Here
xxx refers to the rest of the code stream). In general, it can be shown that if the next bit due to
25
a halfpoint zooming will be b, it is followed by as many opposite bits of b as there were
quarterpoint zoomings before the next halfpoint zooming.
Since the final interval completely covers either the range [0.25, 0.5], or the range [0.5, 0.75],
the encoding can be finished by sending the bit pair "01" if the upper bound is below 0.75, or
"10" if the lower bound exceeds 0.25.
Figure 2.10: Example of two subsequent quarterpoint zoomings.
QMcoder:
QMcoder is an implementation of arithmetic coding which has been specially tailored for
binary data. One of the primary aspects in its designing has also been the speed. The main
differences between QMcoder and the arithmetic coding described in the previous section is
summarized as follows:
· The input alphabet of QMcoder must be in binary form.
· For gaining speed, all multiplications in QMcoder has been eliminated.
· QMcoder includes its own modelling procedures.
The fact that QMcoder is a binary arithmetic coder does not exclude the possibility of having
multialphabet source. The symbols just have to be coded by one bit at a time, using a binary
decision tree. The probability of each symbol is the product of the probabilities of the node
decisions.
In QMcoder the multiplication operations have been replaced by fast approximations or by
shiftleftoperations in the following way. Denote the more probable symbol of the model by
MPS, and the less probable symbol by LPS. In other words, the MPS is always the symbol
which has the higher probability. The interval in QMcoder is always divided so that the LPS
subinterval is above the MPS subinterval. If the interval is A and the LPS probability estimate
is Qe, the MPS probability estimate should ideally be (1Qe). The lengths of the respective
subintervals are then A × Qe and A × 1 Qe . This ideal subdivision and symbol ordering is
shown in Figure 2.11.
26
A+C
LPS A Qe
C+AQe A
MPS A (1Qe)
Figure 2.11: Illustration of symbol ordering and ideal interval subdivision.
Instead of operating in the scale [0, 1], the QMcoder operates in the scale [0, 1.5]. Zooming,
(or renormalization as it is called in QMcoder) is performed every time the length of the
interval gets below half the scale 0.75 (the details of the renormalization are bypassed here).
Thus the interval length is always in the range 0 . 75 A 1. 5 . Now, the following rough
approximation is made:
If we follow this scheme, coding a symbol changes the interval as follows:
After MPS:
C is unchanged
A A × 1 Qe A A × Qe A Qe (2.16)
After LPS:
C C A × 1 Qe C A A × Qe C A Qe
A A × Qe Qe
(2.17)
Now all multiplications are eliminated, except those needed in the renormalization. However,
the renormalization involves only multiplications by the number of two, which can be
performed by bitshifting operation.
QMcoder also includes its own modelling procedures, which makes the separation between
modelling and coding a little bit unconventional, see Figure 2.12. The modelling phase
determines the context to be used and the binary decision to be coded. QMcoder then picks
27
up the corresponding probability, performs the actual coding and updates the probability
distribution if necessary. The way QMcoder handles the probabilities is based on a stochastic
algorithm (details are omitted here). The method also adapts quickly to local variations in the
image. For details see the "JPEGbook" by Pennebaker and Mitchell [1993].
Compression with Compression
a rithmetic coding with QMcoder
Figure 2.12: Differences between the optimal arithmetic coding (left), and
the integrated QMcoder (right).
2.2.3 GolombRice codes
Golomb codes are a class of prefix codes which are suboptimal but very easy to implement.
Golomb codes are used to encode symbols from a countable alphabet. The symbols are
arranged in descending probability order, and nonnegative integers are assigned to the
symbols, beginning with 0 for the most probable symbol, see Figure 2.13. To encode integer
x, it is divided into two components, to the most significant part xM and to the least significant
part xL:
x
x M
m (2.18)
x L x mod m
where m is the parameter of the Golomb coding. The values xM and xL are a complete
representation of x since:
x xM × m xL (2.19)
The xM is outputted using unary code, and the xL is outputted using binary code (an adjusted
binary code is needed if m is not a power of 2), see Table 2.5 for an example.
28
Rice coding is the same as Golomb coding except that only a subset of the parameter values
may be used, namely the power of 2. The Rice code with the parameter k is exactly the same
as the Golomb code with parameter m=2k. The Rice codes are even simpler to implement
since xM can be computed by shifting x bitwise right k times. The xL is computed by masking
out all but the k low order bits. The sample Golomb and Rice code tables are shown in
Table 2.6.
p ( x )
x
0 1 2 ...
Figure 2.13: Probability distribution function assumed by Golomb and Rice codes.
Table 2.5. An example of the Golomb coding with the parameter m=4.
x xM xL Code of xM Code of xL
0 0 0 0 00
1 0 1 0 01
2 0 2 0 10
3 0 3 0 11
4 1 0 10 00
5 1 1 10 01
6 1 2 10 10
7 1 3 10 11
8 2 0 110 00
: : : : :
29
Table 2.6. Golomb and Rice codes for the parameters m=1 to 5.
30
3 Binary images
Binary images represent the simplest and most space economic form of images and are of
great interest when colors or greyscales are not needed. They consists only of two colors,
black and white. The probability distribution of this input alphabet is often very skew, e.g.
p(white)=98 % and p(black)=2 %. Moreover, the images usually have large homogenous
areas of the same color. These properties can be taken advantage of in the compression of
binary images.
3.1 Runlength coding
C1, n1, C2, n2, C3, n3,... (3.1)
where Ci is the code due to the color information of the i'th run, and ni is the code due to the
length of the run. In binary images there are only two colors, thus a black run is always
followed by a white run, and vice versa. Therefore it is sufficient to code only the lengths of
the runs; no color information is needed. The first run in each line is assumed to be white. On
the other hand, if the first pixel happens to be black, a white run of zero length is coded.
The runlength "coding" method is purely a modelling scheme resulting to a new alphabet
consisting of the lengths of the runs. These can be coded for example by using the Huffman
code given in Table 3.1. Separate code tables are used to represent the black and white runs.
The code table contains two types of codewords: terminating codewords (TC) and makeup
codewords (MUC). Runs between 0 and 63 are coded using single terminating codeword.
Runs between 64 and 1728 are coded by a MUC followed by a TC. The MUC represents
a runlength value of 64×M (where M is an integer between 1 and 27) which is equal to, or
shorter than, the value of the run to be coded. The following TC specifies the difference
between the MUC and the actual value of the run to be coded. See Figure 3.1 for an example
of runlength coding using the code table of Table 3.1.
31
current line
code generated
Figure 3.1: Example of onedimensional runlength coding.
Vector runlength coding:
The runlength coding efficiently codes large uniform areas in the images, even though the
twodimensional correlations are ignored. The idea of the runlength coding can also be
applied twodimensionally, so that the runs consist of m´nsized blocks of pixels instead of
single pixels. In this vector runlength coding the pixel combination in each run has to be
coded in addition to the length of the run. Wang and Wu [1992] reported up to 60 %
improvement in the compression ratio when using 8´4 and 4´4 block sizes. They also used
blocks of 4´1 and 8´1 with slightly smaller improvement in the compression ratio.
Predictive runlength coding:
The performance of the runlength coding can be improved by using prediction technique as
a preprocessing stage (see also Section 2.1). The idea is to form a socalled error image from
the original one by comparing the value of each original pixel to the value given by
a prediction function. If these two are equal, the pixel of the error image is white; otherwise it
is black. The runlength coding is then applied to the error image instead of the original one.
Benefit is gained from the increased number of white pixels; thus longer white runs will be
obtained.
The prediction is based on the values of certain (fixed) neighboring pixels. These pixels have
already been encoded and are therefore known to the decoder. The prediction is thus identical
in the encoding and decoding phases. The image is scanned in rowmajor order and the value
of each pixel is predicted from the particular observed combination of the neighboring pixels,
see Figure 3.2. The frequency of a correct prediction varies from 61.4 to 99.8 % depending on
the context; the completely white context predicts a white pixel with a very high probability,
and a context with two white and two black pixels usually gives only an uncertain prediction.
32
The prediction technique increases the proportion of white pixels from 94 % to 98 % for a set
of test images; thus the number of black pixels is only one third than that of the original
image. An improvement of 30 % in the compression ratio was reported by Netravali and
Mounts [1980]; and up to 80 % with an inclusion of the socalled reordering technique.
33
Table 3.1: Huffman code table for the runlengths.
34
terminating codewords
n white runs black runs n white runs black runs
0 00110101 0000110111 32 00011011 000001101010
1 000111 010 33 00010010 000001101011
2 0111 11 34 00010011 000011010010
3 1000 10 35 00010100 000011010011
4 1011 011 36 00010101 000011010100
5 1100 0011 37 00010110 000011010100
6 1110 0010 38 00010111 000011010101
7 1111 00011 39 00101000 000011010111
8 10011 000101 40 00101001 000001101100
9 10100 000100 41 00101010 000001101101
10 00111 0000100 42 00101011 000011011010
11 01000 0000101 43 00101100 000011011011
12 001000 0000111 44 00101101 000001010100
13 000011 00000100 45 00000100 000001010101
14 110100 00000111 46 00000101 000001010110
15 110101 000011000 47 00001010 000001010111
16 101010 0000010111 48 00001011 000001100100
17 101011 0000011000 49 01010010 000001100101
18 0100111 0000001000 50 01010011 000001010010
19 0001100 00001100111 51 01010100 000001010011
20 0001000 00001101000 52 01010101 000000100100
21 0010111 00001101100 53 00100100 000000110111
22 0000011 00000110111 54 00100101 000000111000
23 0000100 00000101000 55 01011000 000000100111
24 0101000 00000010111 56 01011001 000000101000
25 0101011 00000011000 57 01011010 000001011000
26 0010011 000011001010 58 01011011 000001011001
27 0100100 000011001011 59 01001010 000000101011
28 0011000 000011001100 60 01001011 000000101100
29 00000010 000011001101 61 00110010 000001011010
30 00000011 000001101000 62 00110011 000001100110
31 00011010 000001101000 63 00110100 000001100111
makeup codewords
n white runs black runs n white runs black runs
64 11011 0000001111 960 011010100 0000001110011
128 10010 000011001000 1024 011010101 0000001110100
192 010111 000011001001 1088 011010110 0000001110101
256 0110111 000001011011 1152 011010111 0000001110110
320 00110110 000000110011 1216 011011000 0000001110111
384 00110111 000000110100 1280 011011001 0000001010010
448 01100100 000000110101 1344 011011010 0000001010011
512 01100101 0000001101100 1408 011011011 0000001010100
576 01101000 0000001101101 1472 010011000 0000001010101
35
640 01100111 0000001001010 1536 010011001 0000001011010
704 011001100 0000001001011 1600 010011010 0000001011011
768 011001101 0000001001100 1664 011000 0000001100100
832 011010010 0000001001101 1728 010011011 0000001100101
896 011010011 0000001110010 EOL 0000000000001 000000000001
99.76 % 96.64 %
62.99 % 77.14 %
83.97 % 94.99 %
87.98 % 61.41 %
71.05 % 61.41 %
86.59 % 78.74 %
70.10 % 78.60 %
95.19 % 91.82 %
Figure 3.2: A fourpixel prediction function. The various prediction contexts (pixel
combinations) are given in the left column; the corresponding prediction value in the middle;
and the probability of the correct prediction in rightmost column.
3.2 READ code
Instead of the lengths of the runs, one can code the location of the boundaries of the runs (the
black/white transitions) relative to the boundaries of the previous row. This is the basic idea in
the method called relative element address designate (READ). The READ code includes three
coding modes:
· Vertical mode
· Horizontal mode
· Pass mode
In vertical mode the position of each color change (white to black or black to white) in the
current line is coded with respect to a nearby change position (of the same color) on the
reference line, if one exists. "Nearby" is taken to mean within three pixels, so the vertical
mode can take on one of seven values: 3, 2, 1, 0, +1, +2, +3. If there is no nearby change
position on the reference line, onedimensional runlength coding called horizontal mode is
used. A third condition is when the reference line contains a run that has no counterpart in the
current line; then a special pass code is sent to signal to the receiver that the next complete run
36
of the opposite color in the reference line should be skipped. The corresponding codewords
for each coding mode are given in Table 3.2.
Figure 3.3 shows an example of coding in which the second line of pixels the current line
is transformed into the bitstream at the bottom. Black spots mark the changing pixels that are
to be coded. Both endpoints of the first run of black pixels are coded in vertical mode,
because that run corresponds closely with one in the reference line above. In vertical mode,
each offset is coded independently according to a predetermined scheme for the possible
values. The beginning point of the second run of black pixels has no counterpart in the
reference line, so it is coded in horizontal mode. Whereas vertical mode is used for coding
individual changepoints, horizontal mode works with pairs of changepoints. Horizontal
mode codes have three parts: a flag indicating the mode, a value representing the length of the
preceding white run, and another representing the length of the black run. The second run of
black pixels in the reference line must be "passed", for it has no counterpart in the current
line, so the pass code is emitted. Both endpoints of the next run are coded in vertical mode,
and the final run is coded in horizontal mode. Note that because horizontal mode codes pairs
of points, the final changepoint shown is coded in horizontal mode even though it is within
3pixel range of vertical mode.
Table 3.2. Code table for the READ code. Symbol wl refers to the length of the white run and
bl to the length of the black run; Hw() and Hb() refer to the Huffman codes of Table 3.1.
Mode: Codeword:
Pass 0001
Horizontal 001 + Hw(wl) + Hb(bl)
+3 0000011
+2 000011
+1 011
Vertical 0 1
1 010
2 000010
3 0000010
37
reference line
current line
vertical mode horizontal mode pass vertical mode horizontal mode
1 0 3 white 4 black code +2 2 4 white 7 black
010 1 001 1000 011 0001 000011 000010 001 1011 00011
code generated
Figure 3.3: Example of the twodimensional READ code.
3.3 CCITT group 3 and group 4 standards
The RLE and READ algorithms are included in two image compression standards, known as
CCITT1 Group 3 (G3) and Group 4 (G4). They are nowadays widely used in FAXmachines.
The CCITT standard also specifies details like paper size (A4) and scanning resolution. The
two optional resolutions of the image are specified as 1728´1188 pixels per A4 page
(200´100 dpi) in the low resolution, and 1728´2376 (200´200 dpi) in the high resolution.
The G3 specification covers binary documents only, although G4 does include provision for
optional grayscale and color images.
In the G3 standard every kth line of the image is coded by the 1dimensional RLEmethod
(also referred as Modified Huffman) and the 2dimensional READcode (more accurately
referred as Modified READ) is applied for the rest of the lines. In the G3, the k parameter is
set to 2 for the low resolution, and 4 for the high resolution images. In the G4, k is set to
infinite so that every line of the image is coded by READcode. An allwhite reference line is
assumed at the beginning.
3.4 Block coding
The idea in the block coding, as presented by Kunt and Johnsen [1980], is to divide the image
into blocks of pixels. A totally white block (allwhite block) is coded by a single 0bit. All
other blocks (non white blocks) thus contain at least one black pixel. They are coded with
a 1bit as a prefix followed by the contents of the block, bit by bit in rowmajor order, see
Figure 3.4.
1
Consultative Committee for International Telegraphy and Telephone
38
The block coding can be extended so that allblack blocks are considered also. See Table 3.2
for the codewords of the extended block coding. The number of uniform blocks (allwhite and
allblack blocks) depends on the size of the block, see Figure 3.5. The larger the block size,
the more efficiently the uniform blocks can be coded (in bits per pixel), but the less there are
uniform blocks to be taken advantage of.
The power of block coding can be improved by coding the bit patterns of the 2´2blocks by
Huffman coding. Because the frequency distributions of these patterns are quite similar for
typical facsimile images, a static Huffman code can be applied. The Huffman coding gives an
improvement of ca. 10 % in comparison to the basic hierarchical block coding. Another
improvement is the use of prediction technique in the same manner as was used with the run
length coding. The application of the prediction function of Figure 3.2 gives an improvement
ca. 30 % in the compression ratio.
Figure 3.4: Basic block coding technique. Coding of four sample blocks.
IMAGE 1 IMAGE 3
100% 100%
ç%-
5%{
80% 80%
60% 60%
40% «r–¸
AL 40% «À– AL
20% 20%
0% 0%
4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
39
IMAGE 4 IMAGE "BLACK HAND"
100% 100%
80% 0%v 80% ž%ä
60% 60%
´Í
Œ AL ´I
Œ
AL
40% 40%
«
–Ç AL
20% 20% «
–Ö AL
0% 0%
4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Figure 3.5: Number of block types as a function of block size.
Table 3.2. Codewords in the block coding method. Here xxxx refers to the content
of the block, pixel by pixel.
I m a g e to b e c o m p r e s s e d : C o d e b its :
1 0111 0011 0111 1000 0111 1111 1111
0101 1010 1100
x x x x x x x x x x
x x x x x x x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
Figure 3.6: Example of hierarchical block coding.
3.5 JBIG
JBIG (Joint Bilevel Image Experts Group) is the newest binary image compression standard
by CCITT and ISO2. It is based on contextbased compression where the image is compressed
pixel by pixel. The pixel combination of the neighboring pixels (given by the template)
defines the context, and in each context the probability distribution of the black and white
pixels are adaptively determined on the basis of the already coded pixel samples. The pixels
2
International Standards Organization
40
are then coded by arithmetic coding according to their probabilities. The arithmetic coding
component in JBIG is the QMcoder.
Binary images are a favorable source for contextbased compression, since even a relative
large number of pixels in the template results to a reasonably small number of contexts. The
templates included in JBIG are shown in Figure 3.7. The number of contexts in a 7pixel
template is 27=128, and in the 10pixel model it is 210=1024. (Note that a typical binary image
of 1728´1188 pixels consists of over 2 million pixels.) The larger the template, the more
accurate probability model it is possible to obtain. However, with a large template the
adaptation to the image takes longer; thus the size of the template cannot be arbitrary large.
The optimal context size for the CCITT test images set Figure 3.8.
? ? ?
? Pixel to be coded
Pixel within template
Figure 3.7: JBIG sequential model templates.
26
24 optimal
22
20
Compression ratio
18
16
14 standard
12
10
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Pixel in context template
Figure 3.8: Sample compression ratios for contextbased compression.
41
JBIG includes two modes of processing: sequential and progressive. The sequential mode is
the traditional rowmajor order processing. In the progressive mode, a reduced resolution
version of the image (referred as starting layer) is compressed first, followed by the second
layer, and so on, see Figure 3.9. The lowest resolution is either 12.5 dpi (dots per inch), or 25
dpi. In each case the resolution is doubled for the next layer. In this progressive mode, the
pixels in the previous layer can be included in the context template when coding the next
layer. In JBIG, four such pixels are included, see Figure 3.10. Note that there are four
variations (phases 0, 1, 2, and 3) of the same basic template model depending on the position
of the current pixel.
Lowest Highest
resolution resolution
Figure 3.9: Different layers of JBIG.
? ?
Phase 0 Phase 1
? ?
Phase 2 Phase 3
Figure 3.10: JBIG progressive model templates.
42
Resolution reduction in JBIG:
In the compression phase, the lower resolution versions are calculated on the basis on the next
layer. The obvious way to halve the resolution is to group the pixels into 2´2 blocks and take
the color of majority of these four pixels. Unfortunately, with binary images it is not clear
what to do when two of pixels are black (1) and the other two are white (0). Consistently
rounding up or down tends to wash out the image very quickly. Another possibility is to
round the value in a random direction each time, but this adds considerable noise to the
image, particularly at the lower resolutions.
This method preserves the overall grayness of the image. However, problems occur with lines
and edges, because these deteriorate very rapidly. To address this problem, a number of
exception patterns can be defined which, when they are encountered, reverse the polarity of
the pixel that is obtained from thresholding the weighted sum as described above. An example
of such an exception pattern is show in Figure 3.12.
1 3
1 2 1 0 0 0
2 4 2 0 0 0
3 0 1
1 2 1 1 1 1
Figure 3.11: Resolution reduction in JBIG: Figure 3.12: Example of an
participating pixels marked (left); exception pattern for resolution
pixel weights (right) reduction.
43
432´594 216´297 108´148 54´74
Figure 3.13: Example of JBIG resolution reduction for CCITT image 5.
3.6 JBIG2
The emerging standard JBIG2 enhances the compression of text images using pattern
matching technique. The standard will have two encoding methods: pattern matching &
substitution (PM&S), and soft pattern matching (SPM). The image is segmented into pixel
blocks containing connected black pixels using any segmentation technique. The content of
the blocks are matched to the library symbols. If an acceptable match (within a given error
marginal) is found, the index of the matching symbol is encoded. In case of unacceptable
match, the original bitmap is coded by a JBIGstyle compressor. The compressed file consists
of bitmaps of the library symbols, location of the extracted blocks as offsets, and the content
of the pixel blocks.
The PM&S encoding mode performs lossy substitution of the input block by the bitmap of the
matched character. This requires very safe and intelligent matching procedure to avoid
substitution error. The SPM encoding mode, on the other hand, is lossless and is outlined in
Fig. 3.14. Instead of performing substitution, the content of the original block is also coded in
order to allow lossless compression. The content of the pixel block is coded using a JBIG
style compressor with the difference that the bitmap of the matching symbol is used as an
additional information in the context model. The method applies the twolayer context
template shown in Fig. 3.15. Four context pixels are taken from the input block and seven
from the bitmap of the matching dictionary symbol. The dictionary is builded adaptively
during the compression by conditionally adding the new pixel blocks into dictionary. The
standard defines mainly the general file structure and the decoding procedure but leaves some
freedom in the design of the encoder.
44
S e g m e n t im a g e
in to p ix e l b lo c k s
N e x t b lo c k S e arc h fo r
a c c e p tab le m atc h
M a tc h fo u n d ?
Yes N o
E n c o d e b itm a p b y
E n c o d e in d e x o f
J B IG s ty le
m a tc h in g s y m b o l
c o m pre sso r
E n c o d e p o s itio n o f
th e b lo c k a s o ffs e t
L as t b lo c k ?
Yes
EN D
Figure 3.14: Block diagram of JBIG2.
C o n t e x t p ix e ls f r o m C o n t e x t p ix e ls f r o m
t h e o r ig i n a l im a g e t h e m a t c h in g p i x e l b lo c k
Figure 3.15: Twolevel context template for coding the pixel blocks.
3.7 Summary of binary image compression algorithms
45
Note that all of these methods are lossless. Figure 3.17 gives comparison of JBIG and JBIG2
for the set of CCITT test images.
25.0
Compression ratio
20.0
15.0
23.3
10.0 18.0 18.9 17.9
0.0
COMPRESS GZIP PKZIP BLOCK RLE 2D-RLE ORLE G3 G4 JBIG
Figure 3.16: Compression efficiency of several binary image compression algorithms
for CCITT test image 3.
60000
JBIG
50000
JBIG-2
40000
bytes
30000
20000
10000
0
1 2 3 4 5 6 7 8
Figure 3.17: Comparison of JBIG and JBIG2 for the set of CCITT test images.
46
4 Continuous tone images
4.1 Lossless and nearlossless compression
4.1.1 Bitplane coding
The idea of bit plane coding is to apply binary image compression for the bit planes of the
grayscale image. The image is first divided into k separate bit planes, each representing
a binary image. These bit planes are then coded by any compression method designed for the
binary images, e.g. context based compression with arithmetic coding, as presented in Section
3.5. The bit planes of the most significant bits are the most compressible, while the bit planes
of the least significant bits are nearly random and thus mostly uncompressable.
GC BC BC 1 (4.1)
where denotes bitwise exclusive OR operation, and >> denotes bitwise logical rightshift
operation. Note that the ith GC bit plane is constructed by performing an exclusive OR on the
ith and (i+1)th BC bit planes. The most significant bit planes of the binary and the Gray codes
are identical. Figures 4.2 and 4.3 shows the bit planes of binary and Gray codes for the test
image Camera.
47
Binary code Gray code
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
Figure 4.1: Illustration of fourbit binary and Gray codes.
Suppose that the bit planes are compressed by the MSP (most significant plane) first and the
LSP last. A small improvement in a context based compression is achieved, if the context
templates includes few pixels from the previously coded bit plane. Typically the bits included
in the template are the bit of the same pixel that is to be coded, and possibly the one above it.
This kind of templated is referred as a 3Dtemplate.
48
Binary code Gray code
Figure 4.2: Binary and Gray code bit planes for test image Camera, bit planes 7 through 4.
49
Binary code Gray code
50
Figure 4.3: Binary and Gray code bit planes for test image Camera, bit planes 3 through 0.
4.1.2 Lossless JPEG
Lossless JPEG (Joint Photographic Experts Group) proceeds the image pixel by pixel in row
major order. The value of the current pixel is predicted on the basis of the neighboring pixels
that have already been coded (see Figure 2.4). The prediction functions available in JPEG are
given in Table 4.1. The prediction errors are coded either by Huffman coding or arithmetic
coding. The Huffman code table of lossless JPEG is given in Table 4.2. Here one first
encodes the category of the prediction error followed by the binary representation of the value
within the corresponding category, see Table 4.3 for an example.
The arithmetic coding component in JPEG is QMcoder, which is a binary arithmetic coder.
The prediction errors are coded in the same manner than in the Huffman coding scheme:
category value followed by the binary representation of the value. Here the category values
are coded by a sequence of binary decisions as shown in Figure 4.4. If the prediction value is
not zero, the sign of the difference is coded after the "zero/nonzero" decision. Finally, the
value within the category is encoded bit by bit from the most significant bit to the least
significant bit. The probability modelling of QMcoder takes care that the corresponding
binary decisions are encoded according to their corresponding probabilities. The details of the
context information involved in the scheme are omitted here.
0
{ 0 }
1
{ 1 }
2
{ 2, 3 }
::
3
{ 4, . ,7 }
7 8
{ 63, . ,127 } { 128, ..,255 }
Figure 4.4: Binary decision tree for coding the categories.
51
Table 4.1: Predictors used in lossless JPEG.
Table 4.2: Huffman coding of the prediction errors.
Table 4.3: Example of lossless JPEG for the pixel sequence (10, 12, 10, 7, 8, 8, 12) when
using the prediction mode 1 (i.e. the predictor is the previous pixel value). The predictor for
the first pixel is zero.
Pixel: 10 12 10 7 8 8 12
Prediction error: +10 +2 2 3 +1 0 +4
Category: 4 2 2 2 1 0 3
Bit sequence: 1011010 01110 01101 01100 0101 00 100100
4.1.3 FELICS
FELICS (Fast and Efficient Lossless Image Compression System) is a simple but yet efficient
compression algorithm proposed by Howard and Vitter [1993]. The main idea is to avoid the
use of computationally demanding arithmetic coding, but instead use a simpler coding scheme
together with a clever modelling method.
52
FELICS uses the information of two adjacent pixels when coding the current one. These are
the one to the left of the current pixel, and the one above it. Denote the values of the
neighboring pixels by L and H so that L is the one which is smaller. The probabilities of the
pixel values obeys the distribution given in Figure 4.5.
probability
L H intensity
Figure 4.5: Probability distribution of the intensity values.
The coding scheme is as follows: A code bit indicates whether the actual pixel value P falls
into the range [L, H]. If so, an adjusted binary coding is applied. Here the hypothesis is that
the inrange values are uniformly distributed. Otherwise the above/belowrange decision
requires another code bit, and the value is then coded by Rice coding with adaptive
k parameter selection.
Adjusted binary codes:
To encode an inrange pixel value P, the difference PL must be encoded. Denote D=HL,
thus the number of possible values in the range is D+1. If D+1 is a power of two, a binary
code with log 2 D 1 bits is used. Otherwise the code is adjusted so that log 2 D 1 bits
are assigned to some values, and log 2 D 1 bits to others. Because the values near the
middle of the range are slightly more probable, shorter codewords are assigned to those
values. For example, if D=4 there are five values (0, 1, 2, 3, 4) and their corresponding
adjusted binary codewords are (111, 10, 00, 01, 110).
Rice codes:
If the pixel value P exceed the range [L, H], the difference P(H+1) is coded using Rice
coding. According to hypotesis of the distribution in 4.6, the values have exponentially
decreasing probabilities when P increases. On the other hand, if P falls below the range
[L, H], the difference (L1)P is then coded. The shape of the distributions in the above and
below ranges are identical, thus the same Rice coding is applied. See Table 4.4 for a
summarison of the FELICS code words, and Section 2.2.3 for details of Rice coding.
For determining the Rice coding parameter k, the D is used as a context. For each context D,
a cumulative total bit rate is maintained for each reasonable Rice parameter value k, of the
53
code length that would have resulted if the parameter k were used to encode all values
encoutered so far in the context. The parameter with the smallest cumulative code length is
used to encode the next value encountered in the context. The allowed parameter values are
k=0, 1, 2, and 3.
Table 4.4: Code table of FELICS; B = adjusted binary coding, and R = Rice coding.
Pixel position: Codeword:
Below range 10 + R(LP1)
In range 0 + B(PL)
Above range 11 + R(PH1)
4.1.4 JPEGLS
The JPEGLS is based on the LOCOI algorithm (Weinberger et al., 1998). The method uses
the same ideas as the lossless JPEG with the improvement of utilizing context modeling and
adaptive correction of the predictor. The coding component is changed to Golomb codes with
an adaptive choice of the skewness parameter. The main structure of the JPEGLS is shown in
Fig. 4.6. The modeling part can be broken into the following three components:
a. Prediction
b. Determination of context.
c. Probability model for the prediction errors.
min a , b if c max( a , b)
x max a , b if c min( a , b) (4.2)
a bc if c otherwise
The predictor tends to pick a in cases of an horizontal edge above the current location, and b
in cases of a vertical edge exists left of the current location. The third choice (a+bc) is based
on the presumption that there is a smooth plane around the pixel and uses this estimation as
the prediction. The prediction residual = x x are then input to the context modeler, which
will decide the appropriate statistical model to be used in the coding.
54
M o d e le r c o n te x t C oder
F ix e d p r e d ic te d
G ra d ie n ts v a lu e s
p re d ic to r
+ co m p ressed
b it s tr e a m
F lat A d a p tive
re g io n ? c o r re c tio n
c b d m ode r e g u la r
a x im a g e r u n le n g th s ,
s a m p le s run
s ta tis tic s
im a g e R u n c o u n te r R un coder
s a m p le s
Figure 4.6: Block diagram of JPEGLS.
C
2T 1 3
1 (4.3)
2
where T is the number of nonzero regions. Using the default regions (T=4) this gives 365
models in total. Each context will have its own counters and statistical model as described
next.
The fixed predictor of (4.2) is finetuned by adaptive adjustment of the prediction value. An
optimal predictor would result in prediction error = 0 on average, but this is not necessary
the case. A technique called bias cancellation is used in JPEGLS to detect and correct
systematic bias in the predictor. A correction value C’ is estimated based on the prediction
errors seen so far in the context, and then subtracted from the prediction value. The correction
value is estimated as the average prediction error:
D
C'
N
(4.4)
55
implementation is slightly different and the reader is recommended to look for the details
from the paper by Weinberger et al. (1998).
M 2 u (4.5)
This mapping will reorder the values in the interleaved sequence 0, 1, +1, 2, +2, and so on.
Golomb codes (or their special case Rice codes) are then used to code the mapped values, see
Section 2.2.3. The only parameter of the code is k, which defines the skewness of the
distribution. In JPEGLS, the value of k is adaptively determined as follows:
k min k ' 2 k ' × N A (4.6)
where A is the accumulated sum of magnitudes of the prediction errors (absolute values) seen
in the context so far. The appropriate Golomb code is then used as described by Weinberger
et al. (1998).
JPEGLS has also “run mode” for coding flat regions. The run mode is activated when a flat
region is detected as a = b = c = d (or equivalently q1 = q2 = q3 = 0). The number of repeated
successful predictions (=0) is the encoded by a similar Golomb coding of the run length. The
coding the returns to the regular mode when coding the next unsuccessful prediction error
(0). This technique is also referred as alphabet extension.
The standard included also an optional nearlossless mode, in which every sample value in
a reconstructed image is guaranteed to differ from the original value by a (small) preset
amount . This mode is implemented simply by quantizing the prediction error as follows:
Q u (4.7)
2 1
The quantized values must then be used also in the context modeling, and taken into account
in the coding and decoding steps.
4.1.5 Residual coding of lossy algorithms
An interesting approach to lossless coding is the combination of lossy and lossless methods.
First, a lossy coding method is applied to the image. Then a residual between the original and
the reconstructed image is calculated and compressed by any lossless compression algorithm.
56
This scheme can also be considered as a kind of progressive coding. The lossy image serves
as a rough version of the image which can be quickly retrieved. Then the complete image can
be retrieved by adding the residual and the lossy parts together.
4.1.6 Summary of the lossless grayscale compression
The compression efficiency of the bit plane coding is rather close to the results of the lossless
JPEG. In fact, the coding of the bit planes by JBIG (with 3Dtemplate) outperforms lossless
JPEG for images with low bits per pixel values in the original image, see Figure 4.7. For
example, the corresponding bit rates of the lossless JPEG and JBIG for the bit planes were
3.83 and 3.92, for a set of 8 bpp images (256 gray scales). On the other hand, for 2 bpp
images (4 gray scales) the corresponding bit rates were 0.34 (lossless JPEG), and 0.24 (JBIG
bit plane coding). The result of the bit plane based JBIG coding are better when the precision
of the image is 6 bpp or lower. Otherwise lossless JPEG gives slightly better compression
results.
Figure 4.7: Compression efficiency of lossless image compression algorithms
for test image Lena (512´512´8).
57
4.00
3.50
3.00
B it s r a t e
2.50
2.00
1.50 JBIG
1.00
JPEG
0.50
0.00
2 3 4 5 6 7 8
Bits per pixel
Figure 4.8: Compression versus precision. JBIG refers to the bit plane coding, and JPEG for
the lossless JPEG. The results are for the JPEG set of test images.
4.2 Block truncation coding
The basic idea of block truncation coding (BTC) is to divide the image into 4´4pixel blocks
and quantize the pixels to two different values, a and b. For each block, the mean value ( x )
and the standard deviation (s) are calculated and encoded. Then twolevel quantization is
performed for the pixels of the block so that a 0bit is stored for the pixels with values smaller
than the mean, and the rest of the pixels are represented by a 1bit. The image is reconstructed
at the decoding phase from the x and s, and from the bit plane by assigning the value a to the
0value pixels and b to the 1value pixels:
q
a x s× (4.8)
mq
mq
b x s× (4.9)
q
where m (=16) is the total number of the pixels in the block and q is the number of 1bits in
the bit plane. The quantization level values were chosen so that the mean and variance of the
pixels in the block would be preserved in the decompressed image, thus the method is also
referred as moment preserving BTC. Another variant of BTC, called absolute moment BTC
(AMBTC) selects a and b as the mean values of the pixels within the two partitions:
1
a × xi (4.10)
m q xi x
58
1
b × xi (4.11)
q xi x
In the moment preserving BTC the quantization data is represented by the pair ( x ,s).
A drawback to this approach is that the quantization levels are calculated at the decoding
phase from the quantized values of ( x ,s) containing rounding errors. Thus extra degradation
is caused by the coding phase. The other approach is to calculate the quantization levels (a,b)
already at the encoding phase and transmit them. In this way one can minimize both the
quantization error and the computation needed at the decoding phase.
The basic BTC algorithm does not consider how the quantization data ( x ,s) or (a,b) and
the bit plane should be coded, but simply represents the quantization data by 8+8 bits, and the
bit plane by 1 bit per pixel. Thus, the bit rate of BTC is 8 8 m m bits per pixel (=2.0 in
the case of 4´4 blocks).
Original Bitplane Reconstructed
Figure 4.9: Example of the moment preserving BTC.
The major drawback of BTC is that it performs poorly in high contrast blocks because two
quantization levels are not sufficient to describe these blocks. The problem can be attacked by
using variable block sizes. With large blocks one can decrease their total number and
therefore reduce the bit rate. On the other hand, small blocks improve the image quality.
One such an approach is to apply quadtree decomposition. Here the image is segmented into
blocks of size m1 ´m1. If standard deviation s of a block is less than a predefined threshold
sth (implying a low contrast block) the block is coded by a BTC algorithm. Otherwise it is
divided into four subblocks and the same process is repeated until the threshold criterion is
59
met, or the minimal block size (m2 ´m2) is reached. The hierarchy of the blocks is
represented by a quadtree structure.
The method can be further improved by compressing the bit plane, eg. by vector quantization.
This combined algorithm is referred as BTCVQ. The pair (a,b), on the other hand, can be
compressed by forming two subsample images; one from the avalues and another from the b
values. These can then be compressed by any image compression algorithm in the same
manner than the mean value in the mean/residual VQ.
All the previous ideas can be collected together to form a combined BTC algorithm. Let us
next examine one such possible combination. See Table 4.5 for the elements of the combined
method, referred as HBTCVQ. The variable block sizes are applied with the corresponding
minimum and maximum block sizes of 2´2 and 32´32. For a high quality of the compressed
image the use of 2´2blocks is essential in the high contrast regions. Standard deviation (s) is
used as a threshold criterion and is set to 6 for all levels. (For the 4´4level the threshold
value could be left as an adjusting parameter.) The bit plane is coded by VQ using a codebook
with 256 entries, thus the compression effect will be 0.5 bpp for every block coded by VQ.
Two subsample images are formed, one from the avalues and another from the bvalues of
the blocks. They are then coded by FELICS, see Section 4.1.3. The result of the combined
BTC algorithm is illustrated in Figure 4.10.
Table 4.5: Elements of the combined BTC algorithm.
Part: Method:
Quantization AMBTC
Coding of (a,b): FELICS
Coding of bitplane: VQ / 256 entries
Block size: 32´32 ® 2´2
60
bpp = 2.00 bpp = 2.00 bpp = 1.62
mse = 43.76 mse = 40.51 mse = 15.62
Figure 4.10: Magnifications of Lena when compressed by various BTC variants.
4.3 Vector quantization
Vector quantization partitions the input space into K nonoverlapping regions so that the input
space is completely covered. A representative (codevector) is then assigned to each cluster.
Vector quantization maps each input vector to this codevector, which is the representative
vector of the partition. Typically the space is partitioned so that each vector is mapped to its
nearest codevector minimizing certain distortion function. The distortion is commonly the
euclidean distance between the two vectors:
M
d X, Y X Yi (4.12)
2
i
i 1
C c1 , c 2 ,..., c m , X1 , X 2 ,..., X m (4.13)
where X i is the average value of the ith component of the vectors belonging to the partition.
This selection minimizes the euclidean distortion within the partition. The codebook of vector
quantization consists of all the codewords. The design of a VQ codebook is studied in the
next section.
Vector quantization is applied in image compression by dividing the image into fixed sized
blocks (vectors), typically 4´4, which are then replaced by the best match found from the
codebook. The index of the codevector is then sent to the decoder by log 2 K bits, see
61
Figure 4.11. For example, in the case of 4´4 pixel blocks and a codebook of size 256, the bit
rate is log 2 256 16 0.5 (bpp), and the corresponding compression ratio 16 × 8 8 16 .
Mapping
function Q
Code
1 3 vectors
Training set T 2 1 1
Codebook C
3 42 2
4 3 3
M code
N training 42 vectors
vectors
M
8 32 11
Figure 4.11: Basic structure of vector quantization.
4.3.1 Codebook organization
Even if the time complexity of codebook generation algorithms is usually rather high, it is not
so critical since the codebook is typically generated only once as a preprocessing stage.
However, the search of the best match is applied K times for every block in the image. For
example, in the case of 4´4 blocks and K=256, 16×256 multiplications are needed for each
block. For an image of 512´512 pixels there are 16384 blocks in total, thus the number of
multiplications required is over 67 million.
Treestructure VQ:
In a treestructured vector quantization the codebook is organized in an mary balanced tree,
where the codevectors are located in the leaf nodes, see Figure 4.12. The input vector is
compared with m predesigned test vectors at each stage or node of the tree. The nearest test
vector determines which of m paths through the tree to select in order to reach the next stage
of testing. At each stage the number of code vectors is reduced to 1/m of the previous set of
candidates. In many applications m=2 and we have a binary tree. If the codebook size is
K=md, then d= log m K m ary search stages are needed to locate the chosen code vector. An
mary tree have breadth m and depth d. The drawback of the treestructured VQ is that the
best match is not necessarily found because the search is made heuristically on the basis of the
searchtree.
62
Figure 4.12: Treestructured vector quantizer codebook.
The number of bits required to encode a vectors is still log 2 K regardless of the tree
structure. In the case of binary tree, the indexes can be obtained by assigning binary labels for
the branches in the tree so that the index of a codevector is a catenation of the labels within
the path from root to the leaf node. Now these bit sequences can be transmitted (encoded) in
progressive manner from the most significant bits of the blocks to the least significant bits, so
that at the first phase the bits of the first branches in the tree structure are sent. Thus the
decoder can display the first estimate of the image as soon as the first bits of each block are
received. This kind of progressive coding consists of as many stages as there are bits in the
code indexes. Note, that the progressive coding is an option that requires no extra bits in the
coding.
Classified VQ:
In classified vector quantization instead of having one codebook, several (possibly smaller)
codebooks are used. For each block, a classifier selects the codebook where the search is
performed, see Figure 4.13. Typically the codebooks are classified according to the shape of
the block so that codevectors having horizontal edges might be located in one codebook,
blocks having diagonal edges in another, and so on. The encoder has to send two indexes to
the decoder; the index of the chosen codevector within the book, but also the index of the
class where the codevector was taken. The classified VQ can be seen as a special case of tree
structued VQ where the depth of the tree is 1.
There are two motivations for classified VQ. First, it allows faster search from the codebooks
since the codebooks can be smaller than the codebook of a full search VQ. Second, the
classified VQ can also be seen as a type of codebook construction method where the
codevectors are obtained due to their type (shape). However, the classified VQ codebooks in
general are no better than full search codebooks. Consider a classified VQ where we have 4
63
classes, each having 256 codevectors. The total number of bits required to encode each block
is log 2 4 log 2 256 2 8 10 . If we have the same number of bits, we can express
210=1024 different codevectors in a fullsearch VQ. By choosing the union of the four
subcodebooks of classified VQ, the result cannot be any worse than the classified VQ. Thus,
the primary benefit of classified VQ is that it allows a faster search (at the cost of extra
distortion).
i1
X Encoder Decoder
i2
Classifier
Figure 4.13: Classified vector quantizer.
4.3.2 Mean/residual VQ
In mean/residual vector quantization the blocks are divided into two components; the mean
value of the pixels, and the residual where the mean is subtracted from the individual pixel
values:
ri x i x (4.14)
It is easier to design a codebook for the meanremoved blocks than for the original blocks.
The range of the residual pixel values is [255, 255] but they are concentrated around the zero.
Thus, the mean/residual VQ is a kind of prediction technique. However, since the predictor is
the mean value of the same block, it must be encoded also. A slightly different variant is
interpolative/residual VQ, where the predictor is not the mean value of the block, but the
result of a bilinear interpolation, see Figure 4.14. The predictor for each pixel is interpolated
not only on the basis of the mean value of the block, but also on the basis of the mean values
of the neighboring blocks. Details of bilinear interpolation are omitted here.
64
Mean/residual VQ Interpolative/residual VQ
Figure 4.14: Predictors of two residual vector quantizers.
In addition to removing the mean value, one can normalize the pixel values by eliminating the
standard deviation also:
xi x
ri (4.15)
s
where s is the standard deviation of the block. The histogram of the resulting residual values
have zero mean and unit distance. What is left is the shape of the block, ie. the correlations
between the neighboring pixels. This method can be called as mean/gain/shape vector
quantization since it separates these three components from each other. The shape is then
coded by vector quantization. Other coding methods, however, might be more suitable for the
mean and gain (s). For exampe, one could form a subsample image from the mean values of
the blocks and compress them by any image compression algorithm, eg. lossless JPEG. Figure
4.16 shows sample blocks taken from test image Eye (Figure 4.15) when normalized by
Equation (4.15).
Figure 4.15: Test image Eye (50´50´8).
65
Figure 4.16: 100 samples of normalized 4´4 blocks from 4.14.
4.3.3 Codebook design
The codebook is usually constructed on the basis of a training set of vectors. The training set
consists of sample vectors from a set of images. Denote the number of vectors in the training
set by N. The object of the codebook construction is to design a codebook of K vectors so that
the average distortion is minimized in respect to the training set. See Figure 4.17 for an
example of 100 twodimensional training vectors and their one possible partitioning into 10
sets.
66
Figure 4.17: Example of 100 sample training set consisting of 2D vectors plotted into the
vector space (left); example of partitioning the vectors into 10 sample regions (right).
Random heuristic:
Pairwise nearest neighbor:
The pairwise nearest neighbor (PNN) algorithm starts by including all the N training vectors
into the codebook. In each step, two codevectors are merged into one codevector so that the
increase in the overall distortion is minimized. The algorithm is then iterated until the number
of codevectors is reduced to K. The details of the algorithms are explained in the following.
The PNN algorithm starts by calculating the distortion between all pairs of training vectors.
The two nearest neighboring vectors are combined into a single cluster and represented by
their centroid. The input space consists now of N1 clusters, one containing two vectors and
the rest containing a single vector. At each phase of the algorithm, the increase in average
distortion is computed for every pair of clusters, resulting if the two clusters and their
centroids are replaced by the merged two clusters and the corresponding centroid. The two
clusters with minimum increase in the distortion are then merged, and the codevector of that
cluster is its centroid. See Figure 4.18 for an example of PNN algorithm.
67
Training set: Initial: 1st stage: 2nd stage:
( x , x 1) 2
(1,1) 10 10 10
(2,5)
8 8 8
(3,2)
(4,6)
6 6 6
(5,4)
x2 x2 x2
(5,8)
4 4 4
(6,6)
(8,2)
2 2 2
(8,9)
(9,1) 0 0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
x1 x1 x1
Code Best candidates Code Best candidates Code Best candidates
vectors: Dist.: for merging: vectors: Dist.: for merging: vectors: Dist.: for merging:
(1,1) (1,1) (1,1)
(2,5) (8,2)+(9,1) + 2.0 (2,5) (4,6)+(6,6) +2.0 (2,5) (5,4)+(5,6) +1.8
(3,2) (4,6)+(6,6) + 2.0 (3,2) (1,1)+(3,2) +2.4 (3,2) (2,5)+(5,6) +2.2
(4,6) (1,1)+(3,2) + 2.4 (4,6) (2,5)+(4,6) +2.4 (5,6) 2.0 (1,1)+(3,2) +2.4
(5,4) (5,4) (5,4)
(5,8) (5,8) (5,8)
(6,6) (6,6) (8,2)
(8,2) (8,2) (9,2) 2.0
(8,9) (9,2) 2.0
(9,1)
8 8 8
6 6 6
x2 x2 x2
4 4 4
2 2 2
0 0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
x1 x1 x1
Code Best candidates Code Best candidates Code
vectors: Dist.: for merging: vectors: Dist.: for merging: vectors: Dist.:
(1,1) (1,1) (2,2) 2.4
(2,5) (5,5)+(5,8) + 2.2 (2,5) (1,1)+(3,2) +2.4 (2,5)
(3,2) (1,1)+(3,2) + 2.4 (3,2) (2,5)+(5,6) +2.6 (5,6) 3.8
(5,5) 3.8 (2,5)+(5,5) + 2.8 (5,6) 6.0 (2,5)+(3,2) +3.4 (8,2)
(5,8) (8,2) (9,2) 2.0
(8,2) (9,2) 2.0
(9,2) 2.0
Figure 4.18: Pairwise nearest neighbor algorithm for a training set with N=10,
and the final codebook with K=5.
68
Splitting algorithm:
The splitting algorithm has the opposite approach to codebook construction compared to the
PNN method. It starts by a codebook with a single codevector which is the centroid of the
complete training set. The algorithm then produces increasingly larger codebooks by splitting
certain codevector Y to two codevectors Y and Y+, where is a vector of small Euclidean
norm. One choice of is to make it proportional to the vector whose ith component is the
standard deviation of the ith component of the set of training vectors. The new codebook after
the splitting can never be worse than the previous one, since it is the same than the previous
codebook plus one new codevector. The algorithm is iterated until the size of codebook
reaches K.
Generalized Lloyd algorithm:
The Generalized Lloyd Algorithm (GLA), also referred as the LBGalgorithm, is an iterative
method that takes a codebook as an input (referred as the initial codebook) and produces
a new improved version of the initial codebook (resulting to lower overall distortion). The
hypothesis is that the iterative algorithm finds the nearest locally optimal codebook in respect
to the initial one. The initial codebook for GLA can be constructed by any existing codebook
design method, eg. by the random heuristic. The Lloyd's necessary conditions of optimality
are defined as follows:
· Nearest neighbor condition: For a given set of codevectors, each training vector is
mapped to its nearest codevector in respect to the distortion function.
· Centroid condition: For a given partition, the optimal codevector is the centroid of the
vectors within the partition.
On the basis of these two optimality conditions, the Lloyd's algorithm is formulated as a two
phase iterative process:
1. Divide the training set into partitions by mapping each vector X to its nearest
codevector Y using the euclidean distance.
2. Calculate the centroid of each region and replace the codevectors Y by the centroids of
their corresponding partitions.
Both stages of the algorithm satisfy the optimality conditions, thus the resulting codebook
after one iteration can never be worse than the original one. The iterations are thus continued
until no change (=decrease in the overall distortion) is achieved. The algorithm doesn't
necessarily reach the global optimum, but converges to a local minimum.
69
Note, that there is a third condition of optimality stating that there shouldn't be any vector
having equal distance with two different codevectors. However, this can be omitted by
defining the distortion function so that this condition could never happen. Thus this third
optimality condition is omitted in the discussion made here.
4.3.4 Adaptive VQ
VQ can also be applied adaptively by designing a codebook on the basis of the image to be
coded. The codebook, however, must then be included in the compressed file. For example,
consider a codebook of 256 entries each taking 16 bytes. The complete codebook thus
requires 256 × 4 × 4 = 4096 bytes of memory increasing the overall bit rate by 0.125 bits per
pixel (in the case of 512´512 image). In dynamic modelling, on the other hand, the
compression would start with an initial codebook, which would then be updated during the
compression on the basis of the already coded blocks. This wouldn't increase the bit rate, but
the computational load required by eg. GLA might be too much if it was applied after each
block.
4.4 JPEG
JPEG (Joint photographic experts group) was started in 1986 by cooperative efforts of both
the International organization for standardization (ISO) and International Telegraph and
Telephone Consultative Committee (CCITT), and it was later joined by International
Electrotechnical Committee (IEC). The purpose of JPEG was to create an image compression
standard for grayscale and color images. Even if the name JPEG refers to the standardization
organization it was also adopted to the compression method. The JPEG standard includes the
following modes of operation:
· Lossless coding
· Sequential coding
· Progressive coding
· Hierarchical coding
The lossless coding mode is completely different form the lossy coding and was presented in
Section 4.1.1. The lossy baseline JPEG (sequential coding mode) is based on discrete cosine
transform (DCT), and is introduced next.
70
4.4.1 Discrete cosine transform
The 1D discrete cosine transform (DCT) is defined as
N 1
2 x 1 u
C u u f x × cos (4.16)
x0 2N
Similarly, the inverse DCT is defined as
N 1
2 x 1 u
f x u C u × cos (4.17)
u0 2N
where
1 for u 0
N
u (4.18)
2 N for u 1,2,..., N 1
The corresponding 2D DCT, and the inverse DCT are defined as
N 1 N 1
2 x 1 u 2 y 1 v
C u, v u v f x, y × cos × cos
x 0 y0 2N 2N
(4.19)
and
N 1 N 1
2 x 1 u 2 y 1 v
f x, y u v C u, v × cos 2N
× cos
2N
u 0 v0
(4.20)
The advantage of DCT is that it can be expressed without complex numbers. 2D DCT is also
separable (like 2D Fourier transform), i.e. it can be obtained by two subsequent 1D DCT in
the same way than Fourier transform. See Figure 7.2 for an example of basis functions of the
1D DCT, Figure 7.3 for an example of the basis functions of the 2D DCT, and Figure 7.3 for
illustration of the 2D DCT for 4´4 sample blocks.
71
u=0 u=1 u=2 u=3
1.0 1.0 1.0 1.0
Figure 4.19: 1D DCT basis functions for N=8.
v
0 1 2 3
72
Figure 4.20: 2D DCT basis functions for N=4. Each block consists of 4´4 elements,
corresponding to x and y varying from 0 to 3.
73
ORIGINAL TRANSFORMED
IMAGE IMAGE
FLAT
10 10 10 10 40.0 0.0 0.0 0.0
10 10 10 10 0.0 0.0 0.0 0.0
10 10 10 10 0.0 0.0 0.0 0.0
10 10 10 10 0.0 0.0 0.0 0.0
RANDOM TEXTURE
11 15 18 14 58.8 0.3 1.8 1.3
14 11 13 12 3.9 2.8 3.5 2.6
15 16 19 12 2.7 1.7 1.2 3.4
18 17 12 18 3.0 0.9 5.3 1.8
IMPULSE
10 10 10 10 42.5 1.4 2.5 3.2
10 20 10 10 1.4 0.7 1.4 1.8
10 10 10 10 2.5 1.4 2.5 3.3
10 10 10 10 3.2 1.8 3.3 4.3
LINE (horizontal)
10 10 10 10 50.0 0.0 0.0 0.0
10 10 10 10 5.4 0.0 0.0 0.0
20 20 20 20 10.1 0.0 0.0 0.0
10 10 10 10 13.1 0.0 0.0 0.0
EDGE (vertical)
10 10 20 20 60.0 18.4 0.0 7.7
10 10 20 20 0.0 0.0 0.0 0.0
10 10 20 20 0.0 0.0 0.0 0.0
10 10 20 20 0.0 0.0 0.0 0.0
EDGE (horizontal)
10 10 10 10 60.0 0.0 0.0 0.0
10 10 10 10 18.4 0.0 0.0 0.0
20 20 20 20 0.0 0.0 0.0 0.0
20 20 20 20 7.7 0.0 0.0 0.0
EDGE (diagonal)
10 10 10 10 55.0 11.1 0.0 0.7
10 10 10 20 11.1 5.0 4.6 0.0
10 10 20 20 0.0 4.6 5.0 1.9
10 20 20 20 0.7 0.0 1.9 5.0
SLOPE (horizontal)
10 12 14 16 52.0 8.9 0.0 0.6
10 12 14 16 0.0 0.0 0.0 0.0
10 12 14 16 0.0 0.0 0.0 0.0
10 12 14 16 0.0 0.0 0.0 0.0
74
Figure 4.21: Example of DCT for sample 4´4 blocks.
4.4.2 Baseline JPEG
The image is first segmented into 8´8 blocks of pixels, which are then separately coded. Each
block is transformed to frequency domain by fast discrete cosine transform (FDCT). The
transformed coefficients are quantized and then entropy coded either by arithmetic coder
(QMcoder with binary decision tree) or by Huffman coding. See Figure 4.22 for the main
structure of the baseline JPEG. The corresponding decoding structure is given in Figure 4.22.
Neither the DCT nor the entropy coding lose any information from the image. The DCT only
transforms the image into frequency space so that it is easier to compress. The only phase
resulting to distortion is the quantization phase. The pixels in the original block are
represented by 8bit integers, but the resulting transform coefficients are 16 bit real numbers,
thus the DCT itself would result extension in the file size, if no quantization were performed.
The quantization in JPEG is done by dividing the transform coefficients ci (real number) by
the socalled quantization factor qi (integer between 1..255):
c
ci round i (4.21)
qi
The result is rounded to the nearest integer, see Figure 4.24 for an example. The higher the
quantization factor, the less accurate is the representation of the value. Even the lowest
quantization factor (q=1) results to a small amount of distortion, since the original coefficients
are real number, but the quantized values are integers. The dequantization is defined by
ri ci × qi (4.22)
8x8 blocks
T a b le T a b l e
S p e c if ic a t io n s S p e c if ic a t io n s
Figure 4.22: Main structure of JPEG encoder.
75
Compressed Entropy
D e q u a n tiz e r IDCT Reconstructed
Image Data Decoder Image Data
Table Table
Specifications Specifications
Figure 4.23: Main structure of JPEG decoder.
10
9
8
7
6
5 5
4 4
3 3
2 2
1 1
0 0
Figure 4.24: Example of quantization by factor of 2.
In JPEG, the quantization factor is not uniform within the block. Instead, the quantization is
performed so that more bits are allocated to the low frequency components (consisting the
most important information) than to the high frequency components. See Table 4.6 for an
example of possible quantization matrices. The basic quantization tables of JPEG are shown
in Table 4.7, where the first one is applied both in the grayscale image compression, and for
the Y component in color image compression (assuming YUV, or YIQ color space). The
second quantization table is for the chrominance components (U and V in YUV color space).
The bit rate of JPEG can be adjusted by scaling the basic quantization tables up (to achieve
lower bit rates), or down (to achieve higher bit rates). The relative differences between the qi
factors are retained.
76
Table 4.6: Possible quantization tables.
Table 4.7: JPEG quantization tables.
Luminance Chrominance
16 11 10 16 24 40 51 61 17 18 24 47 99 99 99 99
12 12 14 19 26 58 60 55 18 21 26 66 99 99 99 99
14 13 16 24 40 57 69 56 24 26 56 99 99 99 99 99
14 17 22 29 51 87 80 62 47 66 99 99 99 99 99 99
18 22 37 56 68 109 103 77 99 99 99 99 99 99 99 99
24 35 55 64 81 104 113 92 99 99 99 99 99 99 99 99
49 64 78 87 103 121 120 101 99 99 99 99 99 99 99 99
72 92 95 98 112 100 103 99 99 99 99 99 99 99 99 99
The entropy coding in JPEG is either the Huffman or arithmetic coding. Here the former is
briefly discussed. The first coefficient (denoted by DCcoefficient) is separately coded from
the rest of the coefficients (ACcoefficients). The DCcoefficient is coded by predicting its
value on the basis of the DCcoefficient of the previously coded block. The difference
between the original and the predicted value is then coded using similar code table than was
applied in the lossless JPEG (see Section 4.1.1). The ACcoefficients are then coded one by
one in the order given by zigzag scanning (see Section 2.1.2). No prediction is made, but a
simple runlength coding is applied. All subsequent zerovalue coefficients are coded by
their number (the length of the run). Huffman coding is then applied to the nonzero
coefficients. The details of the entropy coding can be found in [Pennebaker & Mitchell 1993].
Table 4.8 gives an example of compressing a sample block by JPEG using the basic
quantization table. The result of compressing test image Lena by JPEG is shown in Figures
4.25 and 4.26.
4.4.3 Other options in JPEG
JPEG for color images:
77
RGB color images are compressed in JPEG by transforming the image first into YUV (or
YIQ in the case of North America) and then compressing the three color components
separately. The chrominance components are often subsampled so that 2´2 block of the
original pixels forms a new pixel in the subsampled image. The color component images are
then upsampled to their original resolution in the decompression phase.
Progressive mode:
Progressive JPEG is rather trivial. Instead of coding the image block after block, the coding is
divided into several stages. At the first stages the DCcoefficients are coded from each block.
The decoder can get rather good approximation of the image on the basis on the
DCcoefficients alone, since they contain the information of the average value of the blocks.
At the second stage, the first significant ACcoefficients (determined by the zigzag order) are
coded. At the third phase the next significant ACcoefficients are coded, and so on. In total,
there are 64 coefficients in each block, so the progressive coding can have at most 64 stages.
In practice, the progressive coding can be changed back to the sequential order for example
already after the first stage. This is because the DCcoefficients are usually enough for the
decoder to decide whether the image is worth retrieving.
Hierarchical mode:
The hierarchical coding mode of JPEG is a variant of progressive modelling, too. A reduced
resolution version of the image is compressed first followed by the higher resolution versions
in increasing order. In each case the resolution is doubled for the next image similarly than
was done in JBIG.
78
Table 4.8: Example of a sample block compressed by JPEG.
Original block Transformed block
139 144 149 153 155 155 155 155 235.6 1.0 12.1 5.2 2.1 1.7 2.7 1.3
144 151 153 156 159 156 156 156 22.6 17.5 6.2 3.2 2.9 0.1 0.4 1.2
150 155 160 163 158 156 156 156 10.9 9.3 1.6 1.5 0.2 0.9 0.6 0.1
159 161 162 160 160 159 159 159 7.1 1.9 0.2 1.5 0.9 0.1 0.0 0.3
159 160 161 162 162 155 155 155 0.6 0.8 1.5 1.6 0.1 0.7 0.6 1.3
161 161 161 161 160 157 157 157 1.8 0.2 1.6 0.3 0.8 1.5 1.0 1.0
162 162 161 163 162 157 157 157 1.3 0.4 0.3 1.5 0.5 1.7 1.1 0.8
162 162 161 161 163 158 158 158 2.6 1.6 3.8 1.8 1.9 1.2 0.6 0.4
Quantization matrix Quantized coefficients
16 11 10 16 24 40 51 61 15 0 1 0 0 0 0 0
12 12 14 19 26 58 60 55 2 1 0 0 0 0 0 0
14 13 16 24 40 57 69 56 1 1 0 0 0 0 0 0
14 17 22 29 51 87 80 62 1 0 0 0 0 0 0 0
18 22 37 56 68 109 103 77 0 0 0 0 0 0 0 0
24 35 55 64 81 104 113 92 0 0 0 0 0 0 0 0
49 64 78 87 103 121 120 101 0 0 0 0 0 0 0 0
72 92 95 98 112 100 103 99 0 0 0 0 0 0 0 0
Dequantized coefficients Decompressed block
240 0 10 0 0 0 0 0 144 146 149 152 154 156 156 156
24 12 0 0 0 0 0 0 148 150 152 154 156 156 156 156
14 13 0 0 0 0 0 0 155 156 157 158 158 157 156 156
0 0 0 0 0 0 0 0 160 161 161 162 161 159 157 155
0 0 0 0 0 0 0 0 163 163 164 163 162 160 158 156
0 0 0 0 0 0 0 0 163 164 164 164 162 160 158 157
0 0 0 0 0 0 0 0 160 161 162 162 162 161 159 158
0 0 0 0 0 0 0 0 158 159 161 161 162 161 159 158
79
Original JPEG
bpp = 8.00 mse = 0.00 bpp = 1.00 mse = 17.26
JPEG JPEG
bpp = 0.50 mse = 33.08 bpp = 0.25 mse = 79.11
Figure 4.25: Test image Lena compressed by JPEG. Mse refers to mean square error.
80
Original JPEG
bpp = 8.00 mse = 0.00 bpp = 1.00 mse = 17.26
JPEG JPEG
bpp = 0.50 mse = 33.08 bpp = 0.25 mse = 79.11
Figure 4.26: Magnifications of Lena compressed by JPEG.
81
4.5 JPEG2000
to be written later
Equation numbers: Next number > (4.23)
Figure numbers: Next number > (4.27)
4.5.1 Wavelet transform
The basic idea of (discrete) wavelet transform is to decompose the image into smooth and
detail components. The decomposition is perform in horizontal and in vertical directions
separately. The smooth component represents the average color information and the detail
component the differentials of neighboring pixels. The smooth component is obtained using
a low pass filter and the detail component by high pass filter.
· Within each family of wavelets (such as the Daubechies family) are wavelet subclasses
distinguished by the number of coefficients and by the level of iteration. Wavelets are
classified within a family most often by the number of vanishing moments. This is an extra
set of mathematical relationships for the coefficients that must be satisfied, and is directly
related to the number of coefficients (1). For example, within the Coiflet wavelet family are
Coiflets with two vanishing moments, and Coiflets with three vanishing moments. In Figure
4, several different wavelet families are illustrated.
· The matrix is applied in a hierarchical algorithm, sometimes called a pyramidal algorithm.
The wavelet coefficients are arranged so that odd rows contain an ordering of wavelet
coefficients that act as the smoothing filter, and the even rows contain an ordering of wavelet
coefficient with different signs that act to bring out the data's detail. The matrix is first applied
to the original, fulllength vector. Then the vector is smoothed and decimated by half and the
matrix is applied again. Then the smoothed, halved vector is smoothed, and halved again, and
the matrix applied once more. This process continues until a trivial number of "smooth
smoothsmooth..." data remain. That is, each matrix application brings out a higher resolution
of the data while at the same time smoothing the remaining data. The output of the DWT
consists of the remaining "smooth (etc.)" components, and all of the accumulated "detail"
components.
82
=
Subsampling
Upsampling
=
Subsampling
Upsampling
Figure 4.??: Different families of wavelet functions.
83
Figure 4.??: Example of vertical and horizontal sub band decomposition.
Figure 4.??: Illustration of the first and second iterations of wavelet decomposition.
4.5.2 Waveletbased compression
· Filtering
· Quantizer
· Entropy coding
· Arithmetic coding
· Bit allocation
84
4.6 Fractal coding
Fractals can be considered as a set of mathematical equations (or rules) that generate fractal
images; images that have similar structures repeating themselves inside the image. The image
is the inference of the rules and it has no fixed resolution like raster images. The idea of
fractal compression is to find a set of rules that represents the image to be compressed, see
Figure 4.25. The decompression is the inference of the rules. In practice, fractal compressions
tries to decompose the image into smaller regions which are described as linear combinations
of the other parts of the image. These linear equations are the set of rules.
inference
of the rules
Image
Set of rules
fractal
compression
Figure 4.25: Fractal compression.
The algorithm is based on quadtree decomposition of the images, thus the states of the
automata are squared blocks. The subblocks (quadrants) of the quadtree have been addressed
as shown in Figure 4.26. In WFA, each block (state) is described by the content of its four
subblocks. This means that the complete image is also one state in the automata. Let us next
consider an example of an automata (Figure 4.27), and the image it creates (Figure 4.28).
Here we adopt a color space where 0 represents the white color (void), and 1 represents the
black color (element). The decimal values from 0 to 1 are different shades of gray.
The labels of the transitions indicates what subquadrant the transition is applied to. For
example, the transition from Q0 to Q1 is used for the quadrant 3 (top rightmost quadrant) with
the weight ½, and for the quadrants 1 (top leftmost) and 2 (bottom rightmost) with the weight
of ¼. Denote the expression of the quadrant d in Qi by fi(d). Thus, the expression of the
85
quadrants 1 and 2 in Q0 is given by ½×Q0+¼×Q1. Note that the definition is recursive so that
these quadrants in Q0 are the same image as the one described by the state itself, but only half
of its size, plus added the one fourth of the image defined by the state Q1.
A 2k´2k resolution representation of the image in Q0 is constructed by assigning the pixels by
a value f0(s), where s is the klength string of the pixel's address in the kth level of the
quadtree. For an example, the pixel values at the addresses 00, 03, 30, and 33 are given
below:
1 1 1 1 1 1 1
f 0 00
2
f 0 0 × × f 0 () × ×
2 2 2 2 2 8
(4.14)
f0 03
1
2
1
1 1 3
f0 3 × 1 2 f 0 () 1 2 f1 ()
2 8 4 8
(4.15)
1 1 1 1 1 1 1 5
f 0 30
2
× f0 0 × f1 0 × × f 0 () × f1 ()
2 2 2 2 8 2 8
(4.16)
1 1
2
1
2
1
2
1 1 1 7
f 0 33 × f 0 3 × f1 3 × 1 2 f 0 () 1 2 f1 () × f1 () (4.17)
2 8 4 2 8
Note, that fi() with empty string evaluates to the final weight of the state, which is ½ in Q0 and
1 in Q1. The image at three different resolutions is shown in Figure 4.28.
11 13 31 33
1 3 10 12 30 32
01 03 21 23
0 2 00 02 20 22
(a) (b) (c)
Figure 4.26: (a) The principle of addressing the quadrants; (b) Example of addressing at the
resolution of 4´4; (c) The subsquare specified by the string 320.
86
0,1,2,3 ( 1 / 2 ) 0,1,2,3 (1)
1,2 ( 1 / 4 )
1 1
/2
Q 0 Q 1
3 ( 1 / 2 )
Figure 4.27: A diagram for a WFA defining the linear grayness function of Figure 4.28.
4 5 6 7
4 6 /8 /8 /8 /8
/8 /8
3 4 5 6
/8 /8 /8 /8
2 3 4 5
2 4 /8 /8 /8 /8
/8 /8 1 2 3 4
/8 /8 /8 /8
2 x 2 4 x 4 128 x 128
Figure 4.28: Image generated by the automata in Figure 4.27 at different resolutions.
The second example is the automata given in Figure 4.29. Here the states are illustrated by the
images they define. The state Q1 is expressed in the following way. The quadrant 0 is the
same as Q0, whereas the quadrant 3 is empty so no transitions with the label 3 exists. The
quadrants 1 and 2, on the other hand, are recursively described by the same state Q1. The state
Q2 is expressed as follows. The quadrant 0 is the same as Q1, and the quadrant 3 is empty. The
quadrants 1 and 2 are again recursively described by the same state Q2. Besides the different
color (shading gray) the left part of the diagram (states Q3, Q4, Q5) is the same as the right part
(states Q0, Q1, Q2). For example, Q4 has the shape of Q1 but the color of Q3. The state Q5 is
described so that the quadrant 0 is Q4, quadrant 1 is Q2, and quadrant 2 is the same as the state
itself.
87
2 (1) 1,2 (1)
Q 5 Q 2
1 (1)
0 (1) 0 (1)
1 ( 1 / 2 )
0 (1) 0 (1)
0,1,2,3 ( 1 / 2 ) 0,1,2,3 (1)
Q 3 Q 0
Figure 4.29: A WFA generating the diminishing triangles.
WFA algorithm:
The aim of WFA is to find such an automata that describes the original image as closely as
possible, and is as small as possible by its size. The distortion is measured by TSE (total
square error). The size of the automata can be approximated by the number of states and
transitions. The minimized optimization criteria is thus:
d k f , f A G × size A (4.18)
The Gparameter defines how much emphasis is put on the distortion and to the bit rate. It is
left as an adjusting parameter for the user. The higher is the G, the smaller bit rates will be
achieved at the cost of image quality, and vice versa. Typically G has values from 0.003 to
0.2.
The WFAalgorithm compresses the blocks of the quadtree in two different ways:
· by a linear combination of the functions of existing states
· by adding new state to the automata, and recursively compressing its four subquadrants.
88
Whichever alternative yields a better result in minimizing (4.18) is then chosen. A small set of
states (initial basis) is predefined. The functions in the basis do not need even to be defined
by a WFA. The choice of the functions can, of course, depend on the type of images one
wants to compress. The initial basis in [Culik & Kari, 1993] resembles the codebook in vector
quantization, which can be viewed as a very restricted version of the WFA fractal
compression algorithm.
The algorithm starts at the top level of the quadtree, which is the complete image. It is then
compressed by the WFA algorithm, see Figure 4.30. The linear combination of a certain block
is chosen by the following greedy heuristic: The subquadrant k of a block i is matched against
each existing state in the automata. The best match j is chosen, and a transition from i to j is
created with the label k. The match is made between normalized blocks so that their size and
average value are scaled to be equal. The weight of the transition is the relative difference
between the mean values. The process is then repeated for the residual, and is carried over
until the reduction in the square error between the original and the block described by the
linear combination is low enough. It is a tradeoff between the increase in the bit rate and the
decrease in the distortion.
89
In itia l b a s is : In itia l b a s is :
Q 1 Q 2 Q 3 Q 4 Q 5 Q 6
Q 1 Q 2 Q 3 Q 4 Q 5 Q 6
w w 2 w 3
1
1
Im a g e Im a g e Q 0
(a) (b)
Figure 4.30: Two ways of describing the image: (a) by the linear combination of existing
states; (b) by creating a new state which is recursively processed.
WFA algorithm for wavelet transformed image:
A modification of the algorithm that yields good results is to combine the WFA with
a wavelet transformation [DeVore et al. 1992]. Instead of applying the algorithm directly to
the original image, one first makes a wavelet transformation on the image and writes the
wavelet coefficients in the Mallat form. Because the wavelet transform has not been
considered earlier, the details of this modification are omitted here.
Compressing the automata:
The final bitstream of the compressed automata consists of three parts:
· Quadree structure of the image decomposition.
· Transitions of the automata
· Weights of the transitions
A bit in the quadtree structure indicates whether a certain block is described as a linear
combination of the other blocks (a set of transitions), or by a new state in the automata. Thus
the states of the automata are implicitly included in the quadtree. The initial states need not to
be stored.
The transitions are stored in an n´n matrix, where each nonzero cell M(i, j)=wij describes that
there is a transition from Qi to Qj having the weight wij. If there is no transition between i
and j, wij is set to zero. The label (0,1,2, or 3) is not stored. Instead there are four different
matrices, one for each subquadrant label. The matrices Mk(i, j) (k=0,1,2,3) are then
90
represented as binary matrices Bk(i, j) so that B(i, j)=1 if and only if wij0; otherwise B(i, j)=0.
The consequence of this is that only the nonzero weights are needed to store.
The binary matrices are very sparse because for each state only few transitions exists,
therefore they can be efficiently coded by runlength coding. Some states are more frequently
used in linear combinations than other, thus arithmetic coding using the column j as the
context was considered in [Kari & Fränti, 1994]. The nonzero weights are then quantized and
a variable length coding (similar to the FELICS coding) is applied.
The results of the WFA outperform that of the JPEG, especially at the very low bit rates.
Figures 4.31 and 4.32 illustrate test image Lena when compressed by WFA at the bit rates
0.30, 0.20, and 0.10. The automata (at the 0.20 bpp) consists of 477 states and 4843
transitions. The quadtree requires 1088 bits, whereas the bitmatrices 25850, and the weights
25072 bits.
Original WFA
bpp = 8.00 mse = 0.00 bpp = 0.30 mse = 49.32
91
WFA WFA
bpp = 0.20 mse = 70.96 bpp = 0.10 mse = 130.03
Figure 4.31: Test image Lena compressed by WFA.
92
Original WFA
bpp = 8.00 mse = 0.00 bpp = 0.30 mse = 49.32
WFA WFA
bpp = 0.20 mse = 70.96 bpp = 0.10 mse = 130.03
Figure 4.32: Magnifications of Lena compressed by WFA.
93
5 Video images
Video images can be regarded as a threedimensional generalization of still images, where the
third dimension is time. Each frame of a video sequence can be compressed by any image
compression algorithm. A method where the images are separately coded by JPEG is
sometimes referred as Motion JPEG (MJPEG). A more sophisticated approach is to take
advantage of the temporal correlations; i.e. the fact that subsequent images resemble each
other very much. This is the case in the latest video compression standard MPEG (Moving
Pictures Expert Group).
MPEG:
MPEG standard consists of both video and audio compression. MPEG standard includes also
many technical specifications such as image resolution, video and audio synchronization,
multiplexing of the data packets, network protocol, and so on. Here we consider only the
video compression in the algorithmic level. The MPEG algorithm relies on two basic
techniques
· Block based motion compensation
· DCT based compression
MPEG itself does not specify the encoder at all, but only the structure of the decoder, and
what kind of bit stream the encoder should produce. Temporal prediction techniques with
motion compensation are used to exploit the strong temporal correlation of video signals. The
motion is estimated by predicting the current frame on the basis of certain previous and/or
forward frame. The information sent to the decoder consists of the compressed DCT
coefficients of the residual block together with the motion vector. There are three types of
pictures in MPEG:
· Intrapictures (I)
· Predicted pictures (P)
· Bidirectionally predicted pictures (B)
Figure 5.1 demonstrates the position of the different types of pictures. Every Nth frame in the
video sequence is an Ipicture, and every Mth frame a Ppicture. Here N=12 and M=4. The
rest of the frames are Bpictures.
Compression of the picture types:
94
Intra pictures are coded as still images by DCT algorithm similarly than in JPEG. They
provide access points for random access, but only with moderate compression. Predicted
pictures are coded with reference to a past picture. The current frame is predicted on the basis
of the previous I or Ppicture. The residual (difference between the prediction and the
original picture) is then compressed by DCT. Bidirectional pictures are similarly coded than
the Ppictures, but the prediction can be made both to a past and a future frame which can be
I or Ppictures. Bidirectional pictures are never used as reference.
The pictures are divided into 16´16 macroblocks, each consisting of four 8´8 elementary
blocks. The Bpictures are not always coded by bidirectional prediction, but four different
prediction techniques can be used:
· Bidirectional prediction
· Forward prediction
· Backward prediction
· Intra coding.
The choice of the prediction method is chosen for each macroblock separately. The
bidirectional prediction is used whenever possible. However, in the case of sudden camera
movements, or a breaking point of the video sequence, the best predictor can sometimes be
given by the forward predictor (if the current frame is before the breaking point), or backward
prediction (if the current frame is after the breaking point). The one that gives the best match
is chosen. If none of the predictors is good enough, the macroblock is coded by intracoding.
Thus, the Bpictures can consist of macroblock coded like the I, and Ppictures.
The intracoded blocks are quantized differently from the predicted blocks. This is because
intracoded blocks contain information in all frequencies and are very likely to produce
'blocking effect' if quantized too coarsely. The predicted blocks, on the other hand, contain
mostly high frequencies and can be quantized with more coarse quantization tables.
Forward Forward
prediction prediction
I B B B P B B B P B B B I
Bidirectional
prediction
Figure 5.1: Interframe coding in MPEG.
95
Motion estimation:
The prediction block in the reference frame is not necessarily in the same coordinates than the
block in the current frame. Because of motion in the image sequence, the most suitable
predictor for the current block may exist anywhere in the reference frame. The motion
estimation specifies where the best prediction (best match) is found, whereas motion
compensation merely consists of calculating the difference between the reference and the
current block.
The motion information consists of one vector for forward predicted and backward predicted
macroblocks, and of two vectors for bidirectionally predicted macroblocks. The MPEG
standard does not specify how the motion vectors are to be computed, however, block
matching techniques are widely used. The idea is to find in the reference frame a similar
macroblock to the macroblock in the current frame (within a predefined search range). The
candidate blocks in the reference frame are compared to the current one. The one minimizing
a cost function measuring the mismatch between the blocks, is the one which is chosen as
reference block.
Exhaustive search where all the possible motion vectors are considered are known to give
good results. Because the full searches with a large search range have such a high
computational cost, alternatives such as telescopic and hierarchical searches have been
investigated. In the former one, the result of motion estimation at a previous time is used as
a starting point for refinement at the current time, thus allowing relatively narrow searches
even for large motion vectors. In hierarchical searches, a lower resolution representation of
the image sequence is formed by filtering and subsampling. At a reduced resolution the
computational complexity is greatly reduced, and the result of the lower resolution search can
be used as a starting point for reduced search range conclusion at full resolution.
96
Literature
97
3.1 A.N. Netravali, F.W. Mounts, Ordering Techniques for Facsimile Coding: A Review.
Proceedings of the IEEE, Vol. 68 (7), 796807, July 1980.
3.1 Y. Wao, J.M. Wu, Vector RunLength Coding of Bilevel Images. Proceedings Data
Compression Conference, Snowbird, Utah, 289298, 1992.
3.3 CCITT, Standardization of Group 3 Facsimile Apparatus for Document Transmission, ITU
Recommendation T.4, 1980.
3.3 CCITT, Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile
Apparatus, ITU Recommendation T.6, 1984.
3.3 Y. Yasuda, Y. Yamazaki, T. Kamae, B. Kobayashi, Advanced in FAX. Proceedings of the
IEEE, Vol. 73 (4), 706730, April 1985.
3.4 M. Kunt, O. Johnsen, Block Coding: A Tutorial Review. Proceedings of the IEEE, Vol. 68
(7), 770786, July 1980.
3.4 P. Fränti and O. Nevalainen, "Compression of binary images by composite methods based on
the block coding", Journal of Visual Communication and Image Representation, 6 (4),
366-377, December 1995.
3.5 ISO/IEC Committee Draft 11544, Coded Representation of Picture and Audio Information
Progressive Bilevel Image Compression, April 1992.
3.5 G.G. Langdon, J. Rissanen, Compression of BlackWhite Images with Arithmetic Coding.
IEEE Transactions on Communications, Vol. 29 (6), 358367, June 1981.
3.5 B. Martins, and S. Forchhammer, Bilevel image compression with tree coding. IEEE
Transactions on Image Processing, April 1998, 7 (4), 517528.
3.5 E.I. Ageenko and P. Fränti, “Enhanced JBIG-based compression for satisfying objectives of
engineering document management system”, Optical Engineering, 37 (5), 1530-1538, May
1998.
3.5 E.I. Ageenko and P. Fränti, "Forwardadaptive method for compressing large binary images",
Software Practice & Experience, 29 (11), 1999.
3.6 P.G. Howard, “Text image compression using soft pattern matching”, The Computer Journal,
40 (2/3), 146156, 1997.
3.6 Howard, P. G., Kossentini, F., Martins, B., Forchammer, S, and Rucklidge, W. J., The
emerging JBIG2 standard. IEEE Trans. Circuits and Systems for Video Technology,
November 1998, 8 (7), 838848.
4.1 Arps R.B., Truong T.K., Comparison of International Standards for Lossless Still Image
Compression. Proceedings of the IEEE, Vol. 82 (6), 889899, June 1994.
4.1.1 M. Rabbani and P.W. Melnychuck, Conditioning Context for the Arithmetic Coding of Bit
Planes. IEEE Transactions on Signal Processing, Vol. 40 (1), 232236, January 1992.
4.1.2 P.E. Tischer, R.T. Worley, A.J. Maeder and M. Goodwin, Contextbased Lossless Image
Compression. The Computer Journal, Vol. 36 (1), 6877, January 1993.
4.1.2 N. Memon and X. Wu, Recent developments in contextbased predictive techniques for
lossless image compression. The Computer Journal, Vol. 40 (2/3), 127136, 1997.
4.1.3 P.G. Howard, J.S. Vitter, Fast and Efficient Lossless Image Compression. Proceedings Data
Compression Conference, Snowbird, Utah, 351360, 1993.
98
4.1.4 M. Weinberger, G. Seroussi and G. Sapiro, “The LOCOI lossless image compression
algorithm: principles and standardization into JPEGLS”, Research report HPL98193,
Hewlett Packard Laboratories. (submitted to IEEE Transactions on Image Processing)
4.1.4 X. Wu and N.D. Memon, “Contextbased, adaptive, lossless image coding”, IEEE
Transactions on Communications, Vol. 45 (4), 437444, April 1997.
4.1.5 S. Takamura, M. Takagi, Lossless Image Compression with Lossy Image Using Adaptive
Prediction and Arithmetic Coding. Proceedings Data Compression Conference, Snowbird,
Utah, 166174, 1994.
4.1.6 J. Ziv, A. Lempel, A Universal Algorithm for Sequential Data Compression. IEEE
Transactions on Information Theory, Vol. 23 (3), 337343, May 1977.
4.1.6 J. Ziv, A. Lempel, Compression of Individual Sequences Via VariableRate Coding. IEEE
Transactions on Information Theory, Vol. 24 (5), 530536, September 1978.
4.2 E.J. Delp, O.R. Mitchell, Image Coding Using Block Truncation Coding. IEEE Transactions
on Communications, Vol. 27 (9), 13351342, September 1979.
4.2 P. Fränti, O. Nevalainen and T. Kaukoranta, "Compression of Digital Images by Block
Truncation Coding: A Survey", The Computer Journal, vol. 37 (4), 308332, 1994.
4.3 A. Gersho, R.M. Gray, Vector Quantization and Signal Compression. Kluwer Academic
Publishers, Dordrecht, 1992.
4.3 N.M. Nasrabadi, R.A. King, Image Coding Using Vector quantization: A Review. IEEE
Transactions on Communications, Vol. 36 (8), 957971, August 1988.
4.3 Y. Linde, A. Buzo, R.M. Gray, An Algorithm for Vector Quantizer Design. IEEE
Transactions on Communications, Vol.28 (1), 8495, January 1980.
4.4 W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard. Van
Nostrand Reinhold, 1993.
4.5 M. Vetterli and C. Herley, Wavelets and Filter Banks: Theory and Design, IEEE Transactions
on Signal Processing, Vol. 40, 22072232, 1992.
4.5 R.A. DeVore, B. Jawerth, B.J. Lucier, Image Compression Through Wavelet Transform
Coding. IEEE Transactions on Information Theory, Vol. 38 (2), 719746, March 1992.
4.5 T. Ebrahimi, C. Christopoulos and D.T. Lee: Special Issue on JPEG2000, Signal Processing:
Image Communication, Vol. 17 (1), Pages 1144, January 2002.
4.5 D.S. Taubman, M.W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards
and Practice. Kluwer Academic Publishers, Dordrecht, 2002.
4.6 Y. Fisher, Fractal Image Compression: Theory and Application. SpringerVerlag, New York,
1995.
4.6 K. Culik, J. Kari, Image Compression Using Weighted Finite Automata. Computers &
Graphics, Vol. 17 (3), 305313, 1993.
4.6 J. Kari, P. Fränti, Arithmetic Coding of Weighted Finite Automata. Theoretical Informatics
and Applications, Vol. 28 (34), 343360, 1994.
6 D.J. LeGall, The MPEG Video Compression Algorithm. Signal Processing: Image
Communication, Vol. 4 (2), 129139, April 1992.
99
Appendix A: CCITT test images
100
Appendix B: Grayscale test images
BRIDGE (256´256) CAMERA (256´256)
BABOON (512´512) LENA (512´512)
101