IEEEXplore 26

A Parallel Fractal Image Compression Algorithm
for Hypercube Multiprocessors

David Jeff Jackson and Thomas Blom
Department of Electrical Engineering

The University of Alabama, Tuscaloosa, AL 35487
Abstract 2: Theoretical foundations
Data compression has become an important issue in Utilizing fractal image compression, an image is
relation to storage and transmission. This issue is modeled as a union of iterated function (IF) systems. An
especially true for databases consisting of a large number IF is a contracting mapping in this case from a source
of detailed computer images. Many methods have been image block in the image to a destination image block [ 13.
proposed in recent years for achieving high compression The source block is referred to as a domain block and the
ratios for compressed image storage. A very promising destination block a range block. To insure contractivity,
compression technique, in terms of compression ratios, is the domain block must be larger than the range block.
fractal image compression. Fractal image compression Additionally, the union of the range blocks must cover the
exploits natural affrne redundancypresent in typical images entire image [1,21. The image is thus partitioned into
to achieve a high compression ratio in a lossy compression nonoverlapping square range blocks, of size BxB. To
format. Fractal based compression algorithms, however, insure contractivity the size of the domain blocks is
have high computational demands. To obtain faster chosen to 2Bx2B.
compression, a sequential fractal image compression The goal of the algorithm is to determine the best
algorithm may be translated into a parallel algorithm. This possible transformation of any domain block into each
translation takes advantage of the inherently parallel range block. The transformation consists of a downscaling
nature, from a data domain viewpoint, of the fractal of the 2Bx2B domain block to size BxB,a contrast scaling,
transform process. luminance shift, and eight combinations of reflection and
rotation [5]. The best possible transformations may be
1: Introduction determined by a brute force search, a very demanding job
with a complexity 0(n4) where n is the size length of the
image [3], thus the high compression times. To determine
In addition to time complexity considerations for the the distance between a transformed domain block and a
fractal transform, spatial, i.e. memory usage, considerations range block the RMS metric is used [1,2].
are also critical. This is especially true for a parallel To improve the compression performance, a quadtree
implementation of the algorithm given the typical limited partitioning algorithm is implemented, allowing B to vary.
memory characteristics of commercial, highly parallel If a user defined RMS threshold value is not met, the range
systems. In this paper we present a parallel algorithm, block is partitioned into four squares each of size B/2xB/2.
implemented utilizing a hypercube-based nCUBE For this implementation, B is fixed to 4,8, or 16.
multiprocessor, with a varying number of processing
elements with limited memory. The algorithm is presented
and issues including workload scheduling, processing 3: A parallel fractal image algorithm
efficiency, and memory/speed trade-offs are discussed.
Results conceming compression ratios and algorithm The presented image compression algorithm belongs to
efficiency for various image types are also presented. The the coarse grain class of parallel algorithms, thus the
results confirm fractal compression to be a high majority of the calculations consists of range block-to-
performance algorithm such as JPEG and wavelet domain block matching. This computation demands very
compression in compression performance with respect to little interprocess communication.
achieved compression ratios. An algorithm for performing parallel fractal image
2'14
0094-2898/95$04.00Q 1995 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on September 1, 2009 at 13:20 from IEEE Xplore. Restrictions apply.
compression has been previously designed for up to 16 600.0
distributed workstations using the Parallel Virtual
Machine (PVM) software [6]. That algorithm uses a full
copy of the uncompressed image at each node. The first
5-
implementation used a static load distribution model
where all assignments were distributed before any
x8 400.0
computations occurred. The results show a substantial z
3
load imbalance due to variations in the complexity of the
image. The algorithm was improved by using dynamic 6
load allocation where tasks where assigned to the i
3 200.0
processes when they became idle. That approach solved
the load imbalance problem and provided an algorithm
well suited for a limited number of powerful processors.
This implementation of the parallel fractal image 0.0
compression algorithm is made for the nCUBE parallel 0.0 500.0 1000.0 1500.0
computer. This target architecture introduces further Image size [pixels]
constraints to the algorithm, as it should execute on a Figure 1. Memory Usage Versus Image Size.
higher number of nodes (>32) with limited memory. Most
of the nCUBE nodes have only 1 Mb of RAM.
Begin
The size of the images in the prototype program was
limited to 256x256 pixels and 256 grayscale values, a total
+
Read input image &
image size of 64Kb. However, the algorithm is designed to perform initialization.
be expanded to compress larger images with more
grayscale values. In that case, the memory constraint
prohibits the algorithm from storing the complete image at
End
each node, implying that the image has to be divided and
distributed upon the different nodes. The image is divided
into n-I sections, where n is the number of nodes in the
Divide image, transmit
system. Each section overlaps with neighboring sections
one pari to each slave.
as some pixels belong to domain blocks located at two
&
different nodes. The n - I nodes are slave nodes which
perform the comparisons between the range blocks and the
domain blocks. The slave workloads are coordinated by
the remaining node referred to as the host. Individual
domain blocks are stored in a linear fashion to achieve
stride one access to all domain blocks. This memory
access technique gives a memory usage overhead but Transmit range to
decreases overall memory access time. requesting node.
Y l
Figure 1 shows the size of the memory allocated to data
at each node. The size of the program code is 128 Kb. So,
if a image of size 1024x1024~8is to be compressed, a Update result.
minimum memory requirement is 1 Mb at each node. Less
memory is required by eliminating the stride one memory 1
access.
The program is composed of one host process,
executing on a single node, and a number of slave
processes one running on each slave node. The task of the Figure 2. Host Process Flow Diagram
host process, shown as a flow diagram in Figure 2, is to
receives image portions, i.e. pool of domain blocks, and
distribute part of the image to each slave node, keep track
of unprocessed areas of the image, distribute unprocessed preprocesses them. A range block is then taken from the
range blocks to the slave nodes and finally collect results slave message buffer and compared with the domain
from the slave processes and write them to a file. blocks belonging to that slave process. If the message
The slave process, flow diagram shown in Figure 3, buffer is empty a new block is requested from the host
275
es
1
Q Begin
processed?
than the RMS threshold value, no comparison is
performed and the average grayshade value is retumed to
the host. Similarly, when domain blocks are received from
the host, each slave classifies them into grayshaded and
non-grayshaded domains. The grayshaded domain blocks
are not used in the comparisons with non-grayshaded
range blocks.
Receive and pre-

process domain blocks I 4: Results
Two classes of results are presented: the results of the

compression algorithm and the degree of parallelism and
Request new range efficiency attained in the implementation. Two images
block from host. have been selected to test the algorithm, an uncomplicated
image of a boy with large uniform areas, and a more
complicated United States weather image taken from
METEOSAT. The original images are shown in Figure 4.
Profiling tools were used to measure execution time
spent in the computation pmedure. The efficiency of each
node is given as the percentage of the total execution time
h spent in the computation routine. The graph in Figure 5
displays the average efficiency as a function of the number
received from range block. of nodes. It shows a constant efficiency for each image.
100.0, -
Compare range block

90.0 1z
L
to all domain blocks 80.0

70.0
3 60.0 -
Y
.-t
.Y
50.0-
40.0
3
5 30.0
Transmit range-
block to next slave.
Figure 3. Slave Process Flow Diagram

20*oi
10.0
"."
t+Boy image.
w USA image.
0.0 50.0 100.0 150.0

process. If a match between a range and domain block is Number of nodes.
found, the result is returned to the host process. If no Figure 5. Number of Nodes vs. Efficiency.
match is found, the block is either transmitted to the
message buffer of the slave process holding the next part The particular constant depends on ratio of non-
of the image or returned to the host process. The range grayshaded blocks to the total number of domain blocks at
block is retumed to the host process if it has been each node, as nodes with low ratios executes a
processed by all slave nodes, together with data about the unsuccessful search faster than nodes with high ratios i.e.
best matching domain block found in the search. ratios close to one. Thus nodes with high ratios are
bottlenecks. Figure 6 illustrates that further showing the
The implemented scheme also includes a simple efficiency of each slave node. Clearly, the two first and
classification algorithm. When a slave receives a range three last nodes show the lowest efficiency for the boy
block from the host, the block is compared to a grayshaded image. By decomposing the boy image horizontally into
block with a grayshade level equal to the average 15 equal size pieces, it becomes clear that the first and last
grayscale level of the range block. If the difference is less nodes would be assigned the most grayshaded blocks, thus
276
100.0
80.0
60.0
c
5
.-a0l
E 40.0
20.0
0.0
0.0
' 5.0 10.0
I
15.0
Node number
Figure 6. Efficiency vs. Node Number.
the use of more than 256 processors simultaneously.
However, the threshold point depends of the image size.
*Boy image.
-USA image.
10' . -... 100

lo Number of nodes.
Figure 7. Execution time vs. number of
Figure 4. images for Algorithm Testing. nodes. (RMS set = 10)
confirming the results. Another import factor is the performance of the
The execution time as a function of the number of algorithm in terms of compression ratios and image
nodes is shown in Figure 7.As both axes are logarithmic, distortion. The degree of image distortion is measured as a
the straight lines indicate a linear speedup, the slope of the signal-to-noise ratio (SNR), between the original and
lines is -1.10for the boy and -1.03for the weather image. compressed image [4].
That indicates that incrementing the number of processors The S N R , as a function of the compression ratio, is
X times yields a relative speedup greater than X. This shown in Figure 8. The boy image shows a peek S N R of
super linear speedup is due to the host slave configuration. 36.9 for a compression ratio of 6.121 and a S N R of
With eight processors, 718 of the total computation power 29.8dB for a compression ratio of 26.7:l. At that high
is dedicated the actual calculations. However, with 64 compression ratio, a blocking effect in the image is visible,
processors, 63/64of the power is dedicated calculations. A but is not disturbing for the eye unless the image is
performance threshold point is expected at n=256 magnified. For compression ratios less than 7:1, the
processors, beyond that performance is expected to quality of the image is degraded. A satisfactory
degrade considerably. For a 256x256 image the number of explanation for that phenomena has not yet been
range blocks of sue 16x16 is only 256, thus prohibiting determined. A decompressed boy image showing quadtree
277
hypercube computers, where processing nodes have a
limited memory. The algorithm is shown to have linear
speedup when additional processing nodes are utilized. A
threshold point for achieving linear speedup depends on
the size of the image. For a 256x256 image, this point is
believed to lie beyond 256 processors.
For images with a uniform distribution of grayshaded
blocks, the execution time associated with computations
exceeds 95% of the total computation and communication
time. However, the high efficiency of the algorithm
degrades when the complexity of the image is unevenly
t+ Boy image. distributed. This degradation results from potential load
o--O USA image. imbalance problems introduced by the classification
algorithm. Although more complicated classification
algorithms have been developed, showing good results, if
I
20.0 such algorithms are implemented into the parallel
0.0 10.0 20.0 30.0
Compression ratio algorithm they could cause serious load balance problems
Figure 8. Compression vs. SNR.
MI.
Fractal image compression is a promising technique,
partitioning is shown in Figure 9. providing compressions from 5 to 20 times and with S N R
greater than 30 dB for certain images. Some images with a
low degree of enwpy seem difficult to compress, but that
phenomenon is a general compression problem which
JPEG compression schemes also possess.
The status of the algorithm is still in the development
phase. Improvements are necessary to tune it to maximum
performance. To reduce the computation time an improved
classification can be implemented and the domain pool
can be limited. Although real-time performance is not
likely to be attained with this scheme, the scheme appears
very promising for compressing images to be distributed
over a network or on cdrom.
References
M. E Barnsley and L. P. Hurd, “Fractal image
compression,” Wellesley, Massachusetts: AK
Peters, La,. 1993.
Y. Fisher, “Fractal image compression,” Siggruph
Figure 9. Decompressed Boy Image Showing ’92 course notes, 1992.
Quadtree Partitioning. C. Frigaard, J. Gade, T.T. Hemmingsen and T. Sand,
“Image compression based on a fractal theory,”
The results for the USA weather image are not as good. lntemal Report, Institute for Electronic Systems,
The S N R peeks with a value of 24 and a compression of Aalborg University, 1994.
4.51. This is to be expected as the USA image exhibits a
A. E, Jacquin, “Image coding based on a fractal
low entropy. Specifically, the cloud formations are lacking
theory of iterated contractive image
in regularity and do not exhibit any common pattern. The
transformations,” IEEE Trans. Image Process., vol.
S N R graph for the USA image shows the same pattern 1, no. 1, pp. 18-30, Jan. 1992.
with a peek in the S N R as the boy image.
A. E. Jacquin, “Fractal image coding: a review,”
Proceedings of the IEEE, vol. 81, no. 10, pp. 1451-
5: Conclusions 1465, Oct. 1993.
Greg. S. Tinney, “Parallel fractal image
This paper has presented a parallel fractal image compression,” M.S. Thesis, The University of
compression algorithm which can be executed on Alabama, 1994.
278

IEEEXplore 26

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

IEEEXplore 26

Enviado por

Direitos autorais:

Formatos disponíveis

A Parallel Fractal Image Compression Algorithm

for Hypercube Multiprocessors

Department of Electrical Engineering

Abstract 2: Theoretical foundations

Receive and pre-

Two classes of results are presented: the results of the

Compare range block

to all domain blocks 80.0

Figure 3. Slave Process Flow Diagram

0.0 50.0 100.0 150.0

10' . -... 100

Você também pode gostar