Escolar Documentos
Profissional Documentos
Cultura Documentos
ttwong, pheng}@cse.cuhk.edu.hk
ABSTRACT
1.
INTRODUCTION
237
which provide good cross-image correlation for image similarity analysis and compression. We can utilize the high similarity information to sort and group the randomly placed
disordered photos and remove the redundancy information
among them.
In this paper, we explore the digital photo similarity analysis techniques in frequency domain, and then perform photo
album sorting, clustering and compression algorithms.We
aim to sort these miscellaneous photos according to their
semantic similarities, and categorize into different clusters.
Based on these sorted photo sequence, we further compress
all the photographs by a MPEG-like algorithm with variable IBP frames and adaptive search windows for motion
compensation. The overall system design is illustrated in
Fig. 1.
2.
Due to the limitations of space and time, most of the photos are stored and distributed in compressed format, among
which JPEG is the most popular standard applied in digital camera and internet multimedia data. Previous digital
photo album processing need to decompress photos into uncompressed domain before carrying on with other existing
image processing and analysis techniques. This is not only
time consuming, but also computationally expensive. Compared with the conventional image feature detection and texture analysis approaches on pixel domain, performing image
analysis in frequency domain directly become more beneficial:
1) Exempt from the huge workload of decompressing every image.
2) Process is performed on less amount of data since most
of the frequency coefficients tend to zero.
3) Utilize the image features contained in frequency domain directly, such as the mean and directional texture
information that DCT (Discrete Cosine Transform) coefficients provide.
Therefore, a new research stream which is conducted to do
image analysis and feature extraction directly in frequency
238
domain has drawn much attention, and some techniques explored for editing and analysis the compressed images with
frequency coefficients become hot topics in recent years.
Smith and Rowe [9] first implemented several operations
directly on JPEG data, such as scalar addition, scalar multiplication, pixel-wise addition, and pixel-wise multiplication
of two images on RLE blocks. With these algorithms, one
can execute the traditional image manipulation operations
on compressed images directly, yielding performance 50 to
100 times faster than that of manipulating decompressed
images. Based on image analysis and processing techniques
in compressed domain, some applications of image indexing using DCT frequency coefficients have been developed.
Chang [1] studied video indexing, image matching, and texture feature extraction with compressed images. Shneier and
Abdel-Mottaleb [8] suggested a method of generating feature
keys of JPEG images with the average value of DCT coefficients computed over a window. During retrieval, images
with similar keys are assumed to be similar. However, there
is no semantic meaning associated with such image similarities. Quad-tree structure has been introduced into image
indexing system by Climer and Bhatia [2]. They designed a
JPEG image database indexing system based on quad-tree
structure with leaves containing relevant DCT coefficients.
Quad-tree is adopted as the signature of original image that
stores the average DC coefficients which correspond to the
average value of the 8 8 block. The system generates a set
of ranked images in the database with respect to the query,
and allows user to limit the output number of matched images by changing the match threshold. Considering the real
time indexing efficiency, Feng and Jiang [3] calculated only
the first two moments, mean and variance, of original images directly in DCT domain using DC and AC coefficients.
Based on these two statistical features, their JPEG compressed image retrieval system is robust to translation, rotation and scale transform with minor disturbance.
The aforementioned work focused on the different applications by manipulating image data in DCT frequency domain. Those achieved efforts somehow exploited the possibility of using DCT coefficients for describing image information, which is also the base point for us to do digital
photo processing in frequency domain.
2.1
Feature Extraction
7
7
Cu Cv X X
4 i=0 j=0
cos
(2i + 1)u
16
cos
(2j + 1)v
f (i, j)
16
(1)
Inverse :
f (i, j) =
7
7
1 XX
4 u=0 v=0
Cu Cv F (u, v)cos
cos
where
Cu , Cv =
1
2
:
:
(2i + 1)u
16
(2j + 1)v
16
(2)
f or u, v = 0
.
otherwise
2.2
239
Energy Histogram
Histogram is one of the primitive tools used in image information analysis. It is generally tolerant to image rotation
and modest object translation, and can include scale invariant through the means of normalization. Given a discrete
color image defined by some color space, e.g. RGB, the color
histogram is obtained by counting the number of times each
discrete color occurs in the image array [10]. Statistically,
color histogram denotes the joint probability of the intensities of these three color channels. Similarly, an energy
histogram of DCT coefficients is obtained by counting the
number of times an energy level appears in a DCT blocks
set of compressed image. The energy histograms of DC and
AC matrices can be written as:
m X
n
X
1 : f or DC[i, j] = k
HDC [k] =
(3)
0 : otherwise
i=0 j=0
HACt [k] =
m X
n
X
1
0
i=0 j=0
:
:
f or ACt [i, j] = k
otherwise
(4)
histogram of DC
800
600
10
400
5
200
0
0
0
histogram of AC0
100
200
histogram of AC1
20
100
200
histogram of AC25
30
120
25
100
20
80
15
60
10
40
P
1 and n
i=1 Xi = 1, based on this similarity value, we can
obtain the corresponding distance simply by:
d(X, Y ) = 1
n
X
min(Xi , Yi )
(6)
i=1
We obtain 32 DC and AC distance values by performing histogram intersection on those DC and AC energy histograms respectively. The final similarity distance of two images can be calculated by weighted summarization of them:
D(X, Y ) = wdc ddc (X, Y ) +
31
X
(7)
i=1
where
wdc +
31
X
waci = 1
i=1
15
10
5
5
0
500
500
1000
0
500
20
0
500
0
100
100
2.3
Photo Distance
To evaluate how similar two images are, we explore a reasonable metric to calculate the distance value from their
feature vectors. Lots of experimental works have been carried out to examine the best metric according to different
applications[4]. In our project, we adopt histogram intersection as it is most effective and natural way for histogram
similarity calculation.
The histogram intersection metric is first proposed by
Swain and Ballard [10] for color image retrieval in the spatial
domain, the formula is defined as:
Pn
min(Xi , Yi )
i=1P
(5)
H(X, Y ) =
n
i=1 Yi
where X and Y denote two n-bins color image histograms.
H(X, Y ) turns out to be the normalized degree of P
similarity.
Since we have normalized the histogram bins, i.e., n
i=1 Yi =
240
3.
where K is a parameter we set to control the range of variance. Clustering threshold T is used to measure whether the
current photo belongs to this cluster or not. The bigger K,
the more probability this photo belongs to current cluster.
Beginning with the first photo, we examine the similarity
distance D(i, j) between every two adjacent photos i and j,
where j = i + 1. We calculate the mean and variance
of distance values in current cluster using Equation(8) and
(9), and compare the distance D(i, j) with the clustering
threshold T .
If D(i, j) T , photo j is classified to this cluster and used
to update this clusters and at the same time :
old N + D(i, j)
N +1
(11)
N
1
1 X
(D(i, i + 1) new )2 ) 2
N i=1
(12)
new =
4.
Based on the sorting sequence of all photos, we can categorize them into different clusters according to the similarity distances. Because different photo cluster has different
texture complexity and similarity distance distribution, as
shown in Fig. 6, the distance variances of different clusters
are variable. The distance variances among the clusters of
sky and cloud photos are much more less than that of the
clusters of people activities.
distance distribution
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
new = (
And then the clustering threshold T of this cluster is therefore changed by new and new :
Tnew = new + K new
(13)
0.05
20
40
60
80
100
120
140
k 0;
j 0;
C[k, j] Q[0];
for i 0 to N
if D[Q[i], Q[i + 1]] T
C(k, j) Q[i + 1];
(j1)+D[Q[i],Q[i+1]]
;
j
P
1
( 1j jt=0 (D[C[k, t], C[k, t + 1]] )2 ) 2 ;
j j + 1;
else
k k + 1;
j 0;
C[k, j] Q[i + 1];
0;
0;
return (C);
(8)
N
1
X
1
1
(D(i, i + 1) )2 ) 2
N 1 i=1
(9)
=(
where D(i, i+1) denotes the distance between adjacent photos i and i + 1. We define the clustering threshold T as:
T =+K
(10)
241
5.
5.1
I-frame. Otherwise, we assign bigger search window in Pframe. Equation (14) and (15) illustrate the B-frame search
window size calculation.
N i
SI (Bi ) = 16 (
+ 1)
(14)
N
SP (Bi ) = 16 (
i
+ 1)
N
(15)
5.2
242
Take one picture for example, as shown in Fig. 13, traditional JPEG picture appears much arti-blocks in high compression ratio. MPEG picture also loses much detail information, for instance the people swimming in the sea, the
texture of trees, and the wave movement. Our methods retain more image quality by intelligent motion estimation and
compensation, which depend on proper sorting sequence,
reasonable clustering groups, and the cross-image similarity
(redundancy) correlation.
7.
243
CONCLUSION
Previous work in image browsing, searching, and management has concentrated on solving retrieval and clustering
problems. And most of them is devoted to the pixel- or
spatial-domain manipulation. This means all images must
be decoded totally to the pixel domain before performing
any image processing and similarity tests.
In this paper we have investigated the use of the energy histograms of the low frequency DCT coefficients in
frequency domain for digital photo analysis. Experimental results prove that our photo album sorting and clustering algorithms can rearrange the randomly placed miscellaneous photos in terms of image similarity and group them
into different clusters, which correspond to different daily
events and sceneries. Finally, we compress all the digital
photos in current album to obtain a compact storage format. Our photo album compression method outperforms
the traditional JPEG and MPEG in high compression ratio
when there are high similarity and redundancy information
among the same cluster.
8.
REFERENCES
244