Você está na página 1de 6

A Fast Normalized Cross Correlation-Based

Block Matching Algorithm Using Multilevel

Cauchy-Schwartz Inequality

Byung Cheol Song

This paper presents a fast block-matching algorithm I. Introduction

based on the normalized cross-correlation, where the
elimination order is determined based on the gradient Motion estimation has been employed by many video
magnitudes of subblocks in the current macroblock. compression schemes to improve coding efficiency by
Multilevel Cauchy-Schwartz inequality is derived to skip removing the temporal redundancy that exists in video
unnecessary block-matching calculations in the proposed sequences. The block-matching algorithm (BMA) is the most
algorithm. Also, additional complexity reduction is popular approach applied to all video coding standards, such
achieved re-using the normalized cross correlation values as MPEG and H.264/AVC, due to its structural simplicity. A
for the spatially neighboring macroblock because the full search algorithm (FSA) can be the best BMA for a given
search areas of adjacent macroblocks are overlapped. block distortion criterion, as it finds the block with minimum
Simulation results show that the proposed algorithm can block-matching distortion among all candidates. However, its
improve the speed-up ratio up to about 3 times in heavy computational cost is a crucial limiting factor in terms
comparison with the existing algorithm. of software implementation as well as hardware
Keywords: Fast full search, normalized cross- For several decades, many fast BMAs have been developed.
correlation, multilevel Cauchy-Schwartz. These can be divided into two categories. The first category
adopts pre-defined search patterns to locate candidate motion
vectors (MVs) based on distortions of potential candidates [1].
The second category is entirely composed of optimal motion
estimation methods, which can find the globally optimal MV
within a search area [2]-[7]. Li and Salari proposed a well-
known successive elimination algorithm (SEA) providing a
decision boundary based on the sum norms of blocks to
eliminate some checking points without the need for
computationally intensive block matching [2]. Gao and others
extended the SEA to a multilevel SEA (MSEA) that provides
Manuscript received June 1, 2010; revised Aug. 27, 2010; accepted Sept. 10, 2010. multiple levels of tighter boundaries using the sum norms of
This work was financially supported by Inha University Research Grant (INHA-39193). the macroblock (MB) and subblocks with reduced sizes [3].
Byung Cheol Song (phone: +82 32 860 7413, email: bcsong@inha.ac.kr) is with the School
MSEA reduces the necessary computation by detecting and
of Electronic Engineering, Inha University, Incheon, Rep. of Korea.
doi:10.4218/etrij.11.0110.0315 rejecting unnecessary candidates from the lowest level to the

ETRI Journal, Volume 33, Number 3, June 2011 2011 Byung Cheol Song 401
highest level. Zhu and others proposed a fine granularity unnecessary block-matching calculation in the proposed
successive elimination (FGSE) scheme that extended the algorithm. Also, unnecessary block matching can be
MSEA by adding greater detail levels [4]. Also, Liu and others additionally avoided re-using the NCC values obtained from
presented an adaptive version of FGSE [5]. The FGSE is the motion estimation of the spatially adjacent MB. The
distinguished from the MSEA in that, if necessary, only a experimental results show that the proposed algorithm
single subblock having the maximum complexity is chosen at outperforms the existing methods in terms of the search speed.
each level and the subblock is partitioned into four smaller The remainder of this paper is organized as follows. Section II
subblocks at the next level. Therefore, in the case of a 1616 introduces the previous works, and section III presents the
block, the total number of partition levels amounts to 86. Thus, proposed algorithm in detail. We present experimental results
the FGSE has more potential to prune out non-optimal in section IV. Finally, our conclusions are given in section V.
candidates than MSEA before wholly performing block
II. Previous Works and SSIM
As a block distortion criterion, the sum of absolute
differences (SAD) is commonly used in video compression. This section introduces two earlier works related to the
SAD is defined as proposed algorithm: MSEA [3] and FGSE [4]. SSIM is also
N 1 N 1 described in this section.
SAD(u, v) = I
i =0 j =0
C (i, j ) I R (i + u, j + v) , (1)
1. Review of MSEA
where (u, v) is a MV in the search area, and IC and IR denote the
In MSEA, using what is known as a block sum pyramid,
current and reference picture, respectively.
multiple boundary levels tighter than the SEA are obtained by
In addition to SAD and the sum of squared differences
dividing each NN block into subblocks of N/2N/2 and
(SSD), the normalized cross correlation (NCC) is also a well-
further dividing these into N/4N/4 until 11 is reached.
known similarity criterion. The NCC is defined as
In total, L=log2N boundary levels are established. All
N 1 N 1 subblocks at one level are of the same size. At the l-th level,
i =0 j =0
C (i , j ) I R (i + u , j + v) where 0 l L, the number of subblocks is 22l=4l, and the size
NCC (u, v) = . (2) of each subblock is NlNl, where Nl =N/2l. For simplicity, let
N 1 N 1 N 1 N 1

i =0 j =0
C (i , j) 2
i =0 j =0
R (i + u , j + v) 2 Ci,j and Ri,j denote IC (i, j) and IR(i+u, j+v) in (2), respectively.
Then, based on the triangle inequality, the following can be
The NCC is more robust than SAD and SSD under uniform N
illumination changes. Accordingly, it is widely used in object N 1 2 L l
recognition and industrial inspection schemes. SAD C
i, j
i , j Ri , j " i, j
Cil, j Ril, j " C 0 R 0 .
Applying the NCC as the matching criterion to motion
estimation leads to more uniform residuals. Hence, the NCC (3)
can improve subjective visual quality as well as coding 2i +1 2 j +1 N 1 N 1
efficiency in video compression [8]. Recently, visual quality Here, X il, j = X
m = 2i n = 2 j
l +1
m,n , X0 = X
m =0 n =0
m,n , and X
measures focusing on the human visual system (HVS) have
been devised in place of PSNR. Among these measures, represents C or R. It is clear from (3) that the boundary value
structural similarity (SSIM) has become popular. Pan and increases along with boundary level l and is bounded by the
others showed that the NCC helps increase the coding matching error. In MSEA, one candidate is evaluated
efficiency [8]. However, the NCC is a more complex criterion sequentially from the lowest level 0 to the highest level L. If a
compared to SAD. Thus, a fast algorithm to speed up NCC- candidate survives until level L1, its matching error, that is,
based block matching is required for computationally efficient SAD, will be finally calculated at level L. It is important to note
video encoding. that there are only a small number of candidates remaining for
This paper presents a fast BMA based on the NCC. In the final SAD calculations.
proposed algorithm, the successive elimination order is
determined based on the gradient magnitudes of subblocks in 2. Review of FGSE
the current MB, as in the FGSE algorithm. The multilevel
Cauchy-Schwartz inequality is derived to remove any The FGSE algorithm establishes a total of L=(N21)/3

402 Byung Cheol Song ETRI Journal, Volume 33, Number 3, June 2011
Level 0 2 x y + K1
l ( x, y ) = ,
x2 + y2 + K1
2 x y + K 2
c ( x, y ) = , (5)
Level 1 Level 2 Level 3 Level 4 x2 + y2 + K 2
xy + K3
s ( x, y ) = ,
x y + K3
Level 5 Level 6 Level 7 Level 19 Level 20
where x and y are two vectors obtained from the image in the
corresponding local windows, x and y are the sample means
of x and y, respectively, and x2 and y2 are the variances of x
Level 21 Level 22 Level 23 Level 83 Level 84
and y, respectively. In addition, xy denotes the covariance of x
and y. K1, K2, and K3 are small constants to avoid a condition in
which the denominator is zero. They are recommended in an
earlier study [9] as
Level 85
K1 = ( K1 D) 2 , K 2 = ( K 2 D) 2 , K 3 = K 2 / 2 , (6)

where K1, K2 << 1, and D is the dynamic range of the pixel

values. In addition, the higher the value of SSIM (x, y) is, the
Fig. 1. Partition process in FGSE algorithm. more similar the images x and y.

boundary levels by partitioning only one block of size NN III. Proposed Algorithm
into four subblocks of size N/2N/2 at each level according to a
predefined rule. Here, subblocks of a smaller size are not This section introduces a fast NCC-based BMA for video
partitioned further until all other larger subblocks are encoding. The NCC can be often a better criterion than the
partitioned. Consequently, at the same level, subblocks of two SAD in terms of SSIM. In order to prove this, a full search
different sizes may coexist. At the l-th level, where 0 l L , based on SAD was compared to that based on NCC, where the
the number of subblocks is 3l+1 (Fig. 1). If SAD at level l is search range and matching block size were fixed to 15 pixels
lower than the current minimum, a subblock in the MB is and 1616 pixels in terms of integer-pel accuracy. Table 1
partitioned into four subblocks at the next level l+1. The shows the experimental results for eight CIF sequences. Here,
selection of the subblock to be partitioned is based on the the PSNR and SSIM values are computed from the motion-
complexity of the image. In other words, image complexity compensated version of the second frame of each sequence in
measures such as gradient magnitudes are utilized to determine order to evaluate the performance of the motion estimation
the partition order, where subblocks of a higher complexity will only. The results show that the NCC provides better SSIM
be partitioned first so that larger boundary values can be performance than the SAD, while the former is comparable to
obtained as early as possible. or slightly weaker than the latter in terms of PSNR. This means
that the motion compensated images using NCC-based motion
3. Structural Similarity (SSIM) estimation are visually better than that those using SAD-based
motion estimation even though the former does not outperform
Recently, Wang and others proposed a measure based on the latter in terms of PSNR.
image structural distortion termed SSIM [9]. The SSIM index The proposed BMA is a type of NCC version of FGSE that
is more consistent with human perception and is designed to uses multilevel Cauchy-Schwartz inequality [9]. As in an
measure structural information degradation, including the three earlier study [10], it is possible to apply the multilevel Cauchy-
comparison points of luminance, contrast, and structure. Its Schwartz inequality based on a L2-norm pyramid to the
definition is as follows: numerator of (2) as follows:
SSIM ( x, y ) = l ( x, y ) c( x, y ) s( x, y ). (4) N
N 1 2 L l
Here, l(x, y), c(x, y), and s(x, y) are the luminance, contrast, and C
i, j
i, j Ri , j " C
i, j
i, j Ril, j " C 0 R 0 . (7)
structure measures, respectively. They are defined as

ETRI Journal, Volume 33, Number 3, June 2011 Byung Cheol Song 403
subblocks at the next level in the descending order of the
Table 1. Comparison of PSNR and SSIM. complexities of larger-sized subblocks, as in a more recent
PSNR (dB) SSIM study [4]. It is easily proved that (7) and (8) still hold for
sequential partition levels, as shown in Fig. 1. Here, the
partition order is determined in the manner used in the above-
Foreman 34.71 33.89 0.9153 0.9335
mentioned study [4] by employing the gradient magnitude as
News 33.84 33.78 0.9537 0.9587
an image complexity measure.
Coastguard 31.89 31.90 0.9126 0.9145 Note that the search areas of the current MB and its left
Containership 37.27 37.26 0.9798 0.9807 neighboring MB are mostly overlapped (Fig. 2). For the search
Hall Monitor 38.49 38.10 0.9294 0.9538 range of 16 pixels in Fig. 2, two adjacent MBs share about
Mobile 24.36 24.39 0.8629 0.8914 2/3 of their entire search areas. We propose a method to prevent
Stefan 26.85 26.82 0.9069 0.8973 unnecessary computations of NCC using the NCC values
Tempete 27.59 27.54 0.9104 0.9161
obtained from motion estimation of the spatially neighboring
MB. Assume that Pi,j denotes a pixel at (i, j) in the spatially
adjacent 1616 MB, that is, IC(i, j16) (Fig. 2). For instance,
we consider an example when l is equal to 1. For the same
Search area of P Search area of C
candidate in the overlapped search area, the NCC1s of P and C
are as
1 1

i, j =0
Ci1, j Ri1, j P
i, j =0
i, j Ri1, j
and , (10)
P C C 2 R 2
P 2 R 2
where C2 stands for the L2-norm of the current MB C.
Since two adjacent MBs generally have significant spatial
correlation, the possibility that Pi,j and Ci,j are equivalent is high.
For example, the two NCC1s of P and C in (10) can be the
same at level 1. Then, we can replace the NCC1 of C with the
Overlapped search area
NCC1 of P without computation. Note that as l becomes
Fig. 2. Overlapping of search areas of adjacent MBs. smaller, the probability that Pli,j and Cli,j are equivalent becomes
higher. Thus, the NCC values of the block candidates in the
2i +1 2 j +1 overlapped search area are stored during motion estimation of
(X )
l +1
Here, X i , j = X iL, j , X il, j = m, n and L=log2N. the spatially neighboring MB, and if Pli,j and Cli,j are equivalent
m = 2i n = 2 j at level l of motion estimation of the current MB, the NCCl of
Based on (7), it is possible to derive the following inequality: C is replaced with the stored NCCl of P.
On the other hand, it is very important to choose the initial
NCC 0 NCC1 " NCC l " NCC L . (8)
search point and search pattern properly in the search area. In
In (8), NCCl is defined as this paper, the initial search point is set to the median MV of
N MVs of three spatially adjacent MBs (Fig. 3(a)). In order to
2 L l maximize the elimination effect, a spiral search pattern with the
Ci, j
i, j Ril, j initial MV is presented as the starting point, as in the example
NCC l = . (9) shown in Fig. 3(b).
N 1 N 1
As a result, the proposed algorithm is summarized as
i, j
i, j
i, j
i, j
1) Offline pre-processing: Build L2-norm pyramids for the
As in the aforementioned studies [3], [10], a multilevel reference frame.
successive elimination algorithm is developed to determine the 2) Online processing: For each MB in the current frame, the
best MB with the maximal NCC value according to (8). following procedure is applied.
Furthermore, the total number of partition levels can be (a) Compute the L2-norm pyramid of the current MB.
extended to 86, that is, 0 l 85 , by partitioning each of four (b) Compute the NCC corresponding to the initial MV and
subblocks at a certain level one by one into another four set the current maximum cost (Cmax) to the computed NCC.

404 Byung Cheol Song ETRI Journal, Volume 33, Number 3, June 2011
MV candidates exist, go to step (h).
(h) Select the MV corresponding to the final Cmax as the best
Upper Upper-
right match to the current MB.
The computed NCC values are stored for the next MB.

Left Current
MB MB IV. Experimental Results
(a) In order to evaluate the performance of the proposed
Initial search point algorithm, eight CIF (352288) video sequences were used,
(-7,-7) (-7,0) (-7,7) that is, Foreman, Containership, Coastguard, Mobile and
Calendar (Mobile), News, Stefan, Tempete, and Hall Monitor.
The luminance components of the first 100 frames of these
(-4,-4) video sequences are adopted in the simulation. The MB size
and the search range are fixed to 1616 and 15 pixels in both
the horizontal and vertical directions, respectively. Simulation
was performed on a dual core CPU at 2.66 GHz.
(0,-7) (0,0) (0,7) The proposed algorithm was compared with FSA and
MSEA [3]. All the algorithms are based on the NCC distortion
criterion. To do this, the MSEA was revised based on NCC
.. rather than SAD. Here, since all the algorithms provide the

same SSIM and the same PSNR, we do not provide those

results in this section. Table 2 summarizes the average number

(7,-7) (7,0) (7,7)

Table 2. Average number of operations per MB for Mobile sequence.
FSA MSEA Proposed
Fig. 3. (a) Neighboring MBs that are used to determine initial ADD/SUB 443,360 21,750 11,483
MV and (b) initial MV-centric search pattern. MLP 445,098 21,250 9,344
DIV 869 2,191 838
l 0 0
(c) Set l to 0. Compute NCC . If C and P are the same at COMP 869 2,190 4,038
this level and the NCC for the reference block is available, SQRT 869 1,067 1,067
employ the stored NCC instead of the NCCl computation. Total 891,065 48,448 26,770
(d) If NCCl Cmax, replace l with l+1 and find a subblock
with the largest complexity according to the given partition rule.
Otherwise, reset l to 0 and go to step (c) with the next MV Table 3. Computational complexity comparison of various sequences.
candidate. If no additional MV candidate exists, go to step (h).
FSA MSEA Proposed
(e) Partition the subblock having the largest complexity and
Speed-up Speed-up
compute its corresponding NCC. Compute NCCl by updating ANOP ANOP ANOP
ratio ratio
the NCC of the partitioned subblock only. If Cl and Pl are same
Foreman 891,065 52,043 17.1 23,964 37.2
at this level and the NCC for the reference block is available,
News 891,065 25,161 35.4 13,405 66.5
employ the stored NCC instead of NCCl computation. If l is
equal to 85, go to step (g). Coastguard 891,065 47,313 18.8 27,485 32.4

(f) If NCCl Cmax, replace l with l+1, find a subblock with Containership 891,065 47,063 18.9 28,993 30.7
the largest complexity according to the given partition rule, and Hall Monitor 891,065 99,091 9.0 33,560 26.6
go to step (e). Otherwise, reset l to 0 and go to step (c) with the Mobile 891,065 48,448 18.4 26,770 23.3
next MV candidate. If no more MV candidates exist, go to step Stefan 891,065 43,603 20.4 25,543 34.9
(h). Tempete 891,065 28,232 31.6 14,582 61.1
(g) If NCC85 Cmax, update Cmax to the NCC85. Reset l to 0
and go to step (c) with the next MV candidate. If no additional

ETRI Journal, Volume 33, Number 3, June 2011 Byung Cheol Song 405
of operations per MB (ANOP) for the Mobile video sequence. Estimation, IEEE Trans. Consum. Electron., vol. 45, no. 3, 1999,
Here, all operations including addition (ADD), subtraction pp. 762-772.
(SUB), multiplication (MLP), division (DIV), comparison [7] C.Y. Choi and J.C. Jeong, New Sorting-Based Partial Distortion
(COMP), and square root (SQRT) are considered. Table 2 Elimination Algorithm for Fast Optimal Motion Estimation,
shows that the proposed algorithm considerably reduces the IEEE Trans. Consum. Electron., vol. 55, no. 4, Nov. 2009, pp.
complexity of the MSEA. The proposed algorithm 2335-2340.
substantially reduces the number of ADD and MLP operations, [8] W.H. Pan et al., A Hybrid Motion Estimation Approach Based
although there are slight increases of the number of COMP on Normalized Cross Correlation for Video Compression, IEEE
operations due to additional comparisons based on (10). Table Proc. ICASSP, 2008, pp. 1037-1040.
3 compares the ANOP and the speed-up ratio over the FSA for [9] Z. Wang et al., Image Quality Assessment: From Error Visibility
various video sequences. to Structural Similarity, IEEE Trans. Image Process., vol. 13, no.
Here, the computational complexity of the proposed 4, 2004, pp. 600-612.
algorithm is significantly lower than the existing algorithms, [10] B.C. Song and J.B. Ra, A Fast Search Algorithm for Vector
regardless of the sequence type. For instance, the proposed Quantization Using L2-norm Pyramid of Codewords, IEEE
algorithm can improve the speed-up ratio up to about 3 times Trans. Image Process., vol. 11, no. 1, 2002, pp. 10-15.
(Hall Monitor) in comparison with the MSEA.
Byung Cheol Song received the BS, MS, and
V. Conclusion PhD in electrical engineering from the Korea
Advanced Institute of Science and Technology
This paper proposes a fast BMA based on NCC, where (KAIST), Daejeon, Rep. of Korea, in 1994,
multilevel Cauchy-Schwartz inequality is employed to skip 1996, and 2001, respectively. From 2001 to
unnecessary block-matching calculation and the elimination 2008, he was a senior engineer at Digital Media
order is determined based on the image complexities of R&D Center, Samsung Electronics Co., Ltd.,
subblocks in the current MB. Also, additional complexity Suwon, Rep. of Korea. In March 2008, he joined the School of
reduction is possible re-using the NCC values for the spatially Electronic Engineering, Inha University, Incheon, Korea, and currently
neighboring MB. The proposed NCC-based algorithm is an assistant professor. His research interests are in the general areas
considerably reduces the computational complexity of a video of video coding, video processing, super-resolution, stereo vision,
encoder. multimedia system design, image coding, content-based multimedia
retrieval, and data mining.
[1] S. Zhu and K.K. Ma, A New Diamond Search Algorithm for
Fast Block Matching Motion Estimation, IEEE Trans. Image
Process., vol. 9, no. 2, 2000, pp. 287-290.
[2] W. Li and E. Salari, Successive Elimination Algorithm for
Motion Estimation, IEEE Trans. Image Process., vol. 4, no. 1,
1995, pp. 105-107.
[3] X.Q. Gao, C.J. Duanmu, and C.R. Zou, A Multilevel Successive
Elimination Algorithm for Block Matching Motion Estimation,
IEEE Trans. Image Process., vol. 9, no. 3, 2000, pp. 501-504.
[4] C. Zhu, W.S. Qi, and W. Ser, Predictive Fine Granularity
Successive Elimination for Fast Optimal Block-Matching Motion
Estimation, IEEE Trans. Image Process., vol. 14, no. 2, 2005, pp.
[5] S.W. Liu, S.D Wei, and S.H. Lai, Fast Optimal Motion
Estimation Based on Gradient-Based Adaptive Multilevel
Successive Elimination, IEEE Trans. Circ. Syst. for Video
Technol., vol. 18, no. 2, 2008, pp. 263-267.
[6] J.N. Kim and T.S. Choi, Adaptive Matching Scan Algorithm
Based on Gradient Magnitude for Fast Full Search in Motion

406 Byung Cheol Song ETRI Journal, Volume 33, Number 3, June 2011