Escolar Documentos
Profissional Documentos
Cultura Documentos
(3)
1936
( , )
( ) ( , )
i i j
i j N
B l g l l
=
(4)
Now, the maximum problem can be computed as a minimum
problem of the energy E . The first term ( , ) A l o in the
equation (2) is known as the regional term, it assumes the
individual penalties for assigning pixel i as object and
background. This term is calculated by the hard constraints of
users interaction. If the RGBD data of a pixel in the image is
close to the probability model constructed from the users
interaction, then the penalties of seeing this pixel as object or
background are small, otherwise, the penalties are large. The
term ( ) B l comprises the boundary properties of segmentation
L , it is interpreted as a penalty for a discontinuity between i
and j . This penalty is large if the observation data RGBD
between in the neighbor system of MRF is vastly different
from each other, and the opposite will be small if the difference
between two neighborhood pixel data is not very obvious. The
term is a coefficient which specifies a relative importance of
the region properties term ( , ) A l o versus the boundary
properties term ( ) B l .
The likelihood term ( , ) A l o is the similarity between the
probability of the foreground and the background model and
the observation data in the image. Through the form ( )
i i i
f o l
we can see that if we give the label
i
l a value 0 or 1, its
meaning is the how much can the observation
i
o fits the
background or foreground. The main problem in this term is
how to construct the fore/background probability model. We
use the method that the user drags a rectangle covering the
object in the given image. This method reduces users
interaction drastically. It is used in [2]. But they have no depth
information added to their model. They add two GMMs to
estimate the fore/background probability model in RGB color
field. Their formulation is listed here:
1
( | ) ( ; , )
K
i f j i j j
j
f o n c
=
=
(5)
1
( | ) ( ; , )
K
i b j i j j
j
f o n c
=
=
(6)
The two terms in the equation
f
and
b
represent the
foreground model and the background model respectively.
( , , )
i i i
n c is a standard normal distribution also called a
single Gaussian model and K represents the number of the
single Gaussian model.
j
is a coefficient that represents the
proportion of the specific single Gaussian model in the
GMMs(Gaussian Mixture Model). In our paper, we add the
depth information to the equation (5) and (6). We add a single
Gaussian model to the foreground model and a uniform
distribution to the background model. It is the reason that
object is always close together in the depth channel and the
background is kind of complex to fit the uniform distribution.
The form is as follows:
1
( | ) ( ; , ) ( ; , )
K
i f j i j j i d d
j
p o n c n d
=
=
(7)
1
1
( | ) ( ; , )
K
i b j i j j
j
p o T n c
=
=
(8)
The meaning is obvious.
i
c is the color data RGB observed in
the image and
i
d is the depth information sensed from the A
Kinect camera. T is the number of the pixels in the image
which is mentioned before.
More specifically, we use GMMs in RGBD field to set the
region penalties ( , ) A l o and ( ) B l as negative log-likelihoods.
The ultimate formation is listed here:
( { 0}) ( ln | )
i i f i i i
f o l p o l = =
(9)
( { 1}) ( ln | )
i i b i i i
f o l p o l = =
(10)
As for the term ( ) B l , we use the MRF neighborhood
system. It is the four neighborhood system. That is to set the
pixels label either 0 or 1 depends only on the four neighbor
pixels near the pixel. It has value only when the two neighbor
pixels are in different label. And the penalty is calculated as
follows:
2
2
( , ) exp
2
i j
i j
o o
p o o
=
(11)
i
o and
j
o are in the MRF neighborhood system. It penalizes a
lot when the neighboring two pixels have similar intensities
when
i j
o o < . On the other hand, if pixels are very
different,
i j
o o > , then the penalty is small. Intuitively,
this function corresponds to the distribution of noise among
neighboring pixels of an image. We can further perfect the
equation into the form:
( , ) ( , )
i i j i j i j
g l l l l p o o =
(12)
This form means that only the link at contours will be
penalized. It defines the soft constrains in order to compute the
global optimum of the boundaries.
1937
III. ALGORITHM
In this section, we introduce our algorithm of two-layer
iterative graph cut in RGBD image. The core of our algorithm
is use the depth information to distinguish the situation when
the probability models of foreground and background resemble.
With the additional depth channel sensed from the A Kinect
camera we can easy abstract the object from the background.
We also find the graph cut algorithm is somewhat time-
consuming when computes the minimum energy, so we use the
image pyramid to imply the new fast max-flow algorithm at the
up layer of the image pyramid, and we do achieve a relatively
good quality with less time. We also use more layers of image
pyramid, but unfortunately the result is not as good as the two-
layer one.
To start our algorithm, we first need a user to drag a
rectangle to cover the object in the image. Then we use k-
means to estimate the GMMs of the fore/back ground model
with an additional single Gaussian model and a uniform
distribution for the depth channel. After that, we calculate the
two terms in the energy function. This process is different from
the term we give before because we use a strategy of image
pyramid. We extract one pixel in every four neighbor pixels in
the image. Then the pixels in our max-flow algorithm will
decrease three quarters. The tow energy terms are computed in
this scale image. We construct the graph for max-flow
algorithm to use on this scale image too. The optimal boundary
can get from the min-cut got from the max-flow algorithm.
This boundary projects to the bottom layer of the image
pyramid. Again a new repeat begins. Now we list all the key
step of our algorithm. A rough process of our algorithm
procedure is listed as follows:
1. Given an image observation
,
, ( {1 }, {1 }, )
i j
o i I j J I J T = , the
number of iterations IterNum and the
fore/background observation stacks and the parameter
.
2. Read the color and the depth data from the A Kinect
camera and set the 0 IterNum = and the GMMs
number 5 K = . Extract the top layer image
,
, ( [ / 2], [ / 2])
p q
o p i q j = = .
3. If 0 IterNum = , use the users rectangle to estimate
the stacks, pixels in the rectangle put into the
foreground stack, the others put into the background
stack. If 0 IterNum , pixels in bottom layer image
belong to the foreground put into the foreground stack,
the others put into the background stack.
1 IterNum IterNum = + .
4. Use k-means algorithm [15] to estimate the GMMs of
fore/background models on the bottom image layer.
Computer the value of the links with the formula:
, ,
1
( ; , ) ( , n ; ) l
K
j p q j j p q d d
j
n c n d
=
for the
link of the object terminal,
1
,
1
( ; ln , )
K
j p q j j
j
T n c
=
for the link of the
background terminal,
1 1
2
, ,
2
exp
2
p q p q
o o
for the
neighbor link and
1 1
( , ) p q is the coordinate
neighboring ( , ) p q .
5. Use the new fast max-flow algorithm [7] to find the
optimum boundary in the top image, reject it to the
bottom layer.
6. If the result satisfies the user then end, else go to step 2.
IV. EXPERIMENT RESULT
In this section, we imply our method in several images.
Also we compare our method with some other state-of-the-art
methods. We perform our algorithm in two steps. First, we use
two-layer iterative graph cut on the standard image of the
starfish.
(a)first iteration (b)second iteration (c)third iteration
Figure 1.Our two-layer iterative graph cut method on the
standard image of star fish.
(a)first iteration (b)second iteration (c)third iteration
Figure 2.Ordinary iterative graph cut method on the standard
image of star fish.
The upper part of Figure 1 is the segmentation result we get
from the top layer of the image. The blue line in the nether
part presents the contour we get at the bottom layer of the
image. Through the result, we can see that by our two-layer
iterative graph cut the results are almost same. Now we list the
time we use in each iterate in Table 1:
Table 1: Time used to get the results in Figure 1 and Figure 2
two-layer iterative graph cut ordinary iterative graph cut
iterative 1 2 3 iterative 1 2 3
time(ms) 3235 1813 1563 time(ms) 4422 2203 1859
1938
From the Table 1 we see that one part of our method can
reduce the time at the same time preserve the quality of the
segmentation.
Next, we add the depth information to the graph cut
methods, both the two energy terms must be changed to adapt
the depth channel. We compare the results that use only the
color information to our both use color and depth information
method. In particularly, our method outweighs the ordinary
iterative graph cut method when object in the image resemble
the background in RGB color spaces. Our results are listed as
follows:
(a)RGB image (b)depth image
Figure 3.The test RGB image containing a red book and a red
can of coke, and the corresponding depth image. The depth
information is presented with the blue (higher 8 bits) and the
green (lower 8 bits) colors.
(a)first iteration (b)second iteration (c)third iteration
Figure 4.The segmentation results of the ordinary iterative graph
cut method without the use of the depth information, it fails to
separate the red can from the book behind.
(a)first iteration (b)second iteration (c)third iteration
Figure 5.The segmentation results of our method with the
additional depth information sensed from the Kinect camera.
With the depth information added to the two terms in the
energy function we can easily abstract the object from the
background even with the similarity between the foreground
and the background in color space.
V. CONCLUTION
In this paper, we derive the iterative graph cut method in
two ways, we first use a strategy that construct the statistical
model in top layer and compute the min-cut in the bottom
layer. Through this method, we can greatly reduce the time
while achieving the required quality. The other way is that we
use the depth information to the graph cut method in case of
the situation that object and the background have similarity in
color distribution. The energy terms are revised to consider the
additional depth information of the image. Experimental
results show the efficiency of our method.
ACKNOWLEDGMENT
This work is supported by the Twelfth Five Years HiTech
project of the Ministry of Science and Technology, discipline
project of Ningbo University(xkl09154), the Natural Science
Foundation of Zhejiang (D1080807), and the Scientific
Research Foundation of Ningbo University ( G11JA017).
REFERENCES
[1] Y.Boykov and V.Kolmogorov. Interactive graph cut for optimal
boundary an region segmetation of objects in N-D images. In ECCV,
2004.
[2] C.Rother, V.Kolmogorov, and A.Blake. Grabcut-interactive foregournd
extraction using interated graph cut. SIGGRAPH, August 2004.
[3] D.Greig, B.Porteous, and A.Seheult. Exact maximum a posteriori
estimation for binary images. J.of the Royal Statistical Society Series B,
51(2):271-279, 1989.
[4] D.M. Greig, B.T. Porteous and A.H. Seheult. Exact maximum a
posteriori estimation for binar images, Journal of the Royal Statistical
Society Series B, 51, 271-741. 1989.
[5] D.Geman and S.Geman. Stochastio relaxation, Gibbs distribution and
the Bayesian restoration of images, IEEE Trans.Pattern Anal. Mach.
Intell., 6, 721-741. 1984.
[6] J.E. Besag, On the statistical analysis of dirty pictures (with discussion),
Journal of the Royal Statistical Society Series B, 48, 259-302. 1986.
[7] Y.Boykov and V.Kolmogorov. An experimental comparison of min-
cut/max-flow algorithms for energy minimization in vision. In 3
rd
.
Intnl.Workshop on Energy Minimization Methods in Computer Vision
and Pattern Recongnition(EMMCVPR). Springer-Verlag, September
2001, to appear.
[8] R.M.Haralick and L.G.Shapiro. Computer and Robot Vision. Addison-
Wesley Publishing Company, 1992.
[9] A.Goldberg and R.Tarjan. A new approach to the maximum flow
problem. Journal of the Association for Computer Machinery,
35(4):921-940, October 1988.
[10] S.Vicente, V.Kolmogorov and C.Rother. Graph cut based image
segmentation with conectivity priors. In CVPR, 2008.
[11] S.Vicente, V.Kolmogorov and C.Rother. Joint optimization of
segmentation and appearance models. In ICCV, 2009.
[12] O.J.Woodford, C.Rother and V.Kolmogorov. Aglobal perspective on
map inference on map inference for low-level vision. In Microsoft
Research Technical Report, 2009.
[13] M.Ruzon, and C.Tomasi. Alpha estimation in natural images. In
Proc.IEEE Conf.Comp.Vision and Pattern Recog. 2000.
[14] Y.-Y.Chuang, B.Curless, D.Salesin, and R.Szeliski. A Bayesian
approach to ditital matting. In Proc.IEEE Conf.Computer Vision and
Pattern Recon. 2001.
[15] M.Inaba, N.Katoh and H. Imai. Applications of weighted Voronio
diagrams and randomization to variance-based k-clustering.
Proceedings of 10
th
ACM Symposium on Computational Geometry. pp.
332-339.1994.