Você está na página 1de 16

Frontal-View Face Detection and Facial Feature Extraction using Color, Shape and Symmetry Based Cost Functions

Eli Sabery and A. Murat Tekalp Department of Electrical Engineering and Center for Electronic Imaging Systems University of Rochester, Rochester, NY 14627-0126 Phone: (716) 275-3774 Fax: (716) 473-0486 E-mail: fsaber,tekalpg@ee.rochester.edu y Xerox Corporation 800 Phillips Road, Building 200-01A, Webster, New York 14580 Phone: (716) 265-5918 Fax: (716) 422-9395

We describe an algorithm for detecting human faces and facial features, such as the location of the eyes, nose, and mouth. First, a supervised pixel-based color classi er is employed to mark all pixels that are within a prespeci ed distance of \skin color," which is computed from a training set of skin patches. This color-classi cation map is then smoothed by Gibbs random eld model-based lters to de ne skin regions. An ellipse model is t to each disjoint skin region. Finally, we introduce symmetry-based cost functions to search the center of the eyes, tip of nose, and center of mouth within ellipses whose aspect ratio is similar to that of a face. Face detection facial feature detection tion Gibbs random elds image segmentation shape classi ca-

Abstract

1 Introduction
Automatic detection and recognition of faces from still images and video is an active research area. A complete facial image analysis system should be able to localize faces in a given image, identify and pin-point facial features, describe facial expressions, and recognize people. Most facial expression analysis and face recognition systems work with the assumption that the location of the face within a frame is known. This assumption is suitable for scenes with a uniform background; however, in images with complex background, faces must be localized before any recognition can be performed. This paper proposes a system to detect human face based on color and shape information; and then, locate the eyes, nose, and mouth through symmetry-based cost functions. Recent surveys summarizing the progress in face detection and recognition can be found in 1, 2, 3]. Due to space constraints, we review only a few representative examples of these systems. 1

In 4], Craw et al. extracts the head area from mug-shot images by placing constraints on the location of the head in a multiresolution scheme, where coarse resolutions as low as 8 8 up to full scale ( ne resolution) are examined. The facial features such as lips, eyebrows, and eyes are then extracted by line/edge following and energy minimization. Turk et al. 5] proposed a KarhunenLoeve (KL) based approach for locating faces by using \eigenfaces" obtained from a set of training images. Use of the KL transform for face representation has also been proposed in 6, 7]. However, the localization performance, in the KL based approaches, degrades quickly with changes in scale and orientation. Deformable template models have been used in 8, 9] to locate the eyes and mouth. In addition, active contour models were employed in 10] to capture the eyebrows, nostrils and face. These techniques rely heavily on \near" perfect segmentations or edge detection. Furthermore, the extracted contours are highly dependent on the initialization of the snake or active contour model, and on the parameters involved in de ning these models. In 25], the authors discuss a face localization and feature detection system which employs morphological ltering and blob coloring to generate hypothesis about eye locations followed by the use of deformable templates and the Hough transform to con rm these hypotheses. In 11], Yang et al. proposed a hierarchical three level knowledge-based system for locating human faces and facial features in relatively complex backgrounds. Chang et al. 12] proposed a color segmentation and thresholding based algorithm to pinpoint the eyes, nostrils and mouth in color \head and shoulder" images. The skin segmentation is performed on a pixel-by-pixel basis, where a pixel is classi ed as skin if its chromaticity falls within a pre-speci ed region of the chromaticity space. The eyes, nostrils, and mouth are located within pre-determined bounding boxes within the skin mask by thresholding the low intensity pixels, the normalized red component, and the normalized red (R) + blue (B) ?2 green (G) component, respectively. Sung et al. 13] described an approach for face detection by utilizing 19 19 window patterns and exhaustively searching a given image at all possible scales. The pattern classi cation is performed by employing a trained neural network. Colmenarez et al. 14] introduced an algorithm for detecting facial patterns based on symmetry measurements. This symmetry measurement is utilized to locate the center line of faces by means of correlation. The algorithm performs a search on all possible sizes and locations of the head. Chen et al. 15] proposed an algorithm for extracting human faces from color images. The algorithm determines the location of the face by rst extracting the skin color regions, and then matching these regions, using a fuzzy pattern matching algorithm, to face models at multiple scales. Yow et al. 26] described a 2

probabilistic based face detection algorithm where feature points are located by spatial lters. The contribution of this paper is to describe an algorithm for detecting human faces using color and shape information, and then localizing the eyes, nose, and mouth through the use of symmetry-based cost functions. The algorithm is divided in three steps: 1) supervised skin/nonskin color classi cation, 2) shape classi cation and 3) eye, nose, and mouth localization. The skin/non-skin classi cation is performed by utilizing the chrominance channels of the YES color space 16] followed by Gibbs random eld (GRF) model-based smoothing in order to yield contiguous regions. The shape classi cation is achieved by employing the eigenvalues and eigenvectors computed from the spatial covariance matrix to t an ellipse to the skin region under analysis. The Hausdor distance is employed as a means for comparison, yielding a measure of proximity between the shape of the region and the ellipse model. Finally, the eye centers are localized by utilizing cost functions designed to take advantage of the inherent symmetries associated with face and eyes locations. Subsequently, the tip of the nose and the center of the mouth are located by utilizing the distance between the eye centers. A owchart of the overall algorithm is depicted in Figure 1. The output of the proposed algorithm is a segmentation map indicating the location of the face, the eyes centers, the nose tip, and the mouth center.

2 Proposed Segmentation/Feature Extraction Algorithm


This section provides a detailed explanation of the steps of the algorithm.

2.1 Color space


A common representation of color images is by red (R), green (G), and blue (B) tristimulus values. Because the RGB space is sensitive to intensity variations, many linear and nonlinear color spaces have been proposed to achieve better color constancy; and hence, more robust color image processing 17, 18]. Examples of these spaces include the YCrCb \luminancechrominance" and the I1-I2-I3 space that were proposed for color segmentation 19]. However, it is generally agreed that there does not exist a single space which is suitable for all color images 20]. In this paper, we use the so-called YES space, where Y represents the luminance channel and E and S denote the chrominance components 18]. The YES space is de ned by a linear transformation of the SMPTE (Society of Motion Picture and Television Engineers)

RGB coordinates, given by


2 6 4

Y 7 6 0:253 0:684 0:063 7 6 R E 5 = 4 0:500 ?0:500 0:000 5 4 G B 0:250 0:250 ?0:500 S

32

3 7 5

(1)

The YES space was chosen because: (1) it reduces variations in chrominance due to changes in luminance, (2) it is computationally e cient (the E and S channels can be computed from R, G, and B by shifting bits rather than multiplication), and (3) it is free of nonsingularities (nonlinear spaces may have singularities). However, the proposed algorithm is applicable in any color space.

2.2 Supervised Skin/Non-Skin Color Classi cation


The supervised pixel-based color classi cation module implements the algorithm presented in 16]. Let wij = Eij Sij ]T denote a vector composed of the chrominance components for a pixel at the site (i; j ). The class-conditional pdf of wij belonging to the skin class xij is modeled by a two-dimensional Gaussian,
?1=2 exp (? 1 wij ? p(wij j xij ) = (2 )?n=2 jK x ij j 2
ij ij

xij ]T hK xij i?1 ij ij


ij

wij ? x ij ])
ij

(2)

x where the mean vector x ij and the covariance matrix K ij are estimated from an appropriate training set. This model is based on the assumption that the chrominance vector at the pixel (i; j ), which belongs to region xij , can be represented by the mean chrominance vector of class xij plus a zero-mean Gaussian residual. Note that the contour of the pdf shown above de nes an ellipse in the ES domain, whose center and principal axis are determined by x ij x and K ij respectively. A binary hypothesis test with an image-adaptive threshold is then employed to decide whether each pixel in a given image belongs to the skin class or not, where the threshold serves to de ne the radius of the ellipse utilized for classi cation. The thresholds can be estimated either at run time from user speci ed con dence bounds, or pre-computed by using receiver operating characteristic (ROC) analysis on a set of training images. The ROC analysis, quantifying optimum \true-positive" vs \false-positive" performance on the training set, yields a universal classi cation threshold. The combined use of the universal threshold and the classi cation histogram serves to adapt the threshold to the image at hand as described in detail in 16]. The main advantage of this approach is its computational e ciency. The results indicate that color can serve as a powerful initial classi er for locating skin regions.
ij ij

2.3 Region Generation by GRF Filtering


Smoothing of the pixel-based color classi cation map by GRF ltering provides contiguous label clusters. The GRF ltering is accomplished by maximizing the a posteriori probability of the segmentation labels x given the observed chrominance data w, where wij = Eij Sij ]T denotes the chrominance vector of a pixel at location (i; j ). According to the Bayes theorem, the a posteriori probability density function (pdf) is given by )p(x) / p(w j x)p(x) p(x j w) = p(wpj(x (3) w) where p(w j x) is the conditional pdf of the observed chrominance vectors of an image given the region labels, and p(x) is the a priori pdf of the region process. Note that since p(w) does not depend on x, it does not e ect the maximization process, and thereby can be ignored. The region process is modeled by a Gibbs distribution (GD) given by ( ) X 1 p(x) = exp ? V (x) (4)

where Z , the partition function, is simply a normalizing constant, and C denotes the collection of all cliques. In our model, we ignore singleton clique potentials assuming that both classes are equally likely. The two-pixel clique potential, which is used to impose a spatial smoothness constraint on the segmentation, is de ned as: ( ? if xij = xkl and (i; j ); (k; l) 2 c (5) Vc (x) = + if xij 6= xkl and (i; j ); (k; l) 2 c: The parameter is taken as a positive quantity indicating that two neighboring pixels are more likely to belong to the same class than to di erent classes. The larger the value of , the stronger the smoothness constraint. With a speci ed value of , the GD of Eq. (4) can be computed for any given segmentation map x. For a more detailed discussion of Gibbs random elds, the reader is referred to 21, 22, 23]. Assuming conditional independence among the pixels in the image, p(w j x) can be computed as the product of the conditional pdfs shown in Eq. (2). Hence, the a posteriori pdf (3) becomes 8 9 < X1 = h i X ? 1 x T p(x j w) / exp :? 2 wij ? x wij ? x VC (x); (6) ij ] K ij ij ] ? C (i;j )
ij ij ij

c2C

Eq. (6) has two components. The rst constrains the region chrominance to be close to the data (consistency term), and the second imposes spatial continuity. The MAP pixel classi cations can then be obtained by maximizing Eq. (6) iteratively. 5

The GRF algorithm employed to maximize Eq. (6) is shown in Figure 2. In summary, an iteration is composed of two steps: Starting with an initial segmentation obtained by utilizing the approach described in 2.2, the rst step calculates the mean vector and covariance matrix for all pixels belonging to a particular class within a window, whose size is initially taken to be equal to the size of the image itself. In order to reduce the computational burden, the mean vector and covariance matrix are computed on a grid of points. The spacing between the grid points is chosen equal to half the window size resulting in a 50% overlap in each direction. The remaining values are obtained by utilizing bilinear interpolation. In the second step, the segmentation labels are updated by utilizing the iterated conditional modes (ICM) algorithm. This procedure often converges to a local minimum of the Gibbs potential within a relatively small number of cycles. Upon convergence, the size of the window, employed in the computation of the means, is then reduced by a factor of two in both horizontal and vertical directions, and the procedure is repeated until the window size is smaller than a speci ed threshold. As a result, the algorithm starts with global estimates and slowly adapts to the local characteristics of each region.

2.4 Shape Classi cation Model


This step eliminates those regions that are \dissimilar" to a face template, whose shape is modeled by an ellipse. Let m1 and m2 denote the horizontal and vertical axes, respectively. Using the spatial coordinates of the skin identi ed pixels within a contiguous region, we form the spatial covariance matrix " # 2 m 1 m2 m 1 R= 2
m1 m2 m2

where m1 , m2 represent the sum of squared distances for each pixel from the centroid, and m1 m2 represents the cross-variance. The eigenvalues of R, ( 1; 2), provide us with a reasonable estimate of the spread of the skin region in the direction of the corresponding eigenvectors (v1 ; v2 ). The directions of the eigenvectors indicate the principal axis of the skin classi ed region, as shown in Figure 3. The resulting eigenvectors are utilized as the major and minor axis of the ellipse shape model m012 + m022 = c (7)
2 2

ellipse. Its center is taken as the centroid of the skin region as shown in Figure 3. We utilize the Hausdor distance as a means for comparison between the skin region shape and the ellipse 6

where m01 and

m02 denote the principal axis coordinates, and c represents the \radius" of the

model for various values of c in the interval cmin ; cmax], where cmin is set to 1. cmax is taken as twice the value of c where all the pixel coordinates of the skin region are enclosed within the ellipse border as shown in Figure 3. Given two nite point sets S 1 = fe0 ; e1 ; ::; ep g and S 2 = ff 0; f 1 ; ::; f q g, the Hausdor distance is de ned as:

H (S 1; S 2) = max(h(S 1; S 2); h(S2; S 1))


where:

(8) (9) (10)

h(S 1; S 2) = max min jje ? f jj e2S 1 f 2S 2

and jj:jj is the Euclidean norm. The Hausdor distance is employed because: 1) it follows from Eq. (8) that if the computed Hausdor distance is d, then every point in S 1 must be within a distance d of some point in S 2 and vice versa, 2) no explicit pairing of points between S 1 and S 2 is required, 3) the number of points in S 1 does not have to be equal to that of S 2, as would be the case if one were to utilize a mean square error measurement, and 4) its ease of computation. Consequently, the value of c that minimizes the Hausdor distance is chosen as the optimum value. The corresponding distance value is utilized as a measure of similarity between the shape of the skin region and the ellipse model, rejecting those that result in a measure greater than a pre-speci ed threshold.

2.5 Symmetry-based Cost Functions for Eye, Nose, and Mouth Localization
Once the facial pattern has been detected and the major and minor axes identi ed, we introduce symmetry-based cost functions to locate the eyes, nose, and mouth within the facial segmentation mask. Those regions within the skin map that are classi ed as non-skin colored are de ned as \holes", e.g., the eyes are not skin colored. Examples of \holes" can be seen in the \skin-segmentation" maps shown in the results section. Our goal is to: 1) nd the two \holes" within the face mask that are most likely the eyes, and 2) locate the nose and the mouth once the eyes have been identi ed. The proposed cost functions take advantage of the inherent symmetries associated with facial patterns and features. The centroids of the detected \holes" are thereby taken as the location of the eyes. Consequently, the distance between the \eye" centers is employed to locate the tip of the nose and the center of the mouth. The details are as follows: The cost functions are designed to take advantage of the following facts on an upright \mug-shot" facial pattern: 7

1. Eyes are located on a line which is parallel to the minor axis. 2. Eyes are symmetric with respect to the major axis represented by the direction of the eigenvector corresponding to the larger eigenvalue. 3. Eyes are equidistant from the the minor axis represented by the direction of the eigenvector corresponding to the smaller eigenvalue. 4. Eyes are, for the most part, the closest \holes" in the skin segmentation mask to the minor axis. 5. Eyes are located above the minor axis. Let (uk ; vk ), (ul; vl ) de ne the centroids of two \holes" within the skin segmentation mask, and (Uc ; Vc ) indicate the centroid of the mask itself. The rst cost function is de ned as:
1 Ckl = Abs(vk ? vl )

(11)

1 A quick examination of Eq. (11) reveals that Ckl reaches its minimum when the two centroids are located on a horizontal line. We assume that the image has been rotated to align the major and minor axis with the vertical and horizontal directions, respectively. The second cost function is designed to take advantage of the fact that the eyes are generally symmetric with respect to the major axis represented by the direction of the eigenvector corresponding to the larger eigenvalue. It is de ned as: 2 = Abs Abs(uk ? Uc ) ? Abs(ul ? Uc )] Ckl

(12)

2 Likewise, Ckl reaches its minimum when the centroids of the two \holes" are equidistant from the vertical axis. Our third proposed cost function is de ned as follows: 3 Ckl = Abs Abs(vk ? Vc ) ? Abs(vl ? Vc)]

(13)

which reaches its minimum if the two \holes" are equidistant from the horizontal axis. The fourth cost function is designed to locate the closest \holes" in the skin segmentation mask to the horizontal minor axis. It is de ned as
4 Ckl = Abs(vk ? Vc ) + Abs(vl ? Vc )

(14)

Its minimum is reached when the centroids of the two \holes" are located exactly on the minor axis; and its value increases as these centroids are moved further away. The fth and nal cost function is de ned as:
5 Ckl = (vk ? Vc ) + (vl ? Vc )

(15)

otherwise. The weighted combination of these cost functions is de ned as:

5 Ckl is negative when the two \holes" are located above the horizontal minor axis; and positive

Ckl =

5 X =1

w Ckl

(16)

Its minimum represents the \two holes" within the facial segmentation mask that are most likely the eyes. The centroid of these holes is marked as the location of the eyes, as shown in Figure 4, where d represents the distance between the eye centers. Upon localizing the eye centers, the tip of the nose and the center of the mouth can then be located on an axis that passes, with a 90 degree angle, through the middle of the eye center axis, as shown in Figure 4. Consequently, the distances between the tip of the nose (center of mouth) and the eye centers axis are de ned as t1 d (t2 d).

3 Results
The skin/non-skin classi er training was performed on a database of eshtone patches extracted from 40 face images representing various ethnic backgrounds. The outcome is a mean vector, a covariance matrix, and a universal threshold optimized through a true-positive/false-positive analysis on the training set. The details of the training procedure can be found in 16]. The facial feature detection training, is performed on the same set of 40 faces, where the eyes, nose, and mouth are manually located and segmented. Consequently, the thresholds t1 and t2 as well as the weights used in (16) are determined empirically yielding optimum performance on the training set. The method is tested on 50 RGB images from the Xerox personnel access badges database, where the images are typically of head and shoulder type. The training images were outside the test set. Figures 5, 6, 7 and 8 demonstrate the performance of our facial pattern detection and eye, nose and mouth localization algorithm on four of these images. Our de nition of background for each image consists of any non facial type scenery such as clothing, walls, 9

hair etc. In each gure, we show the original image, the skin segmentation resulting from the adaptive pixel-based color classi cation followed by smoothing using GRF, the ellipse t to the skin classi ed region superimposed on the GRF based skin classi cation, and the nal result where the location of the center of the eyes, the tip of the nose, and the center of the mouth are indicated by \ + " superimposed on the original image. By examining Figures 5 d, 6 d, 7 d and 8 d, it can be easily seen that the facial features were located accurately even in the presence of a beard. As a nal note, the proposed algorithm is very e ective in locating faces and facial features, where the facial pattern appears as a front view and both eyes are visible within the scene. The color/shape classi cation and feature localization are achieved within a few seconds on a Sparc 20 without any attempts to optimize the software. We believe that its performance can be further enhanced by proper software optimization.

4 Conclusions
The proposed algorithm is comprised of three steps: 1) supervised skin/non-skin color classi cation, 2) shape classi cation and 3) eyes, nose, and mouth localization. It can be used as a pre-processing step to a face recognition system, where it is required to automatically identify/locate the face and facial features within the scene. The region generation step of the algorithm employs Gibbs random elds in order to smooth the pixel based color classi cation. This step could also be accomplished by utilizing other regions generation techniques such as morphology or any type of connected components algorithm that would meet the same criteria as the GRF process. The proposed algorithm is limited to frontal facial views, and consequently would not yield meaningful results on pro les. The feature localization portion of the algorithm requires that: 1) the outer corners of both eyes be visible within the scene i.e. rotations in and out of the image plane are allowed up to the point where the outer corner of either eye is no longer visible, and 2) the tilting of the head, which results in a rotation within the image plane, be limited in its trajectory such that the head does not become horizontal or extend past the horizontal resulting in an up side down image at the extreme. The above conditions do not present any serious limitations in security type applications where the person would have to walk past a camera or show a picture badge to gain assess to a particular facility, since the camera would be mounted such that both outer eye corners are visible and the head is, more or less, in a 10

vertical position. The detection of the tip of the nose and center of the mouth is dependent on the location of the center of the eyes within the skin segmentation mask. Faces with eye glasses pose a limitation to the algorithm in the cases where the rims of the glasses - generally not of skin color - cause the facial region to be segmented into multiple fragments, due to the use of color in the segmentation process. This could yield, for instance, a forehead region, multiple eye regions, and a nose and mouth region. Our current implementation relies on the ability of the segmentation step to result in a single region for each face within the scene. At the present, we are investigating the use of this technology in a face detection and recognition system which will control access of employees to certain facilities. The goal of the system is to: 1) eliminate the need and cost for human operators, 2) automate the process, and 3) allow twenty-four hour assess. The user would be required to stand directly in front of a camera in order to gain assess to the building or facility of choice. The system would perform the classi cation, localization and recognition from a database of faces and, as a result, grant assess to selected personnel. Note that the recognition portion of the overall system could, for instance, be implemented by correlation between the detected face and the templates found in the database, since the detection algorithm, proposed in this paper, provides a reasonable estimate for the scale of the image (distance between eye centers) and the rotation of the face (eigenvectors). These estimates would serve in reducing the need for multiple scale and rotation example of the same face thereby improving the outcome of the recognition by correlation.

5 Acknowledgments
This work is supported in part by The Document Company, Xerox, an SIUCRC grant from National Science Foundation and a New York State Science and Technology Foundation grant to the University of Rochester. The authors wish to express their gratitude to the reviewers whose comments have enhanced the quality of this manuscript.

11

References
1] A. Samal and P. A. Iyengar, \Automatic recognition and analysis of human faces and facial expressions: a survey," Pattern Recognition, vol. 25, no. 1, pp. 65{77, 1992. 2] D. Valentin, H. Abdi, A. J. O'Toole, and G. W. Cottrell, \Connectionist models of face processing: a survey," Pattern Recognition, vol. 27, no. 9, pp. 1209{1230, 1994. 3] R. Chellappa, C. L. Wilson, and S. Sirohey, \Human and machine recognition of faces: a survey," Proc. IEEE, vol. 83, pp. 705{740, May 1995. 4] I. Craw, H. Ellis, and J. Lishman, \Automatic extraction of face features," Pattern Recognition Letters, vol. 5, pp. 183{187, 1987. 5] M. Turk and A. Pentland, \Eigenfaces for recognition," Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71{86, 1991. 6] L. Sirovich and M. Kirby, \Low-dimensional procedure for the characterization of human faces," Journal of Opt. Soc. Am., vol. 4, pp. 519{524, 1987. 7] M. Kirby and L. Sirovich, \Application of the Karhunen-Loeve procedure for the characterization of human faces," IEEE Trans. Patt. Anal. Mach. Intel., vol. 12, pp. 103{108, 1990. 8] A. L. Yuille, P. W. Hallinan, and D. S. Cohen, \Feature extraction from faces using deformable templates," Int. Jour. Computer Vision, vol. 8, no. 2, pp. 99{111, 1992. 9] C. L. Huang and C. W. Chen, \Human facial feature extraction for face interpretation and recognition," Pattern Recognition, vol. 25, no. 12, pp. 1435{1444, 1992. 10] C. L. Huang, T. Y. Cheng, and C. C. Chen, \Color image segmentation using scale space lter and Markov random eld," Pattern Recognition, vol. 25, no. 10, pp. 1217{1229, 1992. 11] G. Yang and T. S. Huang, \Human face detection in a complex background," Pattern Recognition, vol. 27, no. 1, pp. 53{63, 1994. 12] T. C. Chang, T. S. Huang, and C. Novak, \Facial feature extraction from color images," in Int. Conf. Pattern Recognition, (Israel), pp. 39{43, Oct. 1994. 13] K. K. Sung and T. Poggio, \Example-based learning for view-based human face detection," Tech. Rep. 1521, M. I. T. : Arti cal Intelligence Laboratory and Center for Biological and Computational Learning, December 1994. 12

14] A. J. Colmenarez and T. S. Huang, \Frontal view face detection," in SPIE, vol. 2501, pp. 90{98, 1995. 15] Q. Chen, H. Wu, and M. Yachida, \Face detection by fuzzy pattern matching," in Int. Conf. Computer Vision, pp. 591{596, 1995. 16] E. Saber, A. M. Tekalp, R. Eschbach, and K. Knox, \Automatic image annotation using adaptive color classi cation," Graphical Models and Image Processing, vol. 58, pp. 115{126, March 1996. 17] G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, NY: Wiley, 2nd ed., 1982. 18] \Xerox color encoding standards," Xerox Systems Institute, Sunnyvale, CA, 1989. 19] Y. Ohta, T. Kanade, and T. Sakai, \Color information for region segmentation," Computer Graphics and Image Processing, vol. 13, pp. 222{241, 1980. 20] J. Liu and Y. H. Yang, \Multiresolution color image segmentation," IEEE Trans. Patt. Anal. Mach. Intel., vol. 16, pp. 689{700, July 1994. 21] S. Geman and D. Geman, \Stochastic Relaxation, Gibbs Distribution and the Bayesian Restoration of Images," IEEE Trans. Patt. Anal. Mach. Intel., vol. 6, pp. 721{741, November 1984. 22] H. Derin and H. Elliott, \Modeling and segmentation of noisy and textured images using Gibbs random eld," IEEE Trans. Patt. Anal. Mach. Intel., vol. 9, pp. 39{55, Jan. 1987. 23] R. C. Dubes and A. K. Jain, \Random eld models in image analysis," Journal of Applied Statistics, vol. 16, no. 2, pp. 131{164, 1989. 24] E. Saber, A. M. Tekalp, and G. Bozdagi, \Fusion of color and edge information for improved segmentation and edge linking," in Int. Conf. Acoustics, Speech, and Signal Processing, (Atlanta, Georgia), May 1996. 25] G. Chow and X. Li, \Towards a system for automatic facial feature detection", Pattern Recognition, vol. 26, no. 12, pp. 1739{1755, 1993. 26] K. C. Yow and R. Cipolla, \Feature-based human face detection", Image and Vision Computing, vol. 15, pp. 713{735, 1997.

13

Digital Color Image

Pixel-based color classification

Supervised GRF yielding contiguous regions

Select first skin region

Go to next region No if all regions visited No

Fit an ellipse to the selected region

Yes

if fit within threshold Yes

Location of face, eyes, nose, and mouth

Locate the eyes using the symmetry-based cost functions

Locate the nose and mouth using distance between eyes

Figure 1: Facial pattern detection and eye, nose and mouth localization algorithm 14

Initial segmentation

Window equals whole image

Iteration = 1

Given x, compute new estimates for mean vectors and covariance matrices Reduce Window size by 2 Given mean vectors and covariance matrices estimate new segmentation x Y

Window > Minimum size

iteration = iteration + 1

N Smoothed color segmentation

x converged or iteration = max

Figure 2: Supervised GRF Segmentation algorithm

15

Image

V2

Boundary of Skin classified region

V1

c = cmax c = optimum value c = cmin Centroid

Figure 3: Modeling of the skin classi ed region in terms of an ellipse

Tip of nose

t xd 1

t xd 2

Center of mouth

Figure 4: Location of tip of the nose and the center of the mouth 16

Você também pode gostar