The Research and Implementation of Face Detection and Recognition

May 12, 2010
15:23
RPS : Trim Size: 8.50in x 11.00in (IEEE)
icfcc2010-lineupvol-1:
F179
The Research and Implementation of Face Detection and Recognition Based on

Video Sequences
Qianqian Zhao
Hualong Cai
Engineering & Research Center For Information

Technology On Education In HuaZhong Normal
University
Wuhan, China
e-mail: zqq08@126.com
Ertan Hydropower Development Company, Ltd.

Chengdu, China
e-mail: caihualong@ehdc.com.cn
AbstractThe detection and recognition of motive human face

in video sequences is one of the international research hotspot
currently. In order to achieve a face recognition system based
on video, this paper used the method of difference in
background images and the Kalman filter to track and extract
the region of human body firstly, and then used the AdaBoost
algorithm to detect human face in the region. Finally the
improved Hidden Markov Model which is named as Pseudotwo-dimensional Hidden Markov Model was used for feature
extraction and recognition in face image. In addition the effect
of recognition was tested.
Keywords-Video Sequences; Kalman filter; face detection
and recognition; P2D-HMM
I.
INTRODUCTION
The face detection and recognition based on video

sequences has become one of the hotspot in the field of
computer vision with the development of applications in
video surveillance, information security and humancomputer intelligent interaction. Currently the major research
institutions such as the United States MIT, Cambridge
University in Britain[1] and Japan Toshiba Corporation[2]
all conduct the in-depth study of face detection and
recognition in video sequences.
An automatic face recognition system based on video
sequences could include human tracking module, face
detection module, face feature extraction module and the
face recognition module[3]. The system of face detection and
recognition in video sequences this paper studied is that the
importation is a video and then looking for face image
meeting certain rules such as size, posture and clarity. Next
is describing the characteristics of face images with the
dynamic model-Hidden Markov Model. Last the method of
face recognition in a static image is adopted to determine
whether it is a known face in the face library. There are three
specific processes. It is used the method of difference in
background images and the Kalman filter to track and extract
c
978-1-4244-5824-0/$26.00 2010
IEEE
the region of human body firstly, and then used the

AdaBoost algorithm to detect human face in the region.
Finally the improved Hidden Markov Model which is named
as Pseudo-two-dimensional Hidden Markov Model was used
for face image feature extraction and recognition.
II.
KEY TECHNOLOGIES
A. The extraction and tracking of human body contour in

Video sequences
The extraction of human body contour is to detect the
human body from image sequences and get the body region
which is the body's external contour. For the tracking of
human motion, differentiating in background images is used
to detect the motive objects in the video and then the ratio of
width and height is applied in distinguishing the human body
from the moving objects. Experiments show that the ratio of
width and height for motor vehicles and animals in the image
should be greater than one, while the people should be less
than one. Afterward the human body is tracked by Kalman
filter.
1) The method of difference in background images : The
method of difference in the static background images could
be used to avoid the calculation of estimated value point by
point in the optical flow field. The main process is described
as follows. The reliable and stable background images in
video sequences are extracted according to a certain
algorithm firstly, and then make the difference between
background image and the current frame. The appropriate
threshold is set for getting the difference image. At last the
difference image is executed by a certain post-processing to
obtain the moving target. Define the NO.k frame as fk(x, y),
the NO.k frame's background image as Bk (x,y), then the
NO.k frame's binary difference image is Dk( x,y). The
binary threshold is set as T. The flow chart of the method is
shown as Figure 1.
V1-318
May 12, 2010
15:23
There will be negative influences on image analysis for

using the method to get the difference images directly since
noises often exist in the original image acquisited. The
median filter algorithm, which was adopted in this article
before making differential images, can not only filter the
high-frequency noise, but also do not vague the edges of
objects in foreground. The difference image was divided
into two categories valued for 0 and 1 respectively through
an appropriate threshold when binaried the images. Finally
the binaried images were conducted post-processing by
morphological methods due to many isolated points and
spaceimage after binarization.
2) Kalman Filtering :We can measure the body position
from the current frame in the human motion tracking. The
Kalman filter is used to estimate the body's position,
velocity and acceleration in the current frame. At the same
time the human body position can be forecasted in the next
frame by the estimated value. Two Kalman filter were set to
estimate the body's state of motion so as to reduce the
computational complexity. A filter was used to estimate the
central motion state of the human body in direction, and
the other was used to estimate tthe central motion state of
the human body in direction. The body's motion state can
be described as vector (displacement, velocity,
acceleration), then the human motion measurement equation
for the frame of NO.K in direction is:
=+
(1)
In equation (1), =(1 0 0) and =(1)

+ (1) is the equation of system state, is the
Measurement of noise.
B. Face Detection
There are Haar-features and AdaBoost algorithm used to
face detection in the body area drawn by tracing, and the
detected face image is saved after normalizing in size and
gray-scale. Haar feature is a kind of simple features. Features
are selected rather than the pixel itself has the following
advantages for classified operations. First, the features
include some whole information so it could reduct the
numbers of dimension for classified space; another is that the
speed of computing based on feature is far superior to pixel.
Boosting is a kind of fusion algorithm for classifier and
AdaBoost learning algorithm come into being a strong
classifier through an iterative training of a number of weak
classifiers. The weights of the training samples have been
adjusted after the first train to form a weak classifier so that
[Volume 1]
F179
the weights of the sample are increased, which was trained

by weak classifiers at the first time and was misclassified.
This iteration continues, and finally there is a classifier
which is a linear combination for weak classifiers formed in
the process of each training. As is exhibited in the Figure 2.
C. Feature Extraction and Face Recognition

The question of face recognition in video sequences
transforms into recognizing the face in the image after the
above procedure. Its core is to choose the right ways in
characterizing the human face and the matching strategy, and
there is a close relation between the matching methods and
the characterization of face. Here the pseudo-twodimensional hidden Markov model is brought to describe the
information of human face, and the three algorithms such as
forward-backward algorithm, Viterbi algorithm and BaumWelch algorithm are adopted to solving the problems of
estimation, decoding and certification respectivly.
1) Pseudo-Two-Dimensional Hidden Markov Model :
There is a Markov chain used to simulate the changes of
state in HMM, and the change is described through the
observation sequence indirectly. HMM defines the five basic
states of forehead, eyes, nose, mouth and chin for human
face when it is used in face recognition. There is a
rectangular window sampling face image from top to
bottom, and the pixels in the window will be arranged in
column vectors. It takes gray value as the observed value.
The sampling window is overlapping successively in order
to ensure the continuity of observation sequence. However,
this approach is greatly influenced by light and requires a
large storage capacity. The Embedded Hidden Markov
Model (EHMM) reflects the structure of facial features in
the direction of two-dimensional[4]. A EHMM is consist of
a series of "super state", which contains a number of states
known as embedded state. The super state reflects the
information in one direction of vertical, and the embedded
state then reflects horizontal direction. EHMM is a
2010 2nd International Conference on Future Computer and Communication
V1-319
May 12, 2010
15:23
simplified two-dimensional HMM, known as the pseudotwo-dimensional hidden Markov model (P2D-HMM), and it
can be used to describe the mathematical model of face. The
face P2D-HMM structure chart is shown as Figure 3.
F179
principal component of DCT coefficients as an observation

vector. The forward-backward algorithm is introduced to
compute the posterior probability of each HMM
corresponded to the observation vector in the library. And
then the Viterbi algorithm is applied in comparing the
similar probability between the tested face and someone in
the face database. The HMM which has the greatest
probability is the category the recognized face owned.
III.
THE DESIGN AND IMPLEMENTATION OF THE

RECOGNITION SYSTEM
The input of face recognition system researched in this

paper is the video sequences with static background. Firstly
the method of difference in background images and Kalman
Filter are used to extract and track the movtive human body.
Then the AdaBoost algorithm is to detect and extract the face
region. In the end the pseudo-two-dimensional Hidden
Markov Model is adopted to extracting features of this face
region and then identifying whether it is a known face in the
library. The system diagram is demonstrated in Figure 4.
2) Face Image Sampling and P2D-HMM model training
: The method of filtering is adopted to remove noise prior to
the feature extraction, and here is a 3*3 median filter. In
order to obtain the observed value sequences of face image,
the entire image can be extracted a number of image blocks
in L heighth from top to bottom. It is supposed that there is
an overlapping in P heighth between block and block used
to generate observed value sequences[5][6]. The process in
the generation of observed value sequences of face image is
that the value of the facial feature is extracted. In terms of
the P2D-HMM, a method of feature extraction is to define a
block of scanning window commonly, and the face image is
traversed by moving the scanning window in the directions
of horizontal and vertical translation. Then each scanned
image within the window is executed discrete cosine
transform, and the part of low-frequency of the DCT
coefficients is used as the observation vector.
To achieve a face recognition system, the preliminary
work is to build a face database. It keeps a quantitative
model for facial feature named HMM model, and this
process is called the training of face model. In order to make
the HMM model have a good robustness, it is generally
used to train observed value sequences generated from
sampling with several different facial expressions and
gestures of image samples for the same person. In the
authentication system of distance education, it is necessary
to collect several face images of the remote learners firstly,
and then normalize these images' size and gray-scale. In the
process of face P2D-HMM training, the substate and superstate are computed respectively by the Viterbi algorithm. A
HMM model is established in the face database after
training.
3) Face recognition: Face recognition could be
conducted after the face database established. The first is a
pretreatment of face image. Then do DCT and take
V1-320
There is a computer vision library OpenCV(Intel Open

Source Computer Vision Library) used to do a variety of
processing for the collected face images in the system
realization. It is developed by Intel company that contains a
large number of functions used to deal with common
problems in the area of computer vision, such as motion
analysis and tracking, face recognition, etc[7]. The Face
Recognition System base on video sequences was achieved
by the correlated functions in OpenCV at the programming
environment of MFC in Visual C++6.0.
To test the recognition effects of the system, six
individuals of the project team were selected to test. Each
person's 5 images were acquisited as training samplings in
[Volume 1]
May 12, 2010
15:23
the same camera. In order to improve the credibility of test

results, 10 paragraphes of video sequences were captured
and ten tests were carried on. The video sequences shot at the
entrance of the laboratory for each test, test labs to use this
camera were the test objects, and its frame rate was 25
frames per second.
The input of the system is the video sequences. There is
the identification phase in which determine the existence of
the face in database after the extraction and tracking of
human body contour and face detection. It was found that a
recognition error occurred in 10 tests by running the system.
The study showed that the effects of recognition was
impacted by the lighting when the video was captured. Too
bright or too dark could lead to error results for the detection
of face region in recognition. In addition, the speed of human
motion is one of the influencing factors. Too fast or too slow
also would lead to the failure in extraction of the body
contour.
extraction and recognition. In addition the effect of

recognition was tested. Experiments demonstrated that the
method could extract the motive human body in the video
and then detect and recognize the face effectively. So that it
would provide the foundation for such applications as
human-computer intelligent interaction, video surveillance,
access control system, and authentication, etc.
REFERENCES
[1]
[2]
[3]
[4]
IV.
CONCLUSION
The face detection and recognition in video sequences is

an important research field of computer vision. In order to
achieve a face recognition system based on video, this paper
used the method of difference in background images and the
Kalman filter to track and extract region of human body
firstly, and then used the AdaBoost algorithm to detect
human face in the region. Finally the improved Hidden
Markov Model which is named as Pseudo-two-dimensional
Hidden Markov Model was used for face image feature
[Volume 1]
F179
[5]
[6]
[7]
Arandjelovic' O, Cipolla R. A pose-wise linear illumination manifold

model for face recognition using video. Computer Vision and Image
Understanding, 2009, 113(1): 113-125.
Nishiyama M, Yamaguchi O, Fukui K. Face Recognition with the
multiple constrained mutual subspace method, Pro-ceedings of the
5th International Conference on Audio-and Video-Based Biometric
Person Authentication. New York, 2005: 71-80.
Yan Yan, Yujin Zhang, Face Recognition Research in Video.
CHINESE JOURNAL OF COMPUTERS, Vol.32. No.5, 2009: 878884.
Jing Zhao, "The Research of Face Recognition Based on HMM".
Master thesis, Dalian University, 2008.
Qi YuTing, J.W.Paisley,L.Carin. Music Analysis Using Hidden
Markov Mixture Models. Signal Processing, IEEE Transactions
on.2007,pp,5209-5224.
Ting ChuanWei, Chien JenTzung. Factor analysis of acoustic features
for streamed hidden Markov modeling. Automatic Speech
Recognition&Understanding, ASRU.IEEE Workshop on The Westin
Miyako Kyoto, 2007, pp30-35.
Ruizhen Liu, Shiqi Yu, OpenCV Tutorial-Basics, Beijing University
of Aeronautics and Astronautics Press, 2007.
V1-321

The Research and Implementation of Face Detection and Recognition

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

The Research and Implementation of Face Detection and Recognition

Enviado por

Direitos autorais:

Formatos disponíveis

May 12, 2010

RPS : Trim Size: 8.50in x 11.00in (IEEE)