Escolar Documentos
Profissional Documentos
Cultura Documentos
Lin,Wei-Yang
9022090964
May 16, 2003
Introduction
Face Recognition Technology (FRT) is a research area spanning several disciplines such
as image processing, pattern recognition, computer vision and neural network. There are
many applications of FRT as shown in Table 1. These applications range from matching
of photographs to real time matching of surveillance video. Depending on the specific
application, FRT has different level of difficulty and requires wide range of techniques.
In 1995, a review paper by Chellappa et al. [1] gives a through survey of FRT at that
time. During the past few years, FRT is still under rapid evolution.
In this report, I will provide a brief review of most recent developments in FRT.
Though many FRT have been proposed, robust face recognition is still difficult. The
recent FERET test [5] has revealed that there are at least two major challenges:
Either one or both problems can cause serious performance degradation in most of
existing systems. Unfortunately, these problems happen in many real world applications,
such as surveillance video. In the following, I will discuss some existing solutions for
these problems.
The general face recognition problem can be formulated as follows: Given single image
or sequence of images, recognize the person in image using a database. Solving the
problem consists of following steps: 1) face detection, 2) face normalization, 3) inquire
database.
2.1
Images of the same face appear differently due to the change in lighting. If the change
induced by illumination is larger than the difference between individuals, systems would
not be able to recognize the input image. To handle the illumination problem, researchers
have proposed various methods. It has been suggested that one can reduce variation by
discarding the most important eigenface. And it is verified in [18] that discarding the first
few eigenfaces seems to work reasonably well. However, it causes the system
performance degradation for input images taken under frontal illumination.
In [19], different image representations and distance measures are evaluated. One
important conclusion that this paper draws is that none of these method is sufficient by
itself to overcome the illumination variations. More recently, a new image comparison
method was proposed by Jacobs et al. [20]. However this measure is not strictly
illumination-invariant because the measure changes for a pair of images of the same
object when the illumination changes.
An illumination subspace for a person has been constructed in [21, 22] for a fixed view
point. Thus under fixed view point, recognition result could be illuminationinvariant.
One drawback to use this method is that we need many images per person to construct the
basis images of illumination subspace.
In [23], the authors suggest using Principal Component Analysis (PCA) to solve
parametric shape-from-shading (SFS) problem. Their idea is quite simple. They
reconstruct 3D face surface from single image using computer vision techniques. Then
compute the frontal view image under frontal illumination. Very good results are
demonstrated. I will explain their approach in detail later. Actually, there are a lot of
issues in how to reconstruct 3D surface from single image.
I will discuss two important illumination-invariant FRT in the follow sections.
2.2
The system performance drops significantly when pose variations are present in input
images. Basically, the existing solution can be divided into three types: 1) multiple
images per person are required in both training stage and recognition stage, 2) multiple
images per person are used in training stage but only one database image per person is
available in recognition stage, 3) single image based methods. The second type is the
most popular one.
has been proposed for handling both pose and illumination problems. This method is
based on illumination cone to deal with illumination variation. For variations due to
rotation, it needs to completely resolve the GBR (generalized-bas-relief) ambiguity when
reconstructing 3D surface.
Hybrid approaches
probably the most practical solution up to now. Three reprehensive methods are reviewed
in this report: 1) linear class based method [25], 2) graph matching based method [26], 3)
view-based eigenface method [27]. The image synthesis method in [25] is based on the
assumption of linear 3D object classes and extension of linearity to images. In [26], a
robust face recognition scheme based on EBGM is proposed. They demonstrate
substantial improvement in face recognition under rotation. Also, their method is fully
automatic, including face localization, landmark detection and graph matching scheme.
The drawback of this method is the requirement of accurate landmark localization which
is not easy when illumination variations are present. The popular eigenface approach [28]
has been modified to achieve pose-invariant [27]. This method constructs eigenfaces for
each pose. More recently, a general framework called bilinear model has been proposed
[41]. The methods in this category have some common drawbacks: 1) they need many
images per person to cover possible poses. 2) The illumination problem is separated from
the pose problem.
for face recognition [38] and is robust to small-angle rotation. There are many
papers on invariant features in computer vision literature. There are little literatures
talking about using this technology to face recognition. Recent work in [39] sheds
some light in this direction. For synthesizing face images under different lighting or
expression, 3D facial models have been explored in [40]. Due to its complexity and
computation cost it is hard to apply this technology to face recognition.
In the following sections, I will discuss some recent research works in face recognition.
3.1
The basic idea of SFS is to infer the 3D surface of object from the shading information in
image. In order to infer such information, we need to assume a reflectance model under
which the given image is generated from 3D object. There are many illumination models
available. Among these models, the Lambertian model is the most popular one and has
been used extensively in computer vision community for the SFS problem [42]. The
nature of SFS makes it an ill-posed problem in general. In other words, the reconstructed
3D surface cannot synthesize the images under different lighting angle. Fortunately,
theoretical advances make SFS problem a well-posed problem under certain conditions.
The key equation in SFS problem is the following irradiance equation [43]:
I [ x, y ] R ( p[ x, y ], q[ x, y ])
where I [ x, y ] is the image, R is the reflectance map and p[ x, y ], q[ x, y ] are the shape
gradients (partial derivatives of the depth map).With the assumption of a Lambertian
surface and a single, distant light source, the equation can be written as follows:
I cos
or
I
1 pPs qQs
1 p 2 q 2 1 Ps2 Qs2
Since SFS algorithm provides face shape information, illumination and pose problems
can be solved simultaneously. For example, we can solve the illumination problem by
rendering the prototype image Ip from a given input image I. This can be done in two
steps: 1) apply SFS algorithm to obtain (p,q), 2) the new generate the prototype image Ip
under lighting angle = 0.
To evaluate some existing SFS algorithms, Zhao [13] applies several SFS algorithms to
1) synthetic face images which are generated based on Lambertian model and constant
albedo, 2) real face images. The experiment results show that these algorithms are not
good enough for real face images such that a significant improvement in face recognition
can be achieved. The reason is that face is composed of materials with different reflecting
properties: cheek skin, lip skin, eye, etc. hence, Lambertian model and constant albedo
can not provide good approximation. Zhao et al. [3] develop a symmetric SFS algorithm
using the Lambertian and varying albedo (x,y) as a better alternative. With the aid of a
generic 3D head model, they can shorten the two-step procedure of obtaining prototype
image (1. input image to shape via SFS, 2. shape to prototype image) to one step: input
image to prototype image directly.
Their algorithm is applied to more than 150 face images from the Yale University and
Weizmann database. The results clearly indicate the superior quality of prototype images
rendered by their method. They also conduct three experiments to evaluate the influence
in recognition performance when their algorithm is combined with existing FRT. The first
experiment demonstrates the improvements in recognition performance by using the new
illumination-invariant measure they define. The results are shown in table 2 and table 3.
The second experiment shows that using the rendered prototype images instead of
original input images can significantly improve existing FRT such as PCA and LDA. The
results of second experiment are shown in table 4 where P denotes prototype image.
Finally, in the third experiment they demonstrate that the recognition rate of subspace
LDA can be improved.
Database
Image Measure
Gradient Measure Illumination-Invariant Measure
Yale
68.3%
78.3%
83.3%
Weizmann
86.5%
97.9%
81.3%
Table 2: Recognition performance using three different measures (One image per
person).
Database
Image Measure
Gradient Measure Illumination-Invariant Measure
Yale
78.3%
88.3%
90.0%
Weizmann
72.9%
96.9%
87.9%
Table 3: Recognition performance using three different measures (Two images per
person).
Database
PCA
LDA
P-PCA
P-LDA
Yale
71.7%
88.3%
90.0
95.0%
Weizmann
97.9%
100%
95.8%
98.9%
Table 4: Recognition performance with/without using prototype image. (P denotes
prototype image)
3.2
Applying Illumination Cone to Face Recognition
In earlier work, it is shown that the images under arbitrary combination of light sources
form a convex cone in image space. This cone, called illumination cone, can be
(a)
(b)
(c)
cones-cast means that the reconstructed face surface was used to determine cast shadow.
Notice that the cone subspace approximation has the same performance as the original
illumination cone.
10
11
We can draw the following conclusions from their experiment results: 1) we can achieve
pose/illumination invariant recognition by using small number of images with fixed pose
and slightly different illumination, 2) the images of face under variable illumination can
be well approximated by a low-dimensional subspace.
12
Figure 5: each test face is rotated by using 49 faces as examples (not shown) and the
result are marked as output. Only for comparison the true rotated test face is shown on
the lower left (this face was not used in computation) [25].
50 faces, each given in two orientations (22.5 and 0). In their experiment, one face is
chosen as test face, and the other 49 faces are used as examples. In Figure 5, each test
face is shown on the upper left and the synthesized image is shown on lower right. The
true rotated test face is shown on the lower left. In the upper right, they also show the
synthesis of the test face through 49 examples in test orientation. This reconstruction of
the test face should be understood as the projection of the test face into the subspace
13
spanned the other 49 examples. The results are not perfect, but considering the small size
of example set, the reconstruction is pretty good. In general, the similarity of the
reconstruction to the input test face allows us to speculate that an example set of
hundreds faces may be sufficient to construct a huge variety of different faces. We can
conclude that the linear object class approach maybe a satisfactory approximation, even
for complex objects as faces.
Therefore, given only a single face image, we are able to generate additional synthetic
face images under different view angle. For face recognition task, these synthetic images
could be used to handle the pose variation. And this approach does not need any depth
information, so the difficult steps of generating 3D models can be avoided.
14
similar pose to input image. Once the proper eigenspace is determined, the input image is
coded using the eigenfaces of that space and then recognized.
They have evaluated the view-based approach with 189 images, 21 people with 9 poses.
The 9 poses of each person were evenly spaced from -90 to 90 along the horizontal
plane. Two different test mythologies were used to judge the recognition performance.
In the first series of experiments, the interpolation performance was tested by training on
the subset of available view {90, 45, 0} and testing on the intermediate views
{68, 23}. The average recognition rate was 90% for view-based method. A second
series of experiments test the extrapolation performance by training on a range of
available view {e.g., -90 to +45} and testing on views outside the training range {e.g.,
+68, +90}. For testing poses separated by 23 from the training range, the average
recognition rate was 83% for view-based method.
15
curvatures are the principal directions. The principal curvatures and the principal
directions are given by the eigenvalues and eigenvectors of shape matrix. The product of
two principal curvatures is Gaussian curvature. And mean curvature is defined by the
mean value of two principal curvatures.
In practice, because these curvature calculations contain second order partial derivatives,
they are extremely sensitive to noise. A smoothing filter is required before calculating
curvature. Here, the dilemma is how to choose an appropriate smoothing level. If the
smoothing level is too low, twice derivative will amplify noise such that curvature
measurement is useless. On the other hand, over smoothing will modify the surface
features we are trying to measure. In their implementation, they precompute the curvature
values using several different levels of smoothing. They use the curvature maps from low
smoothing level to establish the location of features. Then, use the prior knowledge of
face structure to select the curvature values from the precomputed set. This is done
manually, I think. An example of principal curvature maps is given in figure 6.
Segmentation is somewhat straightforward. By using the sign of Gaussian curvature, K,
and mean curvature, H, face surface can be divided into four different kinds of regions:
K+, H+ is convex, K+, H- is concave, K-, H+ is saddle with
saddle with
k max k min
k max k min
and K-, H- is
16
Figure 6: Principal curvature (a) and principal direction (c) of maximum curvature,
principal curvature (b) and principal direction (d) of minimum curvature.
Figure 7:
Segmentation of three faces by the sign of Gaussian and mean curvature:
concave (black), convex (white), saddle with positive mean curvature (light gray) and
saddle with negative mean curvature (dark gray).
With such a rich set of information available, there are many ways to construct a
comparison strategy. The author uses feature extraction and template matching to perform
face recognition. In the experiment, test set consists of 8 faces with 3 views each. For
17
each face there are two versions without expression and one with expression. The
experiment results show that 97% of the comparisons are correct.
In my opinion, the advantages of curvature-based technique are: 1) it solves the problem
of pose and illumination variation ate the same time. 2) There is a great deal of
information in curvature map which we havent taken advantage of. It is possible to find
an efficient way to deal with it.
However, there are some inherent problems in this approach: 1) Laser range finder
system is much more expensive compared with camera. And this technique cannot be
applied to the existing image database. This makes people dont want to choose it if they
have another choice. 2) Even though the rage finder is not an issue any more, the
computation cost is too high and the curvature calculation is very sensitive to noise. If we
use principal component analysis to deal with range data, the error rate probably will be
similar while the computation complexity is much lower. 3) We can construct 3D face
surface from 2D image instead of expensive range finder. There are a lot of algorithms
available to do this. But you will not be able to calculate curvature from reconstructed 3D
face surface. As mentioned earlier, curvature calculation involves second derivative of
surface. Only the high-resolution data such as laser range finder makes the accurate
curvature calculation possible.
18
object. 3D surface reconstruction is done by stripe detection and labeling. Form each
point of a stripe and its label, triangulation allows for X, Y, Z estimation. This process is
very fast while offering sufficient resolution for recognition purpose.
There are 120 persons in their experiment. Each on is taken three shots, corresponding to
central, limited left/right rotation and up/down rotation. Automatic database uses the
automatic program to get 3D information of each individual. In manual database, the 3D
extraction process was performed by clicking initial points in the deformed pattern.
With the 3D reconstruction, they are looking for characteristics to reduce the 3D data to a
set of features that could be easily and quickly compared. But they found nose seems to
be the only robust feature with limited effort. So, they gave up feature extraction and
considered global matching of face surface.
15 profiles are extracted by the intersection of face surface and parallel plane spaced with
1 cm. A distance measurement called profile distance is defined to quantify the difference
between 3D surfaces. This approach is slow: about 1 second to compare two face
surfaces. In order to speed up this algorithm, they tried to use only the central profile and
19
20
Figure 10.
ROC curves of 3D face surface recognition, with 15 profiles (left) and
with central/lateral profiles (right).
3.7 Elastic Bunch Graph Matching
In [26], they use Gabor wavelet transform to extract face features so that the recognition
performance can be invariant to the variation in poses. Here, I want to talk about some
terminologies they use first and discuss how they build the face recognition system.
For each feature point on the face, it is transformed with a family of Gabor wavelets. The
set of Gabor wavelets consists of 5 different spatial frequencies and 8 orientations.
Therefore, one feature point has 40 corresponding Gabor wavelet coefficients. A jet is
defined as the set J i of Gabor wavelet coefficients for one feature points. It can be
written as J i ai exp(i i ) .
A labeled Graph G represents a face consists of N nodes connected by E edges. The
nodes are located at feature points called fiducial points. For example, the pupils, the
corners of mouth, the tip of nose are all fiducial points. The nodes are labeled with jets
J n . Graphs for different head pose differ in geometry and local features. To be able to
21
appearance of face. This representative set has stack-like structure, called face bunch
graph (FBG) (see figure 11).
Figure 11 : The Face Bunch Graph serves as a general representation of face. Each stack
of discs represents a jet. One can choose the best match jet from a bunch of jets attached
to a single node
A set of jets referring to on fiducial point is called a bunch. An eye bunch, for instance,
may includes jets from closed, open, female and male eyes etc. to cover possible
variation. The Face Bunch Graph is given the same structure as the individual graph.
In searching for fiducial points in new image of face, the procedure described below
selects the best fitting jet from the bunch dedicated to each fiducial point.
The first set of graphs is generated manually. Initially, when the FBG contains only few
faces, it is necessary to check the matching result. Once the FBG is rich enough
(approximately 70 graphs), the matching results are pretty good.
Matching a FBG on a new image is done by maximizing the graph similarity between
image graph and the FBG of the same pose. For an image graph G with nodes n = 1,,N
22
and edges e = 1,,E and FBG B with model graph m = 1,,M the similarity is defined
as
S (G , B)
1
N
max(S ( J nI , J nBm ))
n
( xeI xeB ) 2
( xB ) 2
E e
e
since the FBG provides several jets for each fiducial point, the best one is selected and
used for comparison. The best fitting jets serve as local experts for the new image.
They use the FERET database to test their system. First, the size and location of face is
determined and face image is normalized in size. In this step several FBGs of different
size are required; the best fitting one is used for size estimation. In FERET database, each
image has a label indicating the pose, there is no need to estimate pose. But pose could be
estimated automatically in similar way as size.
After extracting model graphs from the gallery images, recognition is possible by
comparing an image graph to all model graphs and selecting the one with highest
similarity value. A comparison against a gallery of 250 individuals takes less than one
second.
The poses used here are: neutral frontal view (fa), frontal view with different
expression (fb), half-profile right (hr) or left (hl), and profile right (pr) and left (pl).
Recognition results are shown in Table 6.
The recognition rate is high for frontal against frontal images (first row). This is due to
the fact that two frontal views show only little variation. The recognition rate is till high
for right profile against left profile (third row). When comparing left and right halfprofile, the recognition rate drops dramatically (second row). The possible reason is the
variation in rotation angle visual inspection shows that rotation angle may differ by up
to 30. By comparing frontal views or profile against half profile, a further reduction in
recognition rate is observed.
23
From the experiment results, it is obvious that Gabor wavelet coefficients are not
invariant under rotation. Before performing recognition, you still need to estimate pose
and find corresponding FBG.
Table 6: Recognition results for cross-run between different galleries. The combination in
the four bottom rows are due to the fact that not all poses were available for all person.
The table also shows how often the correct face is identified as rank one and how often it
is among the top ten.
3.8
Face recognition from 2D and 3D images
Very few research groups focus on face recognition from both 2D and 3D face images.
Almost all existing systems rely on a single type of face information: 2D images or 3D
range data. It is obvious that 3D range data can compensate for the lack of depth
information in 2D image. 3D shape is also invariant under the variation of illumination.
Therefore, integrating 2D and 3D information will be a possible way to improve the
recognition performance.
In [29], corresponding 3D range data and 2D image are obtained at the same time and the
perform face recognition. Their approach consists of following steps:
24
Find feature points: In order to extract both 2D and 3D features, two type of feature
points are defined. One type is in 3D and is described by Point signature. The other type
is in 2D and is represented as Gabor wavelet response. Four 3D feature points and 10 2D
feature points are selected as shown in figure 12.
Figure 12: 2D feature points are shown by . And 3D feature points are shown by x.
Gabor wavelet function has tunable orientation and frequency. Therefore, it can be
configured to extract a specific band of frequency components from an image. This
makes it robust against translation, distortion and scaling. Here, they use a widely used
version of Gabor wavelets [26]. For each feature point, there a set of Gabor wavelets
consisting of 5 different frequencies and 8 orientations. Localizing a 2D feature point is
accomplished with the following steps.
1. For each pixel k in the search range, calculate the corresponding Jet Jk.
2. ComputethesimilaritybetweenJkandallthejetsincorrespondingbunch.
3. Inthesearcharea,thepixelwithhighestsimilarityvalueischosenasthelocation
offeaturepoint.
In this paper, point signature is chosen to describe feature points in 3D. Its main idea is
summarized in here. For a given point, we place a sphere with center at this point. The
intersection between sphere and face surface is a 3D curve C. a plane P is defined as the
25
tangential plane for the given point. The projection of C to P form a new curve called C.
The distance between C and C is a function of . This function is called point signature.
Point signature is invariant under rotation and translation. The definition of point
signatures is also illustrated figure 13.
26
Face recognition
vector. Shape vector X s 3694 consists of point signatures of four feature points and
their eight neighboring points. Each point signature is sampled with equal interval 10.
The texture vector X t 4010 consists of Gabor wavelet coefficients of 10 2D feature
points. Different weight is assigned for different feature point based on their detection
accuracy.
With the shape vector and texture vector of each training face, texture subspace t and
shape subspace s can be constructed. For a new face image, their shape vector and
texture vector will be projected to shape subspace and texture subspace respectively. The
projection coefficient vector t and s are then obtained. The feature vector st for
this face is calculated from t and s as the following equation:
Ts Tt
st
,
s
t
With the feature vector, the remaining task is simply choosing a classifier with lower
error rate. In this paper, two classifiers are used and compared: one is based on maximal
similarity value, and the other one is based on support vector machine.
Experiment
Their experiment was carried out using face images of 50 individuals. Each person
provides six facial images with view point and expression variation. Some of these
images are chosen as the training images and the rest are taken as test image. So, the
training set and testing set are disjoint.
27
For a new test image, after localizing the feature points, a 3694-dimension shape
vector and a 4010-dimension texture vector are calculated. These two vectors are
projected into corresponding subspace. The projection coefficients are augmented to form
an integrated feature vector.
In order to evaluate recognition performance, two test cases are performed with different
features, different number of principal components and different classifiers.
Case 1: Comparison of recognition performance with different features, point signature
(PS), Gabor coefficients (GC) and PS+GC.
Figure 14 (a)shows the recognition rate versus subspace dimensions with different chosen
features. The results confirm their assumption that combination of 2D and 3D
information can improve recognition performance.
Case 2: compare the recognition performance of different classifiers, similarity function
and Support Vector Machine. Figure 14 (b), (c) and (d) show the recognition rate versus
subspace dimension with different classifiers. Result in (b) is obtained using point
signature as feature, (c) is obtained using Gabor coefficients as feature and (c) is obtained
using PS+GC as feature. With SVM as the classifier, higher recognition rate is obtained
in all three cases.
28
Figure 14: recognition rate vs. subspace dimensions: (a) different chosen feature with the
same classifier; (b) chosen feature: PS; (c) chosen feature: GC; (d) chosen feature:
PS+GC.
3.9
Generally, a face recognition problem can be divided into two major parts: normalization
and recognition. In normalization, we need to estimate the size, illumination, expression,
and pose of face from the given image and then transform input image into normalized
format which can be recognized by the recognition algorithm. Therefore, how to estimate
pose accurately and efficiently is an important problem in face recognition. Solving this
problem is a key step in building a robust face recognition system.
Ji [10] propose a new approach for estimating 3D pose of face from single image. He
assumes that the shape of face can be approximated by an ellipse. The pose of face can be
expressed in terms of yaw angle, pitch angle and roll angle of ellipse (see figure 15, 16).
His system consists of three major parts: pupil detection, face detection and pose
estimation.
29
Figure 16: pose of face can be expressed in terms of yaw, pitch and roll angle.
system. Under the IR illumination, pupil will be much brighter than the other parts of
face. Therefore, pupil detection can be bone easily by setting a threshold. This is a robust
operation and can be done in real time.
Face detection
may have significantly different shape under different pose. In his system, face detection
consists of following steps: 1) find the approximate location of face based on the location
of pupil. 2) perform the constrained sum of gradients maximization procedure to search
for the exact face location.
In the second step, he applies a technique proposed by Birchfield [30]. Birchfileds
technique is aimed at detecting head instead of face. To detect face, he puts constraints on
the gradient maximization procedure. The constraints he exploits are size, location and
symmetry. Specifically, he draws a line between the detected pupils as shown in figure
17. The distance of pupils and their location are used to constraint the size and location of
ellipse. The ellipse should minimally and symmetrically include the two detected pupils.
Results of face detection are shown in figure 18.
30
with the pupil location, roll angle can be easily calculated from
following equation:
arctan
p1 y p 2 y
p1x p 2 x
where p1 and p 2 are the location of pupils. The basic idea about how to estimate the
yaw and pitch angle is simply figuring out the correspondence between 2D ellipse and 3D
ellipse. The details will not be discussed in this report since the derivation takes too much
space.
In order to evaluate his pose estimation algorithm, artificial ellipses are utilized to verify
the accuracy of estimation results. Experiment results are shown in figure 19.
31
Actual ellipse images with different pitch and yaws are used to further investigate the
performance of his algorithm. The computed rotation angle is shown below each image in
figure 20. The estimation results are consistent with the perceived orientations.
Figure 20: the estimated orientation of actual ellipses. The number below each image is
the estimated pitch or yaw angle.
32
The ultimate test is to study its performance with human faces. For this purpose, image
sequences of face with different pitches and yaws are captured. Ellipse detection is
performed on each image. The detected ellipse is used to estimate the pose. Experiment
results are shown in figure 21.
By using his algorithm, we can find the location of face, track the location of face and
estimate the pose. In other words, his algorithm almost provides everything we need in
building a face recognition system. The better part is we can recognize face not only from
image but also from video.
, we can simply apply the hybrid mode face recognition where multiple images are
available for each person. The closest pose can be found in database by using the
estimated pose. Then, use eigenface approach to perform face recognition. As mentioned
before, there are two major drawbacks: 1) it needs many possible images to cover
possible views, and 2) it cannot handle illumination variation. But this type of face
recognition techniques is probably the most practical method up to now.
33
In this paper, a variety of face recognition technologies are studied. Basically, they can be
divided into two major groups. The first one: Since eigenface is an efficient method to
perform face recognition under fixed illumination and pose, most of recent works focus
on how to compensate the variation in pose and illumination. For example, SFS,
illumination cone and linear object method are in this direction. The second one: Instead
of using eigenface, some other methods try to avoid illumination and pose problem at the
beginning of their algorithm. For example, curvature-based method and face bunch graph
matching are belonged to this category.
I give below a concise summary followed by conclusions in the same order as methods
appear in this report.
SFS has great potential to be applied in real world application since it requires
only one input image and one registered image per person. But, SFS problem is
still hard to solve so far. The current method has high computational complexity
and the re-rendering image has poor quality. So, there are many issues to be
solved before it can be applied to commercial or law enforcement application.
34
Linear object method is inspired by the way human look at the world. It provides
a new angle to think about face recognition problem. In [25], they successfully
transform the image from half profile to frontal view given the 49 examples. So,
one possible future work in this direction is to cover more variation in pose or
illumination by expanding their method.
35
information. One issue with their system is that their surface reconstruction
algorithm needs manual adjustment to achieve better recognition rate. The fully
automatic reconstruction algorithm is the possible future work in this direction.
Face bunch graph uses the Gabor wavelet transformation to extract face feature
and then collect the variations of feature points to form face bunch graph. The
primary advantage of this method is that the feature extracted by Gabor wavelet is
invariant in certain degree. So, this is an efficient way to design an robust face
recognition system.
Recognizing face from video is probably the most difficult problem. One should
be able to tract the location of face, estimate the pose of face and then recognize
face. In [10], their method provides the infrastructure for recognizing face from
video. Face tracking and pose estimation can be done by using their method. We
can expand it to a face recognition system by simply adding the view based
eigenspace algorithm.
Reference
36
[1] R. Chellappa, C.L. Wilson, and Sirohey, Human and Machine Recognition of Faces,
A survey, Proc. of the IEEE, Vol. 83, pp. 705-740, 1995.
[3] 3D model enhanced face recognition
Wen Yi Zhao; Chellappa, R.;
Image Processing, 2000. Proceedings. 2000 International Conference on , Volume: 3 ,
2000
Page(s): 50 -53 vol.3
[4] SFS based view synthesis for robust face recognition
Wen Yi Zhao; Chellappa, R.;
Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International
Conference on , 2000
Page(s): 285 292
[5] The FERET evaluation methodology for face-recognition algorithms
Phillips, P.J.; Hyeonjoon Moon; Rizvi, S.A.; Rauss, P.J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 22 Issue: 10 ,
Oct 2000
Page(s): 1090 -1104
[6] Human and machine recognition of faces: a survey
Chellappa, R.; Wilson, C.L.; Sirohey, S.;
Proceedings of the IEEE , Volume: 83 Issue: 5 , May 1995
Page(s): 705 -741
[7] G. Gordon, ``Face Recognition from Depth Maps and Surface Curvature", in Proc. of
SPIE, Geometric Methods in Computer Vision, San Diego, July 1991. Vol. 1570.
[8] C. Beumier, M. Acheroy, Automatic 3D Face Authentication In Image and Vision
Computing, Vol. 18, No. 4, pp 315-321, Feb 1999.
[9] ] Ali Md. Haider, and Toyohisa Kaneko, Automated 3D-2D Projective Registration
of Human Facial Images Using Edge Features, Accepted for publication in the
International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI).
[10] Qiang Ji, 3D Face pose estimation and tracking from a monocular camera ", Image
and Vision Computing, Volume 20, issue 7, pages 499-511, 2002. download the paper
[11] S. Sakamoto, R. Ishiyama, and J. Tajima, 3D model-based face recognition system
with robustness against illumination changes, NEC Research and Development, vol.43,
pp. 15-19, 2002.
37
[12] "What is the Set of Images of an Object under all Possible Illumination Conditions?"
P.N. Belhumeur and D. Kriegman, International Journal of Computer Vision, 28(3), 245260 (1998).
[13] W. Zhao, Robust Image Based 3D Face Recognition, PhD Thesis, University of
Maryland, 1999.
[14] From few to many: illumination cone models for face recognition under
variable lighting and pose
Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 23 Issue: 6 ,
Jun 2001
Page(s): 643 -660
[15] The bas-relief ambiguity
Belhumeur, P.N.; Kriegman, D.J.; Yuille, A.L.;
Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer
Society Conference on , 17-19 Jun 1997
Page(s): 1060 -1066
[18] Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 711 -720
[19] Face recognition: the problem of compensating for changes in illumination
direction
Adini, Y.; Moses, Y.; Ullman, S.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 721 -732
[20]
Comparing images under variable illumination
Jacobs, D.W.; Belhumeur, P.N.; Basri, R.;
Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer
Society Conference on , 23-25 Jun 1998
Page(s): 610 -617
[21]
What is the set of images of an object under all possible lighting conditions?
Belhumeur, P.N.; Kriegman, D.J.;
Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE
Computer Society Conference on , 18-20 Jun 1996
Page(s): 270 -277
38
[22]
A low-dimensional representation of human faces for arbitrary lighting conditions
Hallinan, P.W.;
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE
Computer Society Conference on , 21-23 Jun 1994
Page(s): 995 -999
[23]
Statistical Approach to Shape from Shading: Reconstruction of 3D Face Surfaces
from Single 2D Images (1997)
Joseph J. Atick, Paul A. Griffin, A. Norman Redlich
Neural Computation, Vol. 8, pp. 1321-1340, 1996
[24]
Illumination-based image synthesis: creating novel images of human faces under
differing pose and lighting
Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.;
Multi-View Modeling and Analysis of Visual Scenes, 1999. (MVIEW '99) Proceedings.
IEEE Workshop on , 1999
Page(s): 47 -54
[25]
Linear object classes and image synthesis from a single example image
Vetter, T.; Poggio, T.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 733 -742
[26]
Face recognition by elastic bunch graph matching
Wiskott, L.; Fellous, J.-M.; Kuiger, N.; von der Malsburg, C.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 775 -779
[27]
View-based and modular eigenspaces for face recognition
Pentland, A.; Moghaddam, B.; Starner, T.;
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE
Computer Society Conference on , 21-23 Jun 1994
Page(s): 84 -91
[28]
M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive
Neuroscience, Vol. 3, No. 1, pp. 71-86, Winter 1991
39
[29]
Integrated 2D and 3D images for face recognition
Yingjie Wang; Chin-Seng Chua; Yeong-Khing Ho; Ying Ren;
Image Analysis and Processing, 2001. Proceedings. 11th International Conference on ,
26-28 Sep 2001
Page(s): 48 -53
[30]
Elliptical head tracking using intensity gradients and color histograms
Birchfield, S.;
Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer
Society Conference on , 23-25 Jun 1998
Page(s): 232 -237
[38]
A feature based approach to face recognition
Manjunath, B.S.; Chellappa, R.; von der Malsburg, C.;
Computer Vision and Pattern Recognition, 1992. Proceedings CVPR '92., 1992 IEEE
Computer Society Conference on , 15-18 Jun 1992
Page(s): 373 -378
[39]
Geometric and illumination invariants for object recognition
Alferez, R.; Yuan-Fang Wang;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 21 Issue: 6 ,
Jun 1999
Page(s): 505 536
[40]
Automatic creation of 3D facial models
Akimoto, T.; Suenaga, Y.; Wallace, R.S.;
Computer Graphics and Applications, IEEE , Volume: 13 Issue: 5 , Sep 1993
Page(s): 16 -22
[41]
Learning bilinear models for two-factor problems in vision
Freeman, W.T.; Tenenbaum, J.B.;
Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer
Society Conference on , 17-19 Jun 1997
Page(s): 554 -560
[42]
Surface reflection: physical and geometrical perspectives
Nayar, S.K.; Ikeuchi, K.; Kanade, T.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 13 Issue: 7 ,
40
Jul 1991
Page(s): 611 -634
[43]
Shape from shading
Berthold K. P. Hornand and Michael J. Brooks,
Cambridge, Mass. : MIT Press, 1989.
41