Você está na página 1de 73

AdaBoost & Its

Applications

Outline
Overview
The AdaBoost Algorithm
How and why AdaBoost works?
AdaBoost for Face Detection
AdaBoost & Its
Applications
Overview
Introduction
AdaBoost

Adaptive


A learning algorithm
Building a strong classifier a lot of weaker ones


Boosting
AdaBoost Concept
1
{ 1 } ) , 1 ( h x e +
.
.
.
weak classifiers
slightly better than random
1
( ) ( )
T
t
t T t
h x H x sign o
=
| |
=
|
\ .

2
{ 1 } ) , 1 ( h x e +
{ 1 ( } ) , 1
T
h x e +
strong classifier
Weaker Classifiers
1
{ 1 } ) , 1 ( h x e +
.
.
.
weak classifiers
slightly better than random
1
( ) ( )
T
t
t T t
h x H x sign o
=
| |
=
|
\ .

2
{ 1 } ) , 1 ( h x e +
{ 1 ( } ) , 1
T
h x e +
strong classifier
Each weak classifier learns by
considering one simple feature
T most beneficial features for
classification should be selected

How to
define features?
select beneficial features?
train weak classifiers?
manage (weight) training samples?
associate weight to each weak
classifier?
The Strong Classifiers
1
{ 1 } ) , 1 ( h x e +
.
.
.
weak classifiers
slightly better than random
1
( ) ( )
T
t
t T t
h x H x sign o
=
| |
=
|
\ .

2
{ 1 } ) , 1 ( h x e +
{ 1 ( } ) , 1
T
h x e +
strong classifier
AdaBoost & Its
Applications
The AdaBoost
Algorithm
The AdaBoost Algorithm
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

:probability distribution of 's at time ( )


t i
D i x t
Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
minimize weighted error
for minimize exponential loss
Give error classified patterns more chance for learning.
The AdaBoost Algorithm
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

Boosting illustration
Weak
Classifier 1
Boosting illustration
Weights
Increased
Boosting illustration
Weak
Classifier 2
Boosting illustration
Weights
Increased
Boosting illustration
Weak
Classifier 3
Boosting illustration
Final classifier is
a combination of weak
classifiers
AdaBoost & Its
Applications
How and why
AdaBoost works?
The AdaBoost Algorithm
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

What goal the AdaBoost wants to reach?


The AdaBoost Algorithm
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

What goal the AdaBoost wants to reach?


1 1
ln
2
t
t
t
c
o
c

=
They are goal
dependent.
Goal
Minimize exponential loss
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ,
( )
yH x
x y
loss H x E e

(
=

Goal
Minimize exponential loss
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ,
( )
yH x
x y
loss H x E e

(
=

( ) yH x
Maximize the margin yH(x)
Goal
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ,
( )
yH x
x y
loss H x E e

(
=

Minimize
( ) ( )
,
|
t t
yH x yH x
x y x y
E e E E e x

(
( (
=


Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= + with
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
[ ( ) ( )]
|
t t t
y H x h x
x y
E E e x
o

+
(
(
=


1
( ) ( )
|
t t t
yH x y h x
x y
E E e e x
o


(
(
=


1
( )
( ( )) ( ( ))
t t t
yH x
x t t
E e e P y h x e P y h x
o o


(
(
= = + =


( ) ( )
,
|
t t
yH x yH x
x y x y
E e E E e x

(
( (
=


Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ,
( )
yH x
x y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= + with
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
( )
( ( )) ( ( ))
t t t
yH x
x t t
E e e P y h x e P y h x
o o


(
(
= = + =


( )
,
0
t
yH x
x y
t
E e
o

c
(
=

c
Set
1
( )
( ( )) ( ( )) 0
t t t
yH x
x t t
E e e P y h x e P y h x
o o


(
(
= + = =




0
?
t
o =
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= + with
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
( )
( ( )) ( ( )) 0
t t t
yH x
x t t
E e e P y h x e P y h x
o o


(
(
= + = =




0
( ( )) 1
ln
2 ( ( ))
t
t
t
P y h x
P y h x
o
=
=
=
1 1
ln
2
t
t
t
c
o
c

=
(error)
t
P c =
?
t
o =
| |
( )
exp ,
( )
yH x
x y
loss H x E e

(
=

( , ) ( )
i i t
P x y D i =
1
( )[ ( )]
m
t i j i
i
D i y h x
=
~ =

| |
( )
exp ,
( )
yH x
x y
loss H x E e

(
=

with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
( )
( ( )) ( ( )) 0
t t t
yH x
x t t
E e e P y h x e P y h x
o o


(
(
= + = =




0
( ( )) 1
ln
2 ( ( ))
t
t
t
P y h x
P y h x
o
=
=
=
1 1
ln
2
t
t
t
c
o
c

=
?
t
o =
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

(error)
t
P c =
( , ) ( )
i i t
P x y D i =
1
( )[ ( )]
m
t i j i
i
D i y h x
=
~ =

with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ~ ,
( )
yH x
x D y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
( )
( ( )) ( ( )) 0
t t t
yH x
x t t
E e e P y h x e P y h x
o o


(
(
= + = =




0
( ( )) 1
ln
2 ( ( ))
t
t
t
P y h x
P y h x
o
=
=
=
1 1
ln
2
t
t
t
c
o
c

=
1
?
t
D
+
=
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

(error)
t
P c =
( , ) ( )
i i t
P x y D i =
1
( )[ ( )]
m
t i j i
i
D i y h x
=
~ =

with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ~ ,
( )
yH x
x D y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
?
t
D
+
=
1
, ,
t t t t
yH yH y h
x y x y
E e E e e
o


( (
=

1
2 2 2
,
1
1
2
t
yH
x y t t t t
E e y h y h o o

(
| |
~ +
| (
\ .

1
2 2 2
,
1
argmin 1
2
t
yH
t x y t t
h
h E e y h y h o o

(
| |
= +
|
\ .

2 2
1 y h =
1
2
,
1
argmin 1
2
t
yH
t x y t t
h
h E e y h o o

(
| |
= +
| (
\ .

1
2
1
argmin 1 |
2
t
yH
t x y t t
h
h E E e y h x o o

(
(
| |
= +
( | (
\ .

with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ~ ,
( )
yH x
x D y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
?
t
D
+
=
1
2
1
argmin 1 |
2
t
yH
t x y t t
h
h E E e y h x o o

(
(
| |
= +
( | (
\ .

( )
1
argmin |
t
yH
t x y t
h
h E E e y h x o

(
(
=


( )
1
argmax |
t
yH
t x y
h
h E E e yh x

(
(
=


1 1
( ) ( )
argmax 1 ( ) ( 1| ) ( 1) ( ) ( 1| )
t t
H x H x
t x
h
h E h x e P y x h x e P y x

(
= = + =

with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ~ ,
( )
yH x
x D y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
?
t
D
+
=
| | ( )
1
, ~ ( | )
argmax ( )
yH x
t t
x y e P y x
h
h E yh x


= maximized when
( ) y h x x =
( )
( ) ( )
1 1
, ~ ( | ) , ~ ( | )
( ) ( 1| ) ( 1| )
yH x yH x
t t t
x y e P y x x y e P y x
h x sign P y x P y x


= = =
| |
( )
( )
1
, ~ ( | )
( ) |
yH x
t t
x y e P y x
h x sign E y x


=
1 1
( ) ( )
argmax 1 ( ) ( 1| ) ( 1) ( ) ( 1| )
t t
H x H x
t x
h
h E h x e P y x h x e P y x

(
= = + =

with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ~ ,
( )
yH x
x D y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
?
t
D
+
=
( )
( ) ( )
1 1
, ~ ( | ) , ~ ( | )
( ) ( 1| ) ( 1| )
yH x yH x
t t t
x y e P y x x y e P y x
h x sign P y x P y x


= = =
1
( )
, ~ ( | )
t
yH x
x y e P y x

At time t
with
Final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

| |
( )
exp ~ ,
( )
yH x
x D y
loss H x E e

(
=

Minimize
Define
1
( ) ( ) ( )
t t t t
H x H x h x o

= +
0
( ) 0 H x =
Then, ( ) ( )
T
H x H x =
1
?
t
D
+
=
1
( )
, ~ ( | )
t
yH x
x y e P y x

At time t
At time 1
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

, ~ ( | ) x y P y x
( | ) 1
i i
P y x =
1
1
1 1
(1) D
Z m
= =
At time t+1
( )
, ~ ( | )
t
yH x
x y e P y x
( )
t t
yh x
t
De
o

1
( ) exp[ ( )
, is for normaliza i
]
( ) t on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+
=

AdaBoost & Its
Applications
AdaBoost for
Face Detection
The Task of
Face Detection
Many slides adapted from P. Viola
Basic Idea
Slide a window across image and evaluate a face model at
every location.

Challenges
Slide a window across image and evaluate a face model at
every location.
Sliding window detector must evaluate tens of thousands of
location/scale combinations.
Faces are rare: 010 per image
For computational efficiency, we should try to spend as little
time as possible on the non-face windows
A megapixel image has ~10
6
pixels and a comparable number of
candidate face locations
To avoid having a false positive in every image image, our false
positive rate has to be less than 10
6

The Viola/Jones Face Detector
A seminal approach to real-time object detection
Training is slow, but detection is very fast
Key ideas
Integral images for fast feature evaluation
Boosting for feature selection
Attentional cascade for fast rejection of non-face
windows
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.
CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
Image Features
Rectangle filters
Feature Value (Pixel in white area)
(Pixel in black area)
=

Image Features
Rectangle filters
Feature Value (Pixel in white area)
(Pixel in black area)
=

Size of Feature Space


Feature Value (Pixel in white area)
(Pixel in black area)
=

How many number of possible


rectangle features for a
24x24 detection region?
Rectangle filters
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
= =
= =
= =
+ + +
+ + +
+ +

A+B
C
D
160, 000
Feature Selection
How many number of possible
rectangle features for a
24x24 detection region?
A+B
C
D
160, 000
What features are good for
face detection?
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
= =
= =
= =
+ + +
+ + +
+ +

Feature Selection
How many number of possible
rectangle features for a
24x24 detection region?
A+B
C
D
160, 000
Can we create a good classifier
using just a small subset of all
possible features?
How to select such a subset?
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
= =
= =
= =
+ + +
+ + +
+ +

Integral images
The integral image computes
a value at each pixel (x, y)
that is the sum of the pixel
values above and to the left
of (x, y), inclusive.
(x, y)
,
( , ) ( , )
x x y y
ii x y i x y
' ' s s
' '
=

( , 1) ii x y
( 1, ) s x y
Computing the Integral Image
The integral image computes
a value at each pixel (x, y)
that is the sum of the pixel
values above and to the left
of (x, y), inclusive.

This can quickly be
computed in one pass
through the image.
(x, y)
,
( , ) ( , )
x x y y
ii x y i x y
' ' s s
' '
=

( , ) i x y
( , ) ( 1, ) ( , ) s x y s x y i x y = +
( , ) ( , 1) ( , ) ii x y ii x y s x y = +
Computing Sum within a Rectangle
A B C D
sum ii ii ii ii = +
D B
C A
Scaling
Integral image enables
us to evaluate all
rectangle sizes in
constant time.
Therefore, no image
scaling is necessary.
Scale the rectangular
features instead!
1 2
3 4
5 6
Boosting
Boosting is a classification scheme that works by
combining weak learners into a more accurate
ensemble classifier
A weak learner need only do better than chance
Training consists of multiple boosting rounds
During each boosting round, we select a weak learner
that does well on examples that were hard for the
previous weak learners
Hardness is captured by weights attached to training
examples
Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese
Society for Artificial Intelligence, 14(5):771-780, September, 1999.
The AdaBoost Algorithm
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

:probability distribution of 's at time ( )


t i
D i x t
Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
minimize weighted error
for minimize exponential loss
Give error classified patterns more chance for learning.
The AdaBoost Algorithm
Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

Weak Learners for Face Detection


Given:
1 1
where ( , ), , ( , ) , { 1, 1}
m m i i
x y x y x X y e e +
Initialization:
1
1
( ) , 1, ,
m
D i i m = =
For : 1, , t T =
Find classifier which minimizes error wrt D
t
,i.e., : { 1, 1}
t
h X +
1
where argmin ( )[ ( )]
j
m
t j j t i j i
i h
h D i y h x c c
=
= = =

Weight classifier:
1 1
ln
2
t
t
t
c
o
c

=
Update distribution:
1
( ) exp[ ( )]
, is for normalizati ( ) on
t t i t i
t t
t
D i y h x
D i Z
Z
o
+

=
Output final classifier:
1
( ) ( )
T
t t
t
sign H x h x o
=
| |
=
|
\ .

What base learner is proper


for face detection?
Weak Learners for Face Detection
1 if ( )
( )
0 otherwise
t t t t
t
p f x p
h x
u >

window
value of rectangle
feature parity threshold
Boosting
Training set contains face and nonface examples
Initially, with equal weight
For each round of boosting:
Evaluate each rectangle filter on each example
Select best threshold for each filter
Select best filter/threshold combination
Reweight examples

Computational complexity of learning: O(MNK)
M rounds, N examples, K features
Features Selected by Boosting
First two features selected by boosting:
This feature combination can yield 100% detection
rate and 50% false positive rate
ROC Curve for 200-Feature Classifier
A 200-feature classifier can yield 95% detection
rate and a false positive rate of 1 in 14084.
To be practical for real application, the false
positive rate must be closer to 1 in 1,000,000.
Attentional Cascade
Classifier 3 Classifier 2 Classifier 1
IMAGE
SUB-WINDOW
T T
FACE
T
NON-FACE
F F
NON-FACE
F
NON-FACE
We start with simple classifiers which reject many of the
negative sub-windows while detecting almost all positive
sub-windows
Positive response from the first classifier triggers the
evaluation of a second (more complex) classifier, and so on
A negative outcome at any point leads to the immediate
rejection of the sub-window
Attentional Cascade
Classifier 3 Classifier 2 Classifier 1
IMAGE
SUB-WINDOW
T T
FACE
T
NON-FACE
F F
NON-FACE
F
NON-FACE
Chain classifiers that
are progressively more
complex and have lower
false positive rates
0
100
0 50
% False Pos
%

D
e
t
e
c
t
i
o
n

ROC Curve
Detection Rate and False Positive Rate
for Chained Classifiers
Classifier 3 Classifier 2 Classifier 1
IMAGE
SUB-WINDOW
T T
FACE
T
NON-FACE
F F
NON-FACE
F
NON-FACE
The detection rate and the false positive rate of the
cascade are found by multiplying the respective rates of
the individual stages
A detection rate of 0.9 and a false positive rate on the
order of 10
6
can be achieved by a 10-stage cascade if each
stage has a detection rate of 0.99 (0.99
10
0.9) and a false
positive rate of about 0.30 (0.3
10
610
6
)
1 1
, f d
2 2
, f d
3 3
, f d
1 1
,
K
i
K
i
i i
F f D d
= =
= =
[ [
, F D
Training the Cascade
Set target detection and false positive rates for
each stage
Keep adding features to the current stage until
its target rates have been met
Need to lower AdaBoost threshold to maximize
detection (as opposed to minimizing total classification
error)
Test on a validation set
If the overall false positive rate is not low enough,
then add another stage
Use false positives from current stage as the
negative training examples for the next stage
Training the Cascade
ROC Curves Cascaded Classifier to
Monlithic Classifier
ROC Curves Cascaded Classifier to
Monlithic Classifier
There is little difference between
the two in terms of accuracy.
There is a big difference in terms
of speed.
The cascaded classifier is nearly
10 times faster since its first
stage throws out most non-faces
so that they arenever evaluated by
subsequent stages.

The Implemented System
Training Data
5000 faces
All frontal, rescaled to
24x24 pixels
300 million non-faces
9500 non-face images
Faces are normalized
Scale, translation
Many variations
Across individuals
Illumination
Pose
Structure of the Detector Cascade
Combining successively more complex classifiers in cascade
38 stages
included a total of 6060 features

Reject Sub-Window
1 2 3 4 5 6 7 8 38 Face
F F F F F F F F F
T T T T T T T T T
All Sub-Windows
Structure of the Detector Cascade
Reject Sub-Window
1 2 3 4 5 6 7 8 38 Face
F F F F F F F F F
T T T T T T T T T
All Sub-Windows
2 features, reject 50% non-faces, detect 100% faces
10 features, reject 80% non-faces, detect 100% faces
25 features
50 features
by algorithm
Speed of the Final Detector
On a 700 Mhz Pentium III processor, the face
detector can process a 384 288 pixel image in
about .067 seconds
15 Hz
15 times faster than previous detector of comparable
accuracy (Rowley et al., 1998)

Average of 8 features evaluated per window on
test set

Image Processing
Training all example sub-windows were variance
normalized to minimize the effect of different
lighting conditions
Detection variance normalized as well
2 2 2
1
N
m x o =

Can be computed using
integral image
Can be computed using
integral image of
squared image
Scanning the Detector
Scaling is achieved by scaling the detector itself,
rather than scaling the image
Good detection results for scaling factor of 1.25
The detector is scanned across location
Subsequent locations are obtained by shifting the
window [sA] pixels, where s is the current scale
The result for A = 1.0 and A = 1.5 were reported

Merging Multiple Detections
ROC Curves for Face Detection
Output of Face Detector on Test Images
Other Detection Tasks
Facial Feature Localization
Male vs.
female
Profile Detection
Other Detection Tasks
Facial Feature Localization Profile Detection
Other Detection Tasks
Male vs. Female
Conclusions
How adaboost works?
Why adaboost works?
Adaboost for face detection
Rectangle features
Integral images for fast computation
Boosting for feature selection
Attentional cascade for fast rejection of
negative windows

Você também pode gostar