Você está na página 1de 62

Image Processing

Outline

• Logistics

• Motivation

• Convolution

• Filtering
Waitlist

• We are at 103 enrolled with 158 students on wait


list. This room holds 107.

• I’m getting numerous requests of the form “how


likely is it that I’ll get registered?” unlikely :(

• If you are considering dropping, please do so


quickly
Some final class
philosophies
• Diverse background of class implies folks will find some topics will be
redundant/new (e.g., EE folks might be bored by today’s signal
processing)

• I think 1-way lectures are boring (and such context can easily be found
elsewhere). Discussions are way more fun! I encourage you to come to
class.

• I hate power-point. I’d rather write on board, but this room is not
conducive for it. I still encourage you to take notes.

• If you are going to come and check e-mail / Facebook, I’d rather you
drop now to make room for someone else who’d get more out of
lecture.
Outline

• Logistics

• Motivation

• Convolution

• Filtering
Computational perspective
Credited with early computational approach for vision

David Marr, 1970s


Fei-Fei Li & Andrej Karpathy! Lecture 1 - !12' 5"Jan"15'
!
David Marr, 1982
!
David Marr

Low-level Mid-level High-level


Low-level vision

Finding edges, blobs, bars, etc….


Consider family of low-level
image processing operations
Photoshop / Instragram filters: blur, sharpen, colorize, etc….

Are certain combinations redundant? Is there a mathematical way to characterize them?


Recall: what is a digital (grayscale) image?

Matrix of integer values


Images as height fields

Let’s think of image as zero-padded functions

F[i,j]
Characterizing image
transformations

F[i,j] T G[i,j]

5 4 2 3 7 4 6 5 3 6 F[i] T G[i] 5 4 2 3 7 4 6 5 3 6

G = T (F )
G[i] = T (F [i])
(Abuse of T (↵F1[i]+does
notation: ↵F not2 ) = transformation
mean ↵G1 + Gis 2applied at each pixel separately)
How do we characterize
image processing operations ?
Properties of “nice” functional transformations

Additivity
T (F1 + F2 ) = T (F1 ) + T (F2 )

Scaling
T (↵F ) = ↵T (F )

Direct consequence: Linearity


T (↵F1 + ↵F2 ) = ↵G1 + G2

Shift Invariance
G[i j] = T (F [i j])
Impulse response
[also called delta function]

[i] = 1 for i = 0 (0 othwerwise)

What does this look like for an image?

Any function can be written as linear combination of shifted and scaled impulse reponses

= + + + +
... ...

Figure 1: Staircase approximation to a continuous-time signal.


F[i] = ?
F [i] = signals
Representing F [0] with
[i] + F [1] [iAny signal
impulses. 1] +can . . be
. expressed as a sum of scaled and
shifted unit impulses.X We begin with the pulse or “staircase” approximation to a continuous
signal F ,[i] =
as illustratedFin[u] [i Conceptually,
Fig. 1. u] this is trivial: for each discrete sample of the
original signal, we make a pulse signal. Then we add up all these pulse signals to make up the
u
Convolution
= + + + +
... ...

Figure 1: Staircase approximation to a continuous-time signal.

F [i] =Representing
F [0] [i]signals+ Fwith [1] impulses.
[i 1]Any + .signal
. . can be expressed as a sum of scaled and
shiftedXunit impulses. We begin with the pulse or “staircase” approximation to a continuous
F [i]signal
= , as illustrated in Fig. 1. Conceptually, this is trivial: for each discrete sample of the
F [u] [i u]
original signal, we make a pulse signal. Then we add up all these pulse signals to make up the
u signal. Each of these pulse signals can in turn be represented as a standard pulse
approximate
X
scaled by the appropriate value and shifted to the appropriate place.
impulseIn mathematical
response,notation:
filter, kernel
T (F [i]) = F [u]T ( [i u])
u
X
G[i]As=we let F [u]H[i
approach zero, the u] where becomes
approximation H[i]better
= Tand( better,
[i]),andG[i]
the in = T (F [i])
the limit
equals . Therefore,
u
G=F ⇤H
Also, as , the summation approaches an integral, and the pulse approaches the unit impulse:
January 20, 2015

Example X
G[i] = F [i] ⇤ H[i] = F [u]H[i u]
u
H F
X
1 2 3 * 5 ⇤4F [i]2= 3 H[u]F
= H[i] 7 4[i 6u] 5 3 6
u

0 1 2 0 1 2 3 4 5 6 7 8 9
X
G[i] = F [i] ⌦ H[i] = H[u]F [i + u]
u

G[0] = ?
= F [i]G[1]
⇤ H[= ?i]

XX
G[i, j] = F ⇤ H = F [u, v]H[i u, j v]
u v

XX
G[i, j] = F ⇤ H = H ⇤ F = H[u, v]F [i u, j v]
u v
Example
1 2 3 * 5 4 2 3 7 4 6 5 3 6

-3 -2 -1 0 1 2 3 4 5 6 7 8 9

3 2 1

G[0] = 5x1 = 5
G[1] = 5x2+ 4x1 = 14
G[2] = 5x3 + 4x2 + 2x1 = 25

Preview of 2D
h

f
Properties of convolution

F ⇤H =H ⇤F Commutative

(F ⇤ H) ⇤ G = F ⇤ (H ⇤ G) Associative

(F ⇤ G) + (H ⇤ G) = (F + H) ⇤ G Distributive

Implies that we can efficiently implement complex operations

Powerful way to think about any image transformation that


satisfies additivity, scaling, and shift-invariance
Proof: commutativity
X X
H ⇤F = H[u]F [i u] = H[i u0 ]F [u0 ] where u0 = i u
u u0
X
= F [u]H[i u] = F ⇤ H
u

Conceptually wacky: allows us to interchange the filter and image


Size
Given F of length N and H of length M, what’s size of G = F * H?
Size
Given F of length N and H of length M, what’s size of G = F * H?

N+M-1
>>conv(F,H,’full’)
N-M+1
>>conv(F,H,’valid’)
>>conv(F,H,’same’) N
January
Deva Ramanan 14, 2015

A simpler approach
January 14, 2015
XX
G[i, j] = F ⌦ H = H[u, v]F [i + u, j +
u v
XX XX
1 2 j]3 = F ⌦ HG[i,
G[i, 2 ⇤
=5 j] 4= G 3H =
H[u, 7v]F
4 [i + 5H[u,
6 u, j+ v]F
3 v]6 [i u, j
u v u v
-1 0 1 0X1X2 3 4 5 6 7 8 9
G[i, j] = G ⇤ H = H[u, v]F [i u, j v]
u v
Scan original F instead of flipped version.
What’s the math?
January
Deva Ramanan 14, 2015

(Cross) correlation
January 14, 2015
XX
G[i, j] = F ⌦ H = H[u, v]F [i + u, j +
u v
XX XX
1 2 j]3 = F ⌦ HG[i,
G[i, 2 ⇤
=5 j] 4= G 3H =
H[u, 7v]F
4 [i + 5H[u,
6 u, j+ v]F
3 v]6 [i u, j
u v u v
-1 0 1 0X1X2 3 4 5 6 7 8 9
G[i, j] = G ⇤ H = H[u, v]F [i u, j v]
u v
Scan original F instead of flipped version.
What’s the math?

u=k
X
F [i] ⌦ H[i] = H[u]F [i + u]
u= k
Properties

Associativity, Commutative properties do not hold

… but correlation is easier to think about


Template
Template
Deva Ramanan
ConvolutionDevavsRamanan
correlation
January 20, 2015
Deva Ramanan
(1-d)
January 20, 2015
January 20, 2015

X
G[i] = F [i] ⇤ H[i] =X F [u]H[i u]
Xu (convolution)
G[i] = F [i] ⇤ H[i] =
G[i] = F [i] ⇤ H[i] = FF[u]H[i
[u]H[i u]u]
u
u
X
= H[i] ⇤ F [i] =
X F [i
X u]H[u]
H[i] ⇤⇤ FF[i]
= H[i] [i] == (commutative property)
= uH[u]F[i[i u]u]
H[u]F
uu
X
X
X
G[i] = F [i] ⌦ H[i] = F [i + u]H[u]
G[i] = F
F[i] ⌦ H[i]
[i] ⌦ H[i]== H[u]F[i[i++u]u]
H[u]F (cross-correlation)
u
uu

= [i]⇤⇤⇤H[
=FFF[i]
[i] H[i]i] i]
H[ (exercise for reader!)

X
XXX
XX
G[i, j]
G[i,
G[i, j] =FFF⇤⇤⇤H
j] =
= HH=
== FF[u,
F v]H[i
[u,[u, u,u,j ju,v]jv] v]
v]H[i
v]H[i
Image filtering h[⋅ ,⋅ ] 1 1 1

2D correlation k k
1 1 1
1 1 1

X X
f [.,.]
Gaussian filtering
G[i, j] = F ⌦ H = H[u, v]F [i + u, j + v]
A Gaussian
u= kernel
k v=gives
of the window
g[.,.]
k less weight to pixels further from the center
0 00 00 00 00 00 00 00 00 00 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30
0 0 0 90 90 90 90 90 0 0
1 2 1
0 00 00 090 9090 9090 9090 9090 900 00 0 0 20 40 60 60 60
2 4 2
0 0 0 90 90 90 90 90 0 0
0 1 30 2 60 1 90
00 00 090 9090 090 9090 9090 900 00 0 0 90 90

0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0 0 30 50 80 80 90
0 0 0 0 0 0 0 0 0 0
0 00 00 9090 00 090 090 0 90 00 00 0 0 30 50 80 80 90
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0 0 20 30 50 50 60

0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 30

0 0 90 0 0 0 0 0 0 0 10 10 10 0 0 0

0 This
0 kernel
0 is an0approximation
0 0 0 0 of a
0 Gaussian
0 function:
Slide by Steve Seitz
60
Convolution Template
= F [i] ⇤ H[ i]
vs correlation
G[i, j] = F ⇤ H = Deva(2-d)
XX
F Ramanan
[u, v]H[i u, j v]
u v
Convolution:
January
X14,
X 2015
h
G[i, j] = F ⇤ H = H ⇤ F = H[u, v]F [i u, j v] f
u v

Correlation: XX
G[i, j] = F ⌦ H = X Xv]F [i + u, j + v]
H[u, h
G[i, j] = F ⌦ Hu =v H[u, v]F [i + u, j + v] f
u v
XX
G[i, j] = G ⇤ H
Xk =X
k H[u, v]F [i u, j v]
G[i, j] = F ⇤ H = >> conv2(H,F)
H[u, v]F [i + u, j +convolution
v]
u v
u= k>>
v=filter2(H,F)
k correlation

Can we compute correlation with convolution?


Annoying details
What is the size of the output?
• MATLAB: filter2(g, f, shape)
Border effects
• shape = ‘full’: output size is sum of sizes of f and g
• shape = ‘same’: output size is same as f
• shape = ‘valid’: output size is difference of sizes of f and g

full same valid


g g g g
g g

f f f

g g
g g
g g
Border padding
Borders!
Examples of correlation
Linear filters: examples

1 1 1
1 1 1
1 1 1
=
Original Blur (with a mean
filter)

Source: D. Lowe
Examples offilterscorrelation
Practice with linear

0 0 0
0 1 0 ?
0 0 0

Original

Source: D. Lowe
Examples offilterscorrelation
Practice with linear

0 0 0
0 1 0
0 0 0

Original Filtered
(no change)

Source: D. Lowe
Examples of correlation
Practice with linear filters
Practice with linear filters

0 0 0
10 0 01 ?
0 0 0

Original Shifted left


By 1 pixel

Source: D.
Source: D. Lowe
Lowe
Examples of correlation
Practice with linear filters

0 0 0
0 0 1
0 0 0

Original Shifted left


By 1 pixel

Source: D. Lowe
What would this look like for convolution?
Examples of correlation
Practice with linear filters
Practice with linear filters

00 00 00
10 00 01 ?
00 00 00

Original
Original Shifted left
By 1 pixel

Source:
Source:D.
D.Lowe
Lowe
Examples
Practice with
with
of
linear correlation
filters
Practice with linear filters
Practice linear filters

0 0
00 0
00 00
0
0
0
10
0
1
00
0
01 ?
00 00 00

Original
Original Shifted left
Original Shifted left
By 1 pixel
By 1 pixel

Source: D. Lowe
Source:
Source:D.
D.Lowe
Lowe
Examples of correlation
Practice with linear filters

0 0 0
1 2 1
0 1 0
2 4 2 /16
0 0 0
1 2 1
Original Filtered
(no change)

What would this look like for convolution?


Source: D. Lowe
Examples of correlation
Practice with linear filters Practice with linea

0 0 00 00 0 0 0 0
0 0 21 00 - -0 1 0 ? 0
0 0 00 00 0 0 0 0

Original Filtered Original


(no change)

Source: D. Lowe
Examples of correlation
Practice with linear filters

0 0 00 00 1 2 1 0 0 0
( 0 0 11 00 - -2 4 2 /16 ) + 0 1 0
0 0 00 00 Sharpen1filter 2 1 0 0 0
blurred
Original image
image
Filtered
unit impulse
(identity)

(no change)

Source: D. Lowe

scaled impulse Gaussian Laplacian of Gaussian

Unsharp filter
Examples
Image!rota>on
Image!rota>on! !

? ? ?
? ? ⊗= ? ? =
⊗ ?
? ? ?
h[m,n] h[m,n]
€ €
f[m,n] f[m,n]
g[m,n] g[m,n]

Can rotations be represented with a convolution?


It is linear, but Itnot
is a
linear, but not
spatially a spatially
invariant invariant
operation. operation.
There There is not convolution.
is not convolution.
Are they linear shift-invariant (LSI) operations G[i,j] = T(F[i,j])?
Derivative filters (correlation)


1
⇥ ⇤
1 1 1
Practice
Question: what withas
happens linear filters
we repeatedly convolve
an image F with filter H?
F F*H

0 0 0
0 1 0
0 0 0

Original Filtered
(no change)

Source: D. Lowe
Aside for the probability junkies: The PDF of the sum of two random variables = convolution of their
PDFs functions. Repeated convolutions => repeated sums => CLT
Gaussian

2 3
1 2 1
1 4
2 4 25
16
1 2 1
Gaussian filters

= 1 pixel = 5 pixels = 10 pixels = 30 pixels


Implementation
Gaussian Kernel

Matlab: >> G = FSPECIAL('gaussian',HSIZE,SIGMA)

2 3
σ = 2 with 30 x 30 σ = 5 with 30 x 30
1 2 kernel 1 kernel
1 4
2 4 25
Standard deviation σ: determines extent of smoothing
• 16
1 2 1
63 Source: K. Grauman
Finite-support filters
Choosing kernel width
• The Gaussian function has infinite support, but discrete filters
use finite kernels

What should HSIZE be?


65 Source: K. Grauman
Rule-of-thumb

Set radius of filter to be 3 sigma


Useful representation:
Gaussian pyramid

Filter Pyramid.
Figure 1: Gaussian + subsample (to are
Depicted exploit redundancy
four levels in output)
of the Gaussian pyamid,
levels 0 to 3 presented from left to right.
Burt & Adelson 83
http://persci.mit.edu/pub_pdfs/pyramid83.pdf
[2] P.J. Burt. Fast filter transforms for image processing. Computer Graphics
Smoothing
ussian filters vs edge filters
How should filters behave on a flat region with value ‘v’ ?

= 5 pixels = 10 pixels = 30 pixels


Smoothing
ussian filters vs edge filters
How should filters behave on a flat region with value ‘v’ ?

Output ‘v’ Output 0


= 5 pixels = 10 pixels = 30 pixels
X X
H[i, j] = 1 H[i, j] = 0
ij ij
Template matching with filters
Template matching
Goal: find in image F[i,j]

emplate matching
Main challenge: What is a
good similarity or
distanceH[i,j]
measure
between two patches?
al: find •

Correlation in image
Zero-mean correlation
• Sum Square Difference
• Normalized Cross Correlation

in challenge: What is a
good similarity
Can we use or
Side by Derek Hoiem

filtering to build detectors?

distance measure 53
Attempt 1: correlate with eye patch
k
X k
X
G[i, j] = H[u, v]F [i + u, j + v]
Matching with filters
u= k v= k
Goal: find
T in image (2K+1)2
= H Fij = ||H||||Fij || cos ✓, H, Fij 2 R
Method 0: filter the image with eye patch
h[ m, n] = ∑ g[ k , l ] f [ m + k , n + l ]

plate matching
k ,l
f = image
g = filter

nd in image
What went wrong?

Input Filtered Image Side by Derek Hoiem


Attempt 1: correlate with eye patch
k
X k
X
G[i, j] = H[u, v]F [i + u, j + v]
Matching with filters
u= k v= k
TGoal: find in image (2K+1)2
= H Fij = ||H||||Fij || cos ✓, H, Fij 2 R
Method 0: filter the image with eye patch
h[about
Useful to think ∑
m, n] =correlation
g[ k , l ]and
f [m + k,n + l]
convolution

plate matching
k ,l
f = image
g = filter
H
F ij
✓ij
nd in image
What went wrong?
Fij
H

Input Filtered Image Side by Derek Hoiem


Attempt 1.5:
correlate with
Template matching transformed eye patch
Goal: find in image

emplate matching
Main challenge: What is a
good similarity or
distance measure
between two patches?
al: find•

Correlation in image
Zero-mean correlation
• Sum Square Difference
• Normalized Cross Correlation

ain challenge: What is a


good similarity orthat response on a flat region is 0
Side by Derek Hoiem
Let’s transform filter such

distance measure
Attempt 1.5: correlate with zero-mean eye patch
Matching with filters
k
X k
X
Goal:
G[i,find
j] = in image
(H[u, v] H̄)F [i + u, j + v]
u= k v= k
Method 1: filter
Xk the
Xk image with zero-mean Xk eye
Xk
h[ m, n]== ∑ ( f [ k ,H[u,
l ] − fv]F
) ( g[i[+mu,+jk+
, nv]+ l ]H̄
) F [i + u, j +
u=k ,l k v= k mean of f u= k v= k

True detections

False
detections

Input Filtered Image (scaled) Thresholded Image

57
MatchingAttempt
with filters 2: SSD
2
Goal: find
SSD[i, j]in=image
||H Fij ||
T
Method 2: SSD= (H Fij ) (H Fij )
n] =this∑be( gimplemented
h[ m,Can [ k , l ] − f [with
m + filtering?
k,n + l]) 2

k ,l

True detections

Input 1- sqrt(SSD) Thresholded Image

58
What will SSD find here?
Matching with filters
What’s the potential
Goal: find in image downside of SSD?

Method 2: SSD
h[ m, n] = ∑ ( g[ k , l ] − f [ m + k , n + l ] ) 2

mplate matching
k ,l

find in image

challenge: What is a
Input 1- sqrt(SSD)
Side by Derek Hoiem

d similarity orhave been darkened by .5 scale factor)


(where eyes

ance measure SSD will fire on shirt


59
Normalized cross correlation
H T Fij where H, F are mean-centered
N CC[i, j] =
||H||||Fij ||
T
H
Matching
H Fwith
qij filters
=p
TH TF ✓ij
Goal:Hfind inij
F image
ij
Method 3: Normalized cross-correlation Fij
= cos ✓

ate matching True detections

in image
60
Input Normalized X-Correlation Thresholded Image
The above approaches to filtering were largely hand designed. This is partly
due to limitations in computing power and lack of access to large datasets in
Modern filter banks
he 80s and 90s. In modern approaches to image recognition the convolution
ernels/filtering operations areNeural
Convolutional oftenNets
learned from
(CNNs) huge
Lecun et alamounts
98 of training
data. Learn filters from training data to look for low, mid, and high-level features

In 1998 Yann LeCun created a Convolutional Network (named “LeNet”)


hat could recognize hand-written digits using a sequence of filtering op-
rations, subsampling and assorted nonlinearities the parameters of which
were learned via stochastic gradient descent on a large,labeled training set.
Rather than hand selecting the filters to use, part of LeNet’s training was to
pick for itself the most e↵ective set of filters. Modern ConvNets use basically
he same structure as LeNet but because of richer training sets and greater
omputing power we can recognize far more complex objects than handwrit- 61
en digits (see, for example, GoogLeNet in 2014 and other submissions to
A look back

• Any linear shift-invariant operation can be characterized by a convolution


• (Convolution) correlation intuitively corresponds to (flipped) matched-filters
• Derive filters by continuous operations (derivative, Gaussian, …)
• Contemporary application: convolutional neural networks

62