Escolar Documentos
Profissional Documentos
Cultura Documentos
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 1 April 17, 2018
Administrative
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 2 April 17, 2018
Last time: Neural Networks
Linear score function:
2-layer Neural Network
x W1 h W2 s
3072 100 10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 3 April 17, 20183
Next: Convolutional Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 4 April 17, 20184
A bit of history...
The Mark I Perceptron machine was the first
implementation of the perceptron algorithm.
recognized
letters of the alphabet
update rule:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 5 April 17, 2018
A bit of history...
These figures are reproduced from Widrow 1960, Stanford Electronics Laboratories Technical
Widrow and Hoff, ~1960: Adaline/Madaline Report with permission from Stanford University Special Collections.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 6 April 17, 2018
A bit of history...
recognizable math
Reinvigorated research in
Deep Learning
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 8 April 17, 2018
First strong results
Acoustic Modeling using Deep Belief Networks
Abdel-rahman Mohamed, George Dahl, Geoffrey Hinton, 2010
Context-Dependent Pre-trained Deep Neural Networks
for Large Vocabulary Speech Recognition
George Dahl, Dong Yu, Li Deng, Alex Acero, 2012
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 9 April 17, 2018
A bit of history:
1962
RECEPTIVE FIELDS, BINOCULAR
INTERACTION
AND FUNCTIONAL ARCHITECTURE IN
THE CAT'S VISUAL CORTEX
1968...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 10
10 April 17, 2018
A bit of history Human brain
Neocognitron
[Fukushima 1980]
LeNet-5
“AlexNet”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 16 April 17, 2018
Fast-forward to today: ConvNets are everywhere
Detection Segmentation
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 17 April 17, 2018
Fast-forward to today: ConvNets are everywhere
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 18 April 17, 2018
Fast-forward to today: ConvNets are everywhere
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 19 April 17, 2018
Fast-forward to today: ConvNets are everywhere
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 20 April 17, 2018
Fast-forward to today: ConvNets are everywhere
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 21 April 17, 2018
This image by Christin Khan is in the public domain Photo and figure by Lane McIntosh; not actual
and originally came from the U.S. NOAA. example from Mnih and Hinton, 2010 paper.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 22 April 17, 2018
No errors Minor errors Somewhat related
Image
Captioning
[Vinyals et al., 2015]
[Karpathy and Fei-Fei,
2015]
top of a surfboard suitcase on the floor beach holding a surfboard Captions generated by Justin Johnson using Neuraltalk2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 23 April 17, 2018
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 24 April 17, 2018
Convolutional Neural Networks
(First without the brain stuff)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 25 April 17, 2018
Fully Connected Layer
32x32x3 image -> stretch to 3072 x 1
input activation
1 1
10 x 3072
3072 10
weights
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 26 April 17, 2018
Fully Connected Layer
32x32x3 image -> stretch to 3072 x 1
input activation
1 1
10 x 3072
3072 10
weights
1 number:
the result of taking a dot product
between a row of W and the input
(a 3072-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 27 April 17, 2018
Convolution Layer
32x32x3 image -> preserve spatial structure
32 height
32 width
3 depth
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 28 April 17, 2018
Convolution Layer
32x32x3 image
5x5x3 filter
32
32
3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 29 April 17, 2018
Convolution Layer Filters always extend the full
depth of the input volume
32x32x3 image
5x5x3 filter
32
32
3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 30 April 17, 2018
Convolution Layer
32x32x3 image
5x5x3 filter
32
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
32 (i.e. 5*5*3 = 75-dimensional dot product + bias)
3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 31 April 17, 2018
Convolution Layer
activation map
32x32x3 image
5x5x3 filter
32
28
32 28
3 1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 32 April 17, 2018
consider a second, green filter
Convolution Layer
32x32x3 image activation maps
5x5x3 filter
32
28
32 28
3 1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 33 April 17, 2018
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
activation maps
32
28
Convolution Layer
32 28
3 6
32 28
CONV,
ReLU
e.g. 6
5x5x3
32 filters 28
3 6
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 35 April 17, 2018
Preview: ConvNet is a sequence of Convolution Layers, interspersed with
activation functions
32 28 24
CONV, ….
CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
32 filters 28 24
filters
3 6 10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 36 April 17, 2018
Preview [Zeiler and Fergus 2013]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 37 April 17, 2018
Preview
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 38 April 17, 2018
one filter =>
one activation map example 5x5 filters
(32 total)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 39 April 17, 2018
preview:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 40 April 17, 2018
A closer look at spatial dimensions:
activation map
32x32x3 image
5x5x3 filter
32
28
32 28
3 1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 41 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 42 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 43 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 44 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 45 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 46 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 47 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 48 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
=> 3x3 output!
7
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 49 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 50 April 17, 2018
A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 51 April 17, 2018
N
Output size:
(N - F) / stride + 1
F
e.g. N = 7, F = 3:
F N
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33 :\
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 52 April 17, 2018
In practice: Common to zero pad the border
0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
(recall:)
(N - F) / stride + 1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 53 April 17, 2018
In practice: Common to zero pad the border
0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
7x7 output!
0
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 54 April 17, 2018
In practice: Common to zero pad the border
0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
7x7 output!
0
in general, common to see CONV layers with
stride 1, filters of size FxF, and zero-padding with
(F-1)/2. (will preserve size spatially)
e.g. F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 55 April 17, 2018
Remember back to…
E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially!
(32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
32 filters 28 filters 24
3 6 10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 56 April 17, 2018
Examples time:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 57 April 17, 2018
Examples time:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 58 April 17, 2018
Examples time:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 59 April 17, 2018
Examples time:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 60 April 17, 2018
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 61 April 17, 2018
Common settings:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 62 April 17, 2018
(btw, 1x1 convolution layers make perfect sense)
1x1 CONV
56 with 32 filters
56
(each filter has size
1x1x64, and performs a
64-dimensional dot
56 product)
56
64 32
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 63 April 17, 2018
Example: CONV
layer in Torch
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 64 April 17, 2018
Example: CONV
layer in Caffe
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 65 April 17, 2018
The brain/neuron view of CONV Layer
32x32x3 image
5x5x3 filter
32
1 number:
32 the result of taking a dot product between
the filter and this part of the image
3
(i.e. 5*5*3 = 75-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 66 April 17, 2018
The brain/neuron view of CONV Layer
32x32x3 image
5x5x3 filter
32
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 67 April 17, 2018
The brain/neuron view of CONV Layer
32
32
28 “5x5 filter” -> “5x5 receptive field for each neuron”
3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 68 April 17, 2018
The brain/neuron view of CONV Layer
32
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 69 April 17, 2018
Reminder: Fully Connected Layer
Each neuron
32x32x3 image -> stretch to 3072 x 1 looks at the full
input volume
input activation
1 1
10 x 3072
3072 10
weights
1 number:
the result of taking a dot product
between a row of W and the input
(a 3072-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 70 April 17, 2018
two more layers to go: POOL/FC
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 71 April 17, 2018
Pooling layer
- makes the representations smaller and more manageable
- operates over each activation map independently:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 72 April 17, 2018
MAX POOLING
3 2 1 0 3 4
1 2 3 4
y
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 73 April 17, 2018
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 74 April 17, 2018
Common settings:
F = 2, S = 2
F = 3, S = 2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 75 April 17, 2018
Fully Connected Layer (FC layer)
- Contains neurons that connect to the entire input volume, as in ordinary Neural
Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 76 April 17, 2018
[ConvNetJS demo: training on CIFAR-10]
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 77 April 17, 2018
Summary
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - 78 April 17, 2018