Escolar Documentos
Profissional Documentos
Cultura Documentos
Machine Learning
1
Example Problems
• Identify the risk factors for prostate cancer.
• Classify a recorded phoneme based on a log-periodogram.
• Predict whether someone will have a heart attack on the basis
of demographic, diet and clinical measurements.
• Customize an email spam detection system.
• Identify the numbers in a handwritten zip code.
• Establish the relationship between salary and demographic
variables in population survey data.
2
-1 1 3 40 60 80 0.0 0.4 0.8 6.0 7.5 9.0
oo ooo o o o o o oo o oo o
o o o o
oo o
o
o o
o
o o o
0 1 2 3 4 5
o o o o oo o oo
ooo o o o o o oo o o o oo o o o o o o
oo o oo oooooo o oo ooooo o o o ooo o o o oooo o ooo
o o ooo ooo ooooo o o o oo ooo oooo o o o oooooo oo o o oo
lpsa o o oo ooo o oo o o oo ooooooo o oo
oo ooo o o o o ooo o oo o o o o o o o
oooo
ooooo
o oo o ooooo o o o oo oo o o o ooooo o
o o o o ooo o o o ooo
o o o o ooo
ooo oooo
oo
oo ooooo o o o o o o oo o o oo o o
o o o
o
ooo oo o oo o o o o o o
o o
oo o o o o o
ooo o o o o o o o o o o o o
o o o o oo o o
-1 0 1 2 3 4
o o o
o ooo ooo oo o o oooo o o oo oooo o
o o o ooooo ooo o oo
oo o ooo oo oo
ooo o o o ooo oo o oo
o o o o oooooo o o ooo oo o o
o
o oo oooo oooooo o o oo o oo o
o o o o
o oo o oo o
oooo o o oooo o o ooooo oo o ooo ooo
oo o
o
o o oo o o o o o o o o ooo o
o oo o o o
ooooo o lcavol oooooooo o oooooo oo o o
o ooooo o oo
o oo
o o oooo o o
o
o
o o oooooo oo o
oo o o o oo o oooo o o o o oo o o
o o
o oo
oooo o o o o o o o o o
o oo o
o o
oo o o o o oooo o o oo o oo o o o o o o o o oo
o oo o o oo o o o
o o o o
o o o o o o o o
6
5
oooo o o o oo oo ooo o o o o o o o o
o oooooo o o ooooo o o o o o o ooo o
ooooooooooo oo o o oooo ooo oo lweight o o oooooooooo ooooooooo o o o oo oo oo o o oo oo
o o o o ooo oooo o oo o o ooooo
o oo ooooooo oo
4
o oo o o o o o o o
oooo oo o o o oooooo oo o o ooo oo ooooooooooooo o
oo o o o ooo ooo o o o ooo ooo oo o o o oooooo o o o
o o ooo o oo o o oo o o oooo o
oo o o o oo o oo o o oo oo
3
o o o o
o o o o o o o o
oooo o o o o o
40 50 60 70 80
o o o o o o oo
o o oo o o ooo o o ooo o o o oo o
o oooooo
oooooo o o o ooo o o oo ooo
o oo o o o
o ooo oooo
oo o
o o o ooo
o oo oo oo
o o o oo o
o o
o o o o
ooooo oooo oo
o o oo ooo o oo o o
o oo o o oo o o
ooooooo oo oo o ooo oooooooooo o ooo ooo
oo
o o
o ooo ooo o o
o o o o ooo o o
o
o ooo o oo oo
o ooo o oo ooo oooo oooo o o
o
o oo age o o o
oo o
o oo o oo o o o
o oo oooo o oo
o
o o o o
o o o o o o ooo
o o o oo o o ooo o o o
o o o o o o oo o o
o o o oo
o o o o o oo o o o o o o o o
o o
o oo o ooo o oooo o o oo o o o oooo
oo o oo o oo o o o o o o oo oo o
o oooooo o o ooo o
2
o oooooooo o o o oooooooo o o o o o o ooo o
o o o oo o o
o o oo o o o oo oo o ooo oo o o o ooo o o
o ooooo oo
o o o o o oo o o o o o o
1
o oooo o oooo o o oo o o o
o o oo o o ooo oo
o ooooooo lbph o o o
o o o
o o oo o o o o
o o o o o o o
0
o o o o o o o
oo o o ooo o oo o o o o o o
-1
oooo o ooooo oooooooooooooo o oooooooooo ooooooooooo oo o o o ooo oooooooo o o o ooooooooo oo o
oo oooooo oooooo oo ooo o o o ooo oo o o o o o oo o o ooo ooooooo o o ooooooooooooo
0.8
svi
0.4
o o o o o o o o
o oo oo
3
oo o o o o o o o o oo o
o oo o
oo o o o o oo o oo o oo o
oooo o o
ooooo ooo o o o
o ooo ooo oo o
2
oo o oooo oo o o ooo
oo o o o o o o
o o o o oooo oo o o o o o o oo o
o oo oo ooo o o o o oo o o lcp o o o o
1
oo ooo o o ooo o oooo oo o oo ooooooo
o ooooooo oo
o oooo o o
o ooo o o o o o
-1 0
ooo o
oooo oo o oo o
oooo o ooooo o o oooooo
oo oo
o
o oo
o ooo o
o o
o o
o o
oo o o
ooo o oooo oooooooooooo o oo ooooooo o oooo o ooooooo o o o oo ooooo o o o o o oooooo oo o
oooo o o o oooo oo o o o o o o o o o ooo o ooooo
6.0 7.0 8.0 9.0
o o o o o o o o
100
o o o oo o o o o o oo o
o
oo o oo o o o o o o o o o o o o o o o o
o o
o o oo o oo o oo oo o ooo o o o o o oo o o
o o ooo oooo oo oo oo ooo o o o o o o o oo o
60
oo
oo o
oooo oo o o oo
o ooo o
oo
oooo
o
o
o
o ooo
o
o
o o o
o o o o o oo
o o
o
pgg45
oo o o o o ooooo oo o o o o oo o o o o ooo o
o oo oo o o o o ooooooo oo oooooo ooooooo o oo o o oo o o o o o ooo o o
0 20
o o o o o
ooo o
ooo o oooo oooooooo ooooooo
ooooo oo ooo o
ooo ooo o oo ooooooooo o o o o o
o o oo oo o o
o
o ooo o o oo oo oo o
0 2 4 3 4 5 6 -1 0 1 2 -1 1 2 3 0 40 80
3
Example Problems
• Identify the risk factors for prostate cancer.
• Classify a recorded phoneme based on a log-periodogram.
• Predict whether someone will have a heart attack on the basis
of demographic, diet and clinical measurements.
• Customize an email spam detection system.
• Identify the numbers in a handwritten zip code.
• Establish the relationship between salary and demographic
variables in population survey data.
4
Phoneme Examples
25
aa
ao
20
Log-periodogram
15
10
5
0
Frequency
0.2
0.0
-0.2
-0.4
Frequency
5
Example Problems
• Identify the risk factors for prostate cancer.
• Classify a recorded phoneme based on a log-periodogram.
• Predict whether someone will have a heart attack on the basis
of demographic, diet and clinical measurements.
• Customize an email spam detection system.
• Identify the numbers in a handwritten zip code.
• Establish the relationship between salary and demographic
variables in population survey data.
6
0 10 20 30 0.0 0.4 0.8 0 50 100
220
oo o ooo o ooo o o oo ooo o o o o oo
o oo o
o
ooooooo oo o o o o oo
ooo o oo o oo oo o o ooo o o
oo ooo o o oo o
oooooooooooooooo o o
oo
ooo oo ooo ooooooooo ooo o o
oooooooo oo o ooo oooo
ooo ooo ooo o o oooo oo oooooo ooooo o o o oooooooooo
o
sbp ooooooooo ooo oooooo o ooo oo o ooo oo ooooo ooo oooo oo
160
ooo ooo o o ooooooo oo oooo o
o o o o o o o oo oooo o o
oo ooooooo o o o oo ooo oo oo o o o o o o o oo oo oo
ooo
o
o
oo
o
oo
oooo
o o o
oo
oo
o ooooo
o o oo
oooooooooooooooooooooo ooo ooo o oooooo oooooooo o oooo ooooooo
o o oooooooo
o o o o
o oooo
o oooooooooo oo oo ooooo
o
oo o o
o ooooo ooo
o
oooooooo oooooo
ooo o oooooooooooooo o o oo o o o
o oo o oo o o o o oo
ooooooooooooooooooo o
oo o
o ooooo o o o
ooooo o
oooo o
ooooooooo o oo oo o oooo oooo oo o o oo oo
o ooo
ooooo ooo
oooooooooooooooooooo
ooooo
oooooooooo
oooooo
oooooooooooo oo ooooooooooooo
oo ooo oo o oooooooooooooooo o o o ooooo oooo oo o o oo
oooooooooo o
ooooooo oo o
o oo o
oooo ooooo
ooooo o ooo ooo oo ooooooo oo o o o o o oo o o o o o o oo
100
o o oo o oo
30 o o o o o o
o o o o o o
o o o o o o
oo o o o o
20
oo oo o o oo oo o o o oo o o o oo
o o
oo ooo oooooooo
o tobacco ooooo oo ooo oo oooooooo ooooo
oo o oooo oo o o o
ooo
oooooooo o oo o o oo o o o o o ooo ooooooooo
oooooooooooooooo oo o oooooo o o o
ooooo
o o o o o o o o oo o
10
o o o o oo oooooooooo
o
ooo
oooo o ooo o
oooooooooo o o o
oooooooooo
ooo oooo ooo oooooooo o o o o ooooooo ooo o
oooooooooo oo
o oo oo ooo
oooooooooooooooooooooooo o oooooooo oo oooooooooooo oooooo oo oooooooo ooo
oooo o ooooo
oo
oooooooooooooo ooooooo o ooooooooooo o o o o ooo o o
ooo ooo ooo o oooo
oo o ooooooo oo
o
oooooooooooo oooooo
oooooooo oo
o o o
oooooooooooooooooooo oo
o o
oo oo o ooo ooo ooo o
o oooooooooooooooooooooooo
ooooo
ooo o ooo oooooooooooooo ooo
o oo ooo oo
o o oooooooooo oooooooooooooooooooo oo
0
o o o o o
o o o o o o
14
o o
oo o o oo
ooooo o oo ooooo o
o oo o o oo ooo o
oo o o o o oo o o o
oo o oo o oo o
10
o o o ooo oooo ooooooo
o
oo ooooooooooo o ooooo oo o
oooooooo oooooooo oo
ooooooooo oo o oooooooooooooo o ldl o
oo
oo o o
ooo oooo oo o o
ooooo
oooooo
oo o o o oo o o
o ooooooo o o o o
o oooo
oooooooooooo
o ooooo ooo
o oooooooooo oo oooo
o ooooo ooo
o ooooo
o oo o oo oo o o oo o o ooooo ooo o o o oooooooo oo
6
oooooo o o o o o o
ooooooooooooooooooooo
oo o oo o
o oooo
ooo ooooooooo oo o oo o o o oo o
oo
o oo
oooo oo
o ooooooooooooo o o
o o o
o o ooo
oooo o o oo o o oooooooo
oooooooooooo o
o oooooooooo
o oo
o o
ooo oo o
oooo oooooooooooooooooooo
oo o oooooooo oo ooo
oo
ooo oo
oo oo
oooooooooooo
oo
oo oo ooooooooooooooo
o
oooo
oo o o ooooooooo
o o o o
ooooooo
o
oooooo
o oo o ooo o ooo oo
oooooooooooooooo
oo o
o o o o oo oo
oo
o
ooooooooooooooo o oo oooo
ooooo o o
o oo oo
o o oo o oo oo ooo
oo o o o o o o oo
2
o o
o ooooooooooooooooooooo ooooooooooooooo ooooooooooooooooo oooo oooooooooooooooooooooo ooooooooooooooooooo o ooooooooooooooooooooooooooo
o
0.8
famhist
0.4
o o o o
o o o o o o o
45
o o o oo o o o o oo
o oo o o ooo ooo oo oo o
ooooooo ooo oo
oo oooo oo ooo ooo o oo
ooo ooo ooo ooooooo o
35
oooo
oooooooooooo
oo
oooooo oooooooo ooooo o o o oo ooo
ooooooooo oooooooo
o
oo oo oooo oooo ooooooo
o o
oooooooooooooo
oooooooo
oooo ooooooooooooo oooo o oooooooooo o o o
oooooo
oo
oo oo oooo o ooooo o ooo
o ooo obesity oo
o o o o oo
ooooooooo oo oo
o o
oo oo
o
o oo
o
oo ooooooooooooooooooo
oo o
o
ooooooooo
oooo ooo o o
oo o
ooo o o o
oooooooooo oo ooooooooooo
o ooo o o ooooooooooo o ooo o
oooo
oooooooooo
oooooo ooo o o o o
ooo o
oooooooooo oooooooo o o
o oooo o oooo oo o oooooooo
oo oo
o oooooo ooooooooooooooo ooooo
25
o ooo oooo o oooo oo o o ooo o o o
o o o ooo o o
o o o o o o o o o o o ooooooo oo o
oo oo oo o oo oo oooo oo ooooo ooo o oo oooo oooo o oooo ooooooooo ooo oooooooo o
oooooooooooo oo o
oo ooo o o o oooooooooo oo o
o oooo oo
o ooo o oooo ooo o o oo o
o o o o
15
o o
ooo o oo o oo o oo o o o
o oo
o o oo oo o oo o o o o oo
o
100
o
ooooo oooooooooooooooooooo oooo oooooo ooooooo ooo o ooooo oooooooooooo o oo oo o oo oooooooo ooo
o ooo oooooooo o o oo oo
oooo o ooo oooooooooooooo o oooooooooooooooooooooooo oooooo
60
oooooooooo oooo o
oooooooooooooooo o oo o ooooooooo oooo ooooooooo oo o oooo oo oo oo ooooooooo
oo ooooooo o ooooooo o o oo o oooo oo o
oooooo o o o o o o
oooooooo o
ooooooo oo oooo
oooooooo ooooo o o o oo oo o oo
o ooooooooo o oo o ooo o
ooo ooooo oooooo oo
ooooo ooooooooooooooo o o o ooo oo o
o o o o oo o oooooooo oooooooooo o oooo o o
oooo ooooo oo o ooooo o
ooooooooo o ooooooooooooo oo o o o o ooo
ooo
ooooo
o o
ooooooo
ooooooo
ooooooo oo o o o oo ooooooooooo o ooo oo o
ooo o o o oooooooooo
oooooo o age
40
oo o
ooooo oo oo
o
o ooooo o oooooo o ooo ooo
o oo o oooo ooo oo o o
o
o o o o o o o ooooooo o ooooooooooo o
oooo oooooo o ooo o o o oooo o o ooo o o
oooo o o oo o
oooooo oooooo o o ooooooo ooo
ooooo ooo
oo o o o oo oooooooooooooo o o
oooooo
o
ooooooo o
ooo ooooooo o
oooo ooo o oooo oo
o oooo o oooo o o
ooo o o
20
o
ooooooooooo ooo o
o oooooo oooo oo ooo o o ooooooo o ooo o
8
Spam Detection
• data from 4601 emails sent to an individual (named George,
at HP labs, before 2000). Each is labeled as spam or email.
• goal: build a customized spam filter.
• input features: relative frequencies of 57 of the most
commonly occurring words and punctuation marks in these
email messages.
9
Example Problems
• Identify the risk factors for prostate cancer.
• Classify a recorded phoneme based on a log-periodogram.
• Predict whether someone will have a heart attack on the basis
of demographic, diet and clinical measurements.
• Customize an email spam detection system.
• Identify the numbers in a handwritten zip code.
• Establish the relationship between salary and demographic
variables in population survey data.
10
11
Example Problems
• Identify the risk factors for prostate cancer.
• Classify a recorded phoneme based on a log-periodogram.
• Predict whether someone will have a heart attack on the basis
of demographic, diet and clinical measurements.
• Customize an email spam detection system.
• Identify the numbers in a handwritten zip code.
• Establish the relationship between salary and demographic
variables in population survey data.
12
Example Problems
• Identify the risk factors for prostate cancer.
• Classify a recorded phoneme based on a log-periodogram.
• Predict whether someone will have a heart attack on the basis
of demographic, diet and clinical measurements.
• Customize an email spam detection system.
• Identify the numbers in a handwritten zip code.
• Classify a tissue sample into one of several cancer classes,
based on a gene expression profile.
• Establish the relationship between salary and demographic
variables in population survey data.
13
300
300
300
200
200
200
Wage
Wage
Wage
100
100
100
50
50
50
20 40 60 80 2003 2006 2009 1 2 3 4 5
Income survey data for males from the central Atlantic region
of the USA in 2009.
14
The Supervised Learning Problem
Starting point:
• Outcome measurement Y (also called dependent variable,
response, target).
• Vector of p predictor measurements X (also called inputs,
regressors, covariates, features, independent variables).
• In the regression problem, Y is quantitative (e.g price,
blood pressure).
• In the classification problem, Y takes values in a finite,
unordered set (survived/died, digit 0-9, cancer class of
tissue sample).
• We have training data (x1, y1), . . . , (x N , yN ). These are
observations (examples, instances) of these measurements.
15
Objectives
16
Philosophy
• It is important to understand the ideas behind the various
techniques, in order to know how and when to use them.
17
Unsupervised learning
18
The Netflix prize
19
BellKor’s Pragmatic Chaos wins, beating The Ensemble by a narrow margin.
20
Statistical Learning versus Machine Learning
• Machine learning arose as a subfield of AI.
• Statistical learning arose as a subfield of Statistics.
• There is much overlap — both fields focus on supervised
and unsupervised problems:
• Machine learning has a greater emphasis on large scale
applications and prediction accuracy.
• Statistical learning emphasizes models and their
interpretability, and precision and uncertainty.
• But the distinction has become more and more blurred,
and there is a great deal of “cross-fertilization”.
• Statistical learning is a fundamental ingredient in the
training of a modern data scientist.
21
What is Statistical Learning?
25
25
25
20
20
20
Sales
Sales
Sales
15
15
15
10
10
10
5
5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100
TV Radio Newspaper
Y = f (X) +
where captures measurement errors and other discrepancies.
23
What is f (X) good for?
24
●●
● ●
6
●●
● ●
●●● ●
● ●
●●●●
4
●●● ●
●●●●
●● ●●● ●●●
●●● ● ● ●
● ●●●●● ●●● ●
●●●●●● ● ●●● ●
●● ●
●● ●
●●● ●●●●● ●
●●●●● ●●● ●● ●●●●
●● ●●
●●
y
●●●●●●●
● ●● ● ●
2
● ● ●●●●● ● ●
●●●● ●●●●●●●●●●●●●●●●
● ●●● ●
● ●●● ● ●●● ●●● ●●
●●●●●●● ●
● ●●● ●●●● ●●●●
●●●● ●
● ● ●●●●●●●
● ●
●●●●●
● ●
●
● ●●
●
●●
●● ● ●
●●
● ●●●
●● ●●
●●
●● ●● ● ●● ● ● ● ● ●
● ●●●●●● ●●●●●●●●●●●
● ● ●●● ●
●● ●●●●●● ● ●●●●●●
● ● ●● ●●●●●●
● ●● ●●●● ● ●
●● ●●●
● ●●●●●●● ●●●●
0
1 2 3 4 5 6 7
f (4) = E(Y |X = 4)
E(Y |X = 4) means expected value (average) of Y given X = 4.
This ideal f (x) = E(Y |X = x) is called the regression function.
25
The regression function f (x)
• Is also defined for vector X; e.g.
f (x) = f (x1, x2, x 3 ) = E(Y |X1 = x1, X 2 = x2, X 3 = x3)
26
The regression function f (x)
• Is also defined for vector X; e.g.
f (x) = f (x1, x2, x 3 ) = E(Y |X1 = x1, X 2 = x2, X 3 = x3)
• Is the ideal or optimal predictor of Y with regard to
mean-squared prediction error: f (x) = E(Y |X = x) is the
function that minimizes E[(Y − g(X)) 2 |X = x] over all
functions g at all points X = x.
27
The regression function f (x)
• Is also defined for vector X; e.g.
f (x) = f (x1, x2, x 3 ) = E(Y |X1 = x1, X 2 = x2, X 3 = x3)
• The ideal or optimal predictor of Y with regard to mean-
squared prediction error: f (x) = E(Y |X = x) is the
function that minimizes E[(Y − g(X)) 2 |X = x] over all
functions g at all points X = x.
• = Y − f (x) is the irreducible error — i.e. even if we knew
f (x), we would still make errors in prediction, since at each
X = x there is typically a distribution of possible Y values.
28
The regression function f (x)
• Is also defined for vector X; e.g.
f (x) = f (x1, x2, x 3 ) = E(Y |X1 = x1, X 2 = x2, X 3 = x3)
• Is the ideal or optimal predictor of Y with regard to
mean-squared prediction error: f (x) = E(Y |X = x) is the
function that minimizes E[(Y − g(X)) 2 |X = x] over all
functions g at all points X = x.
• = Y − f (x) is the irreducible error — i.e. even if we knew
f (x), we would still make errors in prediction, since at each
X = x there is typically a distribution of possible Y values.
• For any estimate fˆ(x) of f (x), we have
29
How to estimate f
• Typically we have few if any data points with X = 4 exactly.
• So we cannot compute E(Y |X = x)!
• Relax the definition and let
fˆ(x) = Ave(Y |X ∈ N (x))
where N (x) is some neighborhood of x.
3
● ●
●
2
●
●
● ●
● ● ●
● ●
1
● ●● ●
●● ●
●
● ●● ●
y
●● ●●●
● ●● ● ● ●● ●
0
● ● ● ● ●
● ● ● ●
●
● ● ●● ● ● ● ●
●
−1
● ●● ●● ● ● ●
●
−2
1 2 3 4 5 6
30
x
• Nearest neighbor averaging can be pretty good for small p
— i.e. p ≤ 4 and large-ish N .
31
The curse of dimensionality
10% Neighborhood
p= 10
1.0
● ● ●●●
● ● ●
● ●● ● ● ●●
1.5
● ● ● ●
● ●
● ●
● ● ●
p= 5
0.5
● ● ●
● ●
● ● ●
●●●
● ●
p= 3
1.0
●
Radius
●
● ● ●
● ●
p= 2
0.0
● ●
x2
● ●● ●
● ●●
●
● ● ●
● ●
●●
●
● ● p= 1
● ● ●● ●
0.5
●
−0.5
● ●
●●
● ● ●
●●
● ● ●
● ●●
● ● ● ● ●
● ●
●
●
−1.0
0.0
●
−1.0 −0.5 0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
x1 Fraction of Volume
32
Parametric and structured models
The linear model is an important example of a parametric
model:
f L ( X ) = β0 + β1X1 + β2X2 + . . . βpXp.
33
A linear model fˆL(X) = β̂0 + β̂ 1 X gives a reasonable fit here
3
● ●
2
●
●
●●
● ● ●
● ● ●
1
● ●●●
●● ●
●
● ● ●
y
● ●●
● ●
●● ●● ●
● ● ●
0
●
● ● ● ● ●
● ● ●
● ●
● ● ●● ● ● ● ●
●
−1
● ● ●
● ●
●● ●●
−2
1 2 3 4 5 6
● ●
2
●
●
●
● ● ● ●
● ● ●
● ● ●
1
●
●● ●
●
● ● ●
y
●
●● ●●
●
● ● ● ● ● ● ●
0
●
● ● ● ● ●
● ● ●
● ●
● ● ●● ● ● ● ●
●
−1
● ● ●
● ●
●● ●●
−2
1 2 3 4 5 6
34
Simulated example. Red points are simulated values for income
from the model
income = f (education, seniority) +
36
More flexible regression model fˆS(education, seniority) fit to
the simulated data. Here we use a technique called a thin-plate
spline to fit a flexible surface.
37
Even more flexible spline regression model
fˆS (education, seniority) fit to the simulated data. Here the
fitted model makes no errors on the training data! Also known
as overfitting.
38
Some trade-offs
39
Some trade-offs
• Prediction accuracy versus interpretability.
— Linear models are easy to interpret; thin-plate splines are
not.
• Good fit versus over-fit or under-fit.
— How do we know when the fit is just right?
40
Some trade-offs
• Prediction accuracy versus interpretability.
— Linear models are easy to interpret; thin-plate splines are
not.
• Good fit versus over-fit or under-fit.
— How do we know when the fit is just right?
• Parsimony versus black-box.
— We often prefer a simpler model involving fewer variables
over a black-box predictor involving them all.
41
High Subset Selection
Lasso
Least Squares
Interpretability
Bagging, Boosting
Low High
Flexibility
42
Assessing Model Accuracy
43
2.5
12
2.0
10
1.5
8
Y
1.0
6
0.5
4
2
0.0
0 20 40 60 80 100 2 5 10 20
X Flexibility
44
2.5
12
2.0
10
1.5
8
Y
1.0
6
4
0.5
2
0.0
0 20 40 60 80 100 2 5 10 20
X Flexibility
Here the truth is smoother, so the smoother fit and linear model do
really well.
45
20
20
15
Mean Squared Error
10
10
Y
5
−10
0
0 20 40 60 80 100 2 5 10 20
X Flexibility
Here the truth is wiggly and the noise is low, so the more flexible fits
do the best.
46
Bias-Variance Trade-off
Suppose we have fit a model fˆ(x) to some training data Tr, and
let (x0, y0) be a test observation drawn from the population. If
the true model is Y = f ( X ) + (with f (x) = E(Y |X = x)),
then
47
Bias-variance trade-off for the three
examples
2.5
2.5
20
MSE
Bias
Var
2.0
2.0
15
1.5
1.5
10
1.0
1.0
5
0.5
0.5
0.0
0.0
0
2 5 10 20 2 5 10 20 2 5 10 20
48
Classification Problems
49
|| | | | || | ||| || |||||||||||||||||| ||||||||||||||||| |||||||||||||||| ||||||||||| |||||||| ||||||||||||||||||| ||||||| |||| |||
1.0
| | ||
0.8
0.6
y
0.4
0.2
0.0
x
Suppose the K elements in C are
numbered 1,2,. . . , K. Let
pk(x) = Pr(Y = k|X = x), k = 1,2,. . . , K.
These are the conditional class probabilities at x; e.g. see little
barplot at x = 5. Then the Bayes optimal classifier at x is
C(x) = j if pj (x) = max{p1(x), p2(x), . . . , pK (x)}
50
| | | || || || || || || || |||| ||| | | |||||||| || || | |||| || | || | |
1.0
0.8
0.6
y
0.4
0.2
0.0
| | | || | | || | | | || | || |||| | || | | || | || | | |
2 3 4 5 6
51
Classification: some details
52
Classification: some details
53
Example: K-nearest neighbors in two dimensions
oo o
oo oo
o
o
o oo o o
o
o oo oo ooo
o o
o o
oooooo o oo
o oo o
o o o oo oo o oo
o o o o o
oo oo o o o o
o o o o o o o
ooo o o o
o oo o o
o o oooo o o o o o o o
o
o o ooooooo o o oo o
X2
o o o
oo o oo o o o
o o o o oo oo
o o oo
o o o o ooo
o o ooo o
oo o oooo oooo
o o o oooo o
o o o oo o o
o o o oooo oo
o o
oo o
o
o o o
X1
54
KNN: K=10
oo o
oo oo
o
o
o oo o o
oo o
o o o o
o oo o
o
o oo o o oooooo o oo
o o o
o oo oo oo
o o o o oo o
oo oo o o o o
o o o o o o o
ooo o o o
o o
o o o o
o o oooo o oo o o o
o
o o ooooooo o o oo o
o o
X2
o oo o o o
oo o o o oo o
o o o o
o o oo o
o o o oo
o oo oo o
oo o oooo oooo
o o oooo o
o o o o oo o o
o
o o ooooo oo o
o
o o
o
o o o
X1
55
KNN: K = 1 K N N : K=100
oo o oo o
oo o
o
o oo o
o
o
o oo o o o oo o o
o o
o o
o oo oo ooo o oo oo ooo
o o oo oo o o o oo oo o
oo o o o o o oo oo oo o o o o o oo oo
o o o oo o o o o oo o
o o o oo oo o o o oo oo
o o o o
o oo o o o oo o o
o
o o o o o o
o o o o o
o o o o o o o o o o
o oo o o o o o o oo o o o o o
oo o o o o oo o o o o
o o o o oo o oo o o o o oo o oo
oo o ooo o oo o ooo o
o oo o
o ooo o oo o
o ooo
o o o o o o
oo o oo o o oo o oo o o
o o
o o o
o o oo oo o o o
o o oo oo
o o o o
o o o oo o oo o o o o o oo o oo o o
o o o o
o oooo oo o o o oooo oo o o
oo oo oo oo
o o ooo o o o ooo o
o o o
o oo o o o o o o
o oo o o o
o o
o
o
o oooo oo o
o
o
o oooo oo o
oo o oo o
o o
o o o o
o o
56
0.20
0.15
Error Rate
0.10
0.05
Training Errors
0.00
Test Errors
1/K
57
Exercise
For each of parts (a) through (d), indicate whether we would
generally expect the performance of a flexible statistical learning
method to be better or worse than an inflexible method. Justify your
answer.
58
Exercise
Provide a sketch of typical (squared) bias, variance, training
error, test error, and Bayes (or irreducible) error curves, on a
single plot, as we go from less flexible statistical learning
methods towards more flexible approaches. The x-axis
should represent the amount of flexibility in the method, and
the y-axis should represent the values for each curve. There
should be five curves.
59
End
60