09 Handout PDF

Lecture 9: Linear Regression
0/23
Linear Regression
Roadmap
1
When Can Machines Learn?
Why Can Machines Learn?
Lecture 8: Noise and Error

learning can happen
with target distribution P(y |x) and low Ein w.r.t. err
3
How Can Machines Learn?

Linear Regression Problem
Linear Regression Algorithm
Generalization Issue
Linear Regression for Binary Classification
4
How Can Machines Learn Better?
1/23
Linear Regression
Credit Limit Problem
unknown target function

f: X Y
age
gender
annual salary
year in residence
year in job
current debt
credit limit? 5,000
(ideal credit limit formula)
training examples
D : (x1 , y1 ), , (xN , yN )
23 years
female
33,000 USD
1 year
0.5 year
20,000
learning
algorithm
A
(historical records in bank)
final hypothesis
gf
(learned formula to be used)
hypothesis set
H
(set of candidate formula)
Y = R: regression
2/23
Linear Regression
Linear Regression Hypothesis

age
annual salary
year in job
current debt
23 years
$ 33.000
0.5 year
20,000
For x = (x0 , x1 , x2 , , xd ) features of customer,
approximate the desired credit limit with a weighted sum:

y
d
X
wi xi
i=0
linear regression hypothesis: h(x) = wT x
h(x): like perceptron, but without the sign
3/23
Linear Regression
Illustration of Linear Regression

x = (x) R
x = (x1 , x2 ) R2
x1
x2
linear regression:
find lines/hyperplanes with small residuals
4/23
Linear Regression
The Error Measure

popular/historical error measure:
squared error err(y , y ) = (y y )2
in-sample
out-of-sample
N
1X
(h(xn ) yn )2
Ein (hw) =
| {z }
N
n=1
Eout (w) =
E
(x,y )P
(wT x y )2
wT xn
next: how to minimize Ein (w)?
5/23
Linear Regression
Fun Time
Consider using linear regression hypothesis h(x) = wT x to
predict the credit limit of customers x. Which feature below shall
have a positive weight in a good hypothesis for the task?
1
birth month
monthly income
current debt
number of credit cards owned
Reference Answer: 2
Customers with higher monthly income should
naturally be given a higher credit limit, which is
captured by the positive weight on the monthly
income feature.
6/23
Linear Regression
Matrix Form of Ein (w)

Ein (w) =
N
N
1X T
1X T
2
(w xn yn ) =
(xn w yn )2
N
N
n=1
n=1
2
T
x1 w y1

T

1
x
w
y
2

2

...
N

xT w y
N
N

xT1

T
1
x2 w
...
N

xT
N
1
k X
w y k2
N |{z} |{z} |{z}
Nd+1 d+11
2
y1

y2

...

yN
N1
7/23
Linear Regression
min Ein (w) =

w
1
kXw yk2
N
Ein (w): continuous, differentiable, convex

necessary condition of best w
Ein
Ein (w)
Ein
w 0 (w)
Ein
w 1 (w)
...
Ein
w d (w)
0
0
...
0
task: find wLIN such that Ein (wLIN ) = 0
8/23
Linear Regression
The Gradient Ein (w)
Ein (w) =
1
1
kXw yk2 = wT XT X w 2wT XT y + yT y
|{z}
| {z }
|{z}
N
N
b
vector w
one w only
Ein (w)= N1
aw 2bw + c

Ein (w)= N1 wT Aw 2wT b + c
Ein (w)= N1 (2aw 2b)
Ein (w)= N1 (2Aw 2b)
simple! :-)
similar (derived by definition)

Ein (w) =
2
N
XT Xw XT y
9/23
Linear Regression
Optimal Linear Regression Weights

task: find wLIN such that
2
N
invertible XT X
easy! unique solution
wLIN =
1
XT y
XT X
|
{z
}
pseudo-inverse X

often the case because

XT Xw XT y = Ein (w) = 0
singular XT X
many optimal solutions
one of the solutions
wLIN = X y
by defining X in other ways
N d +1
practical suggestion:
use well-implemented routine
1 T
instead of XT X
X
for numerical stability when almost-singular
10/23
Linear Regression

1
from D, construct input matrix X and output vector y by
xT1
y1
xT
2
y = y2
X=
T
yN
xN
| {z }
|
{z
}
N(d+1)
calculate pseudo-inverse
N1
X
|{z}
(d+1)N
3
return wLIN = X y
|{z}
(d+1)1
simple and efficient

with good routine
11/23
Linear Regression
Fun Time
After getting wLIN , we can calculate the predictions yn = wTLIN xn . If all yn
similar to how we form y, what is the matrix
are collected in a vector y
?
formula of y
1
XXT y
XX y
XX XXT y
Reference Answer: 3
= XwLIN . Then, a simple
Note that y
substitution of wLIN reveals the answer.
12/23
Linear Regression
Linear Classification vs. Linear Regression

Linear Regression
Linear Classification
Y = {1, +1}
T
Y = R
h(x) = sign(w x)
err(y , y ) = Jy 6= y K
h(x) = wT x
err(y , y ) = (y y )2
NP-hard to solve in general
efficient analytic solution
{1, +1} R: linear regression for classification?

1
run LinReg on binary classification data D (efficient)
return g(x) = sign(wTLIN x)

but explanation of this heuristic?
19/23
Linear Regression
Relation of Two Errors

r
z
err0/1 = sign(wT x) 6= y

2
errsqr = wT x y
desired y = 1
desired y = 1
squared
0/1
err
err
wTx
wTx
err0/1 errsqr
20/23
Linear Regression

err0/1 errsqr
classification Eout (w)
VC
classification Ein (w) + . . . . . .
regression Ein (w) + . . . . . .
(loose) upper bound errsqr as ec

rr to approximate err0/1
trade bound tightness for efficiency
wLIN : useful baseline classifier,

or as initial PLA/pocket vector
21/23
Linear Regression
Summary
1
2
When Can Machines Learn?

Why Can Machines Learn?
Lecture 8: Noise and Error

3
How Can Machines Learn?

use hyperplanes to approximate real values
analytic solution with pseudo-inverse
Generalization Issue
Eout Ein 2(d+1)
on average
N
0/1 error squared error
next: binary classification, regression, and then?
4
How Can Machines Learn Better?

23/23

09 Handout PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

09 Handout PDF

Enviado por

Direitos autorais:

Formatos disponíveis

Lecture 9: Linear Regression

When Can Machines Learn?

Why Can Machines Learn?

Lecture 8: Noise and Error

How Can Machines Learn?

Lecture 9: Linear Regression

How Can Machines Learn Better?

Linear Regression Problem

Credit Limit Problem

unknown target function

credit limit? 5,000

(ideal credit limit formula)

(historical records in bank)

Linear Regression Problem

Linear Regression Hypothesis

For x = (x0 , x1 , x2 , , xd ) features of customer,

approximate the desired credit limit with a weighted sum:

linear regression hypothesis: h(x) = wT x

h(x): like perceptron, but without the sign

Linear Regression Problem

Illustration of Linear Regression

Linear Regression Problem

The Error Measure

next: how to minimize Ein (w)?

Linear Regression Problem

number of credit cards owned

Linear Regression Algorithm

Matrix Form of Ein (w)

Linear Regression Algorithm

min Ein (w) =

Ein (w): continuous, differentiable, convex

task: find wLIN such that Ein (wLIN ) = 0

Linear Regression Algorithm

The Gradient Ein (w)

Ein (w)= N1 (2aw 2b)

Ein (w)= N1 (2Aw 2b)

similar (derived by definition)

Linear Regression Algorithm

Optimal Linear Regression Weights

often the case because

Linear Regression Algorithm

Linear Regression Algorithm

from D, construct input matrix X and output vector y by

simple and efficient

Linear Regression Algorithm

Linear Regression for Binary Classification

Linear Classification vs. Linear Regression

NP-hard to solve in general

efficient analytic solution

{1, +1} R: linear regression for classification?

run LinReg on binary classification data D (efficient)

return g(x) = sign(wTLIN x)

Linear Regression for Binary Classification

Relation of Two Errors

Linear Regression for Binary Classification

Linear Regression for Binary Classification

classification Eout (w)

classification Ein (w) + . . . . . .

regression Ein (w) + . . . . . .

(loose) upper bound errsqr as ec

wLIN : useful baseline classifier,

Linear Regression for Binary Classification

When Can Machines Learn?