Você está na página 1de 17

Lecture 9: Linear Regression

0/23

Linear Regression

Roadmap
1

When Can Machines Learn?

Why Can Machines Learn?

Lecture 8: Noise and Error


learning can happen
with target distribution P(y |x) and low Ein w.r.t. err
3

How Can Machines Learn?

Lecture 9: Linear Regression


Linear Regression Problem
Linear Regression Algorithm
Generalization Issue
Linear Regression for Binary Classification
4

How Can Machines Learn Better?

1/23

Linear Regression

Linear Regression Problem

Credit Limit Problem

unknown target function


f: X Y

age
gender
annual salary
year in residence
year in job
current debt

credit limit? 5,000

(ideal credit limit formula)

training examples
D : (x1 , y1 ), , (xN , yN )

23 years
female
33,000 USD
1 year
0.5 year
20,000

learning
algorithm
A

(historical records in bank)

final hypothesis
gf
(learned formula to be used)

hypothesis set
H
(set of candidate formula)

Y = R: regression
2/23

Linear Regression

Linear Regression Problem

Linear Regression Hypothesis


age
annual salary
year in job
current debt

23 years
$ 33.000
0.5 year
20,000

For x = (x0 , x1 , x2 , , xd ) features of customer,

approximate the desired credit limit with a weighted sum:


y

d
X

wi xi

i=0

linear regression hypothesis: h(x) = wT x

h(x): like perceptron, but without the sign

3/23

Linear Regression

Linear Regression Problem

Illustration of Linear Regression


x = (x) R

x = (x1 , x2 ) R2

x1

x2

linear regression:
find lines/hyperplanes with small residuals
4/23

Linear Regression

Linear Regression Problem

The Error Measure


popular/historical error measure:
squared error err(y , y ) = (y y )2

in-sample

out-of-sample

N
1X
(h(xn ) yn )2
Ein (hw) =
| {z }
N
n=1

Eout (w) =

E
(x,y )P

(wT x y )2

wT xn

next: how to minimize Ein (w)?

5/23

Linear Regression

Linear Regression Problem

Fun Time
Consider using linear regression hypothesis h(x) = wT x to
predict the credit limit of customers x. Which feature below shall
have a positive weight in a good hypothesis for the task?
1

birth month

monthly income

current debt

number of credit cards owned

Reference Answer: 2
Customers with higher monthly income should
naturally be given a higher credit limit, which is
captured by the positive weight on the monthly
income feature.
6/23

Linear Regression

Linear Regression Algorithm

Matrix Form of Ein (w)


Ein (w) =

N
N
1X T
1X T
2
(w xn yn ) =
(xn w yn )2
N
N
n=1

n=1

2
T
x1 w y1

T

1
x
w

y
2

2


...
N

xT w y
N
N

xT1

T

1
x2 w

...
N

xT
N
1
k X
w y k2
N |{z} |{z} |{z}
Nd+1 d+11

2
y1


y2

...

yN

N1

7/23

Linear Regression

Linear Regression Algorithm

min Ein (w) =


w

1
kXw yk2
N

Ein (w): continuous, differentiable, convex


necessary condition of best w

Ein

Ein (w)

Ein
w 0 (w)
Ein
w 1 (w)

...
Ein
w d (w)

0
0
...
0

task: find wLIN such that Ein (wLIN ) = 0

8/23

Linear Regression

Linear Regression Algorithm

The Gradient Ein (w)

Ein (w) =

1
1
kXw yk2 = wT XT X w 2wT XT y + yT y
|{z}
| {z }
|{z}
N
N
b

vector w

one w only
Ein (w)= N1

aw 2bw + c



Ein (w)= N1 wT Aw 2wT b + c

Ein (w)= N1 (2aw 2b)

Ein (w)= N1 (2Aw 2b)

simple! :-)

similar (derived by definition)


Ein (w) =

2
N

XT Xw XT y

9/23

Linear Regression

Linear Regression Algorithm

Optimal Linear Regression Weights


task: find wLIN such that

2
N

invertible XT X
easy! unique solution

wLIN =

1
XT y
XT X
|
{z
}
pseudo-inverse X


often the case because


XT Xw XT y = Ein (w) = 0

singular XT X
many optimal solutions
one of the solutions

wLIN = X y
by defining X in other ways

N d +1
practical suggestion:
use well-implemented routine
1 T
instead of XT X
X
for numerical stability when almost-singular
10/23

Linear Regression

Linear Regression Algorithm

Linear Regression Algorithm


1

from D, construct input matrix X and output vector y by

xT1
y1

xT

2
y = y2
X=

T
yN
xN
| {z }
|
{z
}
N(d+1)

calculate pseudo-inverse

N1

X
|{z}

(d+1)N
3

return wLIN = X y
|{z}
(d+1)1

simple and efficient


with good routine
11/23

Linear Regression

Linear Regression Algorithm

Fun Time
After getting wLIN , we can calculate the predictions yn = wTLIN xn . If all yn
similar to how we form y, what is the matrix
are collected in a vector y
?
formula of y
1

XXT y

XX y

XX XXT y

Reference Answer: 3
= XwLIN . Then, a simple
Note that y
substitution of wLIN reveals the answer.

12/23

Linear Regression

Linear Regression for Binary Classification

Linear Classification vs. Linear Regression


Linear Regression

Linear Classification
Y = {1, +1}
T

Y = R

h(x) = sign(w x)
err(y , y ) = Jy 6= y K

h(x) = wT x
err(y , y ) = (y y )2

NP-hard to solve in general

efficient analytic solution

{1, +1} R: linear regression for classification?


1

run LinReg on binary classification data D (efficient)

return g(x) = sign(wTLIN x)


but explanation of this heuristic?
19/23

Linear Regression

Linear Regression for Binary Classification

Relation of Two Errors


r
z
err0/1 = sign(wT x) 6= y


2
errsqr = wT x y

desired y = 1

desired y = 1
squared
0/1

err

err

wTx

wTx

err0/1 errsqr

20/23

Linear Regression

Linear Regression for Binary Classification

Linear Regression for Binary Classification


err0/1 errsqr

classification Eout (w)

VC

classification Ein (w) + . . . . . .

regression Ein (w) + . . . . . .

(loose) upper bound errsqr as ec


rr to approximate err0/1
trade bound tightness for efficiency

wLIN : useful baseline classifier,


or as initial PLA/pocket vector

21/23

Linear Regression

Linear Regression for Binary Classification

Summary

1
2

When Can Machines Learn?


Why Can Machines Learn?

Lecture 8: Noise and Error


3

How Can Machines Learn?

Lecture 9: Linear Regression


Linear Regression Problem
use hyperplanes to approximate real values
Linear Regression Algorithm
analytic solution with pseudo-inverse
Generalization Issue
Eout Ein 2(d+1)
on average
N
Linear Regression for Binary Classification
0/1 error squared error
next: binary classification, regression, and then?
4

How Can Machines Learn Better?


23/23

Você também pode gostar