Escolar Documentos
Profissional Documentos
Cultura Documentos
Outline
Univariate regression
Multivariate regression
Probabilistic view of regression
Loss functions
Bias-Variance analysis
Regularization
Linear Regression
Notations
Training dataset
Number of examples -
Input variable - x#
Target variable - %
Linear Regression
Cost of Film
(Crores of Rs) x
Profit/Loss
(Crores of Rs) y
98.28
199.69
40.22
93.69
62.07
100.33
Linear Regression
Simplest form
(x) = + + - x
Linear Regression
23
3
%8- 6 6
Linear Regression
23
3
%8- % %
Linear Regression
23
3
%8- 6 6
Linear Regression
10
Linear Regression
11
Linear Regression
12
3
23 %8-
x% %
Linear Regression
13
Linear Regression
14
Linear Regression
15
Linear Regression
16
%8-
17
1
w =
; x# %
2
%8-
Linear Regression
18
1
+ = + ; x# %
%8-
1
- = - ; x# % x#
%8-
Linear Regression
19
Example Iteration 0
Regression Function
Linear Regression
Error Function
20
Example Iteration 1
Regression Function
Linear Regression
Error Function
21
Example Iteration 2
Regression Function
Linear Regression
Error Function
22
Example Iteration 4
Regression Function
Linear Regression
Error Function
23
Example Iteration 7
Regression Function
Linear Regression
Error Function
24
Example Iteration 9
Regression Function
Linear Regression
Error Function
25
Gradient Descent
Batch Mode
Update includes contribution of all data points
3
1
+ = + ; x# %
%8-
1
- = - ; x# % x#
%8-
26
Celebrity
status of the
protagonist
# of theatres
release
75.72
7.57
32
52
157.39
18.74
1.87
16
68
81.93
50.96
5.09
27
35
131.95
Linear Regression
27
23
3
%8- x# %
Update equation: B = B 3
%8- x% % %B
3
Linear Regression
28
Gradient Descent
Parameter update equation
3
1
B = B ; x% % %B
%8-
Linear Regression
29
Celebrity
status of the
protagonist
# of theatres
release
Age of the
Profit/Loss
protagonist (Crores of Rs) y
75.72
7.57
32
52
157.39
18.74
1.87
16
68
81.93
50.96
5.09
27
35
131.95
30
Linear Regression
31
Celebrity
status of the
protagonist
# of theatres
release
Age of the
Profit/Loss
protagonist (Crores of Rs) y
75.72
7.57
32
52
157.39
18.74
1.87
16
68
81.93
50.96
5.09
27
35
131.95
X=
Linear Regression
1 -1 2
1 3-
-?
2?
3?
CSL465/603 - Machine Learning
32
1 -1 2
1 3-
-?
2?
3?
1
= ; x# %
2
%8-
Linear Regression
33
Normal Equations
1
min = min X N X
L
L 2
Finding the gradient wrt and equate it to 0
Linear Regression
34
Analytical Solution
Advantage
No need for the learning parameter !
No need for iterative updates
Disadvantage
Need to perform matrix inversion
Linear Regression
35
Linear Regression
36
0, 2 has maximum
entropy among all real-valued
distributions with a specified
variance 2
3- rule:
Linear Regression
37
Then
=
And
x =
Figure 1.28
t
y(x)
y(x0 )
p(t|x0 )
x0
Linear Regression
47
- , , 3 x- , , x3 ; =
Linear Regression
39
Linear Regression
40
Loss Functions
Squared loss 2
Absolute loss
Dead band loss max 0, , ]
Linear Regression
41
Loss Functions
Problem with squared loss
Linear Regression
42
min ; X
L
%8-
min ; % , subject to % x% % %
L,`
Linear Regression
43
Linear Regression
LP output
44
Linear Regression
45
Linear Regression
46
Linear Regression
47
Linear Regression
48
Linear Regression
49
Linear Regression
50
Linear Regression
51
Linear Regression
x x
52
Linear Regression
53
Degree = 4
Linear Regression
54
Linear Regression
55
Variance disappears as
Linear Regression
56
Linear Regression
57
Regularization
Central Idea: penalize over-complicated solutions
Linear regression minimizes
3
; x% %
%8-
; x% %
%8-
Linear Regression
58
Modified Solution
Solution for ordinary linear regression
1
min min N
z
L 2
= N P- N
Now for the regularized version which uses 2 norm
Ridge Regression
1
min min N + 2
z
L 2
= N +
P-
Exercise: derive the closed for solution for ridge regression with L2 regularizer
Linear Regression
59
How to choose ?
Tradeoff between complexity vs. goodness of the fit
Solution 1: If we have lots of data
Generate multiple models
Use lots of test data to discard the bad models
Linear Regression
60
; x% %
%8-
+ ; D
D8-
Quadratic/2 regularizer = 2
3.1. Linear Basis Function Models
Contours for the regularization term
q = 0.5
Figure 3.3
Linear Regression
q=1
q=2
145
q=4
Contours of the regularization term in (3.29) for various values of the parameter q.
CSL465/603 - Machine Learning
61
Error Function: 3
x
%
%8- %
D8- D
For sufficiently large many of the coefficients
become 0 resulting in a sparse solution
146
w2
w2
w1
Linear Regression
w1
62
LASSO
Quadratic programming to solve the optimization
problem
Least Angles Regression solution - refer to ESL
http://web.stanford.edu/~hastie/glmnet_matlab/ matlab packages for LASSO
Linear Regression
63
x = + + ; D x
140
D8-
0.5
0.75
0.75
0.5
0.5
0.5
0.25
0.25
1
1
0
1
Figure 3.1 Examples of basis functions, showing polynomials on the left, Gaussians of the form (3.4) in the
centre, and sigmoidal of the form (3.5) on the right.
Linear Regression
64
1 - x1 - x2
1 - x3
=
Linear Regression
? x ? x2
? x3
P-
65
- =
3-
X = X =
-
3-
1 -1 2
1 3-
=
3
-?
2?
3?
= N
Linear Regression
P-
-+
-?
=
?
66
Summary
Linear Regression (aka curve fitting)
Gradient Descent Approach for finding the solution
Analytical solution
Loss Functions
Probabilistic view of Linear Regression
Bias-Variance analysis
Regularization
Ridge Regression
67