Escolar Documentos
Profissional Documentos
Cultura Documentos
Ho Tu Bao
Japan Advance Institute of Science and Technology
John von Neumann Institute, VNU-HCM
Machine learning
Newer, fast development
Aims to exploit datasets to learn
Early focused on symbolic data
Tends closely to data mining
(more practical exploitation of
large datasets
Increasing employs statistical
methods
More practical with computing
power
ML people: need to learn
statistics!
Outline
1. Introduction
2. The Regression Function and Least Squares
3. Prediction Accuracy and Model Assessment
Introduction
Model and modeling
Model:
M t khi qut hoc tru tng ha ca mt thc th
(simplified description or abstraction of a reality).
Introduction
History
The earliest form of regression was the method of least squares which
was published by Legendre in 1805 and by Gauss in 1809.
The term regression coined by Francis Galton in the 19th century to
describe a biological phenomenon which was extended by Udny Yule
and Karl Pearson to a more general statistical context (1897, 1903).
In 1950s, 1960s, economists used electromechanical desk calculators
to calculate regressions. Before 1970, it sometimes took up to 24 hours
to receive the result from one regression.
Regression methods continue to be an area of active research. In recent
decades, new methods have been developed for robust regression in ,
time series, images, graphs, or other complex data objects,
nonparametric regression, Bayesian methods for regression, etc.
5
Introduction
Regression and model
Given
, , = 1, ,
Introduction
Least square fit
Problem statement
Adjusting the parameters of a
model function to best fit a data set
Data set
, , = 1, ,
Model function
,
Sum of squared residuals
2
=1
= ,
E.g.
, = 0 + 1
0 (phn b chn)
1 ( dc)
7
Introduction
Least square fit
Gradient on
=2
= 0, = 1, ,
Gradient equation
2
,
= 0, = 1, ,
Introduction
Least square fit
, =
=1
,
=
Iterative approximation
Shift vector
+1 = +
, = , +
=1
= , +
=1
Gradient equation
Gauss-Newton algorithm
=1
=1
=0
Introduction
Simple linear regression and correlation
Okuns law (Macroeconomics): An example of the simple linear regression
The GDP growth is presumed to be in a linear relationship with the
changes in the unemployment rate.
10
Introduction
Simple linear regression and correlation
Introduction
Simple linear regression
Example: Do all houses of the same size sell for exactly the same price?
Models
The cost of building a new house is about $75 per square foot and most
lots sell for about $25,000. Hence the approximate selling price (Y)
would be:
Y = $25,000 + (75$/ft2)(X)
(where X is the size of the house in square feet)
12
Introduction
Background of model design
The facts
Having too many input variables in the regression model
an overfitting regression function with an inflated variance
Having too few input variables in the regression model
an underfitting and high bias regression function with poor
explanation of the data
The importance of a variable
Depends on how seriously it will affects prediction accuracy if it is
dropped.
The behind driving force
The desire for a simpler and more easily interpretable regression
model combined with a need for greater accuracy in prediction.
13
Introduction
Simple linear regression
A model of the relationship between house size (independent variable)
and house price (dependent variable) would be:
House
Price
House size
In this model, the price of the house is completely determined by the size
14
Introduction
Simple linear regression
A model of the relationship between house size (independent variable) and
house price (dependent variable) would be:
House
Price
In this model, the price of the house is completely determined by the size
15
Introduction
Simple linear regression
Example: Do all houses of the same size sell for exactly the same price?
Probabilistic model:
= 25,000 + 75 +
where is the random term (error variable). It is the difference
between the actual selling price and the estimated price based on the
size of the house. Its value will vary from house sale to house sale, even
if the square footage (X) remains the same.
= 0 + 1 +
intercept
slope
error variable
independent
variable
16
Introduction
Regression and model
Techniques for modeling and analyzing the relationship between
dependent variables and independent variables.
1
=
1
=
Different forms of
:
linear vs. nonlinear,
parametric vs.
nonparametric.
Linear combination of the parameters (but need not be linear in the independent variables)
17
Introduction
Model selection and model assessment
18
Outline
1. Introduction
2. The Regression Function and Least Squares
3. Prediction Accuracy and Model Assessment
19
Loss function
,
Risk function
= ,
= =
= +
21
Taking = = * = +, the
previous equation is minimized
Data
*( )2 | = +
Statistical
model
= *( )2 = +
= min =
Systematic
component
+
Random
errors
22
Assumption
The output variables are linearly related to the input variables
The model
= 0 +
=1
= 0 +
=1
The tasks
E(Y|X=x) = 0 + 1x
24
25
Regression Plot
my|x=a + x
my|x=a + x
y
Error:
} = Slope
N(my|x, sy|x2)
a = Intercept
X
x
Identical normal
distributions of errors,
all centered on the
regression line.
26
Regression coefficients
1
= ,
0
1
, =
Regression function
= = 0 + = 0 +
1
1
=1
Let
=
27
We get
= E
1 E
1
=
, 0 =
n
1 ,
1 ,
= , , , = , , , = , = ,
1 ,
0 =
28
11
1
1
= 0 +
+ ,
= 1,2, ,
=1
= +
: random n-vector of unobservable errors with = 0, var = 2 .
2
=1 e
= = ( ) ( )
29
= ( )1
Even though the descriptions differ as to how the input data are
generated, the ordinary least-squares estimator estimates turn out to be
the same for the random-X case and the fixed-X case:
=
1 ,
0 =
The components of the n-vector of OLS fitted values are the vertical
projections of the n points onto the LS regression surface (or hyperplane)
= = , = 1, , .
30
= = ( )1 =
where the (nn)-matrix H = ( )1 is often called the hat matrix.
Residual variance
2 =
Regression sum of
of squares
=1(
Use F-statistic, =
=1(
=1(
)2 =
)2 =
( )
)2 = ( ) ( )
(1)
If = 0, use t-statistic, =
bodyfat = 0 + 1(age)
+ 2(weight) + 3(height)
+ 4(neck) + 5(chest)
+ 6(abdomen) + 7(hip)
+ 8(thigh) + 9(knee)
+ 10(ankle)
+ 11(biceps)
+ 12(forearm)
+ 13(wrist) + e
33
34
Outline
1. Introduction
2. The Regression Function and Least Squares
3. Prediction Accuracy and Model Assessment
35
The aims
Prediction is the art of making accurate guesses about new
response values that are independent of the current data.
Good predictive ability is often recognized as the most useful way of
assessing the fit of a model to data.
Practice
Learning data = * , , = 1, , + for regression of on .
by applying the fitted model to a brand Prediction of a new
new , from the test set .
is compared with the actual response value. The
Predicted
predictive ability of the regression model is assessed by its
prediction error (or generalization error), an overall measure of the
quality of the prediction, usually taken to be mean squared error.
36
= 0 +
+ = +
=1
where = , = 0, = 2
Prediction Error
=
= 2 +
38
+ = +
= 0 +
=1
where = 0 +
=1
Prediction Error
=
= 2 +
=1
Model Error
1
=
=1
40
Outline
1. Introduction
2. The Regression Function and Least Squares
3. Prediction Accuracy and Model Assessment
41
42
Data
Evaluate
Model Builder
Predictions
+
+
-
Validation set
Model
Builder
Training set
Final Model
+
- Final Evaluation
+
-
43
Fixed-X case: Conditional bootstrap are appropriate but crossvalidation is not appropriate for estimating prediction error.
45
= , /
1
2
3
Random
Partition
Cross-Validation (V-fold)
1 , , , =
=1
data
=1
1
=
, =
1 2
3 1
2 3
Training
Average
Error
Testing
Learning
=1
46
Unconditional Bootstrap
, = 1, ,
,
1
=
=1
=1
1
=
, =
=1 =1
1
=
=1
1
=
=1 =1
47
Bootstrap
Unconditional Bootstrap
=
=1
=
+
48
Bootstrap
Unconditional Bootstrap
The optimism (improvement of by estimating the bias for
using as an estimate of and then correcting by
subtracting its estimated bias)
=
+
1
Prob( , = 1 1
1 1 0.632 as
49
Bootstrap
Conditional Bootstrap
Coefficients determination
Estimate by minimizing with respect to .
=
, = + , = 1, 2,
= 1
50
Outline
1. Introduction
2. The Regression Function and Least Squares
3. Prediction Accuracy and Model Assessment
51
52
Variable selection
Motivation
Having too many input variables in the regression model
an overfitting regression function with an inflated variance
Having too few input variables in the regression model
an underfitting and high bias regression function with poor
explanation of the data
The importance of a variable
Depends on how seriously it will affects prediction accuracy if it is
dropped
The behind driving force
The desire for a simpler and more easily interpretable regression
model combined with a need for greater accuracy in prediction.
54
Regularized regression
+ ()
=1
55
Regularized regression
Two-dimensional contours
of the symmetric penalty
function
pq() = |1|q + |2|q = 1 for
q = 0.2, 0.5, 1, 2, 5. The
case q = 1 (blue diamond)
yields the lasso and q = 2
(red circle) yields ridge
regression.
56
Regularized regression
The Lasso
+ ()
=1
57
Regularized regression
The Lasso
58
Regularized regression
The Garotte
=1 w (nonnegative
2
=1 w (garotte)
1. , =
2. =
garotte, tht c)
Outline
1. Introduction
2. The Regression Function and Least Squares
3. Prediction Accuracy and Model Assessment
60
Multivariate regression
61