Lecture 1.0

Simple Linear Regression Model
Simple Linear Regression Model
• The equation that describes how y is related
Simple and Multiple
Simple and Multiple to x and an error term is called the regression
g
model.
Regression Analysis
g y • The simple linear regression model is:
The simple linear regression model is:
Topic 1
Least Squares Method.
L S h d
Coefficient of Determination.
y = β 0 + β1 x + ε
Model Assumptions
Testing for Significance.
where β 0 , β1 are parameters of the model.
ε is the error term.
Prediction.
Residual Analysis.
12 February 2010 Practice of Econometrics, Kaushik Deb 2
…simple linear regression model
simple linear regression model …simple linear regression model
simple linear regression model
• The simple linear regression equation is: E(y) Positive, Linear Relationship
E ( y ) = β 0 + β1 x
• Graph of the regression equation is a straight
li
line. Slope, β1, is positive
• β0 is the yy intercept of the regression line.
p g
• β1 is the slope of the regression line. β0
• E(y) is the expected value of y for a given x
value. x
12 February 2010 Practice of Econometrics, Kaushik Deb 3 12 February 2010 Practice of Econometrics, Kaushik Deb 4
…simple linear regression model
simple linear regression model …simple linear regression model
simple linear regression model
E(y) Negative, Linear Relationship E(y) No Relationship
Slope β1=0
Slope, 0
Slope, β1, is negative
β0 β0
x x
Estimated Simple Linear Regression
Estimation Process
Estimation Process
Equation
Sample Data:
Regression Model
Regression
Regression Model
• The estimated simple linear regression equation x y
y = β 0 + β 1x + ε
is: Regression Equation
x1 y1
yŷ = b0 + b1 x E(y) = β0 + β1x
Unknown Parameters
. .
. .
β 0, β1
xn yn
• Graph of the regression equation is a straight line.
Graph of the regression equation is a straight line
• b0 is the y intercept of the regression line.
• b1 is the slope of the regression line. Estimated
b0 and
and bb1
• yŷ is the expected value of y
is the expected value of y for a given x
for a given x value.
value provide estimates of
Regression Equation
β 0 and
and ββ1 Sample Statistics
b 0, b 1

Least Squares Method
Least Squares Method …least squares method
least squares method
min ∑ (y i − yˆ i ) 2 b1 =
∑( x − x )( y − y ) and b
i i
= y − b1 x
∑( x − x )
i
2 0
• yi is the observed value of the dependent
• xi is the observed value of the independent
is the observed value of the independent
i bl f h th observation.
variable for the i b i variable for the ith observation.
• yˆ i = estimated value of the dependent
p • yi is the observed value of the dependent variable
is the observed value of the dependent variable
th
variable for the i observation. for the ith observation.
• x is the mean of the independent variable.
is the mean of the independent variable
• y is the mean of the dependent variable.
…least squares method
least squares method …least squares method
least squares method
350,000
• Let total costs in our dataset be the
p
dependent variable and number of buses be 300,000
y = 1506.2+16.883x
1506 2+16 883x
the independent variable. 250,000
200,000
200,000
• Then,
Then
b0 = 1506.21 150,000
b1 = 16.88 100,000
50,000
0
0 5,000 10,000 15,000 20,000

Recall ANOVA
Recall, ANOVA Coefficient of Determination
Coefficient of Determination
TSS = ESS + RSS
∑( )
2
yˆ i − y
∑( y ) = ∑ ( yˆ − y ) + ∑ ( y − yˆ ) r 2 = ESS
2 2
=
2
−y TSS
∑( y )
i i i i 2
i −y
∑( y − y ) = ∑ ((b + b x ) − y ) + ∑ ( y − (b + b x ))
2 2 2
i 0 1 i i 0 1 i
• How well the model explains the variation in
• TSS: Total Sum of Squares.
TSS: Total Sum of Squares p
the dependent variable?
• ESS: Explained Sum of Squares, or SSE. • In our model,
• RSS: Residual Sum of Squares, or MSTR. – r2= 0.962
Sample Correlation Coefficient

Sample Correlation Coefficient Assumptions about the Residuals
Assumptions about the Residuals
• Recall, Covariance and Correlation. • The error ε is a random variable with mean of
sxy =
∑ ( xi − x )( yi − y ) zero.
n −1 • The variance of ε, σ2, is the same for all values
s of the independent variable
of the independent variable.
rxy = xy • The values of ε are independent.
sx s y
• Sample Correlation Coefficient: • The error ε is a normally distributed random
variable.
variable
rxy = (sign of b1 ) r 2
Testing for Significance
Testing for Significance testing for significance t Test
…testing for significance, t Test
• To test for a significant regression relationship, N ll h th i H0: β1 = 0.
• Null hypothesis: 0
yp
we must conduct a hypothesis test to • Alternative hypothesis: Ha : β1 ≠ 0.
determine whether the value of b1 is zero. • Test Statistic:
b1
Two tests are commonly used t test and F
• Two tests are commonly used, t test and F t=
sb1
test. • Rule:
• Both require an estimate of σ2, the variance of – Do not reject H0 if
b1
ε in the regression model.
in the regression model. −tα ≤ ≤ + tα
2 sb1 2
=∑
( yi − yˆ i )
2
s 2 = RSS – Else, reject H0.
n−2 n−2
Confidence Interval for β1

Confidence Interval for β testing for significance F Test
…testing for significance, F Test
• H0 is rejected if the hypothesized value of β1 is • N ll h th i H0: β1 = 0.
Null hypothesis: 0
not included in the confidence interval for b1. • Alternative hypothesis: Ha : β1 ≠ 0.
• Test Statistic: ESS
b1 ± tα /2 sb1 F= k ,k = 1
RSS
• Hypothesized value here:
yp • Rule: n−k
– Do not reject H0 if
– β1 = 0 ESS
≤ Fα
RSS 2
n −1
– Else, reject H0.

Prediction
Prediction
SStandard
d d
Coefficients Error t Stat p value Lower 95% Upper 95% • Since, Confidence Interval Estimate of E(y):
b0 1506.206 3002.668 0.502 0.619 ‐4602.769 7615.180 yŷ ± tα /2 s yˆ
b1 16.883 0.581 29.063 0.000 15.701 18.065
• Prediction Interval Estimate of y:
df SS MS F Significance F
Regression 1 1.67E+11 1.67E+11 844.632 0.000
y ± tα /2 syˆ
Residual 33 6.52E+09 1.98E+08
Total 34 1.73E+11
Residual Analysis
Residual Analysis …residual analysis
residual analysis
• If
If the assumptions about the error term ε
th ti b t th t appear 50000
questionable, the hypothesis tests about the significance of 40000
the regression relationship and the interval estimation
results may not be valid. 30000
• The residuals provide the best information about ε. 20000
Residual for Observation i:
• Residual for Observation i: 10000
y i − yˆ i 0
• If the assumption that the variance of ε is the same for all
If the assumption that the variance of ε is the same for all ‐10000 0
0 5,000
5,000 10,000
10,000 15,000
15,000 20,000
20,000
values of x is valid, and the assumed regression model is an ‐20000
adequate representation of the relationship between the
variables then the residual plot should give an overall
variables, then the residual plot should give an overall ‐30000
30000
impression of a horizontal band of points ‐40000
‐50000

Multiple Linear Regression Model
Multiple Linear Regression Model …multiple linear regression model
multiple linear regression model
• The equation that describes how the • The multiple linear regression equation is:
p
dependent variable y y is related to a set of E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x3... + β k xk
independent variables x1, x2, x3, x4,… and an • β0 is the y intercept of the regression line.
error term is called the regression model
error term is called the regression model.
• E(y) is the expected value of y for a given set
• The multiple linear regression model is: of x values.
y = β 0 + β1 x1 + β 2 x2 + β 3 x3... + β k xk + ε
h β 0 , β1 , β 2 , β 3... are parameters
where t off the
th model.
d l
ε is the error term.
Estimated Simple Linear Regression
Estimation Process
Estimation Process
Equation
Regression Model
Regression
Regression Model
• The estimated simple linear regression
h i d i l li i y = β0 + β 1x 1…+
…+ββkxk + ε Sample Data:
equation is: Regression Equation
E(y) = β0 + β1x 1…++β k xk
…+β
x 1 x 2 . . .
. . . xxk y
yˆ = b0 + b1 x1 + b2 x2 ... + bk xk Unknown Parameters . . . .

β 0, β1…βk . . . .
• A
A simple random sample is used to compute
simple random sample is used to compute
sample statistics b0, b1, b2, . . . , bk that are
used as the point estimators of the
used as the point estimators of the
parameters β0, β1,. . . , βk. Estimated
b0 , , bb1 ,…
,… bbk Regression Equation
• yŷ is the expected value of y for a set of given
i th t d l f f t f i provide estimates of yˆ = b0 + b1 x1 + b2 x2 ... + bk xk
β 0, β1…βk
x values. Sample Statistics
b0 , , bb1 ,…
,… bbk

Least Squares Method
Least Squares Method Multiple Coefficient of Determination
Multiple Coefficient of Determination
min ∑ (y i − yˆ i ) 2
TSS = ESS + RSS
• The formulas for the regression coefficients b0, b1, ∑( y ) = ∑ ( yˆ − y ) + ∑ ( y − yˆ )
2 2 2
i − y i i i
b2, . . . , b k involve the use of matrix algebra.
involve the use of matrix algebra
• We rely on computers.
∑ ( yˆ )
2
• bi represents an estimate of the change in y −y
R = ESS
i
2
=
TSS
∑( y − y)
2
corresponding to a 1 unit increase in
p g x1 when all i
other independent variables are held constant.
Adjusted Multiple Coefficient of
Determination
• In general, adding more independent variables • The error ε is a random variable with mean of
increases the R2. Two effects: zero.
– Improved explanatory power. • The variance of ε, σ2, is the same for all values
– More independent variables.
More independent variables of the independent variables
of the independent variables.
• To adjust for the latter, • The values of ε are independent.
n −1 • The error ε is a normally distributed random
Ra2 = 1 − (1 − R 2 ) variable.
variable
n − k −1

Testing for Significance
Testing for Significance testing for significance F Test
…testing for significance, F Test
• To test for a significant regression relationship, • The F test is used to determine whether a
yp
we must conduct a hypothesis test to significant relationship exists between the
g p
determine whether the value of bi is zero. dependent variable and the set of all the
The two tests t test and F
• The two tests, t test and F test, estimate
test estimate independent variables
independent variables.
different things now. • The F test is referred to as the test for overall
• Both require an estimate of σ2, the variance of significance.
f
ε in the regression model.
in the regression model.
=∑
(y − yˆ i )
2
s = RSS
2 i
n−2 n−2
testing for significance F Test

…testing for significance, F Test …testing for significance, t Test
testing for significance t Test
• Null hypothesis: H0: β0 = β1 = β2 =… =βk = 0. • If the F test shows an overall significance, the
• Alternative hypothesis: Ha : βi ≠ 0 for some i. t test is used to determine whether each of
• Test Statistic: ESS the individual independent variables is
F= k
RSS significant.
significant
n−k
• Rule: • A separate t test is conducted for each of the
t j t H0 if
– Do not reject
D independent variables in the model.
bl h l
ESS
k ≤ Fα We refer to each of these t tests as a test for
• We refer to each of these t tests as a test for
RSS
SS 2
– Else, reject H0. n−k individual significance.

testing for significance t Test
…testing for significance, t Test Confidence Interval for βi
Confidence Interval for β
N ll h th i H0: βi = 0.
• Null hypothesis: 0 • H0 is rejected if the hypothesized value of βi is
• Alternative hypothesis: Ha : βi ≠ 0. not included in the confidence interval for bi.
• Test Statistic:
t=
bi bi ± tα /2 sb1
sbi
• Rule: • Hypothesized value here:
yp
– Do not reject H0 if
bi – bi = 0
−tα ≤ ≤ + tα
2 sbi 2
– Else, reject H0.
Regression Statistics
Multiple R 0.98
…least squares method
least squares method R Square 0.96
Adjusted R Square
Adjusted R Square 0.96
• Let total costs in our dataset be the
Standard Error 13935.44
p
dependent variable and number of buses and
Observations 35
fuel efficiency be the independent variables.
ANOVA df SS MS F Significance F
Regression 2 1.67E+11 8.36E+10 430.42 0.00
TC = β 0 + β1Buses + β 3 FE + ε Residual 32 6.21E+09 1.94E+08
Total 34 1.73E+11
Standard
Coefficients Error t Stat
t Stat P value
P‐value Lower 95%
Lower 95% Upper 95%
Upper 95%
Intercept 22328.46 16871.26 1.32 0.20 ‐12037.16 56694.09
Bus 17.35 0.69 25.30 0.00 15.95 18.75
12 February 2010 Practice of Econometrics, Kaushik Deb 39 FE 12 February 2010 ‐5227.59 4169.20 ‐1.25

Practice of Econometrics, Kaushik Deb 0.22 ‐13719.98 3264.79
40
Prediction
Prediction
• We substitute the given values of x1, x2, x3,
x4,,… into the estimated regression equation
g q
and use the corresponding value of y as the
point estimate
point estimate.
• Prediction Interval Estimate of y:
y ± tα /2 syˆ
12 February 2010 Practice of Econometrics, Kaushik Deb 41

Lecture 1.0

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lecture 1.0

Enviado por

Direitos autorais:

Formatos disponíveis

Simple Linear Regression Model

12 February 2010 Practice of Econometrics, Kaushik Deb 7 12 February 2010 Practice of Econometrics, Kaushik Deb 8

12 February 2010 Practice of Econometrics, Kaushik Deb 9 12 February 2010 Practice of Econometrics, Kaushik Deb 10

12 February 2010 Practice of Econometrics, Kaushik Deb 11 12 February 2010 Practice of Econometrics, Kaushik Deb 12

12 February 2010 Practice of Econometrics, Kaushik Deb 13 12 February 2010 Practice of Econometrics, Kaushik Deb 14

Sample Correlation Coefficient

Confidence Interval for β1

12 February 2010 Practice of Econometrics, Kaushik Deb 19 12 February 2010 Practice of Econometrics, Kaushik Deb 20

12 February 2010 Practice of Econometrics, Kaushik Deb 21 12 February 2010 Practice of Econometrics, Kaushik Deb 22

12 February 2010 Practice of Econometrics, Kaushik Deb 23 12 February 2010 Practice of Econometrics, Kaushik Deb 24

yˆ = b0 + b1 x1 + b2 x2 ... + bk xk Unknown Parameters . . . .

12 February 2010 Practice of Econometrics, Kaushik Deb 27 12 February 2010 Practice of Econometrics, Kaushik Deb 28

12 February 2010 Practice of Econometrics, Kaushik Deb 29 12 February 2010 Practice of Econometrics, Kaushik Deb 30

12 February 2010 Practice of Econometrics, Kaushik Deb 31 12 February 2010 Practice of Econometrics, Kaushik Deb 32

testing for significance F Test

12 February 2010 Practice of Econometrics, Kaushik Deb 35 12 February 2010 Practice of Econometrics, Kaushik Deb 36

12 February 2010 Practice of Econometrics, Kaushik Deb 37 12 February 2010 Practice of Econometrics, Kaushik Deb 38

Regression 2 1.67E+11 8.36E+10 430.42 0.00

TC = β 0 + β1Buses + β 3 FE + ε Residual 32 6.21E+09 1.94E+08

Intercept 22328.46 16871.26 1.32 0.20 ‐12037.16 56694.09

Bus 17.35 0.69 25.30 0.00 15.95 18.75

12 February 2010 Practice of Econometrics, Kaushik Deb 39 FE 12 February 2010 ‐5227.59 4169.20 ‐1.25

Você também pode gostar