Escolar Documentos
Profissional Documentos
Cultura Documentos
Simple Linear Regression Model
• The equation that describes how y is related
Simple and Multiple
Simple and Multiple to x and an error term is called the regression
g
model.
Regression Analysis
g y • The simple linear regression model is:
The simple linear regression model is:
Topic 1
Least Squares Method.
L S h d
Coefficient of Determination.
y = β 0 + β1 x + ε
Model Assumptions
Testing for Significance.
where β 0 , β1 are parameters of the model.
ε is the error term.
Prediction.
Residual Analysis.
12 February 2010 Practice of Econometrics, Kaushik Deb 2
…simple linear regression model
simple linear regression model …simple linear regression model
simple linear regression model
• The simple linear regression equation is: E(y) Positive, Linear Relationship
E ( y ) = β 0 + β1 x
• Graph of the regression equation is a straight
li
line. Slope, β1, is positive
• β0 is the yy intercept of the regression line.
p g
• β1 is the slope of the regression line. β0
• E(y) is the expected value of y for a given x
value. x
12 February 2010 Practice of Econometrics, Kaushik Deb 3 12 February 2010 Practice of Econometrics, Kaushik Deb 4
…simple linear regression model
simple linear regression model …simple linear regression model
simple linear regression model
E(y) Negative, Linear Relationship E(y) No Relationship
Slope β1=0
Slope, 0
Slope, β1, is negative
β0 β0
x x
12 February 2010 Practice of Econometrics, Kaushik Deb 5 12 February 2010 Practice of Econometrics, Kaushik Deb 6
Estimated Simple Linear Regression
Estimation Process
Estimation Process
Equation
Sample Data:
Regression Model
Regression
Regression Model
• The estimated simple linear regression equation x y
y = β 0 + β 1x + ε
is: Regression Equation
x1 y1
yŷ = b0 + b1 x E(y) = β0 + β1x
Unknown Parameters
. .
. .
β 0, β1
xn yn
• Graph of the regression equation is a straight line.
Graph of the regression equation is a straight line
• b0 is the y intercept of the regression line.
• b1 is the slope of the regression line. Estimated
b0 and
and bb1
• yŷ is the expected value of y
is the expected value of y for a given x
for a given x value.
value provide estimates of
Regression Equation
β 0 and
and ββ1 Sample Statistics
b 0, b 1
min ∑ (y i − yˆ i ) 2 b1 =
∑( x − x )( y − y ) and b
i i
= y − b1 x
∑( x − x )
i
2 0
• yi is the observed value of the dependent
• xi is the observed value of the independent
is the observed value of the independent
i bl f h th observation.
variable for the i b i variable for the ith observation.
• yˆ i = estimated value of the dependent
p • yi is the observed value of the dependent variable
is the observed value of the dependent variable
th
variable for the i observation. for the ith observation.
• x is the mean of the independent variable.
is the mean of the independent variable
• y is the mean of the dependent variable.
…least squares method
least squares method …least squares method
least squares method
350,000
• Let total costs in our dataset be the
p
dependent variable and number of buses be 300,000
y = 1506.2+16.883x
1506 2+16 883x
the independent variable. 250,000
200,000
200,000
• Then,
Then
b0 = 1506.21 150,000
b1 = 16.88 100,000
50,000
0
0 5,000 10,000 15,000 20,000
sxy =
∑ ( xi − x )( yi − y ) zero.
n −1 • The variance of ε, σ2, is the same for all values
s of the independent variable
of the independent variable.
rxy = xy • The values of ε are independent.
sx s y
• Sample Correlation Coefficient: • The error ε is a normally distributed random
variable.
variable
rxy = (sign of b1 ) r 2
12 February 2010 Practice of Econometrics, Kaushik Deb 15 12 February 2010 Practice of Econometrics, Kaushik Deb 16
Testing for Significance
Testing for Significance testing for significance t Test
…testing for significance, t Test
• To test for a significant regression relationship, N ll h th i H0: β1 = 0.
• Null hypothesis: 0
yp
we must conduct a hypothesis test to • Alternative hypothesis: Ha : β1 ≠ 0.
determine whether the value of b1 is zero. • Test Statistic:
b1
Two tests are commonly used t test and F
• Two tests are commonly used, t test and F t=
sb1
test. • Rule:
• Both require an estimate of σ2, the variance of – Do not reject H0 if
b1
ε in the regression model.
in the regression model. −tα ≤ ≤ + tα
2 sb1 2
=∑
( yi − yˆ i )
2
s 2 = RSS – Else, reject H0.
n−2 n−2
12 February 2010 Practice of Econometrics, Kaushik Deb 17 12 February 2010 Practice of Econometrics, Kaushik Deb 18
Residual Analysis
Residual Analysis …residual analysis
residual analysis
• If
If the assumptions about the error term ε
th ti b t th t appear 50000
questionable, the hypothesis tests about the significance of 40000
the regression relationship and the interval estimation
results may not be valid. 30000
• The residuals provide the best information about ε. 20000
Residual for Observation i:
• Residual for Observation i: 10000
y i − yˆ i 0
• If the assumption that the variance of ε is the same for all
If the assumption that the variance of ε is the same for all ‐10000 0
0 5,000
5,000 10,000
10,000 15,000
15,000 20,000
20,000
values of x is valid, and the assumed regression model is an ‐20000
adequate representation of the relationship between the
variables then the residual plot should give an overall
variables, then the residual plot should give an overall ‐30000
30000
impression of a horizontal band of points ‐40000
‐50000
Estimated Simple Linear Regression
Estimation Process
Estimation Process
Equation
Regression Model
Regression
Regression Model
• The estimated simple linear regression
h i d i l li i y = β0 + β 1x 1…+
…+ββkxk + ε Sample Data:
equation is: Regression Equation
E(y) = β0 + β1x 1…++β k xk
…+β
x 1 x 2 . . .
. . . xxk y
min ∑ (y i − yˆ i ) 2
TSS = ESS + RSS
• The formulas for the regression coefficients b0, b1, ∑( y ) = ∑ ( yˆ − y ) + ∑ ( y − yˆ )
2 2 2
i − y i i i
b2, . . . , b k involve the use of matrix algebra.
involve the use of matrix algebra
• We rely on computers.
∑ ( yˆ )
2
• bi represents an estimate of the change in y −y
R = ESS
i
2
=
TSS
∑( y − y)
2
corresponding to a 1 unit increase in
p g x1 when all i
other independent variables are held constant.
Adjusted Multiple Coefficient of
Assumptions about the Residuals
Assumptions about the Residuals
Determination
• In general, adding more independent variables • The error ε is a random variable with mean of
increases the R2. Two effects: zero.
– Improved explanatory power. • The variance of ε, σ2, is the same for all values
– More independent variables.
More independent variables of the independent variables
of the independent variables.
• To adjust for the latter, • The values of ε are independent.
n −1 • The error ε is a normally distributed random
Ra2 = 1 − (1 − R 2 ) variable.
variable
n − k −1
=∑
(y − yˆ i )
2
s = RSS
2 i
n−2 n−2
12 February 2010 Practice of Econometrics, Kaushik Deb 33 12 February 2010 Practice of Econometrics, Kaushik Deb 34
– Else, reject H0.
Regression Statistics
Multiple R 0.98
…least squares method
least squares method R Square 0.96
Adjusted R Square
Adjusted R Square 0.96
• Let total costs in our dataset be the
Standard Error 13935.44
p
dependent variable and number of buses and
Observations 35
fuel efficiency be the independent variables.
ANOVA df SS MS F Significance F
Total 34 1.73E+11
Standard
Coefficients Error t Stat
t Stat P value
P‐value Lower 95%
Lower 95% Upper 95%
Upper 95%
12 February 2010 Practice of Econometrics, Kaushik Deb 41