Escolar Documentos
Profissional Documentos
Cultura Documentos
Chap 11-1
Chapter Goals
After completing this chapter, you should be able to:
Explain the simple linear regression model Obtain and interpret the simple linear regression equation for a set of data Evaluate regression residuals for aptness of the fitted model Understand the assumptions behind regression analysis Explain measures of variation and determine whether the independent variable is significant
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-2
Chapter Goals
(continued)
Chap 11-4
Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable
Chap 11-5
Chap 11-6
Types of Relationships
Linear relationships Y Y Curvilinear relationships
X Y Y
X
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
X
Chap 11-7
Types of Relationships
(continued) Strong relationships Y Y Weak relationships
X Y Y
X
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
X
Chap 11-8
Types of Relationships
(continued) No relationship Y
X Y
X
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-9
Yi 0 1Xi i
Linear component
Chap 11-10
Y
Observed Value of Y for Xi
Yi 0 1Xi i
i
Slope = 1 Random Error for this Xi value
Intercept = 0
Xi
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
X
Chap 11-11
a bX Y i i
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
n xi yi ( xi )( yi )
i 1 i 1 i 1
n xi2 ( xi ) 2
i 1 i 1
a y bx
Chap 11-13
Chap 11-14
308
199 219 405
1875
1100 1550 2350
324
319 255
2450
1425 1700
Chap 11-15
Excel Output
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.76211 0.58082 0.52842 41.33032 10
ANOVA
df Regression Residual Total 1 8 9 SS 18934.9348 13665.5652 32600.5000 MS 18934.9348 1708.1957 F 11.0848 Significance F 0.01039
Chap 11-16
Graphical Presentation
House price model: scatter plot and regression line
450
400 350 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Square Feet
Slope = 0.10977
Intercept = 98.248
Chap 11-18
b measures the estimated change in the average value of Y as a result of a oneunit change in X
Here, b = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size
Chap 11-19
SST SSR
Regression Sum of Squares
SSE
Error Sum of Squares
SST ( Yi Y)2
where:
Measures of Variation
(continued)
Chap 11-23
Measures of Variation
(continued)
Y Yi
SST = (Yi - Y)2
SSE = (Yi - Yi )2
_
Y
_ Y
Xi
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
X
Chap 11-24
note:
0 R 1
2
Chap 11-25
Excel Output
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.76211 0.58082 0.52842 41.33032 10
ANOVA
df Regression Residual Total 1 8 9 18934.9348 13665.5652 32600.5000
Chap 11-26
S XX r b S XY / KK ( S XX SYY ) SYY
(X
i 1 n i 1
X )(Yi Y )
2 2 ( Y Y ) i i 1 n
(Xi X )
Chap 11-27
S YX
Where
SSE n2
n2
Chap 11-28
Excel Output
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.76211 0.58082 0.52842 41.33032 10
S YX 41.33032
ANOVA
df Regression Residual Total 1 8 9 SS 18934.9348 13665.5652 32600.5000 MS 18934.9348 1708.1957 F 11.0848 Significance F 0.01039
Chap 11-29
small s YX
large s YX
The magnitude of SYX should always be judged relative to the size of the Y values in the sample data i.e., SYX = $41.33K is moderately small relative to house prices in the $200 - $300K range
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-30
Residual Analysis
ei Yi Y i
The residual for observation i, ei, is the difference between its observed and predicted value Check the assumptions of regression by examining the residuals
Examine for linearity assumption Examine for constant variance for all levels of X (homoscedasticity) Evaluate normal distribution assumption Evaluate independence assumption
x
residuals residuals
Not Linear
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
Linear
Chap 11-32
x
residuals
x
residuals
x Non-constant variance
Constant variance
Chap 11-33
Not Independent
residuals
X
residuals
Independent
residuals
Chap 11-34
SYX Sb SSX
where:
SYX
(X X )
i
Sb
S YX
Excel Output
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.76211 0.58082 0.52842 41.33032 10
Sb 0.03297
SS MS 18934.9348 1708.1957 F 11.0848 Significance F 0.01039 18934.9348 13665.5652 32600.5000
ANOVA
df Regression Residual Total 1 8 9
Chap 11-36
Test statistic
b 1 t Sb
d.f. n 2
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
where: b = regression slope coefficient 1 = hypothesized slope Sb = standard error of the slope
Chap 11-37
The slope of this model is 0.1098 Does square footage of the house affect its sales price?
324
319 255
2450
1425 1700
Chap 11-38
b
Standard Error 58.03348
Sb
t Stat 1.69296 P-value 0.12892
Square Feet
0.10977
0.03297
3.32938
0.01039
Chap 11-39
b
Standard Error 58.03348
Sb
t Stat 1.69296
t
P-value 0.12892
Square Feet
0.10977
0.03297
3.32938
0.01039
Reject H0
-t/2 -2.3060
Do not reject H0
Reject H
Decision: Reject H0 Conclusion: There is sufficient evidence that square footage affects house price
Chap 11-40
P-value = 0.01039
H0: 1 = 0 H1: 1 0 From Excel output:
Coefficients Intercept 98.24833 Standard Error 58.03348
P-value
t Stat 1.69296 P-value 0.12892
Square Feet
0.10977
0.03297
3.32938
0.01039
This is a two-tail test, so the p-value is P(t > 3.329)+P(t < -3.329) = 0.01039 (for 8 d.f.)
Decision: P-value < so Reject H0 Conclusion: There is sufficient evidence that square footage affects house price
Chap 11-41
Excel Output
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.76211 0.58082 0.52842 41.33032 10
ANOVA
df Regression Residual Total 1 8 9 18934.9348 13665.5652 32600.5000
Chap 11-43
Test Statistic:
MSR F 11.08 MSE
a = .05
Do not reject H0
Reject H0
F.05 = 5.32
b1 t n2Sb1
Excel Printout for House Prices:
Coefficients Intercept Square Feet 98.24833 0.10977 Standard Error 58.03348 0.03297 t Stat 1.69296 3.32938 P-value 0.12892 0.01039
d.f. = n - 2
At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858)
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-45
Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.70 and $185.80 per square foot of house size
This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-46
Chap 11-47
If there is violation of any assumption, use alternative methods or models If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals Avoid making predictions or forecasts outside the relevant range
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-48
Chapter Summary
Introduced types of regression models Reviewed assumptions of regression and correlation Discussed determining the simple linear regression equation Described measures of variation Discussed residual analysis Addressed measuring autocorrelation
Chap 11-49
Chapter Summary
(continued)
Described inference about the slope Discussed correlation -- measuring the strength of the association Addressed estimation of mean values and prediction of individual values Discussed possible pitfalls in regression and recommended strategies to avoid them
Chap 11-50