11 Multiple Regression

Applied Statistics-2 for the
Students
of
Executive program in
Business Analytics and
Business Intelligence
Organized
by
IIM Ranchi
Edited By: Dr. K. Maddulety, NITIE, Mumbai,
Mail: koila@rediffmail.com
1
1Using Statistics
Multiple Regression (1)
The k-Variable Multiple Regression

Model
The F Test of a Multiple Regression
Model
How Good is the Regression
Tests of the Significance of Individual
Regression Parameters
Testing the Validity of the Regression
Model
Using the Multiple Regression Model for
Multiple Regression (2)

1
1Qualitative Independent Variables
Polynomial Regression
Nonlinear Models and Transformations

Multicollinearity
Residual Autocorrelation and the DurbinWatson Test
Partial F Tests and Variable Selection
Methods
The Matrix Approach to Multiple
Regression Analysis
Summary and Review of Terms
11-1 Using Statistics

y
Lines
Planes
B
B
Slope: 1
x1
Intercept: 0
Any two points (A and

B), or an intercept and
slope (0 and 1), define
a line on a twodimensional surface.
x2
Any three points (A, B, and C),

or an intercept and
coefficients of x1 and x2 (0 ,
1, and 2), define a plane in a
three-dimensional surface.
11-2 The k-Variable Multiple

Regression Model
Thepopulation
population regression
regression
The
modelof
ofaadependent
dependentvariable,
variable,
model
Y,on
onaaset
setof
ofkkindependent
independent
Y,
variables,XX1,,XX2,.,.....,,XXkisisgiven
given
variables,
1
2
k
by:
by:
Y=0+
+1XX1+
+2XX2+
+......+
+kXXk
Y=
0
1 1
2 2
k k
+
x2
2
where0isisthe
theY-intercept
Y-interceptof
ofthe
the
where
0
x1
regression
surface
and
each
,
i
regression surface and each i i, i
y 0 1 x1 2 x 2
=1,2,...,k
1,2,...,kisisthe
theslope
slopeof
ofthe
the
=
regressionsurface
surface--sometimes
sometimes
regression
Model
assumptions:
Model
assumptions:
called
theresponse
response
surface
- errors.
2
called
the
surface
1.
~N(0,
of
other
2 ), independent
1. ~N(0, ), independent of other errors.
with
respect
toXXX.i. are uncorrelated with the error term.
with
respect
to
2. The
The
variables
2.
variables
Xi iare uncorrelated with the error term.
i
Simple and Multiple Least-Squares

Regression
y
x1
y b0 b1x
X
Inaasimple
simpleregression
regression
In
model,the
theleastleastmodel,
squaresestimators
estimators
squares
minimizethe
thesum
sumof
of
minimize
squarederrors
errorsfrom
fromthe
the
squared
estimatedregression
regression
estimated
line.
line.
x2
y b0 b1 x1 b2 x 2
Inaamultiple
multipleregression
regression
In
model,the
theleast-squares
least-squares
model,
estimatorsminimize
minimizethe
the
estimators
sumof
ofsquared
squarederrors
errorsfrom
from
sum
theestimated
estimatedregression
regression
the
plane.
plane.
The Estimated Regression

Relationship
Theestimated
estimatedregression
regressionrelationship:
relationship:
The
Y b0 b1 X 1 b2 X 2 bk X k
where isisthe
thepredicted
predictedvalue
valueof
ofY,
Y,the
thevalue
valuelying
lyingon
onthe
theestimated
estimated
where
regression
surface. The
Theterms
termsbb0,...,k
,...,kare
arethe
theleast-squares
least-squaresestimates
estimates
regression
surface.
0
Y
of
the
populationregression
regressionparameters
parameters.i.
of the population
i
Theactual,
actual,observed
observedvalue
valueof
ofYYis
isthe
thepredicted
predictedvalue
valueplus
plusan
an
The
error:
error:
=bb0+
+bb1xx1j+
+bb2xx2j+.
+.....+
+bbkxxkj+e
+e
yyj j=
0
1
1j
2
2j
k
kj
Least-Squares Estimation:
The 2-Variable Normal Equations
Minimizing the
the sum
sum of
of squared
squared errors
errors with
with
Minimizing
respect to
to the
the estimated
estimated coefficients
coefficients bb00,, bb11,,
respect
and bb22 yields
yields the
the following
following normal
normal
and
equations:
equations:
y nb b x b x
0
x y b x b x b x x
2
x y b x b x x b x
2
2
2
Example 11-1
YY XX11 XX22
72 12
12 55
72
76 11
11 88
76
78 15
15 66
78
70 10
10 55
70
68 11
11 33
68
80 16
16 99
80
82 14
14 12
12
82
65 88
65
44
62 88
62
33
90 18
18 10
10
90
--- ----- ------743 123
123 65
65
743
X212 XX222 XXY

XY
1X2 X
1Y X
2Y
XX1X
2
1
2
1
2
60 144
144 25
25 864
864 360
360
60
88 121
121 64
64 836
836 608
608
88
90 225
225 36
36 1170
1170468
468
90
50 100
100 25
25 700
700 350
350
50
33 121
121 99 748
748 204
204
33
144 256
256 81
81 1280
1280720
720
144
168 196
196144
144 1148
1148984
984
168
32 64
64 16
16 520
520 260
260
32
24 64
64 99 496
496 186
186
24
180 324
324100
100 1620
1620900
900
180
--- ------- ----- ------- --------869 1615
1615
509 9382
9382
869
509
Estimated regression equation:
Normal Equations:
Equations:
Normal
743=
=10b
10b0+123b
+123b1+65b
+65b2
743
0
1
2
9382=
=123b
123b0+1615b
+1615b1+869b
+869b2
9382
0
1
2
5040=
=65b
65b0+869b
+869b1+509b
+509b2
5040
0
1
2
=47.164942
47.164942
bb00=
=1.5990404
1.5990404
bb11=
5040
5040
=1.1487479
1.1487479
bb22=
Y 47164942
.
15990404
.
X 1 11487479
.
X2
Example 11-1: Using the

Template
Regression results for Alka-Seltzer sales
Decomposition of the Total Deviation

in a Multiple Regression Model
y
Total deviation: Y Y
y
Y Y: Error Deviation
Y Y : Regression Deviation
x1
x2
TotalDeviation
Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
Total
SST
SST
=
=
SSR
SSR
+ SSE
SSE
+
11-3 The F Test of a Multiple

Regression Model
statisticaltest
testfor
forthe
theexistence
existenceof
ofaalinear
linearrelationship
relationship
AAstatistical
betweenYYand
andany
anyor
orall
allof
ofthe
theindependent
independentvariables
variablesXX 1,,
between
1
...,XXk::
xx22,,...,
k
=2=
=...=
...=k=0
=0
HH00:: 11=
2
k
Notall
all thei(i=1,2,...,k)
(i=1,2,...,k)are
are00
HH11:: of
Not
i Degrees of
Source
Sum the
of
Variation
Squares
Regression SSR
Error
Total
SSE
SST
Freedom
k
n - (k+1)
n-1
Mean Square
SSR
MSR
MSE
SSE
( n ( k 1))
MST
SST
( n 1)
F Ratio
Using the Template: Analysis of

Variance Table (Example 11-1)
F Distribution with 2 and 7 Degrees of Freedom

f(F)
Test statistic 86.34
=0.01
F
F0.01=9.55
Thetest
teststatistic,
statistic,FF=
=86.34,
86.34,
The
greaterthan
thanthe
thecritical
critical
isisgreater
pointof
ofFF(2, 7)for
forany
anycommon
common
point
(2, 7)
levelof
ofsignificance
significance
level
(p-value0),
0),so
sothe
thenull
null
(p-value
hypothesisis
isrejected,
rejected,and
and
hypothesis
wemight
mightconclude
concludethat
thatthe
the
we
dependentvariable
variableisisrelated
related
dependent
toone
oneor
ormore
moreof
ofthe
the
to
independentvariables.
variables.
independent
11-4 How Good is the

Regression
The mean square error is an unbiased
estimator of the variance of the population
2
errors, , denoted by :
MSE
x1
SSE
( n ( k 1))
( y y) 2
( n ( k 1))
Standard error of estimate:

x2
s=
Errors: y - y
MSE
2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
R2 =
SSR
SSE
= 1SST
SST
Decomposition of the Sum of Squares

and the Adjusted Coefficient of
Determination
SST
SSR
R
SSE
2
SSR
SST
= 1-
SSE
SST
2
The adjusted multiple coefficient of determination , R , is the coefficient of
determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R
= 1-
(n - (k + 1))
SST
(n - 1)
Example11-1:
11-1:
Example
1.911
ss==1.911
R-sq==96.1%
96.1%
R-sq
R-sq(adj)==95.0%
95.0%
R-sq(adj)
Measures of Performance in Multiple

Regression and the ANOVA Table
Source of
Variation
Sum of
Squares
Regression SSR
Degrees of
Freedom Mean Square
(k)
MSR
Error
SSE
(n-(k+1))
=(n-k-1)
Total
SST
(n-1)
SSR
SST
=1-
SSE
SST
MSE
F Ratio
F
SSR
k
MSR
MSE
SSE
( n ( k 1))
MST
SST
( n 1)
SSE
( n ( k 1))
2
(1 R )
(k )
=1-
(n - (k + 1))
SST
(n - 1)
MSE
MST
11-5 Tests of the Significance of

Individual Regression Parameters
Hypothesistests
testsabout
aboutindividual
individualregression
regressionslope
slopeparameters:
parameters:
Hypothesis
(1) HH0::1=
=00
(1)
0
1
0
HH11::110
(2) HH0::2=
=00
(2)
0
2
0
HH11::.220
..
..
.
(k)
H
=00
(k) H00::kk=
0
HH1::k0
1
Test statistic for test i: t
b 0
s(b )
i
( n ( k 1 )
Regression Results for Individual

Parameters
Variable
Constant
Coefficient
Estimate
Standard
Error
t-Statistic
53.12
5.43
9.783
X1
2.03
0.22
9.227
X2
5.60
1.30
4.308
X3
10.35
6.88
1.504
X4
3.45
2.70
1.259
X5
-4.25
0.38
11.184
n=150
t0.025=1.96
*
*
*
*
Example 11-1: Using the

Template
Regression results for Alka-Seltzer sales
Using the Template:

Example 11-2
Regression results for Exports to Singapore
11-6 Testing the Validity of the

Regression Model: Residual Plots
Residuals vs M1
It appears that the residuals are randomly distributed with

no pattern and with equal variance as M1 increases
11-6 Testing the Validity of the

Regression Model: Residual Plots
Residuals vs Price
It appears that the residuals are increasing as the Price

increases. The variance of the residuals is not constant.
Normal Probability Plot for the

Residuals: Example 11-2
Linear trend indicates residuals are normally distributed
Investigating the Validity of the

Regression: Outliers and Influential
Observations
y
Regression
line without
outlier
. .
. ..
..
. .. ..
.. .
.
Regressio
n line
with
outlier
* Outlier
x
Outliers
Outliers
Point with a
large value of
xi
.
.
.
.
.. .... .
. .. .
Regression
line when all
data are
included
No
relationship in
this cluster
InfluentialObservations
Observations
Influential
Outliers and Influential

Observations: Example 11-2
UnusualObservations
Observations
Unusual
Obs.
M1
EXPORTS
Obs.
M1
EXPORTS
5.10
2.6000
11
5.10
2.6000
4.90
2.6000
22
4.90
2.6000
25
6.20
5.5000
25
6.20
5.5000
26
6.30
3.7000
26
6.30
3.7000
50
8.30
4.3000
50
8.30
4.3000
67
8.20
5.6000
67
8.20
5.6000
Fit Stdev.Fit
Stdev.Fit
Fit
2.6420 0.1288
0.1288
2.6420
2.6438 0.1234
0.1234
2.6438
4.5949 0.0676
0.0676
4.5949
4.6311 0.0651
0.0651
4.6311
5.1317 0.0648
0.0648
5.1317
4.9474 0.0668
0.0668
4.9474
Residual
St.Resid
Residual
St.Resid
-0.0420
-0.14XX
-0.0420
-0.14
-0.0438
-0.14XX
-0.0438
-0.14
0.9051
2.80R
0.9051
2.80R
-0.9311
-2.87R
-0.9311
-2.87R
-0.8317
-2.57R
-0.8317
-2.57R
0.6526
2.02R
0.6526
2.02R
denotesan
anobs.
obs.with
withaalarge
largest.
st.resid.
resid.
RRdenotes
denotesan
anobs.
obs.whose
whoseXXvalue
valuegives
givesititlarge
largeinfluence.
influence.
XXdenotes
11-7 Using the Multiple Regression

Model for Prediction
Sales
EstimatedRegression
RegressionPlane
Planefor
forExample
Example11-1
11-1
Estimated
89.76
Advertising
18.00
63.42
8.00
Promotions
12
Prediction in Multiple
Regression
A (1 - a) 100% prediction interval for a value of Y given values of X :
i
y t
( ,( n ( k 1)))
2
s 2 ( y) MSE
A (1 - a) 100% prediction interval for the conditional mean of Y given

values of X :
i
y t
( ,( n ( k 1)))
2
s[ E (Y )]
11-8 Qualitative (or

Categorical) Independent
Variables
(in
Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh
0 if level A is not obtained
MOVIEEARN COST PROM BOOK

MOVIEEARN COST PROM BOOK
1
28
4.2
1.0
0
1
28
4.2
1.0
0
2
35
6.0
3.0
1
2
35
6.0
3.0
1
3
50
5.5
6.0
1
3
50
5.5
6.0
1
4
20
3.3
1.0
0
4
20
3.3
1.0
0
5
75
12.5
11.0
1
5
75
12.5
11.0
1
6
60
9.6
8.0
1
6
60
9.6
8.0
1
7
15
2.5
0.5
0
7
15
2.5
0.5
0
8
45
10.8
5.0
0
8
45
10.8
5.0
0
9
50
8.4
3.0
1
9
50
8.4
3.0
1
10
34
6.6
2.0
0
10
34
6.6
2.0
0
11
48
10.7
1.0
1
11
48
10.7
1.0
1
12
82
11.0
15.0
1
12
82
11.0
15.0
1
13
24
3.5
4.0
0
13
24
3.5
4.0
0
14
50
6.9
10.0
0
14
50
6.9
10.0
0
15
58
7.8
9.0
1
15
58
7.8
9.0
1
16
63
10.1
10.0
0
16
63
10.1
10.0
0
17
30
5.0
1.0
1
17
30
5.0
1.0
1
18
37
7.5
5.0
0
18
37
7.5
5.0
0
19
45
6.4
8.0
1
19
45
6.4
8.0
1
20
72
10.0
12.0
1
20
72
10.0
12.0
1
EXAMPLE 11-3
Picturing Qualitative
Variables in Regression
y
Line for
X2=1
b0+
b2
b3
Line for
X2=0
x1
X1
regressionwith
withone
one
AAregression
quantitativevariable
variable(X
(X1))
quantitative
1
andone
onequalitative
qualitative
and
variable(X
variable
y b(X22):):b x b x
0
x2
multipleregression
regressionwith
with
AAmultiple
twoquantitative
quantitativevariables
variables
two
(X1and
andXX2))and
andone
one
(X
1
2
qualitative
variable
(X
): x
3
qualitative
(X
):
y b0 b1 xvariable
b
x
b
3
1
2 2
3 3
Picturing Qualitative Variables in

Regression: Three Categories and Two
Dummy Variables
Y
Line for X = 0 and X3

=1
Line for X2 = 1 and X3
=0
b0+
b3
Line for X2 = 0 and X3

=0
b0+
b2
b0
X1
regressionwith
withone
onequantitative
variable
AAregression
(X1))and
andtwo
twoqualitative
qualitativevariables
variables(X
(X2and
andXX2):):
(X
1
2
2
y b b x b x b x
0
qualitative
AAqualitative
variablewith
with rr
variable
levelsor
or
levels
categoriesisis
categories
represented
represented
with(r-1)
(r-1) 0/1
0/1
with
(dummy)
(dummy)
variables.
variables.
Category
Category
Adventure
Adventure
Drama
Drama
Romance
Romance
XX22
00
00
11
XX33
00
11
00
Using Qualitative Variables

in Regression: Example 11-4
Salary==8547
8547 ++ 949
949Education
Education ++ 1258
1258Experience
Experience -Salary
3256Gender
Gender
3256
(SE) (32.6)
(32.6)
(45.1)
(78.5)
(SE)
(45.1)
(78.5)
(212.4)
(212.4)
(t)
(262.2)
(21.0)
(16.0)
(t)
(262.2)
(21.0)
(16.0)
(-15.3)
Onaverage,
average,female
femalesalaries
salariesare
are
(-15.3)
On
1 if Female
Gender
0 if Male
$3256below
belowmale
malesalaries
salaries
$3256
Interactions between
Quantitative and Qualitative
Variables: Shifting Slopes
Line for X2=0
Line for
X2=1
Slope =
b1
b0
Slope =
b1+b3
b0+b
X1
regressionwith
withinteraction
interactionbetween
betweenaa
AAregression
variable(X
(X1))and
andaaqualitative
qualitative
quantitative
1
variable(X
(X2):):
variable
2
y b b x b x b x x
0
11-9 Polynomial Regression

One-variablepolynomial
polynomialregression
regressionmodel:
model:
One-variable
Y= 0+
+ 1 XX +
+ 2XX22 +
+ 3XX33 +.
+. .. .. +
+ mXXmm +
+
Y=
0
1
2
3
m
wherem
misisthe
thedegree
degreeof
ofthe
thepolynomial
polynomial--the
thehighest
highestpower
powerof
of
where
appearingin
inthe
theequation.
equation. The
Thedegree
degreeof
ofthe
thepolynomial
polynomialisis
XXappearing
theorder
orderof
ofthe
themodel.
model.
the
Y
Y
y b b X
y b b X
0
y b b X b X
0
y b b X b X b X
2
(b 0)
X1
X1
Polynomial Regression:
Example 11-5
Polynomial Regression:
Other Variables and CrossProduct Terms
Variable Estimate
Estimate Standard
StandardError
Error
Variable
2.34
0.92
2.54
XX11
2.34
0.92
2.54
3.11
1.05
2.96
XX22
3.11
1.05
2.96
2
4.22
1.00
4.22
XX121
4.22
1.00
4.22
2
3.57
2.12
1.68
XX222
3.57
2.12
1.68
2
2.77
2.30
1.20
2
1X
XX1X
2.77
2.30
1.20
T-statis
T-statist
11-10 Nonlinear Models

and Transformations:
Multiplicative
Model
The multiplicative model:
Y X X X
1
The logarithmic transformation:

log Y log log X log X log X log
0
Transformations:
Exponential Model
The exponential model:
Y e
1X
The logarithmic transformation:

log Y log
X log
1
Plots of Transformed
Variables
Sim ple Regression of Sales on Ad vertising
Regression of Sales on Log(Advertising)

25
20
Y = 6 .59 2 71 + 1.19 176 X

R- Sq uared = 0 .8 9 5
10
SALES
SALES
30
15
Y = 3.6 6 8 2 5 + 6 .78 4 X
R- Sq uared = 0 .978
5
10
15
ADVERT
LOGADV
Regression of Log(Sales) on Log(Advertising)
Residual Plots: Sales vs Log(Advertising)

1.5
3.5
2.5
Y = 1.70 0 8 2 + 0 .5 53 13 6 X
R- Sq uared = 0 .9 47
RESIDS
LOGSALE
0.5
-0.5
-1.5
1.5
0
LOGADV
12
Y-HAT
22
Variance Stabilizing
Transformations
Square root
root transformation:
transformation:
Y Y
Square
Useful
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
proportionalto
tothe
theconditional
conditionalmean
mean
approximately
ofYY
of
Y log(Y )
Logarithmic transformation:
transformation:
Logarithmic
Useful
Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisis
proportionalto
tothe
the
squareof
ofthe
the
approximately
1 square
Y
conditionalmean
meanof
ofYY
conditional
Y
Reciprocal transformation:
transformation:
Reciprocal
Useful
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
proportionalto
tothe
thefourth
fourthpower
powerof
ofthe
the
approximately
conditionalmean
meanof
ofYY
conditional
Regression with Dependent

Indicator Variables
The logistic function:
e ( X )
E (Y X )
1 e ( X )
0
Transformation to linearize the logistic function:

p
1 p
p log
Logistic Function
11-11: Multicollinearity
x2
x2
x1
Orthogonal X variables
provide information from
independent sources. No
multicollinearity.
x2
x1
Some degree of
collinearity. Problems with
regression depend on the
degree of collinearity.
x1
Perfectly collinear X
variables provide identical
information content. No
regression.
x2
x1
A high degree of negative

collinearity also causes
problems with regression.
Effects of Multicollinearity
Variancesof
ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
Variances
Magnitudesof
ofregression
coefficientsmay
maybe
bedifferent
differentfrom
from
Magnitudes
whatare
areexpected.
expected.
what
Signsof
ofregression
coefficientsmay
maynot
notbe
beas
asexpected.
expected.
Signs
Addingor
orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin
in
Adding
coefficients.
coefficients.
Removingaadata
datapoint
pointmay
maycause
causelarge
largechanges
changesin
incoefficient
coefficient
Removing
estimatesor
orsigns.
signs.
estimates
Insome
somecases,
cases,the
theFFratio
ratiomay
maybe
besignificant
significantwhile
whilethe
thettratios
ratios
In
arenot.
not.
are
Detecting the Existence of Multicollinearity: Correlation

Matrix of Independent Variables and Variance Inflation
Factors
Variance Inflation Factor

The variance inflation factor associated with X h :
1
VIF ( X h )
1 Rh2
where R 2h is the R 2 value obtained for the regression of X on
the other independent variables.
Relationship between VIF and Rh2
VIF100
50
0
0.0
0.5
1.0
Rh2
Variance Inflation Factor

(VIF)
Observation: The VIF (Variance Inflation Factor) values

for both variables Lend and Price are both greater than
5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.
Solutions to the
Multicollinearity Problem
Drop aa collinear
collinear variable
variable from
from the
the
Drop
regression
regression
Change in
in sampling
sampling plan
plan to
to include
include
Change
elements outside
outside the
the
elements
multicollinearity range
range
multicollinearity
Transformations of
of variables
variables
Transformations
Ridge regression
regression
Ridge
11-12 Residual
Autocorrelation and the
Durbin-Watson
Test
An autocorrelation is a correlation of the values of a variable with
An autocorrelation is a correlation of the values of a variable with

valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneor
ormore
moreperiods
periodsback.
back.
values
Consequencesof
ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurateestimates
estimatesof
of
Consequences
variancesand
andinaccurate
inaccuratepredictions.
predictions.
variances
LaggedResiduals
Residuals
Lagged
ii
11
22
33
44
55
66
77
88
99
10
10
i i i-1i-1
i-2i-2 i-3i-3
1.0
1.0
**
**
**
0.0
1.0
0.0
1.0
**
**
-1.0
0.0
1.0 **
-1.0
0.0
1.0
2.0 -1.0
-1.0
0.0 1.0
1.0
2.0
0.0
3.0
2.0 -1.0
-1.0 0.0
0.0
3.0
2.0
-2.0
3.0
2.0 -1.0
-1.0
-2.0
3.0
2.0
1.0 -2.0
-2.0
3.0 2.0
2.0
1.0
3.0
1.5
1.0 -2.0
-2.0 3.0
3.0
1.5
1.0
1.0
1.5
1.0 -2.0
-2.0
1.0
1.5
1.0
-2.5
1.0
1.5 1.0
1.0
-2.5
1.0
1.5
i-4i-4
**
**
**
**
1.0
1.0
0.0
0.0
-1.0
-1.0
2.0
2.0
3.0
3.0
-2.0
-2.0
TheDurbin-Watson
Durbin-Watson test
test
The
(first-orderautocorrelation):
autocorrelation):
(first-order
=00
HH00::11=
:0
0
HH11:
TheDurbin-Watson
Durbin-Watson test
test
The
statistic:n
2
statistic:
( ei ei 1 )
d i2 n
2
ei
i 1
Critical Points of the Durbin-Watson Statistic:

=0.05, n= Sample Size, k = Number of
Independent Variables
=11 kk=
=22 kk=
=33 kk=
=44
=55
kk=
kk=
nn ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU
15 1.08
1.08 1.36
1.36 0.95
0.95 1.54
1.54 0.82
0.82 1.75
1.75
15
16 1.10
1.10 1.37
1.37 0.98
0.98 1.54
1.54 0.86
0.86 1.73
1.73
16
17 1.13
1.13 1.38
1.38 1.02
1.02 1.54
1.54 0.90
0.90 1.71
1.71
17
18 1.16
1.16 1.39
1.39 1.05
1.05 1.53
1.53 0.93
0.93 1.69
1.69
18
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
65 1.57
1.57 1.63
1.63 1.54
1.54 1.66
1.66 1.50
1.50 1.70
1.70
65
70 1.58
1.58 1.64
1.64 1.55
1.55 1.67
1.67 1.52
1.52 1.70
1.70
70
75 1.60
1.60 1.65
1.65 1.57
1.57 1.68
1.68 1.54
1.54 1.71
1.71
75
80 1.61
1.61 1.66
1.66 1.59
1.59 1.69
1.69 1.56
1.56 1.72
1.72
80
85 1.62
1.62 1.67
1.67 1.60
1.60 1.70
1.70 1.57
1.57 1.72
1.72
85
90 1.63
1.63 1.68
1.68 1.61
1.61 1.70
1.70 1.59
1.59 1.73
1.73
90
95 1.64
1.64 1.69
1.69 1.62
1.62 1.71
1.71 1.60
1.60 1.73
1.73
95
1001.65 1.69
1.69 1.63
1.63 1.72
1.72 1.61
1.61 1.74
1.74
1001.65
0.69
0.69
0.74
0.74
0.78
0.78
0.82
0.82
1.97
1.97
1.93
1.93
1.90
1.90
1.87
1.87
0.56
0.56
0.62
0.62
0.67
0.67
0.71
0.71
2.21
2.21
2.15
2.15
2.10
2.10
2.06
2.06
1.47
1.47
1.49
1.49
1.51
1.51
1.53
1.53
1.55
1.55
1.57
1.57
1.58
1.58
1.59
1.59
1.73
1.73
1.74
1.74
1.74
1.74
1.74
1.74
1.75
1.75
1.75
1.75
1.75
1.75
1.76
1.76
1.44
1.44
1.46
1.46
1.49
1.49
1.51
1.51
1.52
1.52
1.54
1.54
1.56
1.56
1.57
1.57
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.78
1.78
1.78
1.78
1.78
1.78
Using the Durbin-Watson

Statistic
Positive
Autocorrelation
dL
Test is
Inconclusive
dU
No
Autocorrelation
Test is
Inconclusive
4-dU
Negative
Autocorrelation
4-dL
Fornn=
=67,
67,kk=
=4:
4: ddU1.73
1.73
4-dU2.27
2.27
For
4-d
U
U
4-dd2.53
<2.58
2.58
L1.47
L2.53<
ddL1.47
4L
rejected,and
andwe
weconclude
concludethere
thereisisnegative
negativefirst-order
first-order
HH00isisrejected,
autocorrelation.
autocorrelation.
11-13 Partial F Tests and

Variable
Selection Methods
Fullmodel:
model:
Full
= 0+
+ 1XX1+
+ 2XX2+
+ 3XX3+
+ 4XX4+
+
YY=
0
1
1
2
2
3
3
4
4
Reducedmodel:
model:
Reduced
= 0+
+ 1XX1+
+ 2XX2+
+
YY=
0
1
1
2
2
PartialFFtest:
test:
Partial
H0:: 3=
= 4=
=00
H
0
3
4
H1:: 3and
and 4not
notboth
both00
H
1
3
4
PartialFFstatistic:
statistic: F
Partial
(r, (n (k 1))
(SSE
SSE ) / r
R
F
MSE
F
whereSSE
SSERis
isthe
thesum
sumof
ofsquared
squarederrors
errorsof
ofthe
thereduced
reducedmodel,
model,SSE
SSEF
where
R
F
is
the
sum
of
squared
errors
of
the
full
model;
MSE
is
the
mean
F
is the sum of squared errors of the full model; MSEF is the mean
squareerror
errorof
ofthe
thefull
fullmodel
model[MSE
[MSEF==SSE
SSEF/(n-(k+1))];
/(n-(k+1))];rris
isthe
the
square
F
F
numberof
ofvariables
variablesdropped
droppedfrom
fromthe
thefull
fullmodel.
model.
number
Variable Selection Methods

All possible
possible regressions
regressions
All
Run
Run regressions
regressions with
with all
all possible
possible
combinations of
of independent
independent variables
variables and
and
combinations
select best
best model
model
select
A p-value of 0.001
indicates that we
should reject the null
hypothesis H0: the
slopes for Lend and
Exch. are zero.
Variable Selection Methods

Stepwise procedures
procedures
Stepwise
Forward
Forward selection
selection
Addone
onevariable
variableat
ataatime
timeto
tothe
themodel,
model,on
onthe
thebasis
basisof
of
Add
itsFFstatistic
statistic
its
Backward
Backward elimination
elimination
Removeone
onevariable
variableat
ataatime,
time,on
onthe
thebasis
basisof
ofits
itsFF
Remove
statistic
statistic
Stepwise
Stepwise regression
regression
Addsvariables
variablesto
tothe
themodel
modeland
andsubtracts
subtractsvariables
variables
Adds
fromthe
themodel,
model,on
onthe
thebasis
basisof
ofthe
theFFstatistic
statistic
from
Stepwise Regression
Compute F statistic for each variable not in the model
No?
Is there at least one variable with p-value > P
in
Stop
Yes
Enter most significant (smallest p-value) variable into model
Calculate partial F for all variables in the model
Is there a variable with p-value > Pout?

No
Remove
variable
Stepwise Regression: Using

the Computer (MINITAB)
MTB>>STEPWISE
STEPWISE'EXPORTS'
'EXPORTS'PREDICTORS
PREDICTORS 'M1
'M1 'LEND'
'LEND' 'PRICE
'PRICE 'EXCHANGE'
'EXCHANGE'
MTB
StepwiseRegression
Regression
Stepwise
F-to-Enter:
F-to-Enter:
4.00 F-to-Remove:
F-to-Remove:
4.00
4.00
4.00
ResponseisisEXPORTS
EXPORTS on
on 44predictors,
predictors,with
withNN== 67
67
Response
Step
Step
Constant
Constant
M1
M1
T-Ratio
T-Ratio
11
0.9348
0.9348
22
-3.4230
-3.4230
0.520
0.520
9.89
9.89
0.361
0.361
9.21
9.21
PRICE
PRICE
T-Ratio
T-Ratio
SS
R-Sq
R-Sq
0.0370
0.0370
9.05
9.05
0.495
0.495
60.08
60.08
0.331
0.331
82.48
82.48
Using the Computer:

MINITAB
MTB>> REGRESS
REGRESS 'EXPORTS
'EXPORTS 44 'M1
'M1 'LEND
'LEND 'PRICE'
'PRICE' 'EXCHANGE';
'EXCHANGE';
MTB
SUBC>
vif;
SUBC> vif;
SUBC>dw.
dw.
SUBC>
Regression
Analysis
Regression Analysis
Theregression
regressionequation
equationisis
The
EXPORTS==- -4.02
4.02++0.368
0.368M1
M1++0.0047
0.0047LEND
LEND++0.0365
0.0365PRICE
PRICE++0.27
0.27
EXPORTS
EXCHANGE
EXCHANGE
Predictor
Predictor
VIF
VIF
Constant
Constant
M1
M1
3.2
3.2
LEND
LEND
5.4
5.4
PRICE
PRICE
6.3
6.3
EXCHANGE
EXCHANGE
1.4
1.4
0.3358
ss==0.3358
Coef
Coef
-4.015
-4.015
0.36846
0.36846
Stdev
Stdev
2.766
2.766
0.06385
0.06385
t-ratio
t-ratio
-1.45
-1.45
5.77
5.77
pp
0.152
0.152
0.000
0.000
0.00470
0.00470
0.04922
0.04922
0.10
0.10
0.924
0.924
0.036511
0.036511
0.009326
0.009326
3.91
3.91
0.000
0.000
0.268
0.268
1.175
1.175
R-sq==82.5%
82.5%
R-sq
0.23
0.23
0.820
0.820
R-sq(adj)==81.4%
81.4%
R-sq(adj)
AnalysisofofVariance
Variance
Analysis
SOURCE
SOURCE
pp
Regression
Regression
0.000
DF
DF
44
SS
SS
32.9463
32.9463
MS
MS
8.2366
8.2366
FF
73.06
73.06
Using the Computer: SAS

(continued)
ParameterEstimates
Estimates
Parameter
Variable
Variable
INTERCEP
INTERCEP
M1
M1
LEND
LEND
PRICE
PRICE
EXCHANGE
EXCHANGE
DF
DF
11
11
11
Variable
Variable
INTERCEP
INTERCEP
M1
M1
LEND
LEND
PRICE
PRICE
EXCHANGE
EXCHANGE
-4.015461
2.76640057
-1.452
-4.015461
2.76640057
-1.452
0.368456
0.06384841
5.771
0.368456
0.06384841
5.771
0.004702
0.04922186
0.096
11
0.004702
0.04922186
0.096
0.036511
0.00932601
3.915
11
0.036511
0.00932601
3.915
0.267896
1.17544016
0.228
0.267896
1.17544016
0.228
DF
DF
11
11
11
Parameter
Standard
forH0:
H0:
Parameter
Standard
TTfor
Estimate
Error
Parameter=0
Prob>>|T|
|T|
Estimate
Error
Parameter=0
Prob
Variance
Variance
Inflation
Inflation
0.00000000
0.00000000
3.20719533
3.20719533
5.35391367
11
5.35391367
6.28873181
11
6.28873181
1.38570639
1.38570639
Durbin-WatsonDD
Durbin-Watson
(ForNumber
NumberofofObs.)
Obs.)
(For
1stOrder
OrderAutocorrelation
Autocorrelation
1st
2.583
2.583
67
67
-0.321
-0.321
0.1517
0.1517
0.0001
0.0001
0.9242
0.9242
0.0002
0.0002
0.8205
0.8205
11-15: The Matrix Approach

to Regression Analysis (1)
The population regression model:
y
y
y
.
.
y
1
1
1
.
.
.
1
x
x
x
.
.
.
x
11
21
31
n1
x
x
x
.
.
.
x
12
22
32
n2
x ... x
x ... x
x ... x
.
.
.
.
.
.
.
.
.
x ... x
13
23
2k
33
3k
n3
1k
nk
Y X
The estimated regression model:
Y = Xb + e

.
.
The Matrix Approach to

Regression Analysis (2)
The normal equations:
X Xb X Y
Estimators:
b ( X X ) X Y
1
Predicted values:
Y Xb X ( X X ) X Y HY
V (b) ( X X )
s (b) MSE ( X X )
1

11 Multiple Regression

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

11 Multiple Regression

Enviado por

Direitos autorais:

Formatos disponíveis

Applied Statistics-2 for the

Multiple Regression (1)

The k-Variable Multiple Regression

Multiple Regression (2)

Nonlinear Models and Transformations

11-1 Using Statistics

Any two points (A and

Any three points (A, B, and C),

11-2 The k-Variable Multiple

Simple and Multiple Least-Squares

The Estimated Regression

X212 XX222 XXY

Estimated regression equation:

Example 11-1: Using the

Decomposition of the Total Deviation

11-3 The F Test of a Multiple

Using the Template: Analysis of

F Distribution with 2 and 7 Degrees of Freedom

Test statistic 86.34

11-4 How Good is the

Standard error of estimate:

Decomposition of the Sum of Squares

Measures of Performance in Multiple

11-5 Tests of the Significance of

Test statistic for test i: t

Regression Results for Individual

Example 11-1: Using the

Using the Template:

11-6 Testing the Validity of the

It appears that the residuals are randomly distributed with

11-6 Testing the Validity of the

It appears that the residuals are increasing as the Price

Normal Probability Plot for the

Investigating the Validity of the

Outliers and Influential

11-7 Using the Multiple Regression

A (1 - a) 100% prediction interval for the conditional mean of Y given

11-8 Qualitative (or

MOVIEEARN COST PROM BOOK

Picturing Qualitative Variables in

Line for X = 0 and X3

Line for X2 = 0 and X3

Using Qualitative Variables

11-9 Polynomial Regression

11-10 Nonlinear Models

The logarithmic transformation:

The logarithmic transformation:

Regression of Sales on Log(Advertising)

Y = 6 .59 2 71 + 1.19 176 X

Regression of Log(Sales) on Log(Advertising)

Residual Plots: Sales vs Log(Advertising)

Regression with Dependent

Transformation to linearize the logistic function:

A high degree of negative

Detecting the Existence of Multicollinearity: Correlation

Variance Inflation Factor

Variance Inflation Factor

Observation: The VIF (Variance Inflation Factor) values

An autocorrelation is a correlation of the values of a variable with

Critical Points of the Durbin-Watson Statistic:

Using the Durbin-Watson

11-13 Partial F Tests and

Variable Selection Methods

Variable Selection Methods