Você está na página 1de 58

Applied Statistics-2 for the

Students
of
Executive program in
Business Analytics and
Business Intelligence
Organized
by
IIM Ranchi
Edited By: Dr. K. Maddulety, NITIE, Mumbai,
Mail: koila@rediffmail.com

1
1Using Statistics

Multiple Regression (1)

The k-Variable Multiple Regression


Model
The F Test of a Multiple Regression
Model
How Good is the Regression
Tests of the Significance of Individual
Regression Parameters
Testing the Validity of the Regression
Model
Using the Multiple Regression Model for

Multiple Regression (2)


1
1Qualitative Independent Variables
Polynomial Regression

Nonlinear Models and Transformations


Multicollinearity
Residual Autocorrelation and the DurbinWatson Test
Partial F Tests and Variable Selection
Methods
The Matrix Approach to Multiple
Regression Analysis
Summary and Review of Terms

11-1 Using Statistics


y

Lines

Planes

B
B

Slope: 1

x1

Intercept: 0

Any two points (A and


B), or an intercept and
slope (0 and 1), define
a line on a twodimensional surface.

x2

Any three points (A, B, and C),


or an intercept and
coefficients of x1 and x2 (0 ,
1, and 2), define a plane in a
three-dimensional surface.

11-2 The k-Variable Multiple


Regression Model
Thepopulation
population regression
regression
The
modelof
ofaadependent
dependentvariable,
variable,
model
Y,on
onaaset
setof
ofkkindependent
independent
Y,
variables,XX1,,XX2,.,.....,,XXkisisgiven
given
variables,
1
2
k
by:
by:
Y=0+
+1XX1+
+2XX2+
+......+
+kXXk
Y=
0
1 1
2 2
k k
+

x2
2

where0isisthe
theY-intercept
Y-interceptof
ofthe
the
where
0
x1
regression
surface
and
each

,
i
regression surface and each i i, i
y 0 1 x1 2 x 2
=1,2,...,k
1,2,...,kisisthe
theslope
slopeof
ofthe
the
=
regressionsurface
surface--sometimes
sometimes
regression
Model
assumptions:
Model
assumptions:
called
theresponse
response
surface
- errors.
2
called
the
surface
1.
~N(0,
of
other
2 ), independent
1. ~N(0, ), independent of other errors.
with
respect
toXXX.i. are uncorrelated with the error term.
with
respect
to
2. The
The
variables
2.
variables
Xi iare uncorrelated with the error term.
i

Simple and Multiple Least-Squares


Regression
y

x1
y b0 b1x
X

Inaasimple
simpleregression
regression
In
model,the
theleastleastmodel,
squaresestimators
estimators
squares
minimizethe
thesum
sumof
of
minimize
squarederrors
errorsfrom
fromthe
the
squared
estimatedregression
regression
estimated
line.
line.

x2

y b0 b1 x1 b2 x 2

Inaamultiple
multipleregression
regression
In
model,the
theleast-squares
least-squares
model,
estimatorsminimize
minimizethe
the
estimators
sumof
ofsquared
squarederrors
errorsfrom
from
sum
theestimated
estimatedregression
regression
the
plane.
plane.

The Estimated Regression


Relationship
Theestimated
estimatedregression
regressionrelationship:
relationship:
The

Y b0 b1 X 1 b2 X 2 bk X k

where isisthe
thepredicted
predictedvalue
valueof
ofY,
Y,the
thevalue
valuelying
lyingon
onthe
theestimated
estimated
where
regression
surface. The
Theterms
termsbb0,...,k
,...,kare
arethe
theleast-squares
least-squaresestimates
estimates
regression
surface.

0
Y
of
the
populationregression
regressionparameters
parameters.i.
of the population
i

Theactual,
actual,observed
observedvalue
valueof
ofYYis
isthe
thepredicted
predictedvalue
valueplus
plusan
an
The
error:
error:
=bb0+
+bb1xx1j+
+bb2xx2j+.
+.....+
+bbkxxkj+e
+e
yyj j=
0
1
1j
2
2j
k
kj

Least-Squares Estimation:
The 2-Variable Normal Equations
Minimizing the
the sum
sum of
of squared
squared errors
errors with
with
Minimizing
respect to
to the
the estimated
estimated coefficients
coefficients bb00,, bb11,,
respect
and bb22 yields
yields the
the following
following normal
normal
and
equations:
equations:
y nb b x b x
0

x y b x b x b x x
2

x y b x b x x b x
2

2
2

Example 11-1
YY XX11 XX22
72 12
12 55
72
76 11
11 88
76
78 15
15 66
78
70 10
10 55
70
68 11
11 33
68
80 16
16 99
80
82 14
14 12
12
82
65 88
65
44
62 88
62
33
90 18
18 10
10
90
--- ----- ------743 123
123 65
65
743

X212 XX222 XXY


XY
1X2 X
1Y X
2Y
XX1X
2
1
2
1
2
60 144
144 25
25 864
864 360
360
60
88 121
121 64
64 836
836 608
608
88
90 225
225 36
36 1170
1170468
468
90
50 100
100 25
25 700
700 350
350
50
33 121
121 99 748
748 204
204
33
144 256
256 81
81 1280
1280720
720
144
168 196
196144
144 1148
1148984
984
168
32 64
64 16
16 520
520 260
260
32
24 64
64 99 496
496 186
186
24
180 324
324100
100 1620
1620900
900
180
--- ------- ----- ------- --------869 1615
1615
509 9382
9382
869
509

Estimated regression equation:

Normal Equations:
Equations:
Normal

743=
=10b
10b0+123b
+123b1+65b
+65b2
743
0
1
2
9382=
=123b
123b0+1615b
+1615b1+869b
+869b2
9382
0
1
2
5040=
=65b
65b0+869b
+869b1+509b
+509b2
5040
0
1
2
=47.164942
47.164942
bb00=
=1.5990404
1.5990404
bb11=
5040
5040
=1.1487479
1.1487479
bb22=

Y 47164942
.
15990404
.
X 1 11487479
.
X2

Example 11-1: Using the


Template
Regression results for Alka-Seltzer sales

Decomposition of the Total Deviation


in a Multiple Regression Model
y

Total deviation: Y Y
y

Y Y: Error Deviation

Y Y : Regression Deviation

x1

x2
TotalDeviation
Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
Total

SST
SST

=
=

SSR
SSR

+ SSE
SSE
+

11-3 The F Test of a Multiple


Regression Model
statisticaltest
testfor
forthe
theexistence
existenceof
ofaalinear
linearrelationship
relationship
AAstatistical
betweenYYand
andany
anyor
orall
allof
ofthe
theindependent
independentvariables
variablesXX 1,,
between
1
...,XXk::
xx22,,...,
k
=2=
=...=
...=k=0
=0
HH00:: 11=
2
k
Notall
all thei(i=1,2,...,k)
(i=1,2,...,k)are
are00
HH11:: of
Not
i Degrees of
Source
Sum the
of
Variation

Squares

Regression SSR
Error
Total

SSE
SST

Freedom
k
n - (k+1)
n-1

Mean Square
SSR

MSR

MSE

SSE
( n ( k 1))

MST

SST
( n 1)

F Ratio

Using the Template: Analysis of


Variance Table (Example 11-1)

F Distribution with 2 and 7 Degrees of Freedom


f(F)

Test statistic 86.34

=0.01

F
F0.01=9.55

Thetest
teststatistic,
statistic,FF=
=86.34,
86.34,
The
greaterthan
thanthe
thecritical
critical
isisgreater
pointof
ofFF(2, 7)for
forany
anycommon
common
point
(2, 7)
levelof
ofsignificance
significance
level
(p-value0),
0),so
sothe
thenull
null
(p-value
hypothesisis
isrejected,
rejected,and
and
hypothesis
wemight
mightconclude
concludethat
thatthe
the
we
dependentvariable
variableisisrelated
related
dependent
toone
oneor
ormore
moreof
ofthe
the
to
independentvariables.
variables.
independent

11-4 How Good is the


Regression
The mean square error is an unbiased
estimator of the variance of the population
2
errors, , denoted by :

MSE
x1

SSE
( n ( k 1))

( y y) 2
( n ( k 1))

Standard error of estimate:


x2

s=

Errors: y - y

MSE

2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:

R2 =

SSR
SSE
= 1SST
SST

Decomposition of the Sum of Squares


and the Adjusted Coefficient of
Determination
SST
SSR
R

SSE
2

SSR
SST

= 1-

SSE
SST

2
The adjusted multiple coefficient of determination , R , is the coefficient of
determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R

= 1-

(n - (k + 1))
SST
(n - 1)

Example11-1:
11-1:
Example

1.911
ss==1.911

R-sq==96.1%
96.1%
R-sq

R-sq(adj)==95.0%
95.0%
R-sq(adj)

Measures of Performance in Multiple


Regression and the ANOVA Table
Source of
Variation

Sum of
Squares

Regression SSR

Degrees of
Freedom Mean Square
(k)
MSR

Error

SSE

(n-(k+1))
=(n-k-1)

Total

SST

(n-1)

SSR
SST

=1-

SSE
SST

MSE

F Ratio
F

SSR
k

MSR
MSE

SSE
( n ( k 1))

MST

SST
( n 1)
SSE

( n ( k 1))
2

(1 R )

(k )

=1-

(n - (k + 1))
SST
(n - 1)

MSE
MST

11-5 Tests of the Significance of


Individual Regression Parameters
Hypothesistests
testsabout
aboutindividual
individualregression
regressionslope
slopeparameters:
parameters:
Hypothesis
(1) HH0::1=
=00
(1)
0
1
0
HH11::110
(2) HH0::2=
=00
(2)
0
2
0
HH11::.220
..
..
.
(k)
H
=00
(k) H00::kk=
0
HH1::k0
1

Test statistic for test i: t

b 0

s(b )
i

( n ( k 1 )

Regression Results for Individual


Parameters
Variable
Constant

Coefficient
Estimate

Standard
Error

t-Statistic

53.12

5.43

9.783

X1

2.03

0.22

9.227

X2

5.60

1.30

4.308

X3

10.35

6.88

1.504

X4

3.45

2.70

1.259

X5

-4.25

0.38

11.184

n=150

t0.025=1.96

*
*
*
*

Example 11-1: Using the


Template
Regression results for Alka-Seltzer sales

Using the Template:


Example 11-2
Regression results for Exports to Singapore

11-6 Testing the Validity of the


Regression Model: Residual Plots
Residuals vs M1

It appears that the residuals are randomly distributed with


no pattern and with equal variance as M1 increases

11-6 Testing the Validity of the


Regression Model: Residual Plots
Residuals vs Price

It appears that the residuals are increasing as the Price


increases. The variance of the residuals is not constant.

Normal Probability Plot for the


Residuals: Example 11-2
Linear trend indicates residuals are normally distributed

Investigating the Validity of the


Regression: Outliers and Influential
Observations
y

Regression
line without
outlier

. .
. ..
..
. .. ..
.. .
.

Regressio
n line
with
outlier

* Outlier
x

Outliers
Outliers

Point with a
large value of
xi

.
.
.
.
.. .... .
. .. .

Regression
line when all
data are
included

No
relationship in
this cluster

InfluentialObservations
Observations
Influential

Outliers and Influential


Observations: Example 11-2
UnusualObservations
Observations
Unusual
Obs.
M1
EXPORTS
Obs.
M1
EXPORTS
5.10
2.6000
11
5.10
2.6000
4.90
2.6000
22
4.90
2.6000
25
6.20
5.5000
25
6.20
5.5000
26
6.30
3.7000
26
6.30
3.7000
50
8.30
4.3000
50
8.30
4.3000
67
8.20
5.6000
67
8.20
5.6000

Fit Stdev.Fit
Stdev.Fit
Fit
2.6420 0.1288
0.1288
2.6420
2.6438 0.1234
0.1234
2.6438
4.5949 0.0676
0.0676
4.5949
4.6311 0.0651
0.0651
4.6311
5.1317 0.0648
0.0648
5.1317
4.9474 0.0668
0.0668
4.9474

Residual
St.Resid
Residual
St.Resid
-0.0420
-0.14XX
-0.0420
-0.14
-0.0438
-0.14XX
-0.0438
-0.14
0.9051
2.80R
0.9051
2.80R
-0.9311
-2.87R
-0.9311
-2.87R
-0.8317
-2.57R
-0.8317
-2.57R
0.6526
2.02R
0.6526
2.02R

denotesan
anobs.
obs.with
withaalarge
largest.
st.resid.
resid.
RRdenotes
denotesan
anobs.
obs.whose
whoseXXvalue
valuegives
givesititlarge
largeinfluence.
influence.
XXdenotes

11-7 Using the Multiple Regression


Model for Prediction
Sales

EstimatedRegression
RegressionPlane
Planefor
forExample
Example11-1
11-1
Estimated

89.76

Advertising

18.00
63.42
8.00

Promotions

12

Prediction in Multiple
Regression
A (1 - a) 100% prediction interval for a value of Y given values of X :
i

y t

( ,( n ( k 1)))
2

s 2 ( y) MSE

A (1 - a) 100% prediction interval for the conditional mean of Y given


values of X :
i

y t

( ,( n ( k 1)))
2

s[ E (Y )]

11-8 Qualitative (or


Categorical) Independent
Variables
(in
Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh
0 if level A is not obtained

MOVIEEARN COST PROM BOOK


MOVIEEARN COST PROM BOOK
1
28
4.2
1.0
0
1
28
4.2
1.0
0
2
35
6.0
3.0
1
2
35
6.0
3.0
1
3
50
5.5
6.0
1
3
50
5.5
6.0
1
4
20
3.3
1.0
0
4
20
3.3
1.0
0
5
75
12.5
11.0
1
5
75
12.5
11.0
1
6
60
9.6
8.0
1
6
60
9.6
8.0
1
7
15
2.5
0.5
0
7
15
2.5
0.5
0
8
45
10.8
5.0
0
8
45
10.8
5.0
0
9
50
8.4
3.0
1
9
50
8.4
3.0
1
10
34
6.6
2.0
0
10
34
6.6
2.0
0
11
48
10.7
1.0
1
11
48
10.7
1.0
1
12
82
11.0
15.0
1
12
82
11.0
15.0
1
13
24
3.5
4.0
0
13
24
3.5
4.0
0
14
50
6.9
10.0
0
14
50
6.9
10.0
0
15
58
7.8
9.0
1
15
58
7.8
9.0
1
16
63
10.1
10.0
0
16
63
10.1
10.0
0
17
30
5.0
1.0
1
17
30
5.0
1.0
1
18
37
7.5
5.0
0
18
37
7.5
5.0
0
19
45
6.4
8.0
1
19
45
6.4
8.0
1
20
72
10.0
12.0
1
20
72
10.0
12.0
1

EXAMPLE 11-3

Picturing Qualitative
Variables in Regression
y

Line for
X2=1

b0+
b2

b3

Line for
X2=0

x1

X1

regressionwith
withone
one
AAregression
quantitativevariable
variable(X
(X1))
quantitative
1
andone
onequalitative
qualitative
and
variable(X
variable
y b(X22):):b x b x
0

x2
multipleregression
regressionwith
with
AAmultiple
twoquantitative
quantitativevariables
variables
two
(X1and
andXX2))and
andone
one
(X
1
2
qualitative
variable
(X
): x
3
qualitative
(X
):
y b0 b1 xvariable

b
x

b
3
1
2 2
3 3

Picturing Qualitative Variables in


Regression: Three Categories and Two
Dummy Variables
Y

Line for X = 0 and X3


=1
Line for X2 = 1 and X3
=0

b0+
b3

Line for X2 = 0 and X3


=0

b0+
b2
b0

X1

regressionwith
withone
onequantitative
quantitativevariable
variable
AAregression
(X1))and
andtwo
twoqualitative
qualitativevariables
variables(X
(X2and
andXX2):):
(X
1
2
2

y b b x b x b x
0

qualitative
AAqualitative
variablewith
with rr
variable
levelsor
or
levels
categoriesisis
categories
represented
represented
with(r-1)
(r-1) 0/1
0/1
with
(dummy)
(dummy)
variables.
variables.
Category
Category
Adventure
Adventure
Drama
Drama
Romance
Romance

XX22
00
00
11

XX33
00
11
00

Using Qualitative Variables


in Regression: Example 11-4
Salary==8547
8547 ++ 949
949Education
Education ++ 1258
1258Experience
Experience -Salary
3256Gender
Gender
3256
(SE) (32.6)
(32.6)
(45.1)
(78.5)
(SE)
(45.1)
(78.5)
(212.4)
(212.4)
(t)
(262.2)
(21.0)
(16.0)
(t)
(262.2)
(21.0)
(16.0)
(-15.3)
Onaverage,
average,female
femalesalaries
salariesare
are
(-15.3)
On
1 if Female

Gender

0 if Male

$3256below
belowmale
malesalaries
salaries
$3256

Interactions between
Quantitative and Qualitative
Variables: Shifting Slopes
Line for X2=0

Line for
X2=1

Slope =
b1

b0
Slope =
b1+b3

b0+b
X1

regressionwith
withinteraction
interactionbetween
betweenaa
AAregression
quantitativevariable
variable(X
(X1))and
andaaqualitative
qualitative
quantitative
1
variable(X
(X2):):
variable
2

y b b x b x b x x
0

11-9 Polynomial Regression


One-variablepolynomial
polynomialregression
regressionmodel:
model:
One-variable
Y= 0+
+ 1 XX +
+ 2XX22 +
+ 3XX33 +.
+. .. .. +
+ mXXmm +
+
Y=
0
1
2
3
m
wherem
misisthe
thedegree
degreeof
ofthe
thepolynomial
polynomial--the
thehighest
highestpower
powerof
of
where
appearingin
inthe
theequation.
equation. The
Thedegree
degreeof
ofthe
thepolynomial
polynomialisis
XXappearing
theorder
orderof
ofthe
themodel.
model.
the
Y
Y
y b b X

y b b X
0

y b b X b X
0

y b b X b X b X
2

(b 0)

X1

X1

Polynomial Regression:
Example 11-5

Polynomial Regression:
Other Variables and CrossProduct Terms
Variable Estimate
Estimate Standard
StandardError
Error
Variable
2.34
0.92
2.54
XX11
2.34
0.92
2.54
3.11
1.05
2.96
XX22
3.11
1.05
2.96
2
4.22
1.00
4.22
XX121
4.22
1.00
4.22
2
3.57
2.12
1.68
XX222
3.57
2.12
1.68
2
2.77
2.30
1.20
2
1X
XX1X
2.77
2.30
1.20

T-statis
T-statist

11-10 Nonlinear Models


and Transformations:
Multiplicative
Model
The multiplicative model:
Y X X X
1

The logarithmic transformation:


log Y log log X log X log X log
0

Transformations:
Exponential Model
The exponential model:
Y e
1X

The logarithmic transformation:


log Y log

X log
1

Plots of Transformed
Variables
Sim ple Regression of Sales on Ad vertising

Regression of Sales on Log(Advertising)


25

20

Y = 6 .59 2 71 + 1.19 176 X


R- Sq uared = 0 .8 9 5

10

SALES

SALES

30

15

Y = 3.6 6 8 2 5 + 6 .78 4 X
R- Sq uared = 0 .978
5

10

15

ADVERT

LOGADV

Regression of Log(Sales) on Log(Advertising)

Residual Plots: Sales vs Log(Advertising)


1.5

3.5

2.5

Y = 1.70 0 8 2 + 0 .5 53 13 6 X
R- Sq uared = 0 .9 47

RESIDS

LOGSALE

0.5

-0.5

-1.5

1.5
0

LOGADV

12

Y-HAT

22

Variance Stabilizing
Transformations
Square root
root transformation:
transformation:
Y Y
Square

Useful
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis

approximatelyproportional
proportionalto
tothe
theconditional
conditionalmean
mean
approximately
ofYY
of

Y log(Y )

Logarithmic transformation:
transformation:
Logarithmic

Useful
Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisis

approximatelyproportional
proportionalto
tothe
the
squareof
ofthe
the
approximately
1 square
Y
conditionalmean
meanof
ofYY
conditional
Y

Reciprocal transformation:
transformation:
Reciprocal

Useful
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis

approximatelyproportional
proportionalto
tothe
thefourth
fourthpower
powerof
ofthe
the
approximately
conditionalmean
meanof
ofYY
conditional

Regression with Dependent


Indicator Variables
The logistic function:
e ( X )
E (Y X )
1 e ( X )
0

Transformation to linearize the logistic function:


p

1 p

p log

Logistic Function

11-11: Multicollinearity
x2
x2

x1
Orthogonal X variables
provide information from
independent sources. No
multicollinearity.

x2
x1
Some degree of
collinearity. Problems with
regression depend on the
degree of collinearity.

x1

Perfectly collinear X
variables provide identical
information content. No
regression.

x2

x1

A high degree of negative


collinearity also causes
problems with regression.

Effects of Multicollinearity

Variancesof
ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
Variances
Magnitudesof
ofregression
regressioncoefficients
coefficientsmay
maybe
bedifferent
differentfrom
from
Magnitudes
whatare
areexpected.
expected.
what
Signsof
ofregression
regressioncoefficients
coefficientsmay
maynot
notbe
beas
asexpected.
expected.
Signs
Addingor
orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin
in
Adding
coefficients.
coefficients.
Removingaadata
datapoint
pointmay
maycause
causelarge
largechanges
changesin
incoefficient
coefficient
Removing
estimatesor
orsigns.
signs.
estimates
Insome
somecases,
cases,the
theFFratio
ratiomay
maybe
besignificant
significantwhile
whilethe
thettratios
ratios
In
arenot.
not.
are

Detecting the Existence of Multicollinearity: Correlation


Matrix of Independent Variables and Variance Inflation
Factors

Variance Inflation Factor


The variance inflation factor associated with X h :
1
VIF ( X h )
1 Rh2
where R 2h is the R 2 value obtained for the regression of X on
the other independent variables.
Relationship between VIF and Rh2

VIF100

50

0
0.0

0.5

1.0

Rh2

Variance Inflation Factor


(VIF)

Observation: The VIF (Variance Inflation Factor) values


for both variables Lend and Price are both greater than
5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.

Solutions to the
Multicollinearity Problem
Drop aa collinear
collinear variable
variable from
from the
the
Drop
regression
regression
Change in
in sampling
sampling plan
plan to
to include
include
Change
elements outside
outside the
the
elements
multicollinearity range
range
multicollinearity
Transformations of
of variables
variables
Transformations
Ridge regression
regression
Ridge

11-12 Residual
Autocorrelation and the
Durbin-Watson
Test
An autocorrelation is a correlation of the values of a variable with

An autocorrelation is a correlation of the values of a variable with


valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneor
ormore
moreperiods
periodsback.
back.
values
Consequencesof
ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurateestimates
estimatesof
of
Consequences
variancesand
andinaccurate
inaccuratepredictions.
predictions.
variances

LaggedResiduals
Residuals
Lagged
ii

11
22
33
44
55
66
77
88
99
10
10

i i i-1i-1
i-2i-2 i-3i-3
1.0
1.0
**
**
**
0.0
1.0
0.0
1.0
**
**
-1.0
0.0
1.0 **
-1.0
0.0
1.0
2.0 -1.0
-1.0
0.0 1.0
1.0
2.0
0.0
3.0
2.0 -1.0
-1.0 0.0
0.0
3.0
2.0
-2.0
3.0
2.0 -1.0
-1.0
-2.0
3.0
2.0
1.0 -2.0
-2.0
3.0 2.0
2.0
1.0
3.0
1.5
1.0 -2.0
-2.0 3.0
3.0
1.5
1.0
1.0
1.5
1.0 -2.0
-2.0
1.0
1.5
1.0
-2.5
1.0
1.5 1.0
1.0
-2.5
1.0
1.5

i-4i-4
**
**
**
**
1.0
1.0
0.0
0.0
-1.0
-1.0
2.0
2.0
3.0
3.0
-2.0
-2.0

TheDurbin-Watson
Durbin-Watson test
test
The
(first-orderautocorrelation):
autocorrelation):
(first-order
=00
HH00::11=
:0
0
HH11:

TheDurbin-Watson
Durbin-Watson test
test
The
statistic:n
2
statistic:
( ei ei 1 )
d i2 n
2
ei
i 1

Critical Points of the Durbin-Watson Statistic:


=0.05, n= Sample Size, k = Number of
Independent Variables

=11 kk=
=22 kk=
=33 kk=
=44
=55
kk=
kk=
nn ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU
15 1.08
1.08 1.36
1.36 0.95
0.95 1.54
1.54 0.82
0.82 1.75
1.75
15
16 1.10
1.10 1.37
1.37 0.98
0.98 1.54
1.54 0.86
0.86 1.73
1.73
16
17 1.13
1.13 1.38
1.38 1.02
1.02 1.54
1.54 0.90
0.90 1.71
1.71
17
18 1.16
1.16 1.39
1.39 1.05
1.05 1.53
1.53 0.93
0.93 1.69
1.69
18
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
65 1.57
1.57 1.63
1.63 1.54
1.54 1.66
1.66 1.50
1.50 1.70
1.70
65
70 1.58
1.58 1.64
1.64 1.55
1.55 1.67
1.67 1.52
1.52 1.70
1.70
70
75 1.60
1.60 1.65
1.65 1.57
1.57 1.68
1.68 1.54
1.54 1.71
1.71
75
80 1.61
1.61 1.66
1.66 1.59
1.59 1.69
1.69 1.56
1.56 1.72
1.72
80
85 1.62
1.62 1.67
1.67 1.60
1.60 1.70
1.70 1.57
1.57 1.72
1.72
85
90 1.63
1.63 1.68
1.68 1.61
1.61 1.70
1.70 1.59
1.59 1.73
1.73
90
95 1.64
1.64 1.69
1.69 1.62
1.62 1.71
1.71 1.60
1.60 1.73
1.73
95
1001.65 1.69
1.69 1.63
1.63 1.72
1.72 1.61
1.61 1.74
1.74
1001.65

0.69
0.69
0.74
0.74
0.78
0.78
0.82
0.82

1.97
1.97
1.93
1.93
1.90
1.90
1.87
1.87

0.56
0.56
0.62
0.62
0.67
0.67
0.71
0.71

2.21
2.21
2.15
2.15
2.10
2.10
2.06
2.06

1.47
1.47
1.49
1.49
1.51
1.51
1.53
1.53
1.55
1.55
1.57
1.57
1.58
1.58
1.59
1.59

1.73
1.73
1.74
1.74
1.74
1.74
1.74
1.74
1.75
1.75
1.75
1.75
1.75
1.75
1.76
1.76

1.44
1.44
1.46
1.46
1.49
1.49
1.51
1.51
1.52
1.52
1.54
1.54
1.56
1.56
1.57
1.57

1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.78
1.78
1.78
1.78
1.78
1.78

Using the Durbin-Watson


Statistic

Positive
Autocorrelation

dL

Test is
Inconclusive

dU

No
Autocorrelation

Test is
Inconclusive

4-dU

Negative
Autocorrelation

4-dL

Fornn=
=67,
67,kk=
=4:
4: ddU1.73
1.73
4-dU2.27
2.27
For
4-d
U
U
4-dd2.53
<2.58
2.58
L1.47
L2.53<
ddL1.47
4L
rejected,and
andwe
weconclude
concludethere
thereisisnegative
negativefirst-order
first-order
HH00isisrejected,
autocorrelation.
autocorrelation.

11-13 Partial F Tests and


Variable
Selection Methods

Fullmodel:
model:
Full
= 0+
+ 1XX1+
+ 2XX2+
+ 3XX3+
+ 4XX4+
+
YY=
0
1
1
2
2
3
3
4
4
Reducedmodel:
model:
Reduced
= 0+
+ 1XX1+
+ 2XX2+
+
YY=
0
1
1
2
2
PartialFFtest:
test:
Partial
H0:: 3=
= 4=
=00
H
0
3
4
H1:: 3and
and 4not
notboth
both00
H
1
3
4
PartialFFstatistic:
statistic: F
Partial

(r, (n (k 1))

(SSE

SSE ) / r
R
F
MSE
F

whereSSE
SSERis
isthe
thesum
sumof
ofsquared
squarederrors
errorsof
ofthe
thereduced
reducedmodel,
model,SSE
SSEF
where
R
F
is
the
sum
of
squared
errors
of
the
full
model;
MSE
is
the
mean
F
is the sum of squared errors of the full model; MSEF is the mean
squareerror
errorof
ofthe
thefull
fullmodel
model[MSE
[MSEF==SSE
SSEF/(n-(k+1))];
/(n-(k+1))];rris
isthe
the
square
F
F
numberof
ofvariables
variablesdropped
droppedfrom
fromthe
thefull
fullmodel.
model.
number

Variable Selection Methods


All possible
possible regressions
regressions
All

Run
Run regressions
regressions with
with all
all possible
possible

combinations of
of independent
independent variables
variables and
and
combinations
select best
best model
model
select
A p-value of 0.001
indicates that we
should reject the null
hypothesis H0: the
slopes for Lend and
Exch. are zero.

Variable Selection Methods


Stepwise procedures
procedures
Stepwise
Forward
Forward selection
selection

Addone
onevariable
variableat
ataatime
timeto
tothe
themodel,
model,on
onthe
thebasis
basisof
of
Add
itsFFstatistic
statistic
its

Backward
Backward elimination
elimination

Removeone
onevariable
variableat
ataatime,
time,on
onthe
thebasis
basisof
ofits
itsFF
Remove
statistic
statistic

Stepwise
Stepwise regression
regression

Addsvariables
variablesto
tothe
themodel
modeland
andsubtracts
subtractsvariables
variables
Adds
fromthe
themodel,
model,on
onthe
thebasis
basisof
ofthe
theFFstatistic
statistic
from

Stepwise Regression
Compute F statistic for each variable not in the model
No?
Is there at least one variable with p-value > P
in

Stop

Yes
Enter most significant (smallest p-value) variable into model

Calculate partial F for all variables in the model

Is there a variable with p-value > Pout?


No

Remove
variable

Stepwise Regression: Using


the Computer (MINITAB)
MTB>>STEPWISE
STEPWISE'EXPORTS'
'EXPORTS'PREDICTORS
PREDICTORS 'M1
'M1 'LEND'
'LEND' 'PRICE
'PRICE 'EXCHANGE'
'EXCHANGE'
MTB
StepwiseRegression
Regression
Stepwise
F-to-Enter:
F-to-Enter:

4.00 F-to-Remove:
F-to-Remove:
4.00

4.00
4.00

ResponseisisEXPORTS
EXPORTS on
on 44predictors,
predictors,with
withNN== 67
67
Response
Step
Step
Constant
Constant
M1
M1
T-Ratio
T-Ratio

11
0.9348
0.9348

22
-3.4230
-3.4230

0.520
0.520
9.89
9.89

0.361
0.361
9.21
9.21

PRICE
PRICE
T-Ratio
T-Ratio
SS
R-Sq
R-Sq

0.0370
0.0370
9.05
9.05
0.495
0.495
60.08
60.08

0.331
0.331
82.48
82.48

Using the Computer:


MINITAB
MTB>> REGRESS
REGRESS 'EXPORTS
'EXPORTS 44 'M1
'M1 'LEND
'LEND 'PRICE'
'PRICE' 'EXCHANGE';
'EXCHANGE';
MTB
SUBC>
vif;
SUBC> vif;
SUBC>dw.
dw.
SUBC>
Regression
Analysis
Regression Analysis
Theregression
regressionequation
equationisis
The
EXPORTS==- -4.02
4.02++0.368
0.368M1
M1++0.0047
0.0047LEND
LEND++0.0365
0.0365PRICE
PRICE++0.27
0.27
EXPORTS
EXCHANGE
EXCHANGE
Predictor
Predictor
VIF
VIF
Constant
Constant
M1
M1
3.2
3.2
LEND
LEND
5.4
5.4
PRICE
PRICE
6.3
6.3
EXCHANGE
EXCHANGE
1.4
1.4
0.3358
ss==0.3358

Coef
Coef
-4.015
-4.015
0.36846
0.36846

Stdev
Stdev
2.766
2.766
0.06385
0.06385

t-ratio
t-ratio
-1.45
-1.45
5.77
5.77

pp
0.152
0.152
0.000
0.000

0.00470
0.00470

0.04922
0.04922

0.10
0.10

0.924
0.924

0.036511
0.036511

0.009326
0.009326

3.91
3.91

0.000
0.000

0.268
0.268

1.175
1.175

R-sq==82.5%
82.5%
R-sq

0.23
0.23

0.820
0.820

R-sq(adj)==81.4%
81.4%
R-sq(adj)

AnalysisofofVariance
Variance
Analysis
SOURCE
SOURCE
pp
Regression
Regression
0.000

DF
DF
44

SS
SS
32.9463
32.9463

MS
MS
8.2366
8.2366

FF
73.06
73.06

Using the Computer: SAS


(continued)
ParameterEstimates
Estimates
Parameter
Variable
Variable
INTERCEP
INTERCEP
M1
M1
LEND
LEND
PRICE
PRICE
EXCHANGE
EXCHANGE

DF
DF
11
11
11

Variable
Variable
INTERCEP
INTERCEP
M1
M1
LEND
LEND
PRICE
PRICE
EXCHANGE
EXCHANGE

-4.015461
2.76640057
-1.452
-4.015461
2.76640057
-1.452
0.368456
0.06384841
5.771
0.368456
0.06384841
5.771
0.004702
0.04922186
0.096
11
0.004702
0.04922186
0.096
0.036511
0.00932601
3.915
11
0.036511
0.00932601
3.915
0.267896
1.17544016
0.228
0.267896
1.17544016
0.228
DF
DF

11
11
11

Parameter
Standard
forH0:
H0:
Parameter
Standard
TTfor
Estimate
Error
Parameter=0
Prob>>|T|
|T|
Estimate
Error
Parameter=0
Prob

Variance
Variance
Inflation
Inflation

0.00000000
0.00000000
3.20719533
3.20719533
5.35391367
11
5.35391367
6.28873181
11
6.28873181
1.38570639
1.38570639

Durbin-WatsonDD
Durbin-Watson
(ForNumber
NumberofofObs.)
Obs.)
(For
1stOrder
OrderAutocorrelation
Autocorrelation
1st

2.583
2.583
67
67
-0.321
-0.321

0.1517
0.1517
0.0001
0.0001
0.9242
0.9242
0.0002
0.0002
0.8205
0.8205

11-15: The Matrix Approach


to Regression Analysis (1)
The population regression model:
y
y

y
.

.
y

1
1

1
.
.

.
1

x
x
x
.
.
.
x

11

21

31

n1

x
x
x
.
.
.
x

12

22

32

n2

x ... x
x ... x

x ... x
.
.
.
.
.
.

.
.
.
x ... x
13

23

2k

33

3k

n3

1k

nk

Y X
The estimated regression model:
Y = Xb + e


.
.

The Matrix Approach to


Regression Analysis (2)
The normal equations:
X Xb X Y
Estimators:
b ( X X ) X Y
1

Predicted values:
Y Xb X ( X X ) X Y HY
V (b) ( X X )
s (b) MSE ( X X )
1

Você também pode gostar