Você está na página 1de 22

SMMD

Class 9
Richard P. Waterman
Wharton

May 23, 2016

May 19, 2016

1 / 22

Table of contents I
1

What you need to know from last time

Todays material

Collinearity
Definition
What is sxk (adjusted) ?

The market model

The consequences of collinearity

Diagnostics for collinearity

Hypothesis testing in multiple regression

Review
May 19, 2016

2 / 22

Last time

What is multiple regression?


The difference between marginal and partial association.
The scatterplot matrix and its uses.
Use leverage plots to diagnose multiple regressions.

May 19, 2016

3 / 22

Todays material

We will work over the material in Stine, 24.2 & 23.4, BAUR Ch. 5.
Collinearity Stine, 24.2. Chapter 5 in case book
Hypothesis testing for regression slopes Stine, 23.4 and Chapter 5 in
the case book.

May 19, 2016

4 / 22

Collinearity
Definition: correlation between the X-variables.
Consequence: it is difficult to establish which of the X-variables are
most important in the regression (they all look the same).
Visually the regression plane becomes very unstable (sausage in
space, legs on the table).
Key formula:

1
Multiple regression: SE (bk )
n sxk (adjusted)
Contrast this to simple regression where

1
Simple regression: SE (b1 )
n sx1
The difference is in whether the standard deviation of x is adjusted.
May 19, 2016

5 / 22

What is sxk (adjusted) ?

The unique variation in xk that is not explained by the other


explanatory variables.
The standard deviation of the residuals from the regression of xk
against all of the other xs.

May 19, 2016

6 / 22

The market model


Consider the regression of Apple return against the Equal weighted
market return.

May 19, 2016

7 / 22

Interpretations

The slope: when the market goes up by an additional one percent,


then Apple can be expected to increase by 1.09%.
The intercept: on days when the market doesnt move, Apple can be
expected to fall by 0.05% (but it is not significant).
R 2 : 16% of the risk in Apple is explained by the market.
1 - R 2 : 84% of the risk in Apple is not explained by the market, that
is, it is specific to Apple.

May 19, 2016

8 / 22

Getting ready for collinearity

The standard deviation of Equal weighted market return is


0.0072256.
The standard error of the slope is approximately:
se(b1 )

0.018157
1
RMSE
1

= 0.1427215
sx1
0.0072256
n
310

Of course, you can see the exact standard error on the output:
0.142952, which is really close to 0.1427215, proving our
approximation is very good.

May 19, 2016

9 / 22

Introducing a second correlated variable

The regression slope for Equal weighted return has switched


sign, its standard error has exploded and it is no longer statistically
significant.
This is what gross collinearity can do to a regression analysis.
May 19, 2016

10 / 22

Why did the standard error explode?


Find sxk (adjusted) .
Regress Equal weighted return against Market return and find the
standard deviation of the residuals. Of course, thats just the RMSE in
this regression of x1 against x2 .

May 19, 2016

11 / 22

Putting it together

All terms refer to the Equal weighted return variable.


Model
SRM
MRM

sd
0.007226
0.002175

std.err slope
0.1429
0.4675

The standard error of the slope has increased because the standard
deviation of x has become much smaller.
Recall that the standard deviation of x is in the denominator of the
standard error formula.
The standard deviation fell from 0.007226 to 0.002175 and the
standard error increased from 0.1429 to 0.4675.

May 19, 2016

12 / 22

The Variance Inflation Factor


This is a numeric summary of the extent of collinearity in a multple
regression.
Each variable gets its own VIF.
The VIF is the price you pay for collinearity: the increase of the
variance in the estimated regression coefficient, due to the presence of
collinearity.
An approximation with a nice interpretation:
2

sxk
VIF (Xk )
.
sxk (adjusted)
The exact formula:
VIF (Xk ) =

1
1

R 2 (Xk v .X1 ,

, Xk1 )

When all the xs are all uncorrelated the VIFs are all 1 (perfection).
As the collinearity increases so do the VIFs. VIFs above 10 are a
warning signal to take some action.
May 19, 2016

13 / 22

VIFS in the market model


Using the approximation:

VIF (Xk )
From the JMP

sxk
sxk (adjusted)

2


=

0.007226
0.002175

2
= 11.038.

output:

Note that the approximation (11.038) is close to the true value (11.0678),
so the VIFs interpretation as the ratio of the standard deviations is
justified.
1
You get the VIFs by right clicking in the parameter estimates table, choosing
columns, then VIF
May 19, 2016

14 / 22

Diagnostics for collinearity


Thin ellipses in the scatterplot matrix. (High correlation.)
Counter-intuitive signs on the slopes.
Large standard errors on the slopes (theres little information on
them).
2 Collapsed

leverage plots.

High Variance Inflation Factors. The increase in the variance of the


slope estimate due to collinearity.
VIF (Xk ) =

1
1

R 2 (Xk v .X1 ,

, Xk1 )

Insignificant t-statistics even though over all regression is significant


(ANOVA F-test).
2

To be discussed in the next class


May 19, 2016

15 / 22

Fix ups for collinearity

Ignore it. OK if sole objective is prediction in the range of the data.


Combine collinear variables in a meaningful way.
Delete variables. OK if extremely correlated.

May 19, 2016

16 / 22

Hypothesis testing in multiple regression

Recall the regression assumptions. If they are seriously broken, then these
tests could be misleading.
Three flavors. They all test whether slopes are equal to zero or not. They
differ in the number of slopes we are looking at simultaneously.
1

Test a single regression coefficient (slope).

Test all the regression coefficients at once.

Test a subset of the regression coefficients (more than one, but not
all of them the Partial F-test).

May 19, 2016

17 / 22

Test a single regression coefficient (slope)

Stine: p.613.
Look for the t-statistic.
The hypothesis test in English: does this variable add any explanatory
power to the model that already includes all the other X-variables?
Small p-value says YES, big p-value says NO.

May 19, 2016

18 / 22

Test all the regression coefficients at once

Stine: p. 611.
It is sometimes called the whole model test.
Look for the F-statistic in the ANOVA table.
The hypothesis test in English: do any of the X-variables in the model
explain any of the variability in the Y-variable?
Small p-value says YES, big p-value says NO.
Note that the test does not identify which variables are important.
If you answer this question as NO then its back to the drawing board
none of your variables are any good!

May 19, 2016

19 / 22

Test a subset of the regression coefficients

See formula on p.152 of BAUR. An example using the custom test


dialog is on the LMS.
The test in English: do any of the X-variables in the subset under
consideration explain any of the variability in Y?
We read the p-value from the JMP output to answer this one.
You must be able to answer this question: why not do a whole bunch of
t-tests rather than one partial F-test? Answer: the partial F-test is an
honest simultaneous test (see p. 135 of BAUR).

May 19, 2016

20 / 22

Writing out the hypothesis tests formally

Consider a model with Y , X1 , X2 and X3


An example of a test for a single coefficient would be:
H0 : 2 = 0 v .H1 : 2 6= 0.
The whole model test would be:
H0 : 1 = 2 = 3 = 0 v .

H1 : at least one of 1 , 2 or 3 6= 0.

An example of the partial-F test would be:


H0 : 2 = 3 = 0 v . H1 : at least one of 2 or 3 6= 0.

May 19, 2016

21 / 22

Review

Introduced collinearity; definition, diagnostics and fix-ups.


Introduced the various tests in multiple regression.

May 19, 2016

22 / 22

Você também pode gostar