Você está na página 1de 54

10.

Heteroskedasticity
Objectives

What is heteroskedasticity?
What are the consequences?
How is heteroskedasticity identified?
How is heteroskedasticity corrected?
ECON 7710, 2010

10.2

Main empirical model for Unit 10:


foodexpi = 0 + 1incomei + i.
foodexp: Family food expenditure
income : Family income
Least squares estimates, US data (UE_Tab0301)

foodexp 40.77 0.128*** Income


i
i
se

22.14

0.031

R 0.3171, N 40.
2

Is this the best estimated equation?


ECON 7710, 2010

10.3

1. The Nature of Heteroskedasticity


In a regression about firms, for the same
mistake,
million

billion

ECON 7710, 2010

10.4

Heteroskedasticity is a problem that


occurs when the error term does not
have a constant variance.

CLRM: Each error term comes from the same


probability distribution.
Assumption CLRM.5 is violated!
ECON 7710, 2010

Regression Model

10.5

Yi = 0 + 1X1i + 2X2i + i
zero mean:

E(i|X1i,X2i) = 0

homoskedasticity:

var(i|X1i,X2i) = 2

no autocorrelation: cov(i, j|X1i,X2i,X1j,X2j) =


i= j
ECON 7710, 2010

10.6
Identical distributions for
observations i and j

Distribution for i
Distribution for j

ECON 7710, 2010

10.7

Homoskedasticity
Yi = 0 + 1Xi + i
var(i|Xi) = 2 for all i
Y

f(Y)

.
0

X1

X2

X3

X4

Conditional
ECON Distribution
7710, 2010

10.8

Heteroskedasticity
Yi = 0 + 1Xi + i
var(i|Xi) = i2 for all i

7710, 2010
ConditionalECON
Distribution

10.9

ECON 7710, 2010

10.10

ECON 7710, 2010

10.11

Pure heteroskedasticity
Different variances of the error term.
Correctly specified PRF.
Impure heteroskedasticity
Different variances of the error term.
Specification error.
ECON 7710, 2010

10.12

2. Detecting Heteroscedasticity
2.1 Graphical Method
Plotting foodexp against income
(for one regressor)

Scatter Diagram of Regressing foodexp on income

280

200
foodexp

Example 1:
Food expenditure,
US Data
(UE_Tab0301)

240

160
120
80
40
200

400

ECON 7710, 2010

600

800

income

1,000

1,200

10.13

Example 1: Food expenditure, US Data,


UE_Tab0301
Plotting e
against income.

Plotting e2 against
income.
7,000

120

6,000
80

squared residual

5,000

residual

40

4,000
3,000
2,000

-40

-80
200

1,000

400

600

800

income

1,000

1,200

0
200

ECON 7710, 2010

400

600

800

income

1,000

1,200

10.14

Example 2: textbook data, (Woody3)


***
***
***
**

Y 102,192 9, 075 N 0.35 P 1.29 I


se

R 2 0.6182, N = 33.
40,000
30,000

residual

20,000
10,000
0
-10,000
-20,000
-30,000
0

50,000 100,000 150,000 200,000 250,000

ECON
7710, 2010
Population

10.15

3.2 Park Test


Model
Yi = 0 + 1X1i + + KXKi + t i = 1,,N (*)
Suppose it is suspected that var(i) depends on Zi
in the form of
var(i) = i2 = 2Zi1evi
lni2 = ln2 + 1lnZki + vi
Ho: 1 = 0 (Homoskedastic errors);
HA: 1 0 (Heteroskedastic
ECON 7710, 2010 errors).

Step 1: Estimate the equation (*) with OLS and


obtain the residuals.

10.16

ei Yi Yi Yi
0 1 X 1i L K X Ki
Step 2: Regress the natural log of squared
residuals on the natural log of a possible
proportionality factor
ln(ei2) = 0 + 1lnZi + vi
where vi is an error term satisfying all classical
assumptions.
ECON 7710, 2010

10.17

Step 3
If the coefficient of lnZ is significantly
different from zero, then it would suggest that
there is heteroscedastic pattern in the residuals
with respect to Z. Otherwise, homoscedastic errors
cannot be rejected.
Example 3: Park Test: US data (UE_Tab0301)
^
ln(e2) = -7.46 + 2.07** ln(income)
t
(2.28)
p-value
(0.0284)
ECON 7710, 2010

10.18

Advantages of the Park test:


a. The test is simple.
b. It provides information about the variance structure.

Limitations of the Park test:


a. The distribution of the dependent variable is
problematic.

b. It assumes a specific functional form.


c. It does not work when the variance depends on two or
more variables.
d. The correct variable with which to order the
observations must be identified first.
e. It cannot handle partitioned
data.
ECON 7710, 2010

10.19

3.3 Whites Test


Model
Yi = 0 + 1X1i + 2X2i + i i = 1,,N (*)
Suppose it is suspected there may be
heteroskedasticity but we are not sure of its
functional form.
Ho: The conditional variance of i is constant.
HA: The conditional variance of i is not constant.
ECON 7710, 2010

10.20

Step 1: Estimate the equation (*) with OLS and


obtain the residuals.

ei Yi Yi Yi
0 1 X 1i 2 X 2 i
Step 2: Regress the squared residuals on all
explanatory variables, all cross product terms and
the square of each explanatory variable.
ei2 = 0 + 1X1i + 2X2i
+ 3X1i2 + 4X2i2
+ 5X1iX2i + vi

ECON 7710, 2010

10.21

Step 3: Test the overall significance of the


equation in Step 2. (df = number of regressors)
Statistic = NR2white ~ 2df
Critical value (cv) = 2df,
Reject the hypothesis of homoskedasticity if
NR2err > cv.
Example 4: White test: US data (UE_Tab0301)
^
e2 = 1924 7.4 income + 0.0088income2*
R2 = 0.3646, N = 40, NR2 = 14.58
cv = 2(2, 0.01) = 9.21.
ECON 7710, 2010

10.22

Advantages of the White test:


a. It does not assume a specific functional form.
b. It is applicable when the variance depends on two
or more variables.
Limitations of the White test:
a. It is an large-sample test.
b. It provides no information about the variance
structure.
c. It loses many degrees of freedom when there are
many regressors.
d. It cannot handle partitioned data.
e. It also captures specification
errors.
ECON 7710,
2010

10.23

3. Consequences of Heteroskedasticity
If heteroskedasticity appears but OLS is
used for estimation, how are the OLS
estimates affected?
Unaffected: OLS estimators are still linear and
unbiased because, on average, overestimates
are as likely as underestimates.

E k k

k 0,1, , K
ECON 7710, 2010

10.24

3.1 OLS estimators are inefficient.


Some fluctuations of the error term are
attributed to the variation in independent
variables.
There are other linear and unbiased
estimators that have smaller variances
than the OLS estimator.
ECON 7710, 2010

10.25

3.2 Unreliable Hypothesis Testing

var

ols

var hetero
k
k

biased se k

unreliable testing conclusion


ECON 7710, 2010

10.26

4. Remedies
4.1 Heteroskedasticity-Corrected
Standard Errors

Yi = 0 + 1X1i + 2X2i + i
heteroskedasticity:

var(i) = i2

OLS estimators are unbiased.


The standard errors
of OLS are biased.
ECON 7710, 2010

10.27

A heteroskedasticity-consistent (HC) standard


error of an estimated coefficient is a standard
error of an estimated coefficient adjusted for
heteroskedasticity.
a. HC standard errors are consistent for any
type of heteroskedasticity.
b. Hypothesis tests are valid with HC standard
errors in large samples.
c. Typically, HC se > OLS se
ECON 7710, 2010

10.28

Example 5:
Yi = 0 + 1Xi + i, var(i|Xi) = i.
incorrect
variance formula:


var
1

X X

correct
variance formula:

var 1

ECON 7710, 2010

2
2
i X i X
2
2
Xi X

HC estimator of the variance of the slope


coefficient in a simple regression model
2
2
ei X i X
est. var 1
2
2

Xi X

Example 6: HC Standard Errors, US


data (UE_Tab0301)
foodexp = 40.77 0.13*** income
i
i
ols se
hc se

22.14
24.32

0.031
0.039

R 2 0.3171,ECON
N =7710,
40. 2010

10.29

10.30

4.2 Weighted Least Squares

Yi = 0 + 1X1i + 2X2i + i
E(i) = 0

var(i) = i2

cov(t, s) = 0

i2 = c Zi 2
ECON 7710, 2010

t =s

The variance is
assumed to be
proportional to
the value of Zi2

10.31

Step 1: Decide which variable is proportional to


the heteroskedasticity.
Step 2: Divide all terms in the original model by
that variable (divide by Zi ).

ECON 7710, 2010

10.32

Step 3: Run least squares on the transformed


model which has new variables. Note that the
transformed model have an intercept only if Z is
one of the explanatory variables.
For example, if Zi = X2i, then

ECON 7710, 2010

10.33

Example 7: WLS: US data (UE_Tab0301)

foodexp

1
***
0.1577 21.2858

0.02342
14.0380 income
income

se

R 2 0.0570, N = 40.

What are values of the estimated coefficients


of the original model?
Has the problem of heteroskedasticity solved?
ECON 7710, 2010

10.34

Comparing different estimates: US data


(UE_Tab0301)

OLS estimate

40.77

0.128***

OLS se

22.14

0.031

HC se

24.32

0.039

WLS estimate 21.28

0.158***

WLS se

0.023

14.03

The WLS estimates have improved upon


those of OLS. ECON 7710, 2010

10.35

Other possibilities
var(i) = cZi
var(i) = cZi
var(i) = c(a1X1i + a2X2i)
ECON 7710, 2010

In large samples HC standard errors


are consistent measures for any type of
heteroscedasticity. CI & t-test are valid.

ECON 7710, 2010

10.36

10.37

4.3 Re-specifying the Regression Model


The heteroskedasticity may be impure.
4.3.1 Use another functional form
E.g., Double-log: Less variation
Example 8: US data (UE_Tab0301)
***

ln foodexp 0.30 0.69 ln income

se

0.90

0.14

R 2 0.4014, N = 40.

The hypothesis of constant variance can be rejected.


ECON 7710, 2010

10.38

Example 9: India data (Food_India55)


Empirical model:
foodexpi = 0 + 1totexpi + i.

foodexp 94.21** 0.44*** totexp


i
se

50.86

0.078

R 0.3698, N = 55.
2

The hypothesis of homoskedasticity can


be rejected by the Park and White tests.
ECON 7710, 2010

10.39

Which model is the best?


Double-log

***

ln foodexp 1.15 0.74 ln totexp


se

0.78

0.12

R 2 0.4125, N = 55.

HC

WLS

foodexp 94.21** 0.44*** totexp.


i
ols se
hc se

50.86
43.26

0.078
0.074

foodexp
1
**
***
76.5439
0.4650 .
37.9435
totexp
totexp 0.0632
se

ECON 7710, 2010

10.40

4.3.2 Other reformulations


E.g., take average of variables related to
the size of observed units, adding more
variables

Example 10: Data set Concert


The concert tour of a singer in the US
revenue = 0 + 1adv + 2stad + 3cd
+ 4radio + 5weekend + .
ECON 7710, 2010

10.41

(1)

revenue 73 3.15adv 34.66stad 8.30cd


300radio 356weekend
revenue
1
adv
cd
81
2.10
50.20 stad 7.53
stad
stad
stad
stad

(2
)

se

radio
weekend
176
293
stad
stad

revenue
adv
stad
cd
22 2.21
109
7.93
pop
pop
pop
pop

(3
) 2.53radio 4.28weekend

ECON 7710, 2010

Remarks:

10.42

The variable Z is difficult to identify. The


functional relationship between the error and Z is
not known. Use WLS at last.
With correct WLS, we expect the standard
errors of the regression coefficients will be
smaller than the OLS counterparts.
A log transformation usually reduces the
degree of heteroskedasticity.
The hypothesis of homoskedasticity should not
be rejected in the new
model.
ECON
7710, 2010

10.43

5. A Complete Example
Sources: Section 8.2.2 (pp. 255 256)
Section 10.5 (pp. 369 376)

Empirical regression model


pconi = 0 + 1regi + 2taxi + 3uhmi + i.
pconi1: petroleum consumption in the ith state
regi

: motor vehicle registrations in the ith state (000)

taxi

: the gasoline tax rate in the ith state(cents per gallon)

ECON wihtin
7710, 2010
uhm : urban highway miles
the ith state

10.44

Equation 1
^ = 389.57*** 0.061reg 36.47***tax + 60.76***uhm se,
pcon
vif
(0.04, 24.3) (13.15, 1.1) (10.26, 24.9)
Adj. R2 = 0.9192, N = 50.

Equation 2
^ = 551.69*** + 0.19***reg 53.59***tax
pcon
se
(0.012)
Adj. R2 = 0.8607, N = 50.

(16.86)

ECON 7710, 2010

10.45

Graphical investigation
1,200

residual

800
400
0
-400
-800

5,000

10,000

15,000

REG

ECON 7710, 2010

20,000

Park test
^
ln(e2) = 1.65 + 0.95***ln(REG)
se

10.46

R2 = 0.1657, N = 50

(0.3083)

White test
^e2 = 11,098,291 + 140REG 0.0005REG2
12.84REGTAX 237,873TAX + 12347TAX2.
R2 = 0.6645, N = 50, NR2 = 33.22.
Checking for other specifications:
Double log, quadratic
ECON 7710, 2010

10.47

(4)

(5)

^ = 551.69*** + 0.19***reg 53.59***tax


pcon
hc se
(0.022)
(23.90)
R2 = 0.8664, N = 50.
pcon
***
*** 1
*** tax
0.1678 218.539
17.3890
0.01367
48.1033
4.6822
reg
reg
reg
se

R 2 0.3600, N 50

(6)

pcon
reg
***
**
0.1684 0.1082
0.0103 tax
0.07159 pop
0.00349
pop
se

R 2 0.1989, N 50
ECON 7710, 2010

Selected
Exercises
Ch. 10: Q. 1, 3, 4, 5, 8, 10, 12, 14

ECON 7710, 2010

10.48

Regression Model

10.49

Yi = 0 + 1X1i + 2X2i + i
zero mean:

E(i|X1i,X2i) = 0

homoskedasticity:

var(i|X1i,X2i) = 2

no autocorrelation: cov(i, j|X1i,X2i,X1j,X2j) =


i= j
heteroskedasticity:

var(i|X1i,X2i) = i2

ECON 7710, 2010

10.50

Heteroskedasticity
Yi = 0 + 1Xi + i
var(i|Xi) = i2 for all i
Y

f(Y)

.
0

X1

X2

X3

7710, 2010
ConditionalECON
Distribution

.
X

10.51

Step 3: Test the overall significance of the


equation in Step 2. (df = number of regressors)
Statistic = NR2err ~ 2df
Critical value (cv) = 2df,
Reject the hypothesis of homoskedasticity if
NR2err > cv.
ECON 7710, 2010

10.52

Step 1: Decide which variable is proportional to


the heteroskedasticity.
Step 2: Divide all terms in the original model by
that variable (divide by Zi ).
Yi
1
X1i
X 2i i
0
1
2

Zi
Zi
Zi
Zi
Zi
*
*
*
*
*
Yi 0 X 0i 1X1i 2 X 2i i

ECON 7710, 2010

10.53

Step 3: Run least squares on the transformed


model which has new variables. Note that the
transformed model have an intercept only if Z is
one of the explanatory variables.
For example, if Zi = X2i, then

Yi
1
X1i
i
0 1
2
Zi
Zi
Zi
Zi
Y 0 X 1X 2
*
i

*
0i

*
1i

ECON 7710, 2010

*
i

In large samples HC standard errors


are consistent measures for any type of
heteroscedasticity. CI & t-test are valid.

ECON 7710, 2010

10.54

Você também pode gostar