Você está na página 1de 15

Erin Madden

Assignment Two
Dr. LeSage
October 29, 2014
Assignment Two
Introduction:
In order to further analyze selling prices for a sample of 200 homes in Toledo, Ohio, the
possibilities of collinearity, heteroscedasticity and spatial or serial correlation will be addressed.
Considering that these selling prices are a cross-section model and not a time series model,
spatial correlation rather than serial correlation will be analyzed. For each of these effects, the
nature of the problem and their respective diagnostics as well as corrective procedures will be
discussed.
Part One Section One: Collinearity
Collinearity, also known as multicollinearity, violates the assumption under CLRM that
states, There is no exact linear relationship among the regressors (Gujarati). Therefore, if one
or more relationships are discovered among the regressors, it is likely that this sample of 200
houses exhibits a collinearity problem. Collinearity can cause OLS estimators to have large
variance and covariance, wider confidence intervals, t-statistics with higher likelihood of
insignificance and a high R^2 value. Additionally, the presence of collinear variables can change
coefficient values of other variables within the model. Collinearity can confound multiple
regressors and cause difficulty in deciphering an individual regressors impact on the model.

Part One Section Two: Collinearity Diagnostic Procedure


The BKW diagnostic procedure will be used to determine whether or not a collinearity
problem is present in this sample of house selling prices. Additionally, this procedure will
identify specific variables exhibiting symptoms of linear dependency.
Variance-decomposition proportions greater than 0.50 and K(x) values greater than 30
indicate that the value in question is involved in a linear dependency. In Table 1.1, the K(x) value
of 11961 exhibits variance-decomposition proportions of 0.63 and 0.94 for number of rooms and
number of bedrooms, respectively, which are both greater than the 0.50 threshold and indicate a
possible collinearity problem. A collinear relationship between number of rooms and number of
bedrooms would be intuitive. Second, K(x) values 15696 and 19323 exhibit values greater than
0.50 for house age and number of half baths, respectively, indicating another potential collinear
relationship. In running my regressions, I neglected to include measurements for house age^2
and house age^3, but a relationship among these variables would be expected.
Table 1.1:
Belsley, Kuh, Welsch Variance-decomposition
constan
#
# full
K(x)
age
t
tla lotsize rooms # beds baths
1
0.00
0.00
0.00
0.04
0.00
0.00 0.00
12
0.00
0.00
0.26
0.46
0.00
0.00 0.00
167
0.02
0.00
0.15
0.01
0.00
0.00 0.00
4351
0.01
0.00
0.51
0.00
0.30
0.04 0.00
11961
0.03
0.00
0.00
0.00
0.63
0.94 0.01
15696 0.73
0.01
0.00
0.04
0.06
0.02 0.47
19323 0.04
0.00
0.04
0.04
0.00
0.00 0.36
42983 0.06
0.99
0.04
0.42
0.00
0.00 0.17

# half
baths
0.00
0.00
0.00
0.00
0.00
0.22
0.73
0.05

Part One Section Three: Collinearity Corrective Procedures


One option for correcting collinearity is to gather a new data set in the case that the data
set is weak and causing the collinearity program. Furthermore, the observer could eliminate the
variables involved in the dependent relationship, although one would risk eliminating primary
variables.
Ridge Regression is perhaps best-suited for correcting collinearity. To do so, Ridge
Regression inflates the models smallest eigenvalues. However, this increase in accuracy comes
at a cost of increased bias. In comparing Tables 1.3 and 1.4 to Table 1.2s OLS estimates, we can
observe that there is not a difference in significance of the t-probabilities for house age, number
of rooms, number of bedrooms or number of half baths (the possible collinear variables). Given
that a notable difference does not exist between the ridge results and OLS results, we can
conclude that we do not have a collinearity problem.
Table 1.2: OLS Estimates
Ordinary Least-squares Estimates
Dependent Variable =
price
R-squared
= 0.5777
Rbar-squared = 0.5623
sigma^2
= 70586735.0749
Nobs, Nvars = 200, 8
***********************************************
Variable
Coefficient
t-statistic t-probability
constant
39289.352246
6.846581
0.000000
tla
15.451200
5.088751
0.000001
lotsize
1.235406
1.926809
0.055480
rooms
-678.130925
-0.718335
0.473424
beds
565.753718
0.385550
0.700257
full baths
3106.830399
1.251977
0.212101
half baths -3833.847406
-1.517442
0.130800
age
-408.122795
-9.975477
0.000000

Table 1.3: Ridge Regression


Ridge Regression Estimates
Dependent Variable =
price
R-squared
= 0.5766
Rbar-squared = 0.5612
sigma^2
= 70760524.7356
Ridge theta = 0.00094520666
Nobs, Nvars = 200, 8
***************************************************************
Variable
Coefficient
t-statistic
t-probability
constant
35629.988505
6.538416
0.000000
tla
15.320274
5.171788
0.000001
lotsize
1.526410
2.449933
0.015185
rooms
-626.549703
-0.702344
0.483316
beds
643.060589
0.457959
0.647499
full baths
3508.227197
1.434100
0.153170
half baths
-4017.790954
-1.593745
0.112638
age
-386.561320
-9.806791
0.000000

Table 1.4: Ridge Regression


Ridge Regression Estimates
Dependent Variable =
price
R-squared
= 0.5777
Rbar-squared = 0.5623
sigma^2
= 70586786.4803
Ridge theta = 0.0037808266
Nobs, Nvars = 200, 8
***********************************************
Variable
Coefficient
t-statistic t-probability
constant
39221.580430
6.840977
0.000000
tla
15.457863
5.091168
0.000001
lotsize
1.240427
1.935394
0.054411
rooms
-677.721128
-0.717940
0.473667
beds
566.638603
0.386181
0.699790
full baths
3116.451628
1.256201
0.210569
half baths
-3838.416063
-1.519561
0.130266
age
-407.813655
-9.971487
0.000000

In figure 1.1 shown below, we can observe that a significant coincidental movement of
two or more coefficients does not exist. This is consistent with our previous conclusion that
collinearity is not present in this sample of selling prices.
Figure 1.1
Values of Regression Coefficients as a Function of

5000

tla
lotsize
rooms
beds
full baths
half baths
age

4000

Regression Coefficients

3000
2000
1000
0
-1000
-2000
-3000
-4000
-5000

0.5

1
1.5
2
2.5
3
Value of , vertical line shows H-K value

3.5

4
-3

x 10

Part Two Section One: Heteroscedasticity


Heteroscedasticity violates Gauss-Markov theorem in that the variance of each
disturbance is not constant when heteroscedasticity is present. When heteroscedasticity is
present, OLS estimators are no longer efficient and t and F tests based on standard CLRM
assumptions may not be reliable.

Part Two Section Two: Heteroscedasticity Diagnostic Procedure


Three tests exist for detecting the possibility of heteroscedasticity. They are the White,
Newey-West and Geweke procedures. The White and Newey-West diagnostics examine
significant changes in t-statistics compared with those of the OLS model to evaluate whether or
not a heteroscedasticity problem exists. The Geweke procedure tests for outliers as well as
possible heteroscedasticity by examining changes in the t-statistics as well as the coefficients. If
an inflation of t-statistics or reduction in t-probabilities occurs during any of these diagnostics,
then a possible case of heteroscedasticity exists.
The OLS regression results in lot size being significant at the 90% level, compared with
the White regression where lot size increases to being significant at the 95% level. Lotsize has a
higher t-probability in the White regression compared with the OLS, while tla has a lower tprobability in the White regression compared with the OLS. The constant term and house age
have the same t-probabilities in both regressions and the remaining variables show no change in
significance.
In comparing the Newey-West regression to OLS regression, we once again see an
increase in the significance of lot size from the 90% level to the 95% level. Additionally, we see
half baths become significant at the 90% level in the Newey-West regression, while it was not
significantly different from zero in the OLS regression. The t-probability is once again higher for
lotsize in comparison with OLS as is the t-probability for # half baths due to its change in

significance. The t-probability for tla is once again lower in comparing the OLS with NeweyWest results and there is no statistical change in the other variables.
Moving on to the Geweke robust regression, we see that unlike the White and NeweyWest regressions, lotsize remains significant at the 90% level. Additionally, the t-probability for
lotsize is lower in the Geweke regression than it is in the OLS regression. It is also important to
note that the number of half baths is significant at the 90% level in the Geweke regression as
well as the Newey-West regression, unlike in the OLS.
There is a notable change in the Geweke coefficients several variables, which points to an
outlier problem rather than a heteroscedasticity problem. The visual representation of residuals in
Figure 2.1 is consistent with the existence of an outlier problem, especially around the 200th
observation but also between the 40th and 90th observations. Figure 2.1 does not exhibit a
megaphone shape, which is consistent with our conclusion that we have an outlier problem rather
than heteroscedasticity. Figure 2.2 shown below displays a dramatic spike in a vi estimate near
the 200th observation and smaller spikes throughout the sample, which adds to the evidence
indicating that there is indeed an outlier problem.
Table 2.1

OLS
Variable
constant
tla
lotsize
# rooms
# bedrooms
# full baths
# half baths

White

NeweyWest

tprobabilit
tCoefficient
y
t-probability
probability
39289.35224
6
0.000000
0.000000
0.000000
15.451200
0.000001
0.000033
0.000044
1.235406
0.05548
0.023303
0.030193
-678.130925
0.473424
0.491529
0.484962
565.753718
0.700257
0.694302
0.675111
3106.830399
0.212101
0.194091
0.194819
0.1308
0.119239
0.080833

Geweke Robust
tprobabilit
Coefficient
y
41294.88457
9
0.000000
16.921674
0.000000
1.124557
0.067979
-942.270632
0.362778
566.184302
0.708540
2937.587720
0.240276
0.078907

3833.847406
-408.122795

age

0.000000

0.000000

0.000000

4523.058435
-427.801944

Figure 2.1
4

x 10

residuals

1
0
-1
-2
-3
-4

20

40

60
80
100
120
140
residuals sorted by house sizea

160

180

200

0.000000

Figure 2.2
Vi plot for outliers and hetero

8
7

Vi estimates

6
5
4
3
2
1
0

20

40

60
80
100
120
140
Observations sorted by house size

160

Part Two Section Three: Heteroscedasticity Corrective Procedure

180

200

Based on the previous diagnostic procedure, we do have a heteroscedasticity problem in


our sample. When faced with heteroscedasticity, several remedial procedures can be carried out.
First, one could implement the Weighted Least Squares (WLS) method, which divides each
observation by i (heteroscedasticity) and estimates the transformed models by OLS. However,
this method requires that the true 2i be known. A second approach estimates the value of 2i and
transforms the original model so that the variance of the errors might be homoscedastic.
Additionally, a logarithmic transformation may be used. This method regresses the logarithm of
the dependent variable on the regressors, and in consequence compresses the scale by which the
variables are measured.

Part Three Section One: Spatial Correlation


Spatial correlation violates the assumption under CLRM that the error terms are not
correlated. If this assumption is violated, the OLS estimators are still unbiased and consistent as
well as normally distributed in large samples, but they are no longer efficient and estimated
standard errors may prove to be unreliable.
Part Three Section Two: Spatial Correlation Diagnostic Procedure
The estimates for the Bayesian spatial error model are displayed in Table 3.1. Lambda
represents the spatial dependence parameter, which has a value () of 0.046438 and a zprobability of 0.724068. This lack of significance indicates that this sample of 200 homes does
not have a spatial dependence in the disturbances.
In Table 3.2, the estimates for the Robust Spatial Error Model are used to further
diagnose possible heteroscedasticity and outliers in addition to the spatial error diagnostics of the

SEM. The Robust SEM exhibits a value of .023972 and a t-statistic of .850692, which is not
significant and is consistent with our previous conclusion that spatial correlation is not present in
our sample. Again, the significance levels of explanatory variables remained the same as those of
the OLS. The coefficients, however, are notably different for some variables from those of the
OLS, which is consistent with our conclusion that there is an outlier problem in this sample.
Figure 3.1 provides a visual depiction of the spatial error models vi estimates. This plot
reiterates our outlier problem in that there erratic vi estimates for several observations.

Table 3.1 Spatial Error Model


Bayesian spatial error model
Heteroscedastic version
Dependent Variable =
price
R-squared
= 0.5775
Rbar-squared
= 0.5621
mean of sige draws = 71613853.9456
r-value
= 200
***************************************************
Posterior Estimates
Variable
Coefficient
constant
38454.674443
tla
15.327073
lotsize
1.246072
rooms
-691.867766
beds
638.945560
full baths
3105.127907
half baths -3703.708169
age
-397.538849
lambda
0.046438

Asymptot t-stat z-probability


6.308688
0.000000
4.961908
0.000001
1.906618
0.056570
-0.701026
0.483286
0.419780
0.674646
1.247771
0.212115
-1.406729
0.159508
-8.185184
0.000000
0.353028
0.724068

Table 3.2 Robust Spatial Error Model


Bayesian spatial error model
Heteroscedastic version
Dependent Variable =
price
R-squared
= 0.5771
Rbar-squared
= 0.5617
mean of sige draws = 60460019.6842
r-value
= 4
*********************************************************
******
Posterior Estimates
Variable
Coefficient
constant
39871.867913
tla
16.291588
lotsize
1.163703
rooms
-813.945209
beds
551.666006
full baths
3015.082331
half baths -4196.466392
age
-412.799843
lambda
0.023972
Figure 3.1

Asymptot t-stat z-probability


6.071175
0.000000
4.753956
0.000002
1.721115
0.085230
-0.775873
0.437824
0.341985
0.732362
1.105854
0.268790
-1.526290
0.126938
-7.894785
0.000000
0.188235
0.850692

Vi plot for outliers and hetero

3.5

Vi estimates

2.5

1.5

20

40

60

80
100
120
140
Observations unsorted

160

180

200

Conclusion
In comparing the OLS regression with two ridge regressions, we were able to determine
that our sample does not contain a collinearity problem. The variables that were possibly
involved in collinear relationships did not change in significance after the ridge regression,
which would be expected if collinearity was present.
Furthermore, our diagnostics revealed that this sample does not contain a
heteroscedasticity problem. At first it was unclear whether or not we had heteroscedasticity or
outliers, but upon running the Geweke Robust Regression, it became apparent that our sample
contains outliers and does not have a heteroscedasticity problem. This outlier problem was
further illustrated by a residual plot and two vi estimate plots. The lack of significance in our
values indicates that this sample does not contain a spatial correlation problem. Given our

conclusion that this sample does not contain problems of collinearity, heteroscedasticity or
spatial correlation, but does in fact have an outlier problem, it is most appropriate to use a Robust
OLS regression model in analyzing this data set.

Works Cited
Gujarati, Damodar. ECONOMETRICS BY EXAMPLE . Houndmills, Basingstoke,
Hampshire: Palgrave Macmillan, 2011.

Você também pode gostar