Escolar Documentos
Profissional Documentos
Cultura Documentos
Jean-Marie Dufour
McGill University
First version: March 1983
Revised: February 2002, July 2011
This version: July 2011
Compiled: November 21, 2011, 11:05
This work was supported by the William Dow Chair in Political Economy (McGill University),
Contents
1.
2.
3.
4.
Coefficient of determination: R2
3
3
4
5
5
5
7
8
5.
Notes on bibliography
10
6.
10
1.
Coefficient of determination: R2
Let y = X + be a model that satisfies the assumptions of the classical linear model, where y and
are T 1 vectors, X is a T k matrix and is k 1 coefficient vector. We wish to characterize to
which extent the variables included in X (excluding the constant, if there is one) explain y.
A first method consists in computing R2 , the coefficient of determination, or R = R2 , the
coefficient of multiple correlation. Let
T
y = X , = y y , y = yt /T = i y/T ,
(1.1)
(1.2)
t=1
SST
(1.3)
(1.4)
(1.5)
t=1
T
SSR =
t=1
T
SSE =
t=1
(1.6)
(1.7)
(1.8)
1.1 Definition R2 = 1 V ( ) /V (y) = 1 (SSE/SST ) .
1.2 Proposition R2 1 .
1.3 Lemma y y = y y + .
P ROOF We have
y = y + and y = y = 0,
hence
y y = (y + ) (y + ) = y y + y + y + = y y + .
(1.9)
i = t = 0
t=1
hence
1 T
yt
T t=1
1
1
1
i y = i (y ) = i y = y ,
T
T
T
1
A = ii = ,
T
1
Ay = y ii y = y iy ,
T
=
= (y iy) (y iy) = y A Ay = y Ay
= (y + ) A (y + )
= y Ay + y A + y A + A
= y Ay +
= (Ay)
(Ay)
+ = SSR + SSE .
V (y)
SSR
=
V (y) SST
and 0 R2 1 .
V ( ) V (y) V ( ) V (y)
SSR
=
=
=
V (y)
V (y)
V (y) SST
2
hence R2 0.
where
C (y, y)
(y, y)
=
1/2
V (y) V (y)
1
1 T
C (y, y)
= (yt y) (yt y) = (Ay) (Ay)
T t=1
T
(y, y)
=
2.
2.1.
1
1
= (Ay)
(Ay + ) (Ay)
(Ay)
= V (y)
,
T
T
1/2
V (y)
V (y)
= R2 0 .
1/2 =
V (y)
V (y) V (y)
R2 is descriptive statistic which measures the proportion of the variance of the dependent variable
y explained by suggested explanatory variables (excluding the constant). However, R2 can be related
to a significance test (under the assumptions of the Gaussian classical linear model).
Consider the model
yt = 1 + 2 Xt2 + + k Xtk + t , t = 1, . . . , T.
We wish to test the hypothesis that none of these variables (excluding the constant) should appear
in the equation:
H0 : 2 = 3 = = k = 0 .
3
F=
where q = k 1, S is the error sum of squares from the estimation of the unconstrained model
: y = X + ,
where X = [i, X2 , . . . , Xk ] and S s the error sum of squares from the estimation of the constrained
model
: y = i 1 + ,
where i = (1, 1, . . . , 1) . We see easily that
S
1 =
S
y X
i i
1
y X = SSE ,
i y =
1 T
yt = y , (under )
T t=1
and
F =
=
SSE
1 SST
/(k 1)
(SST SSE) / (k 1)
=
SSE
SSE/ (T k)
SST / (T k)
R2 / (k 1)
F (k 1, T k) .
(1 R2 ) / (T k)
As R2 increases, F increases.
2.2.
S
S
, R21 = 1
,
SST
SST
S = 1 R20 SST , S = 1 R21 SST .
3.
Since R2 can take negative values when the model does not contain a constant, R2 has little meaning
in this case. In such situations, we can instead use a coefficient where the values of yt are not
centered around the mean.
3.1 Definition
Re2 = 1 /y y .
4.
4.1.
R 2 the
0 R 2 1 .
An unattractive property of the R2 coefficient comes form the fact that R2 cannot decrease when
explanatory variables are added to the model, even if these have no relevance. Consequently, choosing to maximize R2 can be misleading. It seems desirable to penalize models that contain too many
variables.
Since
V ( )
R2 = 1
,
V (y)
where
SST
SSE
1 T
1 T
V ( ) =
= t2 , V (y) =
= (yt y)2 ,
T
T t=1
T
T t=1
1 T 2
SSE
=
t ,
T k T k t=1
SST
1 T
=
(yt y)2 .
T 1 T 1 t=1
4.3 Proposition
2
k1
2
2
2 .
R = 1 TT 1
k 1 R = R T k 1 R
T 1 SSE
T 1
= 1
1 R2
= 1
T k SST
T k
k1
T k+k1
2
1R = 1 1+
= 1
1 R2
T k
T k
k1
k1
1 R2 = R2
1 R2 . Q.E.D.
= 1 1 R2
T k
T k
R R2 1 .
P ROOF The result follows from the fact that 1 R2 0 and (4.2).
4.4 Proposition
R = R2
4.5 Proposition
R 0 iff R2
iff (k = 1 or R2 = 1) .
k1
T 1
4.6 Remark When several models are compared on the basis of R2 or R , it is important to have the
2
same dependent variable. When the dependent variable (y) is the same, maximizing R is equivalent
to minimizing the standard error of the regression
"
1 T 2
s=
t
T k t=1
4.2.
#1/2
, t = 1, . . . , T,
, t = 1, . . . , T.
(4.1)
(4.2)
We can then show that the value of R associated with the restricted model (4.1) is larger than the
one of model (4.2) if the t statistic for testing k = 0 is smaller than 1 (in absolute value).
2
4.7 Proposition If Rk1 and Rk are the values of R for models (4.1) and (4.2), then
2
R
1
k
2
2
tk2 1
Rk Rk1 =
(T k + 1)
(4.3)
Rk Rk1
|tk | 1 .
Rk S Rk1
|tk | S 1 .
iff
P ROOF By definition,
2
Rk = 1
s2k
s2y
and Rk1 = 1
s2k1
s2y
where s2k = SSk / (T k) and s2k1 = SSk1 / (T k + 1) . SSk and SSk1 are the sums of squared
errors for the models with k and k 1 explanatory variables. Since tk2 is the Fisher statistic for testing
k = 0, we have
tk2 =
(SSk1 SSk )
SSk / (T k)
(T k + 1) s2k1 (T k) s2k
s2
k
2
2
(T k + 1) 1 Rk1 (T k) 1 Rk
2
1 Rk
!
2
1 Rk1
(T k)
2
1 Rk
2
s2k = s2y 1 Rk . Consequently,
= (T k + 1)
2
for s2k1 = s2y 1 Rk1
and
2
1 Rk1
t 2 + (T k)
2
k
= 1 Rk
T k+1
and
2
2
2
1 Rk1 1 Rk
t 2 + (T k)
2
k
=
1 Rk
1
T k+1
t2 1
2
k
.
=
1 Rk
T k+1
Rk Rk1 =
4.3.
will raise or decrease R , where C : q k, r : q 1 and rank(C) = q. Let RH0 and R be the values
2
of R for the constrained (by H0 ) and unconstrained models, similarly, s20 and s2 are the values of
the corresponding unbiased estimators of the error variance.
4.8 Proposition Let F be the Fisher statistic for testing H0 . Then
s20 s2 =
qs2
(F 1)
T k+q
and
s20 S s2
iff F S 1 .
P ROOF If SS0 and SS are the sum of squared errors for the constrained and unconstrained models,
we have:
SS0
SS
s20 =
and s2 =
.
T k+q
T k
(SS0 SS)/q
SS/ (T k)
(T k + q) s20 (T k) s2
T k + q s20
T k
=
2
2
qs
q
s
q
hence
[qF + (T k)]
,
(T k) + q
q (F 1)
,
= s2
(T k) + q
s20 = s2
s20 s2
and
s20 S s2
iff F S 1 .
R
2
2
R RH0 =
(F 1)
T k+q
and
RH0 T R
iff F S 1 .
P ROOF By definition,
2
RH0 = 1
s20
s2
2
,
.
R
=
1
s2y
s2y
Thus,
2
R RH0
s2 s20
s2y
q
T k+q
9
s2
s2y
(F 1)
=
hence
2
q 1R
T k+q
RH0 T R
iff
(F 1)
F S1.
5.
Notes on bibliography
2
The notion of R was proposed by Theil (1961, p. 213). Several authors have presented detailed
discussions of the different concepts of multiple correlation: for example, Theil (1971, Chap. 4),
2
Schmidt (1976) and Maddala (1977, Sections 8.1, 8.2, 8.3, 8.9). The R concept is criticized by
Pesaran (1974). The mean and bias of R2 were studied by Cramer (1987) in the Gaussian case, and
by Srivastava, Srivastava and Ullah (1995) in some non-Gaussian cases.
6.
5. Maddala (1977, Sections 8.1, 8.2, 8.3, 8.9) _ Discussion of R2 and R along with their relation
with hypothesis tests.
6. Hendry and Marshall (1983)
7. Cramer (1987)
8. Ohtani and Hasegawa (1993)
10
11
References
Cramer, J. S. (1987), Mean and variance of R2 in small and moderate samples, Econometric Reviews 35, 253266.
Hendry, D. F. and Marshall, R. C. (1983), On high and low R2 contributions, Oxford Bulletin of
Economics and Statistics 45, 313316.
Maddala, G. S. (1977), Econometrics, McGraw-Hill, New York.
Ohtani, L. and Hasegawa, H. (1993), On small-sample properties of R2 in a linear regression model
with multivariate t errors and proxy variables, Econometric Theory 9, 504515.
Pesaran, M. H. (1974), On the general problem of model selection, Review of Economic Studies
41, 153171.
Schmidt, P. (1976), Econometrics, Marcel Dekker, New York.
Srivastava, A. K., Srivastava, V. K. and Ullah, A. (1995), The coefficient of determination and its
adjusted version in linear regression models, Econometric Reviews 14, 229240.
Theil, H. (1961), Economic Forecasts and Policy, 2nd Edition, North-Holland, Amsterdam.
Theil, H. (1971), Principles of Econometrics, John Wiley & Sons, New York.
12