Você está na página 1de 9

Wayne-Roy Gayle ECON 472: Introductory Econometrics Department of Economics University of Virginia

1 Geometry of the Least Squares Estimator


1.1 Introduction
We start with the model given by: yi = xi1 1 + + xik k + ui ,

i = 1, , n

or equivalently yi = xi + ui , i = 1, , n (1.1)

with xi = (xi1 , , xik ) and = (1 , , k ) . The objective is to recover . The statistical properties of ui are crucial to achieve our objective in this linear regression model, but we shall ignore them for the moment. By denition, ui represents the discrepancy between yi and xi . The obvious question is what value of minimizes this discrepancy, given observations of yi and xi . Projection theory gives us the answer to this question. To investigate this, it is convenient to write equation (1.1) in the following matrix form Y = X + u,

where Y = [y1 , , yn ] , u = [u1 , , un ] , and x11 x1k . . . . X = [x1 , , xk ] = . . xn1 xnk

1.2 Projections
Denition 1.1. Subspace. The subspace of the Euclidean space E n associated with the k basis vectors x1 , , xk , denoted by S(x1 , , xk ) or S(X ) consists of every vector in E n that can be formed as a linear combination of X , that is z E |z = bi xi ,
n i=1 k

S (X ) =

bi .

Denition 1.2. Orthogonal Complement. The orthogonal complement of S(X ) in E n , which is denoted by S (X ) is the set of all vectors w in E n that are orthogonal to everything in S(X ), that is, S(X ) = w E n |w z = 0 for all z S(X ) . If the dimension of S(X ) is k, then the dimension of S (X ) is n k. Proposition 1.3. If S(X ) is a subspace of the Euclidean space E n then every vector y E n has a unique decomposition as y = y1 + y2 where y1 S(X ) and y2 S (X ). Denition 1.4. Orthogonal Projection. If S(X ) is a subspace of the Euclidean space E n , and y = y1 + y2 where y1 S(X ) and y2 S (X ), then y1 is called the orthogonal projection of y on S(X ), denoted by PX (y). By the above proposition, PX (y) is unique.

Theorem 1.5. Projection Theorem. Suppose S(x) is a subspace of E n . Then for every z E n , there is a unique vector PX (y) S(X ) satisfying ) (y X b )]1/2 . y PX (y) = min y y = min [(y X b)(y X b)]1/2 = [(y X b
y S(X ) bRk

Moreover, the projection operator PX is determined uniquely by (y PX (y))PX (y) = MX (y) PX (y) = 0

for MX = In PX . Lemma 1.6. S(X ) = S(PX ) and S (X ) = S(MX ). Proposition 1.7. (Properties.) A. If S(X ) is a subspace of E n , then the projection operator Px satises i. PX is linear: PX (y + z) = PX (y) + PX (z). ii. PX is idempotent. iii. PX is symmetric. B. If an operator satises (i), (ii), and (iii), then is an orthogonal projection operator on S(X). Theorem 1.8. (Iterated Projection.) Let X = [X1, X2 ], then

PX1 PX = PX PX1 = PX1 .

can be minimized What does the orthogonal projection operator PX in Theorem 1.5 look like? b by taking derivative and setting to zero to obtain. ) = 0 X y = X X b . 2X (y X b

So we have that = (X X )1X y. b Therefore the projection of y onto S(X ) is given by = X (X X )1 X y, PX (y) = X b

with the projection operator given by PX = X (X X )1 X .

Two projection operators of special interest are the complementary projection operators, PX and MX , for X = = (1, , 1) so that P = ( )1 ,

and M = In P Then for any y E n , = ( )1 y n i=1 yi 1 . 1 =y . = Jn y = . . n n n y i=1 i

P (y)

Hence, P maps any vector to a vector of the average. Similarly, M demeans the vector: M (y) = (In P )y = y y .

Lemma 1.9. (Annihilation.) PX MX = 0.

2 Fitted Values, Residuals, and a Measure of Fit.


Recall our model of interest y = X + u. From above we have that y = PX y + MX y = y +u . is called the tted values. By denition of the projection, we have that y = PX y = X 1. u = (I PX )u. 2. X u = 0. 3. u u . 5

4. y y = y y +u u Property 2 implies that when there is a constant among the regressors, we have = 0. u
n

i=1

This also implies that, when there is a constant as a regressor, yi = y i , since 0 = X u = X (y y ). minimizes the length of the residual vector. Therefore Property 3 comes from the fact that this length must be smaller than the length of any other residual vector, including the residual associated with the true regression coefcient vector .

2.1 Linear Transformations of Regressors


Consider the two linear regressions y = X + u, and y = W + u, where W = X A for some nonsingular matrix A. We can write y = W + u = X A + u

which shows that A = or = A1 .

2.2 The Frisch-Waugh-Lovell Theorem


Consider the model y = X + u, where X is a T k matrix. Theorem 2.1. Let X = [X1, X2 ] where X1 is a full rank T k1 matrix and X2 is a T k2 matrix. Let 1 be the OLS estimator of 1 obtained from y = X1 1 + X2 2 + error1 be the OLS estimate of obtained from the regression and 1 1
y = X1 1 + error

with y = (I P2 )y,
X1 = (I P2 )X1 , and P2 = X2 (X2 X2 )1 X2 .

1 = and the residuals from both regressions are identical. Then, we have that 1

2.3 The Coefcient of Determination (R2 )


Let us assume that the regressors X includes a constant. Recall the result that y y = y y +u u , or equivalently, y
2

= y

+ u 2.

is called the total sum of squares (TSS), y


2

is called the explained sum of squares (ESS),

and u

is called the residual sum of squares (RSS). The R2 is dened by RSS ESS = 1 . T SS T SS

R2 =

The R2 measures the goodness of t of a regression, that is, the proportion of the total variation in = PX y. By construction 0 R2 1. R2 = 1 when y y that is explained by the regression y = X coincide, which is called a perfect t. R2 = 0 when y and u and X = MX y coincide, which means that X has no power in explaining the variation in y. Some important properties of the R2 are as follows 1. The R2 is invariant under nonsingular linear transformations of the regressors. 2. The R2 is invariant to changes in the scale of y. 3. The R2 is invariant to changes in the location of y.

Given that X includes a constant, the FWL theorem implies that an equivalent representation of the R2 is given by R2 = y M y n y )2 i=1 (y = . y M y n )2 i=1 (y y

Notice that this result relies critically on a constant being included in X . If a constant is not included, then the resulting regression is called a regression through the origin. In that case the R2 , as calculated above can be negative. One unappealing feature of the R2 is that it increases monotonically with the number of included regressors. Thus, one can inate the perceived goodness of t by adding irrelevant regres-

sors. The reason for this is that the R2 is normalized by the wrong degrees of freedom: y )2 /n n i=1 (y . R = n )2 /n i=1 (y y
2

This motivates the denition of an alternative measure of the goodness of t, called the adjusted 2 , which is given by R2 or the R 2 = 1 y MX y/(n k 1) . R y M y/(n 1)

Você também pode gostar