Escolar Documentos
Profissional Documentos
Cultura Documentos
The linear regression model is the single most useful tool in the econometrician’s kit.
The multiple regression model is the study if the relationship between a dependent
variable and one or more independent variables. In general it can be written as:
y = f (x1 , x2 , . . . , xK ) + ε (2.1)
= x1 β1 + x2 β2 + · · · + xK βK + ε.
2.2 Assumptions
The classical linear regression model consist of a set of assumptions how a data set
will be produced by the underlying ‘data-generating process.’ The assumptions are:
A1. Linearity
A2. Full rank
A3. Exogeneity of the independent variables
A4. Homoscedasticity and nonautocorrelation
A5. Data generation
A6. Normal distribution
5
6 2 Linear Regression Models, OLS, Assumptions and Properties
2.2.1 Linearity
A SSUMPTION 1 : y = Xβ + ε (2.2)
Notice that the assumption means that Equation 2.2 can be either in terms of the
original variables or after some transformation. For example, consider the following
two equations:
y = Axβ eε
y = Axβ + ε.
While the first equation can be made linear by taking logs, the second equation is
not linear. Typical examples include the constant elasticity model:
ln y = β1 + β2 ln x2 + β3 x3 + · · · + βK xK + ε
Full rank or identification condition means that there are no exact linear relation-
ships between the variables.
Exogeneity of the independent variables means that each of the disturbance terms is
assumed to have zero expected value. This can be written as:
E[y|X] = Xβ . (2.5)
The last assumption, which is convenient, but not necessary to obtain many of the
results of the linear regression model is that the residuals follow a normal distribu-
tion with zero mean and constant variance. That is adding normality to Assumptions
3 and 4.
The first distinction needed at this point is between population parameters and sam-
ple estimates. From the previous discussion we have β and εi as population param-
eters, hence we use b and ei as their sample estimates. For the population regression
we have E[yi |xi ] = x′i β , however β is unknown and we use its estimate b. Therefore
we have:
E[yi |xi ] = ŷi = x′i b. (2.9)
For observation i, the (population) disturbance term is given by:
εi = yi − x′i β . (2.10)
Once we estimate b the estimate of the disturbance term εi is its sample counter-
part, the residual:1
ei = yi − x′i b. (2.11)
It follows that
yi = x′i β + εi = x′i b + ei . (2.12)
A graphical summary of this discussion is presented in Figure 2.2. This figure shows
the simple example of a single regressor.
The idea is to pick the vector b0 that makes the summation in Equation 2.13 the
smallest. In matrix notation:
1 [Dougherty (2007)] follows a similar notation, but most textbooks, e.g. [Wooldridge (2009)], use
β̂ as the sample estimate of β .
10 2 Linear Regression Models, OLS, Assumptions and Properties
∂ S(b0 )
= −2X′ y + 2XXb0 = 0. (2.15)
∂ b0
Let b be the solution. Then, given that X is full rank, (X’X)−1 exists and the solution
is:
b = (X′ X)−1 X′ y. (2.16)
The second order condition is:
∂ 2 S(b0 )
= 2X′ y + 2XXb0 = 0. (2.17)
∂ b0 ∂ b′0
That is satisfied if it yields a positive definite matrix. This will be the case if X is full
rank, then the least squares solution b is unique and minimizes the sum of squared
residuals.
Example 1 Derivation of the least squares coefficient estimators for the simple
case of a single regressor and a constant.
yi = b0 + b1 xi + ei (2.18)
2.3 Ordinary Least Squares Regression 11
ŷi = b0 + b1 xi
For observation i we obtain the residual, then square it and finally sum across
all observations to obtain the sum of squared residuals:
ei = yi − ŷi (2.19)
e2i = (yi − ŷi )2
n n
∑ e2i = ∑ (yi − ŷi )2
i=1 i=1
Again, the coefficients b0 and b1 are chosen to minimize the sum of squared
residuals:
Dividing Equation 2.22 by n and working through some math we obtain the
OLS estimators for the constant:
b0 = ȳ − b1 x̄. (2.23)
From the first order conditions in Equation 2.15 we can obtain the normal equations:
e = y − Xb (2.27)
′ −1 ′
= y − X(X X) X y
= (I − X(X′ X)−1 X′ )y
= My.
ŷ = y − e (2.30)
= y − My
2.3 Ordinary Least Squares Regression 13
= (I − M)y
= X(X′ X)−1 X′ y
= Py.
y = Py + My. (2.31)
The variation of the dependent variable is captured in terms of deviations from its
mean, yi − ȳ. Then the total variation in y is the sum of square deviations:
n
SST = ∑ (yi − ȳ)2 . (2.32)
i=1
To decompose this sum of square deviations into the part the regression model ex-
plain and the part the model does not explain, we first look at a single observation
to get some intuition. For observation i we have:
Subtracting ȳ we obtain:
Figure 2.3 illustrates the intuition for the case of a single regressor.
Let the symmetric and idempotent matrix M0 have (1 − 1/n) in all its diagonal
elements and −1/n in all its off-diagonal elements:
1 − 1n − 1n . . . − 1n
1 1 1
1 i −n 1− n ... −n
h
M0 = I − ii′ = . .. . . .
n .. . . ..
− 1n − 1n . . . 1 − 1n
M0 y = M0 Xb + M0 e (2.36)
y′ M0 = b′ X′ M0 + e′ M0 (2.37)
The second term on the right-hand-side in the last line is zero because M0 e = e and
X′ e = 0, while the third term is zero because e′ M0 X = e′ X = 0 (the regressors are
orthogonal to the residuals). Equation 2.38 show the decomposition of the total sum
of squares into regression sum of squares plus the error sum of squares:
If we calculate the fraction of the total variation in y that is explained by the model,
we are talking about the coefficient of determination, R2 :
SSR b′ X′ M0 Xb e′ e
R2 = = = 1 − (2.40)
SST y ′ M0 y y′ M0 y
As we include variables into the model the R2 will never decrease. Hence, for small
samples, a better measure of fit is the adjusted R2 or R̄2 :
e′ e/(n − K)
R̄2 = 1 − (2.41)
y′ M0 y/(n − 1)
2.4.1 Unbiasedness
The second term after taking expectations is zero because the errors are assumed to
be orthogonal to the regression residuals.
It is relatively simple to derive the sampling variance of the OLS estimators. How-
ever, the key assumption in the derivation is that the matrix X is constant. If X is
not constant, then the expectations should be taken conditional on the observed X.
From the derivation of the unbiasedness of OLS we have that b − β = (X′ X)−1 X′ ε.
Using this in the variance-covariance matrix of the OLS we have:
1 n 2
σ̂ 2 = ∑ ei , (2.44)
n i=1
which makes sense because ei is the sample estimate of εi , and σ 2 is the expected
value of εi2 . However, this one is biased because β is not observed directly. To
obtain an unbiased estimator of σ 2 we can start with the expected value of the sum
of squared residuals. Recall that e = My = M[Xβ + ε] = M ε. Then, the sum of
squared residuals is:
e′ e = ε ′ Mε (2.45)
E[e e|X] = E[ε ′ Mε|X]
′
= E[tr(ε ′ Mε)|X]
= E[tr(Mεε ′ )|X]
= tr(ME[εε ′ |X])
= tr(Mσ 2 I)
= σ 2 tr(M)
= σ 2 tr(In − X(X′ X)−1 X′ )
= σ 2 [tr(In ) − tr(X(X′ X)−1 X′ )]
= σ 2 [tr(In ) − tr(IK )]
= σ 2 (n − K),
e′ e
s2 = . (2.46)
n−K
Hence, the standard errors of the estimators b can be obtained by first obtaining an
estimate of σ 2 using Equation 2.46 and then plugging s2 into Equation 2.43.
Assuming normality conditional on X and with Skk being the kth diagonal element
of X’X−1 we have that:
bk − βk
zk = √ (2.48)
σ 2 Skk
has a standard normal distribution. However, σ 2 is an unknown population parame-
ter. Hence, we use:
bk − βk
tk = √ (2.49)
s2 Skk
that has a t distribution with (n − K) degrees of freedom. We use Equation 2.49 for
hypotheses testing about the elements of β .
Based on Equation 2.49 we can obtain the (1 − α) confidence interval for the popu-
lation parameter βk using:
What this equation is saying is that the true population parameter βk will lie between
the lower confidence interval bk −tα/2,n−K sbk and the upper confidence interval bk +
tα/2,n−K sbk in (1 − α)% of the times. Moreover, tα/2,n−K is the critical value from
the t distribution with (n − K) degrees of freedom. This is illustrated in Figure 2.4.
18 2 Linear Regression Models, OLS, Assumptions and Properties
WƌŽďĂďŝůŝƚLJĚĞŶƐŝƚLJ
ϭ Ϯ ϯ ϰ ϱ ϲ ϳ ϴ ϵ
References
[Dougherty (2007)] Dougherty, C., 2007. Introduction to Econometrics. 3rd ed. New York: Ox-
ford University Press.
[Greene (2008)] Greene, W.H. 2008. Econometric Analysis. 6th ed. New Jersey: Pearson Prentice
Hall.
[Wooldridge (2009)] Wooldridge, J.M., 2009. Introductory Econometrics: A Models Approach.
4th ed. New York: South-Western Publishers.