Ordinary Least Square Estimation

1/30
EC114 Introduction to Quantitative Economics

12. Ordinary Least Squares Estimation
Marcus Chambers
Department of Economics
University of Essex
24/26 January 2012
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
2/30
Outline
1
Ordinary Least Squares (OLS) Estimation
2
Goodness-of-t
3
Computing OLS Estimates
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 9.3 and 9.4.
Ordinary Least Squares (OLS) Estimation 3/30
Recall that the population regression line is given by
E(Y) = + X
and the sample regression line is given by
Y = a + bX
where a and b can be regarded as estimates of and .
Another way to think of these relationships is in terms of Y
itself:
Y = + X + ,
Y = a + bX + e,
where is the disturbance and e is the residual.

It is clear that, if we vary the sample regression line in
some way, we will obtain a different set of residuals.
In other words, if we vary the estimation method for the
sample regression line, we will obtain a different set of
residuals.
The best known method of tting a straight line to a scatter
diagram is Ordinary Least Squares (OLS).
The sample regression line is determined by the intercept
a and the slope b.
A good criterion in the choice of a and b is to make the
residuals small somehow.
Small residuals imply that the differences between the
actual Y and the tted

Y are small.
The OLS method of estimation chooses a and b in order to
minimize the sum of the squares of the residuals:
n
i=1
e
2
i
= e
2
1
+ e
2
2
+ . . . + e
2
n
.
We know that e
i
= Y
i
a bX
i
so that the sum of squared
residuals can be written
S =
i
e
2
i
=
i
(Y
i
a bX
i
)
2
,
which is a function of a and b alone because Y
i
and X
i
are
the given data points.
The objective is to minimise S with respect to a and b.
To do this we need to partially differentiate S with respect to
a and b and set these derivatives equal to zero:
S
a
= 2
(Y
i
a bX
i
) = 0,
S
b
= 2
X
i
(Y
i
a bX
i
) = 0.
As these derivatives are set equal to zero we can divide
both sides by 2 and re-arrange the terms to give:
Y
i
= na + b
X
i
,
X
i
Y
i
= a
X
i
+ b
X
2
i
,
noting that
a = na.
These are known as the normal equations (but are not
related to the normal distribution).
Note that, because Y
i
a bX
i
= e
i
, we can also write the
rst-order conditions in the form:
S
a
= 2
e
i
= 0,
S
b
= 2
X
i
e
i
= 0.
We therefore have two equations in two unknowns which
we can solve for a and b.
The extension question on Problem Set 12 deals with this
solution.
A compact representation of the solution is:
b =
x
i
y
i
x
2
i
, a =

Y b
X,
where x
i
= X
i
X, y
i
= Y
i
Y, and

X and

Y are the sample
means of X and Y respectively.
The above expressions for a and b are the OLS estimators
of and .
We can compute a and b from various sample sums,
making use of the following:
x
2
i
=
(X
i
X)
2
=
X
2
i

(
X
i
)
2
n
=
X
2
i
n
X
2
,
x
i
y
i
=
(X
i
X)(Y
i
Y) =
X
i
Y
i
X
i
Y
i
n
=
X
i
Y
i
n
Y.
In view of this another common expression for b is:
b =
X
i
Y
i
X
i
Y
i
n
X
2
i

(
X
i
)
2
n
.
The data on money stock (Y) and GDP (X) in Table 9.1 of
Thomas yield:
X
i
= 132.004,
X
2
i
= 1247.66,
X
i
Y
i
= 220.956,
Y
i
= 23.718,
Y
2
i
= 45.154, n = 30.
Based on these quantities we obtain:
x
2
i
=
X
2
i

(
X
i
)
2
n
= 1247.66
132.004
2
30
= 666.86,
x
i
y
i
=
X
i
Y
i
X
i
Y
i
n
= 220.956
132.004 23.718
30
= 116.60.
The slope coefcient is therefore
b =
x
i
y
i
x
2
i
=
116.60
666.86
= 0.1748.
We also nd that
X =
132.004
30
= 4.40,

Y =
23.718
30
= 0.7906
and so the intercept is
a =

Y b
X = 0.7906 (0.1748 4.40) = 0.0212.

The sample regression line is therefore
Y = 0.021 + 0.175X.

Note that the residuals (the vertical distance between the
data point and the line) are larger for the countries with
larger GDP (X).
Note, too, that the sample regression line passes through
the point

X,

Y.
The reason can be seen by re-arranging the equation for a,
which gives
Y = a + b
X.
This is known as the point of sample means.
In our example this point is (4.40,0.791).
Note that the intercept a > 0, although its value is small.
We had expected a relationship of the form Y = bX,
suggesting a = 0.
Although a is small we will want to know whether it is
signicantly different from zero we shall consider testing
the hypothesis that a = 0 at a later point.
The value b = 0.175 means that the demand for money per
head will increase by $175 whenever GDP per head
increases by $1000.
But a more interesting quantity is the income (GDP)
elasticity of the demand for money.
We can use the previous results to compute an estimate of
it the required elasticity is given by the formula
=
dY
dX
X
Y
.
However the elasticity varies along our sample regression
line because the values of X and Y vary along the line.
It is, however, common to evaluate at the sample means
of X and Y, while dY/dX can be estimated by b.
In our case the elasticity evaluated at the sample means is
= 0.175
4.40
0.791
= 0.973.
Thus we obtain a GDP elasticity close to unity.
A 1% rise in GDP per head leads to a 0.97% rise in
demand for money per head.
It would be of interest to test the hypothesis that = 1 and
we will examine how to do this later on in the term.
Goodness-of-t 16/30
So far we have tted the sample regression line
Y = 0.021 + 0.175X
to our scatter of points in the money-income example.
The values of the intercept (0.021) and slope (0.175) were
obtained by the method of ordinary least squares (OLS)
which chooses these values so as to minimise the sum of
squared residuals:
i
e
2
i
=
i
(Y
i
a bX
i
)
2
.
But we might want to ask the question: how well does our
sample regression line t the data?
Lets begin by taking a look at the graph:
Goodness-of-t 17/30

We can observe that the sample regression line passes
fairly close to each point in the scatter, although with
greater dispersion for larger values of X (GDP per head).
We need, however, to be more precise about this; in other
words we need some sort of numerical measure of
goodness of t.
Goodness-of-t 18/30
We will use the coefcient of determination, R
2
, which is
equal to the square of the correlation coefcient R, where
R =
(X
X)(Y
Y)
(X
X)
2
(Y
Y)
2
We know that 1 R 1, and so it follows that 0 R
2
1.
In our example of the demand for money we found that
R = 0.8787.
Hence the coefcient of determination must therefore be
R
2
= (0.8787)
2
= 0.772.
In regression analysis it is possible to give a precise
interpretation to the value 0.772 obtained for R
2
.
Goodness-of-t 19/30
Suppose we ask the question:
What proportion of the variation in the demand for
money in our 30 countries can be attributed to the
variation in GDP?
If our sample regression line is able to explain a high
proportion of the variation in the demand for money then it
must provide a good t to the data.
Consider the next Figure, which refers to a single sample
point, namely France, which is observation i = 8.
We have Y
8
= 2.3912;

Y
8
= 1.6776; and the overall sample
mean is

Y = 0.7906.
Goodness-of-t 20/30

The diagram shows the tted line, the sample mean line,
the residual e
8
= Y
8
Y
8
, as well as the deviations Y
8
Y
and

Y
8
Y.
Goodness-of-t 21/30
The variations in demand for money are measured relative
to the mean.
The following relationship holds:
total = variation due + residual
variation to X variation
Y
8
Y =

Y
8
Y + e
8
1.6006 = 0.8870 + 0.7136
Such a relationship holds for all points in the sample, so
that we can write
(Y
i
Y) = (
Y
i
Y) + e
i
, i = 1, . . . , n.
Goodness-of-t 22/30
Note that these variations can be positive or negative, and
that they only apply to a single point in the sample.
However, we require an overall measure for the entire
sample, and when we talk about variation we usually have
in mind a positive measure.
A measure of variation of Y taken over the entire sample is
the total sum of squares (SST):
n
i=1
(Y
i
Y)
2
.
This is the total variation in Y that we attempt to explain by
our regression line, and is always non-negative.
We have seen this sort of quantity before dividing by
n 1 gives the sample variance.
Goodness-of-t 23/30
A sample-wide measure of the variation in Y due to X is
given by the explained sum of squares (SSE):
n
i=1
(
Y
i
Y)
2
.
This quantity is also non-negative.
Finally, a measure of the total residual variation is the
residual sum of squares (SSR):
n
i=1
e
2
i
,
which is also non-negative.
Goodness-of-t 24/30
The following relationship holds:
total sum of = explained sum + residual sum
squares of squares of squares
n
i=1
(Y
i
Y)
2
=
n
i=1
(
Y
i
Y)
2
+
n
i=1
e
2
i
SST = SSE + SSR
The extension question on Problem Set 12 deals with this
identity.
These quantities are used to dene the coefcient of
determination, R
2
, as follows:
R
2
=
variation in Y due to X
total variation in Y
=
SSE
SST
.
Goodness-of-t 25/30
Alternative (but equivalent) expressions for R
2
include
R
2
= 1
SSR
SST
obtained by making the substitution SSE=SSTSSR.
Another expression is
R
2
=
b
2
x
2
i
y
2
i
where x
i
= X
i
X and y
i
= Y
i
Y.
The derivation of this last expression requires showing that
SSE= b
2
x
2
i
and noting that SST=
y
2
i
(see the
extension question on Problem Set 12).
Goodness-of-t 26/30
In our demand for money example we have already shown
that R
2
= 0.772 by squaring the correlation coefcient
R = 0.8787.
However, we know that
b = 0.17485,
x
2
i
= 666.86,
y
2
i
= 26.403,
and hence an alternative derivation is
R
2
=
b
2
x
2
i
y
2
i
=
0.17485
2
666.86
26.403
= 0.772.
This implies that just over 77% of the variation in the
demand for money can be attributed to the variation in
GDP.
Remember: 0 R
2
1.
Goodness-of-t 27/30

In (a) and (c) R
2
= 1 because all points lie on a single
sample line in (a) R = +1 and in (c) R = 1.
In (b) R = 0 due to the lack of association between the two
variables and hence R
2
= 0.
Goodness-of-t 28/30
The correlation coefcient, R, is a measure of the strength
of association between two variables, and says nothing
about the direction of causation (if any exists).
The coefcient of determination, R
2
, however, is based on
the regression model Y = + X + in which the
causation is assumed to go from X to Y.
However, we should be careful to refer to R
2
as the
percentage of the variation in Y attributed to X rather than
explained by X, because any such relationship could be
spurious.
Computing OLS Estimates 29/30
In practice we use computer software for OLS calculations.
As an example, the Stata output for the money demand
example is of the form:
. regress m g
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 1, 28) = 94.88
Model | 20.3862321 1 20.3862321 Prob > F = 0.0000
Residual | 6.01600434 28 .214857298 R-squared = 0.7721
-------------+------------------------------ Adj R-squared = 0.7640
Total | 26.4022364 29 .910421946 Root MSE = .46353
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .1748489 .0179502 9.74 0.000 .1380795 .2116182
_cons | .0212579 .1157594 0.18 0.856 -.2158645 .2583803
------------------------------------------------------------------------------
Quite a lot of information is provided by default, but note
that the estimates a and b are given at the start of the nal
two rows (under the heading Coef.).
Summary 30/30
Summary
Ordinary Least Squares (OLS) Estimation
Goodness-of-t
Next week:
Non-Linear Models

Ordinary Least Square Estimation

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Ordinary Least Square Estimation

Enviado por

Direitos autorais:

Formatos disponíveis

1/30

EC114 Introduction to Quantitative Economics

X = 0.7906 (0.1748 4.40) = 0.0212.

Você também pode gostar