Você está na página 1de 226

Review of Probability and

Statistics
(i.e. things you learned in Ec 10 and
need to remember to do well in this
class!)

Economics 20 - Prof. Anderson 1


Random Variables
X is a random variable if it represents a random
draw from some population

a discrete random variable can take on only


selected values
a continuous random variable can take on any
value in a real interval

associated with each random variable is a


probability distribution
Economics 20 - Prof. Anderson 2
Random Variables – Examples
the outcome of a coin toss – a discrete
random variable with P(Heads)=.5 and
P(Tails)=.5

the height of a selected student – a


continuous random variable drawn from an
approximately normal distribution

Economics 20 - Prof. Anderson 3


Expected Value of X – E(X)
The expected value is really just a
probability weighted average of X
E(X) is the mean of the distribution of X,
denoted by mx
Let f(xi) be the probability that X=xi, then
n
m X  E ( X )   xi f ( xi )
i 1
Economics 20 - Prof. Anderson 4
Variance of X – Var(X)
The variance of X is a measure of the
dispersion of the distribution
Var(X) is the expected value of the squared
deviations from the mean, so

  Var ( X )  E  X  m X 
2
X  2

Economics 20 - Prof. Anderson 5
More on Variance
The square root of Var(X) is the standard
deviation of X
Var(X) can alternatively be written in terms
of a weighted sum of squared deviations,
because

 
E  X  m X     xi  m X  f  xi 
2 2

Economics 20 - Prof. Anderson 6


Covariance – Cov(X,Y)
Covariance between X and Y is a measure
of the association between two random
variables, X & Y
If positive, then both move up or down
together
If negative, then if X is high, Y is low, vice
versa

 XY  Cov( X , Y )  EX  m X Y  mY 


Economics 20 - Prof. Anderson 7
Correlation Between X and Y
Covariance is dependent upon the units of
X & Y [Cov(aX,bY)=abCov(X,Y)]
Correlation, Corr(X,Y), scales covariance
by the standard deviations of X & Y so that
it lies between 1 & –1

 XY Cov( X , Y )
 XY  
 X  Y Var( X )Var(Y )2
1

Economics 20 - Prof. Anderson 8


More Correlation & Covariance
If X,Y =0 (or equivalently X,Y =0) then X
and Y are linearly unrelated
If X,Y = 1 then X and Y are said to be
perfectly positively correlated
If X,Y = – 1 then X and Y are said to be
perfectly negatively correlated
Corr(aX,bY) = Corr(X,Y) if ab>0
Corr(aX,bY) = –Corr(X,Y) if ab<0

Economics 20 - Prof. Anderson 9


Properties of Expectations
E(a)=a, Var(a)=0
E(mX)=mX, i.e. E(E(X))=E(X)
E(aX+b)=aE(X)+b
E(X+Y)=E(X)+E(Y)
E(X-Y)=E(X)-E(Y)
E(X- mX)=0 or E(X-E(X))=0
E((aX)2)=a2E(X2)
Economics 20 - Prof. Anderson 10
More Properties
Var(X) = E(X2) – mx2
Var(aX+b) = a2Var(X)
Var(X+Y) = Var(X) +Var(Y) +2Cov(X,Y)
Var(X-Y) = Var(X) +Var(Y) - 2Cov(X,Y)
Cov(X,Y) = E(XY)-mxmy
If (and only if) X,Y independent, then
 Var(X+Y)=Var(X)+Var(Y), E(XY)=E(X)E(Y)

Economics 20 - Prof. Anderson 11


The Normal Distribution
A general normal distribution, with mean m
and variance 2 is written as N(m, 2)
It has the following probability density
function (pdf)

( xm )2
1 
f ( x)  e 2 2

 2
Economics 20 - Prof. Anderson 12
The Standard Normal
Any random variable can be “standardized” by
subtracting the mean, m, and dividing by the
standard deviation,  , so E(Z)=0, Var(Z)=1
Thus, the standard normal, N(0,1), has pdf

 z2
 z  
1 2
e
2
Economics 20 - Prof. Anderson 13
Properties of the Normal
If X~N(m,2), then aX+b ~N(am+b,a22)
A linear combination of independent,
identically distributed (iid) normal random
variables will also be normally distributed
If Y1,Y2, … Yn are iid and ~N(m,2), then
  2

Y ~ N
 m , 

 n 
Economics 20 - Prof. Anderson 14
Cumulative Distribution Function
For a pdf, f(x), where f(x) is P(X = x), the
cumulative distribution function (cdf), F(x),
is P(X  x); P(X > x) = 1 – F(x) =P(X< – x)
For the standard normal, (z), the cdf is
F(z)= P(Z<z), so
P(|Z|>a) = 2P(Z>a) = 2[1-F(a)]
P(a Z b) = F(b) – F(a)

Economics 20 - Prof. Anderson 15


The Chi-Square Distribution
Suppose that Zi , i=1,…,n are iid ~ N(0,1),
and X=(Zi2), then
X has a chi-square distribution with n
degrees of freedom (df), that is
X~2n
If X~2n, then E(X)=n and Var(X)=2n

Economics 20 - Prof. Anderson 16


The t distribution
If a random variable, T, has a t distribution with n
degrees of freedom, then it is denoted as T~tn
E(T)=0 (for n>1) and Var(T)=n/(n-2) (for n>2)
T is a function of Z~N(0,1) and X~2n as follows:

Z
T 
X
n
Economics 20 - Prof. Anderson 17
The F Distribution
If a random variable, F, has an F distribution with
(k1,k2) df, then it is denoted as F~Fk1,k2
F is a function of X1~2k1 and X2~2k2 as follows:

 X1 
 
F   k1 

 X2 
 
 k 2 

Economics 20 - Prof. Anderson 18


Random Samples and Sampling
For a random variable Y, repeated draws
from the same population can be labeled as
Y1, Y2, . . . , Yn
If every combination of n sample points
has an equal chance of being selected, this
is a random sample
A random sample is a set of independent,
identically distributed (i.i.d) random
variables
Economics 20 - Prof. Anderson 19
Estimators and Estimates
Typically, we can’t observe the full
population, so we must make inferences
base on estimates from a random sample
An estimator is just a mathematical formula
for estimating a population parameter from
sample data
An estimate is the actual number the
formula produces from the sample data

Economics 20 - Prof. Anderson 20


Examples of Estimators
Suppose we want to estimate the population mean
Suppose we use the formula for E(Y), but
substitute 1/n for f(yi) as the probability weight
since each point has an equal chance of being
included in the sample, then
Can calculate the sample average for our sample:

n
1
Y   Yi
n i 1
Economics 20 - Prof. Anderson 21
What Make a Good Estimator?
Unbiasedness
Efficiency
Mean Square Error (MSE)

Asymptotic properties (for large samples):


Consistency

Economics 20 - Prof. Anderson 22


Unbiasedness of Estimator
Want your estimator to be right, on average
We say an estimator, W, of a Population
Parameter, q, is unbiased if E(W)=E(q)
For our example, that means we want

E (Y )  mY

Economics 20 - Prof. Anderson 23


Proof: Sample Mean is Unbiased

1 n
 1 n
E (Y )  E   Yi    E (Yi )
 n i 1  n i 1
n
1 1
  mY  nmY  mY
n i 1 n

Economics 20 - Prof. Anderson 24


Efficiency of Estimator
Want your estimator to be closer to the
truth, on average, than any other estimator
We say an estimator, W, is efficient if
Var(W)< Var(any other estimator)
Note, for our example

1  1
n n
 2
Var(Y )  Var  Yi   2  2

 n i 1  n i 1 n
Economics 20 - Prof. Anderson 25
MSE of Estimator
What if can’t find an unbiased estimator?
Define mean square error as E[(W-q)2]
Get trade off between unbiasedness and
efficiency, since MSE = variance + bias2
For our example, that means minimizing

 
E Y  mY   Var Y   E Y   mY 
2 2

Economics 20 - Prof. Anderson 26


Consistency of Estimator
Asymptotic properties, that is, what
happens as the sample size goes to infinity?
Want distribution of W to converge to q,
i.e. plim(W)=q
For our example, that means we want

 
P Y  mY    0 as n  
Economics 20 - Prof. Anderson 27
More on Consistency
An unbiased estimator is not necessarily
consistent – suppose choose Y1 as estimate
of mY, since E(Y1)= mY, then plim(Y1) mY
An unbiased estimator, W, is consistent if
Var(W)  0 as n  
Law of Large Numbers refers to the
consistency of sample average as estimator
for m, that is, to the fact that:
plim( Y)  m Y
Economics 20 - Prof. Anderson 28
Central Limit Theorem
Asymptotic Normality implies that P(Z<z)F(z)
as n , or P(Z<z) F(z)
The central limit theorem states that the
standardized average of any population with mean
m and variance 2 is asymptotically ~N(0,1), or
Y  mY
~ N 0,1
a
Z

n
Economics 20 - Prof. Anderson 29
Estimate of Population Variance
We have a good estimate of mY, would like
a good estimate of 2Y
Can use the sample variance given below –
note division by n-1, not n, since mean is
estimated too – if know m can use n
2

Yi  Y 
n
1
S 
2

n  1 i 1
Economics 20 - Prof. Anderson 30
Estimators as Random Variables
Each of our sample statistics (e.g. the
sample mean, sample variance, etc.) is a
random variable - Why?
Each time we pull a random sample, we’ll
get different sample statistics
If we pull lots and lots of samples, we’ll get
a distribution of sample statistics

Economics 20 - Prof. Anderson 31


Welcome to Economics 20
What is Econometrics?

Economics 20 - Prof. Anderson 1


Why study Econometrics?
Rare in economics (and many other areas
without labs!) to have experimental data

Need to use nonexperimental, or


observational, data to make inferences

Important to be able to apply economic


theory to real world data

Economics 20 - Prof. Anderson 2


Why study Econometrics?
An empirical analysis uses data to test a
theory or to estimate a relationship

A formal economic model can be tested

Theory may be ambiguous as to the effect


of some policy change – can use
econometrics to evaluate the program

Economics 20 - Prof. Anderson 3


Types of Data – Cross Sectional
Cross-sectional data is a random sample

Each observation is a new individual, firm,


etc. with information at a point in time

If the data is not a random sample, we have


a sample-selection problem

Economics 20 - Prof. Anderson 4


Types of Data – Panel
Can pool random cross sections and treat
similar to a normal cross section. Will just
need to account for time differences.

Can follow the same random individual


observations over time – known as panel
data or longitudinal data

Economics 20 - Prof. Anderson 5


Types of Data – Time Series
Time series data has a separate observation
for each time period – e.g. stock prices

Since not a random sample, different


problems to consider

Trends and seasonality will be important

Economics 20 - Prof. Anderson 6


The Question of Causality
Simply establishing a relationship between
variables is rarely sufficient
Want to the effect to be considered causal
If we’ve truly controlled for enough other
variables, then the estimated ceteris paribus
effect can often be considered to be causal
Can be difficult to establish causality

Economics 20 - Prof. Anderson 7


Example: Returns to Education
A model of human capital investment implies
getting more education should lead to higher
earnings
In the simplest case, this implies an equation like

Earnings   0  1education u

Economics 20 - Prof. Anderson 8


Example: (continued)
The estimate of 1, is the return to
education, but can it be considered causal?
While the error term, u, includes other
factors affecting earnings, want to control
for as much as possible
Some things are still unobserved, which
can be problematic

Economics 20 - Prof. Anderson 9


The Simple Regression Model

y = b0 + b1x + u

Economics 20 - Prof. Anderson 1


Some Terminology
In the simple linear regression model,
where y = b0 + b1x + u, we typically refer
to y as the
 Dependent Variable, or
 Left-Hand Side Variable, or
 Explained Variable, or
 Regressand

Economics 20 - Prof. Anderson 2


Some Terminology, cont.
In the simple linear regression of y on x,
we typically refer to x as the
 Independent Variable, or
 Right-Hand Side Variable, or
 Explanatory Variable, or
 Regressor, or
 Covariate, or
 Control Variables

Economics 20 - Prof. Anderson 3


A Simple Assumption
The average value of u, the error term, in
the population is 0. That is,

E(u) = 0

This is not a restrictive assumption, since


we can always use b0 to normalize E(u) to 0

Economics 20 - Prof. Anderson 4


Zero Conditional Mean
We need to make a crucial assumption
about how u and x are related
We want it to be the case that knowing
something about x does not give us any
information about u, so that they are
completely unrelated. That is, that
E(u|x) = E(u) = 0, which implies
E(y|x) = b0 + b1x

Economics 20 - Prof. Anderson 5


E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)
y
f(y)

. E(y|x) = b + b x
0 1
.

x1 x2
Economics 20 - Prof. Anderson 6
Ordinary Least Squares
Basic idea of regression is to estimate the
population parameters from a sample
Let {(xi,yi): i=1, …,n} denote a random
sample of size n from the population
For each observation in this sample, it will
be the case that
yi = b0 + b1xi + ui

Economics 20 - Prof. Anderson 7


Population regression line, sample data points
and the associated error terms
y E(y|x) = b0 + b1x
y4 .{
u4

y3 .} u3
y2 u2 {.

y1 .} u1

x1 x2 x3 x4 x
Economics 20 - Prof. Anderson 8
Deriving OLS Estimates
To derive the OLS estimates we need to
realize that our main assumption of E(u|x) =
E(u) = 0 also implies that

Cov(x,u) = E(xu) = 0

Why? Remember from basic probability


that Cov(X,Y) = E(XY) – E(X)E(Y)

Economics 20 - Prof. Anderson 9


Deriving OLS continued
We can write our 2 restrictions just in terms
of x, y, b0 and b1 , since u = y – b0 – b1x

E(y – b0 – b1x) = 0
E[x(y – b0 – b1x)] = 0

These are called moment restrictions

Economics 20 - Prof. Anderson 10


Deriving OLS using M.O.M.
The method of moments approach to
estimation implies imposing the population
moment restrictions on the sample moments

What does this mean? Recall that for E(X),


the mean of a population distribution, a
sample estimator of E(X) is simply the
arithmetic mean of the sample

Economics 20 - Prof. Anderson 11


More Derivation of OLS
We want to choose values of the parameters that
will ensure that the sample versions of our
moment restrictions are true
The sample versions are as follows:

 y 
n
n 1
i  bˆ 0  bˆ1 xi  0
i 1

 
n
n 1
 i i 0 1 i
x
i 1
y  ˆ  bˆ x  0
b
Economics 20 - Prof. Anderson 12
More Derivation of OLS
Given the definition of a sample mean, and
properties of summation, we can rewrite the first
condition as follows

y  bˆ0  bˆ1 x ,
or
bˆ0  y  bˆ1 x
Economics 20 - Prof. Anderson 13
More Derivation of OLS

   
n

 i i
x
i 1
y  y  ˆ x  bˆ x  0
b 1 1 i

n n
x 
 i iy  y   ˆ
b 1  xi  xi  x 
i 1 i 1
n n

 xi  x  yi  y   b1  xi  x 
ˆ 2

i 1 i 1

Economics 20 - Prof. Anderson 14


So the OLS estimated slope is
n

 x  x  y i i  y
bˆ1  i 1
n

 x  x 
2
i
i 1
n
provided that   xi  x   0
2

i 1

Economics 20 - Prof. Anderson 15


Summary of OLS slope estimate
The slope estimate is the sample covariance
between x and y divided by the sample
variance of x
If x and y are positively correlated, the
slope will be positive
If x and y are negatively correlated, the
slope will be negative
Only need x to vary in our sample

Economics 20 - Prof. Anderson 16


More OLS
Intuitively, OLS is fitting a line through the
sample points such that the sum of squared
residuals is as small as possible, hence the
term least squares
The residual, û, is an estimate of the error
term, u, and is the difference between the
fitted line (sample regression function) and
the sample point
Economics 20 - Prof. Anderson 17
Sample regression line, sample data points
and the associated estimated error terms
y
y4 .
û4 {
yˆ  bˆ0  bˆ1 x
y3 .} û3
y2 û2 { .

} û1
y1 .
x1 x2 x3 x4 x
Economics 20 - Prof. Anderson 18
Alternate approach to derivation
Given the intuitive idea of fitting a line, we can
set up a formal minimization problem
That is, we want to choose our parameters such
that we minimize the following:

 
n n

 ui    yi  b 0  b1 xi
ˆ ˆ 2
ˆ 2

i 1 i 1

Economics 20 - Prof. Anderson 19


Alternate approach, continued
If one uses calculus to solve the minimization
problem for the two parameters you obtain the
following first order conditions, which are the
same as we obtained before, multiplied by n

 
n

 y
i 1
 ˆ
b
i 0  ˆ x 0
b 1 i

 
n

 i i 0 1i
x y
i 1
 ˆ  bˆ x  0
b
Economics 20 - Prof. Anderson 20
Algebraic Properties of OLS
The sum of the OLS residuals is zero
Thus, the sample average of the OLS
residuals is zero as well
The sample covariance between the
regressors and the OLS residuals is zero
The OLS regression line always goes
through the mean of the sample

Economics 20 - Prof. Anderson 21


Algebraic Properties (precise)
n

n  uˆ i

 uˆi  0 and thus,


i 1
i 1
n
0
n

 x uˆ
i 1
i i 0

y  bˆ0  bˆ1 x
Economics 20 - Prof. Anderson 22
More terminology
We can think of each observation as being made
up of an explained part, and an unexplained part,
yi  yˆ i  uˆi We then define the following :
  y  y  is the total sum of squares (SST)
2
i

  yˆ  y  is the explained sum of squares (SSE)


2
i

 uˆ is the residual sum of squares (SSR)


2
i

Then SST  SSE  SSR


Economics 20 - Prof. Anderson 23
Proof that SST = SSE + SSR

  y  y     y  yˆ    yˆ  y 
2 2
i i i i

  uˆ   yˆ  y 
2
i i

  uˆ  2 uˆ  yˆ  y     yˆ  y 
2 2
i i i i

 SSR  2 uˆ  yˆ  y   SSE
i i

and we know that  uˆ  yˆ  y   0 i i

Economics 20 - Prof. Anderson 24


Goodness-of-Fit
How do we think about how well our
sample regression line fits our sample data?

Can compute the fraction of the total sum


of squares (SST) that is explained by the
model, call this the R-squared of regression

R2 = SSE/SST = 1 – SSR/SST

Economics 20 - Prof. Anderson 25


Unbiasedness of OLS
Assume the population model is linear in
parameters as y = b0 + b1x + u
Assume we can use a random sample of
size n, {(xi, yi): i=1, 2, …, n}, from the
population model. Thus we can write the
sample model yi = b0 + b1xi + ui
Assume E(u|x) = 0 and thus E(ui|xi) = 0
Assume there is variation in the xi

Economics 20 - Prof. Anderson 26


Unbiasedness of OLS (cont)
In order to think about unbiasedness, we need to
rewrite our estimator in terms of the population
parameter
Start with a simple rewrite of the formula as

 x  x  y

1  i
2
i
, where
s x

s    xi  x 
2 2
x

Economics 20 - Prof. Anderson 27


Unbiasedness of OLS (cont)

 x  x y  x  x b  b x
i i i 0 1 i  ui  
 x  x b   x  x b x
i 0 i 1 i

   x  x u 
i i

b   x  x   b  x  x x
0 i 1 i i

   x  x u
i i

Economics 20 - Prof. Anderson 28


Unbiasedness of OLS (cont)
 x  x   0,
i

 x  x x   x  x 
2
i i i

so, the numerator can be rewritten as


b s    xi  x ui , and thus
2
1 x

 x  x u
bˆ1  b1  i
2
i

s x
Economics 20 - Prof. Anderson 29
Unbiasedness of OLS (cont)

let d i  xi  x , so that


 1 
b i  b1   2  d i ui , then
ˆ
 sx 

 
ˆ
E b1  b1  
 1 
2  d i E ui   b1
 sx 

Economics 20 - Prof. Anderson 30


Unbiasedness Summary
The OLS estimates of b1 and b0 are
unbiased
Proof of unbiasedness depends on our 4
assumptions – if any assumption fails, then
OLS is not necessarily unbiased
Remember unbiasedness is a description of
the estimator – in a given sample we may be
“near” or “far” from the true parameter

Economics 20 - Prof. Anderson 31


Variance of the OLS Estimators
Now we know that the sampling
distribution of our estimate is centered
around the true parameter
Want to think about how spread out this
distribution is
Much easier to think about this variance
under an additional assumption, so
Assume Var(u|x) = s2 (Homoskedasticity)

Economics 20 - Prof. Anderson 32


Variance of OLS (cont)
Var(u|x) = E(u2|x)-[E(u|x)]2
E(u|x) = 0, so s2 = E(u2|x) = E(u2) = Var(u)
Thus s2 is also the unconditional variance,
called the error variance
s, the square root of the error variance is
called the standard deviation of the error
Can say: E(y|x)=b0 + b1x and Var(y|x) = s2

Economics 20 - Prof. Anderson 33


Homoskedastic Case
y
f(y|x)

. E(y|x) = b + b x
0 1
.

x1 x2
Economics 20 - Prof. Anderson 34
Heteroskedastic Case
f(y|x)

.
. E(y|x) = b0 + b1x

.
x1 x2 x3 x
Economics 20 - Prof. Anderson 35
Variance of OLS (cont)
ˆ     1 
Var b1  Var b1  

2   d i ui 
 
  sx  
2 2

2  Var  d i ui   
 1   1 

 sx 
2
 sx 
 i Varui 
d 2

2 2
 1   1 
 2
 sx 
 d s  s  sx2 
i
2 2 2
 i
d 2

 
2
 1  2 s2 ˆ
s 2
2  sx  2  Var b1
 sx  sx
Economics 20 - Prof. Anderson 36
Variance of OLS Summary
The larger the error variance, s2, the larger
the variance of the slope estimate
The larger the variability in the xi, the
smaller the variance of the slope estimate
As a result, a larger sample size should
decrease the variance of the slope estimate
Problem that the error variance is unknown

Economics 20 - Prof. Anderson 37


Estimating the Error Variance
We don’t know what the error variance, s2,
is, because we don’t observe the errors, ui

What we observe are the residuals, ûi

We can use the residuals to form an


estimate of the error variance

Economics 20 - Prof. Anderson 38


Error Variance Estimate (cont)

uˆi  yi  bˆ0  bˆ1 xi


 b 0  b1 xi  ui   bˆ0  bˆ1 xi
i 
 u  bˆ  b  bˆ  b
0 0   1 1 
Then, an unbiased estimator of s 2 is

ui  SSR / n  2 
1
s 
ˆ 2

n  2  ˆ 2

Economics 20 - Prof. Anderson 39


Error Variance Estimate (cont)
sˆ  sˆ 2  Standard error of the regression

recall that sd bˆ  s
sx
if we substitute sˆ for s then we have
the standard error of bˆ1 ,

  
se bˆ1  sˆ /  xi  x 
2

1
2

Economics 20 - Prof. Anderson 40


Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

1. Estimation

Economics 20 - Prof. Anderson 1


Parallels with Simple Regression
b0 is still the intercept
b1 to bk all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean
assumption, so now assume that
E(u|x1,x2, …,xk) = 0
Still minimizing the sum of squared
residuals, so have k+1 first order conditions
Economics 20 - Prof. Anderson 2
Interpreting Multiple Regression

yˆ  bˆ0  bˆ1 x1  bˆ 2 x2  ...  bˆ k xk , so


yˆ  bˆ x  bˆ x  ...  bˆ x ,
1 1 2 2 k k

so holding x2 ,..., xk fixed implies that


yˆ  bˆ1 x1 , that is each b has
a ceteris paribus interpretation

Economics 20 - Prof. Anderson 3


A “Partialling Out” Interpretation
Consider the case where k  2, i.e.
yˆ  bˆ  bˆ x  bˆ x , then
0 1 1 2 2

bˆ1   rˆi1 yi   rˆ 2
i1 , where rˆi1 are
the residuals from the estimated
regression xˆ1  ˆ0  ˆ2 xˆ2
Economics 20 - Prof. Anderson 4
“Partialling Out” continued
Previous equation implies that regressing y
on x1 and x2 gives same effect of x1 as
regressing y on residuals from a regression
of x1 on x2
This means only the part of xi1 that is
uncorrelated with xi2 is being related to yi so
we’re estimating the effect of x1 on y after
x2 has been “partialled out”
Economics 20 - Prof. Anderson 5
Simple vs Multiple Reg Estimate

~ ~ ~
Compare the simple regression y  b 0  b1 x1
with the multiple regression yˆ  bˆ0  bˆ1 x1  bˆ 2 x2
~
Generally, b1  bˆ1 unless :
bˆ  0 (i.e. no partial effect of x ) OR
2 2

x1 and x2 are uncorrelated in the sample

Economics 20 - Prof. Anderson 6


Goodness-of-Fit
We can think of each observation as being made
up of an explained part, and an unexplained part,
yi  yˆ i  uˆi We then define the following :
  y  y  is the total sum of squares (SST)
2
i

  yˆ  y  is the explained sum of squares (SSE)


2
i

 uˆ is the residual sum of squares (SSR)


2
i

Then SST  SSE  SSR


Economics 20 - Prof. Anderson 7
Goodness-of-Fit (continued)
How do we think about how well our
sample regression line fits our sample data?

Can compute the fraction of the total sum


of squares (SST) that is explained by the
model, call this the R-squared of regression

R2 = SSE/SST = 1 – SSR/SST

Economics 20 - Prof. Anderson 8


Goodness-of-Fit (continued)
We can also think of R 2 as being equal to
the squared correlation coefficient between
the actual yi and the values yˆ i
  y  y yˆ  yˆ 
2

  y  y   yˆ  yˆ  
R 2
 i
2
i
2
i i

Economics 20 - Prof. Anderson 9


More about R-squared
R2 can never decrease when another
independent variable is added to a
regression, and usually will increase

Because R2 will usually increase with the


number of independent variables, it is not a
good way to compare models

Economics 20 - Prof. Anderson 10


Assumptions for Unbiasedness
Population model is linear in parameters:
y = b0 + b1x1 + b2x2 +…+ bkxk + u
We can use a random sample of size n,
{(xi1, xi2,…, xik, yi): i=1, 2, …, n}, from the
population model, so that the sample model
is yi = b0 + b1xi1 + b2xi2 +…+ bkxik + ui
E(u|x1, x2,… xk) = 0, implying that all of the
explanatory variables are exogenous
None of the x’s is constant, and there are no
exact linear relationships among them
Economics 20 - Prof. Anderson 11
Too Many or Too Few Variables
What happens if we include variables in
our specification that don’t belong?
There is no effect on our parameter
estimate, and OLS remains unbiased

What if we exclude a variable from our


specification that does belong?
OLS will usually be biased

Economics 20 - Prof. Anderson 12


Omitted Variable Bias
Suppose the true model is given as
y  b 0  b1 x1  b 2 x2  u, but we
~ ~ ~
estimate y  b  b x  u, then
0 1 1

~  x  x1  yi
b1  i1

 x  x1 
2
i1

Economics 20 - Prof. Anderson 13


Omitted Variable Bias (cont)
Recall the true model, so that
yi  b 0  b1 xi1  b 2 xi 2  ui , so the
numerator becomes
 x  x b
i1 1 0  b1 xi1  b 2 xi 2  ui  
b  x  x   b 2   xi1  x1 xi 2    xi1  x1 ui
2
1 i1 1

Economics 20 - Prof. Anderson 14


Omitted Variable Bias (cont)

~
b  b1  b 2  x  x x   x  x1 ui
 x  x    x 
i1 1 i2 i1

 x1 
2 2
i1 1 i1

since E(ui )  0, taking expectations we have

 ~
E b1  b1  b 2
 x  x x
 x  x  
i1 1 i2
2
i1 1

Economics 20 - Prof. Anderson 15


Omitted Variable Bias (cont)

Consider the regression of x2 on x1


~ ~ ~ ~  x  x x
x2   0   1 x1 then  1 
 x  x  
i1 1 i2
2
i1 1

 
~
so E b1  b1  b 2 1
~

Economics 20 - Prof. Anderson 16


Summary of Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

b2 > 0 Positive bias Negative bias

b2 < 0 Negative bias Positive bias

Economics 20 - Prof. Anderson 17


Omitted Variable Bias Summary
Two cases where bias is equal to zero
 b2 = 0, that is x2 doesn’t really belong in model
 x1 and x2 are uncorrelated in the sample

If correlation between x2 , x1 and x2 , y is


the same direction, bias will be positive
If correlation between x2 , x1 and x2 , y is
the opposite direction, bias will be negative

Economics 20 - Prof. Anderson 18


The More General Case
Technically, can only sign the bias for the
more general case if all of the included x’s
are uncorrelated

Typically, then, we work through the bias


assuming the x’s are uncorrelated, as a
useful guide even if this assumption is not
strictly true

Economics 20 - Prof. Anderson 19


Variance of the OLS Estimators
Now we know that the sampling
distribution of our estimate is centered
around the true parameter
Want to think about how spread out this
distribution is
Much easier to think about this variance
under an additional assumption, so
Assume Var(u|x1, x2,…, xk) = s2
(Homoskedasticity)
Economics 20 - Prof. Anderson 20
Variance of OLS (cont)
Let x stand for (x1, x2,…xk)
Assuming that Var(u|x) = s2 also implies
that Var(y| x) = s2

The 4 assumptions for unbiasedness, plus


this homoskedasticity assumption are
known as the Gauss-Markov assumptions

Economics 20 - Prof. Anderson 21


Variance of OLS (cont)
Given the Gauss - Markov Assumptions

  s 2
Var bˆ j 

SST j 1  R 2
j  , where

SST j   xij  x j  and R is the R


2 2 2
j

from regressing x j on all other x' s


Economics 20 - Prof. Anderson 22
Components of OLS Variances
The error variance: a larger s2 implies a
larger variance for the OLS estimators
The total sample variation: a larger SSTj
implies a smaller variance for the estimators
Linear relationships among the independent
variables: a larger Rj2 implies a larger
variance for the estimators

Economics 20 - Prof. Anderson 23


Misspecified Models

Consider again the misspecified model


s
 
2
~ ~ ~ ~
y  b 0  b1 x1 , so that Var b1 
SST1
 
~
 
Thus, Var b  Var bˆ unless x and
1 1 1

x2 are uncorrelated, then they' re the same

Economics 20 - Prof. Anderson 24


Misspecified Models (cont)
While the variance of the estimator is
smaller for the misspecified model, unless
b2 = 0 the misspecified model is biased

As the sample size grows, the variance of


each estimator shrinks to zero, making the
variance difference less important

Economics 20 - Prof. Anderson 25


Estimating the Error Variance
We don’t know what the error variance, s2,
is, because we don’t observe the errors, ui

What we observe are the residuals, ûi

We can use the residuals to form an


estimate of the error variance

Economics 20 - Prof. Anderson 26


Error Variance Estimate (cont)
sˆ   uˆ
2
 n  k  1  SSR df
2
i

thus, sebˆ   sˆ SST 1  R 


j j
2 12
j

df = n – (k + 1), or df = n – k – 1
df (i.e. degrees of freedom) is the (number
of observations) – (number of estimated
parameters)
Economics 20 - Prof. Anderson 27
The Gauss-Markov Theorem
Given our 5 Gauss-Markov Assumptions it
can be shown that OLS is “BLUE”
Best
Linear
Unbiased
Estimator
Thus, if the assumptions hold, use OLS

Economics 20 - Prof. Anderson 28


Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

2. Inference

Economics 20 - Prof. Anderson 1


Assumptions of the Classical
Linear Model (CLM)
So far, we know that given the Gauss-
Markov assumptions, OLS is BLUE,
In order to do classical hypothesis testing,
we need to add another assumption (beyond
the Gauss-Markov assumptions)
Assume that u is independent of x1, x2,…, xk
and u is normally distributed with zero
mean and variance s2: u ~ Normal(0,s2)

Economics 20 - Prof. Anderson 2


CLM Assumptions (cont)
Under CLM, OLS is not only BLUE, but is
the minimum variance unbiased estimator
We can summarize the population
assumptions of CLM as follows
y|x ~ Normal(b0 + b1x1 +…+ bkxk, s2)
While for now we just assume normality,
clear that sometimes not the case
Large samples will let us drop normality

Economics 20 - Prof. Anderson 3


The homoskedastic normal distribution with
a single explanatory variable
y
f(y|x)

. E(y|x) = b + b x
0 1
.
Normal
distributions

x1 x2
Economics 20 - Prof. Anderson 4
Normal Sampling Distributions
Under the CLM assumptions, conditional on
the sample values of the independent variables
j 
bˆ ~ Normal b ,Var bˆ , so that
j   j

bˆ bj   
 
j
~ Normal 0,1
sd bˆ j
bˆ j is distributed normally because it
is a linear combination of the errors
Economics 20 - Prof. Anderson 5
The t Test
Under the CLM assumptions
 
bˆ j  b j
 
ˆ
se b
j
~ t n  k 1

Note this is a t distribution (vs normal)


because we have to estimate s by s ˆ 2 2

Note the degrees of freedom : n  k  1


Economics 20 - Prof. Anderson 6
The t Test (cont)
Knowing the sampling distribution for the
standardized estimator allows us to carry
out hypothesis tests
Start with a null hypothesis
For example, H0: bj=0
If accept null, then accept that xj has no
effect on y, controlling for other x’s

Economics 20 - Prof. Anderson 7


The t Test (cont)
To perform our test we first need to form

" the" t statistic for bˆ j : t bˆ  j
j
se bˆ  
j

We will then use our t statistic along with


a rejection rule to determine whether to
accept the null hypothesis, H 0
Economics 20 - Prof. Anderson 8
t Test: One-Sided Alternatives
Besides our null, H0, we need an alternative
hypothesis, H1, and a significance level
H1 may be one-sided, or two-sided
H1: bj > 0 and H1: bj < 0 are one-sided
H1: bj  0 is a two-sided alternative
If we want to have only a 5% probability of
rejecting H0 if it is really true, then we say
our significance level is 5%

Economics 20 - Prof. Anderson 9


One-Sided Alternatives (cont)
Having picked a significance level, a, we
look up the (1 – a)th percentile in a t
distribution with n – k – 1 df and call this c,
the critical value
We can reject the null hypothesis if the t
statistic is greater than the critical value
If the t statistic is less than the critical value
then we fail to reject the null

Economics 20 - Prof. Anderson 10


One-Sided Alternatives (cont)
yi = b0 + b1xi1 + … + bkxik + ui

H0: bj = 0 H1: bj > 0

Fail to reject
reject
1  a a
0 c
Economics 20 - Prof. Anderson 11
One-sided vs Two-sided
Because the t distribution is symmetric,
testing H1: bj < 0 is straightforward. The
critical value is just the negative of before
We can reject the null if the t statistic < –c,
and if the t statistic > than –c then we fail to
reject the null
For a two-sided test, we set the critical
value based on a/2 and reject H1: bj  0 if
the absolute value of the t statistic > c
Economics 20 - Prof. Anderson 12
Two-Sided Alternatives
yi = b0 + b1Xi1 + … + bkXik + ui

H0: bj = 0 H1: bj  0
fail to reject

reject reject
a/2 1  a a/2
-c 0 c
Economics 20 - Prof. Anderson 13
Summary for H0: bj = 0
Unless otherwise stated, the alternative is
assumed to be two-sided
If we reject the null, we typically say “xj is
statistically significant at the a % level”
If we fail to reject the null, we typically say
“xj is statistically insignificant at the a %
level”

Economics 20 - Prof. Anderson 14


Testing other hypotheses
A more general form of the t statistic
recognizes that we may want to test
something like H0: bj = aj
In this case, the appropriate t statistic is

bˆ  aj 
t
 
j
ˆ , where
se b j
a j  0 for the standard test

Economics 20 - Prof. Anderson 15


Confidence Intervals
Another way to use classical statistical testing is
to construct a confidence interval using the same
critical value as was used for a two-sided test
A (1 - a) % confidence interval is defined as

ˆ   ˆ  a
b j  c • se b j , where c is the 1 -  percentile
 2
in a tn k 1 distribution
Economics 20 - Prof. Anderson 16
Computing p-values for t tests
An alternative to the classical approach is
to ask, “what is the smallest significance
level at which the null would be rejected?”
So, compute the t statistic, and then look up
what percentile it is in the appropriate t
distribution – this is the p-value
p-value is the probability we would observe
the t statistic we did, if the null were true

Economics 20 - Prof. Anderson 17


Stata and p-values, t tests, etc.
Most computer packages will compute the
p-value for you, assuming a two-sided test
If you really want a one-sided alternative,
just divide the two-sided p-value by 2
Stata provides the t statistic, p-value, and
95% confidence interval for H0: bj = 0 for
you, in columns labeled “t”, “P > |t|” and
“[95% Conf. Interval]”, respectively

Economics 20 - Prof. Anderson 18


Testing a Linear Combination
Suppose instead of testing whether b1 is equal to a
constant, you want to test if it is equal to another
parameter, that is H0 : b1 = b2
Use same basic procedure for forming a t statistic

ˆ ˆ
b1  b 2
t

se bˆ1  bˆ 2 
Economics 20 - Prof. Anderson 19
Testing Linear Combo (cont)
Since
  
se bˆ1  bˆ 2  Var bˆ1  bˆ 2 , then
Var bˆ  bˆ   Var bˆ   Var bˆ  2Covbˆ , bˆ 
1 2 1 2 1 2

sebˆ  bˆ   sebˆ   sebˆ   2 s 


1
2 2 2
1 2 1 2 12

where s is an estimate of Covbˆ , bˆ 


12 1 2

Economics 20 - Prof. Anderson 20


Testing a Linear Combo (cont)
So, to use formula, need s12, which
standard output does not have
Many packages will have an option to get
it, or will just perform the test for you
In Stata, after reg y x1 x2 … xk you would
type test x1 = x2 to get a p-value for the test
More generally, you can always restate the
problem to get the test you want

Economics 20 - Prof. Anderson 21


Example:
Suppose you are interested in the effect of
campaign expenditures on outcomes
Model is voteA = b0 + b1log(expendA) +
b2log(expendB) + b3prtystrA + u
H0: b1 = - b2, or H0: q1 = b1 + b2 = 0
b1 = q1 – b2, so substitute in and rearrange
 voteA = b0 + q1log(expendA) +
b2log(expendB - expendA) + b3prtystrA + u
Economics 20 - Prof. Anderson 22
Example (cont):
This is the same model as originally, but
now you get a standard error for b1 – b2 = q1
directly from the basic regression
Any linear combination of parameters
could be tested in a similar manner
Other examples of hypotheses about a
single linear combination of parameters:
 b1 = 1 + b2 ; b1 = 5b2 ; b1 = -1/2b2 ; etc

Economics 20 - Prof. Anderson 23


Multiple Linear Restrictions
Everything we’ve done so far has involved
testing a single linear restriction, (e.g. b1 = 0
or b1 = b2 )
However, we may want to jointly test
multiple hypotheses about our parameters
A typical example is testing “exclusion
restrictions” – we want to know if a group
of parameters are all equal to zero

Economics 20 - Prof. Anderson 24


Testing Exclusion Restrictions
Now the null hypothesis might be
something like H0: bk-q+1 = 0, ... , bk = 0
The alternative is just H1: H0 is not true
Can’t just check each t statistic separately,
because we want to know if the q
parameters are jointly significant at a given
level – it is possible for none to be
individually significant at that level

Economics 20 - Prof. Anderson 25


Exclusion Restrictions (cont)
To do the test we need to estimate the “restricted
model” without xk-q+1,, …, xk included, as well as
the “unrestricted model” with all x’s included
Intuitively, we want to know if the change in SSR
is big enough to warrant inclusion of xk-q+1,, …, xk

F
SSRr  SSRur  q
, where
SSRur n  k  1
r is restricted and ur is unrestricted
Economics 20 - Prof. Anderson 26
The F statistic
The F statistic is always positive, since the
SSR from the restricted model can’t be less
than the SSR from the unrestricted
Essentially the F statistic is measuring the
relative increase in SSR when moving from
the unrestricted to restricted model
q = number of restrictions, or dfr – dfur
n – k – 1 = dfur

Economics 20 - Prof. Anderson 27


The F statistic (cont)
To decide if the increase in SSR when we
move to a restricted model is “big enough”
to reject the exclusions, we need to know
about the sampling distribution of our F stat
Not surprisingly, F ~ Fq,n-k-1, where q is
referred to as the numerator degrees of
freedom and n – k – 1 as the denominator
degrees of freedom
Economics 20 - Prof. Anderson 28
The F statistic (cont)
f(F) Reject H0 at a
fail to reject significance level
if F > c

reject
1  a a
0 c F
Economics 20 - Prof. Anderson 29
The R2 form of the F statistic
Because the SSR’s may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = SST(1 – R2) for any
regression, so can substitute in for SSRu and SSRur

F
R
R q2 2

 
ur r
, where again
1  R n  k  1
2
ur

r is restricted and ur is unrestricted


Economics 20 - Prof. Anderson 30
Overall Significance
A special case of exclusion restrictions is to test
H0: b1 = b2 =…= bk = 0
Since the R2 from a model with only an intercept
will be zero, the F statistic is simply

2
R k
F
1  R  n  k  1
2

Economics 20 - Prof. Anderson 31


General Linear Restrictions
The basic form of the F statistic will work
for any set of linear restrictions
First estimate the unrestricted model and
then estimate the restricted model
In each case, make note of the SSR
Imposing the restrictions can be tricky –
will likely have to redefine variables again

Economics 20 - Prof. Anderson 32


Example:
Use same voting model as before
Model is voteA = b0 + b1log(expendA) +
b2log(expendB) + b3prtystrA + u
now null is H0: b1 = 1, b3 = 0
Substituting in the restrictions: voteA = b0
+ log(expendA) + b2log(expendB) + u, so
Use voteA - log(expendA) = b0 +
b2log(expendB) + u as restricted model

Economics 20 - Prof. Anderson 33


F Statistic Summary
Just as with t statistics, p-values can be
calculated by looking up the percentile in
the appropriate F distribution
Stata will do this by entering: display
fprob(q, n – k – 1, F), where the appropriate
values of F, q,and n – k – 1 are used
If only one exclusion is being tested, then F
= t2, and the p-values will be the same

Economics 20 - Prof. Anderson 34


Multiple Regression Analysis

y = b0 + b1x1 + b2x2 + . . . bkxk + u

3. Asymptotic Properties

Economics 20 - Prof. Anderson 1


Consistency
Under the Gauss-Markov assumptions OLS
is BLUE, but in other cases it won’t always
be possible to find unbiased estimators
In those cases, we may settle for estimators
that are consistent, meaning as n  ∞, the
distribution of the estimator collapses to the
parameter value

Economics 20 - Prof. Anderson 2


Sampling Distributions as n 
n3
n1 < n2 < n3

n2

n1
b1
Economics 20 - Prof. Anderson 3
Consistency of OLS
Under the Gauss-Markov assumptions, the
OLS estimator is consistent (and unbiased)
Consistency can be proved for the simple
regression case in a manner similar to the
proof of unbiasedness
Will need to take probability limit (plim) to
establish consistency

Economics 20 - Prof. Anderson 4


Proving Consistency

bˆ1   xi1  x1 yi   x  x   i1 1


2

 b1  n 1
 x  x u  n  x  x   1 2
i1 1 i i1 1

plimbˆ1  b1  Cov x1 , u  Varx1   b1


because Covx1 , u   0

Economics 20 - Prof. Anderson 5


A Weaker Assumption
For unbiasedness, we assumed a zero
conditional mean – E(u|x1, x2,…,xk) = 0
For consistency, we can have the weaker
assumption of zero mean and zero
correlation – E(u) = 0 and Cov(xj,u) = 0, for
j = 1, 2, …, k
Without this assumption, OLS will be
biased and inconsistent!

Economics 20 - Prof. Anderson 6


Deriving the Inconsistency
Just as we could derive the omitted variable bias
earlier, now we want to think about the
inconsistency, or asymptotic bias, in this case

True model : y  b 0  b1 x1  b 2 x2  v
You think : y  b 0  b1 x1  u , so that
~
u  b 2 x2  v and, plimb1  b1  b 2
where   Covx1 , x2  Var x1 
Economics 20 - Prof. Anderson 7
Asymptotic Bias (cont)
So, thinking about the direction of the
asymptotic bias is just like thinking about
the direction of bias for an omitted variable
Main difference is that asymptotic bias uses
the population variance and covariance,
while bias uses the sample counterparts
Remember, inconsistency is a large sample
problem – it doesn’t go away as add data

Economics 20 - Prof. Anderson 8


Large Sample Inference
Recall that under the CLM assumptions,
the sampling distributions are normal, so we
could derive t and F distributions for testing
This exact normality was due to assuming
the population error distribution was normal
This assumption of normal errors implied
that the distribution of y, given the x’s, was
normal as well

Economics 20 - Prof. Anderson 9


Large Sample Inference (cont)
Easy to come up with examples for which
this exact normality assumption will fail
Any clearly skewed variable, like wages,
arrests, savings, etc. can’t be normal, since a
normal distribution is symmetric
Normality assumption not needed to
conclude OLS is BLUE, only for inference

Economics 20 - Prof. Anderson 10


Central Limit Theorem
Based on the central limit theorem, we can show
that OLS estimators are asymptotically normal
Asymptotic Normality implies that P(Z<z)F(z)
as n , or P(Z<z)  F(z)
The central limit theorem states that the
standardized average of any population with mean
m and variance s2 is asymptotically ~N(0,1), or
Y  mY a
Z ~ N 0,1
s
n
Economics 20 - Prof. Anderson 11
Asymptotic Normality
Under the Gauss - Markov assumptions,
ˆ   a

(i) n b j  b j ~ Normal 0, s a j ,
2 2

where a 2
j  plim n  rˆ 
1 2
ij

(ii) sˆ is a consistent estimator of s


2 2

   
(iii) bˆ j  b j se bˆ j ~ Normal0,1
a

Economics 20 - Prof. Anderson 12


Asymptotic Normality (cont)
Because the t distribution approaches the
normal distribution for large df, we can also
say that

bˆ j  b j   
se ˆ ~t
b j n  k 1
a

Note that while we no longer need to


assume normality with a large sample, we
do still need homoskedasticity
Economics 20 - Prof. Anderson 13
Asymptotic Standard Errors
If u is not normally distributed, we sometimes
will refer to the standard error as an asymptotic
standard error, since

  sˆ 2
se bˆ j 
SST j 1  R  2
j  ,

 
se bˆ j  c j n
So, we can expect standard errors to shrink at a
rate proportional to the inverse of √n
Economics 20 - Prof. Anderson 14
Lagrange Multiplier statistic
With large samples, by relying on
asymptotic normality for inference, we can
use more than t and F stats
The Lagrange multiplier or LM statistic is
an alternative for testing multiple exclusion
restrictions
Because the LM statistic uses an auxiliary
regression it’s sometimes called an nR2 stat

Economics 20 - Prof. Anderson 15


LM Statistic (cont)
Suppose we have a standard model, y = b0 + b1x1
+ b2x2 + . . . bkxk + u and our null hypothesis is
H0: bk-q+1 = 0, ... , bk = 0
First, we just run the restricted model
~ ~ ~
y  b  b x  ...  b x  u~
0 1 1 k q k q

Now take the residuals, u~, and regress


u~ on x1 , x2 ,..., xk (i.e. all the variables)
LM  nRu2 , where Ru2 is from this reg
Economics 20 - Prof. Anderson 16
LM Statistic (cont)
a
LM ~  q2 , so can choose a critical
value, c, from a  q2 distribution, or
just calculate a p - value for  q2

With a large sample, the result from an F test and


from an LM test should be similar
Unlike the F test and t test for one exclusion, the
LM test and F test will not be identical

Economics 20 - Prof. Anderson 17


Asymptotic Efficiency
Estimators besides OLS will be consistent
However, under the Gauss-Markov
assumptions, the OLS estimators will have
the smallest asymptotic variances
We say that OLS is asymptotically efficient
Important to remember our assumptions
though, if not homoskedastic, not true

Economics 20 - Prof. Anderson 18


Multiple Regression Analysis

y = b0 + b1x1 + b2x2 + . . . bkxk + u

4. Further Issues

Economics 20 - Prof. Anderson 1


Redefining Variables
Changing the scale of the y variable will
lead to a corresponding change in the scale
of the coefficients and standard errors, so no
change in the significance or interpretation
Changing the scale of one x variable will
lead to a change in the scale of that
coefficient and standard error, so no change
in the significance or interpretation
Economics 20 - Prof. Anderson 2
Beta Coefficients
Occasional you’ll see reference to a
“standardized coefficient” or “beta
coefficient” which has a specific meaning
Idea is to replace y and each x variable with
a standardized version – i.e. subtract mean
and divide by standard deviation
Coefficient reflects standard deviation of y
for a one standard deviation change in x

Economics 20 - Prof. Anderson 3


Functional Form
OLS can be used for relationships that are
not strictly linear in x and y by using
nonlinear functions of x and y – will still be
linear in the parameters
Can take the natural log of x, y or both
Can use quadratic forms of x
Can use interactions of x variables

Economics 20 - Prof. Anderson 4


Interpretation of Log Models
If the model is ln(y) = b0 + b1ln(x) + u
b1 is the elasticity of y with respect to x
If the model is ln(y) = b0 + b1x + u
b1 is approximately the percentage change
in y given a 1 unit change in x
If the model is y = b0 + b1ln(x) + u
b1 is approximately the change in y for a
100 percent change in x
Economics 20 - Prof. Anderson 5
Why use log models?
Log models are invariant to the scale of the
variables since measuring percent changes
They give a direct estimate of elasticity
For models with y > 0, the conditional
distribution is often heteroskedastic or
skewed, while ln(y) is much less so
The distribution of ln(y) is more narrow,
limiting the effect of outliers

Economics 20 - Prof. Anderson 6


Some Rules of Thumb
What types of variables are often used in
log form?
Dollar amounts that must be positive
Very large variables, such as population
What types of variables are often used in
level form?
Variables measured in years
Variables that are a proportion or percent
Economics 20 - Prof. Anderson 7
Quadratic Models
For a model of the form y = b0 + b1x + b2x2 + u
we can’t interpret b1 alone as measuring the
change in y with respect to x, we need to take into
account b2 as well, since


yˆ  bˆ1  2 bˆ 2 x x, so 
yˆ ˆ
 b1  2 bˆ 2 x
x

Economics 20 - Prof. Anderson 8


More on Quadratic Models
Suppose that the coefficient on x is positive and
the coefficient on x2 is negative
Then y is increasing in x at first, but will
eventually turn around and be decreasing in x

For bˆ1  0 and bˆ 2  0 the turning point


will be at x  bˆ1 2 bˆ 2
*
 
Economics 20 - Prof. Anderson 9
More on Quadratic Models
Suppose that the coefficient on x is negative and
the coefficient on x2 is positive
Then y is decreasing in x at first, but will
eventually turn around and be increasing in x
For bˆ1  0 and bˆ 2  0 the turning point
 
will be at x  bˆ1 2 bˆ 2 , which is
*

the same as when bˆ1  0 and bˆ 2  0


Economics 20 - Prof. Anderson 10
Interaction Terms
For a model of the form y = b0 + b1x1 + b2x2 +
b3x1x2 + u we can’t interpret b1 alone as measuring
the change in y with respect to x1, we need to take
into account b3 as well, since
y
 b1  b 3 x2 , so to summarize
x1
the effect of x1 on y we typically
evaluate the above at x2
Economics 20 - Prof. Anderson 11
Adjusted R-Squared
Recall that the R2 will always increase as more
variables are added to the model
The adjusted R2 takes into account the number of
variables in a model, and may decrease

R 2
 1
SSR n  k  1
SST n  1
ˆ 2
 1
SST n  1
Economics 20 - Prof. Anderson 12
Adjusted R-Squared (cont)
It’s easy to see that the adjusted R2 is just (1
– R2)(n – 1) / (n – k – 1), but most packages
will give you both R2 and adj-R2
You can compare the fit of 2 models (with
the same y) by comparing the adj-R2
You cannot use the adj-R2 to compare
models with different y’s (e.g. y vs. ln(y))

Economics 20 - Prof. Anderson 13


Goodness of Fit
Important not to fixate too much on adj-R2
and lose sight of theory and common sense
If economic theory clearly predicts a
variable belongs, generally leave it in
Don’t want to include a variable that
prohibits a sensible interpretation of the
variable of interest – remember ceteris
paribus interpretation of multiple regression

Economics 20 - Prof. Anderson 14


Standard Errors for Predictions
Suppose we want to use our estimates to
obtain a specific prediction?
First, suppose that we want an estimate of
E(y|x1=c1,…xk=ck) = q0 = b0+b1c1+ …+
bkck
This is easy to obtain by substituting the x’s
in our estimated model with c’s , but what
about a standard error?
Really just a test of a linear combination
Economics 20 - Prof. Anderson 15
Predictions (cont)
Can rewrite as b0 = q0 – b1c1 – … – bkck
Substitute in to obtain y = q0 + b1 (x1 - c1) +
… + bk (xk - ck) + u
So, if you regress yi on (xij - cij) the
intercept will give the predicted value and
its standard error
Note that the standard error will be smallest
when the c’s equal the means of the x’s

Economics 20 - Prof. Anderson 16


Predictions (cont)
This standard error for the expected value is not
the same as a standard error for an outcome on y
We need to also take into account the variance in
the unobserved error. Let the prediction error be
 
eˆ 0  y 0  yˆ 0  b 0  b1 x10    b k xk0  u 0  yˆ 0
E eˆ   0 and Var eˆ   Var  yˆ   Var u  
0 0 0 0

Var  yˆ    , so seeˆ   se yˆ   ˆ 


1
0 2 0 0 2 2 2

Economics 20 - Prof. Anderson 17


Prediction interval
 
ˆe 0 se eˆ 0 ~ t n  k 1 , so given that eˆ 0  y 0  yˆ 0
0
we have a 95% prediction interval for y
 
yˆ 0  t.025 • se eˆ 0
Usually the estimate of s2 is much larger than the
variance of the prediction, thus
This prediction interval will be a lot wider than the
simple confidence interval for the prediction

Economics 20 - Prof. Anderson 18


Residual Analysis
Information can be obtained from looking
at the residuals (i.e. predicted vs. observed)
Example: Regress price of cars on
characteristics – big negative residuals
indicate a good deal
Example: Regress average earnings for
students from a school on student
characteristics – big positive residuals
indicate greatest value-added
Economics 20 - Prof. Anderson 19
Predicting y in a log model
Simple exponentiation of the predicted ln(y) will
underestimate the expected value of y
Instead need to scale this up by an estimate of the
expected value of exp(u)

E exp(u )   exp( 2) if u ~ N 0,   2

In this case can predict y as follows
 
yˆ  exp ˆ 2 2 expln̂ y 
Economics 20 - Prof. Anderson 20
Predicting y in a log model
If u is not normal, E(exp(u)) must be
estimated using an auxiliary regression
Create the exponentiation of the predicted
ln(y), and regress y on it with no intercept
The coefficient on this variable is the
estimate of E(exp(u)) that can be used to
scale up the exponentiation of the predicted
ln(y) to obtain the predicted y

Economics 20 - Prof. Anderson 21


Comparing log and level models
A by-product of the previous procedure is a
method to compare a model in logs with
one in levels.
Take the fitted values from the auxiliary
regression, and find the sample correlation
between this and y
Compare the R2 from the levels regression
with this correlation squared

Economics 20 - Prof. Anderson 22


Multiple Regression Analysis

y = b0 + b1x1 + b2x2 + . . . bkxk + u

4. Further Issues

Economics 20 - Prof. Anderson 1


Redefining Variables
Changing the scale of the y variable will
lead to a corresponding change in the scale
of the coefficients and standard errors, so no
change in the significance or interpretation
Changing the scale of one x variable will
lead to a change in the scale of that
coefficient and standard error, so no change
in the significance or interpretation
Economics 20 - Prof. Anderson 2
Beta Coefficients
Occasional you’ll see reference to a
“standardized coefficient” or “beta
coefficient” which has a specific meaning
Idea is to replace y and each x variable with
a standardized version – i.e. subtract mean
and divide by standard deviation
Coefficient reflects standard deviation of y
for a one standard deviation change in x

Economics 20 - Prof. Anderson 3


Functional Form
OLS can be used for relationships that are
not strictly linear in x and y by using
nonlinear functions of x and y – will still be
linear in the parameters
Can take the natural log of x, y or both
Can use quadratic forms of x
Can use interactions of x variables

Economics 20 - Prof. Anderson 4


Interpretation of Log Models
If the model is ln(y) = b0 + b1ln(x) + u
b1 is the elasticity of y with respect to x
If the model is ln(y) = b0 + b1x + u
b1 is approximately the percentage change
in y given a 1 unit change in x
If the model is y = b0 + b1ln(x) + u
b1 is approximately the change in y for a
100 percent change in x
Economics 20 - Prof. Anderson 5
Why use log models?
Log models are invariant to the scale of the
variables since measuring percent changes
They give a direct estimate of elasticity
For models with y > 0, the conditional
distribution is often heteroskedastic or
skewed, while ln(y) is much less so
The distribution of ln(y) is more narrow,
limiting the effect of outliers

Economics 20 - Prof. Anderson 6


Some Rules of Thumb
What types of variables are often used in
log form?
Dollar amounts that must be positive
Very large variables, such as population
What types of variables are often used in
level form?
Variables measured in years
Variables that are a proportion or percent
Economics 20 - Prof. Anderson 7
Quadratic Models
For a model of the form y = b0 + b1x + b2x2 + u
we can’t interpret b1 alone as measuring the
change in y with respect to x, we need to take into
account b2 as well, since


yˆ  bˆ1  2 bˆ 2 x x, so 
yˆ ˆ
 b1  2 bˆ 2 x
x

Economics 20 - Prof. Anderson 8


More on Quadratic Models
Suppose that the coefficient on x is positive and
the coefficient on x2 is negative
Then y is increasing in x at first, but will
eventually turn around and be decreasing in x

For bˆ1  0 and bˆ 2  0 the turning point


will be at x  bˆ1 2 bˆ 2
*
 
Economics 20 - Prof. Anderson 9
More on Quadratic Models
Suppose that the coefficient on x is negative and
the coefficient on x2 is positive
Then y is decreasing in x at first, but will
eventually turn around and be increasing in x
For bˆ1  0 and bˆ 2  0 the turning point
 
will be at x  bˆ1 2 bˆ 2 , which is
*

the same as when bˆ1  0 and bˆ 2  0


Economics 20 - Prof. Anderson 10
Interaction Terms
For a model of the form y = b0 + b1x1 + b2x2 +
b3x1x2 + u we can’t interpret b1 alone as measuring
the change in y with respect to x1, we need to take
into account b3 as well, since
y
 b1  b 3 x2 , so to summarize
x1
the effect of x1 on y we typically
evaluate the above at x2
Economics 20 - Prof. Anderson 11
Adjusted R-Squared
Recall that the R2 will always increase as more
variables are added to the model
The adjusted R2 takes into account the number of
variables in a model, and may decrease

R 2
 1
SSR n  k  1
SST n  1
ˆ 2
 1
SST n  1
Economics 20 - Prof. Anderson 12
Adjusted R-Squared (cont)
It’s easy to see that the adjusted R2 is just (1
– R2)(n – 1) / (n – k – 1), but most packages
will give you both R2 and adj-R2
You can compare the fit of 2 models (with
the same y) by comparing the adj-R2
You cannot use the adj-R2 to compare
models with different y’s (e.g. y vs. ln(y))

Economics 20 - Prof. Anderson 13


Goodness of Fit
Important not to fixate too much on adj-R2
and lose sight of theory and common sense
If economic theory clearly predicts a
variable belongs, generally leave it in
Don’t want to include a variable that
prohibits a sensible interpretation of the
variable of interest – remember ceteris
paribus interpretation of multiple regression

Economics 20 - Prof. Anderson 14


Standard Errors for Predictions
Suppose we want to use our estimates to
obtain a specific prediction?
First, suppose that we want an estimate of
E(y|x1=c1,…xk=ck) = q0 = b0+b1c1+ …+
bkck
This is easy to obtain by substituting the x’s
in our estimated model with c’s , but what
about a standard error?
Really just a test of a linear combination
Economics 20 - Prof. Anderson 15
Predictions (cont)
Can rewrite as b0 = q0 – b1c1 – … – bkck
Substitute in to obtain y = q0 + b1 (x1 - c1) +
… + bk (xk - ck) + u
So, if you regress yi on (xij - cij) the
intercept will give the predicted value and
its standard error
Note that the standard error will be smallest
when the c’s equal the means of the x’s

Economics 20 - Prof. Anderson 16


Predictions (cont)
This standard error for the expected value is not
the same as a standard error for an outcome on y
We need to also take into account the variance in
the unobserved error. Let the prediction error be
 
eˆ 0  y 0  yˆ 0  b 0  b1 x10    b k xk0  u 0  yˆ 0
E eˆ   0 and Var eˆ   Var  yˆ   Var u  
0 0 0 0

Var  yˆ    , so seeˆ   se yˆ   ˆ 


1
0 2 0 0 2 2 2

Economics 20 - Prof. Anderson 17


Prediction interval
 
ˆe 0 se eˆ 0 ~ t n  k 1 , so given that eˆ 0  y 0  yˆ 0
0
we have a 95% prediction interval for y
 
yˆ 0  t.025 • se eˆ 0
Usually the estimate of s2 is much larger than the
variance of the prediction, thus
This prediction interval will be a lot wider than the
simple confidence interval for the prediction

Economics 20 - Prof. Anderson 18


Residual Analysis
Information can be obtained from looking
at the residuals (i.e. predicted vs. observed)
Example: Regress price of cars on
characteristics – big negative residuals
indicate a good deal
Example: Regress average earnings for
students from a school on student
characteristics – big positive residuals
indicate greatest value-added
Economics 20 - Prof. Anderson 19
Predicting y in a log model
Simple exponentiation of the predicted ln(y) will
underestimate the expected value of y
Instead need to scale this up by an estimate of the
expected value of exp(u)

E exp(u )   exp( 2) if u ~ N 0,   2

In this case can predict y as follows
 
yˆ  exp ˆ 2 2 expln̂ y 
Economics 20 - Prof. Anderson 20
Predicting y in a log model
If u is not normal, E(exp(u)) must be
estimated using an auxiliary regression
Create the exponentiation of the predicted
ln(y), and regress y on it with no intercept
The coefficient on this variable is the
estimate of E(exp(u)) that can be used to
scale up the exponentiation of the predicted
ln(y) to obtain the predicted y

Economics 20 - Prof. Anderson 21


Comparing log and level models
A by-product of the previous procedure is a
method to compare a model in logs with
one in levels.
Take the fitted values from the auxiliary
regression, and find the sample correlation
between this and y
Compare the R2 from the levels regression
with this correlation squared

Economics 20 - Prof. Anderson 22


Multiple Regression Analysis

y = b0 + b1x1 + b2x2 + . . . bkxk + u

6. Heteroskedasticity

Economics 20 - Prof. Anderson 1


What is Heteroskedasticity
Recall the assumption of homoskedasticity
implied that conditional on the explanatory
variables, the variance of the unobserved error, u,
was constant
If this is not true, that is if the variance of u is
different for different values of the x’s, then the
errors are heteroskedastic
Example: estimating returns to education and
ability is unobservable, and think the variance in
ability differs by educational attainment

Economics 20 - Prof. Anderson 2


Example of Heteroskedasticity
f(y|x)

.
. E(y|x) = b0 + b1x

.
x1 x2 x3 x
Economics 20 - Prof. Anderson 3
Why Worry About
Heteroskedasticity?
OLS is still unbiased and consistent, even if
we do not assume homoskedasticity
The standard errors of the estimates are
biased if we have heteroskedasticity
If the standard errors are biased, we can not
use the usual t statistics or F statistics or LM
statistics for drawing inferences

Economics 20 - Prof. Anderson 4


Variance with Heteroskedasticity

ˆ  xi  x ui
For the simple case, b1  b1  , so
 xi  x 
2


 i 
  
2
x  x 2

Var bˆ1  , where SSTx    xi  x 


i 2
2
SST x

A valid estimator for this when  i2   2is



 ix  x  ˆ
u 2 2

2
i
, where uˆi are are the OLS residuals
SST x

Economics 20 - Prof. Anderson 5


Variance with Heteroskedasticity

For the general multiple regression model, a valid


 
estimator of Var bˆ with heteroskedasticity is
j

 
Varˆ bˆ j 
 ij uˆi
rˆ 2 2

, where rˆij is the i th residual from


SST j2
regressing x j on all other independent variables, and
SST j is the sum of squared residuals from this regression

Economics 20 - Prof. Anderson 6


Robust Standard Errors
Now that we have a consistent estimate of
the variance, the square root can be used as
a standard error for inference
Typically call these robust standard errors
Sometimes the estimated variance is
corrected for degrees of freedom by
multiplying by n/(n – k – 1)
As n → ∞ it’s all the same, though

Economics 20 - Prof. Anderson 7


Robust Standard Errors (cont)
Important to remember that these robust
standard errors only have asymptotic
justification – with small sample sizes t
statistics formed with robust standard errors
will not have a distribution close to the t,
and inferences will not be correct
In Stata, robust standard errors are easily
obtained using the robust option of reg
Economics 20 - Prof. Anderson 8
A Robust LM Statistic
Run OLS on the restricted model and save the
residuals ŭ
Regress each of the excluded variables on all of
the included variables (q different regressions) and
save each set of residuals ř1, ř2, …, řq
Regress a variable defined to be = 1 on ř1 ŭ, ř2 ŭ,
…, řq ŭ, with no intercept
The LM statistic is n – SSR1, where SSR1 is the
sum of squared residuals from this final regression

Economics 20 - Prof. Anderson 9


Testing for Heteroskedasticity
Essentially want to test H0: Var(u|x1, x2,…,
xk) = 2, which is equivalent to H0: E(u2|x1,
x2,…, xk) = E(u2) = 2
If assume the relationship between u2 and xj
will be linear, can test as a linear restriction
So, for u2 = d0 + d1x1 +…+ dk xk + v) this
means testing H0: d1 = d2 = … = dk = 0

Economics 20 - Prof. Anderson 10


The Breusch-Pagan Test
Don’t observe the error, but can estimate it with
the residuals from the OLS regression
After regressing the residuals squared on all of the
x’s, can use the R2 to form an F or LM test
The F statistic is just the reported F statistic for
overall significance of the regression, F =
[R2/k]/[(1 – R2)/(n – k – 1)], which is distributed
Fk, n – k - 1
The LM statistic is LM = nR2, which is distributed
c 2k

Economics 20 - Prof. Anderson 11


The White Test
The Breusch-Pagan test will detect any
linear forms of heteroskedasticity
The White test allows for nonlinearities by
using squares and crossproducts of all the x’s
Still just using an F or LM to test whether
all the xj, xj2, and xjxh are jointly significant
This can get to be unwieldy pretty quickly

Economics 20 - Prof. Anderson 12


Alternate form of the White test
Consider that the fitted values from OLS, ŷ,
are a function of all the x’s
Thus, ŷ2 will be a function of the squares
and crossproducts and ŷ and ŷ2 can proxy
for all of the xj, xj2, and xjxh, so
Regress the residuals squared on ŷ and ŷ2
and use the R2 to form an F or LM statistic
Note only testing for 2 restrictions now

Economics 20 - Prof. Anderson 13


Weighted Least Squares
While it’s always possible to estimate
robust standard errors for OLS estimates, if
we know something about the specific form
of the heteroskedasticity, we can obtain
more efficient estimates than OLS
The basic idea is going to be to transform
the model into one that has homoskedastic
errors – called weighted least squares
Economics 20 - Prof. Anderson 14
Case of form being known up to
a multiplicative constant
Suppose the heteroskedasticity can be
modeled as Var(u|x) = 2h(x), where the
trick is to figure out what h(x) ≡ hi looks like
E(ui/√hi|x) = 0, because hi is only a function
of x, and Var(ui/√hi|x) = 2, because we
know Var(u|x) = 2hi
So, if we divided our whole equation by √hi
we would have a model where the error is
homoskedastic
Economics 20 - Prof. Anderson 15
Generalized Least Squares
Estimating the transformed equation by
OLS is an example of generalized least
squares (GLS)
GLS will be BLUE in this case
GLS is a weighted least squares (WLS)
procedure where each squared residual is
weighted by the inverse of Var(ui|xi)

Economics 20 - Prof. Anderson 16


Weighted Least Squares
While it is intuitive to see why performing
OLS on a transformed equation is
appropriate, it can be tedious to do the
transformation
Weighted least squares is a way of getting
the same thing, without the transformation
Idea is to minimize the weighted sum of
squares (weighted by 1/hi)

Economics 20 - Prof. Anderson 17


More on WLS
WLS is great if we know what Var(ui|xi)
looks like
In most cases, won’t know form of
heteroskedasticity
Example where do is if data is aggregated,
but model is individual level
Want to weight each aggregate observation
by the inverse of the number of individuals

Economics 20 - Prof. Anderson 18


Feasible GLS
More typical is the case where you don’t
know the form of the heteroskedasticity
In this case, you need to estimate h(xi)
Typically, we start with the assumption of a
fairly flexible model, such as
Var(u|x) = 2exp(d0 + d1x1 + …+ dkxk)
Since we don’t know the d, must estimate

Economics 20 - Prof. Anderson 19


Feasible GLS (continued)
Our assumption implies that u2 = 2exp(d0
+ d1x1 + …+ dkxk)v
Where E(v|x) = 1, then if E(v) = 1
ln(u2) = a0 + d1x1 + …+ dkxk + e
Where E(e) = 1 and e is independent of x
Now, we know that û is an estimate of u, so
we can estimate this by OLS

Economics 20 - Prof. Anderson 20


Feasible GLS (continued)
Now, an estimate of h is obtained as ĥ =
exp(ĝ), and the inverse of this is our weight
So, what did we do?
Run the original OLS model, save the
residuals, û, square them and take the log
Regress ln(û2) on all of the independent
variables and get the fitted values, ĝ
Do WLS using 1/exp(ĝ) as the weight

Economics 20 - Prof. Anderson 21


WLS Wrapup
When doing F tests with WLS, form the weights
from the unrestricted model and use those weights to
do WLS on the restricted model as well as the
unrestricted model
Remember we are using WLS just for efficiency –
OLS is still unbiased & consistent
Estimates will still be different due to sampling
error, but if they are very different then it’s likely that
some other Gauss-Markov assumption is false

Economics 20 - Prof. Anderson 22

Você também pode gostar