Escolar Documentos
Profissional Documentos
Cultura Documentos
Statistics
(i.e. things you learned in Ec 10 and
need to remember to do well in this
class!)
Var ( X ) E X m X
2
X 2
Economics 20 - Prof. Anderson 5
More on Variance
The square root of Var(X) is the standard
deviation of X
Var(X) can alternatively be written in terms
of a weighted sum of squared deviations,
because
E X m X xi m X f xi
2 2
XY Cov( X , Y )
XY
X Y Var( X )Var(Y )2
1
( xm )2
1
f ( x) e 2 2
2
Economics 20 - Prof. Anderson 12
The Standard Normal
Any random variable can be “standardized” by
subtracting the mean, m, and dividing by the
standard deviation, , so E(Z)=0, Var(Z)=1
Thus, the standard normal, N(0,1), has pdf
z2
z
1 2
e
2
Economics 20 - Prof. Anderson 13
Properties of the Normal
If X~N(m,2), then aX+b ~N(am+b,a22)
A linear combination of independent,
identically distributed (iid) normal random
variables will also be normally distributed
If Y1,Y2, … Yn are iid and ~N(m,2), then
2
Y ~ N
m ,
n
Economics 20 - Prof. Anderson 14
Cumulative Distribution Function
For a pdf, f(x), where f(x) is P(X = x), the
cumulative distribution function (cdf), F(x),
is P(X x); P(X > x) = 1 – F(x) =P(X< – x)
For the standard normal, (z), the cdf is
F(z)= P(Z<z), so
P(|Z|>a) = 2P(Z>a) = 2[1-F(a)]
P(a Z b) = F(b) – F(a)
Z
T
X
n
Economics 20 - Prof. Anderson 17
The F Distribution
If a random variable, F, has an F distribution with
(k1,k2) df, then it is denoted as F~Fk1,k2
F is a function of X1~2k1 and X2~2k2 as follows:
X1
F k1
X2
k 2
n
1
Y Yi
n i 1
Economics 20 - Prof. Anderson 21
What Make a Good Estimator?
Unbiasedness
Efficiency
Mean Square Error (MSE)
E (Y ) mY
1 n
1 n
E (Y ) E Yi E (Yi )
n i 1 n i 1
n
1 1
mY nmY mY
n i 1 n
1 1
n n
2
Var(Y ) Var Yi 2 2
n i 1 n i 1 n
Economics 20 - Prof. Anderson 25
MSE of Estimator
What if can’t find an unbiased estimator?
Define mean square error as E[(W-q)2]
Get trade off between unbiasedness and
efficiency, since MSE = variance + bias2
For our example, that means minimizing
E Y mY Var Y E Y mY
2 2
P Y mY 0 as n
Economics 20 - Prof. Anderson 27
More on Consistency
An unbiased estimator is not necessarily
consistent – suppose choose Y1 as estimate
of mY, since E(Y1)= mY, then plim(Y1) mY
An unbiased estimator, W, is consistent if
Var(W) 0 as n
Law of Large Numbers refers to the
consistency of sample average as estimator
for m, that is, to the fact that:
plim( Y) m Y
Economics 20 - Prof. Anderson 28
Central Limit Theorem
Asymptotic Normality implies that P(Z<z)F(z)
as n , or P(Z<z) F(z)
The central limit theorem states that the
standardized average of any population with mean
m and variance 2 is asymptotically ~N(0,1), or
Y mY
~ N 0,1
a
Z
n
Economics 20 - Prof. Anderson 29
Estimate of Population Variance
We have a good estimate of mY, would like
a good estimate of 2Y
Can use the sample variance given below –
note division by n-1, not n, since mean is
estimated too – if know m can use n
2
Yi Y
n
1
S
2
n 1 i 1
Economics 20 - Prof. Anderson 30
Estimators as Random Variables
Each of our sample statistics (e.g. the
sample mean, sample variance, etc.) is a
random variable - Why?
Each time we pull a random sample, we’ll
get different sample statistics
If we pull lots and lots of samples, we’ll get
a distribution of sample statistics
Earnings 0 1education u
y = b0 + b1x + u
E(u) = 0
. E(y|x) = b + b x
0 1
.
x1 x2
Economics 20 - Prof. Anderson 6
Ordinary Least Squares
Basic idea of regression is to estimate the
population parameters from a sample
Let {(xi,yi): i=1, …,n} denote a random
sample of size n from the population
For each observation in this sample, it will
be the case that
yi = b0 + b1xi + ui
y3 .} u3
y2 u2 {.
y1 .} u1
x1 x2 x3 x4 x
Economics 20 - Prof. Anderson 8
Deriving OLS Estimates
To derive the OLS estimates we need to
realize that our main assumption of E(u|x) =
E(u) = 0 also implies that
Cov(x,u) = E(xu) = 0
E(y – b0 – b1x) = 0
E[x(y – b0 – b1x)] = 0
y
n
n 1
i bˆ 0 bˆ1 xi 0
i 1
n
n 1
i i 0 1 i
x
i 1
y ˆ bˆ x 0
b
Economics 20 - Prof. Anderson 12
More Derivation of OLS
Given the definition of a sample mean, and
properties of summation, we can rewrite the first
condition as follows
y bˆ0 bˆ1 x ,
or
bˆ0 y bˆ1 x
Economics 20 - Prof. Anderson 13
More Derivation of OLS
n
i i
x
i 1
y y ˆ x bˆ x 0
b 1 1 i
n n
x
i iy y ˆ
b 1 xi xi x
i 1 i 1
n n
xi x yi y b1 xi x
ˆ 2
i 1 i 1
x x y i i y
bˆ1 i 1
n
x x
2
i
i 1
n
provided that xi x 0
2
i 1
} û1
y1 .
x1 x2 x3 x4 x
Economics 20 - Prof. Anderson 18
Alternate approach to derivation
Given the intuitive idea of fitting a line, we can
set up a formal minimization problem
That is, we want to choose our parameters such
that we minimize the following:
n n
ui yi b 0 b1 xi
ˆ ˆ 2
ˆ 2
i 1 i 1
n
y
i 1
ˆ
b
i 0 ˆ x 0
b 1 i
n
i i 0 1i
x y
i 1
ˆ bˆ x 0
b
Economics 20 - Prof. Anderson 20
Algebraic Properties of OLS
The sum of the OLS residuals is zero
Thus, the sample average of the OLS
residuals is zero as well
The sample covariance between the
regressors and the OLS residuals is zero
The OLS regression line always goes
through the mean of the sample
n uˆ i
x uˆ
i 1
i i 0
y bˆ0 bˆ1 x
Economics 20 - Prof. Anderson 22
More terminology
We can think of each observation as being made
up of an explained part, and an unexplained part,
yi yˆ i uˆi We then define the following :
y y is the total sum of squares (SST)
2
i
y y y yˆ yˆ y
2 2
i i i i
uˆ yˆ y
2
i i
uˆ 2 uˆ yˆ y yˆ y
2 2
i i i i
SSR 2 uˆ yˆ y SSE
i i
R2 = SSE/SST = 1 – SSR/SST
x x y
bˆ
1 i
2
i
, where
s x
s xi x
2 2
x
x x y x x b b x
i i i 0 1 i ui
x x b x x b x
i 0 i 1 i
x x u
i i
b x x b x x x
0 i 1 i i
x x u
i i
x x x x x
2
i i i
x x u
bˆ1 b1 i
2
i
s x
Economics 20 - Prof. Anderson 29
Unbiasedness of OLS (cont)
ˆ
E b1 b1
1
2 d i E ui b1
sx
. E(y|x) = b + b x
0 1
.
x1 x2
Economics 20 - Prof. Anderson 34
Heteroskedastic Case
f(y|x)
.
. E(y|x) = b0 + b1x
.
x1 x2 x3 x
Economics 20 - Prof. Anderson 35
Variance of OLS (cont)
ˆ 1
Var b1 Var b1
2 d i ui
sx
2 2
2 Var d i ui
1 1
sx
2
sx
i Varui
d 2
2 2
1 1
2
sx
d s s sx2
i
2 2 2
i
d 2
2
1 2 s2 ˆ
s 2
2 sx 2 Var b1
sx sx
Economics 20 - Prof. Anderson 36
Variance of OLS Summary
The larger the error variance, s2, the larger
the variance of the slope estimate
The larger the variability in the xi, the
smaller the variance of the slope estimate
As a result, a larger sample size should
decrease the variance of the slope estimate
Problem that the error variance is unknown
ui SSR / n 2
1
s
ˆ 2
n 2 ˆ 2
se bˆ1 sˆ / xi x
2
1
2
1. Estimation
bˆ1 rˆi1 yi rˆ 2
i1 , where rˆi1 are
the residuals from the estimated
regression xˆ1 ˆ0 ˆ2 xˆ2
Economics 20 - Prof. Anderson 4
“Partialling Out” continued
Previous equation implies that regressing y
on x1 and x2 gives same effect of x1 as
regressing y on residuals from a regression
of x1 on x2
This means only the part of xi1 that is
uncorrelated with xi2 is being related to yi so
we’re estimating the effect of x1 on y after
x2 has been “partialled out”
Economics 20 - Prof. Anderson 5
Simple vs Multiple Reg Estimate
~ ~ ~
Compare the simple regression y b 0 b1 x1
with the multiple regression yˆ bˆ0 bˆ1 x1 bˆ 2 x2
~
Generally, b1 bˆ1 unless :
bˆ 0 (i.e. no partial effect of x ) OR
2 2
R2 = SSE/SST = 1 – SSR/SST
y y yˆ yˆ
R 2
i
2
i
2
i i
~ x x1 yi
b1 i1
x x1
2
i1
~
b b1 b 2 x x x x x1 ui
x x x
i1 1 i2 i1
x1
2 2
i1 1 i1
~
E b1 b1 b 2
x x x
x x
i1 1 i2
2
i1 1
~
so E b1 b1 b 2 1
~
s 2
Var bˆ j
SST j 1 R 2
j , where
df = n – (k + 1), or df = n – k – 1
df (i.e. degrees of freedom) is the (number
of observations) – (number of estimated
parameters)
Economics 20 - Prof. Anderson 27
The Gauss-Markov Theorem
Given our 5 Gauss-Markov Assumptions it
can be shown that OLS is “BLUE”
Best
Linear
Unbiased
Estimator
Thus, if the assumptions hold, use OLS
2. Inference
. E(y|x) = b + b x
0 1
.
Normal
distributions
x1 x2
Economics 20 - Prof. Anderson 4
Normal Sampling Distributions
Under the CLM assumptions, conditional on
the sample values of the independent variables
j
bˆ ~ Normal b ,Var bˆ , so that
j j
bˆ bj
j
~ Normal 0,1
sd bˆ j
bˆ j is distributed normally because it
is a linear combination of the errors
Economics 20 - Prof. Anderson 5
The t Test
Under the CLM assumptions
bˆ j b j
ˆ
se b
j
~ t n k 1
Fail to reject
reject
1 a a
0 c
Economics 20 - Prof. Anderson 11
One-sided vs Two-sided
Because the t distribution is symmetric,
testing H1: bj < 0 is straightforward. The
critical value is just the negative of before
We can reject the null if the t statistic < –c,
and if the t statistic > than –c then we fail to
reject the null
For a two-sided test, we set the critical
value based on a/2 and reject H1: bj 0 if
the absolute value of the t statistic > c
Economics 20 - Prof. Anderson 12
Two-Sided Alternatives
yi = b0 + b1Xi1 + … + bkXik + ui
H0: bj = 0 H1: bj 0
fail to reject
reject reject
a/2 1 a a/2
-c 0 c
Economics 20 - Prof. Anderson 13
Summary for H0: bj = 0
Unless otherwise stated, the alternative is
assumed to be two-sided
If we reject the null, we typically say “xj is
statistically significant at the a % level”
If we fail to reject the null, we typically say
“xj is statistically insignificant at the a %
level”
bˆ aj
t
j
ˆ , where
se b j
a j 0 for the standard test
ˆ ˆ a
b j c • se b j , where c is the 1 - percentile
2
in a tn k 1 distribution
Economics 20 - Prof. Anderson 16
Computing p-values for t tests
An alternative to the classical approach is
to ask, “what is the smallest significance
level at which the null would be rejected?”
So, compute the t statistic, and then look up
what percentile it is in the appropriate t
distribution – this is the p-value
p-value is the probability we would observe
the t statistic we did, if the null were true
ˆ ˆ
b1 b 2
t
se bˆ1 bˆ 2
Economics 20 - Prof. Anderson 19
Testing Linear Combo (cont)
Since
se bˆ1 bˆ 2 Var bˆ1 bˆ 2 , then
Var bˆ bˆ Var bˆ Var bˆ 2Covbˆ , bˆ
1 2 1 2 1 2
F
SSRr SSRur q
, where
SSRur n k 1
r is restricted and ur is unrestricted
Economics 20 - Prof. Anderson 26
The F statistic
The F statistic is always positive, since the
SSR from the restricted model can’t be less
than the SSR from the unrestricted
Essentially the F statistic is measuring the
relative increase in SSR when moving from
the unrestricted to restricted model
q = number of restrictions, or dfr – dfur
n – k – 1 = dfur
reject
1 a a
0 c F
Economics 20 - Prof. Anderson 29
The R2 form of the F statistic
Because the SSR’s may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = SST(1 – R2) for any
regression, so can substitute in for SSRu and SSRur
F
R
R q2 2
ur r
, where again
1 R n k 1
2
ur
2
R k
F
1 R n k 1
2
3. Asymptotic Properties
n2
n1
b1
Economics 20 - Prof. Anderson 3
Consistency of OLS
Under the Gauss-Markov assumptions, the
OLS estimator is consistent (and unbiased)
Consistency can be proved for the simple
regression case in a manner similar to the
proof of unbiasedness
Will need to take probability limit (plim) to
establish consistency
b1 n 1
x x u n x x 1 2
i1 1 i i1 1
True model : y b 0 b1 x1 b 2 x2 v
You think : y b 0 b1 x1 u , so that
~
u b 2 x2 v and, plimb1 b1 b 2
where Covx1 , x2 Var x1
Economics 20 - Prof. Anderson 7
Asymptotic Bias (cont)
So, thinking about the direction of the
asymptotic bias is just like thinking about
the direction of bias for an omitted variable
Main difference is that asymptotic bias uses
the population variance and covariance,
while bias uses the sample counterparts
Remember, inconsistency is a large sample
problem – it doesn’t go away as add data
(iii) bˆ j b j se bˆ j ~ Normal0,1
a
bˆ j b j
se ˆ ~t
b j n k 1
a
sˆ 2
se bˆ j
SST j 1 R 2
j ,
se bˆ j c j n
So, we can expect standard errors to shrink at a
rate proportional to the inverse of √n
Economics 20 - Prof. Anderson 14
Lagrange Multiplier statistic
With large samples, by relying on
asymptotic normality for inference, we can
use more than t and F stats
The Lagrange multiplier or LM statistic is
an alternative for testing multiple exclusion
restrictions
Because the LM statistic uses an auxiliary
regression it’s sometimes called an nR2 stat
4. Further Issues
yˆ bˆ1 2 bˆ 2 x x, so
yˆ ˆ
b1 2 bˆ 2 x
x
R 2
1
SSR n k 1
SST n 1
ˆ 2
1
SST n 1
Economics 20 - Prof. Anderson 12
Adjusted R-Squared (cont)
It’s easy to see that the adjusted R2 is just (1
– R2)(n – 1) / (n – k – 1), but most packages
will give you both R2 and adj-R2
You can compare the fit of 2 models (with
the same y) by comparing the adj-R2
You cannot use the adj-R2 to compare
models with different y’s (e.g. y vs. ln(y))
E exp(u ) exp( 2) if u ~ N 0, 2
In this case can predict y as follows
yˆ exp ˆ 2 2 expln̂ y
Economics 20 - Prof. Anderson 20
Predicting y in a log model
If u is not normal, E(exp(u)) must be
estimated using an auxiliary regression
Create the exponentiation of the predicted
ln(y), and regress y on it with no intercept
The coefficient on this variable is the
estimate of E(exp(u)) that can be used to
scale up the exponentiation of the predicted
ln(y) to obtain the predicted y
4. Further Issues
yˆ bˆ1 2 bˆ 2 x x, so
yˆ ˆ
b1 2 bˆ 2 x
x
R 2
1
SSR n k 1
SST n 1
ˆ 2
1
SST n 1
Economics 20 - Prof. Anderson 12
Adjusted R-Squared (cont)
It’s easy to see that the adjusted R2 is just (1
– R2)(n – 1) / (n – k – 1), but most packages
will give you both R2 and adj-R2
You can compare the fit of 2 models (with
the same y) by comparing the adj-R2
You cannot use the adj-R2 to compare
models with different y’s (e.g. y vs. ln(y))
E exp(u ) exp( 2) if u ~ N 0, 2
In this case can predict y as follows
yˆ exp ˆ 2 2 expln̂ y
Economics 20 - Prof. Anderson 20
Predicting y in a log model
If u is not normal, E(exp(u)) must be
estimated using an auxiliary regression
Create the exponentiation of the predicted
ln(y), and regress y on it with no intercept
The coefficient on this variable is the
estimate of E(exp(u)) that can be used to
scale up the exponentiation of the predicted
ln(y) to obtain the predicted y
6. Heteroskedasticity
.
. E(y|x) = b0 + b1x
.
x1 x2 x3 x
Economics 20 - Prof. Anderson 3
Why Worry About
Heteroskedasticity?
OLS is still unbiased and consistent, even if
we do not assume homoskedasticity
The standard errors of the estimates are
biased if we have heteroskedasticity
If the standard errors are biased, we can not
use the usual t statistics or F statistics or LM
statistics for drawing inferences
ˆ xi x ui
For the simple case, b1 b1 , so
xi x
2
i
2
x x 2
2
i
, where uˆi are are the OLS residuals
SST x
Varˆ bˆ j
ij uˆi
rˆ 2 2