Escolar Documentos
Profissional Documentos
Cultura Documentos
i
iid
N(0,
2
)
Topic I
Page 29
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Features of Simple Linear Regression Model
Individual Observations: Y
i
=
0
+
1
X
i
+
i
Since
i
are random, Y
i
are also random and
E(Y
i
) =
0
+
1
X
i
+ E(
i
) =
0
+
1
X
i
Var(Y
i
) = 0 + Var(
i
) =
2
.
Since
i
is Normally distributed,
Y
i
ind
N(
0
+
1
X
i
,
2
) (See A.4, page 1302)
Topic I
Page 30
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Fitted Regression Equation and Residuals
We must estimate the parameters
0
,
1
,
2
from the data. The
estimates are denoted b
0
, b
1
, s
2
. These give us the tted or
estimated regression line
Y
i
= b
0
+b
1
X
i
, where
b
0
is the estimated intercept.
b
1
is the estimated slope.
Y
i
is the estimated mean for Y , when the predictor is X
i
(i.e.,
the point on the tted line).
e
i
is the residual for the ith case (the vertical distance from the
data point to the tted regression line). Note that
e
i
= Y
i
Y
i
= Y
i
(b
0
+b
1
X
i
).
Topic I
Page 31
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Using SAS to Plot the Residuals (Diamond Example)
When we called proc regearlier, we assigned the
residuals to the name resid and placed them in a
new data set called diag. We now plot them vs. X.
proc gplot data=diag;
plot resid
*
weight / haxis=axis1 vaxis=axis2 vref=0;
where price ne .;
run;
Topic I
Page 32
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Notice there does not appear to be any obvious pattern in the residuals. Well talk
a lot more about diagnostics later, but for now, you should know that looking at
residuals plots is an important way to check assumptions.
Topic I
Page 33
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Least Squares
Want to nd best estimators b
0
and b
1
.
Will minimize the sum of the squared residuals
n
i=1
e
2
i
=
n
i=1
(Y
i
(b
0
+b
1
X
i
))
2
.
Use calculus: take derivative with respect to b
0
and
with respect to b
1
and then set the two resultant
equations equal to zero and solve for b
0
and b
1
(see
KNNL, pages 17-18).
Topic I
Page 34
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Least Squares Solution
These are the best estimates for
1
and
0
.
b
1
=
(X
i
X)(Y
i
Y )
(X
i
X)
2
=
SS
XY
SS
X
b
0
=
Y b
1
X
These are also maximum likelihood estimators (MLE), see
KNNL, pages 26-32.
This estimate is the best because it is unbiased (its expected
value is equal to the true value) with minimum variance.
Topic I
Page 35
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Maximum Likelihood
Y
i
N(
0
+
1
X
i
,
2
)
f
i
=
1
2
2
e
1
2
(
Y
i
1
X
i
)
2
L = f
1
f
2
. . . f
n
likelihood function
Find values for
0
and
1
which maximize L. These are
the SAME as the least squares estimators b
0
and b
1
!!!
Topic I
Page 36
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Estimation of
2
We also need to estimate
2
with s
2
. We use the sum of squared
residuals, SSE, divided by the degrees of freedom n 2.
s
2
=
(Y
i
Y
i
)
2
n 2
=
e
2
i
n 2
=
SSE
df
E
= MSE
s =
s
2
= Root MSE,
where SSE =
e
2
i
is the sum of squared residuals or errors,
and MSE stands for mean squared error.
Topic I
Page 37
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
There will be other estimated variance for other
quantities, and these will also be denoted s
2
, e.g.
s
2
{b
1
}. Without any {}, s
2
refers to the value above
that is, the estimated variance of the residuals.
Topic I
Page 38
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Identifying these things in the SAS output
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value
Model 1 2098596 2098596 2069.99
Error 46 46636 1013.81886
Corrected Total 47 2145232
Root MSE 31.84052 R-Square 0.9783
Dependent Mean 500.08333 Adj R-Sq 0.9778
Coeff Var 6.36704
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -259.62591 17.31886 -14.99 <.0001
weight 1 3721.02485 81.78588 45.50 <.0001
Topic I
Page 39
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Review of Statistical Inference for Normal Samples
This should be review!
In Statistics 503/511 you learned how to construct
condence intervals and do hypothesis tests for the
mean of a normal distribution, based on a random
sample. Suppose we have an iid (random) sample
W
1
, . . . , W
n
from a normal distribution. (Usually, I
would use the symbol X or Y , but I want to keep the
context general and not use the symbols we use for
regression.)
Topic I
Page 40
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
We have
W
i
iid
N(,
2
) where and
2
are unknown
W =
W
i
n
= sample mean
SS
W
=
(W
i
W)
2
= sum of squares for W
s
2
{W} =
(W
i
W)
2
n 1
=
SS
W
n 1
= sample variance
s{W} =
_
s
2
{W} = sample standard deviation
s{
W} =
s{W}
n
= standard error of the mean
Topic I
Page 41
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
and from these denitions, we obtain
W N
_
,
2
n
_
,
T =
W
s{
W}
has a t-distribution with n 1 df (in short, T t
n1
)
This leads to inference:
condence intervals for
signicance tests for .
Topic I
Page 42
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Condence Intervals
We are 100(1 )% condent that the following
interval contains :
W t
c
s{
W} =
_
W t
c
s{
W},
W +t
c
s{
W}
where t
c
= t
n1
(1
2
), the upper (1
2
) percentile
of the t distribution with n 1 degrees of freedom, and
1 is the condence level (e.g. 0.95 = 95%, so
= 0.05).
Topic I
Page 43
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Signicance Tests
To test whether has a specic value, we use a t-test
(one sample, non-directional).
H
0
: =
0
vs H
a
: =
0
t =
W
0
s{
W}
has a t
n1
distribution under H
0
.
Reject H
0
if |t| t
c
, where t
c
= t
n1
(1
2
).
p-value = Prob
H
0
(|T| > |t|), where T t
n1
.
Topic I
Page 44
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
The p-value is twice the area in the upper tail of the t
n1
distribution above the observed |t|. It is the probability of
observing a test statistic at least as extreme as what was
actually observed, when the null hypothesis is really true.
We reject H
0
if p . (Note that this is basically the
same more general, actually as having |t| t
c
.)
Topic I
Page 45
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Important Notational Comment
The text says conclude H
A
if t is in the rejection region
(|t| t
c
), otherwise conclude H
0
. This is shorthand for
conclude H
A
means there is sufcient evidence in
the data to conclude that H
0
is false, and so we
assume that H
A
is true.
conclude H
0
means there is insufcient evidence in
the data to conclude that either H
0
or H
A
is true or
false, so we default to assuming that H
0
is true.
Topic I
Page 46
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Notice that a failure to reject H
0
does
not mean that H
a
is true
NOTE: In this course, = 0.05 unless otherwise specied.
Topic I
Page 47
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Section 2.1: Inference about
1
b
1
N(
1
,
2
{b
1
})
where
2
{b
1
} =
2
SS
X
t =
(b
1
1
)
s{b
1
}
where s{b
1
} =
s
2
SS
X
t t
n2
Topic I
Page 48
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
According to our discussion above for W, you
therefore know how to obtain CIs and t-tests for
1
. (Ill
go through it now but not in the future.) There is one
important difference: the degrees of freedom (df) here
are n 2, not n 1, because we are also estimating
0
.
Topic I
Page 49
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Condence Interval for
1
b
1
t
c
s{b
1
},
where t
c
= t
n2
(1
2
), the upper 100(1
2
) percentile of
the t distribution with n 2 degrees of freedom
1 is the condence level.
Signicance Tests for
1
H
0
:
1
= 0 vs H
a
:
1
= 0
t =
b
1
0
s{b
1
}
Reject H
0
if |t| t
c
, t
c
= t
n2
(1 /2)
p-value = Prob(|T| > |t|), where T t
n2
Topic I
Page 50
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Inference for
0
b
0
N(
0
,
2
{b
0
}) where
2
{b
0
} =
2
_
1
n
+
X
2
SS
X
_
.
t =
b
0
0
s{b
0
}
for s{b
0
} replacing
2
by s
2
and take
s{b
0
} = s
_
1
n
+
X
2
SS
X
t t
n2
Topic I
Page 51
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Condence Interval for
0
b
0
t
c
s{b
0
}
where t
c
= t
n2
(1
2
), 1 is the condence level.
Signicance Tests for
0
H
0
:
0
= 0 vs H
A
:
0
= 0
t =
b
0
0
s{b
0
}
Reject H
0
if |t| t
c
, t
c
= t
n2
(1
2
)
p-value = Prob(|T| > |t|), where T t
n2
Topic I
Page 52
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Notes
The normality of b
0
and b
1
follows from the fact that each is a linear combination of
the Y
i
, themselves each independent and normally distributed.
For b
1
, see KNNL, page 42.
For b
0
, try this as an exercise.
Often the CI and signicance test for
0
is not of interest.
If the
i
are not normal but are approximately normal, then the CIs and signicance
tests are generally reasonable approximations.
These procedures can easily be modied to produce one-sided condence intervals
and signicance tests
Because
2
(b
1
) =
2
(X
i
X)
2
, we can make this quantity small by making
(X
i
X)
2
large, i.e. by spreading out the X
i
s.
Topic I
Page 53
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Here is how to get the parameter estimates in SAS (using
diamond.sas). The option clb asks SAS to give you
condence limits for the parameter estimates b
0
and b
1
.
proc reg data=diamonds;
model price=weight/clb;
Parameter Estimates
Parameter Standard
Variable DF Estimate Error
Intercept 1 -259.62591 17.31886
weight 1 3721.02485 81.78588
95% Confidence Limits
-294.48696 -224.76486
3556.39841 3885.65129
Topic I
Page 54
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Points to Remember
What is the default value of that we use in this class?
What is the default condence level that we use in this class?
Suppose you could choose the Xs. How would you choose
them if you wanted a precise estimate of the slope? intercept?
both?
Topic I
Page 55
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Summary of Inference
Y
i
=
0
+
1
X
i
+
i
i
N(0,
2
) are independent, random errors
Parameter Estimators
For
1
: b
1
=
(X
i
X)(Y
i
Y )
(X
i
X)
2
0
: b
0
=
Y b
1
X
2
: s
2
=
(Y
i
b
0
b
1
X
i
)
2
n 2
Topic I
Page 56
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
95% Condence Intervals for
0
and
1
b
1
t
c
s{b
1
}
b
0
t
c
s{b
0
}
where t
c
= t
n2
(1
2
), the 100(1
2
) upper percentile of
the t distribution with n 2 degrees of freedom
Topic I
Page 57
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Signicance Tests for
0
and
1
H
0
:
0
= 0, H
a
:
0
= 0
t =
b
0
s{b
0
}
t
(n2)
under H
0
H
0
:
1
= 0, H
a
:
1
= 0
t =
b
1
s{b
1
}
t
(n2)
under H
0
Reject H
0
if the p-value is small (< 0.05).
Topic I
Page 58
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
KNNL Section 2.3 Power
The power of a signicance test is the probability that the null
hypothesis will be rejected when, in fact, it is false. This probability
depends on the particular value of the parameter in the alternative
space. When we do power calculations, we are trying to answer
questions like the following:
Suppose that the parameter
1
truly has the value 1.5, and
we are going to collect a sample of a particular size n and
with a particular SS
X
. What is the probability that, based on
our (not yet collected) data, we will reject H
0
?
Topic I
Page 59
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Power for
1
H
0
:
1
= 0, H
a
:
1
= 0
t =
b
1
s{b
1
}
t
c
= t
n2
(1
2
)
for = 0.05, we reject H
0
, when |t| t
c
so we need to nd P(|t| t
c
) for arbitrary values of
1
= 0
when
1
= 0, the calculation gives (H
0
is true)
Topic I
Page 60
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
t t
n2
() noncentral t distribution: t-distribution
not centered at 0
=
1
{b
1
}
is the noncentrality parameter: it
represents on a standardized scale how far from
true H
0
is (kind of like effect size)
We need to assume values for
2
(b
1
) =
2
(X
i
X)
2
and n
KNNL uses tables, see pages 50-51
we will use SAS
Topic I
Page 61
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Example of Power for
1
Response Variable: Work Hours
Explanatory Variable: Lot Size
See page 19 for details of this study, page 50-51 for details regarding power
We assume
2
= 2500, n = 25, and SS
X
= 19800, so we have
2
(b
1
) =
2
(X
i
X)
2
= 0.1263.
Consider
1
= 1.5.
We now can calculate =
1
{b
1
}
with t t
n2
(); we want to nd P(|t| t
c
)
We use a function that calculates the cumulative distribution function (cdf) for
the noncentral t distribution.
Topic I
Page 62
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
See program knnl050.sasfor the power calculations.
data a1;
n=25; sig2=2500; ssx=19800; alpha=.05;
sig2b1=sig2/ssx; df=n-2;
beta1=1.5;
delta=abs(beta1)/sqrt(sig2b1);
tstar=tinv(1-alpha/2,df);
power=1-probt(tstar,df,delta)+probt(-tstar,df,delta);
output;
proc print data=a1;run;
Obs n sig2 ssx alpha sig2b1 df beta1
1 25 2500 19800 0.05 0.12626 23 1.5
delta tstar power
4.22137 2.06866 0.98121
Topic I
Page 63
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
data a2;
n=25; sig2=2500; ssx=19800; alpha=.05;
sig2b1=sig2/ssx; df=n-2;
do beta1=-2.0 to 2.0 by .05;
delta=abs(beta1)/sqrt(sig2b1);
tstar=tinv(1-alpha/2,df);
power=1-probt(tstar,df,delta)+probt(-tstar,df,delta);
output;
end;
proc print data=a2;
run;
title1 Power for the slope in simple linear regression;
symbol1 v=none i=join;
proc gplot data=a2; plot power
*
beta1; run;
Topic I
Page 64
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Topic I
Page 65
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Section 2.4: Estimation of E(Y
h
)
E(Y
h
) =
h
=
0
+
1
X
h
, the mean value of Y for
the subpopulation with X = X
h
.
We will estimate E(Y
h
) with
Y
h
=
h
= b
0
+b
1
X
h
.
KNNL uses
Y
h
to denote this estimate; we will use
the symbols
Y
h
=
h
interchangeably.
See equation (2.28) on page 52.
Topic I
Page 66
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Theory for Estimation of E(Y
h
)
Y
h
is normal with mean
h
and variance
2
{
Y
h
} =
2
_
1
n
+
(X
h
X)
2
(X
i
X)
2
_
.
The normality is a consequence of the fact that
b
0
+b
1
X
h
is a linear combination of Y
i
s.
The variance has two components: one for the
intercept and one for the slope. The variance
associated with the slope depends on the distance
X
h
X. The estimation is more accurate near
X.
See KNNL pages 52-55 for details.
Topic I
Page 67
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Application of the Theory
We estimate
2
{
Y
h
} with
s
2
{
Y
h
} = s
2
_
1
n
+
(X
h
X)
2
(X
i
X)
2
_
It follows that t =
Y
h
E(Y
h
)
s(
h
)
t
n2
; proceed as usual.
95% Condence Interval for E(Y
h
)
Y
h
t
c
s{
Y
h
}, where t
c
= t
n2
(0.975).
NOTE: Signicance tests can be performed for E(Y
h
),
but they are rarely used in practice.
Topic I
Page 68
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Example
See program knnl054.sasfor the estimation of subpopulation
means. The option clm to the modelstatement asks for
condence limits for the mean E(Y
h
).
data a1;
infile H:\Stat512\Datasets\Ch01ta01.dat;
input size hours;
data a2; size=65; output;
size=100; output;
data a3; set a1 a2;
proc print data=a3; run;
proc reg data=a3;
model hours=size/clm;
id size;
run;
Topic I
Page 69
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
. Dep Var Predicted
Obs size hours Value
25 70 323.0000 312.2800
26 65 . 294.4290
27 100 . 419.3861
Std Error
Mean Predict 95% CL Mean
9.7647 292.0803 332.4797
9.9176 273.9129 314.9451
14.2723 389.8615 448.9106
Topic I
Page 70
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Section 2.5: Prediction of Y
h(new)
We wish to construct an interval into which we predict the next
observation (for a given X
h
) will fall.
The only difference (operationally) between this and E(Y
h
) is
that the variance is different.
In prediction, we have two variance components: (1) variance
associated with the estimation of the mean response
Y
h
and (2)
variability in a single observation taken from the distribution with
that mean.
Y
h(new)
=
0
+
1
X
h
+ is the value for a new observation
with X = X
h
.
Topic I
Page 71
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
We estimate Y
h(new)
starting with the predicted value
Y
h
. This is the center of the condence interval, just as it
was for E(Y
h
). However, the width of the CI is different
because they have different variances.
E[(Y
h(new)
Y
h
)
2
] = Var(
Y
h
) + Var()
s
2
{pred} = s
2
{
Y
h
} +s
2
s
2
{pred} = s
2
_
1 +
1
n
+
(X
h
X)
2
(X
i
X)
2
_
Y
h(new)
Y
h
s{pred}
t
n2
Topic I
Page 72
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
s
2
{pred} = s
2
{
Y
h
} +s
2
s
2
{pred} = s
2
_
1 +
1
n
+
(X
h
X)
2
(X
i
X)
2
_
Y
h(new)
Y
h
s{pred}
t
n2
s{pred} denotes the estimated standard deviation of a
new observation with X = X
h
. It takes into account
variability in estimating the mean
Y
h
as well as variability
in a single observation from a distribution with that
mean.
Topic I
Page 73
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Notes
The procedure can be modied for the mean of m observations at
X = X
h
(see 2.39a on page 60). Standard error is affected by how
far X
h
is from
X (see Figure 2.3). As was the case for the mean
response, prediction is more accurate near
X.
See program knnl059.sasfor the prediction interval example.
The cli option to the modelstatements asks SAS to give
condence limits for an individual observation. (c.f. clband clm)
Topic I
Page 74
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
data a1;
infile H:\Stat512\Datasets\Ch01ta01.dat;
input size hours;
data a2;
size=65; output;
size=100; output;
data a3;
set a1 a2;
proc reg data=a3;
model hours=size/cli;
run;
Topic I
Page 75
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Dep Var Predicted Std Error
Obs size hours Value Mean Predict
25 70 323.0000 312.2800 9.7647
26 65 . 294.4290 9.9176
27 100 . 419.3861 14.2723
95% CL Predict Residual
209.2811 415.2789 10.7200
191.3676 397.4904 .
314.1604 524.6117 .
Topic I
Page 76
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Notes
The standard error (Std Error Mean
Predict) given in this output is the standard error
of
Y
h
, not s{pred}. (Thats why the word mean is in
there.) The CL Predictlabel tells you that the
condence interval is for the prediction of a new
observation.
The prediction interval for Y
h(new)
is wider than the
condence interval for E(Y
h
) because it has a larger
variance.
Topic I
Page 77
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Estimation of E(Y
h
) Compared to Prediction of Y
h
Y
h
= b
0
+b
1
X
h
s
2
{
Y
h
} = s
2
_
1
n
+
(X
h
X)
2
(X
i
X)
2
_
s
2
{pred} = s
2
_
1 +
1
n
+
(X
h
X)
2
(X
i
X)
2
_
Topic I
Page 78
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
See the program knnl061x.sasfor the clm(mean)
and cli(individual) plots.
data a1;
infile H:\System\Desktop\CH01TA01.DAT;
input size hours;
Condence intervals:
symbol1 v=circle i=rlclm95;
proc gplot data=a1;
plot hours
*
size;
Topic I
Page 79
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Topic I
Page 80
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Prediction Intervals:
symbol1 v=circle i=rlcli95;
proc gplot data=a1;
plot hours
*
size;
run;
Topic I
Page 81
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Working-Hotelling Condence Bands
Section 2.6
This is a condence limit for the whole line at once, in contrast to
the condence interval for just one E(Y
h
) at a time.
Regression line b
0
+b
1
X
h
describes E(Y
h
) for a given X
h
.
We have 95% CI for E(Y
h
) pertaining to specic X
h
.
We want a condence band for all X
h
this is a condence limit
for the whole line at once, in contrast to the condence interval
for just one E(Y
h
) at a time.
The condence limit is given by
Y
h
Ws{
Y
h
}, where
Topic I
Page 82
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
W
2
= 2F
2,n2
(1 ). Since we are doing all values of X
h
at
once, it will be wider at each X
h
than CIs for individual X
h
.
The boundary values dene a hyperbola.
The theory for this comes from the joint condence region for
(
0
,
1
), which is an ellipse (see Stat 524).
We are used to constructing CIs with ts, not Ws. Can we fake
it?
We can nd a new smaller for t
c
that would give the same
result kind of an effective alpha that takes into account that
you are estimating the entire line.
We nd W
2
for our desired , and then nd the effective
t
to
use with t
c
that gives W() = t
c
(
t
).
Topic I
Page 83
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Condence Band for Regression Line
See program knnl061.sasfor the regression line condence
band.
data a1;
n=25; alpha=.10; dfn=2; dfd=n-2;
w2=2
*
finv(1-alpha,dfn,dfd);
w=sqrt(w2); alphat=2
*
(1-probt(w,dfd));
tstar=tinv(1-alphat/2, dfd); output;
proc print data=a1;run;
Topic I
Page 84
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Note 1-probt(w, dfd)gives the area under the
t-distribution to the right of w. We have to double that to
get the total area in both tails.
Obs n alpha dfn dfd w2 w
1 25 0.1 2 23 5.09858 2.25800
alphat tstar
0.033740 2.25800
Topic I
Page 85
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
data a2;
infile H:\System\Desktop\CH01TA01.DAT;
input size hours;
symbol1 v=circle i=rlclm97;
proc gplot data=a2;
plot hours
*
size;
Topic I
Page 86
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Section 2.7: Analysis of Variance (ANOVA) Table
Organizes results arithmetically
Total sum of squares in Y is SS
Y
=
(Y
i
Y )
2
Partition this into two sources
Model (explained by regression)
Error (unexplained / residual)
Y
i
Y = (Y
i
Y
i
) + (
Y
i
Y )
(Y
i
Y )
2
=
(Y
i
Y
i
)
2
+
Y
i
Y )
2
(cross terms cancel: see page 65)
Topic I
Page 87
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Total Sum of Squares
Consider ignoring X
h
to predict E(Y
h
). Then the best predictor
would be the sample mean
Y .
SST is the sum of squared deviations from this predictor
SST = SS
Y
=
(Y
i
Y )
2
.
The total degrees of freedom is df
T
= n 1.
MST = SST/df
T
MST is the usual estimate of the variance of Y if there are no
explanatory variables, also known as s
2
{Y }.
SAS uses the term Corrected Totalfor this source.
Uncorrected is
Y
2
i
. The term corrected means that we
subtract off the mean
Y before squaring.
Topic I
Page 88
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Model Sum of Squares
SSM =
Y
i
Y )
2
The model degrees of freedom is df
M
= 1, since one
parameter (slope) is estimated.
MSM = SSM/df
M
KNNL uses the word regression for what SAS calls model
So SSR (KNNL) is the same as SS Model (SAS). I prefer to
use the terms SSM and df
M
because R stands for regression,
residual, and reduced (later), which I nd confusing.
Topic I
Page 89
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Error Sum of Squares
SSE =
(Y
i
Y
i
)
2
The error degrees of freedom is df
E
= n 2, since
estimates have been made for both slope and
intercept.
MSE = SSE/df
E
MSE = s
2
is an estimate of the variance of Y
taking into account (or conditioning on) the
explanatory variable(s)
Topic I
Page 90
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
ANOVA Table for SLR
Source df SS MS
Model (Regression) 1
Y
i
Y )
2 SSM
df
M
Error n 2
(Y
i
Y
i
)
2 SSE
df
E
Total n 1
(Y
i
Y )
2 SST
df
T
Note about degrees of freedom
Occasionally, you will run across a reference to degrees of
freedom, without specifying whether this is model, error, or total.
Sometimes is will be clear from context, and although that is sloppy
usage, you can generally assume that if it is not specied, it means
error degrees of freedom.
Topic I
Page 91
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Expected Mean Squares
MSM, MSE are random variables
E(MSM) =
2
+
2
1
SS
X
E(MSE) =
2
When H
0
:
1
= 0 is true, then
E(MSM) = E(MSE).
This makes sense, since in that case,
Y
i
=
Y .
Topic I
Page 92
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
F-test
F = MSM/MSE F
df
M
,df
E
= F
1,n2
See KNNL, pages 69-70
When H
0
:
1
= 0 is false, MSM tends to be larger than
MSE, so we would want to reject H
0
when F is large.
Generally our decision rule is to reject the null hypothesis if
F F
c
= F
df
M
,df
E
(1 ) = F
1,n2
(0.95)
In practice, we use p-values (and reject H
0
if the p-value is less
than ).
Recall that t = b
1
/s(b
1
) tests H
0
:
1
= 0. It can be shown
Topic I
Page 93
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
that t
2
df
= F
1,df
. The two approaches give the same p-value;
they are really the same test.
Aside: When H
0
:
1
= 0 is false, F has a noncentral F
distribution; this can be used to calculate power.
ANOVA Table
Source df SS MS F p
Model 1 SSM MSM
MSM
MSE
p
Error n 2 SSE MSE
Total n 1
Topic I
Page 94
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
See the program knnl067.sasfor the program used to
generate the other output used in this lecture.
data a1;
infile H:\System\Desktop\CH01TA01.DAT;
input size hours;
proc reg data=a1;
model hours=size;
run;
Topic I
Page 95
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Analysis of Variance
Sum of Mean
Source DF Squares Square
Model 1 252378 252378
Error 23 54825 2383.71562
Corrected Total 24 307203
F Value Pr > F
105.88 <.0001
Parameter Estimates
Parameter Standard
Variable DF Estimate Error
Intercept 1 62.36586 26.17743
size 1 3.57020 0.34697
t Value Pr > |t|
2.38 0.0259
10.29 <.0001
Note that t
2
= 10.29
2
= 105.88 = F.
Topic I
Page 96
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Section 2.8: General Linear Test
A different view of the same problem (testing
1
= 0). It may
seem redundant now, but the concept is extremely useful in
MLR.
We want to compare two models:
Y
i
=
0
+
1
X
i
+
i
(full model)
Y
i
=
0
+
i
(reduced model)
Compare using the error sum of squares.
Topic I
Page 97
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Let SSE(F) be the SSE for the Full model, and let
SSE(R) be the SSE for the Reduced Model.
F =
(SSE(R) SSE(F))/(df
E(R)
df
E(F)
)
SSE(F)/df
E(F)
Compare to the critical value
F
c
= F
df
E(R)
df
E(F)
,df
E(F)
(1 ) to test H
0
:
1
= 0
vs. H
a
:
1
= 0.
Topic I
Page 98
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Test in Simple Linear Regression
SSE(R) =
(Y
i
Y )
2
= SST
SSE(F) = SST SSM(the usual SSE)
df
E(R)
= n 1, df
E(F)
= n 2,
df
E(R)
df
E(F)
= 1
F =
(SST SSE)/1
SSE/(n 2)
=
MSM
MSE
(Same test as before)
This approach (full vs reduced) is more general, and
we will see it again in MLR.
Topic I
Page 99
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Pearson Correlation
is the usual correlation coefcient (estimated by r)
It is a number between -1 and +1 that measures the strength of
the linear relationship between two variables
r =
(X
i
X)(Y
i
Y )
_
(X
i
X)
2
(Y
i
Y )
2
Notice that
r = b
1
(X
i
X)
2
(Y
i
Y )
2
= b
1
s
X
s
Y
Topic I
Page 100
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Test H
0
:
1
= 0 similar to H
0
: = 0.
R
2
and r
2
R
2
is the ratio of explained and total variation:
R
2
= SSM/SST
r
2
is the square of the correlation between X and Y :
r
2
= b
2
1
_
(X
i
X)
2
(Y
i
Y )
2
_
=
SSM
SST
Topic I
Page 101
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
In SLR, r
2
= R
2
are the same thing.
However, in MLR they are different (there will be a
different r for each X variable, but only one R
2
).
R
2
is often multiplied by 100 and thereby expressed as
a percent.
In MLR, we often use the adjusted R
2
which has been
adjusted to account for the number of variables in the
model (more in Chapter 6).
Topic I
Page 102
Statistics 512: Applied Regression Analysis
Professor Min Zhang
Purdue University
Spring 2014
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 252378 252378 105.88 <.0001
Error 23 54825 2383
C Total 24 307203
R-Square 0.8215
= SSM/SST = 1 SSE/SST
= 252378/307203
Adj R-sq 0.8138
= 1 MSE/MST
= 1 2383/(307203/24)
Topic I
Page 103