Escolar Documentos
Profissional Documentos
Cultura Documentos
7, page 1
Chapter 7 The Simple Linear Regression Model
A common model for modeling the relationship between two quantitative variables is the linear
regression model. Don’t be fooled by the “linear” part: as we’ll see, linear regression models can
often be used to model relationships which aren’t linear.
Although we looked at the linear regression model last semester, we only looked at one part of it
– the part that models the mean response Y as a linear function of X. We’ll extend the model to
model the scatter of the individual data points around the line. The way we extend it makes the
linear regression model exactly like the ANOVA model, except that the explanatory variable is
quantitative instead of categorical.
We assume that at each X, the distribution of Y values is normal with mean β 0 + β1 X and
standard deviation σ.
µ (Y X ) = β 0 + β 1 X
σ (Y X ) = σ 2
Least squares estimates of β 0 and β 1 are denoted by β̂ 0 and βˆ1 . The predicted or fitted value
of Y for a particular X is:
µˆ (Y X ) = βˆ 0 + βˆ1 X .
By modeling the distribution of data points around the line, we can make inferences from the
sample data about the regression parameters.
Chap. 7, page 2
Case Study 7.2: Meat Processing and pH
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 3.00647 1 3.00647 444.306 .000a
Residual .05413 8 .00677
Total 3.06060 9
a. Predictors: (Constant), Log(hours)
b. Dependent Variable: pH
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 6.9836 .0485 143.897 .000
Log(hours) -.7257 .0344 -.991 -21.079 .000
a. Dependent Variable: pH
Yi = β 0 + β 1 X i + ε i
n
∑ ( X i − X )(Yi − Y )
i =1
βˆ1 = n
, βˆ 0 = Y − βˆ1 X
∑ ( X i − X )2
i =1
The ANOVA table gives the sum of squared residuals and the mean square residual which is
σˆ 2 = 0.00677 so σˆ = 0.0823.
The standard errors of β̂ 0 and βˆ1 represent the estimated standard deviations of the sampling
distributions of β̂ and βˆ . The sampling distributions refer to how the least squares estimates
0 1
would vary from sample to sample. We view the X i ’s as fixed; they are viewed to remain the
same from sample to sample while the Yi ’s are random.
1 1 X2
SE ( βˆ1 ) = σˆ , SE ( βˆ 0 ) = σˆ +
(n − 1) s X2 n (n − 1) s X2
A 95% confidence interval for β 1 is -.7257 ± t 8 (.975) (.0344) = -.7257 ± 2.306 (.0344) =
-.7257± .0793 = -.805 to -.646. So we are 95% confident that the decrease in mean pH is
between .646 and .805 for every 2.72-fold increase in time since slaughter.
The confidence interval can also be obtained from SPSS by choosing Options in the
Analyze…Regression…Linear window.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 6.984 .049 143.897 .000 6.872 7.096
Log(hours) -.726 .034 -.991 -21.079 .000 -.805 -.646
a. Dependent Variable: pH
µˆ (Y X 0 ) is
[( )]
SE µˆ Yˆ X 0 = σˆ
1 ( X 0 − X )2
+
n (n − 1) s X2
Note that the standard error is bigger for values of X 0 further from X and is smallest at X .
Chap. 7, page 5
Steer data: What is the estimated mean pH for carcasses 3 hours old? Give a 95% confidence
interval for the mean pH after 3 hours.
First, remember that the X variable in the regression model is log(Hours), so X 0 = log(3) =
( )
1.0986 (natural logarithm). Therefore, µˆ Y X 0 = 1.0986 = 6.9836 - .7257(1.0986) = 6.186.
To calculate the standard error, we need to compute X , the mean of the log(Hours) for the 10
data points and s X2 , the sample variance of log(Hours). From SPSS,
Descriptive Statistics
Therefore,
[( )]
SE µˆ Yˆ X 0 = 1.0986 = 0.0823
1 (1.0986 − 1.1901) 2
10
+
5.709
= 0.0262
and a 95% confidence interval for the mean pH among all steers after 3 hours is
Steer data: for simultaneous 95% confidence intervals, F2,n −2 (.1 − α ) = F2,8 (.95) = 4.46. The
confidence interval for the mean pH after 3 hours is therefore (see above):
Pred(Y X 0 ) = µˆ (Y X 0 ) = βˆ 0 + βˆ1 X 0
1 ( X 0 − X )2
SE[Pred(Y X 0 )] = σˆ 2 + SE[µˆ (Y X 0 )] = σˆ 1 +
2
+
n (n − 1) s X2
The standard error of prediction has two parts: the uncertainty due to estimating the mean
response at X 0 and the uncertainty due to the fact that individual observations vary around that
mean with standard deviation σ. Note that while the standard error of the mean response at X 0
goes to 0 as n increases, the standard error of prediction never goes to 0. An individual 100(1-
α)% prediction interval for the response of an individual at X 0 is
For the steer data, a 95% prediction interval for the pH of a particular steer 3 hours after
slaughter is:
1 (1.0986 − 1.1901) 2
6.186 ± 2.306 (.0823) 1 + + = 6.186 ± 2.306(.08637) = 6.186 ± .1992 =
10 5.709
5.99 to 6.39.
Simultaneous prediction intervals can be computed for several different X values using
Bonferroni, but there is no analog to the Working-Hotelling Scheffe-based procedure for
simultaneous prediction intervals at all possible values of X.
Chap. 7, page 7
SPSS commands
Analyze…Regression…Linear
Under Statistics button, you can choose to get confidence intervals for β 0 and β1 .
To obtain predicted values, confidence intervals and prediction intervals for a value of X not in
the data set, add a case to the data with the desired X value, but leave the value of Y blank (it
should display a period which indicates a missing value).
SPSS can plot the individual confidence intervals for mean response and the prediction
intervals for an individual response. Create a scatterplot and double-click the plot to get into
Chart Editor. Select one of the data points and click on the “Add fit line” icon. Under the “Fit
line” tab you can select “Mean” or “Individual” confidence intervals. The first gives individual
(not simultaneous) confidence intervals for the mean response at each X and the second gives
prediction intervals.
Chap. 7, page 8
95% individual confidence intervals for the mean, 95% Working-Hotelling simultaneous
confidence bands for the mean, and 95% individual prediction intervals for a single response
(this graph is from S-Plus,; SPSS will only do the first and last of the three).
0.95 bands
7.0
6.5
y
6.0
5.5