Escolar Documentos
Profissional Documentos
Cultura Documentos
Practice Problems 2
Fall 2014
1. In mobile ad hoc computer networks, messages must be forwarded from computer to computer
until they reach their destinations. The data overhead is the number of bytes of information that
must be transmitted, and it measures the success of a protocol. A regression study was
performed to predict the data overhead (kB) by using the independent variables average speed
(m/s), pause time (s), and link change rate (LCR, 100/s). The studied model was
y 0 1x1 2 x2 3 x3 4 x1x3 5 x2 x3 6 x12
The regression equation is
Overhead = 406 1.93 Speed 0.387 Pause + 1.02 LCR 0.0618 Speed*LCR
+ 0.290 Pause*LCR + 0.0401 Speed**2
Predictor
Coef SE Coef
Constant 406.491
9.032
Speed
-1.932
1.137
Pause
-0.3866 0.2202
LCR
1.0153
0.9208
Speed*LCR -0.06182 0.02102
Pause*LCR 0.28985 0.04259
Speed**2
0.04013 0.02056
T
45.01
-1.70
-1.76
1.10
-2.94
6.81
1.95
a. By looking at the output and by considering the p values, state whether a predictor should be
kept in the model by using = 0.05, and why? Fill in the table.
b. What would the expected data overhead be if the average speed is 15 m/s, the pause time is 10
s, and the LCR is 20?
2. The weekly sales (in $1000 per week) for fast-food outlets in each of four cities were
collected. The objective is to model sales (y) as a function of traffic flow (in 1,000 of cars),
adjusting for city-to-city variations that might be due to size or other market conditions. The
linear regression model is therefore:
y = + x + C +C + C +, where x is traffic flow,
1, if city 1
1, if city 2
C1 =
, C2 =
, and C3 =
0, otherwise
0, otherwise
and City 4 is the base level.
1, if city 3
,
0, otherwise
Coef
1.0834
-1.2158
-0.5308
-1.0765
0.103673
S = 0.362307
SE Coef
0.3210
0.2054
0.2848
0.2265
0.004094
R-Sq = 97.9%
T
3.37
-5.92
-1.86
-4.75
25.32
P
0.003
0.000
0.078
0.000
0.000
R-Sq(adj) = 97.5%
Analysis of Variance
Source
DF
SS
Regression
4 116.656
Residual Error 19
2.494
Total
23 119.150
MS
29.164
0.131
F
222.17
P
0.000
F
313.75
P
0.000
3. Screening Techniques.
a. In a linear regression study five predictors are being evaluated by using a stepwise regression.
By considering the quantities given below, perform the first two steps of the stepwise regression,
i.e., write down the selected variable(s) in step1 and step 2. Assume entry= remove= 0.10.
Step1 Predictors
Step2
X1
11.21
0.000
X1, X2
3.91,9.92
0.002,0.000
X2, X4
0.72,-0.41
0.486,0.690
X2
22.90
0.000
X1, X3
10.41,2.37
0.000,0.033
X2, X5
7.19,1.34
0.000,0.200
X3
2.75
0.015
X1, X4
3.84,9.67
0.002,0.000
X3, X4
-3.31,23.86
0.005,0.000
X4
22.41
0.000
X1, X5
3.06,2.74
0.009,0.016
X3, X5
2.02,9.48
0.063,0.000
X5
10.71
0.000
X2, X3
24.43,-3.40
0.000,0.004
X4, X5
6.97,1.39
0.000,0.250
Step 1:
Step 2:
b. Consider the best subset output given below. What is the best subset of variables? State your
reasons.
2
Response is y
Vars
1
1
2
2
3
3
4
4
5
5
6
R-Sq
51.2
47.6
93.9
79.6
97.2
95.8
98.8
97.6
99.0
98.8
99.0
Mallows
Cp
144.9
156.0
14.9
59.1
6.6
11.1
3.7
7.3
5.2
5.7
7.0
R-Sq(adj)
45.1
41.1
92.1
73.8
95.9
93.7
97.9
95.8
97.7
97.3
97.1
S
2496.8
2587.1
945.09
1726.6
686.37
848.68
492.34
694.63
510.32
549.94
574.97
x x x x x x
1 2 3 4 5 6
X
X
X
X
X
X
X
X X
X X
X
X X
X X
X X
X X
X X X
X X
X X X X X
X X X X X X
y
0.013
0.068
0.056
0.014
SRES1
-1.47200
-3.17206
-0.12900
-2.87807
TRES1
-1.50366
-3.33242
-0.12679
-2.90441
HI1
0.053974
0.689127
0.269057
0.058732
COOK1
0.04121
3.48608
1.02204
0.11762
Observation #1:
Observation #2:
Observation #3:
Observation #4:
5. The data on annual sale revenues (in billions of dollars) of the Eastman Kodak Company over
a 25-year period were considered. The following time series plot depicts the actual annual
revenues (y) over the 25 years.
Time Series Plot of Y
20.0
17.5
15.0
12.5
10.0
7.5
5.0
2
10
12
14
Index
16
18
20
22
24
A quadratic trend model, yt = + t + t2 + was studied and the following results were
obtained.
The regression equation is
Y = 3.15 + 1.46 t - 0.0411 t^2
Predictor
Constant
t
t^2
Coef
3.152
1.4617
-0.041122
S = 2.04889
SE Coef
1.137
0.2194
0.008832
R-Sq = 80.6%
Analysis of Variance
Source
DF
SS
Regression
2 384.04
Residual Error 22
92.36
Total
24 476.39
T
2.77
6.66
-4.66
P
0.011
0.000
0.000
R-Sq(adj) = 78.9%
MS
192.02
4.20
F
45.74
P
0.000
a. Would you consider this model and the predictors useful (significant)? Support your answer
by looking at the p values, and the tests. Assume = 0.10.
b. Would you change your answer in part (a) when you saw the residual plots given below by
considering the assumptions on errors? In other words, check whether the results justify the
basic error term assumptions or not, i.e., i ~ i.i.d. Normal(0, 2). State the hypotheses that you
are testing and the residual plot that you are referring to explicitly, and use .
Residual Plots for Y
Normal Probability Plot
Versus Fits
99
Residual
Percent
90
50
10
1
-5.0
-2.5
0.0
Residual
2.5
2
0
-2
-4
5.0
10
Fitted Value
Versus Order
Residual
Frequency
Histogram
0
-2
2
0
15
-3
-2
-1
0
1
Residual
-4
8 10 12 14 16 18 20 22 24
Observation Order
a. First, apply a transformation to the equation so that a simple linear regression solution can be
found.
b. Then, consider the solution to the transformed model: y * = -2.3+0.6 x.
What are 1 , 2 , and the predicted value of y (i.e., y ) when x = 5?
7. Answer the following short-answer questions.
a. If the variance inflation factor (VIF) is not less than 1, we can say that there exists
multicollinearity between independent variables. True or False?
b. An unusual data point can be both an outlier and a high leverage point. True or False?
c. What can we detect when t-tests for all (or nearly all) parameters are non-significant
whereas the F-test for overall model adequacy (H0: 1==k = 0) is significant?
d. Suppose a fitted linear regression model is y = 15 + 2 x1 3x2 + x22 + 4 x1 x2. What is the
amount of change in the expected value of y for every one-unit increase in x1, holding x2 fixed
at 2?