Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014

ISyE4031 Regression and Forecasting
Practice Problems 2
Fall 2014
1. In mobile ad hoc computer networks, messages must be forwarded from computer to computer
until they reach their destinations. The data overhead is the number of bytes of information that
must be transmitted, and it measures the success of a protocol. A regression study was
performed to predict the data overhead (kB) by using the independent variables average speed
(m/s), pause time (s), and link change rate (LCR, 100/s). The studied model was
y 0 1x1 2 x2 3 x3 4 x1x3 5 x2 x3 6 x12
The regression equation is
Overhead = 406 1.93 Speed 0.387 Pause + 1.02 LCR 0.0618 Speed*LCR
+ 0.290 Pause*LCR + 0.0401 Speed**2
Predictor
Coef SE Coef
Constant 406.491
9.032
Speed
-1.932
1.137
Pause
-0.3866 0.2202
LCR
1.0153
0.9208
Speed*LCR -0.06182 0.02102
Pause*LCR 0.28985 0.04259
Speed**2
0.04013 0.02056
T
45.01
-1.70
-1.76
1.10
-2.94
6.81
1.95
P Significant?(Y/N) Keep? Why?

0.000
0.107
0.096
0.285
0.009
0.000
0.067
a. By looking at the output and by considering the p values, state whether a predictor should be
kept in the model by using = 0.05, and why? Fill in the table.
b. What would the expected data overhead be if the average speed is 15 m/s, the pause time is 10
s, and the LCR is 20?
2. The weekly sales (in $1000 per week) for fast-food outlets in each of four cities were
collected. The objective is to model sales (y) as a function of traffic flow (in 1,000 of cars),
adjusting for city-to-city variations that might be due to size or other market conditions. The
linear regression model is therefore:
y = + x + C +C + C +, where x is traffic flow,
1, if city 1
1, if city 2
C1 =
, C2 =
, and C3 =
0, otherwise
0, otherwise
and City 4 is the base level.
1, if city 3
,
0, otherwise
The following output was obtained.

SALES = 1.08 - 1.22 CITY_1 - 0.531 CITY_2 - 1.08 CITY_3 + 0.104 TRAFFIC
Predictor
Constant
CITY_1
CITY_2
CITY_3
TRAFFIC
Coef
1.0834
-1.2158
-0.5308
-1.0765
0.103673
S = 0.362307
SE Coef
0.3210
0.2054
0.2848
0.2265
0.004094
R-Sq = 97.9%
T
3.37
-5.92
-1.86
-4.75
25.32
P
0.003
0.000
0.078
0.000
0.000
R-Sq(adj) = 97.5%
Analysis of Variance
Source
DF
SS
Regression
4 116.656
Residual Error 19
2.494
Total
23 119.150
MS
29.164
0.131
F
222.17
P
0.000
Answer the following questions by using = 0.05.

a. Are the mean weekly sales identical for all four cities? Why? I.e., support your answer by
hypothesis testing and expected values. Use =0.05.
b. Does City 2 have more expected sales than City 4? State the hypothesis that you consider
explicitly.
c. Which city has the most expected sales? Explain by referring to the coefficients, hypothesis
testing, and expected values.
d. What is the expected sales in City 3 when the traffic flow is recorded as 60,000 cars?
e. Suppose that you modeled and solved the problem as a simple linear regression model:
y = + x + , where y: Sales and x: Traffic flow. In order to decide on the significance of
the cities, compare this reduced model and the complete model by using a partial (nested) F test.
State the hypothesis explicitly.
SALES = 0.018 + 0.108 TRAFFIC
Source
DF
SS
MS
Regression
1 111.34 111.34
Residual Error 22
7.81
0.35
Total
23 119.15
F
313.75
P
0.000
3. Screening Techniques.
a. In a linear regression study five predictors are being evaluated by using a stepwise regression.
By considering the quantities given below, perform the first two steps of the stepwise regression,
i.e., write down the selected variable(s) in step1 and step 2. Assume entry= remove= 0.10.
Step1 Predictors
Step2
t-stat for each Xj

p-value for each Xj
Predictors
t-stat for each Xj
p-value for each Xj
Predictors
t-stat for each Xj
p-value for each Xj
X1
11.21
0.000
X1, X2
3.91,9.92
0.002,0.000
X2, X4
0.72,-0.41
0.486,0.690
X2
22.90
0.000
X1, X3
10.41,2.37
0.000,0.033
X2, X5
7.19,1.34
0.000,0.200
X3
2.75
0.015
X1, X4
3.84,9.67
0.002,0.000
X3, X4
-3.31,23.86
0.005,0.000
X4
22.41
0.000
X1, X5
3.06,2.74
0.009,0.016
X3, X5
2.02,9.48
0.063,0.000
X5
10.71
0.000
X2, X3
24.43,-3.40
0.000,0.004
X4, X5
6.97,1.39
0.000,0.250
Step 1:
Step 2:
b. Consider the best subset output given below. What is the best subset of variables? State your
reasons.
2
Response is y
Vars
1
1
2
2
3
3
4
4
5
5
6
R-Sq
51.2
47.6
93.9
79.6
97.2
95.8
98.8
97.6
99.0
98.8
99.0
Mallows
Cp
144.9
156.0
14.9
59.1
6.6
11.1
3.7
7.3
5.2
5.7
7.0
R-Sq(adj)
45.1
41.1
92.1
73.8
95.9
93.7
97.9
95.8
97.7
97.3
97.1
S
2496.8
2587.1
945.09
1726.6
686.37
848.68
492.34
694.63
510.32
549.94
574.97
x x x x x x
1 2 3 4 5 6
X
X
X
X
X
X
X
X X
X X
X
X X
X X
X X
X X
X X X
X X
X X X X X
X X X X X X
4. Residual analysis and diagnostics.

a. A linear regression model, y = + x + x + , was fitted to 24 heat treatment data
points, and several test results for diagnostics statistics were produced. Consider the following
four of those 24 observations and state whether they are unusual or not. If an observation is
unusual, is it outlier, leverage, and/or influential? State your reasons explicitly by referring to the
diagnostics statistics.
Observation
1
2
3
4
y
0.013
0.068
0.056
0.014
SRES1
-1.47200
-3.17206
-0.12900
-2.87807
TRES1
-1.50366
-3.33242
-0.12679
-2.90441
HI1
0.053974
0.689127
0.269057
0.058732
COOK1
0.04121
3.48608
1.02204
0.11762
Observation #1:
Observation #2:
Observation #3:
Observation #4:
5. The data on annual sale revenues (in billions of dollars) of the Eastman Kodak Company over
a 25-year period were considered. The following time series plot depicts the actual annual
revenues (y) over the 25 years.
Time Series Plot of Y
20.0
17.5
15.0
12.5
10.0
7.5
5.0
2
10
12
14
Index
16
18
20
22
24
A quadratic trend model, yt = + t + t2 + was studied and the following results were
obtained.
The regression equation is
Y = 3.15 + 1.46 t - 0.0411 t^2
Predictor
Constant
t
t^2
Coef
3.152
1.4617
-0.041122
S = 2.04889
SE Coef
1.137
0.2194
0.008832
R-Sq = 80.6%
Source
DF
SS
Regression
2 384.04
Residual Error 22
92.36
Total
24 476.39
T
2.77
6.66
-4.66
P
0.011
0.000
0.000
R-Sq(adj) = 78.9%
MS
192.02
4.20
F
45.74
P
0.000
a. Would you consider this model and the predictors useful (significant)? Support your answer
by looking at the p values, and the tests. Assume = 0.10.
b. Would you change your answer in part (a) when you saw the residual plots given below by
considering the assumptions on errors? In other words, check whether the results justify the
basic error term assumptions or not, i.e., i ~ i.i.d. Normal(0, 2). State the hypotheses that you
are testing and the residual plot that you are referring to explicitly, and use .
Residual Plots for Y
Normal Probability Plot
Versus Fits
99
Residual
Percent
90
50
10
1
-5.0
-2.5
0.0
Residual
2.5
2
0
-2
-4
5.0
10
Fitted Value
Versus Order
Residual
Frequency
Histogram
0
-2
2
0
15
-3
-2
-1
0
1
Residual
-4
8 10 12 14 16 18 20 22 24
Observation Order
A-D = 0.397, p-value = 0.343

Durbin-Watson statistic = 0.532552
- Each i has a normal distribution:
- Each i has an identical distribution:
- Each i is independent:
6. In a chemical experiment, the true relationship between yield (y) and reaction time (x) is
assumed to be: y = 1e 2 x e .
a. First, apply a transformation to the equation so that a simple linear regression solution can be
found.
b. Then, consider the solution to the transformed model: y * = -2.3+0.6 x.
What are 1 , 2 , and the predicted value of y (i.e., y ) when x = 5?
7. Answer the following short-answer questions.
a. If the variance inflation factor (VIF) is not less than 1, we can say that there exists
multicollinearity between independent variables. True or False?
b. An unusual data point can be both an outlier and a high leverage point. True or False?
c. What can we detect when t-tests for all (or nearly all) parameters are non-significant
whereas the F-test for overall model adequacy (H0: 1==k = 0) is significant?
d. Suppose a fitted linear regression model is y = 15 + 2 x1 3x2 + x22 + 4 x1 x2. What is the
amount of change in the expected value of y for every one-unit increase in x1, holding x2 fixed
at 2?

Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014

Enviado por

Direitos autorais:

Formatos disponíveis

ISyE4031 Regression and Forecasting

P Significant?(Y/N) Keep? Why?

The following output was obtained.

Answer the following questions by using = 0.05.

t-stat for each Xj

4. Residual analysis and diagnostics.

A-D = 0.397, p-value = 0.343

Você também pode gostar