Você está na página 1de 8

QMB3250 Fall 2011 Ripol EXAM 3 December 14, 2011

FORMULAS
[ ( ) ( )]
Partial F tests: Full model: k pred. Red. model: k-g pred. TS = ( )

Exponential Smoothing – weight W Et= W*Yt +(1-W) Et-1

You may use the area below as scratch paper.


HOUSE VALUES
Number R- R-squared
model Variables RMSE
# predictors squared (adj)
1 Property Size, House Size, Age, Rooms, Baths, Garage 6 72.912 0.858 0.821
2 Property Size, House Size, Rooms, Baths, Garage 5 71.380 0.858 0.829
3 Property Size, House Size, Age, Rooms, Baths 5 71.443 0.858 0.828
4 Property Size, House Size, Rooms, Baths 4 70.052 0.858 0.835
5 Property Size, House Size, Baths, Garage 4 72.047 0.850 0.826
6 Property Size, House Size, Baths 3 70.901 0.849 0.831
7 Property Size, House Size, Rooms 3 72.303 0.842 0.824
8 Property Size, House Size 2 72.106 0.837 0.825
9 House Size, Baths 2 85.674 0.770 0.753
10 Property Size 1 121.926 0.517 0.500
11 House Size 1 95.151 0.706 0.696
12 Age 1 172.013 0.040 0.005
13 Rooms 1 172.277 0.037 0.002
14 Baths 1 114.714 0.573 0.558
15 Garage 1 158.651 0.183 0.154

Model # 1 Model #4
Variable Estimate Std. Err. Tstat P-value Variable Estimate Std. Err. Tstat P-value
Intercept 165.422 111.738 1.4804 0.1523 Intercept 165.092 85.829 1.923498 0.0659
Property Size 526.24 165.686 3.1761 0.0042 Property Size 532.15 155.12 3.4304 0.0021
House Size 0.12967 0.024720 5.245 <0.0001 House Size 0.130369 0.02337 5.5777 <0.0001
Age -0.039404 0.851532 -0.046 0.9635 Rooms -15.9963 12.516 -1.2781 0.213
Rooms -15.6225 13.1718 -1.186 0.2477 Baths 43.4471 26.45 1.6425 0.113
Baths 42.005 28.3570 1.481 0.1521
Garage 6.5245 31.5862 0.2066 0.8382
Source DF SS MS F-stat P-value
Model 4 739868 184967 37.692078 <0.0001

Source DF SS MS F-stat P-value Error 25 122682.95 4907.318

Model 6 740278.7 123379.78 23.20833 <0.0001 Total 29 862551

Error 23 122272.266 5316.1855


Summary of fit:
Total 29 862551 Root MSE: 70.05225
R-squared: 0.8578
Summary of fit: R-squared (adjusted): 0.835
Root MSE: 72.91218
R-squared: 0.8582
R-squared (adjusted): 0.8213
QMB3250 Fall 2011 Ripol EXAM 3 TEST FORM CODE: A December 14, 2011

Instructions:
This exam contains 37 Multiple Choice questions.
32 of these questions are worth 3 points, for a total of 96 points.
The last 4 points on the exam will be awarded for correctly bubbling in your name, UFID number and Test
Form Code on the scantron sheet and showing your GatorOne picture ID.
An additional 5 questions at 1 point each provide 5 extra credit points on the test.

Sign the Honor Pledge below and the scantron sheet. The proctors will compare the signature on the ID to
these signatures.

Honor pledge: "On my honor, I have neither given nor received unauthorized aid on this examination."

SIGN your name in this box in INK Write your UFID number

Questions 1- 5 Extra Credit Questions – 1 point each

A waitress wants to know how different factors affect the amount she gets tipped by customers, as a percentage of
their total bill for the meal. Here are different scenarios she's considering - select the best statistical procedure for
each one. Note - each alternative should be used only once.

1. Who tips more on average, males or females?

2. How do average tips compare for each of the seven days of the week?

3. Is there a relationship between the age of the customer and the tip amount?

4. Are customers more likely to tip over 15% for lunch or dinner?

5. Predict the tip for a large group based on how many people in the party, what percentage of them are male,
whether alcohol was consumed, and whether they request one check or separate checks.

a) ANOVA
b) Contingency Table
c) Multiple Regression
d) Simple Linear Regression
e) Two Independent Sample t-test
Questions 6–14 The economic structure of Major League Baseball allows some teams to make substantially more
money than others, which in turn allows some teams to spend much more on player salaries. These teams might
therefore be expected to have better players and win more games on the field as a result. Suppose that after
collecting data on team payroll (in millions of dollars) and season win total for 2010, we find a regression equation
of Wins = 71.87 + 0.101Payroll - 0.060League where League is an indicator variable that equals 0 if the team
plays in the National League or 1 if the team plays in the American League.

6. Common sense suggests that teams with a higher payroll should have a strong tendency to win more games, but
that league affiliation should not matter. Then common sense suggests that the ANOVA F test for this data would
probably have:
a) a small test statistic value and a small p-value.
b) a small test statistic value and a large p-value.
c) a large test statistic value and a small p-value.
d) a large test statistic value and a large p-value.

7. Based on the common sense described in the previous question, in which of the following t tests would we
probably reject the null hypothesis?
a) the t test for Payroll b) the t test for League
c) both t tests d) neither t test

8. The t tests for which variable would necessarily have the same p-value as the ANOVA test?
a) constant b) payroll
c) league d) wins e) none of them

9. Calculate the predicted number of wins for a National League team with a payroll of $98 million.
a) 65.99 b) 77.75 c) 77.85
d) 81.71 e) 81.77

10. One American League team in the data set had a payroll of $108 million and won 88 games. How far was the
observed number of wins from the prediction the model makes?
a) –1.26 b) 5.28 c) 9.65
d) 11.70 e) 22.61

11. Suppose we plotted the data and drew the regression lines for National League and American League teams.
What would be the intercept of the line for American League teams?
a) –0.060 b) 0.060 c) 0.941
d) 0.101 e) 71.81

12. Suppose we plotted the data and drew the regression lines for National League and American League teams.
What would be the slope of the line for American League teams?
a) –0.060 b) 0.060 c) 0.941
d) 0.101 e) 71.81

13. If Teams A and B both play in the same league, and Team A’s payroll is $1 million higher than Team B’s, then
we would expect Team A to win, on average,
a) 0.101 games more than Team B. b) 71.87 games more than Team B.
c) 0.060 games more than Team B. d) 0.060 games fewer than Team B.

14. If Teams A and B have the same payroll, but Team A plays in the National League while Team B plays in the
American League, then we would expect Team A to win, on average,
a) 0.101 games more than Team B. b) 71.87 games more than Team B.
c) 0.060 games more than Team B. d) 0.060 games fewer than Team B.
Questions 15 – 21 Data for 51 U.S. “states” (50 states, plus the District of Columbia) was used to examine the
relationship between violent crime rate (violent crimes per 100,000 persons per year) and the predictor variables of
urbanization (percentage of the population living in urban areas) and poverty rate. A predictor variable indicating
whether or not a state is classified as a Southern state (1 = Southern, 0 = not) was also included. Computer output for
the analysis of this data is shown below (with some information intentionally left blank).

The regression equation is


Crime = -321.9 +4.69Urban +39.3Poverty -649.3South +12.1Urban*South -5.84Poverty*South

Predictor Coef SE Coef T P


Constant -321.90 148.20 -2.17 0.035
Urban 4.689 1.654 2.83 0.007
Poverty 39.34 13.52 2.91 0.006
South(S=1) -649.30 266.96 -2.43 0.019
Urban*South 12.05 2.871 4.20 0.000
Poverty*South -5.838 16.671 –0.35 0.728

S = 140.01 R-Sq = 70.0% R-Sq(adj) = 66.7%

Analysis of Variance

Source DF SS MS F P
Regression 5 2060459 412091 ———— 0.000
Residual Error 45 882169 19604
Total 50 2942628

15. Calculate the ANOVA F test statistic value.


a) 2.34 b) 4.20 c) 4.58 d) 21.02 e) 47.00

16. When finding the p-value for the ANOVA F test, what degrees of freedom should be used?
a) df = 5 b) df = 45 c) df = 50
d) df1 = 5, df2 = 45 e) df1 = 5, df2 = 50

17. Based on the p-value for the ANOVA F test shown in the output, how many of the predictors are useful for
predicting crime rate?
a) none of them b) all of them c) exactly one of them d) at least one of them

18. Which of the following predictors should probably be removed from the model to improve it?
a) Urban b) Poverty c) South d) Urban*South e) Poverty*South

19. Which of the following represents the fitted relationship between crime, urbanization, and poverty
for Southern states?
a) Crime = –321.9 + 4.69Urban + 39.3Poverty
b) Crime = –315.6 + 4.69Urban + 39.3Poverty
c) Crime = –315.6 + 16.8Urban + 33.5Poverty
d) Crime = –971.2 + 16.8Urban + 33.5Poverty
e) Crime = –971.2 + 4.69Urban + 39.3Poverty

20. Predict the violent crime rate for a non-Southern state with an urbanization of 65.6 and a poverty rate of 8.0.
a) 300.4 b) 336.5 c) 349.1
d) 416.9 e) 432.2

21. Predict the violent crime rate for a Southern state with an urbanization of 55.4 and a poverty rate of 13.7.
a) 418.2 b) 510.1 c) 535.8
d) 582.4 e) 633.5
Questions 22 - 28 A sample of 30 single-family homes located in a suburb of New York City was obtained, and six
variables were used to try to predict a home’s appraised value. The six predictors were: land area of the property
(acres), interior size of the house (square feet), age (years), number of bedrooms, number of bathrooms, and number
of cars that can be parked in the garage. Output from various statistical analyses of this data appears on the second
page of the exam, labeled HOUSE VALUES. There are summaries of 15 different models, and complete output for
models #1 and #6.

22. Which of the variables appears to be the single best predictor of appraised value?
a) property size b) house size c) age
d) baths e) rooms

23. Which of the variables provides the least information on appraised value?
a) house size b) age c) rooms
d) baths e) garage

24. Of the models presented, the best one overall has _____________________ predictors.
a) 6 b) 5 c) 4 d) either 4, 5 or 6 e) either 2 or 3

25. What is the F test statistic to determine if age and garage, together, are good predictors of appraised value?
a) 23.20
b) 37.69
c) 0.1606
d) 14.48
e) 0.0386

26. When comparing the model#4 (4 predictors -Property Size, House Size, Rooms, Baths) to model #8
(2 predictors - Property Size, House Size) we can make the following correct observations:
a) model #4 is better because it has a higher R2 adjusted.
b) model #4 is better because it has a higher R2.
c) model #8 is better because it has higher Root MSE.
d) model #8 is better because it has lower number of predictors.

27. Larger homes tend to have more bedrooms and bathrooms than smaller homes. Which of the following is a
statistical statement that says basically the same thing?
a) Extrapolation can give bad predictions if we use all three variables together.
b) There may be multicollinearity between two or more of those variables.
c) Very large homes with many bedrooms and bathrooms could be considered influential points.
d) A cause and effect relationship could not be concluded from this study.
e) It is necessary to look at R2 adjusted instead of R2 when comparing these models.

28. Larger homes with many bedrooms and bathrooms tend to also be more luxurious, costing more that would be
expected based on those parameters alone. This suggests we could try:
a) adding to the model some quadratic terms
b) adding to the model some interaction terms
c) adding to the model some dummy variables
d) removing from the model some of the variables already in it
e) removing from the model all but one of those variables already in it
Questions 29 – 32 Below you will find data representing the gross revenues (in billions of current dollars) of
McDonald’s Corporation from 1998 to 2008, and a graph with the linear forecasting trend (starting at t=1).

Year Revenues
1998 12.4
Revenues y = 1.1545x + 10.555
1999 13.3
25.0 2000 14.2
20.0 2001 14.8
2002 15.2
15.0
2003 16.8
10.0 2004 18.6
2005 19.8
5.0
2006 20.9
0.0 2007 22.8
1 2 3 4 5 6 7 8 9 10 11
2008 23.5

29. What is the Central Moving Average of Length 3 for the year 2001?
a) 14.73
b) 13.68
c) 14.86
d) 14.38
e) 15.25

30. What is the Exponentially Smoothed value using W=.25 for the year 2000?
a) 12.625
b) 13.019
c) 13.525
d) 13.648
e) 14.650

31. What would be the effect on the smoothed values of using W=.5 instead of W=.25?
a) The moving average series would be smoothed out more.
b) The moving average series would be smoothed out less.
c) The exponentially smoothed series would be smoothed out more.
d) The exponentially smoothed series would be smoothed out less.
e) The moving average series would be smoothed out more, but the exponentially smoothed series less.

32. Forecast the revenues for the year 2011.


a) 23.255
b) 22.100
a) 26.718
b) 25.564
c) 24.409
Questions 33 – 37 The owner of a small health club owner knows that elliptical machines are very much in demand,
but they are also very expensive. In order to determine how many machines to buy, he first decides to lease varying
numbers of machines over a period of 12 Wednesdays. He advertises the number of machines, and then records the
number of people attending the gym on each of those days. The analysis and graph of the data appear below.

33. Looking at the graph of the data we can see that:


a) A straight line would do a decent job of summarizing the data, but a curve would be better.
b) A curve would do a decent job of summarizing the data, but a straight line would be better.
c) A straight line would do a bad job of summarizing the data, but a curve would be good.
d) A curve would do a bad job of summarizing the data, but a straight line would be good.
e) Both a curve and a straight line would be equally good.

34. To conduct a quadratic regression we need to create a new column on the spreadsheet with:
a) attendance squared
b) number of machines squared
c) interaction between attendance and number of machines
d) a dummy variable indicating whether that day of the week was Wednesday
e) all of the above

35. The p-values for the variables in the output below indicate that:
a) both x and x^2 are significant b) neither x nor x^2 are significant
c) the intercept is significant d) all of the three variables are significant
e) the error term is significant

36. What attendance does the model predict for a Wednesday when there are 5 machines?
a) 934 b) 697
c) 730 d) 718 e) 258

37. The owner decided to test always on Wednesdays to eliminate extra variability. What other things should he take
into consideration?
a) The number of machines available should be randomized from week to week.
b) Holidays can affect attendance, so those weeks should be avoided.
c) Advertisement should start well before the beginning of the experiment.
d) All of the above are important considerations.
e) None of the above are necessary considerations.

Parameter Estimate Std. Err. Alternative DF T-Stat P-Value


Intercept 72.05 35.237724 ≠0 9 2.0446837 0.0712
X 199.7625 23.053482 ≠0 9 8.665177 <0.0001
X^2 -13.651786 3.2239215 ≠0 9 -4.234528 0.0022

Source DF SS MS F-stat P-value


Model 2 393933.12 196966.56 253.80302 <0.0001
Error 9 6984.5464 776.0607
Total 11 400917.66

Summary of fit:
Root MSE: 27.857866
R-squared: 0.9826
R-squared (adjusted): 0.9787

Você também pode gostar