The document provides examples and questions related to linear regression. It includes:
1) An example of computing a residual value based on a linear regression equation predicting exam scores.
2) Questions about interpreting correlation coefficients, evaluating appropriate regression models based on residual patterns, and predicting values from regression equations.
3) Two multi-part examples analyzing linear regression models - one predicting completion of a PhD program based on undergraduate GPA and course load, and another modeling a chemical reaction based on time and product amount.
The document provides examples and questions related to linear regression. It includes:
1) An example of computing a residual value based on a linear regression equation predicting exam scores.
2) Questions about interpreting correlation coefficients, evaluating appropriate regression models based on residual patterns, and predicting values from regression equations.
3) Two multi-part examples analyzing linear regression models - one predicting completion of a PhD program based on undergraduate GPA and course load, and another modeling a chemical reaction based on time and product amount.
The document provides examples and questions related to linear regression. It includes:
1) An example of computing a residual value based on a linear regression equation predicting exam scores.
2) Questions about interpreting correlation coefficients, evaluating appropriate regression models based on residual patterns, and predicting values from regression equations.
3) Two multi-part examples analyzing linear regression models - one predicting completion of a PhD program based on undergraduate GPA and course load, and another modeling a chemical reaction based on time and product amount.
1) In a statistics course, a linear regression equation was computed to predict the final exam scored based on the score on the first test of the term. The was
y = 25 + 0.7x , where y is the final exam score and x is the score on the first test. George scored 80 on the first test. On the final exam George scored 85. What is the value of his residual? Residual =
y !y
y = 25 + 0.7 80 ( ) = 81 residual = 85 81 = 4
2) A student was interested in the relationship between weight of a car and gas consumption measured in mpg. He selected 16 different automobiles and recorded their weights along with their advertised mpg. The regression plot is show to the right
What affection would the addition of the point (4,300lbs, 15.63mpg) have on the value of r 2
The addition of the new point would lie almost nearly on the line. It would also be the lowest point on the graph and further away from the majority of the data making it an influential point. This point would hold leverage on the data causing the value of r 2 in this case to increase.
3) The correlation between height and weight among men between the ages of 18 and 70 in the United States is approximately 0.42. Which of the following conclusions doesnt follow from the data? a) Taller men tend to be heavier b) Changing the units of weight and height would still yield the same correlation value. c) Heavier men tend to be taller. d) If a man in this group changes his diet and gains 10 pounds, he is likely to get taller. e) There is a moderate association between a mans height and weight..
4) There is a linear relationship between the duration x (in seconds) of an eruption of a geyser and the interval of time y (in minutes) until the next eruption. A LSRL of data collected by a geologist is represented by,
y = 41.9+ 0.18x . What is the estimated increase in the interval of time until the next eruption that corresponds to a 60 seconds in the duration? 3.6 minutes.
5) Which of the following statement(s) is true? a) Values of r near 0 indicate a strong linear relationship? b) Changing the measurement units of x and y may affect the correlation between x and y. c) Strong correlations means that there is a definite cause-and-effect relationship between x and y. d) Correlation changes when the x and y variables are reversed. e) The correlation can be strongly affected by a few outlying observations.
6) Data are obtained from a group of high school seniors comparing age and the number of hours spent on the telephone. The resulting regression equation is: Predicted number of hours = 0.123 (AGE) + 2.57., where r = 0.866
What percentage of variation in the number of hours spent on the telephone can be explained by the least squares regression line model? Since r 2 is the value that explains the percentage of variation that can be attributed to a linear relationship between the variables of age and predicted number of hours, 75% of the variation amongst predicted numbers of hours can be attributed to the linear relationship between the variables age and predicted number of hours.
7) Consider the following three scatterplots to the right. Put them in order by the value of their correlation coefficient from smallest to greatest.
r2 < r1 < r3
8) A linear model was constructed for a set of bivariate data using least squares regression techniques. Given the residual plot shown, what conclusion should be drawn?
Because there is an obvious patter in the data, a linear model is not an appropriate fit.
9) The following output was generated from a random sample of 40 companies on the Forbes 500 list, where sales (in hundreds of thousands of dollars) and profits (in hundred of thousands of dollars) was investigated using linear regression. Here is the output computed in minitab:
On average, for every $100,000 increase in sales, there is a $9,249.80 increase in the profit.
10) A fisheries research report gives the following regression equation for the relationship between the length (L) in cm and weight (W) in grams, of the gracile lizardfish, a small marine fish that lives in the Indian Ocean:
lnW = !5.36+ 3.216lnL What is the predicted weight of a lizardfish that was 12 cm long, based on this model? 13.80 grams
11) All but one of these statements is false. Which one could be true? a) The correlation between a football players weight and the position he plays is 0.54. b) The correlation between a cars length and its fuel efficiency is 0.71 miles per gallon. c) There is a high correlation (1.09) between height of a corn stalk and its age in weeks. d) The correlation between the amounts of fertilizer used and quantity of beans harvested is 0.42. e) There is a correlation 0f 0.63 between gender and political party. a and e are wrong because correlation can only be established between quantitative variables. C is wrong because the correlation cannot be greater than 1. B is wrong because correlation is not a measure that involves a unit
Free Responses.
The statistics department at a large university is trying to determine if it is possible to predict whether an applicant will successfully complete the Ph. D. program or will leave before completing the program. The department is considering whether GPA (grade point average) in undergraduate statistics and mathematics courses (a measure of performance) and mean number of credit hours per semester (a measure of workload) would be helpful measures.
Successfully Completed Ph.D. Program
a)What is the LSRL?
Predicted Doctors = 23.514 !2.756 GPA ( )
b) Interpret the values of r, r-squared, and the slope in context of the problem.
r= 0.872 There is a strong negative correlation between the doctors and GPA r 2 = 76% Approximately 76% of the variation in doctors can be explained by the linear relationship between doctors and GPA slope= -2.755 On average, for every increase in GPA there is an approximate 2.755 decrease in the number of students completing their doctorate
c) Is a linear model appropriate for this data? Justify your answer.
Yes the linear model is appropriate, since there is no apparent pattern in the residuals and approximately 76% of the variation in doctors can be explained by the linear relationship between doctors and GPA
The dependent variable is DOCTORS. Predictor Coef StDev T P Constant 23.514 1.684 13.95 0.000 GPA -2.7555 0.4668 -5.90 0.000 S = 0.5658 R-Sq=76.0% Assume that the following data was collected for a chemical reaction where reactants A and B are reacting to form products C and D. As the products are formed, we measure their masses at two minute increments.
Time (min) Amt of product (g) 2 3 6 5 7 7 8 10 10 13 12 17 14 21 16 26 18 34 20 50
a) What is the equation for the model of best fit? Illustrate your process carefully. Give a rough sketch of the residual plot.
b) Is a linear model the most appropriate? If not, what would be a better model?
No the linear model is not the most appropriate, because there is a definite pattern in the residual plot. Appropriate model:
log predicted amt. of product ( ) ! = .3853+.0662 time ( ) r = .9901 r 2 = .9803
The Exponential model proved to be the most appropriate model because there was no apparent patter in the residual and approximately 98.03% of the variation in the log(amount of product) can be explained by the (exponential) linear relationship between the log(amount of product) and time.
c) What does your model predict would have been the amount present at 5 minutes?
log y ( ) ! = .3853+.0662 5 ( ) log y ( ) ! = .7163 y " = 5.2036 At 5 minutes there would be approximately 5.2036 grams of product present 5 10 15 20 0 5 1 0 Time R e s i d u a l s 5 10 15 20 - 0 . 2 - 0 . 1 0 . 0 0 . 1 0 . 2 Time R e s i d u a l s 5 10 15 20 - 5 0 5 1 0 1 5 Time R e s i d u a l s 5 10 15 20 - 0 . 4 - 0 . 2 0 . 0 0 . 2 0 . 4 Time R e s i d u a l s Linear Model:
predicted amt. of product ( ) ! = -8.8467+2.429 time ( ) r = .945 r 2 = .894
Pattern in the residual= NOT appropriate
Exponential Model:
log predicted amt. of product ( ) ! = .3853+.0662 time ( ) r = .9901 r 2 = .9803
No Pattern in the residual= Appropriate
Logarithmic Model:
predicted amt. of product ( ) ! = -20.6026+39.949 log time ( ) ( ) r = .8046 r 2 = .6473 Pattern in the residual= NOT appropriate
Power Model:
log predicted amt. of product ( ) ! = -.0764+1.233 log time ( ) ( ) r = .9541 r 2 = .9104 Pattern in the residual= NOT appropriate
d) At what time would 25.1 grams remain, according to your model?
log 25.1 ( ) ! = .3853+.0662 x ( ) 1.0144 = .0662 x ( ) x =15.3229 With 25.1 grams remaining the