When trying to decide whether two variables in a situation The correlation is the connectedness between the variables values. Statistical techniques let us assign a value from 1 to 1 to data set of two variables. The closer to 1 or 1, the stronger the correlation. are related, statisticians often look at the correlation between the two variables. But just what does it tell us, if we nd that two variables have a strong correlation? Following are several plots showing statistics about scores on the SAT and ACT exams for the year 2000. Colleges use the exam scores as measures of a students ability or as predictors of a students success in college. Each data point represents a state. For the test scores, a points value represents the average score for that state. The percentages represent the percentage of seniors for the state who took the test. Problems with a Point: January 11, 2002 c EDC 2002 The correlation and the cause: Problem 2 An important kind of correlation is linear correlation, which measures how closely data clusters about about a line. For these problems, consider the linear correlation between the variables in the plots on the previous page. 1. For each plot, decide whether the data appears to have a signicant correlation, or little or no correlation. 2. Consider the variables used in the plots that appear to have at least some correlation. For each, consider whether it seems likely that one of the variables would cause the other variable to change. For example, does it seem likely that a low percentage of students taking the ACT would cause a high percentage of students to take the SAT? When you try to draw conclusions from statistics, you have to think carefully about the situations involved. Although two vari- ables may have a strong correlation, you cant necessarily con- clude that one causes the other. There may be underlying or lurking variables, which aect both the variables being studied but arent shown in the data. 3. Here is height data for eight children in various grades. Each child can read at a level appropriate to his or her grade. For example, Hannah reads at a third grade level. Name Height (cm) Grade Charlie 110 K Jean 115 1 Nancy 120 2 Hannah 128 3 Steve 132 4 Neeraj 139 5 Wayne 142 6 Maria 150 7 (a) Plot these data on the following grid. Problems with a Point: January 11, 2002 c EDC 2002 The correlation and the cause: Problem 3 (b) Does there seem to be a correlation between height and reading level? (c) Of course, a persons height wont directly aect his or her reading level. What lurking variable might be at work here? That is, what variable might cause height to go up, and at the same time, also cause a persons reading level to go up? Explain how this is a lurking variable. A lurking variable isnt a step in a cause-and-eect chain, rather, its a variable that causes two otherwise unrelated things to hap- pen. For example, many people have more trouble with allergies when the weather gets warmso there seems to be a correlation between outside temperature and the amount of trouble with al- lergies. There is another variable that explains this correlation, but it isnt a lurking variable. The variable is how much pollen is in the air. As the temperature rises, more owers bloom. The owers release pollen, which causes allergies to get worse: Higher temperature More pollen More allergy problems A lurking variable must cause change in two variables with no connection to each other. 4. For each of the plots that you identied as having at least some correlation (problem 1), try to think of at least one possible lurking variable that should be considered. 5. Discussion: With your class, talk about the possible lurk- ing variables you thought of. Once youve heard the pos- sibilities, decide which of these variable pairs you think seems likely to have a direct cause-and-eect relationship. For which does it seem likely a lurking variable is causing the correlation? Problems with a Point: January 11, 2002 c EDC 2002 The correlation and the cause: Answers 1 Answers 1. The SAT Verbal vs. SAT Math data have a strong corre- lation (Pearsons correlation coecient is about 0.963534). Almost as strong is the Percent Taking ACT vs. Per- cent Taking SAT data (0.958773). The SAT Combined vs. Percent Taking SAT has a slightly weaker correlation (0.886577). The SAT Combined vs. ACT Composite data has little correlation (0.236171), and the ACT Composite vs. Per- cent Taking ACT has virtually no correlation (0.0989814). 2. It does not seem likely that either variable in the SAT Ver- bal vs. SAT Math would cause the other to change. Some students may nd it reasonable that changing Percent Tak- ing ACT may cause Percent Taking SAT to change, or vice versa (but see the discussion of lurking variables below). It The fact that the Percent Taking SAT and SAT Combined data have a negative correlation may be surprising, however. Some people may reason that many students taking the tests would happen only if the state has high educational standards, so the students should do better, on average, than in other states. (In this case, the educational standard would actually be a lurking variable.) This would give a positive correlation. There is a theory that accounts for the negative correlation: In states with few students taking the test, the ones who take it are more likely to go out of region for college or take both testsin general, the higher ability students. When more students take the test, they bring the average for those high ability students down. This sounds like a causal relationship, but there may be a lurking variable here, too. See the answers to problem 4. Interestingly, though, this pattern does not seem to hold for the ACT test. certainly seems reasonable that Percent Taking SAT might cause SAT Combined to change. 3. (a) Here is the plot: (b) There does seem to be a correlation between height and reading level. (c) Age (or the childrens grades) is a lurking variable. (As children get older, they usually grow taller. Also, they generally become better readers.) 4. Answers will vary. Some students may not be able to think of any examples. Possible answers include: SAT Verbal vs. SAT Math: Level of standards across all curricula; percent (and ability) of students taking the test. Percent Taking ACT vs. Percent Taking SAT: Loca- tion (Southern and Midwestern states tend to have more focus on ACT, while Northeastern states focus more on Problems with a Point: January 11, 2002 c EDC 2002 The correlation and the cause: Answers 2 SAT); cost for tests (if one costs signicantly more, stu- dents in states with a higher average economic status may be more likely to prefer that one over the other). SAT Combined vs. Percent Taking SAT Level of The dierence between this lurking variable and the explanation in the margin next to problem 2 may seems subtle. Consider taking random collections of students and giving them the SAT test. You would not expect the size of the collections to have an eect on the average scores. In this case, though, the collections are not randomhigher ability students seem to be the ones taken the test when there is a smaller percentage. However, they are not higher ability because they are part of a smaller percentage of test takers. expectation for college (in a state where fewer students are expected to continue to college, only those likely to go on will take the SATand those are also more likely to do well). 5. Answers may vary, however, its plausible that none of these really are cause-and-eect relationships. Problems with a Point: January 11, 2002 c EDC 2002