Você está na página 1de 5

1

The correlation and the cause


When trying to decide whether two variables in a situation The correlation is the
connectedness between the
variables values. Statistical
techniques let us assign a value
from 1 to 1 to data set of two
variables. The closer to 1 or 1,
the stronger the correlation.
are related, statisticians often look at the correlation between
the two variables. But just what does it tell us, if we nd that
two variables have a strong correlation?
Following are several plots showing statistics about scores on
the SAT and ACT exams for the year 2000. Colleges use the
exam scores as measures of a students ability or as predictors
of a students success in college. Each data point represents a
state. For the test scores, a points value represents the average
score for that state. The percentages represent the percentage
of seniors for the state who took the test.
Problems with a Point: January 11, 2002 c EDC 2002
The correlation and the cause: Problem 2
An important kind of correlation is linear correlation, which
measures how closely data clusters about about a line. For these
problems, consider the linear correlation between the variables
in the plots on the previous page.
1. For each plot, decide whether the data appears to have a
signicant correlation, or little or no correlation.
2. Consider the variables used in the plots that appear to
have at least some correlation. For each, consider whether
it seems likely that one of the variables would cause the
other variable to change. For example, does it seem likely
that a low percentage of students taking the ACT would
cause a high percentage of students to take the SAT?
When you try to draw conclusions from statistics, you have to
think carefully about the situations involved. Although two vari-
ables may have a strong correlation, you cant necessarily con-
clude that one causes the other. There may be underlying or
lurking variables, which aect both the variables being studied
but arent shown in the data.
3. Here is height data for eight children in various grades.
Each child can read at a level appropriate to his or her
grade. For example, Hannah reads at a third grade level.
Name Height (cm) Grade
Charlie 110 K
Jean 115 1
Nancy 120 2
Hannah 128 3
Steve 132 4
Neeraj 139 5
Wayne 142 6
Maria 150 7
(a) Plot these data on the following grid.
Problems with a Point: January 11, 2002 c EDC 2002
The correlation and the cause: Problem 3
(b) Does there seem to be a correlation between height and
reading level?
(c) Of course, a persons height wont directly aect his or
her reading level. What lurking variable might be at
work here? That is, what variable might cause height
to go up, and at the same time, also cause a persons
reading level to go up? Explain how this is a lurking
variable.
A lurking variable isnt a step in a cause-and-eect chain, rather,
its a variable that causes two otherwise unrelated things to hap-
pen. For example, many people have more trouble with allergies
when the weather gets warmso there seems to be a correlation
between outside temperature and the amount of trouble with al-
lergies.
There is another variable that explains this correlation, but it
isnt a lurking variable. The variable is how much pollen is in
the air. As the temperature rises, more owers bloom. The
owers release pollen, which causes allergies to get worse:
Higher temperature More pollen More allergy problems
A lurking variable must cause change in two variables with no
connection to each other.
4. For each of the plots that you identied as having at least
some correlation (problem 1), try to think of at least one
possible lurking variable that should be considered.
5. Discussion: With your class, talk about the possible lurk-
ing variables you thought of. Once youve heard the pos-
sibilities, decide which of these variable pairs you think
seems likely to have a direct cause-and-eect relationship.
For which does it seem likely a lurking variable is causing
the correlation?
Problems with a Point: January 11, 2002 c EDC 2002
The correlation and the cause: Answers 1
Answers
1. The SAT Verbal vs. SAT Math data have a strong corre-
lation (Pearsons correlation coecient is about 0.963534).
Almost as strong is the Percent Taking ACT vs. Per-
cent Taking SAT data (0.958773). The SAT Combined
vs. Percent Taking SAT has a slightly weaker correlation
(0.886577).
The SAT Combined vs. ACT Composite data has little
correlation (0.236171), and the ACT Composite vs. Per-
cent Taking ACT has virtually no correlation (0.0989814).
2. It does not seem likely that either variable in the SAT Ver-
bal vs. SAT Math would cause the other to change. Some
students may nd it reasonable that changing Percent Tak-
ing ACT may cause Percent Taking SAT to change, or vice
versa (but see the discussion of lurking variables below). It The fact that the Percent Taking
SAT and SAT Combined data
have a negative correlation may
be surprising, however. Some
people may reason that many
students taking the tests would
happen only if the state has high
educational standards, so the
students should do better, on
average, than in other states. (In
this case, the educational
standard would actually be a
lurking variable.) This would give
a positive correlation.
There is a theory that
accounts for the negative
correlation: In states with few
students taking the test, the ones
who take it are more likely to go
out of region for college or take
both testsin general, the higher
ability students. When more
students take the test, they bring
the average for those high ability
students down. This sounds like a
causal relationship, but there may
be a lurking variable here, too.
See the answers to problem 4.
Interestingly, though, this
pattern does not seem to hold for
the ACT test.
certainly seems reasonable that Percent Taking SAT might
cause SAT Combined to change.
3. (a) Here is the plot:
(b) There does seem to be a correlation between height
and reading level.
(c) Age (or the childrens grades) is a lurking variable. (As
children get older, they usually grow taller. Also, they
generally become better readers.)
4. Answers will vary. Some students may not be able to think
of any examples. Possible answers include:
SAT Verbal vs. SAT Math: Level of standards across
all curricula; percent (and ability) of students taking the
test.
Percent Taking ACT vs. Percent Taking SAT: Loca-
tion (Southern and Midwestern states tend to have more
focus on ACT, while Northeastern states focus more on
Problems with a Point: January 11, 2002 c EDC 2002
The correlation and the cause: Answers 2
SAT); cost for tests (if one costs signicantly more, stu-
dents in states with a higher average economic status may
be more likely to prefer that one over the other).
SAT Combined vs. Percent Taking SAT Level of The dierence between this
lurking variable and the
explanation in the margin next to
problem 2 may seems subtle.
Consider taking random
collections of students and giving
them the SAT test. You would
not expect the size of the
collections to have an eect on
the average scores. In this case,
though, the collections are not
randomhigher ability students
seem to be the ones taken the
test when there is a smaller
percentage. However, they are
not higher ability because they
are part of a smaller percentage
of test takers.
expectation for college (in a state where fewer students
are expected to continue to college, only those likely to go
on will take the SATand those are also more likely to do
well).
5. Answers may vary, however, its plausible that none of
these really are cause-and-eect relationships.
Problems with a Point: January 11, 2002 c EDC 2002

Você também pode gostar