Você está na página 1de 20

Working with relationships between two variables

Size of Teaching Tip & Stats Test Score


100
90
80

Stats
Test
Score

70
60
50
40
30
20
10
0
$0

$20

$40

$60

$80

Correlation & Regression


Univariate & Bivariate Statistics
U: frequency distribution, mean, mode, range, standard
deviation
B: correlation two variables

Correlation
linear pattern of relationship between one variable (x) and
another variable (y) an association between two variables
relative position of one variable correlates with relative
distribution of another variable
graphical representation of the relationship between two
variables

Warning:
No proof of causality
Cannot assume x causes y

Scatterplot!
No Correlation
Random or circular
assortment of dots

Positive Correlation
ellipse leaning to right
GPA and SAT
Smoking and Lung Damage

Negative Correlation
ellipse learning to left
Depression & Self-esteem
Studying & test errors

Pearsons Correlation Coefficient


r indicates
strength of relationship (strong, weak, or none)
direction of relationship
positive (direct) variables move in same direction
negative (inverse) variables move in opposite directions

r ranges in value from 1.0 to +1.0

-1.0
Strong Negative

0.0
No Rel.

+1.0
Strong Positive

Go to website!
playing with scatterplots

Practice with Scatterplots

r = .__ __

r = .__ __

r = .__ __

r = .__ __

Correlation Guestimation

Correlations

Miles walked per day

Weight

Depression

Anxiety

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

Miles walked
per day
1
12
-.797**
.002
12
-.800**
.002
12
-.774**
.003
12

**. Correlation is significant at the 0.01 level (2-tailed).


*. Correlation is significant at the 0.05 level (2-tailed).

Weight
Depression
-.797**
-.800**
.002
.002
12
12
1
.648*
.023
12
12
.648*
1
.023
12
12
.780**
.753**
.003
.005
12
12

Anxiety
-.774**
.003
12
.780**
.003
12
.753**
.005
12
1
12

Samples vs. Populations


Sample statistics estimate Population parameters
M tries to estimate
r tries to estimate (rho greek symbol --- not p)

correlation for a sample


based on a the limited observations we have

actual correlation in population


the true correlation

Beware Sampling Error!!


even if =0 (theres no actual correlation), you might get r =.08
or r = -.26 just by chance.
We look at r, but we want to know about

Hypothesis testing with Correlations


Two possibilities
Ho: = 0 (no actual correlation; The Null Hypothesis)
Ha: 0 (there is some correlation; The Alternative Hyp.)

Case #1 (see correlation worksheet)


Correlation between distance and points r = -.904
Sample small (n=6), but r is very large
We guess < 0 (we guess there is some correlation in the pop.)

Case #2
Correlation between aiming and points, r = .628
Sample small (n=6), and r is only moderate in size
We guess = 0 (we guess there is NO correlation in pop.)

Bottom-line
We can only guess about
We can be wrong in two ways

Reading Correlation Matrix


Correlationsa

Total ball toss points

Distance from target

Time spun before


throwing

Aiming accuracy

Manual dexterity

College grade point avg

Confidence for task

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

Total ball
Distance
toss points from target
1
-.904*
.
.013
6
6
-.904*
1
.013
.
6
6
-.582
.279
.226
.592

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

Total ball*. toss points

Aiming
accuracy
.628
.181
6
-.653
.159
6
-.390
.445

Manual
College grade
dexterity
point avg
.821*
-.037
.045
.945
6
6
-.883*
.228
.020
.664
6
6
-.248
-.087
.635
.869

Confidence
for task
-.502
.310
6
.522
.288
6
.267
.609

.628
.181
6
.821*
.045
6
-.037
.945
6
-.502
.310
6

-.653
.159
6
-.883*
.020
6
.228
.664
6
.522
.288
6

-.390
.445
6
-.248
.635
6
-.087
.869
6
.267
.609
6

1
.
6
.758
.081
6
-.546
.262
6
-.250
.633
6

.758
.081
6
1
.
6
-.553
.255
6
-.101
.848
6

-.546
.262
6
-.553
.255
6
1
.
6
-.524
.286
6

-.250
.633
6
-.101
.848
6
-.524
.286
6
1
.
6

Pearson Correlation
a. Day sample collected = Tuesday
Sig. (2-tailed)
N
Distance from target
Pearson Correlation
Sig. (2-tailed)
N
Time spun before
Pearson Correlation
throwing
Sig. (2-tailed)
N
Correlation is significant at the 0.05 level (2-tailed).

Time spun
before
throwing
-.582
.226
6
.279
.592
6
1
.

Total ball
toss points
1
.
6
-.904*
.013
6
-.582
.226

Correlationsa

Time spun
r = -.904
Distance
before
Aiming
Manual College grade Confidence
from target throwing accuracy dexterity
point avg
for task
p
=
.
013
-Probability
of-.502
-.904*
-.582
.628
.821*
-.037
.013
.226 getting
.181 a correlation
.045
.945this size
.310
chance.
Reject
Ho 6
6
6 by sheer
6
6
6
p .05.-.883*
1
.279 if -.653
.228
.522
.
.592
.159
.020
.664
.288
sample
6size
6
6
6
6
6
r
(4)
=
-.904,
p.05
.279
1
-.390
-.248
-.087
.267
.592
.
.445
.635
.869
.609

Predictive Potential
Coefficient of Determination
r
Amount of variance accounted for in y by x
Percentage increase in accuracy you gain by using the regression
line to make predictions
Without correlation, you can only guess the mean of y
[Used with regression]

0%

20%

40%

60%

80%

100%

Limitations of Correlation
linearity:
cant describe non-linear relationships
e.g., relation between anxiety & performance

truncation of range:
underestimate stength of relationship if you cant see full range
of x value

no proof of causation
third variable problem:
could be 3rd variable causing change in both variables
directionality: cant be sure which way causality flows

Regression
Regression: Correlation + Prediction
predicting y based on x
e.g., predicting.
throwing points (y)
based on distance from target (x)

Regression equation

formula that specifies a line


y = bx + a
plug in a x value (distance from target) and predict y (points)
note
y= actual value of a score
Go to website!
y= predict value
Regression Playground

Regression Graphic Regression Line


See correlation
& regression
worksheet

120

100

80

Total ball toss points

60

y=47
y=20

40

20

Rsq = 0.6031
8

10

12

14

16

Distance from target

18

if x=18
then

20

22

24

26

if x=24
then

Regression Equation
y= bx + a

See correlation
& regression
worksheet

y = predicted value of y
b = slope of the line
x = value of x that you plug-in
a = y-intercept (where line crosses y access)

In this case.
y = -4.263(x) + 125.401

So if the distance is 20 feet


y = -4.263(20) + 125.401
y = -85.26 + 125.401

y = 40.141

SPSS Regression Set-up


Criterion,
y-axis variable,
what youre trying
to predict

Predictor,
x-axis variable,
what youre basing
the prediction on

Note: Never refer to the IV or DV when doing regression

Getting Regression Info from SPSS


Model Summary
Model
1

R
R Square
a
.777
.603

Adjusted
R Square
.581

Std. Error of
the Estimate
18.476

See correlation
& regression
worksheet

a. Predictors: (Constant), Distance from target

y = b (x)

y = -4.263(20) + 125.401
Coefficientsa

Model
1

(Constant)
Distance from target

Unstandardized
Coefficients
B
Std. Error
125.401
14.265
-4.263
.815

Standardized
Coefficients
Beta
-.777

a. Dependent Variable: Total ball toss points

t
8.791
-5.230

Sig.
.000
.000

Predictive Ability
Mantra!!
As variability decreases, prediction accuracy ___
if we can account for variance, we can make better predictions

As r increases:
r increases
variance accounted for increases
the prediction accuracy increases
prediction error decreases (distance between y and y)
Sy decreases
the standard error of the residual/predictor
measures overall amount of prediction error

We like big rs!!!

Drawing a Regression Line by Hand


Three steps
1. Plug zero in for x to get a y value, and then
plot this value

Note: It will be the y-intercept

2. Plug in a large value for x (just so it falls on the


right end of the graph), plug it in for x, then
plot the resulting point
3. Connect the two points with a straight line!