Você está na página 1de 20

# Working with relationships between two variables

100
90
80

Stats
Test
Score

70
60
50
40
30
20
10
0
\$0

\$20

\$40

\$60

\$80

## Correlation & Regression

Univariate & Bivariate Statistics
U: frequency distribution, mean, mode, range, standard
deviation
B: correlation two variables

Correlation
linear pattern of relationship between one variable (x) and
another variable (y) an association between two variables
relative position of one variable correlates with relative
distribution of another variable
graphical representation of the relationship between two
variables

Warning:
No proof of causality
Cannot assume x causes y

Scatterplot!
No Correlation
Random or circular
assortment of dots

Positive Correlation
ellipse leaning to right
GPA and SAT
Smoking and Lung Damage

Negative Correlation
ellipse learning to left
Depression & Self-esteem
Studying & test errors

## Pearsons Correlation Coefficient

r indicates
strength of relationship (strong, weak, or none)
direction of relationship
positive (direct) variables move in same direction
negative (inverse) variables move in opposite directions

## r ranges in value from 1.0 to +1.0

-1.0
Strong Negative

0.0
No Rel.

+1.0
Strong Positive

Go to website!
playing with scatterplots

## Practice with Scatterplots

r = .__ __

r = .__ __

r = .__ __

r = .__ __

Correlation Guestimation

Correlations

## Miles walked per day

Weight

Depression

Anxiety

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

Miles walked
per day
1
12
-.797**
.002
12
-.800**
.002
12
-.774**
.003
12

## **. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

Weight
Depression
-.797**
-.800**
.002
.002
12
12
1
.648*
.023
12
12
.648*
1
.023
12
12
.780**
.753**
.003
.005
12
12

Anxiety
-.774**
.003
12
.780**
.003
12
.753**
.005
12
1
12

## Samples vs. Populations

Sample statistics estimate Population parameters
M tries to estimate
r tries to estimate (rho greek symbol --- not p)

## correlation for a sample

based on a the limited observations we have

## actual correlation in population

the true correlation

## Beware Sampling Error!!

even if =0 (theres no actual correlation), you might get r =.08
or r = -.26 just by chance.
We look at r, but we want to know about

## Hypothesis testing with Correlations

Two possibilities
Ho: = 0 (no actual correlation; The Null Hypothesis)
Ha: 0 (there is some correlation; The Alternative Hyp.)

## Case #1 (see correlation worksheet)

Correlation between distance and points r = -.904
Sample small (n=6), but r is very large
We guess < 0 (we guess there is some correlation in the pop.)

Case #2
Correlation between aiming and points, r = .628
Sample small (n=6), and r is only moderate in size
We guess = 0 (we guess there is NO correlation in pop.)

Bottom-line
We can be wrong in two ways

Correlationsa

## Time spun before

throwing

Aiming accuracy

Manual dexterity

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

Total ball
Distance
toss points from target
1
-.904*
.
.013
6
6
-.904*
1
.013
.
6
6
-.582
.279
.226
.592

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

## Total ball*. toss points

Aiming
accuracy
.628
.181
6
-.653
.159
6
-.390
.445

Manual
dexterity
point avg
.821*
-.037
.045
.945
6
6
-.883*
.228
.020
.664
6
6
-.248
-.087
.635
.869

Confidence
-.502
.310
6
.522
.288
6
.267
.609

.628
.181
6
.821*
.045
6
-.037
.945
6
-.502
.310
6

-.653
.159
6
-.883*
.020
6
.228
.664
6
.522
.288
6

-.390
.445
6
-.248
.635
6
-.087
.869
6
.267
.609
6

1
.
6
.758
.081
6
-.546
.262
6
-.250
.633
6

.758
.081
6
1
.
6
-.553
.255
6
-.101
.848
6

-.546
.262
6
-.553
.255
6
1
.
6
-.524
.286
6

-.250
.633
6
-.101
.848
6
-.524
.286
6
1
.
6

Pearson Correlation
a. Day sample collected = Tuesday
Sig. (2-tailed)
N
Distance from target
Pearson Correlation
Sig. (2-tailed)
N
Time spun before
Pearson Correlation
throwing
Sig. (2-tailed)
N
Correlation is significant at the 0.05 level (2-tailed).

Time spun
before
throwing
-.582
.226
6
.279
.592
6
1
.

Total ball
toss points
1
.
6
-.904*
.013
6
-.582
.226

Correlationsa

Time spun
r = -.904
Distance
before
Aiming
from target throwing accuracy dexterity
point avg
p
=
.
013
-Probability
of-.502
-.904*
-.582
.628
.821*
-.037
.013
.226 getting
.181 a correlation
.045
.945this size
.310
chance.
Reject
Ho 6
6
6 by sheer
6
6
6
p .05.-.883*
1
.279 if -.653
.228
.522
.
.592
.159
.020
.664
.288
sample
6size
6
6
6
6
6
r
(4)
=
-.904,
p.05
.279
1
-.390
-.248
-.087
.267
.592
.
.445
.635
.869
.609

Predictive Potential
Coefficient of Determination
r
Amount of variance accounted for in y by x
Percentage increase in accuracy you gain by using the regression
line to make predictions
Without correlation, you can only guess the mean of y
[Used with regression]

0%

20%

40%

60%

80%

100%

Limitations of Correlation
linearity:
cant describe non-linear relationships
e.g., relation between anxiety & performance

truncation of range:
underestimate stength of relationship if you cant see full range
of x value

no proof of causation
third variable problem:
could be 3rd variable causing change in both variables
directionality: cant be sure which way causality flows

Regression
Regression: Correlation + Prediction
predicting y based on x
e.g., predicting.
throwing points (y)
based on distance from target (x)

Regression equation

## formula that specifies a line

y = bx + a
plug in a x value (distance from target) and predict y (points)
note
y= actual value of a score
Go to website!
y= predict value
Regression Playground

See correlation
& regression
worksheet

120

100

80

60

y=47
y=20

40

20

Rsq = 0.6031
8

10

12

14

16

## Distance from target

18

if x=18
then

20

22

24

26

if x=24
then

Regression Equation
y= bx + a

See correlation
& regression
worksheet

y = predicted value of y
b = slope of the line
x = value of x that you plug-in
a = y-intercept (where line crosses y access)

In this case.
y = -4.263(x) + 125.401

## So if the distance is 20 feet

y = -4.263(20) + 125.401
y = -85.26 + 125.401

y = 40.141

## SPSS Regression Set-up

Criterion,
y-axis variable,
what youre trying
to predict

Predictor,
x-axis variable,
what youre basing
the prediction on

Model Summary
Model
1

R
R Square
a
.777
.603

R Square
.581

Std. Error of
the Estimate
18.476

See correlation
& regression
worksheet

## a. Predictors: (Constant), Distance from target

y = b (x)

y = -4.263(20) + 125.401
Coefficientsa

Model
1

(Constant)
Distance from target

Unstandardized
Coefficients
B
Std. Error
125.401
14.265
-4.263
.815

Standardized
Coefficients
Beta
-.777

## a. Dependent Variable: Total ball toss points

t
8.791
-5.230

Sig.
.000
.000

Predictive Ability
Mantra!!
As variability decreases, prediction accuracy ___
if we can account for variance, we can make better predictions

As r increases:
r increases
variance accounted for increases
the prediction accuracy increases
prediction error decreases (distance between y and y)
Sy decreases
the standard error of the residual/predictor
measures overall amount of prediction error

## Drawing a Regression Line by Hand

Three steps
1. Plug zero in for x to get a y value, and then
plot this value

## 2. Plug in a large value for x (just so it falls on the

right end of the graph), plug it in for x, then
plot the resulting point
3. Connect the two points with a straight line!