Você está na página 1de 13

Relationships

Leo Sutrisno

Two types of relationships


1. Correlation
 Question: As one variable changes value, does the other variable
change in a related (or correlated) manner?
 Concern: It is not whether one variable affects another, but with
whether the two variables do something together.
 Nominal: if cases that fall into one category of the
first variable tend to group in one category of the
second variable.
 Ordinal / interval / ratio: if there is a definite
pattern in the occurrence of high and low values.
 Example: Do people with a certain hair color tend to have a
certain eye color?
2. Predictive relationships
a. proportionate reduction in error (PRE)
 Question: To what extend is one variable to be useful in
predicting another.
 Concern: measuring PRE in predicting one variable that is
reduced or eliminated with knowledge of a second variable.
 Example: How much help will our knowledge of people‘s
hair color be in making predictions about their eye color?
Nominal level measures of relationships
1. Percentage differences (%d)
 Statistic: percentage difference
 Symbol: %d
 Application: %d are used to provide a general image of the
degree of association between two nominal variables.

Watch Gender
TV Female Male %d
Often 9 10
(19%) (19%) 0
Occasion 33 40
ally (70%) (75%) 5
Never 5 3
(11%) (6%) 5
Total 47 53
(100%) (100%)

%d= ?

Watch Town of recidence %d %d %d


TVRI Pontia Sinta Singka Ptk . Ptk. Stg.
nak ng wang Stg Skw Skw
Yes 22 16 19 1 2 9
(61%) (50%) (59%)
No 14 16 13 11 2 9
(39%) (50%) (41%)
Total 36 32 32 40 42
(100%) (100 (100%)
%)
%d=? %d=?
%d=?
2. Lambda
 Statistic: Lambda Guttman
 Symbol: λ
 Application:
 a symetrical version
λyx = [Σfc - Fr] / [N-Fr]
y: dependent variable
x: independent variable
fc: the largest frequency within a given X (column) category
Fr: the largest row total
Fc: the largest column total

Joint Income Row


Low Medium High total
Yes 9 34 63 106
No 28 36 43 107
Column 37 70 106 213
total

 symetrical version
λ = [(Σfc + Σfr) – (Fc + Fr)] / [2N - (Fc + Fr)]

Joint Area of residence Row


A B C D total
Yes 1 28 52 25 106
No 50 25 2 30 107
Column 51 53 54 55 213
total

3. Phi
 Statistic: Phi
 Symbol: Ф
 Application: Phi measures the relationship between two
dichotomous variables

Ф = [║bc – ad ║] / [√ (a+b)(a+c)(c+d)(b+d)]

Watch TV Gender
Male Female
Yes (a) 29 (b) 28 (a+b) 57
No © 24 (d) 19 (c+d) 43
(a+c) 53 (b+d) 47
Ordinal level measures of relationships

 Full ordered data distribution


Based: numeric data
1. Rank order correlation
Statistic: rank order correlation
Symbol: rs
Application: rs measures the strength and direction of
the rank order correlation between two ordinal
variables

rs = 1 – [ {6(Σd2)} / {N3 – N}]

X Y Rank X Rank Y d d2
11 8 7 6 1.0 1.00
10 9 6 7 1.0 1.00
8 4 5 3 2.0 4.00
6 4 3.5 3 0.5 0.25
6 2 3.5 1 2.5 6.25
4 6 2 5 3.0 9.00
3 4 1 3 2.0 4.00

Σd2 = 25.50
2. Measures of concordance
Statistic: Kendall’s tau-a
Symbol: τa
Application: τa measures the strength and direction of
the correlation between two fully ordered variables

τa = [A – D] / [N(N-1)/2]
A: number of agreements in order
D: number of disagreements in order

Na X Y R R Rx A D
ma x y R 1 2 3 4 5 6 7
G 10 9 7 6 y
0 5 7 F
F 93 9 6 7 1
9 6 G
1
E 87 7 5 5
6 5 E
1
D 73 6 4 3 4 B
8 1
C 68 6 3 2 3 D
2 1
B 65 7 2 4 2 C
2 1
A 40 5 1 1 1 A
1
3
 Partially ordered data distribution
Based: nonnumeric data, subjectively derived categories, such
as measurement of level of agreement in terms of strongly
agree through strongly disagree.

Measures of concordance
 Number of rows and number of columns are unequal
1. Kendall’s tau-b
Statistic: Kendall’s tau-b
Symbol: τb
Application: τb measures the strength and direction
of the correlation between two partially ordered
variables; It is an alternative to Tau-a when there
are ties on X or Y variables. PRE

: τb = [A – D] / [√ (A+D+Ty)(A+D+Tx)]
A: number of agreements in order
D: number of disagreements in order
Ty: Ties on the Y variable
Tx: Ties on the X variable
Number of rows is equal number of columns

Production Salary
level Low Medium High
High 5 11 6
Medium 2 20 4
Low 15 18 3
A = 15(20+11+4+6) + 2(11+6) + 18(4+6) + 20(6)
D = 5(20+4+18+) + 2(18+3) + 11(4+3) + 20(3)
Ty = 15(18+3) + 2(20+4) + 5(11+6) + 18(3) + 20(3) + 11(6)
Tx = 15(2+5) + 2(5) + 18(20+11) + 20(11) + 3(4+6) + 4(6)

 Number of rows and number of columns are unequal


Kendall’s tau-c
Statistic: Kendall’s tau-c
Symbol: τc
Application: τb measures the strength and direction
of the correlation between two partially ordered
variables; PRE
τc = [2(A – D)] / [N2 {║(m-1) / m║}]
A: number of agreements in order
D: number of disagreements in order
m: number of rows or the number of columns,
whichever is smaller
Number of rows is unequal number of columns

3. Gamma
Statistic: Goodman and Kruskal’s gamma
Symbol: γ
Application: gamma measures the strength and
direction of the correlation between two partially
ordered variables; PRE
γ = [A – D] / [A + D]

4. Somer’s dyx
Statistic: Somer’s dyx
Symbol: dyx
Application: Somer’s dyx measures the effect of X on
Y – the power of X to predict Y for two partially
ordered variables
dyx = [A – D] / [A + D + Ty]
dxy = [A – D] / [A + D + Tx]
 Dichotomous ordinal data distribution
Based: two categories, but the categories have a sense of order, such
as measurement of examination, high or low grades.

1. Phi
Statistic: Phi
Symbol: Ф
Application: Phi measures the relationship between two
dichotomous variables

Ф = [║bc – ad ║] / [√ (a+b)(a+c)(c+d)(b+d)]
2. Yule’s Q
Statistic: Yule’s Q
Symbol: Q
Application: Yule’s Q measures the relationship
between two dichotomous variables

Ф = [bc – ad] / [bc+ad]

Envir. Education
Support Low High
High (a) 14 (b) 56
Low © 38 (d) 10
Interval / ratio level measures of relationships

1. Pearson’s correlation coefficient


Statistic: Pearson’s correlation coefficient
Symbol: r
Application: Pearson’s r is used to measure the strength
and the direction of the linear correlation between two
interval level variable.

X Y XY X2 Y2
12 12 144 144 144
14 13 12 196 169
11 12 132 121 144
12 12 144 144 144
12 8 96 144 64
16 16 256 256 256
13 6 78 169 36
12 10 120 144 100
16 12 192 256 144
10 12 120 100 144
12 16 192 144 256
16 14 224 256 196
8 9 72 64 81
14 16 224 196 256
12 12 144 144 144
10 6 60 100 36
16 12 192 256 144
12 10 120 144 100
14 8 112 196 64
12 12 144 144 144
254 228 2948 3318 2766

r = [N(ΣXY) – (ΣX)( ΣY)] / √[{N ΣX2 – (ΣX)2}{ N ΣY2 – (ΣY)2}]

2. Coefficient of determination
Statistic: Coefficient of determination
Symbol: r2
Application: Coefficient of determination indicates, for
interval level variables, the proportion of error in
predicting Y that is eliminated by knowledge of X, and
vice versa

. r2 = {r}2

3. Regression
Assumptions:
 A linear relationship exists
 The relationship is known
 The value of the dependent variable can be
predicted given knowledge of the value of the
independent variable.

The slope
 Statistic: the regression coefficient; the slope
 Symbol: byx
 Application: byx measures the amount and
direction of change in Y for each unit increase in X
byx = [N(ΣXY) – (ΣX)( ΣY)] / [N ΣX2 – (ΣX)2]

The Y-Intercept
 Statistic: The Y-Intercept; the constant
 Symbol: ayx
 Application: The Y-Intercept indicates the point at
which the regression line crosses the Y axis. It is
the predicted value for Y when X = 0
 ayx = Y - byx ( X )

The regression equation


 Statistic: The regression equation;
 Symbol: ayx
 Application: The Y-Intercept indicates the point at
which the regression line crosses the Y - axis. It is
the predicted value for Y when X = 0
 Ŷ = ayx - byx (X)

Standard error of the estimate


 Statistic: Standard error of the estimate
 Symbol: Syx
 Application: the Syx indicates the standard amount
of error made in predicting Y from X using the
regression equation. It is the standard deviation of
the Y values around regression line - that is the Y
value from the Ŷ values
Syx = Sy √(1-r2)

4. The correlation ratio


 Statistic: the correlation ratio, eta
 Symbol: ηyx
 Application: Eta is used to measure the nonlinear
correlation between two interval level variables, and
provides a PRE measure for an interval level dependent
variable.
ηyx = √[SSb/SSt]

SSb = {∑ [(∑Y)2 / nk]} – {(∑fY)2 / N}


SSt = [∑fY2] – [(∑Y)2 / nk]

5. Eta-squared
 Statistic: Eta-squared
 Symbol: η2yx
 Application: Eta-squared is a PRE measure. It indicates
the proportion of variance in Y that is explained by X
or the proportionate reduction of error in predicting Y
given knowledge of X.

η2yx = SSb / SSt]

Você também pode gostar