Escolar Documentos
Profissional Documentos
Cultura Documentos
Association measures
3 criteria important
Frequency
Significance
Effect size
Collocations
dangerous + driving/substances/situations...
px , y
p x , y log
p x p y
how much knowing one var. reduces uncertainty about the other.
px , y
PMI =log
p x p y
observed
PMI =log
expected
PMI
p x , y
PMI2=log
p x p y
p x , y
PMI3=log
p x p y
Calculation
px , y
PMI =log
p x p y
f w 1 , w 2
N
PMI w 1 , w2 =log
f w 1 f w 2
N
N
w 1=dangerous
w 2=substance
N=67063111
50
67063111
PMI dangerous , substance =log
=log 447=2.65
28252657
2
67063111
Task
BNC-based
Human-validated
Collocate + substance
OCD
CORPUS
GOOD
34 total 82 total
18 (53%)
MI
MS
MI
MS
cutoff 10 cutoff 10
First 5
2 (40%)
5 (100%) 4 (80%)
5 (100%)
First 10
6 (60%)
9 (90%)
6 (60%)
First 15
9 (60%)
8 (80%)
Collocate + charm
OCD
CORPUS
GOOD
17 total 29 total
8 (47%)
MI
MS
MI
MS
cutoff 10 cutoff 10
First 5
2 (40%)
2 (40%)
First 10
4 (40%)
4 (40%)
First 15
8 (53%)
8 (53%)
Collocate + network
OCD
CORPUS
GOOD
27 (90%)
MI
MS
MI
MS
cutoff 10 cutoff 10
First 5
3 (60%)
First 10
7 (70%)
1 (10%)
First 15
1 (7%)
9 (60%)
3(20%)
Conclusions
Bigger corpus