Escolar Documentos
Profissional Documentos
Cultura Documentos
KDD 2012
(slide)[http://shrdocs.com/presentations/9266/index.html]
5
Takuya Makino
Saturday, August 3, 13
Active
Active
Sampling
Sampling
for Entity Matching
with Guarantees
(slide)[http://shrdocs.com/presentations/9266/index.html]
5
Takuya Makino
Saturday, August 3, 13
entity
: == True/False
active learning
Saturday, August 3, 13
Entity Matching
Imbalanced data:
blocking(): = 100:1
Imbalanced data [Arasu, 11]
precisionrecall
Saturday, August 3, 13
precisionrecall()
sub-linearlabel complexity
Saturday, August 3, 13
Overview
CONVEXHULL Algorithm
precisionrecall
maximize RECALL(h),
subject to PRECISION(h) >= r,
=>
maximize -fn(h),
subject to tp(h) - fp(h) >= 0,
=r/(1-r)
hblack box
Saturday, August 3, 13
H
recallh
Y(h)>=0
P={(X(h), Y(h)):hH}
H
P-1/
Y(h)>=0h O(log n)
Saturday, August 3, 13
Saturday, August 3, 13
h
h
Saturday, August 3, 13
h
Y(h)
precision
recallh
0
X(h)
Saturday, August 3, 13
h = B(mid),
if Y(h)>=0, max=mid
max
mid
min
precision
recallh
0
X(h)
Saturday, August 3, 13
h = B(mid),
if Y(h)>=0, max=mid
max mid
min
precision
recallh
0
X(h)
Saturday, August 3, 13
h = B(mid),
if Y(h)>=0, max=mid
max min
mid
precision
recallh
0
X(h)
Saturday, August 3, 13
(1 - )
0-1 LOSSB
http://www.machinedlearnings.com/2012/01/cost-sensitive-binary-classication.html
Saturday, August 3, 13
: http://www.slideshare.net/pfi/20120105-pfi
Saturday, August 3, 13
REJECTION SAMPLING ()
1-
1-
(fn(h) + (1 - )fp(h))/n
Saturday, August 3, 13
REJECTION SAMPLING ()
1-
1-
(fn(h) + (1 - )fp(h))/n
Saturday, August 3, 13
Label Complexity
Blabel complextitymax(/(1 - ), (1 - )/)
O(log n)
Saturday, August 3, 13
Saturday, August 3, 13
Saturday, August 3, 13
Saturday, August 3, 13
Saturday, August 3, 13
Conclusion
entity matchingprecisionrecall
active learning
black box
label & computational complexity
recall
state-of-the-artoutperform
Saturday, August 3, 13