Você está na página 1de 12

STEP 1A STEP 1B

Selecting Selecting
Keywords Media

STEP 1C
Classification

STEP 2A STEP 2B
Collecting full Collecting user-
content of generated
news stories comments
U: All irrelevant
articles U*: a subset of U; n
= |R|
M: Articles w/
keywords in full text
(n = 3,027)

R: Articles w/
keywords in
titles
(n = 2,271)

U**: another subset of U; n


= |R|
U: All irrelevant
articles
n = 5,278,696
M: Articles w/
keywords in full text
(n = 3,027)

R: Articles w/
keywords in
titles
(n = 2,271)
2014-04

2015-09

2014-04
U: All
articles
U*: a subset of U; n
= |R|
M: Articles w/
keywords in full text

R: Articles w/
keywords in
titles
U: No
keyword
(N = 5.3M)

R: U*: A
Keyword sample
in titles of U

Test
Set
U: No
keyword
(N = 5.3M)

R: U*: A
Keyword sample
in titles of U

Test
Set
U: No
keyword
(N = 5.3M)

R:
Keyword R*: U*:
in titles a random a random
sample of R sample of U

Test
Set
.90 0.8401487 0.9911894 0.9098196 0.9 50%

.991 [.982,
1.000]

.909 [.889, .
930]
.901 [.877, .
925]
.840 [.806, .
875]
.991 [.982,
1.000]

.909 [.889, .
930] [.877, .
.901
925]

.840 [.806, .
875]
Always classified as
relevant (57.0%)

Always classified as
irrelevant (26.8%) Optimal cutoff point ?

Mixed (16.2%)

Number of times classified as relevant


STEP 1A STEP 1B
Selecting Selecting
Keywords Media

Evaluation of
STEP 1C
Performance by
Classification Human Coders

STEP 2A STEP 2B
Collecting full Collecting user-
content of generated
news stories comments

STEP 3
Pre-processing
and

Você também pode gostar