Escolar Documentos
Profissional Documentos
Cultura Documentos
approaches that look at individual words. We look at word counts, word clouds, and some
basic clustering methods that explore connections between words.
Kenneth Benoit and Alexander Herzog. 2015. Text Analysis: Estimating Policy Preferences
From Written and Spoken Words. In Analytics, Policy and Governance, eds. Jennifer
Bachner, Kathyrn Wagner Hill, and Benjamin Ginsberg.
Daniel Riffe, Stephen Lacy and Frederick Fico. 2014. Analyzing Media Messages: Chapters 6
(Reliability) and 7 (Validity).
Lori Young and Stuart Soroka. 2012. Affective News: The Automated Coding of Sentiment
in Political Texts, Political Communication 29: 205-231.
Cornelius Puschmann and Tatjana Scheffler. 2016. Topic Modelling for Media and
Communication Research: A Short Primer. HIGG Discussion Paper Series, no. 2016-05.
Content Analysis in the Social Sciences
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Content Analysis in the Social Sciences
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
desktop, which
in the end
should look like
this:
Session 1
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
1.1 Foundations
Bernard Berelson. 1952. Content Analysis in Communication Research. Glencoe: Free Press.
a text.
Krippendorff takes content to emerge in the process
of a researcher analyzing a text relative to a particular
context.
2010)
I am not convinced that the manifest-latent
distinction is useful; but the different approaches
to content matter fundamentally to the kind
of analyses we choose, manual or automated.
No Categories
Session 1. Introduction, & Building a Corpus
(Just looking
at words)
Source: Grimmer and Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.
1.3 Typologies of content analysis
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Schwartz and Unger. 2015. Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods.
1.4 Principles of Content Analysis
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Grimmer and Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.
1.5 Finding a Corpus
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Benoit and Herzog. 2015. Text Analysis: Estimating Policy Preferences From Written and Spoken Words.
1.6 Spreadsheets & Software
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Riffe, Lacy and Fico. 2014. Analyzing Media Messages, 3rd ed.
1.6 Spreadsheets & Software
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
questions
paul ryan -> paulryan corrupt criminal
department
mike pence -> mikepence crooked
crook preparing
her
congress leaked classified
debate
she
miss universe -> missuniverse situation campaign secretary campaigning
Session 1. Introduction, & Building a Corpus
timkaine
mostly
scandal dishonest
muslim problems
investigation ads
hacked released truth
foundation
liar benghazi
read
still play
To make sure we lump together related terms got
leaks
wiki
trail
money corruption
miss piggy -> missuniverse reopening state health fbi issues regarding
response
children
email
wikileaks
done
found
russians -> russia jail
groping
military
public
rally
visit que
way like
paulryan
To make sure we lump together singular and plan media wall
mexico
women
business rigged immigration
years talk thinks
twitter
plural forms baby
speech trip war born stance
country obama
tax sexual going making changing
immigrant -> immigration meeting man election
hotel made russia president people paid
family
ago
isis said policy doesnt
talking
immigrants -> immigration amendment
callingvideo wants wife veterans
mikepence rnc black mouth statements
emails -> email stupid negativeracist son
remarks
star towards
statement trying missuniverse sex idiot manager
america
louisiana wentchurch
make african changed
makeamericagreat second build
economy room tape putin building accusing flood
soldier assault change sexually
detroit
talks allegations
locker
trump
Source: electiondynamics.org.
1.7 Preparing your Corpus
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Benoit and Herzog. 2015. Text Analysis: Estimating Policy Preferences From Written and Spoken Words.
Session 2
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Chicago: University of
Chicago Press.
2.1 Taking words seriously
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Which would
code these
variables
more reliably
humans or
computers?
Session 2. Word-Based Approaches
Source: Maxwell McCombs and Donald Shaw, The Agenda-Setting Function of Mass Media, Chapter 2.7 in Krippendorff and Bock
2.2 Human vs computers
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Which would
code the
readability/
complexity of
text more
reliably
humans or
computers?
Session 2. Word-Based Approaches
Source: Klaus Krippendorff, Inferring the Readability of Text, Chapter 3.9 in Krippendorff and Bock
2.3 Pre-whitening for word-based analyses
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
terms
Session 2. Word-Based Approaches
Stuart Soroka, University of Michigan
Session 2. Word-Based Approaches Content Analysis in the Social Sciences
documents
terms
2.4 Word mentions
Source: Mary Angela Bock, Impressionistic Content Analysis: Word Counting in Popular Media, Chapter 1.7 in Krippendorff and Bock
2.5 Word Clouds
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Schwartz and Unger. 2015. Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods.
2.5 Word Clouds
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Source: Benoit and Herzog. 2015. Text Analysis: Estimating Policy Preferences From Written and Spoken Words.
2.5 Word Clouds
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
clinton
Word cloud of
questions
criminal
corrupt
differentiating crooked department
crook preparing
her
congress leaked classified
debate
she
words from situation campaign
director private server
secretary campaigning
information
dnc billclinton deleted
open-ended
timkaine
berniesanders dishonest mostly
scandal
muslim problems
investigation ads
hacked released truth
foundation
liar
read
benghazi still play
responses to a got wiki
leaks
trail
money corruptionresponse
state health issues regarding fbi
survey
reopening
children
email
wikileaks
done
found
jail
groping
military
public
question
rally
visit que
way like
about what
paulryan
plan media wall
mexico
women
business rigged
twitter speech trip
immigration
years talk thinks
war making born stance
you have baby country obama
Session 2. Word-Based Approaches
ago
isis said policy doesnt
talking
amendment
callingvideo wants wife veterans
rnc black mouth statements
stupid mikepence
heard about negative racist son
remarks
star towards
statement trying sex idiot
missuniverse went america manager
make african louisiana
Clinton or makeamericagreat church changed
second build
economy room tape putin building accusing flood
detroit
talks allegations
locker
trump
Source: electiondynamics.org.
2.5 Word Clouds
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
maxi(pi,j-pj) ,
and its angular position is determined by the
document where that maximum occurs.
Aug 2127
email foundation liar scandal health
Aug 283
email foundation liar scandal fbi
email
health
Sep 410 liar fbiscandal talking
health
debate
Sep 824 email liar debatecampaign
Oct 2330
email
email debate wikileaks liar campaign
Nov 67
email fbi foundation investigation scandal
Source: electiondynamics.org.
2.6 Co-occurrences
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
you.
2.6 Co-occurrences
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Analysis
Session 3. Dictionary-Based Approaches
Source: James Pennebaker and Cindy Chung, Computerized Text Analysis of Al Qaeda Transcripts, Chapter 7.7 in Krippendorff et al.
3.1 Dictionary-Based Coding & Sentiment
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Analysis
Session 3. Dictionary-Based Approaches
Source: Megan Bayagich, Laura Cohen, Lauren Farfel, Andrew Krowitz, Emily Kuchman, Sarah Lindenberg,
Natalie Sochacki, and Hannah Suh, Exploring the Tone of the 2016 Campaign, CPS Blog
3.1 Dictionary-Based Coding & Sentiment
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Analysis
Some standard sources for automated
dictionaries include:
General Inquirer
Linguistic Inquiry and Word Count (LIWC)
Diction
Some Lexicoder-built dictionaries (adaptable to
Session 3. Dictionary-Based Approaches
Analysis
How do you make a decision about which
dictionaries to use?
Always read the dictionary, and if you cant,
dont use the dictionary.
Always check the use of words you are unsure
of, ideally using the corpus you want to analyze.
Session 3. Dictionary-Based Approaches
precision
reliability,
Neuedorfs
Comparing
accuracy and
3.3 The classic approach to reliability
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Intercoder reliability
Option 1: % agreement
Stuart Soroka. 2014. Reliability and Validity in Automated Content Analysis, in Communication and
Language Analysis in the Corporate World, Roderick P. Hart, ed., Hershey PA: CGI Global.
3.4 New concerns about reliability
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
validity
Drawn from Riffe, Lacy and Ficos Types of content analysis validity
validity
Comparing automation and human coding
Session 3. Dictionary-Based Approaches
Lori Young and Stuart Soroka. 2012. "Affective News: The Automated Coding of
Sentiment in Political Texts", Political Communication 29: 205-231.
3.6 New concerns about measurement
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
validity
Pairwise correlations, automated dictionaries
Session 3. Dictionary-Based Approaches
Lori Young and Stuart Soroka. 2012. "Affective News: The Automated Coding of
Sentiment in Political Texts", Political Communication 29: 205-231.
3.6 New concerns about measurement
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
validity
Comparing
results across
dictionaries
Session 3. Dictionary-Based Approaches
Lori Young and Stuart Soroka. 2012. "Affective News: The Automated Coding of
Sentiment in Political Texts", Political Communication 29: 205-231.
3.6 New concerns about measurement
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
validity
Automated dictionaries and manual coding
Session 3. Dictionary-Based Approaches
Lori Young and Stuart Soroka. 2012. "Affective News: The Automated Coding of
Sentiment in Political Texts", Political Communication 29: 205-231.
3.6 New concerns about measurement
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
validity
Media tone and vote shares in the 2006
Canadian election
Session 3. Dictionary-Based Approaches
Lori Young and Stuart Soroka. 2012. "Affective News: The Automated Coding of
Sentiment in Political Texts", Political Communication 29: 205-231.
Session 4
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences
Supervised
Session 4. (Un)Supervised Learning
Source: Grimmer and Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.
4.2 Examples, Background
Stuart Soroka, University of Michigan
Content Analysis in the Social Sciences