Blogs and The Social Weather

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/216518301
Blogs and the Social Weather

ARTICLE JANUARY 2002
CITATIONS
READS
13
103
1 AUTHOR:
Alexander Halavais
Arizona State University
42 PUBLICATIONS 421 CITATIONS
SEE PROFILE
Available from: Alexander Halavais

Retrieved on: 25 February 2016
Blogs and the Social Weather

Alexander Halavais
University at Buffalo / SUNY
School of Informatics
http://alex.halavais.net
Internet Research 3.0: Net / Work / Theory
Maastricht, October, 2002
The Web allows us all, at least those with access, to be publishers. Until recently,
that idealized opportunity was encumbered by a number of practical difficulties. Chief
among these was that while creating a hypertext document and placing it on a server was
not the most difficult task, it remained just beyond the reach of most users. The creation
of systems that allowed those without training to easily create and maintain a personal
journal on-line changed that. Web logs, or simply blogs, existed for several years
before such systems were available, but the simplified publishing systems allowed for a
much broader group to engage in blogging.
There are two ways to think of blogs. First, they may be considered individually,
as a way for people to communicate their ideas to a larger audience. While the last few
centuries have seen the introduction of a range of technologies, perhaps none lowered the
entry barriers to broadcasting individual thoughts and creative efforts to the degree to
which the Web and blogging systems have. The second way is to think about blogs from
a collective perspective, a perspective that is only recently becoming more popular.
In several recent essays in a blog dedicated to the topic, John Hiler has examined
the impact of blogs on the larger media environment, and more specifically their
relationship with more traditional forms of journalism. He suggests that blog journalism
has much in common with the collective mind of Star Treks Borg (2002a). Soon after,
he rejects this metaphor for another, ecological view of the blogosphere (2002b). Both
the term, and the concept of a wider sphere of blogs, has come very much to the fore this
year. No longer is the focus on the several individual blogs that have attained a wide
readership, but on the effect of blogs at a macro-scale.
The problem with considering blogs at a larger scale is that many of the
connections between blogs are informal; that is, we must assume that readers are being
influenced by a network of writers and this is contributing to their own work. While there
do exist other indications of relationships, there exists very few reference points from
which to easily apprehend the nature of recent discussions on blogs. After a brief
overview of possible ways of indexing and summarizing blogs, one approach to such
summarization is described herein.
Large-Scale Discussions and Cyberdemocracy
Each new communications medium has been accompanied by claims that it will
enhance or inhibit freedom. More recently, it has been suggested that differences in the
type of technology are significant; in Ithiel de Sola Pools words, Freedom is fostered
when the means of communication are dispersed, decentralized, and easily available, as
are printing presses or microcomputers. Central control is more likely when the means of
communication are concentrated, monopolized, and scarce (1985, p.5). From early in
the evolution of computer networks, they appeared to engender the sort of massively
distributed and decentralized communication conducive to democratic deliberation.
This promise has gone largely unfulfilled. In particular, we have not seen the sort
of anticipatory democracy Alvin Toffler predicted would increasingly involve the
citizen in policy creation (1970, 1978). This is not due to a lack of active discussion
supported by computer networks. While these on-line discussions are not
unproblematicand work has been done that demonstrates the degree to which such
discussion can vary from what we might consider the rational ideal at the root of
participatory democracymany such discussions can and do lead to informed consensus
over policy issues. The question is how to link this consensus to those who actively shape
policy (Dahlgren, 2000).
Many see the Internet as a technology encouraging discussion and integration at
the community level, and hope that virtual communities can serve to enhance rather
than distract from traditional, geographically-bound local communities (e.g., Agre &
Schuler, 1997; Doheny-Farina, 1998). James Fishkin (1995, p. 24) notes that democracy
as it is practiced today in America often appears more like the Spartan tradition of
measuring the level of applause to elect a representative (during an event called the
Shout) than the deliberative model found in Athens. The quantity of response is held to
be more important than the content. Electronic media has until this point enhanced rather
than reversed this trend. Legislators and policy-makers are sometimes aware of the
volume of the telephone calls, emails, and faxes they receive, but all detail is lost in the
shout. From the perspective of policy-makers, public deliberation is generally limited
to what is visible in the national daily newspapers. In those cases where public opinion is
in conflict with the general opinions represented in the mainstream newsas in the
nomination of Zo Baird, for examplethe only indication is often this sort of direct
logging of telephone calls, faxes, and mail (Page, 1996, c. 4).
The challenge is to find ways of representing discussion and deliberation, so that the
public discourse finds its way to appropriate representatives. Some have placed the onus
of this communication on the citizen, and recommended ways in which activists can
make use of new communication technologies to make their voice heard (e.g., Kush,
2000; Browning & Weitzer, 1996). Thus far, computer networking has, on the whole,
reinforced existing political processes, rather than causing radical change (Davis, 1999).
The history of using computer networks as catalysts for policy discussions
reaches into the 1970s (Hiltz & Turoff, 1993). Such attempts examine the formal
structure of the software and rules for discussion in the hope that an idealized public
sphere of discussion can be obtained. The Online Deliberative Discourse Project at
Harvards Berkman Center for Internet and Society1 and attempts by the Summit
Education Initiative2, for example, are exploring ways in which computer networking can
be used to establish temporary public fora to discuss policy issues. Such projects are
promising, but they ignore the discussion that already exists informally across the
Internet on email lists and discussion boards, blogs, and homepages.
1
2
http://cyber.law.harvard.edu/projects/deliberation/
http://www.seisummit.org/od-overview.htm
It is not difficult to find examples of political discussion on-line, in venues not

necessarily designed for such discussion. Group blogs, like www.slashdot.org,
www.kuro5hin.net, www.plastic.com, and www.fark.com regularly become the locus of
political debate. These opinions, falling across the political spectrum, may not be as
erudite or as measured as those found in the editorials of the major daily newspapers, but
they do represent a significant variety of views, views that may not be accurately
reflected in uni-dimensional opinion polls.
While the other forms of online discussion remain important, blogs continue to
grow in popularity and importance. Among the newest authors of these blogs are those
who write in support of the War on Terrorism. These war blogs represent an
important indicator of how regular Americans have reacted to global events. While there
are general indications of the tenor of many of these blogs (the New York Times
characterized the content as addressing a wide range of news and political topics,
usually from right of center; Gallagher, 2002), the information from these informal
sources is collected equally informally, and their possible use as an indicator of public
opinion is largely ignored.
Significant value is placed on methods of gathering public opinion that result in
quantitative measures (Kweit & Kweit, 1987; Wadsworth, 1997). While we cannot
accurately infer the proportion of Americans that hold a particular position from the
relatively small percentage of citizens who write their opinions on-line, we can provide
the qualitative content that remains outside the capability of most surveys or opinion
polls. Moreover, the approach taken in this research holds the promise of providing
unbiased quantification of such opinions, without losing the deliberative content of the
discussion, or forcing a set of pre-existing categories on the discussion.
Content analysis of the mass media has advanced relatively slowly since its
inception in the middle part of the last century. The ultimate aim was to systematize the
process in such a way that the global production of news and opinion could be used to
predict possible conflicts and map resources; content analysis would become akin to
predicting the weather. Unfortunately, while the technology for analysis has gradually
improved over the decades, it has not been until recently that content was easily available
in digestible form. Perhaps as a result of the explosion of content available in electronic
format, tools available for analysis of large textual databases have increased in number
exponentially over the last decade (see Nissan, 1994; Popping, 2000, Pennings & Keman,
2002). While computerized content analysis of news resources has been used to support
policy analysis (e.g., Bengston & Fan, 2001), such analyses have not been extended to
the Internet at large.
Summarizing Public Discussion on Blogs
The Google search engine provides three approaches to organizing the content of
the Web that may serve as effective models for organizing and summarizing the content
of blogs. First, Google ranks pages in its index according to an algorithm that determines
the reputation of each page according to the number and reputation of pages linking to
it (see Page et al, 1998). This approach leverages the collective ability of webmasters to
filter their own hyperlinks, and assumes that the majority of those links are an indication
of quality. The resulting ranking system has proven effective, if only by the popularity of
the search engine.
Second, a new feature on the Google site trolls the news sites on the web and
generates a summary. Beyond this basic function it also automatically groups stories with
similar content from different sources. Other systems, like NewsSeer, perform a filtering
function by tracking a users preferred stories and proffering similar ones on subsequent
visits. While PageRank filters sites by popularity, Google News (and similar systems) are
biased in favor of those sites most rapidly or recently updated.
Finally, Googles Zeitgeist report is generated on a weekly basis. It assumes
that the terms searched for on the popular search engine are in some way indicative of the
collective thoughts and aspirations of its millions of users, Google assembles weekly lists
of the largest gaining and losing search terms. It provides both overall lists of these
changes, as well as more topically specialized lists (see fig. 1). Alexa provides a similar
metric for websites, suggesting in their Movers and Shakers listings which sites have
had the greatest increases in traffic. Companies offering corporate intelligence, or web
intelligence offer similar trend-watching to their clients. This study utilizes such an
approach to track and summarize the topics of conversation on a sample of blogs,
emphasizing differences and anomalies, either in topic or vocabulary.
The first of these approaches, the use of automated hyperlink analysis to
determine pages with the highest reputation, has already been applied to the blogging
world. MIT Media Labs Blogdex indexes nearly 10,000 blogs and analyzes the links
among them (Kahney, 2001). In fact, it includes an element of the zeitgeist approach as
well, by ranking sites according to changes in linkages rather than simply linkages.
The second approach finds a parallel in meta blogs and announcement sites.
The original role of blogs was to provide a kind of filtering of the web: a cross between a
site providing bookmarks, new site announcements, and reviews. Many successful
collective blogs remain more or less true to this vision, filtering either web sites or
mainstream news. Some of the most popular group blogs, like Slashdot and Metafilter,
perform this function. Unlike Googles site, however, these filters tend not to detect not
general trends. Others do a better job of summarizing blog content, and as processes for
syndicating content become more widespread, sites that summarize content will likely
gain in popularity. Google News benefits from a set of news sources that tend to be fairly
uniform in terms of topics and approaches, something that certainly cannot be said of the
blogosphere. As with Google News, there are also sites that focus on the most recently
updated blogs, including Weblogs.com and Daypop.com.
The final approach, inspired by Googles Zeitgeist, is the one followed here.
Rather than recording the changes in keyword frequency, the changes in word frequency
were recorded from week to week, in an attempt to gauge what could be considered the
hot topics. This measurement of topicality among blogs can then be considered in
relation to a similar measure of news sites to determine the ways in which they differ.
Finally, once the broad outlines of blog discourse are discovered, questions related to
those topics can be examined and illustrated with specific key phrases from the original
work.
Tracking Word Differences on Personal Blogs

As there is no single source that indexes the blogosphere, assembling a
representative sample is difficult. It is difficult even to estimate the total number of blogs
in use. For the purposes of the work here, a personal blog was defined as any web page
that includes dated entries (with at least one for each 10-11 day period collected), that
appears to have one predominant author. The sample was also limited to Englishlanguage blogs. The approach taken here is a compromise, but one that seemed
reasonable, given the constraints. A total of 125 blogs were assembled, via either
random blog links or random sampling, on five directories: weblogs.com, bloghop.com,
blogger.com, portal.eatonweb.com, and livejournal.com. It is worth noting that while
there is significant overlap among these listings, there are also significant differences.
Blogs hosted by LiveJournal tend to be personal in nature, and rarely discuss news
events, while those listed on weblogs.com must alert the website when they have
changed, leading to a set of blogs that are more frequently updated. Nonetheless, they
represent a part of the collective opinion.
Initially, it was planned to compare these with changes in items covered in the
mainstream news. On examination of the results however, it seemed clear on immediate
analysis that with a few idiosyncratic exceptions, the gaining items in the blogs were tied
to major news stories.
Text from each of these blogs was gathered six times: once every ten days
beginning on 1 July, using a web crawler written specifically for this task, and manually
harvesting text from those blogs with a unique format. Text was taken only from the first
page of the appropriate archive for each blog (that is, jumps to more pages were
ignored). Some modifications might help to alleviate this, perhaps by parsing the listed
dates on the page, but for the purposes of the initial study it is assumed that these effects
are minimal. The archived text was divided into three 10 or 11-day segments for each
month.
There exist a number of approaches for categorizing and summarizing text,
ranging from the very simple act of counting words to increasingly more finely grained
versions of the same. At this point, the simplest procedure was used, and changes in word
frequency were recorded and reported as a percentage change3. Future implementations
may seek to lemmatize the corpus, group similar words, or look for word pairs. At
present however, differences simply in word frequency are explored.
All of the text from the blogs was collected into a single text file, and the
frequency of words in that file was determined. This list of frequencies was compared
with that of the previous crawl. The words are listed in order of largest difference, with
those that are unique to one list or the other placed in a separate list. The most changed
items (either up or down) were assembled and lists of these words in context were
generated. These were then grouped in a necessarily interpretive process into the top
gaining and losing topics.
Some of the work, including harvesting the text from the blogs and comparing
word frequencies from week to week, is already automated at this stage. The ultimate
aim, not yet realized, is to implement a software system that will complete similar
surveys of blog changes with very little human intervention or interpretation needed.
3
A separate tally was generated of words that appear in only one of the lists.
Overall Blog-Agenda
The final product of the collection was five lists of words that represented the
greatest changes from week to week. From these, the obvious artifacts were removed
(e.g., the days of the month or month of the year). What remained, with some
consistency, was the major news stories of the periods studied, with a few items that
seemed to have an impact more specifically on the blog community.
A list of the top five topics during each period is provided in table 1. In practice,
assembling news-related items was relatively straightforward, and these dominated the
top of the lists. An informal comparison to the top stories reported in the news during the
same period makes clear that these also represented the most commonly shared content of
personal blogs. Unfortunately, given that the measures were not very finely grained,
determining the lag between release and comment, or the degree to which particular
stories were more or less popular, was difficult.
There were some unusual emphases. It seemed, for example, more unusual or
controversial topics were more likely, naturally, to be discussed. There also seemed to be
an enduring interest in the events in the Israeli-Palestinian conflict, as well as possible US
involvement in Iraq. There was a further emphasis on environmental issues, and on issues
broadly related to the Republican party.
Some items seemed to be unique to the blog world and of interest particularly to
the blog community. For example, the Friday Five4, a set of personal questions asked
each Friday and answered on personal blogs, seemed to be popular among a relatively
large number of blogs in the sample. The results tended to present themselves in the
results. For example, when the Friday Five from August 9 asked questions about the
authors cars, this led to a spike in not only the word car but the names of a number of
automobile manufacturers. Similar trends were detectable but remained subsidiary to a
focus on news items.
Finally, some words were impossible to clearly categorize, yet seemed to change
significantly from week to week. The first period, for example, saw a substantial rise in
the words anyway and simple. These were used in a variety of contexts. Two
possibilities present themselves as explanations. First, it may be that with a larger sample
these anomalies will recede. Given the impact a blog dedicated to space exploration had
on the averages (by repeating the word scramjet many times, for example), it is certain
that a larger sample would yield more consistent results. Second, it is possible that those
who are exposed to a particular word are more likely to continue to use it in their own
work. Normally, this possibility would seem remote, but the increase of the use of the
words sign and signs during the period a new film titled Signs was released is only
partially attributable to mention of the name of the movie and its discussion. It seems that
the term is used more frequently even when not in connection with the film. Again, more
finely grained sampleson a daily basis, for examplewould provide evidence if this is
the case.
Crystal Gazing
This represents a very preliminary effort at analyzing changes in Weblog activity.
Even as such, it introduces some potentially interesting insights. The Blogdex project has
demonstrated that mainstream news items are the most popular target of links among
4
http://www.fridayfive.org
bloggers, and this survey not surprisingly supports this by indicating that popular news
stories are the most common topic of discussion across a sample of pages. However,
there does seem to be an emphasis on issues that are particularly contentious. As Hiler
(2002b) has suggested, it may be that blogs largely represent an extension of commentary
found in other news media. A comparison of topics found in blogs and on the opinion
pages of newspapers might yield interesting similarities.
A significant limitation of the approach used here is that it focuses on centrality
measures of textual changes. I cannot think of another corpus that would be as eclectic as
the text collected from these blogs; many are not even internally cohesive, so it is
preposterous to expect the whole to make sense. The identified trends rarely rose beyond
a few dozen blogs within the sample. Therefore, any indication that this represents a
plurality of interest or opinion is probably an overstatement. Nonetheless, it is interesting
to find common threads among such a diverse set of sources.
For an automated blog weather report to be created, the context of these
changing terms must be analyzed and categorized. The largely interpretive categorization
undertaken here is neither reliable nor scalable. A number of options are available for
determining clusters of words within the text, and for summarizing the nature of these
clusters. Future work will concentrate on integrating these approaches.
Blogs, as a popular and accessible form of public discourse, represent a
potentially valuable source of public opinion and deliberation. Unless the ideas presented
in blogs are in some way collected and summarized, they risk being lost to the policymaker and to the larger public. This study represents a first step in the direction of
making that discourse transparent and available.
Bibliography
Agre, Phillip and Schuler, Douglas (eds.). (1997). Reinventing technology, rediscovering
community. Norwood, NJ: Ablex.
Bengston, David N. & Fan, David P. (2001). Trends in attitudes toward the recreation fee
demonstration program on the national forests: A computer content analysis approach.
Journal of Park and Recreation Administration, 19(4), 1-21.
Browning, Graeme & Daniel J. Weitzer (1996). Electronic democracy: Using the Internet
to influence American politics. Medford, NJ: ITI.
Dahlgren, Peter (2000). The Internet and the democratization of civic culture. Political
Communication, 17, 335-340.
Davis, Richard (1999). The web of politics. Oxford: Oxford University Press.
Doheny-Farina, Stephen (1998). Wired neighborhood. New Haven, CT: Yale University
Press.
Fishkin, James S. (1995). The voice of the people. New Haven, CT: Yale University
Press.
Gallagher, David F. (2002, 10 June). A rift among bloggers. New York Times, p. C4.
Hiler, John (2002a). Borg journalism. Microcontent News. April 1. Retrieved from the
web, August, 2002, http:/ /www.microcontentnews.com/ articles/ borgjournalism.htm .
Hiler, John (2002b). Blogosphere, the emerging media ecosystem. Microcontent News.
May 28. Retrieved from the web, August, 2002, http:// www.microcontentnews.com/
articles/ blogosphere.htm .
Hiltz, Starr Roxanne & Turoff, Murray (1993). The network nation, revised ed.
Cambridge, Mass.: MIT Press.
Kahney, Leander (2001, 30 July). Tracking bloggers with blogdex. Wired News.
Retreieved from the web, August, 2002, http:// www.wired.com/ news/ culture/
0,1284,45546,00.html .
Kush, Christopher (2000). Cybercitizen: How to use your computer to fight for all the
issues you care about. New York: St. Martins.
Kweit, Mary Grisez & Kweit, Robert W. (1987). The politics of policy analysis: The role
of citizen participation in analytic decision making. In Jack DeSario & Stuart Langton
(eds.), Citizen Participation in Public Decision Making, Contributions in Political
Science, no. 158, pp. 19-38. New York: Greenwood Press.
Nissan, Ephraim (ed.). (1994). From information to knowledge: Conceptual and content
analysis by computer. London: Intellect.
Page, Benjamin I. (1996). Who deliberates? Mass media in modern democracy. Chicago:
University of Chicago Press.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking:
Bringing order to the web. Stanford Digital Library Technologies Project. Retrieved
August, 2002 from http://stanford.edu/ ~backrub/ pageranksub.ps
Pennings, Paul & Keman, Hans (2002). Towards a new methodology of estimating party
policy positions. Quality & Quantity, 36(1), 55-79.
Pool, Ithiel de Sola (1983). Technologies of freedom. Cambridge, Mass.: Harvard
University Press.
Popping, Roel. (2000). Computer-assisted text analysis. Thousand Oaks, CA: Sage.
Toffler, Alvin (1970). Future shock. New York: Random House.
Toffler, Alvin (1978). Introduction of future-conscious politics. In Clement Bezold (ed.),
Anticipatory Democracy: People in the Politics of the Future. New York: Random
House.
Wadsworth, Deborah (1997). The publics view of public schools. Educational
Leadership, 54(5), 44-48.
Table 1 Leading topics from each 10-day period.

From first to second part of July
1. Encouraging Arafat to step down
2. Nuclear materials stolen in Congo
3. German intelligence believes Bin Laden is alive
4. American Taliban suspect can be held without attorney
5. Issues relating to clean water
From second to third part of July
1. Jackson and the Israeli occupation
2. The role of the PLO in historical perspective
3. Questions related to the nature of the soul
4. Study of Saudi Internet filtering
5. King Husseins comments on potential attack of Iraq
From third part of July to first part of August
1. Signs (movie)
2. Class action suit against fast food
3. Republican preparations for the election
4. US Navy called to task for damage to environment
5. Rumsfelds visit to the Middle East
From first to second part of August
1. Links between Bin Laden to Saudi Arabia
2. Destruction with reference to the Middle East
3. NEA guidelines on teaching about September 11
4. Human rights in Egypt
5. New TV programs
From second to third part of August
1. Swedish hijack suspect
2. European criticism of Bushs Iraq policy
3. Israeli occupation
4. Earth Summit
5. Jean Chrtiens comments on Middle East situation

Blogs and The Social Weather

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Blogs and The Social Weather

Enviado por

Direitos autorais:

Formatos disponíveis

See

Blogs and the Social Weather

Available from: Alexander Halavais

Blogs and the Social Weather

It is not difficult to find examples of political discussion on-line, in venues not

Tracking Word Differences on Personal Blogs

Table 1 Leading topics from each 10-day period.

Você também pode gostar