Você está na página 1de 7

Predicting Movie Prices

through Social Network


Analysis
Doshi et al.

Introduction
A system for predicting the box-office
success / failure of movies
Gross revenue earned during the first 4
weeks
Web metrics (buzz and ratings) - SNA
Actual revenue from Box Office Mojo
Quotes from HSX
General sentiment about a movie

User Ratings
IMDb and Rotten Tomatoes
IMDb aggregates votes of a large number of users
Users may try to bias the vote artificially

Rotten Tomatoes collects Experts ratings


May have critic bias

Ratings may have snowball effects

Box Office Mojo provides data on daily gross


earnings at the box office
initial high income that peters out overhyped movie
sustained income possible success
increase in income over time black swan driven by
word-of-mouth

Social Network Analysis (SNA)


Relative importance on the web
Betweenness Centrality measure
Number of paths through the network that
pass through the concept (i.e. movie title)
Degree of separation search
Uses a search engine
Counts links that point to the top ranked sites

Sentiment Analysis
General sentiment collected from IMDb
and other forums
Overall sentiment calculated using sentiment
bearing words
Scores indicate strength of sentiment
Words around the word anchor (e.g. title,
stars) used
Dynamically adapt the sentiment expressing
words to take care of movie genre
Weighted by Betweenness Centrality of the
author

Modeling
HSX Prices direction prediction by voting using the 3
most accurate independent variables (web & blog
betweenness, sentiment and gross revenue)
Linear regression affected by time lag between the
independent variables and the predicted variable (HSX
price)
Experimented with various combinations of moving
window size and lag period
The number of theatres showing the movie was found to
be the best predictor
Multiple linear regression and non-linear regression used
with variables like web buzz (perceived prominence of a
movie) together with sentiment (whether the buzz is
positive or negative)

Results
Classification into 3 classes: Revenue less
than production cost (I), Revenue close to
production cost (II), and revenue much more
than production cost (III)
Achieved 53% accuracy
Classification of Group II was 100%
accurate
False positive rates were low

Você também pode gostar