Escolar Documentos
Profissional Documentos
Cultura Documentos
AbstractHashtags are useful for categorizing and discovering content and conversations in online social networks.
However, assigning hashtags requires additional user effort,
hampering their widespread adoption. Therefore, in this paper,
we introduce a novel approach for hashtag recommendation,
targeting English language tweets on Twitter. First, we make use
of a skip-gram model to learn distributed word representations
(word2vec). Next, we make use of the distributed word representations learned to train a deep feed forward neural network.
We test our deep neural network by recommending hashtags
for tweets with user-assigned hashtags, using Mean Squared
Error (MSE) as the objective function. We also test our deep
neural network by recommending hashtags for tweets without
user-assigned hashtags. Our experimental results show that the
proposed approach recommends hashtags that are specific to the
semantics of the tweets and that preserve the linguistic regularity
of the tweets. In addition, our experimental results show that the
proposed approach is capable of generating hashtags that have
not been seen before.
Keywordsdeep neural networks, distributed word representations, hashtag recommendation, Rectified Linear Units, Twitter,
word2vec
I.
I NTRODUCTION
R ELATED W ORK
Q = C (D + D log2 V ),
(1)
Deep FFNNs for language modeling [2] and NLP tasks [3]
have already been shown to perform well. In what follows,
we discuss how a deep FFNN can be used for Twitter hashtag
recommendation.
In general, we can describe a deep FFNN as implementing
a non-linear function on each of its layers. If the non-linear
functions implemented by the n layers are h1 , h2 , h3 , ..., and
hn , then, for a given input feature vector x, the output of the
FFNN can be written as follows:
(2)
(m i + 0.5).
(3)
i=1
f (x) =
L(rl(rl(rl(xT W (1) + b(1) ))W (2) + b(2) )W (3) + b(3) ), (4)
where x represents the input feature vector, and where
W (n) and b(n) represent the weight vector and bias of the
nth layer, respectively.
Fig. 1 provides an overview of the overall neural network
architecture used for hashtag recommendation, depicting both
the skip-gram model and our deep FFNN.
during that phase, tweets without hashtags are used for evaluating the model by recommending hashtags for these tweets.
Consequently, we remove tweets without hashtags from the
datasets used for neural network training, validation, and
testing.
We collected tweets over a period of four days. After removal of the non-English tweets and subsequent preprocessing, we obtained a dataset with 226, 981 tweets. We
divided these tweets with a ratio of 70 : 15 : 15 into a training,
validation, and testing dataset, respectively.
B. Feature Vector Generation
V.
E XPERIMENTS
In this section, we discuss our experimental setup, including tweet collection and pre-processing, feature vector generation, and training. Moreover, we discuss our experimental
results. Note that Fig. 2 provides a general schematic overview
of the process of hashtag recommendation, visualizing a number of steps described below.
A. Tweet Collection and Pre-processing
To collect tweets, we made use of the public tweet streams
accessible through the Twitter streaming Application Programming Interface (API)2 . The tweets belonging to these public
streams are sampled from the complete volume of tweets going
through Twitter, at a rate of approximately 1%3 . Note that
Twitter does not disclose the exact volume of tweets it is
processing.
The public tweet streams contain tweets that are written
in a variety of languages. However, we only consider English
language tweets for our research purposes. To that end, we
remove all non-English language tweets by making use of the
lang field of the tweets. This language marker is based on
the language locale of the profile of a user. Nonetheless, the
language of the tweets may still be different from English, even
if the language marker identified it as English. Hence, we also
remove all non-ASCII characters from the tweets. Again, some
of the remaining tweets may still contain non-English words
(e.g., due to transliteration). Therefore, we also remove these
non-English words by making use of the Google News-based
dictionary4 discussed further in this section. This dictionary
contains standard and non-standard (i.e., slang) words in the
English language only.
Furthermore, we perform the following pre-processing
steps on the tweets collected:
Input
layer
Projection
layer
Output
layer
Distributed word
representation
feature space
r(w1(n))
w(n-2)
r(w2(n))
w(n-1)
r(w3(n))
w(n)
Current
word
w(n+1)
Hashtag
Averaging
w(n+2)
r(wt(n))
Tweet word
feature vectors
Skip-gram model
ReLU unit
Linear unit
Fig. 1: Overall neural network architecture used for hashtag recommendation. The left part of the figure depicts the skip-gram
model used to generate the distributed word representations. w(n i) represents the ith word before the current word w(n) and
w(n + i) represents the ith word after the current word w(n). r(wt (n)) denotes the distributed representation of the tth word
in the tweet. The right part shows the deep FFNN used to recommend hashtags for a tweet, only depicting one ReLU unit for
reasons of clarity. In practice, we use three ReLU layers, with each layer containing multiple ReLU units in our deep FFNN.
300-D feature
vector
Word 1
Word 2
Word 3
Word 4
Word n
Skip-gram
model
pre-trained
on Google
News
dataset
(dictionary)
300-D feature
vector
300-D feature
vector
300-D feature
vector
Hashtag 1
Deep FFNN
300-D
feature
vector
Skip-gram
model
pre-trained
on Google
News
dataset
(dictionary)
Hashtag 2
Hashtag 3
Averaging
Hashtag k
300-D feature
vector
D. Evaluation
During evaluation, we generate hashtag recommendations
for a particular tweet by feeding its 300-D tweet feature vector
to our trained FFNN. The 300-D output vector produced by
our FFNN is then matched to the three million words and
phrases in the 300-D distributed word representation feature
space, using the closest words or phrases as the hashtags
recommended. We calculate closeness on the basis of the
cosine similarity between the output vector generated by our
FFNN and the feature vectors of the candidate hashtags in the
distributed word representation feature space. Note that we
recommended ten hashtags for each tweet.
Table I shows a number of example hashtags recommended
by our deep FFNN. The first six entries in Table I contain the
recommendations for tweets without user-assigned hashtags,
whereas the last four entries in Table I contain the recommendations for tweets that already contained user-assigned
hashtags.
For the first tweet, the recommended hashtags aptly summarize the feeling. For the second tweet, the first recommendation is a rejoicing after a revelation, which seems to fit
the situation. For the third tweet, the neural network moves
into the semantic space of health issues. The fourth example
demonstrates the preservation of linguistic regularities by the
FFNN, recommending relevant hashtags that have been written
in capital letters for a tweet that has been written in capital
letters. For the fifth tweet, the neural network recommends
1.
2.
3.
4.
Tweet
Someone dm/text me bc Im so bored
The good life is one inspired by love and guided by knowledge.
5.
6.
7.
#theheat
#notagame
#screwyouheat
#hot
#socal
#104degrees
http://t.co/ktl6sQQjEF
Everyone is getting their Giggle on at #Gigglepalooza
Lol my teacher wants me to have a two minute presentation in Spanish about
Pablo Picasso #toocracker
@kokokose I know, I dont even know why Im bothering trying to be early,
I mean Im probably going to be late anyway #C B F A
8.
9.
10.
Recommended Hashtags
madd, Oh noes, rainnwilson, sooooooo, fricken
Ahh yes,
FIVE THINGS About,
YANKEES TALK,
Kinder gentler,
Ya gotta love
Shape Shifting, Treat Acne, Detect Cancer, Warps, Calorie Burn
DEBUTS NEW, NOW AVAILABLE FOR, TO PUBLISH, DESIGNED TO,
IS READY TO
FAN S ATTIC, Puh leez, Mopping robot, % #F######## 3v.jsn, Interest EURO JAP
Dave Leaderer, Jonathan Toews Troy Brouwer, Pascal Dupuis backhander,
ERIC MAUK Congratulations, Hornqvist redirected
CHEFS Chefs, hot, Muy caliente, petewentz @, courtney cox
Giggle, Skean, Whiffle, Buggie, Bourner
fin, damnit, Gnight, ya, youngin
hurry, flour lentils, al Adha, Hurry hurry, Popeyes chicken
TABLE I: Recommended hashtags for tweets. Phrases are shown with underscores.
hashtags like Puh leez (slang for please6 ) and Mopping robot,
which seems to cover both aspects of an unwanted cleaning
process.
The sixth example demonstrates the dynamic nature of
our recommendations, suggesting hockey players along with
hockey terms. This can likely be explained by two factors.
First, our dataset contained tweets from the beginning of
September 2013, covering the start of the NHL final games7 .
Second, period is also hockey jargon8 .
As previously mentioned, the last four entries in Table I
show hashtag recommendations for tweets that already contained hashtags. For tweet number 7 and tweet number 8, the
neural network recommends a hashtag that was already present
in the tweet. In the last example in Table I, the recommended
hashtags are significantly relevant to the tweet, which is in
contrast to the hashtags assigned by the user to the tweet.
6 http://dictionary.reference.com/browse/puh+leez!
7 http://www.nhl.com/ice/schedulebymonth.htm?month=201309
8 http://en.wikipedia.org/wiki/Ice
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Top-k recommendations
top - 1
top - 2
top - 3
top - 4
top - 5
top - 6
top - 7
top - 8
top - 9
top - 10
Hit-rate
52.50
63.33
75.83
80.83
83.33
85.00
85.00
85.83
85.83
86.67
ACKNOWLEDGMENTS
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
M. Baroni, G. Dinu, and G. Kruszewski. Dont count, predict! a systematic comparison of context-counting vs. context-predicting
semantic vectors. In Proc. 52nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), June 2014.
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:11371155, Mar. 2003.
R. Collobert and J. Weston. A unified architecture for natural language
processing: Deep neural networks with multitask learning. In Proc.
25th International Conference On Machine Learning, ICML 08, pages
160167, New York, NY, USA, 2008. ACM.
X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural
networks. In AISTATS, pages 315323, 2011.
F. Godin, V. Slavkovikj, W. De Neve, B. Schrauwen, and R. Van de
Walle. Using topic models for Twitter hashtag recommendation. In
World Wide Web 2013 Companion. ACM, 2013.
I. J. Goodfellow, D. Warde-Farley, P. Lamblin, V. Dumoulin, M. Mirza,
R. Pascanu, J. Bergstra, F. Bastien, and Y. Bengio. Pylearn2: a machine
learning research library. arXiv preprint arXiv:1308.421, 2013.
S. M. Kywe, T.-A. Hoang, E.-P. Lim, and F. Zhu. On recommending
hashtags in Twitter networks. In Proc. 4th International Conference
on Social Informatics, SocInfo12, pages 337350, Berlin, Heidelberg,
2012. Springer-Verlag.
T. Li, W. Yu, and Y. Zhang. Twitter hash tag prediction algorithm.
In World Congress in Computer Science, Computer Engineering, and
Applied Computing, 2011.
A. Mazzia and J. Juett. Suggesting hashtags on Twitter. Technical
report, Computer Science and Engineering, University of Michigan,
2009.
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of
word representations in vector space. CoRR, abs/1301.3781, 2013.
T. Mikolov, J. Kopecky, L. Burget, O. Glembek, and J. Cernocky. Neural
network based language models for highly inflective languages. In IEEE
International Conference on Acoustics, Speech and Signal Processing,
2009, pages 47254728, April 2009.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed
representations of words and phrases and their compositionality. In
Advances in Neural Information Processing Systems. NIPS, 2013.
T. Mikolov, W. T. Yih, and G. Zweig. Linguistic regularities in
continuous space word representations. In HLT-NAACL, pages 746
751. The Association for Computational Linguistics, 2013.
J. She and L. Chen. TOMOHA: TOpic Model-based HAshtag Recommendation on Twitter. In Proc. Companion Publication of the
23rd International Conference on World Wide Web Companion, WWW
Companion 14, pages 371372, Republic and Canton of Geneva,
Switzerland, 2014. International World Wide Web Conferences Steering
Committee.
J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple
and general method for semisupervised learning. In in 48th Annual
Meeting of the Association for Computational Linguistics, pages 384
394, 2010.
[16]
F. Xiao, T. Noro, and T. Tokuda. News-topic oriented hashtag recommendation in Twitter based on characteristic co-occurrence word
detection. In Proc. 12th International Conference on Web Engineering,
ICWE12, pages 1630, Berlin, Heidelberg, 2012. Springer-Verlag.
[17] E. Zangerle, W. Gassler, and G. Specht. Recommending #-tags in
Twitter. In Proc. Workshop on Semantic Adaptive Social Web (SASWeb
2011). CEUR Workshop Proceedings, volume 730, pages 6778, 2011.