Towards Twitter Hashtag Recommendation Using Distributed Word Representations and A Deep Feed Forward Neural Network

Towards Twitter Hashtag Recommendation
using Distributed Word Representations and

a Deep Feed Forward Neural Network
Abhineshwar Tomar , Frederic Godin , Baptist Vandersmissen , Wesley De Neve , and Rik Van de Walle
Multimedia
Lab, ELIS, Ghent University - iMinds, Ghent, Belgium

and Video Systems Lab, KAIST, Daejeon, South Korea
Email: {abhineshwar.tomar, frederic.godin, baptist.vandersmissen, wesley.deneve, rik.vandewalle}@ugent.be
Image
AbstractHashtags are useful for categorizing and discovering content and conversations in online social networks.
However, assigning hashtags requires additional user effort,
hampering their widespread adoption. Therefore, in this paper,
we introduce a novel approach for hashtag recommendation,
targeting English language tweets on Twitter. First, we make use
of a skip-gram model to learn distributed word representations
(word2vec). Next, we make use of the distributed word representations learned to train a deep feed forward neural network.
We test our deep neural network by recommending hashtags
for tweets with user-assigned hashtags, using Mean Squared
Error (MSE) as the objective function. We also test our deep
neural network by recommending hashtags for tweets without
user-assigned hashtags. Our experimental results show that the
proposed approach recommends hashtags that are specific to the
semantics of the tweets and that preserve the linguistic regularity
of the tweets. In addition, our experimental results show that the
proposed approach is capable of generating hashtags that have
not been seen before.
Keywordsdeep neural networks, distributed word representations, hashtag recommendation, Rectified Linear Units, Twitter,
word2vec
I.
I NTRODUCTION
With hundreds of millions of daily active users producing

massive amounts of data, Twitter has become a treasure trove
for a large community of data scientists. Originally started
as a social network, it has evolved today into a platform for
entertainment, news gathering, opinion shaping, and public
demonstration.
We can define hashtags as strings of alpha-numeric characters preceded by a hash character (#). From their first reported
use in 1970 to mean immediate in assembly language, they
have come a long way. Today, besides capturing the gist of
the tweets in which they have been included, hashtags on
Twitter may also provide a sense of being involved in a realtime conversation with an interested audience during local and
global events.
However, the use of hashtags is still not widespread on
Twitter as only 8% of tweets contain hashtags [5]. This can be
mainly attributed to the fact that assigning hashtags to tweets
is cumbersome and time consuming, thus having users forgo
the effort. Nonetheless, hashtags are useful for categorization
and discovery of content and conversations in online social
networks. The latter provides a strong motivation for the
development of novel methods for hashtag recommendation.
The main contribution of this paper is a novel method

for hashtag recommendation for English language tweets,
taking advantage of distributed word representations to train
a deep feed forward neural network. Indeed, the distributed
word representations, which have been generated by means
of a continuous skip-gram model, are useful for predicting
the surrounding words in a sentence, given that they encode
a significant number of linguistic and semantic similarities,
outperforming previous word vector representations [10] and
count-based methods [1]. In addition, the distributed word
representations can be generated efficiently, given a training
speed of more than a billion words a day [12].
Our experimental results show that the proposed approach
is able to recommend hashtags that are specific to the semantics
of the tweets and that preserve the linguistic regularity of the
tweets. In addition, our experimental results show that the
proposed approach is capable of generating hashtags that have
not been seen before (that is, the hashtags recommended have
not been previously used as hashtags by users).
This paper has been organized as follows. We briefly
review related work in Section II. Next, in Section III, we
discuss the skip-gram model used to derive distributed word
representations from part of the Google News dataset. In
Section IV, we introduce our deep feed forward neural network
and motivate the design choices made. In Section V, we
discuss our experimental setup and the experimental results
obtained. Finally, in Section VI, we draw conclusions and
present directions for future work.
II.
R ELATED W ORK
Recent work on hashtag recommendation for microposts

has mainly focused on two research directions: hashtag recommendation for a particular type of tweets and generalpurpose hashtag recommendation. To recommend hashtags that
are specifically oriented towards news topics, Xiao et al. [16]
propose a method that is based on detecting characteristic
co-occurrence words. The latter are defined as topic-specific
informative words that are jointly appearing with a given
target word. To recommend general-purpose hashtags, methods
have been proposed that are based on high-dimensional Euclidean space modeling [8], term frequency-inverse document
frequency (TF-IDF) [17], collaborative filtering [7], and probabilistic models such as naive Bayes [9] and topic models [5],
[14].
Li et al. [8] and Zangerle et al. [17] use the content of

a tweet to find similar tweets, and then use the hashtags of
these similar tweets for recommendation purposes. Mazzia et
al. [9] also leverage tweet content to create a naive Bayes
model of hashtags conditioned on the words of the tweet.
Kywe et al. [7] try to move away from the mere use of tweet
content and attempt to personalize hashtag recommendation by
complementing the tweet content with user preferences. She et
al. [14] propose a supervised topic model-based solution for
hashtag recommendation, treating hashtags as labels of topics.
With the exception of Godin et al. [5], who also make use of
topic models to recommend hashtags from a pool of general
topics, the aforementioned methods all recommend hashtags
from a pool of hashtags that have been previously assigned by
users to tweets.
Bengio et al. [2] propose a popular Neural Network Language Model (NNLM), consisting of a Feed Forward Neural
Network (FFNN) with a linear projection layer and a nonlinear hidden layer to jointly learn word representations and
a language model. Mikolov et al. [11] introduce a different
architecture in which word representations are learned using
a neural network consisting of a single hidden layer. These
word representations are then used to train the NNLM, thus
effectively making the training of the NNLM a two-step process. In [3], [15], it has been shown that word representations
can be used to significantly enhance many Natural Language
Processing (NLP) tasks.
In this paper, we propose a novel content-based method
for Twitter hashtag recommendation, leveraging distributed
word representations and a feed forward neural network. The
proposed method retains the semantic and linguistic regularities of tweets without requiring careful feature engineering.
Furthermore, the proposed method recommends hashtags from
a set of three million words and phrases that have been sampled
from the Google News dataset1 , thus not making use of a
limited pool of general topics or a limited pool of hashtags
that have been previously assigned by users to tweets.
III.
D ISTRIBUTED W ORD R EPRESENTATIONS
We make use of distributed word representations that have

been generated by a continuous skip-gram model. In [12],
[13], it has been demonstrated that this kind of representations
allows preserving linguistic regularities better than Latent
Semantic Analysis (LSA), whereas Latent Dirichlet Allocation
(LDA) becomes computationally expensive in the context of
large data sets.
The skip-gram model used consists of three layers: an input
layer, a projection layer, and an output layer. A log-linear
classifier is used to predict words in a certain range before
and after the current word that is used as an input. However, in
contrast to traditional neural network-based language models,
the skip-gram model used does not make use of a nonlinear hidden layer [2], [11]. By doing so, the computational
complexity of the neural network can be decreased.
Note that the computational complexity Q of the skip-gram
model used can be approximated as follows:
1 https://code.google.com/p/word2vec/
Q = C (D + D log2 V ),
(1)
where C is the maximum distance between the words

(that is, C is the range of prediction), D is the dimension
of the feature vectors, and V is the size of the vocabulary.
Increasing the range C improves the quality of the learned
word vectors but it also increases the computational complexity
of the model. This increase in computational complexity can be
offset by sampling the distant words less frequently to generate
the training examples [10].
IV.
F EED F ORWARD N EURAL N ETWORKS FOR

H ASHTAG R ECOMMENDATION
Deep FFNNs for language modeling [2] and NLP tasks [3]
have already been shown to perform well. In what follows,
we discuss how a deep FFNN can be used for Twitter hashtag
recommendation.
In general, we can describe a deep FFNN as implementing
a non-linear function on each of its layers. If the non-linear
functions implemented by the n layers are h1 , h2 , h3 , ..., and
hn , then, for a given input feature vector x, the output of the
FFNN can be written as follows:
f (x) = hn (hn1 (hn2 (...h3 (h2 (h1 (x)))...))).
(2)
For our research purposes, we created a neural network

with three non-linear hidden layers, having 1000, 500, and 400
neural computational units, respectively (the dimension of both
the input and the output layer is 300, which implies that both
x and f (x) have a dimension of 300; see Section V for more
details). We used Rectified Linear Units (ReLUs) in these layers. ReLUs are close to biological neurons in their functioning,
performing better than sigmoid and hyperbolic tangent nonlinearities when training multilayer neural networks. ReLUs
also do not face the vanishing gradient problem, thus making
it possible to effectively train deep neural networks without
unsupervised pre-training [4]. The output layer used linear
computational units since feature values in our model are real
(i.e., for an input feature value m, m R).
The activation function rl(.) of a ReLU can be described
as follows:
rl(m) =
(m i + 0.5).
(3)
i=1
Here, m represents a feature value, i is a natural number,

and (.) denotes the standard sigmoid non-linearity used by
neural networks. Note that (3) can be approximated as log(1 +
em ), where the latter is known as a smooth softmax non-linear
function.
Let L(.) represent the linear activation function of the
output layer. Given (2) and (3), we can then describe our
multilayer FFNN as follows:
f (x) =
L(rl(rl(rl(xT W (1) + b(1) ))W (2) + b(2) )W (3) + b(3) ), (4)
where x represents the input feature vector, and where
W (n) and b(n) represent the weight vector and bias of the
nth layer, respectively.
Fig. 1 provides an overview of the overall neural network
architecture used for hashtag recommendation, depicting both
the skip-gram model and our deep FFNN.
during that phase, tweets without hashtags are used for evaluating the model by recommending hashtags for these tweets.
Consequently, we remove tweets without hashtags from the
datasets used for neural network training, validation, and
testing.
We collected tweets over a period of four days. After removal of the non-English tweets and subsequent preprocessing, we obtained a dataset with 226, 981 tweets. We
divided these tweets with a ratio of 70 : 15 : 15 into a training,
validation, and testing dataset, respectively.
B. Feature Vector Generation
V.
E XPERIMENTS
In this section, we discuss our experimental setup, including tweet collection and pre-processing, feature vector generation, and training. Moreover, we discuss our experimental
results. Note that Fig. 2 provides a general schematic overview
of the process of hashtag recommendation, visualizing a number of steps described below.
A. Tweet Collection and Pre-processing
To collect tweets, we made use of the public tweet streams
accessible through the Twitter streaming Application Programming Interface (API)2 . The tweets belonging to these public
streams are sampled from the complete volume of tweets going
through Twitter, at a rate of approximately 1%3 . Note that
Twitter does not disclose the exact volume of tweets it is
processing.
The public tweet streams contain tweets that are written
in a variety of languages. However, we only consider English
language tweets for our research purposes. To that end, we
remove all non-English language tweets by making use of the
lang field of the tweets. This language marker is based on
the language locale of the profile of a user. Nonetheless, the
language of the tweets may still be different from English, even
if the language marker identified it as English. Hence, we also
remove all non-ASCII characters from the tweets. Again, some
of the remaining tweets may still contain non-English words
(e.g., due to transliteration). Therefore, we also remove these
non-English words by making use of the Google News-based
dictionary4 discussed further in this section. This dictionary
contains standard and non-standard (i.e., slang) words in the
English language only.
Furthermore, we perform the following pre-processing
steps on the tweets collected:
We remove all mentions (@USER) from the tweets.
We remove all URLs from the tweets.
We remove RT @ from tweets that are retweets.
Since we use tweets and hashtags for training, validation,

and testing of our FFNN, tweets without hashtags are not
useful, with the exception of the evaluation phase5 . Indeed,
2 https://dev.twitter.com/docs/streaming-apis/streams/public
3 https://dev.twitter.com/discussions/9716
4 https://code.google.com/p/word2vec/
5 Testing here refers to the testing of the feed forward neural network with its
error objective function, whereas evaluation refers to the different procedures
discussed in Section V-D.
Given a tweet, we retrieve a 300-dimensional feature vector

for each word of the tweet, including the hashtags, that is
present in the pre-trained Google News-based dictionary4 . As
described in [12], this dictionary is produced by training a skipgram model on part of the Google News dataset, and where
the dictionary produced contains the distributed representations
of three million words and phrases in the Google News
dataset in a 300-dimensional feature space. Next, we perform
an average operation over the different tweet word feature
vectors to create a single tweet feature vector, which is used
as an input for our FFNN. This tweet feature vector has the
same dimension as that of a tweet word feature vector (i.e.,
300). Note that averaging creates a tweet feature vector that
is semantically close to the tweet, given that the distributed
word representations learned preserve semantic and linguistic
regularities over linear operations among words [12].
The target feature vector for a given input tweet feature vector is obtained by retrieving a distributed word
representation-based 300-dimensional feature vector for one
of the hashtags present in the corresponding tweet [12]. The
resulting hashtag feature vector is then used as the desired output during training, validation, and testing. It should be clear
that our feature vectors are always 300-dimensional, given that
the Google News-based distributed word representations have
been obtained in a 300-dimensional feature space.
C. Training
For training, we used the Batch Gradient Descent (BGD)
algorithm with a batch size of 1000 samples. Moreover, we
used the Mean Squared Error (MSE) as the objective function.
The criterion used to stop the training of our deep FFNN
was based on the objective function value obtained for the
validation set. Specifically, if we were not able to observe a
change in the MSE obtained for the validation set for five
consecutive epochs, then we stopped the training of our deep
FFNN. Note that we made use of the Pylearn2 library to
implement our deep FFNN [6].
Fig. 3 shows the progression of the MSE on our training,
validation, and testing dataset as a function of the number of
epochs. The MSE was reduced by 55.22%, from 3.39 to 1.52
on the training set. Furthermore, the minimum value the MSE
achieved on the validation set after a reduction of 47.81% was
1.80 (from 3.44). To avoid overfitting, we stopped training
when the MSE did not reduce anymore on the validation set for
five consecutive epochs. The best performing model produced
an MSE value of 1.86 on the test set, resulting in a reduction
of 46.28%, given the original value of 3.47.
Input
layer
Projection
layer
Output
layer
Distributed word
representation
feature space
r(w1(n))
w(n-2)
r(w2(n))
w(n-1)
r(w3(n))
w(n)
Current
word
w(n+1)
Hashtag
Averaging
w(n+2)
r(wt(n))
Tweet word
feature vectors
Skip-gram model
ReLU unit
Linear unit
Feed forward neural network
Fig. 1: Overall neural network architecture used for hashtag recommendation. The left part of the figure depicts the skip-gram
model used to generate the distributed word representations. w(n i) represents the ith word before the current word w(n) and
w(n + i) represents the ith word after the current word w(n). r(wt (n)) denotes the distributed representation of the tth word
in the tweet. The right part shows the deep FFNN used to recommend hashtags for a tweet, only depicting one ReLU unit for
reasons of clarity. In practice, we use three ReLU layers, with each layer containing multiple ReLU units in our deep FFNN.
300-D feature
vector
Word 1
Word 2
Word 3
Word 4
Word n
Skip-gram
model
pre-trained
on Google
News
dataset
(dictionary)
300-D feature
vector
300-D feature
vector
300-D feature
vector
Hashtag 1
Deep FFNN
300-D
feature
vector
Skip-gram
model
pre-trained
on Google
News
dataset
(dictionary)
Hashtag 2
Hashtag 3
Averaging
Hashtag k
300-D feature
vector
Fig. 2: Schematic overview of the proposed approach for hashtag recommendation.
D. Evaluation
During evaluation, we generate hashtag recommendations
for a particular tweet by feeding its 300-D tweet feature vector
to our trained FFNN. The 300-D output vector produced by
our FFNN is then matched to the three million words and
phrases in the 300-D distributed word representation feature
space, using the closest words or phrases as the hashtags
recommended. We calculate closeness on the basis of the
cosine similarity between the output vector generated by our
FFNN and the feature vectors of the candidate hashtags in the
distributed word representation feature space. Note that we
recommended ten hashtags for each tweet.
Table I shows a number of example hashtags recommended
by our deep FFNN. The first six entries in Table I contain the
recommendations for tweets without user-assigned hashtags,
whereas the last four entries in Table I contain the recommendations for tweets that already contained user-assigned
hashtags.
For the first tweet, the recommended hashtags aptly summarize the feeling. For the second tweet, the first recommendation is a rejoicing after a revelation, which seems to fit
the situation. For the third tweet, the neural network moves
into the semantic space of health issues. The fourth example
demonstrates the preservation of linguistic regularities by the
FFNN, recommending relevant hashtags that have been written
in capital letters for a tweet that has been written in capital
letters. For the fifth tweet, the neural network recommends
1.
2.
3.
4.
Tweet
Someone dm/text me bc Im so bored
The good life is one inspired by love and guided by knowledge.
5.
Method of Losing Weight http://t.co/rs64CEuo5W

SPELLS AND SPELL-CASTING:ENCYCLOPEDIA OF 5000 SPELLS
( JUDIKA ILLES ):BLACKSMITHS WATER HEALING SPELL: A...
http://t.co/k0TfrqJFQW
I hate today cause its room cleaning day for me!!!
6.
yay , we got sixth period today:
7.
#theheat
#notagame
#screwyouheat
#hot
#socal
#104degrees
http://t.co/ktl6sQQjEF
Everyone is getting their Giggle on at #Gigglepalooza
Lol my teacher wants me to have a two minute presentation in Spanish about
Pablo Picasso #toocracker
@kokokose I know, I dont even know why Im bothering trying to be early,
I mean Im probably going to be late anyway #C B F A
8.
9.
10.
Recommended Hashtags
madd, Oh noes, rainnwilson, sooooooo, fricken
Ahh yes,
FIVE THINGS About,
YANKEES TALK,
Kinder gentler,
Ya gotta love
Shape Shifting, Treat Acne, Detect Cancer, Warps, Calorie Burn
DEBUTS NEW, NOW AVAILABLE FOR, TO PUBLISH, DESIGNED TO,
IS READY TO
FAN S ATTIC, Puh leez, Mopping robot, % #F######## 3v.jsn, Interest EURO JAP
Dave Leaderer, Jonathan Toews Troy Brouwer, Pascal Dupuis backhander,
ERIC MAUK Congratulations, Hornqvist redirected
CHEFS Chefs, hot, Muy caliente, petewentz @, courtney cox
Giggle, Skean, Whiffle, Buggie, Bourner
fin, damnit, Gnight, ya, youngin
hurry, flour lentils, al Adha, Hurry hurry, Popeyes chicken
TABLE I: Recommended hashtags for tweets. Phrases are shown with underscores.
Fig. 3: MSE versus the number of epochs.
hashtags like Puh leez (slang for please6 ) and Mopping robot,
which seems to cover both aspects of an unwanted cleaning
process.
The sixth example demonstrates the dynamic nature of
our recommendations, suggesting hockey players along with
hockey terms. This can likely be explained by two factors.
First, our dataset contained tweets from the beginning of
September 2013, covering the start of the NHL final games7 .
Second, period is also hockey jargon8 .
As previously mentioned, the last four entries in Table I
show hashtag recommendations for tweets that already contained hashtags. For tweet number 7 and tweet number 8, the
neural network recommends a hashtag that was already present
in the tweet. In the last example in Table I, the recommended
hashtags are significantly relevant to the tweet, which is in
contrast to the hashtags assigned by the user to the tweet.
6 http://dictionary.reference.com/browse/puh+leez!
7 http://www.nhl.com/ice/schedulebymonth.htm?month=201309
8 http://en.wikipedia.org/wiki/Ice
hockey#Periods and overtime
Fig. 4: Percentage of tweets for which at least one relevant

hashtag was recommended among the top-k recommended
hashtags.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Top-k recommendations
top - 1
top - 2
top - 3
top - 4
top - 5
top - 6
top - 7
top - 8
top - 9
top - 10
Hit-rate
52.50
63.33
75.83
80.83
83.33
85.00
85.00
85.83
85.83
86.67
TABLE II: Percentage of tweets for which at least one relevant

hashtag was recommended among the top-k recommended
hashtags.
We also performed a subjective evaluation with the help

of three evaluators. Given a tweet, the evaluators were asked
to mark a hashtag recommended as either relevant or not
relevant, depending on whether the evaluators would assign
the hashtag recommended to the tweet under consideration.
We took the majority view for a hashtag recommended as the
final answer. Fig. 4 shows the hit-rate obtained, and where

hit-rate is defined as the percentage of tweets for which at
least one relevant hashtag is recommended. We performed this
subjective evaluation on 120 tweets that were not used for
training the FFNN. Table II contains the results in detail.
Fig. 4 shows that the top hashtag recommended is relevant for 52.50% of the tweets. This percentage increases to
83.33% when we consider the top-5 recommendations, and to
86.67% when we consider the top-10 recommendations. As
such, recommending 10 hashtags for a tweet does not lead
to a significant increase in the number of relevant hashtags
recommended, compared to recommending 5 hashtags for a
tweet.
For reference, it is worth mentioning that She et al. [14]
report a best hit-rate of 82% for top-5 recommendations, which
is below our hit rate. However, the best hit-rate of She et al.
increases to 89% for top-10 recommendations, which is better
than our hit-rate for top-10 recommendations. Note, however,
that we made use of a dataset that is different from the dataset
used by She et al. [14], given the lack of a public benchmark
in the field.
Finally, it is evident from the examples above that while
the hashtags recommended are relevant and specific to the
tweet content, they are also highly informal in nature, a typical
characteristic of online social networks.
VI.
C ONCLUSIONS AND F UTURE W ORK
In this paper, we proposed a novel approach for Twitter

hashtag recommendation, leveraging distributed representations of words and a deep feed forward neural network.
The proposed approach is able to learn different semantic
and linguistic regularities of tweets without requiring careful
feature engineering, exploiting these underlying regularities for
recommending hashtags that may consist of single- or multiword phrases, and where the hashtags recommended preserve
the semantic and linguistic regularities of the tweets. Moreover,
the proposed method is also able to take into account the
dynamic nature of Twitter by recommending hashtags for
tweets that have temporal significance at the moment the
tweets were sent. Finally, based on a subjective evaluation,
we find that at least one hashtag is relevant among the top5 recommendations for 83.33% of the tweets used. This
percentage marginally increases to 86.67% if we consider the
top-10 recommendations.
We can identify a number of directions for future research.
First, given that the proposed method uses data-intensive techniques such as skip-gram models and deep feed forward neural
networks, we believe that the effectiveness of the proposed
method can be significantly improved when training would
make use of a larger volume of tweets produced over a longer
period of time. In this context, it would be of particular interest
to study in more detail the influence on the dynamic nature
of hashtag recommendation. Second, in the current approach,
the generation of the distributed word representations is tweet
agnostic (i.e., we do not make use of tweets to generate
the distributed word representations). As such, it would be
interesting to observe the effect of using distributed word
representations produced by a large tweet dataset. In this
context, it would also be interesting to investigate how the
use of more formal datasets such as DBpedia and FreeBase

would affect the semantic and linguistic quality of the hashtags
recommended. Finally, it would be interesting to study in more
detail the effect of syntactic features like capitalization on the
semantic nature of hashtag recommendation.
VII.
ACKNOWLEDGMENTS
The research activities in this paper were funded by Ghent

University, iMinds (an ICT research institute founded by the
Flemish Government), the Institute for Promotion of Innovation by Science and Technology in Flanders (IWT), the FWOFlanders and the European Union.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
M. Baroni, G. Dinu, and G. Kruszewski. Dont count, predict! a systematic comparison of context-counting vs. context-predicting
semantic vectors. In Proc. 52nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), June 2014.
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:11371155, Mar. 2003.
R. Collobert and J. Weston. A unified architecture for natural language
processing: Deep neural networks with multitask learning. In Proc.
25th International Conference On Machine Learning, ICML 08, pages
160167, New York, NY, USA, 2008. ACM.
X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural
networks. In AISTATS, pages 315323, 2011.
F. Godin, V. Slavkovikj, W. De Neve, B. Schrauwen, and R. Van de
Walle. Using topic models for Twitter hashtag recommendation. In
World Wide Web 2013 Companion. ACM, 2013.
I. J. Goodfellow, D. Warde-Farley, P. Lamblin, V. Dumoulin, M. Mirza,
R. Pascanu, J. Bergstra, F. Bastien, and Y. Bengio. Pylearn2: a machine
learning research library. arXiv preprint arXiv:1308.421, 2013.
S. M. Kywe, T.-A. Hoang, E.-P. Lim, and F. Zhu. On recommending
hashtags in Twitter networks. In Proc. 4th International Conference
on Social Informatics, SocInfo12, pages 337350, Berlin, Heidelberg,
2012. Springer-Verlag.
T. Li, W. Yu, and Y. Zhang. Twitter hash tag prediction algorithm.
In World Congress in Computer Science, Computer Engineering, and
Applied Computing, 2011.
A. Mazzia and J. Juett. Suggesting hashtags on Twitter. Technical
report, Computer Science and Engineering, University of Michigan,
2009.
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of
word representations in vector space. CoRR, abs/1301.3781, 2013.
T. Mikolov, J. Kopecky, L. Burget, O. Glembek, and J. Cernocky. Neural
network based language models for highly inflective languages. In IEEE
International Conference on Acoustics, Speech and Signal Processing,
2009, pages 47254728, April 2009.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed
representations of words and phrases and their compositionality. In
Advances in Neural Information Processing Systems. NIPS, 2013.
T. Mikolov, W. T. Yih, and G. Zweig. Linguistic regularities in
continuous space word representations. In HLT-NAACL, pages 746
751. The Association for Computational Linguistics, 2013.
J. She and L. Chen. TOMOHA: TOpic Model-based HAshtag Recommendation on Twitter. In Proc. Companion Publication of the
23rd International Conference on World Wide Web Companion, WWW
Companion 14, pages 371372, Republic and Canton of Geneva,
Switzerland, 2014. International World Wide Web Conferences Steering
Committee.
J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple
and general method for semisupervised learning. In in 48th Annual
Meeting of the Association for Computational Linguistics, pages 384
394, 2010.
[16]
F. Xiao, T. Noro, and T. Tokuda. News-topic oriented hashtag recommendation in Twitter based on characteristic co-occurrence word
detection. In Proc. 12th International Conference on Web Engineering,
ICWE12, pages 1630, Berlin, Heidelberg, 2012. Springer-Verlag.
[17] E. Zangerle, W. Gassler, and G. Specht. Recommending #-tags in
Twitter. In Proc. Workshop on Semantic Adaptive Social Web (SASWeb
2011). CEUR Workshop Proceedings, volume 730, pages 6778, 2011.

Towards Twitter Hashtag Recommendation Using Distributed Word Representations and A Deep Feed Forward Neural Network

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Towards Twitter Hashtag Recommendation Using Distributed Word Representations and A Deep Feed Forward Neural Network

Enviado por

Direitos autorais:

Formatos disponíveis

Towards Twitter Hashtag Recommendation

using Distributed Word Representations and

Lab, ELIS, Ghent University - iMinds, Ghent, Belgium

With hundreds of millions of daily active users producing

The main contribution of this paper is a novel method

Recent work on hashtag recommendation for microposts

Li et al. [8] and Zangerle et al. [17] use the content of

D ISTRIBUTED W ORD R EPRESENTATIONS

We make use of distributed word representations that have

where C is the maximum distance between the words

F EED F ORWARD N EURAL N ETWORKS FOR

f (x) = hn (hn1 (hn2 (...h3 (h2 (h1 (x)))...))).

For our research purposes, we created a neural network

Here, m represents a feature value, i is a natural number,

We remove all mentions (@USER) from the tweets.

We remove all URLs from the tweets.

We remove RT @ from tweets that are retweets.

Since we use tweets and hashtags for training, validation,

Given a tweet, we retrieve a 300-dimensional feature vector

Feed forward neural network

Fig. 2: Schematic overview of the proposed approach for hashtag recommendation.

Method of Losing Weight http://t.co/rs64CEuo5W

yay , we got sixth period today:

Fig. 3: MSE versus the number of epochs.

hockey#Periods and overtime

Fig. 4: Percentage of tweets for which at least one relevant

TABLE II: Percentage of tweets for which at least one relevant

We also performed a subjective evaluation with the help

final answer. Fig. 4 shows the hit-rate obtained, and where

C ONCLUSIONS AND F UTURE W ORK

In this paper, we proposed a novel approach for Twitter

use of more formal datasets such as DBpedia and FreeBase

The research activities in this paper were funded by Ghent

Você também pode gostar