Você está na página 1de 5

Recommender Systems Using

Semantic Web Technologies and Folksonomies



Abstract Many applications have been created using various
algorithms developed for Recommender Systems. These
applications rely heavily on data related to user ratings for items.
However, this data exists in various systems and in multiple
formats, which makes it difficult for different Recommender
Systems based applications to utilize data and provide
recommendations. Semantic Web Technologies aid
Recommender Systems by supplementing already available data
with new pieces of information, which enables applications to
process data about the user-item relationship better. Users also
use tags to explain their liking for a particular item. These tags
or folksonomies also provide information that helps us
understand the user-item relationship better. This article looks at
different implementations of Recommender Systems using
Semantic Web technologies and folksonomies and highlights the
benefits and challenges of the current implementations.

Keywords Recommender Systems; Semantic Web; Linked
Open Data; Folksonomies;
I. INTRODUCTION
To provide effective recommendations, Recommender
Systems (RS) depend on the rating or ranking data for items.
Data used by RSs can include any of the following:
Rating for an item on certain characteristics
The ratings given by users for items
A mix of multiple rating mechanisms
The user-item rating data is usually collected at an
individual application level. For example, once a user has seen
a movie purchased from NetFlix.com or Amazon.com, he can
go and rate it on the site. Thus, the user-movie rating would be
stored either on NetFlix.com or Amazon.com. This becomes a
restriction for RSs when portals try to rank and recommend
items that have no associated ratings on their site although
ratings for the same items are available on other sites.
The problem of availability of RS-based item rating data
existing in silos can be alleviated if all applications publish
their data on the Linked Open Data (LOD) cloud[1]. Then RS-
based applications can use Semantic Web technologies to
utilize data available on the LOD to provide effective
recommendations. A richer information base can be
constructed out of LOD for RS to improve the quality of
recommendations.
It has been further observed [2] that the recommendations
can be improved by tagging the informal information entered
by the users for items.
This paper examines the current work done in the
implementation of RSs using Semantic Web technologies and
folksonomies. It details the current implementations of
websites using Recommender Systems in different
applications and the challenges that they pose.
II. RELATED TECHNOLOGIES
The different technologies used are detailed below:
A. Recommender Systems
Recommender Systems (RS) have been explained in [3]
under the following headings:
1) Collaborative Filtering RS: Collaborative filtering helps
predict the interests of a user by collecting preferences or taste
information from many users. The underlying assumption of
the collaborative filtering approach is that if person A and
person B like a particular item, then the probability of A liking
another item that B likes is more compared to any other
person chosen randomly.
2) Content-Based Filtering RS: Content-based filtering is
based on the characteristics of items. The system suggests
other items to the user based on the characteristics of the items
liked by the user.
3) Knowledge-Based Filtering RS: Knowledge-based filtering
is appropriate where the number of ratings is low or specific
requirements need to be met for an item to satisfy a customer
need.
4) Hybrid Filtering RS: Hybrid systems use a combination of
all the available recommender systems to give an appropriate
recommendation based on the data available.
B. Semantic Web Technologies Web 3.0
Semantic Web [4], also known as Web 3.0, is the future of
the Internet as envisioned by Tim Berners-Lee, the creator of
the Internet. Although artificial intelligence has been studied
extensively, the benefits that a normal user derives from
artificial intelligence are very limited. It is envisioned that
users of the Internet will benefit from the Semantic Web in the
future as concepts of artificial intelligence get implemented
easily in web-based applications.
When Web content is read by human beings, the content or
the data available on the website cannot be processed by
machines. This is because a common global standard for data
and website implementation does not exist across websites.
Semantic Web has laid down the standards to be followed so
that structure can be brought into web content. This helps
developers create semantic web agents that can access these
web pages automatically and have inference power to conduct
automated reasoning [1].
There are multiple terms or technologies that together make
the Semantic Web. They are described below.
1) Resource Description Framework (RDF)
The challenge for Semantic Web is to be able to provide a
language for both the data as well as the rules for reasoning
about the data. The meaning is expressed in RDF as triples [4].
Each triple contains a subject, a predicate, and an object. If
two terms have the same meaning, then the ontology provides
a third basic component of the Semantic Web that formally
defines the relationship among terms. There are multiple RDF
formats, such as RDF/XML, Turtle and N3.
2) Linked Open Data
To realise the full potential of the web, it is essential to
have all the web data to be available as a single global system.
This is the concept of Linked Open Data (LOD) where
different organisations, government agencies or individuals
upload their data on to the web such that it is interconnected
and at the same time accessible by semantic web-enabled
applications. Linked data is mainly about publishing
structured data in RDF using Uniform Resource Identifiers
(URI) [5]. It refers to a set of best practices to be followed for
publishing and connecting structured data over the Internet [4].
Semantic Web applications rely on people and organizations
publishing their data on to the Linked Open Data cloud in a
structured format. Tim Berners-Lee outlined the set of
principles known as the Linked Data principles to be followed
when publishing data on the web. The linked data principles
[6] are as follows:
Use URIs as names for things.
Use Hyper Text Transfer Protocol (HTTP) URIs so that
people can look up those names.
When someone looks up in a URI, provide useful
information, using standards, such as RDF and (SPARQL
Protocol and RDF Query Language) SPARQL.
Include links to other URIs so that they can discover more
things.
Every Linked Open Data (LOD) dataset can be understood
as a Semantic Web application that helps the end user in some
way[7]. In 2007, Chris Bizer and Richard Cyganiak submitted
the application of Linked Open Data (LOD) to W3C,
representing the start of linked data development. As of
September 1
st
, 2011, 295 datasets have been published and
interlinked by the project consisting of over 31 billion RDF
triples, which are interlinked by approximately 504 million
RDF links [8].
3) Semantic Web Services
Semantic Web Services (SWS) provides features that allow
new services to be added, discovered, and composed
dynamically. The processes that might be able to use the web
services are updated automatically to reflect the new forms of
cooperation. SWS combine the flexibility, reusability, and
universal access that typically characterise a web service
along with the expressivity of semantic markup and reasoning,
in order to make the invocation, composition, mediation and
automatic execution of complex services feasible. [4].
4) Semantic Web Applications
Applications are built to use ontologies and data published
in Linked Open Data as RDF to display and infer different
conclusions based on the inference model that has been built
into the application.
5) Ontology Development
Traditionally, to facilitate the building of ontologies for the
Semantic Web from text, text-mining techniques have been
used. However, traditional systems employ shallow natural
language processing techniques and focus only on concept and
taxonomic relation extraction. Ontology development is a big
area for Semantic Web technologies and a lot of work is
happening in this area [9].
C. Folksonomies
Folksonomies are a feature provided by Web 2.0 to end
users. A Folksonomy is a classification technique in which
end users put keywords also known as tags to each item or
page freely and subjectively. The end user can choose any
word as a tag and can put one or more tags to an item. These
tags also represent a form of user item ratings that can be used
to provide better recommendations. [10]
III. IMPLEMENTATIONS OF RECOMMENDER SYSTEMS USING
SEMANTIC WEB TECHNOLOGIES
Recommender Systems have been implemented using
Semantic Web technologies. Some of these are:
MORE - Movie Recommendations using DBpedia [11]
Taste It! Try It! - Mobile Restaurant review and
recommendation application [12]
Semantic Enhanced Case-Based Reasoning for Intelligent
Recommendations [13]
Ontology-Based Personalized and context-aware
Recommendations of News Items [14]
Ontology-based TV Programs recommender system [15]
Semantic Web enabled tourism recommender system [16]
In this paper, we shall discuss two of them in detail and the
remaining ones in brief.
A. MOvie REcommendation (MORE)
In [11], R. Mirizzi et al., have presented MORE. MORE is
a Facebook application that recommends movies by using the
details of the user from their Facebook account along with
data from the Linked Data cloud: DBPedia and the semantic-
enabled version of the Internet Movie Database (IMDB),
LinkedMDB. Similarities between the movies are found if:
They are directly related. This can be found using
properties such as dbpedia-owl:subsequentWork or
dbpedia-owl:previousWork
Two subjects have the same Predicate and same Object in
an RDF triple of Subject, Predicate and Object. For
example, two movies have the same directors. This can be
found using queries that use properties such as dbpedia-
owl:director.
Two objects have the same subject and predicate. For
example, the star cast of the movie would have the same
movie name and the same predicate as starring. The
property used in this scenario is dbpedia-owl:starring.
They belong to the same category or sub-category. This is
handled in DBPedia using the property dcterms:subject and
skos:broader

A semantic adaptation of the Vector Space Model (VSM) is
used for text retrieval to deal with RDF graphs. The whole
RDF graph is represented as a three-dimensional tensor where
each slice refers to an ontology property. For every property, a
movie is seen as a vector whose properties refer to the term
frequency-inverse document frequency (TF-IDF). The
similarity degree is the correlation between the two vectors
and is quantified by the cosine of the angle between the two
vectors.
In the system, once the application is loaded, the user can
search for a particular movie by typing a few characters in the
corresponding text field. The system starts populating an
autocomplete list of movies based on the PageRank algorithm
that is adapted to the DBpedia subgraph related to movies.
The user can select any of the movies as his favourite. The
system then recommends forty movies that are similar to the
selected movie. The system uses content-based filtering using
DBPedia and LinkedMDB and collaborative filtering using
the similarities between users.
B. Taste It! Try It!
In [12], S. Lazaruk et al., have presented a semantically
enabled Social Web Recommender Application called Taste It!
Try It! This application is based on the idea that LOD contains
data that is diverse in nature. The goal of the application is to
make annotating easy so that the end user can create reviews
easily and the information can be used further in
recommender systems.
This application is targeted at the following two groups:
Data producers: Users providing reviews of restaurants.
Data Consumers: People interested in the reviews.
When a person goes to a restaurant and then wants to provide
a review of the same, he can access the Taste It! Try It!
application on his mobile phone. The application captures the
position and place using the GPS on the mobile and enables
putting a semantically enabled review at this location. The
application asks the user to rate a particular location on
various parameters. Once the user has entered all the values,
they can further add some free text if they want to and then
save it. This is saved to the servers and a semantic
recommendation is created in the background. The application
also gives the user a special title if the quality of the
annotation is good, which would then be visible to other
Facebook users.
The ratings given by the user can then be used by semantic
enabled recommender systems while searching for restaurants
fulfilling certain criteria. Thus this application:
Provides semantically enabled reviews
Keeps the end user entertained
Offers personalized, semantic-aware recommendations
C. Semantic Enhanced Case-Based Reasoning for Intelligent
Systems
H. Wang et al. have proposed a Case Based recommender
system using Semantic technologies in [13]. Their approach
integrates both content and rating information. Case similarity
between the current case and a retrieved case is measured
based on the semantic similarity algorithm. The domain
ontology provides a formal representation, which includes
semantic descriptions of users and products. The proposed
approach that considers semantic information of both the
products content descriptions and the users preferences
overcomes the limitations of traditional recommender systems.
D. Recommendations for News Items
In [14], I. Cantador et al., have proposed a News Item
recommender system that uses Semantic Web technologies to
suggest which news items should be shown to the end user on
the screen. Their model personalizes the order in which news
articles are shown to the user based on the interests of the user.
E. Tourism recommender System
In [15], an application TripFromTV+ is discussed. This is
an interactive application that creates lowest-price tailor-made
tourist packages by helping the viewer decide what to do and
what to visit during a trip. This application infers the viewers
preferences from the kinds of TV programs that they enjoyed .
They also use the users activities on social networking sites,
whose diffusion mechanisms are exploited to make the
existing tourism offers known among the viewers contacts.
The paper shows how interactive TV applications can
incorporate content from the Internet, by creating seamlessly
integrated presentations that allow the viewer to have the
advantages of the network capabilities in the TV environment
through domestic and mobile consumer devices.

IV. IMPLEMENTATIONS OF RECOMMENDER SYSTEMS USING
FOLKSONOMIES
In [10], L. Marinho et al., have proposed that the
recommendation quality is improved by using the metadata in
the tags associated with items such that it gives additional
knowledge. With the increasing popularity of the collaborative
tagging systems, tags could be interesting and provide useful
information to enhance RS algorithms. Attributes are termed
as global descriptions of items given by the users and tags
as local descriptions .
They have proposed a generic method that allows tags to be
incorporated to standard CF algorithms, by reducing the three-
dimensional correlations to three two-dimensional correlations
and then applying a fusion method to re-associate these
correlations.
V. CHALLENGES
Although semantic web-related technologies look very
promising for RS, their acceptance and implementation pose
challenges. They include:
Semantic web-based applications suffer from a vicious
circle of data versus application availability.
Organizations are not investing much to publish their data
to the LOD cloud as there are not a large number of
applications that use this data and provide business
benefit. On the other hand, application developers are not
creating new and improved applications as there is not
enough data published on the LOD that can used by the
new applications. This vicious circle of application versus
data exists when any new path-breaking technology starts
getting accepted and implemented as a mainstream
application.
Management of URIs [5]: Linked data is mainly about
publishing structured data in RDF using URIs rather than
focusing on the ontological level or inferencing. This
simplification lowers the entry barrier for data providers
just as the Internet based on URLs simplified the
established academic approaches of Hypertext systems.
However, all the RDF data on websites needs to be
independently accessible using URIs.
Creation and selection of vocabularies: An important
aspect in the whole process of ontology creation and
selection is deciding which ontologies to use or extend. It
is strongly advised to reuse existing vocabularies and
extend them if required rather than create new ones based
on the type of application that is being developed.
Handling provenance and trust [1]: The RS depends on
the data drawn from multiple sites. The question of how
to represent the provenance and trustworthiness of data
drawn from many sources into an integrated view is a
significant research challenge. Tim Berners-Lee proposed
that the browser interface should be enhanced with the
Oh, yeah? button [16] to support the user in assessing
the reliability of the information encountered on the web.
Whenever a user encounters a piece of information that
they would like to verify, pressing such a button will
produce an explanation of the trustworthiness of the
displayed information. This goal is yet to be realized.
Addressing the quality of service [1]: An overview of
different content-based, context-based, and rating-based
techniques can be used to heuristically assess the
relevance and quality of data given. This can be viewed
by other users of the dataset to understand its quality.
Performance and scalability issues [1]: Linked data can
be accessed by different semantic web-enabled
applications using techniques such as advanced crawling
and caching. However, the increase in the number of
datasets over time will degrade the performance of
semantic web-enabled applications. Therefore, this might
necessitate widespread link traversal and crawling. It is
necessary to make sure that an increase in the data in the
LOD does not impact the performance of semantic web-
enabled applications. Any issues in performance will
have a reverse effect on the popularity being gained by
Semantic Web Technologies.
Link Maintenance [1]: The content of the Linked data is
continuously changing. The RDF links between data
sources are updated sporadically. This leads to dead links
pointing to URIs that are no longer maintained as new
data is published. Web architecture is tolerant to dead
links but too many can lead to unnecessary http requests.
This is also an area of research that is receiving a lot of
focus for improvement.
VI. CONCLUSIONS AND FURTHER WORK
Semantic Web provides a foundation and framework that
assists human beings in inferencing knowledge using artificial
intelligence. This study reveals that RS application creators
are slowly coming to realize the benefits of the data present on
the LOD and better RSs based on Semantic Web technologies
are being envisioned and designed. The real power of the
Semantic Web will be realized once developers start creating
Semantic Web-enabled software agents that collect content
from diverse sources, process the information, and exchange
results with other programs. The quality of the data available
on the LOD also needs to be enhanced further. We believe
that this would happen as more and more applications are
designed to use the data on the LOD to provide better
recommendations. Better data on the LOD, along with proper
extraction of user item relationships from Folksonomies,
would help in the creation of superior analysis and
recommendation tools, which will in turn help end users make
wiser choices.
REFERENCES
[1] Bizer Christian, Health Tom, Berners-Lee Tim, Linked Data The
Story So Far, International Journal on Semantic Web and Information
Systems, vol. 5, no. 3, pp. 1-22, 2009.
[2] Karen H. L. TsoSutter, Leandro Balby Marinho and Lars Schmidt
Thieme, "Tagaware Recommender Systems by Fusion of Collaborative
Filtering Algorithms," in SAC08, 2008.
[3] G. Adomavicius and A. Tuzhilin, Toward the next generation of
recommender systems: a survey of the state-of-the-art and possible
extensions, IEEE Trans. Knowl. Data Eng., vol 17, no. 6, pp. 734-749,
2005.
[4] Berners-Lee Tim, Hendler James, and Lassila Ora, The Semantic
Web, Scientific American, pp. 35-43, May 2001.
[5] Hausenblas Michael. Exploiting Linked Data For Building Web
Applications. IEEE Internet Computing July-Aug., vol. 13 no. 4 pp.
68-73, 2009.
[6] Heath, T.,Bizer, C.: LinkedData: Evolving the web into a Global Data
space. Morgan and Claypool, 2011.
http://linkeddatabook.com/editions/1.0/ (accessed August 15, 2012)
[7] Halb Wolfgang, Raimond Yves, Hausenblas Michael. Building
Linked Data For Both Humans and Machines, Linked Data on the
Web Workshop at the 17th International World Wide Web Conference
2008 (WWW2008), Beijing, China, 2008.
[8] Hongbo Lai, Yushun Fan, Le Xin and Hui Liang, "The Framework of
Web 3.0-Based Enterprise Knowledge Management System" 7th
International Conference on Knowledge Management in Organizations:
Service and Cloud Computing Advances in Intelligent Systems and
Computing, Volume 172, 345-351, 2013.
[9] Xing Jiang, Ah-Hwee Tan. "CRCTOL: A Semantic-based domain
ontology learning system." Journal of the American Society for
Information Science & Technology, Vol. 61 Issue 1, p150-168, Jan
2010.
[10] Karen H. L. TsoSutter, Leandro Balby Marinho and Lars Schmidt
Thieme, "Tagaware Recommender Systems by Fusion of Collaborative
Filtering Algorithms" in SAC08, 2008.
[11] R. Mirizzi, T. Di Noia, A. Ragone, V. C. Ostuni, and E. Di Sciascio,
Movie recommendation with DBPedia, in 3rd Italian Information
Retrieval Workshop (IIR 2012). CEUR-WS, 2012.
[12] Szymon azaruk, Jakub Dzikowski, Monika Kaczmarek and Witold
Abramowicz, "Semantic Web Recommendation Application" in
Federated Conference on Computer Science and Information Systems
pp. 10831090, 2012.
[13] Huimin Wang, Guihua Nie, Donglin Chen, "Semantic-Enhanced Case-
Based Reasoning for Intelligent Recommendation" Computer Science
and Information Engineering, WRI World Congress 2009.
[14] Ivn Cantador, Alejandro Bellogn, Pablo Castells, "Ontology-based
Personalised and Context-aware Recommendations of News Items"
Web Intelligence and Intelligent Agent Technology, 2008.
[15] Yolanda Blanco-Fernndez, Martn Lpez-Nores, Jos J. Pazos-Arias,
Jorge Garca-Duque, Manuela I. Martn-Vicente, "TripFromTV+:
Targeting Personalized Tourism to Interactive Digital TV Viewers by
Social Networking and Semantic Reasoning" IEEE Transactions on
Consumer Electronics, Vol. 57, No. 2, May 2011.
[16] Berners-Lee, T., Cleaning up the User Interface, Section The Oh,
yeah?-Button. http://www.w3.org/DesignIssues/UI.html, February 6,
1997.

Você também pode gostar