Recommender Systems rely heavily on data related to user ratings for items. This data exists in various systems and in multiple formats. This article looks at different implementations of RS using Semantic Web technologies and folksonomies.
Recommender Systems rely heavily on data related to user ratings for items. This data exists in various systems and in multiple formats. This article looks at different implementations of RS using Semantic Web technologies and folksonomies.
Recommender Systems rely heavily on data related to user ratings for items. This data exists in various systems and in multiple formats. This article looks at different implementations of RS using Semantic Web technologies and folksonomies.
Abstract Many applications have been created using various algorithms developed for Recommender Systems. These applications rely heavily on data related to user ratings for items. However, this data exists in various systems and in multiple formats, which makes it difficult for different Recommender Systems based applications to utilize data and provide recommendations. Semantic Web Technologies aid Recommender Systems by supplementing already available data with new pieces of information, which enables applications to process data about the user-item relationship better. Users also use tags to explain their liking for a particular item. These tags or folksonomies also provide information that helps us understand the user-item relationship better. This article looks at different implementations of Recommender Systems using Semantic Web technologies and folksonomies and highlights the benefits and challenges of the current implementations.
Keywords Recommender Systems; Semantic Web; Linked Open Data; Folksonomies; I. INTRODUCTION To provide effective recommendations, Recommender Systems (RS) depend on the rating or ranking data for items. Data used by RSs can include any of the following: Rating for an item on certain characteristics The ratings given by users for items A mix of multiple rating mechanisms The user-item rating data is usually collected at an individual application level. For example, once a user has seen a movie purchased from NetFlix.com or Amazon.com, he can go and rate it on the site. Thus, the user-movie rating would be stored either on NetFlix.com or Amazon.com. This becomes a restriction for RSs when portals try to rank and recommend items that have no associated ratings on their site although ratings for the same items are available on other sites. The problem of availability of RS-based item rating data existing in silos can be alleviated if all applications publish their data on the Linked Open Data (LOD) cloud[1]. Then RS- based applications can use Semantic Web technologies to utilize data available on the LOD to provide effective recommendations. A richer information base can be constructed out of LOD for RS to improve the quality of recommendations. It has been further observed [2] that the recommendations can be improved by tagging the informal information entered by the users for items. This paper examines the current work done in the implementation of RSs using Semantic Web technologies and folksonomies. It details the current implementations of websites using Recommender Systems in different applications and the challenges that they pose. II. RELATED TECHNOLOGIES The different technologies used are detailed below: A. Recommender Systems Recommender Systems (RS) have been explained in [3] under the following headings: 1) Collaborative Filtering RS: Collaborative filtering helps predict the interests of a user by collecting preferences or taste information from many users. The underlying assumption of the collaborative filtering approach is that if person A and person B like a particular item, then the probability of A liking another item that B likes is more compared to any other person chosen randomly. 2) Content-Based Filtering RS: Content-based filtering is based on the characteristics of items. The system suggests other items to the user based on the characteristics of the items liked by the user. 3) Knowledge-Based Filtering RS: Knowledge-based filtering is appropriate where the number of ratings is low or specific requirements need to be met for an item to satisfy a customer need. 4) Hybrid Filtering RS: Hybrid systems use a combination of all the available recommender systems to give an appropriate recommendation based on the data available. B. Semantic Web Technologies Web 3.0 Semantic Web [4], also known as Web 3.0, is the future of the Internet as envisioned by Tim Berners-Lee, the creator of the Internet. Although artificial intelligence has been studied extensively, the benefits that a normal user derives from artificial intelligence are very limited. It is envisioned that users of the Internet will benefit from the Semantic Web in the future as concepts of artificial intelligence get implemented easily in web-based applications. When Web content is read by human beings, the content or the data available on the website cannot be processed by machines. This is because a common global standard for data and website implementation does not exist across websites. Semantic Web has laid down the standards to be followed so that structure can be brought into web content. This helps developers create semantic web agents that can access these web pages automatically and have inference power to conduct automated reasoning [1]. There are multiple terms or technologies that together make the Semantic Web. They are described below. 1) Resource Description Framework (RDF) The challenge for Semantic Web is to be able to provide a language for both the data as well as the rules for reasoning about the data. The meaning is expressed in RDF as triples [4]. Each triple contains a subject, a predicate, and an object. If two terms have the same meaning, then the ontology provides a third basic component of the Semantic Web that formally defines the relationship among terms. There are multiple RDF formats, such as RDF/XML, Turtle and N3. 2) Linked Open Data To realise the full potential of the web, it is essential to have all the web data to be available as a single global system. This is the concept of Linked Open Data (LOD) where different organisations, government agencies or individuals upload their data on to the web such that it is interconnected and at the same time accessible by semantic web-enabled applications. Linked data is mainly about publishing structured data in RDF using Uniform Resource Identifiers (URI) [5]. It refers to a set of best practices to be followed for publishing and connecting structured data over the Internet [4]. Semantic Web applications rely on people and organizations publishing their data on to the Linked Open Data cloud in a structured format. Tim Berners-Lee outlined the set of principles known as the Linked Data principles to be followed when publishing data on the web. The linked data principles [6] are as follows: Use URIs as names for things. Use Hyper Text Transfer Protocol (HTTP) URIs so that people can look up those names. When someone looks up in a URI, provide useful information, using standards, such as RDF and (SPARQL Protocol and RDF Query Language) SPARQL. Include links to other URIs so that they can discover more things. Every Linked Open Data (LOD) dataset can be understood as a Semantic Web application that helps the end user in some way[7]. In 2007, Chris Bizer and Richard Cyganiak submitted the application of Linked Open Data (LOD) to W3C, representing the start of linked data development. As of September 1 st , 2011, 295 datasets have been published and interlinked by the project consisting of over 31 billion RDF triples, which are interlinked by approximately 504 million RDF links [8]. 3) Semantic Web Services Semantic Web Services (SWS) provides features that allow new services to be added, discovered, and composed dynamically. The processes that might be able to use the web services are updated automatically to reflect the new forms of cooperation. SWS combine the flexibility, reusability, and universal access that typically characterise a web service along with the expressivity of semantic markup and reasoning, in order to make the invocation, composition, mediation and automatic execution of complex services feasible. [4]. 4) Semantic Web Applications Applications are built to use ontologies and data published in Linked Open Data as RDF to display and infer different conclusions based on the inference model that has been built into the application. 5) Ontology Development Traditionally, to facilitate the building of ontologies for the Semantic Web from text, text-mining techniques have been used. However, traditional systems employ shallow natural language processing techniques and focus only on concept and taxonomic relation extraction. Ontology development is a big area for Semantic Web technologies and a lot of work is happening in this area [9]. C. Folksonomies Folksonomies are a feature provided by Web 2.0 to end users. A Folksonomy is a classification technique in which end users put keywords also known as tags to each item or page freely and subjectively. The end user can choose any word as a tag and can put one or more tags to an item. These tags also represent a form of user item ratings that can be used to provide better recommendations. [10] III. IMPLEMENTATIONS OF RECOMMENDER SYSTEMS USING SEMANTIC WEB TECHNOLOGIES Recommender Systems have been implemented using Semantic Web technologies. Some of these are: MORE - Movie Recommendations using DBpedia [11] Taste It! Try It! - Mobile Restaurant review and recommendation application [12] Semantic Enhanced Case-Based Reasoning for Intelligent Recommendations [13] Ontology-Based Personalized and context-aware Recommendations of News Items [14] Ontology-based TV Programs recommender system [15] Semantic Web enabled tourism recommender system [16] In this paper, we shall discuss two of them in detail and the remaining ones in brief. A. MOvie REcommendation (MORE) In [11], R. Mirizzi et al., have presented MORE. MORE is a Facebook application that recommends movies by using the details of the user from their Facebook account along with data from the Linked Data cloud: DBPedia and the semantic- enabled version of the Internet Movie Database (IMDB), LinkedMDB. Similarities between the movies are found if: They are directly related. This can be found using properties such as dbpedia-owl:subsequentWork or dbpedia-owl:previousWork Two subjects have the same Predicate and same Object in an RDF triple of Subject, Predicate and Object. For example, two movies have the same directors. This can be found using queries that use properties such as dbpedia- owl:director. Two objects have the same subject and predicate. For example, the star cast of the movie would have the same movie name and the same predicate as starring. The property used in this scenario is dbpedia-owl:starring. They belong to the same category or sub-category. This is handled in DBPedia using the property dcterms:subject and skos:broader
A semantic adaptation of the Vector Space Model (VSM) is used for text retrieval to deal with RDF graphs. The whole RDF graph is represented as a three-dimensional tensor where each slice refers to an ontology property. For every property, a movie is seen as a vector whose properties refer to the term frequency-inverse document frequency (TF-IDF). The similarity degree is the correlation between the two vectors and is quantified by the cosine of the angle between the two vectors. In the system, once the application is loaded, the user can search for a particular movie by typing a few characters in the corresponding text field. The system starts populating an autocomplete list of movies based on the PageRank algorithm that is adapted to the DBpedia subgraph related to movies. The user can select any of the movies as his favourite. The system then recommends forty movies that are similar to the selected movie. The system uses content-based filtering using DBPedia and LinkedMDB and collaborative filtering using the similarities between users. B. Taste It! Try It! In [12], S. Lazaruk et al., have presented a semantically enabled Social Web Recommender Application called Taste It! Try It! This application is based on the idea that LOD contains data that is diverse in nature. The goal of the application is to make annotating easy so that the end user can create reviews easily and the information can be used further in recommender systems. This application is targeted at the following two groups: Data producers: Users providing reviews of restaurants. Data Consumers: People interested in the reviews. When a person goes to a restaurant and then wants to provide a review of the same, he can access the Taste It! Try It! application on his mobile phone. The application captures the position and place using the GPS on the mobile and enables putting a semantically enabled review at this location. The application asks the user to rate a particular location on various parameters. Once the user has entered all the values, they can further add some free text if they want to and then save it. This is saved to the servers and a semantic recommendation is created in the background. The application also gives the user a special title if the quality of the annotation is good, which would then be visible to other Facebook users. The ratings given by the user can then be used by semantic enabled recommender systems while searching for restaurants fulfilling certain criteria. Thus this application: Provides semantically enabled reviews Keeps the end user entertained Offers personalized, semantic-aware recommendations C. Semantic Enhanced Case-Based Reasoning for Intelligent Systems H. Wang et al. have proposed a Case Based recommender system using Semantic technologies in [13]. Their approach integrates both content and rating information. Case similarity between the current case and a retrieved case is measured based on the semantic similarity algorithm. The domain ontology provides a formal representation, which includes semantic descriptions of users and products. The proposed approach that considers semantic information of both the products content descriptions and the users preferences overcomes the limitations of traditional recommender systems. D. Recommendations for News Items In [14], I. Cantador et al., have proposed a News Item recommender system that uses Semantic Web technologies to suggest which news items should be shown to the end user on the screen. Their model personalizes the order in which news articles are shown to the user based on the interests of the user. E. Tourism recommender System In [15], an application TripFromTV+ is discussed. This is an interactive application that creates lowest-price tailor-made tourist packages by helping the viewer decide what to do and what to visit during a trip. This application infers the viewers preferences from the kinds of TV programs that they enjoyed . They also use the users activities on social networking sites, whose diffusion mechanisms are exploited to make the existing tourism offers known among the viewers contacts. The paper shows how interactive TV applications can incorporate content from the Internet, by creating seamlessly integrated presentations that allow the viewer to have the advantages of the network capabilities in the TV environment through domestic and mobile consumer devices.
IV. IMPLEMENTATIONS OF RECOMMENDER SYSTEMS USING FOLKSONOMIES In [10], L. Marinho et al., have proposed that the recommendation quality is improved by using the metadata in the tags associated with items such that it gives additional knowledge. With the increasing popularity of the collaborative tagging systems, tags could be interesting and provide useful information to enhance RS algorithms. Attributes are termed as global descriptions of items given by the users and tags as local descriptions . They have proposed a generic method that allows tags to be incorporated to standard CF algorithms, by reducing the three- dimensional correlations to three two-dimensional correlations and then applying a fusion method to re-associate these correlations. V. CHALLENGES Although semantic web-related technologies look very promising for RS, their acceptance and implementation pose challenges. They include: Semantic web-based applications suffer from a vicious circle of data versus application availability. Organizations are not investing much to publish their data to the LOD cloud as there are not a large number of applications that use this data and provide business benefit. On the other hand, application developers are not creating new and improved applications as there is not enough data published on the LOD that can used by the new applications. This vicious circle of application versus data exists when any new path-breaking technology starts getting accepted and implemented as a mainstream application. Management of URIs [5]: Linked data is mainly about publishing structured data in RDF using URIs rather than focusing on the ontological level or inferencing. This simplification lowers the entry barrier for data providers just as the Internet based on URLs simplified the established academic approaches of Hypertext systems. However, all the RDF data on websites needs to be independently accessible using URIs. Creation and selection of vocabularies: An important aspect in the whole process of ontology creation and selection is deciding which ontologies to use or extend. It is strongly advised to reuse existing vocabularies and extend them if required rather than create new ones based on the type of application that is being developed. Handling provenance and trust [1]: The RS depends on the data drawn from multiple sites. The question of how to represent the provenance and trustworthiness of data drawn from many sources into an integrated view is a significant research challenge. Tim Berners-Lee proposed that the browser interface should be enhanced with the Oh, yeah? button [16] to support the user in assessing the reliability of the information encountered on the web. Whenever a user encounters a piece of information that they would like to verify, pressing such a button will produce an explanation of the trustworthiness of the displayed information. This goal is yet to be realized. Addressing the quality of service [1]: An overview of different content-based, context-based, and rating-based techniques can be used to heuristically assess the relevance and quality of data given. This can be viewed by other users of the dataset to understand its quality. Performance and scalability issues [1]: Linked data can be accessed by different semantic web-enabled applications using techniques such as advanced crawling and caching. However, the increase in the number of datasets over time will degrade the performance of semantic web-enabled applications. Therefore, this might necessitate widespread link traversal and crawling. It is necessary to make sure that an increase in the data in the LOD does not impact the performance of semantic web- enabled applications. Any issues in performance will have a reverse effect on the popularity being gained by Semantic Web Technologies. Link Maintenance [1]: The content of the Linked data is continuously changing. The RDF links between data sources are updated sporadically. This leads to dead links pointing to URIs that are no longer maintained as new data is published. Web architecture is tolerant to dead links but too many can lead to unnecessary http requests. This is also an area of research that is receiving a lot of focus for improvement. VI. CONCLUSIONS AND FURTHER WORK Semantic Web provides a foundation and framework that assists human beings in inferencing knowledge using artificial intelligence. This study reveals that RS application creators are slowly coming to realize the benefits of the data present on the LOD and better RSs based on Semantic Web technologies are being envisioned and designed. The real power of the Semantic Web will be realized once developers start creating Semantic Web-enabled software agents that collect content from diverse sources, process the information, and exchange results with other programs. The quality of the data available on the LOD also needs to be enhanced further. We believe that this would happen as more and more applications are designed to use the data on the LOD to provide better recommendations. Better data on the LOD, along with proper extraction of user item relationships from Folksonomies, would help in the creation of superior analysis and recommendation tools, which will in turn help end users make wiser choices. REFERENCES [1] Bizer Christian, Health Tom, Berners-Lee Tim, Linked Data The Story So Far, International Journal on Semantic Web and Information Systems, vol. 5, no. 3, pp. 1-22, 2009. [2] Karen H. L. TsoSutter, Leandro Balby Marinho and Lars Schmidt Thieme, "Tagaware Recommender Systems by Fusion of Collaborative Filtering Algorithms," in SAC08, 2008. [3] G. Adomavicius and A. Tuzhilin, Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., vol 17, no. 6, pp. 734-749, 2005. [4] Berners-Lee Tim, Hendler James, and Lassila Ora, The Semantic Web, Scientific American, pp. 35-43, May 2001. [5] Hausenblas Michael. Exploiting Linked Data For Building Web Applications. IEEE Internet Computing July-Aug., vol. 13 no. 4 pp. 68-73, 2009. [6] Heath, T.,Bizer, C.: LinkedData: Evolving the web into a Global Data space. Morgan and Claypool, 2011. http://linkeddatabook.com/editions/1.0/ (accessed August 15, 2012) [7] Halb Wolfgang, Raimond Yves, Hausenblas Michael. Building Linked Data For Both Humans and Machines, Linked Data on the Web Workshop at the 17th International World Wide Web Conference 2008 (WWW2008), Beijing, China, 2008. [8] Hongbo Lai, Yushun Fan, Le Xin and Hui Liang, "The Framework of Web 3.0-Based Enterprise Knowledge Management System" 7th International Conference on Knowledge Management in Organizations: Service and Cloud Computing Advances in Intelligent Systems and Computing, Volume 172, 345-351, 2013. [9] Xing Jiang, Ah-Hwee Tan. "CRCTOL: A Semantic-based domain ontology learning system." Journal of the American Society for Information Science & Technology, Vol. 61 Issue 1, p150-168, Jan 2010. [10] Karen H. L. TsoSutter, Leandro Balby Marinho and Lars Schmidt Thieme, "Tagaware Recommender Systems by Fusion of Collaborative Filtering Algorithms" in SAC08, 2008. [11] R. Mirizzi, T. Di Noia, A. Ragone, V. C. Ostuni, and E. Di Sciascio, Movie recommendation with DBPedia, in 3rd Italian Information Retrieval Workshop (IIR 2012). CEUR-WS, 2012. [12] Szymon azaruk, Jakub Dzikowski, Monika Kaczmarek and Witold Abramowicz, "Semantic Web Recommendation Application" in Federated Conference on Computer Science and Information Systems pp. 10831090, 2012. [13] Huimin Wang, Guihua Nie, Donglin Chen, "Semantic-Enhanced Case- Based Reasoning for Intelligent Recommendation" Computer Science and Information Engineering, WRI World Congress 2009. [14] Ivn Cantador, Alejandro Bellogn, Pablo Castells, "Ontology-based Personalised and Context-aware Recommendations of News Items" Web Intelligence and Intelligent Agent Technology, 2008. [15] Yolanda Blanco-Fernndez, Martn Lpez-Nores, Jos J. Pazos-Arias, Jorge Garca-Duque, Manuela I. Martn-Vicente, "TripFromTV+: Targeting Personalized Tourism to Interactive Digital TV Viewers by Social Networking and Semantic Reasoning" IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011. [16] Berners-Lee, T., Cleaning up the User Interface, Section The Oh, yeah?-Button. http://www.w3.org/DesignIssues/UI.html, February 6, 1997.