Inferring User Search

ABSTRACT
For a broad-topic and ambiguous query, different users may have

different search goals when they submit it to a search engine. The inference and
analysis of user search goals can be very useful in improving search engine
relevance and user experience. In this paper, we propose a novel approach to
infer user search goals by analyzing search engine query logs. First, we propose
a framework to discover different user search goals for a query by clustering the
proposed feedback sessions. Feedback sessions are constructed from user click-
through logs and can efficiently reflect the information needs of users. Second,
we propose a novel approach to generate pseudo-documents to better represent
the feedback sessions for clustering. Finally, we propose a new criterion
Classified Average Precision (CAP) to evaluate the performance of inferring
user search goals. Experimental results are presented using user click-through
logs from a commercial search engine to validate the effectiveness of our
proposed methods.

INTRODUCTION

In web search applications, queries are submitted to search engines to
represent the information needs of users. However, sometimes queries may not
exactly represent users specific information needs since many ambiguous
queries may cover a broad topic and different users may want to get information
on different aspects when they submit the same query. Therefore, it is necessary
and potential to capture different user search goals in information
retrieval. We define user search goals as the information on different aspects of
a query that user groups want to obtain. Information need is a users particular
desire to obtain information to satisfy his/her need. User search goals can be
considered as the clusters of information needs
for a query. The inference and analysis of user search goals can have a lot of
advantages in improving search engine relevance and user experience

According to user search goals by grouping the search results with the
same search goal; thus, users with different search goals can easily find what
they want. Second, user search goals represented by some keywords can be
utilized in query recommendation .Thus, the suggested queries can help users to
form their queries more precisely. Third, the distributions of user search goals
can also be useful in applications such as re ranking web search results that
contain different user search goals.
With the diverse and explosive growth of Web information, how to
organize and utilize the information effectively and efficiently has become more
and more critical. This is especially important for Web 2.0 related applications
since user-generated information is more freestyle and less structured, which
increases the difficulties in mining useful information from these data sources.
In order to satisfy the information needs of Web users and improve the user

experience in many Web applications, Recommender Systems, have been well
studied in academia and widely deployed in industry. Typically, recommender
systems are based on Collaborative Filtering which is a technique that
automatically predicts the interest of an active user by collecting rating
information from other similar users or items. The underlying assumption of
collaborative filtering is that the active user will prefer those items which other
similar users prefer. Based on this simple but effective intuition, collaborative
filtering has been widely employed in some large, well-known commercial
systems, including product recommendation at
Amazon,1 movie recommendation at Netflix,2 etc. Typical collaborative
filtering algorithms require a user-item rating matrix which contains user-
specific rating preferences to infer users characteristics. However, in most of
the cases, rating data are always unavailable since information on the Web is
less structured and more diverse.
Fortunately, on the Web, no matter what types of data sources are used
for recommendations, in most cases, these data sources can be modeled in the
form of various types of graphs. If we can design a general graph
recommendation algorithm, we can solve many recommendation problems on
the Web. However, when designing such a framework for recommendations on
the Web, we still face several challenges that need to be addressed.
The first challenge is that it is not easy to recommend latent semantically
relevant results to users. Take Query Suggestion as an example, there are
several outstanding issues that can potentially degrade the quality of the
recommendations, which merit investigation. The first one is the ambiguity
which commonly exists in the natural language. Queries containing ambiguous
terms may confuse the algorithms which do not satisfy the information needs of

users. Another consideration, as reported, is that users tend to submit
short queries consisting of only one or two terms under most circumstances, and
short queries are more likely to be ambiguous.
Through the analysis of a commercial search engines query logs
recorded over three months in 2006, we observe that 19.4 percent of Web
queries are single term queries, and further 30.5 percent of Web queries contain
only two terms. Third, in most cases, the reason why users perform a search is
because they have little or even no knowledge about the topic they are searching
for. In order to find satisfactory answers, users have to rephrase their queries
constantly.
The second challenge is how to take into account the personalization
feature. Personalization is desirable for many scenarios where different users
have different information needs. As an example, Amazon.com has been the
early adopter of personalization technology to recommend products to shoppers
on its site, based upon their previous purchases. Amazon makes an extensive
use of collaborative filtering in its personalization technology. The adoption of
personalization will not only filter out irrelevant information to a person, but
also provide more specific information that is increasingly relevant to a persons
interests.
The last challenge is that it is time consuming and inefficient to design
different recommendation algorithms for different recommendation tasks.
Actually, most of these recommendation problems have some common features,
where a general framework is needed to unify the recommendation tasks on the
Web. Moreover, most of existing methods are complicated and require to tune a
large number of parameters. In this paper, aiming at solving the problems
analyzed above, we propose a general framework for the recommendations on
the Web. This framework is built upon the heat diffusion on both undirected
graphs and directed graphs, and has several advantages.

1. It is a general method, which can be utilized to many recommendation
tasks on the Web.
2. It can provide latent semantically relevant results to the original
information need.
3. This model provides a natural treatment for personalized
recommendations.
4. The designed recommendation algorithm is scalable to very large data
sets.
The empirical analysis on several large scale data sets (AOL clickthrough
data and Flicker image tags data) shows that our proposed framework is
effective and efficient for generating high-quality recommendations.
TECHNIQUE USED
Collaborative Filtering
Collaborative filtering (CF) is a technique used by some recommender
systems. Collaborative filtering has two senses, a narrow one and a more
general one. In general, collaborative filtering is the process of filtering for
information or patterns using techniques involving collaboration among
multiple agents, viewpoints, data sources, etc. Applications of collaborative
filtering typically involve very large data sets.
Collaborative filtering methods have been applied to many different kinds
of data including: sensing and monitoring data, such as in mineral exploration,
environmental sensing over large areas or multiple sensors; financial data, such
as financial service institutions that integrate many financial sources; or in
electronic commerce and web applications where the focus is on user data, etc.
The remainder of this discussion focuses on collaborative filtering for
user data, although some of the methods and approaches may apply to the other
major applications as well.

In the newer, narrower sense, collaborative filtering is a method of
making automatic predictions (filtering) about the interests of a user by
collecting preferences or taste information from many users (collaborating).
Learning to Order for CF
A flaw in rating-based CF data
users tend to rate on different scales
this makes ratings hard to aggregate and transfer
A solution:
disbelieve (ignore) a users absolute ratings
believe (use in training) relative values
e.g., if user rates item j
1
at 5 and item j
2
as 8 then believe
j
1
is preferred to

j
2 .

i.e., treat CF as a problem of learning to rank items.
Sentiment analysis (also known as opinion mining) refers to the use
of natural language processing, text analysis andcomputational linguistics to
identify and extract subjective information in source materials.

Inferring User Search

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Inferring User Search

Enviado por

Direitos autorais:

Formatos disponíveis

ABSTRACT

For a broad-topic and ambiguous query, different users may have

Você também pode gostar