A Survey On Various Architectures, Models and Methodologies For Information Retrieval

International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING 6367(Print), ISSN 0976 6375(Online) Volume
e 4, Issue 1, January- February (2013), IAEME & TECHNOLOGY (IJCET)
ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), pp. 182-194 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2012): 3.9580 (Calculated by GISI) www.jifactor.com
IJCET
IAEME
A SURVEY ON VARIOUS ARCHITECTURES, MODELS AND METHODOLOGIES FOR INFORMATION RETRIEVAL

Prakasha S sprakashjpg@yahoo.co.in RNSIT, Bengaluru 560098 Shashidhar HR shashi_dhara@yahoo.com RNSIT, Bengaluru 560098 Dr. G T Raju gtraju1990@yahoo.com RNSIT, Bengaluru 560098
ABSTRACT The typical Information Retrieval (IR) model of the search process consists of three essentials: query, documents and search results. An user looking to fulfill information need has to formulate a query usually consisting of a small set of keywords summarizing the information need. The goal of an IR system is to retrieve documents containing information which might be useful or relevant to the user. Throughout the search process there is a loss of focus, because keyword queries entered by users often do not suitably summarize their complex information needs, and IR systems do not sufficiently interpret the contents of documents leading to result lists containing irrelevant and redundant information. The short keyword query used as input to the retrieval system can be supplemented with topic categories from structured Web resources. The topic categories can be used as query context to retrieve documents that are not only relevant to the query but also belongs to a relevant topic category. Category information is especially useful for the task of entity ranking where the user is searching for a certain type of entity such as companies or persons. Category information can help to improve the search results by promoting in the ranking pages belonging to relevant topic categories, or categories similar to the relevant categories. Users may raise various queries to describe the same information need. For example, to search for National Board of Accreditation, queries National Board of Accreditation (NBA) or NB Accreditation may be formulated. Directly using individual queries to describe context cannot capture contexts concisely and accurately. Also queries may arise where NBA can be expanded as either National Basketball Association or National Board of accreditation. Hence it becomes extremely important to go for context based query based on the user history and present requirements of the user in that context. In this paper, an extensive survey has been made on different Architectures, Models and Methodologies that have been used in IR by various researchers along with the comparison of results against various performance metrics, also highlighting the need for context based query.
182
International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME
Keywords: Query Model, Ranking Model, feedback-model, Retrieval model, query context 1. INTRODUCTION
Given the constantly increasing information overflow of the digital age, the importance of IR has become critical. Web search is one of the most challenging problems of the Internet today, striving to provide users with search results most relevant to their information needs. IR deals with the representation, storage, organization of, and access to information items such as documents, Web pages, online catalogues, structured and semistructured records, and multimedia objects [Baeza-Yates and Ribeiro-Neto, 2011]. Web search engines are by far the most popular and heavily used IR applications. The next step in the search process is to translate the information need into a query, which can be easily processed by the search engine. The primary goal of an IR system is to retrieve all the documents which are relevant to a user query while retrieving as few non-relevant documents as possible. To achieve this goal IR systems must somehow ìnterpret' the contents of the documents in a collection, and rank them according to a degree of relevance to the user query. The ìnterpretation' of a document involves extracting syntactic and semantic information from the document and using this information to match the user information need. The notion of relevance is at the centre of IR. While for simple navigational information needs the search process is straightforward, for more complex information needs we need focused retrieval methods. The notion of `focused retrieval' can be defined as providing more direct access to relevant information by locating the relevant information inside the retrieved documents [Trotman et al., 2007]. The first element of the search process is the query. In an ideal situation this short keyword query is a suitable summarization of the information need, and the user will only have to inspect the first few search results to fulfill his information need. To overcome the shallowness of the query, i.e., users entering only a few keywords poorly summarizing the information need, we add context to the query to focus the search results on the relevant context. We define context as: all available information about the user's information need, besides the query itself. Different forms of context can be considered to implicitly or explicitly gather more information on the user's search request. Potential forms of query context are document relevance, and category information. The second elements of search we examine are the documents. Documents on the Web are rich in structure. Documents can contain HTML structure, link structure, different types of classification schemes, etc. Most of the structural elements however are not used consistently throughout the Web. A key question is how to deal with all this (semi-)structured information, that is how IR systems can ìnterpret' these documents to reduce the shallowness in the document representation. A problem in Web search is the large amount of redundant and duplicate information on the Web. Web pages can have many duplicates or near-duplicates. Web pages containing redundant information can be hard to recognize for a search engine, but users easily recognize redundant information and this will usually not help them in their search. Most structured Web resources have organized their information in such a way that they do not contain, or significantly reduce redundant information [Anna Maria Kaptein 2011]. Structured resources provide two interesting opportunities: `Documents categorized into a category structure' and Àbsence of redundant information'. Category information is of
183
vital importance to a special type of search, namely entity ranking. Entity ranking is the task of finding documents representing entities of an appropriate entity type that are relevant to a query. Entities can be almost anything, from broad categories such as persons, locations and organizations to more specific types such as churches, science-fiction writers or CDs. Searchers looking for entities are arguably better served by presenting a ranked list of entities. Rather directly, than a list of Web pages with relevant but also potentially redundant information about these entities. Category information can be used to favor pages belonging to appropriate entity types[Anna Maria Kaptein 2011]. Search Intent and Context is an important criterion in catering to the users query. Suppose a user raises a query apple It is hard to determine the users search intent that is, whether the user is interested in the history of apple Inc, or the fruit apple. Without looking at the context of search, the existing methods often suggest many queries for various possible intents, and thus result in a low accuracy in query suggestion. The query context which consists of the search intent expressed by the users recent queries can help to better understand thesaurus search intent and make more meaningful suggestions. 2. DIFFERENT MODELS USED IN IR
For effectively retrieving relevant documents by IR strategies, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. Keke Cai et al., in their paper use retrieval process based on context-based Retrieval model consists of KL_divergence retrieval model for initial retrieval [9]. Similarly Tangjian Deng et al., present a brain memory inspired, context-based information re-finding framework, which enables users to re-find results accessed before by relevant contexts [16]. Yunping Huanget et al., propose a new query model refinement approach: random walk smoothing method which exploits the expanded terms and term relationships based on the feedback documents [13]. Xiaohui Yan et al., address the problem of context-aware query recommendation. Unlike the existing approaches which leverage query sequence patterns in query sessions, they use the clickthrough of the given query as the major clue of user search intents to provide context-aware recommendation [22]. Chang Liu and Nicholas J. Belkinhas proposes an a personalized IR model based on implicit acquisition of task type and document preferences as search context by observing and analyzing user behaviors, and then use implicit relevance feedback to rerank or reformulate user queries to help users search effectively and efficiently [4]. Huanhuan Cao et al., proposes modeling search context by CRF[31]. Ji-Rong Wen et al., proposes four models for contextual retrieval [20]. Protima Banerjee et al., proposed the Aspect Model forms the foundation of the Probabilistic Latent Semantic Analysis (PLSA) method. They also put forward a technique that estimates a relevance model from the query alone without the need for training data. Yan Qi et al., proposes a Query-driven feedbackbased conflict resolution. They have developed data structures and algorithms to enable feedback-based conflict resolution during query processing on imperfectly aligned data [25]. The various models listed above are used for query expansion with the help of various feedback techniques. By expanding the query it adds a context to the query. The above said models are also used for ranking the query. Comparison of these models has been presented in Table 1.
184
Model
Author
KL_Divergence Retrieval Model[9].
KekeCai
Query model refinement approach[13]. Probabilistic model[22].
Yunping Huang
Approach Markov Random Field (MRF). MRF based sentence retrieval Bayesian network Random walk smoothing method High-order method Intuitive Model Query And Context Model Eliminate Noisy Elements Model PLSA method Concept matching. Quest The FICSR preprocessing module Constraint analysis & system feedback Users feedback
Parameters Top ranked document list and ranked list average Score of each vertex -controls the weight of the initial query model.
Inference MRR are respectivel-y improved by 19.7%, 25.5% and 24.1%
Inputs
Top-ranked documents
Query to 0.1 or 0.2 usually yields the best retrieval performance 51:1% of the query occurrences & 51:7% of the URL clicks remained Improvement in precision & recall (no % specified)
Xiaohui Yan
Feedback documents
Modeling Search Context by CRF[20].
Huanhuan Cao
Document- dmax
Querys
Aspect Model[25].
Protima Banerjee
smoothing parameter -
Documents with probability p(d)
Query driven Feedback based Conflict resolution[15].
Yan Qi et al
k- simple paths
the stabbed version was 60% faster
User query
Query Model and Ranking Model[33].
Liang Jeff Chen
vk-aggregate document parameter- sc
Mean precision 10.2 for 30 query
set of keywords (Qk)
Table1. Comparison of Various Models used by different authors for IR 3. THE VARIOUS ARCHITECTURES OF IR
The various architectures for query context are defined since all the existing systems do not perform ranking a query pattern according to context. Some of the architectures are mentioned in the following sentences. Giorgio Orsi et al., has proposed a SAFE architecture that receives input of sequence of keywords and produces, as output, a ranking over a set of query patterns, possibly with a suggested assignment for their parameters [19]. They also propose The Context Model is an instantiation of the context vocabulary and defines the context model for the given application. In particular, the context-model specifies the (possibly hierarchical) context dimensions for the specific application, along with their possible values. A K Sharma et al., proposes Query Semantic Search System (QUESEM,
185
/Qu-sem/) to improve the search quality. QUESEM maintains a database of definitions (referred to as Definition Repository), as the core of the system to accomplish its desired task [26]. Haizhou Fu et al., proposes CoSisystem architecture consists of three core components: an indexer, a context-sensitive cost model and a query interpreter [23]. Christian Sengstock and Michael Gertz proposes architecture of the CONQUER system is composed of a model generation component, a model index, and a suggestion service [37]. Reiner Kraft et al., propose the overall Y! Q system design and architecture. The Y! Q back-end comprises three major system components for processing contextual search queries: Content Analysis (CA), Query Planning and Rewriting Framework (QPW), and Contextual Ranking (CR) [29]. Liang Jeff Chen et al., proposes Query Model and Ranking Model. In Query model a document, denoted by d, is modeled as a tuple of fields, each consisting of a bag of words [33]. The various architectures mentioned above suggest to improve the retrieval process by enhancing the context of query. A comparison of these architectures is presented in Table 2.
Architecture Authors Inputs Models / Methods Inference 65% queries were found on top of theranked list25% of cases, users found the query in the second position CoSi will learn what user is asking for & rank the intended interpretationhigher such that the end users can _nd them more easily. space-complexity of O(1) per node in the FP-tree & O(1) runtime-complexity overhead for each node update opertion. Y!Q is superior to Yahoo! WS 32.3% of the context and query pairs, while Yahoo! WS is better only 8.3% of them (with 59.4% tied.)
SAFE architecture [19].
Giorgio Orsi
Keyword Search,
The Context Model indexer a contextsensitive cost model query interpreter Model Generator Model Index Suggestion Service CA component QPWs CR
CoSisystem architecture [23].
Haizhou Fu
keyword queries
Architecture Of the CONQUER System[ 37].
Christian Sengstock
patterns and their synopses
Y!Q System Design And Architecture[29].
Reiner Kraft
Table 2: Comparison of various Architectures proposed by different authors for IR
186
4.
METHODOLOGIES PROPOSED BY DIFFERENT BY AUTHORS
A K Sharma et al., proposes two algorithms, Local Site Search for Query and Definition Generation & Annotation. As the response pages are retrieved from dictionary based sites, it is assumed that they will contain the direct thesaurus and synonyms of the query terms[26]. Lidong Bing et al., proposes scoring algorithm and Latent Topic Analysis and Training Algorithm [32]. Wenwei Xue et al., proposes algorithm for context attribute matching and context schema matching [27]. Reiner Kraft et al.,proposed two algorithms for ranking and filtering of documents. They are rank averaging and MC4 [29]. Liang Jeff Chen et al proposes Data-Mining-based Selection and graph decomposition algorithm [33]. Huanhuan Cao et al., proposes algorithm for clustering queries. In their method, a cluster C is a set of queries [36]. ZimingZhuang and Silviu Cucerzan proposes re-ranking algorithm. Q-Rank is based on a straight-forward yet very effective rationale, that the most frequently seen query extensions of a target query (terms extracted from queries that contain the target query as an affix) and adjacent queries (queries that immediately precede or follow a query in a user search session) provide important hints about users search intents [35]. Zhen Liao et al., proposes Query Stream Clustering with Iterative Scanning (QSC-IS). Query Stream Clustering with Master-Slave Model (QSC-MS) and query suggestion algorithm [1]. Mariam Daoud et al., proposed session based personalized search algorithm which describes the general view of the overall process of our session-based personalized search is set according to the algorithm [30]. Minmin Chen et al., proposed adaptive self training algorithm [31]. Self training is a very commonly used algorithm to wrap complex models for semi-supervised learning [30]. The various algorithms used in IR range from query clustering, query ranking, to query suggestion to query expansion. The query clustering usually clusters similar queries that leads to a similar or same documents viewed by the user. In query ranking algorithm the queries are ranked according to frequency with which users raise their queries. The algorithms that use the concept of query expansion use some kind feedback or probability technique to expand the query. A comparison of these methodologies has been presented in Table 3. 5. APPLICATIONS OF IR The applications of IR are mainly classified into general applications and domain specific applications. The general applications includes digital libraries, Search Engines etc, Domain specific application includes Expert Search Finding, Genomic IR Geographic IR etc., 5.1General applications of IR Digital libraries: A digital library is a library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible by computers.
187
Author AK Sharma et al[26]. Lindongbing et al[32]. Technique / Methodology Defination_Generator_An notator(D) Local _Site_searching Scoring algorithm Latent topic analysis and training algorithm Context attribute matching Wenweixue et al[27]. Context schema matching Parameters considered Keywords Query Query Ranking a pair of context attributes schema matcher integrates a local schema into the current set of global schemas assigning a score to every position in a rank list, the input is k ranked lists which are the top few results of k sub queries. For two keyword combinations P1; P2, keyword combinations Diameter parameter Dmax Outcome / Results performance From 0.6 lakhs to a 1.6 lakhs relevance results is achieved from 2.5 lakhs results The differences between the performances of our method and CTA are significant with significance level 0.05. CAMSUBSYN achieved as high as 100% precision and 64% recall upon our dataset
Rank averaging algorithm Reneirkraft et al[29]. MC4 algorithm
95 % confidence interval is [2.873, 2.972]), compared to an average of 2.54 ([2.45, 2.66]) based on ComScore (which includes MSN, Google, and Yahoo)
Liang Jeffchen et al[33].
Data-mining based selection algorithm Graph decomposition algorithm Algorithm for clustering queries
The average number of MeSH terms in a citation after the inheritance is 44better ranking in 21 out of 30 queries The average overall precision of CRF-B, CRF-B-C and CRF-B-C-T is improved across different K by 50%, 52% and 57%, respectively. Interpolation parameter (). When varying , on average, Q-Rank improved the rankings for 75.8% of the re-ranked queries. Total response time is still small, that is, about 0.3 millisecond. The setting (r =0,3) produces the best improvement in personalized search since it produces higher precision improvement at P@5 (11,63%). 51.38% precision with only 10% of the training data labeled.
Huanhuancao et al[36]. Zimingzhuang et al[35].
Re-ranking algorithm Query stream clustering with iterative scanning Query stream clustering with master-slave model Query suggestion
adjacent queries The M1-th query. x modM= . preceding queries Query Unlabeled queries
Zhen liao et al [1].
Mariam daoud[30]. Minminchen[31] .
Session personalized search algorithm Adaptive self training with conditional random fields
Table 3: Comparison of different methodologies for IR

188
The digital content may be stored locally, or accessed remotely via computer networks. A digital library is a type of IR system. Search engines : - Desktop search: is the name for the field of search tools which search the contents of a user's own computer files, rather than searching the Internet. These tools are designed to find information on the user's PC, including web browser histories, e-mail archives, text documents, sound files, images and video. - Enterprise search : Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. - Federated search : Federated search is an IR technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user. - Mobile search : Mobile search is an evolving branch of IR services that is centered on the convergence of mobile platforms and mobile phones, or that it can be used to tell information about something and other mobile devices. Web search engine ability in a mobile form allows users to find mobile content on websites which are available to mobile devices on mobile networks - Social search : Social search or a social search engine is a type of web search that takes into account the Social Graph of the person initiating the search query. When applied to web search this Social-Graph approach to relevance is in contrast to established algorithmic or machine-based approaches where relevance is determined by analyzing the text of each document or the link structure of the documents. Web search : It is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as Search Engine Results Pages (SERPs). The information may be a specialist in web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories. 5.2 Domain Specific applications of IR In domain specific IR the information is based on a particular domain and classification based on the specific domain. The domain may be legal system, geographic system etc Expert search finding: Expert search is a task of growing importance in Enterprise settings. An expert search system predicts and ranks the expertise of a set of candidate persons with respect to the users query. Genomic IR: The in-silico revolution has changed how biologists characterise DNA and protein sequences. As a first step to exploring the structure and function of an unknown sequence, biologists search large genomic databases for similar sequences. This process of Genomic IR has allowed significant advances in biology and led to advancements in critical areas such as cancer research.
189
Geographic IR : Geographic IR (GIR) is the augmentation of IR with geographic metadata. GIR involves extracting and resolving the meaning of locations in unstructured text. This is known as Geo-parsing. After identifying location references in text, a GIR system must index this information for search and retrieval Legal IR : Legal IR is the science of IR applied to legal text, including legislation, case law, and scholarly works. Accurate legal IR is important to provide access to the law to laymen and legal professionals Vertical search : A vertical search engine, as distinct from a general web search engine, focuses on a specific segment of online content. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, and travel. 5.3 Other Applications of IR IR has been applied in other fields also such as Adversarial IR , Automatic summarization, Question Answering etc., Adversarial IR : Adversarial IR is a topic in IR related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, and filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation Automatic summarization : Automatic summarization is the creation of a shortened version of a text by a computer program. The phenomenon of information overload has meant that access to coherent and correctly-developed summaries is vital. As access to data has increased so has interest in automatic summarization. An example of the use of summarization technology is employed in Google search engine. Multi-document summarization : Multi-document summarization is an automatic procedure aimed at extraction of information from multiple texts written about the same topic - Compound term processing : Compound term processing is the name that is used for a category of techniques in IR applications that performs matching on the basis of compound terms. Compound terms are built by combining two (or more) simple terms, for example "triple" is a single word term but "triple heart bypass" is a compound term. Cross-lingual retrieval : Cross-Language IR (CLIR) is a subfield of IR dealing with retrieving information written in a language different from the language of the user's query. - Document classification : The task of document classification is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science Spam filtering : is a statistical technique of e-mail filtering. It makes use of a naive Bayes classifier to identify spam e-mail. Question answering : Question Answering (QA) is a computer science discipline within the fields of IR and Natural Language Processing (NLP) which is concerned with building systems that automatically answer questions posed by humans in a natural language. A QA implementation, usually a computer program, may construct its answers by querying a structured database of knowledge or information, usually a knowledge base.
190
6. OPEN ISSUES/CHALLENGES Although the discussed models implement efficiently the stated objectives, but still they lack in efficient retrieval process when context is to be considered. When user submits a query for the first time, the search engine is unable to find a context of the query. However, if some events of web pages can be captured, this problem can be resolved. Some of the open challenges in this area are Reducing the volume of the documents for effective retrieval. i.e., to improve the quality of documents to be considered for retrieval through filtering of irrelevant and redundant documents Ranking of structured and unstructured documents for better accuracy in retrieval Context awareness in both modeling and scaling up of query suggestion Visualization and presentation of search results with in-depth summarized analysis. To address the above challenges, we propose a novel retrieval technique which is query based on the context along with concept which enhances retrieval operation through exploitation of unstructured documents that can increase the focused retrieval of documents especially from web by capturing recent browsing sessions of the user. The snippets used in modern Web search are query based and are proven to be better than static document summaries. For instance, we can examine for the word clouds, in respect of the following: Depth on the query side: to add depth on the user side is a bottleneck for delivering more accurate retrieval results. Users provide only 2 to 3 keywords on average to search in the complete Web. Depth in the document representation: Documents on the Web are rich in structure. Most of the structural elements however are not used consistently throughout the Web. A key question is how to compact with semi structured information. Depth on the result side: While a query can have thousands of relevant results, only the first 10 or 20 results will get any attention in a Web search interface. Often these first n results will still contain redundant information. Our main objective is to exploit query context and document structure to address following challenges Ambiguity in query from the user Appropriate feedback from the user search logs Effective use and exploitation of structured and unstructured documents for better query formulation and search results.
191
7. CONCLUSION In this paper, we have discussed and analyzed various models, algorithms and architectures against their performance that have been used by various researchers in IR. The various models discussed are used for query ranking and query expansion with the help of various feedback techniques that adds context to the query. The various architectures discussed are either completely new architectures or some variations in the existing architecture models to improve the retrieval process by enhancing the context of query. The various algorithms used in IR range from query clustering, query ranking, to query suggestion and query expansion. The query clustering usually clusters a similar query that leads to a similar set of documents viewed by the user. In query ranking algorithm, the queries are ranked according to frequency with which the users submit their queries. The algorithms that use the concept of query expansion use some kind feedback or probability technique to expand the query. Although the discussed models implement efficiently the stated objectives, but still they lack in efficient retrieval process when context is to be considered. Hence exploitation of structured and unstructured documents which can increase the focused retrieval of documents from web has become a challenging one. REFERENCES
[1] Zhen Liao, Nankai University, Daxin Jiang, Microsoft Research Asia, Enhong Chen, University of Science and Technology of China, Jian Pei, Simon Fraser University, HUANHUAN CAO, University of Science and Technology of China, Hang Li, Microsoft Research Asia Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion ACM Transactions, October 2011. Mario Cataldi Universit di Torino, Claudio Schifanella Universit di Torino K. SelukCandan Arizona State University, Maria Luisa SapinoUniversit di Torino Luigi Di Caro Universit di Torino CoSeNa: a Context-based Search and Navigation System 2009 October ACM. Michal Kajaba and PavolNavrat, Personalized Web Search Using Context Enhanced Query.International Conference on Computer Systems and Technologies - CompSysTech09 Chang Liu and Nicholas J. Belkin Implicit Acquisition of Context for Personalization ofInformation Retrieval SystemsCaRR 2011, February 13, 2011, Stanford, CA, USA. Ziv Bar-Yossef Google Inc. MATAM, Bldg 30 Israel and Naama Kraus Computer Science Department Technion, Israel Context-Sensitive Query Auto-CompletionCIKM10, October 2630, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM. RianneKaptein University of Amsterdam, Effective Focused Retrieval by Exploiting Query Context and Document Structure ACM October 6, 2011. Zheng Ye1;2, Xiangji Huang2 and Hongfei Lin1 1Department of Computer Science and Engineering, Dalian University of Technology Dalian China 2 School of Information Technology York University, Toronto, Ontario, M3J 1P3, Canada A Bayesian Network Approach to Context Sensitive Query Expansion SAC11 March 21-25, 2011, TaiChung, Taiwan. Copyright 2011 ACM. Minmin Chen1,Jian-Tao Sun2, Xiaochuan Ni2, Yixin Chen1 1Department of Computer Science and Engineering Washington University in Saint Louis, Saint Louis, MO, USA 2Microsoft Research Asia, Beijing, P.R. China Improving Context-Aware Query Classification viaAdaptive Self-training October 2428, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM. KekeCai, Chun Chen*, Jiajun Bu, Peng Huang, Zhiming Kang College of Computer Science, University Hangzhou,China Exploration of Query Context for Information Retrieval May 8 12, 2007, Banff, Alberta, Canada. ACM.
[2]
[3] [4] [5]
[6] [7]
[8]
[9]
192
International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME [10] Lev Finkelstein, EvgeniyGabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, GadiWolfman, And EytanRuppin Zapper Technologies, Inc. Placing Search in Context: The Concept Revisited ACM Transactions on Information Systems, Vol. 20, No. 1, January 2002. [11] Raymond Y.K. Lau, Centre for Information Technology Innovation, Queensland University of Technology and Peter D. Bruza and Dawei Song, Distributed Systems Technology Centre, The University of Queensland, Australia Belief Revision for Adaptive Information Retrieval July 2529, 2004, Sheffield, South Yorkshire, UK. Copyright 2004 ACM. [12] Jiang Bian,College of Computing, Georgia Institute of Technology, Tie-Yan Liu, Tao Qin Microsoft Research Asia,HongyuanZha,College of Computing, Georgia Institute of Technology Ranking with Query-Dependent Loss for Web Search February 46, 2010, New York City, New York, USA. Copyright 2010 ACM. [13] Yunping Huang, Le Sun Institute of Software, Chinese Academy of Sciences, Beijing, China and Jian-Yun Nie ,Department of Computer Science and Operations Research, University of Montreal, Canada Query Model Refinement Using Word Graphs October 2630, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM. [14] Jing Bai 1, Jian-Yun Nie 1,Hugues Bouchard 2, and Guihong Cao 1 1 Department IRO, University of Montreal Canada 2 Yahoo! Inc. Montreal, Quebec, Canada Using Query Contexts in Information Retrieval July 2327, 2007, msterdam, The Netherlands. Copyright 2007 ACM. [15] Yan Qi Arizona State University Tempe, USA, K. SelukCandan, Arizona State University, Tempe, AZ 85287, USA and Maria Luisa Sapino ,Universita di Torino,ItalyFICSR: Feedback-based InConSistencyResolution and Query Processing on Misaligned Data Sources June 1214, 2007, Beijing, China. Copyright 2007 ACM. [16] Tangjian Deng, Liang Zhao, Ling Feng Tsinghua ,National Laboratory for Information Science and Technology Tsinghua University, Beijing, China and WenweiXue Nokia Research Center, Beijing, China Information Re-finding by Context: A Brain MemoryInspired Approach October 2428, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM. [17] Xing Wei, FuchunPeng, Huihsin Tseng Yumao Lu, Benoit Dumoulin Yahoo! Labs, California, USA, Context Sensitive Synonym Discovery for Web SearchQueries November 26, 2009, Hong Kong, ChinaCopyright 2009 ACM. [18] Ivan T. Bowman, School of Computer Science, University of Waterloo And Kenneth Salem School of Computer Science ,University of Waterloo Optimization of Query Streams Using SemanticPrefetching June 1318 2004, Paris, France, Copyright 2004 ACM. [19] Giorgio Orsi, Politecnico di Milano,Italy,LetiziaTanca,Politecnico di Milano, Italy, Eugenio Zimeo,Universit del Sannio,ItalyKeyword-based, Context-aware Selection of Natural Language Query Patterns March 2224, 2011, Uppsala, Sweden., Copyright 2011 ACM. [20] Huanhuan Cao1,Daxin Jiang2 Jian Pei3 Enhong Chen1 Hang Li2 ,1University of Science and Technology of China 2Microsoft Research Asia 3Simon Fraser University Towards ContextAware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs April 2024, 2009, Madrid, Spain. ACM. [21] Carla Teixeira Lopes, Departamento de EngenhariaInformticaFaculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias , Portugal, Cristina Ribeiro, Departamento de EngenhariaInformticaFaculdade de Engenharia, Universidade do Context Effect on Query Formulation and Subjective Relevance in Health Searches August 1821, 2010, New Brunswick, New Jersey, USA. Copyright 2010 ACM. [22] Xiaohui Yan, JiafengGuo, Xueqi Cheng, Institute of Computing Technology, CASBeijing, China Context-Aware Query Recommendation by Learning High-Order Relation in Query Logs October 2428, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM. [23] HaizhouFu,North Carolina State, University, Raleigh, NC, SidanGao,North Carolina State University, Raleigh, NC,KemaforAnyanwu,North Carolina State, University, Raleigh, NC CoSi: Context-Sensitive Keyword Query Interpretation on RDF Databases 2011, March 28 April 1, 2011, Hyderabad, India. ACM. 193
International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME [24] Ying-Hsang Liu Nicholas J. Belkin, Rutgers University, USA Query Reformulation, Search Performance, and Term Suggestion Devices in Question-Answering Tasks Information Interaction in Context, 2008, London, UK Copyright 2008 ACM. [25] Protima Banerjee, College of Information Science and Technology, Drexel University Philadelphia, and Hyoil Han ,College of Information Science and Technology, Drexel University Philadelphia, USA Incorporation of Corpus-Specific Semantic Information into Question Answering Context October 30, 2008, Napa Valley, California, USA. Copyright 2008 ACM. [26] A. K. Sharma Computer Engg. Department YMCA Univ. of Sc. & Technology Faridabad, India, NeelamDuhan Computer Engg. Department YMCA Univ. of Sc. & Technology Faridabad, India and Bharti Sharma Computer Engg. Department MVN Instt. ofEngg& Technology Palwal, IndiaA Semantic Search System using Query Definitions December 2830, 2010, Allahabad, UP, India. Copyright 2010 ACM. [27] WenweiXue, HungkengPung, Paulito P. PalmesSchool of Computing, National University of Singapore , Singapore 117543 and Tao GuInstitute for Infocomm Research ,Terrace, Singapore Schema Matching for Context-Aware Computing September 21-24, 2008, Seoul, Korea. Copyright 2008 ACM. [28] Huanhuan Cao1 Derek Hao Hu2 Dou Shen3 Daxin Jiang4 ,Jian-Tao Sun4 ,Enhong Chen and Qiang Yang2 ,1University of Science and Technology of China 2Hong Kong University of Science and Technology 3Microsoft Corporation 4Microsoft Research Asia Context-Aware Query Classification July 1923, 2009, Boston, Massachusetts, USA. Copyright 2009 ACM. [29] Reiner Kraft, Chi Chao Chang, FarzinMaghoul, Ravi Kumar Yahoo!, Inc. Sunnyvale, USA Searching with Context. [30] Mariam Daoud,LyndaTamine-Lechani and MohandBoughanem Institute de Recherche enInformatique de Toulouse, FranceLearning user interests for a session-based personalized search Information Interaction in Context, 2008, London, UK. Copyright 2008 ACM. [31] Ji-Rong Wen, Microsoft Research Asia Beijing, China,Ni Lao, Tsinghua University Beijing, China and Wei-Ying Ma Microsoft Research Asia Beijing, China Probabilistic Model for Contextual Retrieval July 25-29, 2004, Sheffield, South Yorkshire, UK. Copyright 2004 ACM. [32] Lidong Bing Wai Lam ,Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong Shatin, Hong Kong and Tak-Lam Wong Department of Mathematics and Information Technology The Hong Kong Institute of Education Using Query Log and Social Tagging to Refine Queries Based on Latent Topics October 2428, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM. [33] Liang Jeff Chen, UC San Diego La Jolla, CA, US and YannisPapakonstantinou UC San Diego Context-sensitive Ranking for Document Retrieval June1216, 2011, Athens, Greece. Copyright 2011 ACM. [34] Reiner Kraft, FarzinMaghoul and Chi Chao ChangYahoo!, Inc.701 First AvenueSunnyvale, CA 94089Y!Q: Contextual Search at the Point of Inspiration October 31November 5, 2005, Bremen, Germany. Copyright 2005 ACM. [35] ZimingZhuang, The Pennsylvania State University, University Park, USA and SilviuCucerzan Microsoft Research Redmond, USA Re-Ranking Search Results Using Query Logs November 511, 2006, Arlington, Virginia, USA. ACM. [36] Huanhuan Cao1 Daxin Jiang2 Jian Pei3 Qi He4, Zhen Liao5, Enhong Chen1 ,Hang Li2 ,1University of Science and Technology of China ,2Microsoft Research Asia, 3Simon Fraser University,4Nanyang Technological University ,5Nankai UniversityContext-Aware Query Suggestion by Mining Click-Through and Session Data August 2427, 2008, Las Vegas, Nevada, USA. Copyright 2008 ACM. [37] Christian Sengstock and Michael Gertz Institute of Computer Science, University of Heidelberg, GermanyCONQUER: A System for Efficient Context-awareQuery Suggestions 2011, March 28April 1, 2011, Hyderabad, India, ACM. 194

A Survey On Various Architectures, Models and Methodologies For Information Retrieval

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Survey On Various Architectures, Models and Methodologies For Information Retrieval

Enviado por

Direitos autorais:

Formatos disponíveis

International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING 6367(Print), ISSN 0976 6375(Online) Volume

e 4, Issue 1, January- February (2013), IAEME & TECHNOLOGY (IJCET)

A SURVEY ON VARIOUS ARCHITECTURES, MODELS AND METHODOLOGIES FOR INFORMATION RETRIEVAL

KL_Divergence Retrieval Model[9].

Query model refinement approach[13]. Probabilistic model[22].

Inference MRR are respectivel-y improved by 19.7%, 25.5% and 24.1%

Modeling Search Context by CRF[20].

Documents with probability p(d)

Query driven Feedback based Conflict resolution[15].

the stabbed version was 60% faster

Query Model and Ranking Model[33].

Liang Jeff Chen

vk-aggregate document parameter- sc

Mean precision 10.2 for 30 query

set of keywords (Qk)

SAFE architecture [19].

CoSisystem architecture [23].

Architecture Of the CONQUER System[ 37].

patterns and their synopses

Y!Q System Design And Architecture[29].

Table 2: Comparison of various Architectures proposed by different authors for IR

METHODOLOGIES PROPOSED BY DIFFERENT BY AUTHORS

Rank averaging algorithm Reneirkraft et al[29]. MC4 algorithm

Liang Jeffchen et al[33].

Huanhuancao et al[36]. Zimingzhuang et al[35].

Zhen liao et al [1].

Mariam daoud[30]. Minminchen[31] .

Table 3: Comparison of different methodologies for IR

[3] [4] [5]

Você também pode gostar