Você está na página 1de 14

An introduction to Geographic Information

Retrieval Systems

Ajay Kumar Garg Engineering College,
Ghaziabad

Uttar Pradesh Technical University

June, 2014
by
Nishant Shekhar (1002710065)
Nishank Garg (1002710064)
Nikhil Babu (1002710062)
Sumit Jha (1002710106)



1. INTRODUCTION
2. OBJECTIVE
3. BASIC CONCEPTS OF GIRS
4. INVERTED INDEX
5. TRADITIONAL SEARCH VS GEO SEARCH
6. COMPARISION WITH EXISTING SEARCH ENGINE
7. CONCLUSION
*Geographic information retrieval System(GIRS) is a
fast developing area concerned with providing
access to geo-referenced information sources.

*It is a useful premise to assume that every document
in a collection and every query issued to an
information retrieval (IR) system is geography-
dependent. If we can globally determine what area an
article or a document is about (i.e., its geographical
scope), we can reasonably assume that people, places
and organizations named in the article are located in
the area.

*The project presents our work on automatically identifying the
geographical scope of Web documents, which provides the means to
develop retrieval tools that take the geo-graphical context into
consideration.

Other objectives are:-

*Detection of geographic references in the documents.
*Modeling of geographic scope of documents.
*Relevance ranking according to geographic context.
*Need for efficient index techniques which cope with both textual and
spatial dimensions.
*Development of user interfaces which provide usability to deal with both
dimensions.

Web crawling and indexes: Web crawling is the process by which
we gather pages from the Web, in order to index them and support a
search engine. The objective of crawling is to quickly and efficiently
gather as many useful web pages as possible, together with the link
structure that interconnects them. It is sometimes referred to as a
spider.



The basic operation of any hypertext crawler is as follows.
1. The crawler begins with one or more URLs that constitute a seed set.
2. It picks a URL from this seed set, and then fetches the web page at that
URL.
3. The fetched page is then parsed, to extract both the text and the links
from the page (each of which points to another URL).

Document Parsing: Document parsing breaks apart the components
(words) of a document or other form of media for insertion into the
forward and inverted indices. The words found are called tokens, and so, in
the context of search engine indexing and natural language processing,
parsing is more commonly referred to as tokenization.
It is also sometimes called text segmentation, content analysis, text
analysis, text mining, speech segmentation, lexical analysis. The terms
'indexing', 'parsing', and 'tokenization' are used interchangeably in
corporate slang.

Document Retrieval:

Document retrieval is defined as the matching of some stated user query
against a set of free-text records. These records could be any type of mainly
unstructured text, such as newspaper articles, real estate records or
paragraphs in a manual. User queries can range from multi-sentence full
descriptions of an information need to a few words.

Document retrieval is sometimes also referred to as, or as a branch of, Text
Retrieval. Text retrieval is a branch of information retrieval where the
information is stored primarily in the form of text. Text databases became
decentralized.

*Inverted index is an index data structure storing a mapping from
content, such as words or numbers, to its locations in a database
file, or in a document or a set of documents. The purpose of an
inverted index is to allow fast full text searches, at a cost of
increased processing when a document is added to the database.

There are two main variants of inverted indexes:

*A record level inverted index (or inverted file index or just
inverted file) contains a list of references to documents for each
word.
*A word level inverted index (or full inverted index or inverted
list) additionally contains the positions of each word within a
document.


Ranking according to
subject-relevance and
Geographic attributes
Ranking according to
subject-relevance

Boolean operations on
Spatial database followed
by inverted index
Boolean operations on
inverted index.
User enters key words and
geographic details
User enters key words
Geographic Search Traditional Search

*This project aims at determining the geographical scope of the web documents,
fetching them from the web, indexing them according to their geographical location
as well as based upon their textual information. Now the intersection is taken in order
to determine the exact information about the document and arranging them and
displaying the result.

*To support information retrieval , its fundamental that the Web page geographic
classification is very accurate and the classification of each Web page is a very
narrow region (for example: cities, streets). To improve the usability levels of project,
the following functionalities have to be extended:

*Support of geographic querys with multiple geo-graphic scopes.
*Support of complex semantic relations between the query object and geographic
scope
*Employ the user disambiguation history to improve the geographic disambiguation.
*Generate document summaries that would allow the user to visualize the most
important information of each result, without consulting the full document.

Você também pode gostar