Você está na página 1de 17

SIDHARTHA SARANGI

Roll: 064021
Regno:0601292094
WEB SEARCH ENGINE
 A web search engine is a tool designed to search for information on the World Wide Web.
The search results may consist of web pages, images, information and other types of files.
Search engine work algorithmically or are a mixture of algorithmic and human input.

 According to “netcraft”
there are around 240000000
web domains globally.
Current search engines:
Generation of search
engine
 First generation search engine:
 Search results were depended on what was on the Web page. factors included keyword density, title, and where in the document keywards appeared.
 First generation added relevancy for META tags, keywords in the domain name, and a few bonus points for having keywords in the URL.

 Second generation search engine:


 Employ tracking clicks, link popularity and link quality. Then they added context where two-word keyword pairs were extracted from a page to better
categorize it.
 Google's Page Rank system and the length of visits are the evidence of 2 generation search engine.
nd

 Third generation search engine:


 It adds word stemming to keep a search in context. Auto extraction of keyword pairs helps categorize a page. It extracts data about your individual
searching habits. It adds Web maps which are a useful filtering tool to get rid of duplicate sites.
works:
 A search engine operates, in the following
order
 Web crawling
 Indexing
 Searching
Web crawling
 Web search engines work by storing information about many web
pages, from the WWW by a Web crawler (spider) — an automated
Web browser which follows every link it sees.

 Googlebot is Google’s web crawling robot.


It functions like web browser, by sending a
request to a web server for a web page,
downloading the entire page,
then handing it off to
Google’s indexer.

 Search engine spiders do not read pages


the way a human does. Instead, they tend
to see only particular stuff
and are blind
for many extras (Flash, JavaScript ,images)
that are intended for humans.
Spider simulator:Bput.org
As we can see the
images,flash,javascript
/vbscript does not
have any Impact on
the webspider.

The only thing matters


is text, in-bound / out-
bound links, meta key-
words etc.
Indexing
 Web crawler gives the indexer the full text of the pages it finds. These pages
are stored in Google’s index database by search term, with each index entry
storing a list of documents in which the term appears and the location within
the text where it occurs. This data structure allows rapid access to documents
that contain user query terms.
 To improve search performance,
Google ignores stop words (such
as the, is, on, or, of, how, why,
as well as certain single digits
and single letters).
The indexer also
ignores some punctuation
and multiple spaces, as well as
converting all letters to lowercase,
to improve Google’s performance.
Searching:
The query processor has
several parts, including the user
interface (search box), the
“engine” that evaluates queries
and matches
them to relevant documents,
and the results formatter.
engine: it’s time to look beyond
google

 3D search engine
 Theme search engine
 Meta search engine
3D search engine:
 A Search engines that can mine catalogs of three-
dimensional objects , which lets users create images as
queries for searches.
 Query formulation
Users can select objects from a catalog of images based on
product groupings, or they can let users draw a 2D or 3D
representation of the object they want to find.
 Search process
It uses algorithms to convert the selected or drawn image-
based query into a mathematical model. The search system
then compares the mathematical description of the drawn
or selected object to those of 3D objects stored in a
database, looking for similarities in the described features.
 Ex : Princeton 3D Model Search Engine
http://shape.cs.princeton.edu/search.html
Theme search engine:
 It is called as `in context' searching or on topic
searching.

 What you say your page is about, what the search


engine calculates your page to be about, and what the
rest of the Internet thinks your page is about, must
match, according to their mathematical formulas.

 The 2nd & 3rd Generation search engines are example of


theme search engine.
Meta search engine:
 A meta-search engine is a search tool that sends user requests to
several other search engines and/or databases and aggregates the
results into a single list or displays them according to their source.

 Web is too large for any one search engine to index it all and that more
comprehensive search results can be
obtained by combining the results from
several search engines. This also
may save the user from having to
use multiple search engines
separately. This also
helps in deep web searching.

 Metasearch engines create what is


known as a virtual database.
They take a user's request,
pass it to
several other heterogeneous search
engines and then compile the results.
Search engine
optimization
 Search engine optimization (SEO) is the
process of improving the volume or quality of
traffic to a web site from search engines.
 Current Optimization Strategies
 1. Cloaking: Hide it from the spider’s eye.
 2. Keyword Weight: Use proper key word.
 4. Stop Words: Be careful with stop words.
 5. Redundancy: Don’t use same pages again.
 6. Lengthy Pages: focus on one topic

Você também pode gostar