Escolar Documentos
Profissional Documentos
Cultura Documentos
Submitted By:
AANCHAL GARG
CSE
Web Crawlers
Why do we need web crawlers
Prerequisites of a Crawling System
General Crawling Strategies
Crawling Policies :
◦ Selection policy
◦ Re-visit policy
◦ Politeness policy
◦ Parallelization policy
Web crawler architectures
Distributed web crawling
Focused Crawlers
Examples of Web crawlers
Spiders, robots, bots, aggregators, agents and
intelligent agents.
An Internet-aware program that can retrieve
information from specific location on the Internet.
A program that collects documents by recursively
fetching links from a set of starting pages.
Mainly depends on crawling policies used.
It starts with a list of URLs to visit, called the seeds.
As the crawler visits these URLs, it identifies all the
hyperlinks in the page and adds them to the list of
URLs to visit, called the crawl frontier.
•Internet has a
wide expanse of
Information.
• Finding relevant
information
requires an
efficient
mechanism.
• Web Crawlers
provide that scope
to the search
engine.
Flexibility: System should be suitable for various
scenarios.