Você está na página 1de 6

The internet is an iceberg.

And, as you might guess, most of us only reckon with


the tip. While the pages and media found via simple searches may seem unendingl
y huge at times, what is submerged and largely unseen
often referred to as the i
nvisible web or deep web is in fact far, far bigger.
The Surface Web
What we access every day through popular search engines like Google, Yahoo or Bi
ng is referred to as the Surface Web. These familiar search engines crawl throug
h tens of trillions of pages of available content (Google alone is said to have
indexed more than 30 trillion web pages) and bring that content to us on demand.
As big as this trove of information is, however, this represents only the tip o
f the iceberg.
Eric Schmidt, the CEO of Google, was asked to estimate the size of the World Wid
e Web. He estimated that of roughly 5 million terabytes of data, Google has inde
xed roughly 200 terabytes, or only .004% of the total internet.
The Invisible Web
Beneath the Surface Web is what is referred to as the Deep or Invisible Web. It
is comprised of:
Private websites, such as VPN (Virtual Private networks) and sites that requ
ire passwords and logins
Limited access content sites (which limit access in a technical way, such as
using Captcha, Robots Exclusion Standard or no-cache HTTP headers that prevent
search engines from browsing or caching them)
Unlinked content, without hyperlinks to other pages, which prevents web craw
lers from accessing information
Textual content, often encoded in image or video files or in specific file f
ormats not handled by search engines
Dynamic content created for a single purpose and not part of a larger collec
tion of items
Scripted content, pages only accessible using Java Script, as well as conten
t downloaded using Flash and Ajax solutions
There are many high-value collections to be found within the invisible web. Some
of the material found there that most people would recognize and, potentially,
find useful include:
Academic studies and papers
Blog platforms
Pages created but not yet published
Scientific research
Academic and corporate databases
Government publications
Electronic books
Bulletin boards
Mailing lists
Online card catalogs
Directories
Many subscription journals
Archived videos
Images
But knowing all
really help the
f the invisible
nd sort through

these materials are out there, buried deep within the web doesn t
average user. What tools can we turn to in order to make sense o
web? There really is no easy answer. Sure, the means to search a
massive amounts of invisible web information are out there, but

many of these tools have an intense learning curve. This can mean sophisticated
software that requires no small amount of computer savvy; it can mean energy-suc
king search tools that require souped up computers to handle the task of combing
through millions of pages of data; or, it can require the searching party to be
unusually persistent
something most of us, with our expectations of instantaneo
us Google search success, won t be accustomed to.
All that being said, we can become acquainted with the invisible web by degrees.
The many tools considered below will help you access a sizable slice of the inv
isible web s offerings. You will find we ve identified a number of subject-specific
databases and engines; tools with an established filter, making their searches m
uch more narrow.
Open Access Journal Databases
Open access journal databases (OAJD) are compilations of free scholarly journals
maintained in a manner that facilitates access by researchers and others who ar
e seeking specific information or knowledge. Because these databases are compris
ed of unlinked content, they are located in the invisible web.
The vast majority of these journals are of the highest quality, with peer review
s and extensive vetting of the content before publication. However, there has be
en a trend of journals that are accepting scholarship without adequate quality c
ontrols, and with arrangements designed to make money for the publishers rather
than furtherance of scholarship. It is important to be careful and review the st
andards of the database and journals chosen. This helpful guide explains what to l
ook for.
Below is a sample list of well-regarded and reputable databases.
AGRIS (International Information System for Agricultural Science and Technolog
y) is a global, public domain database maintained in multiple languages by the F
ood and Agriculture Organization of the United Nations. They provide free access
to agricultural research and information.
BioMed Central is the UK-based publisher of 258 peer-reviewed open access jour
nals. Their published works span science, technology and medicine and include ma
ny well-regarded titles.
Copernicus Publications has been an open-access scientific publisher in German
y since 2001. They are strong supporters of the researchers who create these art
icles, providing top-level peer review and promotion for their work.
DeGruyter Open (formerly Versita Open) is one of Germany s leading publishers of
open access content. Today DeGruyter Open (DGO) publishes about 400 owned and t
hird-party scholarly journals and books across all major disciplines.
Directory of Open Access Journals is focused on providing access only to thos
e journals that employ the highest quality standards to guarantee content. They
are presently a repository of 9,740 journals with more than 1.5 million articles
from 133 countries.
EDP Sciences (dition Diffusion Presse Sciences) is a France-based scientific pu
blisher with an international mission. They publish more than 50 scientific jour
nals, with some 60,000 published pages annually.
Elsevier of Amsterdam is a world leader in advancing knowledge in the science
, technology and health fields. They publish nearly 2,200 journals, including Th
e Lancet and Cell, and over 25,000 book titles, including Gray s Anatomy and Nelso
n s Pediatrics.
Hindawi Publishing Corporation , based in Egypt, publishes 434 peer-reviewed, o
pen access journals covering all areas of Science, Technology and Medicine, as w
ell as a variety of Social Sciences.
Journal Seek (Genamics) touts itself as the largest completely categorized data
base of freely available journal information available on the internet, with more
than 100,000 titles currently. Categories range from Arts and Literature, throu
gh both hard- and soft-sciences, to Sports and Recreation.

The Multidisciplinary Digital Publishing Institute (MDPI), based in Switzerlan


d, is a publisher of more than 110 peer-reviewed, open access journals covering
arts, sciences, technology and medicine.
Open Access Journals Search Engine (OAJSE), based in India, is a search engine
for open access journals from throughout the world, except for India. An extrem
ely simple interface. Note: the site was last updated June 21, 2013.
Open J-Gate is an India-based e-journal database of millions of journal articl
es in open access domain. With a worldwide reach, Open J-Gate is updated every d
ay with new academic, research and industry articles.
Open Science Directory contains about 13,000 scientific journals, with another
7,000 special programs titles.
Springer Open offers a roster of more than 160 peer-reviewed, open access jour
nals, as well as their more recent addition of free access books, covering all s
cientific disciplines.
Wiley Open Access , a subsidiary of New Jersey-based global publishers John Wil
ey & Sons, Inc., publishes peer reviewed open access journals specific to biolog
ical, chemical and health sciences.
Invisible Web Search Engines
Your typical search engine s primary job is to locate the surface sites and downlo
ads that make up much of the web as we know it. These searches are able to find
an array of HTML documents, video and audio files and, essentially, any content
that is heavily linked to or shared online. And often, these engines, Google chi
ef among them, will find and organize this diversity of content every time you s
earch.
The search engines that deliver results from the invisible web are distinctly di
fferent. Narrower in scope, these deep web engines tend to access only a single
type of data. This is due to the fact that each type of data has the potential t
o offer up an outrageous number of results. An inexact deep web search would qui
ckly turn into a needle in a haystack. That s why deep web searches tend to be mor
e thoughtful in their initial query requirements.
Below is a list of popular invisible web search engines:
Clusty is a meta search engine that not only combines data from a variety of d
ifferent source documents, but also creates clustered responses, automatically sor
ting by category.
CompletePlanet searches more than 70,000 databases and specialty search engine
s found only in the invisible web. A search engine as well-suited to casual sear
chers as it is to researchers.
DigitalLibrarian : A Librarian s Choice of the Best of the Web is maintained by a
real librarian. With an eclectic mix of some 45 broad categories, Digital Libra
rian offers data from categories as diverse as Activism/Non Profits and Railroad
s and Waterways.
InfoMine is another librarian-developed internet resource collection, this tim
e from The Regents of the University of California.
InternetArchive has an eclectic array of categories, starting with the Wayback
Machine, which allows the searcher to locate archived documents, and including an
archive of Grateful Dead audience and soundboard recordings. They offer 6 milli
on texts, 1.5 million videos, 1.9 million audio recordings and 126K live music c
oncerts.
The Internet Public Library (ipl and ipl2) is a non-profit, student-run websit
e at Drexel University. Students volunteer to act as librarians and respond to q
uestions from visitors. Categories of data include those directed to Children an
d Teens.
SurfWax is a metasearch engine that offers practical tools for Dynamic Search N
avigation. It offers the option of grabbing results from multiple search engines
at the same time, or even designing SearchSets, which are individualized groups of
sources that can be used over and over in searches.

UC Santa Barbara Library offers access to a diverse group of research database


s useful to students, researchers and the casual searcher. It should be noted th
at many of these resources are password protected. Those that do not display a l
ock icon are publicly accessible.
USA.gov offers acess to a huge volume of information, including all types of f
orms, databases, and information sites representing most government agencies.
Voice of the Shuttle (VoS) offers access to a diverse assortment of sites, inc
luding literature, literary theory, philosophy, history and cultural studies, an
d includes the daily update of all things cool.
Subject -Specific Databases
The following lists pool together some mainstream and not so mainstream database
s dedicated to particular fields and areas of interest. While only a handful of
these tools are able to surface deep web materials, all of the search engines an
d collections we have highlighted are powerful, extensive bodies of work. Many o
f the resources these tools surface would likely be overlooked if the same query
were made on one of the mainstream engines most users fall back on, like Bing,
Yahoo and even Google.
Art & Design
ArtNet deals with pricing and sourcing work in the art market. They also keep
track of the latest news and artists in the industry.
The Metropolitan Museum of Art site hosts an impressively interactive body of
information on their collections, exhibitions, events and research.
Muse du Louvre , the renowned museum, maintains a site filled with navigable sec
tions covering its collections.
The National Gallery of Art premier museum of arts in our nation s capital, also
maintains a site detailing the highlights, exhibitions and education efforts th
e institution oversees.
Public Art Online is a resource detailing sources, creators, prices, projects,
legal issues, success stories, resources, education and all other aspects of th
e creation of public art.
Smithsonian Art Inventories Catalog is a subset of the Smithsonian Institution
Research Information System (SIRIS). A browsable database of over 400,000 art i
nventory items held in public and private collections.
Web Gallery of Art is a searchable database of European art, containing nearly
34,000 reproductions. Additional database information includes artist biographi
es, period music and commentaries.
Business
Better Business Bureau (BBB) Information System Search allows consumers to loc
ate the details of ratings, consumer experience, governmental action and more of
both BBB accredited and non-accredited businesses.
BPubs.com is the business publications search engine. They offer more than 200
free subscriptions to business and trade publications.
BusinessUSA is an excellent and complete database of everything a new or exper
ienced business owner or employer should know.
EDGAR: U.S. Securities and Exchange Commission contains a database of Securiti
es and Exchange Commission. Posts copies of corporate filings from US businesses
, press releases and public statements.
Global Edge delivers a comprehensive research tool for academics, students and
businesspeople to seek out answers to international business questions.
Hoover s , a subsidiary of Dun & Bradstreet, is one of the best known databases o
f American and International business. A complete source of company and industry
information, especially useful for investors.
The National Bureau of Economic Research is perhaps the leading private, nonpartisan research organization dedicated to unbiased analysis of economic policy
. This database maintains archives of research data, meetings, activities, worki

ng papers and publications.


U.S. Department of Commerce , Bureau of Economic Analysis is the source of many
of the economic statistics we hear in the news, including national income and p
roduct accounts (NIPAs), gross domestic product, consumer spending, balance of p
ayments and much more.
Legal & Social Services
U.S. Department of Justice Resources is a comprehensive database for the Depar
tment of Justice, including archives, initiatives, news, publications and resour
ces.
Federal Bureau of Investigation (FBI) Stats & Services organizes crime statist
ics, criminal history checks, a sex offender registry, resources for businesses,
communities, crime victims, law enforcement, job seekers, researchers and stude
nts.
Homeland Security Digital Library (HSDL) maintains databases, policy and strat
egy statements, special collections and research tools.
National Criminal Justice Reference Service (NCJRS) is a federally funded reso
urce offering extensive databases detailing issues of justice, substance abuse,
and victim assistance information to victims of crime, among other topics.
Social Work Policy Institute supports research in social work with databases,
publications, archives, foundation news and events.
UNESCO Human Rights Institute Database aintains a searchable body of data and r
eports on human rights cases, victims and participants.
Science & Technology
Environmental Protection Agency rganizes the agency s laws and regulations, scie
nce and technology, and the many issues affecting the agency and its policies.
National Science Digital Library (NSDL) is a source for science, technology, e
ngineering and mathematics educational data. It is funded by the National Scienc
e Foundation.
Networked Computer Science Technical Reports Library (NCSTRL) was developed a
s a collaborative effort between NASA Langley, Virginia Tech, Old Dominion Unive
rsity and University of Virginia. It serves as an archive for submitted scientif
ic abstracts and other research products.
Science.gov is a compendium of more than 60 US government scientific databases
and more than 200 websites. Governed by the interagency Science.gov Alliance, t
his site provides access to a range of government scientific research data.
Science Research is a free, publicly available deep web search engine that pur
ports to use a sophisticated technology that permits queries to more than 300 sc
ience and technology sites simultaneously, with the results collated, ranked and
stripped of duplications.
WebCASPAR provides access to science and engineering data from a variety of US
educational institutions. It incorporates a table builder, allowing a combined
result from various National Science Foundation and National Center for Educatio
n Statistics data sources.
WebCASPAR World Wide Science is a global scientific gateway, comprised of US a
nd international scientific databases. Because it is multilingual, it allows rea
l-time search and translation of reporting from an extensive group of databases.
Healthcare
Cases Database is a searchable database of more than 32,000 peer-reviewed medi
cal case reports from 270 journals covering a variety of medical conditions.
Center for Disease Control (CDC) WONDER s online databases permit access to the
substantial public health data resources held by the CDC.
HCUPnet is an online query system for those seeking access to statistical data
from the Agency for Healthcare Research and Quality.
Healthy People provides rolling 10-year national objectives and programs for i

mproving the health of Americans. They currently operate under the Healthy Peopl
e 2020 decennial agenda.
National Center for Biotechnology Information (NCBI) is an offshoot of the Nat
ional Institutes of Health (NIH). This site provides access to some 65 databases
from the various project categories currently being researched.
OMIM offers access to the combined research of many decades into genetics and
genetic disorders. With daily updates, it represents perhaps the most complete s
ingle database of this sort of data.
PubMed is a database of more than 23 million citations from the US National L
ibrary of Medicine and National Institutes of Health.
TOXNET is the access portal to the US Toxicology Data Network, an offshoot of
the National Library of Medicine.
U.S. National Library of Medicine is a database of medical research, available
grants, available resources. The site is maintained by the National Institutes
of Health.
World Health Organization (WHO) is a comprehensive site covering the many init
iatives the WHO is engaged in around the world.

Você também pode gostar