Escolar Documentos
Profissional Documentos
Cultura Documentos
Foreword
T
solutions.
his whitepaper is intended to aid you in evaluating search and information access proposals for your organization by detailing a very important, often overlooked, cost component: scaling your search solution. Too many customers are surprised to find that almost immediately after deploying a search engine, they need to scale their
platformand that the cost of scaling can be exorbitant. This paper therefore: Identifies the reasons why search needs escalate so frequently and dramatically, Explains why scaling is often expensive, Provides practical advice for anticipating and controlling costs, and Furnishes performance benchmarks for more effectively making cost comparisons between
We hope this information will aid you in developing a complete TCO forecast for your search platform, one that effectively incorporates the costs associated with scaling functionality and/or performance in addition to more easily identifying direct, indirect and upgrade costs. The Authors
Table of Contents
1 Why Search Demands & Costs Escalate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 Users Demand Wider Access, More Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.2 IT Discovers New Uses, Additional Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 2 3 Anticipating and Controlling Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Forecasting Demand: Five Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 3.1 Double Your Estimated Volume; Anticipate Double-Digit Growth . . . . . . . . . . . . . . . . . . . .3 3.2 Plan for Additional Data Sources, including the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 3.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration . . . . . . . . . . .4 3.4 Plan for Increased Compliance Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 3.5 Position Yourself for the Unexpected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 4 Understanding Search Types & Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4.1 Legacy Enterprise Search Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4.2 Search Add-Ons from Mainstream Application Providers . . . . . . . . . . . . . . . . . . . . . . . . . .5 4.3 Web Search Engines Ported to the Enterprise 5 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Establishing Apples-to-Apples Cost Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 About Exalead CloudViewTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.1 Dual Web/Enterprise DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.2 High Performance with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.3 Infinite Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.4 True Unified Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 6.5 Rapid Time to Market, Agile Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 7 CloudView Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 7.1 Enterprise Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 7.2 Business Applications - Database Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 7.3 Web Applications - Online Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 7.4 Web Applications - Online Classifieds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Figures
Fig. 1: Data Volume Managed by Companies Worldwide (IDC) . . . . . . . . . . . . . . . . . . . . . . . . .3 Fig. 2: CloudView Scales with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Fig. 3: CloudView Scales Infinitely in Five Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware . . . . . . . . . . . . . . . . .8 Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource . . . . .9
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Increasing the Value of Structured Data Because of performance limitations (databases are optimized for storing, not accessing data), and heavy licensing and infrastructure costs, database resources are frequently under-utilized in the enterprise. However, when IT managers discover that index-based querying is as rich as relational database querying yet ten times faster and cheaper, they begin to use search engines to provide alternative access to essential database content. Unified Data Access for Agile Applications Unified data access is essential for meeting escalating compliance requirements, and for satisfying Web-savvy users appetite for fast, easy information access. However, by decoupling data from traditional application layers, enterprises are also learning that search engines can enable a new breed of light business applications. Known as Search-Based Business Applications (SBAs), these applications can be created on-the-fly to satisfy evolving business needs using information drawn from any sourcefrom legacy databases to email, blogs, and the Webwhile leaving existing systems and structures untouched, an approach that preserves existing IS investments and is clearly less complex and costly than traditional data and application integration strategies. Maximize Benefit; Avoid Sticker Shock Given these benefits to both end users and IT managers, it is no wonder that functional and performative search demands escalate so frequently. And it is in this attempt to meet these escalating search demands by scaling hardware, infrastructure and functionality that organizations frequently encounter search sticker shock. They boost RAM, add servers, increase bandwidth, add or upgrade licenses, and set about the difficult (sometimes impossible) task of trying to make simple search tools perform complex analytic functions. But, given that search is too often a complex, resource-intensive
Without built-in scalability, even low cost solutions can skyrocket to millions of dollars in just a few years Search-Based Business Applications (SBAs) are fast and easy to construct and can incorporate data from any source
process, with infrastructure requirements increasing exponentially with increases in functional requirements, and that scaling is often tied to proprietary hardware or to unreasonable user or document counts, it is easy to see why costs can quickly mount. Even some solutions that begin at only a few thousand dollars can skyrocket to millions of dollars within just a few short years (sometimes even within one year) when functional or performance needs escalate.
Page 2
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
This rapid, often unmeasured growth is the reason enterprises frequently under-estimate the initial volume of content that needs to be indexed. Even for accurately forecast volumes, one must keep in mind that the organic growth in corporate datastores means that the content load for enterprise search typically doubles every 6 months. Therefore, to be safe, develop your best volume estimate, then double it, and plan for double-digit growth post-implementation.
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
3.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration
These same Web-savvy users are also demanding that enterprise search, and the business applications built upon search platforms, be as easy and intuitive to use as Web applications, if not seamlessly integrated with those same tools. Make sure prospective bidders can meet users demands for: Zero-training usage (for search and search-based applications) The ability to leverage Web and personal information for business tasks (e.g., using their LinkedIn network for sales and recruiting or integrating FaceBook data in CRM applications) Web 2.0/3.0 interactive capabilities, such as workflow integration and collaborative tools like resource tagging, bookmarking and sharing Fresh, up-to-date data
As people spend more time online actively participating in Web 2.0 technologies such as rich user interfaces based on Ajax and Flash, social networking and tagging, blogs and wikis, Web mashups, and on-demand services in general, information workers will start expecting Enterprise 2.0 applications in the workplace that focus on providing easy-to-use and many-to-many personalized online experiences for creating, publishing, locating, and sharing information with colleagues, customers, and partners.
Susan Feldman, IDC, Worldwide Search and Discovery Software 2008-2012 Forecast Update and
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Originally designed for limited data collections and a small, trained user base, traditional enterprise systems are often difficult to scale
products typically offer limited text analytics (i.e., limited ability to process unstructured data), are expensive to connect to external data sources, and expensive to scale due to restrictive licensing policies and resource-intense engineering. Many of these vendors have attempted to address these shortcomings by acquiring native enterprise search companies, with limited success in product integration and support.
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
ingest, process and integrate structured content. They likewise have limited built-in support for the special security constraints of an enterprise environment. Therefore, extending the functionality of such systems to better meet enterprise needs can be very expensive, when it is doable at all.
Lastly, even when scaling is limited to content well-suited to these engines, it can still be surprisingly expensive. These products are often sold with licenses tied to unrealistically low document counts, or scaling necessitates the purchase of expensive proprietary hardware. Consider, for example, the cost of scaling a search solution from one popular Web vendor to hundreds of millions of documents when for only 30 million documents, the solution requires a $500k bi-annual license of proprietary hardware.
Though they technically scale well, licensing policies often make scaling Web search engines expensive
Note: Keep in mind that you can not only reduce costs by selecting a resource-efficient solution, you can also help your organization meet Green IT objectives.
Page 6
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Page 7
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
CloudView easily and cost-effectively scales in five directions: The Total Number of System Users Proven capacity to serve 100 million unique monthly visitors System Features and Functionality Extensive built-in functionality with full administrator control over features activated; open APIs for endlessly extending functionality Volume of Data Indexed Index and index build services can easily be distributed across commodity servers; built-in index partitioning and replication services further extend performance and availability Number of Simultaneous Queries Processed Average throughput of 20 Queries per Second (QPS) per server; easily scales by distributing query processing across multiple commodity servers Index Refresh Rate Supports any data refresh strategy: 1) real-time, 2) interval, and 3) just in time (on query reception). Dictionaries, thesauri, etc., are automatically updated as the index is updated.
Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware CloudView is designed to maximize performance and availability through process distribution, load balancing, index partitioning and index replication.
Exaleads ability to scale is comparable to GooglesMost enterprise search and content processing systems cannot handle billions of documentsExalead does. Exalead's search and content processing solutions give the company a technical advantage over vendors whose systems choke when thousands of users simultaneously want access to information. Stephen Arnold, ArnoldIT
Page 8
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
7 CloudView Benchmarks
To help you form a baseline of functional and performance requirements for comparing solutions, we provide below benchmarks for actual Exalead CloudViewTM installations. These benchmarks include statistics such as: Number of documents indexed Refresh rate for the index Queries processed per second Servers required Time to market Data source connectors used We invite you to use these specifications when evaluating vendor offerings. Furthermore, we encourage you to demand that prospective vendors contractually agree to meet your requirements with the resources they have proposed. Exalead can, and does.
The indexing capacity and performance of CloudView impressed us, and we quickly realized that this solution would enable us to create the kind of research services we wanted for our clients while letting us retain control over our costs, software, services, servers and maintenance. Whats more, the Exalead solution integrated transparently into our infrastructure, and offered essential security guarantees. Jean-Luc Brizard, ISD, Coface Services
Page 10
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
The Sanger Institute, a world-renowned research center dedicated to the study and analysis of genomes, uses CloudView for its knowledgebase of resources including genome data and genome-related scientific articles. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approximate search, spell checker. Documents: 1.2 billion (XML files, database records, scientific documents); growing by 120 million documents every 2 months; projected to eventually reach 20 billion documents Processing: Servers: Time to Market: Staffing: Connectors: Competitors: 5 Queries Per Second (QPS) 1 for indexing + 1 for searching 6 weeks; search component ready in 10 days 1 part-time technician: 2 days per month Native ODBC Connector; XML API Lucene; CloudView replaced Altavista
Our in-house staff and our external researcher community are now instantaneously in touch with all the information they need... We have to provide the context behind the search that allows our users to navigate to the specific area of interest in a few clicks. It is a unique solution over our size of index. Tony Cox, Head of Software, The Sanger Institute
Page 11
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
GEFCO selected CloudView for its redesigned logistics portal, reducing the load on its Oracle databases and allowing staff, partners and customers to locate, track and optimize vehicle transport in real-time across 80 countries and 500 international routes. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, geolocalization, reverse search, proximity search, approximate search, spell checker. Documents: Processing: Refresh Rate: Servers: Time to Market: Connectors: Notes: 1 million (representing 600,000 daily transactions) 2000 documents indexed per second Quasi real-time (30 seconds) 1 for index build + 1 for search + 1 for high availability Prototype 10 days; deployment in 60 days Native ODBC Connector Improved functionality, performance and data freshness while offloading central databases and reducing IS infrastructure. Enforces strong firewalling of confidential client data.
Exalead CloudView has dramatically improved system efficiency across the board. Before we installed CloudView it could take a day to get the results of such CPU-intensive queries, by which time the information was out of date. Now we get these answers almost instantly. Guillaume Rabier, Manager of Studies and Projects, GEFCO
Page 12
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
This hybrid online yellow and white page directory from Frances leading directory service company uses CloudView to dynamically enrich database content with Web content (Web/database mash-up). Features include geolocalization, faceted results navigation, dynamic categorization and clustering, entity extraction (people, places, organizations), reverse search, proximity search, approximate search, spell checker. Documents: Processing: Refresh Rate: Servers: Time to Market: Connectors: Competitors: Notes: 30 million (database records and Webpages) 40 QPS per server 15 minutes 1 for build + 2 for search 60 days Built-In HTTP and ODBC Connectors; XML API FAST Features powerful natural language interpretation capabilities.
Deploying an online directory is highly complex and usually requires 12 to 24 months. Exalead allowed us to launch our site in 2 months while bringing unmatched differentiating innovation. Bruno Massiet Dubiest, CEO, 118 218
Page 13
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
VIAMICHELIN
Travel publishing and services leader Michelin selected CloudView for its high-traffic travel portal, ViaMichelin. Features include rich mapping, dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, geolocalization, reverse search, proximity search, approximate search, spell checker. Documents: Processing: Servers: Time to Market: Connectors: 15 million points of interest (hotels, restaurants, attractions, etc.) 800 QPS; 150 milliseconds per query 8 4 weeks Built-In HTTP and ODBC Connectors; XML API
Page 14
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
This classified ad site uses CloudView to aggregate listings from more than 500 public websites. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approximate search, spell checker. Documents: Processing: Servers: Staffing: Connectors: Notes: 1 million announcements from 500 databases in 15 languages 40 QPS; 6 million unique monthly visitors, with traffic growing rapidly (18% in most recent quarter) 1 index build + 1 search + 1 high availability 100% of the work done by Yakaz team; Exalead provided only training Built-In HTTP Crawler + Extractors The system is very non-intrusive; indexing has no impact on source databases.
Page 15
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
RIGHTMOVE
Rightmove, the UKs top real estate classifieds portal, selected CloudView to enhance the end user experience, improve system performance, and reduce IT costs. Features include dynamic categorization and clustering, faceted results navigation, and geolocalization. Documents: Processing: Refresh Rate: Servers: Deployment: Connectors: Notes: 2 million (real estate ads) 400 QPS; 1.2 million records indexed in 1 hour; 29 million monthly visitors Less than 2 minutes 3 datacenters for high availability: each has 1 build + 2 search servers 3 months Built-In ODBC Connector Cost of search successfully reduced from .06 pence to .01 pence per 1000 queries (with more powerful and intuitive search and navigation features). 99.99% reliability achieved. 30 Oracle CPUs replaced by 9 Exalead CPUs.
Rightmove has already found that Exalead CloudView has allowed the speedy development of advanced search functionality whilst reducing search costs by 83%. Peter Brooks-Johnson, Product Director, Rightmove
Page 16
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
About Exalead
Founded in 2000 by search engine pioneers, Exalead is a global software provider in the enterprise and Web search markets. More than 190 companies worldwide and 100 million unique users a month rely on Exalead's information access platform to search, discover, and manage their information assets for faster, smarter decision-making, real-time unified data access, and improved productivity. Exaleads team includes industry-leading experts in information search, non-structured data analysis, and natural language processing. This team has concentrated its R&D efforts around meetings its clients need to collect, transform, index, and search arbitrarily complex data from heterogeneous sources. As a result, the Exalead CloudViewTM product has emerged as a uniquely successful platform for automatically structuring very high volumes of nonstructured data, such as email messages, Office documents, presentations, Web pages, blogs, forums, and RSS feeds. CloudView is currently being deployed for: Enterprise Search Extended Business Applications (EBI, Smart CRM, Intelligent Compliance, etc.) eBusiness (search and content enhancement for high traffic websites) Improved Data Management (database offloading, data migration and information lifecycle management) Embedded Search for OEMs/ISVs For more information, please visit http://www.exalead.com/software. The companys public WWW search engine is accessible at http://www.exalead.com/search.
Exalead France
10 place de la Madeleine 75008 Paris Tel: +33 (0) 1 55 35 26 26 Fax: +33 (0) 1 55 35 26 27
Exalead USA
576 Folsom Street, 2nd Floor San Francisco, CA 94105 Tel: +1 (415) 230 3800 Fax: +1 (415) 568 3375
Exalead Italy
Corto Giuseppe Garibaldi, 86 20121 - Milano Tel: +39 02 62 71 10 10 Fax: +39 02 62 71 10 11
Exalead UK
International House Stanley Bvd, Hamilton Glasgow G72 OBN Tel: +44 (0) 1698 404630 Fax: +44 (0) 1698 404639
Exalead Germany
Niederlassung Deutschland Robert-Bosch-Strasse 7 64293 Darmstadt Tel: +49 6151 35 99 690-0 Fax: +49 6151 35 99 690-35