Você está na página 1de 20

EXALEAD WHITEPAPER

Foreword

T
solutions.

his whitepaper is intended to aid you in evaluating search and information access proposals for your organization by detailing a very important, often overlooked, cost component: scaling your search solution. Too many customers are surprised to find that almost immediately after deploying a search engine, they need to scale their

platformand that the cost of scaling can be exorbitant. This paper therefore: Identifies the reasons why search needs escalate so frequently and dramatically, Explains why scaling is often expensive, Provides practical advice for anticipating and controlling costs, and Furnishes performance benchmarks for more effectively making cost comparisons between

We hope this information will aid you in developing a complete TCO forecast for your search platform, one that effectively incorporates the costs associated with scaling functionality and/or performance in addition to more easily identifying direct, indirect and upgrade costs. The Authors

We Welcome Your Feedback


Whatever your roleIT analyst, system administrator, application end user, business manager, security expert, or simply a curious readeryour feedback is important to us. We invite you to contact us at the address below with your comments, suggestions or questions. Frdric Catherine, Marketing Supervisor, Exalead frederic.catherine@exalead.com +33 1 55 35 26 81 www.exalead.com

Table of Contents
1 Why Search Demands & Costs Escalate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 Users Demand Wider Access, More Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.2 IT Discovers New Uses, Additional Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 2 3 Anticipating and Controlling Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Forecasting Demand: Five Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 3.1 Double Your Estimated Volume; Anticipate Double-Digit Growth . . . . . . . . . . . . . . . . . . . .3 3.2 Plan for Additional Data Sources, including the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 3.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration . . . . . . . . . . .4 3.4 Plan for Increased Compliance Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 3.5 Position Yourself for the Unexpected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 4 Understanding Search Types & Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4.1 Legacy Enterprise Search Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4.2 Search Add-Ons from Mainstream Application Providers . . . . . . . . . . . . . . . . . . . . . . . . . .5 4.3 Web Search Engines Ported to the Enterprise 5 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Establishing Apples-to-Apples Cost Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 About Exalead CloudViewTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.1 Dual Web/Enterprise DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.2 High Performance with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.3 Infinite Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 6.4 True Unified Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 6.5 Rapid Time to Market, Agile Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 7 CloudView Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 7.1 Enterprise Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 7.2 Business Applications - Database Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 7.3 Web Applications - Online Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 7.4 Web Applications - Online Classifieds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Figures
Fig. 1: Data Volume Managed by Companies Worldwide (IDC) . . . . . . . . . . . . . . . . . . . . . . . . .3 Fig. 2: CloudView Scales with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Fig. 3: CloudView Scales Infinitely in Five Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware . . . . . . . . . . . . . . . . .8 Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource . . . . .9

1 Why Search Demands & Costs Escalate


At root, search demands and costs escalate because search works. Users are hungry for better, easier information access. In fact, IDC estimates that information workers spend 48% of their time searching for and analyzing information, with onethird of that time resulting in failed searches (and re-created work), costing organizations $28,000 per worker per year.1
Information workers spend 48% of their time searching for information, with 1/3 of that time resulting in failed searches and recreated work

1.1 Users Demand Wider Access, More Features


Once enterprise search is deployed and users get a taste of unified, universal data access, demands to scale the system in functionality and performance appear almost immediately. Often, this is because organizations begin with an overly basic search solution: simple keyword searching of a finite set of resources, often HTML-centric, delivered via an appliance, hosted service or open source solution, and provided to a restricted user base. Even when more advanced systems are deployed, and a wider initial user base is served, users still quickly demand access to a wider range of data sources, and insist on more sophisticated features and functionality, such as automatic clustering and categorization, multilingual indexing, natural language querying and Web-style collaboration tools. And, of course, whatever the scope or functionality, users expect the sub-second responsiveness theyve become accustomed to on the Internet.

1.2 IT Discovers New Uses, Additional Value


In addition to this user-driven escalation, IT departments often discover that their search engine can provide value beyond simply locating information. They learn that search engines can be used to derive new value from existing information assets while adding much-needed IT agility. Specifically, these engines can be used to: Create new, exploitable assets from unstructured content like email, Office documents, chat and Web pages Increase the value of existing structured content (i.e., database systems) Provide a unified data platform for constructing agile business applications Transforming Unstructured Content into an Exploitable Asset Search engines automatically classify and categorize unstructured data. Once this data is structured and indexed, it can be incorporated into business information systems and processes. Enterprises find this can provide a significant competitive advantage given that unstructured data makes up on average 80% of corporate information assets, and that it contains highly valuable emotive and qualitative data.
1. IDC Predictions 2009: An Economic Pressure Cooker Will Accelerate the IT Industry Transformation, IDC, 12/2008
Page 1

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

Increasing the Value of Structured Data Because of performance limitations (databases are optimized for storing, not accessing data), and heavy licensing and infrastructure costs, database resources are frequently under-utilized in the enterprise. However, when IT managers discover that index-based querying is as rich as relational database querying yet ten times faster and cheaper, they begin to use search engines to provide alternative access to essential database content. Unified Data Access for Agile Applications Unified data access is essential for meeting escalating compliance requirements, and for satisfying Web-savvy users appetite for fast, easy information access. However, by decoupling data from traditional application layers, enterprises are also learning that search engines can enable a new breed of light business applications. Known as Search-Based Business Applications (SBAs), these applications can be created on-the-fly to satisfy evolving business needs using information drawn from any sourcefrom legacy databases to email, blogs, and the Webwhile leaving existing systems and structures untouched, an approach that preserves existing IS investments and is clearly less complex and costly than traditional data and application integration strategies. Maximize Benefit; Avoid Sticker Shock Given these benefits to both end users and IT managers, it is no wonder that functional and performative search demands escalate so frequently. And it is in this attempt to meet these escalating search demands by scaling hardware, infrastructure and functionality that organizations frequently encounter search sticker shock. They boost RAM, add servers, increase bandwidth, add or upgrade licenses, and set about the difficult (sometimes impossible) task of trying to make simple search tools perform complex analytic functions. But, given that search is too often a complex, resource-intensive
Without built-in scalability, even low cost solutions can skyrocket to millions of dollars in just a few years Search-Based Business Applications (SBAs) are fast and easy to construct and can incorporate data from any source

process, with infrastructure requirements increasing exponentially with increases in functional requirements, and that scaling is often tied to proprietary hardware or to unreasonable user or document counts, it is easy to see why costs can quickly mount. Even some solutions that begin at only a few thousand dollars can skyrocket to millions of dollars within just a few short years (sometimes even within one year) when functional or performance needs escalate.

Page 2

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

2 Anticipating and Controlling Costs


To avoid this unpleasant turn of events, one must better anticipate and control costs by: Forecasting search needs as accurately as possible and understanding the drivers for increased demands while still expecting the unexpected Understanding the differences between the basic types of search solutions and the scaling issues unique to each Establishing baseline functional and performance criteria to more easily make direct comparisons across solutions

3 Forecasting Demand: Five Rules


Keeping the user and management-driven demand factors from Section 1 in mind, we recommend following these five practical rules to better forecast demand.

3.1 Double Your Estimated Volume; Anticipate Double-Digit Growth


Enterprise datastores now routinely reach 100 terabytes (1000 GB) or more, with large organizations accumulating as much as 2TB of new data daily. Much of this is unstructured UserGenerated Content (UGC) enabled by personal productivity tools (like the Office suite) and communication tools (like email and chat). Structured content has likewise increased with the rise of content management systems designed to manage UGC, and with the widespread adoption of enterprise business applications (ERP, SCM, CRM, BI, CI, etc.).
Fig. 1: Data Volume Managed by Companies Worldwide (IDC)

This rapid, often unmeasured growth is the reason enterprises frequently under-estimate the initial volume of content that needs to be indexed. Even for accurately forecast volumes, one must keep in mind that the organic growth in corporate datastores means that the content load for enterprise search typically doubles every 6 months. Therefore, to be safe, develop your best volume estimate, then double it, and plan for double-digit growth post-implementation.

3.2 Plan for Additional Data Sources, including the Web


New information silos are popping up daily due to increasing IS complexity and the rise of the Cloud model, that is to say the real-time delivery of software, information, computing power and other business services via the Web. Expect your users to demand that more and more of these internal and external data sources be included in the search index, or in applications built upon this index, including extensive data from the Web itself. The Web already constitutes an essential daily resource for knowledge-hungry workers, and workers are increasingly demanding that Web data (blogs, competitor sites, feedback forums, industry sites, etc.) be integrated in business applications.
Page 3

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

3.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration
These same Web-savvy users are also demanding that enterprise search, and the business applications built upon search platforms, be as easy and intuitive to use as Web applications, if not seamlessly integrated with those same tools. Make sure prospective bidders can meet users demands for: Zero-training usage (for search and search-based applications) The ability to leverage Web and personal information for business tasks (e.g., using their LinkedIn network for sales and recruiting or integrating FaceBook data in CRM applications) Web 2.0/3.0 interactive capabilities, such as workflow integration and collaborative tools like resource tagging, bookmarking and sharing Fresh, up-to-date data
As people spend more time online actively participating in Web 2.0 technologies such as rich user interfaces based on Ajax and Flash, social networking and tagging, blogs and wikis, Web mashups, and on-demand services in general, information workers will start expecting Enterprise 2.0 applications in the workplace that focus on providing easy-to-use and many-to-many personalized online experiences for creating, publishing, locating, and sharing information with colleagues, customers, and partners.

Susan Feldman, IDC, Worldwide Search and Discovery Software 2008-2012 Forecast Update and

2007 Vendor Shares

3.4 Plan for Increased Compliance Demands


While IT has been working to meet increased legal and regulatory compliance demands for several years, regulatory pressures are revving up again in response to mismanagement issues underlying the recent economic crisis. Expect a trickle down impact on your own compliance strategy, with heightened internal demand for better risk management as well.

3.5 Position Yourself for the Unexpected


As the evolution of the Internet and Cloud computing attest, the information landscape is changing so fast that many demands simply cannot be anticipated. To make sure you have an enterprise search tool that provides you with maximum agility in responding to the unexpected, look for: An SOA architecture, with core services that can be easily replicated and distributed Open, standards-based APIs for flexibility in managing and interacting with the platform Support for Web formats and protocols (SOAP, REST, OWL, XML, RDF, RSS, etc.) as well as major programming environments (Java, C#, .Net) A single, unified base of unstructured and structured data Linear scaling using commodity hardware With these platform attributes, you can quickly modify existing applications, rapidly construct new applications and easily scale on demand.
Page 4

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

4 Understanding Search Types & Limitations


Another aid to accurately forecasting costs is understanding the three basic types of enterprise search engines and their unique performance capabilities and limitations. These types are: Legacy enterprise search products Search add-ons from mainstream business application providers Web-based search engines ported to an enterprise environment

4.1 Legacy Enterprise Search Products


Designed from inception for enterprise search, these engines were constructed for cross-repository data access and use statistical and linguistics-based text analytics to automate content processing. This enables them to produce the kind of faceted results navigation required for task-based searching in the enterprise. Most also provide good support for existing enterprise security infrastructures. While this native enterprise focus enables these engines to accommodate a wide range of functional requirements, they are often complex to use and lag in Web-style features. They can also be expensive to scale as they were designed from the outset for a relatively small user base and a limited (often internal) set of data sources.

Originally designed for limited data collections and a small, trained user base, traditional enterprise systems are often difficult to scale

4.2 Search Add-Ons from Mainstream Application Providers


Another class of engine is that developed by leading business application providers (IBM, SAP, SAS, Microsoft, Oracle, etc.) who sought first to improve the search function within their own databasecentered products, then to extend that search functionality to external repositories. As they were originally designed for database querying, these
Search add-ons from non-search vendors are typically poor in text analytics, limited in source connectors, and expensive to scale

products typically offer limited text analytics (i.e., limited ability to process unstructured data), are expensive to connect to external data sources, and expensive to scale due to restrictive licensing policies and resource-intense engineering. Many of these vendors have attempted to address these shortcomings by acquiring native enterprise search companies, with limited success in product integration and support.

4.3 Web Search Engines Ported to the Enterprise


These search engines scale well, up to tens of billions of documents and hundreds of queries per second, however, they are feature-poor, designed for light keyword searching of mainly HTML content. They typically return a laundry list of search results rather than the faceted navigation required for task-based enterprise search (popularity-driven Web relevancy is meaningless in an enterprise context).
Page 5

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

They have limited text analytic capabilities and a limited capacity to


Products from Web search vendors are weak in structured data handling, faceted navigation, and security

ingest, process and integrate structured content. They likewise have limited built-in support for the special security constraints of an enterprise environment. Therefore, extending the functionality of such systems to better meet enterprise needs can be very expensive, when it is doable at all.

Lastly, even when scaling is limited to content well-suited to these engines, it can still be surprisingly expensive. These products are often sold with licenses tied to unrealistically low document counts, or scaling necessitates the purchase of expensive proprietary hardware. Consider, for example, the cost of scaling a search solution from one popular Web vendor to hundreds of millions of documents when for only 30 million documents, the solution requires a $500k bi-annual license of proprietary hardware.
Though they technically scale well, licensing policies often make scaling Web search engines expensive

5 Establishing Apples-to-Apples Cost Comparisons


Finally, you can better anticipate and control costs by conducting a more accurate, more complete comparison of vendor cost proposals. To do so, first, detail your now-revised demand forecast, specifying: The Number of Users and Simultaneous Queries to be Processed The Number and Type of Sources and Documents to be Indexed The Range of Search and Indexing Features Required The Data Refresh Rate Next, ask prospective vendors to provide 5 year costs to cover both the initial demand and the scaled demand. To realistically forecast TCO, these costs should include the following: Direct Costs: Software Licensing Fees Hardware & Operating System (servers, server clusters, back-up systems) - Initial Purchase and Upgrade Costs 3 year 24*7 Support 5 Year Maintenance & Support Indirect Costs: Staffing Costs for Software Implementation, and Software and Hardware Administration Hardware Floor Space Hardware Power Cooling Hardware Bandwidth

Note: Keep in mind that you can not only reduce costs by selecting a resource-efficient solution, you can also help your organization meet Green IT objectives.

Page 6

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

6 About Exalead CloudViewTM


As a final tool for anticipating and controlling costs, we provide several performance benchmarks for CloudView that you can use in comparing vendor solutions. But first, it is helpful to understand why CloudView provides an important comparative model for cost-efficient search scaling.

6.1 Dual Web/Enterprise DNA


First, and most importantly, CloudView was designed from inception for both the Web, driving an 8 billion (soon to be 16 billion) page public search engine and serving 100 million unique researchers a month, and the enterprise market, with advanced semantic processing of unstructured data, superior structured data handling, and full compliance with existing security systems.

6.2 High Performance with Minimal Resources


Furthermore, CloudView was designed to achieve this balance of Web scalability and enterprise functionality using minimal resources. The end result is a platform that uses on average 1/5th the hardware resources of competitors, providing realtime indexing of 100 million documents and processing 20 queries per second on a single commodity serverall while providing advanced semantic features like dynamic categorization and clustering.
Fig. 2: CloudView Scales with Minimal Resources

6.3 Infinite Scaling

Fig. 3: CloudView Scales Infinitely in Five Directions

Page 7

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

CloudView easily and cost-effectively scales in five directions: The Total Number of System Users Proven capacity to serve 100 million unique monthly visitors System Features and Functionality Extensive built-in functionality with full administrator control over features activated; open APIs for endlessly extending functionality Volume of Data Indexed Index and index build services can easily be distributed across commodity servers; built-in index partitioning and replication services further extend performance and availability Number of Simultaneous Queries Processed Average throughput of 20 Queries per Second (QPS) per server; easily scales by distributing query processing across multiple commodity servers Index Refresh Rate Supports any data refresh strategy: 1) real-time, 2) interval, and 3) just in time (on query reception). Dictionaries, thesauri, etc., are automatically updated as the index is updated.

Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware CloudView is designed to maximize performance and availability through process distribution, load balancing, index partitioning and index replication.

Exaleads ability to scale is comparable to GooglesMost enterprise search and content processing systems cannot handle billions of documentsExalead does. Exalead's search and content processing solutions give the company a technical advantage over vendors whose systems choke when thousands of users simultaneously want access to information. Stephen Arnold, ArnoldIT

Page 8

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

6.4 True Unified Data Access


Because CloudView was developed simultaneously for Web and enterprise search, the platforms natural language processing modules (text processing and annotation, automatic document classification, named entity extraction, etc.) are especially adept at analyzing, categorizing and classifying very high volumes of unstructured data, content like Word documents, Web pages, blog entries, email messages, PowerPoint presentations, PDFs, etc. This automatic structuration not only makes previously unstructured data directly accessible as a new information channel, it also enables CloudView to synthesize it with existing structured data, such as that from corporate databases and business applications. This meaningful correlation forms the foundation for value-added uses such as database offloading, data migration, and content mash-ups.

Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource

6.5 Rapid Time to Market, Agile Development


Rapid Time to Market CloudView is both a fully packaged, off-the shelf product designed for plug and play use, and a white box solution that can be quickly adapted to specific needs using standards-based APIs. As a result, CloudView typically deploys in just days for enterprise search, and on average within only 4-6 weeks for advanced business applications and data mash-ups, with little to no need for professional services support. Agile Development Beyond initial deployment, CloudView provides an agile base for rapidly constructing new business applications, and can be quickly scaled to meet evolving demands. Application agility is assured by CloudViews fully unified data access platform, SOA architecture and open API framework, while the ability to scale quickly is made possible by built-in distribution and replication facilities that simply require the addition of commodity hardware.
Page 9

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

7 CloudView Benchmarks
To help you form a baseline of functional and performance requirements for comparing solutions, we provide below benchmarks for actual Exalead CloudViewTM installations. These benchmarks include statistics such as: Number of documents indexed Refresh rate for the index Queries processed per second Servers required Time to market Data source connectors used We invite you to use these specifications when evaluating vendor offerings. Furthermore, we encourage you to demand that prospective vendors contractually agree to meet your requirements with the resources they have proposed. Exalead can, and does.

7.1 Enterprise Search


COFACE EXTRANET Coface, a world leader in trade-credit information and protection with offices in 60 countries, selected CloudView for this extranet which provides customers with key data on 100 million companies. Performance Benchmarks Documents Indexed: Processing: Refresh Rate: Servers Required: Time to Market: Connectors: Competitors: Note: 100 million (Oracle db records) 2000 documents indexed per second; 1.7 million company profiles added per hour Less than 1 minute 2 for indexing + 2 for searching 60 days Standard PAPI and ODBC Connectors Sinequa, Fast Response rate is five times faster than legacy system

The indexing capacity and performance of CloudView impressed us, and we quickly realized that this solution would enable us to create the kind of research services we wanted for our clients while letting us retain control over our costs, software, services, servers and maintenance. Whats more, the Exalead solution integrated transparently into our infrastructure, and offered essential security guarantees. Jean-Luc Brizard, ISD, Coface Services

Page 10

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

SANGER INSTITUTE INTRANET

The Sanger Institute, a world-renowned research center dedicated to the study and analysis of genomes, uses CloudView for its knowledgebase of resources including genome data and genome-related scientific articles. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approximate search, spell checker. Documents: 1.2 billion (XML files, database records, scientific documents); growing by 120 million documents every 2 months; projected to eventually reach 20 billion documents Processing: Servers: Time to Market: Staffing: Connectors: Competitors: 5 Queries Per Second (QPS) 1 for indexing + 1 for searching 6 weeks; search component ready in 10 days 1 part-time technician: 2 days per month Native ODBC Connector; XML API Lucene; CloudView replaced Altavista

Our in-house staff and our external researcher community are now instantaneously in touch with all the information they need... We have to provide the context behind the search that allows our users to navigate to the specific area of interest in a few clicks. It is a unique solution over our size of index. Tony Cox, Head of Software, The Sanger Institute

Page 11

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

7.2 Business Applications - Database Offloading


GEFCO EXTRANET/DATABASE OFFLOADING

GEFCO selected CloudView for its redesigned logistics portal, reducing the load on its Oracle databases and allowing staff, partners and customers to locate, track and optimize vehicle transport in real-time across 80 countries and 500 international routes. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, geolocalization, reverse search, proximity search, approximate search, spell checker. Documents: Processing: Refresh Rate: Servers: Time to Market: Connectors: Notes: 1 million (representing 600,000 daily transactions) 2000 documents indexed per second Quasi real-time (30 seconds) 1 for index build + 1 for search + 1 for high availability Prototype 10 days; deployment in 60 days Native ODBC Connector Improved functionality, performance and data freshness while offloading central databases and reducing IS infrastructure. Enforces strong firewalling of confidential client data.

Exalead CloudView has dramatically improved system efficiency across the board. Before we installed CloudView it could take a day to get the results of such CPU-intensive queries, by which time the information was out of date. Now we get these answers almost instantly. Guillaume Rabier, Manager of Studies and Projects, GEFCO

Page 12

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

7.3 Web Applications - Online Directories


118 218.fr

This hybrid online yellow and white page directory from Frances leading directory service company uses CloudView to dynamically enrich database content with Web content (Web/database mash-up). Features include geolocalization, faceted results navigation, dynamic categorization and clustering, entity extraction (people, places, organizations), reverse search, proximity search, approximate search, spell checker. Documents: Processing: Refresh Rate: Servers: Time to Market: Connectors: Competitors: Notes: 30 million (database records and Webpages) 40 QPS per server 15 minutes 1 for build + 2 for search 60 days Built-In HTTP and ODBC Connectors; XML API FAST Features powerful natural language interpretation capabilities.

Deploying an online directory is highly complex and usually requires 12 to 24 months. Exalead allowed us to launch our site in 2 months while bringing unmatched differentiating innovation. Bruno Massiet Dubiest, CEO, 118 218

Page 13

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

VIAMICHELIN

Travel publishing and services leader Michelin selected CloudView for its high-traffic travel portal, ViaMichelin. Features include rich mapping, dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, geolocalization, reverse search, proximity search, approximate search, spell checker. Documents: Processing: Servers: Time to Market: Connectors: 15 million points of interest (hotels, restaurants, attractions, etc.) 800 QPS; 150 milliseconds per query 8 4 weeks Built-In HTTP and ODBC Connectors; XML API

Page 14

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

7.4 Web Applications - Online Classifieds


YAKAZ

This classified ad site uses CloudView to aggregate listings from more than 500 public websites. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approximate search, spell checker. Documents: Processing: Servers: Staffing: Connectors: Notes: 1 million announcements from 500 databases in 15 languages 40 QPS; 6 million unique monthly visitors, with traffic growing rapidly (18% in most recent quarter) 1 index build + 1 search + 1 high availability 100% of the work done by Yakaz team; Exalead provided only training Built-In HTTP Crawler + Extractors The system is very non-intrusive; indexing has no impact on source databases.

Page 15

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

RIGHTMOVE

Rightmove, the UKs top real estate classifieds portal, selected CloudView to enhance the end user experience, improve system performance, and reduce IT costs. Features include dynamic categorization and clustering, faceted results navigation, and geolocalization. Documents: Processing: Refresh Rate: Servers: Deployment: Connectors: Notes: 2 million (real estate ads) 400 QPS; 1.2 million records indexed in 1 hour; 29 million monthly visitors Less than 2 minutes 3 datacenters for high availability: each has 1 build + 2 search servers 3 months Built-In ODBC Connector Cost of search successfully reduced from .06 pence to .01 pence per 1000 queries (with more powerful and intuitive search and navigation features). 99.99% reliability achieved. 30 Oracle CPUs replaced by 9 Exalead CPUs.

Rightmove has already found that Exalead CloudView has allowed the speedy development of advanced search functionality whilst reducing search costs by 83%. Peter Brooks-Johnson, Product Director, Rightmove

Page 16

Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

About Exalead
Founded in 2000 by search engine pioneers, Exalead is a global software provider in the enterprise and Web search markets. More than 190 companies worldwide and 100 million unique users a month rely on Exalead's information access platform to search, discover, and manage their information assets for faster, smarter decision-making, real-time unified data access, and improved productivity. Exaleads team includes industry-leading experts in information search, non-structured data analysis, and natural language processing. This team has concentrated its R&D efforts around meetings its clients need to collect, transform, index, and search arbitrarily complex data from heterogeneous sources. As a result, the Exalead CloudViewTM product has emerged as a uniquely successful platform for automatically structuring very high volumes of nonstructured data, such as email messages, Office documents, presentations, Web pages, blogs, forums, and RSS feeds. CloudView is currently being deployed for: Enterprise Search Extended Business Applications (EBI, Smart CRM, Intelligent Compliance, etc.) eBusiness (search and content enhancement for high traffic websites) Improved Data Management (database offloading, data migration and information lifecycle management) Embedded Search for OEMs/ISVs For more information, please visit http://www.exalead.com/software. The companys public WWW search engine is accessible at http://www.exalead.com/search.

Exalead France
10 place de la Madeleine 75008 Paris Tel: +33 (0) 1 55 35 26 26 Fax: +33 (0) 1 55 35 26 27

Exalead USA
576 Folsom Street, 2nd Floor San Francisco, CA 94105 Tel: +1 (415) 230 3800 Fax: +1 (415) 568 3375

Exalead Italy
Corto Giuseppe Garibaldi, 86 20121 - Milano Tel: +39 02 62 71 10 10 Fax: +39 02 62 71 10 11

Exalead UK
International House Stanley Bvd, Hamilton Glasgow G72 OBN Tel: +44 (0) 1698 404630 Fax: +44 (0) 1698 404639

Exalead Germany
Niederlassung Deutschland Robert-Bosch-Strasse 7 64293 Darmstadt Tel: +49 6151 35 99 690-0 Fax: +49 6151 35 99 690-35

Você também pode gostar