Você está na página 1de 26

SALE OF ASSETS OF SEARCHME, INC.

Lighthouse Capital Partners V, L.P. ("Lighthouse"), the senior secured lender to SearchMe, Inc.,
("SearchMe"), (www.SearchMe.com) is soliciting interest for the acquisition of all or substantially all of
SearchMe's assets, including its Intellectual Property ("IP"), in whole or in part (collectively, the
"SearchMe Assets"). Please be advised that the SearchMe Assets are being offered for sale pursuant to
Section 9-610 of the Uniform Commercial Code. Purchasers of the SearchMe Assets will receive all of
SearchMe's rights in the purchased portion of Lighthouse's collateral, which consists of substantially all of
SearchMe's assets, as provided in the Uniform Commercial Code.

The sale is being conducted by Lighthouse with the cooperation of SearchMe. SearchMe has advised
Lighthouse that it will use its best efforts to make its employees available to assist purchasers with due
diligence and assist with a prompt and efficient transition at a mutually convenient time.

IMPORTANT LEGAL NOTICE

The information in this memorandum does not constitute the whole or any part of an offer or a contract.

The information contained in this memorandum relating to the SearchMe Assets has been supplied by
third parties and obtained from a variety of sources. It has not been independently investigated or verified
by Lighthouse or its respective agents.

Potential purchasers should not rely on any information contained in this memorandum or provided by
Lighthouse (or its respective staff, agents, and attorneys) in connection herewith, whether transmitted
orally or in writing (the "information"), as a statement, opinion, or representation of fact. Please note that
all information provided herein relating to the operations of SearchMe's business and its market positions
relates to periods on or prior to March 31, 2009. Interested parties should satisfy themselves through
independent investigations as they see fit.

Lighthouse and its respective staff, agents, and attorneys, (i) disclaim any and all implied warranties
concerning the truth, accuracy, and completeness of any information provided in connection herewith and
(ii) do not accept liability for the information, including that contained in this memorandum, whether that
liability arises by reasons of Lighthouse's negligence or otherwise.

Any sale of the SearchMe Assets will be made on an "as-is," "where-is," and "with all faults" basis,
without any warranties, representations, or guarantees, either express or implied, of any kind, nature, or
type whatsoever from, or on behalf of, Lighthouse. Without limiting the generality of the foregoing,
Lighthouse, and its respective staff, agents, and attorneys, hereby expressly disclaim any and all implied
warranties concerning the condition of the SearchMe Assets and any portions thereof, including, but not
limited to, environmental conditions, compliance with any government regulations or requirements, the
implied warranties of habitability, merchantability, or fitness for a particular purpose.

This memorandum contains confidential information and is not to be supplied to any person without
Lighthouse's prior consent.

Total investment in SearchMe, Inc.:

$45 Million
of equity capital proceeds
from institutional venture capital funds
and other private investors from 2004-2008
THE FOLLOWING IS PRESENTED FOR INFORMATIONAL PURPOSES ONLY ON A "BEST EFFORTS"
BASIS. NO WARRANTY IS PROVIDED WITH REGARD TO THE ACCURACY OF THE INFORMATION
HEREIN OR THE VALUE OF SEACHME'S ASSETS. PAST PERFORMANCE MAY NOT BE INDICATIVE
OF FUTURE RESULTS.

SUMMARY POINTS

q SearchMe Invented, Built and Ran a Proprietary, Large Scale Search Technology from the
ground up. One of the only such systems built in the last decade, in addition to those of
Inktomi, Yahoo, Google, and MSN/ Bing.

q SearchMe is one of the Early Pioneers of Integrated Media Blending; mixing video, music
and web pages into a single search. SearchMe's technology is aware of the different
characteristics of rich media, and can disambiguate between media types with similar names.

q SearchMe's Vertical Architecture has Query understanding within 100 milliseconds across
hundreds of categories. This leveraged a complex and deep taxonomy of web pages trained over
one thousand categories.

q Significant Investment in Intellectual Property and Assets – over $45 million of equity capital
has funded the development of SearchMe's business and Intellectual Property.

q Designed and Developed by Top-Tier intellectual talent in Search and UI Design –


SearchMe's intellectual property and assets were developed by some of the best minds in search
and user interface design.

q Funded and vetted by well-known institutional venture capital funds, including Sequoia
Capital. SearchMe's most recent formal round of equity capital in April 2008 valued the
company's assets at approximately $250 million post-money.

q Well-Known Brand Name, Trademark, and domain name related to Visual Search.

q IP Documented across 23 Patent Applications, including three pending published patent


applications, described in greater detail on Exhibit A.

q Strong Historical Growth in "People Per Month," per Quantcast

SearchMe, Inc. Confidential Page 2


SUMMARY OF HISTORICAL INFORMATION1

SearchMe is a large-scale general-purpose search engine comprised of back-end server software and
multiple front-end client software for the web and five native mobile platforms. SearchMe specializes in
(1) the visual presentation of web pages, various types of rich media, and other forms of Internet search
results, and (2) the delivery and retrieval of highly relevant results for a given search query within a
vertical domain. SearchMe's visual approach to search is designed to create a more natural and human
experience for the user, while at the same time crafting an environment that may be a better fit for the
delivery of rich streams of Internet video and other types of media. This approach to Internet search has
the potential to create a rich and differentiated platform for Internet advertising campaigns. In addition to
its advantages in rich media search, SearchMe's novel User Interface may have applications beyond
traditional Internet search, including the potential to be repurposed as an elegant user interface for the set-
top box market, or as an embedded UI for electronic devices. SearchMe's technology assets have
leveraged deep industry knowledge from some of the best minds in search and UI design to create a
proprietary technology platform composed of (A) a well-known and elegant front-end User Interface
design and related assets, and (B) a back-end search and advertising architecture and related assets.
SearchMe has been prominently featured in numerous major U.S. and global publications, including Time
Magazine, The Washington Post, TechCrunch, and Bloomberg (Venture Show) among others. Time
Magazine heralded SearchMe as one of the "50 Best Web Sites of 2008."2 At its peak, the company
generated approximately 5 million "people per month."

The company's assets are well-positioned to capitalize on several important industry trends – the robust
growth of Internet advertising and Internet search; the increased demand for online videos and other types
of rich multimedia within search; and the eventual convergence of television with broadband Internet.

SearchMe's Intellectual Property is described in 23 patent applications, including three pending


published patent applications attached in Exhibit A.

SearchMe is a privately held company. SearchMe (founded in 2004 as Kavam, Inc.) is headquartered in
Mountain View, California. To date, SearchMe has secured over $45 million in equity financing, with the
company achieving a post-money valuation of nearly $250MM in April 2008. SearchMe's institutional
investors include Sequoia Capital, DAG and Deep Fork Capital, among others.

THE MARKET:

SearchMe has historically competed primarily in the market for Internet Advertising and specifically in
the market for Internet search engine advertising.

According to PWC and the Internet Advertising Bureau, Internet advertising revenues ("revenues") in the
United States totaled $23.4 billion for the full year of 2008, with Q3 accounting for approximately $5.8
billion and Q4 totaling approximately $6.1 billion. Internet advertising revenues for the full year of 2008
increased 10.6 percent over 2007.

According to PWC and the Internet Advertising Bureau, in 2008, search engines represented the largest
online advertising revenue format, accounting for 45 percent of 2008 full year Internet advertising

1
ALL INFORMATION PROVIDED HEREIN RELATING TO THE OPERATIONS OF SEARCHME 'S BUSINESS AND THE MARKET POSITIONS
RELATES TO PERIODS ON OR PRIOR TO MARCH 31, 2009 AND IS PROVIDED ON A "BEST -EFFORTS" BASIS. INTERESTED PARTIES SHOULD
SATISFY THEMSELVES THROUGH INDEPENDENT INVESTIGATIONS AS THEY OR THEIR LEGAL AND FINANCIAL ADVISORS SEE FIT THAT
THE INFORMATION IS ACCURATE. LIGHTHOUSE MAKES NO WARRANTY TO THE ACCURACY OF ANY INFORMATION CONTIANED HEREIN
OR THE VALUE OF THE SEARCHME ASSETS.
2
http://www.time.com/time/specials/2007/article/0,28804,1809858_1809955_1811466,00.html

SearchMe, Inc. Confidential Page 3


revenues, up from the 41 percent reported in 2007. Search revenues totaled $10.5 billion for the full year
2008, up 20 percent from the $8.8 billion reported in 2007. As demonstrated in the following chart, search
engines represent the largest category of Internet advertising revenue from 2004 to 2008.

It is widely assumed that each percentage share of the search market is worth $1 billion in market cap
valuation3

VALUE, QUALITY, AND OPPORTUNITY:

SearchMe has historically experienced strong growth and has been among the leaders in next generation
search technology and progressive, visual interface designs for the presentation of rich media from a
multitude of sources. However, recent working capital constraints have created the opportunity for all or a
portion of SearchMe's assets to be sold.

Value Over $45 million of equity capital investment in SearchMe's business &
intellectual property. Due to market circumstances, the technology assets and
intellectual property are available for purchase at a material discount to the
substantial total amount of equity invested in SearchMe since 2004.

3
http://dondodge.typepad.com/the_next_big_thing/2007/05/why_1_of_search.html
http://www.altsearchengines.com/2009/06/10/1-market-share-in-search-is-worth-up-to-3-billion-dollars/

SearchMe, Inc. Confidential Page 4


Quality SearchMe was founded by an experienced team of search experts, and
funded by top-tier institutional venture capital funds, including Sequoia
Capital. SearchMe's management team has included some of the most sought
after experts in user interface design and search technology.

Opportunity SearchMe's Assets have the capacity to be deployed in a number of ways


that could create value for an acquirer of these assets. This opportunity
includes the creation of a Stand Alone Visual Search Engine with Enhanced
Vertical Search Capabilities.

IMPORTANT TECHNOLOGY:

(A) Valuable & Elegant User Interface that can be repurposed across markets: The
Company's proprietary user interface design provides superior functionality relative to
competitive solutions.

Thumbnail Server – renders visual previews of dynamic websites. Leverages DOM to


gather meta data including references to specific media, phone numbers, addresses (beta),
text, links, and flash media. This data is saved in a document and can be passed to backend
for relevance ranking or front end for a more dynamic user experience.

Ribbon Control – is an elegant Flex control used to display large amounts of visually
compelling media in a horizontal list format. Though this control resembles Apple's cover
flow feature it also supports unique "ribbon" views and a standard "film" view.

Query Auto Complete - given the beginning of a query this feature will predict what the
user is typing and return a list of suggestions. This feature comes with the logic for the auto
complete and the data to power it.

(B) Innovative and Scalable Search & Advertising Architecture: The Company's proprietary
back-end search and advertising architecture is purpose-built to provide highly-relevant
search results paired with rich multimedia advertising capabilities. SearchMe's back-end
architecture includes the following systems, feeds, data, and technology components:

Data: Online System Components:


o Labeled relevance judgment data
(TORGO Data) o Cluster Server (receives queries and
o Labeled classification sends to IS/TS/WS)
training/evaluation data (CHOCO o Index Server (receives query from
Data) CS and performs scoring)
o Term Server (Parses query and adds
Offline Processing Systems: extra query-level features)
o Widget Server (Returns specialized
o CHOCO Classification System results)
o TORGO Judgment Collection System
o SFP – Special Features Processing Key Components, Subsystems, and Code
o Aggregation Snippets:
o Specialized SPAM Prediction Engine
o Page Quality Prediction Engine o Search Relevance (list of used
o Spider System features, and code to generate them,

SearchMe, Inc. Confidential Page 5


along with importance in our trained
Offline Media Feeds & Freshness Systems: function)
o Paparazzi Technology
o Paid Inclusion System o Eddie Technology
o El Rapido System
o Hulu Document Processor Core Offline Systems for Online Production
o Other Media Feed Processing System
(YouTube RSS, generic media) o Distributed Document Storage
System
o IB – Index Builder

MANAGEMENT TEAM AT SEARCHME 4

Randy Adams, Founder and CEO: Over 25 years of experience, founded built and sold seven venture-
backed start-ups in the technology sector, raised more than $200 million in venture capital, arranged for
the initial funding of Yahoo, Inc. and served on the Yahoo, Inc. board of directors, created the first
internet commerce company, the Internet Shopping Network, sold to the Home Shopping Network,
served as Division President of the Home Shopping Network and Director of Engineering for Adobe
Systems where he envisioned and created Adobe Acrobat and PDF file format.

John Galatea, Vice President of Sales & Marketing: Over 25 years of experience in Sales, Business
Development and Marketing in the Silicon Valley. John was an early member of the Inktomi team that
revolutionized OEM search. He also co-founded the Paid Inclusion platform for monetizing algorithmic
search that was acquired by Yahoo in 2003 and grew revenues to over $200M annually for Yahoo. In
addition, John as asked to evangelized, develop and lead sales organizations for Yahoo in both Sponsored
Search and Display across Digital Agencies, Direct Clients and SEM's.

Eric Glover, Principal Scientist & Classification Architect: Eric Glover has a PhD from University of
Michigan and has over ten years of commercial and academic web search experience. Previously at Ask
Jeeves and before that NEC Laboratories America, Eric has a proven track record of creating effective,
commercial, and web-scale categorization systems. Eric has numerous highly cited publications and over
ten filed patents.

Timothy Huertas, Client Applications Architect: Timothy received his undergraduate from University
of Central Missouri and has over 7 years of professional experience in industries ranging from finance to
photo sharing. While at SearchMe Timothy was responsible for SearchMe's web and web service
interface. Prior to joining SearchMe Timothy was a member of Snapfish's (a Hewlett Packard company)
Emerging Technologies Group where some of his contributions include the site's image editor, text editor
and slide show, which are still used by millions of people in multiple countries.

Consulting Resources Available for Hire

SearchMe has identified certain principal technologists that may be available to an acquirer of the
SearchMe Assets to help render the company's technology, systems, data, and architecture.

4
THE BIOGRAPHICAL INFORMATION CONCERNING THE CURRENT MANAGEMENT OF SEARCHME IS INCLUDED FOR INFORMATION
PURPOSES ONLY. ALTHOUGH THIS SALE IS BEING CONDUCTED WITH SEARCHME 'S COOPERATION, THIS SALE IS STRICTLY AN ASSET
SALE OFFERED BY LIGHTHOUSE AS SEARCHME 'S SENIOR LENDER PURSUANT TO ARTICLE 9 OF THE UNIFORM COMMERCIAL CODE.
LIGHTHOUSE HAS NO ARRANGEMENT PURSUANT TO WHICH BUYER OF THE SEARCHME ASSETS COULD BE ASSURED THE FUTURE
SERVICES OF ANY SEARCHME OFFICERS OR EMPLOYEES.

SearchMe, Inc. Confidential Page 6


Introduction to SearchMe's IP

At its peak SearchMe had over 5,000,000 unique people per month, making it the number one visual
search engine at the time. The significant positive press and reviews demonstrate the unique capabilities
of SearchMe's technology. This technology was born out of tens of millions of dollars of research and
several PhDs from top institutions as well as former executives and high-ranking employees from major
commercial search engines.

Most people know SearchMe for the visual and media rich UI, which leveraged our own high resolution
Thumbnail generation technology. In addition, the back-end system demonstrated relevance which was
competitive with the top five search engines. As well as thumbnails, embedded media and powerful
relevance algorithms, SearchMe is also differentiated by its advanced categorization technology. Our
system has labels for over 1000 categories from our more than 1000 topic taxonomy for every web page
in our index. This powerful system allows SearchMe to offer vertical suggestions as a form of real-time
disambiguation, such as knowing the different meanings of 'diamondback', 'bonds', 'saturn', etc.. The
system would know, for each query in real-time the percent affinity (and intersection of) for each of the
nearly 300 exposed categories.

In addition to the system, SearchMe amassed a wealth of valuable data. In order to train a Machine
Learned Relevance function (MLR), and nearly 1000 automated category classifiers, we have about 1
million human labeled judgments. These judgments include tens of thousands of categorical labels for
classifier training and hundreds of thousands of relevance judgments for query/url pairs. This data is
extremely valuable for any company interested in the search space.

SearchMe's IP is divided among many sub-groups; Data (such as human judgments and category
classifiers), Online System, Offline Processing (which includes categorization), and specific technologies
and patent filings. Several of the inventors are available to assist in explaining the benefits and issues with
trying to apply this technology for your organization.

Lighthouse is seeking a buyer for the SearchMe's Assets, in whole or in part. Interested parties may bid
on all or any part of SearchMe's brand name, core technology, front-end user interface, or back-end
search and advertising architecture, enabling the purchaser to leverage SearchMe's brand name, core
technology, front-end user interface, and/or back-end architecture, to establish an Internet search engine
with a visual approach, to enhance the user interface of an existing search engine, to leverage the potential
relevancy improvements.

SearchMe, Inc. Confidential Page 7


The Bidding Process

Interested and qualified parties will be expected to sign a nondisclosure agreement (Exhibit C hereto) to
have access to due diligence documentation and key members of the management and development teams
(the "Due Diligence Access"). Each interested party, as a consequence of the Due Diligence Access
granted to it, shall be deemed to acknowledge and represent (i) that it is bound by the bidding procedures
described herein; (ii) that it has an opportunity to inspect and examine the SearchMe Assets and to review
all pertinent documents and information with respect thereto; (iii) that it is not relying upon any written or
oral statements, representations, or warranties of Lighthouse or SearchMe, or their respective staff,
agents, or attorneys; and (iv) all such documents and reports have been provided solely for the
convenience of the interested party, and Lighthouse and SearchMe (and their respective, staff, agents, or
attorneys) do not make any representations as to the accuracy or completeness of the same.

Indications of Interest (outlining value range and specific assets to be purchased) must be received by
Lighthouse no later than October 9, 2009 at 5pm Pacific Time ("Indication Deadline"), and may be
subject to the completion of due diligence. Based on Lighthouse's evaluation of the Indications of
Interest, Lighthouse will invite those interested parties that Lighthouse deems in its sole discretion to be
viable candidates to deliver binding Letters of Intent consistent with the terms of a standard foreclosure
sale agreement prepared by Lighthouse ("Sale Agreement") and provided by Lighthouse to all parties
invited to Letters of Intent. The Sale Agreement will require the bidder to close and fund the purchase
price within seven days of Lighthouse's delivery of its signature to the Sale Agreement. Letters of Intent
must be received no later than October 31, 2009 at 5pm Pacific Time ("Offer Deadline").

Indications of Interest must include the name of the purchasing entity, purchase price range, assets to be
purchased and any contingencies to closing. Letters of Intent must be accompanied by the bidder's duly
executed final version of the Sale Agreement with a comparison showing all variation and changes from
the form of proposed Sale Agreement provided by Lighthouse: delivery of the bidder's duly executed Sale
Agreement shall constitute a binding, unconditional offer to purchase the identified property. This will
be an "as is", "where is" sale with no representations or warranties provided by the Lighthouse or
SearchMe. Exclusivity will not be provided and it is the winning bidders' sole responsibility to set the
closing agenda.

Lighthouse reserves the right to close the bidding process immediately with or without notice to interested
parties. Interested parties are encouraged to complete due diligence and submit offers as soon as
practicable.

Any person or other entity making a bid must be prepared to provide independent confirmation that they
possess the financial resources to complete the purchase where applicable. Lighthouse reserves the right
to, in its sole discretion, accept or reject any bid, or withdraw any or all assets from sale.

All sales, transfer, and recording taxes, or similar taxes, if any, relating to the sale of the SearchMe Assets
shall be the sole responsibility of the successful bidder and shall be paid to Lighthouse at the closing of
each transaction.

For additional information, please see below and/or contact:

John Galatea Tom Conneely Randy Adams


Vice President, Sales & Marketing Vice President, Operations CEO
SearchMe, Inc. Lighthouse Capital Partners SearchMe, Inc.
jgalatea@yahoo.com tom@lcpartners.com randy@olstealth.com
408-921-4614 415-464-5950 650-862-0870

SearchMe, Inc. Confidential Page 8


SearchMe Asset Details

SearchMe's assets are organized into three key areas:

1. Brand name, trademarks, domain name, and related marketing collateral


2. Front-End User Interface and related assets.
3. Back-End Architecture and related assets.

SearchMe's Patent applications are described in an attachment to this document listed as Exhibit A, and
are available for sale in conjunction with or separate from any of SearchMe's other assets.

I. SEARCHME'S BRAND NAME, TRADEMARKS AND DOMAIN NAME

(A) Brand Name

SearchMe's brand name is well known as one of the first recognizable brands in visual search engines.

(B) Trademarks

The following excerpt comes from the USPTO Trademark Electronic Search System:

Word
SearchMe
Mark
Goods and IC 009. US 021 023 026 036 038. G & S: Computer software for searching, retrieving,
Services mining, classifying, and collecting information on computer networks within individual
workstations and personal computers via the internet
IC 038. US 100 101 104. G & S: Transmission of data, images, video and sound clips via
electronic global computer networks

IC 042. US 100 101. G & S: Computer services, namely, providing computer search engine
software, which makes use of an index of documents, over a network for obtaining
customized on-line user-queried information; providing search engines for obtaining data on
global computer networks, namely, providing search engines that provide indexed
information, such indexed information including web sites, on-line links, and other
information extracted and retrieved from global computer networks, providing search engines
that provide information in the form of text, electronic documents, databases, graphic, and
audio visual information extracted and retrieved from global computer networks, and
providing search engines that retrieve documents, or portions thereof, available on the
Internet and classify such documents, or portions thereof, using classifiers in order to provide
an indexed set of documents that can be queried by a user

(C) Domain Name

SearchMe, Inc. Confidential Page 9


SearchMe's primary domain name, http://www.SearchMe.com is available for sale.

II. FRONT-END USER INTERFACE AND RELATED ASSETS

SearchMe's User Interface system - which is easily separable from its back-end architecture - can
formulate queries and receive an XML feed that can contain a mix of regular web results and special
results which specify a specific rendering engine to allow for in-line media (i.e. YouTube, Hulu, Imeem,
MTV, etc.)

SearchMe's front-end user interface has been implemented in several different platforms, including AS3,
Javascript, iPhone, Android, S60, Windows Mobile and Blackberry Storm.

The key technology components of SearchMe's font-end user interface are the (A) Thumbnail Server
and (B) Ribbon Control.

(A) Thumbnail Server

SearchMe's thumbnail server leverages Firefox's plug-in framework technology, a combination of, XUL
(markup similar to HTML) and JavaScript this permits manipulation and modification by junior level
staff. The thumbnailer is unique in the following 2 ways:

1. It captures a larger than average 1024x1024 screen shot of a web page.


2. Since the thumbnailer is built on top of FireFox 3.x it has access to the full DOM of a web page.
This can be used to harvest metadata about the image. The thumbnailer currently collects the
following meta data:
a. The coordinates of every text object on the page.
b. The coordinates of every hyperlink on the page and its corresponding href.
c. The coordinates of every flash object on the page and its embed inner HTML.
d. The coordinates of every US phone number on the page.

SearchMe, Inc. Confidential Page 10


e. The coordinates of some addresses on the screen (beta).

Here is how SearchMe intended to leverage the metadata:

1. A user does a search and is presented with snapshots of every web page found.

2. The user hits the info button and can opt to make several pieces of the image interactive.

SearchMe, Inc. Confidential Page 11


3. The user can opt to highlight the media on the page.

4. The user can opt to play the media inline.

SearchMe, Inc. Confidential Page 12


5. The user can highlight every hyperlink on the page. The hyperlinks become active and the user
can use them to click through.

SearchMe, Inc. Confidential Page 13


6. The user can opt to highlight where their search term appears on the image.

SearchMe, Inc. Confidential Page 14


7. The user can opt to highlight every address on the page. The plan was to make the address
clickable and bring up a map in the slide.

8. The user can opt to highlight every phone number on the page. The plan was to make the phone
number clickable and tie it to an online phone service (Google voice, skype).

SearchMe, Inc. Confidential Page 15


Mobile Devices

SearchMe's IP includes user interfaces for 4 major mobile phone platforms. The UI can easily be ported
to support any search engine giving buyers an instant mobile presence. The use of SearchMe's thumbnail
technology makes for a fast way to browse the web on even the slowest connections. See images below:

IPhone

SearchMe, Inc. Confidential Page 16


Android

SearchMe, Inc. Confidential Page 17


Symbian

SearchMe, Inc. Confidential Page 18


Windows Mobile

SearchMe, Inc. Confidential Page 19


Toolbars

SearchMe's IP also includes a Firefox tool bar. This toolbar allows users to search directly from their
browser. This toolbar can also be used to gather metrics about the machine it is installed on and its users
browsing behavior. The toolbar can be ported to fit any search engine.

Mini Search Widget

The mini search widgets allows users to embed the power of any visual search technology on their blog,
web site or social network with just a few lines of code. Like SearchMe's site this widget supports
multiple types of media.

SearchMe, Inc. Confidential Page 20


Thumbnail Server Technical Details:

Given a URL and image size (width x height) the server will return an image for the given URL.

Given a URL the thumbnail server will return the following metadata:

o The bounding box (x, y, width, height) of every piece of text, hyperlink and embed tag on the
page.
The href of each hyper link on the page.
o The embed code for every embedded object on the page.
o The majority of the US phone numbers (and their respective bounding boxes) on the page.
o Beta: The system is set up to sniff out addresses as well. The coverage is limited and exact
metrics are unknown.
o The fully parsed post on load inner HTML the web page and all its child documents
(frames).

Additional SearchMe UI Visuals

SearchMe, Inc. Confidential Page 21


(B) Ribbon Control

The ribbon control is a proprietary method of presenting information in a fluid, "cover flow" style design.
Its presence gives any website a "wow" effect by visually stringing together linked images in a "ribbon"
or liquid image stream. The background of the ribbon can be customized around any color scheme so as
to better fit the unique needs of any branding requirement. In particular, SearchMe's Ribbon Control has
significant value for web applications that need to display an infinite number of items in a finite space;
SearchMe's Ribbon Control technology enables large volumes of "inventory" to be displayed in a central
location. Ribbon Control is the perfect substitute to the commonplace horizontal or vertical list of web
pages, videos, documents, or other items.

Details:

The ribbon control is more than


meets the eye. It is not tightly
coupled with the SearchMe IP, it is
designed to be reused and extended.
Below is a list of its key features:

1. The control is designed using the


item renderer data provider
paradigm. Any UI Component can
be presented in the slides. The
slides need only implement an
interface consumes data and paint
the slide's content.

2. The control is designed to


support a high number of records
and as such uses memory wisely.
That is if the data provider has
100,000 records and there are only
20 slides on the screen there will
only be 20 slides in memory
regardless of the selected index.
Simply stated the slides are
recycled.

3. The distortion logic for each


slide is a strategy. This means that
conceivably the control can support
an infinite number of layouts. At
this time it supports 3 layouts
(ribbon, film, page flow). It is
important to note that the control
was ported from pure AS3 to Flex 3. The animation does not use any classes in the mx package thus
porting it back is certainly possible.

4. The ribbon control can be presented in any color. It has CSS properties that take an array of colors
and their positions used to form a gradient.

SearchMe, Inc. Confidential Page 22


How it works:

The inner workings of the control are an advanced topic that is beyond the scope of this summary, that
being said the complexities are encapsulated well. To use the control the developer need only create an
item renderer that implements an interface and implement the build, focus and blur methods. The item
renderer must also dispatch an event to let the control know it is ready to be drawn. The control takes an
array collection of data. The control monitors the array collection for changes and responds accordingly.
In other words, if you add or remove records from the array collection the control will logically add or
remove sides. This feature is useful when implementing paging.

III. BACK-END ARCHITECTURE AND RELATED ASSETS

SearchMe Architecture Overview

SearchMe's Back-End Architecture achieves its "magic" through various aspects of online and offline
components with specialized systems and data. The back-end is substantial in its complexity and robust in
its capabilities. It supports a parallel installation for redundancy, and is designed to fit roughly 10M web
documents per Index Server. In addition, the running system has significant support for editorial
functionality such as Paparazzi and Eddie as well as integration of "special services" from the Widget
Server.

SearchMe's Back-End Architecture is principally comprised of the following 18 primary components or


logical systems, (A) Paparazzi, (B) Eddie, (C) Search Relevance, (D) CHOCO Classification
System, (E) SFP – Special Features Processing, (F) Aggregation, (G) Page Quality, (H) Specialized
SPAM Prediction Engine, (I) Distributed Document Storage System, (J) Spider System, (K)
TORGO Judgment Collection System, (L) TORGO Data Set, (M) CHOCO Data Set, (N) Paid
Inclusion System, (O) IB – Index Builder, (P) El Rapido System, (Q) Hulu Document Processor,
and (R) Other Media Feed Processing System. These components are described in more detail in the
following narrative:

(A) Paparazzi

Paparazzi System is a combination of several components which enable a consistent ranking for specified
classes of results. Paparazzi combines human rules, classification and domain-specific lists to enhance
relevance. For example, an editor might decide that for all Actors, the first result should be the 'actor's
homepage', followed by 'IMDB page', followed by a Hulu-clip (i.e. the actor's recent appearance on
Letterman), etc... Paparazzi simplifies the process of generating a consistent experience over
automatically classified data.

(B) Eddie

Eddie is the nickname for the SearchMe system to editorially manage queries. Some queries have results
which should be blocked (in-appropriate for the given query), or queries where desired content does not
rank as it should. The Eddie system includes a web UI to manage queries or groups of queries to enable
editorial ranking. The Cluster Server honors the Eddie rankings, as does the Index Builder and Index
Servers. The system is designed such that it is possible to rapidly make and incorporate changes - without
requiring a full rebuild of the production/live indexes.

(C) Search Relevance

SearchMe, Inc. Confidential Page 23


The online relevance function was built with hundreds of specially crafted features designed by experts in
the field over two years - this unique and extensive set of run-time features allows for DCG within a small
bound of Google or Bing. In order to effectively utilize the extensive set of features, we trained a MLR
(Machine Learned Relevance Function) over about 1 Million labeled training data [TORGO Data
Repository (L)].

The labeled training data is a key part of the value of SearchMe and is valuable to anyone in this space
(even without the associated infrastructure of the SearchMe platform) - in addition to the judgments (a
human evaluation of relevance for a given query or URL) we have substantial other human data about
every query and URL (in our TORGO judgment collection system). This includes indications of
misspellings, official homepages, adult intent, spam and many others. This data could be used (and was
used) for training more than just the MLR system.

(D) CHOCO Classification System

One of the strongest differentiators of the SearchMe system is the Classification. The system (CHOCO)
allows a non-engineer to teach the computer a new web-scale category in just a few hours. The nearly
1000 algorithmic categories could be applied to every page in our index in just a few days of offline
processing. This system is loosely tied to part of the SearchMe system [the document storage system or
docstore (I)]. As we built up the nearly1000 algorithmic categories, we generated substantial training data
and testing data to evaluate each category (M). The labeled pages for each category are also very valuable
by themselves for anyone designing any type of categorization system.

(E) SFP – Special Features Processing

One of the things key to any search engine is the ability to make new features quickly. We have a system
we call "New SFP" (New Special Features Processing) which can take a YAML config file and then use
that to generate a wide variety of page-related features. This system was designed so non-programmers
(i.e. spam or domain experts) could encode complex features - such as "Unique Spam Words in Title" in
just a few lines of Perl or XPATH. New SFP would utilize structured data files (lists) and would run
against all documents stored in the docstore (I) offline. This system is separable, but was designed to
work with the proprietary Docstore system.

(F) Aggregation

The SearchMe aggregation system allowed for scanning the entire docstore (> 3B documents) and
generating summary data - such as "nfl.com" is 95% Football, or the average wordcount of the
imdb.com/movies is X. The set of features aggregated is managed by easily editable config files.

(G) Page Quality

Using a complex trained classifier (using similar system to training the main MLR) we can assign a
numerical score from -1 to +1 where +1 means a strong high-quality document, -1 means a low-quality
document. This system runs on a .tsv file but requires the production system to generate the specific
features used.

(H) Specialized Spam Prediction Engine

Automated SPAM Classification - One of the biggest challenges for web-scale search engines is SPAM,
or automatically generated content designed to manipulate search ranking. Using technology similar to
our MLR, we trained an automated SPAM classifier that assigns a score to every document considered for

SearchMe, Inc. Confidential Page 24


indexing. The features used include some features used for relevance, as well as many features generated
using our Special Features Processing framework including hand culled lists of spam 'concepts'.

(I (a)) Distributed Document Storage System (docstore)

In order to make our search engine operate, it is critical to store the "raw data" and provide a central
repository for all of the offline processing. The Docstore is designed for very fast streaming for in and
outbound requests. It is possible to make requests to stream results in a distributed set of clients. You
could run ten or a thousand Classification clients, or run a hundred spiders, etc... Each could read (in
consecutive streams) or write in bulk to the docstore.

(I (b)) Document Pipeline Processing Subsystem

Each document required a sequence of steps to prepare it for indexing - beginning with spidering, then
classification, special features processing, etc... In order to make this processing more efficient, a
customized "pipeline processing architecture" was created. This sub-system was used to process the El
Rapido (near-real-time content), as well as media content for indexing.

(J) Spider System

All large-scale search engines need to obtain content through some type of 'spider'. SearchMe's spider,
called Charlotte, is designed to be multi-threaded and distributed. The spider streams URLs to crawl from
the distributed docstore and stream back the resultant page data (and responses). In addition to the spider,
there is an associated connection throttler which enforced rates per domain (each domain could have
different associated rates).

(K) TORGO Judgment Collection System

The TORGO system is a semi-flexible system (parts tied to the production docstore/spider) for allowing
collection of human evaluations. The system permitted making a new project and automatically fetching
results from our engine or third party engines with customized scrapers. Human judges could select
queries to judge where cached copies of pages are presented (the caching system was an installation of the
docstore and spider). The configuration of TORGO is done via simple SQL entries - the menus and
options are totally configurable. Judgments can cover queries (i.e. adult, navigational, etc...), URLs
(spam, porn, etc..) and query/URL (i.e. highly-relevant, official homepage, bad, etc...).

(L) TORGO Data Set – See Exhibit B, "Data Supplement"

We have about 1M "judgments" - each Judgment includes many pieces of data - ranging from: a
particular URL is highly-relevant for a given query, or the information described above in the TORGO
system description. We also have actual result lists from competitor search engines - to allow for DCG
(relevance score) comparison. The judgment scale was 5-point – "Expected", "Comprehensive",
"Acceptable", "Marginal", and "Bad", as well as other no-judgment options (page error, etc…).

(M) CHOCO Data Set – See Exhibit B, "Data Supplement"

The CHOCO system is one of the strongest differentiators for the SearchMe system - being able to
classify about 1000 algorithmic categories for each and every web page in our index in fractional seconds
each. The CHOCO system was built up over time and each of the about 1000 algorithmic categories has
multiple (ranging from about 40 to 120) positive labeled training examples as well as about 50-100

SearchMe, Inc. Confidential Page 25


labeled 'testing' URLs. This data - along with the associated Ontology/Taxonomy is useful for any
company in the area of classification.

(N) Paid Inclusion System

Like some other big search engines, we needed a way to rapidly insert arbitrary URLs, and provide
tagging of them to allow tracking (potentially for payment). The PI system includes powerful feed
processing that is able to take .tsv (standard formatted) files and automatically insert these into a docstore.

(O) IB - Index Builder

Each production IS required a "built index" - the IB produces this built index. It streams documents from
the docstore and produce an inverted index, along with many other fields required for the resultant XML
(media-fields), and aggregation data. The IB can build a 8.5 M document Index in under a day.

(P) El Rapido System

Our engine primarily focused on using URLs discovered as part of the webmap process - an expensive,
time consuming stage. To speed inclusion of new content (i.e. News, special-interest feeds, etc...), we
built the Paid Inclusion System (N) - which can take a .tsv feed and insert it into the docstore (a separate
production process runs the IB). The El Rapido system takes as input a list of RSS feeds, fetches the
results and produces a .tsv file for the PI system to insert into the docstore.

(Q) Hulu Document Processor

We built a custom system that can connect to HULU (implementing their API) and fetch the 'changes'.
The results are then used to generate a .tsv which contains only the 'active content' and the appropriate
MLR related fields.

(R) Other Media Feed Processing System

We also include pages from YouTube, Imeem and Ficker - we have a version of the El Rapido feed
generation which is generic - there are two halves, one which takes RSS inputs and produces an
'intermediate file', and the other tool takes the 'intermediate file' to produce a feed. This way any custom
tool could be used to connect with proprietary APIs.

SearchMe, Inc. Confidential Page 26

Você também pode gostar