Você está na página 1de 30

Advanced Search with

Lucene

Drupal + Lucene without the caffeine.


Introductions

Presenters

Chris Pliakas – Engineer

Erich Beyrent – VP of Engineering

http://www.commonplaces.com
Presentation Summary

• The problem with Search



What is Lucene?

The Search Lucene API module

Advanced usage

Implementing the API

The state of SLAPI and where it is going
Problem - Common Search Requests


Advanced query Syntax

High-performance, scalable

Ability to add custom facets

Multisite search, content not shared

Managed through Drupal admin interface
Analysis of the Core Search

Pros:

Good API ... for the most part

Pure PHP solution

Works out of the box

Cons:

Elementary query syntax

Not scalable

No good method to alter query

Adding facets is unwieldy
What is Lucene?

is ...

An open source text search library written in

High-performance and full featured

Supported by the Apache Software Foundation
Capabilities of Lucene

• Ranked search results


• Boolean AND, AND NOT, OR
• Fielded data search
• Powerful query types
•Wildcard, fuzzy, range, boost

• Field and term grouping


• Index on filesystem, no SQL
Search Lucene API

Search Lucene API


http://drupal.org/project/luceneapi
Goals of Search Lucene API

• Integrate Lucene into Drupal


• API for Lucene backend, define hooks
• Implement and extend core Search API
• Easy to install, no external services
• Native PHP solution

Drupal ninjas use hooks, and


we don't want to upset ninjas.
Where's the PHP?

Expertly Decaffeinated by the


What is the Zend Framework?

Well documented, tested, E_STRICT compliant

ZF's Zend_Search_Lucene component

Object oriented PHP port of Lucene

Lucene index binary compatible with Java

Stripped down version of required components
Installation

• Download Search Lucene API from Drupal.


•http://drupal.org/project/luceneapi

Download ZF components from SourceForge.net

Enable the Search Lucene API modules
•Search, Search Lucene API, Search Lucene Content

Run !!

Your site search now rocks.


Configuring Search Lucene API


Hijacking the core search box

Error handling settings

Search Lucene Content settings

Configuring facets

No kittens were harmed in the making


of the D6 version of Search Lucene API
Performance Testing

Comparison With Other Engines


Search Lucene API vs. Search vs. Apache Solr

Memory consumption

Page load time

Index maintenance operations
Improving Lucene Performance

Performance Settings


Search results caching

Result set limit

Index optimization
Maintaining Lucene With Drush


Who needs cron?

Performing common maintenance tasks

Retrieving index information

Updating “gotcha”

The future of Drush integration
Search Lucene API

Implementing the API


PHP 5 Language Constructs

Before we start developing ...


Objects passed by reference

Exceptional error handling with Exceptions

Autoload implementation

Abstraction layer for common ZF objects
Faceted Search

•“A faceted classification system allows the assignment


• of multiple classifications to an object, enabling the
•classifications to be ordered in multiple ways, rather
•than in a single, pre-determined, taxonomic order.”
•~Wikipedia

“Wikipedia is the best thing ever. Anyone in


the world can write anything they want about
any subject. So you know you are getting the
best possible information” ~Michael Scott
Creating Facets

Creating a Search Lucene API Facet Module


Why the Facet API makes sense

hook_luceneapi_facet($op, $module, $type)

Handling facets via “facet handler” callback

How to $_GET facet values

Defining multiple facets in one hook.

Advanced facets on Twolia
How the Facet API Works

Very similar to the core Search



Converting $_POST to $_GET

Facet hook invoked in luceneapi_form_alter()

Callbacks invoked in luceneapi_search('search')

Facet queries appended as required subqueries
Extending Search Lucene Content

Index Hooks

•hook_luceneapi_document_alter($doc, $module, $type)


•hook_luceneapi_document_delete($item, $module, $type)

“Useful for adding extra fields for


faceted searched ad filtering which
data can be deleted from the index”
Extending Search Lucene Content

Search Hooks

•hook_luceneapi_query_alter($query, $module, $type)


•hook_luceneapi_result_alter(&$result, $module, $type)
•hook_luceneapi_positive_keys($keys, $module, $type)

“Useful for modifying the final


search query and the information
displayed in the results”
Creating a Search Lucene API Module

•Core search hooks:


•hook_search(), hook_update_index()

•Search Lucene API hooks:


•hook_luceneapi_index($op)

“Search Lucene API is an extension


of the core Search API”
Future Development

Search Lucene API


Going Forward
Drawbacks


Memory intensive

Lack of an SMP solution

Lucene index on NFS volumes

Distributed indexes?
Search Lucene API 2.0

Addressing scalability

Process control extension

Forking the search processes

Index opened only once on startup

Drupal module becomes the application
Search Lucene API 2.0

New Features

User, help, multisite search

Result sorting

User defined weights and boost factors

Better index statistics

Improved caching mechanism
Recap

In Summary ...


Replace core search with Search Lucene API

Install, configure, and tune SLAPI modules

Maintain indexes via Drush

Use and extend Seach Lucene API
Search Lucene API

Questions
Thank you!

Search Lucene API


http://drupal.org/project/luceneapi

Presenters

Chris Pliakas – Engineer
 Erich Beyrent – VP of Engineering

http://www.commonplaces.com

Você também pode gostar