Você está na página 1de 19

Salon 3.

0 System
FULL TEXT SEARCH - SOLUTIONS EVALUATION

Submitted by

www.Patni.com
Version: 1.0
Date: 07 November 2008

TABLE OF CONTENTS
1
1.1

INTRODUCTION.........................................................................................................4
Salon 3.0 Requirements....................................................................................................................................4

POSSIBLE SOLUTIONS............................................................................................4

2.1
File System Based Search - Using Windows Indexing Service.....................................................................4
2.1.1
Steps for configuring this service:-.............................................................................................................4
2.1.2
Security Points for Windows Indexing Service :-......................................................................................8
2.1.3
Pros and Cons.............................................................................................................................................8
2.2
Database Based Search Using SQL Server 2005 Full Text Search............................................................8
2.2.1
Pros and Cons..........................................................................................................................................14

POC DETAILS...........................................................................................................15

3.1
POC Implementation Details.........................................................................................................................17
3.1.1
Using Windows Indexing Service.............................................................................................................17
3.1.2
Using SQL Server 2005 FullText Search................................................................................................18

PATNI RECOMMENDATION....................................................................................19

APPENDIX................................................................................................................19

5.1

Reference.........................................................................................................................................................19

DOCUMENT CONTROL:
Security
Classification:
Issue Date:
Author(s):

Patni Confidential
07 November 2008
Name
Sameer M

Title
Technical Designer

Archana Kamat

Technical Architect

Reviewer(s)

Document History:

Date

Revision

Change

17 Sep 2008

0.01D

Initial Draft .

07 Nov 2008

1.0

Patni recommended solution File


System based storage and search
has been finalized for Salon 3.0
system

1 INTRODUCTION
This document provides details on the Full Text search requirement of Salon 3.0 system. It also
explores possible solutions for implementing the same.

1.1 Salon 3.0 Requirements


In general, Full text search means searching data in the database and files on the file system.
Stated below is the Full Text Search requirement from The Future State document for Salon
3.0 system.
1. Full text search "Search For" - with the entry of a search key a full text search will be
executed relying on one or more attributes.

2 POSSIBLE SOLUTIONS
These requirements for Full Text Search can be achieved by following ways -

2.1 File System Based Search - Using Windows Indexing Service


Windows Indexing Service is a base service for Microsoft Windows 2000 and later. It supports
creation of indexed catalog by extracting content from one or more selected files. This indexed
catalogue enables efficient and rapid searching on the file system.

2.1.1

Steps for configuring this service:-

1) Open the Computer Management tool available in Administrative Tools.


2) In the tree view under Services and Application node click on Indexing Service.
3) A list of existing catalogs is displayed in the right panel.

4) Right-click on 'Indexing Service' and select 'New' 'Catalog' from the list that appears.
This will present the following dialogue box.

5) Enter the catalog a name like Search and specify the location of the catalog where it
will be stored.
6) Press 'OK' to continue.
7) On the catalog created, select the directory folder. Right click and select new directory
menu option. In the displayed window give the path of the directory that needs to be
included in the search operation.

8) Repeat step 7 to include more directories.


9) Stop the indexing service and then restart it. The service will start scanning and
indexing on the directories defined for the catalog.

2.1.2

2.1.3
Pros

Cons

Security Points for Windows Indexing Service :The Indexing service runs on the local system account. It can not be configured to run
in any other context.
On a local computer indexing service uses the System account to operate. If the system
account does not have access to documents or directories, Indexing service will not be
able to index the documents.
Any authenticated local or remote user can issue Indexing Service queries.

Pros and Cons


Supports querying on contents of file as well as its properties.
Supports searching within office documents, HTML files, plain text files & Multipurpose
Internet Mail Extension (MIME) messages.
Searching within PDF files is also supported by installing the adobe ifilter.
FullText Search Service is part of the operating system, hence no additional software
installation needed.

No search support for xml documents.


Utilizes disk space and resources on the web server for file storage and catalogs
In case of clustered environment storage of data(files) on file system is not
recommended as it causes server affinity.
Incase of disk crash the service configuration and catalog is lost
If not secured properly files saved on the file system can be tampered

2.2 Database Based Search Using SQL Server 2005 Full Text
Search
Steps for using this service:
1) Open the Microsoft SQL Server Management Studio and connect to the SQL Server 2005
database instance where the full text catalog setup needs to be created.
2) Create a table for storing files. For example :
CREATE TABLE [dbo].[Documents](
[documentid] [int] IDENTITY(1,1) NOT NULL,
[FileName] [nvarchar](50) NULL,
[FileSize] [int] NULL,
[ContentType] [nvarchar](50) NULL,
[full_Text_bin] [varbinary](max) NULL,
[Extention] [nchar](10) NULL,
CONSTRAINT [pk_documents] PRIMARY KEY CLUSTERED
(
[documentid] ASC
))

3) Ensure the Full-Text search is enabled on the selected database. Open the database
properties screen, then select the Files page. This window has "Use full-text indexing"
checkbox for enabling or disabling the full-text search on this database. If the option is
disabled, then enable it by checking the checkbox.

4) On the selected database go to the Storage -> Full Text Catalogs folder. Right Click and
select the option New Full Text Catalog. This will bring a new window

5) On the window enter the Catalog name like Search, its location and other details and
click the OK button. This creates the catalog for the database.
6) Select the catalog, right click and on the displayed menu select Properties option. A
new window will be displayed.

7) On the displayed window select the Tables/Views option. The screen looks as follows

8)

Assign the Documents table from the displayed list to the catalog. In the Selected
object properties section, under Available Columns, tick the check box for
full_text_bin field. Under the Data Type Column for the full_text_bin field, select
the Extension field from the dropdown. Click the OK button to save the changes.
9) This completes the setup for creating a catalog on the SQL Server 2005.

2.2.1
Pros

Cons

Pros and Cons


Fulltext search service is part of SQl Server 2005. No additional service needs to be
created.
Supports search for different file types such as office documents, HTML files, plain text
files & Multipurpose Internet Mail Extension (MIME) messages, pdf files and xml files
too.
Backup of the catalog can be taken along with database. Incase data is lost; it can be
recovered from backup.
Data is more secured than storing on file system.

Programming point of view storing files and retrieving from database is more tedious
than that on file system.
SQL Server 2005 imposes restriction on file size of the file to be stored in the DB. Max
file size allowed is 2 MB.

3 POC DETAILS
A POC has been done to test both solutions of Full Text Search.
The details of the POC are as follows
Below files considered for the full text search
POC for Salon.xls(20KB)
Analysis Report Screen.htm (3KB)
License.txt(3KB)
proposal-expectations.doc(86KB)
SalonSystem3.0_Request_Module_UCS.doc(170KB)
abcpdf.pdf(336 KB)
NET Memory Profiler.pdf(1.23 MB)
A web application created for entering search criteria and two submit buttons one for File
System Full Text Search and the other for Database Full Text Search.
The user can enter the search string in the text box and click one of the submit buttons.
On clicking the Search In File System button, a search is performed on the catalog
created on the IIS machine. The search results and the turn-around time taken for the
search operation is recorded and displayed on the screen:

On clicking the Search In Database button, a search for the entered string is done on the
catalog created on the database. The search results and the turn-around time taken for the
search operation is recorded and displayed don the screen:

3.1 POC Implementation Details


3.1.1

Using Windows Indexing Service

A new catalog namely Salon is created on the application server machine


ADO .NET OleDB provider(System.Data.OleDb) used for the full text search on
filesystem .Source Provider tag added to Web.Config . And select query used for the
search.
Web.config setting
<add key="IndexService" value="Provider=MSIDXS;Data Source=Salon;"/>

Code Snippet for data search on file system using OleDB provider of Ado .NET
DataSet ds = new DataSet();
string query = "";
try
{
string strconnection =
ConfigurationManager.AppSettings["IndexService"].ToString();
OleDbConnection connection = new
OleDbConnection(strconnection);
if (strsearchstring.IndexOf('*') > 0 )

query = @"SELECT FileName FROM scope() WHERE CONTAINS (Contents, '" +


'"' + strsearchstring + '"' + "')";
else
query = @"SELECT FileName FROM scope() WHERE FREETEXT(Contents, '" +
'"' + strsearchstring + '"' + "')";
OleDbDataAdapter objAdp = new OleDbDataAdapter (query, connection );
objAdp.Fill(ds, "FileName");
Security settings for POC
Application hosted on IIS with Security setting set to Basic Authentication. And no-anonymous
access.
Corresponding settings done in Web.config as shown below
<authentication mode="Windows"/>
<authorization >
<deny users="?" />
<allow users="*"/>
</authorization>

Fulltext search on file system tested for local and remote users who are successfully
authenticated and they do not have admin access on the file system of the application
server.

3.1.2

Using SQL Server 2005 FullText Search

Implementation of search is similar to any record search in the database tables. ADO .NET
Sqlclient provider used for database access.
Code Snippet for data search on the database using SqlClient provider of Ado .NET
DataSet ds = new DataSet();
string query = "";
try
{
string strconnection =
ConfigurationManager.ConnectionStrings["Databaseconnection"].ToString();
if (strsearchstring.IndexOf('*') > 0)
query = "SELECT FileName FROM Documents WHERE
CONTAINS(full_Text_bin,N'" + '"' + strsearchstring + '"' + "')";
else
query = "SELECT FileName FROM Documents WHERE
FREETEXT(full_Text_bin,N'" + '"' + strsearchstring + '"' + "')";
SqlConnection dbConn = new SqlConnection(strconnection);
dbConn.Open();
dbConn);

SqlDataAdapter objAdp = new SqlDataAdapter(query,


objAdp.Fill(ds, "Documents");

4 PATNI RECOMMENDATION
On analyzing the possible-solutions details and POC results, we feel both solutions are suitable
for Salon 3.0 system requirements. That is either
File-System based document storage and full-text-search can be used. OR
Database based document storage and full-text-search can be used.
Patni recommends File system based documents storage and full text search. Reasons are as
below:1. All pros(of File System based search) as detailed in prior section are in favor of Salon 3.0
system requirements
2. All cons (of File System based search) as detailed in prior section do not have major
impact on Salon 3.0 system requirements.
For example: Cons related to security,data backup can be taken care with appropriate
Operational and administrative solutions.
Regarding no support for XML files, we assume XML is not a required file type to be
supported for Salon 3.0 system. File types supported by File system based full text search
are MS Office files, MIME messages, HTML, text files, PDFs.
3. As per the proposed infrastructure requirements for Salon 3.0 system, the application
server will be a dedicated one. And as per the P&G infrastructure standards no clustered
environment available for dedicated application server. Hence server affinity issue is not
anticipated in case of file-system based storage.
Even in future , if clustered environment is applied then solution can be devised by having
a central server for file storage.
4. For filesystem-based search, to support PDF files adobe ifilter(free downloadable) has to
be installed on the application server. Since Salon3.0 application server is dedicated one
this installation should not be a problem.
5. Having DB based file storage and search may call for more tables, stored-procedures and
also some special settings on the database.
As per the proposed infrastructure
requirements for Salon 3.0 system, the database server will be in shared environment. As
per P&G infrastructure standards, there are restrictions on the DB size, # of tables, stored
procedure etc for the DB hosted in shared server.
6. SQL Server 2005 database has 2MB file size restriction. But there is no pre-defined min-size
or max-size for the files/documents to be stored in Salon 3.0 system.

5 APPENDIX
5.1 Reference
http://msdn.microsoft.com/en-us/library/ms142571(SQL.90).aspx
http://msdn.microsoft.com/en-us/library/aa163263.aspx
POC Source Code - In Salon 3.0 VSS

Você também pode gostar