Escolar Documentos
Profissional Documentos
Cultura Documentos
Administrators Guide
Trademarks
Endeca, the Endeca logo, Guided Navigation, Endeca The Next Generation of Search and Information Access, Find/Analyze/Understand, MDEX Engine, Endeca Latitude, Endeca Profind, Endeca Navigation Engine, and other Endeca product names referenced herein are registered trademarks or trademarks of Endeca Technologies, Inc. in the United States and other jurisdictions. All other product names, company names, marks, logos, and symbols are trademarks of their respective owners.
Contents
Preface
About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Who should use this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Contacting Endeca Standard Customer Support . . . . . . . . . . . . . . . xxi
SECTION I
Chapter 1
ENDECA TOOLS
Endeca Tools Overview
Endeca tools and tool components . . . . . . . . . . . . . . . . . . . . . . . . . 26 Endeca Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Endeca Developer Studio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Endeca Application Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Chapter 2
iv
Chapter 3
Chapter 4
Authentication of users in Web Studio with LDAP enabled. . . . . 57 Troubleshooting user authentication in Web Studio with LDAP enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 User profiles for LDAP users and groups . . . . . . . . . . . . . . . . . . 59 Roles and permissions for LDAP users and groups . . . . . . . . . . 59 Administrators in Web Studio with LDAP. . . . . . . . . . . . . . . . 60 Workflow notifications for LDAP users and groups . . . . . . . . 61 Enabling LDAP authentication in Web Studio . . . . . . . . . . . . . . . 61 Disabling LDAP authentication for Web Studio . . . . . . . . . . . 62 Configuration of the Webstudio login profile for LDAP . . . . . . . . 62 Specifying the location of the configuration file . . . . . . . . . . . 63 Templates used in the Webstudio profile. . . . . . . . . . . . . . . . 64 Configuration parameters for the Webstudio profile . . . . . . . 65 LDAP path parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Specifying multiple values for parameters in the Webstudio profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Chapter 5
vi
Chapter 6
Chapter 7
vii
Viewing HTML reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Viewing reports produced by other Report Generators. . . . . . . 116 Archiving and deleting log files and reports . . . . . . . . . . . . . . . 116
Chapter 8
SECTION II
Chapter 9
viii
Chapter 10
Chapter 11
ix
Component reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Forge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Dgidx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Dgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Agidx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Agraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Crawler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 LogServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 ReportGenerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Provisioning your implementation with eaccmd . . . . . . . . . . . . . . . 183 Provisioning the Application Controller to work on multiple machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Multiple machine example . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Forcing the removal of an application . . . . . . . . . . . . . . . . . . . . . . 185 Incremental provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Incremental provisioning guidelines . . . . . . . . . . . . . . . . . . . . . 186 About the def_file setting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 About the --force flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Adding, removing, or updating a component . . . . . . . . . . . . . . 187 Adding, removing, or updating a host . . . . . . . . . . . . . . . . . . . . 188 Adding, removing, or updating a script . . . . . . . . . . . . . . . . . . . 189 Provisioning your deployment with Endeca Deployment Template 189 Downloading the Endeca Deployment Template . . . . . . . . . . . 190 Using the Endeca Deployment Template . . . . . . . . . . . . . . . . . 190
Chapter 12
Eaccmd usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Eaccmd command reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Provisioning commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Provisioning example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Incremental provisioning commands . . . . . . . . . . . . . . . . . . . . . 196 Incremental provisioning example . . . . . . . . . . . . . . . . . . . . 199 Synchronization commands . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 About the Synchronization service . . . . . . . . . . . . . . . . . . . . 200 Synchronization examples . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Component and script control commands . . . . . . . . . . . . . . . . . 201 Component control example . . . . . . . . . . . . . . . . . . . . . . . . 201 Utility commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 General notes on Application Controller utilities. . . . . . . . . . 202 The List Directory Contents (ls) command . . . . . . . . . . . . . . 203 The Shell utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 The Copy utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 The Archive utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Chapter 11
xi
removeAllFlags(IDType removeAllFlagsInput) . . . . . . . . . . 222 listFlags(IDType listFlagsInput) . . . . . . . . . . . . . . . . . . . . . . 222 Utility interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Utility methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 startBackup(RunBackupType startBackupInput) . . . . . . . . . 223 startFileCopy(RunFileCopyType startFileCopyInput) . . . . . 224 startRollback(RunRollbackType startRollbackInput) . . . . . . 225 startShell(RunShellType startShellInput) . . . . . . . . . . . . . . . 226 stop(FullyQualifiedUtilityTokenType) . . . . . . . . . . . . . . . . . . 227 getStatus(String applicationID, String token) . . . . . . . . . . . . 227 listDirectoryContents(ListDirectoryContentsInputType listDirectoryContentsInput) . . . . . . . . . . . . . . . . . . . . . . . . 228 Provisioning interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Provisioning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 defineApplication(ApplicationType application) . . . . . . . . . . 229 getApplication(IDType getApplicationInput). . . . . . . . . . . . . 230 getCanonicalApplication(IDType getCanonicalApplicationInput) . . . . . . . . . . . . . . . . . . . . . 230 listApplicationIDs(listApplicationIDsInput) . . . . . . . . . . . . . . 231 removeApplication(RemoveApplicationType removeApplicationInput) . . . . . . . . . . . . . . . . . . . . . . . . . . 231 addComponent(AddComponentType addComponentInput) 231 removeComponent(RemoveComponentType removeComponentInput) . . . . . . . . . . . . . . . . . . . . . . . . . 232 updateComponent(UpdateComponentType updateComponentInput) . . . . . . . . . . . . . . . . . . . . . . . . . . 233 addHost(AddHostType addHostInput) . . . . . . . . . . . . . . . . . 234 updateScript(UpdateScriptType updateScriptInput) . . . . . . 235 removeHost(RemoveHostType removeHostInput) . . . . . . . 235 updateHost(UpdateHostType updateHostInput) . . . . . . . . . 236 addScript(AddScriptType addScriptInput) . . . . . . . . . . . . . . 237 removeScript(RemoveScriptType removeScriptInput). . . . . 237 ScriptControl interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
xii
ScriptControl methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 startScript(FullyQualifiedScriptIDType startScriptInput) . . . . 238 stopScript(FullyQualifiedScriptIDType stopScriptInput) . . . . 239 getScriptStatus(FullyQualifiedScriptIDType getScriptStatusInput). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Chapter 12
xiii
DgidxComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 DgidxComponentType properties . . . . . . . . . . . . . . . . . . . . . . . 250 DgraphComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 DgraphComponentType properties. . . . . . . . . . . . . . . . . . . . . . 251 DgraphHostPortType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 DgraphHostPortType properties . . . . . . . . . . . . . . . . . . . . . . . . 252 DgraphReferenceType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 DgraphReferenceType properties. . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryType properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 EACFault class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 EAC Fault property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathListType. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FlagIDListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FlagIDListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 ForgeComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 ForgeComponentType properties . . . . . . . . . . . . . . . . . . . . . . . 255 FullyQualifiedComponentIDType class . . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedComponentIDType properties . . . . . . . . . . . . . . . 256 FullyQualifiedFlagIDType class . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedFlagIDType properties. . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedHostIDType class . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedHostIDType properties . . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedScriptIDType class . . . . . . . . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedScriptIDType properties. . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedUtilityTokenType class . . . . . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedUtilityTokenType properties. . . . . . . . . . . . . . . . . 257 HostListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
xiv
HostListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 HostType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 HostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 ListApplicationIDsInput class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 ListDirectoryContentsInputType class . . . . . . . . . . . . . . . . . . . . . . 258 ListDirectoryContentsInputType properties . . . . . . . . . . . . . . . . 259 LogServerComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . 259 LogServerComponentType properties . . . . . . . . . . . . . . . . . . . 259 PropertyListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 PropertyListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 PropertyType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 PropertyType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 ProvisioningFault class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 ProvisioningFault properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 RemoveApplicationType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RemoveApplicationType properties. . . . . . . . . . . . . . . . . . . . . . 261 RemoveComponentType class. . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RemoveComponentType properties . . . . . . . . . . . . . . . . . . . . . 261 RemoveHostType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RemoveHostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 RemoveScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 RemoveScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . 262 ReportGeneratorComponentType class . . . . . . . . . . . . . . . . . . . . . 262 ReportGeneratorComponentType properties . . . . . . . . . . . . . . 262 RunBackupType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 RunBackupType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 RunFileCopyType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunFileCopyType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunRollbackType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunRollbackType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunShellType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 RunShellType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 RunUtilityType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
xv
RunUtilityType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptListType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 SSLConfigurationType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 SSLConfigurationType properties . . . . . . . . . . . . . . . . . . . . . . . 268 StateType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 StateType fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 StatusType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 StatusType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 TimeRangeType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 TimeRangeType fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 TimeSeriesType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 TimeSeriesType fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 UpdateComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 UpdateComponentType properties . . . . . . . . . . . . . . . . . . . . . . 270 UpdateHostType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 UpdateHostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 UpdateScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 UpdateScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . 271
SECTION III
Chapter 13
xvi
Using emgr_update to transfer from a Web Studio staging environment to a Web Studio production environment . . . . . 282 Transferring all instance configuration files . . . . . . . . . . . . . 282 Transferring only instance configuration files modified by Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Using emgr_update to transfer from one Web Studio environment to another . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Using emgr_update to remove instance configuration files from Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Using emgr_update to send the dimensions file produced by Forge to the Web Studio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Removing an application from Endeca IAP . . . . . . . . . . . . . . . . . . 287
SECTION IV
Chapter 14
Chapter 15
xvii
Converting a MDEX Engine request log file . . . . . . . . . . . . . . . 309 Creating a log file by hand using substitute search terms. . . . . 309 Debugging Eneperf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Chapter 16
Chapter 17
xviii
Chapter 18
Chapter 19
SECTION V
Appendix A
APPENDICES
Endeca Flag Reference
Agidx options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Agraph options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Dgidx options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Dgraph options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Forge options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
xix
Appendix B
Appendix C
Index
xx
Preface
The Endeca Information Access Platform is the foundation for building applications based on Endeca MDEX Engine technology. With the Endeca Information Access Platform, you can build Guided Navigation functionality into your Web applications. The Endeca Guided Navigation solution puts the results of all navigation, search, and analytic queries in an organized context that shows users how to refine and explore further. This helps solve the problems associated with information overload by guiding users as they quickly and precisely navigate through large data sets.
xxii
The Endeca Support Center provides registered users with important information regarding Endeca software, implementation questions, product and solution help, training and professional services consultation as well as overall news and updates from Endeca.
Endeca Confidential
SECTION I
Endeca Tools
24
Administrators Guide
Endeca Confidential
Chapter 1
Endeca tools and tool components Endeca Web Studio Endeca Developer Studio Endeca Application Controller
26
Endeca Web Studio contains configuration and administrative functionality for system administrators as well as business logic functionality for business users. Web Studio provides the primary means to administer your Endeca implementation in a Tools environment. Endeca Developer Studio facilitates the entire process of data pipeline development and some aspects of configuration of an Endeca application. Endeca Application Controller (EAC) controls access to and use of an Endeca implementation.
Provision the hosts available to an Endeca implementation. Provision the applications available to an Endeca implementation. Provision the scripts, such as the report generator script, or a baseline update script to an Endeca implementation. Configure SSL settings, report generation, and set up a preview application for dynamic business rule testing. Perform system operations such as running baseline updates or starting and stopping the MDEX Engine or Log Server. Monitor the status of system components such as Forge, Dgidx, MDEX Engine, Log Server, and Report Generator.
Endeca Confidential
27
Web Studio and Developer Studio require the Endeca Application Controller (EAC) to control and communicate with other components and hosts in an Endeca implementation.
A pipeline diagram, which serves as a visual script for the entire data transformation process. A dimension hierarchy, which provides the dimension names and IDs that are needed to map your source data properties to Endeca dimensions. An index configuration, which defines how your Endeca records, Endeca properties, dimensions, and dimension values are indexed by Dgidx.
From a file perspective, an instance configuration is represented by a number of XML files that Developer Studio generates. For more information about the instance configuration see the Endeca Getting Started Guide.
Endeca Confidential
28
Endeca Confidential
Chapter 2
Changing the Developer Studio configuration Operational tasks in Developer Studio Other Developer Studio tasks Input paths and state in pipeline files
30
Changing the Developer Studio connection to use a different Endeca Web Studio. Specifying command options for Endeca components.
Endeca Confidential
31
Developer Studio sends the instance configuration to the target Web Studio.
On the EAC Administration Console page of Web Studio In the command line interface to EAC (with eaccmd) In the WSDL API to EAC.
The Forge, Dgidx, and Dgraph options that you can specify are listed in Endeca Flag Reference on page 343.
To specify command options on the EAC Administration Console page of Web Studio:
1 2 3 4 5 Start Web Studio and log in to your application. Select the EAC Administration Console page. For the component you want to modify, enter your options exactly as you would on the command line. Click Save Changes. If you are only sending command options to the Endeca MDEX Engine, stop and restart the MDEX Engine so that the new options take effect. Options for Forge and for Forge and Dgidx do not take effect until you run a baseline update.
Endeca Confidential
32
Sending a new instance configuration. Retrieving your Endeca applications most recent instance configuration.
Description
Opens Web Studio, allowing access to the EAC Administration Console. Retrieves Web Studios currently loaded instance configuration. Sends the instance configuration for the current project to Web Studio.
Endeca Confidential
33
The new instance configuration is sent to Web Studio. The new configuration does not take effect until you run a baseline update.
From the File menu, choose Save to save the new project with the instance configuration retrieved from Web Studio.
Endeca Confidential
34
Relative paths in the pipeline are resolved in relation to the directory path you entered in Web Studio, in the Incoming Directory field for hosts, components and scripts on the EAC Administration page. Absolute paths are not changed and used exactly as specified.
Endeca Confidential
Chapter 3
Accessing the EAC Administration Console of Web Studio Provisioning an application using Web Studio Breaking resource locks in Web Studio Performing system operations Monitoring the system status Changing Endeca HTTP service ports Encoding of workflow e-mails in Web Studio
36
Hosts shows a view of your application organized by the hosts you provision. This view indicates the host name, host alias, port and configuration options. You can modify the hosts configuration options, start or stop a component on a host, and see the status of a component on a host. Components shows a view of your application organized by the Endeca components provisioned for an application. You can create components on this tab but not hosts. Scripts show the EAC scripts available to an application and allows you to add, remove, run, and monitor EAC scripts. You can stop and start system operations run by EAC scripts, such as baseline updates.
Note: If you used a different HTTP Connector port when you configured Web Studio, substitute that ports number for 8888. 2 At the Web Studio login page, do the following: a b Type the name and password for the user with the admin role. The default username and password are both admin. If you have an application provisioned, select the application to access. An admin user can also log to Web Studio without any applications provisioned in the system. Click log in.
c 3
Endeca Confidential
37
To hide the drop-down list of applications on the login page of Web Studio:
1 2 3 Stop the Endeca HTTP service. Open the webstudio.properties file located in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF\conf (on UNIX). Locate the com.endeca.webstudio.hide.login.application.dropdown property, for example:
# Hides the dropdown for selecting an application # on the login page. com.endeca.webstudio.hide.login.application.dropdown=false
5 6
Save and close the webstudio.properties file. Start the Endeca HTTP service.
Endeca Confidential
38
To provision an application:
1 2 3 4 Using a Web browser, log in to Endeca Web Studio, as described in Accessing the EAC Administration Console of Web Studio on page 36. Add one or more applications. The procedure to add or remove an application is described in the Endeca Web Studio Help. Add one or more hosts. The procedure to add or remove a host is described in the Endeca Web Studio Help. Add Endeca components to the hosts. You should set one Forge, and at least one Indexer (Dgidx) and one MDEX Engine (Dgraph). The procedure to add components is described in the Endeca Web Studio Help. Add EAC scripts. The procedure to add EAC scripts is described in the Endeca Web Studio Help.
Endeca Confidential
39
While one user has a resource locked, no other user can select the resource without getting an error such as This component is currently in use by another application or user. Resource locking protects a project from multiple users making conflicting changes at the same time. Not all pages (resources) in the navigation pane of Web Studio can be locked. Web Studio locks the following pages when a user selects them: Thesaurus page, Rule Manager page, Phrases page, Stop Words page, and Dimension Order page. In addition, if an application uses rule groups on the Rule Manager page or redirect groups on the Redirect List page, then Web Studio treats each group as a separate resource and locks the group when a user selects it. The Preview App Settings page and View Reports page are not locked if a user selects them. Web Studio releases a resource lock in the following ways:
When a user logs out by clicking the Logout link. When a user closes his or her Web browser. Web Studio logs the user out approximately one minute after the browser closes. Note: If multiple browser windows are open with the same user log in, the lock is released only after the last window is closed.
When Web Studio ends a users session by timing out. Web Studio ends a session after 20 minutes of inactivity. When an administrator breaks a resource lock on the Resource Locks page. When a user clicks a rule group on the Rule Manager page or clicks a keyword redirect group on the Redirect List page. Each rule group or redirect group is locked individually and the lock is broken individually when a user selects a different group.
Note: Breaking the resource locks causes that user to lose any unsaved changes.
Endeca Confidential
40
Endeca Confidential
41
rolls the Log Server that is running on port 8002 on the host named web002.
Instance configuration - the Endeca project files for all the applications managed by the same instance of Web Studio. Web Studio store - a directory that contains a database of users, rule groups, and associated permission information.
Together, the instance configuration and the Web Studio store are the backup. The two are a snapshot of your projects and all their associated user and permission information.
To back up a project:
1 2 Stop the Endeca HTTP service. Copy the webstudiostore directory, including all its subdirectories, from the %ENDECA_CONF%\state\ directory (on Windows) or $ENDECA_CONF/state/ (on UNIX) to another location. Note: Recall that the default location of %ENDECA_CONF% on Windows is C:\Endeca\MDEXEngine\workspace. 3 Copy the emanager directory, including all its subdirectories, from the %ENDECA_CONF%\state\ directory (on Windows) or $ENDECA_CONF/state/ (on UNIX) to another location.
Endeca Confidential
42
In 5.1.3 or later, back up your Web Studio customization files: a b Navigate to %ENDECA_CONF%\conf directory (on Windows) or $ENDECA_CONF/conf (on UNIX). Copy ws-extensions.xml, ws-mainMenu.xml, and ws-roles.xml to another location.
To restore a backup:
Note: You can only restore a back up to an Endeca installation that is exactly the same version as the one on which you made the backup (for example, from 5.1.3 to 5.1.3, but not from a different 5.1.x version to 5.1.3). For information about transferring project and user information when upgrading Endeca, see the Endeca Migration Guide. 1 2 Stop the Endeca HTTP service. Delete the webstudiostore and emanager directories from %ENDECA_CONF%\state\ (on Windows) or $ENDECA_CONF/state/ (on UNIX). Copy the webstudiostore directory that you backed up earlier to %ENDECA_CONF%\state\ (on Windows) or $ENDECA_CONF/state/ (on UNIX). Copy the emanager directory that you backed up earlier to %ENDECA_CONF%\state\ (on Windows) or $ENDECA_CONF/state/ (on UNIX). In 5.1.3 or later, copy the ws-extensions.xml, ws-mainMenu.xml, and ws-roles.xml files that you backed up earlier to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Start the Endeca HTTP service.
If you are restoring the backup to a new Endeca installation, users will be unable to log in to Web Studio until the corresponding applications have been provisioned in the EAC.
Endeca Confidential
43
being used by Web Studio. The project XML files that make up the instance configuration are zipped into one file. This feature is intended primarily for debugging and support purposes. See the Endeca Web Studio Help for how to download an instance configuration. For information on transferring your instance configuration from staging to production environment, and using the emgr_update utility, see Transferring Endeca Implementations Between Environments on page 275.
Endeca Confidential
44
The Web Studio log (webstudio.log) logs activity such as user logins, dynamic business rule changes, automatic phrase creation and modification, and so on. Business rule logging records when a rule was modified, who modified the rule (according to Web Studio user name), and the name of the rule. Business rule logging does not record specific changes to the rules configuration such as changes to its trigger values, target values, rule properties, and so on.
Note: For backward compatibility, Web Studio log levels can still be configured using system properties.
Endeca Confidential
45
Port 8090 for the Endeca HTTP service shutdown port. Port 8888 for the Endeca HTTP service port.
You can change either or both of these ports, as long as you choose a new port that is not being used.
Open the server.xml file in a text editor. Find the Server element in the file:
NOTE: ENDECA HAS MODIFIED TOMCAT'S DEFAULT SERVER PORT OF 8005. ENDECA'S USES A DEFAULT SERVER PORT OF 8090
3 4 5
Change the number in the port attribute to the new port you want to use. Save and close the server.xml file. Restart the Endeca HTTP service. On UNIX: a Stop the Endeca HTTP service using:
Endeca Confidential
46
$ENDECA_ROOT/tools/server/bin/shutdown.sh
On Windows: a b c From the Windows Control Panel, select Administrative Tools, and then select Services. In the right pane of the Services window, right-click Endeca HTTP service and choose Restart. Close the Services window.
<!-- NOTE: ENDECA HAS MODIFIED THE DEFAULT TOMCAT NON-SSL HTTP PORT OF 8080. ENDECA' USES A DEFAULT NON-SSL HTTP PORT OF 8888 --> <!-- Define a non-SSL HTTP/1.1 Connector on port 8888 --> <Connector className="org.apache.catalina.connector.http.HttpConnector" port="8888" minProcessors="5" maxProcessors="75" enableLookups="true" redirectPort="8443" acceptCount="10" debug="0" connectionTimeout="60000"/>
3 4 5
Change the number in the port attribute to the new port you want the Endeca Application Controller and Web Studio to use. Save and close the server.xml file. Restart the Endeca HTTP service, as documented in step 5 of the previous procedure.
Endeca Confidential
47
well as the text of the note. For more information about workflow for business rules in Web Studio, see the Endeca Business Users Guide. To support non-ASCII characters in workflow e-mails, you can configure Web Studio to use UTF-8 encoding. Note that some e-mail clients, including Microsoft Outlook 2003, do not support UTF-8 encoding in mailto URLs, which causes extended characters not to display properly. You should only enable UTF-8 encoding if you are certain that it is supported on all e-mail clients in your organization. The default setting in Web Studio encodes workflow e-mail notifications using the escape function in JavaScript. On most systems this results in ISO-8859-1 encoding (which is supported by Outlook), but the actual encoding may depend on system settings on the client machine.
5 6
Save and close the file. Start the Endeca HTTP service.
Note that although UTF-8 support varies depending on the default e-mail client on each users machine, this setting applies to all workflow e-mail messages created by Web Studio.
Endeca Confidential
48
Endeca Confidential
Chapter 4
Users, roles, and permissions in Web Studio Web Studio user roles Assigning rule group permissions to Web Studio users LDAP integration with Web Studio
50
user name password roles and permissions user identity information such as first name, last name, and e-mail address
Roles dictate which Web Studio features are available to users. User identity information provides a way to associate name and contact information with user names in Web Studio. If you have Web Studio configured to use LDAP for user authentication, an administrator can create a user profile where the password and identity information is stored and managed in an LDAP directory. LDAP integration also allows you to assign roles and permissions across an entire LDAP group rather than configuring each user individually. For more information about configuring Web Studio with LDAP, see LDAP integration with Web Studio. Each business user profile is associated with a specific application and a business user profile cannot span applications. In cases where you might want the same user in multiple applications, an administrator can create a number of identical business user profiles for any number of applications. Administrators, on the other hand, span applications across Web Studio. For the process to add users and modify user names, passwords, and roles, see the Endeca Web Studio Help.
Endeca Confidential
51
password is admin. After signing in as the admin user, you can modify the password but not the user name. The admin user can create additional administrators in Web Studio. Only an administrator can create other administrators. An administrator can also delete other administrators, including the predefined admin user, as long as there is always at least one administrator in the system. If you have LDAP authentication enabled, see also Administrators in Web Studio with LDAP on page 60. An administrator is not associated with an application in the same way that business users are. Each business user is associated with a particular application. Administrators span applications, so an administrator can add or remove applications without being affected by that addition or removal.
Role description
This is a cumulative role that provides access to pages enabled by all the predefined user roles in Web Studio. This role cannot be assigned to users, but is automatically assigned to any administrators that you create. It is possible to disable admin users from modifying provisioning information. For more information, see Disabling the admin role from modifying provisioning information on page 55.
eacconsole
Provides access to the EAC Admin Console page. Users with this role cannot modify provisioning information on the EAC Admin Console. However, users can start and stop Endeca components and EAC scripts. Provides access to the Dimension Order page. Provides access to the Phrases page.
dimorder phrases
Endeca Confidential
52
Role name
redirects reporting rules settings
Role description
Provides access to the Keyword Redirects page. Provides access to the Reporting page. Provides access to the Rule Manager page. Provides access to all pages under the Application Settings section. This includes the following pages: Instance Configuration, Resource Locks, User Management, Rule Group Permissions, Preview App Settings. Provides access to the Stop Words page. Provides access to the Thesaurus page.
stopwords thesaurus
Endeca Confidential
53
Each role is defined in a role element within roles. You can specify as many additional roles as you need by adding more role elements. The following attributes must be defined for each role: Attribute name
id
Attribute value
A unique string identifying this role. Do not define a custom role with the same id as one of the predefined user roles: admin, crawler, dimorder, eacconsole, phrases, redirects, reporting, rules, settings, stopwords, thesaurus. Roles are listed in alphabetical order by id in the User Management page in Web Studio. Note: Modifying this value after the rule is created deletes the original role and creates a new role.
defaultName
The display name for this role that appears on the User Management page in Web Studio. A brief description of this role that appears on the User Management page in Web Studio.
defaultDescription
Example
This example of a ws-roles.xml file defines two custom roles.
<?xml version="1.0" encoding="UTF-8"?> <roles xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="roles.xsd"> <role id="roleA" defaultName="roleA" defaultDescription="Provides access to an extension page" /> <role id="roleB" defaultName="roleB" defaultDescription="Provides access to another extension page" /> </roles>
Endeca Confidential
54
Localized names are defined in a names element within role that contains one or more name elements. Localized descriptions are defined in a descriptions element within role that contains one or more description elements. The name and description elements require a locale attribute whose value is a valid ISO language code.
Example
This example of a ws-roles.xml file defines a custom role with separate names and descriptions for English and French.
<?xml version="1.0" encoding="UTF-8"?> <roles xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="roles.xsd"> <role id="localized" defaultName="localized" defaultDescription="A role with localized names" > <names> <name locale="en">localized</name> <name locale="fr">localis</name> </names> <descriptions> <description locale="en">A localized role</description> <description locale="fr">Un rle localis</description> </descriptions> </role> </roles>
Web Studio checks for a name and description that matches the locale defined in the current installation of Web Studio. If no matching localized name or description is found, the defaultName and defaultDescription values are used.
Endeca Confidential
55
3 4 5 6
Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Open ws-roles.xml in a text editor and add or modify roles as necessary. See Custom user roles in Web Studio on page 52. Save and close the file. Start the Endeca HTTP service.
IMPORTANT: Deleting a role causes all the user assignments to that role to be deleted across all applications. Modifying the id attribute of a role deletes the original role (and its corresponding user assignments) and creates a new role with the new id. Modifications to any other attributes are saved when you update Web Studio and user assignments are preserved. To recover a deleted role along with its user assignments, restore the backups made in Step 1. See Backing up and restoring an Endeca project on page 41.
5 6
Save and close the file. Start the Endeca HTTP service.
Endeca Confidential
56
Assign by group on the Rule Group Permissions page Assign by user name on the User Management page
There are four permission levels available for rule group access. A user may have one of the following permissions for each rule group:
ApproveThe user has permission to view, edit, and approve rules in the group. EditThe user has permission to view and edit rules in the group but no permission to approve rules. View OnlyThe user has permission to view rules in the group but no permission to edit or approve rules. NoneThe user has no permission to view, edit, or approve rules in the group. Users with this permission will not see the rule group displayed in Web Studio.
Administrators are automatically assigned Approve permissions in all rule groups. See the Endeca Web Studio Help for the procedures to assign rule group permissions to Web Studio users.
Endeca Confidential
57
For users who are configured in Web Studio to authenticate via LDAP, the password and identity information such as name and e-mail address are maintained in the LDAP directory. Web Studio does not write any data to the LDAP directory. Any roles and permissions assigned to an LDAP user profile in Web Studio are stored in the Web Studio database. LDAP user and group profiles can be used in combination with the traditional Web Studio user profiles that an administrator configures manually. Users can authenticate via either method on the same instance of Web Studio and in the same application. Optionally, you can enable SSL for communication between Web Studio and your LDAP server. For more information on using LDAP with SSL, see the Endeca Security Guide. Web Studio supports integration with LDAP servers that comply with LDAP version 3.
2 3
Endeca Confidential
58
inheritance of LDAP group roles and permissions, see Roles and permissions for LDAP users and groups on page 59.
A user is attempting to log in with a user name and password defined in the LDAP directory, but there is a manually configured Web Studio user in the same application with the same user name or a Web Studio administrator with the same user name. A user with a manually configured profile always takes precedence over a user authenticating via LDAP. For more details about the behavior of users with the same name, see User profiles for LDAP users and groups on page 59.
A manually configured profile exists for the user in Web Studio and the password provided does not match the password stored in the user profile. No manually configured profile exists for the user in Web Studio but the user exists in the LDAP directory, and the password provided does not match the password stored in the LDAP directory. No profile exists for the user in Web Studio either as a Web Studio user or an LDAP user, and none of the LDAP groups of which the user is a member have a user profile configured in Web Studio. One or more user profiles exist for the user or for groups of which the user is a member, but none of the profiles specify any roles. A user who does not have any associated roles cannot log in to Web Studio. No manually configured profile exists for the user in Web Studio, and no such user exists in the LDAP directory or the query to the LDAP server returns more than one result. Web Studio does not handle the case of more than one user object with the same user name specified in the LDAP directory. If no users are able to authenticate via LDAP, there may be a problem with the configuration in %ENDECA_CONF%\conf\Login.conf (on Windows) or $ENDECA_CONF/conf/Login.conf (on UNIX).
Check the error messages in the Web Studio log for more information about the causes of authentication failures.
Endeca Confidential
59
Endeca Confidential
60
group in the application, a user who is a member of multiple LDAP groups defined in Web Studio is assigned the broadest permission associated with any of the LDAP groups of which that user is a member. If you create an LDAP user profile in Web Studio for an individual who is also a member of one or more LDAP groups defined in Web Studio, that user is assigned any roles you specify on the User Management page in addition to any roles that the user inherits from membership in LDAP groups. If you specify rule group permissions for an LDAP user who is also a member of an LDAP group, then for each rule group, the user is assigned either the permission specified on the User Management page or the broadest permission associated with any of the users LDAP groups, whichever is broader. You can override this behavior by specifying Override LDAP Group Permissions when creating the profile in Web Studio. If you select this option, the user is assigned only the roles and permissions you specify in the user profile, and does not inherit any roles or permissions from LDAP groups.
Rename the predefined admin user. If you have created another administrator as a manually configured Web Studio user, you can delete the predefined admin user.
Note that administrators can still delete other administrators, but there must be at least one manually configured Web Studio administrator. This is to ensure that changes to the LDAP directory or disabling of LDAP authentication for Web Studio cannot disable all administrator logins.
Endeca Confidential
61
This information is only captured in the log; the user in Web Studio will not see any message about whether e-mail addresses could be found. Because Web Studio launches another application to send the e-mail and the user can edit the list of recipients before sending the message, the Web Studio log cannot record whether an e-mail was sent, or the actual recipients of the message.
Endeca Confidential
62
Open the webstudio.properties file, and locate the com.endeca.webstudio.useLdap property, for example:
# LDAP Authentication com.endeca.webstudio.useLdap=false
5 6
Save and close the file. Open the Login.conf file. This file contains a sample configuration for LDAP authentication. Note: By default, Web Studio uses the authentication profile in this location. You can specify an alternate configuration file. For more information, see Specifying the location of the configuration file on page 63.
Uncomment and modify the Webstudio profile according to your LDAP configuration. For details about profile parameters, see Configuration of the Webstudio login profile for LDAP on page 62. Save and close the file. Start the Endeca HTTP service.
8 9
Endeca Confidential
63
profile named Webstudio in %ENDECA_CONF%\conf\Login.conf (on Windows) or $ENDECA_CONF/conf/Login.conf (on UNIX). A sample profile is included in this location by default, but you should modify its parameters as needed for your LDAP configuration. You can also specify an alternate location for the configuration file. If you want to configure JAAS authentication for other applications running in the Endeca HTTP service, for example, for the Standard Application or your own Endeca implementation, create additional profiles with unique names in this same Login.conf file. For more information on configuring JAAS authentication for your Endeca application using LDAP or a local password file, see the Endeca Security Guide. Note: A Login.conf file exists in %ENDECA_CONF%\etc (on Windows) and $ENDECA_CONF/etc (on UNIX). This file contains a sample profile for file-based authentication and is not used by Web Studio.
Note: In the Registry Editor Explorer pane, expand the folders until you reach Java. Then click on the Java folder and look for the Options setting in the right pane. 2 Edit the Options setting and look for the following parameter:
-Djava.security.auth.login.config=%ENDECA_CONF%/conf/Login.conf
Endeca Confidential
64
If you are running the Endeca HTTP service on Windows from the command line:
1 2 3 Navigate to the %ENDECA_ROOT%\tools\server\bin directory. Open the setenv.bat file. Locate the line that begins with set JAVA_OPTS, for example:
Change the path of the -Djava.security.auth.login.config parameter to point to the location of your configuration file.
Change the path of the -Djava.security.auth.login.config parameter to point to the location of your configuration file.
Description
The name of the LDAP user as defined in the user profile in Web Studio, or the user name entered by a user at the Web Studio login page. The name of the LDAP group as defined in the user profile in Web Studio.
%{#groupname}
Endeca Confidential
65
Escape
%{#dn}
Description
The distinguished name of the user or group object in the LDAP directory. The value of the path field at index n in the distinguished name of the user or group object in LDAP. For example, if the value in the %{#dn} field is cn=joe,ou=People,dc=foo,dc=com, then the value People will be substituted for %{#dn:1}, while joe will be substituted for %{#dn:0}. Note that unlike the value of %{#dn}, which is the raw value returned from the LDAP server, the values returned by this template are not LDAP escaped.
%{#dn:n}
%{#fieldname}
The value in the fieldname field of the user object (or group object when used in the groupTemplate parameter) under consideration.
For most parameter values, single quotation marks (') do not need to be escaped and the values you specify for the parameters can include UTF-8 characters. For additional restrictions on the userPath, groupPath, and findGroupPath parameters, see LDAP path parameters on page 68. The following parameters can be specified in the profile: Parameter
serverInfo
Description
A URL specifying the name and port of the LDAP server to be used for authentication. You can specify multiple LDAP servers.
Endeca Confidential
66
Parameter
userPath
Description
The query that is passed to the LDAP server to find an individual user. When appended to the URL in the serverInfo parameter, this should form a valid LDAP URL as described in RFC 2255. Optional. The name of the attribute on the user object that contains the users first name. Optional. The name of the attribute on the user object that contains the users last name. Note: Web Studio requires at least one of the name fields to be specified. If the LDAP directory does not have separate fields for first and last name, you can map the full name of the user to either the firstNameAttribute or the lastNameAttribute parameter.
firstNameAttribute
lastNameAttribute
emailAttribute
Optional. The name of the attribute on the user object that contains the users e-mail address. This information is used for workflow notifications. The query that is passed to the LDAP server to find all the groups of which a user is a member. The query uses the information about the user that is returned by the userPath query. When appended to the URL in the serverInfo parameter, this should form a valid LDAP URL as described in RFC 2255. You can specify multiple values for groupPath. A template that specifies how to produce individual group names from the set of groups returned by the groupPath query. The value of this template should match the name of the LDAP group as defined in the Web Studio user profile. You can specify multiple values for groupTemplate. The query that is passed to the LDAP server to find a specific group. When appended to the URL in the serverInfo parameter, this should form a valid LDAP URL as described in RFC 2255.
groupPath
groupTemplate
findGroupPath
Endeca Confidential
67
Parameter
groupEmailAttribute
Description
The name of the attribute on the group object that contains an e-mail address associated with the group in LDAP. This information is used for workflow notifications in the case where an LDAP group is specified as an approver for a rule group. The user name of an administrator login to the LDAP server specified in the serverInfo parameter. For example: "Manager@example.com" or "cn=Manager,dc=example,dc=com" If no value is specified for this option, Web Studio will attempt to authenticate anonymously.
serviceUsername
servicePassword
The password to use in conjunction with the serviceUsername value. Specifies the method of authentication that should be used in connecting to the LDAP server as the administrator account. The permitted values are none, simple, or EXTERNAL.
serviceAuthentication
authentication
Specifies the method of authentication that should be used in rebinding to the LDAP server as a user account. The permitted values are none, simple, or EXTERNAL.
ldapBindAuthentication
Optional. By default this is set to true, and Web Studio authenticates users by rebinding as the user to the LDAP system, thereby employing the LDAP systems own authentication mechanism. Optional. A template login name that will be used to rebind to the LDAP server if ldapBindAuthentication is true. Default value is %{dn}.
loginName
Endeca Confidential
68
Parameter
passwordAttribute
Description
Optional. The name of the attribute on the user object that contains the users password. Used only if ldapBindAuthentication is set to false. The field specified must contain the users password in clear text. By default this is set to userPassword. Optional. Determines whether Web Studio checks passwords during logins. Default value is true. If set to false, Web Studio uses only the user name to authenticate from the LDAP directory. Optional. Default value is false. If set to true, Web Studio will make mutually authenticated SSL connections to the LDAP server. If you set the parameter, ensure that you have configured the LDAP server to use SSL and that the value of serverInfo has the protocol specified as ldaps:// with an SSL port.
checkPasswords
useSSL
keyStoreLocation
Used only if useSSL=true. The location of the Java keystore, which stores keys and certificates. The keystore is where Java gets the certificates to be presented for authentication. The location of the keystore is OS-dependant, but is often stored in a file named .keystore in the users home directory. Note: Even if this location is on a Windows system, the path uses forward slashes, (/) not backslashes (\).
keyStorePassphrase
Used only if useSSL=true. The passphrase used to open the keystore file.
Endeca Confidential
69
it to your LDAP server before specifying it in the configuration for Web Studio. LDAP encoding affects reserved characters such as the comma (,), equals sign (=), and question mark (?). These characters must be escaped by prepending a backslash (\) when they are not used for their reserved purpose, for example if they appear within a common name or organizational unit. URL encoding affects characters that are invalid for URLs, such as non-ASCII characters and any unsafe characters as defined in RFC 1738. This includes reserved LDAP characters when they are not used for their reserved purpose. These characters must be replaced with the % sign followed by the appropriate hex code. For example, if you have the following string as part of your userPath:
ou=Endeca Technologies, Inc.
Any non-ASCII characters or any other characters that are not valid in an LDAP URL must also be properly encoded in the string that you specify in the Webstudio profile.
For example:
serverInfo.0="ldap://web01.endeca.com:1234" serverInfo.1="ldap://web02.endeca.com:1230" serverInfo.2="ldap://web03.endeca.com:1334"
If you specify multiple LDAP servers, the servers are assumed to be equivalent. The choice of which LDAP server to contact is made randomly. If an LDAP server cannot be reached, the LoginModule plug-in proceeds through the
Endeca Confidential
70
remaining servers in order of configuration, wrapping if necessary. For example, if five servers are configured and Server 3 is the first to be contacted, the remaining order of contact is Server 4, Server 5, Server 1, and finally Server 2. You can also specify multiple values for the groupPath attribute by using the same format, for example:
groupPath.0="/ou=groups,dc=endeca,dc=com??sub?(member=%{#dn})" groupPath.1="/dc=endeca,dc=com?memberOf?sub?(AccountName=%{#use rname})"
If you specify more than one groupPath, Web Studio sends all the queries to the LDAP server to discover the groups of which a user is a member. You can specify corresponding values for groupTemplate for each groupPath. In this case, the value for groupTemplate.0 is applied to the results of the groupPath.0 query, groupTemplate.1 is applied to the results of groupPath.1, and so on.
Endeca Confidential
Chapter 5
72
Add a new menu item. Remove an item from the menu. Specify the order in which the menu items display. Specify whether an item is in the top-level menu or in a submenu. Specify whether a menu item displays on the launch page.
Attribute value
The id of a predefined node in Web Studio or a unique string identifying a custom node. For more information on predefined nodes, see Predefined menu nodes in Web Studio on page 74. The display name for this node that appears in the navigation menu. This attribute is required for all custom nodes.
defaultTitle
A menunode element requires one or more child menuitem elements (see Navigation menu leaf items on page 74).
Endeca Confidential
73
Example
This example of a ws-mainMenu.xml file defines a custom menu node with extensions as its child items.
<?xml version="1.0" encoding="UTF-8"?> <mainmenu xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mainMenu.xsd"> <menunode id="myextensions" defaultTitle="My Extensions"> <menuitem id="extensionA"/> <menuitem id="extensionB"/> </menunode> </mainmenu>
Example
This example of a ws-mainMenu.xml file defines a custom menu node with titles in both English and French.
<?xml version="1.0" encoding="UTF-8"?> <mainmenu xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mainMenu.xsd"> <menunode id="myextensions" defaultTitle="My Extensions"> <titles> <title locale="en">Access Extensions</title> <title locale="fr">Accder aux extensions</title> </titles> <menuitem id="extensionA"/> <menuitem id="extensionB"/> </menunode> </mainmenu>
Web Studio checks for a title that matches the locale defined in the current installation of Web Studio. If no matching localized title is found, the defaultTitle value is used.
Endeca Confidential
74
Node description
Search Configuration View Reports Application Settings EAC Administration
Attribute value
The id of a predefined page in Web Studio or the id of an extension as defined in ws-extensions.xml. For more information about extensions, see Web Studio extensions on page 77.
Required?
yes
onLaunchPage
If set to true, the menu item displays on the launch page in the order in which it is listed in ws-mainMenu.xml. Default value is false.
no
Endeca Confidential
75
The predefined pages and their corresponding ids are as follows: Web Studio page
Rule Manager Keyword Redirects Thesaurus Phrases Stop Words Dimension Order Current Report (Daily) Current Report (Weekly) Daily Reports Weekly Reports EAC Monitor User Management Rule Group Permissions Resource Locks Report Generation Preview App Settings Instance Configuration User Settings (for non-admin users) EAC Admin Console EAC Settings
Menu item id
rules redirects thesaurus phrases stopwords dimorder reporting.currentDaily reporting.currentWeekly reporting.daily reporting.weekly eacMonitor settings.users settings.permissions settings.locks settings.reporting settings.previewApp settings.instanceConfig userSettings eacconsole.console eacconsole.settings
Endeca Confidential
76
Example
This example of a ws-mainMenu.xml file defines a menu that shows top-level leaf items, items nested within a predefined node, and items nested within a custom node. Items that have onLaunchPage="true" display in the launch page regardless of whether they are in the top-level menu or in a submenu.
<?xml version="1.0" encoding="UTF-8"?> <mainmenu xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mainMenu.xsd"> <menuitem id="rules" onLaunchPage="true"/> <menuitem id="redirects" onLaunchPage="true"/> <menunode id="searchConfig"> <menuitem id="thesaurus" onLaunchPage="true"/> <menuitem id="phrases"/> <menuitem id="stopwords"/> </menunode> <menunode id="myextensions" defaultTitle="My Extensions"> <menuitem id="extensionA" onLaunchPage="true"/> <menuitem id="extensionB"/> </menunode> </mainmenu>
Endeca Confidential
77
Each extension is defined in an extension element within extensions. You can specify as many additional extensions as you need by adding more extension elements.
Endeca Confidential
78
Attribute value
A unique string identifying this extension. Do not define an extension with the same id as one of the predefined Web Studio pages. For a list of predefined Web Studio pages and their ids, see the table in Navigation menu leaf items on page 74.
Required?
yes
defaultName
The display name for this extension that appears in the navigation menu and launch page in Web Studio. A brief description of this extension that appears on the launch page in Web Studio. The fully specified URL to this extension. The extension must be a Web application reachable through HTTP or HTTPS, but it does not have to run on the same server as Web Studio. The fully specified URL to a custom image for this extensions entry on the launch page. The id of the role that is allowed to access this extension. This can be one of the predefined Web Studio user roles, or any custom role. For more information on user roles, see Web Studio user roles on page 51. Each extension can have a maximum of one role, although a single role can allow access to many extensions. If no role is specified, the extension is available to all Web Studio users.
yes
defaultDescription
yes
url
yes
launchImageUrl
no
role
no
height
The height in pixels of the frame in which the extension is displayed. The default value is 500 pixels. A shared key that Web Studio uses to calculate the authentication token. For more information on the authentication token, see Token-based authentication for Web Studio extensions on page 82.
no
sharedSecret
no
Endeca Confidential
79
Example
This example of a ws-extensions.xml file defines a simple extension that enables a link to the Endeca Web site for all admin users.
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="endecaHome" defaultName="Endeca home page" defaultDescription="Visit the Endeca home page" url="http://www.endeca.com" role="admin" /> </extensions>
Example
This example of a ws-extensions.xml file defines an extension with separate names and descriptions for English and French.
Endeca Confidential
80
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="endecaHome" defaultName="Endeca home page" defaultDescription="Visit the Endeca home page" url="http://www.endeca.com" role="admin"> <names> <name locale="en">The Endeca Web site</name> <name locale="fr">La page d'accueil d'Endeca</name> </names> <descriptions> <description locale="en">Link to the Endeca Web site</description> <description locale="fr">Lien vers la page Web dEndeca</description> </descriptions> </extension> </extensions>
Web Studio checks for a name and description that matches the locale defined in the current installation of Web Studio. If no matching localized name or description is found, the defaultName and defaultDescription values are used.
Endeca Confidential
81
Token description
An MD5 hash value used to authenticate users coming from Web Studio. For more information on the authentication token, see Token-based authentication for Web Studio extensions on page 82. The name of the application that the Web Studio user is logged in to. The host running the EAC Central Server to which Web Studio is currently connected. The port on the EAC host through which Web Studio and the EAC Central Server communicate. The id of the extension as defined in ws-extensions.xml. The locale of Web Studio; this is the value of the com.endeca.webstudio.locale property in %ENDECA_CONF%\conf\webstudio.properties. The time, in milliseconds since 00:00:00 UTC January 1, 1970, when the user navigates to the extension. The username of the Web Studio user accessing the extension. The id of the users current Web Studio session. The extension can use this in combination with the ${USERNAME} token to maintain the state of the extension throughout a single Web Studio session, for instance by storing the information in a cookie.
${EAC_APP}
${EAC_HOST}
${EAC_PORT}
${EXTENSION_ID} ${LOCALE}
${TS}
${USERNAME}
${WEBSTUDIO_SESSIONID}
Endeca Confidential
82
You use these tokens by specifying them in the url attribute of the extension definition in %ENDECA_CONF%\conf\ws-extensions.xml. The name of the URL parameter does not have to match the id of the token as listed in the preceding table. For example, the following extension definition creates a URL that passes the EAC host, port, and application to the extension:
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="testExtension" defaultName="Test Extension" defaultDescription="Demonstrates extensions with tokens." url="http://www.example.com:8989/TestExtension/index.jsp?eac-host= ${EAC_HOST}&eac-port=${EAC_PORT}&eac-app=${EAC_APP}" </extension> </extensions>
Note the use of the & entity in the url attribute in place of the ampersand in the URL. In general, you should ensure that the ws-extensions.xml file validates against the provided schema before updating Web Studio with the new configuration.
Endeca Confidential
83
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="authExtension" defaultName="Authenticated Extension" defaultDescription="Demonstrates token-based authentication." url="http://localhost:8080/AuthExtension/index.jsp?timestamp=${TS}& auth=${AUTH}" role="admin" sharedSecret="secret!@#$%^*(987654321" /> </extensions>
In this case, the value of the authentication token is the hash of a string that looks similar to this:
/AuthExtension/index.jsp?timestamp=1189702462936&auth=secret!@#$%^*(987654321
The extension can verify that a user is coming from Web Studio by calculating the hash of the same string and comparing the result to the value of the AUTH token. This ensures that the user visiting the extension has logged in to Web Studio and has the role (if any) that is required to access the extension. Because the AUTH token is based in part on the URL, it is recommended that you include the time stamp of the request to introduce some variation in the value of the token. The time stamp can also be used to filter out stale requests and limit the possibility of an eavesdropper reusing the same URL to gain access to the extension. The following Java code shows how the extension defined in the preceding example can authenticate users from Web Studio:
Endeca Confidential
84
// These values depend on what you defined in ws-extensions.xml String extensionSecret="secret!@#$%^*(987654321"; final String authTokenParameterName = "auth"; final String timeStampParameterName = "timestamp"; // Set the tolerance, in milliseconds, before a request is considered too old int allowedTimeStampSlackInMS = 5 * 60 * 1000; // Calculate the hash of the substring of the URL and the shared secret String url = request.getRequestURI() + "?" + request.getQueryString(); String findAuthToken = "&" + authTokenParameterName + "="; url = url.substring(0, url.indexOf(findAuthToken) + findAuthToken.length()); String authCode = request.getParameter(authTokenParameterName); MessageDigest md = MessageDigest.getInstance("MD5"); byte[] md5Hash = md.digest((url + extensionSecret).getBytes("UTF-8")); StringBuffer hashCode = new StringBuffer(); for(int i : md5Hash) { String str = Integer.toHexString(i+128); if (str.length() < 2) { str = "0" + str; } hashCode.append(str); } // Compare the hash to the value of the AUTH token if (!hashCode.toString().equals(authCode)) { // Authentication fails because AUTH token did not match } // Compare the time stamp of the request to the current time stamp long currentTime = new Date().getTime(); long ts = Long.parseLong(request.getParameter(timeStampParameterName)); if ( Math.abs(ts - currentTime) > allowedTimeStampSlackInMS) { // Authentication fails because request is too old }
The example extension places the AUTH token at the end of the URL, making it more convenient to build the substring of the URL for the hash.
Endeca Confidential
85
However, the AUTH token can be in any position in the URL. For instance, the URL can be defined in ws-extensions.xml is as follows:
url="http://localhost:8080/AuthExtension/index.jsp?auth=${AUTH}& timestamp=${TS}"
The value of the authentication token would be the hash of a string similar to this:
/AuthExtension/index.jsp?auth=×tamp=1189702462936secret!@#$%^*(987654321
In this case the code in the extension to remove the value of the authentication token from the URL would be more complex.
The host name is the name or IP address of the Web Studio server. Replace 8888 with the Web Studio port if it is not running on the default port. For more information about the styles defined in the public style sheet, see the comments within the public.css file. The file can be viewed at the following URL on the Web Studio server:
http://hostname:port/stylesheets/public.css
The public.css file cannot be edited. If you want to specify additional styles or modify the default styles, create a separate style sheet and apply it to your application.
Endeca Confidential
86
If the extension does not have a link in the navigation menu or launch page:
Stop and restart the Endeca HTTP service. Changes to the XML configuration files for extensions, roles, and the navigation menu do not go into effect until the service is restarted. Ensure that you have the required Web Studio user role to access the extension. Ensure that a menu item for the extension is specified in ws-mainMenu.xml and that the id attribute matches the id of the extension as defined in ws-extensions.xml. Defining an extension in ws-extensions.xml does not automatically add a link to the navigation menu in Web Studio. For more information about customizing the Web Studio menu, see Updating the Web Studio menu and launch page on page 76. If you want an extension to have an entry on the launch page, specify onLaunchPage="true" in the menuitem element for the extension in
ws-mainMenu.xml.
If you have no applications defined in Web Studio, the only links that display in the navigation menu are for the EAC Admin Console and EAC Settings. To enable display of the full Web Studio menu, you must first provision an application.
If the link displays in the menu but the extension does not display when you click the link:
Ensure that the URL for the extension specified in ws-extensions.xml is a valid HTTP or HTTPS URL. A Web Studio extension must be a Web application running in a Web server.
If the Web Studio window does not display at all after updating ws-extensions.xml:
There may be a problem with your XML configuration files that prevents Web Studio from starting up. The error messages in the Endeca HTTP
Endeca Confidential
87
service logs (located at %ENDECA_CONF%\logs\catalina.date.log on Windows, or $ENDECA_CONF/logs/catalina.date.log on UNIX) can help you identify whether one of the following is the case:
One or more of the XML configuration files is missing. The following files must be present in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX):
ws-extensions.xml and its associated schema, extensions.xsd ws-mainMenu.xml and its associated schema, mainMenu.xsd ws-roles.xml and its associated schema, roles.xsd
The files are created in this location when you install Endeca. By default, the ws-extensions.xml and ws-roles.xml files define no extensions or additional roles. The ws-mainMenu.xml file controls the display of the navigation menu and launch page. If you have deleted one of these files, you can restore the default file by copying it from %ENDECA_ROOT%\workspace_template\conf (on Windows) or $ENDECA_ROOT/workspace_template/conf (on UNIX).
One or more of the configuration files contains badly formed or invalid XML. Ensure that the configuration files contain well-formed XML. In particular, check that any ampersand that is used within an attribute value is specified as the & entity. Use an XML tool to validate any configuration files that you have edited against the associated schema in %ENDECA_CONF%\conf (on Windows) and $ENDECA_CONF/conf (on UNIX).
Endeca Confidential
88
Endeca Confidential
Chapter 6
Preview application overview Preview application requirements Instrumenting your application Configuring the preview application
90
Windows). Do not confuse this with the regular JSP reference implementation in $ENDECA_REFERENCE_DIR/endeca_jspref (%ENDECA_REFERENCE_DIR%\endeca_jspref. on Windows).
Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio
Endeca Confidential
91
Domain
The preview application and Web Studio must reside in the same domain (for example, endeca.com).
Javascript domain
If Web Studio and your custom application do not reside on the same host, you must declare the Javascript domain in two locations inside the custom applications code:
Navigation results page (the page that shows the set of records that correspond to a users query). Record page (the page that displays information about a single record).
Web Studio communicates with and controls the preview application via Javascript. As a result, both Web Studio and the preview application must have the same Javascript domain property. The domain property provides security for scripts that run in different browser windows but need to communicate with one another. When you enter the Javascript domain, you can also include the port number of the application server. This will ensure that you are referring to the exact host machine and port number. For example, if the custom application is on an application server running on port 8080, you can enter the Javascript domain as:
10.0.0.61:8080
or
web004:8080
The first format uses the host machines IP address, while the second uses the machine name. IMPORTANT: In addition, Web Studios Configuration page provides a field where you must enter this information. This is analogous to declaring the domain in your Javascript headers.
Endeca Confidential
92
No frames
The preview application must not use frames, because they are likely to collide with the frames of Web Studio itself.
URL-based state
The preview application must use URLs to handle navigation and search requests, as opposed to a hidden cookie or session state. The URLs should allow the substitution of search terms and navigation components. See Using pre-existing applications on page 95 for more information.
Cookie name
Web Studio uses cookies to maintain a users session. The name of the session cookie used by Web Studio is ESESSIONID. In rare cases it is possible for the cookie name to collide with a cookie of the same name on the same application server. This conflict can occur if you are running your application on an application server on the same host as Web Studio and using ESESSIONID for two purposes. In this situation, a user may have their session unexpectedly terminated. To resolve this issue, you can either run the application on another host (that is, a host other than the one Web Studio is on), or customize your application server to use a different cookie name (other than ESESSIONID) through custom directives on the specific application server. Note: If your application does not meet the above requirements, Endeca recommends that you use the default JSP reference implementation available in Web Studio.
Navigation results page (the page that shows the set of records that correspond to a users query). Record page (the page that displays information about a single record).
Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio
Endeca Confidential
93
Endeca provides an Application Instrumentation Library with convenient methods to do this. The Application Instrumentation Library is a simple library, consisting of two functions, one for the navigation results page and one for the record page. A version is provided for each supported languageJava, .NET and COM. Note: The COM API is deprecated in version 5.1, and will be removed in a future version of the Endeca Information Access Platform. Therefore, if you are beginning a new project, it is recommended that you use the Java API or the .NET API.
where nav is the Navigation object for the page. The code above produces an HTML form that looks similar to this example:
<form name="eti-navigation"> <input type="hidden" name="nav" value="0"> <input type="hidden" name="srchTerms" value=""> <input type="hidden" name="srchKey" value="Wine Types"> </form>
COM/ASP
dim eti set eti = Server.CreateObject("Endeca.ETInstrumentor") eti.htmlInstrumentNavigation(nav)
ASP .NET
ETInstrumentor eti = new ETInstrumentor(); eti.htmlInstrumentNavigation(nav);
Endeca Confidential
94
where rec is the ID of the Endeca record displayed on the page, NameProp is the name of the property that represents the records name, and UniqueProp is the name of the property that uniquely identifies the record. The code above produces an HTML form that looks similar to this example:
<form name="eti-record"> <input type="hidden" name="displayName" value="Mustilli, Non-Vintage"> <input type="hidden" name="recordSpecKey" value="WineID"> <input type="hidden" name="recordSpecValue" value="1"> </form>
COM/ASP
dim eti set eti = Server.CreateObject("Endeca.ETInstrumentor") eti.htmlInstrumentRecord(rec, "NameProp", "UniqueProp")
ASP .NET
ETInstrumentor eti = new ETInstrumentor(); eti.htmlInstrumentRecord(rec, "NameProp", UniqueProp");
Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio
Endeca Confidential
95
The URLs must contain parameters that map to navigation, search key, and search term parameters. The navigation, search key, search term parameters, record ID, preview time, and rule filter parameters must use the same encoding as the standard Endeca N, Ntk, Ntt, Nmpt, and Nmrf parameters, respectively.
Enter the URLs for the preview application of the reference implementation (these URLs originally were filled in as default settings), or Enter the URL settings for your own application.
For information on the default URL settings used for the JSP reference implementation, see the Endeca Web Studio Help.
Endeca Confidential
96
Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio
Endeca Confidential
Chapter 7
About logging and reporting Implementing logging and reporting in Web Studio Viewing reports in Web Studio Additional report generation tasks
98
How much search and navigation traffic is my site getting? How are visitors searching and browsing the site? What conversion rates are occurring as a result of searching, navigating, and reacting to merchandising or content spotlighting? What are the most popular search terms and navigation requests? How effective are their searching and browsing techniques?
For information on the logging and reporting system architecture, API, and customization, see the Endeca Developers Guide.
In addition to running on the Web Studio server, an EAC Agent must be installed on any host where you will be running the Log Server and Report Generator components. Report generation depends on the information collected in the log files. To enable logging, your development team should implement logging API calls to your application modules. For information on the logging API, see the Endeca Developers Guide.
Endeca Confidential
99
Monitoring the Log Server: To check that the Log Server is running, issue the following URL:
http://LogServerNameorIP:LogServerPortNumber/stats
If the Log Server is running, this URL returns a confirmation message containing the file name, number of log entries, and number of errors. If it is not running, you will see your browsers default error message.
Rolling the log file: To roll the Log Servers log file, issue the following URL:
http://LogServerNameorIP:LogServerPortNumber/roll
Endeca Confidential
100
Report details
Reports help you make informed decisions about how your application is being used. For example, you can obtain information about searches that do not return desired results. By analyzing these searches, you can determine what aspects of your Endeca IAP implementation may require changes. Report generation has the following characteristics:
You can enable both daily and weekly reports. If you are using Web Studio, daily reports start at 12 am and finish at 11:59:59 pm. Weekly reports start at 12 am on the day that you specify and finish at 11:59:59 pm on the day that ends a week. For example, if you select Monday as the day to start your weekly report, your report runs from 12 am on Monday until 11:59:59 pm on the following Sunday. The Endeca software saves generated reports to the EAC directory /workspace/reports/<application_name> on UNIX and \workspace\reports\<application_name> on Windows. You cannot specify an alternate reports directory.
It is also possible to customize the contents of your reports. For details, see the Endeca Developers Guide.
You can configure and run each component and step individually. This method is discussed on page 101. You can automate the end-to-end logging and reporting process by using the report generation script. This method is discussed on page 109.
Endeca Confidential
101
In either case, the output of the Log Server must not match up with the input of the Report Generator in terms of provisioning. In other words, the output and input must reside in separate directories. Note: You can also run logging and reporting with other Application Controller clients, such as the eaccmd tool or a custom Web services interface. For information about provisioning the LogServer and ReportGenerator components in an Application Controller provisioning file, see page 178 and page 179, respectively. For eaccmd usage, see Component and script control commands on page 201. For API details, see Endeca Application Controller API Class Reference on page 241.
Endeca Confidential
102
Using the setting descriptions below, type in the settings and click Create.
Description
The name of the host this component is located on. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. The path to the Log Server log file.
Default/recommended value
n/a
Working Directory
If working-dir is not specified, it defaults to $ENDECA_CONF/work/<app Name>/<componentName> on UNIX, or %ENDECA_CONF%\work\< appName>/<component Name> on Windows. If the Log File is not specified, the default is component working directory plus component name plus .log. Typically the Dgraph component port plus two (such as 8002). Note: The Log Server output directory cannot be the same as the Report Generator input directory, due to file usage contention issues. n/a
Log File
Port
Required. Port on which to run the LogServer. Required. Path and prefix name for the LogServer output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp. Required. Controls the archiving of log files. Possible values are true and false. Specifies the amount of time in seconds that the eaccmd waits while starting the Log Server. If it cannot determine that the Log Server is running in this timeframe, it times out.
Output Prefix
Gzip
Startup Timeout
Endeca Confidential
103
Description
An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
Default/recommended value
n/a
Endeca Confidential
104
Description
The name of the host this component is located on. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. The path to the Report Generator log file.
Default/recommended value
n/a
Working Directory
If working-dir is not specified, it defaults to $ENDECA_CONF/work/<app Name>/<componentName> on UNIX, or %ENDECA_CONF%\work\< appName>/<component Name> on Windows. If the log-file is not specified, the default is component working directory plus component name plus .log. Note: The Log Server output directory cannot be the same as the Report Generator input directory, due to file usage contention issues.
Log File
Required. Path to the file or directory containing the logs to report on. If it is a directory, then all log files in that directory are read. If it is a file, then just that file is read.
Endeca Confidential
105
Description
Required. Name the generated report file and path to where it is stored. For example: C:\Endeca\reports\weekly\ myreport.xml on Windows /endeca/reports/weekly/my report.xml on UNIX
Default/recommended value
Note: If you are running the report generation script provided with the installation, the value of Output File will change with every run of the script, based on the specific output file for the day or week of the run. %ENDECA_CONF%\etc\ tools_report_stylesheet.xsl on Windows $ENDECA_CONF/etc/tools_ report_stylesheet.xsl on UNIX
Stylesheet File
Required. Filename and path of the XSL stylesheet used to format the generated report.
Settings File
These set the report window to the given date and time. The date format should be either yyyy_mm_dd or yyyy_mm_dd.hh_mm_ss. For example, 2007_01_25.19_30_57 expresses Jan 25, 2007 at 7:30:57 in the evening. Turns on the generation of report charts. Should indicate a JDK 1.5.x or later.
n/a
Charts
Disabled by default.
Java Binary
Endeca Confidential
106
Description
Command-line options for the java_binary setting. This command is primarily used to adjust the Report Generator memory, which defaults to 1GB and to adjust the language code for reports, which defaults to English. To set the memory, use the following (ignore the linebreak): java_options = -Xmx[MemoryInMb]m -Xms[MemoryInMb]m
Default/recommended value
n/a
Arguments
Command-line flags to pass to the Report Generator, expressed as a set of arg sub-elements.
n/a
Endeca Confidential
107
IMPORTANT: Keep in mind that even though the Report Generator component is in a Running state, it is not necessarily generating reports at that moment. Web Studio runs the Report Generator automatically, just after midnight, once a day or once a week, as specified. 3 To monitor the status of the Report Generator, click the Running link located next to its name. The screen displays the components status, its start time, and the length of time it has been running.
When you are finished provisioning, click Create. The next time you run the Report Generator, it creates reports in French.
Note: If you want to generate daily and weekly reports, check both.
Endeca Confidential
108
Click OK to save your configuration. If you select the report frequency in Web Studio, then Web Studio will automatically provision a host with the alias of webstudio for you, if one does not exist already. This host contains a directory provisioned with an alias of webstudio-report-dir. This is set to the following directory for report storage: On Windows:
%ENDECA_CONF%\reports\<app name>
On UNIX:
$ENDECA_CONF/reports/<app name>
After the directory mentioned in step 3 has been created, manually add daily and/or weekly sub-directories. These sub-directories are where Web Studio will look for reports to display. Note: The report generation script (described on page 109) creates these sub-directories for you.
If you did not already provision either or both of these scripts before you set the report frequency in Web Studio, Web Studio provisions them for you automatically, using the correct aliases. If you change the alias name of these scripts or remove them, Web Studio does not automatically run anything.
If you are not using an EAC script to control the report generation process, you may want to automate the process using the Scheduled Tasks control panel on Windows or crontab task scheduler on UNIX. See your operating system documentation for details about automated scheduling.
Endeca Confidential
109
Daily report files must be copied to the following subdirectory on the Web Studio server: On Windows:
%ENDECA_CONF%\workspace\reports\<app name>\daily
On UNIX:
$ENDECA_CONF/workspace/reports/<app name>/daily
Weekly report files must be copied to the following subdirectory on the Web Studio server: On Windows:
%ENDECA_CONF%\workspace\reports\<app name>\weekly
On UNIX:
$ENDECA_CONF/workspace/reports/<app name>/weekly
It looks at the list of files in the Log Server and determines which of these files are relevant to the report that you requested, such as daily or weekly. If a frequency of daily is specified on the command line, logs from the previous day are requested. If a frequency of weekly is
Endeca Confidential
110
specified, logs from the seven days previous to the time the script is run are requested. Optionally, it can also tell the Log Server to roll its current log and start a new one. This is useful if you want to control the size of a log file or keep it within the requested date range.
It moves the relevant files from the Log Server host to the Report Generator host and instructs the Report Generator to generate a report. For weekly reports, it passes the Report Generator the exact dates of the seven days ending yesterday. It moves the report files from the Report Generator host to the Web Studio host. It instructs the Log Server to delete all files over 30 days old.
If you want to run both daily and weekly reports, add a separate script for each time range. A version of the report generation script is included with the Endeca software, and is stored in %ENDECA_ROOT%\bin\generate-report.bat for Windows ($ENDECA_ROOT/bin/generate-report.sh for UNIX). You can copy and modify this script as needed. For information, see Editing the report generation script on page 111.
Notes:
The report generation script, generate-report.bat or generate-report.sh, overrides the options specified for the Log Server and Report Generator components in Web Studio. For instance, it ignores previously-set options for Start Date and Stop Date. The supplied script for report generation does not support multi-platform scenarios (although multi-machine scenarios are supported). If you want to perform multi-platform report generation, you need to update the script or provide your own. The Log Servers Output Prefix should not be set to write to the same folder as the Report Generators Input File or Directory. If these two components are set to write to the same directory, you will receive an error.
Endeca Confidential
111
The script source tree is installed as part of the Endeca reference implementation, and can be found in %ENDECA_REFERENCE%\eac_scripts on Windows, or $ENDECA_REFERENCE/eac_scripts on UNIX. The executable files for the script are stored in the %ENDECA_ROOT%\bin (Windows) or $ENDECA_ROOT/bin (UNIX); they depend on the eacscript.jar file in %ENDECA_ROOT%\lib\java (Windows) or $ENDECA_ROOT/lib/java (UNIX).
You can generate your own version of the eacscript.jar file by modifying the source files in the reference implementation.
2 3
Endeca Confidential
112
c d
For Agent Port, specify the port the agent is using, such as 8888. For Custom Directories, specify webstudio-report-dir as
C:\Endeca\MDEXEngine\workspace\reports\<app name> on Windows, or /usr/local/endeca/workspace/reports/<app name> on UNIX (assuming you installed to /usr/local). This is the directory where Web Studio will look for report files, so it must match.
e 2 3 4
Configure and start the Log Server. For instructions, see page 101. Configure the Report Generator but do not start it. For instructions, see page 103. Add a report generation script by selecting the Scripts tab and completing the information as follows. (If you want to generate daily and weekly reports, add two separate scripts, one for each.) a For New Script Alias, specify the name of the script. For a daily report, specify DailyReports; for a weekly report, specify WeeklyReports. For Command, for a daily report script, specify: On Windows:
%ENDECA_ROOT%\bin\generate-report.bat daily
On UNIX:
$ENDECA_ROOT/bin/generate-report.sh daily
ON UNIX:
$ENDECA_ROOT/bin/generate-report.sh weekly
Note: If you check off the boxes in the Application Settings > Report Generation section, this step will be done for you.
Endeca Confidential
113
Note: If you specified a different Endeca HTTP service port, use that instead. 3 4 5 At the Web Studio login page, click Log In. In the Enter Network Password dialog, enter the user name and password and click OK. In the navigation menu, click View Reports.
The default report is the current daily report, which is the same as the Current (daily) link on the navigation menu (in the left pane of the View Reports page). The navigation menu also lets you view:
Endeca Confidential
114
Current weekly report, via the Current (weekly) link List of archived daily reports, via the Daily link List of archived weekly reports, via the Weekly link
For example, click the Weekly link to display a list of weekly reports that looks like this:
From the list of weekly reports, click a specific report to display its contents. Note: For more information on using the View Reports page, see the Endeca Web Studio Help.
Configuring report contents and format Customizing the report generation file Generating HTML reports
Endeca Confidential
115
Viewing reports produced by other Report Generators Archiving and deleting log files and reports
Endeca Confidential
116
Archive your log files on a weekly basis. When the Report Generator processes reporting information, it processes all log files contained in the logs directory you specified and any of its subdirectories. This processing has performance implications as the size of your log data grows. To minimize log processing time, Endeca recommends that you archive your log files on a weekly basis to a directory that is not under the logs directory.
Delete outdated log files. The EAC script for report generation retains 30 days of log files, in case a report does not generate properly. More specifically, after a report has been successfully generated, any log files that are more than 30 days older than the start of the reports time period are deleted. By extension, if a report is not successfully generated, no log files are deleted, ensuring that no data is lost. If you are not using the report generation script, you need to purge log files manually.
Endeca Confidential
117
Delete outdated reports. Reports are never deleted by the Endeca Application Controller or Web Studio. Therefore, it is the administrators responsibility to check the contents of the reports directory on the Web Studio server periodically and manually delete any obsolete or unwanted reports.
Endeca Confidential
118
Endeca Confidential
Chapter 8
About the Endeca Standard Application Accessing the Standard Application Configuring the Standard Application Installing the Standard Application on Tomcat Installing the Standard Application on WebLogic
120
It runs under the Endeca Application Controller (when installed as part of the Endeca Application Controller and Web Studio feature). It can be installed to run on WebLogic 5.1 (or later) and Tomcat 3.0 (or later) application servers. It can be used against only one instance of a MDEX Engine. That is, you cannot use your Web browser to point the Standard Application at a different MDEX Engine. If you want to use a different MDEX Engine, you must install a second instance of the Standard Application. You cannot modify the source code, unlike for the JSP, ASP, and ASP.NET reference applications. However, you can change some configuration parameters, as described below on page 123. The Endeca Access Control System (that is, user authentication) and SSL can be configured for the Standard Application when it is running under a Tomcat or WebLogic application server.
All configuration will happen at deployment time, using standard J2EE environment entries in the appropriate configuration file (warname.runtime.xml file for WebLogic or server.xml and endeca_standard.xml for Tomcat).
Endeca Confidential
121
Display features
The user interface of the Standard Application is similar to those of the reference implementations, but with some features removed for the sake of simplicity. For example:
The hostname and port number of the MDEX Engine for the Standard Application are pre-configured by the deployer, so that the user does not have to supply them. The UI has a Download as CSV button to allow the user to download all matching records in a CSV (comma separated values) format. (This feature is useful if the user wants to export them to Microsoft Excel, for example.) The display key for each record is Name. That is, your data must have a property called Name that is used as the title of each record. If this property does not exist, the record name will be Record 32874 or a similar display. If the record has a property named URL.External, it is used to create a link URL for the record name. For example, if the property value is the URL of a document, that document will be retrieved when you click on the record name.
If you select the Endeca Application Controller & Web Studio feature, the WAR is installed in the following directory: On Windows:
%ENDECA_ROOT%\tools\server
On UNIX:
$ENDECA_ROOT/tools/server
This scenario assumes that you will be running the Standard Application under the Endeca Application Controller. If you accept the default configuration settings, the Standard Application is already installed and
Endeca Confidential
122
running on your local machine as soon as the Endeca Application Controller is running. To change the Standard Application configuration, see page 123.
If you select the Endeca Standard Application feature, the WAR is installed in the following directory:
Windows: %ENDECA_ROOT%\applications UNIX: $ENDECA_ROOT/applications
This scenario assumes that you intend to install the Standard Application to run under a Tomcat or BEA WebLogic application server. For details, see Installing the Standard Application on Tomcat on page 126 or Installing the Standard Application on WebLogic on page 133.
If you select both features, you get two copies of the Standard Application WAR, in the directories mentioned above.
Endeca Confidential
123
For example:
http://localhost:8888/endeca_standard/
If you used a different HTTP Connector port when you configured the Endeca Application Controller, substitute that port number for 8888.
server.xml which is located in the $ENDECA_CONF/conf directory on UNIX (%ENDECA_CONF%\conf on Windows) endeca_standard.xml which is located in the $ENDECA_CONF/conf/Standalone/localhost directory on UNIX (%ENDECA_CONF%\conf\Standalone\localhost on Windows)
Endeca Confidential
124
Note: You cannot configure SSL and user authentication support for the Standard Application when it is running under the Endeca Application Controller. This configuration is available only when the Standard Application is running on a Tomcat or WebLogic application server. The endeca_standard.xml file contains several Context elements. A Context element represents an individual Web application that is running within its parent Host element. The Host element is specified in the server.xml file. The default Context definition for the Standard Application looks like this example:
<!-- Context configuration file for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase= "C:\Endeca\MDEXEngine\5.1.0\tools\server/standard-webapp-5.1.0.75.war" debug="0" privileged="false"> <Environment type="java.lang.String" name="ene-host" value="DOC-004"/> <Environment type="java.lang.Integer" name="ene-port" value="8000"/> </Context>
The meanings and defaults of the Context attributes, specified in endeca_standard.xml, are listed in the following table. Context attribute
path
Default setting
/endeca_standard
Description
The context path of the application. This is the name that the user enters on the browsers URL address field (as documented in the previous section) to access the main page of the application. Note that this value must be unique among all the Context elements for this Host definition. The path of the WAR that contains the application. Do not change this attribute unless you are moving the WAR to another location on the host.
docBase
Endeca Confidential
125
Context attribute
debug
Default setting
0
Description
Sets the verbosity debug level for logging messages. Higher numbers generate more detailed output. Specifies whether this context is allowed to use container servlets. Do not change this attribute. The name of the machine hosting the MDEX Engine.
privileged
false
Environment name="ene-host"
The name of the machine on which the Endeca software is installed. 8000
Environment name="ene-port"
The port on which the MDEX Engine is listening. Change this value only if you are running a MDEX Engine on a port other than the default 8000 port. The title of the application, which is displayed at the top of each page. The type is a java.lang.String value. If the value is empty or the attribute is missing, Endeca will be used as the title. Note that the default configuration does not use this attribute, but you can add it to the Context if you want use your own title.
Environment name="title"
This attribute is not set by default, which means that the default value Endeca is used as the page title.
Endeca Confidential
126
$ENDECA_ROOT/tools/server/bin/shutdown.sh
On Windows: a b c 6 From the Windows Control Panel, select Administrative Tools, and then select Services. In the right pane of the Services window, right-click Endeca HTTP service and choose Restart. Close the Services window.
Access the Standard Application to check that your changes were successfully implemented.
2 3
Endeca Confidential
127
<!-- Context configuration for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase="C:\Tomcat 5.5\webapps\standard-webapp-5.1.0.75.war"> <Environment name="ene-host" type="java.lang.String" value="web007"/> <Environment name="ene-port" type="java.lang.Integer" value="8000"/> <Environment name="title" type="java.lang.String" value="Endeca App"/> </Context>
4 5
Save and close the server.xml file. Restart the Tomcat server.
Assuming the above example configuration (both the Tomcat server and the MDEX Engine are running on host web007 and an HTTP Connector on port 8080 is being used), you would access the Standard Application with this URL in your browser:
http://web007:8080/endeca_standard
If you have a running MDEX Engine, you should see the Standard Application main page. Refer to the Tomcat documentation for full details on Tomcat configuration and deployment of WARs.
2 3 4
Endeca Confidential
128
Steps 2 to 4 are described in the following sections. Note: The following sections assume that you have configured the Java JSSE framework on the server, including setting up an SSL HTTP/1.1 connector. Consult the Tomcat documentation for details on the SSL setup procedure.
The following example shows the Standard Application Context element with the SSL environment entry:
<!-- Context configuration for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase="C:\Endeca\MDEXEngine\5.1.0\tools\server/standard-webapp-5.1.0.75.war"> <Environment name="ene-host" type="java.lang.String" value="web007"/> <Environment name="ene-port" type="java.lang.Integer" value="8000"/> <Environment name="title" type="java.lang.String" value="Endeca App"/> <Environment type="java.lang.Boolean" name="ene-ssl-is-enabled" value="true"/> </Context>
Endeca Confidential
129
-Djavax.net.ssl.keyStore specifies the keystore file. -Djavax.net.ssl.keyStorePassword specifies the password of the
keystore.
-Djavax.net.ssl.trustStore specifies the truststore file to use to
the truststore file. One way to provide these values to Tomcat is to use the Tomcat
CATALINA_OPTS environment variable, which provides Java runtime
options when the server is started. You can set the CATALINA_OPTS environment variable in an existing Tomcat startup file (.bat on Windows or .sh on UNIX) or create a wrapper file that sets the variable and then calls the Tomcat startup file. For example, this Windows batch file can be placed in the Tomcat bin directory and used to start the server:
@echo off setlocal set CLIENT_CERT=C:\Endeca\NavigationEngine\workspace\etc\eneCert.jks set CATALINA_OPTS=-Djavax.net.ssl.keyStore=%CLIENT_CERT% -Djavax.net.ssl.keyStorePassword=endeca -Djavax.net.ssl.trustStore=%CLIENT_CERT% -Djavax.net.ssl.trustStorePassword=endeca cd c:\tomcat\bin call c:\tomcat\bin\startup.bat endlocal
The values for the set CATALINA_OPTS command are actually on one line, but are shown as wrapping in the example.
Endeca Confidential
130
authenticate a users identity against and obtain authorization information from an LDAP directory or a local username/password file. The authorization information is used to build a user entitlement filter (that is, a security filter), which controls the records that are retrieved from a MDEX Engine query. The general procedure for configuring user authentication for the Standard Application is as follows: 1 2 3 4 5 Configure the Java JAAS framework on the Tomcat application server. Modify the Tomcat server.xml file to configure the Standard Application to use user authentication. Set up the Access Control System login configuration file. Configure the Endeca instance configuration to set access permissions on the Endeca records. Access the Standard Application and log in.
The following example shows the Standard Application Context element with user authentication enabled:
Endeca Confidential
131
<!-- Context configuration for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase="standard-webapp-5.0.0.war"> <Environment name="ene-host" type="java.lang.String" value="web007"/> <Environment name="ene-port" type="java.lang.Integer" value="8000"/> <Environment name="title" type="java.lang.String" value="Endeca App"/> <Environment type="java.lang.Boolean" name="enable-security-filters" value="true"/> </Context>
LDAPLoginModule for authentication against an LDAP server. FileLoginModule for authentication against a local password file.
When you create the login configuration file, it must have a configuration entry for this login module:
com.endeca.webapp.profind.auth.PassthroughLoginModule required;
The PassthroughLoginModule refers to an internal class in the Standard Application WAR. The following is an example login configuration file that configures the Access Control System to use a local file for authentication.
Endeca { com.endeca.webapp.profind.auth.PassthroughLoginModule required; }; StandardWebApp { com.endeca.navigation.FileLoginModule required passwordFile="C:/Endeca/NavigationEngine/workspace/etc/passwd" checkPasswords="true"; };
In the example, the Standard Application will use the FileLoginModule for authentication against the local password file specified by the passwordFile parameter. The format of the password file is described in Chapter 4 of the Endeca Security Guide for Java. If you want the Standard Application to use an LDAP server for user authentication, use the LDAPLoginModule (instead of the FileLoginModule)
Endeca Confidential
132
in the StandardWebApp configuration entry. See the Endeca Security Guide for Java for an example of an LDAPLoginModule configuration. After you set up the login configuration file, you must specify its location to the Tomcat application server via the java.security.auth.login.config property. One method of setting this property is to edit the JAVA_HOME/jre/lib/security/java.security file and add the name of the login configuration file, as in this Windows example:
# Default login configuration file login.config.url.1=file:C:/EndecaProjects/SSL/Login.conf
Please consult your application server documentation for full details on how to set this property.
Endeca Confidential
133
The user name and password are then authenticated against the LDAP server or the password file, depending on the login configuration. After authentication, the MDEX Engine will construct a user entitlement filter (based on the users group information) and return only the records that the user is authorized to see.
Endeca Confidential
134
Endeca Confidential
SECTION II
Administering Application Controller Environments
136
Administrators Guide
Endeca Confidential
Chapter 9
138
It provides the infrastructure to support Endeca projects from design through deployment and runtime. It replaces the Control Interpreter (deprecated in the 5.0 release), while leaving the Endeca tools (Developer Studio and Web Studio) largely intact. It uses open standards, such as the Web Services Descriptive Language (WSDL), which makes the Application Controller platform- and language-independent. As a result, the Application Controller supports a wide variety of applications in production. It allows you to handle complex operating environments that support features such as partial updates, delta updates, phased MDEX Engine updates, and more.
One instance serves as the EAC Central Server. This instance includes a WSDL interface, through which you communicate with the Application Controller. Communication is implemented with the standard Web Services protocol, SOAP. You can communicate with the Application Controller using any of the following methods:
Using Web Studio. Endeca Web Studio communicates through the WSDL interface to the EAC Central Server. Using Web Studio you can provision, run, and monitor your application. For details, see the Endeca Web Studio Help. Using the command line utility, eaccmd. eaccmd lets you script the Application Controller within a language such as Perl, shell, or batch. (For details, see Using the Eaccmd Tool on page 191.)
Endeca Confidential
139
Using direct programmatic control through the Endeca WSDL and languages, such as Java, that support Web Services. (For details, see Endeca Application Controller API Interface Reference on page 217.)
Using any of these methods, you can instruct the Application Controller to perform different operations in your Endeca implementations, such as start or stop a component (for example, Forge or Dgraph), or a utility (for example, Copy or Shell environment). The EAC Central Server also contains a repository that stores provisioning informationthat is, data about the hosts, components, applications and scripts that the Application Controller is managing.
All other instances of the EAC serve as Agents. The Agents instruct their host machines to do the actual work of an Endeca implementation, such as processing data with a Forge component, or coordinating the workings of multiple MDEX Engines with an Aggregated MDEX Engine component. Each Agent also contains a small repository for its own use. The EAC Central Server communicates with its Agents through an internal Web Service interface. You do not communicate directly with the Agentsall command, control, and monitoring functions are sent through the EAC Central Server.
Endeca Confidential
140
Endeca Confidential
141
In this diagram, the following happens: 1 The developer, business user, and system administrator provide instance configuration and resource configuration information to the EAC Central Server, using any of the three methods:
Developer Studio and Web Studio The eaccmd command line utility Direct programmatic control through the Endeca Web services interface, or any of the languages, such as Java.
The EAC Central Server uses that information to communicate with EAC Agents that run on each machine hosting an implementation. The Agents in turn run the necessary processes on each machine.
Endeca Confidential
142
Endeca Confidential
Chapter 10
Installing the Application Controller Specifying the EAC Central Server in Web Studio Starting and stopping the Application Controller directly Using the eac.properties file Modifying Application Controller logging levels
144
Install the Agent. The Agent controls the workings of a single machine in an Application Controller deployment. There are typically several Agents in a deployment. Install the EAC Central Server. The Central Server acts as a hub in an Application Controller deployment, relaying commands to each of the Agents in the deployment. As such, there is only a single Central Server per deployment. Alternatively, you can use an SSL-enabled Central Server. Upon configuration, this version encrypts the HTTP channel between the Central Server and the client Web services.
Install both.
During installation, when you select whether you want to run the Agent and/or the Central Server on a machine, an XML pointer to the appropriate WAR file is copied to its workspace directory. The presence or absence of these files in the workspace directory determines what that machine is running. If you want to run the SSL-enabled version of the Central Server, you must copy the XML pointer to it to your workspace directory manually, as described in the following section.
Enable the SSL version of the appropriate Application Controller WAR file (eac-ssl.war replaces eac.war for the Central Server, and eac-agent-ssl.war replaces eac-agent.war for the Agent). Modify the server.xml file for the Tomcat that is hosting the Application Controller.
Endeca Confidential
145
For details on enabling SSL security in the Application Controller, see the Endeca Security Guide for Java.
Note: If you followed the instructions to set the environment variables in the Endeca Installation Guide, you can use this shortened version, startup.sh, instead. You stop the Application Controller (along with any other components using the same port) with the following command:
$ENDECA_ROOT/tools/server/bin/shutdown.sh
Endeca Confidential
146
it to the inittab file, see the chapter titled Installing the Endeca Information Access Platform on UNIX in the Endeca Installation Guide.
If this setting is defined as an absolute path, the Copy utility uses it.
Endeca Confidential
147
If it is defined as a relative path, the Copy utility considers it to be relative to %ENDECA_CONF%/state/ If it is not defined, the Copy utility uses the directory
%ENDECA_CONF%/state/file_transfer/
defaults to one.
Endeca Confidential
148
1 2 3
Open logging.properties. Find the section EAC Log Level. In the line com.endeca.eac.level, change WARNING to INFO.
Endeca Confidential
Chapter 11
Provisioning overview About the provisioning file and schema Provisioning your implementation with eaccmd Forcing the removal of an application Incremental provisioning Provisioning your deployment with Endeca Deployment Template
150
Provisioning overview
Provisioning an Endeca implementation with the Application Controller consists of the following steps:
Creating a provisioning file, in which you define the hosts and components that comprise your implementation, as well as the scripts that it uses. Referencing that file when creating an implementation with the eaccmd tool or your custom Web service interface.
Note: This chapter provides examples using the sample wine reference implementation and the eaccmd tool. For information about provisioning programmatically using the WSDL, see Endeca Application Controller API Interface Reference on page 217. For information about provisioning EAC within Web Studio, see the Web Studio online help.
Application (the root element) Hosts (and, optionally, directories on hosts) Components Scripts.
Note: You can name this file anything you like. In the remainder of this chapter, we frequently refer to the provisioning file as app.xml. The provisioning schema, eaccmdProvisioning.xsd, is located in the MDEXEngine\<version>\conf\schema directory.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
151
Forward slash (/) Backslash (\) Colon (:) Asterisk (*) Question mark (?) Right and left angle brackets (< >) Double quotation mark ( ) Vertical pipe (|)
You can also specify an applicationID in the eaccmd tool, which is described in Chapter 12. If eaccmd specifies a different applicationID for the same application, it overrides the one provided in the provisioning file.
Endeca Confidential
152
Defining hosts
In the <hosts> element you list each <host> by a host ID, a host name, a port number, and (optionally) properties and directories. The <host> syntax is as follows:
<host host-id="host1" host-name="localhost" port="8888"> <properties> <property name="department" value="engineering" /> <property name="department" value="prof services" /> <property name="enforceDiskQuota" /> </properties> </host>
In this example the port is the HTTP port through which the EAC Central Server communicates with its Agents. The optional use of host-id to alias host definitions is explained in the following section. The optional addition of properties is described on page 154. The optional addition of directories is described on page 153.
Later, when defining components, you could simply refer to that host-id when specifying the host for a given component.
<dgidx name="dgidx-0" host-id="host1">
It allows you to switch staging and production machines easily, by changing the name and port associated with a host-id alias. It makes it possible to reference a single physical host through different host-id aliases.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
153
Notes:
The order of elements in a component does not matter. Unless otherwise noted, relative paths are supported. Required elements are labelled as such. If you attempt to provision a component without a required element, you will receive an error.
Endeca Confidential
154
Subsequently, when defining a Forge component, rather than having to enter the host machine and working directory like this:
<forge component-id="forge1" host-id="host1"> <working-dir>C:\Endeca\MDEXEngine\reference\ sample_wine_data\data\ </working-dir> ... </forge>
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
155
baseline update script that will execute the start component commands in the proper sequence. You can reuse this script as often as you like. Scripts live on the EAC Central Server; the EAC runs them from there. You can use scripts with the eaccmd tool, when accessing the Endeca WSDL programmatically, or within Web Studio. Details on starting, stopping, and obtaining status for scripts for each of these environments can be found in the following places:
Component and script control commands, located on page 201 of Using the Eaccmd Tool The ScriptControl interface, located on page 238 in the Endeca Application Controller API Interface Reference In the Web Studio Help
Note: Scripts are not supported on clusters that are not uniformly one platform.
A baseline update script that runs a very simple (Forge/Index/Dgraph) baseline update. An MDEX Engine update script that pushes configuration changes to the MDEX Engine.
Endeca Confidential
156
A report generation script that can run daily or weekly reports. This script is discussed in detail on page 109.
For reference, the script source tree will be installed as part of the Endeca reference implementation. Compiled scripts reside in $ENDECA_ROOT/bin, with any dependent jar files in $ENDECA_ROOT/lib/java.
EAC_HOST is the hostname for the EAC Central Server host. EAC_PORT is the port number for the EAC Central Server host. EAC_APP is the application in which this script is provisioned.
Provisioning scripts
Scripts, like hosts and components, need to be provisioned before they can be used in the Application Controller. Scripts can be provisioned with the following elements: Sub-element
script-id cmd log-file
Description
Required. The name of this script. Required. The command to launch the script. Name of the script log file. If log-file is not specified, the default value is used. Working directory for the process that is launched. If it is specified, it must be an absolute path. If working-dir is not specified, the default value of $ENDECA_CONF/working/(app_id)/ is used.
working-dir
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
157
Example
This example provisions two scripts.
<scripts> <script script-id="script1"> <cmd>runthis.sh</cmd> </script> <script script-id="script2"> <cmd>run.sh --this</cmd> </script> </scripts>
Endeca Confidential
158
Component reference
This section provides details and examples for the following components:
Forge on page 158 Dgidx on page 161 Dgraph on page 165 Agidx on page 168 Agraph on page 170 Crawler on page 174 LogServer on page 178 ReportGenerator on page 179
Note: In the components that follow, if input-dir, output-dir, or state-dir are not specified, they default to directories named input, output, and state respectively, underneath the components working-dir.
Forge
A Forge element launches the Forge (Data Foundry) software, which transforms source data into tagged Endeca records.
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
host-id
properties
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
159
Sub-elements
The Forge element contains the following sub-elements: Sub-element
args
Description
Command-line flags to pass to Forge, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>
input-dir log-file
The path to the Forge input. Name of the Forge log file. If the log-file is not specified, the default is component working directory plus component name plus .log. The implementation-specific prefix name, without any associated path information. Directory where the output from the Forge process will be stored. Required. Name of the Pipeline.epx file to pass to Forge. The number of partitions. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.
output-prefix-name
output-dir
pipeline-file
num-partitions working-dir
Endeca Confidential
Component reference
160
Sub-element
state-dir temp-dir web-service-port
Description
The directory where the state file is located. The temporary directory that Forge uses. The port on which the Forge metrics Web service listens. Both the parallel Forge and Forge metrics Web service can secure their communications with SSL. The ssl-configuration element contains three sub-elements of its own:
ssl-configuration
cert-file
The cert-file specifies the path of the eneCert.pem certificate file that is used by Forge processes to present to any client. This is also the certificate that the Application Controller Agent should present to Forge when trying to talk to it. The file name can be a path relative to the components working directory.
ca-file
The ca-file specifies the path of the eneCA.pem Certificate Authority file that Forge processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.
cipher
The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that parallel Forge processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. Note: The Forge metrics Web service does not use the cipher sub-element.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
161
Example
The following example provisions a Forge component for use with the sample wine data:
<forge component-id="wine_forge" host-id="wine_indexer"> <args> <arg>-vw</arg> </args> <num-partitions>1</num-partitions> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <pipeline-file>.\data\forge_input\pipeline.epx</pipeline-file> <input-dir>.\data\forge_input</input-dir> <output-dir>.\data\partition0\forge_output</output-dir> <state-dir>.\data\partition0\state</state-dir> <log-file>.\logs\wine_forge.log</log-file> <output-prefix-name>wine</output-prefix-name> </forge>
Dgidx
A Dgidx component sends the finished data prepared by Forge to the Dgidx program, which generates the proprietary indices for each Dgraph.
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
host-id
properties
Endeca Confidential
Component reference
162
Sub-elements
The Dgidx element contains the following sub-elements: Sub-element
args
Description
Command-line flags to pass to Dgidx, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>
app-config-prefix
Path and file prefix that define the input for Dgidx. For example, in /endeca/project/ files/myProject, files beginning with myProject in the directory /endeca/ project/files are the ones to be considered. Required. Path and prefix name for the Dgidx output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp.
output-prefix
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
163
Sub-element
log-file
Description
The path to and name of the Dgidx log files. If the log-file is not specified, the default is component working directory plus component name plus .log. Dgidx can generate three distinct log files: the basic component log file, and two files that log the subtasks described in run-aspell, below.
working-dir
Endeca Confidential
Component reference
164
Sub-element
run-aspell
Description
Specifies Aspell as the spelling correction mode for the implementation. This causes the Dgidx component to run dgwordlist and to copy the Aspell files to its output directory, where the Dgraph component can access them. The default is true. See log-file above for details on the logging of these subtasks. For Aspell details, see the Using Spelling Correction and Did You Mean section in the Endeca Developers Guide.
temp-dir
Example
The following example provisions a Dgidx component to work with the sample wine data:
<dgidx component-id="wine_dgidx" host-id="wine_indexer"> <args> <arg>-v</arg> </args> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <input-prefix>.\data\partition0\forge_output\wine</input-prefix> <app-config-prefix>.\data\partition0\forge_output\wine</app-config-prefix> <output-prefix>.\data\partition0\dgidx_output\wine</output-prefix> <log-file>.\logs\wine_dgidx.log</log-file> <run-aspell>true</run-aspell> </dgidx>
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
165
Dgraph
A Dgraph element launches the Dgraph (MDEX Engine) software, which processes queries against the indexed Endeca records.
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
host-id
properties
Sub-elements
The Dgraph element contains the following sub-elements: Sub-element
args
Description
Command-line flags to pass to Dgraph, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>
port
Required. The port at which the Dgraph should listen. The default is 8000.
Endeca Confidential
Component reference
166
Sub-element
log-file
Description
The path to and name of the Dgraph log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Required. Path and prefix name for the Dgidx output that the Dgraph uses as an input. Path and file prefix that define the input for the Dgraph. For example, in /endeca/project/ files/myProject, files beginning with myProject in the directory /endeca/ project/files are the ones to be considered. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/<co mponentName> on Windows.
input-prefix
app-config-prefix
working-dir
startup-timeout
Specifies the amount of time in seconds that the Application Controller waits while starting the Dgraph. If it cannot determine that the Dgraph is running in this timeframe, it times out. The default is 60.
req-log-file spell-dir
Path to and name of the request log. If specified, is the directory in which the Dgraph will look for Aspell files. If it is not specified, the Dgraph will look for Aspell files in the Dgraphs input directory (that is, input-prefix without the prefix). For example, if input-prefix is /dir/prefix and all the Dgraph input files are /dir/prefix.*, the Dgraph will look for the Aspell files in /dir/).
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
167
Sub-element
update-dir
Description
Specifies the directory from which the Dgraph reads partial update file. For more information, see the "Implementing Partial Updates" section in the Endeca Information Transformation Layer Guide. Specifies the file for update-related log messages. A temporary directory used by this component. Contains three sub-elements of its own:
update-log-file
temp-dir ssl-configuration
cert-file
The cert-file specifies the path of the eneCert.pem certificate file that is used by the Dgraph processes to present to any client. This is also the certificate that the Application Controller Agent should present to the Dgraph when trying to talk to the Dgraph. The file name can be a path relative to the components working directory.
ca-file
The ca-file specifies the path of the eneCA.pem Certificate Authority file that the Dgraph processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.
cipher
The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that the Dgraph processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. See the Endeca Security Guide for more information.
Endeca Confidential
Component reference
168
Example
The following example provisions an SSL-enabled Dgraph component for use with the sample wine data:
<dgraph component-id="wine_dgraph" host-id="wine_indexer"> <args> <arg>--spl</arg> <arg>--dym</arg> </args> <port>8000</port> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <input-prefix>.\data\partition0\dgraph_input\wine</input-prefix> <app-config-prefix>.\data\partition0\dgraph_input\wine</app-config-prefix> <log-file>.\logs\wine_dgraph.log</log-file> <req-log-file>.\logs\wine_dgraph_req_log.out</req-log-file> <startup-timeout>120</startup-timeout> <ssl-configuration> <cert-file>C:\Endeca\MDEXEngine\workspace\etc\eneCert.pem</cert-file> <ca-file>C:\Endeca\MDEXEngine\workspace\etc\eneCA.pem</ca-file> <cipher>AES128-SHA</cipher> </ssl-configuration> </dgraph>
Agidx
An Agidx component runs Agidx on a machine, creating a set of Agidx indices that support the Agraph program in a distributed environment. The Agidx component is used only in distributed environments and is run sequentially on multiple machines. On the first machine, the Agidx component takes the Dgidx output from that machine as its input. On the next machine, the output from the first Agidx run is copied over, using the Copy service. It, along with the Dgidx output from that machine, is used as Agidx input.
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
169
Attribute
host-id
Description
Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
properties
Sub-elements
The Agidx element contains the following sub-elements: Sub-element
args
Description
Command-line flags to pass to Agidx, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>
output-prefix
Required. Path and prefix name for the Agidx output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp.
log-file
The path to and name of the Agidx log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Required. The path to the output of various Dgidxes, which Agidx uses as input. These are listed as a set of input-prefix sub-elements.
input-prefixes
Endeca Confidential
Component reference
170
Sub-element
working-dir
Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.
previous-agidx-outputprefix
The file prefix of the Agidx data from the previous run, which has been copied to this machine by a Copy operation. This parameter should not be used when running the Agidx component on the first data subset.
Example
The following example provisions an Agidx component to work with the sample wine data:
<agidx component-id="mkt_agidx" host-id="host2"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data </working-dir> <args> <arg>-v</arg> </args> <input-prefixes> <input-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\dgidx_output1\wine</input-prefix> <input-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\dgidx_output2\wine</input-prefix> </input-prefixes> <output-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\agidx\wine</output-prefix> <log-path>C:\Endeca\MDEXEngine\workspace\logs\agidx.out</log-path> </agidx>
Agraph
An Agraph component runs the Agraph program, which defines and coordinates the activities of multiple, distributed Dgraphs.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
171
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
host-id
properties
Sub-elements
The Agraph component contains the following sub-elements: Sub-element
args
Description
Command-line flags to pass to Agraph, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>
port
Required. The port at which the Agraph should listen. The path to and name of the Agraph log file. If the log-file is not specified, the default is component working directory plus component name plus .log.
log-file
Endeca Confidential
Component reference
172
Sub-element
children
Description
Required. A list of the child Dgraphs and related devices for this Agraph, children is a single element that can contain a mixture of dgraph-ref and host-port elements.
working-dir
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
173
Sub-element
startup-timeout
Description
Specifies the amount of time in seconds that the Application Controller will wait while starting the Agraph. If it cannot determine that the Agraph is running in this timeframe, it times out. The default is 60.
req-log-file ssl-configuration
Path to and name of the request log. Contains three sub-elements of its own:
cert-file
The cert-file specifies the path of the eneCert.pem certificate file that is used by the Agraph processes to present to any client. This is also the certificate that the Application Controller Agent should present to the Agraph when trying to talk to the Agraph. The file name can be a path relative to the components working directory.
ca-file
The ca-file specifies the path of the eneCA.pem Certificate Authority file that the Agraph processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.
cipher
The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that the Agraph processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. See the Endeca Security Guide for more information.
Endeca Confidential
Component reference
174
Example
The following example provisions a non-SSL Agraph component to work with the sample wine data:
<agraph component-id="mkt_agraph-3" host-id="host2"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <args/> <port>10020</port> <app-config-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ forge_input\wine</app-config-prefix> <log-file>C:\Endeca\MDEXEngine\workspace\logs\agraph3.out</log-file> <req-log-file>C:\Endeca\MDEXEngine\workspace\logs\agraph_requests3.out </req-log-file> <children> <dgraph-ref component-id="dgraph-0"/> <!-- <dgraph-ref component-id="dgraph-1"/> --> <host-port host-name="localhost" port="9900"/> <!-- <host-port host-name="localhost" port="9901"/> --> </children> <input-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\agraph-3\wine</input-prefix> <startup-timeout>120</startup-timeout> </agraph>
Crawler
A Crawler component runs the Endeca Advanced Crawler, which creates Endeca records based on crawled source documents. For more information about the Advanced Crawler, see the Endeca Information Transformation Layer Guide.
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running.
host-id
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
175
Attribute
properties
Description
An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
Sub-elements
The Crawler component contains the following sub-elements: Sub-element
working-dir
Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.
log-file
The path to and name of the Crawler log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Command-line flags to pass to the Crawler, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>
args
default-settings-file
Required. Path to the default settings file for this Crawler component. The file is typically named something like <prefix>.crawler_ defaults.properties.
Endeca Confidential
Component reference
176
Sub-element
global-config-file
Description
Required. Path to the global configuration file for this Crawler component. The file is typically named something like <prefix>.crawler_ global_config.xml. Required. Path to the profile configuration file to use for this crawler run. The file is typically named something like crawler_profile_ 1_config.xml. Required. Path to the file that contains the list of URLs to crawl. The file is typically named something like crawl_profile_1_url_list.xml. Required. Path and prefix name for the data the Crawler component stores. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp. Also, any downloaded files the crawler stores are in a subdirectory of output_prefix called \crawler_downloaded_files.
profile-config-file
url-list-file
output-prefix
port
Port on which to run the Crawler component. The default is 8099. Java Virtual Machine settings. If you are modifying Java source files, you may need to modify these settings, which are passed to the Java process. Class path add-ons. If you are modifying Java source files, the modifications may require additions to the class path.
java-options
classpath-elements
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
177
Example
The following example provisions a Crawler component based on the sample wine data.
<crawler component-id="mkt_crawler" host-id="host2"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data </working-dir> <port>9099</port> <default-settings-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ wine.crawler_defaults.properties </default-settings-file> <global-config-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ wine.crawler_global_config.xml </global-config-file> <profile-config-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ crawl_profile_1_config.xml </profile-config-file> <url-list-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ crawl_profile_1_url_list.xml </url-list-file> <output-prefix>wine</output-prefix> </crawler>
Endeca Confidential
Component reference
178
LogServer
The LogServer component controls the use of the Endeca Log Server.
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
host-id
properties
Sub-elements
The LogServer component contains the following sub-elements: Sub-element
port output-prefix
Description
Required. Port on which to run the LogServer. Required. Path and prefix name for the LogServer output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp. Required. Controls the archiving of log files. Possible values are true and false.
gzip
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
179
Sub-element
working-dir
Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.
startup-timeout
Specifies the amount of time in seconds that the eaccmd waits while starting the LogServer. If it cannot determine that the LogServer is running in this timeframe, it times out. The default is 60.
log-file
The path to the LogServer log file. If the log-file is not specified, the default is component working directory plus component name plus .log.
Example
The following example provisions a LogServer component based on the sample wine data.
<logserver component-id="wine_logserver" host-id="wine_indexer"> <port>8002</port> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <output-prefix>.\logs\logserver_output\wine</output-prefix> <gzip>false</gzip> <startup-timeout>120</startup-timeout> <log-file>.\logs\wine_logserver.log</log-file> </logserver>
ReportGenerator
The ReportGenerator component runs the Report Generator, which processes Log Server files into HTML-based reports that you can view in your Web browser and XML reports that you can view in Web Studio.
Endeca Confidential
Component reference
180
Attributes
Every Application Controller component contains the following attributes: Attribute
component-id
Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.
host-id
properties
Sub-elements
The ReportGenerator component contains the following sub-elements: Sub-element
working-dir
Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.
input-dir-or-file
Required. Path to the file or directory containing the logs to report on. If it is a directory, then all log files in that directory are read. If it is a file, then just that file is read.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
181
Sub-element
output-file
Description
Required. Name the generated report file and path to where it is stored. For example: C:\Endeca\reports\myreport.html on Windows /endeca/reports/myreport.html on UNIX
stylesheet-file
Required. Filename and path of the XSL stylesheet used to format the generated report. For example: %ENDECA_CONF%\etc\ report_stylesheet.xsl on Windows $ENDECA_CONF/etc/report_stylesheet.xsl on UNIX
settings-file
Path to the report_settings.xml file. For example: %ENDECA_CONF%\etc\ report_settings.xml on Windows $ENDECA_CONF/etc/report_settings.xml on UNIX
timerange
Sets the time span of interest (or report window). Allowed keywords:
Endeca Confidential
Component reference
182
Sub-element
time-series
Description
Turns on the generation of time-series data and specifies the frequency, Hourly or Daily. Turns on the generation of report charts. Disabled by default. The path to the ReportGenerator log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Should indicate a JDK 1.5.x or later. Defaults to the JDK that Endeca installs. Command-line options for the java_binary setting. This command is primarily used to adjust the ReportGenerator memory, which defaults to 1GB. To set the memory, use the following (ignore the linebreak): java_options = -Xmx[MemoryInMb]m -Xms[MemoryInMb]m
charts
log-file
java_binary
java_options
args
Example
The following example provisions a ReportGenerator component based on the sample wine data.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
183
<reportgenerator component-id="wine_gen_html_report" host-id="wine_indexer"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <input-dir-or-file>.\logs\logserver_output</input-dir-or-file> <output-file>.\reports\daily\daily_report.html</output-file> <stylesheet-file>.\etc\report_stylesheet.xsl</stylesheet-file> <settings-file>.\etc\report_settings.xml</settings-file> <timerange>day-so-far</timerange> <charts>true</charts> <log-file>.\logs\wine_gen_html_report.log</log-file> </reportgenerator>
Endeca Confidential
184
In this scenario, there are three machines: devhost, which serves as the EAC Central Server, and dev555 and dev777, which serve as Agent machines running Forge and Dgraph respectively. The Application Controller is installed identically on each machine. Eaccmd is run on devhost (aliased host_1), using HTTP port 8888. Eaccmd issues commands to the EAC Central Server, which in turn passes them on to Agent machines dev555 (aliased data_proc) and dev777 (aliased dgraph_1) via HTTP. The EAC Central Server machine, devhost, handles all direct communication with the user, while the Agent machines execute application tasks. Note: You can run eaccmd on any machine, as long as it is pointed at the EAC Central Server.
Agent 1 dev555 data_proc eaccmd HTTP connection EAC Central Server devhost host_1 Agent 2 dev777 dgraph_1 Dgraph HTTP connections Forge Dgidx
Application Execution
The following steps walk you through multi-machine provisioning and execution using the Application Controller. 1 First, write a provisioning document for the EAC Central Server in which you define all of the components and their corresponding host
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
185
machines. Save this document as app.xml. (For complete syntax, see page 152.) 2 Run eaccmd on the host_1 machine, using the app.xml provisioning document as follows:
eaccmd devhost:8888 define-app --app myApp --def app.xml
To start the component Forge on machine data_proc, issue this eaccmd command on host_1:
eaccmd devhost:8888 start --app myApp --comp forge
To start the component Dgidx on machine data_proc, issue this eaccmd command on host_1:
eaccmd devhost:8888 start --app myApp --comp dgidx
To start the component Dgraph on machine dgraph_1, we issue this eaccmd command on host_1:
eaccmd devhost:8888 start --app myApp --comp dgraph
In a WSDL tool, this behavior is controlled by the forceRemove property on the RemoveApplicationType class. For details, see page 261.
Incremental provisioning
With incremental provisioning, it is possible to add, remove, or modify one or more hosts, components, or scripts without having to bring down the entire implementation.
Endeca Confidential
186
You can perform incremental provisioning in eaccmd or your custom Web service tool. We use eaccmd in the examples below. Note: For the WSDL API, see Endeca Application Controller API Interface Reference on page 217.
Scripts can be changed at any time, as long as they are not running. Properties on either hosts or components can be changed at any time. Properties are described on page 154. Anything other than a property on a component cannot be changed, nor can a component be removed, if the component is either running or unreachable. Anything other than a property or a directory on a host cannot be changed, nor can a host be removed, if any components or utilities on it are running, or if the host is unreachable.
You can attempt to override the constraints mentioned above by using the --force flag, which is described on page 187.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
187
information individually, through the --cmd (command), --wd (working directory), and --log-file settings.
would first stop the component forge, if it is running, before updating it.
would first stop any running components or services on host dev777 before removing that host.
For example:
Endeca Confidential
Incremental provisioning
188
For example:
remove-component --force --app myApp --comp forge
To change the attributes of a previously-defined component in eaccmd, use the following syntax:
update-component [--force] --app app_id [--comp comp_id] --def def_file
For example:
update-component --force --app myApp --def newDgraphProps.xml
For example:
add-host --app myApp --host mktg022 --def myApp.xml
For example:
remove-host --force --app myApp --host dev777
To change the attributes of a previously-defined host in eaccmd, use the following syntax:
update-host [--force] --app app_id [--host host_id] --def def_file
For example:
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
189
For example:
add-script --app myApp --script newbaseline.pl --cmd perl
For example:
remove-script --app myApp --script testbaseline.pl
For example:
update-script --app myApp --script newbaseline.pl --def myApp.xml
Endeca Confidential
190
This template includes functionality required for a Dgraph deployment powered by the EAC and the Java EAC Development Toolkit, including support for baseline and partial index updates and Web Studio integration.
Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller
Endeca Confidential
Chapter 12
192
About eaccmd
When you manage your Endeca implementation with the Endeca Application Controller, you control and monitor its working through the EAC Central Server. You can communicate with the EAC Central Server in two ways:
With the eaccmd command-line tool, as described in this chapter. Through direct programmatic control with a language that understands Web services. The Application Controllers WSDL API is described in Endeca Application Controller API Interface Reference on page 217.
Running eaccmd
The eaccmd tool is installed by default in
C:\Endeca\MDEXEngine\<version>\bin on Windows. On UNIX, it is $ENDECA_ROOT/bin. You run eaccmd within a scripting environment such
as Bash or Perl. You can run eaccmd on any machine as long as it is pointing at the EAC Central Server. The eaccmd syntax, which is platform-independent, is described starting on page 194.
Eaccmd feedback
Eaccmd gives no feedback in cases of success (that is, if a component is running or completed or a service is completed). If an operation fails, a FAILED message is printed to the screen. If instead you want eaccmd to run asynchronously, you must use the --async flag on the command line after the command, as follows:
eaccmd host:port <cmd> [--async]
Endeca Confidential
193
Endeca Confidential
About eaccmd
194
Eaccmd usage
The eaccmd usage is as follows:
eaccmd host:eac_port <cmd> [--async] [-verbose]
where settings in square brackets ([ ]) are optional and <cmd> is one of:
[Provisioning commands:] define-app [--app app_id] [--def def_file] describe-app --app app_id [--canonical] remove-app [--force] --app app_id list-apps [Incremental Provisioning commands:] add-component --app app_id [--comp comp_id] --def def_file add-host --app app_id [--host host_id] --def def_file add-script --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...]) remove-component [--force] --app app_id --comp comp_id remove-host [--force] --app app_id --host host_id remove-script --app app_id --script script_id update-component [--force] --app app_id [--comp comp_id] --def def_file update-host [--force] --app app_id [--host host_id] --def def_file update-script [--force] --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...]) [Synchronization commands:] set-flag --app app_id --flag flag remove-flag --app app_id --flag flag remove-all-flags --app app_id list-flags --app app_id [Component and Script Control commands:] start --app app_id [--comp comp_id | --script script_id] stop --app app_id [--comp comp_id | --script script_id] status --app app_id [--comp comp_id | --script script_id] [Utility commands:] ls --app app_id --host host_id --pattern file_pattern start-util --type shell --app app_id [--token token] --host host_id [--wd working_dir] --cmd command [args...] start-util --type copy --app app_id [--token token] [--recursive] --from host_id --to host_id --src src_path --dest dest_path start-util --type backup --app app_id [--token token] --host host_id --dir ls [--method <copy|move>] [--backups num_backups] start-util --type rollback --app app_id [--token token] --host host_id --dir ls stop-util --app app_id --token token status-util --app app_id --token token
Endeca Confidential
195
Provisioning commands
The provisioning commands make it possible for you to define and manage your applications from the command line. Command
define-app [--app app_id] [--def def_file]
Description
Defines an application. Def_file takes an XML provisioning file, a sample of which, sample_wine_definition.xml, is located in the %ENDECA_REFERENCE_DIR%\ sample_wine_data\etc directory on Windows, or the $ENDECA_ REFERENCE_DIR\sample_wine_data\etc directory on UNIX. The provisioning file typically contains an application ID. If eaccmd specifies a different app_id for the same application, the eaccmd version overrides the one in provided in the provisioning file.
Describes an application. Returns an XML file in the format used by the def_file setting of define-app. If --canonical is specified, all paths are canonicalized, as described on page 157. Removes the named application. The optional --force flag indicates whether or not this remove operation should force any running components or services to stop before attempting the remove. Remove fails if any components or services are still running (that is, not forced to stop). Lists all defined applications.
list-apps
Endeca Confidential
196
Provisioning example
The following example defines an application called my_wine. (In this and all examples that follow we assume that the host and port are set in the eaccmd.properties file and so do not need to be included on the command line.)
eaccmd define-app --app my_wine --def sample_wine_definition.xml
Description
Adds a single component to an application. Def_file is a provisioning document. You can use a larger provisioning file for this purpose, or you can use one that specifies exactly one component or host. If you choose to use a larger provisioning file, then you must specify which component listed within it that you are adding, using the --comp flag. Adds a single host to an application. Def_file is a provisioning document. You can use a larger provisioning file for this purpose, or you can use one that specifies exactly one component or host. If you choose to use a larger provisioning file, then you must specify which host listed within it that you are adding, using the --host flag.
Endeca Confidential
197
Command
add-script --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...])
Description
Adds a script to an application. Scripts can be added at any time. You can use --def to specify a definition file to start the script, or use the following settings: --log-file is the file for appended stdout/stderr output. If it is not specified, it defaults to $ENDECA_CONF/logs/script/ (app_id).(script_id).log --wd is the working directory. If it is not specified, it defaults to $ENDECA_CONF/working/(app_id)/ --cmd is the command that is used to start the script. If --cmd is omitted, the first unrecognized argument is taken as the start of your command. Note: The --log-file and --wd, if used, should come before --cmd.
Removes a single component from an application. The optional --force flag indicates whether or not this remove operation should force any running components or services to stop before attempting the remove. Remove fails if any components or services are still running (that is, not forced to stop). Removes a single host from an application. The optional --force flag indicates whether or not this remove operation should force any running components or services to stop before attempting the remove. Remove fails if any components or services are still running (that is, not forced to stop). Removes a script from an application. The optional --force flag indicates whether or not this remove operation should force a running script to stop before attempting the remove.
Endeca Confidential
198
Command
update-component [--force] --app app_id [--comp comp_id] --def def_file
Description
Updates a component. Component properties can be updated at any time. Other changes cannot be made if the component is running or unreachable. The optional --force flag indicates that the Application Controller will attempt to force the conditions under which the specified updates can be made (by stopping stop a running component or utility invocation, for example). Regardless of whether or not the forced stop is successful, however, the update persists in the application provisioning, even if this leaves a dangling process somewhere. Updates a host. Host properties can be updated at any time. Other changes cannot be made if any components or services are running on the host, or if the host is unreachable. The optional --force flag indicates that the Application Controller will attempt to force the conditions under which the specified updates can be made (by stopping stop a running component or utility invocation, for example). Regardless of whether or not the forced stop is successful, however, the update persists in the application provisioning, even if this leaves a dangling process somewhere.
Endeca Confidential
199
Command
update-script [--force] --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...])
Description
Updates a script. The optional --force flag indicates whether or not this update operation should force a running script to stop before attempting the update. You can use --def to specify a definition file to update the script, or use the following settings: --wd is the working directory. If it is not specified, it defaults to $ENDECA_CONF/working/(app_id)/ --log-file is the file for appended stdout/stderr output. If it is not specified, it defaults to $ENDECA_CONF/logs/script/ (app_id).(script_id).log --cmd is the command that is used to start the script. If --cmd is omitted, the first unrecognized argument is taken as the start of your command. Note: The --log-file and --wd, if used, should come before --cmd.
Note: For more information about incremental provisioning, see page 185.
Endeca Confidential
200
Synchronization commands
Synchronization commands are used by the Synchronization service (described below) to manage application-level flags that let users know when processes are in use. Command
set-flag --app app_id --flag flag
Description
Sets a flag that demonstrates that a group of processes are in use. You specify the flag with the application name and a flag name, which may be arbitrary but should be well-known. Removes the named flag and releases the reserved processes.
remove-flag --app app_id --flag flag remove-all-flags --app app_id list-flags --app app_id
Removes all flags in an application and releases all reserved processes. Lists all flags in an application.
Endeca Confidential
201
Synchronization examples
The following example adds a flag called mkt1010 to the my_wine application:
eaccmd set-flag --app my_wine --flag mkt1010
Description
Starts a component or a script.
Gets the status of a component (one of Starting, Running, NotRunning, or Failed) or a script (one of Running, NotRunning, or Failed). Note: For information on changing the verbosity of the status message with the --verbose flag, see page 192.
Endeca Confidential
202
Utility commands
The utility commands allow you to run and monitor Application Controller utilities through the eaccmd tool.
Utility naming
Be sure to name your utilities carefully. If you create a new utility that has the same name as a running utility, an error is issued. However, if there is an existing utility with the same name that is not running, the new utility overwrites it.
Endeca Confidential
203
Description
Returns a list of files matching the pattern input in file_pattern. Note the following:
Wildcard behavior
The List Directory Contents command expands the wildcards in a pattern. If the expansion results in a file, it returns a file. If the expansion results in a directory, it returns the directory non-recursively. Wildcard expansion can result in any combination of files and directories. For example, assume that the following directories and files exist:
/home/endeca/reference/... /home/endeca/install.log /home/e.txt
Endeca Confidential
204
would list all of these files and directories, because they match the file_pattern.
Description
Starts a Shell utility with the specified command string. The token is a string. If you do not specify a token, one is generated and returned when you start the utility. The token is used to stop the utility or to get its status. --wd, which is optional, sets the working directory for the process that gets launched. If specified, it must be an absolute path. If wd is not specified, the setting defaults to %ENDECA_CONF%\working\ <appName>\shell on Windows or $ENDECA_CONF/working/ <appName>/shell on UNIX. The --cmd arguments are passed in a single string. If --cmd is omitted, the first unrecognized argument is taken as the start of your command.
Endeca Confidential
205
Command
stop-util --app app_id --token token
Description
Stops a Shell utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status. Gets the status of a Shell utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status.
Endeca Confidential
206
Directories are copied first to a temporary directory on the destination machine before being copied one file at a time to the target location. You can configure the location of this temporary directory in the eac.properties file, using the optional setting com.endeca.eac.filetransfer.fileTransferTempDir as follows:
If this setting is defined as an absolute path, the Copy utility uses it. If it is defined as a relative path, the Copy utility considers it to be relative to %ENDECA_CONF%/state/ If it is not defined, the Copy utility uses the directory
%ENDECA_CONF%/state/file_transfer/
If the Copy utility tries to copy a file to a location where another file already exists, the utility overwrites the preexisting file.
Enabling SSL
The Copy utility supports both SSL and non-SSL communication, with SSL being off by default. For details on enabling SSL, see the Endeca Security Guide for Java.
Destination directories
In most cases, the destination directory where the copied files are placed has to exist already. However, there are a few exceptions where the destination directory does not have to exist prior to the copy:
Copying just one file to the location of an existing file. Copying just one file to a new file name in an existing directory. Copying just one directory to a new directory name in an existing parent directory.
The Copy utility tries to write to a directory it doesnt have permissions to. There is not enough disk space. There is no file at the source location.
Endeca Confidential
207
The wildcard expression matches no files. When there are mismatches between directories and files. For example:
The Copy utility tries to copy a file to path where a directory with that name already exists. The Copy utility tries to create a directory in the destination and a file with that name already exists.
You cannot use .. to create paths that do not exist. For example, the path /temp/../../a.txt refers to a path that is above the root directory. This is an invalid path that causes the utility to fail. Asking for a copy that results in multiple files being written to the same location. For example, given the following directory structure on the source:
/trunk/src/a.txt /testbranch/src/a.txt
a copy from /t*/src/* to /temp would result in the Copy utility trying to write both a.txt files to the same location in the temp directory. There is no recovery for copies. Therefore, if the transfer of a large file fails, the entire file must be transferred again. Likewise, if a multi-file transfer fails before completion, you must either re-run the entire transfer or request only those parts that did not transfer.
On Windows, "C:\*.txt".
Endeca Confidential
208
On UNIX, "/home/endeca/test/*.txt".
Description
As part of the Copy utility, starts a copy. You identify the hostname, port, and path for both the source and destination directories. If the copy is local, you do not need to specify the host_id. Note: Keep in mind that you are not necessarily copying to the machine you are running eaccmd on. The hosts you are copying to and from are those you specified in your provisioning file. --token is a string used to stop the utility or get its status. If you do not specify a token, one is generated and returned when you start the utility. If --recursive is specified, it indicates that the Copy utility recursively copies any directories that match the wildcard. If --recursive is not specified, the Copy utility does not copy directories, even if they match the wildcard. Instead, it creates intermediate directories required to place the copied files at the destination path. --src is a string representing the file, wildcard, or directory to be copied. A --src must start with an absolute path, such as C:\ or /. A --src can contain . or .. as directory names, and expands * and ? wildcards. continued
Endeca Confidential
209
Command
start-util --type copy --app app_id [--token token] [--recursive] --from host_id --to host_id --src file_pattern --dest dest_path
Description
Continued from the previous page Note the following:
The parent of the destination already exists. You are copying only one thing.
stop-util --app app_id --token token Stops a Copy utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status. Gets the status of a Copy utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status.
Endeca Confidential
210
Endeca Confidential
211
Endeca Confidential
212
Endeca Confidential
213
Backup operations
Backup operations create an archive directory from an existing directory. The archive directory has the same name as the original directory, but with a timestamp appended to the end. The timestamp reflects the time when the backup operation was performed. For example, if the original directory is called logs and was backed up on October 11, 2006 at 8:00 AM, the backup operation creates a directory called logs.2006_10_11.08_00_00.
Description
Starts the backup operation. The token is a string. If you do not specify a token, one is generated and returned when you start the utility. The token is used to stop the utility or to get its status. The host and dir settings specify the path to the directory that will be archived. The method is either copy or move (the default). The optional backups setting specifies the maximum number of archives to store. This number does not include the original directory itself, so if backups is set to 3, you would have the original directory plus up to three archive directories, for a total of as many as four directories. The default num_backups is 5.
Stops a backup operation. The token is a string, either user-created or system-generated when you start the utility. The token can be used to stop the utility or to get its status. Gets the status of a backup operation. The token is a string, either user-created or system-generated when you start the utility. The token can be used to stop the utility or to get its status.
Endeca Confidential
214
Rollback operations
Rollback operations roll back the directory to the most recent backed up version. For example, say you have a directory called logs, one called logs.2006_10_11.08_00_00, and other, older versions. When you roll back, the following things happen:
Note: There can only be a single .unwanted directory at a time. If you roll back twice, the .unwanted directory from the first rollback is deleted.
Endeca Confidential
215
Description
Starts the rollback operation. The token is a string. If you do not specify a token, one is generated and returned when you start the utility. The token is used to stop the utility or to get its status. The host and dir settings specify the path to the directory that will be rolled back. Stops a rollback operation. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status. Gets the status of a rollback operation. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status.
Endeca Confidential
216
Endeca Confidential
Chapter 11
218
Notes:
The Application Controller schema is defined in eac.wsdl, which is located in the $ENDECA_ROOT/lib/services directory on UNIX (C:\Endeca\MDEXEngine\<version>\lib\services on Windows). You generate clent stubs (or proxies) using the eac.wsdl file located in the file system provided by the Endeca installation. You cannot generate client stubs using the SOAP Web services addresses associated with each service within the WSDL file.
IDType, TokenType, BackupMethodType, TimeRangeType, and TimeSeriesType can be treated as Strings PortNumber can be treated as an Integer TimeOut can be treated as a Long
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
219
ComponentControl interface
The ComponentControl interface provides component management capabilities.
FullyQualifiedComponentIDType parameters:
applicationID identifies the application to use. componentID identifies the component to use.
Throws:
EACFault is the error message returned by the Application Controller
stopComponent(FullyQualifiedComponentIDType stopComponentInput)
Stops the named component.
FullyQualifiedComponentIDType parameters:
applicationID identifies the application to use. componentID identifies the component to use.
Throws:
EACFault is the error message returned by the Application Controller
Endeca Confidential
ComponentControl interface
220
getComponentStatus(FullyQualifiedComponentIDType getComponentStatusInput)
Returns the status of a component.
FullyQualifiedComponentIDType parameters:
applicationID identifies the application to use. componentID identifies the component to use.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
A BatchStatusType object (for batch components; see page 247) or a StatusType object (for server components; see page 269).
Synchronization interface
The Synchronization interface manages application-level flags that let users know when processes are in use. For example, your code could create a flag named update-running to ensure that a new baseline update does not start while another update is already in progress.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
221
FullyQualifiedFlagIDType parameters:
applicationID identifies the application to use. flagID is a unique string identifier for this flag.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
A Boolean, false if the flag was already set, or true if it was not set meaning the method succeeded).
removeFlag(FullyQualifiedFlagIDType removeFlagInput)
Removes the named flag.
FullyQualifiedFlagIDType parameters:
applicationID identifies the application to use. flagID is a unique string identifier for this flag.
Endeca Confidential
Synchronization interface
222
Throws:
EACFault is the error message returned by the Application Controller
removeAllFlags(IDType removeAllFlagsInput)
Removes all flags in an application.
IDType parameter:
applicationID identifies the application to use.
Throws:
EACFault is the error message returned by the Application Controller
listFlags(IDType listFlagsInput)
Lists the collection of flags in an application.
IDType parameter:
applicationID identifies the application to use.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
flagIDList, a string collection of flagIDs.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
223
Utility interface
The Utility interface allows you to manage the Application Controller utilities (Shell, Copy, and Archive) programmatically. Note: Be sure to name your utilities carefully. If you create a new utility that has the same name as a running utility, an error is issued. However, if there is an existing utility with the same name that is not running, the new utility overwrites it.
RunBackupType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
hostID is a unique identifier for the host. The hostID and dirName
Endeca Confidential
Utility interface
224
backupMethod is either copy or move. numBackups specifies the maximum number of archives to store. This number does not include the original directory itself, so if numBackups is set to 3, you would have the original directory plus up to three
archive directories, for a total of as many as four directories. The default numBackups is 5.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
The string token assigned to this invocation.
startFileCopy(RunFileCopyType startFileCopyInput)
Launches the Copy utility, which copies files either on a single machine or between machines.
RunFileCopyType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
fromHostID is a unique identifier for the host from which you are
copying.
toHostID is a unique identifier for the host to which you are copying. sourcePath is a string representing the file, wildcard, or directory to be copied. A sourcePath must start with an absolute path, such as C:\ or /. A sourcePath can contain . or .. as directory names, and expands * and ? wildcards.
You cannot use the wildcard expressions .*, .?, or ..* as directory or file names.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
225
Bracket wildcards, such as file[123].txt, are not supported. Wildcards cannot be applied to drive names.
destinationPath is the full path to the destination file or directory. destinationPath must be an absolute path, and no wildcards are
allowed. Note: The destination directory must exist, unless the parent of the destination already exists and you are copying only one thing.
recursive, when true, indicates that the Copy utility recursively copies
any directories that match the wildcard. If recursive is false, the Copy utility does not copy directories, even if they match the wildcard. Instead, it creates intermediate directories required to place the copied files at the destination path.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
The string token assigned to this invocation.
startRollback(RunRollbackType startRollbackInput)
Rollback operations roll back the directory to the most recent backed up version. For example, say you have a directory called logs, one called logs.2007_1_11.08_00_00, and other, older versions. When you roll back, the following things happen:
Note: There can only be a single .unwanted directory at a time. If you roll back twice, the .unwanted directory from the first rollback is deleted. IMPORTANT: Do not start a backup or rollback operation while another such operation is in progress on the same directory. Unexpected behavior may occur if you do so.
Endeca Confidential
Utility interface
226
RunRollbackType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
hostID is a unique identifier for the host. The hostID and dirName
Throws:
EACFault is the error message returned by the Application Controller
Returns:
The string token assigned to this invocation.
startShell(RunShellType startShellInput)
The startShell() method launches the Shell utility, which allows you to run arbitrary commands in a host system shell.
RunShellType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
hostID is a unique identifier for the host. cmd is the command line to execute. workingDir is the full path to the working directory.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
227
Throws:
EACFault is the error message returned by the Application Controller
Returns:
The string token assigned to this invocation.
stop(FullyQualifiedUtilityTokenType)
Takes a token returned by any of the start methods, and stops that invocation by terminating the process that is running it.
FullyQualifiedUtilityTokenType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility.
Throws:
EACFault is the error message returned by the Application Controller
Parameters:
applicationID identifies the application to use. token identifies the token used to get the utilitys status.
Throws:
EACFault is the error message returned by the Application Controller
Endeca Confidential
Utility interface
228
Returns:
A BatchStatusType object (see page 247).
listDirectoryContents(ListDirectoryContentsInputType listDirectoryContentsInput)
Performs a list operation similar to UNIX ls on a remote host, with the following restrictions on the input file pattern:
A filePattern must start with an absolute path, such as C:\ or /. A filePattern can contain . or .. as directory names, and expands * and ? wildcards. A filePattern cannot contain the wildcard expressions .*, .?, or ..* as directory or file names. Bracketed wildcards, such as file[123].txt, are not supported. Wildcards cannot be applied to drive names. You cannot use .. to create paths that do not exist. For example, the path /temp/../../a.txt refers to a path that is above the root directory. This is an invalid path that causes the operation to fail.
ListDirectoryContentsInputType parameters:
applicationID (required) identifies the application to use. hostID (required) is a unique identifier for the host. filePattern (required) is the name of the directory, file, or wildcard combination of directory and file whose contents are to be listed.
Throws:
EACFault is the error message returned by the Application Controller
when the method fails. Failure conditions correspond to bad input cases.
Returns:
A FilePathListType object representing the contents of the requested directory.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
229
Provisioning interface
The Provisioning interface allows you to define and manage your Endeca applications programmatically.
ApplicationType parameters:
applicationID identifies the application to use. hosts is a collection of HostType objects (see page 258), representing
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
Endeca Confidential
Provisioning interface
230
getApplication(IDType getApplicationInput)
Gets an application, which is composed of hosts, components, and scripts and identified by an application ID.
IDType parameter:
applicationID identifies the application to use.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
An ApplicationType object, as described on page 246.
getCanonicalApplication(IDType getCanonicalApplicationInput)
The getCanonicalApplication() method returns the provisioning just as getApplication() does, but with all paths canonicalized. This process ensures that all paths are absolute, and that the working directory and log path settings are provided. It also prevents .. from being used in a path name.
IDType parameter:
applicationID identifies the application to use.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
An ApplicationType object, as described on page 246.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
231
listApplicationIDs(listApplicationIDsInput)
Lists the applications that are defined.
Returns:
An ApplicationIDListType object, as described on page 245.
Throws:
EACFault is the error message returned by the Application Controller
removeApplication(RemoveApplicationType removeApplicationInput)
Removes the named application.
RemoveApplicationType parameter:
applicationID identifies the application to use.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
addComponent(AddComponentType addComponentInput)
Adds a single component to an application.
Endeca Confidential
Provisioning interface
232
AddComponentType parameters:
applicationID identifies the application to use. component is one of the following:
Forge (see ForgeComponentType class on page 255) Dgidx (see DgidxComponentType class on page 250) Dgraph (see DgraphComponentType class on page 251) Agidx (see AgidxComponentType class on page 243) Agraph (see AgraphComponentType class on page 244) Crawler (see CrawlerComponentType class on page 248) LogServer (see LogServerComponentType class on page 259) ReportGenerator (see ReportGeneratorComponentType class on page 262)
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
removeComponent(RemoveComponentType removeComponentInput)
Removes a single component from an application.
RemoveComponentType parameters:
applicationID identifies the application to use. componentID identifies the component to use.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
233
the component to stop before attempting the remove. If the component is running, and forceRemove is not set to true, then the remove call will fail.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
updateComponent(UpdateComponentType updateComponentInput)
Updates a running component.
UpdateComponentType parameters:
applicationID identifies the application to use. component is one of the following:
Forge (see ForgeComponentType class on page 255) Dgidx (see DgidxComponentType class on page 250) Dgraph (see DgraphComponentType class on page 251) Agidx (see AgidxComponentType class on page 243) Agraph (see AgraphComponentType class on page 244) Crawler (see CrawlerComponentType class on page 248) LogServer (see LogServerComponentType class on page 259)
Endeca Confidential
Provisioning interface
234
force the conditions under which the update can take place, by stopping running components.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
addHost(AddHostType addHostInput)
Adds a host to an application.
AddHostType parameters:
applicationID identifies the application to use. host is a HostType object (see page 258) specifying the host to add. directories allows you to specify directories using a full path and a
name. These directories are associated with hosts and created when the host is provisioned.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
235
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
updateScript(UpdateScriptType updateScriptInput)
Updates a running script.
UpdateScriptType parameters:
applicationID identifies the application to use. script is a ScriptType object specifying the script to be updated. forceUpdate is a Boolean that indicates whether the Application
Controller should force a running script to stop before attempting the update.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
removeHost(RemoveHostType removeHostInput)
Removes a single host from an application.
RemoveHostType parameters:
applicationID identifies the application to use. hostID is a unique string identifier for this host.
Endeca Confidential
Provisioning interface
236
should force any running components or services to stop before attempting the remove. If a component or service is running, and forceRemove is not set to true, then the remove call will fail.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
updateHost(UpdateHostType updateHostInput)
Updates a running host.
UpdateHostType parameters:
applicationID identifies the application to use. host is a HostType object (see page 258) specifying the host to add. directories allows you to specify directories using a full path and a
name. These directories are associated with hosts and created when the host is provisioned.
force the conditions under which the update can take place, by stopping running components or services.
Throws:
EACFault is the error message returned by the Application Controller
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
237
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
addScript(AddScriptType addScriptInput)
Adds a script to an application.
AddScriptType parameters:
applicationID identifies the application to use. script is a ScriptType object (see page 267) specifying the script to
add.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
removeScript(RemoveScriptType removeScriptInput)
Removes a script from an application.
RemoveScriptType parameters:
applicationID identifies the application to use. scriptID is a unique string identifier for this host.
Endeca Confidential
Provisioning interface
238
force the conditions under which the remove can take place.
Throws:
EACFault is the error message returned by the Application Controller
provisioning warnings thrown when there are fatal errors during provisioning.
Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.
ScriptControl interface
The ScriptControl interface provides programmatic script management capabilities.
FullyQualifiedScriptIDType parameters:
applicationID identifies the application to use. scriptID identifies the script to use.
Throws:
EACFault is the error message returned by the Application Controller
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
239
stopScript(FullyQualifiedScriptIDType stopScriptInput)
Stops the named script.
FullyQualifiedScriptIDType parameters:
applicationID identifies the application to use. scriptID identifies the script to use.
Throws:
EACFault is the error message returned by the Application Controller
getScriptStatus(FullyQualifiedScriptIDType getScriptStatusInput)
Returns the status of a script.
FullyQualifiedScriptIDType parameters:
applicationID identifies the application to use. scriptID identifies the script to use.
Throws:
EACFault is the error message returned by the Application Controller
Returns:
A ScriptStatus object (a sub-class of the StatusType class described on page 269). This status may be Running, NotRunning, or Failed. (Failure results from a failure error code or internal EAC errors).
Endeca Confidential
ScriptControl interface
240
Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference
Endeca Confidential
Chapter 12
Typically, a Java WSDL tool translates these classes into get and set methods. For example, the ApplicationIDType class would generate getApplicationID() and setApplicationID(String[] applicationID) methods. The Microsoft .NET WSDL tool translates these classes into .NET properties.
Be sure to check the client stub classes that are generated by your WSDL tool for the exact syntax of the Application Controller API class members.
242
AddComponentType class
A class that describes a component to be added to a named application during incremental provisioning.
AddComponentType properties
applicationID (required) identifies the application to use. component (required) is one of the following:
Forge (see ForgeComponentType class on page 255) Dgidx (see DgidxComponentType class on page 250) Dgraph (see DgraphComponentType class on page 251) Agidx (see AgidxComponentType class on page 243) Agraph (see AgraphComponentType class on page 244) Crawler (see CrawlerComponentType class on page 248) LogServer (see LogServerComponentType class on page 259) ReportGenerator (see ReportGeneratorComponentType class on page 262)
AddHostType class
A class that describes a host to be added to a named application during incremental provisioning.
AddHostType properties
applicationID (required) identifies the application to use. host (required) is a description of the host to add. directories allows you to specify directories using a full path and a
name.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
243
AddScriptType class
A class that describes a script to be added to a named application during incremental provisioning.
AddScriptType properties
applicationID (required) identifies the application to use. script (required) is a description of the script to add.
AgidxComponentType class
A class that describes an Agidx component within an application. An Agidx component runs Agidx on a machine, creating a set of Agidx indices that support the Agraph program in a distributed environment. The Agidx component is used only in distributed environments and is run sequentially on multiple machines. On the first machine, the Agidx component takes the Dgidx output from that machine as its input. On the next machine, the output from the first Agidx run is copied over, using the Copy service. It, along with the Dgidx output from that machine, is used as Agidx input.
AgdixComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component.
Endeca Confidential
AddScriptType class
244
args is a list of command-line flags to pass to Agidx. previousAgidxOutputPrefix is the file prefix of the Agidx data from the previous run, which has been copied to this machine by a Copy operation. This parameter should not be used when running the Agidx component on the first data subset. inputPrefixList (required) is the paths to the output of various Dgidxes, which Agidx uses as input. outputPrefix (required) is the path and prefix name for the Agidx
output.
AgraphChildListType class
A class used by the AgraphComponentType class to establish the list of child Dgraphs and related devices used by a resulting Agraph. Each Agraph component can contain a mixture of DgraphReferenceType and DgraphHostPortType objects. A DgraphReferenceType object refers to a child Dgraph, while a DgraphHostPortType object is typically used to refer to an unprovisioned device such as a load balancer. If you know you are referring only to actual Dgraphs, and not to load balancers or other unprovisioned devices, you do not need to use DgraphHostPortType objects.
AgraphChildListType properties
child (required) is a collection of child Dgraphs and related devices comprising this AgraphChildListType object.
AgraphComponentType class
A class that describes an Agraph component within an application. An Agraph component runs the Agraph program, which defines and coordinates the activities of multiple, distributed Dgraphs.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
245
AgraphComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this
component. Any relative paths in component properties are be interpreted as relative to the components workingDir. The workingDir property, if specified, must be an absolute path.
logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to Agidx. port (required) is the port at which the Agraph should listen. appConfigPrefix is the path and file prefix that define the input for the
Agraph.
reqLogFile is the path to and name of the request log. children is a list of the child Dgraphs and related devices for this
Agraph.
inputPrefix (required) is the path and prefix name for the Agidx output that the Agraph uses as an input. startupTimeout specifies the amount of time in seconds that the
ApplicationIDListType class
A class that describes a returned value of a list application call to the Provisioning service. ApplicationIDListType encapsulates the list of applications running on this EAC Central Server.
ApplicationIDListType properties
Endeca Confidential
ApplicationIDListType class
246
ApplicationType class
A class that describes an application to be deployed by the Application Controller. An application is composed of a set of components residing on a set of hosts. You can construct an ApplicationType object as a full specification of the application, including all hosts and components. Alternatively, you can start with an empty an ApplicationType object and incrementally fill in the hosts, components, and scripts. In the latter case, order matters, because a host must be added before you add a component that lives on that host.
ApplicationType properties
applicationID identifies the application to use. hosts is a list of hosts. components is a list of components. scripts is a list of scripts.
BackupMethodType class
In relation to the Archive utility, this class serves as an identifier for the type of backup you want the utility to perform, Copy or Move.
BackupMethodType fields
The enumeration of possible values is as follows:
Copy Move
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
247
BatchStatusType class
Based on the StatusType class (see page 269), a BatchStatusType object describes the status of a batch component. Batch components include Forge, Dgidx, Agidx, ReportGenerator, and Crawler.
BatchStatusType properties
Starting
ComponentListType class
A class that describes a list of components, such as ForgeComponentType and DgraphComponentType.
Endeca Confidential
BatchStatusType class
248
ComponentListType properties
ComponentType class
A class that describes the base type for all components within an application.
ComponentType properties
Note: Each component contains these properties, as well as some others.
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this
component.
logFile is a string identifying the log file for this component. properties is a string identifying any properties associated with this
component.
CrawlerComponentType class
A class that describes a Crawler component within an application. A Crawler component runs the Endeca Advanced Crawler, which creates Endeca records based on crawled source documents. For more information about the Advanced Crawler, see the Endeca Information Transformation Layer Guide.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
249
CrawlerComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this
component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path.
logFile is a string identifying the log file for this component. args are command-line arguments to pass to this component. javaOptions are the Java Virtual Machine settings. If you have
extended the Crawler using Java code, you may need to modify these settings, which are passed to the Java process.
classpath lists class path add-ons. If you have extended the Crawler
using Java code, the modifications may require additions to the class path.
defaultSettingsFile (required) is the path to the default settings file for this Crawler component. globalConfigFile (required) is the path to the global configuration file
to crawl.
outputPrefix (required) is the path and prefix name for the data the Crawler component stores. port is the port at which the Crawler should listen for status request messages.
Endeca Confidential
CrawlerComponentType class
250
DgidxComponentType class
A class that describes a Dgidx component within an application. A Dgidx component sends the finished data prepared by Forge to the Dgidx program, which generates the proprietary indices for each Dgraph.
DgidxComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to Dgidx. appConfigPrefix is the path and file prefix that define the input for
Dgidx.
inputPrefix (required) is the path and prefix name for the Forge output
output.
runAspell prepares the Aspell files for the Dgraph. The default is true. It causes the Dgidx component to run dgwordlist and to copy the Aspell files to its output directory, where the Dgraph component can access them. tempDir is the path to the temporary directory that Dgidx uses.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
251
DgraphComponentType class
A class that describes a Dgraph component within an application. A Dgraph element launches the Dgraph (MDEX Engine) software, which processes queries against the indexed Endeca records.
DgraphComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to the Dgraph. port (required) is the port the Dgraph listens at. The default is 8000. appConfigPrefix is the path and file prefix that define the input for
Dgraph.
inputPrefix (required) is the path and prefix name for the Dgidx output
Endeca Confidential
DgraphComponentType class
252
updateDir is the directory from which Dgraph reads partial update files.
For more information, see the Implementing Partial Updates section in the Endeca Information Transformation Layer Guide.
updateLogFile specifies the file for update-related log messages. tempDir is the path to the temporary directory that the Dgraph uses.
DgraphHostPortType class
A class used by the AgraphChildListType class to represent a (non-Dgraph) related device used by a parent Agraph. Each Agraph component can contain a mixture of DgraphReferenceType and DgraphHostPortType objects. A DgraphReferenceType object refers to a child Dgraph that is provisioned with the Application Controller, while a DgraphHostPortType object is typically used to refer to an unprovisioned device such as a load balancer. If you know you are referring only to actual Dgraphs, and not to load balancers or other unprovisioned devices, you do not need to use DgraphHostPortType objects.
DgraphHostPortType properties
hostname (required) is the name of the host. port (required) is the communications port.
DgraphReferenceType class
A class used by the AgraphComponentType class to represent a child Dgraph. Each Agraph component can refer to a mixture of DgraphReferenceType and DgraphHostPortType objects.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
253
DgraphReferenceType properties
DirectoryListType class
A class that represents a collection of DirectoryType objects.
DirectoryListType property
DirectoryType class
A class used by the HostType class to define a directory while provisioning a host.
DirectoryType properties
dirID (required) is a unique identifier for this directory. dir (required) is a full path for this directory.
EACFault class
The class that creates the EACFault. EACFault is the error message returned by the Application Controller when the method fails.
Endeca Confidential
DirectoryListType class
254
FilePathListType
An array of FilePathTypes that describes a returned value of a listDirectoryContents call. FilePathListType operates on the application level.
FilePathListType property
FilePathType
A class that describes a file on a remote host.
FilePathType properties
path (required) is the full path to the file. directory (required) indicates whether the path is a directory.
FlagIDListType class
A class that describes a returned value of a list flags call. FlagIDListType operates on the application level.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
255
FlagIDListType property
ForgeComponentType class
A class that describes a Forge component within an application. A Forge element launches the Forge (Data Foundry) software, which transforms source data into tagged Endeca records.
ForgeComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to Forge. stateDir is the directory where the state file is located. inputDir is the path to the Forge input. outputDir is the directory where the output from the Forge process will
be stored.
outputPrefixName is the prefix, without any associated path
information, that Forge uses to save its output files. These files are located in the directory specified by outputDir.
numPartitions is the number of partitions. pipelineFile (required) is the name of the Pipeline.epx file to pass to Forge. tempDir is the temporary directory that Forge uses.
Endeca Confidential
ForgeComponentType class
256
which provides progress and performance metrics for Forge. For details, see page 327.
FullyQualifiedComponentIDType class
A class that serves as an input to the start, stop, get status, and remove component commands.
FullyQualifiedComponentIDType properties
applicationID (required) identifies the application to use. componentID (required) identifies the component to use.
FullyQualifiedFlagIDType class
In relation to the Synchronization service, this class serves as an input to an acquire or release flag method.
FullyQualifiedFlagIDType properties
applicationID (required) identifies the application to use. flagID (required) is a unique string identifier for this flag.
FullyQualifiedHostIDType class
A class that identifies a host so that it can be used as an input to another command, such as remove host.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
257
FullyQualifiedHostIDType properties
applicationID (required) identifies the application to use. hostID (required) is a unique string identifier for this host.
FullyQualifiedScriptIDType class
A class that identifies a script so that it can be used as an input to another command, such as startScript().
FullyQualifiedScriptIDType properties
applicationID (required) identifies the application to use. scriptID (required) is a unique string identifier for this script.
FullyQualifiedUtilityTokenType class
In relation to the Utility service, this object represents the token.
FullyQualifiedUtilityTokenType properties
applicationID (required) identifies the application to use. token (required) identifies the token used to stop the utility or to get its status. If you do not specify a token, one is generated and returned when you start the utility.
HostListType class
A class that represents a collection of HostType objects.
Endeca Confidential
FullyQualifiedScriptIDType class
258
HostListType property
HostType class
A class that describes a host within an application. Along with components, a collection of HostType objects define an application.
HostType properties
hostname (required) is the name of the host. port (required) is the connection port. hostID is a unique string identifier for this host. directories allows you to specify directories using a full path and a
name.
ListApplicationIDsInput class
An empty object you pass into the Web services interface to get back a list of applications.
ListDirectoryContentsInputType class
An object that serves as an input to the listDirectoryContents object.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
259
ListDirectoryContentsInputType properties
applicationID (required) identifies the application to use to look up the
host.
hostID (required) is a unique identifier for the host within that application. filePattern (required) is the pattern that listDirectoryContents()
expands the wildcards in a pattern. If the expansion results in a file, it returns a file. If the expansion results in a directory, it returns the directory non-recursively. Wildcard expansion can result in any combination of files and directories.
LogServerComponentType class
A class that describes a LogServerComponent within an application. The LogServer component controls the use of the Endeca Log Server.
LogServerComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the components workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. port (required) is the port on which to run the LogServer. outputPrefix (required) is the path and prefix name for the LogServer
output.
gzip (required) controls the archiving of log files. Possible values are true and false.
Endeca Confidential
LogServerComponentType class
260
that the Application Controller will wait while starting the LogServer.
PropertyListType class
A class that represents a collection of PropertyType objects.
PropertyListType property
PropertyType class
The PropertyType class allows you to add arbitrary properties (that is, name/value pairs) to host and all component elements.
PropertyType properties
name (required) is a non-null string. value is a string.
ProvisioningFault class
An extension of EACFault, the ProvisioningFault class is thrown when there are fatal errors during provisioning.
ProvisioningFault properties
errors is a list of provisioning errors. warnings is a list of provisioning warnings.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
261
RemoveApplicationType class
Related to the Provisioning service, this class serves as input to the incremental remove command.
RemoveApplicationType properties
applicationID (required) identifies the application to use. forceRemove indicates whether or not a remove operation should force
RemoveComponentType class
Related to the Provisioning service, this class serves as input to the incremental remove command.
RemoveComponentType properties
FullyQualifiedComponentIDType (required) identifies the component
to use.
forceRemove indicates whether or not a remove operation should force
RemoveHostType class
Related to the Provisioning service, this class serves as input to the incremental remove command.
Endeca Confidential
RemoveApplicationType class
262
RemoveHostType properties
FullyQualifiedHostIDType (required) is a unique string identifier for
this host.
forceRemove is a Boolean that indicates whether or not a remove
operation should force any running components or services to stop before attempting the remove.
RemoveScriptType class
Related to the Provisioning service, this class serves as input to the incremental remove command.
RemoveScriptType properties
applicationID (required) identifies the application. scriptID (required) identifies the script to remove.
ReportGeneratorComponentType class
A class that describes a ReportGenerator component within an application. The ReportGenerator component runs the Report Generator, which processes Log Server files into HTML-based reports that you can view in your Web browser and XML reports that you can view in Web Studio.
ReportGeneratorComponentType properties
componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
263
interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path.
logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to the ReportGenerator. javaBinary, if used, should indicate a JDK 1.5.x or later. Defaults to the
parameter. This parameter is primarily used to adjust the ReportGenerator memory, which defaults to 1GB. To set the memory, use the following:
java_options = -Xmx[MemoryInMb]m -Xms[MemoryInMb]m
inputDirOrFile (required) is the path to the file or directory containing the logs to report on. If it is a directory, then all log files in that directory are read. If it is a file, then just that file is read. outputFile (required) is the name the generated report file and path to
where it is stored.
stylesheetFile (required) is the filename and path of the XSL
keywords:
These keywords assume that days end at midnight, and weeks end on the midnight between Saturday and Sunday.
startDate set the report window to the given date and time. The date
Endeca Confidential
ReportGeneratorComponentType class
264
stopDate sets the report window to the given date and time. The date
RunBackupType class
A child of the RunUtilityType class, this class provides all the information you need to perform a backup operation to the Archive utility.
RunBackupType properties
applicationID (required) is the unique identifier for this application. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
hostID (required) is a unique identifier for the host. The hostID and dirName parameters specify the path to the directory that will be
archived.
dirName (required) is the full path of the directory. The hostID and dirName parameters specify the path to the directory that will be
archived.
backupMethod is either Copy or Move. numBackups specifies the maximum number of archives to store. This number does not include the original directory itself, so if numBackups is set to 3, you would have the original directory plus up to three
archive directories, for a total of as many as four directories. The default numBackups is 5.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
265
RunFileCopyType class
A child of the RunUtilityType class, this class provides all the information you need to run the Copy utility.
RunFileCopyType properties
applicationID (required) identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
fromHostID (required) is the unique identifier for the host you are copying the data from. toHostID (required) is the unique identifier for the host you are copying the data to. sourcePath (required) is the full path to the source file or directory.
If sourcePath contains no wildcards, then destinationPath must be the destination file or directory itself, rather than the parent directory.
directory.
recursive, when specified, downloads the directories recursively.
RunRollbackType class
A child of the RunUtilityType class, this class provides all the information you need to perform a rollback operation to the Archive utility.
RunRollbackType properties
Endeca Confidential
RunFileCopyType class
266
token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
hostID (required) is a unique identifier for the host. The hostID and dirName parameters specify the path to the directory that will be
archived.
dirName (required) is the full path for the directory. The hostID and dirName parameters specify the path to the directory that will be
archived.
RunShellType class
A child of the RunUtilityType class, this class provides all the information you need to run the Shell utility.
RunShellType properties
applicationID (required) identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
hostID (required) is a unique identifier for the host. cmd (required) is the command(s). workingDir is the full path for the working directory.
RunUtilityType class
Parent class of the other Utility classes.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
267
RunUtilityType properties
applicationID (required) identifies the application to use. token identifies the token used to stop the utility or to get its status. If
you do not specify a token, one is generated and returned when you start the utility.
ScriptListType class
A class that describes a list of scripts.
ScriptListType properties
ScriptType class
A class that describes the base type for all scripts within an application.
ScriptType properties
scriptID (required) is a unique string identifier for the script. cmd (required) is the command that is used to start the script. logFile is the file for appended stdout/stderr output. It defaults to $ENDECA_CONF/logs/script/(app_id).(script_id).log. workingDir is the working directory. It defaults to $ENDECA_CONF/working/(app_id)/.
Endeca Confidential
ScriptListType class
268
SSLConfigurationType class
A class used by the DgraphComponentType class and AgraphComponentType class to enable SSL on the resulting components.
SSLConfigurationType properties
that is used by the Dgraph or Agraph processes to present to any client. The file name can be a path relative to the components working directory.
Authority file that the Dgraph or Agraph processes use to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.
cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that the Dgraph or Agraph processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. See the Endeca Security Guide for more information.
StateType class
A class used by the StatusType class to describe the state of a component.
StateType fields
An enumeration of the following fields:
Starting
Running
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
269
NotRunning Failed
StatusType class
Describes the status of a server component in the Application Controller. Server components include the Dgraph, Agraph, and LogServer. All other components (Forge, Dgidx, Agidx, ReportGenerator, and Crawler) are batch components. Their status is described by the BatchStatusType class on page 247.
StatusType properties
Starting
startTime (required) The time the component started; for example, 5/25/07 3:58 PM. failureMessage The failure message, which tells you that a failure has occurred in the execution of the component. failureMessage is empty unless state is FAILED. (This is different from EACFault, which tells you that a problem has occurred while processing the Web Service request to get the status.)
TimeRangeType class
A class used by the ReportGeneratorComponentType class to set the time span of interest (or report window).
Endeca Confidential
StatusType class
270
TimeRangeType fields
The enumeration of possible values is as follows:
TimeSeriesType class
A class used by the ReportGeneratorComponentType class to turn on the generation of time-series data and specify the frequency, hourly or daily.
TimeSeriesType fields
The enumeration of possible values is as follows:
Hourly Daily
UpdateComponentType class
A class that describes a component to be updated during incremental provisioning.
UpdateComponentType properties
applicationID (required) identifies the application. component (required) identifies the component to update.
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
271
UpdateHostType class
A class that describes a host to be updated during incremental provisioning.
UpdateHostType properties
applicationID (required) identifies the application. host (required) identifies the host to update. forceUpdate indicates whether the Application Controller should force
any components or services running on the host to stop before attempting the update.
UpdateScriptType class
A class that describes a script to be updated during incremental provisioning.
UpdateScriptType properties
applicationID (required) identifies the application. scriptID (required) identifies the script to update.
Endeca Confidential
UpdateHostType class
272
Administrators Guide Chapter 12: Endeca Application Controller API Class Reference
Endeca Confidential
SECTION III
Transferring Implementations Between Environments
274
Administrators Guide
Endeca Confidential
Chapter 13
About transferring your front-end Web application Transferring implementations using the tools Transferring implementations using the emgr_update utility Removing an application from Endeca IAP
276
Endeca Confidential
277
6 7 8
From the File menu, choose Save to save the project with the latest instance configuration. Optionally, remove inactive dynamic business rules from the instance configuration. Copy the instance configuration files from the saved project to the location in your production environment where the Endeca Application Controller expects them to be. Use the Application Controller to run a baseline update on the production system.
emgr_update syntax
The emgr_update is a utility that assists you in updating the instance configuration of a production system based on the changes made with the Endeca tools in a staging environment. You run emgr_update from a command line. Open a command prompt or UNIX shell to run the program. The syntax for running emgr_update is:
Endeca Confidential
278
emgr_update <parameters>
The following table describes the command line parameters you can use with emgr_update. You can specify only one --action operation for each invocation of the utility. emgr_update parameter
--host name:port
Description
Specifies the host name and the port of a machine running Web Studio. If you are retrieving settings (using the get operation), this is the host name of the environment you are transferring from; if you are updating settings (using the set operation), this is the host name of the environment you are transferring to.
--action <op>
Specifies one of the actions, where <op> is one of the operations listed below. Retrieves all the instance configuration settings for a project you performed in the Web Studio in the staging environment, for their use in the production environment. Required parameters: --dir, --prefix Optional parameters: --filter
--action get_all_settings
--action get_ws_settings
Retrieves only those instance configuration settings that can be modified in Web Studio (not all settings). These configuration settings include the following Web Studio features: dynamic business rules, keyword redirects, thesaurus entries, automatic phrases, stop words, and dimension ordering.
Endeca Confidential
279
emgr_update parameter
--action get_mdex_settings
Description
Retrieves the instance configuration settings that were modified in Web Studio, and that do not require a baseline update to update the MDEX Engine. These configuration settings include the following Web Studio features: dynamic business rules, keyword redirects, thesaurus entries, automatic phrases. Required parameters: --dir, --prefix Optional parameters: --filter
--action set_post_forge_dims
Updates the Web Studio configuration with the post-Forge dimensions. Retrieves the copy of Web Studio settings for the post-Forge dimensions. Typically, this operation can be used for debugging purposes. Updates a Web Studio production environment with instance configuration settings that were extracted from the Web Studio configuration in the staging environment. Removes all the instance configuration files from Web Studio for the application that you specify with the --app_name parameter. Removing the instance configuration does not remove the associated provisioning information for an application.
--action get_post_forge_dims
--action update_mgr_settings
--action remove_all_settings
--app_name <string>
Specifies the name of the application provisioned to the EAC Central Server.
Endeca Confidential
280
emgr_update parameter
Optional action parameters --dir <string>
Description
Specifies the pathname of the directory where the instance configuration files are written to or read from. Required for all --action operations except for set_post_forge_dims and remove_all_settings. Specifies the prefix used for the instance configuration files. This option is required for all --action operations except for set_post_forge_dims and get_post_forge_dims. Filters out dynamic business rules that have a state of inactive. (A rule has the property endeca.internal.workflow.state set to INACTIVE.) This option can be used in conjunction with get_all_settings or get_ws_settings when retrieving an instance configuration. Removing inactive rules is not required but it is recommended. With the default rule filter in place, the MDEX Engine does not fire any rule whose state is inactive. In other words, you can transfer an instance configuration, including both active and inactive rules, and the MDEX Engine fires only active rules in reply to user queries.
--prefix <string>
--filter filter
--post_forge_file <string>
Specifies the pathname to the file that contains post-Forge dimensions. This option is required for the set_post_forge_dims operation.
Endeca Confidential
281
emgr_update parameter
Optional global parameters --stop_on_warnings
Description
Stops the utility without asking you if the target directory is not empty before a get operation, or if it finds extra or missing files before an update operation. Continues running the utility if the target directory is not empty before a get operation, or continues if there are extra or missing files before an update operation. Displays the usage parameters for the utility. Displays the version number for the utility.
--ignore_warnings
--help
--version
By using the appropriate --action operations, you can use the emgr_update program to do the following tasks:
Transfer the instance configuration files for a particular application of your choice from the staging environment to the production environment. After the transfer, you run a baseline update using your own EAC scripts. You have the option of transferring all instance configuration files, or transferring just the instance configuration files that Web Studio modified. If your implementation uses the Advanced Crawler, you run it before running a baseline update. Transfer the instance configuration for a particular application from one Web Studio environment to another. Remove instance configuration information for a specified application from the Web Studio configuration. Send the Forge dimensions to the Web Studio.
Endeca Confidential
282
Using emgr_update to transfer from a Web Studio staging environment to a Web Studio production environment
This section describes how to transfer instance configuration files from a staging environment that uses Web Studio to a production environment that also uses Web Studio. Two scenarios are described:
Transferring all instance configuration files for an Endeca project. Transferring only the instance configuration files that can be modified by Web Studio.
If the destination directory is not empty, you will be prompted to continue. Answer y.
Endeca Confidential
283
When the utility finishes, all project configuration files (including the project and pipeline files) are copied to the production directory specified by the --dir parameter. 4 Use the Endeca Application Controller to run a baseline update on the production system.
The utility uses prefix.esp as the name of the output Developer Studio project file (where prefix is whatever you specified with the --prefix parameter). If there is an existing project file in the production directory with another name, it is recommended that you change it to prefix.esp.
The Endeca Advanced Crawler Dynamic business rules Thesaurus entries Automatic phrases Stop words Dimension ordering
A subsequent baseline update uses the updated files for these features.
Endeca Confidential
284
c d
For the --app_name parameter, specify the application name whose instance configuration you want to transfer. Use the --filter parameter, to remove inactive dynamic business rules.
If the destination directory is not empty, you will be prompted to continue. Answer y. When the utility finishes, the project files that Web Studio modified are copied to the production directory specified by the --dir parameter.
Use the Application Controller to run a baseline update on the production system.
To transfer and deploy all instance configuration files to the production system:
1 2 In the staging environment, use Developer Studio and/or Web Studio to make changes to the project. Run emgr_update with an --action of get_all_settings. a b c For the --host parameter, specify the machine name and port for the staging environment in Web Studio. For the --dir parameter, specify the Forge input directory in the production environment. For the --app_name parameter, specify the application name whose instance configuration you want to transfer.
Endeca Confidential
285
If the destination directory is not empty, you will be prompted to continue. Answer y. When the utility finishes, all project configuration files are copied to the production directory specified by the --dir parameter.
Run emgr_update with an --action of update_mgr_settings. a b For the --host parameter, specify the machine name and port for the production environment in Web Studio. For the --dir parameter, specify the directory that contains the project configuration files that will be used to update the production environment in Web Studio (typically, this will be the same directory that was used in step 2). For the --app_name parameter, specify the application name whose instance configuration you want to transfer.
5 6
If your implementation uses the Advanced Crawler, use the Application Controller to run the baseline update. Use the Application Controller to run a baseline update on the production system.
Endeca Confidential
286
The applications instance configuration files are removed from the Web Studio.
Using emgr_update to send the dimensions file produced by Forge to the Web Studio
Read this section only if you are not using an Application Controller default script for running the baseline update, and are using your own scripts for this purpose. If you are using your own scripts for running the baseline update, then after you run Forge, you need to send the dimensions file produced by Forge to the Web Studio instance configuration for your application.
To send the dimensions file produced by Forge to the Web Studio, run emgr_update as follows:
On the machine that has access to the output files of Forge (this is typically the machine on which you ran Forge), run emgr_update with an action of set_post_forge_dims: a b For the --host parameter, specify the machine name and port for the environment in Web Studio. For the --app_name parameter, specify the application name whose instance configuration you want to update with this information. For the --post_forge_file parameter, specify the full pathname to the output file where Forge stores its dimensions.
Endeca Confidential
287
Endeca Confidential
288
Endeca Confidential
SECTION IV
Tuning Endeca Implementations
290
Administrators Guide
Endeca Confidential
Chapter 14
About the MDEX Engine request log Request log file format Extracting information from request logs Request log rolling URL parameter mapping
292
Each new line in a request log file starts with a time stamp such as 1180560941. Each entry has the following eleven columns:
[Time of Request] [Client IP Address] [Query ID Tag] [Response Size] [Response Duration] [Processing Time] [HTTP Return Code] [Number of Results Returned] [Queue Status] [Thread ID] [Request URL]
The following table describes the log entries in more detail. Column
Time of Request
Description
Time stamp indicating the time the request was completed, in seconds, since the epoch (January 1, 1970, 00:00:00 UTC).
Endeca Confidential
293
Column
Client IP Address Query ID Tag
Description
IP address of the requesting client. Query identifier tag, comprised of underscore-seperated timestamp, IP-address, port, and sequential values. This value can be used to correlate an entry in a child Dgraphs log with a query in a parent Agraphs log. This field will always contain a - character in request logs from non-child Dgraphs. Number of bytes written to the client. May be less than or equal to the intended result size, for example, due to a premature session end. The request lifetime, in milliseconds. Equal to the total amount of time between when the Dgraph reads the request from the network and finished sending the result. May include queuing time, such as time spent waiting for earlier requests to be completed. Processing time, in milliseconds. Equal to the total computation time required for the Dgraph to handle the request, excluding network and wait time. This value gives an accurate measure of how expensive the request was to compute, given current system state. (That is, if the machine in question was busy with other threads or processes, the time may be longer than on an otherwise unused machine.) For any given query, Processing Time will always be smaller than Response Duration.
Response Size
Response Duration
Processing Time
HTTP return code, such as 200 (OK) or 404 (Not Found). Number of results returned (or - if the HTTP request was not a query).
Endeca Confidential
294
Column
Queue Status
Description
Number of threads busy when the request was received. If the queue status, Q, is positive, it means that there were Q requests in the queue. If Q < 0, it means that there were 0 requests in the queue, and -Q threads idle. The thread ID of the thread that was assigned the request (or - in single-threaded mode). The URL passed to the MDEX Engine, unquoted, exactly as it was received.
Thread ID
Request URL
Run the Cheetah script available from Endeca Support. Write your own Perl code.
The Cheetah script reads one or more MDEX Engine logs and reports on the nature and performance of the queries recorded in those logs. This report provides information on what actually happened in the past, instead of reporting on potential performance or capacity planning for the future. This script can be run manually in order to debug performance problems, and should also be run on a regular basis to continually monitor performance and call out trends in dgraph traffic load, latency, throughput, and application behavior. To download the Cheetah script, log in to the Endeca Support Center at https://support.endeca.com/ and see the Tools and Utilities page. If you write Perl to extract, manipulate, and analyze the information you a request log, you may find the following setting useful in Perl scripts:
Endeca Confidential
295
perl -nae
where:
turn.
a turns on autosplit. e indicates that it should execute the next argument, which should be Perl code.
This script shows how many queries took more than five seconds. It splits the line on whitespace into an array called F. The fourth element in the array ([3]) corresponds to the Response Duration and represents the amount of time the query took.
perl -nae 'print if $F[3] > 5000' logfile
If you are tracking system trends by time, you may find it useful to correlate the epochal time that the log displays with human-readable time. This script is used to convert the time stamps into a more readable form.
perl -nae 'print scalar localtime $F[0]," $_"'
Note: In this script, Localtime is set to the location where you are doing analysis, so if you are looking at a log from a different time zone, you may want to change the timezone. On UNIX systems the TZ environment variable can be set to effect this change. For example, TZ=US/Pacific.
PID is the Dgraph process ID. N is the number of logs that this Dgraph has already rotated. N=0 the
first time the Dgraph does log rotation, and then goes up by 1 each time.
Endeca Confidential
296
Your Application
You can use the information in the remainder of this section to translate the MDEX Engine request log file, which tells you exactly which URLs the MDEX Engine has processed. By extension, these are the URLs that the Presentation API has sent to the MDEX Engine. If the API has sent an incorrect URL to the MDEX Engine, it is a good indication that the API received an incorrect URL from the Web application in the first place. Use the table below to map the parameters as follows: The left column is the mapping between the Presentation API and the ENE parameter. The far right column is the mapping between the Web application and the Presentation API parameter. Note: For a complete description of the ENE URL query parameters, see the Endeca Developers Guide.
Example mappings
Here are some sample mappings. Web Application to API
/controller.jsp?N=0 /controller.jsp?N=0&Ntk=DESC& Ntt=merlot
Endeca Confidential
297
Mapping parameters
The ENE parameters in bold are the primary parameters, while those in non-bold are secondary parameters. MDEX Engine parameter
graph? node
Description
Navigation query Navigation query parameter, navigation descriptors Navigation query parameter, record offset Navigation query parameter, aggregated record offset Navigation query parameter, exposed refinements Navigation query parameter, records per aggregated record Navigation query parameter, sort Navigation query parameter, sort order Navigation query parameter, rollup Navigation query parameter, record search key, terms, and options Navigation query parameter, search interface, relevance ranking terms, relevance ranking strategy and match mode Navigation query parameter, Did You Mean Navigation query parameter, compute phrasings
Maps to...
N N
offset
No
offset
Nao
group
Ne
allbins
Np
relrank
dym
Nty
autophrase
Ntpc
Endeca Confidential
298
Description
Navigation query parameter, rewrite query Navigation query parameter, merchandising preview time Navigation query parameter, merchandising rule filter Navigation query parameter, range filters Navigation query parameter, record filters Navigation query parameter, Endeca Query Language Navigation query parameter, analytics Navigation query parameter, dynamic refinement ranking
Maps to...
Ntpr
merchpreviewtime
Nmpt
merchrulefilter
Nmrf
pred
Nf
filter
Nr
structured
Nrs
stat refinement
Nl Nrc
search? terms
Dimension search query Dimension search query parameter, search terms Dimension search query parameter, options Dimension search query parameter, dimension search scope Dimension search query parameter, search dimension Dimension search query parameter, number of results
D D
options
Dx
node
Dn
model
Di
num
Dp
Endeca Confidential
299
Description
Dimension search query parameter, offset Dimension search query parameter, rank Dimension search query parameter, range filters Dimension search query parameter, record filters Dimension search query parameter, Endeca Query Language
Maps to...
Do
rank pred
Dk Df
filter
Dr
structured
Drs
abin? id
Aggregated record query Aggregated record query parameter, record ID Aggregated record query parameter, descriptors Aggregated record query parameter, rollup Aggregated record query parameter, range filters Aggregated record query parameter, record filters Aggregated record query parameter, Endeca Query Language
A A
node
An
groupby
Au
pred
Af
filter
Ar
structured
Ars
bin?
Record query
Endeca Confidential
300
Description
Record query parameter, record ID
Maps to...
R
Endeca Confidential
Chapter 15
About Eneperf Using Eneperf Obtaining logs for use with Eneperf Debugging Eneperf
302
About Eneperf
Eneperf is a performance debugging tool that can measure throughput to help you identify system bottlenecks. Eneperf makes HTTP queries against the MDEX Engine (Dgraph) based on your MDEX Engine request logs and gathers the resulting statistics, without processing the results in any way. Because Eneperf is lightweight, it has a very slight impact on performance. In most cases, it can be run on the same machine as the Dgraph or Agraph being tested. In addition, it can be run on a remote machine. Eneperf drives a substantial load at the MDEX Engine and reveals how many operations per second the MDEX Engine responds with. You specify the log file and tell Eneperf how many times to run through it, as well as the number of client connections to simulate. Eneperf understands Endeca MDEX Engine URLs, which use the pipe symbol (|). Because the pipe symbol is not a legal character in the URL/URI standards, other programs, such as wget, may transform it inappropriately.
Using Eneperf
Eneperf is installed in the Endeca Navigation Platform bin directory. It has the following usage: usage: eneperf [-v] [--header <header file path>] [--gzip] [--nreq <n>] [--nodnscache] [--progress] [--pidcheck <pid>] [--quitonerror] [--rcvbuf <size bytes>] [--record <recording file prefix>] [--record_hdr] [--record_ord] [--record_roll <max KB per recording file>] [--reqstats] [--runtime <max runtime (minutes)>] [--sleeponerror <secs>] [--stats <num reqs>] [--throttle <max req/sec>] [--warn <max req time warning threshold (msecs)>] <host> <port> <log> <num connections> <num iteration> Eneperf has both required and optional settings.
Endeca Confidential
303
Required settings
The required settings (shown in order) are as follows:
<host> <port> <log> <num connections> <num iterations>
Description
Target host for requests. Port the target host is listening to for requests. Log file of the query portion of the MDEX Engine URLs (that is, the portion that resides in the last column of the MDEX Engine request log), which is used for HTTP request generation. URLs from the <log> file are replayed in order. Maximum number of outstanding requests to allow before waiting for replies. In other words, the number of simultaneous HTTP connection streams to keep open at all times. This number emulates multiple clients for the target server. For example, using <num connections> of 16 emulates 16 concurrent clients querying the target server at all times. Number of times to replay the query log. All outstanding requests are drained before a new iteration is started.
<num connections>
<num iterations>
The following sections contain additional information about the required settings.
Endeca Confidential
Using Eneperf
304
Higher concurrent load can be achieved by using a single large request log file (which might simply be repeated concatenations of a smaller log file) than by using multiple iterations of a small log file. The log file should preferably be at least 100 lines, even if it consists of the same query repeated over and over. Because Eneperf drains all connections between each iteration, running a one-line log file through Eneperf 100 times will give you skewed throughput statistics.
Endeca Confidential
305
For example, if num connections is set to 4, it sends four requests to the MDEX Engine. When one returns, another is sent out to replace it. The number of connections needed to saturate the MDEX Engine varies depending on your MDEX Engine configuration and the server machine characteristics, and generally correlates to the number of threads in use. For example, if you have four threads, you might start with six or eight client connections. A good rule of thumb is to use two times the number of threads. However, a MDEX Engine with four threads might be saturated by just three connections if the queries are complex and all CPUs are being used 100%. There is no hard and fast rule, so feel free to experiment. Although num connections does not have to be large, you want to make sure there are always enough simultaneous clients so that requests are waiting to be served. This ensures that the MDEX Engine stays busy during the communication lag between the MDEX Engine and Eneperf. If you are using a small log with a large num connections, keep in mind that each time the log is restarted, all connections are drained. In effect, using a log file with just one entry limits num connections to one. To generate a MDEX Engine request log showing the canonical time for each query, run Eneperf with a single client (that is, num connections equal to one), so that it sends only one request at a time. Each query will be executed alone; no other query computations will be contending for the machines resources. The request log can then be examined for slow queries without the concern that they happened to be slow because other queries were executing simultaneously.
Optional settings
Eneperf contains the following optional settings: Setting
-v
Description
Verbose mode. Print query URLs as they are requested. Specify path of file containing HTTP header text, one header field per line.
Endeca Confidential
Using Eneperf
306
Setting
--gzip --nreq <n> --nodnscache
Description
Send 'Accept-encoding: gzip' in the HTTP request. Stop after n requests. Disable caching of DNS hostname lookups. By default, Eneperf caches these lookups to improve performance. On a connection error, tests the target Dgraph or Agraph. If the process is not alive, Eneperf terminates. Display the percentage of the log file processed. Causes Eneperf to terminate if it encounters a fatal HTTP error. By default, errors are ignored. Override the default TCP receive buffer size, set via the SO_RCVBUF socket option. Record a log of all HTTP responses. Recorded data is placed in output files with the prefix <recording file prefix>. Data files are given the suffixes .dat1, .dat2, and so on. An index file with the suffix .idx is also produced. In --record mode, record HTTP header information along with page content. In --record mode, ensure that log entries are recorded in the same order that they are listed in the <log> file, even if they are processed out of order. Set the maximum number of KB per recording file. Default is 1024 KB.
--pidcheck <pid>
--progress --quitonerror
--record_hdr
--record_ord
Maintain and report per-request timing statistics. Note: This option only produces accurate results when <num connections> is set to 1.
Endeca Confidential
307
Setting
--runtime <max runtime (minutes)> --seek <n>
Description
Places a limit on the run time for Eneperf. Eneperf exits after <max runtime> minutes.
Tells Eneperf to skip a specified number of requests in the specified log file and start with log entry n. For example, in a log containing 100 requests, if Eneperf is invoked with --seek 50, it will issue 50 requests from 50 to 100. Used in conjunction with --seek to indicate that Eneperf should start each iteration with the log entry specified by --seek. --seekrepeat only comes into play when the number of iterations specified is greater than one. If so, when Eneperf reaches the end of the log file, --seekrepeat indicates that it should start the next iteration from the log entry specified as a value to --seek (50 in the example above). The default behavior (without --seekrepeat) is to seek only on the first iteration and restart from the beginning of the file on subsequent iterations.
--seekrepeat
--sleeponerror <secs>
Causes Eneperf to sleep for number of seconds before sending any new requests after it encounters a connection error. Print statistics after every <num reqs> requests are processed (sent and received). Places an approximate limit on the number of requests per second that Eneperf will generate. For more information, see Setting the number of queries sent to the Dgraph on page 308. Causes Eneperf to print a warning message for any requests that take longer than the specified threshold time limit to return (useful for finding the slow requests in a log file).
Endeca Confidential
Using Eneperf
308
Endeca Confidential
309
There are numerous ways that you can obtain such logs; this section provides you with a few examples.
It deletes DGRAPH STARTUP lines, because these lines contain no commands. It removes admin requests, such as admin?op=stats or admin?op=exit, that can cause problems in an Eneperf run. It strips out everything before the first slash (/) character in each remaining line.
Endeca Confidential
310
Debugging Eneperf
Because it is very lightweight, Eneperf itself is not prone to errors. In general, if you make an error while typing the command line argument, Eneperf returns its help message. However, if you accidentally mistype the MDEX Engine port, Eneperf generates numerous failed connection error messages. It is also possible for error messages to be displayed during normal operation. For example, if the log file contains a request to retrieve a record that is not present in the MDEX Engines data set, Eneperf (as expected) presents a 404 (file not found) message. Note: Queries that cause HTTP errors are not counted towards ops/sec performance results displayed by Eneperf.
Endeca Confidential
Chapter 16
About the MDEX Engine Statistics page Viewing the MDEX Engine Statistics page Sections of the MDEX Engine Statistics page Checking the aliveness of a Dgraph or Agraph
312
For example, if your Dgraph is running on your local machine and listening on port 8000, specify this:
http://localhost:8000/admin?op=stats
The source data for the statistics is stored in XML. By default, the MDEX Engine Statistics page is rendered into HTML through an Endeca XSLT stylesheet, stats.xslt, that is installed in the ENDECA_ROOT/conf/dtd/xform directory. If your browser supports XSLT transformations (for example, Internet Explorer 6 and later), you can view the statistics as transformed by stats.xslt or you can modify the shipped stats.xslt stylesheet to provide a different transformation of the data. If your browser does not support XSLT transformations, or if you want to see the raw XML, rename or remove ENDECA_ROOT/conf/dtd/xform/stats.xslt.
Endeca Confidential
313
Description
Various statistics (average, standard deviation, minimum, maximum, and total) on:
Queue length (in multithreaded mode only) Number of threads busy (in multithreaded mode
only)
Performance Statistics
Total processing time Number of records (results) Response size (in bytes)
Properties Dimensions
The usage of memory per property. The usage of memory per dimension.
Endeca Confidential
314
Description
Connection details as well as A list of all arguments the Dgraph was started with.
Description
Displays how much time the Dgraph has spent computing sorts and range filters. (This is typically done as background work.)
Description
Total and per-key statistics for the main cache. Total and per-key statistics for the page cache.
Endeca Confidential
315
Description
The URL and total time in milliseconds for the ten queries with the largest total computation time (that is, queue time plus Dgraph processing time plus write time) made in the session. Details on the performance of specific features, including navigation, record filter, range filter, merchandising, record search, and snippeting. The number of result pages served, as well as format performance and byte size by average, standard deviation, minimum, maximum, and total. Information about the number of navigation pages, as well as performance, query size, and result size by average, standard deviation, minimum, maximum, and total. The total number of sorts performed, and the percentage of those sorts for each sort type. Information pertaining to the analytics features in Endeca Analytics. A finer-grained analysis of the performance of individual features.
Hotspots
Results
Navigation
Record Sorting
Analytics
Search
Note: If you modified the shipped stats.xslt stylesheet, the information might display differently.
Endeca Confidential
316
overhead. A quicker way to check the aliveness of a Dgraph or an Agraph is by accessing the following URL:
http://DgraphServerNameOrIP:DgraphPort/admin?op=ping
or
http://AgraphServerNameOrIP:AgraphPort/admin?op=ping
The Dgraph or Agraph quickly returns a lightweight HTML response page with the following content:
dgraph host:port responding at date/time
or
agraph host:port responding at date/time
For example, if your Dgraph is running on your local machine and listening on port 8000, specify this:
http://localhost:8000/admin?op=audit
Endeca Confidential
317
The source data for the auditing reports is stored in XML. By default, the MDEX Engine Auditing page is rendered into HTML through an Endeca XSLT stylesheet, audit.xslt, that is installed in the ENDECA_ROOT/conf/dtd/xform directory.
For example, an audit persistence file on the sample wine implementation might look like this:
audit-wine-0.xml
This convention ensures that each Dgraph creates a unique file. It makes it possible to maintain the audit persistence files for numerous Dgraphs in an application in the same directory without contention. By default, the audit persistence file is written to a directory called persist that is located in the applications working directory. To direct it elsewhere, use the Dgraph flag --persistdir when you first create the Dgraph. Do not move or rename this directory after it has been created. You should not delete the audit persistence file or attempt to edit it manually. Upon startup, the Dgraph checks for the presence of this file, and if it cannot find it or read it, it issues a warning message and creates a new one. Note: If you see such a warning message when you first create a Dgraph, you can safely disregard it.
The Query Load statistic tracks the hour with the most queries in each calendar week, starting when you first run the Dgraph and persisting through process restarts.
Endeca Confidential
318
All other auditing statistics constantly monitor the peak value over the course of a calendar week, and report the exact time when a value greater than the current peak value appears, starting when you first run the Dgraph and persisting through process restarts. Because these metrics are calculated over the course of a week, a change such as a deleted record is not reflected until the following week, when the peak value count is reset.
Description
The peak number of queries per hour that the Dgraph has seen in the past week. In addition to the peak value, this metric also returns the peak interval, expressed as the peak hour (aggregated by hour). The exact time with the peak number of records in the past week. The peak value for the total number of identified properties and dimensions across all records in the Dgraph. This value is calculated over the past week. The total number of populated dimensions or properties for all records. This value is calculated over the past week.
Number of Records
Number of Columns
Number of Assignments
Data Size
The total size, in bytes, of all user data. Note: This may vary, depending on platform and on whether the machine is 32 or 64 bit.
Endeca Confidential
319
Description
Basic connection and machine details. A list of all arguments the Dgraph was started with.
Note: This tab is identical to the one of the same name on the MDEX Engine Server Statistics page.
Endeca Confidential
320
Endeca Confidential
Chapter 17
Overview of the Forge logging system About log levels Logging topics The command line interface
322
Description
Indicates a problem so severe that you have to shut down. Non-fatal error messages. Alerts you to any peculiarities the system notes. You may want to address these. Provides status messages, even if everything is working correctly. Provides all information of interest to a user.
ERROR WARN
INFO
DEBUG
Logging topics
All log messages are flagged with one or more topics. There are different types for different components, all logically related to some aspect of the component.
Endeca Confidential
323
By selecting a level you are requesting all feedback at of that level of severity and greater. For example, by specifying the WARN level, you receive WARN, ERROR, and FATAL messages. The --logLevel option sets either the default log level, the topic log level, or both:
The default log level provides global logging for the component. This example
forge --logLevel WARN
logs all WARN level or higher messages. Note: Forge defaults to log all INFO or higher level messages if a default level is not specified.
The topic log level provides logging at the specified level for just the specified topic. This example
forge --logLevel baseline=DEBUG
overrides the default log level and logs all DEBUG messages and higher in the baseline topic.
Endeca Confidential
324
If two different log levels are specified, either globally or to the same topic, the finer-grained level is used. In the case of this example
forge logLevel INFO logLevel WARN
all INFO level messages and higher are printed out. It is possible to specify both default and topic level logging in the same command to filter the feedback that you receive. For example, the command
forge --logLevel WARN --logLevel config=INFO --logLevel update=DEBUG
works as follows:
It logs all WARN or higher messages, regardless of topic. It logs any message flagged with the config topic if it is INFO level or higher. It logs any message flagged with the update topic if it is DEBUG level or higher.
Endeca Confidential
325
Status in 5.1
Supported.
Old meaning
Defaults to v (verbose) or the EDF_LOG_LEVEL environment variable. Verbose (all messages). Info (info, stat, warnings, and errors). Stat (stat, warnings, and errors). Warnings and errors. Errors. Quiet mode (errors). Silent mode (fatal errors). n/a n/a Printed out the timestamp when using --legacyLogFormat.
-vv -vi
Deprecated. Supported.
DEBUG INFO
INFO WARN ERROR ERROR FATAL DEBUG FATAL Has no effect. The timestamp is always printed now.
For information about the deprecation state of logging systems used in previous versions of the Endeca software, see the Endeca Migration Guide version 5.1.
Endeca Confidential
326
Endeca Confidential
Chapter 18
About the Forge metrics Web service Enabling Forge metrics Using Forge metrics The MetricsService interface
328
Metric, which serves as a parent category for child metrics, without containing any data of its own. Attribute metric, such as the start time of the Forge being queried. For each attribute metric you request, you receive ID, Name, and Attribute Value (a string).
Endeca Confidential
329
For each measurement metric you request, you receive ID, Name, Measurement Units (a string), and Measurement Value (a number).
Notes:
The Forge metrics Web service does not tell you what step Forge is on or its estimated time to completion. The service is not long-lived; it exits when Forge does. For this reason, you cannot use this service to find out how long the Forge run took. The Forge metrics Web service does not work in conjunction with parallel Forge.
In Web Studio, on the EAC Administration page. In a provisioning file used with the eaccmd tool (for details on provisioning a Forge component, see page 158). Programmatically, via the webServicePort on the ForgeComponentType object. For details, see page 255.
Outside of the Application Controller environment, you can also set or change the Web service port (and thus turn on Forge metrics) at the Forge commandline. The commandline argument for setting the metrics port is --wsport <port-number>.
Endeca Confidential
330
Endeca Confidential
331
getMetric method
The MetricsService interface consists of a single method, getMetric.
getMetric(MetricInputType getMetricInput)
Lists the collection of metrics in an application.
Parameters:
getMetricInput is a MetricInputType object consisting of a path to the
node you want to query and a Boolean setting that allows you to exclude that nodes children from the query.
Throws:
MetricFault is the error message returned when the method fails.
Returns:
getMetricOutput, a string collection of metrics.
MetricsService classes
The MetricsService interface contains the following classes:
MetricType
A class that describes a metric.
Endeca Confidential
332
Properties
id is a unique string identifier for the metric. displayName is the name for the metric, as it appears in the output
file.
children is a collection of metric objects.
MetricListType
A class that describes a list of metrics.
Properties
metric is a collection of metrics comprising this MetricListType object.
MetricInputType
A class that describes the input to the getMetric method.
Properties
path is the path to the node you want to query. Null indicates top level,
MetricResultType
A class that describes the output returned by the getMetric method.
Properties
metric is an object of type MetricType.
AttributeType
An extension of MetricType, the AttributeType class describes an attribute metric.
Endeca Confidential
333
Properties
value is a string describing the attribute.
MeasurementType
An extension of MetricType, the MeasurementType class describes a measurement metric.
Properties
value is a double representing the value of the measurement metric. units is a string describing the unit of measure used by the metric.
Endeca Confidential
334
Endeca Confidential
Chapter 19
Cross-platform tools Solaris and Linux tools Solaris-specific tools Linux-specific tools Windows tools
Note: The tools listed here are not supported by Endeca and are subject to change. In addition, these suggestions are not meant to overrule your choice of other tools.
336
Cross-platform tools
The following tools are available in both UNIX and Windows versions. Tool
Ethereal
Description
Ethereal is an open source license network protocol analyzer for both UNIX and Windows. It allows you to examine data from a live network or a capture file on disk. For information and downloads, see http://www.ethereal.com.
Tcpdump/Windump
Tcpdump (and its Windows version, Windump) are network traffic analysis tools. These tools can be used to watch and diagnose network traffic according to various complex rules. You can download Tcpdump from http://www.tcpdump.org. You can download Windump from http://www.winpcap.org/windump. Note: Tcpdump comes with most Linux distributions by default.
Endeca Confidential
337
Description
Netperf is a network benchmarking tool that can be used to measure the throughput of many different types of TCP and UDP connections. Netperf provides tests for both unidirectional throughput, and end-to-end latency. Note: Be sure to compile netperf with histogram support. To simulate the network traffic to a MDEX Engine with average result pages of 50,000 bytes, run netperf like this: netperf -l 600 -v 2 -H remotehost -p 8899 -t TCP_CRR -- -r 200, 50000 where:
-l is the length of the test in seconds -v specifies verbose output level -H indicates the host where netserver is running -p indicates the port that was given to the netserver process -t indicates the test to run. TCP_CRR is the TCP test that opens
a new TCP connection for each request/response
Endeca Confidential
338
Tool
Sar
Description
Sar reports system activity on single processor systems. It reports the status of counters in the operating system that are incremented as the system performs various activities. These include counters for CPU utilization, buffer usage, disk I/O activity, TTY device activity, switching and system-call activity, file access, queue activity, inter-process communications, swapping and paging. On Solaris, sar is part of the system activity reporter package. On Linux, it is part of the downloadable sysstat package.
iostat
The iostat utility iteratively reports terminal, disk, and tape I/O activity, as well as CPU utilization. On Solaris, iostat is built in to the operating system. On Linux, it is part of the downloadable sysstat package (see Linux-specific tools on page 339).
Solaris-specific tools
The following utilities are built into Solaris. Tool
prstat
Description
On Solaris the prstat command displays information about active processes on the system. By default, prstat displays information about all processes sorted by CPU usage. On multiprocessor machines, cpusar reports per-CPU statistics, and mpsar reports systemwide statistics. Kstat reports many kernel parameters and statistics. The lockstat utility gathers and displays kernel locking and profiling statistics. Lockstat allows you to specify which events to watch, how much data to gather for each event, and how to display the data.
Endeca Confidential
339
Linux-specific tools
The following tools are available for Linux. Tool
sysstat
Description
The sysstat utilities package is a download for Linux that contains performance monitoring tools such as iostat, sar, and mpstat. Iostat and sar are described in Solaris and Linux tools on page 337; mpstat is described below. For information and downloads, see http://perso.wanadoo.fr/sebastien.godard.
Mpstat
Mpstat is the Linux multiprocessor load display utility. It displays system processor activity information on your screen for each of the processors serialized on your system.
Windows tools
The following tools are available for Windows. Tool
Task Manager
Description
The Windows Task Manager provides information about programs and processes running on your computer. It also displays the most commonly used performance measures for processes. You can access the Task Manager by right-clicking an empty area on the task bar on your Windows machine.
Performance Monitor
The Performance Monitor provides details about the resources used by specific components of the operating system and by programs that have been designed to collect performance data. You can access the Performance Monitor from the Control Panel by selecting Administrative Tools > Performance.
Endeca Confidential
Linux-specific tools
340
Tool
Other performance tools
Description
Sysinternals (http://www.sysinternals.com) offers useful freeware tools, including the following:
Endeca Confidential
SECTION V
Appendices
342
Administrators Guide
Endeca Confidential
Appendix A
Agidx options Agraph options Dgidx options Dgraph options Forge options
IMPORTANT: All options are case-sensitive. Note: Keep in mind the following terminology equivalences, which may clarify the meaning of some options:
Bin equals record Model equals dimension Category equals dimension value Attribute equals property Feature link equals precedence rule Edge equals refinement dimension value
344
Agidx options
Agidx is a file that runs in a distributed environment. It creates a set of Agidx indices and aggregates the Agraph index with the current data subset. Usage Agidx has the following usage:
agidx [-v] [--options] <input db_prefix list> <output db_prefix>
Description
Prefix for output generated previously by Agidx that should now be used as input. This option helps you incrementally build the Agidx index, allowing you to run Agidx against individual data subsets that have been generated by Dgidx. Verbose mode. Specify file path to which stdout/stderr should be remapped (default is to use default stdout/stderr for the process). Print version information and exit. Print this help message and exit.
--version --help
Endeca Confidential
345
Agraph options
A distributed configuration requires an additional program called Agraph. The Agraph program is responsible for receiving requests from clients, forwarding the requests to the distributed MDEX Engines, and coordinating the results. From the perspective of the Endeca API, the Agraph program behaves identically to a Dgraph program. Usage The Agraph has the following usage:
agraph [-v] [--options] <db_prefix>
Description
Verbose mode. Print this help message and exit. Enable backwards compatibility, so that the Agraph can communicate with previous versions of the Presentation API. Only the previous two full versions are supported (i.e., 5.0.x and 4.8.x). Therefore, the value for <api-version> must be one of the following:
500 = for all 5.0.x versions of the API. 480 = for all 4.8.x versions of the API.
--child <host>:<port> Specify the location of a child Dgraph or Agraph process. Specify a configuration file to read on startup. The configuration file should contain arguments of the same format used on the command line (that is, it ignores whitespace, including newlines).
--config <filename>
Endeca Confidential
Agraph options
346
Option
--fork
Description
(UNIX only) Causes the Agraph to fork off a new process to handle each request. (UNIX only) Set the maximum number of live child processes in --fork mode. Default value is 4. Change the path for the request log file (./agraph.reqlog is the default value). Set default maximum wait time (in seconds) for client connection shutdown. The default value is 1 second. Specify the maximum number of seconds the Agraph waits for the client to download data across the network. The default network timeout value is 30 seconds. Disable caching of hostname to IP number lookups for child Dgraphs. By default, the Agraph caches these name lookups to improve performance. Disable inclusion of implicit refinement dimension values in computed refinement sets. Implicit refinements are dimension values that are assigned to all records in the current result set, and whose selection therefore does not narrow the results. Do not process merchandising (that is, dynamic business rule) results from children. These are processed by default. Do not return results if any child fails to respond. Specify file path to which stdout/stderr should be remapped. (The default is to use default stdout/stderr for the process.)
--log <path>
--net-closetimeout
--net-timeout
--nodnscache
--noimplicit
--nomerch
Endeca Confidential
347
Option
--pidfile <pidfilename>
Description
Specify the file to write the process ID (pid) to. If unspecified, the default name of the pid file depends on how the Agraph starts. Running the Agraph in a Control System environment (deprecated) or from the command line creates a default named agraph.pid. Running the Agraph in an Endeca Application Controller environment creates a default named agraph-S0-R0.pid.
--port <num>
Specify the port that the Agraph listens to for user queries on the associated host. Default is 8888. Specify initial record list radius (tuning parameter; the default is 100). Create dynamic record properties indicating the relevance rank assigned to record search results. Print version information and exit.
--radius <num>
--stat-brel
--version
Endeca Confidential
Agraph options
348
Dgidx options
The Dgidx program indexes the tagged Endeca records that were prepared by Forge, and creates the proprietary indices for the Endeca MDEX Engine. Usage The usage of Dgidx is as follows:
dgidx [-cCqvS] [--options] <data export file> <output db_prefix>
Description
Quiet mode. Verbose mode. Enable compound dimension search for the application. Use of this option increases indexing time. However, if this option is not enabled at index time, compound dimension results (multiple-dimension-value results) are not returned by the MDEX Engine. Compute and report coverage statistics for dimensions and properties. Compute dimension value equivalence classes as a space-saving optimization. This adds time to the indexing phase, but reduces the size of the index. The default is to search leaf assignments only. Print the help message and exit. Deprecated. Do not delete unused dimension values from the system. Note: From version 4.8, this behavior is now the default.
--cov
--equivopt
--help --keepcats
Endeca Confidential
349
Option
--lang <lang-id>
Description
Assume all documents are in the specified language. The default for <lang-id> is en. For details, see the Internationalized Data section in the Endeca Developers Guide. Ignore character accents when indexing text. Use ISO Latin 1 character mappings for international characters when performing search indexing. Note that the accents are folded down before indexing, so only a single form is indexed. If --wildcard indexing is enabled, specifies the minimum text substring length to index. Generally, this value should not be modified (default is 1). Disable strict attribute checking. Allows records to retain property values for properties with no property (or <PROP_REF> element) defined in the navigation configuration file, and in the Properties view of Developer Studio. Do not do XML validation while reading the XML export file. This option only makes a difference if the export file is in XML format. Limit the number of records that Dgidx reads. Compute the inverted index using the slower but more memory-efficient offline method. Specify the number of tmp files that should be used during offline indexing of the inverted index (the default is 8). Specify file path to which stdout/stderr should be remapped (the default is to use default stdout/stderr for the process).
--latin1
--ngram_min <value>
--nostrictattrs
--noxmlvalidate
--offline_tmpn <num>
Endeca Confidential
Dgidx options
350
Option
--pos_backcompat
Description
Deprecated. Instructs Dgidx to create positional indexes using the 4.8.x configuration model. This flag is provided as a migration convenience to replicate 4.8.x indexing behavior. To replicate 4.8.x behavior for creating positional indexes, run the flag and also select Enable positional indexing for a dimension or property. If you run the flag and do not select Enable positional indexing for a dimension or property, then Dgidx does not create a positional index. (In versions 5.0.0 and higher, Dgidx creates a positional index by default for each dimension and property.)
--sort <spec>
Specify a default sort specification for the data set. The format of <spec> is: key|dir where key is the name of a property or dimension on which to sort and dir is either asc for ascending or desc for descending (if not specified, the order will be ascending). key can also be a geocode property, as in this example: Location(43,73)|desc You can specify multiple sort keys in the format: key_1[|dir_1]||key_2[|dir_2]| |...||key_n[|dir_n] If you specify multiple sort keys, the records are sorted by the first sort key, with ties being resolved by the second sort key, whose ties are resolved by the third sort key, and so on.
Endeca Confidential
351
Option
--spellmode <mode>
Description
Specify the spelling correction mode for the application. Supported modes are:
--tmpdir <dir>
Endeca Confidential
Dgidx options
352
Dgraph options
You start the MDEX Engine by running a program called Dgraph, and pointing it at a set of indices prepared by the Data Foundry. The Dgraph has a number of options that allow you to adjust the MDEX Engine (for example, you can tweak spelling, caching, and so forth). Usage The usage of Dgraph is as follows:
dgraph [-?Adv] [--options] <db_prefix>
Description
Print the help message and exit. Disallow server shutdown/restart through admin URLs. Start in debug mode. Verbose mode. Print information about each request to stdout. Compute counts for root dimension values and any intermediate dimension value selections. This matches the default behavior in earlier versions of the MDEX Engine. By default, the Dgraph only computes refinement counts for proper refinements (in other words, for actual refinement dimension values). It does not compute counts for root dimension values or for any intermediate dimension value selections.
-d -v
--ancestor_counts
Endeca Confidential
353
Option
--back_compat <api-version>
Description
Enable backwards compatibility, so that the Dgraph can communicate with previous versions of the Presentation API. Only the previous two full versions are supported (i.e., 5.0.x and 4.8.x). Therefore, the value for <api-version> must be one of the following:
500 = for all 5.0.x versions of the API. 480 = for all 4.8.x versions of the API.
--backlog-timeout <time in seconds> Specify the wait limit for a query that has been read and queued for processing. After n seconds spent waiting in the process queue, the Dgraph responds with a timeout message. The default is 60 seconds. Specify the maximum memory usage in MB for the MDEX Engine main cache. When --cmem is not specified, a default value size of 256 MB is used. Specify a configuration file to read on startup. The configuration file should contain arguments of the same format used on the command line (that is, it ignores whitespace, including newlines). Deprecated. Instead, use the SEARCH_INERT_DVALS attribute of DIMSEARCH_CONFIG in your projects Dimsearch_config.xml file. Allow non-navigable dimension values, (such as dimension roots) in dimension value search results. Normally, these dimension values are dynamically filtered out of dimension value search results. --deadends Allow dead-end refinement options.
--cmem <MB>
--config <path>
--csrch_nnav
Endeca Confidential
Dgraph options
354
Option
--disable_fast_aspell
Description
Disable fast mode for the aspell spelling module. If you disable fast mode, it decreases the performance of the spelling correction, but may allow additional queries to be corrected. When the fast mode is enabled, it can significantly speed up applications that use spelling correction features with the aspell module. The fast mode is used by default.
--dtag <data-tag>
Specify the data tag to send with all result XML objects. The default is to use <db_prefix> as the data tag. Enable did you mean explicit query spelling suggestions for fulltext search queries. Specify the threshold number of hits at or above which did you mean suggestions will not be generated. The default is 20. Specify the maximum number of did you mean query suggestions to return for any query. The default is 1. Specify the threshold spelling correction score for words used by the did you mean engine. The default is 175. Deprecated. Enable refinement verbose/debugging messages. Deprecated. Specify the maximum number of records to sample during refinement computation. The default is 64 records. Larger values can improve edge-ranking quality, but may reduce performance. Specify the minimum number of records to sample during refinement computation. The default is 0. Larger values can improve dynamic refinement ranking quality but may reduce performance.
--dym
--dym_hthresh <thresh>
--dym_nsug <count>
--dym_sthresh <thresh>
--edebug
--esamp <num>
--esampmin <num>
Endeca Confidential
355
Option
--esampr <num>
Description
Deprecated. For dynamically-ranked dimensions, specify the maximum number of dimension values per top N refinements above which the refinement algorithm will reduce to exclusively bottom-up mode. The default is 4. Deprecated. Specify the threshold number of records below which strict bottom-up refinement LCA computation is used. The default is 128 records. This is a performance tuning parameter. The best value depends on the structure and scale of the input data set. Deprecated. Set a threshold number of hits above which exact/substring based rankings will not be computed. The default is no threshold. Print the help message and exit. Disable the default approximate computation of implicit refinements. This option is not a recommended setting. If this option is not enabled, dimension values without full coverage of the current result record set may sometimes be returned as implicit refinements, although the probability of such false implicit refinements is miniscule.
--ethresh <num>
--ftrnk_thrsh <thresh>
--help --implicit_exact
--implicit_sample
Specify the maximum number of records to sample per query. The default is 1024. In approximate computation mode (default), this parameter allows a trade-off between performance and the likelihood of incorrect implicit refinements being returned. In implicit_exact mode, this option is simply a performance tuning parameter that can be used to trade off record sampling work for index access work.
Endeca Confidential
Dgraph options
356
Option
--lang <lang-id>
Description
Assume all queries are in the specified language. The default is en. For details, see the Internationalized Data section in the Endeca Developers Guide. Ignore character accents when handling search requests, and use ISO Latin 1 character mappings when processing search requests. Specify the path for the Dgraph request log file. If unspecified, the default name of the request log file depends on how the Dgraph starts. Running the Dgraph in a Control System environment (deprecated) or from the command line creates a default log file named dgraph.reqlog. Running the Dgraph in an Endeca Application Controller environment creates a default log file named dgraph-S0-R0.log. Specify the path and filename for the Endeca Query Language statistics log. By default, this log is turned off; specifying this flag will activate logging of statistics for Endeca Query Language requests. Sets the threshold above which statistics information for an Endeca Query Language request will be logged. The value is specified in milliseconds (1000 milliseconds = 1 second). The value can also be specified in seconds by adding a trailing s to the number, such as 1s for 1 second. The default is 60000 milliseconds (1 minute). Note that this flag is dependent on the --log_stats flag being used. Show memory usage of the data structures of the Dgraph. Deprecated. Display verbose debugging messages during merchandising rule processing.
--latin1
--log <path>
--log_stats <path>
--log_stats_thresh <value>
--memusage
--merch_debug
Endeca Confidential
357
Option
--net-close-timeout
Description
Specify the maximum wait time (in seconds) for client connection shutdown. The default value is 1 second. Specify the maximum number of seconds the Dgraph waits for the client to download data across the network. The default network timeout value is 30 seconds. Do not return information about implicit dimensions with node results, when displaying refinements in navigation results. This flag lets you optimize performance for applications where it is not necessary to present the implicit dimensions to the users in navigation results. If you specify this flag, the MDEX Engine still computes the implicit dimensions with node results, but they are not included in the navigation results that are displayed to the users.
--net-timeout
--noctrct
Disable filtering for dynamic business rules. Specify file path to which stdout/stderr should be remapped (the default is to use default stdout/stderr for the process). Running the Dgraph in an Endeca Application Controller environment creates a default file named dgraph-S0-R0.out.
--pcmem
Specify the maximum memory usage in MB for the page cache. The default is 32 MB. Deprecated. Display verbose performance debugging messages during core Dgraph navigation computations.
--perf_msg
Endeca Confidential
Dgraph options
358
Option
--persistdir
Description
Directs the Dgraph audit persistence file (written by default to a directory called persist that is located in the applications working directory) to a directory of your choice. For details about the audit persistence file, see page 317.
IMPORTANT: Use the --persistdir flag only when you first create the Dgraph. Do not move or rename this directory after it has been created.
--pidfile <pidfile-path> Specify the file to write the process ID (pid) to (./dgraph.pid. is the default). Specify the port to use in server (non-interactive) mode. The default is 5555. Deprecated. Causes the Dgraph to display a help message describing the set of available dynamic search relevance ranking modules, and the syntax for specifying relevance ranking strategies. For details on the relevance ranking modules, see the Using Relevance Ranking section in the Endeca Developers Guide. Deprecated. Display verbose information about relevance ranking during search query processing. Deprecated. Set the relevance ranking strategy for dimension value search. Deprecated. Display verbose information about record filter performance. Specify the max number of terms for text search. Default is 10.
--port <num>
--relrnk_help
--relrnk_msg
--search_max <num>
Endeca Confidential
359
Option
--snip_cutoff <num words>
Description
Limit the number of words in a property that the MDEX Engine evaluates to identify the snippet. If a match is not found within <num> words, the MDEX Engine does not return a snippet, even if a match occurs later in the property value. If <num words> is unspecified, the default is 500. (The 500-word default applies even if the flag is not specified at all in the Dgraph options.) Globally disable snippeting. Specify location of spelling data files. Parameter should be a full path to a directory containing the needed aspell support files for spelling correction features (see --dym, --spl, and --spld options). Note that this path must be an absolute path (relative paths are not supported). In addition, this is a path to a directory containing at least the generic pspell/aspell support files. This does not need to be the same as the location of the .spelldat file for the indexed data set. The Dgraph typically requires write permissions in this directory, unless a correct or writable .pwli file is already available in this directory. Set maximum number of variants considered for spelling and did you mean correction (the default is 32). Allow cross-property suggestions, and count cross-property matches when evaluating the frequencies of suggestions. Normally, suggestions must match results in a single property value.
--spell_bdgt <num>
--spell_glom
--spell_msg
Endeca Confidential
Dgraph options
360
Option
--spell_nobrk
Description
Disable word-break analysis in the suggestion engine. Normally, in addition to considering spelling corrections, the suggestion engine considers alternate word separation points for the query to generate suggestions for did you mean and auto-correct.
--spl
Enable auto-suggest spelling corrections for fulltext search. Specify the minimum number of hits at or above which auto-correct suggestions will not be generated for full text searches. The default is 1, meaning that if there are one or more hits for a users full text search, then auto-correct does not provide spelling suggestions. Stated differently, if you use the default of 1 and there are zero (0) hits for a users search, then spelling auto-correct does engage and provides suggestions for alternate keyword spellings. Specify the maximum number of auto-correct suggestions to return for any full text search query. The default is 1. Specify the threshold spelling correction score for words used as auto-correct suggestions. The default is 125. Deprecated. Filter dimension values out of term search results when an ancestor dimension value is already available in the search results. Specify the path of the eneCert.pem certificate file that will be used by the Dgraph to present to any client for SSL communications. If not given, SSL is not enabled for Dgraph communications.
--spl_hthresh <thresh>
--spl_nsug <count>
--spl_sthresh <thresh>
--srchfltr
--sslcertfile <certfile-path>
Endeca Confidential
361
Option
--sslcafile <CA certfile-path>
Description
Specify the path of the eneCA.pem Certificate Authority file that the Dgraph will use to authenticate SSL communications with other Endeca components. If not given, SSL mutual authentication is not performed. Set one or more cipher names (such as RC4-SHA) that specify the minimum cryptographic algorithm that the Dgraph will use during the SSL negotiation. If multiple ciphers are specified, the names must be separated by colons. Enable all available dynamic dimension value attributes. Note that this option has performance implications and is not intended for production use. Set the cutoff for record counts. Once there are this many records associated with a refinement dimension value, the record count algorithm will stop and return this number or a number higher than it. Set the threshold for stat-bins (that is, the maximum number of records above which record counts will not be computed). By default, stat-bins runs with no threshold. Create dynamic record attributes indicating the relevance rank assigned to fulltext search result records. Create dynamic record attributes indicating the weight for all records returned. Create dynamic dimension value attributes indicating the relevance ranking score (for dimension value search results).
--sslcipher <cipher-list>
--stat-all
--stat-bins-cutoff <nbins>
--stat-bins-thresh <thresh>
--stat-brel
--stat-bwgt
--stat-rel
Endeca Confidential
Dgraph options
362
Option
--stat-srnk
Description
Deprecated. Create dynamic dimension value attributes indicating the static rank for all dimension values. Direct all output to syslog. Set a limit on the number of words in a users search query that are subject to thesaurus replacement. The default value of <limit> is 3. This means that up to 3 words in a users search query can be replaced with thesaurus entries. If there happen to be more terms in the query that match thesaurus entries, say terms 4 and 5, then terms 4 and 5 are not replaced by thesaurus expansion. This option is intended as a performance guard against very expensive thesaurus queries. Lower values improve thesaurus engine performance. For more information, see the Using Stemming and Thesaurus section in the Endeca Developers Guide.
--thesaurus_msg
Deprecated. Enables verbose output for the thesaurus engine. Specify that words in a multiple-word thesaurus form should be treated like phrases and should not be stemmed, which will increase performance for some query loads. Single-word terms will be subject to stemming regardless of whether this flag is specified. This flag prevents the Dgraph from expanding multi-word thesaurus forms by stemming. Thesaurus entries continue to match any stemmed form in the query, but multi-word expansions only include explicitly listed forms. To get the multi-word stemmed thesaurus expansions, the various forms must be listed explicitly in the thesaurus.
--thesaurus_ multiword_nostem
Endeca Confidential
363
Option
--threads <num>
Description
Specify the number of query threads. If the specified value is 0, the Dgraph runs in non-threaded mode. If the specified value is greater than 0, the Dgraph runs in threaded mode executing the specified number of query threads. The default is 0 (non-threaded). In threaded mode, additional threads are also started to execute internal maintenance tasks. Specify the path to a temporary directory to be used to hold temporary files (the default is the base directory of db_prefix). Specify to the dgraph not to compute implicit dimensions, and to only compute and present explicitly specified dimensions, when displaying refinements in navigation results. Specifying this flag does not reduce the size of the resulting record set that is being displayed. Be aware that if you use this flag, in order to receive meaningful navigation refinements, you need to make top-level precedence rules work for ALL outbound queries. (Since the dgraph does not compute implicit dimensions, it also no longer uses precedence rules for all queries, which otherwise it does by default). You can make top-level precedence rules work for all your queries by appending the ID of the root of the primary dimension to the navigation state on each outbound query (such as, use N=xxx, instead of N=0 in your query). If you do not do this, you may receive meaningless refinement options returned, for some of your queries. Specifying this flag lets you improve run-time performance of the MDEX Engine. For more information on ways of improving the run-time performance of the MDEX Engine, see the "Displaying refinement dimension values" section in the Endeca Performance Tuning Guide.
--tmpdir <dir>
--unctrct
Endeca Confidential
Dgraph options
364
Option
--updatedir <dir>
Description
Specify the directory into which completed partial update files will be placed. Partial update files are also read from this directory. For more information, see the Implementing Partial Updates section in the Endeca Information Transformation Layer Guide. Specify the file for update related log messages. If unspecified, the default name of the update file depends on how the Dgraph starts. Running the Dgraph in a Control System environment (deprecated) or from the command line creates a default named dgraph.updatelog. Running the Dgraph in an Endeca Application Manager environment creates a default named dgraph-S0-R0-update.log. Show verbose messages while processing updates. Validate that all indexed data loads and then exit. Print version information and exit. In word-break analysis, specify the maximum number of breaks to insert or remove per query. The default is 1. In word-break analysis, specify the minimum length of a new word-break term. The default is 2. In word-break analysis, disable word-break insertion analysis. In word-break analysis, disable word-break removal analysis.
--updatelog
--updateverbose
--validate_data
--version --wb_maxbrks
--wb_minbrklen
--wb_noibrk
--wb_norbrk
Endeca Confidential
365
Option
--wildcard_approx <mode>
Description
Enable approximate wildcard search query matching, which is faster than default exact wildcard matching, but may return some false positive matches. (Use larger values of --ngram_max at indexing time to decrease the likelihood of false positives with this option.) Supported values for the <mode> parameter are:
--wildcard_msg
--whymatch
--whymatchConcise
Endeca Confidential
Dgraph options
366
Option
--wordinterp
Description
Enable computation of word interpretation dynamic supplement (or see-also) objects, which report on alternate forms of user query terms considered by the text search engine while processing fulltext (record) search requests. For more information, see the Using Word Interpretation section in the Endeca Developers Guide.
Forge options
The Forge program transforms your raw data into tagged Endeca records. Forge references the information in the pipeline you create with Developer Studio to perform its transformations. Usage The usage of Forge is as follows:
forge [-bcdinov] [--options] <Pipeline-XML-File>
<Pipeline-XML-File> can be a relative path or use the file://[hostname]/ protocol. Options Forge takes the following options: Option
-b <cache-num>
Description
Specify the maximum number of records that the record caches should buffer. This may be set individually in the Maximum Records field of the Record Cache editor in Developer Studio.
Endeca Confidential
367
Option
-c <name=value>
Description
Forge has a set of XML entity definitions whose values can be overridden at the command line, such as current_date, current_time, and end_of_line. You can specify a replacement string for the default entity values using the -c option, or in an .ini file specified with -i (described below). The format is: <configValName=configVal> For example: end_of_line=\n which would be specified on the command line with: -c end_of_line=\n or included as a line in an .ini file specified with -i. This allows you to assign pipeline values to Forge at the command line. In the above example, you would specify &end_of_line; in your pipeline file instead of hard-coding \n, then invoke Forge with the -c option shown above. Forge would substitute \n whenever it encountered &end_of_line;. For a complete list of entities and their default values, see the ENTITY definitions in Endeca_Root/conf/dtd/common.dtd.
-d <dtd-path>
Specify the directory containing DTDs (overrides the DOCTYPE directive in XML). Specify an .ini file that contains XML entity string replacements. Each line must be in this form: <configValName=configVal> See the description of the -c option for details.
-i <ini-filename>
-n <parse-num>
Specify the number of records to pull through the pipeline. This option is ignored by the record cache component.
Endeca Confidential
Forge options
368
Option
-o <filename> -v[f|e|w|i|d]
Description
Specify an output file for messages. Set the global log level. See --logLevel for corresponding information. If the -v option is omitted, the global log level defaults to d (DEBUG) or the value set in the EDF_LOG_LEVEL environment variable. If the -v option is used without a level, it defaults to d (DEBUG).
f = FATAL messages only. e = ERROR and FATAL messages. w = WARNING, ERROR, and FATAL
messages.
--clientNum <num>
--combineWarnCount <num>
Endeca Confidential
369
Option
--compression <num> | off
Description
Instruct Forge to compress the output to a level of <num>, which is 0 to 9 (where 0 = minimum, 9 = maximum). Specify off to turn off compression. Specify the number of retries (-1 to 100) when connecting to the server. The default is 12 while -1 = retry forever. Requires the --client option. Deprecated. Specify the global disk backed record cache setting (<value> is either NONE or IN_MEMORY_INDEX). Encrypt a key pair so that only Forge can read it. For details on this options, see the Implementing the Endeca Crawler section in the Endeca Data Foundry Guide. Print full help if used with no options. Prints specific help with these options (option names and arguments are case sensitive):
--connectRetries <num>
--dbRecCache <value>
--encryptKey [user:]<password>
--help [option]
--ignoreState
Endeca Confidential
Forge options
370
Option
--indexConfigDir <path>
Description
Instruct Forge to copy index configuration files from the specified directory to its output directory. Instruct Forge to load input data from this directory. <path> must be an absolute path and will be used as a base path for the pipeline. Any relative paths in the pipeline will be relative to this base path.
--inputDir <path>
Note: If the pipeline uses absolute paths, Forge ignores this flag.
--input-encoding <encoding> --javaArgument <java_arg> Deprecated. Specify the encoding of non-XML input files. Prepend the given Java option to the Java command line used to start a Java virtual machine (JVM). Override the value of the Class path field on the General tab of the Record adapter, if one is specified. If the Record adapter has a Format setting with JDBC selected, then Class path indicates the JDBC driver. If the Record adapter has a Format setting with Java Adapter selected, then Class path indicates the absolute path to the custom record adapters .jar file. --javaHome <java_home> Specifies the location of the Java runtime engine (JRE). This option overrides the value of the Java home field on the General tab of a Record adapter, if one is specified. The --javaHome setting requires Java 2 Platform Standard Edition 5.0 (aka JDK 1.5.0) or later. --logDir <path> Instructs Forge to write logs to this directory, overriding any directories specified in the pipeline.
--javaClasspath <classpath>
Endeca Confidential
371
Option
--logLevel (<topicName>=) <logLevel>
Description
Set the global log level and/or topic-specific log level. If this option is omitted, the value defaults to INFO or to that set in the EDF_LOG_LEVEL environment variable. For corresponding information, see the -v option. For more information about Forge logging, see page 321. Possible log levels are:
FATAL = FATAL messages only. ERROR = ERROR and FATAL messages. WARNING = WARNING, ERROR, and FATAL
messages.
--numClients <num>
Endeca Confidential
Forge options
372
Option
--numPartitions <num>
Description
Specify the number of Dgidx instances available to Forge. This number corresponds to the number of Dgraphs, which in turn corresponds to the number of file sets Forge creates. This option overrides the value of the NUM_IDX attribute in the ROLLOVER element of your projects Pipeline.epx file, if one is specified.
--outputDir <path>
Instruct Forge to save output data to this directory, overriding any directories specified in the pipeline. Override the value specified in Output prefix field of the Indexer Adapter or Update Adapter editors in your Developer Studio pipeline. Add <dir> to perls library path. May be repeated. File in which to store process ID (PID). Print records as they are produced by each pipeline component. If number is specified, start printing after that number of records have been processed. Specify the number of seconds (0 to 60) to sleep between connection attempts. The default is 5. Requires the --client option. Run as a server and listen on port specified Requires the --numClients option.
--outputPrefix <prefix>
--perllib <dir>
--retryInterval <num>
--server <portNum>
Endeca Confidential
373
Option
--spiderThrottle <wait>: <expression_type>: <expression>
Description
During a crawl, throttle the rate at which URLs are fetched by the spider, where: <wait> is the fetch interval in seconds. <expression_type> specifies the type of regular or host expression to use:
--sslcertfile <certfile-path>
--sslcipher <cipher>
Endeca Confidential
Forge options
374
Option
--tmpDir <path>
Description
Instruct Forge to write temporary files in the specified directory, overriding any directories specified by environment variables. The <path> value is interpreted as being based in Forges working directory, not in the directory containing Pipeline.epx. Timing statistics (comp = time each component). Specify the number of seconds (from -1 to 300) that the server waits for clients to connect. Default is 60 and -1 means wait forever. Requires the --server option. Print out the current version information. Start the Forge metrics Web service, which is off by default. It listens on the port specified.
--time <comp>
--timeout <num>
Endeca Confidential
Appendix B
376
Default Ports
Port
Endeca MDEX Engine, user query port Endeca Logging and Reporting Server port Note: The Log Server port number can be no larger than 32767. Endeca Control System JCD port Note: The JCD is deprecated in this release. Endeca HTTP service port Endeca HTTP service shutdown port
Default
8000 8002
8088
8888 8090
Endeca Confidential
Appendix C
The script copies the project configuration files from Web Studio to the host running Forge, using emgr_update.pl. It runs Forge, and then copies the post-Forge dimensions back to Web Studio, again using emgr_update.pl. The script copies the Forge output files onto the host running the Indexer, and then runs the Indexer. It brings down the Dgraph if it is running, copies the Indexer output files, and uses emgr_update.pl to download associated application rules, redirects, and thesaurus entries. The script starts the Dgraph.
A version of the baseline update script is included with the Endeca software, and is stored in %ENDECA_ROOT%\bin\baseline-update.bat for Windows ($ENDECA_ROOT/bin/baseline-update.sh for UNIX). You can copy and modify the underlying Java code as needed, as described in the following section. Note: Before editing the baseline update script, Endeca recommends using the Endeca Deployment Template. The Endeca Deployment Template is a collection of operational components that provides a starting point for development and application deployment, and is a free download from the support site. For more information about the Endeca Deployment Template, see Using the Endeca Deployment Template on page 190.
378
The script source tree is installed as part of the Endeca reference implementation, and can be found in %ENDECA_REFERENCE%\eac_scripts on Windows, or $ENDECA_REFERENCE/eac_scripts on UNIX (where ENDECA_REFERENCE stands for the location of the reference implementations). The executable files for the script are stored in the %ENDECA_ROOT%\bin (Windows) or $ENDECA_ROOT/bin (UNIX); they depend on the eacscript.jar file in %ENDECA_ROOT%\lib\java (Windows) or $ENDECA_ROOT/lib/java (UNIX).
You can generate your own version of the eacscript.jar file by modifying the source files in the reference implementation.
The application has an EAC Agent running on the Web Studio host. The application contains a provisioned host named webstudio. That host must be specified with a fully qualified host name. The application has exactly one Forge component provisioned. Any additional ones are ignored. The application has exactly one Indexer component provisioned. Any additional ones are ignored. The application has exactly one Dgraph component provisioned. Any additional ones are ignored.
Endeca Confidential
379
The baseline update script itself must be provisioned. For details on provisioning EAC scripts, see Defining scripts in your provisioning file on page 154.
Component and script control commands are located on page 201 of Using the Eaccmd Tool The ScriptControl interface is located on page 238 in the Endeca Application Controller API Interface Reference Information about using scripts in Web Studio is located in the Web Studio Help
Endeca Confidential
380
Endeca Confidential
Index
AddComponentType class 242, 270, 271 AddHostType class 242, 243 adding cookies to the preview application 93 Agidx component, Endeca Application Controller 168, 243 Agidx flags 344 AgidxComponentType class 243 Agraph checking aliveness of 315 showing status in Web Studio 43 Agraph component, Endeca Application Controller 170, 244 Agraph flags 345 AgraphChildListType class 244 AgraphComponentType class 244 Application Controller. See Endeca Application Controller Application element, defining 151 application provisioning, in Web Studio 38 ApplicationIDListType class 245 ApplicationType class 246 archiving log files 116 audience for this guide xxi
auditing, viewing for MDEX Engine 316 authentication configuration parameter for LDAPLoginModule 67 AuthHttpENEConnection enableSSL method 129 automatically scheduling report generation 108
B
BackupMethodType class 246 baseline update script 377 BatchStatusType class 247
C
canonical paths 157 certificates location of Java keystore 68 changing the preview application in Web Studio 90 checkPasswords configuration parameter for LDAPLoginModule 68 classes AddComponentType 242 AddHostType 242 AgidxComponentType 243 AgraphChildListType 244
382
AgraphComponentType 244 ApplicationIDListType class 245 ApplicationType 246 BackupMethodType 246 BatchStatusType 247 ComponentListType 247 ComponentType 248 CrawlerComponentType 248 DgidxComponentType 250 DgraphComponentType 251 DgraphHostPortType 252 DgraphReferenceType 252 EACFaultMessage 253 FlagIDListType 254 ForgeComponentType 255 FullyQualifiedComponentIDType 256 FullyQualifiedFlagIDType 256 FullyQualifiedUtilityTokenType 257 HostListType 257 HostType 258 ListApplicationIDsInput 258 LogServerComponentType 259 RemoveComponentType 261 RemoveHostType 261 ReportGeneratorComponentType 26 2 RunBackupType 264 RunFileCopyType 265 RunRollbackType 265 RunShellType 266 RunUtilityType 266 SSLConfigurationType 268 StatusType 269 TimeRangeType 269 TimeSeriesType 270 command options for Endeca programs, specifying 31 ComponentControl interface 219 ComponentListType class 247 ComponentType class 248 connection setting for Eneperf 304
Administrators Guide
cookie name for Web Studio 92 cookies adding to the preview application 93 Cpusar performance analysis tool 338 Crawler component, Endeca Application Controller 174, 248 CrawlerComponentType class 248 custom application for Web Studio See preview application
D
defining scripts 154 deleting log files 116 deleting outdated reports 116 Developer Studio See Endeca Developer Studio Dgidx showing status in Web Studio 43 specifing command options from Web Studio 31 Dgidx component, Endeca Application Controller 161, 250 Dgidx, flags 348 DgidxComponentType class 250 Dgraph checking aliveness of 315 See MDEX Engine Dgraph component, Endeca Application Controller 165, 251 Dgraph request log See MDEX Engine request log Dgraph Stats page See MDEX Engine Statistics page Dgraph, flags 352 DgraphComponentType class 251 DgraphHostPortType class 252 DgraphReferenceType class 252 dgraph.reqlog MDEX Engine request log file 292 directories, provisioning on a host 153 dynamic business rules
Endeca Confidential
383
283
E
EAC Agent, introduced 139 EAC Central Server, introduced 138 EAC scripts baseline updates 377 defined 154 editing 378 report generation 109 EAC. See Endeca Application Controller eaccmd about 192 Archive utility 212 component commands 201 Copy utility 205 incremental provisioning commands 196199 provisioning 183 provisioning commands 195196 Shell utility 204 synchronization commands 200201 usage 194 utility commands 202214 EACFaultMessage class 253 emgr_update utility deploying Web Studio changes 283 overview 277 Endeca Access Control System, configuring 131 Endeca Application Controller adding a component or host 187 Agents 139 architecture diagram 139 ComponentControl interface 219 EAC Central Server 138 HTTPS security in 144 introduced 27, 138 Java WSDL tool interpretation 241
Endeca Confidential
.NET WSDL tool interpretation 241 Provisioning interface 229 provisioning overview 150 removing a component or host 188 starting and stopping 145 starting and stopping on Windows 145 starting from inittab 145 Synchronization interface 220 Utility interface 223 WSDL overview 218 Endeca Application Controller provisioning file Agidx component 168, 243 Agraph component 170, 244 aliasing hosts 152 Crawler component 174, 248 defining components 153 defining hosts 152 Dgidx component 161, 250 Dgraph component 165, 251 Forge component 158, 255 LogServer component 178, 259 ReportGenerator component 179, 262 Endeca Deployment Template 189 Endeca Developer Studio 27 about additional tasks 33 changing to another Web Studio 30 retrieving the project configuration from Web Studio 33 specifying command options for Endeca programs 31 Endeca HTTP service changing port 45 Endeca Presentation API HttpENEConnection 296 Endeca Standard Application accessing main page 122 attribute for title 125 attribute for URL address 124
384
enabling SSL 128 enabling user authentication 130 file-based authentication 131 LDAP authentication 131 location of WAR 121 login page 132 overview 120 Tomcat installation 126 URL.External property, use of 121 WebLogic server installation 133 Endeca tools Developer Studio 27 overview 26 Web Studio 26 Endeca Web Studio 2627 audience for 26 changing port 45 configuring the preview application 9495 cookie name 92 customizing the navigation menu 72 downloading instance configuration 42 navigation results page 93 record page 94 user permissions 50 endeca_standard.xml file used for Standard Application 124 ENE URL parameter mapping 296 Eneperf debugging 310 generating statistics 308 introduced 302 logs for use with 309 optional settings 305 required settings 303 running locally 304 running remotely 304 setting the number of queries 308 usage 302 Ethereal performance analysis tool 336
Administrators Guide
F
FileLoginModule for Standard Application 131 FlagIDListType class 254 Forge showing status in Web Studio 43 specifying command options from Web Studio 31 Forge component, Endeca Application Controller 158, 255 Forge hierarchical logging introduced 321 Forge, flags 366 ForgeComponentType class 255 FullyQualifiedComponentIDType class 256 FullyQualifiedFlagIDType class 256 FullyQualifiedUtilityTokenType class 257
G
generate-report.bat 109 generate-reports.bat running the script 111 source tree, editing 111, 378 generating reports with the report generation script 109 groupPath configuration parameter for LDAPLoginModule 66 groupTemplate configuration parameter for LDAPLoginModule 66
H
host setting for Eneperf 303 HostListType class 257 HostType class 258
I
implementing logging and reporting in Web Studio 100 inittab, starting the Endeca Application
Endeca Confidential
385
Controller from 145 instance configuration downloading from Web Studio 42 retrieving from Web Studio with Developer Studio 33 instrumenting the preview application 92 invalid characters in provisioning 151 Iostat performance analysis tool 338 iteration setting for Eneperf 304
J
JAAS framework for Access Control System 130 Java keystore configuring location 68 Java keystore file, creating 128 Javascript domain for preview application 91
K
keyStoreLocation configuration parameter for LDAPLoginModule 68 keyStorePassphrase configuration parameter for LDAPLoginModule 68
L
LDAP authentication rebinding 67 LDAP server configuration for multiple servers 69 configuring SSL 68 ldapBindAuthentication configuration parameter for LDAPLoginModule 67 LDAPLoginModule authentication configuration parameter 67 checkPasswords configuration parameter 68 groupPath configuration
Endeca Confidential
parameter 66 groupTemplate configuration parameter 66 keyStoreLocation configuration parameter 68 keyStorePassphrase configuration parameter 68 ldapBindAuthentication configuration parameter 67 passwordAttribute configuration parameter 68 serverInfo configuration parameter 69 serviceAuthentication configuration parameter 67 servicePassword configuration parameter 67 serviceUsername configuration parameter 67 userPath configuration parameter 66 useSSL configuration parameter 68 LDAPLoginModule for Standard Application 131 Linux performance tools 339 Linux sysstat package 339 ListApplicationIDsInput class 258 Lockstat performance analysis tool 338 log file (eneperf) creating 309 settings 304 Log Server about 99 archiving log files 116 monitoring 99 monitoring in Web Studio 43 provisioning 101 rolling logs 99 settings 102 starting 103 stopping and starting from Web Studio 40
386
using the Log Server command line 99 logging and reporting introduced 98 login page for Standard Application 132 logs to be used with Eneperf 309 LogServer component, Endeca Application Controller 178, 259 LogServerComponentType class 259
N
navigation results page, instrumenting 93 Netperf performance analysis tool 337
P
passwordAttribute configuration parameter for LDAPLoginModule 68 performance, tuning 289 Perl guidelines, for MDEX Engine request log 294 ping, to check Dgraph or Agraph 315 port for Endeca HTTP service and Web Studio, changing 45 port setting for Eneperf 303 prerequisites for the baseline update script 378 prerequisites to logging and reporting 98 preview application described 90 enabling and disabling its display in Web Studio 95 instrumenting 92 Javascript domain 91 requirements 9092 Process Explorer performance analysis tool 340 properties, adding to hosts and components 154 provisioning adding properties to hosts and components 154 an Endeca Application Controller implementation 150 directories on a host 153 incremental 185 invalid characters in 151 multi-machine 183 report generation script, in Web Studio 112
Endeca Confidential
M
MDEX Engine auditing statistics, viewing 316 checking aliveness of 315 making SSL connection 129 showing status in Web Studio 43 specifying command options from Developer Studio 31 specifying command options from Web Studio 31 statistics, resetting 312 statistics, viewing 312 stopping and starting from Web Studio 40 MDEX Engine Auditing page viewing 316 MDEX Engine parameter mapping 296 MDEX Engine request log converting for use with Eneperf 309 extracting information 294 file format 292 introduced 292 MDEX Engine Statistics page about 312 presentation transformed with XSLT 312, 317 sections of 313 viewing 312 viewing raw XML 312 Mpsar performance analysis tool 338 Mpstat performance analysis tool 339
Administrators Guide
387
scripts 154 the provisioning file 150 the Provisioning interface 229 using Endeca Deployment Template 189 using XML entities 153 Prstat performance analysis tool 338
R
rebinding for LDAP authentication 67 record page, instrumenting 94 reference implementations, scripts 155 RemoveComponentType class 261 RemoveHostType class 261, 262 report generation script 109 adding in Web Studio 112 defined 109 running 111 Report Generator about 99 about reports 100 automatic scheduling of 108 creating HTML reports 115 enabling the display of reports 109 monitoring status in Web Studio 43 provisioning 103 settings 104 specifying report frequency 107 starting 106 ReportGenerator component, Endeca Application Controller 179, 262 ReportGeneratorComponentType class 262 reports customizing content and appearance 115 deleting 117 generated from control scripts 116 reports, viewing in Web Studio 113 request log See MDEX Engine request log
Endeca Confidential
retrieving the instance configuration from Web Studio, with Developer Studio 276 retrieving the Web Studio project configuration from Developer Studio 33 root Application element, defining 151 RunBackupType class 264 RunFileCopyType class 265 RunRollbackType class 265 RunShellType class 266 RunUtilityType class 266
S
Sar performance analysis tool 338 scripts baseline update 377 developing 155 editing the report generation script 111 environment variables 156 high-level workflow for report generation 111 preparing to use in Web Studio 111 provisioning 156 report generation script 109 using canonical paths in 157 serverInfo configuration parameter for LDAPLoginModule 69 server.xml file used for Endeca HTTP service 45 serviceAuthentication configuration parameter for LDAPLoginModule 67 servicePassword configuration parameter for LDAPLoginModule 67 serviceUsername configuration parameter for LDAPLoginModule 67 Solaris performance tools 338 specifying command options in Developer Studio 31 specifying report frequency 107
388
SSL configuring LDAP server 68 enabling connection to MDEX Engine 129 enabling for Standard Application 128 SSLConfigurationType class 268 starting the Endeca Application Controller 145 statistics setting to Eneperf 308 statistics, viewing for MDEX Engine stats.xslt file 312, 317 StatusType class 269 stopping the Endeca Application Controller 145 Synchronization interface 220
U
updates running baseline 40 URL address for Standard Application, changing 124 URL mappings for the preview application 95 URL.External property for Standard Application 121 user authentication for Standard Application 130 user entitlement filter LDAPLoginModule configuration parameter 66 user permissions in Web Studio 50 user roles in Web Studio predefined 51 user defined 52 userPath configuration parameter for LDAPLoginModule 66 useSSL configuration parameter for LDAPLoginModule 68 Utility interface 223
312
T
Task Manager performance analysis tool 339 Tcpdump performance analysis tool 336 TCPView performance analysis tool 340 terminology equivalences 343 thesaurus entries, deploying 283 third-party performance tools cross-platform 336 Solaris and Linux 337 Windows 339 throttle setting to Eneperf 308 TimeRangeType class 268, 269 TimeSeriesType class 270 title of Standard Application, changing 125 Tomcat application server importing keystore file 129 installation of Standard Application 126 JAAS framework 130 JAAS login configuration file 131 Top performance analysis tool 337 transferring the instance configuration
Administrators Guide
W
WAR for Standard Application, location of 121 Web Studio retrieving the instance configuration 276 See Endeca Web Studio viewing reports in 113 Web Studio extensions and URL tokens 81 configuring 77 defined 77
Endeca Confidential
389
enabling 80 theming 85 token-based authentication 82 troubleshooting 86 WebLogic application server, installing Standard Application on 133 Windows Task Manager 339 third-party performance tools 339 Windump performance analysis tool 336 WSDL Endeca Application Controller 218 special ID types 218
X
XML entities, using in your provisioning file 153 XSLT, transforming MDEX Engine statistics 312, 317
Endeca Confidential
390
Administrators Guide
Endeca Confidential