Escolar Documentos
Profissional Documentos
Cultura Documentos
11 Security Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
11.1 Authorizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
11.2 Roles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
11.3 Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
11.4 Integration in Application Authorizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
11.5 Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
12 Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
This document describes the concept and the steps that an application programmer has to follow in order to
load binary files into SAP HANA with the file loader functionality. Once this has been done, it is possible to run
services like search on this content.
Note
Once you have read this document, we recommend reading the SAP HANA Search Developer Guide. This
guide provides you with information about how to access the loaded file contents using full-text search.
The file loader is a set of HTTP services that you can use to develop your own applications to search in file
contents. The file loader package also contains a basic example application with monitoring and statistical
information about the current file loader schedule.
Note
The file loader supports the loading of file contents for search. To enable properties and metadata of files
for search as well, you can extend the node table with additional columns and follow the steps described in
the SAP HANA Search Developer Guide.
Related Information
The file loader is used to load the text representation of files that are stored on HTTP or HTTPS servers into
SAP HANA.
The following diagram shows the architecture of the file loader component in SAP HANA.
Technically, the file loader is an SAP HANA XS application that is shipped as a delivery unit.
The binary file content from the HTTP(S) server is converted and stored as a textual representation in the
node table in the SAP HANA database. The loading process is carried out asynchronously. Files are processed
in parallel to minimize processing time. The task list table is used to track the file loading process.
The file loader exposes an HTTP service API in REST format that can be accessed from any HTTP client.
To use the file loader in your own application, you need to perform the following steps.
Prerequisites
Procedure
Related Information
This tutorial teaches you how to set up the file loader component to be used in an example application.
Context
Before you can use the file loader example application, you have to install and configure the component and
set up user management with minimal authorizations. Based on this configuration, you can then try out the
example UI provided with the file loader component.
Note
To perform the setup steps, you need a user with system administrator permissions and access to the SAP
HANA XS Administration Tool.
Procedure
The file loader component comes with SAP HANA as an SAP HANA XS delivery unit.
Use SAP HANA Application Lifecycle Management (sap/hana/xs/lm) or SAP HANA studio to import the
file loader delivery unit with the name HCO_INA_FILELOAD.tgz . After the import, the component is
available and activated in the sap bc ina fileloader job package.
The script enables job scheduling in SAP HANA XS and sets up users and tables for the example scenario.
Note
Our example uses users with minimal authorizations.
In the SAP HANA studio, execute the example setup SQLScript to set up users, authorizations, and
example database tables. Execute this script in SAP HANA Studio as a system administrator and replace
the password placeholders with your passwords. For more information, see the Related Information
section.
See the related links for more information about these user types.
The setup SQLScript performs the following configuration tasks: It activates the XS job scheduler, assigns
the database connection user to the file loader job and the SQL connection, and activates the file loader
job.
The script creates example node tables and task list tables for the access user, and assigns appropriate
object authorizations for the database connection user.
3. Configure the file loader job. If you have executed the SQL setup script (step 2), this configuration will have
been applied automatically.
Note
To perform this step, you need a user with the sap.bc.ina.fileloader.roles::Administrator
role.
Start the SAP HANA XS Administration Tool and activate the file loader job in the Application Objects tree
under sap bc ina fileloader job TaskCommandJob (direct link: /sap/hana/xs/admin/?
package=sap.bc.ina.fileloader.job#/package/sap.bc.ina.fileloader.job/job/
taskCommandJob).
To activate the job, mark it as <Active> and enter the database access user (FLDBCONN) in the <User>
field. Save the changes.
4. Configure the SQL connection for the file loader job. If you have executed the SQL setup script (step 3),
this configuration will have been applied automatically.
Note
To perform this step, you need a user with the sap.bc.ina.fileloader.roles roles.
You are still in the SAP HANA XS Administration Tool. In the Application Objects tree, navigate to the file
loader SQL connection located under sap bc ina fileloader job fileloader.xssqlcc (direct link:
http(s)://<SAP HANA host>:<SAP HANA port>/sap/hana/xs/admin/?
package=sap.bc.ina.fileloader.job&object_name=fileloader&object_type=xssqlcc) and
enter the DB connection user (FLDBCONN) in the <User> field. Save the changes. Repeat this for the SAP
HANA DB connection of the package: sap.bc.ina.fileloader.lib.sqlconnection
5. Start the file loader example application.
Start your browser and open the file loader example application by entering the following address:
http(s)://<SAP HANA host>:<SAP HANA port>/sap/bc/ina/fileloader/app/#/
exampleUiPage.
The following example SELECT statement shows how to search in the file content of the example node
table. The SELECT statement returns the URL, snippets, and highlighted content.
Run the example clean-up SQLScript to remove all tutorial data and users generated in the previous steps
and to return to your previous system state.
Related Information
The file loader component contains a small browser-based demo UI to show what you can develop and how
you can use the file loader's capabilities.
Context
The file loader example application is an implementation of the file loader functionality using JavaScript with a
web front end. You can enter a number of URLs for documents that you want to upload into SAP HANA with
the file loader. You can then search for content in the uploaded documents.
Procedure
Open your browser and start the file loader example application by entering the following address:
http(s)://<HostName>:<Port>/sap/bc/ina/fileloader/app/#/exampleUiPage.
Log on with the access user FLACCESS. The initial screen displays a text field containing URLs. Use copy
and paste to replace these URLs with the URLs for your documents.
Clicking the + icon on the initial screen of the user interface shows an options panel where you can change
a number of basic options, such as the number of packages, schedules and frequency, and the names of
the task list table and node table.
Table 1:
Name of Option Description
Number of Workers Limits the number of job schedules used to upload the
documents for each package.
Schedule Timeout Limits the amount of time that processing can run overall.
The default for the schema is the current user, but you
can modfiy this to upload data for another user.
The default for the schema is the current user, but you
can modfiy this to upload data for another user.
3. Start processing.
When you choose Start loading, the results screen appears. The top half contains the Job monitor, which
displays the current status of the files you chose to process.
Note
Some URLs will fail, as they require HTTP destinations to use HTTPS and proxy servers.
Here you find the setup and clean-up scripts used in the tutorial.
The setup SQLScript creates the users and the example database tables for the file loader example application
tutorial. It also performs all configurations which are defined in the process steps.
/*
Example SQL script to set up file loader users and file loader tables
This script can be used to perform a smoke test in SAP HANA, to verify the
installation and configuration of the file loader.
Execute this SQL script with a user who has authorizations to create users,
tables, assign activated roles, edit SAP HANA configuration, and so on.
The following users are created:
The clean-up SQL script removes the users from the file loader example application tutorial and deletes
objects that depend on them, such as tables.
The file loading process involves three steps: scheduling the job, getting the file, and converting the file
content.
The file loading process is ready to start as soon as the node table and the task list table are available and
populated with data.
After the files have been processed, the application can use the extracted and converted content of the node
table. The task list table can be used by the application for cleanup processes if errors occur.
The file loader provides HTTP services to retrieve the current status of the data processing to determine the
process status for long-running processes.
If you want to load files from HTTP servers that require authentication, SSL or HTTP proxies, you have to
create a XS HTTP destination.
Context
Procedure
1. Start your browser and open the SAP HANA File Loader Administrator UI with the following URL: http://
<SAP HANA host>:<SAP HANA port>/sap/bc/ina/fileloader/app. You need file loader
administrator (FLADMIN) permissions.
2. Go to the Destinations section.
3. Create a new XS HTTP destination. Choose Create and enter the required information (refer to the XS
HTTP destination documentation of the XS programming guide).
4. To edit a XS HTTP destination select the destination from the list and press Edit. You will be directed to the
XS Administration tool to edit the configuration including passwords.
Results
The file loader HTTP destinations are stored in the file loader package
sap.bc.ina.fileloader.lib.destination. The destinations will remain after an upgrade of the file
loader applilcation.
The file loader returns several states and helps you to monitor the process.
The state of the file loading process is stored in the task list table column /1ES/_STATUS.
NEW
FILE_LOADING_IN_PROCESS
FILE_LOADING_FAILED
FILE_LOADING_SUCCESSFUL
TEXT_CONVERSION_IN_PROGRESS
NODE_TABLE_UPDATE_FAILED
TEXT_CONVERSION_FAILED
SUCCESS
TIMED_OUT_WHILE_INDEXING
The application uses a database table that stores the text content of the processed files.
The file loader supports any structure for the table, but needs one column for the text content.
Note
The content column must have the data type BINTEXT.
The node table can have a language column (NVARCHAR(2)). This column can be used to store the language of
the file.
Example
Related Information
The task list table is used to track the file loading process of an entry in the node table.
The table shows the progress of the file loading so that you can take action if data loading problems occur.
Every entry in the node table has a corresponding entry in the task list table. The node table controls the data
loading process and is used by the file loader jobs. The primary keys of the task list table are identical to those
of the node table. All other columns are determined by the file loader.
Example
The primary keys must be identical to the primary keys of the node table.
In this example, the column ID is the primary key. All other columns /1ES/_* must have the structure
described.
Primary Key: Use identical primary keys for the node table and the task list table.
URL: Provide an absolute HTTP URL that targets a file in the /1ES/_URL column.
Initial Status: To indicate that the entry has to be processed by the file loader, set the /1ES/_STATUS
column to the value "NEW".and set the time stamp column /1ES/_TS_STATUS_NEW to the current time
stamp.
Related Information
The file loader API is exposed as an HTTP service that offers various commands.
The various operations that can be executed with this service are called commands. A command is described
as a JSON object.
Command JSON: The HTTP service API uses a JSON format to describe commands that should be executed.
The command must have a command property to provide a command name as a string (for example
cmdScheduleJob). The second property is the optional parameter property that describes the data that is
required by the command.
http(s)://<host>:<port>/sap/bc/ina/fileloader/service.xsjs
HTTP GET: Services that provide information and do not change the state can be invoked using the HTTP GET
method. The file loader service supports the command parameter.
http(s)://<host>:<port>/sap/bc/ina/fileloader/service.xsjs?command=<command JSON>
HTTP POST: Services that change the state must be invoked using the HTTP POST method. Information is
passed in the HTTP POST body.
http(s)://<host>:<port>/sap/bc/ina/fileloader/service.xsjs
The HTTP body contains the command in JSON format: <command JSON>
If you use lowercase letters and/or special characters for task list, node table or attribute data, you do not
need to use quotation marks (). The file loader API behaves differently from the SAP HANA SQL interface in
this regard, to make it easier to use.
Related Information
When service.xsjs is invoked using an HTTP GET request, an HTML response is provided. The CSRF token
can also be fetched.
The file loader API supports various commands using HTTP GET or POST.
10.3 cmdGetQueueStatistics
This command returns statistical information about the current package of task list items.
If no package is specified, all data is used for the statistics. These statistics are used by the application to
decide on further processing or for monitoring.
Command
{
"command": "cmdGetQueueStatistics",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the task list table >",
"name": "< Name of the task list table >",
"packageId": "< optional unique ID of a package >"
}
}
}
}
Return
The detailed status information is given for all individual statuses; the summaries are calculated as follows:
Table 3:
totalCount All files of the task list table failCount + successCount + inProcess
Count + unprocessedCount
{
"statusCode": <code>,
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": <code>}
]
},
"queue":
{"statusDetail":
{ "newCount": "<count>" },
{ "successCount": "<count>" },
{ "timedOutCount": "<count>" },
{ "fileloadingInProcessCount": "<count>" },
{ "fileloadingFailedCount": "<count>" },
{ "fileloadingSuccessfulCount": "<count>" },
{ "nodeTableUpdateFailedCount": "<count>" },
{ "textConversionInProcessCount": "<count>" },
{ "textConversionFailedCount": "<count>" }
},
"statusSummary": {
{ "unprocessedCount": "<count>" },
{ "inProcessCount": "<count>" },
{ "successCount": "<count>" },
{ "failCount": "<count>" },
{ "totalCount": <count> },
}
}
}
The job schedules are created with an overall timeout. The job schedule automatically stops the processing
when the timeout is reached. However, the application can use this command to stop the processing earlier by
setting unfinished or unprocessed files as timed out. Once you have executed this command, the running job
schedules will not find any unprocessed files and will stop the processing. This service processes a subset of
all node table entries. To update all node table entries, call the service until the number of updated records is
0.
Command
{
"command": "cmdSetQueueTimedOut",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the task list table >",
"name": "< Name of the task list table >",
"packageId": "< Optional unique ID of a package >"
}
}
}
}
}
Return
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": <code>}
]
},
"table": {
numberOfUpdatedRecords : <Number of updated records>
}
}
This command returns the number of schedules and the active schedules of the given task list. It is used for
monitoring purposes.
Command
{
"command": "cmdGetSchedules",
"parameter": {
"fileLoaderRequest": {
"table": {
"Table": {
"schema": "< Schema name of the task list table >",
"name": "< Name of the task list table >"
}
}
}
}
}
Return
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": <code>}
]
},
"schedule": {
"jobName" : < id of the job>,
"numberOfSchedules" < number of schedules >,
"scheduleDetails": [
{
"fileloaderScheduleId": "<schedule id fileloader>",
"xsEngineScheduleId": "< schedule id XS >"
}
]
}
}
This command stops the schedules for all packages and sets the status "timed out" for all unprocessed files.
Note
This command is reserved for situations where it is necessary to stop the process immediately, either in
emergencies or when errors have occurred.
Command
{
"command": "cmdKillTaskListWorkers",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the task list table >",
"name": "< Name of the task list table >"
}
}
}
}
}
Return
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": <code>}
]
},
"schedule": {
"jobName" : < id of the job>,
"numberOfSchedules" < number of schedules >,
"scheduleDetails": [
{
"fileloaderScheduleId": "<schedule id fileloader>",
"xsEngineScheduleId": "< schedule id XS >"
}
]
}
}
This command returns the user name of the user being logged on.
Command
{"command": "cmdGetUser"}
Return
If the returned number of updated records is 0, then all columns are updated.
{
"user": {
"name": "< user name >"
}
}
10.8 cmdGetSystemInformation
Command
{
"command": "cmdGetSystemInformation",
// optional parameter to restrict the output to the desired sections only
"parameter": [
"User", "Time", "XS"
]
}
The command returns the following information about the current system.
{ "XS": { },
"Fileloader": { },
"Time": { },
"User": { },
"System": { },
"Services": { },
"Memory": { },
"CPU": { },
"Disk": { },
"Statistics": { }
}
10.9 cmdGetCommands
Command
{
"command": "cmdGetCommands",
"parameter": { "packageName": "sap.bc.ina.fileloader.cmd"}
}
Return
[
{
"name": "<command name>",
"description": "<description>",
"accessMethods": [ "HTTP POST","HTTP GET","JavaScript" ],
"privilege": "sap.bc.ina.fileloader::<application privilege>",
}
]
This sections contains security-related information like authorizations, users, and roles that are used by the
file loader component.
The file loader HTTP services can be used remotely by applications. The services ensure authentication and
authorization, and prevent cross-site request forgery (CSRF).
Note
Ensure that you assign minimal authorizations to users.
11.1 Authorizations
Table 4:
Type of Authorization Description
Administration The administration authorization allows the user to access the administration user
interface and the maintenance of HTTP destinations.
Monitoring The monitoring authorization allows users to access the HTTP service with read-
only access.
Access The access authorization allows users to influence (start and stop) the file loading
process using the HTTP service.
JobAccess This authorization defines access to the file loader job script. It prevents non-au
thorized users from executing the job script.
11.2 Roles
The file loader roles define the minimum authorizations and access types that are required.
Table 5:
Type and Name of Role Description
Access Use the access role when a file loader client component needs to schedule jobs or to
stop jobs. This role also includes all of the access rights of the monitoring role. The
sap.bc.ina.fileloader.ro role inludes the file loader access application authorization and object authorization
les::Access to select/update/insert/delete on the job schedule table and select on framework
tables.
DBConnection The database connection role comprises all object authorizations to access the SAP
HANA database, and the privileges to schedule the file loader job and to maintain
HTTP destinations. This role does not include any application privileges. Apply this
role to one technical user in your SAP HANA system.
Administrator User The File Loader administrator needs the File Loader Administrator role. This role in
cludes all of the access user authorizations and also provides access to the adminis
tration functions like the HTTP destination maintenance.
11.3 Authentication
The file loader HTTP services support the authentication methods of SAP HANA XS.
Client applications have to fulfill certain prerequisites to be used with the file loader component.
The file loader works with the node table and the task list table that are defined by the application.
In addition to the authorizations that are defined in the file loader roles, the file loader client application has to
define the object authorization for the task list table and the node table. The file loader requires SELECT and
UPDATE authorization for the two tables. The object authorizations have to be applied to the technical user
with the DBConnection role.
Example
This example shows how to define two file loader users with application-specific roles that includes the file
loader roles.
There are a number of different user types that require specific authorizations for the file loader.
Table 6:
User Description
Access User FLACCESS The access user is used to call the HTTP services to sched
ule the file loading process. This user requires the file loader
access role. This user uses the DBConnection user implicitly
to connect to SAP HANA and therefore does not need any
object authorizations.
This user also owns the example node and task list table in
the user's schema.
Administrator FLADMIN The file loader administrator can access the file loader ad
ministrator UI to perform configuration tasks and to monitor
the loading process.
DBConnection User FLDBCONN The database connection user is a technical user in your
SAP HANA system. This user has object authorizations to
access the SAP HANA database, can start schedules and is
used in the file loader job. If you want to use new DB objects
in the file loader application, you need to give this user addi
tional object authorizations. This user does not have any file
loader application authorizations and cannot be used to log
on to file loader user interfaces or services.
The requirements for optimal usage of the file loader component are listed below.
1. The overall execution time of the file loading process should be as short as possible.
2. The load on the SAP HANA system should be minimal while the file loading process is running.
3. All files should be processed successfully.
The loading time can be reduced by adding additional parallel schedules to the file loader job.
The number of parallel schedules reduces the overall execution time of the file loading process.
There should not be more parallel schedules than files to process. A high number of schedules can also
lead to locking overhead while updating the task list table.
The overall SAP HANA system load is also increased. The system administrator has to balance the file
loader load with the remaining load of the SAP HANA system.
As well as the load on the SAP HANA system, the load on the HTTP servers where the remote files are
located is increased. If the remote HTTP servers cannot handle the load, the response times might
increase, or the servers might not provide any response.
If new schedules are added to the file loading process, consider resizing the remote HTTP servers.
Coding Samples
Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system
environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and
completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP
intentionally or by SAP's gross negligence.
Accessibility
The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be
a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however,
does not apply in cases of wilful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations of
SAP.
Gender-Neutral Language
As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as
"sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun
does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.
Internet Hyperlinks
The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does
not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any
damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for
transparency (see: http://help.sap.com/disclaimer).