Você está na página 1de 18

SAP Predictive Analysis Installation

SAP Predictive Analysis is the latest addition to the SAP BusinessObjects suite and introduces entirely new functionality to the existing Business Objects toolbox. Predictive Analysis integrates SAPs Visual Intelligence data visualization tool with new predictive functionality powered by both open source R and SAP-written algorithms. Predictive Analysis includes algorithms for time series forecasting (for predicting sales, demand, price, and other time-dependent metrics), clustering (for identifying distinct groups of individuals based on numeric descriptive data), decision trees (for creating a tree-like set of decision support rules to categorize observations), and linear regression (for fitting linear relationships between a dependent variable and one or more predictors). These predictive algorithms can be used to extract insights and predictions, improving the value and actionability of the existing Business Intelligence infrastructure. Predictive Analysis combines these powerful predictive algorithms with a familiar and easy-to-use tool that integrates with existing BusinessObjects tools to make data preparation, model building, and implementation faster and easier than ever before. SAP Predictive Analysis is installed locally on the users machine and accesses data for processing locally (from a CSV, Exce l, or ODBC connection to a database) or on SAP HANA. Predictive Analysis can be used alone to analyze data on the client machine or can be paired with HANA, allowing PA to leverage the powerful in-memory processing power of SAP HANA.

Todays blog post is a brief how-to for getting SAP Predictive Analysis installed and working for both local data and on HANA. Since Predictive Analysis has the power to run local R algorithms, HANA-R algorithms, and HANA Predictive Analysis Library (PAL) algorithms, installation is somewhat complex. Installation steps include: 1.) 2.) 3.) 4.) Installing PA Installing R on local machine PA HANA PAL installation/activation R and Rserve Linux Installation and HANA Rserve setup

The following items are required to perform the full installation: Predictive Analysis installation files for version 1.0.8 (or latest available) An internet connection to install R and required packages SAP HANA A Linux R/Rserve host (SAP currently only supports SUSE Linux)

Step 1: Installing Predictive Analysis


The first step is the easiestinstalling PA from the setup.exe executable is pretty straightforward and guided by the installation wizard. Once PA is installed, I recommend importing a small csv file (one option is the MTA daily fare data) and running some visualizations just to make sure PA was installed properly. Uploading this dataset and creating a line chart with DATE on the x-axis and the CASH and ETC variables as measures on the y-axis shows total daily fares by payment type:

At this point, most of the predictive algorithms will be grayed out and not available because R is not yet installed.

Step 2: Installing Local R


Now that PA is installed, R must be installed to enable most of the predictive algorithms in PA. The PA team did make available a widget to install R within the PA interface (FileInstall and Configure R). However, this may not work for everyone, so I will cover the manual installation and set up (this is also useful if you tried the automated installation and still are not able to use the R algorithms).

Before installing R, ensure that Java is installed and available on the computer by typing java version at the command line. The message below (showing Java version 1.6.0 or higher) is required for Predictive Analysis.

First, go to the CRAN (Comprehensive R Area Network) site and download R for Windows (version 2.15 or later). Select base to install the base R software. Finally, select Download R 2.15.3 (or the current latest version) for Windows.

In addition, R requires a list of packages for the built in algorithms, so download these from CRAN as well by selecting Packages on the left menu.

The list of packages required by PA is below. Search for each package in the CRAN packages library and download the zip file to any location on your computer.
rJava AMORE RJDBC pmml RODBC arules DBI caret monmlp XML

Once all the R install files have been downloaded, install R by executing the R executable file (ex. R2.15.3-win.exe) and clicking on Next where appropriate. On the component selection screen, select Core Files and either the 32-bit or 64-bit Files as appropriate. Note the installation location for this instance of R (should be something like: C:\Program Files\R\R-2.15.3).

After installing R, open up the R GUI to install the packages that have been downloaded.

Select each of the downloaded package zip files on the list above and install one by one. If the package is installed successfully, the message Package <packagename> successfully unpacked and MD5 sums checked will appear in the R Console window. To test that the packages have installed successfully, type library(packagename) in the R Console window. If no error is displayed, the package is installed successfully.

Finally, set environment variables to add the libraries folder, set the home library, and add update the path. These may change depending on the version and installation location for R, as noted above.
Sys.setenv(R_LIBS = "C:/Program Files/R/R-2.15.3/library"); Sys.setenv(R_HOME = "C:/Program Files/R/R-2.15.3/"); Sys.setenv(Path = "existing path; C:/Program Files/R/R2.15.3/library/rJava/jri; C:/Program Files/R/R-2.15.3/bin");

Now that R and the required packages are successfully installed, open Predictive Analysis and select FileInstall and Configure R. If Predictive Analysis was open during the R installation and set up or if changes were made to the Install and Configure R settings, the user must completely exit and re-open Predictive Analysis prior to running any predictive objects.

Select the Configuration tab. Make sure that the Enable Open Source R Algorithms box is checked and ensure that the R installation folder noted during the R console installation process matches the actual location of the R install.

Re-open the MTA fare plaza data document and proceed to the Predict view. The R algorithms should no longer be grayed out.

Lets run a quick forecast, just to make sure the R integration is working. In the prediction dataflow area, select a Filter object from the Data Preparation pane. Then connect an R-Double Exponential Smoothing object. The Filter object should filter for only Plazaid=3 while the R-Double Exponential Smoothing should predict the dependent column ETC over the Custom period starting in 2010 and having 365 period per year. Once these settings are entered, all 3 objects should show successful validation, with a green check on the object.

Click the Run Analysis button 3, the Bronx-Whitestone Bridge.

to generate the predicted values for ETC vehicle traffic at plaza

Success! Review the results by clicking Yes and then selecting the Charts view in the Results pane. The chart below shows the actual (blue bars) vs. predicted (green line) traffic at Plaza 3.

Step 3: HANA PAL Installation/Activation


Install HANA SP05 or verify that it is already installed.

Refer to the SAP HANA Installation Guide with Unified Installer for SP05 section 3.8: Installing Application Function Libraries (AFLs) on a SAP HANA System to install the PAL on HANA. Enable scripting in HANA by starting the script server per SAP Note 1650957:
Starting the script server You start the script server while the SAP HANA database is already running. To start the script server perform the following steps: 1. Open the 'Configuration' tab page in the SAP HANA database studio. 2. Expand the 'daemon.ini' configuration file. 3. Expand the 'scriptserver' section. 4. Change the parameter 'instances' from 0 to 1. This change is possible on the system level and on the host level. Note: The system will start the script server immediately. Note: You have to start a script server instance for each index server instance.

See SAP Note 1650957 for additional information. For further information on the PAL, see the SAP HANA Predictive Analysis Library (PAL) Reference. To test that the PAL installation was successful, run a quick analysis using a PAL algorithm. Import the gasoline data into HANA using the Data ImportData from Local File tool. Original Data Source: http://people.sc.fsu.edu/~jburkardt/datasets/regression/x17.txt

File is comma delimited with row 1 as the header.

Select the Sample field as the key

Now that the data is available in HANA, create a new document in PA and select the HANA Online data source:

Enter in the HANA server connection information and click Connect HANA instance. After successful connection, navigate to the table that was just imported into HANA and click Acquire.

In the Predict pane, add a HANA Multiple Linear Regression object and select Component 1, Component 2, Component 3 and Condition as the Independent Columns and Octane Rating as the Dependent Column.

The predictive workflow should indicate validation (green check mark on each item). Click the Run Analysis button to generate the predicted values for Octane Rating. When the success message appears, review the results showing that the regression has generated a model with R-Squared of 0.906.

A visualization can be generated by going to the Prediction Pane Visualize tool and plotting the Actual and Predicted Octane Rating vs. Sample.

All HANA algorithms should now be available in the Prediction pane for the HANA Online document.

Step 4: Installing Linux R Manually


Note: If the Linux host is running SUSE with an active support agreement, R and Rserve can be downloaded and installed via the update repository. In this situation, there is no need to compile the R code. The following instructions are for installing Linux R manually, without access to the update repository. On the Linux server, install the UNIXODBC and libxml12-devel packages. Before compiling R, install Java JDK version 1.6.0_18+ and then set the following environment variables:
Java home path Java compiler : Example: JAVA_HOME=/apps/java/jdk : Example: JAVAC=/apps/java/jdk/bin/javac

Java headers gen.: Example: JAVAH=/apps/java/jdk/bin/javah Java archive tool: Example: JAR=/apps/java/jdk/bin/jar Java library paths: Example: JAVA_LD_LIBRARY_PATH=$(JAVA_HOME)/jre/lib/amd64/server:(JAVA_HOME)/jre/lib/amd64: $(JAVA_HOME)/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/li b:/usr/lib Example: JAVA_LIBS=$(JAVA_HOME)/jre/lib/amd64/server:(JAVA_HOME)/jre/lib/amd64: $(JAVA_HOME)/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/li b:/usr/lib Note: If you want to run R under a dedicated user, create a user before you proceed. For example create the user named ruser.

The following steps are taken from SAP HANA R Integration Guide sections 1.1.1 1.1.3.

Installing R
Check or install these Linux packages: xorg-x11-devel gcc-fortran readline-devel (install if you want to use R standalone(usability)) To install R for SAP HANA, you must compile the R package from its source code. SAP has tested the SAP HANA integration with R version 2.15. Execute the following steps as user root:

1.) To compile R, download the R (version 2.15) source package from the R Project for Statistical Computing website.

2.) Extract it to a user defined directory. Go into this directory, and execute: ./configure --enable-R-shlib If you have trouble during configuration you could try the following options: --with-readline=no --with-x=no If you want to have terminal/history support in R, then you can install the Linux package readline-devel. 3.) Then compile: make clean make make install

After successful compilation, the 'R' command will be installed in /usr/local/bin. If you decide to install it into another directory, make sure that it is properly set in your PATH variable.

Installing Rserve
Install Rserve on the same host. SAP has tested the integration with Rserve version 0.6-8. Execute the following steps as root user. The user under which you want to use R is called ruser. 1.) Install the Rserve package. a. Download the Rserve package from Rserve: Binary R server or from The R Project for Statistical Computing. b. Login as root user and install the package using the following R terminal command: R install.packages("/PATH/TO/YOUR/Rserve.tar.gz", repos = NULL) library("Rserve") # test if installation worked, it should return no output q() 2.) Configure Rserve. As user root, create the file /etc/Rserv.conf with the following content and grant file readaccess rights to the ruser user.: maxinbuf 10000000 Maxsendbuf 0 remote enable

The value 10000000 is merely an example. We recommend that you set the value of maxinbuf to (physical memory size, in bytes) / 2048. For example if you installed R on a host with 256 GB of physical memory you should set maxinbuf to 134217728. 3.) Start Rserve, login as ruser, and enter the following:. R CMD Rserve --RS-port <PORT> --no-save --RS-encoding "utf8" The port which is used to start Rserve has to be chosen according to the cer_rserve_port value in the indexserver.ini file (see section Prepare the SAP HANA database for R below). <PORT> is the port number, e.g. 30120. The --no-save option makes sure that the invoked R runtimes do not store the R environment onto the file system after the R execution has been stopped. This is important to avoid the file system to be filled over time due to multiple R runs. There is currently no support for automatically starting the Rserve server after rebooting the Linux host. To accomplish this, you can for instance use crontab using a shell script like the following which starts a new Rserve process if none is running: pgrep -u ruser -f "Rserve --RS-port <PORT> --no-save" || R CMD Rserve --RS-port <PORT> --no-save

Configuring SAP HANA Parameters


Depending on your system landscape and your R requirements you may need to modify some of the SAP HANA database configurations. All of the R related configuration parameters are to be found in the indexserver.ini file, under the calcEngine section. In order to modify indexserver.ini parameters use the SAP HANA studio: 1.) 2.) 3.) 4.) 5.) Right click on your system node at the navigator tab. Select Administration. Select on the right hand side the Configuration tab. Select the indexserver.ini Select the calcengine

In indexserver.ini under the calcEngine section you can add following configuration parameters: cer_timeout Connection timeout in seconds Default: 300 This parameter is particular important, since it defines the maximal run time allowed for a single R function execution. If you expect your R processing to run longer than 5 minutes you should modify this parameter, otherwise the R processing will be stopped before completion. cer_rserve_addresses List of host (given as IPv4 address) and port pairs, where Rserves are running Has to be set as follows "host1:port1,host2:port2,..." Use multiple hosts to accomplish High Availability

cer_rserve_maxsendsize maximum size of a result transferred from R to SAP HANA (in Kbytes) default: 0 (no limit) If the result-size exceeds the limit, the transfer is aborted with an error

Once R and Rserve are installed on the Linux host, download the zipfiles for and install the required R packages (see list from Step 2). Before installing rJava run the following from the Linux shell: R CMD javareconf -e

Install the following Packages in R:


install.packages("rJava", lib="/usr/local/lib64/R/library") install.packages("RJDBC", lib="/usr/local/lib64/R/library") install.packages("RODBC", lib="/usr/local/lib64/R/library") install.packages("DBI", lib="/usr/local/lib64/R/library") install.packages("monmlp", lib="/usr/local/lib64/R/library") install.packages("AMORE", lib="/usr/local/lib64/R/library") install.packages("pmml", lib="/usr/local/lib64/R/library") install.packages("arules", lib="/usr/local/lib64/R/library") install.packages("caret", lib="/usr/local/lib64/R/library") install.packages("XML", lib="/usr/local/lib64/R/library")

Just like in Step 2, check that all packages have been installed successfully by typing library(packagename) in the R Console window. If no error is displayed, the package is installed successfully.

To test the HANA/R integration with PA, add a HANA R Multiple Linear Regression object to the previous document using the gasoline.csv data with the same independent and dependent variables as the HANA MLR object.

Upon successful completion, the Prediction Results view shows the R algorithm output and automatically generates a bar/line chart to plot the predicted vs. actual values.

Hillary Bliss, Business Intelligence Consultant Decision First Technologies Hillary.bliss@decisionfirst.com twitter @HillaryBlissDFT Hillary Bliss is a Business Intelligence Consultant specializing in data warehouse design, ETL development, statistical analysis, and predictive modeling. Hillary works with clients and vendors to integrate business analysis and predictive modeling solutions into data warehouses based on their data and business needs. With Decision First Technologies, Hillary uses Data Services, Web Intelligence, Predictive Analysis, and HANA. Hillary has a Masters in Statistics and an MBA from Georgia Tech.

References:
CRAN (Comprehensive R Area Network) SAP Predictive Analysis User Guide (1.0.8) SAP BusinessObjects Predictive Analysis 1.0 Installation and Configuration (previous version of PA) How to Install and Configure Open Source R on Microsoft Windows 7 for SAP PA 1.0 SAP HANA Installation Guide with Unified Installer for SP05 SAP HANA Predictive Analysis Library (PAL) Reference. SAP Note 1650957 (Enabling the HANA Scripting Server) SAP HANA R Integration Guide

Você também pode gostar