Escolar Documentos
Profissional Documentos
Cultura Documentos
SAP Predictive Analysis is the latest addition to the SAP BusinessObjects suite and introduces entirely new functionality to the existing Business Objects toolbox. Predictive Analysis integrates SAPs Visual Intelligence data visualization tool with new predictive functionality powered by both open source R and SAP-written algorithms. Predictive Analysis includes algorithms for time series forecasting (for predicting sales, demand, price, and other time-dependent metrics), clustering (for identifying distinct groups of individuals based on numeric descriptive data), decision trees (for creating a tree-like set of decision support rules to categorize observations), and linear regression (for fitting linear relationships between a dependent variable and one or more predictors). These predictive algorithms can be used to extract insights and predictions, improving the value and actionability of the existing Business Intelligence infrastructure. Predictive Analysis combines these powerful predictive algorithms with a familiar and easy-to-use tool that integrates with existing BusinessObjects tools to make data preparation, model building, and implementation faster and easier than ever before. SAP Predictive Analysis is installed locally on the users machine and accesses data for processing locally (from a CSV, Exce l, or ODBC connection to a database) or on SAP HANA. Predictive Analysis can be used alone to analyze data on the client machine or can be paired with HANA, allowing PA to leverage the powerful in-memory processing power of SAP HANA.
Todays blog post is a brief how-to for getting SAP Predictive Analysis installed and working for both local data and on HANA. Since Predictive Analysis has the power to run local R algorithms, HANA-R algorithms, and HANA Predictive Analysis Library (PAL) algorithms, installation is somewhat complex. Installation steps include: 1.) 2.) 3.) 4.) Installing PA Installing R on local machine PA HANA PAL installation/activation R and Rserve Linux Installation and HANA Rserve setup
The following items are required to perform the full installation: Predictive Analysis installation files for version 1.0.8 (or latest available) An internet connection to install R and required packages SAP HANA A Linux R/Rserve host (SAP currently only supports SUSE Linux)
At this point, most of the predictive algorithms will be grayed out and not available because R is not yet installed.
Before installing R, ensure that Java is installed and available on the computer by typing java version at the command line. The message below (showing Java version 1.6.0 or higher) is required for Predictive Analysis.
First, go to the CRAN (Comprehensive R Area Network) site and download R for Windows (version 2.15 or later). Select base to install the base R software. Finally, select Download R 2.15.3 (or the current latest version) for Windows.
In addition, R requires a list of packages for the built in algorithms, so download these from CRAN as well by selecting Packages on the left menu.
The list of packages required by PA is below. Search for each package in the CRAN packages library and download the zip file to any location on your computer.
rJava AMORE RJDBC pmml RODBC arules DBI caret monmlp XML
Once all the R install files have been downloaded, install R by executing the R executable file (ex. R2.15.3-win.exe) and clicking on Next where appropriate. On the component selection screen, select Core Files and either the 32-bit or 64-bit Files as appropriate. Note the installation location for this instance of R (should be something like: C:\Program Files\R\R-2.15.3).
After installing R, open up the R GUI to install the packages that have been downloaded.
Select each of the downloaded package zip files on the list above and install one by one. If the package is installed successfully, the message Package <packagename> successfully unpacked and MD5 sums checked will appear in the R Console window. To test that the packages have installed successfully, type library(packagename) in the R Console window. If no error is displayed, the package is installed successfully.
Finally, set environment variables to add the libraries folder, set the home library, and add update the path. These may change depending on the version and installation location for R, as noted above.
Sys.setenv(R_LIBS = "C:/Program Files/R/R-2.15.3/library"); Sys.setenv(R_HOME = "C:/Program Files/R/R-2.15.3/"); Sys.setenv(Path = "existing path; C:/Program Files/R/R2.15.3/library/rJava/jri; C:/Program Files/R/R-2.15.3/bin");
Now that R and the required packages are successfully installed, open Predictive Analysis and select FileInstall and Configure R. If Predictive Analysis was open during the R installation and set up or if changes were made to the Install and Configure R settings, the user must completely exit and re-open Predictive Analysis prior to running any predictive objects.
Select the Configuration tab. Make sure that the Enable Open Source R Algorithms box is checked and ensure that the R installation folder noted during the R console installation process matches the actual location of the R install.
Re-open the MTA fare plaza data document and proceed to the Predict view. The R algorithms should no longer be grayed out.
Lets run a quick forecast, just to make sure the R integration is working. In the prediction dataflow area, select a Filter object from the Data Preparation pane. Then connect an R-Double Exponential Smoothing object. The Filter object should filter for only Plazaid=3 while the R-Double Exponential Smoothing should predict the dependent column ETC over the Custom period starting in 2010 and having 365 period per year. Once these settings are entered, all 3 objects should show successful validation, with a green check on the object.
Success! Review the results by clicking Yes and then selecting the Charts view in the Results pane. The chart below shows the actual (blue bars) vs. predicted (green line) traffic at Plaza 3.
Refer to the SAP HANA Installation Guide with Unified Installer for SP05 section 3.8: Installing Application Function Libraries (AFLs) on a SAP HANA System to install the PAL on HANA. Enable scripting in HANA by starting the script server per SAP Note 1650957:
Starting the script server You start the script server while the SAP HANA database is already running. To start the script server perform the following steps: 1. Open the 'Configuration' tab page in the SAP HANA database studio. 2. Expand the 'daemon.ini' configuration file. 3. Expand the 'scriptserver' section. 4. Change the parameter 'instances' from 0 to 1. This change is possible on the system level and on the host level. Note: The system will start the script server immediately. Note: You have to start a script server instance for each index server instance.
See SAP Note 1650957 for additional information. For further information on the PAL, see the SAP HANA Predictive Analysis Library (PAL) Reference. To test that the PAL installation was successful, run a quick analysis using a PAL algorithm. Import the gasoline data into HANA using the Data ImportData from Local File tool. Original Data Source: http://people.sc.fsu.edu/~jburkardt/datasets/regression/x17.txt
Now that the data is available in HANA, create a new document in PA and select the HANA Online data source:
Enter in the HANA server connection information and click Connect HANA instance. After successful connection, navigate to the table that was just imported into HANA and click Acquire.
In the Predict pane, add a HANA Multiple Linear Regression object and select Component 1, Component 2, Component 3 and Condition as the Independent Columns and Octane Rating as the Dependent Column.
The predictive workflow should indicate validation (green check mark on each item). Click the Run Analysis button to generate the predicted values for Octane Rating. When the success message appears, review the results showing that the regression has generated a model with R-Squared of 0.906.
A visualization can be generated by going to the Prediction Pane Visualize tool and plotting the Actual and Predicted Octane Rating vs. Sample.
All HANA algorithms should now be available in the Prediction pane for the HANA Online document.
Java headers gen.: Example: JAVAH=/apps/java/jdk/bin/javah Java archive tool: Example: JAR=/apps/java/jdk/bin/jar Java library paths: Example: JAVA_LD_LIBRARY_PATH=$(JAVA_HOME)/jre/lib/amd64/server:(JAVA_HOME)/jre/lib/amd64: $(JAVA_HOME)/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/li b:/usr/lib Example: JAVA_LIBS=$(JAVA_HOME)/jre/lib/amd64/server:(JAVA_HOME)/jre/lib/amd64: $(JAVA_HOME)/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/li b:/usr/lib Note: If you want to run R under a dedicated user, create a user before you proceed. For example create the user named ruser.
The following steps are taken from SAP HANA R Integration Guide sections 1.1.1 1.1.3.
Installing R
Check or install these Linux packages: xorg-x11-devel gcc-fortran readline-devel (install if you want to use R standalone(usability)) To install R for SAP HANA, you must compile the R package from its source code. SAP has tested the SAP HANA integration with R version 2.15. Execute the following steps as user root:
1.) To compile R, download the R (version 2.15) source package from the R Project for Statistical Computing website.
2.) Extract it to a user defined directory. Go into this directory, and execute: ./configure --enable-R-shlib If you have trouble during configuration you could try the following options: --with-readline=no --with-x=no If you want to have terminal/history support in R, then you can install the Linux package readline-devel. 3.) Then compile: make clean make make install
After successful compilation, the 'R' command will be installed in /usr/local/bin. If you decide to install it into another directory, make sure that it is properly set in your PATH variable.
Installing Rserve
Install Rserve on the same host. SAP has tested the integration with Rserve version 0.6-8. Execute the following steps as root user. The user under which you want to use R is called ruser. 1.) Install the Rserve package. a. Download the Rserve package from Rserve: Binary R server or from The R Project for Statistical Computing. b. Login as root user and install the package using the following R terminal command: R install.packages("/PATH/TO/YOUR/Rserve.tar.gz", repos = NULL) library("Rserve") # test if installation worked, it should return no output q() 2.) Configure Rserve. As user root, create the file /etc/Rserv.conf with the following content and grant file readaccess rights to the ruser user.: maxinbuf 10000000 Maxsendbuf 0 remote enable
The value 10000000 is merely an example. We recommend that you set the value of maxinbuf to (physical memory size, in bytes) / 2048. For example if you installed R on a host with 256 GB of physical memory you should set maxinbuf to 134217728. 3.) Start Rserve, login as ruser, and enter the following:. R CMD Rserve --RS-port <PORT> --no-save --RS-encoding "utf8" The port which is used to start Rserve has to be chosen according to the cer_rserve_port value in the indexserver.ini file (see section Prepare the SAP HANA database for R below). <PORT> is the port number, e.g. 30120. The --no-save option makes sure that the invoked R runtimes do not store the R environment onto the file system after the R execution has been stopped. This is important to avoid the file system to be filled over time due to multiple R runs. There is currently no support for automatically starting the Rserve server after rebooting the Linux host. To accomplish this, you can for instance use crontab using a shell script like the following which starts a new Rserve process if none is running: pgrep -u ruser -f "Rserve --RS-port <PORT> --no-save" || R CMD Rserve --RS-port <PORT> --no-save
In indexserver.ini under the calcEngine section you can add following configuration parameters: cer_timeout Connection timeout in seconds Default: 300 This parameter is particular important, since it defines the maximal run time allowed for a single R function execution. If you expect your R processing to run longer than 5 minutes you should modify this parameter, otherwise the R processing will be stopped before completion. cer_rserve_addresses List of host (given as IPv4 address) and port pairs, where Rserves are running Has to be set as follows "host1:port1,host2:port2,..." Use multiple hosts to accomplish High Availability
cer_rserve_maxsendsize maximum size of a result transferred from R to SAP HANA (in Kbytes) default: 0 (no limit) If the result-size exceeds the limit, the transfer is aborted with an error
Once R and Rserve are installed on the Linux host, download the zipfiles for and install the required R packages (see list from Step 2). Before installing rJava run the following from the Linux shell: R CMD javareconf -e
Just like in Step 2, check that all packages have been installed successfully by typing library(packagename) in the R Console window. If no error is displayed, the package is installed successfully.
To test the HANA/R integration with PA, add a HANA R Multiple Linear Regression object to the previous document using the gasoline.csv data with the same independent and dependent variables as the HANA MLR object.
Upon successful completion, the Prediction Results view shows the R algorithm output and automatically generates a bar/line chart to plot the predicted vs. actual values.
Hillary Bliss, Business Intelligence Consultant Decision First Technologies Hillary.bliss@decisionfirst.com twitter @HillaryBlissDFT Hillary Bliss is a Business Intelligence Consultant specializing in data warehouse design, ETL development, statistical analysis, and predictive modeling. Hillary works with clients and vendors to integrate business analysis and predictive modeling solutions into data warehouses based on their data and business needs. With Decision First Technologies, Hillary uses Data Services, Web Intelligence, Predictive Analysis, and HANA. Hillary has a Masters in Statistics and an MBA from Georgia Tech.
References:
CRAN (Comprehensive R Area Network) SAP Predictive Analysis User Guide (1.0.8) SAP BusinessObjects Predictive Analysis 1.0 Installation and Configuration (previous version of PA) How to Install and Configure Open Source R on Microsoft Windows 7 for SAP PA 1.0 SAP HANA Installation Guide with Unified Installer for SP05 SAP HANA Predictive Analysis Library (PAL) Reference. SAP Note 1650957 (Enabling the HANA Scripting Server) SAP HANA R Integration Guide