Escolar Documentos
Profissional Documentos
Cultura Documentos
TABLE OF CONTENTS
INTRODUCTION ............................................................................................................................................... 3
EXERCISE INSTRUCTIONS ............................................................................................................................ 4
Acquire Data ..................................................................................................................................................... 4
Visualize Room ................................................................................................................................................ 7
Create Geographical Hierarchy ...................................................................................................................... 8
Data Visualizations ........................................................................................................................................ 10
Descriptive Statistics .................................................................................................................................... 16
FURTHER READING ...................................................................................................................................... 20
2
openSAP EXERCISE WEEK 1 UNIT 6
INTRODUCTION
These exercises are designed to introduce you to some of the methods we can use to undertake Initial Data
Analysis using SAP BusinessObjects Predictive Analytics expert tool.
There are 5 columns of data and 150 rows. The columns represent the variables defined above, and the
rows represent the values for each of these variables for each individual store.
The exercises show a variety of different visualizations you can use to gain a deeper understanding of the
data and undertake an Initial Data Analysis.
3
openSAP EXERCISE WEEK 1 UNIT 6
EXERCISE INSTRUCTIONS
Acquire Data
4
openSAP EXERCISE WEEK 1 UNIT 6
For this exercise, the data set we will be using is the openSAP_STORES_US.csv text data. Therefore,
select Text as the data source and then press Next
5
openSAP EXERCISE WEEK 1 UNIT 6
Navigate to the folder where you have downloaded the data sets that accompany this training and select
openSAP_STORES_US.csv:
Press Open. The selected data will be read by SAP Predictive Analytics:
Press Create.
6
openSAP EXERCISE WEEK 1 UNIT 6
Visualize Room
The data set will be created and you will enter the Visualize Room:
On the left side you will see the Measures and Dimensions.
Data is grouped into measures (for quantitative data) and dimensions (for categorical data).
Measures and dimensions can be dragged directly to the Chart Canvas or to shelves in the Chart Builder.
Dimensions can be thought of as the rows in a spreadsheet. These are those things you want to track. They
are customers, pages, country of origin, product category and other items whose attributes are often non-
numerical. Commonly used dimensions are people, products, place and time. These functions are often
described as "slice and dice". Slicing refers to filtering data. Dicing refers to grouping data. A common
example involves sales as the measure, with customer and product as dimensions. In each sale a customer
buys a product. The data can be sliced by removing all customers except for a group under study, and then
diced by grouping by product.
Measures are like the columns in a spreadsheet. They are the quantities you want to measure. Visits, page
views, hits, bounce rate and other items that can be quantified numerically. A measure is a property on
which calculations (e.g., sum, count, average, minimum, maximum) can be made.
This exercise will use a number of different data visualizations to give you a deeper understanding of the
data.
SAP Predictive Analysis has already created measures for some numeric variables (automatic enrichment).
Under dimensions it has listed the variables as numeric (123), and the Store name as a potential
geographical variable (world icon). We can use this last information to create a geographical hierarchy.
7
openSAP EXERCISE WEEK 1 UNIT 6
Click on the Options button after the STORE variable: Select Create a geographic hierarchy By Names
Press Confirm.
8
openSAP EXERCISE WEEK 1 UNIT 6
There are 139 solved and 11 unsolved geographical areas compared to the internal look-up table. The
unsolved areas occur because there are multiple cities with the same name and manual confirmation is
required to resolve the conflict.
Press Done.
9
openSAP EXERCISE WEEK 1 UNIT 6
Data Visualizations
There are a number of different visualizations you can now produce to start to gain a deeper understanding
of the data:
10
openSAP EXERCISE WEEK 1 UNIT 6
To achieve the filtered chart above, ensure you have TURNOVER as the measure, select the 123 radio
button and choose the filter:
11
openSAP EXERCISE WEEK 1 UNIT 6
This indicates that there are some errors in the allocation of the geographical areas that were automatically
assigned, as the cities should all be located in the USA. This error is an important finding in the analysis and
the data should be corrected. This is achieved as follows:
12
openSAP EXERCISE WEEK 1 UNIT 6
Press Done.
This analysis indicates that there is a wide distribution of margin and size across all of the stores in the US.
To gain more specific information you could try to filter the top 20 or bottom 20 stores for example.
The Scatter Matrix Chart will give you an initial understanding of possible outliers and groups within the data.
Select the Scatter Matrix Chart:
13
openSAP EXERCISE WEEK 1 UNIT 6
The scatter matrix chart shows that there are possibly two or more groups of stores.
A bubble chart:
The bubble chart shows 4 variables TURNOVER, SIZE, STAFF and STORE. It is filtered to show the
stores for California only.
There are clearly some very interesting stores. For example the store in the top right bubble Santa Clarita
has large STAFF, TURNOVER and SIZE. However, just underneath there is another similar sized bubble
representing Moreno Valley that has similar staff and turnover, but less store size. It would be interesting for
the organization to understand why this store can achieve similar turnover with the same number of staff but
in a smaller retail area. There are also some other interesting stores to investigate, such as Fresno and
Oakland.
Heat Maps and Tree Maps can provide useful comparative data insight:
14
openSAP EXERCISE WEEK 1 UNIT 6
The data can be viewed so you can pinpoint stores and look at the actual data:
The Parallel Coordinates Chart can also be used to see outliers and potential groups in the data (remove
filter):
The Parallel Coordinates Chart confirms that there are possibly two or more groups of stores. Looking at the
last vertical axis for MARGIN you will see that the stores are grouped into possibly two regions. This is also
15
openSAP EXERCISE WEEK 1 UNIT 6
true for the STAFF axis. Interestingly high MARGIN stores seem to group with high STAFF, low STAFF and
relatively low TURNOVER. This insight should indicate that a segmentation model might give us very
interesting results, but more about these algorithms later in the course.
Radar charts can be used to compare different dimensions and point to unusual values. This chart has been
simplified by filtering on the top 15 stores by selecting the option in the 123 radio button:
Descriptive Statistics
The data is shown in the data component on the left hand side. Click the green arrow radio button:
16
openSAP EXERCISE WEEK 1 UNIT 6
This will take you to the Results tab with the Data Grid:
17
openSAP EXERCISE WEEK 1 UNIT 6
This will give the Statistical Summary Chart with the distribution, count, min, max, range and standard
deviation, variance, average, sum, count all values for the measures.
Note that the count for each variable is 150. This means there are no missing values in any of these
variables.
The distribution for TURNOVER and SIZE is fairly normal, with an average of 5.84 and 3.05 respectively.
However, the distributions for STAFF and MARGIN are more bimodal with two distinct peaks. The number
of staff in a store ranges from 1 to 6.9, which represents 10 to 69 staff members.
18
openSAP EXERCISE WEEK 1 UNIT 6
This completes the introductory exercise to Week 1 Unit 6 Initial Data Analysis & Exploratory Data Analysis.
19
openSAP EXERCISE WEEK 1 UNIT 6
FURTHER READING
There are many more visualization in the Visualize Room that you can experiment with.
You can also compose stories by selecting important presentations in the Compose Room, and then you can
share them in the Share Room.
Detailed instructions and information regarding other visualization options can be found in the user guide
pa31_expert_user_en.pdf.
20
openSAP EXERCISE WEEK 1 UNIT 6
Coding Samples
Any software coding or code lines/strings (Code) provided in this documentation are only examples and are
not intended for use in a production system environment. The Code is only intended to better explain and
visualize the syntax and phrasing rules for certain SAP coding. SAP does not warrant the correctness or
completeness of the Code provided herein and SAP shall not be liable for errors or damages cause by use of
the Code, except where such damages were caused by SAP with intent or with gross negligence.
21
www.sap.com