Você está na página 1de 22

openSAP Getting Started with Data Science

Exercise Week 1 Unit 6

Initial Data Analysis & Exploratory Data Analysis


openSAP

TABLE OF CONTENTS
INTRODUCTION ............................................................................................................................................... 3
EXERCISE INSTRUCTIONS ............................................................................................................................ 4
Acquire Data ..................................................................................................................................................... 4
Visualize Room ................................................................................................................................................ 7
Create Geographical Hierarchy ...................................................................................................................... 8
Data Visualizations ........................................................................................................................................ 10
Descriptive Statistics .................................................................................................................................... 16
FURTHER READING ...................................................................................................................................... 20

2
openSAP EXERCISE WEEK 1 UNIT 6

INTRODUCTION
These exercises are designed to introduce you to some of the methods we can use to undertake Initial Data
Analysis using SAP BusinessObjects Predictive Analytics expert tool.

The data to be used is openSAP_STORES_US.csv.

This data set is a short list of US based retail stores.

The data set contains the following variables:


STORE (US City location of the store)
TURNOVER (annual sales for the previous 12 month period for each store $000000)
SIZE (size of retail floor space in 000s of sq. ft. for each store)
STAFF (number of staff members in 10s)
MARGIN (total gross margin per store $00000)

There are 5 columns of data and 150 rows. The columns represent the variables defined above, and the
rows represent the values for each of these variables for each individual store.

The exercises show a variety of different visualizations you can use to gain a deeper understanding of the
data and undertake an Initial Data Analysis.

3
openSAP EXERCISE WEEK 1 UNIT 6

EXERCISE INSTRUCTIONS
Acquire Data

Open SAP BusinessObjects Predictive Analytics Select Expert Analytics:

Open Expert Analytics:

4
openSAP EXERCISE WEEK 1 UNIT 6

Select Acquire Data:

For this exercise, the data set we will be using is the openSAP_STORES_US.csv text data. Therefore,
select Text as the data source and then press Next

5
openSAP EXERCISE WEEK 1 UNIT 6

Navigate to the folder where you have downloaded the data sets that accompany this training and select
openSAP_STORES_US.csv:

Press Open. The selected data will be read by SAP Predictive Analytics:

Press Create.

6
openSAP EXERCISE WEEK 1 UNIT 6

Visualize Room
The data set will be created and you will enter the Visualize Room:

On the left side you will see the Measures and Dimensions.

Data is grouped into measures (for quantitative data) and dimensions (for categorical data).
Measures and dimensions can be dragged directly to the Chart Canvas or to shelves in the Chart Builder.

Dimensions can be thought of as the rows in a spreadsheet. These are those things you want to track. They
are customers, pages, country of origin, product category and other items whose attributes are often non-
numerical. Commonly used dimensions are people, products, place and time. These functions are often
described as "slice and dice". Slicing refers to filtering data. Dicing refers to grouping data. A common
example involves sales as the measure, with customer and product as dimensions. In each sale a customer
buys a product. The data can be sliced by removing all customers except for a group under study, and then
diced by grouping by product.

Measures are like the columns in a spreadsheet. They are the quantities you want to measure. Visits, page
views, hits, bounce rate and other items that can be quantified numerically. A measure is a property on
which calculations (e.g., sum, count, average, minimum, maximum) can be made.

This exercise will use a number of different data visualizations to give you a deeper understanding of the
data.

SAP Predictive Analysis has already created measures for some numeric variables (automatic enrichment).
Under dimensions it has listed the variables as numeric (123), and the Store name as a potential
geographical variable (world icon). We can use this last information to create a geographical hierarchy.

7
openSAP EXERCISE WEEK 1 UNIT 6

Create Geographical Hierarchy

Click on the Options button after the STORE variable: Select Create a geographic hierarchy By Names

Press Confirm.

8
openSAP EXERCISE WEEK 1 UNIT 6

There are 139 solved and 11 unsolved geographical areas compared to the internal look-up table. The
unsolved areas occur because there are multiple cities with the same name and manual confirmation is
required to resolve the conflict.

You will need to manually resolve the 11 unsolved areas as follows:

Press Done.

9
openSAP EXERCISE WEEK 1 UNIT 6

The geographical hierarchy will now appear under Dimensions section:

Data Visualizations
There are a number of different visualizations you can now produce to start to gain a deeper understanding
of the data:

Simple bar chart:

10
openSAP EXERCISE WEEK 1 UNIT 6

Simple bar chart with multiple measures:

Bar chart filtered for the top 10 stores by turnover:

To achieve the filtered chart above, ensure you have TURNOVER as the measure, select the 123 radio
button and choose the filter:

Geographical maps can be used:

11
openSAP EXERCISE WEEK 1 UNIT 6

This indicates that there are some errors in the allocation of the geographical areas that were automatically
assigned, as the cities should all be located in the USA. This error is an important finding in the analysis and
the data should be corrected. This is achieved as follows:

Select the options on the dimension Geography_STORE. Select edit Reconciliation

Correct the errors:

12
openSAP EXERCISE WEEK 1 UNIT 6

Press Done.

This analysis indicates that there is a wide distribution of margin and size across all of the stores in the US.
To gain more specific information you could try to filter the top 20 or bottom 20 stores for example.

The Scatter Matrix Chart will give you an initial understanding of possible outliers and groups within the data.
Select the Scatter Matrix Chart:

13
openSAP EXERCISE WEEK 1 UNIT 6

This will create the following visualization:

The scatter matrix chart shows that there are possibly two or more groups of stores.

A bubble chart:

The bubble chart shows 4 variables TURNOVER, SIZE, STAFF and STORE. It is filtered to show the
stores for California only.

There are clearly some very interesting stores. For example the store in the top right bubble Santa Clarita
has large STAFF, TURNOVER and SIZE. However, just underneath there is another similar sized bubble
representing Moreno Valley that has similar staff and turnover, but less store size. It would be interesting for
the organization to understand why this store can achieve similar turnover with the same number of staff but
in a smaller retail area. There are also some other interesting stores to investigate, such as Fresno and
Oakland.

Heat Maps and Tree Maps can provide useful comparative data insight:

14
openSAP EXERCISE WEEK 1 UNIT 6

The data can be viewed so you can pinpoint stores and look at the actual data:

The Parallel Coordinates Chart can also be used to see outliers and potential groups in the data (remove
filter):

The Parallel Coordinates Chart confirms that there are possibly two or more groups of stores. Looking at the
last vertical axis for MARGIN you will see that the stores are grouped into possibly two regions. This is also

15
openSAP EXERCISE WEEK 1 UNIT 6

true for the STAFF axis. Interestingly high MARGIN stores seem to group with high STAFF, low STAFF and
relatively low TURNOVER. This insight should indicate that a segmentation model might give us very
interesting results, but more about these algorithms later in the course.

Radar charts can be used to compare different dimensions and point to unusual values. This chart has been
simplified by filtering on the top 15 stores by selecting the option in the 123 radio button:

Descriptive Statistics

Go to the Predict Room:

The data is shown in the data component on the left hand side. Click the green arrow radio button:

Press the OK button and the data will be analysed:

16
openSAP EXERCISE WEEK 1 UNIT 6

This will take you to the Results tab with the Data Grid:

Select the Statistical Summary Chart radio button:

17
openSAP EXERCISE WEEK 1 UNIT 6

This will give the Statistical Summary Chart with the distribution, count, min, max, range and standard
deviation, variance, average, sum, count all values for the measures.

Note that the count for each variable is 150. This means there are no missing values in any of these
variables.

The distribution for TURNOVER and SIZE is fairly normal, with an average of 5.84 and 3.05 respectively.
However, the distributions for STAFF and MARGIN are more bimodal with two distinct peaks. The number
of staff in a store ranges from 1 to 6.9, which represents 10 to 69 staff members.

18
openSAP EXERCISE WEEK 1 UNIT 6

The Parallel Coordinates Chart is available (this was described above):

The Scatter Matrix Chart is available (this was described above):

This completes the introductory exercise to Week 1 Unit 6 Initial Data Analysis & Exploratory Data Analysis.

19
openSAP EXERCISE WEEK 1 UNIT 6

FURTHER READING
There are many more visualization in the Visualize Room that you can experiment with.

You can also compose stories by selecting important presentations in the Compose Room, and then you can
share them in the Share Room.

Detailed instructions and information regarding other visualization options can be found in the user guide
pa31_expert_user_en.pdf.

20
openSAP EXERCISE WEEK 1 UNIT 6

Coding Samples
Any software coding or code lines/strings (Code) provided in this documentation are only examples and are
not intended for use in a production system environment. The Code is only intended to better explain and
visualize the syntax and phrasing rules for certain SAP coding. SAP does not warrant the correctness or
completeness of the Code provided herein and SAP shall not be liable for errors or damages cause by use of
the Code, except where such damages were caused by SAP with intent or with gross negligence.

21
www.sap.com

2016 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form
or for any purpose without the express permission of SAP SE or an SAP
affiliate company.
SAP and other SAP products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of SAP SE (or an
SAP affiliate company) in Germany and other countries. Please see
http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for
additional trademark information and notices. Some software products
marketed by SAP SE and its distributors contain proprietary software
components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for
informational purposes only, without representation or warranty of any kind,
and SAP SE or its affiliated companies shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP SE or
SAP affiliate company products and services are those that are set forth in
the express warranty statements accompanying such products and services,
if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue
any course of business outlined in this document or any related presentation,
or to develop or release any functionality mentioned therein. This document,
or any related presentation, and SAP SEs or its affiliated companies
strategy and possible future developments, products, and/or platform
directions and functionality are all subject to change and may be changed by
SAP SE or its affiliated companies at any time for any reason without notice.
The information in this document is not a commitment, promise, or legal
obligation to deliver any material, code, or functionality. All forward-looking
statements are subject to various risks and uncertainties that could cause
actual results to differ materially from expectations. Readers are cautioned
not to place undue reliance on these forward-looking statements, which
speak only as of their dates, and they should not be relied upon in making
purchasing decisions.

Você também pode gostar