Você está na página 1de 40

DMM105 Applying Machine Learning to RealTime Streaming Analytics

Public

Speakers
Las Vegas, Sept 19 - 23

Bangalore, October 5 - 7

Barcelona, Nov 8 - 10

Rob Waywell, SAP HANA


Product Management

Luke Kwong, SAP

Ruediger Karl, SAP HANA


Product Management

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

Disclaimer
The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of
SAP. Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or
any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this
presentation or any related document, or to develop or release any functionality mentioned therein.
This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms
directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice.
The information in this presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality.
This presentation is provided without a warranty of any kind, either express or implied, including but not limited to, the implied
warranties of merchantability, fitness for a particular purpose, or non-infringement. This presentation is for informational
purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this
presentation, except if such damages were caused by SAPs intentional or gross negligence.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially
from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only
as of their dates, and they should not be relied upon in making purchasing decisions.

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

Agenda
Introduction to SAP HANA Smart Data Streaming
Streaming Analytics & Complex Event Processing
Machine Learning in HANA Smart Data Streaming
Adaptive Hoeffding Tree
DenStream Clustering

Demo
Example Use Cases
Integration with SAP Predictive Analytics
Resources

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

SAP HANA smart data streaming


extends the capabilities of HANA to analyze and act on data in motion
Continuous Intelligence with Streaming Analytics:

insight from live event streams


instantaneous response

Process live event streams

Analyze data in motion


Analyze events in the context of
other events and historical data

Alerts while there is time to act

Immediate response

Watch for patterns, trends,


correlations

Continuous computation of KPIs

Apply predictive analytics to


anticipate whats coming

Stream key data to live


dashboards

Apply rules to determine what


action to take

Process events in real-time, as


fast as they arrive

Filter, enrich, transform,


normalize

Capture high value data in SAP


HANA

2016 SAP SE or an SAP affiliate company. All rights reserved.

Act in Real-time

Public

SAP rated a Leader in 2016 Forrester Wave:


Big Data Streaming Analytics
Streaming analytics is essential for bringing
real-time context to apps.
- Forrester

Forrester defines streaming analytics as:


Software that can filter, aggregate, enrich, and
analyze a high throughput of data from multiple,
disparate live data sources and in any data
format to identify simple and complex patterns to
provide applications with context to detect
opportune situations, automate immediate
actions, and dynamically adapt.

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

Benefits of SAP HANA smart data streaming

Part of the SAP HANA Platform

SQL-based event process language familiar and concise


Both visual and text editors in HANA Studio
Rich set of testing tools in HANA Studio

2016 SAP SE or an SAP affiliate company. All rights reserved.

Standard REST and WebSocket interfaces


Range of standard adapters
Extensible adapter toolkit
APIs for C++, Java and .NET

Secure

Millisecond latency from arrival to response

Rapid Development

Built-in predictive models based on machine learning

Range of Connectivity Options

Process events in real-time, as fast as they arrive


Hundreds of thousands to millions of events per second
Cluster architecture support low-cost multi-node scale out

Fast Low Latency

Predictive Analytics

Scalable

Runs as a separate service in the HANA system


Managed via the HANA Cockpit
Access to HANA tables in streaming data models

All connectivity/actions require HANA authentication


Granular access control to the stream/action level

Fault Tolerant

Multi-node clusters with auto-restart


Cluster managers operate as a peer network
Event windows can be configured as recoverable

Public

Streaming Analytics & Complex


Event Processing

Public

Streaming data sources are everywhere

Sensors
Click streams

Social media
Transactions
Market prices

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

Event stream processing uses continuous queries

Database Queries

Step 1:

Step 2:

Store the data

Query the data

2016 SAP SE or an SAP affiliate company. All rights reserved.

Continuous Queries

Step 1:

Step 2:

Define the
continuous
queries and the
dataflow

Wait for data to arrive.


As it arrives, it flows
through the continuous
queries to produce
immediate results

Public

10

Complex Event Processing extracts insight from raw events

Virtually no useful
information in a
single isolated event

history

e.g. Compare
variance of trends
across multiple
sensors against
historical norms

Alert

Event window e.g. 30 min

Sensor readings 10s of thousands per second


2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

11

Smart data streaming runs as a service of the HANA Platform

Input Streams

HANA DB

Alerts

adapters

Dashboards
Streaming Server

Applications

SAP HANA Platform


Hadoop

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

12

Why Streaming Analytics?


Situation Detection

Watch for trends or


patterns

Spot significant
changes

Monitor correlations

Compare current
values to historical
norms

Alerts
Examples:

Examples:

Automatically adjust
prices based on
market conditions

Dispatch a technician
for urgent preventive
maintenance

Tailor an offer to a
user based on current
activity

Shut down a system to


prevent damage

Alert a supervisor
when a machine
needs adjustment
before quality is
affected
Alert IT staff as soon
as a security threat is
detected

Apply predictive
models to anticipate
whats coming

2016 SAP SE or an SAP affiliate company. All rights reserved.

Live Dashboards

Immediate Response

Continuously compute
and stream summary
data to live
dashboards

Public

13

Integrating event streams with SAP HANA


Receive events from streaming sources at
high speeds
Process hundreds of thousands or millions of events per second,
in real-time
REST and WebSocket interfaces
Kafka, JMS and MQ adapters
Extensible adapter toolkit plus other standard adapters

Filter, enrich and normalize incoming data


Rather than capturing all the raw data in HANA, optimize HANA
resources by capturing valuable information in an optimized data
model

2016 SAP SE or an SAP affiliate company. All rights reserved.

Data reduction
Options include:
Sample high frequency data to reduce the number of data
points
Only record changes

Data tiering:
Store high value data in HANA in-memory tables
Lower value data in HANA Dynamic Tiering or in Hadoop
High speed HANA database loading with support for parallel
writing to multiple partitions

Public

14

Machine Learning

Public

What is Machine Learning?


Machine learning is a subfield of computer science[1] that evolved from the study of pattern
recognition and computational learning theory in artificial intelligence.[1] Machine learning
explores the study and construction of algorithms that can learn from and make predictions
on data.[2] Such algorithms operate by building a model from example inputs in order to make
data-driven predictions or decisions,[3]:2rather than following strictly static program
instructions. (Wikipedia)
Machine learning is the science of getting computers to act without being explicitly
programmed. (Stanford University Machine Learning course description)

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

16

Wide Range of Machine Learning Algorithms

Decision tree learning


Association rule learning
Artificial neural networks
Inductive logic
programming
Support vector machines
Clustering

2016 SAP SE or an SAP affiliate company. All rights reserved.

Bayesian networks
Reinforcement learning
Representation learning
Similarity and metric learning
Sparse dictionary learning
Genetic algorithms

Public

17

Predictive Analytics for Streams

New for HANA Smart Data


Streaming SPS11

Machine Learning Algorithms

Adaptive Hoeffding Tree


An incremental decision tree algorithm
which uses limited samples to choose the
best tree node splitting attribute
Learn a tree-like graph from historical data
to model the decision rules and map an
observation to its target value as prediction.

DenStream (Clustering)
An incremental clustering algorithm which
uses the concept of micro-clusters to
summarize clusters of arbitrary shapes and
a pruning technique to detect outliers.

It is in-sensitive to noise and its novel


pruning technique leads to better memory
management of steaming data that is
evolving.

Detect s concept drift and updates the tree


model automatically

Low-Overfitting; No need of pruning and


examples are used only once

Widely used to mine unknown patterns from


the data.

Low-Variance; Stable decisions with


statistical support

Low Resource utilization; Uses limited


hardware resources (CPU, Memory & I/O)

Traditional batch clustering algorithms take


all the available data as input, not suitable
for stream data which usually only allows
single-pass of the data.

Best suited to address problems where old


data is less important.

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

18

Relevance of Machine Learning to Stream Analytics


Why does it matter?
Event stream processing is about reacting to event driven data in real time as the events
happen
Historical data may be of less significance than current data for near time (immediate future) prediction
Expected behavior may shift or drift over time
Utilizing Machine Learning algorithms within Smart Data Streaming projects provides the ability to:
Incorporate current data into prediction algorithms immediately rather than periodically polling an external data
source
Instantly and progressively adapt to changing conditions and behaviors
Leverage machine learning algorithms that are designed for continuous analytics with low latency

Machine Learning Algorithms that learn on the fly are always kept up-todate while utilizing less hardware resources and generating predictions
with a single pass of data
2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

19

Demo
Implementing machine learning algorithms in HANA Smart Data Streaming

Public

Demo Notes
Demo will show:

HANA Studio
Minimal streaming project for Hoeffding Training
Streaming project for Hoeffding Scoring
Streaming project for DenStream Clustering
Model Definition in HANA
Discuss interpreting the output

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

21

Demo Notes

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

22

Example Use Cases

Public

Driver Drowsiness Detection Hoeffding Tree


A Hoeffding model is pre-trained to detect steering behavior that is indicative of driver drowsiness.
The scoring operation is then used to decide if a driver is drowsy and to what degree. Alerts are generated based
on the scored output.

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

24

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

25

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

26

Fraud Detection DenStream Clustering

Purchasing behavior using either debit or credit will likely show a unique pattern for individual users
Common dimensions to define the behavior may be

Geographic location
Purchase amount
Time of day
Others?

The DenStream Clustering algorithm can be used to identify clusters of normal purchasing
behavior for individual users.
Outliers to the normal clustered purchasing behavior would be indicators of potential theft or fraud

Combining multiple models for different dimensions can increase the strength of the prediction
Example: Combine geographic location and purchase amount
If both dimensions show an outlier to the normal clustered behavior then you have a stronger
indicator of possible fraud than either dimension in isolation

Further combining with clustered behavior of larger groups can help to eliminate false alarms

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

27

Fraud Detection DenStream Clustering


Example
Credit card holder in England may make most of their purchases at restaurants within 20km of their home address
A restaurant purchase in Greece would show up as an outlier at the individual level
Cluster analysis at a macro level may show that many people from England make purchases in Greece for a 1-2 week
period out of the year probably while on vacation
Combining the individual & macro clusters would reduce the probability that this purchase is a fraudulent one
But hold on
You can add time of purchase too if the same card is used in 2 establishments a few thousand miles apart
within a few minutes of each other there is likely a problem.
Or

If you further identified that a normal cluster of behavior is for customers from England who make purchases in a
foreign country have also purchased a plane ticket in the past X months then you have an even stronger indicator
that this is or isnt a normal practice
Which may be best handled by combining with a Hoeffding Tree to make a decision
based on the combination of factors

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

28

Integration with SAP Predictive


Analytics

Public

Automated Analytics
A few data mining functions for answering various business questions
Classification / Scoring
Who will churn, fraud or buy next week, next month ?
Regression
How many products will a customer buy next month, next quarter ?
Segmentation / Clustering
What are the groups of customers with similar behavior or profile ?

Forecasting
How much will be the monthly revenue or number of churners next year ?
Recommendations
What is the best offer or action for a customer or internet user ?

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

30

SAP Predictive Analytics features

Prepare data:

Produce automatic analysis of models:

Define source data creation or manipulations

Estimate predictive power and confidence

Create persistent metadata definitions

Identify the most contributive attributes

Enable the automatic creation of thousands of


derived attributes

Produce statistical reports

Create graphical analysis of results

Perform automatic predictive modeling:

Save, export, and apply results:

Regression and classification

Create and save persistent models

Clustering

Forecasting

Export scoring equations as structured query


language (SQL) or code like CCL

Association rules

Social network analysis

Apply predictive models directly to data sets


and in your database (if applicable)

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

31

Automated Analytics
Automated Data preparation

Provide business
analysts with a
fully automated
process

Create 1000s of derived


attributes

Define metadata once

Builds analytic dataset


automatically

Automated Predictive modeling


/ Data mining

Automated model deployment


& management factory
Social & recommendation
Automated Analytics

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

32

Expert Analytics
Perform statistical analysis on
your data to understand
trends & detect outliers in
your business

Provide data
scientists with
sophisticated
algorithms to take
the next step in
understanding
their business
and modeling
outcomes

Build models & apply to


scenarios to forecast potential
future outcomes

Breadth of connectivity to
access almost any data
Optimized for SAP HANA to
support huge data volumes &
in-memory processing
Expert Analytics

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

33

Demo
Integrating SAP Predictive Analytics with HANA Smart Data Streaming

Public

Resources

Public

Resources
Smart Data Streaming Developer Center
http://scn.sap.com/community/developer-center/streaming

Table of Contents - Smart Data Streaming

http://scn.sap.com/community/developer-center/streaming/blog/2016/03/11/table-of-contents

Machine Learning Algorithms Hoeffding Tree

http://scn.sap.com/docs/DOC-71415

Machine Learning Algorithms DenStream Clustering

http://scn.sap.com/docs/DOC-71407

HANA Streaming (and ESP) Internal Community


https://jam4.sapjam.com/groups/about_page/77LOiFUPLkqETB9UFFNsiq

CIO Guide to Using the SAP HANA Platform for Big Data
https://jam4.sapjam.com/wiki/show/0reRAeek9m48mfnFYv7Dm7
2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

36

Further information
Related SAP TechEd sessions:
DMM167 - Implementing Streaming Analytics for Real-Time Alerting and Response

SAP Public Web


Smart Data Streaming Developer Center http://scn.sap.com/community/developer-center/streaming
scn.sap.com
www.sap.com

Watch SAP TechEd Online


www.sapteched.com/online

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

37

SAP TechEd Online


Continue your SAP TechEd
education after the event!
Access replays of

2016 SAP SE or an SAP affiliate company. All rights reserved.

Keynotes
Demo Jam
SAP TechEd live interviews
Select lecture sessions
Hands-on sessions

Public

38

Feedback
Please complete your
session evaluation for
DMM105.

2016 SAP SE or an SAP affiliate company. All rights reserved.

Contact information:
Rob Waywell
HANA Product Management
robert.waywell@sap.com

Public

39

2016 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://www.sap.com/corporate-en/about/legal/copyright/index.html for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forwardlooking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

2016 SAP SE or an SAP affiliate company. All rights reserved.

Public

40

Você também pode gostar