Você está na página 1de 31

Data Science, Big Data, and Analytics of IBM

2013 IBM Corporation

INDEX

Part 1
About IBM IBM Research & Use Case Smarter Planet

Part 2
Data Data Science Big Data Analytics

2013 IBM Corporation

Part 1 About IBM IBM Research & Use Case Smarter Planet

2013 IBM Corporation

IBM
IBM
: 1967 4 25, IBM1401 IBM 100% : 1,135

Premier Partner/ISV Advanced Partner Member Partner (Distributor)

65() 73 1,179 9 37

()

( )

2011

2010

2009

12,061
1,304.4

12,250
987.9

12,068
633.5


, IT ,

2011.03 45 - 2011.01 IT ' (ACO) AA 2009.11 IBM 2008.12 IT ' 2008.09 2007.04 IBM 40 2007.03 1 2004 3000 2003 IBM 2002 IBM (SI)

2013 IBM Corporation


(, , DB) ,


, , , ,

(GBS)

IT (GTS)

(SWG)
Database Web Application Groupware

(STG)
Unix Svr NT POS I Series

(GPS)
CRM

(R&D)
Trend

(IGF)

/ / /

/ 2013 IBM Corporation


(, , DB) ,


, , , ,

Global Business Service

Global Technology services

Growth Initiative
Large Deal

Sector
FSS

Consulting Service
S&T (Strategy & Transformation)
I&G (Innovation & Growth) BAO (Business Analytics & Optimization) EA (Enterprise Applications)

AMS & Delivery


AMS

SO Sales
SO (Strategic Outsourcing) Sales

GTS SD
GTS Service Delivery

SO Client Service
SO (Strategic Outsourcing) Client Service

ITS
ITS (Information Technology Services)

ST&MA&C M
Strategy& Marketing*CM

MTS
MTS (Maintenance and Technical Support)

OFFERGROUP

Offering Group

Commercial

Delivery Excellence

Electronics

ITS COVERAGE
ITS

ITS Delivery
ITS Presale & Delivery

ITS SALES
Opportunity Owner

Operation

Ops & Support 6

AIS (Application Innovation Service)

2013 IBM Corporation

IBM ,
IBM , 6

IBM 6

IBM GBS
IT

IBM GTS
IT

IBM GPS

IT

IBM STG

IBM SWG
,

IBM Financing

IBM 10 , 12 IBM 16,000


7

2013 IBM Corporation

IBM Research ( ) 3,000 researchers in 12 labs


Almaden
Watson

Zurich

Haifa

Tokyo

1986

1961

1955

1972

1982

Austin

China

1995 1995
Ireland

4 labs participated in the Watson project

New!

1995

Brazil

Africa

India

Australia

2010

2010

2012

1998

2013 2010 IBM Corporation

Analytics enable better Decisions for Water System Management (Washington D.C. Water and Sewer Authority)
Failure Association
How does environmental conditions impact failure? Does one brand hydrant fail more frequently than the other brand? How does aging process impact asset condition?

Failure Prediction
Which hydrant will fail most likely in the next 6 months? What type of failure will most likely happen given the current condition? How likely is the pipe segment going to fail?

Asset Failure & Risk

Replacement
What is the state of the water delivery and sewage disposal? What is the best to allocate capital for infrastructure network upgrade? PM Optimization

Preventive Maintenance
Can I reduce PM cost? Which failures are driving my water mains repair costs? Which pipes should I replace to prevent challenges next winter?

Application of these techniques in an engagement with Washington D.C. Water and Sewer Authority resulted in 25% increase in maintenance crew utilization 30-50% cost savings on selected inspection and preventive maintenance significant revenue increase through loss prevention and differential pricing

2013 IBM Corporation

Preventive Maintenance for Water System (Washington D.C. Water and Sewer Authority)
Optimize preventive maintenance time for each hydrant by considering the following factors: Inspection cost for PM before failure Repair cost given failure Penalty cost during downtime Failure risk
Periodic inspection interval Inspection cost Repair cost Penalty cost

Min
Downtime (repair)

s.t.
Max allowable periodic inspection interval (364 days)

PM time (days) (100,150] (150, 200] (200, 250] > 250

# of hydrants 1436 2153 2584


2013 IBM Corporation

Maintenance planning

1005

Outage/Damage Prediction and Response Optimization (Utility Company)

Prediction
Optimization Real-time analytics
Optimized Maintenance Plan

Customized Weather Forecast

2 Damage Model Outage Prediction

3 Response Plan

4 Data Assimilation

5 Revised Outage

6 Revised Response

7 Plan Execution

8 Report

2013 IBM Corporation

Predicting Multi-Category Daily Damage Counts (Utility Company)


Objective: Predict the daily multi-category damage counts based on the weather conditions on the
region level Date range: 01/2010~02/2013
temperature (min, max)

Number or Records: 52, 206 for 34 regions Response Categories: 13 (C1~C13) Data Characteristics:
- target: daily damage counts in multiple categories - predictors: 1. Cumulative rainfall in the preceding two weeks; 2. In Day 0, -1, and -2: aggregate the weather conditions

rain rate (max)


daily rain (max) monthly rain (max) humidity (max) average wind speed (max) wind gust speed (max) wind gust frequency pressure (min, max) C1, C2, C3 cumulative rainfall 14-day window 24 hour

Methods: Random Forests Model, Multivariate Poisson Regression Model

Damages
12A M

24 hour

Day 0

Weather conditions

24 hour Day -2

Day -1

2013 IBM Corporation

Maintenance Scheduling (Semiconductor Manufacturing Plant)


The scheduling problem for a wafer fab is a complex extension of the Resource Constrained Project Scheduling Problem that handles planned and unplanned orders. The objectives are to minimize the sum of
the expected WIP in the time periods utilized by maintenance operations, minimize the number of technicians used, avoid performing maintenance early, satisfy business rules.

The scheduling problem needs to integrate the Production schedule with the maintenance schedule so as to avoid maintenance during high demand for a machine The system is currently deployed and generating schedules daily at IBMs East Fishkill 300mm semi-conductor manufacturing plant.

Maintenance Scheduling

2013 IBM Corporation

Anomaly Detection (Semiconductor Manufacturing) - Integrated Outlier Management in Tracer


Objective: Exclude spurious values from score calculation. Method: an Integrated methodology consisting of Information Theoretic Method and Statistical Method; implemented in two steps.
Step I: Calculation done in the context of the data from one chamber group, one recipe, one SVID, both time periods. Step II: Calculation done in the context of the data from one chamber group, one recipe, one SVID, single time period (reference/current).

Step I Information Theoretic Outlier Detection (Entropy Based)

outliers UCL_CUSUM

Step II
Comparison of the the chamber of interest and the control band from the other chambers (Mean m*Std)

SVID

Chamber i
UCL

LCL

Outlier detection for the chamber of Interest (CUSUM Based Method )


2013 IBM Corporation

time

Process Monitoring (Semiconductor Manufacturing) - Hotellings T-squared Control Chart


Objective: Design Hotellings T-squared control charts for manufacturing tools. Method: a complete procedure consisting of Phase-I design (initial study) and Phase II design (process monitoring) - Phase I: remove the outliers from the trace data collected from processes under normal conditions and calculate the in-control mean and covariance matrix; - Phase II: build the control chart using the in-control mean and covariance matrix from Phase I to monitor the current processes. Hotelling's T-squared Control Chart
20
UCL Types UCL for Phase-I Design UCL for Phase-II Design

Out of control

T-squared Value

10

15

Phase-I design

Phase-II design
0
0 50 100 Wafer Label 150 200 250 2013 IBM Corporation

Process Monitoring and Quality Control (Semiconductor Manufacturing)


- Motivation for virtual metrology applications Tools publish large amounts of real-time data
Temperature & pressure

Throttle valve positions

Gas flows

Electric bias, impedance, etc

Virtual metrology (VM) generally refers to a model based prediction of some process outcome when there is no physical measurement of that outcome
Predictive modeling: The underlying models are learned from histories of the actual physical outcomes and process trace data Benefits:
Detect faulty wafers early Improve process control: from lot-to-lot wafer-to-wafer level Reduce physical measurements for process monitoring and control
2013 IBM Corporation

Can we use the data for process control?

Process Monitoring and Quality Control (Semiconductor Manufacturing) - Performance of VM-enhanced process control
Simulation results for a given set of parameters: VM-EWMA : reduced process variance around 70% VM-LM : reduce process variance around 30% Given a target process variance, e.g. 0.03, we can reduce the measurement frequency VM-LM: 1 out of 6 wafers 1 out of 19 wafers VM-EWMA: 1 out of 6 wafers 1 out of 94 wafers
0.08

Variance of Process Outcomes

0.07 0.06 0.05 0.04 0.03

LM VM-EWMA VM-LM

LM:

VM-LM:

VM-EWMA:

0.02 0.01 0 0 50 Wafer Index


2013 IBM Corporation

100

150

Power plant monitoring based on ANACONDA

Business goal: Early anomaly detection to avoid emergency stops of the system
Technical task: Detect anomalously behaving modules by comparing with previous normally-working state
# of sensors ~ 100

Technical hurdle: Na ve thresholding for individual sensors is hard since the system frequently changes its operational mode
Result: Detected about 60% of the serious faults that cannot be detected with conventional methods

Example of detected faults

intake pressure

air flow rate

intake pressure

Anaconda captures the interdependency pattern between variables, and detects a deviation from the normal pattern Example:

normal

faulty

air flow rate

2013 IBM Corporation

IBM Anomaly Analyzer for Correlational Data (ANACONDA) leverages a unique dependency-based anomaly detection technology
ANACONDA monitors the dependency among variables

Setting a fixed threshold on individual variables leads to many false alerts for dynamic systems

ANACONDA computes the anomaly score for individual variables


Unusual change in dependency

Learns dependency patterns from past data under a normal condition Alert is raised if the present dependency is significantly different from the normal pattern

2013 IBM Corporation

Dependency discovery is a key technology

ANACONDA leverages sparse structure learning technique for dependency discovery Automatically discovers important dependencies among sensors
Dependency is indentified by building sensor-wise predictive models

Sensor6

Sensor1

Sensor6

Sensor1

Sensor5

Sensor2

Sensor5

Sensor2

Repeated until convergence

Sensor4

Sensor3

Sensor4

Sensor3

Sensor1

Sensor2

2013 IBM Corporation


, , Q&A
IBM

Healthcare advisor Engagement advisor


OUTBOUND

FAQ


, ,


INBOUND

VoC FAQ
2013 IBM Corporation

Smarter Planet
http://www.ibm.com/smarterplanet/kr/ko/overview/ideas/index.html

22

2013 IBM Corporation

Part 2 Data Data Science Big Data Analytics

23

2013 IBM Corporation

Data = Digitialization of all things


Data Type Form/Meaning
(, , , ) Amount (, , ), , DNA,

Transformed
, , , ,

Number Number + Text

Number Number

WEB LOG, , ,

, , ,

Text
Sound Signal Image Video
24

SNS, , , WEB, , , , , , , , , , CCTV, UCC,

, , , ,

Number

Text

, , , , Feature , , Feature , ,

Number Number

2013 IBM Corporation

Data Science = Handeling of Digital Information

25

2013 IBM Corporation

Data Scientist of Korea = Group of Speciailst

IT System System IT

DB (R) Architect IT Outsorcing

26

2013 IBM Corporation

Data Scientist of Big Data

27

2013 IBM Corporation

Big Data

28

2013 IBM Corporation

Predictive analytics at the heart of the enterprise


LOB 3

Channels Moments of Truth I buy I renew I claim I mend I cancel


LOB 2 LOB 1 Optimized Business Processes Business Processes

Corporate Goals
Attract Grow Sales Sales Effectiveness Effectiveness Fraud Fraud Management Management Marketing Marketing

Customer Customer Support Support


Claims Claims Processing Processing Underwriting Underwriting

Retain
Fraud Risk

Analytical Foresight
Claims Profile
Customer Feedback

Customer LTV

Retention Risk

Fraud Risk

Best Offers

Customer Experience

Optimal Campaigns

Risk Assessment
Customer Interactions

Platform

Data Collection

Data Mining & Statistics

Decision Optimization

Visualization

Base Services

Attitudinal Data
29

Interaction Data

Behavioral Data

Demographic Data 2013 IBM Corporation

Big Data

The IBM Big Data Platform


IBM InfoSphere BigInsights
Visualization & Discovery
BigSheets Dashboard & Visualization

Applications & Development


Apps Workflow
Text Analytics Pig & Jaql MapReduce Hive

Administration
Admin Console

Integration
JDBC

Monitoring Netezza

Advanced Analytic Engines


Adaptive Algorithms

Text Processing Engine & Extractor Library)

DB2

Streams

Workload Optimization
Integrated Installer ZooKeeper Lucene Enhanced Security Oozie Pig Splittable Text Compression Jaql Hive Adaptive MapReduce Flexible Scheduler Index HCatalog DataStage

Guardium

Runtime / Scheduler Data Store

MapReduce

Symphony

Symphony AE

Management
Security

Platform Computing
Cognos

HBase

Audit & History

Flume
Lineage

File System

HDFS

GPFS FPO

Sqoop

Open Source

IBM

Optional 2013 IBM Corporation

31

2013 IBM Corporation

Você também pode gostar