Escolar Documentos
Profissional Documentos
Cultura Documentos
January 2013
Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data
Cost effectively manage and analyze all available data in its native form unstructured, structured, streaming
Website
Social Media
Billing ERP
2
CRM
RFID
Network Switches
2012 IBM Corporation
Data Warehousing
Stream Computing
Business-Centric Big Data Enables You to Start With a Critical Business Pain and Expand the Foundation for Future Requirements
Big data isnt just a technologyits a business strategy for capitalizing on information resources Getting started is crucial Success at each entry point is accelerated by products within the Big Data platform Build the foundation for future requirements by expanding further into the big data platform
IT
Structures the data to answer that question
Monthly sales reports Profitability analysis Customer surveys
IBM Big data platform Includes much more than IBM InfoSphere Big Insights
Hadoop
Open-source software framework from Apache Inspired by
Google MapReduce GFS (Google File System)
HDFS Map/Reduce
InfoSphere BigInsights
Platform for volume, variety, velocity Enhanced Hadoop foundation Analytics Text analytics & tooling Application accelerators
Enterprise class
Enterprise Edition
Licensed Application accelerators Pre-built applications Text analytics Spreadsheet-style tool RDBMS, warehouse connectivity Basic Edition Administrative tools, security Eclipse development tools Free download Performance enhancements Integrated install .... Online InfoCenter BigData Univ.
Apache Hadoop
Breadth of capabilities
2012 IBM Corporation
Spreadsheet-style Analysis
Web-based analysis and visualization
Spreadsheet-like interface
Define and manage long running data collection jobs Analyze content of the text on the pages that have been retrieved
11
12
Jaql
Jaql I/O Jaql Core Operators
Jaql Modules
Local and distributed file systems NoSQL data bases Content repositories Relational sources (Warehouses, operational data bases)
DFS
NoSQL
RDBMS
File System
13
Data warehouse
BigInsights
Filter
14
Transform
Aggregate
2012 IBM Corporation
Value statement Speed: 10 100x better performance Simplicity: Administration costs reduced by 75% - 90% Scalability Smart system In-database analytics Out-of-the box integration with SPSS
15
I need to evaluate the possible relationship between client salary and overdrafts Analyst
OK. We have to evaluate a lot of statistics, set the correct db indexes and db partitioning. It will take us 5 days. IT
16
Analyst
IT
17
Great. I can see here some nice Noooo!!! not correlations. Now I need to Its look atpossible to work here! it from the different perspective.
Ohhh, welcome dear friend. Understand. So, its . another 5 days of our work
Analyst
IT
18
19
I need to evaluate the possible relationship between client salary and overdrafts. I will use Netezza. Analyst IT
20
Great. I can see here some nice correlations. Now I need to look at it from the different perspective. With Netezza I can run the query immediately. The response will be in the same time
Analyst
IT
21
22
Dedicated device
Optimized for purpose Complete solution
Fast installation
Very easy operation Standard interfaces Low cost
23
In October 2012
24
Proof-Of-Concept Project
New EnterpriseDataWarehouse platform selection Comparison of existing and other platforms Selection Criteria
Performance Operational Savings
25
26
27
28
Original Platform Workflow Reporting Invoicing and Payments reporting Payment discipline of current month invoices 33 minutes 2 hours
Netezza 1 minute
17 seconds
10 hours
50 minutes
23 seconds
38 seconds
30
Data Warehouse
31
HiveTables
HBase tables
CSV Files
InfoSphere BigInsights
34 2012 IBM Corporation
Streams Computing
35
Processing in isolation or in limited windows (time / nr. Of records) Spatial data, images, text, voice, Different connection methods Different data rates Different processing requirements Volume / rate very high => scalability required Immediate analysis and response Because of very high volume of data (and its rates)
2012 IBM Corporation
Streams and BigInsights - Integrated Analytics on Data in Motion & Data at Rest
Visualization of realtime and historical insights
InfoSphere Streams
1. Data Ingest
Data
2. Bootstrap/Enrich
Data ingest, preparation, online analysis, model validation
Control flow
38
IN DETAIL By moving from entry to a 2nd and 3rd project Shared components Integration
Analytic Applications
BI / Exploration / Functional Industry Predictive Content BI / Reporting Visualization App App Analytics Analytics Reporting
Accelerators
Points of leverage
Shared text analytics for Streams and BigInsights HDFS connectors (data integration (ETL, ), Streams) Accelerators Build across multiple engines
Hadoop System
Stream Computing
Data Warehouse
39
THINK
IBM big data IBM big data IBM big data IBM big data IBM big data
40