Escolar Documentos
Profissional Documentos
Cultura Documentos
William El Kaim
May 2018 – V 4.0
This Presentation is part of the
Enterprise Architecture Digital Codex
Visualization
Source: Clustrix
Source: Capgemini
Encoding Format: JSON, Distributed File Data Science: Dataiku, Cassandra, Druid, DynamoDB,
Rcfile, Parquet, ORCfile System: GlusterFS, Datameer, Tamr, R, SaS, MongoDB, Redshift, Google
Open BigQuery, etc.
HDFS, Amazon S3, Python, RapidMiner, etc.
Data
MapRFS, ElasticSearch Machine Learning: Data Warehouse
Batch BigML, Mahout,
Map Reduce Predicsys, Azure ML,
Operational TensorFlow, H2O, etc.
Systems Streaming Qlik, Tableau, Tibco, Jethro,
(ODS, IoT) Event Stream & Micro Batch Looker, IBM, SAP, BIME, etc.
BI Tools & Platforms
Ingestion Technologies: NoSQL Databases: Distributions: Cloudera,
Existing Sources Apex, Flink, Flume, Cassandra, Ceph, HortonWorks, MapR,
of Data Kafka, Amazon Kinesis, DynamoDB, Hbase, SyncFusion, Amazon Cascading, Crunch, Hfactory,
(Databases, Nifi, Samza, Spark, Hive, Impala, Ring, EMR, Azure HDInsight, Hunk, Spring for Hadoop,
DW, DataMart) Sqoop, Scribe, Storm, OpenStack Swift, etc. Altiscale, Pachyderm, D3.js, Leaflet
NFS Gateway, etc. Qubole, etc.
App. Services
Data Sources Data Ingestion Data Lake Lakeshore & Analytics Analytics App and Services
Copyright © William El Kaim 2018
Copyright © William El Kaim 2016 32
Hadoop Distributions and Providers
Forrester Big Data Hadoop Distributions, Q1 2016 Forrester Big Data Hadoop Cloud, Q1 2016
2
Big Data
Analytics
and Use
1
Data
Ingestion
and Storage
ETL Pre-processor
• Shift the pre-processing of ETL in staging data warehouse to Hadoop
• Shifts high cost data warehousing to lower cost Hadoop clusters
Massive Storage
• Offloading large volume of historical data into cold storage with Hadoop
• Keep data warehouse for hot data to allow BI and analytics
• When data from cold storage is needed, it can be moved back into the warehouse
Data Discovery
• Keep data warehouse for operational BI and analytics
• Allow data scientists to gain new discoveries on raw data (no format or structure)
• Operationalize discoveries back into the warehouse
Sensors,
devices Business Dashboards,
Intelligence Reports,
& Analytics Visualization, …
DB data
3. Data Processing /
Analysis Layer
External Data
2. Data Storage
Layer
1. Data Source
Layer
1 2 3 3 4
Copyright © William El Kaim 2018 Source: The Modeling Agency and sv-europe 81
How to Implement Hadoop?
McKinsey Seven Steps Approach