Escolar Documentos
Profissional Documentos
Cultura Documentos
1
Introduction
Apache Hadoop 1.0 vs 2.0
HDFS
Map Reduce
Master-Slave Architecture
Limitation in Hadoop 1.0
Yarn
References
2
Open source software framework designed for
storage and processing of large scale data on
clusters of commodity hardware
3
4
Architecture
5
HDFS
6
Responsible for storing data on the cluster
7
Default replication is 3-fold
8
Distributing computation
across nodes
9
A method for distributing computation across
multiple nodes
10
Takes a set of data and broken down into tuples
11
12
Master Slave Architecture
13
Name Node
Stores metadata for the files, like the directory
structure
Handles creation of more replica blocks when
necessary after a DataNode failure
Data Node
Stores the actual data in HDFS
14
JobTracker
splits up data process into smaller tasks and sends
it to the TaskTracker process in each node
TaskTracker
reports back to the JobTracker node and reports on
job progress, sends data or requests new jobs
15
Scalability: JobTracker runs on single machine doing
several task like
Resource management Job scheduling Monitoring
16
http://hortonworks.com/apache/yarn/#secti
on_2
http://saphanatutorial.com/how-yarn-
overcomes-mapreduce-limitations-in-
hadoop-2-0/
http://www.slideshare.net/emcacademics/mil
ind-hadoop-trainingbrazil
https://en.wikipedia.org/wiki/Apache_Hadoo
p
17
18