Você está na página 1de 18

Presented By:

KALAI SELVI PIYUSH JANGIR


2015272013 2015272053

1
Introduction
Apache Hadoop 1.0 vs 2.0
HDFS
Map Reduce
Master-Slave Architecture
Limitation in Hadoop 1.0
Yarn
References

2
Open source software framework designed for
storage and processing of large scale data on
clusters of commodity hardware

Created by Doug Cutting and Mike Carafella .

Cutting named the program after his sons toy


elephant.

The core of Apache Hadoop consists of a storage


part, known as Hadoop Distributed File
System (HDFS), and a processing part called Map
Reduce.

3
4
Architecture

5
HDFS

6
Responsible for storing data on the cluster

Data files are split into blocks and distributed


across the nodes in the cluster

Each block is replicated multiple times

7
Default replication is 3-fold

8
Distributing computation
across nodes

9
A method for distributing computation across
multiple nodes

Each node processes the data that is stored at


that node

Consists of two main phases


Map
Reduce

the reduce task is always performed after the map job.

10
Takes a set of data and broken down into tuples

Takes the output from a map as an input

Combines those data tuples into a smaller set of


tuples.

11
12
Master Slave Architecture

13
Name Node
Stores metadata for the files, like the directory
structure
Handles creation of more replica blocks when
necessary after a DataNode failure

Data Node
Stores the actual data in HDFS

14
JobTracker
splits up data process into smaller tasks and sends
it to the TaskTracker process in each node

TaskTracker
reports back to the JobTracker node and reports on
job progress, sends data or requests new jobs

15
Scalability: JobTracker runs on single machine doing
several task like
Resource management Job scheduling Monitoring

Availability Issue: In Hadoop 1.0, JobTracker is single Point


of availability. This means if JobTracker fails, all jobs must
restart.

Problem with Resource Utilization: In Hadoop 1.0, there is


concept of predefined number of map slots and reduce
slots for each TaskTrackers. Resource Utilization issues
occur because maps slots might be full while reduce slots
is empty (and vice-versa).

16
http://hortonworks.com/apache/yarn/#secti
on_2
http://saphanatutorial.com/how-yarn-
overcomes-mapreduce-limitations-in-
hadoop-2-0/
http://www.slideshare.net/emcacademics/mil
ind-hadoop-trainingbrazil
https://en.wikipedia.org/wiki/Apache_Hadoo
p

17
18