Hadoop 1.0 Vs 2.0

Presented By:
KALAI SELVI PIYUSH JANGIR

2015272013 2015272053
1
Introduction
Apache Hadoop 1.0 vs 2.0
HDFS
Map Reduce
Master-Slave Architecture
Limitation in Hadoop 1.0
Yarn
References
2
Open source software framework designed for
storage and processing of large scale data on
clusters of commodity hardware
Created by Doug Cutting and Mike Carafella .
Cutting named the program after his sons toy

elephant.
The core of Apache Hadoop consists of a storage

part, known as Hadoop Distributed File
System (HDFS), and a processing part called Map
Reduce.
3
4
Architecture
5
HDFS
6
Responsible for storing data on the cluster
Data files are split into blocks and distributed

across the nodes in the cluster
Each block is replicated multiple times
7
Default replication is 3-fold
8
Distributing computation
across nodes
9
A method for distributing computation across
multiple nodes
Each node processes the data that is stored at

that node
Consists of two main phases

Map
Reduce
the reduce task is always performed after the map job.
10
Takes a set of data and broken down into tuples
Takes the output from a map as an input
Combines those data tuples into a smaller set of

tuples.
11
12
Master Slave Architecture
13
Name Node
Stores metadata for the files, like the directory
structure
Handles creation of more replica blocks when
necessary after a DataNode failure
Data Node
Stores the actual data in HDFS
14
JobTracker
splits up data process into smaller tasks and sends
it to the TaskTracker process in each node
TaskTracker
reports back to the JobTracker node and reports on
job progress, sends data or requests new jobs
15
Scalability: JobTracker runs on single machine doing
several task like
Resource management Job scheduling Monitoring
Availability Issue: In Hadoop 1.0, JobTracker is single Point

of availability. This means if JobTracker fails, all jobs must
restart.
Problem with Resource Utilization: In Hadoop 1.0, there is

concept of predefined number of map slots and reduce
slots for each TaskTrackers. Resource Utilization issues
occur because maps slots might be full while reduce slots
is empty (and vice-versa).
16
http://hortonworks.com/apache/yarn/#secti
on_2
http://saphanatutorial.com/how-yarn-
overcomes-mapreduce-limitations-in-
hadoop-2-0/
http://www.slideshare.net/emcacademics/mil
ind-hadoop-trainingbrazil
https://en.wikipedia.org/wiki/Apache_Hadoo
p
17
18

Hadoop 1.0 Vs 2.0

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Hadoop 1.0 Vs 2.0

Enviado por

Direitos autorais:

Formatos disponíveis

Presented By:

KALAI SELVI PIYUSH JANGIR

Created by Doug Cutting and Mike Carafella .

Cutting named the program after his sons toy

The core of Apache Hadoop consists of a storage

Data files are split into blocks and distributed

Each block is replicated multiple times

Each node processes the data that is stored at

Consists of two main phases

the reduce task is always performed after the map job.

Takes the output from a map as an input

Combines those data tuples into a smaller set of

Availability Issue: In Hadoop 1.0, JobTracker is single Point

Problem with Resource Utilization: In Hadoop 1.0, there is

Você também pode gostar