Você está na página 1de 22

Presented By :Deshmukh Sachin B.

ME(computer) 9970406068

Guided By :-

Open source implementation. Uses HDFS.

Hadoop Architecture
Client
Job assignment to cluster

Name Node Master Node Job Tracker

Slave Node Task Tracker Data Node Map Reduce

Slave Node Task Tracker Data Node Map Reduce

Slave Node Task Tracker Data Node Map Reduce

Data replication on multiple node

Fig 1:Architcture Of Hadoop

It is the software framework for distributed processing of large data sets on computer cluster. Map and reduce have general interface Each receives sequence of records and produces records in response A record consists of key and value

Map Reduce Job


Unordered Bined Data

Map

Map

Map

Map

Records Bined with Key

Reduc

Redu ce

Reduc

Reduc

Result By Reducer

Fig 2: Map Reduce Process

Map operation:
Map seeks to key its output

Reduce operation:
so that the system places in the same bin the records that should come together in the reduce phase.

Fig 3: Map Operation

Fig 4: Reduce Operation

Files in HDFS Splitter Record Reader

Mappers

M1

M2

M3

Combiner

Partitioner

Reducers

R1

R2

R3

Sorter

Fig 5: Detailed Flow Of Map Reduce

Problem statement for word count: There is huge file. Determine the count of each word in the file. Approach: Map reduce take advantage of the huge number of nodes presents in the cluster. Map- reduce runs in parallel at each node in the cluster.

map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1)

reduce(key, values): // key: a word; values: an iterator over counts result = 0 for each count v in values: result += v emit(key, result)

map(key=url, Val=contents): For each word w in contents, emit (w, 1) reduce(key=word, values=uniq_counts): Sum all 1s in values list Emit result (word, sum)

see bob run see spot throw

see bob run see spot throw

1 1 1 1 1 1

bob run see spot throw

1 1 2 1 1

EXAMPLE WORD COUNT USING MAP REDUCE


Input Splitting Mapping Shuffling Reducing Final Result

Deer 1 Bear 1 Bear 1 Bear 1 River 1 Deer Beer River Deer Beer River Car Car River Deer Car Bear Deer Car Bear Car Car Car River Car 1 Car 1 1 Car 1 River 1 Deer 1 Deer 1 Car 1 Deer 1 Bear 1 River 1 River 1 River 2 Deer 2 Deer 2 River 2 Car 1 Car 3 Bear 2 Car 3 Bear 2

Fig 6: Map Reduce Word Count Process

ANOTHER WORD COUNT

Fig 7: Word Count Example

In Pipelined Map Reduce Mapper directly send data to Reducer.

Comparision Of Hadoop & Pipelined Map Reduce Data Flow


Push Pull
Local HDFS

Pull
Map Store Red uce

HDFS

Fig Haboop Data Flow for Batch

Push Push
HDFS Local HDFS Map Store Red uce

Pull

Mapper directly push data to reducer as it is produce

Fig Pipelined Map Reduce Data Flow

Algo For Word Count Using Pipelined MR


TCP Socket See Bob Run
Client Submits Job

See 1
M1

Bob 1
M2

Run 1
M3

See 1
M4

See

Job Tracker

See 1
R1

Bob 1
R2

Run 1
R3

Run 1
R4

Reduce Task Accept The Pipeline Data & store it in In Memory Buffer In Memory Buffer

See 1 Bob 1 Run 1 See 1

MERGE

Applies User defined Reduce Function

See 2 Bob 1 Run 1

HDFS

Open TCP Socket

Allows to send and Receive data between task and between jobs with disk i/o. Reduce Time. Enabling the user to take snapshots of approximate output.

In this seminar , we studied the Pipelined-MapReduce in the Hadoop environment , extends the MapReduce programming model which is superior to the batch, reduce the completion time of tasks. Pipedline-MapReduce can processes large datasets effctively. In our future works , we will study the applicability of the MapReduce technique in cloud computing environments.

J.Dean,S.Ghemawat,MapReduce:simplified Data Processing on Large Clusters. Proc. of Operating Systems Design and Implementation, San Francisco,CA, pp. 137150 (2004) T.Hey, S.Tansley, K.Tolle. The Fourth Paradigm: DataIntensive Scientic Discovery. Microsoft Research, Redmond, Washington, 2009 C.Ranger, R.Raghuraman, A.Penmetsa, and G.Bradski, C.Kozyrakis, Evaluating MapReduce for Multi-core and Multiprocessor Systems. Proc. of 13th Symposium on High-PerformanceComputer Architecture (HPCA), Phoenix, AZ(2007) Hadoop, http://hadoop.apache.org/core/

Você também pode gostar