Escolar Documentos
Profissional Documentos
Cultura Documentos
Institution: Eurecom
Exercise Session:
I Warm up: WordCount and first design patterns [45 minutes]
I Exercises on various design patterns: [60 minutes]
F Pairs
F Stripes
F Order Inversion [HomeWork]
I Solved Exercise: PageRank in MapReduce [45 minutes]
Hadoop MapReduce
Terminology
I MapReduce:
F Job: an execution of a Mapper and Reducer across a data set
F Task: an execution of a Mapper or a Reducer on a slice of data
F Task Attempt: instance of an attempt to execute a task
I Example:
F Running Word Count across 20 files is one job
F 20 files to be mapped = 20 map tasks + some number of reduce tasks
F At least 20 attempts will be performed... more if a machine crashes
HDFS in details
HDFS Blocks
NameNode
I Keeps metadata in RAM
I Each block information occupies roughly 150 bytes of memory
I Without NameNode, the filesystem cannot be used
F Persistence of metadata: synchronous and atomic writes to NFS
Secondary NameNode
I Merges the namespce with the edit log
I A useful trick to recover from a failure of the NameNode is to use the
NFS copy of metadata and switch the secondary to primary
DataNode
I They store data and talk to clients
I They report periodically to the NameNode the list of blocks they hold
External clients
I For each block, the NameNode returns a set of DataNodes holding
a copy thereof
I DataNodes are sorted according to their proximity to the client
MapReduce clients
I TaskTracker and DataNodes are colocated
I For each block, the NameNode usually1 returns the local DataNode
1
Exceptions exist due to stragglers.
Pietro Michiardi (Eurecom) Hadoop in Practice 9 / 76
Hadoop MapReduce HDFS in details
Details on replication
I Clients ask NameNode for a list of suitable DataNodes
I This list forms a pipeline: first DataNode stores a copy of a
block, then forwards it to the second, and so on
Replica Placement
I Tradeoff between reliability and bandwidth
I Default placement:
F First copy on the same node of the client, second replica is off-rack,
third replica is on the same rack as the second but on a different node
F Since Hadoop 0.21, replica placement can be customized
Job Submission
JobClient class
I The runJob() method creates a new instance of a JobClient
I Then it calls the submitJob() on this class
Job Initialization
Task Assignment
Heartbeat-Based mechanism
I TaskTrackers periodically send heartbeats to the JobTracker
I TaskTracker is alive
I Heartbeat contains also information on availability of the
TaskTrackers to execute a task
I JobTracker piggybacks a task if TaskTracker is available
Selecting a task
I JobTracker first needs to select a job (i.e. scheduling)
I TaskTrackers have a fixed number of slots for map and reduce
tasks
I JobTracker gives priority to map tasks (WHY?)
Data locality
I JobTracker is topology aware
F Useful for map tasks
F Unused for reduce tasks
Pietro Michiardi (Eurecom) Hadoop in Practice 16 / 76
Hadoop MapReduce Hadoop MapReduce in details
Task Execution
Disk spills
I Written in round-robin to a local dir
I Output data is partitioned corresponding to the reducers they will
be sent to
I Within each partition, data is sorted (in-memory)
I Optionally, if there is a combiner, it is executed just after the sort
phase
Pietro Michiardi (Eurecom) Hadoop in Practice 20 / 76
Hadoop MapReduce Hadoop MapReduce in details
Input consolidation
I A background thread merges all partial inputs into larger, sorted
files
I Note that if compression was used (for map outputs to save
bandwidth), decompression will take place in memory
Hadoop I/O
Whats next
I Overview of what Hadoop offers
I For an in depth knowledge, use [7]
Serialization
MapReduce Types
Types:
I K types implement WritableComparable
I V types implement Writable
What is a Writable
Reading Data
TextInputFormat
I Treats each newline-terminated line of a file as a value
KeyValueTextInputFormat
I Maps newline-terminated text lines of key SEPARATOR value
SequenceFileInputFormat
I Binary file of key-value pairs with some additional metadata
SequenceFileAsTextInputFormat
I Same as before but, maps (k.toString(), v.toString())
Record Readers
Examples:
I LineRecordReader
F Used by TextInputFormat, reads a line from a text file
I KeyValueRecordReader
F Used by KeyValueTextInputFormat
On the top of the Crumpetty Tree (0, On the top of the Crumpetty Tree)
The Quangle Wangle sat, (33, The Quangle Wangle sat,)
But his face you could not see, (57, But his face you could not see,)
On account of his Beaver Hat. (89, On account of his Beaver Hat.)
Analogous to InputFormat
Algorithm Design
Preliminaries
Algorithm Design
Learn by examples
I Design patterns
I Synchronization is perhaps the most tricky aspect
Algorithm Design
Aspects that are not under the control of the designer
I Where a mapper or reducer will run
I When a mapper or reducer begins or finishes
I Which input key-value pairs are processed by a specific mapper
I Which intermediate key-value paris are processed by a specific
reducer
Algorithm Design
Optimizations
I Scalability (linear)
I Resource requirements (storage and bandwidth)
Motivations
In the context of data-intensive distributed processing, the
most important aspect of synchronization is the exchange of
intermediate results
I This involves copying intermediate results from the processes that
produced them to those that consume them
I In general, this involves data transfers over the network
I In Hadoop, also disk I/O is involved, as intermediate results are
written to disk
Input:
I Key-value pairs: (docid, doc) stored on the distributed filesystem
I docid: unique identifier of a document
I doc: is the text of the document itself
Mapper:
I Takes an input key-value pair, tokenize the document
I Emits intermediate key-value pairs: the word is the key and the
integer is the value
The framework:
I Guarantees all values associated with the same key (the word) are
brought to the same reducer
The reducer:
I Receives all values associated to some keys
I Sums the values and writes output key-value pairs: the key is the
word and the value is the number of occurrences
Technique 1: Combiners
In-Mapper Combiners
In-Mapper Combiners
In-Mapper Combiners
Precautions
I In-mapper combining breaks the functional programming paradigm
due to state preservation
I Preserving state across multiple instances implies that algorithm
behavior might depend on execution order
F Ordering-dependent bugs are difficult to find
Scalability bottleneck
I The in-mapper combining technique strictly depends on having
sufficient memory to store intermediate results
F And you dont want the OS to deal with swapping
I Multiple threads compete for the same resources
I A possible solution: block and flush
F Implemented with a simple counter
Problem statement
Motivation
I This problem is a basic building block for more complex operations
I Estimating the distribution of discrete joint events from a large
number of observations
I Similar problem in other domains:
F Customers who buy this tend to also buy that
The mapper:
I Processes each input document
I Emits key-value pairs with:
F Each co-occurring word pair as the key
F The integer one (the count) as the value
I This is done with two nested loops:
F The outer loop iterates over all words
F The inner loop iterates over all neighbors
The reducer:
I Receives pairs relative to co-occurring words
I Computes an absolute count of the joint event
I Emits the pair and the count as the final key-value output
F Basically reducers emit the cells of the matrix
Pietro Michiardi (Eurecom) Hadoop in Practice 55 / 76
Algorithm Design Paris and Stripes
The mapper:
I Same two nested loops structure as before
I Co-occurrence information is first stored in an associative array
I Emit key-value pairs with words as keys and the corresponding
arrays as values
The reducer:
I Receives all associative arrays related to the same word
I Performs an element-wise sum of all associative arrays with the
same key
I Emits key-value output in the form of word, associative array
F Basically, reducers emit rows of the co-occurrence matrix
Formally, we compute:
N(wi , wj )
f (wj |wi ) = P 0
w 0 N(wi , w )
The mapper:
I additionally emits a special key of the form (wi , )
I The value associated to the special key is one, that represtns the
contribution of the word pair to the marginal
I Using combiners, these partial marginal counts will be aggregated
before being sent to the reducers
The reducer:
I We must make sure that the special key-value pairs are processed
before any other key-value pairs where the left word is wi
I We also need to modify the partitioner as before, i.e., it would take
into account only the first word
Memory requirements:
I Minimal, because only the marginal (an integer) needs to be stored
I No buffering of individual co-occurring word
I No scalability bottleneck
Introduction
What is PageRank
I Its a measure of the relevance of a Web page, based on the
structure of the hyperlink graph
I Based on the concept of random Web surfer
Formally we have:
1 X P(m)
P(n) = + (1 )
|G| C(m)
mL(n)
PageRank in Details
PageRank in MapReduce
PageRank in MapReduce
Implementation details
I Loss of PageRank mass for sink nodes
I Auxiliary state information
I One iteration of the algorith
F Two MapReduce jobs: one to distribute the PageRank mass, the
other for dangling nodes and random jumps
I Checking for convergence
F Requires a driver program
F When updates of PageRank are stable the algorithm stops
References I
References II