Escolar Documentos
Profissional Documentos
Cultura Documentos
3
N1 MAP REDUCE N1
DATA SHUFFLE 2
N2 MAP and REDUCE N2
N1
SORT
2
2
N3 MAP REDUCE N3
N1
MapReduce (4)
● Map: <key1, value1> → List(<key2, value2>)
● Reduce: <key2, List(<value2>)> →
List(<value2>)
http://wiki.apache.org/hadoop/PoweredBy
Hadoop (3)
JobTracker
Task tracker Task tracker Task tracker Task tracker Task tracker
● Completely automated
● Jobs are scheduled based on data locality
● Speculative execution
Hadoop (4)
● Code
● Is open source
● Java
● Build scripts
● Bash scripts
● Configuration files
Hadoop (5)
● Is part of a larger ecosistem
● HDFS – distributed file system
● Hbase – distributed, column-oriented database
● Mahout – machine learning algorithm library
● Nutch – web crawler
● And lots of other stuff
Hadoop example
● Ad clicking log
● User information (Age, Location) database
● How could you use that to your advantage?
Secondary
NameNode NameNode
● key=location:Romania;age:16;sex=M
● ads:copiutze.ro.clickProbability = 0.0018
● ads:copiutze.ro.bestPlacement = calendarPage
● …
● stats:clickProbability=0.0015
HBase (3)
Master