Escolar Documentos
Profissional Documentos
Cultura Documentos
B. RAMAMURTHY
cse4/587 12/25/2017
Reference
2
cse4/587 12/25/2017
Examples
4
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130
"http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I;
PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36
"http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
cse4/587 12/25/2017
Traditional Storage Solutions
5
Off system/online
File system
storage/ Offline/ tertiary
abstraction/
secondary memory/ DFS
Databases
memory
RAID: Redundant
NAS: Network SAN: Storage area
Array of
Accessible Storage networks
Inexpensive Disks
cse4/587 12/25/2017
Solution Space
6
cse4/587 12/25/2017
Google File System
7
cse4/587 12/25/2017
Hadoop
8
cse4/587 12/25/2017
Basic Features: HDFS
9
Highly fault-tolerant
High throughput
Suitable for applications with large data sets
Streaming access to file system data
Can be built out of commodity hardware
HDFS provides Java API for applications to use.
It also provides a streaming API for other languages.
(See MR in python here)
A HTTP browser can be used to browse the files of a
HDFS instance.
cse4/587 12/25/2017
Architecture
10
cse4/587 12/25/2017
Namenode and Datanodes
11
Master/slave architecture
HDFS cluster consists of a single Namenode, a master server that
manages the file system namespace and regulates access to files by
clients.
There are a number of DataNodes usually one per node in a
cluster.
The DataNodes manage storage attached to the nodes that they run
on.
HDFS exposes a file system namespace and allows user data to be
stored in files.
A file is split into one or more blocks and set of blocks are stored in
DataNodes.
DataNodes: serves read, write requests, performs block creation,
deletion, and replication upon instruction from Namenode.
cse4/587 12/25/2017
HDFS Architecture
12
Metadata(Name, replicas..)
Metadata ops Namenode (/home/foo/data,6. ..
Client
Block ops
Read Datanodes Datanodes
replication
B
Blocks
Client
cse4/587 12/25/2017
File system Namespace
13
cse4/587 12/25/2017
Data Replication
14
cse4/587 12/25/2017
Replica Placement
15
Replication factor is 3
Replicas are placed: one on a node in a local rack, one on a different node in the local
rack and one on a node in a different rack.
1/3 of the replica on a node, 2/3 on a rack and 1/3 distributed evenly across remaining
racks.
cse4/587 12/25/2017
Replica Selection
16
cse4/587 12/25/2017
Safemode Startup
17
cse4/587 12/25/2017
Filesystem Metadata
18
cse4/587 12/25/2017
Namenode
19
cse4/587 12/25/2017
Datanode
20
cse4/587 12/25/2017
Protocol
21
cse4/587 12/25/2017
The Communication Protocol
22
cse4/587 12/25/2017
Robustness
23
cse4/587 12/25/2017
Possible Failures
24
cse4/587 12/25/2017
DataNode failure and heartbeat
25
cse4/587 12/25/2017
Cluster Rebalancing
27
cse4/587 12/25/2017
Data Organization
30
cse4/587 12/25/2017
Data Blocks
31
cse4/587 12/25/2017
Staging
32
cse4/587 12/25/2017
Replication Pipelining
34
cse4/587 12/25/2017
API (Accessibility)
35
cse4/587 12/25/2017
FS Shell, Admin and Browser Interface
36
cse4/587 12/25/2017
MapReduce Engine
38
cse4/587 12/25/2017
Large scale data splits Map <key, 1>
<key, value>pair Reducers (say, Count)
Parse-hash
Count
P-0000
, count1
Parse-hash
Count
P-0001
, count2
Parse-hash
Count
P-0002
Parse-hash ,count3
cse4/587 39 12/25/2017
MapReduce Engine
40
cse4/587 12/25/2017
Summary
43
cse4/587 12/25/2017