An Intro of Big Data and Their Relationship With Network System

A Review Paper on Big Data and Networking
Abstract: The term Big Data describes innovative techniques and technologies to capture, store,
distribute, manage and analyze petabyte- or larger-sized datasets with high-velocity and different
structures. Big data can be structured, unstructured or semi-structured, resulting in incapability of
conventional data management methods. Data is generated from various different sources and can
arrive in the system at various rates. In order to process these large amounts of data in an inexpensive
and efficient way, parallelism is used. Big Data is a data whose scale, diversity, and complexity require
new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden
knowledge from it. Hadoop is the core platform for structuring Big Data, and solves the problem of
making it useful for analytics purposes. Hadoop is an open source software project that enables the
distributed processing of large data sets across clusters of commodity servers. It is designed to scale up
from a single server to thousands of machines, with a very high degree of fault tolerance..
1.Introduction 2. Reduce: Takes the output from Map

outputs the results
Big Data and their relationship with Network
System SHUFFLE
MAP
1. Why, What, and How of Big Data: Its all REDUCE
because of advances in networking INPUT
MAP
2. Recent Developments in Networking and REDUCE
MAP
their role in Big Data (Virtualization, SDN, NFV)
Pic 2. Simplified Data Processing on Large Clusters
3. Networking needs Big Data
What Is Hadoop
BIG DATA
An open source implementation of MapReduce
Large Storage FAST COMPUTING
CLOUD It was named by Doug Cutting at Yahoo after his

sons yellow plus elephant
VIRTUALIZATION
Hadoop File System (HDFS) requires data to be
NETWORKING
broken into blocks. Each block is stored on 2 or
more data nodes on different racks.
Pic 1. Big Data and Network Structure
Understanding MapReduce
Software framework to process massive

amounts of unstructured data by distributing it
over a large number of inexpensive processors
1. Map: Takes a set of data and divides it for

computation
Pic 3. Hadoop architecture
Name node: Manages the file system name

space
keeps track of blocks on various Data Nodes. Todays data center: Tomorrow data Center:
Job Tracker: Assigns MapReduce jobs to task Tens of tenants 1k of clients

tracker nodes that are close to the data (same Hundreds of switches 10k of Switches
rack) and routers
Task Tracker: Keep the work as close to the data Thousands of servers 1M of VMs
as possible.
Hundreds of Tens of Administrators
administrators
Need to monitor traffic patterns and rearrange

virtual networks connecting millions of VMs in
real-time, it means that :
Managing clouds is a real-time big data

problem.
Pic 4. Block of HDFS
Internet of things Big Data generation and
Like the ecositem we can describe that Hadoop analytics
is a framework for processing, storing, and
analyzing massive amounts of distributed 2. Networking in Big Data
unstructured data. As a distributed file storage
subsystem, Hadoop Distributed File System Networking Requirements for Big Data
(HDFS) was designed to handle petabytes and 1. Code/Data Collocation: The data for map jobs
exabytes of data distributed over multiple should be at the processors that are going to
nodes in parallel. Picture below can illustrates map.
an overview of Hadoops deployment in a big
data analytics environment. 2. Elastic bandwidth: to match the variability of
volume
3. Fault/Error Handling: If a processor fails, its

task needs to be assigned to another processor
(CPU)
4.Security: Access control (authorized users

only), privacy (encryption), threat detection, all
in real-time in a highly scalable manner 5.
Synchronization: The map jobs should be
comparables so that they finish together.
Similarly reduce jobs should be comparable.
Recent Developments in Networking
1. High-Speed: 100 Gbps Ethernet

Pic 4. Hadoop Deployment Structure 400 Gbps 1000 Gbps
2. Cheap storage access. Easy to move big data.
Big Data for Networking
3. Virtualization
Today VS Tomorrow : 4. Software Defined Networking
5. Network Function Virtualization
Virtualization
Recent networking technologies and standards

allow:
1. Virtualizing Computation
2. Virtualizing Storage
3. Virtualizing Rack Storage Connectivity
4. Virtualizing Data Center Storage
5. Virtualizing Metro and Global Storage
Virtualizing Computation
Initially data centers consisted of multiple IP

subnets
Each subnet = One Ethernet Network

Ethernet addresses are globally unique
and do not change
IP addresses are locators and change
every time you move
If a VM moves inside a subnet No
change to IP address Fast
If a VM moves from one subnet to
another Its IP address changes All
connections break Slow Limited
VM mobility
IEEE 802.1ad-2005 Ethernet Provider Bridging

(PB), IEEE 802.1ah-2008 Provider Backbone
Bridging (PBB) allow Ethernets to span long
distances Global VM mobility
Pic 4. Layer Networking Architecture

An Intro of Big Data and Their Relationship With Network System

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

An Intro of Big Data and Their Relationship With Network System

Enviado por

Direitos autorais:

Formatos disponíveis

A Review Paper on Big Data and Networking

1.Introduction 2. Reduce: Takes the output from Map

CLOUD It was named by Doug Cutting at Yahoo after his

Software framework to process massive

1. Map: Takes a set of data and divides it for

Name node: Manages the file system name

Job Tracker: Assigns MapReduce jobs to task Tens of tenants 1k of clients

Need to monitor traffic patterns and rearrange

Managing clouds is a real-time big data

3. Fault/Error Handling: If a processor fails, its

4.Security: Access control (authorized users

Recent Developments in Networking

1. High-Speed: 100 Gbps Ethernet

Recent networking technologies and standards

3. Virtualizing Rack Storage Connectivity

4. Virtualizing Data Center Storage

5. Virtualizing Metro and Global Storage

Initially data centers consisted of multiple IP

Each subnet = One Ethernet Network

IEEE 802.1ad-2005 Ethernet Provider Bridging

Pic 4. Layer Networking Architecture

Você também pode gostar