Você está na página 1de 20

BIGDATA HADOOP

PRESENTED BY
SHARTHAK ACHARJEE
Introduction
 Today We Live in the Data Age.
 Due to Internet of Things (IoT), the speed of
ingestion of data is keeps on increasing and
increasing.
 So, the World is getting more “Hungrier and
Hungrier for Data”
Terminologies
 Cloud Computing
 BigData
 Hadoop
 Distributed Computing
 Parallel Computing
 Utility Computing
 Data Scientist
Letz start the journey…!
What is BigData?
 BigData is the any amount of data that is structured
and/or unstructured data which is beyond the
storage and processing capabilities of a single
physical machine and traditional database
techniques. .
 In short, such a data is so large and complex that
none of the traditional data management tools are
NOT able to store it or process it efficiently.
 Examples are Facebook, NYSE, Boieng 787 etc
Why big data deserves our attention?

 Everyday we create 2.5 quintillions bytes of data,


90% this data is unstructured.
 90% of the data in the World today has been
created in the last two years alone.
 By the end of 2016, CISCO estimates that global
Internet traffic will reach 6.8 Zettabytes a year.
 BigData would create 7.8 million jobs by 2019.
Characteristics:-
 VOLUME
 VELOCITY
 VARIETY
 VERACITY
 VALUE
Sector using BigData
What is Hadoop???????
Hadoop….. WHY??
 •Need to process Multi Petabyte Datasets
 •Data may not have strict schema
 •Expensive to build reliability in each application.
 •Nodes fail every day
 • Failure is expected, rather than exceptional.
 • The number of nodes in a cluster is not constant.
 • Need common infrastructure
 •Efficient, reliable, Open Source Apache License
Who uses hadoop??
Features of hadoop…
 Scalable– New nodes can be added without changing
data formats.
 Cost-effective– It parallelly processes huge datasets on
large clusters of commodity computers.
 Efficient and Flexible- It is schema-less, and can absorb
any type of data, from any number of sources.
 Fault-tolerant and Reliable- It handles failures of nodes
easily because od Replication.
 Easy to use- It uses simple Map and Reduce functions to
process the data.
 It is developed in Java but it can support Python &
others too.
Hadoop Architecture
Thank you!!!

Any queries???

Você também pode gostar