Você está na página 1de 212

Cloud Computing

Cloud Introduction

1
Cloud Computing

What does Cloud Computing do?

Provides online data storage

Enables configuration and accessing of online applications

Provides a variety of software usage

Provides computing platform and computing infrastructure

2
Cloud Computing

Application Example

Using Gmail on my smartphone to check e-mails


Receive an e-mail with a MS Power Point attachment file
However, MS Power Point and Windows OS is not installed
on my smartphone!
Google Drive services Google Docs, Sheets, and Slides
can be used to open the file

3
Cloud Computing

What is a Cloud?

Cloud can provide services through a public or private


Network or the Internet, where the service hosting system is
at a remote location

Cloud can support various applications


E-mail, Web Conferencing, Games, Database
Management, CRM (Customer Relationship Management),
etc.

4
Cloud Computing

Cloud Models

5
Cloud Computing

Cloud Models

Public Cloud
Enables public systems and service access
Open architecture (e.g., e-mail)
Could be less secure due to openness

Private Cloud
Enables service access within an organization
Due to its private nature, it is more secure

6
Cloud Computing

Cloud Models

Community Cloud
Cloud accessible by a group of organizations

Hybrid Cloud
Hybrid Cloud = Public Cloud + Private Cloud
Private cloud supports critical activities
Public cloud supports non-critical activities

7
Cloud Computing

Cloud Service Models


The lower service model supports the
management, computing power, security
of its upper service model

SaaS: Software as a Service


PaaS: Platform as a Service
IaaS: Infrastructure as a Service

8
Cloud Computing

Software as a Service (SaaS)


Provides a variety of software applications as a service to
end users

Platform as a Service (PasS)


Provides a program executable platform for applications,
development tools, etc.

Infrastructure as a Service (IaaS)


Provides the fundamental computing and security
resources for the entire cloud
Backup storage, computing power, VM (Virtual Machines),
etc.
9
Cloud Computing

Cloud Service Models


There are many other service models

XaaS = Anything as a Service

NaaS N for Network as a Service


DaaS D for Database as a Service
BaaS B for Business as a Service
etc.

10
Cloud Computing

Cloud Benefits

11
Cloud Computing
Characteristics

12
Cloud Computing

REFERENCES

13
References
K. Kumar and Y. H. Lu, Cloud Computing for Mobile Users: Can Offloading
Computation Save Energy?, Computer, vol. 43, no. 4, pp. 5156, Apr. 2010.
Wikipedia, http://www.wikipedia.org
Apple, iCloud, https://www.icloud.com
Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]
Virtualization, Ciscos IaaS cloud,
http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg
[Accessed June 1, 2015]
Tutorialspoint, Cloud computing,
http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf
[Accessed June 1, 2015]

14
References
Image sources
AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web
Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons
MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

15
Cloud Computing

Cloud Service Models

16
Cloud Computing

Cloud Service Models


The lower service model supports the
management, computing power, security
of its upper service model

SaaS: Software as a Service


PaaS: Platform as a Service
IaaS: Infrastructure as a Service

17
IaaS

IaaS (Infrastructure as a Service)

Infrastructure support over the Internet


Clouds Computing & Storage Resources
Computing Power
Storage Services
Software Packages & Bundles
VLAN (Virtual Local Area Network)
VM (Virtual Machine) Features

18
IaaS

VM (Virtual Machine) Administration


IaaS enables control of computing resources through
Administrative Access to VMs
Server Virtualization features
Access to computing resources are enabled by
Administrative Access to VMs
VM Administrative Command examples
Save data on cloud server
Start web server
Install new application
19
IaaS

IaaS Procedures

20
IaaS

IaaS Benefits
Flexible and Efficient Renting of Computer & Server
Hardware
Rentable Resources
VM, Storage, Bandwidth,
IP Addresses, Monitoring Services, Firewalls,
etc.
Rent Payment Basis
Resource type
Usage time
Service packages
21
IaaS

IaaS Benefits
Portability & Interoperability with
Legacy Applications
Enables portability based on infrastructure
resources that are
used through Internet connections
Enables a method to maintain interoperability with
legacy applications and workloads
between IaaS clouds

22
PaaS

PaaS
(Platform as a Service)
Provides development &
deployment tools for
application development

Provides runtime
environment for apps.

23
Cloud Services

PaaS Types

Application Stand Alone


Delivery-Only Development
Environment Environment

Add-on
Open Platform Development
as a Service Facilities

24
PaaS

PaaS Types
Application Delivery-Only Environment
Provides on-demand scaling & application security
Stand-Alone Development Environment
Provides an independent platform for a specific function
Open Platform as a Service
Provides open source software to run applications for
PaaS providers
Add-On Development Facilities
Enables customization to the existing SaaS platforms
25
PaaS

PaaS Benefits

26
PaaS

Benefits

Lower Administrative Overhead


User does not need to be involved in any
administration of the platform

Lower Total Cost of Ownership


User does not need to purchase any hardware,
memory, or server

27
PaaS

Benefits

Scalable Solutions
Application resource demand based automatic
resource scale control

More Current System Software


Cloud provider needs to maintain software
upgrades & patch installations

28
SaaS

SaaS (Software as a Service)


Provides software applications as a service to the
user

Software that is deployed on a cloud server which


is accessible through the Internet

29
SaaS

Characteristics
On Demand Availability
Cloud software is available anywhere that the
cloud is reachable via Internet
Easy Maintenance
No user software upgrade or maintenance needed
All supported by the cloud
Flexible Scale Up or Scale Down
Centralized Management & Data

30
SaaS
Characteristics
Enables a Shared Data Model
Multiple users can share a single
data model and database
Cost Effectiveness
Pay based on usage
No risk in buying the wrong software
Multitenant Programming Solutions
Multiple programmers are ensured to use the same
software version
No version mismatch problems
31
Software-as-a-service

Open SaaS
Applications

32
Cloud Computing

REFERENCES

33
References
K. Kumar and Y. H. Lu, Cloud Computing for Mobile Users: Can Offloading
Computation Save Energy?, Computer, vol. 43, no. 4, pp. 5156, Apr. 2010.
Wikipedia, http://www.wikipedia.org
Apple, iCloud, https://www.icloud.com
Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]
Virtualization, Ciscos IaaS cloud,
http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg
[Accessed June 1, 2015]
Tutorialspoint, Cloud computing,
http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf
[Accessed June 1, 2015]

34
References
Image sources
AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web
Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons
MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

35
Cloud Computing

Cloud Services

36
Cloud Services

Google Cloud
Google App Engine
Released as a preview in April 2008
PaaS (Platform as a Service) for web applications
Provides automatic scaling based on resource
demands and server load

Google Cloud Storage


Launched in May 2010
Online file storage service

37
Cloud Services

Google Cloud
Google BigQuery
Released in April 2012
Data analysis tool that uses SQL-like queries to
process big datasets in seconds

Google Compute Engine


Released in June 2012
IaaS (Infrastructure as a Service) support
to enable on demand launching of VMs (Virtual
Machines)

38
Cloud Services

Google Cloud
Google Cloud Endpoints
Released in November 2013
Tool to create services inside App Engine
Easily connects from Android, iOS, and JavaScript
clients

Google Cloud DNS (Domain Name System)


DNS service supported by the Google Cloud

39
Cloud Services

Google Cloud

Google Cloud Datastore


NoSQL (No Structured Query Language) data storage

Google Cloud SQL (Structured Query Language)


Released in February 2014
as GA (General Availability)
Fully managed MySQL database

40
Cloud Services

Amazon S3 (Simple Storage Service)


Online file storage web service offered by Amazon Web
Services
Public web service released in the United States in March
2006 and in Europe in November 2007
Provides storage through
web services interfaces
(REST, SOAP, and BitTorrent)

41
Cloud Services

Amazon Cloud Drive


Amazon Cloud Drive was released in
March 2011
Web storage application from Amazon
Storage Space Characteristics
Can be accessed from up to eight specific devices (e.g.,
mobile devices & different computers) and by using
different browsers on the same computer

42
Cloud Services

Amazon Cloud Drive

Cloud Player (Originally bundled)

Users can play music in their Cloud Drive from any


computer or Android device

Music browsing based on song titles, albums, artists,


genres (website only), and playlists

43
Cloud Services

Amazon Cloud Drive Options

Unlimited Photos
Unlimited storage for photos & raw data files
5 gigabytes of video storage

Unlimited Everything
Unlimited storage for photos, videos, documents, and
various files types

44
Cloud Services

iCloud

Developed by Apple, Inc.


Public release in October 2011
Cloud Storage & Cloud Computing

Operating system
OS X (10.7 Lion or later)
Microsoft Windows 7 or later
iOS 5 or later

45
Cloud Services

iCloud replaces MobileMe


Subscription-based collection of Apples online
services and software
MobileMe was replaced by iCloud
MobileMe ceased services in
June 2012
MobileMe users were allowed transfers to iCloud
until
July 2012

46
Cloud Services

iCloud Features
Email, Contacts, and Calendars
Find My Friends
Backup & Restore
Back up feature for device settings & data
iOS 5 or later required

Find My iPhone
Enables a user to track the location of an iOS device or
Mac
Formerly a feature of MobileMe

47
Cloud Services

iCloud Features

Can manage lost or stolen Apple devices


Back to My Mac
Enables remote log in to other computers that have
Back to My Mac installed (using the same Apple ID)

iWork for iCloud


Apple's iWork suite (Pages, Numbers, and Keynote)
made available on a web interface

48
Cloud Services

iCloud Features
Photo Stream
Can store most recent 1,000 photos
Free storage for up to 30 days

iCloud Photo Library


Stores all photos at original resolution
Stores photo metadata

Storage (Introduced in 2011)


5 GB of free storage per account

49
Cloud Services

iCloud Features

iCloud Drive
Can save photos, videos, documents, and apps

iCloud Keychain
Secure database for Website and Wi-Fi
password
Secure Credit card & Debit card management for
quick access and auto-fill

50
Cloud Services

iCloud Features

iTunes Match
iTunes music library scan and match tracks
function
Serves tracks copied from CDs or other sources

51
Cloud Computing

REFERENCES

52
References
K. Kumar and Y. H. Lu, Cloud Computing for Mobile Users: Can Offloading
Computation Save Energy?, Computer, vol. 43, no. 4, pp. 5156, Apr. 2010.
Wikipedia, http://www.wikipedia.org
Apple, iCloud, https://www.icloud.com
Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]
Virtualization, Ciscos IaaS cloud,
http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg
[Accessed June 1, 2015]
Tutorialspoint, Cloud computing,
http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf
[Accessed June 1, 2015]

53
References
Image sources
AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web
Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons
MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

54
Big Data

Big Data Examples

55
Big Data

New FLU Virus Starts in the U.S.!


H1N1 flu virus (which has combined virus elements of the
bird and swine (pig) flu) started to spread in the U.S. in
2009
U.S. CDC (Centers for Disease Control and Prevention) was
only collecting diagnostic data of Medical Doctors once a
week
Using the CDC information to find how the flu was
spreading would have an approximate
2 week lag, which is far too slow compared to the speed of
the virus spreading

56
Big Data

New FLU Virus Starts in the U.S.!

What vaccine was needed?


How much vaccine was needed?
Where was the vaccine needed?

Vaccine preparation and delivery plans could


not be setup fast enough to safely prevent the
virus from spreading out of control

57
Big Data

New FLU Virus Starts in the U.S.!


Fortunately, Google published a paper about
how they could predict the spread of the winter
flu in the U.S. accurately down to specific
regions and states

This paper was published in the journal Nature


a few weeks before the H1N1 virus made the
headline news

58
Big Data

New FLU Virus Starts in the U.S.!


Millions of the most common search terms and
Millions of different mathematical models were tested
on Googles database
Google receives more than 3 billion search queries
a day

Analysis system was set to look for correlation


between the frequency of certain search queues and
the spread of the flu over time and space

59
Big Data

New FLU Virus Starts in the U.S.!

Googles method of analysis did not use data


provided from hospitals or Medical Doctors
Google used Big Data analysis on the most common
search terms people use
Googles system proved to be more accurate and
faster than analyzing government statistics

60
Big Data

Wal-Mart

Wal-Marts Data Warehouse


Stores 4 petabytes (41015) of data
Records every single purchase
Approximately 267 million
transactions a day from 6000
stores worldwide is recorded

61
Big Data

Wal-Mart

Wal-Marts Data Analysis


Focused on evaluating the effectiveness of
pricing strategies and advertising campaigns
Seeking for improvement methods
in inventory management and supply chains

62
Big Data

Recommendation System using Big Data


Based on data analysis of simple elements
What users made purchases in the past
Which items do they have in their virtual
shopping cart
Which items did customers rate and like
What influence did the rating have on other
customers to make a purchase

63
Big Data

Amazon.com
Amazon.coms Recommendation System
Item-to-Item Collaborative Filtering Algorithm
Personalization of the Online Store
Customized to each customer
Each customers store is based on the customers
personal interest
Example: For a new mother, the store will display
baby supplies and toys

64
Big Data

Citibank
Bank operations in 100 countries
Big Data analysis on the database of basic financial
transactions can enable Global insight on
investments, market changes, trade patterns, and
economic conditions
Many companies (e.g., Zara, H&M, etc.) work with
Citibank to locate new stores and factories

65
Big Data

Product Development & Sales


For example, a Smartphone takes significant time
and money to manufacture
In addition, the duration of popularity for a new
Smartphone is limited
To maximize sales, a company needs to manufacture
just the right amount of products and sell them in the
right locations

66
Big Data

Product Development & Sales


Too much will result in leftovers and a
big waste for the company!
Too less will result in a lost opportunity for company profit
and growth!
Big Data analysis can help find how many smartphones
and where the products could be popular based on
common search terms that people use Use this to also
estimate how many products could be sold in a certain
location But why is this difficult?

67
Big Data

REFERENCES

68
References
V. Mayer-Schnberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think. Houghton Mifflin Harcourt, 2013.
T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
J. Venner, Pro Hadoop. Apress, 2009.
S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, Big Data,
Analytics and the Path From Insights to Value, MIT Sloan Management Review,
vol. 52, no. 2, Winter 2011.
B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.

69
References
J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
S. Sagiroglu and D. Sinanc, Big data: A review, Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
M. Chen, S. Mao, and Y. Liu, Big Data: A Survey, Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data Mining with Big Data, IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, Jan.
2014.
Z. Zheng, J. Zhu, and M. R. Lyu, Service-Generated Big Data and Big Data-as-a-
Service: An Overview, Proc. IEEE International Congress on Big Data, pp. 403
410, Jun/Jul. 2013.

70
References
I. Palit and C.K. Reddy, Scalable and Parallel Boosting with MapReduce, IEEE
Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916,
2012.
M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, A Database
Synchronization Algorithm for Mobile Devices, IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
Hadoop Apache, http://hadoop.apache.org
Wikipedia, http://www.wikipedia.org
Image sources
Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

71
Big Data

Big Data's 4 Vs

72
Big Data

Big Datas 4 V Big Challenges

Volume Data Size

Variety Data Formats

Velocity Data Streaming Speeds

Veracity Data Trustworthiness

73
Big Data

Volume Data Size


40 Zettabytes (1021) of data is predicted to be created
by 2020
2.5 Quintillionbytes (1018) of data are created every
day
6 Billion (109) people have mobile phones
100 Terabytes (1012) of data (at least) is stored by
most U.S. companies
966 Petabytes (1015) was the approximate storage size
of the American manufacturing industry in 2009

74
Big Data

Variety Data Formats

150 Exabytes (1018) was the estimated size of data for


health care throughout the world in 2011
More than 4 Billion (109) hours each month are used in
watching YouTube
30 Billon contents are exchanged every month on
Facebook
200 Million monthly active users exchange 400 Million
tweets every day

75
Big Data

Velocity Data Streaming Speeds


1 Terabytes (1012) of trade information is exchanged
during every trading session at the New York Stock
Exchange

100 sensors (approximately) are installed in modern


cars to monitor fuel level, tire pressure, etc.

18.9 Billion network connections are predicted to


exist by 2016

76
Big Data

Veracity Data Trustworthiness

1 out of 3 business leaders have experienced trust


issues with their data when trying to make a
business decision

$3.1 Trillion (1012) a year is estimated to be wasted


in the U.S. economy due to poor data quality

77
Big Data

New technology is needed to overcome these


4 V Big Data Challenges
Volume Data Size

Variety Data Formats

Velocity Data Streaming Speeds

Veracity Data Trustworthiness

78
Big Data

REFERENCES

79
References
V. Mayer-Schnberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think. Houghton Mifflin Harcourt, 2013.
T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
J. Venner, Pro Hadoop. Apress, 2009.
S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, Big Data,
Analytics and the Path From Insights to Value, MIT Sloan Management Review,
vol. 52, no. 2, Winter 2011.
B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.

80
References
J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
S. Sagiroglu and D. Sinanc, Big data: A review, Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
M. Chen, S. Mao, and Y. Liu, Big Data: A Survey, Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data Mining with Big Data, IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, Jan.
2014.
Z. Zheng, J. Zhu, and M. R. Lyu, Service-Generated Big Data and Big Data-as-a-
Service: An Overview, Proc. IEEE International Congress on Big Data, pp. 403
410, Jun/Jul. 2013.

81
References
I. Palit and C.K. Reddy, Scalable and Parallel Boosting with MapReduce, IEEE
Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916,
2012.
M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, A Database
Synchronization Algorithm for Mobile Devices, IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
Hadoop Apache, http://hadoop.apache.org
Wikipedia, http://www.wikipedia.org
Image sources
Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

82
Big Data

HADOOP

83
Hadoop

Data Storage, Access, and Analysis

Hard drive storage capacity has tremendously


increased
But the data read and write speeds to and from the
hard drives have not significantly improved yet
Simultaneous parallel read and write of data with
multiple hard disks requires advanced technology

84
Hadoop

Data Storage, Access, and Analysis


Challenge 1: Hardware Failure
When using many computers for data storage and
analysis, the probability that one computer will fail is
very high

Challenge 2: Cost
To avoid data loss or computed analysis information
loss, using backup computers and memory is needed,
which helps the reliability, but is very expensive

85
Hadoop

Data Storage, Access, and Analysis


Challenge 3: Combining Analyzed Data

Combining the analyzed data is very difficult

If one part of the analyzed data is not ready, then the


overall combining process has to be delayed

If one part has errors in its analysis, then the overall


combined result may be unreliable and useless

86
Hadoop

Hadoop

Hadoop is a Reliable Shared Storage and Analysis System

Hadoop = HDFS + MapReduce +


HDFS provides Data Storage
HDFS: Hadoop Distributed FileSystem

MapReduce provides Data Analysis


MapReduce = Map + Reduce
Function Function

87
Hadoop

HDFS: Hadoop Distributed FileSystem

DFS (Distributed FileSystem) is designed for storage


management of a network of computers

HDFS is optimized to store huge files with streaming


data access patterns

HDFS is designed to run on clusters of general


computers

88
Hadoop

HDFS: Hadoop Distributed FileSystem


HDFS was designed to be optimal in performance
for a WORM (Write Once, Read Many times) pattern,
which is a very efficient data processing pattern

HDFS was designed considering the time to read the


whole dataset to be more important than the time
required to read the first record

89
Hadoop

HDFS

HDFS clusters use 2 types of nodes

Namenode (master node)

Datanode (worker node)

90
Hadoop

HDFS: Namenode

Manages the filesystem namespace

Maintains the filesystem tree and the metadata for all the
files and directories in the tree

Stores on the local disk using 2 file forms


Namespace Image
Edit Log

91
Hadoop

HDFS: Datanodes

Workhorse of the filesystem

Store and retrieve blocks when requested by the


client or the namenode

Report back to the namenode periodically with lists


of blocks that were stored

92
Hadoop

MapReduce

MapReduce is a program that abstracts the analysis


problem from stored data

MapReduce transforms the analysis problem into a


computation process that uses a set of keys and
values

93
Hadoop

MapReduce System Architecture

MapReduce was designed for tasks that consume


several minutes or hours on a set of dedicated trusted
computers connected with a broadband high-speed
network managed by a single master data center

94
Hadoop

MapReduce Characteristics

MapReduce uses a somewhat brute-force data analysis


approach

The entire dataset (or a big part of the dataset) is


processed for every query
Batch Query Processor model

95
Hadoop

MapReduce Characteristics

MapReduce enables the ability to run an ad hoc query


against the whole dataset within a scalable time

Many distributed systems combine data from multiple


sources (which is very difficult), but MapReduce does
this in a very effective and efficient way

96
Hadoop

Technical Terms used in MapReduce

Seek Time is the delay in finding a file

Transfer Rate is the speed to move a file

Transfer Rate has improved significantly more (i.e.,


now has much faster transfer speeds) compared to
improvements in Seek Time (i.e., still relatively slow)

97
Hadoop

MapReduce
MapReduce gains performance enhancement through
optimal balancing
of Seeking and Transfer operations
Reduce Seek operations
Effectively use Transfer operations

In the next lecture, we will compare MapReduce with a


traditional RDBMS (Rational Database Management
System)

98
Big Data

REFERENCES

99
References
V. Mayer-Schnberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think. Houghton Mifflin Harcourt, 2013.
T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
J. Venner, Pro Hadoop. Apress, 2009.
S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, Big Data,
Analytics and the Path From Insights to Value, MIT Sloan Management Review,
vol. 52, no. 2, Winter 2011.
B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.

100
References
J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
S. Sagiroglu and D. Sinanc, Big data: A review, Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
M. Chen, S. Mao, and Y. Liu, Big Data: A Survey, Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data Mining with Big Data, IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, Jan.
2014.
Z. Zheng, J. Zhu, and M. R. Lyu, Service-Generated Big Data and Big Data-as-a-
Service: An Overview, Proc. IEEE International Congress on Big Data, pp. 403
410, Jun/Jul. 2013.

101
References
I. Palit and C.K. Reddy, Scalable and Parallel Boosting with MapReduce, IEEE
Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916,
2012.
M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, A Database
Synchronization Algorithm for Mobile Devices, IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
Hadoop Apache, http://hadoop.apache.org
Wikipedia, http://www.wikipedia.org
Image sources
Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

102
Big Data

MapReduce vs.
RDBMS
103
Hadoop

MapReduce vs. RDBMS


RDBMS (Rational Database Management System)
Characteristics

RDBMS is good for updating a small proportion of a


big database

RDBMS uses a traditional B-Tree, which is highly


dependent in the time required to perform seek
operations

104
Hadoop

MapReduce vs. RDBMS

MapReduce Characteristics

MapReduce is good for updating all (or a majority) of


a big database

MapReduce uses Sort and Merge to rebuild the


database, which depends more on transfer
operations

105
Hadoop

MapReduce vs. RDBMS

RDBMS is good for applications that require the


datasets of the database to be very frequently updated
(e.g., point queries or small dataset updates)
MapReduce is better for WORM (Write Once and Read
Many times) based data applications
MapReduce is a complementary system to RDBMS

106
Hadoop

MapReduce vs. RDBMS


RDBMS MapReduce
Data Size Gigabytes (109) Petabytes (1012)
Access Interactive & Batch Batch

Updates Read & Write Many Times WORM (Write Once,


Read Many Times)
Data Static Schema Dynamic Schema
Structure
Integrity High Low
Scalability Nonlinear Linear

107
Hadoop

MapReduce vs. RDBMS: Data Types


Structured Data: Data that has a formal defined structure (e.g.,
XML documents or database tables)

Semi-Structured Data: Data that has a looser format where the


data structure is used as a guide and may be ignored

Unstructured Data: Data that does not have any formal


structure (e.g., plain text or image data)

108
Hadoop

MapReduce vs. RDBMS: Data Types


MapReduce is very effective on unstructured and semi-
structured data
Why?
MapReduce interprets data during the data
processing sessions
MapReduce does not use intrinsic properties of the
data as input keys or input values. The parameters
used
are selected by the person analyzing the data

109
Hadoop

MapReduce vs. RDBMS: Scalability


MapReduce has a programming model that is linearly
scalable

MapReduce Functions: 2 types


Map function
Reduce function

Both of these functions define a


Key-Value pair mapping relation
(e.g., Key-Value pair 1 Key-Value pair 2)

110
Hadoop
Hadoop Release Series Release 2.6.0 became available Nov. 2014

Feature 1.x 0.22 2.X


Secure authentication Yes No Yes
Old configuration names Yes
New configuration names No Yes Yes
Old MapReduce API Yes Yes Yes
Yes (with some
New MapReduce API Yes Yes
missing libraries)
MapReduce 1 runtime (Classic) Yes Yes No
MapReduce 2 runtime (YARN) No No Yes
HDFS Federation No No Yes
HDFS High-Availability No No Yes

111
Hadoop

Hadoop Release Series

2.x includes several major new features


MapReduce 2 is the new MapReduce runtime
implemented on a new system called YARN
YARN
Yet Another Resource Negotiator
General resource management system for
running distributed applications

112
Hadoop

Hadoop Release Series


HDFS Federation partitions the HDFS namespace
across multiple namenodes
Enables improved support for clusters with very
large numbers of files

HDFS High-Availability feature uses standby


namenodes for backup, and therefore, the namenode
is no longer a potential SPOF (Single Point of Failure)

113
Big Data

REFERENCES

114
References
V. Mayer-Schnberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think. Houghton Mifflin Harcourt, 2013.
T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
J. Venner, Pro Hadoop. Apress, 2009.
S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, Big Data,
Analytics and the Path From Insights to Value, MIT Sloan Management Review,
vol. 52, no. 2, Winter 2011.
B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.

115
References
J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
S. Sagiroglu and D. Sinanc, Big data: A review, Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
M. Chen, S. Mao, and Y. Liu, Big Data: A Survey, Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data Mining with Big Data, IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, Jan.
2014.
Z. Zheng, J. Zhu, and M. R. Lyu, Service-Generated Big Data and Big Data-as-a-
Service: An Overview, Proc. IEEE International Congress on Big Data, pp. 403
410, Jun/Jul. 2013.

116
References
I. Palit and C.K. Reddy, Scalable and Parallel Boosting with MapReduce, IEEE
Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916,
2012.
M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, A Database
Synchronization Algorithm for Mobile Devices, IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
Hadoop Apache, http://hadoop.apache.org
Wikipedia, http://www.wikipedia.org
Image sources
Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

117
Big Data

MapReduce

118
MapReduce

Hadoop

Hadoop is a Reliable Shared Storage and Analysis System

Hadoop = HDFS + MapReduce +

HDFS provides Data Storage


HDFS: Hadoop Distributed FileSystem

MapReduce provides Data Analysis


MapReduce = Map Function + Reduce Function

119
MapReduce

Scaling Out

Scaling out is done by the DFS (Distributed FileSystem),


where the data is divided and stored in distributed
computers & servers

Hadoop uses HDFS to move the MapReduce computation


to several distributed computing machines
that will process a part of the
divided data assigned

120
MapReduce

Jobs
MapReduce job is a unit of work that needs to be
executed

Job types: Data input, MapReduce program,


Configuration Information, etc.

Job is executed by dividing it into one of two types of


tasks
Map Task
Reduce Task

121
MapReduce

Node types for Job execution

Job execution is controlled by 2 types of nodes


Jobtracker
Tasktracker

Jobtracker coordinates all jobs

Jobtracker schedules all tasks and assigns the tasks


to tasktrackers

122
MapReduce

Tasktracker will execute its assigned task


Tasktracker will send a progress reports to the Jobtracker
Jobtracker will keep a record of the progress of all jobs executed
123
MapReduce

Data flow

Hadoop divides the input into input splits (or splits)


suitable for the MapReduce job

Split has a fixed-size

Split size is commonly matched to the size of a HDFS


block (64 MB) for maximum processing efficiency

124
MapReduce

Data flow

Map Task is created for each split

Map Task executes the map function for all records


within the split

Hadoop commonly executes the Map Task on the


node where the input data resides

125
MapReduce

Data flow

Data-Local Map Task


Data locality optimization
does not need to use the cluster network
Data-local flow process shows why the
Optimal Split Size = 64 MB HDFS Block Size
126
MapReduce

Data flow Node

Rack

Data Center
Rack-Local Map Task
Map Task
A node hosting the
HDFS Block
HDFS block replicas for
a map tasks input split
could be running other map tasks
Job Scheduler will look for a free map slot on
a node in the same rack as one of the blocks
127
MapReduce

Data flow

Off-Rack Map Task


Needed when the
Job Scheduler
cannot perform data-local or rack-local map tasks
Uses inter-rack network transfer
128
MapReduce

Map
Map task will write its output to the local disk
Map task output is not the final output, it is only the
intermediate output

Reduce
Map task output is processed by Reduce Tasks to produce
the final output
Reduce Task output is stored in HDFS
For a completed job, the Map Task output can be
discarded

129
MapReduce

Single Reduce Task

Node includes Split, Map, Sort, and Output unit


Light blue arrows show data transfers in a node
Black arrows show data transfers between nodes
130
MapReduce

Single Reduce Task

Number of reduce tasks is specified


independently, and is not based on
the size of the input
131
MapReduce

Combiner Function
User specified function to run on the Map output
Forms the input to the Reduce function
Specifically designed to minimize the data transferred
between Map Tasks and Reduce Tasks
Solves the problem of limited network speed on the
cluster and helps to reduce the time in completing
MapReduce jobs

132
MapReduce

Multiple Reducer

Map tasks partition their output, each creating one


partition for each reduce task

Each partition may use many keys and key


associated values

All records for a key are kept in a single partition

133
MapReduce

Multiple Reducers
Shuffle

Shuffle process is used in the data flow


between the Map tasks and Reduce tasks
134
MapReduce

Zero Reducer

Zero reducer uses


no shuffle process
Applied when all of the
processing can be carried
out in parallel Map tasks
135
Big Data

REFERENCES

136
References
V. Mayer-Schnberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think. Houghton Mifflin Harcourt, 2013.
T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
J. Venner, Pro Hadoop. Apress, 2009.
S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, Big Data,
Analytics and the Path From Insights to Value, MIT Sloan Management Review,
vol. 52, no. 2, Winter 2011.
B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.

137
References
J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
S. Sagiroglu and D. Sinanc, Big data: A review, Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
M. Chen, S. Mao, and Y. Liu, Big Data: A Survey, Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data Mining with Big Data, IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, Jan.
2014.
Z. Zheng, J. Zhu, and M. R. Lyu, Service-Generated Big Data and Big Data-as-a-
Service: An Overview, Proc. IEEE International Congress on Big Data, pp. 403
410, Jun/Jul. 2013.

138
References
I. Palit and C.K. Reddy, Scalable and Parallel Boosting with MapReduce, IEEE
Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916,
2012.
M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, A Database
Synchronization Algorithm for Mobile Devices, IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
Hadoop Apache, http://hadoop.apache.org
Wikipedia, http://www.wikipedia.org
Image sources
Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

139
Big Data

HDFS

140
HDFS

Hadoop

Hadoop is a Reliable Shared Storage and Analysis System

Hadoop = HDFS + MapReduce +

HDFS provides Data Storage


HDFS: Hadoop Distributed FileSystem

MapReduce provides Data Analysis


MapReduce = Map Function + Reduce Function

141
HDFS

HDFS: Hadoop Distributed FileSystem

DFS (Distributed FileSystem) is designed for storage


management of a network of computers

HDFS is optimized to store large terabyte size files


with streaming data access patterns

142
HDFS

HDFS: Hadoop Distributed FileSystem

HDFS was designed to be optimal in performance for


a WORM (Write Once,
Read Many times) pattern

HDFS is designed to run on clusters of general


computers & servers from multiple vendors

143
HDFS

HDFS Characteristics

HDFS is optimized for large scale and high throughput


data processing

HDFS does not perform well in supporting applications


that require minimum delay (e.g., tens of milliseconds
range)

144
HDFS

Blocks
Files in HDFS are divided into block size chunks 64
Megabyte default block size

Block is the minimum size of data that it can read or write

Blocks simplifies the storage and replication process


Provides fault tolerance & processing speed
enhancement for larger files

145
HDFS

HDFS

HDFS clusters use 2 types of nodes

Namenode (master node)

Datanode (worker node)

146
HDFS

Namenode
Manages the filesystem namespace
Namenode keeps track of the datanodes that have
blocks of a distributed file assigned
Maintains the filesystem tree and the metadata for all
the files and directories in the tree
Stores on the local disk using 2 file forms
Namespace Image
Edit Log

147
HDFS

Namenode

Namenode holds the filesystem metadata in its memory

Namenodes memory size determines the limit to the


number of files in a filesystem

But then, what is Metadata?

148
HDFS

Metadata

Traditional concept of the library card catalogs

Categorizes and describes the contents and context of


the data files

Maximizes the usefulness of the original data file by


making it easy to find and use

149
HDFS

Metadata Types
Structural Metadata
Focuses on the data structure's design and
specification

Descriptive Metadata
Focuses on the individual instances of application
data or the data content

150
HDFS

Datanodes

Workhorse of the filesystem

Store and retrieve blocks when requested by the client


or the namenode

Periodically reports back to the namenode with lists of


blocks that were stored

151
HDFS

Client Access

Client can access the filesystem (on behalf of the user)


by communicating with the namenode and datanodes

Client can use a filesystem interface (similar to a POSIX


(Portable Operating System Interface)) so the user code
does not need to know about the namenode and
datanodes to function properly

152
HDFS

Namenode Failure
Namenode keeps track of the datanodes that have blocks
of a distributed file assigned Without the namenode, the
filesystem cannot be used

If the computer running the namenode malfunctions then


reconstruction of the files (from the blocks on the
datanodes) would not be possible Files on the
filesystem would be lost

153
HDFS

Namenode Failure Resilience

Namenode failure prevention schemes

1. Namenode File Backup

2. Secondary Namenode

154
HDFS

1. Namenode File Backup


Back up the namenode files that form the persistent
state of the filesystems metadata
Configure the namenode to write its persistent state
to multiple filesystems
Synchronous and atomic backup
Common backup configuration Copy to Local Disk
and Remote FileSystem

155
HDFS

2. Secondary Namenode
Secondary namenode does not act the same way as the
namenode
Secondary namenode periodically merges the
namespace image with the edit log to prevent the edit log
from becoming too large
Secondary namenode usually runs on a separate
computer to perform the merge process because this
requires significant processing capability and memory

156
HDFS

Hadoop 2.x Release Series HDFS Reliability


Enhancements

HDFS Federation

HDFS HA (High-Availability)

157
HDFS

HDFS Federation
Allows a cluster to scale by adding namenodes

Each namenode manages a


namespace volume and a block pool
Namespace volume is made up of the metadata for
the namespace
Block pool contains all the blocks for the files in the
namespace

158
HDFS

HDFS Federation
Namespace volumes are all independent
Namenodes do not communicate with each other
Failure of a namenode is also independent to other
namenodes
A namenode failure does not influence the
availability of another namenodes namespace

159
HDFS

HDFS High-Availability
Pair of namenodes (Primary & Standby) are set to be in
Active-Standby configuration

Secondary namenode stores the latest edit log entries


and an up-to-date block mapping

When the primary namenode fails, the standby


namenode takes over serving client requests

160
HDFS

HDFS High-Availability

Although the active-standby namenode can takeover


operation quickly (e.g., few tens of seconds), to
avoid unnecessary namenode switching, standby
namenode activation will be executed after a
sufficient observation period
(e.g., approximately a minute or a few minutes)

161
Big Data

REFERENCES

162
References
V. Mayer-Schnberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think. Houghton Mifflin Harcourt, 2013.
T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
J. Venner, Pro Hadoop. Apress, 2009.
S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, Big Data,
Analytics and the Path From Insights to Value, MIT Sloan Management Review,
vol. 52, no. 2, Winter 2011.
B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.

163
References
J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
S. Sagiroglu and D. Sinanc, Big data: A review, Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
M. Chen, S. Mao, and Y. Liu, Big Data: A Survey, Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data Mining with Big Data, IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, Jan.
2014.
Z. Zheng, J. Zhu, and M. R. Lyu, Service-Generated Big Data and Big Data-as-a-
Service: An Overview, Proc. IEEE International Congress on Big Data, pp. 403
410, Jun/Jul. 2013.

164
References
I. Palit and C.K. Reddy, Scalable and Parallel Boosting with MapReduce, IEEE
Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916,
2012.
M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, A Database
Synchronization Algorithm for Mobile Devices, IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
Hadoop Apache, http://hadoop.apache.org
Wikipedia, http://www.wikipedia.org
Image sources
Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

165
CDN (Content Delivery Network)

CDN Introduction

166
CDN

Table of Contents
CDN Motivation & Structure

CDN Procedures

Hierarchical Content Delivery Model

CDN Market & Major Service Providers

CDN Research & Development

167
CDN

CDN Motivation
CDN is a network constructed from a group of
strategically placed and geographically distributed
caching servers

CDN is one of the most efficient solutions for CPs (Content


Providers) in serving a large number of user devices, for
reduction in content download time and network traffic

168
CDN

CDN Motivation
Network traffic that is accessed by mobile users (e.g., smart
devices) is rapidly increasing

Mobile network performance is highly dependent on the


content download of multimedia data and applications

Several mobile network operators have suffered from service


outage or performance deterioration due to the significant
increase in use of mobile devices

169
CDN
Using CDN, both content
CDN Structure download time and network
traffic are reduced

Content
Provider
User

Store
Caching popular
Server contents in
advance
Content request and delivery route with CDN
Content request and delivery route without CDN

170
CDN

CDN in Mobile Networks

Mobile communication networks have a stronger need


for both reduced traffic load and content delivery time
compared to broadband backbone networks where
capacity is abundant such that traffic load reduction may
not be as much of a critical issue

171
CDN

CDN Structure

CDN usually consists of the CP (Content Provider) and


caching servers

CP possesses all contents to serve

Caching servers are distributed in the network


containing selected copies of identical contents that the
CP stores

172
CDN

CDN Structure
When a user requests a content to its nearest
caching server, the server can delivery the
content if the requested content is in its cache

Otherwise the caching server redirects the


users request to the remotely located CP

173
CDN

CDN Procedures
When a user requests a content to its nearest caching server, the
server can delivery the content if the requested content is in its
cache

174
CDN

CDN Procedures
If the requested content is not in the local servers cache,
content request is redirected to the remotely located CP

175
CDN

Content Aging Procedure


Content aging is focused on delivering the most popular
contents to users in the most effective way
Dependent on
Location of caching servers
Number of caching servers
Limited memory size of caching servers
Content Aging
Delete expired contents from the cache server
Download updated contents from the CP

176
CDN

Content Aging Procedure

Each content has a content update period


TTL (Time to Live)
Few seconds for on-line trading
Few seconds for auction information
24 hours or more for movies

177
CDN

REFERENCES

178
References
Content Delivery Functional Architecture in NGN, Telecommunication
Standardization Sector of ITU, White Paper, Sep. 2010.
Content delivery networks: Market dynamics and growth perspectives, Informa
Telecoms & Media, White Paper, Oct. 2012.
Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-
index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]
Akamai, http://www.akamai.com/index.html/
LimeLight, http://www.limelight.com/
Level 3, http://www.level3.com/
CDNetworks, http://www.us.cdnetworks.com/

179
CDN (Content Delivery Network)

CDN Hierarchical
Content Delivery
180
Hierarchical Content Delivery

Hierarchical Content Delivery

It is not possible for a caching server to save all


contents that the CP (Content Providers) serves
Retrieving contents from the remotely located CP can
cause a long content download time. In addition, a
large amount of traffic will be generated by each
server in support of the contents packet routing

181
Hierarchical Content Delivery

Hierarchical Content Delivery


For the given cache size of each server, it is important
to maximize the hit rate of the local caching server
such that the requested contents do not have to be
retrieved from the CP
To accomplish this objective in the Internet in a
scalable way, hierarchical cooperative content delivery
techniques are used in providing content delivery to
local caching servers

182
Hierarchical Content Delivery

Hierarchical Content Delivery


CD & LCF (Content Distribution & Location Control
Functions) controls the overall content delivery process,
and has all content IDs of the CDN

CCF (Cluster Control Function) controls multiple CDPFs


(Content Delivery Processing Functions) and saves
content IDs of the cluster

CDPF stores and delivers the contents to the users

183
Hierarchical Content Delivery

Hierarchical Content Delivery Network Example

184
Hierarchical Content Delivery

Content Delivery Procedures

Case 1
Requested content is in the local cluster
Content request message is delivered to the CCF
CCF sends a session request message to the
CDPF to deliver the content to the user
CDPF delivers the content to the user

185
Hierarchical Content Delivery
Content Delivery Procedures
Case 1 Procedures

186
Hierarchical Content Delivery

Content Delivery Procedures


Case 2
Requested content is not in the local cluster, but
another local cluster (i.e., target cluster) has the
content
Procedures
Content request message is redirected from
the local cluster to the CD & LCF
Continued

187
Hierarchical Content Delivery

Content Delivery Procedures


Case 2
Procedures Continued
CD & LCF checks if the requested content is
in the
other cluster

Requested content can be delivered from the


target cluster to the user directly, or through
the local cluster (the local cluster can store
the requested content)
188
Hierarchical Content Delivery

Content Delivery Procedures


Case 2 Procedures

189
Hierarchical Content Delivery

Content Delivery Procedures

Case 3
When the requested content is not in the CDN
Content request message is sent from the
CD & LCF to the CP
CP delivers the content to the user through
the local cluster
The requested content can be stored in
the local cluster

190
Hierarchical Content Delivery
Content Delivery Procedure
Case 3 Procedures

191
CDN

REFERENCES

192
References
Content Delivery Functional Architecture in NGN, Telecommunication
Standardization Sector of ITU, White Paper, Sep. 2010.
Content delivery networks: Market dynamics and growth perspectives, Informa
Telecoms & Media, White Paper, Oct. 2012.
Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-
index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]
Akamai, http://www.akamai.com/index.html/
LimeLight, http://www.limelight.com/
Level 3, http://www.level3.com/
CDNetworks, http://www.us.cdnetworks.com/

193
CDN (Content Delivery Network)

CDN Market

194
CDN Market

Measuring the CDN Market Value


There are many ways to evaluate the value of the CDN market
Evaluation is related to the diverse range of CDN industry
participants
Example of industry participants
CSP (Communications Service Provider)
Industry manufacturers
CDN service providers
Content provider

195
CDN Market

Measuring the CDN Market Value

For communication service providers, the CDNs value


includes improving retail service delivery and supporting
their efforts to win and retain customers

For industry manufacturers, the market value is related to


the demand from telcos, content providers and other
businesses

196
CDN Market

CDN Market Size


2014 CDN Market size was $3.71 billion
CDNs Market Components
Content delivery technologies, hardware, analytics,
monitoring, encoding, transparent caching, DRM
(Digital Rights Management), CMS (Content
Management System), OVP (Online Video Platform),
etc.
CDN Market Estimations
Expectations to grow to $12.16 billion by 2019
Predicted 26.3% CAGR (Compound Annual Growth Rate) from
2014~2019

197
CDN Market

CDN Service Providers


Akamai has about 110,000 servers over the world.
Akamai's service includes cloud computing, HD video
delivery, etc.

Amazon Cloudfront delivers static and streaming


contents. Amazon Cloudfront works seamlessly with
other Amazon Web and Cloud Service solutions
S3 (Simple Storage Service)
EC2 (Elastic Compute Cloud)

198
CDN Market

CDN Service Providers

CDNetworks has POPs (Point of Presences) in 6


continents, including 20 POPs in China. Worlds
3rd largest, and Asias #1, full-service provider

Level 3 supports a comprehensive encoding suite


for video data, and intelligent traffic manager
services (i.e., load balance)

199
CDN Market

CDN Service Providers

Limtlight has 6,000 servers at 75 POPs (Points of


Presence), and more than 30 regional content delivery
centers in the U.S., Europe, and Asia

ChinaCache is a CDN market leader in China, which


has 127 POPs and 11,000 servers in China. CDN
services include hotlink protection, custom CNAME
for SSL and Purge All.

200
CDN Market

Telcos with a CDN resale agreement

CDN Provider Operator (Market Region)


Verizon (US), NTT Communications (Japan), du (UAE),
Akamai Telekom Malaysia (Malaysia)
Andorra Telecom (Andorra), MegaFon (Russia),
CDNetworks Telecom Italia Sparkle (Italy),
SingTel (Singapore)
ChinaCache China Mobile (China), HGC (International)

201
CDN Market

Telcos with a CDN resale agreement


CDN Provider Operator (Market Region)

AT&T (US), AAPT (Australia), Deutsche Telekom ICSS


EdgeCast (Germany), Dogan Telecom (Turkey), Pacnet (Asia Pacific),
Telus (Canada)
Jet-Stream Telenet (Belgium), Ziggo (Netherlands)
Internexa (South America), MWeb (South Africa), STC
Level 3 (Saudi Arabia)
Limelight Bell Canada (Canada), Bestel (Mexico),
Networks Bharti Airtel (India), XO Communications (US)

202
CDN

REFERENCES

203
References
Content Delivery Functional Architecture in NGN, Telecommunication
Standardization Sector of ITU, White Paper, Sep. 2010.
Content delivery networks: Market dynamics and growth perspectives, Informa
Telecoms & Media, White Paper, Oct. 2012.
Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-
index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]
Akamai, http://www.akamai.com/index.html/
LimeLight, http://www.limelight.com/
Level 3, http://www.level3.com/
CDNetworks, http://www.us.cdnetworks.com/

204
CDN (Content Delivery Network)

CDN R&D

205
CDN

CDN Research & Development


Content Aspects
Content Type based Differentiated Support
Data, Multimedia, Mobile Apps, etc.
Content Aging Control
Content Selection & Deletion
Content Replication Detection
Dynamic Page Publishing
Digital Rights Management
Live Event Management

206
CDN

CDN Research & Development


System Aspects
Surrogate Server Location (Dynamic)
Storage Memory Size (Dynamic)
Content Delivery Method
Mobile Device Characteristics, Location
Network Latency
Security & Information Assurance
Anomaly Detection
User Authentication
Content Authentication
207
CDN

Mobile CDN Research & Development


Mobile wireless networks have additional challenges in
supporting CDN services, e.g.,
GPS & Navigation Information
Mobile TV
ITS (Intelligent Transportation System)
LBS (Location Based Service)

Efficient content provisioning is required to provide


scalable control over wide coverage areas while
providing high levels of QoS with limited resources

208
CDN

Mobile CDN Challenges


Mobile node constraints (limited storage, processing
power, input capability) due to the portable size of mobile
devices

Frequent network disconnections due to mobile users

Location oriented services regarding user mobility

Real time monitoring to obtain the real time status of mobile


users

209
CDN

CDN vs. Mobile CDN


Features CDN Mobile CDN [Future]
Static, Dynamic,
Content Type Static, Dynamic, Streaming
Streaming
Users Location Fixed Mobile, Fixed

Surrogate Location Fixed Fixed, [Mobile]

ISP (Internet Service BSs (Base Stations), RAN (Radio


Surrogate Topology Provider) Local, Center Access Network) Systems,
of Service Area [Mobile Devices]

Maintenance Complexity Low~Medium Medium~High [Dynamic]


Multimedia & Data Mobile Apps, LBS, [Mobile]
Services
Services, etc. Cloud, etc.

210
CDN

REFERENCES

211
References
Content Delivery Functional Architecture in NGN, Telecommunication
Standardization Sector of ITU, White Paper, Sep. 2010.
Content delivery networks: Market dynamics and growth perspectives, Informa
Telecoms & Media, White Paper, Oct. 2012.
Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-
index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]
Akamai, http://www.akamai.com/index.html/
LimeLight, http://www.limelight.com/
Level 3, http://www.level3.com/
CDNetworks, http://www.us.cdnetworks.com/

212

Você também pode gostar