in partial fulfillment for the award of the degree of
MASTER OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
MAHENDRA COLLEGE OF ENGINEERING SALEM
ANNA UNIVERSITY: CHENNAI 600 025
MAY 2014
RECENT ADVANCEMENTS IN CLOUD COMPUTING: A CASE STUDY APPROACH
ii
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this technical seminar report RECENT ADVANCEMENTS IN CLOUD COMPUTING: A CASE STUDY APPROACH is the bonafide work of Gunasekaran, D (621513510002) who carried out the work under my supervision.
Submitted for the technical seminar held on __________
SIGNATURE Mrs. A.LOGANAYAKI, M.E., SUPERVISOR Assistant Professor, Department of Information Technology, Mahendra College of Engineering, Minnampalli, Salem SIGNATURE
HEAD OF THE DEPARTMENT
Department of Information Technology, Mahendra College of Engineering, Minnampalli, Salem iii
ABSTRACT
Cloud Computing is the latest buzz word in the IT industry. This internet - based ongoing technology which has brought out flexibility, capacity and power of processing has realised service- oriented idea and has created a new ecosystem in the computing world with its great power and benefits. Cloud capabilities have been able to move IT industry one giant step forward. Nowadays, large and famous enterprises have resorted to cloud computing and have transferred their processing and storage to it. This report discusses the basics of cloud computing followed by review of two important and emerging aspects of cloud computing, viz., performance evaluation and network virtualisation. Due to popularity and progress of cloud in different organisations, cloud performance evaluation is of special importance and this evaluation can help users make right decisions.
Network virtualisation is the key to the current and future success of cloud computing. Key reasons for virtualisation, several of the networking technologies that have been developed recently or are being developed in various standards bodies are reviewed including software defined networking, which is the key to network programmability. OpenADN - application delivery in a multi-cloud environment is also briefly reviewed.
The design and implementation of an academic cloud at IIT Delhi, named as Baadal, which is an in-house effort, is reviewed as a case study to highlight the recent achievements within the sphere of cloud computing.
iv
ACKNOWLEDGEMENT
I take immense pleasure in expressing my humble note of gratitude to our honourable Chairman Shri.M.G.BHARATHKUMAR, M.A., B.Ed., and our young and dynamic Managing Directors Er.Ba.MAHENDHIRAN, B.E., and Er.Ba.MAHA AJAY PRASATH, B.E.,M.S(U.S.A)., who have provided excellent facilities to complete the technical seminar successfully.
I also express my gratitude and thanks to our honourable Principal Dr.R.ASOKAN, M.Tech., Ph.D., F.I.E., F.T.A., for providing all facilities for carrying out the technical seminar work. I take immense pleasure in expressing my heart-felt gratitude to our Dean, Dr. S.KRISHNAKUMAR, M.Tech., Ph.D., for his guidance and sustained encouragement for the successful completion of this report.. I wish to express my sense of gratitude and sincere thanks to our Head of the Department Dr. N.SATISH, M.E., Ph.D., of Information Technology for his valuable guidance and resources provided for completion of the technical seminar report . I express my profound sense of thanks with deepest respect and gratitude to my Guide Mrs. A.LOGANAYAKI, M.E., Assistant Professor, Department of Information technology for her valuable and precious guidance for completion of this report. v
TABLE OF CONTENTS
Chapter No. Description Page No Abstract iii 1 Introduction 1 2 Cloud Computing 2 3 Evolution and Potential 5 4 Virtualisation 11 5 Performance Evaluation 14 6 Network Virtualisation 18 7 Design and Implementation of Academic Cloud: Baadal at IIT Delhi 27 8 Conclusion 32 References 33
1
CHAPTER 1
INTRODUCTION
As more aspects of our work and life move online and the Web expands beyond a communication medium to become a platform for business and society, a new paradigm of large-scale distributed computing has emerged in our lives. Cloud computing has very quickly become one of the hottest topics, if not the hottest one , for practicing engineers and academics in domains related to engineering, science, and art for building large-scale networks and Internet applications. Nowadays, everyones talking about cloud computing. In academia, numerous research papers, tutorials, workshops, and panels on this emerging topic have been presented at major conferences and published in the top-level computer science journals and magazines. Also, several universities have added courses that are dedicated to cloud computing principles. A plethora of blogs, forums, and discussion groups on the subject are available on the Web. In industry, companies are devoting great resources to investing in cloud computing, either by building their own infrastructures or developing innovative cloud services.
Cloud computing is a new multidisciplinary research field, considered to be the evolution and convergence of several independent computing trends such as Internet delivery, pay-as-you-go utility computing, elasticity, virtualization, grid computing, distributed computing, storage, content outsourcing, security, and Web 2.0. However, multidisciplinary nature of cloud computing has raised questions in the research community about how novel this new paradigm is because it includes almost everything that existing technologies already do. An attempt is made to demystify cloud computing and highlight the innovative aspects of cloud computing, identifying its major technical and nontechnical challenges.
2
CHAPTER 2
CLOUD COMPUTING
2.1 Definition
Even though we cant precisely define the cloud because its an evolving paradigm, the US National Institute of Standards and Technologys definition covers the most important aspects of the cloud vision. NIST defines [1] as Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models.
2.2 Essential Characteristics
On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider. Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).
Resource pooling. The providers computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or data centre). Examples of resources include storage, processing, memory, and network bandwidth.
3
Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.
2.3 Service Models
Software as a Service (SaaS). The capability provided to the consumer is to use the providers applications running on a cloud infrastructure2. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider.3 The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.
Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, 4
storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).
2.4 Deployment Models Private cloud. The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.
Community cloud. The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.
Public cloud. The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.
Hybrid cloud. The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
5
CHAPTER 3
EVOLUTION AND POTENTIAL
3.1 Evolution
Figure 3.1 below shows the evolution of cloud computing as a paradigm. With the launch of Amazon Web Services in 2002 the proliferation of firms entering into this business has been tremendous. Figure 3.2 depicts and alternate view of the evolution process which has ushered in the Ubiquity era. Figure 3.3 shows how computing era has evolved over a period of time in a nutshell.
Figure 3.1- Cloud computing timeline. Cloud computing has evolved from previous computing paradigms going back to the days of mainframes
John McCarthy envisions that computation might someday be organised as a public utility The term grid computing is originated by Ian Foster and Carl Kesselmans work, The Grid: Blueprint for a New Computing Infrastructure Salesforce. com introduces the concept of delivering enterprise application s via a website
Amazon Web Services provide a suite of cloud-based services, including storage and computation
Amazon launches Elastic Compute (EC2) as a commercial Web service that small companies and individuals rent to run their own computer applications Private cloud models make their appearance Cloud providers offer browser- based enterprise application s
Cloud 2.0 model is emerging
IBM (key contributor Jim Rymarczyk) launches CP-67 software; one of IBMs first attempts at virtualising mainframe operating systems
1961 1967 1999 2002 2006 2008 2009 2010
6
Figure 3.2- A different perspective of cloud computing evolution
Figure 3.3 March to the Ubiquity Era
7
3.2 Cloud Computing Growth and Potential Similarly, the user base of cloud services and revenue generated by way of business opportunities have also seen meteoric rise. Today, Telcos have around a 5% share of nearly $20Bn p.a. cloud services revenue, with 25% compound annual growth rate (CAGR) forecast to 2015. Most market forecasts are that the total cloud services market will reach $45-50Bn revenue by 2015 . Applying these views to an extrapolated 'mid-point' forecast view of the Cloud Market in 2015, implies that Telcos will take just under $9Bn revenue from Cloud by 2014, thus increasing today's $1Bn share nine-fold. Figure 3.4 shows the growth forecast the current market players in cloud computing services.
Figure 3.4- Cloud services current players and market growth
3.3 Reference Architecture - The Conceptual Reference Model
Figure 3.5 below presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing. 8
Figure 3.5 NIST Reference Architecture
A brief role/definition of the key players of cloud computing is shown figure 3.6
Ac tor Definition
Cloud Consumer A person or organization that maintains a business relationship with, and uses service from, Cloud Providers.
Cloud Provider A person, organization, or entity responsible for making a service available to interested parties.
Cloud Auditor A party that can conduct independent assessment of cloud services,
information system operations, performance and security of the cloud implementation.
Cloud Broker An entity that manages the use, performance and delivery of cloud
services, and negotiates relationships between Cloud Providers and Cloud Consumers.
Cloud Carrier An intermediary that provides connectivity and transport of cloud services from Cloud Providers to Cloud Consumers.
Figure 3.6- Actors of cloud computing 9
It also makes economic sense to enterprises and start-ups to migrate to cloud due to the obvious benefits it shall accrue to them over a period of time and the really short uptime required to use cloud services. Figure 3.7 shows that the tip of the iceberg, which is just the acquisition cost is only 10 % as compared to the hidden cost of operation and maintenance.
Figure 3.7-Total Cost of IT infrastructure
3.4 Services on the cloud The services offered to the consumers almost encompasses everything including scientific computing as depicted in figure 3.8.
Figure 3.8 Services offered on the cloud 10
The grid in figure 3.9 shows the type of services offered by various layers of cloud computing
Figure 3.9- Grid showing the services offered by cloud computing
11
CHAPTER 4
VIRTUALISATION
Virtualisation is one of the bedrock of cloud computing. The other being multi-tenancy. All services offered on the cloud depend on these two fundamental pillars as shown in Figure 4.1
Figure 4.1- Pillars of Cloud Computing
The Internet has resulted in virtualization of all aspects of our life. Today, our workplaces are virtual, we shop virtually, get virtual education, entertainment is all virtual, and of course, much of our computing is virtual. The key enabler for all virtualizations is the Internet and various computer networking technologies. It turns out that computer networking itself has to be virtual-ized. Several new standards and technologies have been developed for network virtualization. This article is a survey of these technologies. There are many reasons why we need to virtualise resources. The five most common reasons are: Sharing: When a resource is too big for a single user, it is best to divide it into multiple virtual pieces, as is the case with todays multi-core processors. Each processor can run multiple virtual machines (VMs), and each machine can be used by a different user. The same applies to high-speed links and large-capacity disks. 12
Isolation: Multiple users sharing a resource may not trust each other, so it is important to provide isolation among users. Users using one virtual component should not be able to monitor the activities or interfere with the activities of other users. This may apply even if different users belong to the same organization since different departments of the organization (e.g., finance and engineering) may have data that is confidential to the department. Aggregation: If the resource is too small, it is possible to construct a large virtual resource that behaves like a large resource. This is the case with storage, where a large number of inexpensive unreliable disks can be used to make up large reliable storage. Dynamics: Often resource requirements change fast due to user mobility, and a way to reallocate the resource quickly is required. This is easier with virtual resources than with physical resources. Ease of management: Last but probably the most important reason for virtualization is the ease of management. Virtual devices are easier to manage because they are soft-ware-based and expose a uniform interface through standard abstractions. Virtualisation is not a new concept to computer scientists. Memory was the first among the computer components to be virtualized. Memory was an expensive part of the original computers, concepts were developed in the 1970s. Study and comparison of various page replacement algorithms was a popular research topic then. Todays computers have very sophisticated and multiple levels of caching for memo-ry. Storage virtualization was a natural next step with virtual disks, virtual compact disk (CD) drives, leading to cloud storage today. Virtual-isation of desktops resulted in thin clients, which resulted in significant reduction of capital as well as operational expenditure, eventually leading to virtualization of servers and cloud computing. However, there has been significant renewed interest in network virtualization fuelled primarily by cloud computing. Several new standards have been developed 13
and are being developed. Software defined networking (SDN) also helps in network virtualisation The efficiency and effectiveness of cloud computing as a service intrinsically depends on performance and continued innovation. Thus a review of literature available on performance evaluation of cloud computing was carried out.
14
CHAPTER 5
PERFORMANCE EVALUATION
Cloud computing resources must be compatible, high performance and powerful. High performance is one of the cloud advantages which must be satisfactory for each service.
Higher performance of services and anything related to cloud have influence on users and service providers. Hence, performance evaluation for cloud providers and users is important. There are many methods for performance prediction and evaluation; the following methods in is used in the evaluation process:
Evaluation based on criteria and characteristics
Evaluation based on simulation
Another category which can be considered for evaluating cloud performance is classification of three layers of cloud services evaluation.
5.1. Factors affective on performance
Nowadays , the term performance is more than a classic concept and includes more extensive concepts such as reliability, energy efficiency, scalability and soon. Due to the extent of cloud computing environments and the large number of enterprises and normal users who are using cloud environment, many factors can affect the performance of cloud computing and its resources. Some of the important factors considered in this paper are as follows:
Security, the impact of security on cloud performance may seem lightly strange, but the impact of security on network infrastructure has been proven. For example, DDoS attacks have wide impact on networks performance and if happen, it will greatly reduce networks performance and also be effective on response time too. Therefore, if this risk and any same risks threaten cloud environment, it will be a big concern for users and providers.
15
Recovery, when data in cloud face errors and failures or data are lost for any eason, the time required for data retrieval and volumes of data which are recoverable, will be effective on cloud performance. For example, if the data recovery takes a long time will be effective on cloud Performance and customer satisfaction, because most organizations are cloud users and have quick access to their data and their services are very important for them.
Service level agreements, when the user wants to use cloud services, an agreement will be signed between users and providers which describes users requests, the ability of providers, fees, fines etc. If performance is looked at from personal view, the better, more optimal and more timely the agreed requests, the higher the performance will be .This view also holds true for providers.
Network bandwidth, this factor can be effective on performance and can be a criterion for evaluations too. For example, if the bandwidth is too low to provide service to customers, performance will be low too
Storage capacity, Physical memory can also be effective on the performance criteria. This factor will be more effective in evaluating the performance of cloud infrastructure
Buffer capacity: as shown in figure 2, if servers cannot serve a request, it will be buffered in a temporary memory. Therefore, buffer capacity effect on performance. If the buffer capacity is low, many requests will be rejected and therefore performance will be low.
Disk capacity, can also have a negative or positive impact on performance in cloud
Fault tolerance, this factor will have special effect on performance of cloud environment. As an example, if a data centre is in deficient and is able to provide the minimum services, this can increase performance.
Availability, with easy access to cloud services and the services are always available, performance will be increase. 16
Number of users, if a data centre has a lot of users and this number is greater than that of the rated capacity, this will reduce performance of services.
Location, data centres and their distance from a users location are also an important factor that can be effective on performance from the users view.
Other factors that can affect performance which are as follows:-
Usability
Scalability
Workload
Repetition or Redundancy
Processor Power
Latency
5.2. Simulation Category
There are three categories in this simulation and evaluation based on major components in cloud environment. Specific metrics are used and the categories have been selected because data centres, users and geographic region are important in cloud computing environments.
Simulation and evaluation based on data centres. Evaluation is done by modifying the virtual machine, memory and bandwidth. Results show the response time for some users increasing with improved processing time, bud additional Data Centre is only extra cost and also for low volume requests, additional Data Centre will not Cause significant changes in processing time.
The average of maximum service time per request in data centres. It also proved to be similar to previous values after the same number of data centres. This reviews shows that the average service time is decreasing with increasing number of centres, but reduction is lower after some additional data centre and costs is too high.
Change in number of processors of data centres has the greatest impact on 17
processing time, and has the greatest impact on cost too.
Simulation and evaluation based on users. The results with change in number of users and volume of work is evaluated then. It can be concluded that if a data centre user is overrated capacity, not only it will not be profitable but also it will lower efficiency of that centre.
The results shows that increasing the number of requests per unit times have little impact on response time, on processing time, data centres. But unlike other measures, it is effective on the data transfer and thus the cost of data transfer. More information can be transferred with the increasing number of requests and therefore, costs will also increase.
Simulation and evaluation based on geographical region. The impact of geographical location of users and data centres are studied to determine how effective they will be on criteria if the data centres and users are in the same region or are far from each other in different regions.
The results show that these changes affect cost and other measures. This result shows that it is better for the users and data centres to be in the same region or have the least distribution. Processing rate in data centres will be reduced when user is away from the centre, because the response time increases, so users may have fewer requests from the data centre.
18
CHAPTER 6
NETWORK VIRTUALISATION
6.1 Introduction
A computer network starts with a network inter-face card (NIC) in the host, which is connected to a layer 2 (L2) network (Ethernet, WiFi, etc.) segments. Several L2 network segments may be interconnected via switches (a.k.a. bridges) to form an L2 network, which is one subnet in a layer 3 (L3) network (IPv4 or IPv6). Multiple L3 networks are connected via routers (a.k.a. gate-ways) to form the Internet. A single data centre may have several L2/L3 networks. Several data centres may be interconnected via L2/L3 switches. Each of these network components - NIC, L2 network, L2 switch, L3 networks, L3 routers, data centres, and the Internet - needs to be virtualised. There are multiple, often competing, standards for virtualization of several of these components. Several new ones are being developed.
When a VM moves from one subnet to another, its IP address must change, which complicates routing. It is well known that IP addresses are both locators and system identifiers, so when a system moves, its L3 identifier changes. In spite of all the developments of mobile IP, it is significantly simpler to move systems within one subnet (within one L2 domain) than between subnets. This is because the IEEE 802 addresses used in L2 networks (both Ethernet and WiFi) are system identifiers (not locators) and do not change when a system moves. Therefore, when a network connection spans multiple L2 networks via L3 routers, it is often desirable to create a virtual L2 network that spans the entire network. In a loose sense, several IP networks together appear as one Ethernet network.
6.2 Virtualisation Of NICs
Each computer system needs at least one L2 NIC (Ethernet card) for communication. There-fore, each physical system has at least one physical NIC. However, if we run multiple VMs on the system, each VM needs its own virtual NIC. As shown in Fig. 6.1, one way to solve this problem is for the hypervisor software that provides processor virtualization also implements as many virtual 19
NICs (vNICs) as there are VMs. These vNICs are interconnected via a virtual switch (vSwitch) which is connected to the physical NIC (pNIC). Multiple pNICs are connected to a physical switch (pSwitch). We use this notation of using p-prefix for physical and v-prefix for virtual objects. In the figures, virtual objects are shown by dotted lines, while physical objects are shown by solid lines.
vM1 v M2 v M1 v M2 v M1 v M2
vNIC1 v NIC2 p M Hy pervisor
v NIC1 v NIC2
v Switch
v NIC1 v NIC2
VEP A
pM
pM
pNIC p NIC v Switch p NIC
p Switch
(a)
(b)
(c)
Figure 6.1- Three approaches to NIC virtualization
Virtualization of the NIC may seem straight-forward. However, there is significant industry competition. Different segments of the networking industry have come up with competing standards. Figure 6.1 shows three different approaches.
The first approach, providing a software vNIC via hypervisor, is the one proposed by VM software vendors. This virtual Ethernet bridge (VEB) approach has the virtue of being trans-parent and straightforward. Its opponents point out that there is significant software overhead, and vNICs may not be easily manageable by external network management software. Also, vNICs may not provide all the features todays pNICs provide.
The second approach provided by pNIC vendors (or pNIC chip vendors) have their own solution, which provides virtual NIC ports using single-route I/O virtualization (SR-IOV) on the peripheral-component interconnect (PCI) bus.
20
The third approach by the switch vendors (or pSwitch chip vendors) have yet another set of solutions that provide virtual channels for inter-VM communication using a virtual Ethernet port aggregator (VEPA), which passes the frames simply to an external switch that implements inter-VM communication policies and reflects some traffic back to other VMs in the same machine. IEEE 802.1Qbg specifies both VEB and VEPA.
6.3 Virtualisation of Switches
A typical Ethernet switch has 32128 ports. The number of physical machines that need to be connected on an L2 network is typically much larger than this. Therefore, several layers of switches need to be used to form an L2 network. IEEE Bridge Port Extension standard 802.1BR, shown in Fig. 2, allows forming a virtual bridge with a large number of ports using port extenders that are simple relays and may be physical or virtual (like a vSwitch).
pBridge vBridge
Port extender Port extender
Port extender Port extender
Figure 6.2- IEEE 802.1BR bridge port extension.
6.4 Virtualisation in LAN Clouds
One additional problem in the cloud environment is that multiple VMs in a single physical machine may belong to different clients and thus need to be in different virtual LANs (VLANs). As discussed earlier, each of these VLANs may span several data centres interconnected via L3 networks, as shown in Fig. 6.3.
21
S erver 1 S erver 2
V M2-1
VM2- 1
V M2-1
VM2- 1
V LAN 22 VLAN 34 V LAN 34 VLAN 74
V M2-3
VM2- 4
V M2-3
VM2- 4
V LAN 74 VLAN 98 V LAN 98 VLAN 22
Hypervisor VTEP IP1
L3 networks
Hypervisor VTEP IP2
Figure 6.3- Different virtual machines may be in different VLANs.
Again, there are a number of competing proposals to solve this problem. VMware and sever-al partner companies have proposed virtual extensible LANs (VXLANs). Network virtualization using generic routing encapsulation (NVGRE) and the Stateless Transport Tunnelling (STT) protocol are two other proposals being considered in the Network Virtualization over L3 (NVO3) working group of the Internet Engineering Task Force (IETF).
6.5 Network Function Virtualisation
Standard multi-core processors are now so fast that it is possible to design networking devices using software modules that run on standard processors. By combining many different functional modules, any networking device - L2 switch, L3 router, application delivery controller, and so on - can be composed cost effectively and with acceptable performance. The Network Function Virtualization (NFV) group of the European Telecommunications Standards Institute (ETSI) is working on developing standards to enable this.
6.6 Software Defined Networking
Software defined networking is the latest revolution in networking innovations. All components of the networking industry, including network equipment vendors, Internet service providers, cloud service providers, and users, 22
are working on or looking forward to various aspects of SDN. SDN consists of four innovations: Separation of the control and data planes Centralization of the control plane Programmability of the control plane Standardization of application programming interfaces (APIs)
Each of these innovations is explained briefly below.
6.7 Separation of the Control Plane and Data Plane
Networking protocols are often arranged in three planes: data, control, and management. The data plane consists of all the messages that are generated by the users. To transport these messages, the network needs to do some house-keeping work, such as finding the shortest path using L3 routing protocols such as Open Shortest Path First (OSPF) or L2 forwarding proto-cols such as Spanning Tree. The messages used for this purpose are called control messages and are essential for network operation. In addition, the network manager may want to keep track of traffic statistics and the state of various networking equipment. This is done via network management. Management, although important, is different from control in that it is optional and is often not done for small networks such as home networks.
One of the key innovations of SDN is that the control should be separated from the data plane. The data plane consists of forwarding the packets using the forwarding tables prepared by the control plane. The control logic is separated and implemented in a controller that prepares the forwarding table. The switches implement data plane (forwarding) logic that is greatly simplified. This reduces the complexity and cost of the switches significantly.
6.8 Centralisation of Control Plane
The U.S. Department of Defence funded Advanced Research Project Agency Network (ARPAnet) research in the early 1960s to counter the threat that the entire nationwide communication system could be disrupted if the telecommunication centres, which were highly centralized and owned by a single company at that time, 23
were to be attacked. ARPAnet researchers therefore came up with a totally distributed architecture in which the communication continues and packets find the path (if one exists) even if many of the routers become non-operational. Both the data and control planes were totally distributed. For example, each router participates in helping prepare the routing tables. Routers exchange reachability information with their neighbours and neighbours neighbours, and so on. This distributed control paradigm was one of the pillars of Internet design and unquestionable up until a few years ago.
Centralization, which was considered a bad thing until a few years ago, is now considered good, and for good reason. Most organizations and teams are run using centralized control. If an employee falls sick, he/she simply calls the boss, and the boss makes arrangements for the work to continue in his/her absence. Now consider what would happen in an organization that is totally distributed. The sick employee, say John, will have to call all his co-employees and tell them that he/she is sick. They will tell other employees that John is sick. This will take quite a bit of time before everyone will know about Johns sickness, and then everyone will decide what, if anything, to do to alleviate the problem until John recovers. This is quite inefficient, but is how current Internet control protocols work. Centralization of control makes sensing the state and adjusting the control dynamically based on state changes much faster than with distributed protocols.
Of course, centralization has scaling issues but so do distributed methods. For both cases, we need to divide the network into subsets or areas that are small enough to have a common control strategy. A clear advantage of centralised control is that the state changes or policy changes propagate much faster than in a totally distributed system. Also, standby controllers can be used to take over in case of failures of the main controller. Note that the data plane is still fully distributed.
6.9 Programmable Control Plane
Now that the control plane is centralized in a central controller, it is easy for the network manager to implement control changes by simply changing the control program. In effect, with a suitable API, one can implement a variety of policies and change them dynamically as the system states or needs change. 24
This programmable control plane is the most important aspect of the SDN. A programmable control plane in effect allows the network to be divided into several virtual networks that have very different policies and yet reside on a shared hardware infrastructure. Dynamically changing the policy would be very difficult and slow with a totally distributed control plane.
6.10 Standardisation of API
SDN consists of a centralised control plane with a southbound API for communication with the hardware infrastructure and a northbound API for communication with the network applications. The control plane can be further subdivided into a hypervisor layer and a control system layer. A number of controllers are already available. Floodlight is one example. OpenDaylight is a multi-company effort to develop an open source controller. A networking hypervisor called FlowVisor that acts as a transparent proxy between forwarding hardware and multiple controllers is also available.
The main southbound API is OpenFlow, which is being standardized by the Open Networking Foundation. A number of proprietary southbound APIs also exist, such as OnePK from Cisco. These later ones are especially suit-able for legacy equipment from respective vendors. Some argue that a number of previously existing control and management protocols, such as Extensible Messaging and Presence Protocol (XMPP), Interface to the Routing System (I2RS), Software Driven Networking Protocol (SDNP), Active Virtual Network Management
Protocol (AVNP), Simple Network Management Protocol (SNMP), Network Configuration (Net-Conf), Forwarding and Control Element Separation (ForCES), Path Computation Element (PCE), and Content Delivery Network Interconnection (CDNI), are also potential southbound APIs. However, given that each of these was developed for another specific application, they have limited applicability as a general-purpose southbound control API.
Northbound APIs have not been standardised yet. Each controller may have a different programming interface. Until this API is standardised, development of 25
network applications for SDN will be limited. There is also a need for an east-west API that will allow different controllers from neighbouring domains or in the same domain to communicate with each other.
Networking industry has shown enormous interest in SDN. SDN is expected to make the net-works programmable and easily partitionable and virtualisable. These features are required for cloud computing where the network infrastructure is shared by a number of competing entities. Also, given simplified data plane, the forwarding elements are expected to be very cheap standard hardware. Thus, SDN is expect-ed to reduce both capital expenditure and operational expenditure for service providers, cloud service providers, and enterprise data centres that use lots of switches and routers. SDN is like a tsunami that is taking over other parts of the computing industry as well. More and more devices are following the software defined path with most of the logic implemented in software over standard processors. Thus, today we have software defined base stations, software defined optical switches, software defined routers, and so on. Regardless of what happens to current approaches to SDN, it is certain that the networks of tomorrow will be more programmable than today. Programmability will become a common feature of all networking hardware so that a large number of devices can be programmed (aka orchestrated) simultaneously. The exact APIs that will become common will be decided by transition strategies since billions of legacy networking devices will need to be included in any orchestration. It must be pointed out that NFV and SDN are highly complementary technologies. They are not dependent on each other.
6.11 Open Application Delivery Using SDN
While current SDN-based efforts are mostly restricted to L3 and below (network traffic), it may be extended to manage L3 and above application traffic as well. Application traffic management involves enforcing application deployment and delivery policies on application traffic flows that may be identified by the type 26
of application, application deployment context (application partitioning and replication, intermediary service access for security, performance, etc.), user and server contexts (load, mobility, failures, etc.), and application QoS requirements. This is required since delivering modern Internet-scale applications has become increasingly complex even inside a single private data centre.
Key features of OpenADN
OpenADN takes network virtualization to the extreme of making the global Internet look like a virtual single data centre to each ASP.
Proxies can be located anywhere on the global Internet. Of course, they should be located in proximity to users and servers for optimal performance.
Backward compatibility means that legacy traffic can pass through OpenADN boxes, and OpenADN traffic can pass through legacy boxes.
No changes to the core Internet are neces-sary since only some edge devices need to be OpenADN/SDN/OpenFlow-aware. The remaining devices and routers can remain legacy.
Incremental deployment can start with just a few OpenADN-aware OpenFlow switches.
Economic incentives for first adopters are to be found by ISPs that deploy a few of these switches, and those ASPs that use OpenADN will benefit immediately from the technology.
ISPs keep complete control over their net-work resources, while ASPs keep complete control over their application data, which may be confidential and encrypted.
27
CHAPTER 7
DESIGN AND IMPLEMENTATION OF ACADEMIC CLOUD: BAADAL AT IIT DELHI
7.1 Introduction
Cloud Computing is becoming increasingly popular for its better usability, lower cost, higher utilization, and better management. Apart from publicly available cloud infrastructure such as Amazon EC2, Microsoft Azure, or Google App Engine, many enterprises are setting up private clouds". Private clouds are internal to the organization and hence provide more security, privacy, and also better control on usage, cost and pricing models. Private clouds are becoming increasingly popular not just with large organizations but also with medium sized organizations which run a few tens to a few hundreds of IT services.
An academic institution (university) can benefit significantly from private cloud infrastructure to service its IT, research, and teaching requirements. The paper, discusses the experience with setting up a private cloud infrastructure in Indian Institute of Technology (IIT) Delhi, which has around 8000 students, 450 faculty members, more than 1000 workstations, and around a hundred server-grade machines to manage the IT infrastructure. With many different departments and research groups requiring compute infrastructure for their teaching and research work, and other IT services, IIT Delhi has many different labs" and server rooms" scattered across the campus. The aim to consolidate this compute infrastructure by setting up a private cloud and providing VMs to the campus community to run their workloads. This can significantly reduce hardware, power, and management costs, and also relieve individual research groups of management headaches. A cloud infrastructure with around 30 servers, each with 24 cores, 10 TB shared SAN-based storage, all connected with 10Gbps fibre. A Virtual Machine is run on this hardware infrastructure using KVM and manages these hosts using the custom management layer developed using Python and libvirt
28
7.2 Salient Design Features of the Academic Cloud
While implementing the private cloud infrastructure, the team came across several issues that have previously not been addressed by commercial cloud offerings. Some of the main challenges faced by the team are discussed below:
Workflow: In an academic environment the concern is about simplicity and usability of the workflow for researchers (e.g., Ph.D. students, research staff, faculty members) and administrators (system administrators, policy makers and enforcers, approvers for resource usage).
For authentication, the cloud service is integrated with a campus-wide LDAP server to leverage existing authentication mechanisms and also the service with the campus-wide mail and Kerberos servers. A researcher creates a request which should be approved by the concerned faculty member before it is approved by the cloud administrator. Both the faculty member and cloud administrator can change the request parameters (e.g., number of cores, memory size, disk size, etc.) which is followed by a one click installation of the virtual machine. As soon as the virtual machine is installed, the faculty and the students are informed about the same with a VNC console password that they can use to access the virtual machine.
Cost and Freedom: In an academic setting, concern is also about both cost and freedom to tweak the software. For this reason, on free and open-source infrastructure are chosen. Enterprise solutions like those provided by VMware are both expensive and restrictive. The virtualization stack comprising of KVM, Libvirt, and Web2py is open-source and available freely.
Workload Performance: The researchers typically need large number of VMs executing complex simulations communicating with each other through message- passing interfaces like MPI. Both compute and I/O performance is critical for such workloads. Hardware and software are chosen to provide the maximum performance possible. For example, the best possible bandwidths between the physical hosts, storage arrays, and external network switches are ensured with available hardware. Similarly, the best possible emulated devices in Virtual Machine Monitor are used. Whenever possible, para-virtual devices for maximum performance are used. 29
Maximizing Resource Usage: Currently dedicated high-performance server- class hardware to host cloud infrastructure is used. Custom scheduling and admission-control policies to provide maximal resource usage are employed. In future, plan is to use the idle capacity of lab and server rooms to implement larger cloud infrastructure at minimal cost. Some details are discussed below. A typical lab contains tens to a few hundred commodity desktop machines, each having one or more CPUs, a few 100 GBs of storage, connected over 100Mbps or 1Gbps Ethernet. Often these clusters of computers are also connected to a shared Network-Attached Storage (NAS) device. For example, there are around 150 commodity computers in the Computer Science department. Typical utilization of these desktop computers is very low (1-10%). Intention to use this community" infrastructure for running the cloud service. The VMs will run in background, causing no interference to the applications and experience of the workstation user. This can significantly improve the resource utilization of lab machines.
7.3 Challenges
Reliability: In lab environments, it is common for desktops to randomly switch-off or become disconnected. These failures can be due to several reasons including manual reboots, pulling out of network cables, power outages, or physical hardware failures. Work on techniques to have redundant VM images to be able to recover from such failures is in progress. Network and Storage topology: Most cloud offerings use shared storage (SAN/NAS). Such shared storage can result in a single point of failure. Highly reliable storage arrays tend to be expensive. Use of disk- attached-storage in each computer to provide a high-performance shared storage pool with built-in redundancy is under investigation. Similarly, redundancy in network topology is required to tolerate network failures.
Scheduling: Scheduling of VMs on server-class hardware has been well- studied and is implemented on current cloud offerings. Scheduling algorithms for commodity hardware where network bandwidths are lower is being developed, 30
storage is distributed, and redundancy is implemented. For example, the scheduling algorithm maintains redundant copies of a VM in separate physical environments.
Encouraging Responsible Behaviour: Public clouds charge their users for CPU, disk, and network usage on per CPU-hour, GB-month, and Gbps-month metrics. Instead of a strict pricing model, reliance on good community behaviour is ensured by using different categories of users.
7.4 Ubuntu Enterprise Cloud
Ubuntu Enterprise Cloud is integrated with the open source Eucalyptus private cloud platform, making it possible to create a private cloud with much less configuration than installing Linux first, then Eucalyptus. Ubuntu/Eucalyptus internal cloud offering is designed to be compatible with Amazon's EC2 public cloud service which offers additional ease of use. On the other side, there is a need to familiarize with both Ubuntu and Eucalyptus, as were frequently required to search beyond Ubuntu documentation following the Ubuntu Enterprise Cloud's dependence on Eucalyptus. For example, it was observed that Ubuntu had weak documentation for customizing images, which is an important step in deploying their cloud. Further even though the architecture is quite stable and worth using, it doesn't serve the requirements of a custom tailored interface which should suit an academic or research environment like IIT Delhi.
7.5 VMware vCloud
VMware vCloud offers on demand cloud infrastructure such that end users can consume virtual resources with maximum agility. It offers consolidated data centres and an option to deploy workloads on shared infrastructure with built-in security and role-based access control. Migration of workloads between different clouds and integration of existing management systems using customer extensions, APIs, and open cross-cloud standards serves as one of the most convincing arguments to use the same for a private cloud. Despite these features and one of the most stable cloud platforms VMware vCloud might not be an ideal solution to be deployed by an academic institution owing to the high licensing costs attached to it, though it might prove ideal for an Enterprise with sufficiently good budget. 31
7.6 Baadal: The Workflow Management Tool for Academic Requirements
Currently Baadal is based on KVM as the hypervisor and the Libvirt API which serves as a toolkit to interact with the virtualization capabilities. The choice of libvirt is guided by the fact that libvirt can work on variety of hypervisors including KVM, Xen, and VMWare. Thus, the underlying hypervisor technology can be changed at any later stage with minimal efforts.
Management software is exported in two layers, namely Web based and Command-line interface (CLI). While the web based interface is built using web2py, a MVC based python framework, use of python is continued for the command line interface as well. The choice of the python as the primary language for the entire project is supported by the excellent support and documentation by libvirt community.
32
CHAPTER 8
CONCLUSION
Cloud computing is a result of advances in virtualization in computing, storage, and networking. Networking virtualisation is still in its infancy. Numerous standards related to network virtualization have recently been developed in the IEEE and Internet Engineering Task Force (IETF), and several are still being developed. One of the key recent developments in this direction is software defined networking. The key innovations of SDN are separation of the control and data planes, centralisation of control, programmability, and stan-dard southbound, northbound, and east-west APIs. This will allow a large num-ber of devices to easily be orchestrated (programmed). OpenFlow is the standard southbound API being defined by Open Networking Forum. Work on OpenADN, which is a network application based on SDN that enables application partitioning and delivery in a multi-cloud environment.
The recent developments in cloud computing were reviewed from available literature with specific reference to performance evaluation of cloud environment and innovation in the field of network virtualisation.
The designing and implementation of an academic cloud, Baadal at IIT Delhi with in-house efforts were examined as a case study to highlight the developments within the country in the sphere of cloud computing.
33
REFERENCES
1. National Institute of Standards and Technology U.S. Department of Commerce - Special Publication 800-145
2. Raj Jain and Subharthi Paul (2013) Networking Virtualization and Software Defined networking for Cloud Computing : A Survey IEEE Communication Magazine Nov 2013
3. Niloofar Khanghahi and Reza Ravanmehr (2013) Cloud Computing Performance Evaluation: Issues and Challenges International Journal on Cloud Computing: Services and Architecture (IJCCSA) ,Vol.3, No.5, October 2013
4. George Pallis (2010) Cloud Computing The New Frontier of Internet Computing IEEE Computer Society September/October 2010
5. Christian Vecchiola, Suraj Pandey, and Rajkumar Buyya High Performance Cloud Computing: A view of Scientific Applications Cloud computing and Distributed Systems (CLOUDS) Laboratory Department of Computer Science and Software Engineering
6. Abhishek Gupta, Jatin Kumar, Daniel Mathew, Sorav Bansal, Subhashis Banerjee and Huzur Saran Design and Implementation of the WorkFlow of an Academic Cloud IIT Delhi