Research On Fault Tolerance in Cloud Computing

1
Improving upon current Fault Tolerance

Techniques, with specific emphasis on Low-
Latency Fault Tolerance, in Cloud Computing
Systems
Jabanjalin Hilda J[a],Shivam Khanna[b],
[a]
Asso. Prof., VIT University, Vellore, India
[b]
UG Student, VIT University, Vellore, India
 Cloud Computing is a substantial scale advancement (IT)

Abstract—with the increase in efficiency of Cloud computing perspective, a model for enabling unavoidable access to shared
as a second-option to old-fashioned information processing pools of configurable resource, (for instance, PC
systems, its reliability and stability in the face of faults or of faulty frameworks,servers, stockpiling, applications and
components has become an area with great scope for research services),which can be quickly provisioned with negligible
and innovation. A typical cloud computing data centre consists of administration exertion, frequently finished the Internet. Cloud
multiple servers that efficiently co-ordinate the storage, Computing permits clients and ventures with different
execution, and handing of users’ tasks. One of the obvious registering abilities to store and process information either in
bottlenecks here are the servers themselves. Each consists of
an exclusive cloud, or on an outsider server situated in a server
hardware devices, like processors, memory registers, network
cards, etc., all of which have a scope for failure. This hardware farm - consequently making information getting to
failure can be tackled in two ways, one being a software method components more productive and reliable.Cloud figuring
and the other a hardware-based or network or a connection- depends on sharing of assets to accomplish lucidness and
based solution. In this paper, we have studied the various economy of scale, like an utility.[1]
methods and algorithms being employed to combat the issue of
fault tolerance as of today, and using the data that we have Cloud Computing is the delayed consequence of the headway
collected to improve upon a current method. One of the proposed and choice of existing advances and perfect models. The goal
models is LLFT [1], which contains a Low Latency Fault
Tolerance middleware for providing fault-tolerance. based on the
of disseminated processing is to empower customers to take
frequency of usage. Similarly, there are many areas where certain profit by these headways, without the prerequisite for
models do not work well. Hence, through the course of this paper, significant data about or authority with each and every one of
we hope to incorporate a tried and tested file system management them. The cloud intends to cut costs, and empowers the
method in cloud computing systems such that an efficient and customers to focus on their inside business rather than being
secure fault tolerant model can be developed, improving upon the obstructed by IT obstacles. The guideline engaging
current methods [2].
development for conveyed processing is virtualization.
Virtualization programming separates a physical enrolling
Index Terms—Cloud Computing, Fault Tolerance, LLFT, low-
latency, HDFS, Hadoop
device into no less than one "virtual" devices, each of which
can be easily used and made sense of how to perform
I. INTRODUCTION preparing assignments. With working framework level
virtualization essentially influencing a flexible game plan of
A. CLOUD COMPUTING various free handling devices, to sit as yet enrolling resources
can be allocated and used more capably. Virtualization gives
Cloud computing is an a remarkable popular expression today
the preparation required to quicken IT activities, and declines
among innovation devotees. It is an On-The-Go-Data
cost by growing establishment utilize. [2] Autonomic figuring
stockpiling administration to make it advantageous for the
client. motorizes the system through which the customers would
arrangement be able to resources on-ask. By constraining
customer incorporation, automation quickens the system,
diminishes work costs and reduces the probability of human
mistakes. Clients routinely stand up to troublesome business
issues. Appropriated registering gets thoughts from Service-
2
arranged Architecture (SOA) that can enable the client to stable stockpiling. In this model, calculation lost because of a
break these issues into administrations that can be coordinated disappointment can be dealt with similarly as a calculation
to give an answer. Cloud Computing gives the greater part of moved back because of a straggler.In an optimistic scheme, a
its assets as administrations, and makes utilization of the process may fail without logging any of its received messages
settled models and best practices picked up in the area of SOA since its last checkpoint. This implies that, to reduce the cost
to enable worldwide and simple access to cloud benefits in a of accessing stable storage, messages can be logged only when
standardized manner. checkpoints are being written to stable storage. This makes
optimistic schemes well suited for distributed simulation,
B. FAULT TOLERANCE IN CLOUD COMPUTING where message activity is high.
In a distributed system (for eg. a cloud computing II. EXISTING APPROACHES
environment), a crash of a Logical Process makes the whole
process/calculation to stop. The likelihood that one of the A. HADOOP
logical processes will crash amid the simulation increases with
the number or processes participating in the simulation. One of the first mainstream attempts at handling big data
Simply restarting the failed process may leave the simulation was made by Apache’s Hadoop. It was created by Doug
in a conflicting state. Cutting and Mike Cafarella in 2005 [1]. Doug Cutting named
Up until this point, the main response in such a circumstance it after his son’s yellow elephant, hence the logo as well.
has been to restart the whole system. However simply Hadoop is an open source, Java-based programming
restarting the framework is unsatisfactory for simulations that structure that backings the preparing and capacity of to a great
keep running for quite a long time (hours or days together). As degree huge informational indexes in a Cloud Computing
a result, some type of fault tolerance is required to limit the condition. It is a piece of the Apache venture supported by the
erroneous calculation to a minimum[1]. Apache Software Foundation. [1]
A Logical Process may crash because of a bug in the
application code, test system code, or working framework B. HADOOP’S CORE IDEA
code. Even when the entire code is right, the code being keep
Using a single system or server to handle terabytes and
running with a distributed test system may have been written
for a sequential test system. In such cases, it is troublesome to petabytes of data was impractical. It would take a lot of time to
find and adjust the source of the crash. The client of an store, analyse, and retrieve the data. So, what Hadoop, and
application may not be the designer of the code. So, the client Google before it, came up with was to split the big data into
might be not able (or unwilling) to troubleshoot the application small chunks and store it on various individual storage
even when the bug lies completely in the application code. The locations. This would reduce the time needed to work upon the
circumstance will be hopeless, assuming each time the system data.This is similar to the clustering method described earlier
is restarted, a similar bug was to lead to a similar crash[1]. in Section 1.2
Fortunately, experiments with different software systems have
demonstrated that most of the bugs encountered in real
scenarios are transient[1].
At the point when the procedure is restarted, more memory
may be available, subsequently keeping the crash. Accidents
are especially anticipated that would be transient in the
optimistic recreation, where an other message requesting or an
other procedure planning achieves an alternate execution,
likely bypassing the bug that caused the crash in any case.
Hence, restarting the fizzled procedure is a handy decision, if
steps are taken to ensure that the resulting framework state is
predictable. An adaptation to internal failure methodology
Figure 1: Basic Working of Hadoop
ought to likewise have the capacity to endure equipment
disappointments. Equipment disappointments might be as
Figure 1 shows that all of the data being operated upon is
processor breaking down, control disappointment or somebody
stored in the Hadoop Distributed File System
stumbling over the interfacing wires.
orHDFS[Apache HDFS]. The data is first allocated space in
We accept that procedures bomb by basically slamming and the cluster, taken out for performing an operation on them and
they don't convey any incorrect messages or do some other are then returned to the cluster along with a suitable output.
mischief. A procedure loses all its unpredictable memory in a The operations on the data is carried out using MapReduce.
disappointment. To diminish the measure of squandered For all operations on the data, it is easier to access data from
calculation, it occasionally composes its checkpoints to stable the cluster of data storage devices where the data is stored. The
stockpiling. After a disappointment, it is restarted from its last cluster consists of servers that are used to house the data[1].
stable checkpoint. We demonstrate a disappointment as a For further studies, the technical contribution of authors in
straggler occasion with a timestamp equivalent to the the field of ant colony and networks are available in [2].
timestamp of the most noteworthy checkpoint saved money on
3
of the data
D. EXISTING LOW LATENCY FAULT TOLERANCE

C. HADOOP DISTRIBUTED FILE SYSTEM (HDFS) TECHNIQUE WHICH HAS TO BE IMPROVED
To understand how HDFS works, consider the following The Low Latency Fault Tolerance (LLFT) middleware gives
example: - adaptation to internal failure to disseminated applications
Consider a data file of size 158 GB. As discussed earlier, if conveyed inside a distributed computing or server farm
condition. It utilizes the pioneer/adherent replication approach.
The LLFT middleware comprises of the accompanying key
segments: -
1. Low Latency Messaging Protocol - The Messaging
Protocol gives are at risk, completely requested
message conveyance benefit by utilizing an
immediate gathering to-amass multicast where the
requesting is controlled by the essential reproduction
in the gathering.
2. Leader-Determined Membership Protocol - The
Membership Protocol gives a quick reconfiguration
and recuperation benefit when an imitation ends up
Figure 2: Divide big data into blocks plainly broken and when a copy joins or leaves a
gathering.
we take the entire data file as a single block, it takes longer to 3. Virtual Determinizer Framework - The Virtual
analyse it. Instead, the data is split into various blocks shown Determinizer Framework catches requesting data at
by Figure 2, in this case 3 blocks [3]. the essential reproduction and authorizes a similar
This is then stored into the HDFS, thus decreasing the time requesting at the reinforcement imitations for
needed to operate upon the data. Each of the blocks are named significant wellsprings of non-determinism. The
as blk(some number) as shown in figure 2.
LLFT middleware keeps up solid imitation
Figure 3 shows that there is a sample cluster, with the data
consistency, offers application straightforwardness,
blocks we had taken earlier being individually stored into
servers inside the cluster, as shown above. Each of these and accomplishes low end-to-end
servers, numbered from 1 to 5, are called datanodes. They are idleness.Improvement suggested
the data storage units in HDFS.
E. PROPOSED MODEL
One of the proposed models is LLFT, which contains a Low
Latency Fault Tolerance middleware for giving adaptation to
non-critical failure, particularly for disseminated applications
conveyed inside the distributed computing condition as an
administration (SaaS), which is for the most part offered by the
proprietors of the cloud itself. The premise of this model is that
one of the primary difficulties of distributed computing is to
guarantee that the applications which are running on the cloud
give the proposed administration to the client with no down
time or break in service[4]. This middleware works by
Fig.3: Adding data to the cluster
recreating applications utilizing semi-dynamic replication or
Although breaking big data files into smaller pieces for semi-latent replication forms, subsequently securing the
storage makes it theoretically faster to access data, there needs application against different sorts of flaws. By making
to be a way for us to know where the data has been stored. different duplicates or imitations of uses, the impact of issues
Hence, we use another node specifically for storing the details on the working of the administration diminishes.
of the data contained in each node, and the size of each chunk Notwithstanding, the LLFT demonstrate does not work well
of data. This node is called the name node.
for administrations or applications that are utilized rarely. This
The name node contains details about each of the nodes in
is because of the way that once an application is imitated, it
the cluster and which block of data has been sent to which
node. It also keeps track of the progress made in the processing isn't moved around on the cloud in view of the recurrence of
4
use. In the event that this is done, it would permit for latency VMs. A broker implements the policies for selecting a VM to
to be reduced whilst still allowing for smooth, synchronous run a Cloudlet and a Datacenter to run the submitted VMs.
access.
III. IMPLEMENTATION IV. EXPERIMENTAL RESULTS
A. CODE REVIEW AND EXPLANATION A. TABLE I: OUTPUT OF SIMULATION

In this section, the explanation of the method employed in
this paper and the corresponding code will be expanded upon. Cloudle STATUS Data V Tim Star Finis
A real-world scenario involves the use of the following t ID Centr M e t h
entities: - e ID ID Tim Time
 Datacentre – A Datacentre is the central node that e
houses the service or data that is being requested by 0 SUCCES 2 0 400 0.1 400.1
the clients [5].
S
 Cloudlet –A cloudlet is an instance of the connection
between the client and the cloud or a datacentre and Figure: Simulated cloudlet list output table
the cloud. It allows for communication between the
two entities [5].
The table which creates the data base and on which our
 Broker – Acts on behalf of the client to exchange
conclusion is based is given here. We consider the cloudlet id
information and services with the datacentres[5].
of the cloud storage which uniquely identifies our storage on
To illustrate a real-world scenario involved in cloud
the cloud[6].
computing architectures,
We basically try to identify the problem of network transfer
Initially the cloudlet list is declared.
Step1) Declare private static List rate and basically propose the solution for the same in this
paper[9].
cloudletList and vmlist. We try to reduce the number of collisions in Network
transfer in data transfer between the user and the cloud storage
Step 2) Create a datacenter in the data centre[9].
We run this program and test simulation to run the real time
Step 3) Give each datacenter a unique id. environments to test the drops in the collision packets and how
much time they are taking from one side to the other which
Step 4) Initialise id as 0; brings us to the solution that more the transfer rate more is the
Repeat Initialise length as 40000; rate of error which has to be avoided by creating common host
and installing 3rd layer switches in between to broadcast the
Step 5) Declare Cloudlet network in a single shot without sending the packets one by
one which might make the whole procedure faster and that too
Step 6) Call cloudletList without any errors[7].
Step 7) Call broker
Step 8) Declare cloudletList as new

ArrayList
B. ANALYSIS OF SIMULATED OUTPUT

After this, for each connection, a new cloudlet will created After obtaining a simulated output, the output table is
to facilitate data transfer and access to services. analysed using cloud analyst. The simulations are also a good
Datacenter class is a CloudResource whose hostList are way to test how the proposed system would work in a real-
virtualized. It deals with processing of VM queries (i.e., world environment [9].
handling of VMs) instead of processing Cloudlet-related
queries. By selecting a datacentre in Europe and clients in the USA
A datacentre broker represents a broker acting on behalf of a for this example, we have demonstrated the efficacy of the
cloud customer. It hides VM management such as VM system[10]. The simulated output is provided in the figures.
creation, submission of cloudlets to VMs and destruction of
5
The process response times is also provided to give a clearer

picture of the output [9].
TABLE 2: TABLE FOR RESPONSE TIME
Average(m Minimum(m Maximum(m

s) s) s)
Overall 437.47 243.11 627.62
Response
Time
Data 0.35 0.02 0.63
Center
Processin
g Time
TABLE III: REPONSE TIME FROM EACH USER BASE

Figure: The average transfer time for a particular case
Userbase Avg(ms) Min(ms) Max(ms)
UB1 299.79 243.14 364.613
UB2 501.12 392.626 627.622
UB3 501.296 395.111 605.118
Figure: Data Center Hourly Average Processing Times
TABLE V: TOTAL COST ANALYSIS
COST
Total Virtual Machine $0.50
Figure: Response times from the process
Cost
graphically Total Data Transfer Cost $0.28
Grand Total $0.78
Datacentre Avg(ms) Min(ms) Max(ms) TABLE IV: DATACENTER ANALYSIS
DC1 0.349 0.018 0.625 V. CONCLUSION AND FUTURE WORK
OUR AIM IS TO PROVIDE A SOLUTION TO THE EXISTING

MODEL OF LLFT AND SUGGEST A SOLUTION TO THE
PROBLEM OF LARGE INTERVAL OF TRANSFER BETWEEN THE
6
DATA CENTER AND THE USER WHICH LEADS TO FAILURE OF

PACKETS IN BETWEEN AND THE STORAGE GETS CORRUPTED OR
INCOMPLETE FILES ARE SENT FROM THE USER TO THE DATA
CENTER. THE BASIS OF THIS MODEL IS THAT ONE OF THE MAIN
CHALLENGES OF CLOUD COMPUTING IS TO ENSURE THAT THE
APPLICATIONS WHICH ARE RUNNING ON THE CLOUD PROVIDE
THE INTENDED SERVICE TO THE USER WITHOUT ANY DOWN TIME
OR BREAK IN SERVICE. THIS MIDDLEWARE WORKS BY
REPLICATING APPLICATIONS USING SEMI-ACTIVE REPLICATION
OR SEMI-PASSIVE REPLICATION PROCESSES, THUS PROTECTING
THE APPLICATION AGAINST VARIOUS TYPES OF FAULTS. THIS
BASIC WORKING OF THIS MODEL HAS MANY SIMILARITIES TO
THE FUNCTIONING OF THE HDFS (HADOOP DISTRIBUTED FILE
SYSTEM) AS USED IN THE APACHE HADOOP PROJECT. BY
CREATING MULTIPLE COPIES OR REPLICAS OF APPLICATIONS,
THE EFFECT OF FAULTS ON THE FUNCTIONING OF THE SERVICE
REDUCES. HOWEVER, THE LLFT MODEL DOES NOT FUNCTION
WELL FOR SERVICES OR APPLICATIONS THAT ARE USED
INFREQUENTLY. THIS IS DUE TO THE FACT THAT ONCE AN
APPLICATION IS REPLICATED, IT IS NOT MOVED AROUND ON THE
CLOUD. OVERALL WE HAVE CONCLUDED THAT THE DATA
CENTER WHICH IS FAR AWAY HAS TO BE SENT DATA WITH
GREATER SPEED AS COMPARED TO THE ONE NEAR BY THIS
INCREASES THE RESPONSE TIME OF THE DATA CENTERS AND THE
LATENCY CAN BE REDUCED DRASTICALLY. WE HAVE TO
ENSURE THE SPEED OF DATA TRANSFER BASED ON THE
DISTANCE OF THE STORAGE IN THE DATA CENTER SO THAT DATA
IS NOT LOST IN BETWEEN.
REFERENCES
[1] Rimal, B. P., Choi, E., &Lumb, I. (2009). A Taxonomy and Survey of
Cloud Computing Systems. NCM, 9, 44-51.
[2] Zhao, W., Melliar-Smith, P. M., & Moser, L. E. (2010, July). Fault
tolerance middleware for cloud computing. In Cloud Computing
(CLOUD), 2010 IEEE 3rd International Conference on (pp. 67-74).
IEEE.
[3] Bala, A., & Chana, I. (2012). Fault tolerance-challenges, techniques and
implementation in cloud computing. IJCSI International Journal of
Computer Science Issues, 9(1), 1694-0814.
[4] Jhawar, R., Piuri, V., &Santambrogio, M. (2013). Fault tolerance
management in cloud computing: A system-level perspective. IEEE
Systems Journal, 7(2), 288-297.
[5] Gong, C., Liu, J., Zhang, Q., Chen, H., & Gong, Z. (2010, September).
The characteristics of cloud computing. In Parallel Processing
Workshops (ICPPW), 2010 39th International Conference on (pp. 275-
279). IEEE.
[6] Zhang, Q., Cheng, L., &Boutaba, R. (2010). Cloud computing: state-of-
the-art and research challenges. Journal of internet services and
applications, 1(1), 7-18.
[7] Jhawar, R., Piuri, V., &Santambrogio, M. (2012, March). A
comprehensive conceptual system-level approach to fault tolerance in
cloud computing. In Systems Conference (SysCon), 2012 IEEE
International (pp. 1-5). IEEE.
[8] Das, Pranesh (2013) Virtualization and Fault Tolerance in Cloud
Computing. MTech thesis.
[9] Malik, S., &Huet, F. (2011, July). Adaptive fault tolerance in real time
cloud computing. In Services (SERVICES), 2011 IEEE World Congress on
(pp. 280-287). IEEE.
[10] Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for
enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.

Research On Fault Tolerance in Cloud Computing

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Research On Fault Tolerance in Cloud Computing

Enviado por

Direitos autorais:

Formatos disponíveis

1

Improving upon current Fault Tolerance

 Cloud Computing is a substantial scale advancement (IT)

D. EXISTING LOW LATENCY FAULT TOLERANCE

III. IMPLEMENTATION IV. EXPERIMENTAL RESULTS

A. CODE REVIEW AND EXPLANATION A. TABLE I: OUTPUT OF SIMULATION

Step 7) Call broker

Step 8) Declare cloudletList as new

B. ANALYSIS OF SIMULATED OUTPUT

The process response times is also provided to give a clearer

TABLE 2: TABLE FOR RESPONSE TIME

Average(m Minimum(m Maximum(m

TABLE III: REPONSE TIME FROM EACH USER BASE

Figure: Data Center Hourly Average Processing Times

TABLE V: TOTAL COST ANALYSIS

Datacentre Avg(ms) Min(ms) Max(ms) TABLE IV: DATACENTER ANALYSIS

DC1 0.349 0.018 0.625 V. CONCLUSION AND FUTURE WORK

OUR AIM IS TO PROVIDE A SOLUTION TO THE EXISTING

DATA CENTER AND THE USER WHICH LEADS TO FAILURE OF

Você também pode gostar