Você está na página 1de 18

MAIN PROJECT REPORT

HIERARCHICAL DATA BACKUP


SUBMITTED IN PARTIAL FULFILMENT OF THE DEGREE OF

BACHELOR OF TECHNOLOGY

by
AJITH.K Y2058
SRIJITH.P.K Y2107
GROUP NUMBER 22

Under the guidance of

Sri. Vinod. Pathari

Department of Computer Engineering


National Institute of Technology, Calicut
Acknowledgement

We thank sri Vinod Pathari ,lecturer in Department of Computer Science and


Engineering,for his guidance and advice for the completion of this project.We also thank
my friends for their help to make this software more useful.

AJITH.K
SRIJITH.P.K

2
Abstract

The project entitled ‘Hierarchical data backup’ aims to design and develop a system for
the reliable backup of the critical data. The current backup strategies for data backup are
based on direct attached storage methods, in which we backup our data in a hard disk
attached directly to dedicated centralized server. Our project aims to develop a more
reliable, secure and dependable backup scheme based on a distributed and hierarchical
storage of data. The project develops a system in which the data to be backed up is stored
in a set of nodes in a LAN rather than in a single node or server, providing a distributed
backup mechanism, extending this backup to provide a more reliable storage through an
external storage device and to develop a protocol for the hierarchical backup of the data
over the intranet through the introduction of the external storage device.

3
Contents

1. Problem specification 3

2. Literature survey 3

3. Motivation 3

4. Design 4
4.1 backup procedure 5
4.2 deletion procedure 6
4.3 retrieval procedure 6

5. Implementation 7

6. Testing and verification 11


6.1 results 12

7. Conclusion 13

8. References 14

4
1. Problem specification

To develop a distributed and hierarchical data backup scheme with which we can reliably
and securely store critical data. First we develop a LAN based protocol for the storage of
critical data securely over a set of nodes on a LAN. Next we introduce an external
storage device and a protocol for the backup of more critical data in a LAN in secure and
reliable manner using this external storage device. Finally we develop an intranet sharing
protocol in which we develop a protocol to backup the most critical data in another LAN.

2. Literature Survey

The dependency of computer for data backup is increasing rapidly. The data stored in a
computer may be lost because of user error, data corruption, hardware failure or because
of disaster [2]. In order to save data from data losses many methods have been adopted
by industries and educational institutions. The most common methods of backup are disk
to tape backups like Direct Attached Backups, Centralized LAN backups, LAN free
backups etc. and disk to disk backups [3]. In Direct attached backup strategy we backup
servers by directly attaching at tape backup unit to each server and to backup stored data.
In a centralized backup server strategy a designated server known as backup server is
used. The backup server manages backup of all data associated in all servers to a tape
attached directly to a backup server. In LAN free backups, in most of the cases we make
use of Storage area networks (SAN) [4]. Storage area networks are dedicated high speed
networks containing a large number of storage elements and a number of servers manage
the backup of data to these storage elements. NAS servers are dedicated file servers that
function to store and retrieve files for other general purpose servers and computers.
Whereas general-purpose production servers are loaded with applications that consume
storage, NAS servers are stripped of unnecessary hardware (there is no monitor,
keyboard or mouse) and software applications, and use only those components of the
operating system required for file serving, thus maximizing the disk space available for
storage [5].

3. Motivation

Relevance of the data is felt most when it is lost irrecoverably. The heavy dependence on
computers as data storage mechanisms could be justified only if proper backup facilities
are available to accommodate critical failures. The usual backup strategies heavily

5
depend on backing data to a single disk or tape attached directly to server. This approach
has many significant drawbacks. It scales poorly, since meeting the increased demand for
storage capacity must be dealt with by adding additional servers. The file processing
functions (such as data storage and retrieval) directly compete with applications for
system resources. The proposed work is an attempt to derive a distributed and
hierarchical backup mechanism which is both cost effective and reliable [1]. The
distributed backup protocol securely backup the critical data over a set of nodes that are
willing to share a space for storing backup data thereby increasing the reliability. The
hierarchical data backup protocol stores the critical data in an external storage device
NAS and then over the intranet adding further reliability.

4. Design

We first deal with the design of the first phase of the project i.e. to develop a distributed
file backup system which allows us to back up the critical files over the LAN. The system
is designed in such a way that it is scalable and adaptable with succeeding phases.

The architectural model for the system is a combination of client server model. We
maintain a server to monitor whole transactions. It maintains necessary atomicity of
transactions and provides data consistency and isolation. We assume a lightly loaded
situation in which data backup does not results in a heavy traffic in the network. Even
though it looks like a client server paradigm, the data to be backed up is stored on some
arbitrary nodes and not on the server. Hence the failure of the server does not affect the
backed up data. Each node on the network knows in which node it has backed up its data.

client client

serve
r

6
We maintain a server for the purpose of management of data backup over the nodes in a
LAN. Any authorized user in a LAN who is ready to share a fixed size of memory for
backing up data for other users can backup that much amount of data in some remote
nodes in the LAN in a distributed manner i.e. his different data will be backed up in
different nodes. If a user wants to save a file in some remote node, he will send a request
to the server. The server decides to which node the file has to be backed up and backs up
the file to that node. The server maintains all the information regarding the data backup
i.e. to which node and filename in which a backup request from a user has been saved.
The deletion of files backed up in remote nodes and retrieval of backed up files will also
be taken care of by the server.

The design for the distributed data backup is based on an interactive model .Computation
occurs within the processes, the processes interact by passing messages resulting in
communication (i.e. information flow) and coordination (synchronization and ordering of
activities) between processes. The interaction model reflects the fact that communication
takes place with delays that are of considerable duration and the accuracy with which
independent processes can be coordinated is limited by these delays. The model we have
designed is a synchronous system in which keep a timeout mechanism and buffering of
data to deal with the process omission failures because of system crashes and the
communication omission failures.

The figure below gives a sequential model for our system design. The sequential diagram
deals with three scenarios one for backup, another for deleting backed up files and the
last one for retrieving backed up file.

7
4.1 Backup procedure

ClientA Server ClientB

Backup req

Search quota .if OK search


for lightly loaded node
Backup ack

File send

File saved

Update tables
File saved

4.2 Deletion procedure

ClientA Server ClientB

deletion req

Search dest_node and file

delete file

File deleted

Update tables
File deleted

8
4.3 Retrieval procedure

ClientA Server ClientB

Retrievereq

Search dest_node and file

retrievefile

fileretrieve

Update tables
fileretrieve

The following design shows the architectural model for the development of LAN based
protocol using an external storage device. In this case we plan to store more critical data
to an external storage device, a NAS, in addition to store it on a remote node. This
increases reliability and availability of the backed up data. Design shown below uses a
server to control backing up of data in a NAS.

9
Clie

Clie

NA
NAS

server
Se

5. Implementation

The project was envisioned to be implemented in two phases. The first phase of the
project implemented the data backup mechanism in a LAN. This phase was implemented
in a Linux environment. In this phase we implemented the backing up of data in different
nodes within a LAN. There is central server which decides upon data backup over
different clients belonging to the group in a lightly loaded node manner. For a node in the
LAN, to use the facilities provided by the software, has to join the group first by sending
a join message to the server. Only those nodes with sufficient storage space are allowed
to join the group. Once it has joined the group it can make use of the facilities offered by
the software. Through the software a client can save files in a remote node, retrieve saved
files from remote node, delete backed up files, view information about backed up files,
join the group and unsubscribe from the group. The client is provided with a graphical
user interface to perform these functions. The graphical user interface is developed in Qt.
The implementation of client side requires two processes to run continuously
once the client has joined the group. The processes were implemented in C/UNIX. They
make use of UNIX socket feature to perform communication over a network. One of the
processes always listens through a predefined port for any message from the server.
Whenever it receives a broadcast message “ARE_YOU_UP” from the server it sends
back a message “AYU_Answer” to the server. This is used up by the server to determine
which all nodes in the group are currently up in the network. The process is implemented

1
as a daemon. The other process always listens through a predefined port using TCP for
any user initiated message from the server. The process may receive three kinds of
messages from the server. First message is to save a remote file in this node. On
receiving this message the process generates a unique filename in the folder ‘remotefile’
and saves the remote file in that name and sends back that filename to the server. Second
message is to retrieve the file back to the server. On receiving the message the process
opens the file and sends back the content to the server through the open socket. Third
message is to delete the file saved in this node upon which it tries to delete the saved file
and sends a status message back to the server. On the client side a checking is done on a
regular basis
if the client is exceeding the required quota needed to be in the group. If it finds that the
client side is exceeding the required quota to be in the group it sends an unjoin message
to the server which removes the client from the group and the client is notified about this
through a popup menu. If it needs to save files again, it has to delete some unwanted files
and has to join the group. The client side uses ‘c_searchfile’ to obtain the information
about the remote nodes in which the backed up files are stored. The client side can
perform following operations.
1. JOIN
When user specifies to join a group client sends a join message to the server.
Once the client has joined the group it can back up the files in remote nodes.
Message format
Client to server: JOIN:END

2. SAVE A FILE
In this case the user specifies the filename to be saved. The client performs some
initial error routine such as checking whether file could be opened, whether same file
was already saved, whether maximum allowable storage space exceeded etc. A
message is sent to the server to save the file. After this file is opened and sent to the
server for backing up. Server sends back the target node and target filename in which
the file is backed up.
Message format
Client to server: SAVE:FILENAME:FILESIZE:END

3. DELETE A FILE
When the user specifies the filename to be deleted, after performing some error
checking like checking whether the file was saved or not, client sends a deletion
message to the server specifying the filename to be deleted.
Message format

1
Client to server: DELETE:FILENAME:END

4. RETRIEVE A FILE
When the user specifies the filename to be retrieved, after performing some initial
error checking, client sends a retrieval message to the server. The retrieved file will
be saved in the folder ‘retrieved’.
Message format
Client to server: RETRIEVE:FILENAME:END

5. VIEW SAVED FILES


Client shows the files saved in remote nodes to the user. It obtains this
information from ‘c_searchfile’. The fields of the file are source filename, target
node, target filename, file size and server time at which file is saved.

6. UNJOIN
This option is for the client to unsubscribe from the group. When a client no
longer wants to be in the group, it sends an unjoin message to the server. When a
client unsubscribe from the group, all files backed up by this node to remote nodes
will be deleted and the files other nodes have backed up in this node are restored.
Message format
Client to server: UNJOIN:END

The communication between client and server uses reliable TCP/IP protocol

The implementation of server side requires mainly two processes to run


uninterrupted. One of them broadcasts a message over the LAN in regular time intervals.
It will broadcast “ARE_YOU_UP” message through the broadcast socket configured to
use UDP protocol and waits for any response from the clients. On receiving the response
message, the process infers the IP address of the client from the message header. In this
way the process finds out the currently up client nodes in the group and consequently
performs the updation of currently up client node list. The other process always listens
through a predefined port configured to use TCP/IP for any connection from the client.
The server acts as a concurrent server servicing the clients and listening for any client
connection concurrently. The server may receive five kinds of messages from the client.

1. On receiving a save a file message from the client, server first finds out a suitable
candidate in the currently up group of nodes. That is, one among the currently up group
of nodes in which the least amount of files have been backed up by the server. Now it

1
sets up a connection with target node and sends the file to be saved to the selected node
for backing up. Then it listens for any message from target node, updates necessary files
on the server side and sends back a status message to the client. It can be an error
message or a success message in which case it sends the target node name, filename in
which the specified file is backed up and the server time at which it is done. The server
stores all information regarding backing up, that is source node, source filename, target
node, target filename, size, and time of backing up in the file ‘searchfile’. It also updates
‘nodestorage’ file which stores the amount of remote files stored in each node. Candidate
target nodes are found out by the server by searching the ‘nodestorage’ file. The
‘nodestorage’ file always maintains the entries sorted in ascending order according to
their stored space size. Hence the most lightly nodded node will always be present at the
top.
Message format
server to target node: SAVE:SOURCE NODE:SOURCEFILE:FILESIZE:END
target node to server: SAVED:TARGET FILENAME:END
server to client:
SAVED:TARGETNODE:TARGETFILENAME:TIME:END

2. On receiving a retrieve a file message from the client, the server finds out the target
node and target filename in which the file is saved by searching the ‘searchfile’, sets up a
connection with target node and retrieves contents of the file from the target node by
sending a retrieve message to it and sends it back to the client.
Message format
Server to target node: RETRIEVE:TARGETFILENAME:FILESIZE:END

3. On receiving a delete a file message from the client, the server finds out the target
node and target filename and sends a delete message to the target node and sends a status
message back to the client. It will be an error message if the target node in which the file
has been backed up is not currently up. The server will remove the corresponding entry
from the ‘searchfile’ if deletion is successful and update the file ‘nodestorage’.
Message format
client to server: DELETE:FILENAME:END
server to target node: DELETE:TARGETFILENAME:END

4. On receiving a join message from the client the server joins the client in the group
and sends a success message back to the client. The server creates an entry in the
‘nodestorage’ file with storage amount initialized as zero.

1
Message format
Server to client: SUCCESS

5. On receiving an unjoin message from a client, the server tries to unsubscribe the client
from the network. Server tries to delete all the files stored by the client on all other
remote nodes in the group. Then it tries to restore the files stored by other members in
this client to other members of the group and sends back the restore information to the
source node. Server also takes care of the situation in which not all members in the group
are up. This enables the client to unsubscribe from the group whenever he wants.
Message format
server to client: SUCCESS
server to source node:
RESTORE: SOURCEFILE: TARGETNODE: TARGETFILE: TIME: END

The server side is also provided with an easy to use graphical user interface through
which an administrator of the server can view all the information stored up in the server
related to backing up. The graphical user interface is developed in Qt.

In the second phase of the project, an external storage medium, an Iomega NAS,
was purchased and configured for using in the software lab. We configured NAS as a
network hard drive. It is configured with the static IP of 192.168.3.90 and a machine
name of csed-lab. The NAS can be administered at the site http://192.168.3.90/
The NAS was configured so that user logins can be created and quotas can be
provided. It is possible for individual users to enter into their logins through Samba client
(smb://csed-lab/). NAS will be accessible only to those users who are in the same
workgroup of the NAS. The NAS can also be remotely mount to a machine by using
command 'mount -t smbfs -o username= (user), password= (pass) //csed-lab/
/path/to/mountpoint'. All the sharing, user creation, maintenance, quota setting etc. can be
configured at http://192.168.3.90/ by the administrator. The second phase of the project is
done by mounting the NAS to a local folder of the server. Now the server can freely
write to or read from the NAS. A program is developed through which one could write,
retrieve and delete files in the NAS. This gives the user an additional option to save a
copy of his critical data to the NAS also. The only access to the NAS is through the
server.

1
6. Testing and verification

The testing and verification of the project was done on an incremental basis. The
programs for performing broadcast in the LAN are developed and tested first. The
programs are tested in the Linux machines of the software lab. Initially we could not
obtain the expected results due to the firewall configuration of the Linux machines in the
software lab. Disabling the firewall settings in some machines we were able to obtain the
expected results. Next server and client programs are developed and deployed in the
Linux machines of the software lab. Testing and verification of each program is done
separately. All the programs are then integrated and tested together. The following
different cases are analyzed for testing.

1. JOIN
The following cases are tested and verified when a user tries to join a group
a. User has already joined the group
b. User has not got enough storage requirements to join the group.
c. User has got enough storage requirements to join the group.

2. SAVE FILE
When user specifies a filename to save the following cases are initially tested on
the client side
a. If the file could be opened
b. If the file was already saved
c. If the size of the file and total size of files he has already saved exceeds
maximum allowable storage limit
The following are different cases tested and verified that may arise during
backing up of files.
a. No other nodes in the group are currently up
b. Free allocated space on the currently up nodes exceeds the size of the file
c. File could not be saved on the remote node.
d. File is saved successfully on the remote node.

3. DELETE FILE
When user specifies a filename to be deleted, an initial checking is done to verify
such a file is already saved. The following cases are tested and verified
a. Node in which the file has been saved is not up.

1
b. The file could not be deleted from the remote node.
c. The file could be successfully deleted from the remote node.

4. RETRIEVE FILE
When user specifies the filename to be retrieved, an initial checking is done if that
file was already saved. The following are then tested and verified.
a. Node in which the file has been saved is not up.
b. The file could not be retrieved from the remote node.
c. The file could be successfully retrieved from the remote node.

5. UNJOIN
When the user specifies to unsubscribe from the group an initial checking is done
if he has already joined the group. The following cases are tested and verified.
a. Remote node in which he has backed up the file is not up.
b. All remote nodes in which he has saved the files are up.
c. No other nodes are currently up so that immediate restoring of files initial-
ly stored by other nodes on this node could not be performed.
d. Free allowable space of currently up nodes for storing remote files not
enough for performing restoration.
e. Source node which has saved a file in this node is not up.
f. All nodes are currently up.

The graphical user interface developed in Qt is also tested and verified completely for
running in the software lab. The project as a whole is run using graphical user
interface and tested and verified.

6.1 Results

The project has been successfully tested and verified. The software was successfully
deployed in the software lab. The test results inferred that the software did not showed
any anomalous behavior. The software behaved as expected. The following results were
shown by the software when tested against various use cases.
1. JOIN
a. If the user has already joined the group, user is notified about it
b. If user has not enough space, user is notified about it.
c. User is notified that he has successfully joined

2. SAVE FILE

1
a. Proper error messages are shown for all initial error routine checks.
b. Proper error messages are shown for each of the cases, no other nodes
in the group are currently up, and size for storage exceeded among
currently up nodes, and file could not be saved in the remote node.
c. If file was successfully saved, user is notified about it along with
remote node and filename with which it is stored.
3. DELETE FILE
a. Proper error messages are shown for initial error checks.
b. If the remote node is not up, user is notified about it.
c. If the file was successfully deleted, user is notified about it.
4. RETRIEVE FILE
a. Proper error messages are shown for initial error checks.
b. If the remote node is not up, user is notified about it.
c. If the file could be retrieved, user is notified about it along with the
name with which retrieved file is stored in the client side.
5. UNJOIN
a. User is notified about the successful unsubscription from the group.

7. Conclusion

Software for distributed backing up of data in a LAN was developed. The software could
perform backing up of files in remote nodes, deletion of files from remote node, retrieve
files from remote node, join the group and unsubscribe from the group. The software is
also provided with an easy to use graphical user interface. A program was developed
through which one could write, delete, and retrieve files to the external storage medium
NAS which provides an additional reliability for data storage.

1
8. References

[1] Jerome .H. Saltzer, Needed: A systematic structuring paradigm for distributed data,
th
ACM SIGOPS 5 European Workshop,2000.
[2] Disaster Recovery :Best Practices. white paper. Cisco network solutions, August
2003, http://www.cisco.com/warp/public/63/disrec.html.
[3] Disaster Recovery Planning. white paper. Cisco network solutions, December 2003
http://whitepapers.techrepublic.com.com/whitepaper.aspx
[4] Using Network Attached Storage for Reliable Backup and Recovery. white paper.
Microsoft corporation and Dell,
http://www.dell.com/downloads/global/products/pvaul/en/nasReliableBackups.pdf
[5] Business case for network attached storage. white paper. June 2005,
http://whitepapers.techrepublic.com.com/abstract.aspx