Você está na página 1de 52

A Seminar Report on

UR TITLE NAME Submitted in partial fulfillment of the requirement for the award of

Bachelor of Technology In COMPUTER SCIENCE & ENGINEERING From



DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING LAQSHYA INSTITUTE OF TECHNOLOGY & SCIENCE (Approved by AICTE, New Delhi & Affiliated to JNTU, Hyderabad) TANIKELLA (V), KHAMMAM (M), KHAMMAM (Dt). A.P. India -507305 Ph: 08742 211306 http://www.laqshya.edu.in/ 1

LAQSHYA INSTITUTE OF TECHNOLOGY & SCIENCE (Approved by AICTE, New Delhi & Affiliated to JNTU, Hyderabad) TANIKELLA (V), KHAMMAM (M), KHAMMAM (Dt). A.P. India -507305 Ph: 08742 211306 http://www.laqshya.edu.in/


This is to certify that the dissertation entitled GRID COMPUTING is a confide work done by AMBADIPUDI RAJESH,08RC1A0503 in the partial fulfillment of Bachelor of Technology in Computer Science & Engineering from JNTU, Hyderabad during the year 20112012.

Mr.SK.SHARFUDDIN M.Tech Assistant Professor Supervisor

Mrs.M.Sri devi M.Tech,(p.hd) Associate Professor H.O.D., C.S.E.


The satisfaction that accompanies the successful completion of any task would be incomplete without the mention of the people who made it possible and whose encouragement and guidance has been a source of inspiration throughout the course of project.

It is my privilege and pleasure to express my profound sense of gratitude and indebtedness to Mr.SK.SHARFUDDIN seminor Supervisor& Assistant Professor, Department of Computer Science and Engineering, LAQSHYA institute of technology and science, for his guidance, cogent discussion, constructive criticisms and encouragement throughout this dissertation work.

I express my sincere gratitude for Associate Professor

Mrs.M.SRIDEVI, Head of

Department of Computer Science and Engineering, LAQSHYA institute of technology and science, for her precious suggestions, motivation and co-operation for the successful completion of this seminor work.

In addition I would like to thank all my family members, friends, and colleagues for giving moral strength and support to complete this dissertation.

Abstract: Applications of Grid Computing A Grid can be simply defined as a combination of different components which function collectively as a part of one large electrical or electronic circuit. Grid FTP is a wellknown and robust protocol for fast data transfer on the grid. GridFTP is an exceptionally used for large volumes of data. In this data transmission we can face a Lots of small files (LOSF) problem. In this problem the large amount of data set will be partitioned into a small file. The transmission of the small files will be achieved by the concept of Pipelining. Pipelining approaches the LOSF problem by trying to minimize the amount of time between transfers. Pipelining allows the client to have many outstanding transfer commands at a time, Instead of being forced to wait for the transfer successful acknowledgement message the client has free to send the transfer commands at any time. The Server processes these requests in the order they are send. In Grid FTP we can establish a channel. In this Channel establishment we can use two channels one is Control channel and another is Data channel. Key words: Grid Grid FTP LOSF (Lots of small files ) Pipelining Robust Server Control channel


1 INTRODUCTION7-13 2 LITERATURE SURVEY.14-23 3 EXISTING SYSTEM....24-13 Dis-advantages in Existing system 4 PROPOSED SYSTEM...14-19 Advantages in proposed system. 5 METHODS..20-21 6 CONCLUSION..22 7 FUTURE WORK...23 8 BIBILOGRAPHY...24 9 WEB SITES...25


s.no 1 2

Figure name/description



A Grid can be simply defined as a combination of different components which function collectively as a part of one large electrical or electronic circuit. It can also be defined as a paradigm/infrastructure that enables the sharing, selection and aggregation of geographically distributed resources such as Computers-PCs, Workstations, Clusters, SuperComputers, Softwares, Catalogued data and databases etc.

The term Grid Computing can similarly be applied to a large number of computers which connect together to collectively solve a problem of very high complexity and magnitude. The fundamental idea behind the making of any computer based grid is to utilize the idle time of processor cycles. Simply stated, a processor during the times it would stay idle would now team up with similar idle processors to tackle various complexities. Grid Computing virtualizes distributed computing and data resources such as processing, network band-width and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities.


An important type of communication in grid and distributed computing environments is bulk data transfer. GridFTP has emerged as a de facto standard for secure, reliable, high-performance data transfer across resources on the Grid GridFTP is a well-known and robust protocol for fast data transfer on the Grid. The GridFTP implementation provided by the Globus Toolkit can scale to network speeds and has been shown to deliver 27 Gb/s on 30 Gb/s. The Globus Toolkit is an open source software toolkit used for building Grid systems and applications 7

The protocol is optimized to transfer large volumes of data commonly found in Grid applications. Datasets of sizes from hundreds of megabytes to terabytes and beyond can be transferred at close to network speeds by using GridFTP. Given the high-speed networks commonly found in modern Grid environments, datasets less than 100 MB are too small for the underlying protocols like TCP to utilize the maximum capacity of the network. Therefore, GridFTP and most bulk data transfer protocols experiences the highest levels of throughput when transferring large volumes of data. Unfortunately, conventional implementations of GridFTP have a limitation as to how the data must be partitioned to reach these high-throughput levels. Not only must the amount of data to transfer be large enough to allow TCP to reach full throttle, but the data must also be in large files, ideally in one single file. If the dataset is large but partitioned into many small files(on gigabit networks we consider any file smaller than 100 MB as a small file), the performance of GridFTP servers suffers drastically This problem is known as thelots of small files(LOSF) problem. In this paper we study the LOSF problem and present a solution known as pipelining. We have implemented pipelining in the Globus Toolkit,


The GridFTP protocol is a backward-compatible extension of the legacy RFC959 FTP protocol. It maintains the same command/response semantics introduced by RFC959. It also maintains the two-channel protocol semantics. One channel is for control messaging (the control channel) such as requesting what files to transfer , and the other is for streaming the data payload (the data channel). These protocol details have interesting effects on the LOSF problem.

Channel Establishment GridFTP servers listen on a well-known and published port for client control channel connections. Once a client successfully forms a control channel with a server (this often involves 8

authentication and authorization), it can begin sending commands to the server.In order to transfer a file, the client must first establish a data channel.This involves sending the server a series of commands on the control channel describing attributes of the desired data channel such as: what protocol to use, binary or ASCII data, passive or active connection, and various protocol specific attributes. Once these commands are successfully sent, a client can request a file transfer. At this point a separate data channel connection is formed using all of the agreed upon attributes, and the requested file is sent across it. In standard FTP the data channel can be used only to transfer one file. Future transfers must again go through the process of setting up a new data channel. GridFTP modified this part of the protocol to allow many files to be transferred across a single data channel. With GridFTP all of the messaging to establish a data channel is done once; the data channel connection is formed just once, and the client can request several file transfers using that same data channel. This enhancement is known as data channel caching.

File Transfers

File transfer requests are done with the RETR (send) or STOR (receive) command. A client sends one of these commands to the server across the control channel. Data then begins to flow between the client and server over the data channel. Once all of the data has been transferred, a 226 Transfer Complete acknowledgment message is sent from the server to the client on the control channel. Only when this acknowledgment is received can the client request another transfer. This interaction is illustrated in Figure 1. As the figure shows, there is an entire round-trip time on the control channel between transfers where the data channel must be idle. Before issuing the next transfer command the client must first receive the transfer completion acknowledgment, which is one across the network. After receiving the acknowledgment, the client sends the transfer command immediately. However, the server does not immediately receive it

Figure 1: GridFTP file transfers with no pipelining

The message must cross the network before the server will begin sending data. This process involves another trip across the network. Assuming we have the GridFTP data channel caching enabled, we do not have to worry about the latencies involved with establishing the data channel. If we do not have it enabled, the delay is significantly longer. During this time the data channel is idle. The latency between transfers adds to the overall transfer time and thus detracts from the overall throughput. The problem is even exacerbated when communicating over highlatency networks where the RTT is very high. While the idle data channel time is a problem, there is a far greater problem that it causes. TCP is a window-based protocol. For it to achieve maximum efficiency, the window size of 10

allowed unacknowledged bytes must grow to the bandwidth delay product . Various algorithms in the TCP protocol decide to increase or decrease the window size based on observed events. If a connection is idle for longer than one RTT, the window size gets reduced to zero; and once it is used again, it must go through TCP slow start [14]. When transferring a series of files, the data channel is idle for a control channel RTT in between transfers. If the control channel RTT and the data channel RTT are similar, it is likely that data channel TCP connections will have entire closed windows by the time the next transfer begins. When the amount of data sent in each file is small, the ratio of idle data channel time to transfer time becomes higher and affects the throughput. Additionally, small files may not be transferred long enough to traverse the slowstart algorithm and bring TCP to full throttle. Thus, even when data is being transferred, it is not moving at full speed.


Pipelining approaches the LOSF problem by trying to minimize the amount of time between transfers. Pipelining allows the client to have many outstanding, unacknowledged transfer commands at once. Instead of being forced to wait for the 226 Transfer Successful message; the client is free to send transfer commands at any time. The server processes these requests in the order they are sent. Acknowledgments are returned to the client in the same order. The process is shown in Figure 2.This process hides the latency of transfer requests by overlapping them with data transfers. The first transfer request is sent, and data begins to flow across the data channel. While the file transfer is in progress, the client sends the next n file transfer requests. The server queues the requests. When the server completes the file transfer, it sends the acknowledgment to the client and checks the queue for the next transfer request. If the queue is not empty, the next file transfer begins immediately. There is some inevitable processing latency between transfers, but it is very small compared to the entire RTT of network latency that has been eliminated.


Figure 2: GridFTP file transfers with pipelining

According to the proposed pipelining protocol, the client is allowed to send an unlimited number of outstanding commands. In practice, the number of outstanding commands will be limited by the GridFTP server implementation and TCP flow control. The client is free to send as many commands as it wishes on the TCP control channel. However, the GridFTP server will read a limited number of these commands out of the TCP buffer and into its process space. All other outstanding commands will remain in the operating systems TCP buffers. As the server side buffers get full, the TCP window will close. Ultimately, the sending side TCP buffers will fill up, and the clients attempt to send future commands will be stalled. In most cases there is little performance benefit for a client to have more thanthree outstanding commands; however, allowing an unlimited number makes client implementation simpler.Client waits for the same number of acknowledgments from the server.


GridFTP Pipelining

GridFTP is a high-performance, secure, reliable datatransfer protocol optimized for high-bandwidth wide-area networks. GridFTP is an exceptionally fast transfer protocol for large volumes of data. Implementations of it are widely deployed and used on well-connected Grid environments such as those of the TeraGrid because of its ability to scale to network speeds. However, when the data is partitioned into many small files instead of few large files, it suffers from lower transfer rates. The latency between the serialized transfer requests of each file directly detracts from the amount of time data pathways are active, thus lowering achieved throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go through the slow-start algorithm. The performance penalty can be severe. This situation is known as the lots of small files problem. In this paper we introduce a solution to this problem. This solution, called pipelining, allows many transfer requests to be sent to the server before any one completes. Thus, pipelining hides the latency of each transfer request by sending the requests while a data transfer is in progress. We present an implementation and performance study of the pipelining solution.


LITERATURE SURVEY Utility computing is the conceptual core of our analysis but much of the current debate on this idea is discourse on the concept of Cloud Computing a more marketable vision perhaps than utility computing. Cloud Computing is a new and confused term. Gartner define cloud computing succinctly as a style of computing where massively scalable IT-related capabilities are provided as a service using Internet technologies to multiple external customers. Yet our interest is not in the particulars of cloud computing itself but the opportunities presented for researchers and practitioners by this new technology. We argue that fundamental to both cloud computing and utility computing is a decoupling of the physicality of IT infrastructure from the architecture of such infrastructures use. While in the past we thought about the bare-metal system (a humming grey box in an air-conditioned machine room with physical attributes and a host of peripherals) today such ideas are conceptual and virtualized hidden from view. It is this decoupling which will form the basis of our discussion of the technology of the Grid. There certainly is a strong element of hype in much of the Utility, Grid and Cloud computing discourse and perhaps such hype is necessary. As Swanson and Ramillar (1997) remind us, the organising visions of information and communications technology are formed as much in extravagant claims and blustering sales talk as they are in careful analysis, determination of requirements or proven functionality. We can at times observe a distinct tension between the technologists aspiration to develop and define an advanced form of computer infrastructure, and a social construction of such technology through discourses of marketing, public relations. We find a plethora of terms associated with Utility computing within commercial settings include Autonomic Computing; Grid Computing; On-Demand Computing; Real-time Enterprise; Service-Oriented Computing; Adaptive computing (or Adaptive Enterprise) (Goyal and Lawande 2006; Plaszczak and Wellner 2007) and peer-to-peer computing (Foster and Iamnitchi 2003). We have adopted the term utility computing as our categorization of this mixed and confused definitional landscape. Many authors who write about Utility Computing start with an attempt to provide a definition, often accompanied by a comment as to the general confusion surrounding the term (e.g. (Gentzsch 2002)). It is unrealistic to expect an accepted definition of a technology which is still emerging, but by tracing the evolution of definitions in currency we can see how the 14

understanding of new technology is influenced by various technical, commercial and sociopolitical forces. Put another way, the computer is not a static thing, but rather a collection of meanings that are contested by different groups (Bijker 1995), and as any other technology, embodies to degrees its developers and users social, political, psychological, and professional commitments, skills, prejudices, possibilities and constraints.

Computing Utility: The Shifting nature of Computing. Since Von Neumann defined our modern computing architecture we have seen computers as consisting of a processing unit (capable of undertaking calculation) and a memory (capable of storing instructions and data for the processing unit to use). Running on this machine is operating system software which manages (and abstracts) the way applications software makes use of this physical machine. The development of computing networks, client-server computing and ultimately the internet essentially introduced a form of communication into this system allowing storage and computing to be shared with other locations or sites - but ultimately the concept of a "personal computer" or "server computer" remains. This basic computer architecture no longer represents computing effectively. Firstly the physical computer is becoming virtualized represented as software rather than as a physical machine. Secondly it is being distributed through Grid computing infrastructure such that it is owned by virtual rather than physical organizations. Finally these two technologies are brought together in a commoditization of computing infrastructure as cloud computing where all physicality of the network and computer is hidden from view. It is for this reason that in 2001 Shirky at a P2P Webservices conference stated that Thomas Watsons famous quote that I think there is a world market for maybe five computers was wrong - he overstated the number by four. For Shirky the computer was now a single device collectively shared. All PCs, mobile phones and connected devices share this Cloud of services on demand and where processing occurs is not relevant. We now review the key technologies involved in Utility Computing. 1: Internet Bandwidth and At the core of the Utility Computing model is the network. Internet Standards The internet and its associated standards have enabled interoperability among systems and provides the foundation 15

for Grid Standards. 2: Virtualisation Central to the Cloud Computing idea is the concept of Virtualising the machine. While we desire services, these are provided software). 3:Grid Computing Just as the Internet infrastructure (standards, hardware and software) provides the foundation of the Web, so Grid Standards and Software extend this infrastructure to provide utility computing utilising large clusters of distributed computers. Internet Bandwidth and Standards by personal-machines (albeit simulated in

Middleware and Standards

The internet emerged because of attempts to connect mainframe computers together to undertake analysis beyond the capability of one machine - for example within the SAGE air-defence system or ARPANET for scientific analysis (Berman and Hey 2004). Similarly the Web emerged from a desire to share information globally between various different computers (Berners-Lee 1989). Achieving such distribution of resources is however founded upon a communications infrastructure (of wires and radio-waves) capable of transferring information at the requisite speed (bandwidth) and without delays (latency). Until the early 2000s however the bandwidth required for large applications and processing services to interact was missing. During the dotcom boom however a huge amount of fibre-optic cable and network routing equipment was installed across the globe by organisations, such as the failed WorldCom, which reduced costs dramatically and increased availability. Having an effective network infrastructure in place is not enough. A set of standards (protocols) are also required which define mechanisms for resource sharing (Baker, Apon et al. 2005). Internet standards (HTTP/HTML/TCP-IP) made the Web possible by defining how information is shared globally through the internet. These standards ensure that a packet of information is reliably directed between machines. It is this standardised high-speed high-bandwidth Internet infrastructure upon which Utility Computing is built.


Virtualization for cloud computing is a basic idea of providing a software simulation of an underlying hardware machine. These simulated machines (so called Virtual Machines) present themselves to the software running upon them as identical to a real machine of the same specification. As such the virtual machine must be installed with an operating system (e.g. Windows or Linux) and can then run applications within it. This is not a new technology and was first demonstrated in 1967 by IBMs CP/CMS systems as a means of sharing a mainframe with many users who are each presented with their own virtual machine (Ceruzzi 2002). However its relevance to modern computing rests in its ability to abstract the computer away from the physical box and onto the internet. Today the challenge is to virtualize computing resources over the Internet. This is the essence of Grid computing, and it is being accomplished by applying a layer of open Grid protocols to every local operating system, for example Linux, Windows, AIX, Solaris, z\OS (Wladawsky-Berget 2004). Once such Grid enabled virtualization is achieved it is possible to decouple the hardware from the now virtualized machine, for example running multiple virtual machines on one server or moving a virtual machine between servers using the internet. Crucially for the user it appears they are interacting with a machine with similar attributes to a desktop machine or server - albeit somewhere within the internetcloud. Grid Computing The term Grid is increasingly used in discussions about the future of ICT infrastructure, or more generally in discussion of how computing will be done in the future. Unlike Cloud computing which emerges and belongs to an IT industry and marketing domain, the term Grid Computing emerged from the super-computing (High Performances Computing) community (Armbrust, Fox et al. 2009). Our discussion of Utility computing begins with this concept of Grids as a foundation. As with the other concepts however for Grids hyperbole around the concept abounds, with arguments proposed that they are the next generation of the internet, the next big thing; or that will overturn strategic and operating assumptions, alter industrial economics, upset markets () pose daunting challenges for every user and vendor (Carr 2005) and even provide the electronic foundation for a global society in business, government, research, science and entertainment (Berman, Fox et al. 2003). Equally, Grids have been 17

accused of faddishness and that there is nothing new in comparison to older ideas, or that the term is used simply to attract funding or to sell a product with little reference to computational Grids as they were originally conceived (Sottrup and Peterson 2005). From a technologists perspective an overall description might be that Grid technology aims to provide utility computing as a transparent, seamless and dynamic delivery of computing and data resources when needed, in a similar way to the electricity power Grid (Chetty and Buyya 2002; Smarr 2004). Indeed the word grid is directly taken from the idea of an electricity grid, a utility delivering power as and when needed. To provide that power on demand a Grid is built (held together) by a set of standards (protocols) specifying the control of such distributed resources. These standards are embedded in the Grid middleware, the software which powers the Grid. In a similar way to how Internet Protocols such as FTP and HTTP enable information to be past through the internet and displayed on users PCs, so Grid protocols enable the integration of resources such as sensors, data-storage, computing processors etc (Wladawsky-Berget 2004). The idea of the Grid is usually traced back to the mid 1990s and the I-Way project to link together a number of US supercomputers as a metacomputer (Abbas, 2004). This was led by Ian Foster of the University of Chicago and Argonne National Laboratory. Foster and Carl Kesslemenn then the Globus project to develop the tools and middle ware for this metacomputer[3]. This tool kit rapidly took off in the world of supercomputing and Foster remains a prominent proponent of the Grid. According to Foster and Kesselmans (1998) bible of the grid a computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities. In this Foster highlights high-end in order to focus attention on Grids as supercomputing resource supporting large scale science; Grid technologies seek to make this possible, by providing the protocols, services and software development kits needed to enable flexible, controlled resource sharing on a large scale (Foster 2000)[4]. Three years after their first book however the same authors shift their focus, again speaking of Grids as "coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations" (Foster, Kesselman et al. 2001). The inclusion of multi-institutional within this 2001 definition highlights the scope of the concept as envisaged by these key Grid proponents, with Berman (2003) further adding that Grids enable resource sharing on a global 18

scale. Such definitions, and the concrete research projects that underlie them, make the commercial usage of the Grid seem hollow and opportunistic. These authors seem critical of the contemporaneous re-badging by IT companies of existing computer-clusters and databases as Grid enabled (Goyal and Lawande 2006; Plaszczak and Wellner 2007). This critique seems to run through the development of Grids within supercomputing research and science where many lament the use of the term by IT companies marketing clusters of computers in one location. In 2002 Foster provides a three point checklist to assess a Grid (Foster 2002). A Grid 1) coordinates resources that are NOT subject to centralized control; 2) uses standard, open, general purpose protocols and interfaces; 3) delivers non-trivial qualities of service. Fosters highlighting of NOT, and the inclusion of open protocols appear as a further challenge to the commercialization of centralized, closed grids. While this checkpoint was readily accepted by the academic community and is widely cited, unsurprisingly, it was not well received by the commercial Grid community (Plaszczak and Wellner 2007). The demand for decentralization was seen as uncompromising and excluded practically all known grid systems in operation in industry (Plaszczak and Wellner 2007, p57). It is perhaps in response to this definition that the notion of Enterprise Grids (Goyal and Lawande 2006) emerged as a form of Grid operating within an organisation, though possibly employing resources across multiple corporate locations employing differing technology. It might ultimately be part of the reason why "Cloud computing" has eclipsed Grid computing as a concept. The commercial usage of Grid terms such as Enterprise Grid Computing highlights the use of Grids away from the perceived risk of globally distributed Grids and is the foundation of modern Cloud Computing providers (e.g Amazon S3). The focus is not to achieve increased computing power through connecting distributed clusters of machines, but as a solution to the Silos of applications and IT systems infrastructure within an organisations IT function (Goyal and Lawande 2006, p4) through a focus on utility computing and reduced complexity. Indeed in contrast to most academic Grids such Enterprise Grids demand homogeneity of resources and centralization within Grids as essential components. It is these Grids which form the backdrop for Cloud Computing and ultimately utility computing in 19

which cloud provider essentially maintain a homogenous server-farm providing virtualized cloud service. In such cases the Grid is far from distributed, rather existing as a centralized pool of resources to provide dedicated support for virtualized architecture (Plaszczak and Wellner 2007,p174) often within data-centers. Before considering the nature of Grids we discuss their underlying architecture. Foster (Foster, Kesselman et al. 2001) provides an hour-glass Grid architecture (Figure 1). It begins with the fabric which provides the interfaces to the local resources of the machines on the Grid (be they physical or virtual machines). This layer provides the local, resource-specific facilities and could be computer processors, storage elements , tape-robots, sensor, databases or networks. Above this is a resource and connectivity layer which defines the communication and authentication protocols required for transactions to be undertaken on the Grid. The next layer provides a resource management function including directories, brokering systems, as well as monitoring and diagnostic resources. In the final layer reside the tools and applications which use the Grid. It is here that Virtualization software resides to provide services.

Figure 1: The Layered Grid Architecture.


One of the key challenges of Grids is the management of the resources they manage for the users. Central to achieving this is the concept of a Virtual Organisation (VO). A Virtual Organisation is a set of individuals and/or institutions defined by the sharing rules for a set of resources (Foster and Kesselman 1998) or a set of Grid entities, such as individuals, applications, services or resources, that are related to each other by some level of trust (Plaszczak and Wellner 2007). By necessity these resources must be controlled with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs (Foster and Kesselman 1998) and for this purpose VOs are technically defined along with the rules of their resources sharing. A Grid VO implies the assumptions of the absence of central location, central control, omniscience, and an existing trust relationship (Abbas 2004). It is this ability to control access to resources which is also vital within Cloud Computing - allowing walled-gardens for security and accounting of resource usage for billing. Various classes and categories of Grids exist. According to Abbas Grids can be categorised according to their increasing scale - desktop grids, cluster grids, enterprise grids, and global grids (Abbas 2004). Desktop Grids are based on existing dispersed desktop PCs and can create a new computing resource by employing unused processing and storage capacity while the existing user can continue to use the machine. Cluster Grids describe a form of parallel of distributed computer system that consists of a collection of interconnected yet standardised computer nodes working together to act, as far as the user is concerned, as a single unified computing resource. Many existing supercomputers are clusters which use Smart Software Systems (SSS) to virtualise independent operation-system instances to provide an HPC service (Abbas 2004). All the above are arguably grids, and potentially can just about live up to Fosters 3 tests. However, for the information systems field, for Pegasus, and for those who wish to explore Cloud Computing, it is the final category of global Grids that is the most significant. Global Grids employ the public internet infrastructure to communicate between Grid Nodes, and rely on heterogeneous computing and networking resources. Some global grids have gained a large amount of publicity by providing social benefit which capture the public imagination. Perhaps the first large scale such project was SETI@home which searches radio-telescope data for signs of extra-terrestrial intelligence. WorldCommunityGrid.org undertaking research for healthcare 21

and Folding@home concerned with protein folding experiments are other examples. Folding@home indeed can claim to be the worlds most powerful distributed computing network according to the Guinness Book of Records, with 700,000 Sony PlayStation 3 machines and over 1,000 trillion calculations per second[9]. Each works by dividing a problem into steps and distributing software over the internet to the computers of those volunteering. Since within the home and workplace a large number of desktop computers remain idle most of the time such donations have little impact on the user. Indeed the average computer is idle for over 90% of the time, and even when used only a very small amount of the CPUs capabilities are employed (Smith 2005). Another way to categories Grids is by the types of solutions that they best address (Jacob 2003). A computational grid is focused on undertaking large numbers of computations rapidly, and hence the focus is on using high performance processors. A data grids focus is upon the effective storage and distribution of large amounts of data, usually across multiple organisations of locations. The focus of such systems is upon data integrity, security and ease of access. It should be stressed that there are no hard boundaries between these two types of grid, and one need often pre-supposes the other and real users face both issues. As an example of a grid project with a more data orientation, consider the Biomedical Informatics Research Network, a grid infrastructure project that serves biomedical research needs http://www.nbirn.net/index.shtm. They express their offerings in terms of 5

complementary elements; a cyber infrastructure, software tools (applications) for biomedical data gathering, resources of shared data, data integration support, an ontology and support for multi-site integration of research activity. As they say, By intertwining concurrent revolutions occurring in biomedicine and information technology, BIRN is enabling researchers to participate in large scale, cross-institutional research studies where they are able to acquire, share, analyze, mine and interpret both imaging and clinical data acquired at multiple sites using advanced processing and visualization tools. Other examples of Grid Computing exist within science, particularly particle physics. The particle physics community faces the challenge of analyzing the unprecedented amounts of data - some 15 Petabytes per year - that will be produced by the LHC (Large Hadron Collider) experiments at CERN[10]. To process this data CERN required around 100,000 computer22

equivalents[11] forming its associated grids by 2007, spread across the globe and incorporating a number of grid infrastructures (Faulkner, Lowe et al. 2006). In using the Grid physicists submit their computing-jobs to the Grid which spreads across the globe. Similarly data from the LHC is initially processed at CERN but is quickly spread to 12 computer centres across the world (so called Tier-1 Grid sites). From here data is spread to local data-centres at universities within these countries (Tier-2 sites).



Distributed or grid computing in general is a special type of parallel computing that relies on complete computers (with onboard CPUs, storage, power supplies, network interfaces, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus. The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which, when combined, can produce a similar computing resource as multiprocessor supercomputer, but at a lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various processors and local storage areas do not have highspeed connections. This arrangement is thus well-suited to applications in which multiple parallel computations can take place independently, without the need to communicate intermediate results between processors. The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet. There are also some differences in programming and deployment. It can be costly and difficult to write programs that can run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a thin layer of grid infrastructure can allow conventional, standalone programs, given a different part of the same problem, to run on multiple machines. This makes it possible to write and debug on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time. Design considerations and variations One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple administrative domains). This can 24





in utility







assemble volunteer computing networks. One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes. Due to the lack of central control over the hardware, there is no way to guarantee that nodes will not drop out of the network at random times. Some nodes (like laptops or dialup Internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results in expected time. The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to the developing organization, or to an open external network of volunteers or contractors. In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust client nodes must place in the central system such as placing applications in virtual machines. Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures. With many languages, there is a trade off between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this trade off, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform).There are diverse scientific and commercial projects to harness a particular associated grid or for the 25

purpose of setting up new grids. BOINC is a common one for various academic projects seeking public volunteers; more are listed at the end of the article. In fact, the middleware can be seen as a layer between the hardware and the software. On top of the middleware, a number of technical areas have to be considered, and these may or may not be middleware independe]] management, Trust and Security, Virtual organization management, License Management, Portals and Data Management. These technical areas may be taken care of in a commercial solution, though the cutting edge of each area is often found within specific research projects examining the field.

Disadvantages of Conventional super computers: Disadvantages: Power usage, heat, cost and in the case of over clocked computers heat that leads to damage to the components which in turn will raise the cost through replacement parts. In the case of 64 bit processors, (which can provide better processing capabilities) there can be the downside of compatibility issues for some software.


Proposed system:

Grid: Grid is a combination of different components which collectively as a part of one large electrical or electronic circuit.

Figure 1:Architecture of a Grid

Grid computing: The term grid computing means that large number of computers are connected together to collectively solve a problem of very high complexity and magnitude. Grid computing is all about sharing, aggregating, hosting, offering services across the world for the benefit of mankind. Grid computing is a form of networking. Unlike conventional networks that focus on communication among devices, grid computing harnesses unusedprocessing cycles of all computers in a network for solving problems too intensive for any stand-alone machine. A well-known grid computing project is the SETI (Search for Extraterrestrial Intelligence) @Home project, in which PC users worldwide donate unused processor cycles to help the search 27

for signs of extraterrestrial life by analyzing signals coming from outer space. The project relies on individual users to volunteer to allow the project to harness the unused processing power of the user's computer. This method saves the project both money and resources. Grid computing does require special software that is unique to the computing project for which the grid is being used.

Figure 2:Architecture Of Grid Computing

1.Current Issues In Grid Computing:

Grid Computing is still very much in its development stage and there are a number of issues that must be addressed or resolved before it can be considered as a stable technology. Some of these issues are discussed below.


1.1 The Grid versus Many Grids: A distinction must be made between the idea of a single, worldwide, ubiquitous grid and the idea of many separate grids located in businesses and on university campuses. The original intention of Grid Computing was that it would follow the same architecture as the electricity grid. This means that whenever and wherever you needed compute power you would simply \plug in" to The Grid and the processing would be done. There would be no need to know where the computing was being done - just as there is no need for me to know where the power that is lighting this room is coming from - only that it was being done. In the same way that I don't need to know whether the electricity lighting this room is coming from a hydro-electric power plant in Fiordland or a wind turbine in Wellington, I wouldn't care if my complicated simulation were being run on a spare machine next door or on an idle server somewhere on the other side of the world. Infact,The Grid could be viewed as a Grid of Grids, in much the same way as the Internet is a network of networks. Although work is still being done toward creating a single Grid, it is already the case that there are many disparate grids worldwide that are all completely isolated from each other. Having many separate Grids makes issues like authentication and Virtual Organisations much simpler, which is one of the reasons that The Grid has not emerged. It also eliminates the need for some sort of global billing system, which is discussed further in Section 1.2.Some progress toward creating a single worldwide grid has been made, however. The PlanetLab project is a distributed testbed for testing new networking protocols, planetary scale sharing, and many other ideas which can benefit from having a huge distributed. It involves hundreds of computers at different locations around the world, mostly within academic institutions, on which researchers at the institutions can run experiments. It is not an initiative aimed at creating a global Computational Grid but it does provide some of the things that a Grid must provide, such as authentication and authorisation. It currently has 361 nodes (as at 20 February 2004)[30] connected to it so it is far short of being a worldwide Grid but it is certainly an important step toward it, both in the new research initiatives that it has allowed and in demonstrating that world-wide distributed computing projects are feasible. It has been expected that PlanetLab will have over 1000 nodes distributed over the world by the end of 2004. Its only node in New Zealand is under care off the Network Research Group in the Department ofComputer Science and Software Engineering at the University of Canterbury in Christchurch. Sofar the only Australian node of PlanetLabs is located at the University of Technology in Sydney. 29

1.3 No-one wants to share: One of the biggest problems facing Grid Computing is not a technological one but a social one. Even when the technology exists for Grid Computing to work easily and awlessly, people are still required to donate their spare CPU cycles or Grid Computing will not work at all. Although one of the major points of Grid Computing is that only spare cycles will be used, it still goes against human nature to allow others to access their computers and run programs on them. A fear of viruses is no doubt a valid threat as what has been viewed as a secure system in the past has been shown not to be so, so much work must go into developing a security infrastructure that can be completely trusted. In the SETI@home project, and others like it, work by volunteers around the world allowing their computers to be used for scientic research shows that some people at least are willing to share for no direct benefit to themselves but it is unlikely that everyone would allow this. Within single businesses or university departments it is likely that it could be official policy that every computer must be part of the organisation's Grid, but this would probably not work for The Grid without some sort of global billing system.

5.3 Grid Economics: Before all the separate grids can be connected into one `supergrid' some sort of billing system must be established that is accepted and trusted by everyone. It is unlikely for a worldwide Grid to take and make use of almost all spare CPU time without some incentive for people to make their computers available. However, in order for a world-wide billing system to work, there will need to be some way of accurately keeping track of the CPU time used, the CPU time provided by each user and a way of transferring payment between users. The development of such a system in a way that is scalable and trusted by everyone is necessary before a global Grid can become the reality. The development of such as system could lead to some sort of global bidding system for compute power - which would uctuate like the stock market. The value of CPU time would vary over time according to supply and demand. Daytime hours in the North America during the working week would probably have the highest demand so would cost more, but could make use of the servers in Europe and Asia that are not handling their peak capacity. The analogy of the Computer Grid with the electricity grid can be expanded further - just like it is possible to feed 30

power back into the electricity grid - it will be possible to feed computing power back into the Computer Grid. In order for a stock-market like Grid billing system to succeed, several obstacles must be overcome. Local resources must be able to be used first, otherwise a company could incur costs from using The Grid that they wouldn't have otherwise. This includes stopping nonlocal users from using the local resources in order to run local Grid applications. In order for a stock-market system to work it must also be made sure that businesses or universities do not incur charges that are more than the gain they would have made. If running an application on The Grid saves several seconds but costs $100 then, it is probably not worth it. The ISP charges as well as the Grid charges must be taken into account when calculating how much it will cost to run on The Grid, which further complicates the issue. These problems mean that although The Grid certainly can come into sometime, it is likely that in the next few years at least the development of Grid Computing will focus mainly on the simpler task of creating separate Grids at separate organisations. 5.4 Performance Forecasting: One of the problems with scheduling resources on a Grid is that it is hard to know how long a resource will be available for or how good its performance will be if it is used. Researchers have implemented a tool known as EveryWare which contains, amongst other things, a performance forecasting mechanism [21]. With accurate forecasting, scheduling becomes simpler because it is known that a given resource will react fast to requests or process data quickly. Without accurate performance forecasting a scheduler could schedule a remote set of CPUs to try and speed up processing but actually make it slower because those CPUs do not perform as well as expected. There is still work to be done in this area, however, as the performance forecasting needs to be incorporated into scheduler algorithms and the accuracy of performance forecasting can no doubt be improved. 5.5 The No-Defined Problems Problem: A vital step in solving problems is identifying what they actually are. With any new technology it is hard to know what the key problems to be solved for that technology to work are - there are no forums for putting problems forward to be solved and no systematic attempts by various researchers to solve them . To encourage the formulation of specific problems and solutions, The authors of propose several problems that they see as holding back the progress of Grid Computing and challenge other researchers not only to solve those problems but to supply more.


7Although Grid Computing has reached a state when a common vocabulary has been formed of Grid Computing terms and various components of any Grid Computing system have been spoken of, there is still inconsistency of what the different terms mean and when they are used. When basic terms related with Grid Computing and components of Grid systems are agreed upon, research into Grid Computing will be in a much better shape. 5.6 Security: As mentioned, one of the reasons that people may not want to make their computer available on a Grid is that they do not trust other users to run code on their machines. Within small scale Grids this is not too much of a problem as Virtual Organisations at least partially eliminate the fear of malicious attacks. This is because in a Virtual Organisation you can authorise only those from within a certain trusted organisation to be able to access your computer. However, there could potentially be problems with the authorisation systems and it is possible that someone from within the organisation could act in a malicious way. With larger scale Grids it will be impossible to know and trust everyone who can access a single computer so the Grid infrastructure will have to provide guarantees of security in some way. The Java Sandbox Security Model [14] already provides an environment in which untrusted users are restricted from making certain system calls which are not considered safe, and from accessing memory addresses outside of a certain range. Any Grid system will have to provide a similar mechanism, so that users will be happy to let others access their computer.

5.7 Supercomputing Power For Everyone? In the past, supercomputing power has been available only to very few people - certain people in research institutions and some businesses. If The Grid is ever created, though, supercomputing power will be available to anyone who wishes to access it, although probably at a fairly large cost. This means that, amongst other things, anyone can do huge password searches or can try and crack public/private keys. With the creation of The Grid, these issues will have to be addressed either by somehow restricting users from being able to do such searches or by using even larger keys and passwords. As [5] shows, what is considered to be an unbreakable key one year can be inadequate a few years later, and with the advent of The Grid, this situation will be re-enforced further. There are no doubt many other social issues that will arise when everyone can have access to supercomputing power, and they will have to be addressed as well. 5.8 The Need Not To Centralise: 32

Any Grid system must have some knowledge of what resources are available in order to provide Resource Access and Resource Discovery. The logical way to do this would be to have a central repository listing all resources currently available and who is allowed to access them. The problem with this centralised solution is that it is not at all scalable and means that the entire Grid system is subject to a single point of failure. For these reasons, another way of providing Resource Discovery is required. If there were a central repository containing details on all Grid resources for a large Grid, the speed at which it would need to operate would be immense. The dynamic nature of Grid resources would mean that the list of resources available would need to be constantly updated. Because the availability of resources is dynamic, they can be taken away from the users at any time which means that users may have to be constantly requesting access to further resources. In a Grid of world-wide scale, a single server to handle this would not be possible. As well as the problem of making the central server fast enough, it must also be so reliable that it can never break down. If it did stop working then the whole Grid would also have to stop - and even if some of the communication channels between it and certain sections of the Grid broke, that whole section would have no other server which it could access. Some distributed form of providing Resource Discovery is required for large Grids to operate reliably. 8To solve this problem, the authors of [21] say that they have created distributed, dynamic `State Exchange Services'system called Gossips which manage resource access and discovery and create and destroy themselves automatically. However, as stated there, not every Grid can use that system so more work is required in this area. Other current Grid systems do not address this problem at all (see, e.g. [1] and [20]) - but rely on centralised managers - so could not be scaled past a certain point.

5.9 Grid Programming Environments: Current Computer Aided Software Engineering (CASE) tools and programming languages have not been designed to facilitate the creation of Grid applications. What is considered to be high level in standard software development situations - Java, Message Passing Interfaces (MPI) are referred to as low level in Grid publications . This is because Grid Computing uses the abstractions provided by what are currently referred to as high-level layers - Virtual Machines, etc. - and extends them. For example Grid programmers should be able to treat a network as one huge computer and not have to worry about the individual virtual machine computers that make it up. This extra layer of abstraction should lead to new development environments 33

and possibly things like new programming keywords - `remote', `local', `secure', etc. The current trend toward component based development will continue with Grid applications being made up of different components at different sites. This could mean that huge data sets are stored at one place, analysis is done on the Grid, and visualisation is done somewhere else. The component based structure leads to the need for standard ways of storing and exchanging data, which current tools like XML provide.

6 Grid Computing at the University of Canterbury:

Grid Computing is not currently employed at the University of Canterbury (UC), but there are serveral research teams who would like to work on projects that could make extensive use of Grid Computing. This section outlines details of some of those projects and then the ways in which they could be activated. 6.1 Research Teams and Projects: These are projects of research teams from Physics and Astronomy (Prof. Philip Butler and Associate Prof. Lou Reinisch), Forestry (Dr. Hamish Cochrane), Biological Sciences (Associate Prof. Jack Heinemann) as well as from HIT Laboratory (Dr Mark Billinghurst). Their projects are considered to be so heavily computational that they are not suitable for desktop processing. In particular, the following projects have been planned: Medical Imaging The Department of Physics and Astronomy is hoping to purchase a PET/CT scanner in the near future which would be used for Medical Imaging. Currently running the PET/CT software on a high-end desktop computer means that only about 10% of time is spent doing the scanning and the other 90% of the time is spent waiting for results. It is hoped that this ratio of scanning to processing time could be increased greatly using a Grid, with reduction of processing times at least ten times. Bioinformatic Analysis and Genetic Data Researchers in the New Zealand Institute of Gene Ecology (NZIGE), which includes sta from the Department of Forestry and the School of Biological Sciences, as well as others, would also be ready to make use of a computational grid. The research that would use the grid would mostly involve (in very simple terms) searching for certain patterns on large data 34

sets. This is a very slow process on standard workstations and any increase in speed would 9be considered useful, with a speedup between 2 to 24 times being regarded as good, but anything further better, of course.As well as these, it is envisaged that other projects would use the grid if it were available. Some other potential users are: Proteomics research. Processing data about imported foods on behalf of MAF. This looks for certain features of the foods but is currently a very slow process. Processing astronomical data from the several telescopes that the Department of Physics and Astronomy has access to. 6.2 Potential Grid Tools For UC There are several tools that could be used to facilitate Grid Computing at the UC. All of the projects mentioned above have a focus on data processing rather than data access or any other Grid function, so this section will focus only on the data processing side of Grid Computing. Note that although most of these tools are not Computational Grids as defined earlier in this article they can still provide useful amounts of computing power (and fall into the realm of what is commonly called Grid Computing) 6.2.1 XGrid XGrid is a distributed computing system that is currently installed on all Apple Macintoshes at UC. It claims to automatically detect the precense of other Apple Macs and to be capable of distributing processing to them without any explicit programming . The degree to which this works would have to be investigated further, but although most of the computers on campus are not Macs, enough of them are for a fairly significant amount of processing power to be available from them if the XGrid system is effective. 6.2.2 Globus As mentioned earlier, the Globus Toolkit is often referred to as the de facto standard for creating Computational Grids. It is therefore logical that if a Grid is to be created at UC, the Globus Toolkit be used. The Globus Toolkit is not simply plugged in and used, however, unlike XGrid, but is used to create Grids . For this reason, if the Globus Toolkit were to be used to create a Grid at UC, specialist programmers would have to be employed to put it all together. The advantage of the Globus Toolkit is that it is widely used and well understood and, compared to other tools, it is at least known to work and work well. 6.3 The Akaroa Project 35

Akaroa2 is an automated controller of stochastic discrete-event simulation developed at the University of Canterbury by the Simulation Research Group (the group led by Prof. K. Pawlikowski from Computer Science and Software Engineering, and Associate Prof. D. McNickle from Management). When Akaroa2 was designed at the University of Canterbury in 1992, it was one of the first software packages enabling grid processing. In 1993, it received an international commendation (in Science category) in the Computerworld Smithsonian Award for Achievements in Information Technology, USA. Akaroa2 speeds up simulation experiments by performing multiple replications of the experipment in parallel (MRIP) on multiple computers of a LAN, with a simulation being stopped when the overall results have reached the desired level of statistical precision. It runs the different replications on different machines acting as simulation engines. Akaroa2 has been designed for 10working on local area networks consisting of UNIX/Linux machines. Thus, the degree of its dis-tributiveness is limited by the number of workstations in a given LAN. Currently, students of Computer Science and Software Engineering at the University of Canterbury can use AkaroA2 for distributing simulations utilizing about 250 workstations. Launching Akaroa2 on a Grid system would certainly be very desirable, since access to many more hosts could be possible. The next section investigates how this could be done. 6.3.1 PlanetLab As mentioned, PlanetLab is not a Grid Computing system but is a global testbed for distributed computing systems [30]. The Department of Computer Science and Software Engineering at UC has maintained a node on PlanetLab, so any Grid projects conducted there could use the PlanetLab testbed. This could form a very good way of extending the Akaroa2 project - multiple simulations could be run on different parts of the world instead of on different machines in the same lab, although issues such as the effect of the increased time propogation delay and unreliable access to machines would need to be investigated. PlanetLab would also provide access to another several-hundred machines which could further increase the speed of simulation studies, and allow more complicated simulations to be carried out. 6.3.2 MPICH-G2 and Globus: MPICH-G2 is a grid-enabled implementation of the MPI standard . MPI is a library specification for message-passing which can be used for constructing portable parallel programs. Its goals are to provide portability and performance across many platforms and, because it is aimed at being portable, it could be a good tool to use to modify Akaroa2. MPICH-G2 imple36

ments the MPI standard and extends it using tools from the Globus Toolkit, allowing the creation of Grid applications that run on multiple machines of potentially different architectures . If Akaroa2 were extended using MPICH-G2, it could be run on multiple environments at once (ie. not just UNIX or Linux). This would greatly increase the potential processing power available to simulation applications. MPICH-G2 has C and C++ bindings which make it ideal for use with Akaroa2. Grid Computing means sharing computing resources in order to create super-computing capabilities out of desktop computers by using their idle CPU time. It also involves sharing other computing resources such as data sets and disk storage. It has been around for several years and has reached the stage when there are tools available so that experts can create Computational Grids and use them to solve problems in many fields. There are four vital issues which must be resolved in a distributed computing system before it can be called a Grid. These are Authentication, Authorisation, Resource Access and Resource Discovery. They lead to the idea of Virtual Organisations of collaborators who share resources over a Grid. There are currently several tools available to help developers create Grids. The most widely used of these is the Globus Toolkit, but there are others. There are also several commercial companies which claim to provide Grid systems to clients.Despite all the progress that has been made with Grid Computing, a number of challenges still exist. They must be faced now or in the future if Grid Computing is to succeed as a technology. These include the issue of many separate Grids versus a single world-wide Grid, addressing social issues of resulting from sharing computing resources (the idea of Grid Economics), security issues(allowing untrusted others to run code on your machine), problems with allocating resources (forecasting the performance of resources and creating a way of discovering resources without using a single central repository), and many others. Grid Computing is well suited to some of the research that is being done, or is intended to 11be done, at the University of Canterbury. Projects in Physics and Astronomy, Biological Sciences.

To show the effectiveness of pipelining, we ran a series of experiments. All of our experiments were performed on TeraGrid machines. For local-area tests we ranentirely on the University of Chicago TeraGrid. Our wide-area tests ran between the San 37

Diego Supercomputer Center TeraGrid site and the University of Chicago TeraGrid site. The nodes at these sites are Dual Itanium 1.5 GHz machines with 4 MB of RAM and 1 Gb/s network interface cards. We used the Globus GridFTP server with the modifications described above and a custom client written by using the jglobus libraries described above. To avoid anomalies and bottlenecks in the filesystem, we used the standard UNIX devices /dev/zero and /dev/null as our source and desitation files, respectively. The devices appear as files to the GridFTP server; however, they do no disk or block I/O Figures 3 through 6 show the results of an experiment that transfers 1 GB of partitioned into an increasing number of files. As the number of files increases, the size of each file decreases, but the total number of bytes transferred remains constant at 1 GB. The top x-axis shows the number of files, and the bottom x-axis shows the size of each file. The y-axis shows the achieved throughput in Mb/s. The LAN results in Figures 3 and 4 show how the legacy transfer request techniques quickly suffer when the data is partitioned into multiple files. There is a significant dropoff before just 10 files of 100 MB each, and almost all of the throughput is lost at 1,000 1 MB files. However, the pipelining solution is unaffected by file partitioning until the point where the file sizes are less than 100 KB. The wide-area tests in Figures 5 and 6 show how significantly latency affects the legacy transfers. Sine the round-trip times are greater on wide area networks, the delay between transfers is also greater, and thus the overall transfer time is longer. However, the pipelining case is again unaffected.


Figure 3: Comparison of the performance of pipelined GridFTP transfers with standard (nonpipelined) GridFTP transfers in a LAN with no security

Figure 4: Comparison of the performance of pipelined GridFTP transfers with standard (nonpipelined) GridFTP transfers in a LAN with security

Security affects the results in a way we did not expect. Since we are caching data channel connections in both the cached and the pipelining cases, we did not expect the throughput levels to drop any sooner with security than without security. However, as shown in Figures 4 and 6, this is not the case. As the number of files increases, the throughput drops off sooner when sending with GSI authentication. After extensive investigation we have determined that this result is due not to any data channel handling but rather to message processing latencies on the control channel.


Figure 5: Comparison of the performance of pipelined GridFTP transfers with standard (nonpipelined) GridFTP transfers in a WAN with no security

Figure 6: Comparison of the performance of pipelined GridFTP transfers with standard (nonpipelined) GridFTP transfers in a WAN with security


Between transfers the server sends a reply to the client. In our implementation the data channel must be idle while the reply is formatted and passed to the TCP stack for sending. With nonsecure transfers this time is extremely short. With GSI, however, the reply must be encrypted, and therefore it takes much longer to format. As more transfers are requested, more of these replies must be sent. Thus, this idle time becomes great enough to affect the transfer rate.


y y Allows many outstanding transfer requests Send next request before previous completes Latency is overlapped with the data transfer y Backward compatible Wire protocol doesnt change Client side sends commands sooner y Significant performance improvement for LOSF

Advantages of Grid Computing:

Grid computing has been around for over 12 years now and its advantages are many. Grid computing can be defined in many ways but for these discussions let's simply call it a way to execute compute jobs (e.g. perl scripts, database queries, etc.) across a distributed set of resources instead of one central resource. In the past most computing was done in silos or large SMP like boxes. Even today you'll still see companies perform calculations on large SMP boxes (e.g. E10K's, HP Superdomes). But this model can be quite expensive and doesn't scale well.

Along comes grid computing (top five strategic technologies for 2008) and now we have the ability to distribute jobs to many smaller server components using load sharing software that distributes the load evenly based on resource availability and policies. Now instead of having one heavily burdened server the load can be spread evenly across many smaller computers. The 41

distributed nature of grid computing is transparent to the user. When a user submits a job they don't have to think about which machine their job is going to get executed on. The "grid software" will perform the necessary calculations and decide where to send the job based on policies. Many research institutions are using some sort of grid computing to address complex computational challenges. This post talks about how yous can volunteer your workstation to be part of a grid that attempts to solve the some of the worlds biggest challenges.

Some Advantages of Grid Computing: 1. No need to buy large six figure SMP servers for applications that can be split up and farmed out to smaller commodity type servers. Results can then be concatenated and analyzed upon job(s) completion. 2. Much more efficient use of idle resources. Jobs can be farmed out to idle servers or even idle desktops. Many of these resources sit idle especially during off business hours. Policies can be in place that allow jobs to only go to servers that are lightly loaded or have the appropriate amount of memory/cpu characteristics for the particular application. 3. Grid environments are much more modular and don't have single points of failure. If one of the servers/desktops within the grid fail there are plenty of other resources able to pick the load. Jobs can automatically restart if a failure occurs. 4. Policies can be managed by the grid software. The software is really the brains behind the grid. A client will reside on each server which send information back to the master telling it what type of availability or resources it has to complete incoming jobs. 5. This model scales very well. Need more compute resources? Just plug them in by installing grid client on additional desktops or servers. They can be removed just as easily on the fly. This modular environment really scales well.


6. Upgrading can be done on the fly without scheduling downtime. Since there are so many resources some can be taken offline while leaving enough for work to continue. This way upgrades can be cascaded as to not effect ongoing projects. 7. Jobs can be executed in parallel speeding performance. Grid environments are extremely well suited to run jobs that can be split into smaller chunks and run concurrently on many nodes. Using things like MPI will allow message passing to occur among compute resources.

Methods of Grid Computing:

1. Drozdowskis on-line scheduling method: Our scheduling method is based the On-Line method presented by Drozdowski in [1], denoted "OL" thereafter. OL proceeds incrementally, computing the size i,j of the chunk to be sent to a worker Ni for each new round j, in order to try and maintain a constant duration for the di erent rounds and thus avoid contention at the master. That is it allocates comparatively bigger (resp. smaller) chunks to workers with higher (resp. lower) performance. Hence, this method can take the heterogeneous nature of computing and communication resources into account, without explicit knowledge of execution parameters (as equality (1) shows); as Drozdowski states, "the application itself is a good benchmark" [1] (actually the best one). Lemma 6.1 in [1] shows that, in a static context, with a ne cost models for communication, the way i,j is computed using equation (1) ensures the convergence of i,j to when j increases indenitely. Being an estimation of the asymptotic period used for task distribution, is also an upper-bound on the discrepancy between workers. Being able to control this bound makes it possible to minimize the makespan during the clean-up phase. round from the master to worker Ni (resp. from Ni to the master).


It should be noted that, unlike previous work [1, 9], this paper introduces computation start-up times in order to be more realistic when considering grids. As suggested in section 2, the values of the execution parameters of any worker Ni ensures that sending chunks of any size to a

worker Ni and receiving the corresponding results cost less than processing these chunks. The problem with OL is that computation never overlaps communication in any worker node, as the emission of the chunk of the next round is at best triggered by the return of the result of the previous one, with no possible anticipation.

2.The OLMR method:

2.1 Overview of the method Our method is based on OL, but avoids idle time with respect to computing. When the total load is important compared to the available bandwidth between master and workers, the workload should be delivered in multiple rounds [10, 11, 12]. Therefore we will have each worker receive its share of the load through multiple rounds, hence the name On-Line Multi-Round method [9], denoted "OLMR" thereafter. OLMR divides the chunk sent to Ni for each round j into two subchunks "I" and "II" of respective sizes i,j and i,j i,j . Dividing the chunks in two parts is enough in order to apply the principle, and the division allows the computation to overlap the communications as can be seen in gure FIG.1. In order to compute i,j , we use a value of i,j1 derived from the measurement of the elapsed time (including both communications and computation) for subchunk I of the previous round: i,j1. We will show that, thanks to this anticipation (compared to OL) in the computation of i,j , we can avoid the interround starvation.


So far we have been describing and walking through overview discussion topics on the Grid Computing discipline that will be discussed further throughout this book, including the Grid Computing evolution, the applications, and the infrastructure requirements for any grid environment. In addition to this, we have discussed when one should use Grid Computing disciplines, and the factors developers and providers must consider in the implementation phases. With this introduction we can now explore deeper into the various aspects of a Grid Computing system, its evolution across the industries, and the current architectural efforts underway throughout the world. The proceeding chapters in this book introduce the reader to this new, evolutionary era of Grid Computing, in a concise, hard-hitting, and easy-to-understand manner. In past by implementing the concept of Grid computing achieved the things like robustness, throughput,and standard. In future concentrate the things like secure, scalable, extensible. Finally a grid in need is agrid indeed.

Future work on Grid Computing:

Grid Computing can be defined as the seamless provision of access to possibly remote, possibly heterogeneous, possibly untrusting, possibly dynamic computing resources. Analysed piece by piece, this definition means that Grid Computing provides seamless access to:

1. Possibly Remote Computing Resources Means that local resources, which are on the same LAN, and remote resources, which are geographically distant, can be accessed in exactly the same way on the Grid.

2. Possibly Heterogeneous Computing Resources


Some computers on the Grid can run different Operating Systems on different types of machines. Accessing them via the Grid should be possible without making any special allowances for this.

3. Possibly Untrusting Computing Resources Means that the owner of a computing resource on the Grid might not know or trust other users but should still be confident that they cannot access any non-shared data and cannot make malicious system calls on their computer. The Grid should handle this security checking without any specific instruction from the user or from the sharer.

4. Possibly Dynamic Computing Resources One of the major selling points of Grid Computing is that it makes use of otherwise wastedCPU cycles. The problem with this is that the availability of computers to the Grid changes rapidly as computers become busy and then idle as their owner's usage varies. The Grid system should ensure that this dynamism is hidden from users so that they do not have to program explicitly to take account of this. Seamless provision means that Grid users can access such seemingly un-accessible resources easily without having to worry about all these complications.

Altogether, this definition leads to four main things that any Grid system must provide seamlessly in order to be considered a Grid, 1. Authentication 2. Authorization 3. Resource Access 4. Resource Discovery

4.1.1 Authentication Authentication means that each user has an identity which can be trusted as genuine. This is necessary because some resources may be authorized only to certain users, or certain classes of users. Authentication of a user should happen only once when they start using a Grid - they should not have to sign on separately to each of the many machines that their 46

computation may use.

4.1.2 Authorization Authorisation means that each resource be it the spare computing power on a computer of an organisation or a set of astronomical data will have a set of users and groups that can accessit. TheGrid needs to rst authenticate that the users are who they say they are and then ensure that they are allowed to access the resources that they are requesting. Having groups authorised to access certain resources leads to the idea of Virtual Organisations.

4.1.3 Resource Access Resource Access means that remote resources can be accessible to Grid users. These resources could mean anything from CPU time to disk storage, to visualisation tools and data sets. As discussed, not everyone should be able to access all resources but the Grid must provide a way to access those that are allowed. This means that some sort of virtual machine is required so that machines with different operating systems, etc. can be accessed in a uniform way.

4.1.4 Resource Discovery Being allowed to access thousands of different CPUs is useless without being able to find out where they are. Resource Discovery means that users can find remote resources that they can use. This process should be automated by the Grid so that a user's task can automatically be run remotely without them having to go through the process of finding CPUs that they can use. The automation of resource discovery is complicated hugely by the dynamic nature of Grid resources what is available at one instant of time may no longer be available a while later. Added to this complication is the desire to avoid a single central point where all data is stored because the failure of it would bring the whole system down and one single point of control is not a scalable solution if the Grid becomes really large this central point would be badly overloaded.

3.2 Virtual Organisations 47

The idea of a Virtual Organisation (VO) is that on, say, a university campus-wide Grid, members of the Physics and Biology departments could be working on a project together so they could form a Virtual Organisation for that project where they could all access the data for that project and each other's computing resources. However, those who are not members of the research group would not be members of the VO so would not be able to access the resources. Members of the Computer Science department - who would not be part of the other VO - may be working on a different project however could have separate projects running with separate access rights for a different set of resources. Note that different projects within the same departments could also have separate Virtual Organisations so keep some of their data separate but allow projects from both VOs to use the compute resources. 4 Current Grids and Grid Products There are a number of tools available to help create Computational Grids, both free, open-source ones and commercial products. There is also a standards body which seeks to put forward `recommendations' about how best to do Grid Computing. This section overview of these, and details about several of the many Grids in existence today. 4.1 Tools and Standards 4.1.1 Globus The Globus Toolkit designed by the Globus Alliance contains a set of free software tools services, APIs and protocols - to facilitate constructions of Grids. It is the most widely used toolkit for building of Grids and is frequently referred to as the de facto standard; see e.g. ,. It includes tools for, among other things, security, resource management and communication.The Globus Alliance also researches various issues related to Grid Computing, especially issues relating to the infrastructure of Grids. Almost every Grid which has its details published was constructed using the Globus Toolkit. 4.1.2 The Global Grid Forum The Global Grid Forum (GGF) performs a similar role to the development of Grids as the W3C does toward the development of the World Wide Web, [26]. It is a conglomerate of interested parties including universities, research institutes and industry. It is not an official body so it does not put forward standards but just `best practices' for Grid developers. It is important because it provides a forum for new ideas to be discussed by all interested parties. There are strong links between the GGF and The Globus Alliance - ideas put forward by the GGF are often implemented by Globus. 48 gives an

4.1.3 Condor and Condor-G: Condor is a software tool for distributing computationally intensive jobs over Grids. It works by using spare CPU cycles on other computers. It provides a way of doing resource discovery using `ClassAds' which matches job requests to unused resources. From the Condor product Condor-G has been created. Condor-G is an enhanced version of Condor which can be used to make Grids. It uses Globus tools to provide \security, resource discovery, and resource access in multidomain environments" with Condor's \management of computation and harnessing of resources within a single administrative domain." There has also been work on making separate Condor pools self-organising, fault-tolerant, scalable, and locality-aware" which has proved to be a successful way for automatic management of larger groups of Condor pools.

4.2 Some current Grids in development and deployment

There are many Grids currently in use and in production; in this section we examine several ofthem in detail. These are not claimed to give a representative sample of all current Grids, but are only to give insight into a few of them. The huge Euro Grid project and the United States National Fusion Collaboratory are discussed.

4.2.1 European Data Grid: The European Data Grid is a European Union funded project which aims to create a huge Grid system for computation and data-sharing. It is aimed at projects in high energy physics, led by CERN, biology and medical image processing, and astronomy. It is being developed using and extending the Globus Toolkit. In building the Grid new tools and systems have been developed in many areas useful for the extension of Grid Computing. For example, a method of enabling secure access to databases in Grid environments has been developed [18]. New techniques for searching for patterns in genomic data using the European Data Grid have also been developed .

4.2.2 The National Fusion Collaboratory: The National Fusion Collaboratory project exists to help research magnetic fusion. Magnetic fusion experiments operate on pulses of plasmas which are produced approximately every 15 minutes. The data generated from each measurement must be analysed within the 15 minutes so that changes can be made to the set up in time for the next pulse . This time limit means 49

that it would be very useful for the researchers to be able to analyse the data quickly so that more time can be spent reconfiguring the experimental set up. For this reason, the National Fusion Collaboratory constructed a Computational Grid. This project was also built using the Globus Toolkit and the main research focus is on `advanced reservations of multiple resources' - this means that resources such as computational cycles can be reserved in advance if it is known that they will be required sometime in the future.

4.3 Commercial Grid Products: There are several Grid products currently listed on various websites; see for example and . They claim to easily enable Grid Computing within organisations but it is hard to tell how much they actually do because they do not publish refereed papers - most of the information available about them is probably marketing hype and not a veriable fact. When the NorduGrid was being constructed in Scandanavia they chose to develop their own Grid system because nothing existing was suitable, [11]. This shows that at this stage at least commercial products were not of a high enough standard for real use.



[1] W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster, The Globus striped GridFTP framework and server, in SC'05, ACM Press, 2005.

[2] Gu, Y. and Grossman, R. L. 2007. UDT: UDP-baseddata transfer for high-speed wide area networks. Comput. Networks 51, 7 (May. 2007), 17771799. DOI= http://dx.doi.org/10.1016/j.comnet.2006.11.009

[3] C. Kiddle P. Rizk and R. Simmonds. A GridFTP overlay network service. In In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, Barcelona, Spain, 2007.


[1] http://www.nlr.net/ [2]http://www.uklight.ac.uk/ [3] http://www.csm.ornl.gov/ultranet/topology.html [4] http://www.lambdastation.org/ [5] http://www.atlasgrid.bnl.gov/terapaths