Towards A Distributed Java Virtual Machine Transparent Migration of Objects Using Specialised Methods

Towards a Distributed Java Virtual Machine Transparent Migration of Objects using Specialised Methods
Will Deacon wjd105@doc.ic.ac.uk Supervisor: Tony Field January, 2009
Contents
1 Introduction 1.1 Java . . . . . . . . . . . . . . . . . . . . . . 1.2 The Jikes Research Virtual Machine . . . . 1.3 Method Specialisation . . . . . . . . . . . . 1.4 Object Migration . . . . . . . . . . . . . . . 1.5 Applications of Object Migration in Java . 1.5.1 Data Persistence . . . . . . . . . . . 1.5.2 DJVMs . . . . . . . . . . . . . . . . 1.5.3 In-memory Databases . . . . . . . . 1.5.4 Hot-swappable (fault tolerant) JVMs 1.6 Objectives . . . . . . . . . . . . . . . . . . . 1 1 2 3 6 7 7 7 7 8 8 9 9 9 11 12 14 15 16 16 17 18 20 20 21 22 22 22 23 27 27 27 27 28 28 29
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
2 Background and Related Work 2.1 DJVMs . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 cJVM . . . . . . . . . . . . . . . . . . . . . . 2.1.2 dJVM . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Hyperion . . . . . . . . . . . . . . . . . . . . 2.1.4 JESSICA2 . . . . . . . . . . . . . . . . . . . . 2.1.5 Kaemik . . . . . . . . . . . . . . . . . . . . 2.2 Distribution APIs . . . . . . . . . . . . . . . . . . . . 2.2.1 JavaParty . . . . . . . . . . . . . . . . . . . . 2.2.2 JavaSpaces . . . . . . . . . . . . . . . . . . . 2.2.3 ProActive . . . . . . . . . . . . . . . . . . . . 2.3 Object Persistence . . . . . . . . . . . . . . . . . . . 2.3.1 Orthogonally Persistent Java (OPJ) . . . . . 2.3.2 Java Data Objects (JDO) . . . . . . . . . . . 2.3.3 Enterprise JavaBeans (Java Persistence API) 2.4 In-memory Database Systems . . . . . . . . . . . . . 2.4.1 Space4J . . . . . . . . . . . . . . . . . . . . . 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 3 Specication 3.1 Aims . . . . . . . . . . . . . . . . . . . . . 3.2 Milestones . . . . . . . . . . . . . . . . . . 3.2.1 Method Specialisation Framework 3.2.2 Data Persistence Transforms . . . 3.2.3 Checkpointing . . . . . . . . . . . 3.2.4 Thread Persistence . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
3.2.5 3.2.6
Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . A Distributed Java Virtual Machine . . . . . . . . . . . .
29 30 31 31 31 32 32 33 33 34 34 34
4 Evaluation 4.1 Testing . . . . . . . . . . . . . . . . . . . 4.1.1 Jikes . . . . . . . . . . . . . . . . . 4.1.2 Method Specialisation Framework 4.2 Benchmarking . . . . . . . . . . . . . . . . 4.2.1 DaCapo . . . . . . . . . . . . . . . 4.2.2 SpecJBB . . . . . . . . . . . . . . 4.2.3 SpecJVM . . . . . . . . . . . . . . 4.2.4 Java Grande Forum . . . . . . . . 4.2.5 Conclusion . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
ii
Abstract This project aims to explore the idea of method specialisation [1] as a means for fast and ecient object migration within the Java virtual machine. The idea is that by applying suitable bytecode transformations at classloading time, virtual method tables can be hijacked depending on the state and location of the object in question. We plan to implement transforms providing object persistence and transparent migration as a precursor to a fully distributed JVM.
Chapter 1
Introduction
Object-Oriented programming languages have gained immense popularity due to their convenient level of abstraction [2]. Objects provide a way to encapsulate related data alongside the knowledge to manipulate them. This paradigm naturally lends itself to modularity; a program being considered as a collection of objects interacting with one another to perform a given task. Taking the object as the nest level of granularity, programs can be partitioned and manipulated in order to introduce features such as distribution, persistence and redundancy. Central to realising these goals is the concept of object migration. Managing object movement manually requires program modication and the hard coding of transfer operations, reducing the portability of the code and forcing the programmer to worry about consistency issues. Runtime checks to determine the state of objects are expensive and diminish the possible performance gains originally aorded by the migration. Using the concept of specialised methods (Section 1.3), we aim to explore the potential for transparent migration of objects (Section 1.4) in Java (Section 1.1).
1.1
Java
In 1995, the Java programming language was introduced by Sun Microsystems [3]. Hailed as an architecturally neutral, portable and high performance [4] programming language, it was quickly adopted and even incorporated into web browsers due to its applet functionality and platform independence. Today, Java is a popular general-purpose language, used in a wide variety of applications [2]. Many systems, including mobile devices, contain implementations of the Java Virtual Machine (JVM) on which Java bytecode is interpreted. The use of a virtual machine facilitates portability and allows for a higher level of abstraction than native machine languages, such as C++. For example, the Java virtual machine provides a number of dierent garbage collectors to relieve the programmer from the burden of explicit memory management [5]. Pointers and therefore pointer arithmetic are not made available to the programmer in Java, helping to hide the underlying hardware on which the virtual machine is executing. A consequence of this level of abstraction is a potential loss of performance. Garbage collection cycles can momentarily pause the program at a time-critical
point, programmers can not sensibly allocate objects with cache layout in mind and interpreting bytecode is slower than executing native machine code 1 . Recent improvements to the JVM have closed the performance gap with respect to native languages [6], but the potential for performance degradation remains non-trivial [7]. These performance concerns, combined with the need for the virtual machine to present a Single System Image (SSI), mean that high performance cluster applications (often written in C or Fortran) have not migrated to the language at the same pace as applications in other domains.
1.2
The Jikes Research Virtual Machine
Jikes RVM [8] is an open source implementation of the Java virtual machine. It grew out of the Jalapeo and was eventually open-sourced by IBM. Unusually, n it is itself written in Java and makes use of a boot image in order to load the runtime [9] without requiring a second JVM. The high level abstractions aorded by the Java programming language have made Jikes a popular virtual machine for the implementation of research projects. The design of the virtual machine is also well documented and easily modied when compared to more commercial implementations, such as Suns JVM. A number of compilers have been implemented into Jikes and can be selected at build time. Each compiles bytecode into machine code in a slightly dierent way. Baseline The baseline compiler is designed to generate code where correctness is favoured over eciency. This is useful for debugging other aspects of the RVM, but is too slow for use in a production environment. The baseline compiler is used to load other compilers dynamically and so is always present in the boot image. JNI The JNI compiler cannot be selected like the other two compilers as it is designed especially for compiler Java methods marked with the native keyword. Native methods require the virtual machine to execute code written for the native machine on which the RVM is running. To facilitate this, the JNI compiler generates code to transfer from the virtual machines internal calling convention to the ABI of the host. Optimising Finally, the optimising compiler can be used to realise a substantial performance increase over the baseline compiler. This doesnt come for free however, and the optimising compiler is substantially more complicated than the other compilers. The bytecode is transformed through a further three representations before machine code is eventually emitted. Common compiler optimisations such as method inlining and elimination of tail calls are applied to the code in order to generate highly ecient native code. Of particular interest to this project is the threading system used by Jikes. Rather than fork a new native thread for each Java thread 2 , green threads are multiplexed onto a number of virtual processors. Each virtual processor is
1 Although 2 Older
the Sun Hotspot compiler gets around this versions of Jikes did do this, but the code was later re-written.
implemented as a native Posix thread. The number of virtual processors remains constant during program execution and must be specied when invoking the RVM. This approach limits the maximum number of threads which Jikes can spawn, preventing it from monopolising the systems resources. However, it comes with its problems. A number of thread queues are used for thread scheduling. A global thread queue can serve all the virtual processors and each virtual processor also has a private local thread queue. In Jikes, all threads run at the same priority, so a simple load balancing system moves threads between the queues at regular intervals. This can aect performance signicantly when the threads use storage local to the underlying pthread (i.e. when executing native code). Although written in Java, Jikes achieves impressive performance when using the optimising compiler in conjunction with the Adaptive Optimisation System (AOS) [10] to recompile code at runtime.
1.3
Method Specialisation
Method specialisation is a technique for dynamically modifying methods depending on the state of the receiver object. This allows the programmer to hijack the dynamic dispatch mechanism of Object-Oriented programming languages in order to invoke specialised method variants. Dynamic dispatch is the process of invoking a virtual method. This is complicated by the fact that a key feature of the Object-Oriented methodology is the ability to override and overload methods. Selecting the correct method at runtime is achieved by the use of virtual method tables (VMTs) which contain (amongst other things) pointers to the correct method implementations for each object class.
Figure 1.1: The implementation of virtual method tables in Jikes RVM (image taken from [1]).
We can consider the diagram of 1.1 to represent the special case where there is only a single VMT. As shown, an object in the heap contains a pointer to a Type Information Block (TIB) which describes the layout of the object. The VMT for the object is appended to the TIB as a list of pointers to int[]s containing the instructions for the relevant methods. We consider only the Jikes RVM (Section 1.2) implementation as this is the JVM on which [1] was built and which we shall be working. If we extend the dispatch system to include support for multiple tables, we can select one of a number of candidate methods at runtime. This is the idea behind specialised method variants. The specialisation framework developed in [1] achieves full virtualisation by beanifying 3 all of the objects in a program. Consequently, all object accesses are directed through a VMT and are candidates for modication. With this achieved, the framework modies the TIB layout such that an array of TIBs is now present for each object.
Figure 1.2: The modied TIB structure implemented in [1] to support specialised method variants. The new structure can be seen in Figure 1.2. Each specialised method instance is referred to by a separate TIB which is itself held in an array of TIBs. The unspecialised (primary) TIB is placed at index 0 of the TIB array and, for typechecking purposes, all other TIBs hold a pointer directly to it. TIB ipping is achieved by manipulating the TIB pointer held inside a heap object. This is exposed via the following API methods: /* Returns the TIB at index tibSpecNum for the Object o */
3 The process of beanifying an object requires that get and set methods are generated for all elds in the object class.
Object[] getTIB(Object o, int tibSpecNum); /* Sets the TIB of Object o to be the one held at index tibSpecNum */ void hijackTIB(Object o, int tibSpecNum); /* Resets the TIB for Object to the primary (default) TIB */ void restoreTIB(Object o); The benets of method specialisation are twofold. The performance loss associated with full virtualisation of objects can be mitigated and potentially eliminated by allowing the Adaptive Optimisation Subsystem within Jikes to perform method inlining. It is worth pointing out that virtual methods existing in the original source code incur no additional cost from full virtualisation since invocations of such methods already dispatch via the virtual method table. The presence of specialised methods now presents us with a problem; we must perform a runtime check before executing an inlined method to ensure that the inlined code matches that of the corresponding method in the current TIB. To solve this problem, the framework provides a new guard, ig tib test, which invokes the correct version of the method in the face of inlining. With method inlining compensating for the costs of virtualisation, the use of the framework can then be used to gain performance. Consider the following pseudo-code fragment: if (state of object != normal state) { /* Perform special-case code to put object in normal state */ } /* Proceed normally */ Every time this code is executed, the state of the object is explicitly checked. If the special-case code is written as a specialised method, the TIB of objects that are not in the normal state can be modied to point to the specialised code. This specialised code can then place the object in the normal state, revert the TIB and execute the standard code (i.e. the Proceed normally block). This technique completely removes the runtime check and with it, the corresponding performance implications (see Section 2.1.3). As well as oering potential performance gains, another benet of method specialisation is ease of use. If a program contains many instances of the code above, it would be easy to miss checks of the object state or even to check it and perform an incorrect action. Method specialisation allows the programmer to dene specialisations as single transforms, vastly improving the locality of the code in and reducing the eort required to modify the behaviour in the future. This is similar to the crosscutting features available in aspect-oriented programming. A prototype of the framework has been used successfully to implement an implicit read-barrier which will be used to construct a self-scavenging garbage collector in the future.
1.4
Object Migration
Object migration is the process of managing object state transitions across multiple address spaces. This process has been extensively studied in the eld of object-oriented databases [11, 12, 13, 14] where either a single, distributed database moves objects between nodes or multiple databases operate on some shared data and require synchronisation. Migration in distributed computation, however, usually concerns processes communicating via messages, explicitly dividing up the workload rather than sharing it between nodes [15]. This is due, in part, to the complications of ensuring both ecient and error-free object distribution but also because distributed systems originated from a non objectoriented background. Some of the potential benets of object migration are: Data Robustness Objects can be replicated across address spaces. This may be as simple as writing them to stable storage or as complicated as distributing them across multiple nodes in a cluster. An object-oriented database can use object migration to write to redundant storage. Load Balancing Distributing objects amongst a number of compute nodes provides a way to exploit data-level parallelism. The distribution of objects can be adjusted at runtime to exploit the hardware available in an optimal fashion. Sharing State Many long-running, scientic simulations involve numerical analysis on an underlying model. Using object migration it is possible to share a single model with a number of experiments and to reason concurrently about them. System Reliability In critical applications, redundant standby nodes can be used to recover from node failure. The objects are migrated from the failing node 4 and the execution is allowed to continue on one of the standby nodes. Despite these benets, object migration is subject to a number of pitfalls: Consistency Ensuring object replicas contain consistent information when read by dierent nodes is not an easy task. Heavy use of locking and invalidation of remote objects both lead to unwanted performance degradation. The use of local object caches also requires careful management. Non-Trivial Performance Issues If an object is constantly changing state between a number of nodes then performance will suer. This is akin to cache-line thrashing in a multi-processor system. In a statically scheduled environment, the relative speeds of nodes in a heterogeneous system may introduce wasted CPU cycles while a faster node waits for a slower one to complete. Program Modication To facilitate object migration, the programmer must manually code all object transfers and ensure the consistency of replicated objects in the system. This ties a given program to the hardware on which it is designed to execute and reduces the portability of the code.
4 If the failure is so severe that object migration is not possible, a method of checkpointing during execution could be used.
Nevertheless, there are a number of domains where object migration has been successfully implemented. We discuss these domains in the following section and evaluate a number of implementations in Chapter 2.
1.5
1.5.1
Applications of Object Migration in Java

Data Persistence
Persistent data is characterised by having a lifetime which extends that of the program in which it was created. A simple example of persistent data is that of a non-volatile database. If a program creates a new record in a database, then the record is committed to stable storage and becomes available for subsequent accesses by any future executions. In-memory databases are discussed in Section 1.5.3. Many applications require a portable form of persistence without the overheads of a full database application. Object migration can be used to full these demands by writing objects to disk in a serialised form at specied checkpoints. In this situation, objects take on two states; they are either stored on disk or loaded in the applications memory. The programmer must ensure that during execution the persisted data (if any) is loaded into memory and used as required. When execution terminates, any changes must be written back to disk. There are a number of Java APIs available to allow objects to persist between program executions (see Section 2.3).
1.5.2
DJVMs
In an eort to hide the movement of objects from the programmer, there have been a number of attempts at creating a Distributed Java Virtual Machine (Section 2.1). Making the underlying JVM cluster-aware allows execution to migrate transparently between nodes in order to achieve parallelism. Although a number of approaches have been taken when implementing these systems, object migration is always present in some form or another. Whether migrating shared data between multiple nodes or executing a method on a remote object, issues such as consistency, addressing and object state all come in to play; every object dereference having to be checked at runtime.
1.5.3
In-memory Databases
Database performance can be improved signicantly by holding records in memory rather than on slower stable storage. The volatile nature of memory, however, is not ideal for safely storing database records and measures must be taken to satisfy the ACID 5 properties of database systems. To achieve durability, object migration may be used to perform asynchronous updates to a stable store, such as a local disk array or another remote database. The database could also be distributed across multiple nodes, each holding replica objects to minimise the possibility of data loss and using object migration to synchronise with each other.
5 The
D in ACID stands for Durability
1.5.4
Hot-swappable (fault tolerant) JVMs
As discussed in Section 1.1, Java is not yet widely used for cluster computation. Instead, multiple JVMs are often created and each node runs a separate Java process on a self-contained sub-problem of the overall goal. If a node fails in this environment, the workload will not be automatically re-distributed and the other nodes will continue blindly. To cater for this scenario, a number of hot spares can be implemented into the cluster. The purpose of one of these nodes is to resume the work of a failing node. At the simplest level, object migration can be used to migrate the working set of the failing node to the spare, which then begins to operate on the partially processed data. However, this requires the program to have explicit code to handle the recovery case. A related problem is that of thread migration 6 , where the spare node must recreate the state of the threads in the failing system before resuming execution where it left o.
1.6
Objectives
The objective of this project is to explore the potential for transparent object migration using the method specialisation framework described in Section 1.3. In particular, we plan to investigate applications of the framework for transparent and explicit persistence of Java programs as well as fault tolerance. The runtime checks associated with object migration (namely identifying the state of an object and taking appropriate action to ensure it can be used) can be eliminated in the same way in which the implicit read-barrier was implemented in [1]. A further objective is to identify and characterise the issues associated with a distributed JVM. The outcome of this research, combined with the above work, can then be used to implement a prototype system in which objects migrate between nodes when required by the threads operating on them.
6 In Java, threads are themselves represented as transient objects, so this is a special case of object migration.
Chapter 2
Background and Related Work

2.1 DJVMs
Existing implementations of a distributed Java runtime suer from a number of pitfalls: Lack of (current) development Active research into distributed JVMs appears to have diminished in recent years. Previous work has been left unmaintained and become subject to bit rot. Availability Where the design of a distributed JVM has been described, there is rarely a publicly available implementation of the ideas. In the few cases where code is available, it is commonly a research prototype oering only a limited subset of the full functionality required by a real Java program. Performance analysis Often, a paper will describe the algorithms to implement a distributed JVM eciently, but then fail to disclose any real benchmarks. Where benchmarks are evaluated, they are typically embarrassingly parallel and oer no real insight into the overheads of communication and data movement. No de-facto standard As a result of the above points, no single distributed JVM has entered regular use by a body of programmers. The lack of even a de-facto standard leaves the situation as a collection of incomplete, unmaintained and disparate implementations, oering no support or future-proong to programmers wishing to use them. With this in mind, we evaluate some proposed solutions to the problem of distributed Java and look specically at the use of object migration within them.
2.1.1
cJVM
cJVM [16] was developed by IBM around 1999. They claim to be the rst to implement a cluster-aware JVM, where the cluster is completely hidden from the application. The cluster in mind is a collection of homogeneous machines 9
connected by a fast communications medium and the application domain is that of Java Server Applications (JSAs). The goal of cJVM is to distribute a standard multi-threaded Java application transparently across a cluster by assigning the application threads to dierent nodes. In order to provide this illusion, a distributed heap is implemented by cJVM. Each node runs a cJVM process capable of creating new objects. When a new object is created, the node responsible for its creation is said to hold the master copy of the object. External references (proxies) held by other nodes can be used to access the object indirectly. To support the illusion of a single address space, objects passed as arguments to remote procedures are assigned a unique global identier. Rather than move data around the cluster, cJVM migrates execution between nodes. That is, when a thread attempts to dereference a proxy object, execution is momentarily migrated to a thread running on the node where the master copy of the object is held. On this node the method is invoked before execution returns to the original thread on the remote node. This process is referred to as method shipping. Method shipping requires the concept of distributed stacks, the stack of a Java thread being split up over multiple system threads. Each Java thread is also assigned a logical identier in order to provide a handle on related system threads and any monitors held by the thread. To compensate for dierent data access patterns, proxy objects can dynamically change behaviour at run-time. The proxy implementations provided by cJVM are described as: Simple proxy Default implementation - the method is executed on the master node. Read-only proxy When data is read-only, the method can be executed locally. Copies of the data are therefore held by this proxy. Proxy with locally invoked stateless methods Where object elds are not referenced, a method can be invoked locally. Dynamic switching among proxy implementations is provided by extending the virtual method table of a class into an array of method tables, each of which refers to a dierent proxy implementation. This is similar to the idea of method specialisation described in [1]. Load balancing in the cJVM is achieved by intercepting the new opcode. If the parameter is a runnable object, the opcode is rewritten as a private remote new opcode which can then consult a load balancing system in order to select the optimal node on which to run. The performance of cJVM is unclear. Benchmarks taken from their website 1 oer little insight into the performance of real-world applications (Figure 2.1). They claim 80% eciency on a 4 node cluster but this cannot be validated because the source code (or even a binary distribution) has not been made publicly available and the project has been inactive since 2000. In conclusion, cJVM is a sound proposition for a distributed JVM implementation. The use of proxy objects allows for a fully distributed heap and
1 http://www.haifa.il.ibm.com/projects/systems/cjvm/benchmark.html
10
Figure 2.1: cJVM benchmarks for three selected kernels. automatic thread migration hides the cluster architecture from the programmer. The lack of an available implementation and a full suite of benchmarks limits the use of cJVM as a comparison with other distributed JVMs.
2.1.2
dJVM
dJVM [17] is an open-source research project built on top of the Jikes Research Virtual Machine (RVM) described in Section 1.2. Originally, the dJVM project was to explore the themes of performance enhancement, fault-tolerance and memory management, but since 2004 the last two areas have been dropped. The prototype code available is unmaintained, obsolete (supporting Java versions 1.3 and 1.4) and only implemented in the baseline compiler of Jikes (as opposed to the optimising compiler). The implementation is not robust enough to run any real benchmarks so the performance of the code is largely unknown. The approach taken is similar to that of cJVM. A master/slave architecture is used to coordinate interactions within the system. Upon booting, the master node activates the communication layer in order to connect each slave node to every other node in the cluster, forming a fully connected node graph prior to execution. Communication within the dJVM occurs via the use of messages and consists of a substrate, registry and thread pool. The substrate provides a means of replacing the underlying communications protocol (e.g. TCP/IP) with another, if desired. It also provides a pool of message buers which can be re-used to support garbage collection. The registry is used for messages which require a response; the outgoing message registers itself with the registry in order to receive a notication when a response is sent back from the recipient. Finally, the thread pool contains a number of message-processing threads which assemble and decode messages from incoming packets. Messages can be either synchronous or asynchronous and are of the type send, decode or process. dJVM makes use of a centralised class loader running on the master node. This creates a bottleneck but is justied by being simple to implement and by exploiting the observation that class loading generally becomes less frequent as a program executes. A class is described in Jikes as a set of constant objects, so these are simply copied to the slave node when classes are loaded. More interestingly, method compilation during class instantiation is performed locally on
11
the slave node. Compiled methods are private to the node; sharing these objects is a potential optimisation that could be explored in future work. Globally used classes are instantiated at the master node. Within the dJVM, objects have either an Object IDentier (OID) in the case that they are held locally, or a Local logical IDentier (LID) to indicate that the object is remote. A LID can be mapped on to a Universal IDentier (UID) which in turn can be used to determine the remote node. An OID is simply the address of the locally held object. Invoking a method on a remotely held object is therefore performed by: 1. Locating the remote node 2. Deciding where to execute the method. static methods, for example, can be executed locally as static method code is replicated at each node. 3. In the case of remote invocation, the method parameters are transferred to the remote node prior to execution. Inter-node parameter passing is achieved using a proxy/stub model. Methods are compiled along with a proxy and a stub method. The proxy method creates a message containing the parameters, which it then sends to the remote node. The stub method unpacks the parameters from this message and invokes the relevant method on the (now local) object. dJVM makes use of both data and thread movement in an attempt to improve performance. However, it is the location of data in the system that decides where execution takes place so, like cJVM, execution migrates to where the data is held. No benchmarks are available as the system is not yet robust enough to execute any substantial pieces of Java code. It is worth noting dJVM is still being worked on internally at ANU even though the original funding has expired.
2.1.3
Hyperion
Hyperion [18] takes a novel approach to the distribution of Java threads. Rather than directly implement a distributed virtual machine, Java bytecode is generated with a standard Java compiler and then immediately translated into C using the java2c tool which forms part of the Hyperion system. The generated C code is linked with the Hyperion runtime library and compiled with a standard C compiler (Figure 2.2). The resulting executable contains an encoding of the original Java program alongside an internal API for distributed thread and data management. The runtime system is made up of the following components: Thread subsystem This is built on top of the Marcel [19] thread library from PM2 2 . The Marcel library oers the ability for RPCs and thread migration between nodes; the latter being used for load balancing in Hyperion. Communication subsystem Communication in Hyperion is again provided by PM2. RPC invocations between nodes are used to invoke methods and create new threads where appropriate.
2 http://runtime.bordeaux.inria.fr/Runtime/software.html
12
Figure 2.2: The process of compiling a Java program to run using the Hyperion framework. (Image taken from [18]) Memory subsystem The illusion of a Distributed Shared Memory (DSM) is created using the distributed shared memory layer in PM2. The Java Memory Model (JMM) does not require sequential consistency for nonvolatile variables, allowing nodes to request data asynchronously from other nodes during execution. The JMM also species thread-local storage, or caches, to improve performance of multi-threaded applications. Hyperion implements these caches, as well as per-node caches which are shared by all the threads on that node. The DSM also provides a number of optimisations; for example objects accessed inside a loop may be cached during the execution of the loop. Hyperion uses a number of memory primitives (loadIntoCache, updateMainMemory, invalidateCache, get, put) which are implemented at this layer. Object migration in Hyperion occurs when multiple nodes access the same object. Like both cJVM and dJVM, objects are considered to have a master copy located at their home node (the node on which they were created). When another node accesses a remote object, a local copy is created in the requesting nodes cache. Writes to this object by the (non-home) node then occur only in the cache. To synchronise with the home node, updateMainMemory is called to write back the object to the home node and ensure that the object is consistent again. Unfortunately, Hyperion is now a rather dated system. Only a small number of native methods from the Java API 1.1 have been implemented, making Hyperion unsuitable for widespread use. The load balancing system simply uses a round-robin scheme rather than anything more advanced, for instance migrating threads away from overloaded nodes. The performance of Hyperion is once again unclear. The single benchmark evaluated in [18] is not a widely recognised benchmarking program and the particular implementation is not disclosed. It is therefore dicult to assess whether the Hyperion system is optimally distributing threads and controlling data movement or whether the problem is simply embarrassingly parallel. The results from the benchmark show that the Hyperion system is about 5x slower than native C code when executing a sequential implementation. This overhead is particularly large and is attributed to runtime checks performed by Hyperion. These runtime checks can involve checking array bounds, or more interestingly checking object locality. The DSM underlying Hyperion is object-based rather than page based, each object reference therefore requires an in-line locality check to decide whether to fetch the possibly remote object. Results show that, when executing on a single node (i.e. making all objects local) performance increased 13
by almost 50% due to the absence of these checks. However, the benchmark scales well when executing a parallel implementation achieving between 78% and 90% eciency across four nodes.
2.1.4
JESSICA2
JESSICA2 [20] is a DJVM implementation from the University of Hong Kong. It is a successor to the JESSICA (Java-Enabled Single-System Image Computing Architecture) DJVM developed by the same team. JESSICA2 diers from the original project in that it uses a JIT to translate Java bytecode into native code prior to execution, as opposed to running the bytecode in an interpreter. The JIT is a custom implementation referred to as JITEE and has built-in support for thread migration between nodes at bytecode boundaries.
Figure 2.3: The JESSICA2 architecture. A master node migrates threads between slave nodes. Each slave node executes bytecode using a cluster-aware JIT existing in a shared object space. (Image taken from [20]) The single system image of JESSICA2 is realised using a Global Object Space (GOS). The GOS is essentially a purpose-built DSM designed with the JMM in mind. This approach oers the advantage that object migration can simply involve moving empty object shells because dereferencing these will cause the GOS to fetch the relevant data automatically. Figure 2.3 shows the overall system architecture of the virtual machine. Both objects and threads can be migrated in JESSICA2. The master/slave architecture is used to distribute multi-threaded Java programs, with the master migrating threads to and between the slave nodes according to a load monitor. Thread migration is achieved using a technique called stack capturing. Stack capturing allows threads to migrate even though they are subject to execution by a JIT. This diers from Hyperion (Section 2.1.3) in that JESSICA2 must be able to match bytecode to native machine code, whereas Hyperion runs purely
14
machine code and uses a generic thread migration layer. The migration process may therefore only occur at certain coherent points, i.e. those points where the native code lines up with the end of a bytecode instruction. These points are identied by the JIT and migration code is added into the native code. When execution reaches a candidate point for thread migration, the added code checks whether or not it should migrate to another node. If so, the machine registers are spilled onto the Java stack, type information about the variables on the stack is encoded and the Java stack frames to migrate are chosen. Utility methods are then called to perform the data transfer and migrate the thread to the remote node. As with many DJVMS, objects have a home node associated with them. However, JESSICA2 allows the home node of an object to change at runtime depending upon execution heuristics. This allows objects to move towards the nodes that are using them the most and ultimately reduce the amount of network trac. The performance of JESSICA2 is largely achieved by using a JIT on each node rather than a bytecode interpreter to execute the program. CJVM (Section 2.1.1) makes use of an interpreter and suers as a result. Despite using a JIT, the performance of JESSICA2 is still generally worse then the unmodied Kae JVM on which it is built. The Java Grande Benchmark suite (Section 4.2.4) was used to benchmark the two JVMs. The only two benchmarks in which JESSICA2 outperformed Kae by a reasonable margin were sync and SOR, although this was achieved in part by replacing the locking mechanism of Kae with native locking code for JESSICA2. A major part of the performance trouble is accredited to having to check the state of each object on every access. It is claimed in [20] that this checking code alone contributes to as much as 50% of the native code executed. JESSICA2 is available as an open source prototype, implemented using the Kae JVM 3 .
2.1.5
Kaemik
Another DJVM implementation build using the Kae JVM is the suitably named Kaemik [21]. Like Hyperion (Section 2.1.3), Kaemik uses a DSM layer to hide the distributed heap on which the DJVM must operate. The DSM is accessed using a single address space which subsumes the virtual memory spaces of all the nodes. The architecture can be seen in Figure 2.4. Memory in Kaemik is managed as a collection of regions which can be allocated or destroyed via an API. Creating a new region will return either a new region (in which to allocate objects) or a reference to a remote region if one already exists with the given identier. Threads are assigned to nodes in a round-robin fashion, assuming homogeneity of system. To make up for the performance lost through the DSM, a high speed interconnect is used to connect up the nodes. The interconnect is used to implement inter-thread communication and minimise latency. Nodes can be interrupted using this network, allowing for thread synchronisation between them. The performance of Kaemik is evaluated using a ray tracer benchmark. Since this can be computed using an embarrassingly parallel algorithm, it does
3 http://www.kaffe.org
15
Figure 2.4: The architecture of Kaemik. A separate JVM process runs on each node, communicating with other nodes using a high speed interconnect accessed through a single address space (SAS). (Image taken from [21]) not really say much about the implementation of Kaemik. Two versions of the benchmark are described; an optimised version and a non-optimised version. The non-optimised version exhibits a large drop in performance when moving from a single node system to one with two nodes. Adding a third node mitigates this slightly, but it still runs at 0.104 times the original speed. The optimised version, however, shows good performance gains as the number of nodes in the system rises, achieving 2.81x speedup with three nodes. The dierence between the two benchmarks makes clear that Kaemik is highly application dependent, even in the face of easily parallelisable problems.
2.2
Distribution APIs
An alternative to the transparent approach of the DJVM concept is the use of an API to control explicitly the migration of objects. This can be presented to the programmer at various levels of abstraction.
2.2.1
JavaParty
JavaParty [22] is an API designed for easy distribution of multi-threaded programs across cluster hardware. A remote keyword is added to the Java language for describing classes. Instances of classes decorated with the remote keyword can be accessed by any node in the cluster. Remote objects therefore migrate around the system as and when required. Migration can occur automatically or manually, allowing the programmer to assign priorities to particular nodes and tune migration in order to improve performance. Similarly, remote objects can be declared as resident to prevent them from automatically migrating around the system. 16
A basic program using the JavaParty API is shown below. public remote class HelloWorld { public void hello() { System.out.println("Hello World from machine " + DistributedRuntime.getMachineID() + "!"); } public static void main(String[] args) { for (int n = 0; n < 4; ++n) { // Create a remote object on some node HelloWorld world = new HelloWorld(); world.hello(); } } } The HelloWorld class is declared remote. When the main method instantiates a number of HelloWorld objects, JavaParty will automatically distribute these onto idle nodes. When the hello() method is invoked on these remote objects, it prints out the machine ID for the JVM on which the DistributedRuntime has placed it. An example trace from this program may be: Hello Hello Hello Hello World World World World from from from from machine machine machine machine 0! 1! 2! 3!
JavaParty provides a simple API for adapting multi-threaded programs to a distributed environment, allowing the programmer to interfere as much or as little as they wish. It does, however, assume that nodes do not fail and that the network is robust. Exceptions are not thrown, let alone handled, if such failures do occur.
2.2.2
JavaSpaces
JavaSpaces [23] is an API oering a dierent take on the problem of program distribution. Internally, JavaSpaces uses the RMI and Object Serialisation features of Java to abstract data movement in the form of a space. Put simply, a space is a persistent object repository accessible via a network. It acts as a store for serialised objects that can be accessed by a number of nodes using the following primitive operations: write Write a new object into a space, making it accessible to all users of the space. take Retrieve an object from a space. This operation removes the object from the space in which it resides. read Take a copy of an object from a space. notify Notify an object (i.e. a user of a space) when an object matching a given query is found in a space. 17
The idea is to co-ordinate processes by capturing the ow of objects between them in a number of dierent spaces. Objects migrate through spaces in order to arrive at one or more destination nodes.
Figure 2.5: An example architecture for a JavaSpaces program. (Image taken from [24]) Figure 2.5 shows a possible JavaSpaces application. It consists of three spaces, each exposed to a number of clients as a service. Objects migrate between clients through the spaces. Unlike JavaParty, existing code has to be re-architected to benet from JavaSpaces technology. This restricts language portability and ties an application to the specic API used.
2.2.3
ProActive
ProActive [25] is a suite of middleware designed to make parallel programming easy. It is implemented on top of standard Java APIs. The core idea of ProActive is that of active objects; objects which have an individual thread of control and can migrate around the system. An active object is made up of one or more passive objects and has a single entry point (the root). Method invocations on an active object are stored in a queue, akin to a re-order buer, which is served arbitrarily by the active object. It is not clear how consistency is enforced in this scheme; for example the reordering of getter/setter methods can result in incorrect generation of results. Active objects can be collected up into node objects which are then hosted, alongside other node objects, by a JVM in the system. Nodes are addressed in a similar vein to RMI calls, as can be seen in Figure 2.6. Execution of a method on an active object results in an asynchronous remote invocation, returning a future object if the method has a non-void return type. The future object behaves as a normal object until it is dereferenced. At this point, the program will block until the remote invocation has terminated and the future object is transparently replaced with the correct return value. Manual migration of active objects is possible in ProActive using the API
18
public class Foo { private int n; public Foo(int n) { this.n = n; } public void increment() { ++n; } public int value() { return n; } } /* * Create a new Foo object on the JVM hosting the node object Node1 on the * machine mybox.doc.ic.ac.uk. The constructor is invoked with the parameter 0. */ Foo f = (Foo)ProActive.newActive("Foo", 0, "//mybox.doc.ic.ac.uk/Node1"); /* * Asynchronously call the increment method on f. This will return as soon * as the remote object has received the call. */ f.increment(); /* * This call will return a future for n. That is, it will return immediately * but n may not hold the return from the call. */ int n = f.value(); /* * This call will block due to the wait-by-necessity semantics. * Once n receieves a value, the call will continue to execute. */ System.out.println(n); Figure 2.6: A simple example of the use of an active object. The code will print out 1 when executed.
19
calls migrateTo(URL) and migrateTo(Object), migrating an object to a specied node and the location of another object respectively. These API calls are static and apply to the object from which they are invoked. A more interesting feature of ProActive is grouping of active objects. A group is essentially a container of handles on active objects of comparable type 4 . These objects can then be treated as a single instance of their base class. Invoking a non-void method on the group then returns another group; this time a group containing the futures representing the results of the computations (Figure 2.7). This is useful for programs making use of map-reduce algorithms.
Figure 2.7: An example of asynchronous method invocation on a group of active objects, resulting in a group of futur (sic) objects. (Image taken from [25]). Unfortunately, the apparent promise of ProActive serves simply to raise ones hopes and in reality oers little more than a wrapper around the Java RMI package. Designed to support so-called Grid applications, the core of the ProActive paradigm is an inelegant, uninteresting collection of XML descriptors and meaningless acronyms, na vely pulled together by the illusion that an omniscient network of heterogeneous machines, sinisterly referred to only as The Grid, will accept this software with open arms. A highly contrived use-case is described in [25], but, lacking any benchmarks or quantitative evaluation, oers little insight into the usefulness of this research.
2.3
Object Persistence
In a similar vein to thread distribution, object persistence can be achieved transparently with a modied virtual machine, or explicitly using an API.
2.3.1
Orthogonally Persistent Java (OPJ)
OPJ [26] is an implementation of transparent object persistence designed by the same team that worked on dJVM (Section 2.1.2). The primary objectives of OPJ are: Complete transparency To prevent the programmer from having to declare explicitly persistent objects, OPJ uses static variables as persistent roots of a program.
4 i.e.
types possibly derived from the same base class.
20
Concurrent issue of transactions ACID transactions are chained together, as described in the chain-and-spawn transaction system of [27]. The termination of a transaction atomically allows the next transaction to proceed. Portability To avoid restricting the portability of Java, OPJ implements a separate PersistentClassLoader for instantiating persistent objects. The backing store is also represented via a storage interface to hide its implementation details from the runtime. In order to use persistent objects correctly, read and write barriers are inserted into the Java bytecode to ensure that objects are faulted in from and written to the persistent store when necessary. When objects exist only in the persistent store, they are considered to be in the unfaulted state. A major design decision when implementing a persistence mechanism is the method of representing an unfaulted object. The type system of Java imposes further constraints on the structure of unfaulted objects. In [26], two methods of representing unfaulted objects are discussed. These use the notions of shells and faades. c A shell is simply a default, or empty object. The contents of a shell object is populated when it is dereferenced. This involves adding a read barrier to all occurrences of the getfield bytecode that checks the status of the object shell and replaces it with its persisted contents if required. A disadvantage of this approach is the high memory usage incurred by the empty shell objects. The faade approach uses less memory whilst also removing some of the c read barriers associated with the shell mechanism. A object faade appears to c the programmer as a normal copy of the object it represents. The rst access to a faade causes it to replace itself transparently with the real object. Like c the method specialisation framework of [1], this requires all classes to be fully virtualised in order to trap accesses to them. In order to resolve all references to when faulting in an object from the persistent store, a faade maintains c backreferences to all referring objects. This means that the read barrier cost is paid only once, i.e. on the rst object access. OPJ does not support persistence of transient data and therefore does not support persistence of threads.
2.3.2
Java Data Objects (JDO)
JDO [28] is another example of object persistence. Unlike OPJ, persistence is explicitly managed by the programmer using an XML schema to describe data in a manner similar to a relational database. JDO alone is merely a specication 5 , with a number of dierent implementations available. It is designed to take the role of a database in applications that use it whilst remaining independent of the underlying store. Since JDO is designed for use in domains where the opportunity for persistence is explicit, data transparency is only realised due to the environment. The programmer is still in full control of the persistent store and can query it using the JDOQL query language. A forerunner to JDO, is the Java Persistence API available as part of Enterprise JavaBeans.
5 The
original specication is online at http://www.jcp.org/en/jsr/detail?id=12
21
2.3.3
Enterprise JavaBeans (Java Persistence API)
The Enterprise JavaBeans architecture [29] denes the Java Persistence API [30] for persisting entities to disk. Like JDO, the system is heavily database oriented, with support for explicit data relationships and a query API. The API operates on Plain Old Java Objects (POJOs), separating the data from the control code. Annotations such as @Entity, @Id, @Column and @Table are used to specify object behaviour and purpose. The glue holding this system together is the EntityManager class. This allows the programmer to persist classes annotated with the @Entity annotation and treat the persistent store as though it was a normal database.
2.4
In-memory Database Systems
There is a ne line and certainly some overlap between object persistence and in-memory database management. The Java Persistence API (Section 2.3.3), for example, is heavily database oriented. We evaluate a simple database system for loading persisted Java collections into memory and allowing them to be queried and retrieved at high speed.
2.4.1
Space4J
Space4J [31] is an implementation of object persistence in Java designed to replace the use of a database in smaller applications. The API centres around the concept of a persistent, serialisable Space. A space can contain Java collections (commonly java.util.Map) which can be iterated over using methods on the enclosing space. Operations such as insertion and deletion into maps can be performed using the Command class. Explicit persistence is achieved by invoking the snapshot method on a space, causing the space to write its contents to disk. Implicitly, the data on disk is updated whenever a command is executed that modies the contents of a space in some way. If the application crashes during this operation, a log le is used to update the database during the next execution. A simple example of Space4J is shown below (adapted from the phonebook example of [31]. Exception code has been omitted for clarity). public class Directory { private Space4J space4j; private Space space; private static final String MAP_NAME = "directoryEntries"; private static final String SEQ_NAME = "directorySeq"; public Directory(Space4J space4j) { this.space4j = space4j; space4j.start(); space = space4j.getSpace(); } private int getNextId() { 22
return space4j.exec(new IncrementSeqCmd(SEQ_NAME)); } // The Entry class must implement Serializable public void addEntry(Entry entry) { space4j.exec(new PutCmd(MAP_NAME, getNextId(), entry)); } // Assume name is a primary key public Entry findEntry(String name) { Entry notFound = null; Iterator<Object> iter = space.getIterator(MAP_NAME); while (iter.hasNext()) { Entry cur = (Entry)iter.next(); if (cur.name.equals(name)) return cur; } return notFound; } } Whilst fairly clunky, the interaction with the object store is very similar to the use of a standard Java collection, such as a Map. Space4J also allows a distributed mode of operation. In this set-up, a master node can be contacted by slave nodes which then take a replica of any spaces when they are referenced.
2.5
Conclusion
Although widely adopted, the implementation of various object migration mechanisms remains non-standard. The balance between ease-of-use and eciency also varies between systems. Fully transparent approaches, such as those described in Section 2.1, tend to lead to incomplete implementations and questionable performance characteristics. This is due to the added complexities involved in creating such a system and the inability to recover the performance lost during object migration. Realising both ecient and transparent migration of objects in the general case can be achieved using specialised methods. Like the faade objects dec scribed in Section 2.3.1, specialised methods can be used to remove explicit object tests at runtime, such as the checks described in Section 2.1.3 and Section 2.1.4. The framework introduced in Section 1.3 buys back the performance loss of full virtualisation using guarded inlining. The generality of this framework combined with the potential performance benets makes it ideal for implementing object migration in systems such as the ones evaluated here. One of the more complicated applications of object migration is that of a distributed Java virtual machine. To date there is not a single implementation in common use. The following conclusions can be drawn from research carried out in Section 2.1.
23
Home nodes should be exible A common strategy for object distribution is the concept of migration out of an object-specic home node. The idea of adaptive home node re-assignment used in JESSICA2 (Section 2.1.4) shows that static home nodes can be a major bottleneck. Migrating an object to the node using it most is logical and can reduce the amount of migration taking place in the long run. Thread migration is not necessary in a homogeneous environment More specically, migrating both threads and objects in a homogeneous system is unnecessary. In a heterogeneous system, dynamic load balancing warrants migration of threads and data. However, a thread can only execute eciently if it is operating on local data and so migrating either threads or data reduces the amount of network trac whilst ensuring that execution remains localised. The method shipping technique used in Section 2.1.1 then becomes overly complicated, since we can just move the data onto the local node. Thread-local caches are crucial to performance The concept of thread local memory in the JMM is commonly taken to refer to hardware caches implemented on a processor die. This is one realisation of such memory, but in a distributed system we can implement additional caches in software on a per-node basis, as in the Hyperion DJVM (Section 2.1.3). Unless there is substantial cache contention between nodes, this will drastically reduce the amount of network trac at the cost of complexity (since now consistency becomes an issue). A DSM layer is convenient, but inhibits JVM optimisations Using a generic DSM layer to hide object migration is an attractive prospect from the implementers point of view. Unfortunately, this hides the location of objects from the JVMs, which then na vely dereference remote objects with no regard for the performance implications. This requires the use of expensive, high performance networks such as SCI used by Kaemik (Section 2.1.5). To avoid the costs of avoidable remote object dereferences, the JVMs in the cluster must be aware of each objects whereabouts. This can be achieved by implementing the DSM in the JVM, or by exposing it via an API that allows a ne degree of control over object movement. The emerging architecture of data movement between thread-local caches, where the communication between caches is minimised, strongly mirrors the architecture of a standard multi-processor architecture. In this situation, a cache-coherency protocol such as MESI [32] can be used to implement data movement eciently, or in this case, object migration. By statically scheduling the threads of a multi-threaded application and making use of adaptive home nodes, it should be possible to implement a modied version of the MESI protocol to provide a basic DSM abstraction in a distributed Java virtual machine. Clearly, implementing a full DJVM is a large and complex task. We can instead consider a DJVM as being built up of multiple layers, each potentially oering reuse in other domains. This is illustrated in Figure 2.8. Dissecting these layers allows us to exploit their reuse and map them onto simpler applications of object migration. For example, the persistent class loader implemented in OPJ (Section 2.3.1) removes the Message Passing and Remote 24
Figure 2.8: A high level view of the possible layers making up a DJVM. In this architecture threads are statically assigned to slave nodes by the master node. The object migration policy is encoded into objects by the master node. Details such as private slave classloaders and thread communication are omitted. Thread Instantiation layers. The Initial Object Distribution layer then involves reading persisted objects from disk. A simple two-state migration policy could then be used to ensure objects are faulted in correctly. To achieve optimal performance in a distributed system, it is necessary to tune data layout and communication between nodes to the particular application in hand. This requires programmer intervention. Of the distribution APIs evaluated, a minimum requirement to distribute a parallel program is the addition of keywords, for example, the remote keyword of JavaParty (Section 2.2.1). At the other end of the scale, an API can force you to re-architect your entire program around a new paradigm. JavaSpaces (Section 2.2.2) is a good example of this, whilst ProActive (Section 2.2.3) is only applicable to applications of a certain type anyway. The various implementations of object persistence do not provide persistence of threads, or any other transient data. As discussed above, thread migration is not a critical factor in distributed execution, but it does have a place in persistence. In particular, checkpointing a running JVM so that it can resume from that point in the event of failure requires the encapsulation of threads. Similarly, a failing node in a DJVM could migrate its threads to another node in order to continue the computation elsewhere. These two techniques require strong mobility of transient data, that is transient data is copied to another location. Many so called migration scheme make used of weak mobility which retains handles on transient data but does not attempt to encapsulate it. Overall, it can be seen that programmers require both transparency of object migration and high performance. These two goals are at odds with one another and in reality a compromise is sought. A sensible solution is to present
25
a transparent implementation of object migration, but one which can be easily customised such that its runtime behaviour matches the needs of a particular application.
26
Chapter 3
Specication
3.1 Aims
This project aims to explore the potential for transparent object migration using specialised method variants to ensure that objects are automatically placed into a coherent state prior to them being accessed. This will provide a powerful layer on which to build a variety of software for object distribution and storage. Transparency of operation should be matched with practicality and performance; migration behaviour should be exible and have a single point of exposure to allow custom protocols to be implemented where necessary. The capturing and migration of transient objects, in particular threads, will also be studied in order to realise full state migration of a running program.
3.2
3.2.1
Milestones
Method Specialisation Framework
The prototype method specialisation framework from [1] is currently built on top of Jikes 2.9.2; a stable release from October 2007. Since then Jikes has undergone a number of changes: A TIB is now represented by a class rather than an array of Object. Support for Java 5 language features. A number of performance improvements, including support for MMX registers and better code generation. As with all open source projects, a large number of bugs have been identied and xed. Notably, support for x86 64 machines has been greatly improved. An objective of this project is to develop a robust implementation of the framework that is compatible with the latest version of Jikes. This will enable us to take full advantage of the various improvements made to the RVM, especially those concerning performance.
27
3.2.2
Data Persistence Transforms
Building an implementation of data persistence is a step towards more complex migration systems. The method specialisation framework provides an environment in which to manipulate classes in the same way that OPJ (Section 2.3.1) does for faade objects. Implementing a similiar approach using bytecode transc formations during classloading is a basis for more sophisticated applications and mirrors the PersistentClassLoader of OPJ. Bytecode transformations in the method specialisation framework are dened using the ASM bytecode manipulation framework 1 . This framework makes use of the visitor pattern in order to facilitate easy traversal of the bytecode entities. A ClassReader visits nodes in the bytecode representation and calls the accept method on a ClassVisitor class, which has multiple, polymorphic denitions of accept in order to take dierence actions depending on the type of the node involved. To create new bytecode, a ClassWriter must be used. ClassWriter implements ClassVisitor; creating each node as it is visited. Ultimately, an array of bytes representing the visited (and possibly modied) class is returned from the ClassWriter. To implement the persistence transforms, it is necessary to extend the abstract class Transform. This requires a single method, public InputStream apply(InputStream is, String cname); to be dened. By implementing this method, we actually dene a new ClassWriter which contains the denitions of the methods described by ClassVisitor inside the apply method. Dening specialised versions of methods on all static objects using the VisitMethod method, we can cause static objects to load themselves into memory before resetting their TIB to the standard version using toggleTIB.
3.2.3
Checkpointing
With suitable transforms for data persistence in place, a natural extension is that of checkpointing during execution. This is a precursor to JVM recovery and should be investigated at both the explicit and implicit level. Explicit checkpointing can be achieved using keywords or annotations in the Java source code to communicate suitable points during execution to persist the state of the application. Implicit checkpointing could occur at thread yield points or in response to internal JVM events, for instance shutdown. Adding checkpointing to a persistent JVM provides a method for migrating objects on-demand which could then be used later on to distribute objects to remote processes. Explicit checkpointing may even be used to free up heap space (by eliminating static data structures) prior to a memory-intensive method invocation. Explicit checkpointing will require adding support for a new annotation to Jikes. When a method is decorated with this annotation, it indicates that Jikes should persist static objects to disk prior to invocation of the method. This can be achieved by using the shutdown code that will be required for the previous milestone. Such code will perform a sweep of static objects during JVM termination and persist them to disk.
1 http://asm.objectweb.org/
28
3.2.4
Thread Persistence
Once the persistence of non-transient objects is complete, thread persistence should be considered. Persistence of threads is not the same as thread migration. Weak thread migration can be achieved using remote procedure calls, but thread persistence requires strong migration of a thread in order to resume it at a later time (possibly in a dierent process). Strong thread migration is dicult to achieve. As mentioned in Section 1.2, Jikes makes use of green threads which are multiplexed onto a number of virtual processors, each of which is represented by a pthread. Jikes supports moving threads between virtual processors, using this technique to achieve a simple form of load balancing. Moving a Java thread between pthreads is accomplished using the machine-specic threadSwitch method. This method saves the state of the native machine registers as a eld in the Java thread object. Restoring this thread onto another pthread performs the opposite operation; the native machine registers are restored using the saved context. Persisting a set of registers to save the state of a thread is not enough. The stack of the pthread must also be saved in a portable manner. To allow access to the execution state of a thread, Jikes oers an API for on-stack replacement (OSR). Although designed for code optimisation, this API has successfully been used to implement thread migration in [33]. The downside is that it only works with the baseline compiler in Jikes rather than the optimising one. An alternative approach is to walk the stack of the thread manually. Threads within Jikes have a byte[] stack eld which acts as a pointer to their native machine stack. If the type information of this stack can be retrieved, then all the pointers on the stack can be identied. Pointers that point back into the stack can then be represented as an oset from the stack base. Pointers that point into the heap will need to be represented as some form of serialisable handle to the referenced object (which will also be serialised). This milestone should be investigated but, as it is not crucial to the DJVM implementation which this project pursues, it may be abandoned given insucient progress.
3.2.5
Fault Tolerance
Persistence of both objects and threads can be used to encapsulate the entire state of a program 2 and restore it elsewhere. A hot-swappable JVM can then be implemented to accept the persisted state of a program over a network and resume it automatically from where it left o. Failure to implement strong thread persistence still allows a degree of fault tolerance to be implemented. When the JVM begins to exhibit failure, the state of the program could be checkpointed, but without overwriting the last explicit checkpoint. When restoring the state of the failed JVM on a remote node, the programmer will be given the choice of whether to restore the last explicit checkpoint of the automatic checkpoint taken during the JVM failure. The context of the crash must be explained and especially the point in the program
2 This may not behave as one might expect when external inuences such as I/O are taken into account. The application programmer will have to ensure that the state of such entities is re-established if required.
29
where the automatic checkpoint was made in order for the programmer to make an informed decision. The code to perform data transfer to a remote node will be written as a separate daemon process in order to minimise the time spent executing critical code in the failing JVM process.
3.2.6
A Distributed Java Virtual Machine
Designing and implementing a distributed JVM is a large and dicult undertaking. It should be stressed that the following work is beyond the scope of this project and will be achieved only if sucient time remains after completion of the previous milestones. Any outstanding goals at the end of the project should be considered as future work. Load balancing of two independent nodes The simplest form of DJVM is that of two nodes operating independently on separate portions of a data set. Object migration occurs between them when load balancing is deemed necessary. This is only applicable to data parallel problems. For example, a parallel raytracer may divide a scene by assigning half of the pixels to one node (A) and the remainder to another (B). In terms of computation, simply dividing the scene in two may not distribute the workload fairly. Dynamic load balancing at runtime can then be used to even out the workload, making use of object migration between the nodes. Node A may have completed the rendering of its scene and can then start accessing Bs data portion, causing objects to migrate from B to A. Some communication will be needed to ensure that the workload does not overlap. Remote Procedure Calls on Migrated Objects Weak migration of objects with transient elds presents us with a problem. Invoking a method on a weakly migrated object which requires access to its transient state will return an erroneous result. To get around this, it must be possible to perform remote invocations on objects where transient data is required to compute the output. One such example is invoking the run method of a thread. This can be achieved easily using RMI wrappers around the aected methods. Distributed Classloading During classloading, a master node must load (and transform) the classes locally before instructing any remote nodes to perform loads using a remote classloader. This can be signalled using RMI. Java provides a rich classloader framework and it is easy to extend the basic classloader in order to create a more specic version. In this case we simply need to add support for remote loading. The master node will also be used to host static data, which will be accessed in a similar manner to transient data, i.e. via RMI calls to wrapper functions. Implementation of a Full Coherency Protocol Finally, an implementation of a coherency protocol will allow nodes to share data transparently. Each node will require a local object cache which will provide a number of static methods to control its behaviour. The transformed classes output by the distributed classloader will call these methods when they are dereference to ensure that the coherent version of the object is returned to the application. 30
Chapter 4
Evaluation
4.1 Testing
To avoid introducing bugs and regressions between updates, it is necessary to run (and pass) a series of tests on the modied code.
4.1.1
Jikes
Jikes includes a rich test framework allowing dierent distributions of the RVM to be testing in varying levels of detail. The main distributions of Jikes and their features are described below. Prototype Designed to aid in rapid development, the prototype build target is quick to build but results in a slow virtual machine. The optimising compiler is absent from the prototype build, as is the adaptive optimisation system. This target can be used quickly to test some new code but should not be relied upon to behave the same way as the more advanced builds of the RVM. Prototype-Opt Like the prototype build, prototype-opt is designed to be quick to build. However, the optimising compiler and adaptive system are both present. Whilst the resulting RVM is still fairly slow, this target is useful for testing the optimisation capabilities of Jikes, particularly to see if code changes elsewhere have caused them to generate bad code. It is essential that the optimising compiler operates correctly with a new project, so this target is of particular importance. Development The development build oers a better performing RVM at the cost of an extended build time. This is the build that should be tested prior to commits to the repository. Assertions remain enabled in the development build and so debugging is still possible to some extent. Production The fastest version of Jikes is the production build. Essentially, this is the same as the development build, but with assertions turned o. The production target should be used for all benchmarking of the RVM. A series of test runs are included in the Jikes source tree, allowing the user to stress-test particular components of the RVM or simply just test the software 31
for correctness. The tests are dened in the build/test-runs directory, of particular interest are the following. Core The core tests are designed to test the correctness of Jikes. None of the core tests should result in failure. The test run operates on all four build targets of the RVM. If a core test fails, it should be xed before development continues. Performance For benchmarking purposes, a performance test is included with the Jikes release. These tests run only on the production build of Jikes, executing a SpecJVM, SpecJBB and DaCapo benchmarks (See Section 4.2). These are essential when evaluating the system and help establish a base for other implementations to compare against. Pre-commit As the name suggests, the pre-commit test run is designed to be executed (and passed) prior to a commit to the Jikes repository. Two build congurations are tested by the run; the prototype and development targets. A series of basic tests is executed, followed by tests for the optimiser. The development build is also benchmarked using the DaCapo benchmark suite. This ensures developers do not commit regressions to the repository and also allows other developers to track down any major performance issues between builds. These tests should ensure that Jikes remains functional at all times, as well as providing some performance metrics for evaluation at a later stage.
4.1.2
Method Specialisation Framework
The method specialisation framework includes no tests of its own. The Jikes tests and benchmarks can be used to ensure no regressions are introduced by the framework and to measure the performance losses and gains experience when using the framework. When new ideas are implemented using the framework, new tests should be introduced to ensure that the new features behave as expected. The Jikes testing framework is highly modular and adding tests is a fairly straightforward task. As it stands, the full virtualisation (beanifier) transform in the method specialisation framework causes a small number of tests in Jikes to fail. More specically, the reection test InvokeReflect and the JNI test ClassQuery fail to execute correctly. InvokeReflect fails due to an assertion in the invocation of an interface method that the method must be public and abstract. This is likely due to an oversight in the beanifier transform or even an error in the assertion code caused by the revised TIB structure. The ClassQuery test fails when it cannot nd a native library. It appears that the java.library.path property is being ignored, but this should be investigated. These errors may be corrected during the port of the framework to the latest version of Jikes. If they still exist at that time, they will be investigated further.
4.2
Benchmarking
As mentioned in Section 4.1.1, Jikes runs a selection of benchmarks We briey discuss these benchmarks here, as well as another suite specically designed to 32
benchmark parallel programs.
4.2.1
DaCapo
The DaCapo benchmarking suite [34] is a selection of open-source, Java specic benchmarks, aimed at benchmarking real-world applications on an implementation of the Java virtual machine. Each benchmark can take a problem size of small, default or large depending upon the environment in question. Currently the majority of benchmarks contained in the DaCapo suite are sequential, but there are a growing number of multi-threaded benchmarks being added to the project.
4.2.2
SpecJBB
SpecJBB (The Java Business Benchmark) [35] is a benchmark designed for server side Java implementations. Specically, it emulates a three tier client/server system, as shown in Figure 4.1.
Figure 4.1: The architecture for the SpecJBB benchmark. Varying workloads can be applied by the client threads onto the business logic. This is a typical scenario for Java server applications. (Image taken from [35].) Results obtained from executing the SpecJBB benchmark can be found online at [35]. Interestingly, multiple JVM instances are often used in order to boost performance.
33
4.2.3
SpecJVM
SpecJVM [36] is a benchmark designed purely for measuring the performance characteristics of an implementation of the Java runtime. Like SpecJBB, SpecJVM is based on real world applications and acts as a system benchmark as well as a Java benchmark due to its lack of I/O dependent tests. As with all Spec benchmarks, results are made available online at [36].
4.2.4
Java Grande Forum
Unlike the benchmarks mentioned previously, the Java Grande Forum (JGF) benchmarks [37] do not come bundled with Jikes. These benchmarks are designed for applications labelled as Grande, simply meaning those applications with high memory and/or computational requirements. Typically these applications are used for computation in scientic and engineering disciplines, making the benchmarks particularly relevant in these areas. The JGF benchmarks come in a variety of version, each designed to test a particular architecture. Unlike many other benchmarks, there is a multithreaded suite of JGF benchmarks designed specically for benchmarking parallel machines [38]. This suite tests low-level operations such as thread spawning as well as high-level parallel applications like ray tracing and Monte Carlo simulations. The JGF benchmarks quickly allow users to draw comparisons between different JVM implementations based on real statistics. For example, when comparing DSM implementations it may be useful to look at the time taken for threads to synchronise at a memory barrier.
4.2.5
Conclusion
To achieve both reliable and comparable results, the benchmarks included in Jikes RVM will be used to evaluate the performance of Jikes when using the updated method specialisation framework. Benchmarking object migration is a more dicult task. The multi-threaded benchmarks described in Section 4.2.4 can be used to benchmark a parallel system, for example a DJVM. To assess the cost of object migration, the benchmarks should be run on a single node, on a number of nodes using explicit object migration and on a number of nodes using object migration transforms. Transparent implementations of object persistence and fault tolerance are harder still to benchmark. The benets of such systems arise from ease of use and reduced programming complexity, metrics which are not reected in benchmark suites. The performance of object persistence will likely be dominated by I/O transfer rates to disk and the performance of a fault tolerant JVM will likely be dominated by the network transfer time to migrate the state of the application. This should, however, be conrmed in order to check that the price of the migration transforms remains negligible. As the project progresses, each milestone should be evaluated as described in the following sections.
34
Method Specialisation Framework The diculty with this milestone arises from the inability to perform incremental testing. The updated framework can only be tested once the entire codebase has been ported. This will undoubtedly expose a large number of initial problems. Once these have been remedied, the standard Jikes tests should be executed to test for regressions. The read-barrier implemented in [1] should also be conrmed to work with the new framework. Inevitably, problems will be raised with the framework when it is being used in later milestones and time will have to be spent returning to this. Data Persistence Transforms The performance of data persistence will be dominated by the time taken to write serialised objects to disk. It is therefore more important to ensure correctness of the transforms rather than attempt to reduce execution time. A series of tests should be dened and executed to investigate the behaviour of the transforms in dierent situations. Standard A normal Java program with a small number of static variables. Both persisting and restoring the data should be tested and conrmed to work. No state A Java program containing no static variables should still execute as normal, leaving behind no persisted state. Ordering The ordering in which programs are persisted and restored should not aect the outcome of execution. For example, executing program A followed by program B and then restoring them in any order should behave as expected. Portability Persisting a program on a dierent machine from the one on which it is restored should have no aect on the execution of the program. If required, performance of the system could be evaluated with disk I/O factored out as a means for reasoning about the overheads of the specialised methods. Checkpointing Since this milestone is a logical progression from the previous one, the same tests will be executed both with and without checkpoint annotations present in the code. The Jikes internal tests should produce the same results as in Section 4.2.5. Additional tests may be dened to ensure that any new annotations do not cause problems in the JVM. Thread Persistence Persistence of threads can be tested largely in the same way as persistence of data (Section 4.2.5). Due to the dicult nature of thread persistence, it is likely that an implementation will either work fully or not work at all.
35
Fault Tolerance This is the rst milestone where performance issues come into play. A failing JVM has a limited amount of time in which to save its state before terminating completely. Persistence of data should therefore happen as quickly as possible. Failure to persist a JVM should result in no data being persisted, rather than a partial or erroneous saving of state. The time taken to persist a failing JVM can be carried out by sending a SIGSEGV signal to a JVM and intercepting its signal handlers. The handlers can then start a timer which is then stopped on completion of the data persistence operation. Testing correctness of such a system depends on the strength of object persistence achieved. If strong thread persistence is supported, the existing tests in Jikes can simply be interrupted and restored during execution in order to test the fault tolerance of the system. In the presence of weak persistence, new tests will need to be written to check that the restored state is coherent. A Distributed Java Virtual Machine A full distributed JVM will require a test framework. This will be based on the framework within Jikes, adding additional tests to ensure aspects such as object migration and distributed classloading function correctly. The transparent nature of the software also allows us to reuse the existing Jikes tests. Benchmarking the system can be achieved using the JGF multi-threaded benchmarks (Section 4.2.4) as well as the sequential benchmarks to observe any slowdown. The level of object migration should also be investigate as this provides an indicator of the transfer overheads in a given application. The eciency of the coherency protocol can be evaluated using these results.
36
Bibliography
[1] A. M. Cheadle, A. J. Field, J. Nystrom-Persson, A method specialisation and virtualised execution environment for Java, in: VEE 08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, ACM, New York, NY, USA, 2008, pp. 5160. [2] Tiobe programming community index. URL http://www.tiobe.com/index.php/tiobe_index/ [3] Java technology: The early years. URL http://java.sun.com/features/1998/05/birthday.html [4] The Java language environment. URL http://java.sun.com/docs/white/langenv/Intro.doc2.html [5] Tuning garbage collection with the 5.0 Java[tm] Virtual Machine. URL http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html# 1.1.%20Types%20of%20Collectors|outline [6] J. Bull, L. Smith, L. Pottage, R. Freeman, Benchmarking Java against C and Fortran for scientic applications, Tech. rep., Edinburgh Parallel Computing Centre, Scotland, UK. [7] A. Georges, D. Buytaert, L. Eeckhout, Statistically rigorous Java performance evaluation, Tech. rep., Department of Electronics and Information Systems, Ghent University, Belgium. [8] JikesRVM Homepage. URL http://www.jikesrvm.org [9] I. Rogers, J. Zhao, I. Watson, Boot Image Layout for Jikes RVM, in: Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS), 2008. [10] M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney, Architecture and Policy for Adaptive Optimization in Virtual Machines. URL http://domino.research.ibm.com/comm/research_people.nsf/ pages/dgrove.RC23429.html [11] E. Radeke and M.H. Scholl, Functionality for object migration among distributed, heterogeneous, autonomous DBS, Research Issues in Data Engineering, International Workshop on 0 (1995) 58.
37
[12] Alberto O. Mendelzon and Tova Milo and Emmanuel Waller, Object migration, in: PODS 94: Proceedings of the thirteenth ACM SIGACTSIGMOD-SIGART symposium on Principles of database systems, ACM, New York, NY, USA, 1994, pp. 232242. [13] Radeke, E. and Scholl, M.H., Framework for object migration in federated database systems, Parallel and Distributed Information Systems, 1994., Proceedings of the Third International Conference on (1994) 187194. [14] El-Sharkawi, M.E. and Kambayashi, Y., Object migration mechanisms to support updates in object-oriented databases, Databases, Parallel Architectures and Their Applications,. PARBASE-90, International Conference on (1990) 378387. [15] Wolfgang Lux, Adaptable object migration: concept and implementation, SIGOPS Oper. Syst. Rev. 29 (2) (1995) 5469. [16] IEEE International Conference on Parallel Processing (ICPP-99), cJVM: a Single System Image of a JVM on a Cluster. [17] J. Zigman, R. Sankaranarayana, Designing a distributed JVM on a cluster, in: Proceedings of the 17th European Simulation Multiconference, Nottingham, United Kingdom, 2003. URL http://djvm.anu.edu.au/publications/djvm_design.pdf [18] G. Antoniu, L. Boug, P. Hatcher, K. Mcguigan, R. Namyst, b mark macbeth. [19] S. Thibault, A exible thread scheduler for hierarchical multiprocessor machines, in: Second International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), Cambridge, USA, 2005. URL http://hal.inria.fr/inria-00000138 [20] W. Zhu, C. li Wang, F. C. M. Lau, JESSICA2: A distributed Java virtual machine with transparent thread migration support, in: In IEEE Fourth International Conference on Cluster Computing, 2002. [21] J. Andersson, S. Weber, E. Cecchet, C. Jensen, V. Cahill, Kaemik- A distributed JVM on a single address space architecture. [22] JavaParty Trac. URL http://svn.ipd.uni-karlsruhe.de/trac/javaparty/wiki/ JavaParty?redirectedfrom=WikiStart [23] JavaSpaces Service Specication. URL http://java.sun.com/products/jini/2.0/doc/specs/html/ js-spec.html [24] Getting Started With JavaSpaces Technology. URL http://java.sun.com/developer/technicalArticles/tools/ JavaSpaces/
38
[25] L. Baduel, F. Baude, D. Caromel, A. Contes, F. Huet, M. Morel, R. Quilici, Programming, Composing, Deploying for the Grid, in: GRID COMPUTING: Software Environments and, Springer Verlag, 2006. [26] A. Marquez, S. Blackburn, G. Mercer, J. N. Zigman, Implementing Orthogonally Persistent Java, in: POS-9: Revised Papers from the 9th International Workshop on Persistent Object Systems, Springer-Verlag, London, UK, 2001, pp. 247261. [27] S. M. Blackburn, J. N. Zigman, Concurrency - the y in the ointment, Morgan Kaufmann, 1999, pp. 250258. [28] Sun Java Data Objects Homepage. URL http://java.sun.com/jdo/index.jsp [29] Enterprise JavaBeans Specication. URL http://java.sun.com/products/ejb/docs.html [30] The Java Persistence API. URL http://java.sun.com/javaee/technologies/persistence.jsp [31] Space4J Homepage. URL http://www.space4j.org/ [32] M. S. Papamarcos, J. H. Patel, A low-overhead coherence solution for multiprocessors with private cache memories, in: ISCA 84: Proceedings of the 11th annual international symposium on Computer architecture, ACM, New York, NY, USA, 1984, pp. 348354. [33] Raaele Quitadamo and Giacomo Cabri and Letizia Leonardi, Mobile JikesRVM: a Framework to Support Transparent Java Thread Migration. [34] S. M. Blackburn, R. Garner, C. Homan, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi, c T. VanDrunen, D. von Dincklage, B. Wiedermann, The DaCapo benchmarks: Java benchmarking development and analysis, in: OOPSLA 06: Proceedings of the 21st annual ACM SIGPLAN conference on ObjectOriented Programing, Systems, Languages, and Applications, ACM Press, New York, NY, USA, 2006, pp. 169190. [35] SpecJBB 2005 Homepage. URL http://www.spec.org/jbb2005/ [36] SpecJVM 2008 Homepage. URL http://www.spec.org/jvm2008/ [37] J. M. Bull, L. A. Smith, M. D. Westhead, D. S. Henty, R. A. Davey, Benchmarking Java Grande Applications, in: In Proceedings of The Second International Conference on The Practical Applications of Java, 2000, pp. 6373. [38] L. A. Smith, J. M. Bull, A Multithreaded Java Grande Benchmark Suite, in: In Proceedings of the Third Workshop on Java for High Performance Computing, 2001.
39

Towards A Distributed Java Virtual Machine Transparent Migration of Objects Using Specialised Methods

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Towards A Distributed Java Virtual Machine Transparent Migration of Objects Using Specialised Methods

Enviado por

Direitos autorais:

Formatos disponíveis

Towards a Distributed Java Virtual Machine Transparent Migration of Objects using Specialised Methods

Will Deacon wjd105@doc.ic.ac.uk Supervisor: Tony Field January, 2009

Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . A Distributed Java Virtual Machine . . . . . . . . . . . .

The Jikes Research Virtual Machine

Applications of Object Migration in Java

D in ACID stands for Durability

Hot-swappable (fault tolerant) JVMs

Background and Related Work

Orthogonally Persistent Java (OPJ)

types possibly derived from the same base class.

Java Data Objects (JDO)

original specication is online at http://www.jcp.org/en/jsr/detail?id=12

Enterprise JavaBeans (Java Persistence API)

In-memory Database Systems

Data Persistence Transforms

A Distributed Java Virtual Machine

Method Specialisation Framework

benchmark parallel programs.

Java Grande Forum

Você também pode gostar