Você está na página 1de 39

Java Performance Tuning

by Fabian Skive

Overview
Profiling methodology Profiling tools Case study

Introduction
There is a general perception that Java programs are slow. In early versions of Java, you had to struggle hard and compromise a lot to make a Java application run quickly. The VM technology and Java development tools have progressed to the point where a Java application is not particularly handicapped.

Why is it slow ?
The virtual machine layer that abstracts Java away from the underlying hardware increase the overhead. These overheads can cause Java application to run slower that an equivalent application written in a lower-level language. Java's advantages platform-independence, memory management, powerful exception checking, built-in multi-threading, dynamic ressource loading and security checks add costs.

The tuning game


Performance tuning is similar to playing a strategy game. Your target is to get a better score than the last score after each attempt. You are playing with, not against, the computer, the programmer, the design, the compiler. Techniques include switching compilers, turning on optimizations, using a different VM, finding 2 or 3 bottleneck in the code that have simple fixes.

System limitations
Three ressources limits all applications :
CPU speed and availability System memory Disk (and network) input/output

The first step in the tuning is to determine which of these is causing your application to run slowly. When you fix a bottleneck, is normal that the next bottleneck switch to another limitations.

A tuning strategy
1.Identify the main bottlenecks (look for about the top five bottlenecks) 2.Choose the quickest and easiest one to fix, and address it. 3.Repeat from Step 1.
Advantage : - once a bottleneck has been eliminated, the characteritics of the application change, and the topmost bottlenck may no need to be addressed any longer.

Identify bottleneck
1. Measure the performance by using profilers and benchmark suites. 2. Identify the location of any bottlenecks. 3. Think of a hypothesis for the cause of the bottleneck. 4. Consider any factors that may refute your hypothesis. 5. Create a test to isolate the factor identified by the hypothesis. 6. Test the hypothesis 7. Alter the application to reduce the bottleneck 8. Test that the alteration improves performance, and measure the improvement 9. Repeat from Step 1.

Perceived Performance
The users has a particular view of performance that allows you to cut some corners.
Ex : A browser that gives a running countdown of the amount left to be downloaded from a server is seen to be faster that one that just sits here until all the data is downloaded.

Rules :
if application is unresponsive for more than 2 sec, it is seem as slow. Users are not aware of response time improvements of less than 20 %

How to appear quicker ?


Threading : ensuring that your application remains reponsive to the user, even while it is executing some other function. Streaming : display a partial result of the activity while continuing to compile more results in background. (very useful in distributed systems). Caching : the caching technics help you to speed the data access. The read-ahead algorithme use in disk hardware is fast when you reading forward through a file.

Starting to tune
User agreements : you should agree with your users what the performance of the applications is expected to be : response times, systemwide throughput, max number of users, data, ... Setting benchmarks : these are precise specifications stating what part of code needs to run in what amount of time. How much faster and in which parts, and for how much effort ? Without clear performance objectives, tuning will never be completed

Taking Measurements
Each run of your benchmarks needs to be under conditions that are identical as possible. The benchmark should be run multiples times, and the full list of results retained, not just the average and deviation. Run a initial benchmark to specify how far you need to go and highlight how much you have achieved when you finish tuning. Make your benchmark long enough (over 5 sec)
   

What to measure ?
Main : the wall-clock time (System.currentTimeMillis()) CPU time : time allocated on the CPU for a particular procedure Memory size Disk throughput Network traffic, throughput, and latency
    

Java doesn't provide mechanisms for measuring theses values directly.

Profiling Tools
Measurements and timings Garbage collection Method calls Object-creation profiling Monitoring gross memory usage
    

If you only have a hammer, you tend to see every problem as a nail.
Abraham Maslow

Measurements and Timings


Any profiler slow down the application it is profiling. Using currentTimeMillis() is the only reliable way. The OS interfere with the results by the allocation of different priorities to the process. On certain OS, the foreground processes are given maximum priority. Some cache effects can lead to wrong result.
    

Garbage Collection
Some of the commercial profilers provide statistics showing what the garbage collector is doing. Or use the -verbosegc option with the VM. With VM1.4 : java
!  

-Xloggc:<file>

The printout includes explicit synchronous calls to the garbage collector and asynchronous executions of the garbage collector when free memory available gets low.

Garbage Collection
The important items that all -verbosegc output are
the size of the heap after garbage collection the time taken to run the garbage collection the number of bytes reclaimed by the garbage collection. Cost of GC to your application (percentage) Cost of the GC in the application's processing time
$ & ( ' % # "

Interesting value :

GC Viewer
Supported verbose:gc formats are: Sun JDK 1.3.1/1.4 with the option -verbose:gc Sun JDK 1.4 with the option -Xloggc:<file> (preferred) IBM JDK 1.3.0/1.2.2 with the option -verbose:gc
0 3 2 1 )

GCViewer shows a number of lines : Full GC Lines: Black vertical line at every Full GC Inc GC Lines: Cyan vertical line at every Incremental GC GC Times Line: Green line that shows the length of all GCs Total Heap: Red line that shows heap size Used Heap: Blue line that shows used heap size
4 8 7 6 5

GC Viewer
GCViewer also provides some metrics: Acc Pauses: Sum of all pauses due to GC Avg Pause: Average length of a GC pause Min Pause: Shortest GC pause Max Pause: Longest GC pause Total Time: Time data was collected for (only Sun 1.4 and IBM 1.3.0/1.2.2) Footprint: Maximal amount of memory allocated Throughput:Time percentage the application was NOT busy with GC Freed Memory: Total amount of memory that has been freed Freed Mem/Min: Amount of memory that has been freed per minute
@ H G F E D C B A 9

GC Viewer

Method Calls
Show where the bottlenecks in your code are and helping you to decide where to target your efforts. Most method profilers work by sampling the call stack at regular intervals and recording the methods on the stack. The JDK comes with a minimal profiler, obtain by using the -Xrunhprof option (depends on the JDK). This option produces a profile data file (java.hprof.txt).
I Q P

Rolf's Profile Viewer


For each method
a count of the number of times the method is invoked a short form of the class and method name itself the time spent in that method (in seconds) a bargraph of the time.
T W X V U S R

All the methods which call the current method are listed in the caller pane All the methods that the current method itself invokes are listed in the callee pane.

Rolf's Profile Viewer

Object creation
Determine object numbers Identifying where particular objects are created in the code. The JDK provides very rudimentary objectcreation statistics. Use a commercial tool in place of the SDK.
` b a Y

Monitoring Gross Memory Usage


The JDK provides two methods for monitoring the amount of memory used by the runtime system : freeMemory() and totalMemory() in the java.lang.Runtime class. returns a long, which is the number of bytes currently allocated to the runtime system for this particular VM process.
totalMemory()
c d

returns a long, which is the number of bytes available to the VM to create objects from the section of memory it controls.
freeMemory()

Tools
(commercial) Optimizeit from Borland (commercial) JProbe from Quest Software (commercial) JProfiler from ej-technologies (commercial) WebSphere Studio from IBM (free) HPjmeter from Hewlett-Packard (free) HPjtune
g q p i h f

Case study :
Tuning IO performance

Tuning IO performance
The example consists of reading lines from a large files. We compare differents methods on 2 files :
small file with long lines long file with short lines
u v t r s

We test our methods with four JVM config :


JVM 1.2.2 JVM 1.3.1 JVM 1.4.1 JVM 1.4.1 -server
x y w

Method 1 : Unbuffered input stream


Use the deprecated method readLine() from DataInputStream.

DataInputStream in = new DataInputStream(new FileInputStrem(file)); while ((line = in.readLine()) != null) { doSomething(line); } in.close();

Method 2 : Buffered input stream


Use a BufferedInputStream to wrap the FileInputStream.

DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStrem(file))); while ((line = in.readLine()) != null) { doSomething(line); } in.close();

Method 3 : 8K buffered input stream


Set the size of the buffer to 8192 bytes.

DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStrem(file),8192)); while ((line = in.readLine()) != null) { doSomething(line); } in.close();

Method 4 : Buffered reader


Use Readers instead of InputStreams, according to the Javadoc, for full portability, etc.

BufferedReader in = new BufferedReader(new FileReader(file)); while ((line = in.readLine()) != null) { doSomething(line); } in.close();

Method 5 : Custom-built reader


Let's get down to some real tuning. You know from general tuning practices that creating objects is overhead. Up until now, we have used the readLine() method, which returns a string. Suppose we avoid the String creation. Better, why not working directly on the underlying char array.

Method 5 : Custom-built reader


We need to implement the readLine() functionnality with our own buffer while passing the buffer to the method that does the string processing. Our implementation uses its own char array buffer. It reads in characters to fill the buffer, then runs through the buffer looking for ends of lines.

Method 5 : Custom-built reader


Each time the end of a line is found, the buffer together with the start and end index of the line in that buffer, is passed to the doSomething() method. This implementation avoids both String-creation overhead and the subsequent String-processing overhead.

Method 6 : Custom reader and converter


Better, performing the byte-to-char conversion. Change the FileReader to FileInputStream and add a byte array buffer of the same size as the char array buffer. Create a convert() method that convert the byte buffer to the char buffer.

Results with small file


Method Unbuffered input stream Buffered input stream 8K Buffered input stream Buffered reader Custom-built reader Custom reader and converter JDK 1.2.2 2293.75% 933.33% 931.25% 1143.75% 981.25% 441.67% JDK 1.3.1 2077.08% 97.92% 95.83% 116.67% 87.50% 39.58% JDK 1.4.1 2247.92% 100.00% 95.83% 131.25% 85.42% 58.33% JDK 1.4.1 -server 2233.33% 239.58% 133.33% 193.75% 189.58% 114.58%

The file contains 10000 lines of 100 caracters. (977Kb)

Results with long file


Method Unbuffered input stream Buffered input stream 8K Buffered input stream Buffered reader Custom-built reader Custom reader and converter JDK 1.2.2 2381.25% 943.75% 931.25% 1139.58% 975.00% 427.08% JDK 1.3.1 2039.58% 97.92% 95.83% 106.25% 108.33% 43.75% JDK 1.4.1 2189.58% 100.00% 97.92% 133.33% 85.42% 56.25% JDK 1.4.1 -server 2106.25% 164.58% 116.67% 170.83% 110.42% 143.75%

The file contains 35000 lines of 50 caracters. (1,7Mb)

Links
www.javaperformancetuning.com www-2.cs.cmu.edu/~jch/java/optimization.html www.cs.utexas.edu/users/toktb/J-Breeze/javaperform.tips.html www.javagrande.com http://java.sun.com/j2se/1.4.1/docs/guide/jvmpi/jvmpi.html www.run.montefiore.ulg.ac.be/~skivee/java-perf/
g f e d

Você também pode gostar