Você está na página 1de 21

HP-UX setup and tuning information for BEA WebLogic

Server on HP Integrity servers

HP-UX 11i v3 and 11i v2; BEA WLS 10.0 and 9.2; HP JDK 5

Executive summary............................................................................................................................... 2
HP-UX 11i v3 and 11i v2 ..................................................................................................................... 3
Kernel............................................................................................................................................. 3
Kernel parameter changes............................................................................................................. 3
Network tuning.............................................................................................................................. 11
Network tuning changes ............................................................................................................. 11
Hyperthreading ............................................................................................................................. 12
Processor sets ................................................................................................................................ 13
Additional settings recommended by BEA ......................................................................................... 13
WebLogic Server ............................................................................................................................... 13
Discussion ................................................................................................................................. 14
HP Java Runtime Environment (JRE) Standard Edition (SE) 5.0.08 ............................................................ 15
Discussion ................................................................................................................................. 16
Appendix: Database – Oracle 10.2.0.2 .............................................................................................. 18
init.ora.......................................................................................................................................... 18
Log writer...................................................................................................................................... 19
Block size and cache size parameters .............................................................................................. 20
Statistics parameters ....................................................................................................................... 20
Summary .......................................................................................................................................... 20
For more information.......................................................................................................................... 21
Executive summary
This paper provides information about setting up the HP-UX operating system on HP Integrity servers to
host BEA WebLogic Server 10.0 and 9.2. Included in this paper are the kernel settings for HP-UX 11i
v3 and 11i v2, and information regarding processor set, hyperthreading, networking and the HP
Java™ Virtual Machine (JVM). This paper is based on details we have discovered while running the
SPECjAppServer2004 benchmark. Details of the benchmark settings may be reviewed at
www.spec.org.
A quick way to verify if your HP-UX system is properly configured for running Java applications is to
go to www.hp.com/go/java and download and run the latest version of HPjconfig. HPjconfig will tell
you what kernel parameters to adjust, and what patches to download for your system.
The information presented is an evolution of detail as we have run the J2EE benchmarks, the latest of
which is SPECjAppServer2004. As of August 2007, some of the information mentioned in this paper
is new, some is about updated versions and parameters, and some is about obsolete versions and
parameters; we include this information so you can hopefully leverage the learning curve we have
gone through.
Elements we believe to be generally important for better performance are marked by (*). However, all
benchmarks and tuning parameters depend on the application. Each application has characteristics
of its own and thus tuning will be specific to that application, the version of the Java Virtual Machine
(JVM), the version of BEA WebLogic Server (WLS), the operating system, the system’s hardware, and
the degree of vertical or horizontal scaling.
You can review the kernel and setup parameters for specific benchmarks by reviewing the write-ups
provided at www.spec.org.

Target audience: This paper is intended primarily for IT professionals and systems administrators
responsible for installing and tuning BEA WebLogic Server on HP Integrity servers running HP-UX.

2
HP-UX 11i v3 and 11i v2
To restate a previous statement, a quick way to verify if your HP-UX system is properly configured for
running Java applications is to go to www.hp.com/go/java and download and run the latest version
of HPjconfig. HPjconfig will tell you what kernel parameters to adjust, and what patches to download
for your system. This is a very important starting point. After you have done this, then look at the
parameters below.
For system configuration settings recommended by BEA, see the HP-UX supported configurations
section of their website 1 .
HP has run these benchmarks on systems ranging from HP Integrity Superdomes to HP Integrity
rx2660 servers and everything in between, as well as on older hardware platforms. This is a broad
range of systems, yet the parameters below are relatively consistent amongst the various models.
The sections below discuss the tunable kernel parameters, kernel tuning using adb (absolute
debugger) for network performance, networking parameters, hyperthreading and processor sets

Kernel
Kernel parameter changes
Below is a list of kernel parameters we have used for the benchmark runs. Since it was developed in
an evolutionary style, we suggest you use them as a learning tool and build your own template based
on what makes sense for your system. Below the list is a dialog on each parameter.
The parameters that we used for runs on HP-UX 11i v2 are marked as such. When using HP-UX 11i
v3, you may omit specifying those parameters (some of these parameters are obsolete).
More information for the kernel tunable parameters can be found at hp.com 2 .
The parameters that are especially helpful for performance or required to avoid error messages are
marked with (*).

Parameter Value

STRMSGSZ 65535

cmc_plat_poll (11iv2) 15

create_fastlinks 1

dbc_max_pct (11iv2) 8

dbc_min_pct (11iv2) 8

default_disk_ir 1

fs_async 1

hfs_max_ra_blocks (11iv2) 20

1
HP-UX documentation for WLS 10.0 is found at http://edocs.bea.com/platform/suppconfigs/configs/hpux/
2
HP-UX kernel parameter (section 5) manuals can be found at http://docs.hp.com/en/B2355-60105/

3
Parameter Value

hfs_max_revra_blocks (11iv2) 20

hfs_ra_per_disk (11iv2) 256

hfs_revra_per_disk (11iv2) 256

max_async_ports 768

max_thread_proc 4096

maxdsiz * 4294963200

maxfiles * 32768

maxfiles_lim * 32768

maxssiz 20000000

maxtsiz 1073741824

maxuprc 2040

maxvgs (11iv2) 80

msgmap (11iv2) 5122

msgmax (11iv2) 32768

msgmnb 65536

msgseg (11iv2) 20480

msgssz (11iv2) 128

msgtql 5120

nfile * 150000

ninode 8192

nkthread 32768

nproc 5000

npty 200

nstrpty 200

4
Parameter Value

nswapdev 25

o_sync_is_o_dsync (11iv2) 1

scsi_max_qdepth (11iv2) 8

semmni 4096

semmns 8192

semmnu 4092

semume 512

shmmax 15000000000

shmmni 520

shmseg 512

swapmem_on (11iv2) 1

swchunk 22000

tcphashsz 32768

vps_ceiling 64

vx_ninode (11iv2) 6555

Discussion

Parameter Description

STRMSGSZ Limits the number of bytes of message data that can be inserted by
putmsg() or write() in the data portion of any streams message on the
system. If the tunable is set to zero, there is no limit on how many bytes
can be placed in the data segment of the message.

cmc_plat_poll (11iv2) Corrected Machine Check Platform Polling, Query the processor every
(specified) minutes for logging purposes. This was used with some old
SPECjAppServer2004 submissions, but was probably just copied from a
TPC-H submission where it was first used. This parameter has no effect
on Java benchmark performance.

5
Parameter Description

create_fastlinks When create_fastlinks is nonzero, it causes the system to create


Hierarchical file system ( HFS) symbolic links in a manner that reduces
the number of disk-block accesses by one for each symbolic link in a
path name lookup. This involves a slight change in the HFS disk format,
which makes any disk formatted for fast symbolic links unusable on
Series 700 systems prior to HP-UX Release 9.0 and Series 800 systems
prior to HP-UX Release 10.0 (this configurable parameter was present
on Series 700 Release 9.0 systems, but not on Series 800 HP-UX 9.0
systems).

dbc_max_pct (11iv2) Defines the maximum percentage of memory to be used for caching file
I/O data and metadata. We use 5% for a single instance and 8% for
multiple instances.

dbc_min_pct (11iv2) Defines the minimum percentage of memory used for caching file I/O
data and metadata. We use 5% for a single instance and 8% for
multiple instances.

default_disk_ir Enables (1) or disables (0) the Immediate Reporting behavior of the SCSI
subsystem, also known as Write Cache Enable (WCE). With Immediate
Reporting enabled, disk drives that have data caches return from a
write() system call, including raw writes, when the data is cached, rather
than returning after the data is written to the media. This sometimes
improves write performance especially for sequential transfers.

fs_async Specifies whether or not asynchronous writing of file system data


structures to disk is allowed. 0 – use synchronous disk writes only, or 1 –
allow asynchronous disk writes. Asynchronous writes to disk can
improve file system performance significantly. However, asynchronous
writes can leave file system data structures in an inconsistent state in the
event of a system crash.

hfs_max_ra_blocks Defines the maximum number of read-ahead blocks that kernel may
(11iv2) have outstanding for a single HFS file system.

hfs_max_revra_blocks Defines the maximum number of reverse read-ahead blocks that kernel
(11iv2) may have outstanding for a single HFS file system.

hfs_ra_per_disk (11iv2) Defines the amount of HFS file system read-ahead per disk drive, in KB.

hfs_revra_per_disk Defines the maximum HFS file system blocks to be read in one read-
(11iv2) ahead operation when sequentially reading backwards. The value of
this tunable should be increased if there is a large number of reverse
sequential file I/O on file systems with small file system block size.

max_async_ports Defines the maximum number of asynchronous disk ports that can be
open at any time.

max_thread_proc Defines the maximum number of concurrent threads allowed per


process. For a 4-core box, 2048 is probably sufficient.

6
Parameter Description

maxdsiz * Defines the maximum size (in bytes) of the data segment for any user
process. This tunable defines the maximum size of the static data
storage segment for 32-bit and 64-bit processes. The data storage
segment contains fixed data storage such as globals, arrays, static
variables, local variables in main(), strings, and space allocated using
sbrk() and malloc(). In addition, any files memory mapped as private
and shared library per-invocation data also resides in the data segment.

If this parameter is not set, applications can return out of memory or out
of swap space errors.

maxfiles * Specifies the initial default number of file descriptors a process is


allowed to have for open files at any given time. It is possible for a
process to increase its soft limit and therefore open more than maxfiles
files.

If this parameter is not set, applications that need many sockets may
have socket or open file errors.

maxfiles_lim * Specifies the system hard limit for the number of file descriptors that a
process is allowed to have for open files at any given time. It is possible
for a nonsuperuser process to increase its soft limit up to this hard limit.

This parameter needs to be set for the hard limit for maxfiles.

maxssiz Defines the maximum size (in bytes) of the stack for any user process.
The maximum value, 401,604,608, or a very large number may restrict
the size of the Java heap to less than 3500M (i.e., Xms and Xmx Java
parameters)

maxtsiz Controls the size (in bytes) of the text segment, which is the read-only
executable object code for the process that can be shared by multiple
processes executing the same program. For example, all copies of vi on
the system use the same text segment.

maxuprc A dynamic tunable that limits the maximum number of processes per
user. Only root can have more than the number of processes limited by
maxuprc.

maxvgs (11iv2) Defines the maximum number of Logical Volume Manager (LVM) Volume
Groups which may be created or activated on the system.

msgmap (11iv2) Specifies the size of (number of entries in) the message space resource
map that tracks the free space in shared inter-process communication
(IPC) message space. Each resource map entry is an offset-space pair
which points to the offset and size (bytes) of each contiguous series of
unused message space "segments".

7
Parameter Description

msgmax (11iv2) Specifies the maximum allowable size, in bytes, of any single message
in a System V message queue. msgmax must be no larger than msgmnb
(the size of a queue) nor can it be larger than the pre-allocated system-
wide message storage space (msgssz*msgseg).

msgmnb Specifies the maximum allowable total combined size, in bytes, of all
messages queued in a single given System V IPC message queue at any
one time. Each of the msgmni number of message queues in the system
has the same limitations and msgmnb sets the maximum number of
message bytes that a queue can store. However, other tunables, like
msgmax, msgtql, msgssz, and msgseg, influence how that space is
utilized and even if all of that space can be utilized. msgmni is only an
upper bound for any given queue.

msgseg (11iv2) Specifies the total number of "segments" of system-wide shared memory
message storage space which is shared among all IPC message queues.
The total available space for messages in the system is defined by the
product of msgseg*msgssz, the number of segments multiplied by the
segment size. Segments are only used to store messages larger than 64
bytes long. Messages smaller than or equal to 64 bytes long are stored
in a different area and do not consume a segment.

msgssz (11iv2) Specifies the size, in bytes, of a "segment" of memory space reserved
for storing IPC messages. Space for messages is acquired in segment-
sized increments as required to contain the message. Separate
messages do not share segments. Messages of size less than or equal to
64 bytes are allocated in a different area and do not require a segment.
The total available space for messages greater than 64 bytes in size on
the system is defined by the product of msgseg*msgssz, the number of
segments multiplied by the segment size. It is best to select a segment
size equal to the size of most messages sent by applications. This
ensures optimal memory utilization of the segments.

msgtql Specifies the maximum total system-wide individual messages across all
message queues. Every message has a header to specify message type
and location and the total number of headers is limited by msgtql.

nfile * Defines the maximum number of slots in the system open file table. This
number limits the cumulative number of open files by all processes in the
system. In addition to named files (regular files, directories, links, device
files, etc.), other objects that consume slots in the system open file table
include pipes, FIFOs, sockets, streams. Sockets slots are critical for
WebLogic. The large number specified for this benchmark is due to very
high activity for the SPECjAppServer2004 benchmark and is probably
not needed on most applications.

This parameter needs to be set to support maxfiles.

8
Parameter Description

ninode Defines the number of slots in the HFS inode table. This number limits the
number of open inodes that can be in memory for HFS file systems at
any given time. The inode table is used as a cache memory. For
performance reasons the most recent ninode (number of) open inodes
are kept in main memory. The table is hashed. Each unique open file
has an open inode associated with it. Therefore, the larger the number
of unique open files, the larger ninode should be.

nkthread Controls the absolute number of threads allowed on a system at any


given time. Increasing nkthread will allow more threads, and lowering it
will restrict the number of threads. 8192 is probably sufficient.

nproc Controls the absolute number of processes allowed on a system at any


given time. Increasing nproc will allow more processes, and lowering it
will restrict the number of processes.

npty Defines the number of pseudo terminal (pty) drivers that a system can
support. Using a parameter value significantly larger than the number of
ptys is not recommended. An excessively large value wastes kernel
memory space.

nstrpty Defines the number of STREAMS-based pseudo terminal (pts) drivers that
a system can support. nstrpty should be set to a value that is equal to, or
greater than, the number of pty devices on the system that will be using
STREAMS-based I/O pipes. Using a parameter value significantly larger
than the number of ptys is not recommended. nstrpty is used when
creating data structures in the kernel to support STREAMS-based ptys,
and an excessively large value wastes kernel memory space.

nswapdev Swap devices are managed in a table for easier indexing in the kernel.
nswapdev sets the kernel variable responsible for the upper limit on this
table, and thus the upper limit to devices which can be used for swap.

o_sync_is_o_dsync Specifies whether an open or fcntl with the O_SYNC flag set can be
(11iv2) converted to the same call with the O_DSYNC flag instead. This controls
whether the function can return before updating the file access.

scsi_max_qdepth (11iv2) The number of commands that can be outstanding varies by device, and
is not known to HP-UX. To avoid overflowing this queue, HP-UX will not
send more than a certain number of outstanding commands to any SCSI
device. This tunable sets the default value for that limit.

semmni Specifies the maximum number of System V IPC system-wide semaphore


sets (and identifiers) which can exist at any given time. A single
identifier (ID) is returned for each semget() system call to create a new
set of one or more (up to semmsl) semaphores.

semmns Specifies the maximum total individual System V IPC system-wide


semaphores which can be assigned by applications. Semaphores are
assigned in "sets" associated with an ID. Thus semaphores can be
distributed in any manner across the range of IDs with one or more per
ID. There is no reason to specify semmns less than semmni (the
maximum number of identifiers) as each ID requires at least one
semaphore.

9
Parameter Description

semmnu Specifies the maximum number of System V IPC system-wide processes


that can have "undo" operations pending at any given time.

semume Specifies the maximum number of System V IPC semaphores upon which
a single process can have outstanding (non-zero) "undo" operations.

shmmax Defines the maximum size (in bytes) for a System V shared memory
segment

shmmni Sets the number of unique shared memory segments creatable system
wide, since each segment is assigned an identifier by the kernel.

shmseg Defines the maximum number of System V shared memory segments per
process

swapmem_on (11iv2) This tunable was created to allow system swap space to be less than
core memory. To accomplish this, a portion of physical memory is set
aside as “pseudo-swap” space. While actual swap space is still
available, processes still reserve all the swap they will need at fork or
execute time from the physical device or file system swap. Once this
swap is completely used, new processes do not reserve swap, and each
page which would have been swapped to the physical device or file
system is instead locked in memory and counted as part of the pseudo-
swap space.

swchunk Swap space in the kernel is managed using 'chunks' of physical device
space. These chunks contain one or more (usually more) pages of
memory, but provide another layer of indexing (similar to inodes in file
systems) to keep the global swap table relatively small, as opposed to a
large table indexed by swap page. swchunk controls the size in
physical disk blocks (which are defined as 1 KB) for each chunk. The
total bytes of swap space manageable by the system is swchunk * 1 KB
* 16384 (the system maximum number of swap chunks in the swap
table). The way to think of swchunk is not as the size of the I/O
transactions in the swap system (in disk blocks), but as the number of
blocks that will be placed on one swap device (or file system) before
moving to the next device (assuming all priorities are equal). This
spreads the swap space over any devices and is called swap
interleaving. Swap interleaving spreads out the swap over many devices
and reduces the possibility of one single device becoming a bottleneck
for the entire system when swap usage is heavy.

tcphashsz Sets the size of the networking hash tables. A system that is going to
have a large number of connections on it all of the time may see some
benefit of increasing this value. This tunable needs to be a power of
two. If it is not specified as a power of two, then it is rounded down to
the nearest power of two.

vps_ceiling Defines the maximum (in kilobytes) of system-selectable page size

10
Parameter Description

vx_ninode The Symantec Veritas File System (VxFS) caches the inodes in an inode
table. The kernel tunable vx_ninode determines the number of inodes in
the inode table to help VxFS in caching. The vx_ninode static tunable is
initialized when a system is booted. Thus the changes in the vx_ninode
table will take effect only during the next system reboot.

Network tuning
All the tunables for ndd can be displayed by running “ndd -h”.
We have set these parameters in the past, but as of late, we use the default ndd parameters. You may
want to use these parameters if your application needs a significant number of ports and connections.

Network tuning changes


ndd 3 -set /dev/tcp tcp_conn_request_max 15000
ndd -set /dev/tcp tcp_smallest_anon_port 32768
ndd -set /dev/tcp tcp_naglim_def 1
ndd –set /dev/sockets socket_enable_tops 2

Discussion

Parameter Description
tcp_conn_request_max The maximum number of pending connection requests for any listening end
point. This tunable is also known as the maximum depth of the "listen
queue". The actual maximum for any given TCP endpoint in the LISTEN state
will be the MINIMUM of the tcp_conn_request_max and the value the
application passed-in to the listen() socket call. For this parameter to take
effect, it must be set BEFORE an application makes its call to listen(). So, if
you use ndd to set this value after the application has started, it will have no
effect unless you can get the application to recreate its LISTEN endpoint(s).
tcp_smallest_anon_port Smallest anonymous port number to use. We do not recommend the user
modify this value. It is recorded here as it was set for this benchmark.
tcp_naglim_def Initial value for the Nagle limit. We do not recommend the user modify this
value. It is recorded here as it was set for this benchmark.

3
For additional information on ndd, go to: http://docs.hp.com/en/B2355-60130/ndd.1M.html.

11
Parameter Description
socket_enable_tops We do not recommend setting this value. It was used to resolve an issue
with HP-UX 11i v2 networking. In previous HP-UX releases, networking used
to schedule incoming packets to different CPUs in the system. It would try
and schedule these packets to the processor running the application that
needed the data. However, in 11i v2, networking "fixed" a rare issue by
turning off this functionality for the most part. When this happens,
all inbound packets are processed on the same processor as the one
fielding interrupts for the network card. Thus, it is very easy to saturate a
particular CPU with network traffic, especially for a high-speed gigabit
Ethernet interface like GigE. To return to the previous behavior, you need to
use the adb tool to modify the kernel and set the value "enable_tops" to 2.
The default value is 1. We used it because we were saturating the network
at the very high end of the performance curve.

Hyperthreading
We do not recommend the use of hyperthreading with Java.
In 11i v3, hyperthreading is available. In some cases hyperthreading has increased performance, in
others it hasn’t. To turn-on hyperthreading, the system needs to be rebooted. During the reboot, go to
the EFI prompt and use the following commands.
cpuconfig 4 threads on
reset
Then when the system comes up, the user must login as root and execute the following command.
kctune 5 lcpu_attr=1
This tunable dynamically enables (1) or disables (0) the logical processor (LCPU) attribute in the
default processor set. On systems supporting hyperthreading technology, each hyperthread is
represented as an LCPU. Hyperthreading does improve the performance for the
SPECjAppServer2004 benchmark for WebLogic applications servers when using more than one
instance. Special attention to the number of Garbage Collection (GC) threads is needed when
running Java applications with hyperthreading enabled. When using any GC policy other than the
single-threaded SerialGC policy, it is highly recommended that you set the number of GC threads to
be equal to the number of physical (not logical) cores on the server. For example, if your server has
four physical cores, then set the number of GC thread by adding the following flag to the application
command-line:

-XX:ParallelGCThreads=4

In addition to explicitly setting the number of GC threads as described above, it is highly


recommended that you bind the multiple GC threads to physical cores by specifying this JVM flag:

-XX:+BindGCTaskThreadsToCPUs

4
For more information on EFI and cpuconfig, go to the system documentation. For example, for the rx2660, go to
http://docs.hp.com/en/AB419-9002B/apds03.html. For more information on cpuconfig threads and hyperthreading in a hard partition (nPars),
go to http://docs.hp.com/en/5991-1247B/ch07s08.html.
5
For additional information on lcpu_attr, go to: http://docs.hp.com/en/B2355-60130/lcpu_attr.5.html.

12
Processor sets
When setting up the runtime system, we have found it beneficial to lock the JVMs in processor sets,
especially when the number of cores increased beyond 4.
For a four-core two-socket box with hyperthreading turned on, we would set-up two WLS/JVM
instances for best performance, and would then tie the instances to specific processors.
First, create a processor set (the second processor set, because there is already a pset_id=0).
psrset 6 -c 4 6
Processor set 1 is created with CPUs 4 and 6 including their companion logical CPUs 5 and 7,
respectively. So default processor set 0 has CPUs 0-3 and processor set 1 has CPUs 4-7.
Second, bring up the JVM/WLS instances on each processor set. To put WebLogic in processor set 0
simply use the normal start up command. To get the second instance started in processor set 1, here is
the partial command.
psrset -e 1 <command bringing up WebLogic>

Additional settings recommended by BEA


The following command helps maximize usage of the network bandwidth by setting the maximum
transmission unit on the network card to 1500 bytes. This is the maximum packet size unless you are
using jumbo frames.

/sbin/ifconfig lo mtu 1500

To review BEA recommendations for HP-UX, go to the URL:


http://e-docs.bea.com/platform/suppconfigs/configs/hpux/hpux_11iv2_90.html and

http://e-docs.bea.com/platform/suppconfigs/configs/hpux/hpux_11iv3_100.html

WebLogic Server
The following parameters were set for WebLogic Server 9.2 and 10.0 on startup.
– Dweblogic.SocketReaders=2
– Dweblogic.management.discover=false
– Djavax.xml.parsers.DocumentBuilderFactory=
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
– Dcom.sun.xml.namespace.QName.useCompatibleSerialVersionUID=1.0

– -Dsun.net.inetaddr.ttl=0
– -Dnetworkingaddresscache=0

6
For additional information on psrset, go to: http://docs.hp.com/en/B2355-60130/psrset.1M.html.

13
Discussion

Parameter Description
Weblogic.SocketReaders = 2 Is a normal runtime parameter. SocketReaders=2 uses
/dev/poll for faster performance. Usually, the parameter is
set to 1, but with very high throughput, it is advisable to set
it to 2 or 3.

weblogic.management.discover = Is to minimize management overhead for this benchmark.


false Leave this parameter set to true for normal production.
javax.xml.parsers. The XML Registry is a facility for configuring and
DocumentBuilderFactory = administering the XML resources of WebLogic Server. XML
com.sun.org.apache.xerces. resources include the default parser and transformer
factories and external entity resolution. In particular, you use
internal.jaxp.
the XML registry to specify an alternative server-wide XML
DocumentBuilderFactoryImpl parser, used by default when parsing XML documents,
instead of the parser that is installed by default. You do this
by specifying the names of the classes that implement the
javax.xml.parsers.DocumentBuilderFactory and
javax.xml.parsers.SaxParserFactory interfaces; these
implementing classes are used to parse XML in DOM and
SAX mode, respectively.
com.sun.xml.namespace.QName. Needed as a workaround for Java Bug ID: 6267224.
useCompatibleSerialVersionUID=1. Starting with WebLogic 9.2 MP1, this parameter no longer
0 needs to be specified past HP JVM 5.0.4.
sun.net.inetaddr.ttl=0 The first time you invoke a Web service from a client
application that uses the WebLogic client JAR files, the client
caches the IP address of the computer on which the Web
service is running, and by default this cache is never
refreshed with a new DNS lookup. Since we are load
balancing, we need to look up the IP address every time.
This property may not be supported in future releases. A
value of 0 means to lookup the IP address each time.
networkaddress.cache.ttl=0 (default: -1) Specified in java.security to indicate the caching
policy for successful name lookups from the name service.
The value is specified as an integer to indicate the number of
seconds to cache the successful lookup. A value of -1
indicates "cache forever". A value of 0 means to lookup the
round robin IP address each time, which is used in the
benchmark. This parameter and the one before it make the
DNS server act like a Cisco Local Director.

14
HP Java Runtime Environment (JRE) Standard Edition (SE)
5.0.08
At the time of this writing, August 2007, 5.0.8 was the latest JVM. For the JVM to use large 512M
pages and save entries in the system’s Transaction Look Aside Buffer (TLB), use the following
commands:
cd /opt/java1.5/bin/IA64N
chatr +pi 512M +pd 512M java java_q4p

It is believed that there is no difference between 256M pages and 512M pages from performance
point of view; however, using 512M pages will result in an increase in the memory footprint of your
application. The default is 4M.
The following parameters were set for HP JVM on HP-UX 11i v3 and 11i v2. For more detailed
information, see the HP Java website 7 .

– XX:+AggressiveHeap -server -Xoptgc -Xmx3500m -Xms3500m -Xmn2450m


– XX:PermSize=85m 8 -Xverbosegc:file=stdout -XX:+ForceMmapReserved
– XX:-UseHighResolutionTimer -XX:SchedulerPriorityRange=SCHED_RTPRIO
– XX:+UseSpinning -XX:-UseFastAccessorMethods
– XX:-StackTraceInThrowable -XX:CICompilerCount=1
– XX:OldPLABSize=8192 -XX:TLABSize=32k

The parameters that were especially helpful for performance for this particular workload or required
to avoid error messages are marked with (*). Please keep in mind that SPECjAppServer2004 is an
application server benchmark, and, unlike normal 3-tier workloads which are usually database
bound, it is designed to stress the application server tier. Tuning for the SPECjAppServer2004
benchmark may not be suitable for your application. Use the options above as an example. Do not
blindly copy and paste into production run scripts, always profile and tune accordingly.

7
http://www.hp.com/products1/unix/java/infolibrary/prog_guide/hotspot.html
8
We used -XX:PermSize=96m for 11iv2

15
Discussion

Parameter Description

XX:+AggressiveHeap This option instructs the JVM to push memory use to


the limit. It sets the overall heap to around 3850MB,
the memory management policy defers collection as
long as possible, and (beginning with J2SE 1.3.1.05)
some garbage collection activity is done in parallel.
We have found it to be useful for certain applications
that create a lot of short lived objects. In addition to
maximizing the size of the Java heap,
AggressiveHeap also modifies internal JVM settings
that affect performance. When AggressiveHeap is
followed by -Xms and -Xmx on the command-line, the
Java heap size is overwritten to use the specified
values (by -Xms and -Xmx), but the "other" internal
JVM parameters remain in effect.
Server Default (doesn’t need to be specified). The Java
HotSpot Server VM is designed for maximum program
execution speed for applications running in a server
environment.
Xoptgc The optimistic garbage collection flag. Improves
garbage collection performance of applications with
mostly short-lived objects. A server-side application that
creates many short-lived objects for each transaction is
likely to benefit greatly with Xoptgc. However this flag
should be used with caution. It is not recommended for
applications that build up objects quickly during the run
time that are not short-lived.
Xms3500m * Java heap size information to minimize the garbage
collection during the benchmark. The entries made after
Xmx3500m *
setting AggressiveHeap override the changes it made.
In general, using a larger heap will improve
performance. Using the small default heap size is
generally detrimental to performance for significant
WebLogic applications.
Xmn2450m * Sets the Java new generation heap size. The "new
generation" is the area of the Java heap where all
newly created objects are placed, objects that survive
multiple garbage collections of the new generation are
eventually moved to another area of the Java heap
knows as the old generation. (This option replaces the
option -XX:NewSize=N.)
For this benchmark, a large Xmn is very helpful. For
most applications with a 3500MByte heap,
Xmn=1000m is a large enough size.
XX:PermSize=85m
Specifies the initial size, in bytes, of the Permanent
Space memory allocation pool. This value must be a

16
multiple of 1024 and greater than 1MB. Append the
letter k or K to indicate kilobytes, or m or M to indicate
megabytes.
Xverbosegc:file=stdout Prints out detailed information about the spaces within
the Java Heap before and after garbage collection. In
this benchmark, the output goes to standard out or the
WebLogic log file.
This parameter does not help performance. The data
that comes out can be fed to HPjmeter to analyze
garbage collection data. This free tool can be
downloaded from http://www.hp.com/go/java. This
parameter has minimal effect on performance.
XX:+ForceMmapReserved Tells the JVM to reserve the swap space for all large
memory regions used by the JVM. Use this option to
reserve the space for all large memory regions used by
the JVM. This includes the Java Heap, which is an
mmap'ed space. Starting with HP-UX 11.11, the default
behavior is that the memory regions be reserved lazily.
Most large server-side applications will use all of the
space, so improved performance can be obtained by
reserving the space at program initialization. If this
option is not used, 4K pages will be allocated for the
mmap'ed regions. This can put pressure on the chip's
translation lookaside buffer when it needs to translate
the virtual page's address to its corresponding physical
address.
XX:-UseHighResolutionTimer Instructs the VM to use longer intervals between checks
for the current time of day to reduce the number of
gettimeofday system calls. Default = 1ms.
XX:SchedulerPriorityRange=SCHED_RTPRIO Use real time thread scheduler priority. This prevents
* long running threads from degrading in priority. This
option can be used to both select the scheduling policy
and map the Java thread priorities, 1 (low) through 10
(high), to the underlying HP-UX thread priorities. See
the Java website 9 for details.

This parameter can drag down your system if you have


other applications also running on the same system as
this application has real time priority. You can specify
XX:SchedulerPriorityRange=SCHED_NOAGE to prevent
threads degrading in priority, but for the benchmark,
the best performance was achieved using
SCHED_RTPRIO.
XX:+UseSpinning Use spinlocks. When a thread t2 attempts to grab a
lock already held by another thread, t1, it optimistically
"spins" in hope that t1 releases the lock and t2 will not
have to do a costly OS sleep. This parameter might not
be performance critical for SPECjAppServer2004. You
may get the same performance if you don't include this

9
http://www.hp.com/products1/unix/java/infolibrary/prog_guide/hotspot.html?jumpid=reg_R1002_USEN#-
XX:SchedulerPriorityRange=SCHED

17
option on the command-line.
XX:-UseFastAccessorMethods We do not recommend using this option. At the time of
this benchmark, there was a bug in the JVM. This bug
has been fixed in subsequent releases of the JVM.
While resolving the bug, we enabled this flag to
disable fast path for resolved accessor methods, thus
increasing performance.
XX:-StackTraceInThrowable Do not collect backtrace in throwable for exceptions.
The VM does extra work internally just in case the user
asks for a stack trace when an exception is thrown, this
flag suppresses this which results in saving some CPU
cycles.
XX:CICompilerCount=1 Limit the number of Java Hotspot compiler threads
XX:OldPLABSize=8192 Specifies optimal Old Generation Promotion Local
Allocation Buffer size for the benchmark workload. The
default value should work for most of the workloads.
XX:TLABSize=32k Set the Eden Thread Local Allocation Buffer size to 32K.
This value is specific to SPECjas04, the default value
should work well for most applications.

Appendix: Database – Oracle 10.2.0.2


This appendix is simply to identify the parameters we used during our benchmark runs. The intent is
not to explain the parameters, as we expect the database administrator to be assisting in this area.
The parameters that we found especially helpful for performance or required to avoid error messages
are marked with (*). It is our intent to show how the Oracle® database was setup for the
SPECjAppServer2004 benchmark. Some of the parameters identified below may be undocumented
and not proper for production use. You may want to contact your Oracle administrator to get the
proper settings for your database.
For the benchmarks we applied the Oracle patch 5339853. The patch is very specific, and has some
prerequisites. You should obtain the patch after reviewing the need with Oracle support. The bug
description in the patch documentation reads “LGWR RUNNING OUT OF KERNEL I/O RESOURCES
WARNINGS”. The patch eliminates many misleading and repeated warnings of running out of kernel
resources from appearing in Oracle text *.trc logfiles.

init.ora
Oracle tuning in init.ora:
aq_tm_processes=0
compatible=10.1.0.0.0
control_files = /oradisk/cntrlspeccb
cursor_space_for_time = TRUE
db_4k_cache_size = 15G *
db_8k_cache_size = 10G *
db_block_checking=false
db_block_checksum=false
db_block_size = 2048 *
db_cache_advice=off
db_cache_size = 21G *

18
db_file_multiblock_read_count = 128
db_files = 256
db_keep_cache_size = 3G
db_name=specdb
dml_locks = 8000
hpux_sched_noage=178
java_pool_size=250M
lock_sga=true
log_buffer=204800000
log_checkpoint_interval=0
log_checkpoint_timeout=0
log_checkpoints_to_alert=true
open_cursors = 4000
parallel_max_servers = 100
pga_aggregate_target=0
processes = 4000
query_rewrite_enabled=false
replication_dependency_tracking=false
session_cached_cursors=3000
sessions = 4000
shared_pool_size = 4096M
sort_area_size=52428800
statistics_level=basic *
timed_statistics=false *
transactions = 4000
transactions_per_rollback_segment = 1
undo_management = AUTO
undo_retention = 500
undo_tablespace = undo_ts
_array_update_vector_read_enabled=true
_collect_undo_stats=FALSE
_cursor_cache_frame_bind_memory=true
_db_block_hash_latches=262144
_db_cache_pre_warm=FALSE
_db_writer_flush_imu=FALSE
_enable_NUMA_optimization=FALSE
_imu_pools=500
_in_memory_undo=true
_smm_advice_enabled=FALSE
_two_pass=FALSE
_undo_autotune=FALSE

Log writer
Put the log writer into realtime class with /usr/bin/rtprio 127 –PID and put the log writer
in a single CPU processor set using the psrset (1M) command; for example
psrset –c <CPU ID>; psrset –a 1 <PID>
At this time, log writer is single threaded and can only use one CPU at a time.

19
Block size and cache size parameters
All 4 parameters db_block_size, db_cache_size, db_4k_cache_size and db_8k_cache_size are
specified to indicate the amount of memory used for Oracle buffers. Choose the db_block_size that
best fits your data. In general, Oracle recommends using a large db_block_size, but this benchmark
doesn't have that much data re-use and doesn't need a large block holding too many rows that might
slow down the benchmark. Oracle has row based locking, but if there are 6 rows in one data block
buffer, only one thread can update that buffer at one time.
For this benchmark, 8k data blocks are used for the undo tables. There are a few tables that
have 4k block size, so there are buffers specifically for that size (i.e., db_4k_cache_size). The
database server used for this benchmark was a 32 processor / 64 core Integrity Superdome with
512GB memory, and so the sizes used for these parameters are huge, but smaller amounts can be
used successfully.

Statistics parameters
For top performance, statistics_level and timed_statistics should be set to as shown below to get about
5% CPU back from the database server.

The default is timed_statistics=true and statistics_level=typical.

When the database is running, you can use sqlplus as the DBA user to change these parameters:

alter system set timed_statistics=false (or true)


alter system set statistics=basic (or typical or all)

Summary
We have identified parameters and setup options used over several different runs of the
SPECjAppServer2004 benchmark to help tune HP Integrity Servers running HP-UX and BEA WebLogic
Server. However, in a production environment, performance may be impacted by a variety of factors.
In addition, your application will probably have different needs and therefore the tuning parameters
should be adjusted accordingly.

Important:
HP recommends proof-of-concept testing in a non-production environment
using the actual target application as a matter of best practice for all
application deployments. Testing the actual target application in a
test/staging environment identical to, but isolated from, the production
environment is the most effective way to estimate systems behavior.

20
For more information
BEA’s documentation: http://edocs.bea.com.
HP’s documentation: http://docs.hp.com.
HPjconfig tool: http://www.hp.com/go/java
HP Integrity servers, http://www.hp.com/go/integrity

To help us improve our documents, please provide feedback at www.hp.com/solutions/feedback

© 2007, 2008 Hewlett-Packard Development Company, L.P. The information


contained herein is subject to change without notice. The only warranties for HP
products and services are set forth in the express warranty statements
accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial
errors or omissions contained herein.
Java is a US trademark of Sun Microsystems, Inc. Oracle is a registered trademark
of Oracle Corporation and/or its affiliates. TPC-H is a trademark of the Transaction
Processing Performance Council.
4AA1-5006ENW, Revision 2, April 2008

Você também pode gostar