Você está na página 1de 34

IBM Power Systems Technical University

featuring IBM AIX and Linux


September 8 – 12, 2008 – Chicago, IL

AIX 6.1 Performance Differences

Session ID: pAI09

Speaker: Steve Nasypany

© 2008 IBM Corporation


IBM Training

Introduction
 VMM Page Replacement
– New defaults reducing the requirement for basic performance tuning
 VMM File IO Pacing Enabled By Default
 Performance Tunables
– Tunables are categorized into restricted and non-restricted tunables
 AIO
– Dynamic AIO tuning
– AIO Fast Path for CIO
 JFS2
– Read only access to files opened with CIO
 NFS
– Changes to TCP scaling window, R/W size and number of biod daemons
 Enhanced JFS “no-log” option
 MPSS support

© 2008 IBM Corporation


IBM Training

Review – AIX Page replacement algorithm


 When page replacement begins to run, it selects a
page type to steal based on:
– If the amount of file pages is above
maxclient/maxperm, file pages are chosen

How much memory is caching files


– If the number of file pages is between minperm and 100%
maxclient, the type is chosen based on re-paging
history Pick file pages Maxclient =
– If the amount of file pages is below minperm,
working storage and file pages are chosen without Maxperm=80%
checking for the re-paging history Pick file pages
 Re-paging history indicates if individual pages have -or-
been written to disk and read back recently
– Re-paging history adds a degree of uncertainty to w/s pages based
the selection process
– If re-paging history decides to pick working storage on recent history Minperm=20%
pages, system paging may begin
• This was intended as a “safety” valve, if we are Pick any pages
too aggressive in stealing file pages, stop
• But, sometimes it is triggered by bad luck 0%
– If re-paging history decides to pick file pages and
many file pages are “dirty”, heavy writes to disk can Contents of system memory
occur
• This would probably happen eventually due to
“sync”

© 2008 IBM Corporation


IBM Training

AIX v5 vs v6 VMM Page Replacement tuning


 AIX 5.2/5.3  AIX 6.1
– minperm% = 20 – minperm% = 3
– maxperm% = 80 – maxperm% = 90
– maxclient% = 80 – maxclient% = 90
– strict_maxperm = 0 – strict_maxperm = 0
– strict_maxclient = 1 – strict_maxclient = 1
– lru_file_repage = 1 – lru_file_repage = 0
– page_steal_method = 0 – page_steal_method = 1

 On AIX 6.1, no paging to the paging space will occur unless the
system memory is over committed (AVM > 97%)

© 2008 IBM Corporation


IBM Training

Legacy page_steal_method=0

 Partition memory is broken up into page


pools
– A page pool is a set of physical pages
Page
organized into a list
scan for
 One lrud per memory pool List of
either w/s
 Inside each memory pool is a mix of
working storage and file pages pages or file
 When the free list is depleted, lrud scans
its page pool one scan bucket (default Page Pool 0
128k pages) at a time
 The scan can be targeted for working Page
storage pages, file pages, or either
List of scan for
 If scanning for file pages and the number
of file pages is small (e.g. max_client=10) either w/s
the ratio of scanned pages to freed pages pages
or file
will be high (e.g. 10:1)
 This reduces performance in two ways:
Page Pool 1
– CPU time in lrud
System Memory
– Fragmentation of memory which can
result in I/O coalescing being less
effective

© 2008 IBM Corporation


IBM Training

List-based LRU page_steal_method=1

 Partition memory is broken up into page


pools
– A page pool is a set of physical pages
– There are two lists for a page pool, List of w/s Page
one that is working storage pages and
another that is file pages scan for
pages
 One lrud per memory pool w/s
 When the free list is depleted, lrud scans
the appropriate list for the type of pages it
desires one scan bucket (128k pages) at a
time
Page
 If scanning for file pages and the number of List of file
file pages is small (e.g. max_client=10) the scan for
ratio of scanned pages to freed pages
should be low (e.g. 2:1 – 1:1) pages file

 This improves performance in two ways:


– CPU time in lrud is reduced due to less
scanning
– IO Coalescing is better preserved for Page Pool 0
reading and writing of files larger than
memory

© 2008 IBM Corporation


IBM Training

VMM File IO Pacing Enabled By Default

 IO Pacing Enabled By Default


– Prevents system responsiveness issues due to large quantities of
writes
– Limits the maximum number of pages of I/O outstanding to a file
• Without I/O pacing a program can fill up large amounts of memory with
written pages. Those “queued” I/O’s can result in long waits for other
programs using the storage
• Better solution than the file system write behind techniques
– New defaults
• Not very aggressive, intended to limit one or a few programs from impacting
system responsiveness. Values high enough not to impact sequential write
performance
• maxpout = 8193
• minpout = 4096

© 2008 IBM Corporation


IBM Training

Performance Tunables
 Tunables now in two categories
 Restricted Tunables
– Should not be changed unless recommended by AIX development or
development support
– Are not shown by tuning commands unless the –F flag is used
– Dynamic change will show a warning message
– Permanent change must be confirmed
– Permanent changes will cause an error log entry at boot time
 Non-Restricted Tunable
– Can have restricted tunables as dependencies

© 2008 IBM Corporation


IBM Training

Changing restricted tunables


Changing a restricted tunable dynamically

>ioo -o aio_sample_rate=6
Warning: a restricted tunable has been modified

A dynamic change of a restricted tunable will inform the user.

Changing a restricted tunable permanently

ioo -po aio_sample_rate=6


Modification to restricted tunable aio_sample_rate, confirmation yes/no

A permanent change of a restricted tunable requires a confirmation from the user.

Note: The system will log changes to restricted tunable in the system error log at
boot time.

© 2008 IBM Corporation


IBM Training

List restricted tunables


> ioo -aF
aio_active = 0
aio_maxreqs = 65536
...
posix_aio_minservers = 3
posix_aio_server_inactivity = 300
##Restricted tunables
aio_fastpath = 1
aio_fsfastpath = 1
aio_kprocprio = 39
aio_multitidsusp = 1
aio_sample_rate = 5
aio_samples_per_cycle = 6
j2_maxUsableMaxTransfer = 512
j2_nBufferPerPagerDevice = 512
j2_nonFatalCrashesSystem = 0
j2_syncModifiedMapped = 1
j2_syncdLogSyncInterval = 1

© 2008 IBM Corporation


IBM Training

TUNE_RESTRICTED Error Log Entry


LABEL: TUNE_RESTRICTED
IDENTIFIER: D221BD55

Date/Time: Thu May 24 15:05:48 2007


Sequence Number: 637
Machine Id: 000AB14D4C00
Node Id: quake
Class: O
Type: INFO
WPAR: Global
Resource Name: perftune

Description
RESTRICTED TUNABLES MODIFIED AT REBOOT

Probable Causes
SYSTEM TUNING

User Causes
TUNABLE PARAMETER OF TYPE RESTRICTED HAS BEEN MODIFIED

Recommended Actions
REVIEW TUNABLE LISTS IN DETAILED DATA

Detail Data
LIST OF TUNABLE COMMANDS CONTROLLING MODIFIED RESTRICTED TUNABLES AT REBOOT,
SEE FILE /etc/tunables/lastboot.log

© 2008 IBM Corporation


IBM Training

Why you ask?

 The number of tunables in AIX had grown to a ridiculously large


number
– 5.3 TL06: vmo 61, ioo 27, schedo 42, no 135, plus a few others
– 6.1 vmo 29, ioo 21, schedo 15, no 133, plus a few others
 The potential combinations that exist are too huge to effectively
test and document
 Many of the tunables had been created to deal with very specific
customers or situations which don’t apply often
 This wasn’t done in a vacuum, a survey of support and recent
situations was employed to identify the commonly used tunables
(which remain unrestricted)
 If a restricted tunable must be changed, a PMR should be
opened to identify the issue

© 2008 IBM Corporation


IBM Training

General trend toward file system I/O with concurrent I/O

 Concurrent I/O (CIO) has been a feature of AIX since AIX 5.2
– http://www.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.pdf
 Concurrent I/O gives applications which do internal buffering of disk I/O
and locking a means of by-passing operating system caching and i-node
file locking
– This improves CPU efficiency of I/O to very near that of raw logical
volumes
– And improves scalability by eliminating operating system i-node
locking in the read/write paths
 Concurrent I/O is not for all applications
– Some applications require operating system i-node locking to
function correctly
– Other applications do not do sophisticated storage buffering and
benefit from caching in the operating system or read-ahead/write-
behind mechanisms that the AIX virtual memory management
subsystem provide to improve sequential file performance

© 2008 IBM Corporation


IBM Training

CIO and Applications

 DB2 Version 9.5 implements CIO as the DEFAULT mechanism for table spaces
on AIX
– NO FILE SYSTEM CACHING/FILE SYSTEM CACHING clauses on CREATE
TABLESPACE or ALTER TABLESPACE
– View caching DB2 GET SNAPSHOT FOR TABLES ON db
– DB2 has supported CIO since V8.1
– http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0408lee/

 Oracle 10g/11g have support, but it is not a default


– Requires filesystemio_options is SETALL or DIRECTIO
– CIO is the recommended deployment solution for JFS2, however some 3rd
party tools have issues

© 2008 IBM Corporation


IBM Training

CIO and Applications

 If you use legacy VMM tuning (e.g AIX 5.2/5.3 defaults) and you switch
an application from non-CIO to CIO operation, you will likely need to
retune
– The amount and distribution of memory may change quite radically
– Usually, switching file usage to CIO reduces the memory required,
as the operating system no longer will be buffering file pages for
those files
– Upgrading from DB2 9.1 (non-CIO) to DB2 9.5 may require some
tuning preparation

 With AIX 6.1 default tuning, it should not be necessary to change tuning
when converting from non-CIO to CIO operation

© 2008 IBM Corporation


IBM Training

AIX 6.1 AIO Support

 Interface Changes
– All the AIO entries in the ODM and AIO smit panels have
been removed
– The aioo command will not longer be shipped
– All the AIO tunables have current, default, minimum and
maximum value that can be viewed with ioo
 AIO kernel extension loaded at system boot
– Applications no longer fail to run because you forgot to load
the kernel extension (you may applaud here)
– No AIO servers are active until requests are present
– Extremely low impact on memory requirements with this
implementation

© 2008 IBM Corporation


IBM Training

Improvements to AIO CIO


 AIO Fast Path for CIO enabled by Application
default AIO Server
– With the fast path, the AIO server
threads no longer participate in the Application
I/O path
File System File System
– By removing the AIO servers from
the path, we get three things LVM
• The removal of AIO servers as any
potential resource bottleneck FS no Fast Path
Device Driver
• The reduction in path length for AIO
read/write services, as less
dispatching is required
• Potentially better coalescing of Application
sequential I/O requests initiated
through AIO or LISTIO services File System
 Fast Path enabled for LV and PV’s CIO Fast Path
for a long time LVM

– No change in behavior for Device Driver


environments such as Oracle
10G/ASM on raw hdisks

© 2008 IBM Corporation


IBM Training

General improvements to AIO


 The number of AIO servers varies between minservers and maxservers (times
#CPUs), based on workload
– AIO servers stay active as long as they service requests
– Number of AIO server dynamically increased/reduced based on the demand of
the workload
– aio_server_inactivity defines after how many seconds idle time an AIO server
will exit
– Do not confuse no active servers with kernel extension not loaded. The kernel
extension is always loaded
 Changes to AIO tunables are dynamic through ioo
– Changes do not require system reboot
– minservers is changed to a per CPU tunable
– maxservers is changed to 30
– maxreqs is changed to 65536
 Benefit
– No longer necessary to tune the minservers/maxservers/maxreqs as in the
past

© 2008 IBM Corporation


IBM Training

AIO Tunables

> ioo -a
aio_active = 0
aio_maxreqs = 65536
aio_maxservers = 30
aio_minservers = 3
aio_server_inactivity = 300
posix_aio_active = 0
posix_aio_maxreqs = 65536
posix_aio_maxservers = 30
posix_aio_minservers = 3
posix_aio_server_inactivity = 300

© 2008 IBM Corporation


IBM Training

AIO Restricted Tunables


> ioo -aF
...
##Restricted tunables
aio_fastpath = 1
aio_fsfastpath = 1
aio_kprocprio = 39
aio_multitidsusp = 1
aio_sample_rate = 5
aio_samples_per_cycle = 6
posix_aio_fastpath = 1
posix_aio_fsfastpath = 1
posix_aio_kprocprio = 39
posix_aio_sample_rate = 5
posix_aio_samples_per_cycle = 6

© 2008 IBM Corporation


IBM Training

CIO Read Mode Flag


 Allows an application to open a file for CIO such that subsequent opens
without CIO avoid demotion
– In the past, a 2nd opening of a file without CIO, would cause
“demotion” which removes many of the benefits of CIO
– The 2nd read-only opening without CIO will still result in that opening
having uncached reads to the file. Thus, such programs should
ensure that the I/O sizes are large enough to achieve I/O efficiency
 Example, a backup application can access database files in read only
mode while the database has the file opened in concurrent IO mode
 open() flag is O_CIOR
 procfiles does not reflect O_CIO/O_CIO_R currently
– kdb 'u <slotnumber>' then for each file listed there 'file
<filepointer>' gives some info

© 2008 IBM Corporation


IBM Training

NFS Performance Improvements

 RFC 1323 enabled by default


– Allows for TCP window scaling beyond 64K, so more one-way packets
in-flight allowed between acks for large sequential transfers. We had
the nfs_rfc1323 tunable before, it just wasn't enabled by default.
 Increase default number of biod daemons
– 32 biod daemons per NFS V3 mount point
– Very slight increase in memory (<2MB) required over previous default
of 4
– Enables more I/O’s to be outstanding at the same, doesn’t speed
sequential operations much, but helps random access (e.g. OLTP)
 Default read/write size increased to 64k for TCP connections
– Was 32k previously

© 2008 IBM Corporation


IBM Training

NFS biod changes


 Having more biod’s allows better read-ahead and write-
behind
 However, measured on a single-process basis, don’t have
huge performance differences over the AIX 5.3 defaults
 Results should improve in tests with multiple
processes/threads operating over NFS
 NFS client tests, p5 520 on 1GB Ethernet with 64kB I/O’s
(next slide)

© 2008 IBM Corporation


IBM Training

NFS biod changes


NFS single process throughput,
over 256MB file

120000
100000
MB/second

80000
32biod
60000
4biod
40000
20000
0
ed
ed
ed

te

te
e
rit

ea
ea
ch
ch

ch

w
ca

cr

cr
ca
ca

er
un

un

nd
ov
er

se

ra
rv

q
er

er

se

e
se

rv
rv

e
rit

rit
se
se

w
q

w
rit
se
q

nd

w
se

ad

ra
re
ad

ad
re

re

© 2008 IBM Corporation


IBM Training

NFS biod change with Kerberos krbp5

 The increase in biod’s has a


much more positive impact NFS biod changes with Kerberos
when using Kerberos DES
70000
security 60000
50000
 Overlapping more compute

MB/sec
40000 32biod
with network traffic through 30000 4biod
20000
more biod’s greatly improves 10000
throughput 0

d
d

te
d

e
he

t
he

ri t

ea
ea
ch

rw
ac

ac

cr

cr
ca

e
rc
nc
 Same model as previous

nd
ov
un

se
u

ra
rv

q
er

er

e
se
se

e
rv

rit
rv

r it
se

w
se

e
q

w
rit
se
q

nd

w
chart, krbp5 (full packet

se

ad

ra
re
ad

ad
re

re
encryption) mount option

© 2008 IBM Corporation


IBM Training

Enhanced JFS “nolog” option

 JFS2 standard metadata logging for filesystem integrity disabled


via a mount option
– Similar to “legacy” JFS “nointegrity option”
 Meant to enable faster migration of data to new storage
– File system operation with heavy file create/delete activity can
create log bottlenecks
– Potentially useful for temporary file systems where the
filesystem can be easily recreated or fsck’ed
 Mount –o log=NULL during data migration phase, then unmount
and mount with standard logging

© 2008 IBM Corporation


IBM Training

Enhanced JFS “nolog” option - example

 4-way POWER5 p550, PHP PHP Wikibench

test “Wikibench” 90
80
70

Throughput
60

 Test makes heavy use of file 50


40
30
meta-data 20
10
0

 With single disk setup, Default log nolog

bottleneck on disk writes to Disk utilization over time


Enhanced JFS2 logs 100

80
 With “nolog”, the log

%disk busy
60 default log

bottleneck is avoided 40 nolog

20

0
time

© 2008 IBM Corporation


IBM Training

Multiple Page Size Segment (MPSS) Support


 POWER6 provides hardware support for mixing 4kB pages and 64kB
pages in the same hardware segment
 This allows the AIX operating system to transparently to an application
promote small pages to medium pages
– This typically improves performance by reducing stress on hardware
translation mechanisms
– It is controlled with the vmo vmm_default_pspa parameter (-1 turns
off)
 This behavior is enabled as a default on AIX 6.1 on POWER6 hardware
– Since it is not supported on POWER5, systems running identical
application conditions on POWER5 and POWER6 may differ on
exact memory page usage
– In general, no increase in memory consumption should be noticed,
however the usage of 64kB pages may increase on POWER6
– System paging activity may result in 64kB pages being broken into
4kB pages
– 64kB pages that are broken by paging won’t usually be reconstituted
into 64kB pages later
© 2008 IBM Corporation
IBM Training

MPSS – Using svmon to see MPSS segments


svmon –P 553068
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
553068 java 44652 8388 37623 73342 N Y N
PageSize Inuse Pin Pgsp Virtual
s 4 KB 1132 244 4055 4798
m 64 KB 2720 509 2098 4284

Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual


51b10 3 work working storage m 1879 0 1946 3068
0 0 work kernel segment m 520 507 47 561
3c02d d work text or shared-lib code seg m 297 0 85 612
d3a7 e work shared memory segment sm 582 0 3744 4096
61adc - work s 549 244 311 702
65add f work working storage m 20 0 17 36
51ad0 2 work process private m 3 2 2 5
75ad9 1 work code m 1 0 1 2

© 2008 IBM Corporation


IBM Training

MPSS – Using svmon to detail MPSS segments


svmon –D d3a7
Segid: d3a7
Type: working
PSize: sm (4 KB - 64 KB)
Address Range: 0..4095
Size of page space allocation: 3744 pages ( 14.6 MB)
Virtual: 4096 frames (16.0 MB)
Inuse: 582 frames ( 2.3 MB)
Page Psize Frame Pin ExtSegid ExtPage
0 m 442176 Y - -
1 m 442177 Y - -
2 m 442178 Y - -
382 s 362140 N - -
435 s 430534 N - -

© 2008 IBM Corporation


IBM Training

Implementation Considerations

AIX 5.2/3 to AIX 6.1 migration example (DB2 performance tuning)

 AIX 5.2/5.3  AIX 6.1


– VMM page replacement tuning – VMM page replacement
• reduce minperm, maxperm, tuning
maxclient • NO TUNING REQUIRED
• turn off strict_maxclient – AIO tuning
• increase minfree, maxfree
• NO TUNING REQUIRED
– AIO tuning
– DB2 tuning
• Enable AIO
• Enable CIO
• Tune minservers, maxservers
and reboot
– DB2 tuning
• Enable CIO

© 2008 IBM Corporation


IBM Training

Implementation Considerations (Cont’d)

Best Practices
Do not apply legacy tuning since some tunables may now be restricted
If you do an upgrade install, your old tunings will be preserved
You may wish to undo them, but we won’t make you
This level of tune was been applied to numerous AIX 5.3
customers through field support
We are confident this was a good thing
However, we try to never change defaults in the service stream, so
AIX 5.3 remains as it was

Change restricted tunables only if recommended by AIX support

© 2008 IBM Corporation


IBM Training

Implementation Considerations (Cont’d)

Problem Determination
Common problems - seen in field or lab
Legacy VMM tuning results in error log entries
(TUNE_RESTRICTED)
Tuning scripts fail due to required confirmation for permanent
changes of restricted tunables
Install/tuning scripts fail due missing aio0 device
Diagnostics
Check AIX errpt for TUNE_RESTRICTED
Check /etc/tunables/lastboot.log
PERFPMR

© 2008 IBM Corporation


IBM Training

Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.

Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not
actively marketed or is not significant within its relevant market.
Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:

*, AS/400®, e business(logo)®, DBE, ESCO, eServer, FICON, IBM®, IBM (logo)®, iSeries®, MVS, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA,
WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®

The following are trademarks or registered trademarks of other companies.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
* All other products may be trademarks or registered trademarks of their respective companies.

Notes:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will
experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual
environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without
notice. Consult your local IBM business contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance,
compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

34 © 2008 IBM Corporation

Você também pode gostar