Escolar Documentos
Profissional Documentos
Cultura Documentos
cover
Front cover
Instructor Guide
ERC 3.1
Instructor Guide
Trademarks
IBM is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
Active Memory AIX 5L AIX 6
AIX BladeCenter DB
DB2 EnergyScale Express
HACMP IBM Systems Director Active i5/OS
Energy Manager
i5/OS Micro-Partitioning Power Architecture
POWER Hypervisor Power Systems POWER
PowerVM POWER4 POWER5
POWER5+ POWER6+ POWER6
POWER7 Systems POWER7 pSeries
Redbooks System i System p
System p5 System z Systems Director
VMControl
Tivoli Workload Partitions z/VM
Manager
400
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Windows and Windows NT are trademarks of Microsoft Corporation in the United States,
other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
Other product and service names might be trademarks of IBM or other companies.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Unit 3. Dedicated shared capacity and multiple shared processor pools . . . . . . . . 3-1
Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Topic 1: Dedicated shared processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Dedicated processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Shared dedicated processors: Donating mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
POWER virtualization enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Dedicated processors: Enabling donating mode . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
Viewing the sharing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Working with sharing/donor mode from CLI (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . 3-17
Working with sharing/donor mode from CLI (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . 3-19
Viewing donating mode in AIX tools (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21
Viewing donating mode in AIX tools (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Viewing donating mode: HMC utilization data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26
Dedicated processors donating scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28
Dedicated idle cycles donation: New metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-31
Processor folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-34
Processor folding: Maximizing the idle capacity (1 of 2) . . . . . . . . . . . . . . . . . . . . 3-37
Processor folding: Maximizing the idle capacity (2 of 2) . . . . . . . . . . . . . . . . . . . . 3-39
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-41
Topic 1: Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-43
Topic 2: Multiple shared processor pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-45
What are multiple shared processor pools? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-47
Multiple shared processor pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-49
Multiple shared processor pool: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-51
CPU consumption for uncapped partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-53
CPU usage in a user-defined shared processor pool . . . . . . . . . . . . . . . . . . . . . . 3-55
Virtual shared processor pools: Resolution level . . . . . . . . . . . . . . . . . . . . . . . . . 3-57
Hardware and software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-59
Configuring multiple shared pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-61
Managed system properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-63
Change attributes of shared processor pools (1 of 3) . . . . . . . . . . . . . . . . . . . . . . 3-65
Change attributes of shared processor pools (2 of 3) . . . . . . . . . . . . . . . . . . . . . . 3-67
Change attributes of shared processor pools (3 of 3) . . . . . . . . . . . . . . . . . . . . . . 3-69
Changing the LPAR shared pool assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-71
Viewing shared pools in AIX tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-73
Viewing shared pools from HMC CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-75
Monitoring shared pools: AIX tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-77
Monitoring shared pools: HMC utilization data . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-79
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-81
Troubleshooting (4 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-101
Troubleshooting (5 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-103
Dual VIO Server: Network topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-105
Dual VIO Server: Virtual SCSI with dual HMC consideration . . . . . . . . . . . . . . . .8-108
Partition mobility with IVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-112
Partition mobility with IVM: Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-114
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-116
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-118
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-120
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
Active Memory AIX 5L AIX 6
AIX BladeCenter DB
DB2 EnergyScale Express
HACMP IBM Systems Director Active i5/OS
Energy Manager
i5/OS Micro-Partitioning Power Architecture
POWER Hypervisor Power Systems POWER
PowerVM POWER4 POWER5
POWER5+ POWER6+ POWER6
POWER7 Systems POWER7 pSeries
Redbooks System i System p
System p5 System z Systems Director
VMControl
Tivoli Workload Partitions z/VM
Manager
400
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Windows and Windows NT are trademarks of Microsoft Corporation in the United States,
other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
Other product and service names might be trademarks of IBM or other companies.
Purpose
Students in this course will learn how to implement advanced
PowerVM features, such as Active Memory Sharing, Active Memory
Expansion, shared dedicated processors, multiple shared processor
pools, N_Port Virtualization, and Remote Live Partition Mobility.
Additionally, students will learn skills to implement, measure, analyze,
and tune PowerVM virtualization features for optimal performance on
IBM System p servers. This course focuses on the features that relate
to the performance of POWER6 and POWER7 processors, AIX 6.1,
and the special monitoring, configuring, and tuning needs of logical
partitions (LPARs). This course does not cover application monitoring
and tuning.
Students will also learn AIX 6.1 performance analysis and tuning tools
that help an administrator take advantage of the Micro-Partitioning and
other virtualization features of the System p servers.
Hands-on lab exercises reinforce each lecture and give the students
practical experience.
Audience
Anyone responsible for the system administrative duties implementing
and managing virtualization features on a System p server.
The audience for this training includes the following:
AIX technical support individuals
System administrators
Systems engineers
System architects
Prerequisites
The LPAR prerequisite skills can be met by attending one of the
following classes or students can have equivalent LPAR skills.
AN11 Power Systems for AIX I: LPAR Planning and Configuration
AN30 PowerVM Virtualization II: Dual VIO Servers and IVE
Objectives
After completing this course, the students should be able to:
Describe the effect of the POWER6 virtualization features on
performance and monitoring, such as:
- Simultaneous multithreading (SMT), Micro-Partitioning, multiple
shared processor pools (MSPP), shared dedicated capacity,
Active Memory Sharing (AMS), Active Memory Expansion
(AME), and other virtualization features
Interpret the outputs of AIX 6.1 performance monitoring and tuning
tools used to view the impact of SMT, Micro-Partitioning, additional
shared processor pool activations, and device virtualization; these
tools include the following:
- vmstat, iostat, sar, topas, trace, curt, mpstat, lparstat, smtctl
List various sources of information and support related to AIX 6.1
performance tools, system sizing, system tuning, and AIX 6.1
enhancements and new features
Perform a Live Partition Mobility between two different
POWER6/POWER7 servers.
Describe the new features available with the Virtual I/O Server
Version 2.1 and Version 2.2, such as:
- N_port ID Virtualization, heterogeneous multithreading, virtual
tape devices, Active Memory Sharing
Describe and implement the Active Memory Sharing feature
Describe the Active Memory Expansion feature
Curriculum relationship
This course assumes that students have taken the prerequisite AIX
and virtualization training. This course is the third of the available
Power p virtualization courses.
pref Agenda
Day 1
(00:30) Welcome
(01:00) Unit 1: PowerVM features review
(00:45) Exercise 1: Introduction to the lab environment
(02:00) Unit 2: Processor virtualization tuning
(02:00) Exercise 2: Processor virtualization tuning
Day 2
(01:30) Unit 3: Dedicated shared capacity and multiple shared
processor pools
(01:30) Exercise 3: Configuring multiple shared processor pools
(01:30) Unit 4: Active Memory Sharing
(02:00) Exercise 4: Configuring Active Memory Sharing
Day 3
(02:00) Unit 5: Active Memory Expansion: Overview
(00:35) Exercise 5: Active Memory Expansion
(01:00) Unit 6: N_Port ID Virtualization
(01:30) Exercise 6: Virtual Fibre Channel adapter configuration
(optional)
(02:00) Unit 7: I/O device virtualization performance and tuning
Day 4
(02:00) Unit 7: I/O device virtualization performance and tuning
(continued)
(01:00) Exercise 7: I/O device virtualization performance and tuning
(01:30) Unit 8: Partition mobility
Day 5
(01:00) Exercise 8: Implementing Live Partition Mobility
(01:30) Unit 9: PowerVM advanced systems maintenance
(01:00) Exercise 9: PowerVM system maintenance
(01:00) Unit 10: Virtualization management tools
(00:30) Wrap up/Evaluations
Estimated time
01:00
References
Redbooks and Redpapers related to the PowerVM that you can
download at http://www.redbooks.ibm.com
SG24-7940 PowerVM Virtualization on IBM System p: Introduction
and Configuration. (Fourth Edition Redbook)
SG24-7590 PowerVM Virtualization on IBM System p: Managing and
Monitoring, SG24-7590-00
REDP-4194 IBM System p Advanced POWER Virtualization
(PowerVM) Best Practices, REDP-4194-00
REDP-4638-00 IBM Power 750 and 755 Technical Overview and
Introduction
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
After completing this unit, you should be able to:
Notes:
The objectives list what you should be able to do at the end of this unit.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty Dedicated shared processor: Function provides the ability for partitions that normally
run as dedicated processor partitions to contribute unused processor capacity to the
shared processor pool. This support allows unneeded capacity to be donated to
uncapped micro-partitions instead of being wasted as idle cycles in the dedicated
partition.
Integrated Virtual Ethernet (IVE): Also called Host Ethernet Adapter (HEA), this is a
feature that provides network connectivity to the partitions. It is available on certain
types of POWER6 and POWER7-based systems and allows partition communication. It
uses a physical Integrated Virtual Ethernet adapter and it must not be considered as a
virtual feature.
Simultaneous Multithreading (SMT): On-chip hardware threads to improve resource
usage.
Virtual LAN: Provides network virtualization capabilities. It is purely firmware-based,
using the POWER Hypervisor, and does not require the purchase of the PowerVM
edition. Multiple virtual switches can be defined with POWER6 and POWER7
processor-based systems.
Virtual I/O: Allows the sharing of I/O adapters and devices between partitions.
Integrated Virtualization Manager (IVM) provides the virtualization capabilities for the
management of a server and LPARs without using a Hardware Management Console
(HMC).
Capacity on Demand (CoD): Allows system resources, such as processors and
memory, to be activated as needed. Utility CoD is new with POWER6 technology and
automates the usage of CoD processors.
The IBM PowerVM Lx86 feature of PowerVM editions is designed to allow you to run
most x86 Linux applications. This allows consolidation of AIX and Linux on POWER
and x86 Linux applications on the same server.
The PowerVM Enterprise edition includes the Partition Mobility virtualization feature that
allows migrating a virtualized logical partitions from a source system to a target. This
feature supports Live Partition Migration without partition shutdown.
The PowerVM Enterprise edition includes also the Active Memory sharing feature that
allows shared memory partitions to shared a common shared memory pool.
Active memory expansion feature requires POWER7 processor-based systems. The
purpose of AME is to reduce the memory footprint used by an LPAR by compressing
the memory. The Operating system running in the LPAR compress data in memory to
effectively expand the size of the memory by allowing more data to be packed into
memory. Active memory expansion is configurable on a per-LPAR basis.
Most of the hardware virtualization features listed above are managed using the HMC
V7 Web user interface. IVM allows management of some IBM Power Systems, and
blades without using an HMC. IVM does not support all of the HMC functions and has
limited capabilities.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The Workload Partition (WPAR) feature, not listed on the previous slide, provides a
software solution for creating virtualized operating environments to use when managing
multiple workloads. WPAR is a purely software partitioning solution that is provided by
the operating system. It has no dependencies on hardware features. It is strictly an AIX
6.1 feature.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
minimization of the number of I/O adapters required. File-backed devices are available with
VIOS version 1.5.
Multiple operating system support
The IBM Power Systems support IBM AIX Version 6.1, IBM AIX Version 5.3, i5/OS, and
Linux distributions from SUSE and Red Hat.
Integrated Virtualization Manager
The Integrated Virtualization Manager is a hardware management solution that inherits the
most basic of Hardware Management Console (HMC) features and removes the
requirement of an external HMC. It is limited to managing a single IBM Power System
server. Integrated Virtualization Manager (IVM) runs now on the Virtual I/O Server Version
1.5.
Additional information Active memory expansion is a separate priced feature from
PowerVM Editions. Active memory expansion is detailed in a separate unit.
Transition statement Lets discuss the different PowerVM editions and the supported
features.
Uempty
Lx86
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The POWER6 processor is designed to provide improved performance. It uses the latest
65nm advanced technology with 10 levels of metal (low-k dielectric on first eight levels).
The distributed switch architecture (previous POWER5 generation) is enhanced for SMP
enhanced scaling and core parallelism. The memory path and caches have been modified
to increase memory bandwidth and reduce memory latencies.
L1 data cache is 64KB instead of 32KB and cache access bandwidth is increased (double
number of paths compared to POWER5). L2 cache capacity is increased to 2x4MB.
High-speed elastic bus interface is implemented, which allows all interface busses to the
POWER6 to operate at a range of higher frequencies. The bus speed scales with the
speed of the processor.
The following features have been implemented on the POWER6 core:
Advanced memory subsystem
Decimal floating point execution unit
VMX (Alti-Vec) execution unit
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Explain that POWER6 system technology is a full redesign based on a new:
Processor
System architecture
PHYP microcode and new HMC
Virtualization component set
AIX version
Explain that the POWER processor is based on a strong roadmap, and the POWER6
design benefits from the past experience. POWER4 and AIX 5 brought LPAR and DLPAR
for server consolidation. POWER5 and AIX 5.3 brought virtualization for resource
optimization. POWER6 and AIX 6 brings enhancements to all previous features with a
emphasis on RAS.
All of the generations of POWER systems bring advancements in computing performance.
POWER6/AIX 6 is a UNIX main frame class of systems.
Details
Additional information IBM POWER systems scale significantly better than any other
systems. Part of the secret is the implementation of the storage hierarchy, including
management of the L2 and connections to it from other parts of the system.
Transition statement Next slide is just an introduction to the POWER6 processor and
its characteristics.
Uempty
Notes:
POWER7 processor is 45 nm technology. The basic processor is 8 cores, with 4 and 6
cores options. POWER7 provides significantly better performance per chip as we have
many more cores per chip than POWER6.
The L3 cache for an 8 core chip is 32 MB just like it was for POWER6, but POWER7
implements L3 caches on-chip. So we have 32 MB of cache but now it is on chip and uses
this new embedded DRAM technology. With this eDRAM technology, you get significant
bandwidth and latency improvements, up to six times faster access of data in this cache
versus external cache.
Each core has its own level 2 cache. The level 2 cache was 4 MB on POWER6 and now it
is 256 KB but it is about three times faster to access. In fact the cache acts more like it was
on L1 cache before.
The POWER7 processor has the same L3 cache size than POWER6 but much faster to
access and much higher bandwidth. Also much faster access to L2 cache than it was on
POWER6.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
POWER7 processor has dual DDR 3 memory controllers with a new buffer chip to connect
to the memory to be able to move data faster and more efficiently. The memory bandwidth
just doubled from POWER6 to POWER7
POWER7 processor has 12 execution units compared to 9 on POWER6, and instead of
running two threads per core, the POWER7 implements four threads per core.
AIX 6.1 and AIX 7 support POWER7. AIX 6.1 supports 64 cores but allows to use four-way
multithreading for a maximum of 256 threads. AIX 7 will support 256 cores with up to 1024
threads.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Shared processors
Shared processors are physical processors that are allocated to partitions on a timeslice
basis. Any physical processor in the shared processor pool can be used to meet the
execution needs of any partition using the shared processor pool.
A POWER system can contain a mix of shared and dedicated partitions. A partition must
be either all shared or all dedicated, and you cannot use dynamic LPAR commands to
change between the two. You need to bring down the partition and switch it from using
dedicated to shared, or vice versa.
Processing units
When a partition is configured, you assign it an amount of processing units. A partition
must have a minimum of one tenth of a processor, and after that requirement has been
met, you can configure processing units at the granularity of one hundredth of a processor.
Virtual processors
The virtual processor setting defines the way that a partitions entitlement can be spread
concurrently over physical processors. That is, you can think of the processing power
available to the operating system on the partition as being spread equally across these
virtual processors. The number of virtual processors is what the operating system thinks it
has for physical processors. The Hypervisor dispatches virtual processors onto physical
processors.
The example in the visual above shows four physical processors in the shared pool, and
each partition thinks it has three processors. The number of virtual processors can be
independently configured for each shared partition. The number of virtual processors can
be changed dynamically.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Review shared processors.
Details
Additional information
Transition statement Lets look at how simultaneous multithreading and virtual
processors work together.
Uempty
Simultaneous multithreading and
Micro-Partitioning
Simultaneous multithreading can
be used with Micro-Partitions. POWER7
With simultaneous multithreading, LPAR1
each virtual processor runs two
threads (POWER6) or four
threads (POWER7): Logical
processors
Each thread is called a
logical processor.
Virtual
LPAR1 example:
processors
1.6 processing units
Two virtual processors
Simultaneous
multithreading enabled
Eight logical processors
Notes:
PURR
The Processor Utilization Resource Register (PURR) value, covered in the processor
virtualization unit of this course, is used to accumulate information only when the virtual
processor is dispatched on a physical processor. So PURR is utilized even if simultaneous
multithreading is disabled, because it provides accurate processor utilization statistics in a
shared processor environment.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Provide an overview of how logical processors and virtual processors work
together.
Details
Additional information Additional information is given later in this course.
Transition statement
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
processors and online virtual processors that are visible to the user or applications does
not change. The middleware or the applications running on the system are not affected,
because the active and inactive virtual processors are internal to the system.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Dedicated
processor
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Up to 64
Shared processor pooln
Shared processor pool1 Set of micro-partitions shared
Set of micro-partitions
Shared processor pool0 processor
Set of micro-partitions
LPAR7 pools
LPAR1 LPAR3 LPAR8
LPAR2 LPAR4
LPAR5
LPAR6
Shared
Dedicated
Physical
processors
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Give an overview of MSPPs.
Details This feature is detailed later in another unit.
Additional information
Transition statement Lets remind the students about virtual Ethernet adapters.
Uempty
Notes:
Introduction
Virtual Ethernet enables inter-partition communication without the need for physical
network adapters assigned to each partition. It can be used in both shared and dedicated
POWER processor partitions provided the partition is running AIX 5.3, AIX 6.1 or Linux with
the 2.6 kernel or a kernel that supports virtualization. This technology enables IP-based
communication between logical partitions on the same system using a Virtual Local Area
Network (VLAN)-capable software switch (POWER Hypervisor) in POWER systems.
Because the number of partitions possible on many systems is greater than the number of
I/O slots, virtual Ethernet is a convenient and cost-saving option to enable partitions within
a single system to communicate with one another through a virtual Ethernet LAN.
The virtual Ethernet interfaces can be configured with both IPv4 and IPv6 protocols.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Client/server relationship
Virtual I/O devices provide for sharing of physical resources, such as adapters and
devices, among partitions. Multiple partitions can share physical I/O resources and each
partition can simultaneously use virtual and physical I/O devices. When sharing adapters,
the client/server model is used to designate partitions as users or suppliers of adapters. A
server must make its physical adapter available and a client must configure the virtual
adapter.
If a server partition providing I/O for a client partition fails, the client partition might continue
to function, or it might fail, depending on the significance of the hardware it is using. For
example, if the server is providing the paging volume for another partition, a failure of the
server partition would be significant to the client.
Virtual Ethernet
Virtual Ethernet provides a network connection between partitions on the same managed
server. The Hypervisor provides the inter-partition virtual switch to provide support for
connecting up to 4,096 LANs. All partitions using a particular virtual LAN ID can
communicate with each other. The Virtual I/O Server software is not required to use virtual
Ethernet.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Ethernet adapter failover functionality has been supported. The Virtual I/O Server software
is required to configure shared Ethernet adapters.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Shared Ethernet adapter (SEA) technology (part of the Virtual I/O Server feature on
POWER hardware) enables the logical partitions to communicate with other systems
outside the managed system.
Uempty Bridge
As the shared Ethernet processes packets at Layer 2, the original MAC address and VLAN
tags of the packet are visible to other systems on the physical network.
MTU issues
The virtual Ethernet adapters can transmit packets with a size up to 65408 bytes.
Therefore, the maximum MTU for the corresponding interface can be up to 65394 (65390
with VLAN tagging). Since the shared Ethernet adapter can only forward packets of a size
up to the MTU of the physical Ethernet adapters, a lower MTU or PMTU discovery should
be used when the network is being extended using the shared Ethernet.
Most packets including broadcast (for example, ARP) or multicast (for example, Network
Discovery Packet [NDP]) packets that pass through the shared Ethernet setup are not
modified. These packets retain their original MAC header and VLAN tag information. When
the MTU of the physical and virtual side do not match this can result in the shared Ethernet
receiving packets that cannot be forwarded because of MTU limitations. This situation is
handled by processing the packets at the IP layer, by either doing IP fragmentation or
reflecting ICMP errors (packet too big) to the source based on the IP flags in the packet. In
the case of IPv6, the packets ICMP errors are sent back to the source, as IPv6 allows
fragmentation only at the source host. These ICMP errors help the source host discover the
PMTU and therefore handle future packets appropriately.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Overview of the shared Ethernet adapter.
Details The SEA network bandwidth apportioning feature is available with the Virtual I/O
Server version 1.5.2 and later.
Additional information
Transition statement The Integrated Virtual Ethernet is an alternative to the shared
Ethernet adapter.
Uempty
LPAR OS
Logical Host
switch Port group Ethernet
Adapter (HEA)
Physical
port External switch
Copyright IBM Corporation 2011
Notes:
IBM Power Systems have an Integrated Virtual Ethernet (IVE). IBM Power 570 and Power
Systems 770 can have one IVE per system drawer. All operating systems supported on
IBM Power Systems support the use of IVE ports. The IVE allows multiple partitions to
share a single integrated Ethernet adapter to connect to an external network without a
Virtual I/O Server and without routing through another partition.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Hardware configuration options, such as speed, are set at the physical port level using the
HMC. The administrator chooses which logical ports to allocate to partitions and which
physical port to use for the logical ports.
There is one HEA in each IVE adapter. In the operating system, the HEA is represented
logically as an lhea device. If a partition uses two logical ports from the same HEA, they
must use different physical ports. In this case, there will be one lhea parent device and two
ent# devices.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
POWER Hypervisor
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
N_Port_ID Virtualization
Simplify management of
Fibre Channel SAN VIO client
EMC 5000 IBM 2105
environment with port LUN LUN
virtualization.
Fibre Channel industry Virtual FC
adapters
standard method for using
VIOS
virtualization to map multiple
Pass through
N_Port IDs to a physical Fibre module
Channel port
Allows LPARs to have PCIe 8Gbit
Fibre Channel adapters
dedicated N_Port IDs (just as
with a dedicated physical HBA). NPIV SAN
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Virtual tape
Simplify backup and restore operations with virtual tape.
Only SAS tape drives are supported.
SAN Fibre Channel tape drives are supported through N-port ID
Virtualization (NPIV).
VIO client
SAN
Tape library
drive robotics
SAS tape Copyright IBM Corporation 2011
Notes:
PowerVM has two virtualization methods for using tape devices on IBM Power Systems,
simplifying backup and restore operations. Both methods are supported with PowerVM
Express, Standard, or Enterprise Edition.
NPIV enables PowerVM LPARs to access SAN tape libraries using shared physical
HBA resources for AIX V5.3, AIX V6.1, and SUSE Linux Enterprise Server 11 partitions
on POWER6 processor-based servers.
Virtual tape support allows serial sharing of selected SAS tape devices for AIX V5.3,
AIX V6.1, IBM i 6.1, and SUSE Linux Enterprise Server 11 partitions.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Requirements
POWER6 and POWER7 systems
Logical partition must only have
virtual adapters
Notes:
Live Partition Mobility allows for the movement of a running partition from one POWER6 or
POWER7 processor-based server to another without application downtime. This provides
better system utilization, improved application availability and energy savings. With live
partition mobility, planned application downtime due to regular server maintenance can be
a thing of the past.
As of this writing, all the resources of the moving partition must be virtualized (no dedicated
adapters). Also, you can perform a live partition mobility between two systems that are
managed by different HMCs.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
POWER
Hypervisor
Physical memory
Copyright IBM Corporation 2011
Notes:
The IBM Power Virtualization Manager (IBM PowerVM) Active Memory Sharing (AMS)
technology takes PowerVM virtualization to a new level of consolidation and virtualization
by optimizing memory utilization. AMS intelligently shares memory by dynamically moving
it from one partition to another on demand. This can optimizes memory utilization and
allows for flexible global memory usage.
Because memory utilization can be linked to processor utilization, this function
complements shared processors very well. Systems with low CPU requirements are very
likely to have low memory residency requirements.
The Virtual I/O Server is required as paging partition, owning the paging devices used
when the hypervisor pages out partitions memory to satisfy demands from other partitions.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Active memory expansion is configurable on a per-LPAR basis. When memory expansion
is enabled for a LPAR, the operating system running in the LPAR will compress in-memory
data to effectively expand the size of memory by allowing more data to be packed into
memory.
Logically, the operating system will maintain two pools of memory - a compressed pool of
memory and an uncompressed pool of memory. The sizes of the pools will be controlled by
the operating system. With AIX, the sizes of the memory pools will vary based on load and
the target memory expansion factor.
Only pages in the uncompressed memory pool are directly accessible and usable. Pages
in the compressed pool must first be decompressed into the uncompressed pool in order to
be used. The OS will be responsible for moving pages between the compressed and
uncompressed pools based on workload.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets see PowerVM Lx86.
Uempty
PowerVM Lx86
Run x86 Linux applications on Power Systems servers along with your AIX
and Linux on POWER applications
Simplifies migration of Linux on x86 applications
Runs most existing 32-bit x86 Linux applications with no application changes
Enables customers to realize the energy and administrative benefits associated with
consolidation
Is included with the purchase of PowerVM editions
PowerVM
x86 Install and run x86
x86
x86 Linux OS Linux OS
Linux OS No porting
Linux OS app app POWER
app AIX OS
app Linux OS
Linux OS No recompile PowerVM application
Linux OS Lx86 application
Linux OS
No changes
x86 platforms Supported Linux OS AIX OS
x86 Platforms
x86 Platforms
Power Systems platform
Notes:
This feature enables the dynamic execution of x86 Linux instructions by mapping them to
instructions on a POWER processor-based system and caching the mapped instructions to
optimize performance. PowerVM Lx86 software is designed with features that enable users
to easily install and run a wide range of x86 Linux applications on Power Systems
platforms, with a Linux on POWER operating system.
This allows the consolidation of AIX and Linux on POWER and x86 Linux applications on
the same server.
PowerVM Lx86 supports the installation and running of most 32-bit x86 Linux applications
on any IBM System p or BladeCenter model with POWER7, POWER6, POWER5+ or
POWER5 processors, or IBM POWER Architecture technology-based IBM BladeCenter
blade servers. It creates an x86 Linux application environment running on POWER
processor-based systems by dynamically translating x86 instructions to POWER
Architecture instructions and caching them to enhance performance, as well as mapping
x86 Linux system calls to Linux on POWER system calls. No native porting or application
upgrade is required for running most x86 Linux applications.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Lx86 is available with all the PowerVM Editions.
Details PowerVM Lx86 can help developers and ISVs reduce the effort required to
support Linux by reducing or eliminating the requirement to port, tune, recompile, release
new media or documentation, or maintain a unique product offering for POWER
technology.
Additional information
Transition statement Lets introduce virtualization performance management.
Uempty
Notes:
Measuring performance
Tools are used to measure performance in key areas such as:
CPU utilization
Memory utilization and paging
Disk I/O
Network I/O
Know your performance baseline over time so that performance issues can be recognized
and tuning activities can be evaluated.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce why the course title refers to performance management and not
tuning.
Details Performance management encompasses more than tuning. It includes
understanding what is normal on your system, making adjustments as necessary, and
evaluating tuning adjustments that work.
Additional information
Transition statement Lets review a methodology for analyzing the performance of a
system.
Uempty
Performance methodology
Performance can be improved by using a methodical
approach.
1. Understand the factors that can affect performance.
2. Measure the current performance of the server.
3. Identify any performance bottlenecks.
4. Change the component causing the bottleneck.
5. Measure the new performance of the server to check for
improvement.
Notes:
Methodical approach
Using a methodical approach, you can improve server performance. For example:
Understanding the factors that can affect server performance for the specific server
functional requirements and for the characteristics of the particular system.
Measuring the current performance of the server.
Identifying performance bottlenecks.
Changing the component that is causing the bottleneck.
Measuring the new performance of the server to check for improvement.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Discuss the performance analysis methodology.
Details This is a typical scientific approach to performance management. Measure,
change one thing, measure again.
Additional information
Transition statement Lets look at a general flowchart used to analyze performance.
Uempty
CPU bound?
Yes
Yes Actions
No
Is there a
performance Memory bound?
No problem? Yes
Actions
No
I/O bound?
Yes
Normal operations Actions
No
Monitor system performance
and check against requirements. Network bound?
Yes
Actions
No
No
Does performance Additional tests
meet stated
goals?
Actions
Yes
Copyright IBM Corporation 2011
Notes:
This is a flowchart that some performance analysts use. Keep in mind that this is an
iterative process.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Use this flowchart to provide an overview of performance analysis.
Details Point out that continuous monitoring should be done, as well as satisfying
customer complaints.
Additional information
Transition statement Lets look at some AIX tools that can be used for performance
monitoring.
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Enhanced commands for AIX 5.3 and AIX 6, which support the
virtualization features
The lparstat command reports logical partition CPU and memory-related information
and statistics.
Uempty The mpstat command collects and displays performance statistics for all logical
processors in the system. The mpstat command shows SMT utilization (-s), interrupt
metrics (-i), detailed software and dispatcher metrics (-d), and other information.
The smtctl command controls the enabling and disabling of the processor
simultaneous multithreading mode.
The vmstat, iostat, sar, and topas commands are collecting statistics for virtual
processors.
The entstat command shows Ethernet device statistics including shared Ethernet
adapter statistics.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose List the performance analysis tools.
Details Provide an overview of the available performance tools based on the metrics
they are used for.
Point out that the lparstat, vmstat, and topas tools have been enhanced to support the
active memory sharing feature.
Additional information Other performance tools not written by IBM also exist. Some
are for purchase and others are publicly available on the Internet.
Transition statement Next slide shows AIX tuning tools.
Uempty
bindprocesso chps
chdev chdev
r mkps
chdev fdpr migratepv ifconfig
setpri chdev chlv
bindintcpu rmss reorgvg
procmon
smtctl
Copyright IBM Corporation 2011
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
smtctl is used to enable and disable simultaneous multithreading and to view the
status.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
References
Information Center documents:
http://publib16.boulder.ibm.com/pseries/index.htm
Support for virtualization software:
http://www14.software.ibm.com/webapp/set2/sas/f/virtualization/home.html
IBM PowerVM Web portal:
http://www-03.ibm.com/systems/power/software/virtualization/index.html
Provides links to white papers, education resources, services, and so forth
Redbooks:
http://www.redbooks.ibm.com/
In particular:
SG24-7590 IBM PowerVM Virtualization Managing and Monitoring
Redp-4194 IBM System p PowerVM Best Practices
Redp-4470 PowerVM Virtualization Active Memory sharing
SG24-7559 IBM AIX Version 6.1 Differences Guide
SG24-6478 AIX 5L Practical Performance Tools and Tuning Guide
Notes:
This list is a starting point to obtain documentation for your system. There is documentation
for your specific system model, for the HMC, for the operating systems, and for configuring
partitions. The Information Center is the access point to the IBM documentation.
There are new Redbooks released all the time, particularly as a product matures. Check
the www.redbooks.ibm.com Web site from time to time.
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. The PowerVM Enterprise Edition is required for which of the
following?
a. Shared Ethernet adapter
b. Partition mobility
c. Virtual SCSI Adapter
d. Integrated Virtual Ethernet
e. Active Memory Sharing
Notes:
Checkpoint solution
1. The PowerVM Enterprise Edition is required for which of the following?
a. Shared Ethernet adapter
b. Partition mobility
c. Virtual SCSI Adapter
d. Integrated Virtual Ethernet
e. Active Memory Sharing
The answers are partition mobility and Active Memory Sharing.
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
Having completed this unit, you should be able to:
Notes:
Copyright IBM Corp. 2010, 2011 Unit 1. PowerVM features review 1-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
02:00
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
References
SG247940 - PowerVM Virtualization on IBM System p Introduction
and Configuration (Fourth Edition)
SG247590 - PowerVM Virtualization Managing and Monitoring
Redbook
White Paper: POWER5 System micro architecture from IBM research
http://www.research.ibm.com/journal/rd/494/sinharoy.
pdf
White Paper: POWER6 System micro architecture from IBM research
http://researchweb.watson.ibm.com/journal/rd/516/le.p
df
White Paper: SPURR informations and descriptions: EnergyScale for
IBM POWER6 microprocessor based systems:
http://researchweb.watson.ibm.com/journal/rd/516/mc
creary.pdf
REDP-4638-00: IBM Power 750 and 755 Technical Overview and
Introduction redpaper
Uempty
Unit objectives
After completing this unit, you should be able to:
Describe the simultaneous multithreading concept and its effect on
performance monitoring and tuning
Describe the function of the PURR/SPURR statistics
Describe the impact of simultaneous multithreading on tools such as
vmstat, iostat, sar, and topas
Discuss guidelines for systems running simultaneous multithreading
with various workloads
Use tools to view statistics related to the monitoring and tuning of
partitions that have simultaneous multithreading enabled
Describe how the POWER Hypervisor allocates processing power from
the shared processing pool
Discuss recommendations associated with the number of virtual
processors
Describe performance considerations associated with implementing
Micro-Partitioning
Use tools to monitor the statistics on a partition running a workload with
Micro-Partitioning configured
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Review the objectives for this unit.
Details Explain what well cover and what the students should be able to do at the end
of the unit.
Additional information
Transition statement Heres the performance flowchart that we will be following
throughout the course, with the CPU box highlighted.
Uempty
Physical layer
Hardware Hardware Hardware Hardware
thread0 thread1 thread2 thread3
Physical CPU
Copyright IBM Corporation 2011
Notes:
Simultaneous multithreading (SMT) is the ability of a single physical processor to
concurrently dispatch instructions from more than one hardware thread. There are two
hardware threads per physical processor on POWER6 and four on POWER7, so additional
instructions can run at the same time. Because the processor can fetch instructions from
any of the threads in a given cycle, the processor is no longer limited by the
instruction-level parallelism of the individual threads.
Simultaneous multithreading also allows instructions from one thread to utilize all the
execution units if the other thread encounters a long latency event. For instance, when one
of the threads has a cache miss, the second thread can continue to execute.
The operating system supports each hardware thread as a separate logical processor. So,
the operating system configures a dedicated partition that is created with one physical
processor as a logical two-way or four-way when simultaneous multithreading is enabled.
This is independent of the partition type, so a shared partition with one virtual processor is
configured as a logical two-way. Starting in AIX 5L V5.3, simultaneous multithreading is
enabled by default.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe how simultaneous multithreading works on IBM POWER systems.
Details When in simultaneous multithreading mode, instructions from either thread can
use the eight instruction pipelines in a given clock cycle. By duplicating portions of logic in
the instruction pipeline and increasing the capacity of the register rename pool, the IBM
POWER processor can execute two or four instruction streams, or threads, concurrently.
Additional information Normally, AIX maintains sibling threads at the same priority,
but boosts or lower-thread priorities in a few key places to optimize performance. AIX
lowers thread priorities, when the thread is doing non-productive work spinning in the idle
loop or on a kernel lock. When a thread is holding a critical kernel lock, AIX boosts the
thread priorities. These priority adjustments do not persist into user mode. AIX does not
consider a software threads dispatching priority when choosing its hardware thread
priority. Several scheduling enhancements were also made to exploit simultaneous
multithreading. For example, work is distributed across all primary threads before it is
dispatched to secondary threads. The reason for this enhancement is that a thread
performs best when its sibling thread is idle. AIX also considers thread affinity in idle
stealing and periodic run queue load balancing.
Transition statement When is simultaneous multithreading beneficial?
Uempty
Notes:
In general, the following rules can be summarized for application performance on
simultaneous multithreading environments.
Applications found in commercial environments showed higher simultaneous
multithreading gain, than scientific applications.
Experiments on different workloads have shown varying degrees of simultaneous
multithreading gain ranging from -11% to 43%.
On average, most of the workloads showed a positive gain when running in
simultaneous multithreading mode.
Applications that showed a negative simultaneous multithreading gain can be attributed
to L2 cache thrashing and increased local latency under simultaneous multithreading.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Workloads that have a very high Cycles Per Instruction (CPI) count tend to utilize
processor and memory resources poorly and usually see the greatest simultaneous
multithreading benefit. These large CPIs are usually caused by high cache miss rates from
a very large working set. Large commercial workloads typically have this characteristic,
although it depends somewhat on whether the two hardware threads share instructions or
data or are completely distinct. Workloads that share instructions or data, which includes
those that run a lot in the operating system or within a single application, tend to have
better simultaneous multithreading benefit. Workloads with low CPI and low cache miss
rates tend to see a benefit, but a smaller one.
For high performance computing, try enabling simultaneous multithreading and monitor
performance. If the workload is data-intensive with tight loops, you might see more
contention for cache and memory which can reduce performance.
Uempty Snoozing
The process of putting an active thread into a dormant state is known as snoozing. In
dedicated processor partitions, if there are not enough tasks available to run on both
hardware threads of a processor, the operating systems idle process will be selected to run
on the idle hardware thread. It is better for the operating system to snooze the idle process
thread and switch to single-threaded mode. Doing so enables all of the processor
resources to be available to the thread doing meaningful work.
To snooze a thread, the operating system will invoke the h_cede Hypervisor call. The
thread then goes to the dormant state. A snoozed thread is woken when a decrementer,
external interrupt, or an h_prod hypervisor call is received. When other tasks become
ready to run, the processor transitions from single-threaded mode to simultaneous
multithreading mode. It does not make sense to snooze a thread as soon as the idle
condition is detected. There could be another thread in the ready-to-run state in the run
queue by the time the snooze occurs, resulting in wasted cycles due to the thread start-up
latency. It is good for performance if the operating system waits for a short of time for work
to come in before snoozing a thread. This short idle spinning time is known as
simultaneous multithreading snooze delay. Both AIX and Linux provide snooze delay
tunables.
To view the current snooze delay value on AIX 6.1:
# schedo -o smt_snooze_delay
smt_snooze_delay = 0
The value represents the number of microseconds spent in the idle loop without useful
work before snoozing (calling h_cede). A value of -1 indicates to disable snoozing; a value
of 0 (the default) indicates to snooze immediately. The value can go as high as 100000000
(100 secs).
Certain workloads might see better performance with a larger snooze delay. To change the
delay, use schedo. For example, heres the command to change the delay to five
microseconds:
# schedo -o smt_snooze_delay=5
Setting smt_snooze_delay to 5
With POWER7, a new parameter was added: smt_tertiary_snooze_delay. This acts similar
to smt_snooze_delay except that it works on the third and fourth smt threads while
smt_snooze_delay works on the first and second thread.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Explain when simultaneous multithreading is beneficial or not.
Details Review types of workloads and whether simultaneous multithreading is
beneficial or not. The bottom line is that it is difficult to predict ahead of time whether it
would be beneficial or not for any given workload. The best option is to monitor
performance with it off, then monitor performance with it on and see if theres a difference.
Additional information In AIX 5.3, the schedo command had a rounding error that set
the number of microseconds to one less than you specify. It use to cause the following:
# schedo -o smt_snooze_delay=10
Setting smt_snooze_delay to 10
# schedo -o smt_snooze_delay
smt_snooze_delay = 9
However, this has been fixed by APAR IY85228.
Transition statement Lets overview POWER7 intelligent threads
Uempty
Notes:
POWER7 features Intelligent threads that can vary based on the workload demand. The
system will automatically determine whether a workload benefits from dedicating as much
capability as possible to a single thread of work, or benefits from having capability spread
across 2 or 4 threads of work.
With more threads, POWER7 can deliver more total capacity as more tasks are
accomplished in parallel. With fewer threads, those workloads that need very fast individual
tasks (like databases or transaction workloads) can get the performance they need for
maximum benefit.
Powers Intelligent threads capability lets the system dynamically switch from single thread
(ST) to dual thread (SMT2) to quad thread (SMT4) modes per core.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe POWER7 intelligent threads.
Details
Additional information
Transition statement How can we enable or disable SMT?
Uempty
Turning on or off simultaneous
multithreading (1 of 2)
Use the smtctl command or SMIT to enable, disable, or see status:
smtctl [ -m off | on [ -w boot | now]]
SMIT fastpath: smitty smt
To turn simultaneous multithreading off dynamically (for now):
# smtctl -m off -w now
smtctl: SMT is now disabled.
# bindprocessor -q
The available processors are: 0
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Starting with AIX6.1 TL4, the smtctl command has been enhanced to support POWER7
SMT2 and SMT4 modes.
The -t option of the smtctl command will set the number of the simultaneous threads per
processor. The value can be set to one to disable simultaneous multithreading. The value
can be set to two for systems that support 2-way simultaneous multithreading (POWER6)
and the value can be set to four, for the systems that support 4-way simultaneous
multithreading (POWER7). This option cannot be used with the -m flag.
To disable simultaneous multithreading you can perform:
# smtctl -t 1
or # smtctl -m off
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Traditionally, AIX processor utilization uses a sample-based approach to approximate the
percentage of processor time spent executing user programs, system code, waiting for disk
I/O, and idle time.
AIX produces 100 interrupts per second to take samples. At each interrupt, a local timer
tick (10 ms) is charged to the current running thread that is preempted by the timer
interrupt. One of the following utilization categories is chosen based on the state of the
interrupted thread:
user: Interrupted code outside AIX kernel
sys: Interrupted code inside AIX kernel and currently running thread is not waitproc
iowait: Currently running thread is waitproc and there is an I/O pending
idle: Currently running thread is waitproc and there is no I/O pending
If the thread was executing code in the kernel through a system call, the entire tick is
charged to the process system time. If the thread was executing application code, the
entire tick is charged to the process user time. Otherwise, if the current running thread was
Uempty the operating systems idle process, the tick is charged in a separate variable. The problem
with this method is that the process receiving the tick most likely did not run for the entire
time period and happened to be executing when the timer expired.
Data structures
The processor utilization information is recorded in the sysinfo (system-wide) and cpuinfo
(per-processor) kernel data structures. These structures are documented in
/usr/include/sys/sysinfo.h. In order to preserve binary compatibility, this stays
unchanged with AIX 5L V5.3 or V6.1.
Performance tools such as vmstat, iostat or sar, convert tick counts from the sysinfo
structure into utilization percentages for a machine/partition. Other tools like
sar -P ALL and the topas hot CPU section, convert tick counts from the cpuinfo structure
into utilization percentages for a processor/thread.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
PURR
The Processor Utilization Resource Register (PURR) is a register, provided by the
POWER5, POWER6 and POWER7 processors, which is used to provide an actual count of
physical processing time units that a logical processor has used. All performance tools and
APIs utilize this PURR value to report CPU utilization metrics for Micro-Partitioning and
simultaneous multithreading systems. This register is a special purpose register that can
be read or written by the POWER Hypervisor but is read-only by the operating system
(supervisor mode). There are two registers, one for each hardware thread.
The PURR is used to approximate the time that a virtual or logical processor is actually
running on a physical processor. The register advances automatically so that the operating
system can always get the current, up-to-date value. The Hypervisor saves and restores
the register across virtual processor context switches.
Because there are many resources in the hardware, any one of which can be a bottleneck
that limits simultaneous multithreading gain, the use of the PURR is an approximation of
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
the time spent running. The execution time for a virtual processor can be calculated by
adding sibling thread PURRs.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
POWER7 implements four PURR registers. One for each hardware thread.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Thread0 Thread1
spurr0 spurr1
Physical CPU
timebase register
Copyright IBM Corporation 2011
Notes:
The POWER6 and POWER7 Systems internal power management results in performance
variability. This implies as the power management implementation operates, it can change
the effective speed of the processor.
POWER6 and POWER7 processors contain an additional special-purpose register for
each hardware thread known as the Scaled Processor Utilization of Resources Register
(SPURR). The SPURR is used to compensate for the effects of performance variability in
the operating systems. The hypervisor virtualizes the SPURR for each hardware thread so
that each OS obtains accurate readings that reflect only the portion of the SPURR count
that is associated with its partition. Implementing virtualization for the SPURR is the same
as that for the PURR. Building on the functions provided by hypervisor, the operating
systems use SPURR to do the same type of accurate accounting that is available on
POWER5 processor-based machines. With the introduction of the EnergyScale
architecture for POWER6 processor-based machines, not all timebase ticks have the same
computational value; some represent more usable processor cycles than others. The
SPURR provides a scaled count of the number of timebase ticks assigned to a hardware
Uempty thread, in which the scaling reflects the speed of the processor (taking into account
frequency changes and throttling) relative to its nominal speed.
The AIX tools reverted back to using PURR on POWER6 from 5.3 TL7 (SP9), 5.3 TL8
(SP7), 5.3 TL9 (SP4), and 5.3 TL10 (SP1). SPURR is used on Power6 from 6.1.0.0
through present.
The SPURR is supported on POWER6 and POWER7 processors. It is similar to the
Performance Utilization Resources Register (PURR), except that it scales as a function of
degree of processor throttling. If your hardware supports the SPURR, the processor use
statistics shown by the sar command are proportional to the frequency or the instruction
dispatch rate of the processor.
SPURR is used when Power Saver is activated.
System-wide tools have been modified for variable processor frequency. The EnergyScale
architecture might therefore affect some of the performance tools and metrics built with the
user-visible performance counters. Many of these counters count processor cycles, and
because the number of cycles per unit time varier, the values reported by unmodified
performance monitors are subject to some interpretation.
The lparstat command has been updated in AIX Version 6.1 to display new statistics if the
processor is not running at nominal speed. The %nsp metric shows the current average
processor speed as a percentage of nominal speed. This field is also displayed by the new
version of the mpstat command.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe the new PURR register for a simultaneous multithreading
environment.
Details Describe how the traditional sample-based utilization metrics are misleading if
used with simultaneous multithreading enabled. Show that there is a PURR for each logical
processor.
Additional information The decrementer (DEC) is a counter that is updated at the
same rate as the timebase register. It provides a means of signaling an interrupt after a
specified amount of time has elapsed unless the decrementer is altered by software in the
interim, or the frequency of the timebase update changes.
Transition statement How are CPU utilization metrics calculated in a simultaneous
multithreading environment?
Uempty
CPU utilization
In a simultaneous multithreaded environment and or a
Micro-Partition, CPU utilization statistics:
Still collect 100 samples per second (for binary compatibility)
Collect additional state-based PURR-based metrics (in PURR
increments)
Utilization metrics:
Same categories are used: user, sys, iowait, and idle
Physical resource utilization metrics for a logical processor:
(delta PURR/delta TB) Represents the fraction of the physical processor
consumed by a logical processor
(delta PURR/delta TB)*100 over an interval represents the percentage of
dispatch cycles given to a logical processor
Delta PURR0 Delta PURR1
Delta timebase
Timebase
Copyright IBM Corporation 2011
Notes:
Utilization metrics
AIX uses the PURR for process accounting. Instead of charging the entire 10 ms clock tick
to the interrupted process as before, processes are charged based on the PURR delta for
the hardware thread since the last interval, which is an approximation of the computing
resource that the thread actually received. This results in a more accurate accounting of
processor time in the simultaneous multithreading environment.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
At each interrupt:
The elapsed PURR is calculated for the current sample period.
This value is added to the appropriate utilization category, instead of the fixed-size
increment (10 ms) that was previously added.
The interval information is stored in the same four categories; user, sys, iowait, and idle.
There are metrics in AIX associated with simultaneous multithreading utilization. There are
two different ways to measure it: the threads processor time and the elapsed time. For the
first, the thread's PURR values are used and are now virtualized. To measure the elapsed
time, the timebase register (TB) is still used.
The physical resource utilization metrics for a logical processor are:
(delta PURR/delta TB) represents the fraction of the physical processor consumed by a
logical processor.
(delta PURR/delta TB)*100 over an interval represents the percentage of dispatch
cycles given to a logical processor.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
%sys
Delta PURR0 Delta PURR1
Delta timebase
Timebase
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The new registers used to track processor utilization also provide some new statistics.
Some statistics can only be viewed when the partition is using shared processors.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose List the enhancements to the performance library API to support simultaneous
multithreading.
Details This visual shows some commands been updated for the new virtual
environment.
Additional information
Transition statement Lets look at some of the performance monitoring commands
that had to be changed to support the simultaneous multithreading environment.
Uempty
Notes:
When AIX is running in simultaneous multithreading mode or in a Micro-Partition,
commands that display CPU information, such as vmstat, iostat, topas, and sar, display
the PURR-based statistics rather than the traditional sample-based statistics.
In simultaneous multithreading mode, additional columns of information are displayed:
pc or physc - Physical Processor Consumed
pec or %entc - Percentage of Entitlement Consumed (Micro-Partitions only)
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
trace -r PURR Collects the PURR register values. Only valid for a
trace run on a 64-bit kernel.
trcrpt -O PURR=[on|off] Tells trcrpt to show the PURR along with any
timestamps. The PURR is displayed following any
timestamps. If the PURR is not valid for the processor
traced, the elapsed time is shown instead of the PURR.
If the PURR is valid, or the CPUID is unknown, but
wasn't traced for a hook, the PURR field contains
asterisks (*).
netpmon -r PURR Uses the PURR time instead of timebase in percent
and CPU time calculation. Elapsed time calculations
are unaffected.
pprof -r PURR Uses the PURR time instead of timebase in percent
and CPU time calculation. Elapsed time calculations
are unaffected.
gprof New environment variable GPROF controls the gprof's
new mode that supports simultaneous multithreading.
curt -r PURR Uses the PURR register to calculate CPU times.
splat -p Specifies the use of the PURR register to calculate
CPU times. The output will show the message PURR
was used to calculate CPU times.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
10:00:22 0 22 67 0 11 0.51
1 1 99 0 0 0.49
- 11 84 0 5 1.00
Average 0 22 67 0 11 0.51
1 0 100 0 0 0.49
- 11 84 0 5 1.00
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smtctl
This system is SMT capable.
SMT is currently enabled.
SMT boot mode is set to enabled.
SMT threads are bound to the same physical processor.
proc0 has 2 SMT threads.
Bind processor 0 is bound with proc0
Bind processor 1 is bound with proc0
Notes:
The mpstat command collects and displays performance statistics for all logical CPUs in
the system.
If simultaneous multithreading is enabled, the mpstat -s command displays logical
processors usage as shown in the visual above. In the example, logical processor cpu0 is
65.23% busy and logical processor cpu1 is 32.18%. cpu0 and cpu1 are hardware threads
for proc0.
Because this example output is from a dedicated partition, the logical processors could
simply be running the AIX wait thread and not be doing any real work. The two logical
processor utilization metrics always add up to a whole processor. Well see later in this unit,
how idle cycles are ceded back to the shared processing pool if the partition was using
shared processors. With share processor LPARs, the mpstat output would show the time
that the logical processors were busy doing real work.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smtctl
This system is SMT capable.
SMT is currently enabled.
SMT boot mode is set to enabled.
SMT threads are bound to the same physical processor.
proc0 has 4 SMT threads.
Bind processor 0 is bound with proc0
Bind processor 1 is bound with proc0
Bind processor 2 is bound with proc0
Bind processor 3 is bound with proc0
Notes:
On POWER7 processor-based systems, If simultaneous multithreading is enabled, the
mpstat -s command displays four logical processors usage as shown in the visual
above. In the example, logical processor cpu0 is 35.23% busy, logical processor cpu1 is
22.18%, logical processor cpu2 is 25.23% busy and logical processor cpu3 is 17.18%
busy. cpu0, cpu1, cpu2 and cpu3 are hardware threads for proc0.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The topas output shows statistics by logical processor. The metrics have been applied so
that processor utilization is calculated using the PURR-based register and formula when
running in simultaneous multithreading (or with shared processors).
The visual shows output from a system with two dedicated processors and simultaneous
multithreading enabled, which is why we see four logical processors.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
This slide shows the traditional POWER6 statistics behavior. When one process is running
with SMT enabled, the processor appears 100% busy. Looking at the mpstat -s output,
you can notice the logical processor cpu0 100% busy and the logical processor cpu1 idle.
The topas output shows the logical partition 50% busy because we have two processors
in the logical partition.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
One of the new behaviors with POWER7 processor-based systems, is to provide a better
representation of the core capacity in the utilization metrics. The slide shows a topas
command output and an mpstat -s command output on a two cores system when in
SMT4 mode with only one process (spload program running a single thread) running.
Compared to POWER6 processor-based systems, you can notice in the mpstat -s
command output, that when a single thread is running on a POWER7 processor, the
statistics do not reflect the logical processor 100% percent busy but about 63 % instead.
Looking at the topas output, you can notice the logical partition is 32 % busy.
When MST-4 is enabled on POWER7 processor-based systems, the statistics reflect that
the capacity of the core is really more than what one thread can consume. So the idle
threads appear to be taxed more. The statistics of the different logical processors are
weighted if there are idle threads in the core. If the SMT mode is explicitly changed
(through smtctl), the weighting changes, since the potential capacity of the core changes.
While a single thread can get equivalent of single-thread performance mode (provided the
other 3 threads are idle), the weights reflect that there is still headroom in the core.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Here is a example of the dynamic SMT scheduling on POWER7. The slide shows four
processes running on a two core logical partition. SMT-4 is enabled and because only four
processes are running, the system switched automatically to SMT-2 by left shifting the work
to the two first logical processors of each virtual processor.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce section discussing SPLAR considerations.
Details In the following section, we will discuss the Micro-Partitioning or Shared
Processor LPAR (SPLPARs) considerations. First, we will briefly discuss the benefits of the
dedicated processor LPAR. We will then contrast this with the benefits of SPLPARs.
Additional information
Transition statement
Uempty
Dedicated processors
Performance benefits:
LPAR
Processor and memory affinity Dedicated
utilized for best performance
Whole
processors
Performance considerations: allocated to a
Unused capacity lost partition
When partition is
stopped, dedicated
processors might (or
might not) go to shared
pool.
Notes:
Dedicated processors were used exclusively on the POWER4 processor-based
LPAR-capable systems and it is one of the configuration options on the System p5 and
Sserver p5 platforms.
Dedicated processors are whole physical processors exclusively allocated to a particular
partition. When the partition is shut down, the processors can return to the shared
processing pool. When the dedicated processor partition starts again, it is allocated
dedicated processors, although the actual physical processors might be different than the
last time it was activated.
A checkbox in the partition profile indicates whether idle processors are returned to the
shared pool. The box is labelled Allow idle processors to be shared and if checked,
when the partition shuts down, its processors become part of the shared pool.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Processor affinity
There is performance overhead involved when jobs switch processors because of latency
due to the context switches and cache misses. With dedicated partitions (using dedicated
processors), because the partition uses the same physical processors, there is less
potential for this latency and for cache misses than there is with a shared processor
partition utilizing processors out of a shared processing pool. This is a function not of using
shared processors versus dedicated processors, but a function of the increased processor
utilization.
Processor affinity refers to how the overhead of processor context switches is reduced as
much as possible by scheduling work on the same processor if that processor is available.
Memory affinity
When processors are allocated to a partition, an attempt is made to allocate physical
memory that is local to the processors. (Local memory is physical memory that is on the
same node, for example, MCM or DCM, as the processors).
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Shared processors (1 of 2)
Processor capacity assigned in processing units from the shared
processor pool
Partitions guaranteed amount is its entitled capacity (EC)
Performance benefits:
Excess processing capacity can be used by other partitions
Configuration flexibility
Performance considerations:
Context switches and cache misses
Notes:
Shared processors are physical processors which are allocated to partitions on a timeslice
basis. Any physical processor in the shared processor pool can be used to meet the
execution needs of any partition using the shared processor pool.
There can be a mix of shared and dedicated partitions on the same managed system. A
partition uses shared or dedicated processors, and you cannot use dynamic LPAR
commands to change between the two. You need to bring down the partition and switch it
from using dedicated to shared, or vice versa, by using a different partition profile or
altering the existing one.
Processing units
When a partition is configured, you assign it an amount of processing units. A partition
must have a minimum of one tenth of a processor and after that requirement has been met,
you can configure processing units at the granularity of one hundredth of a processor.
Micro-Partitions
The term Micro-Partition is used for partitions that take advantage of shared processing. A
system can be configured with many Micro-Partitions each running independently. The I/O
needs for many small Micro-Partitions can be supported by another partition called a
Virtual I/O server. This concept is covered later in this course.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose This visual introduces shared processors.
Details Describe the concept of a shared processor. The discussion of how the shared
processor pool works is coming up in the next few visuals.
The term Micro-Partition can be used for any partition using shared processors because
you can use sub-processor allocations. Micro-Partitioning also means using virtual
processors, which is covered in a few pages.
Mention the acronym SPLPAR, because its used in some documentation.
Define the term entitled capacity.
Point out the Benefits and Disadvantage of using shared processors sections in the student
notes. In the next few pages of this unit, well be looking at the concepts brought up in the
Disadvantage... section, so just use this as a hint of things to come.
Remind students that there is just one shared processing pool and that the sum of all the
entitled capacity for running shared processor partitions cannot exceed the physical
processor capacity of the pool.
Additional information
Transition statement Lets look at how shared processing units map to processing
time.
Uempty
Shared processors (2 of 2)
Each partition is configured with a percentage of execution dispatch
time for each 10 ms timeslice (dispatch window).
Examples:
A partition with 0.2 processing units is entitled to 20% capacity during each
timeslice.
A partition with 1.8 processing units is entitled to 18 ms of processing time for
each 10 ms timeslice (using multiple processors).
The Hypervisor dispatches excess idle time back to pool.
Processor affinity algorithm takes into account hot cache.
10 ms 20 ms 10 ms 20 ms
10 ms 20 ms
Copyright IBM Corporation 2011
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
timeslice, which add up to the equivalent of 1.8 of a processor. Another way to say this is
that 1.8 processing units is 18 ms of processing time that happens on multiple processors
during a 10 ms clock time period.
Processor affinity
Hot cache refers to cache which still has data relevant to a current running process. If a
process is interrupted and another runs on that physical processor, and then the original
thread is ready to run again, its data might still be in the cache. If the time threshold has not
been reached, the original process will attempt to run on the same physical processor. This
is called processor affinity.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
P P P P Partition 1
10 ms
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
Shared processing pool
Notes:
This visual shows seven partitions running on four shared processors. Partition 1 is circled
in the example to show that within a single 10 ms timeslice it ran on two processors
simultaneously, then interrupted, and returned on a third processor. Any excess processing
time is returned to the shared processing pool.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Virtual processors (1 of 3)
Virtual processors are used to tell the operating system how many
whole physical processors it thinks it has.
Operating system in LPAR2 does not see 1.75 processing units; it sees the
configured virtual processors.
In this example, each partition sees four processors.
LPAR1 LPAR2
Entitled
capacity in
EC=2.4 EC=1.75
processing Virtual
units processors
Physical
processors
Shared processor pool
Notes:
The virtual processor setting defines the way that a partitions entitlement can be spread
concurrently over physical processors. That is, you can think of the processing power
available to the operating system on the partition as being spread equally across these
virtual processors. The number of virtual processors is what the operating system thinks it
has for physical processors. The Hypervisor dispatches virtual processors onto physical
processors.
The example in the visual above shows six physical processors in the shared pool, and
each partition thinks it has four processors.
The number of virtual processors can be configured independently for each shared
partition. The number of virtual processors can be changed dynamically.
Micro-Partitioning redefined
Previously, this course defined a Micro-Partition as a partition that takes advantage of
shared processors and sub-processor increments.
Uempty Micro-Partitioning is that it also defined as the mapping of virtual processors to physical
processors. A partition is configured with a number of virtual processors, which are then
dispatched on the physical processors in the shared pool.
In Micro-Partitioning there is no fixed relationship between virtual processors and physical
processors. The POWER Hypervisor can use any physical processor in the shared
processor pool when it dispatches a virtual processor.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe what a virtual processor is.
Details Stay at the concept level on this visual. The next visual discusses the minimum
and maximum configuration options for virtual processors.
It might be a good idea at this point to turn back to the previous visual titled Figure 3-7
Shared Processor Pool. This visual illustrates an example Partition 1 running threads
simultaneously on two physical processors. If Partition 1s virtual processor number was
increased to four, then you might see Partition 1 running on all four processors at the same
time.
Additional information
Transition statement The next visual discusses minimum and maximum settings for
virtual processors.
Uempty
Virtual processors (2 of 3)
By default, for each 1.00 of a processor or part thereof, a
virtual processor will be allocated.
Example: 3.6 processing units would have four virtual processors.
Up to 10 virtual processors can be assigned per processing
unit.
Example: 3.6 processing units can have up to 36 virtual processors.
Number of virtual processors does not change the entitled
capacity.
Both entitled capacity and number of virtual processors can be
changed dynamically for tuning.
Maximum virtual processors per partition is 64
Example:
Partition with 4.2 entitled capacity
Minimum virtual processors = ______
Maximum virtual processors = ______
Notes:
The virtual processor setting does not change the total number of guaranteed processing
units (entitled capacity). For example, a partition with 1.5 capped processing units will still
have only 15 ms of processing time, whether that is on two physical processors or four.
With four virtual processors, the partition might consume its entitled capacity in a shorter
period than on two virtual processors.
If the partition with 1.5 processing units was uncapped, then with four virtual processors it
could consume as much as 40 ms per timeslice if there was sufficient spare capacity in the
shared processor pool. If the partition had two virtual processors, then it would only be able
to consume at most 20 ms per timeslice, even if there is more unused capacity. You want to
be sure to have enough virtual processors configured to take full advantage of all the
physical processors that can be used.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Virtual processors (3 of 3)
Example:
Partition has 1.5 processing units (EC).
For each 10 ms timeslice, it is entitled to 15 ms of processing time.
Entitled capacity is distributed between all of the virtual processors.
Entitled
capacity
LPAR1 LPAR2
EC=1.5 EC=1.5
Virtual
processors
15 ms split on up to 15 ms split on up to
two physical four physical
processors processors
Notes:
The visual above shows how two partitions, each with 15 ms of processing time, divide up
the work between the virtual processors. In both cases, the work to be done in the partition
is split evenly between the virtual processors. So LPAR1 might have 7.5 ms on each virtual
processor, and LPAR2 might have 3.75 ms on each of the four virtual processors. This is
simplified, because of reality, a partition might not use its entire allotment of processing
time due to blocking for I/O, and so on.
Depending on the entitled capacity, a partition can be configured with up to 64 virtual
processors. By increasing the number of virtual processors, you decrease the amount of
processing time assigned to each virtual processor. The Hypervisor dispatches virtual
processors onto physical processors. If there are more virtual processors in a partition than
you have physical processors in the shared processing pool, then multiple virtual
processors will run on one physical processor. This causes context switches that incur a
performance cost.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Physical
CPUs
0 VP 1* VP VP Idle Idle VP 0*** VP 1* Idle
2*** 1*
1 VP 0** VP 0* VP 0***VP 1*** VP VP 0* VP 1*** VP 0**
Dispatch wheel pass 1 2*** Dispatch wheel pass 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Time in milliseconds
Notes:
Virtual processors are dispatched in a time-sliced manner onto physical processors under
the control of the POWER Hypervisor, much like an operating system timeslices software
threads. Virtual processors have dispatch latency, because they are scheduled. When a
virtual processor is made runnable, it is placed on a run queue by the POWER Hypervisor,
where it sits until it is dispatched. The time between these two events is referred to as
dispatch latency.
Notice these scheduling points from the graphic in the visual above:
LPAR1s work is evenly divided over the physical CPUs. It is entitled to 80% of a
timeslice, and the workload is 40% on each physical processor.
The same virtual processor can be re-dispatched in the same dispatch wheel pass. In
the visual above, VP1 of LPAR1 is dispatched twice on CPU0 in the first dispatch wheel
pass.
LPAR2 has just one virtual processor, and it is dispatched on the same CPU (processor
affinity).
Uempty LPAR3 has three virtual processors and runs on two physical processors. Notice that
the virtual processors context switch three times within six ms on physical CPU 1.
Dispatch wheel
The POWER Hypervisor uses the metaphor of a dispatch wheel with a fixed timeslice of 10
milliseconds (1/100 of a second) to guarantee that each virtual processor receives its share
of the entitlement in a timely fashion. When a partition is completely busy, the partition
entitlement is evenly distributed among its virtual processors.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Illustrate how partitions share processors. Define what is meant by dispatch
latency and what causes it.
Details Show how the three partitions in the visual share two physical processors. To
do this, walk through all three partitions and how their virtual processors are dispatched
onto the physical processors. Do not describe the concept of uncapped yet, but students
might ask about using the idle capacity. If the students have black and white printouts of
this visual, the asterisks are there to show which boxes belong to which partitions.
Describe the concept of a dispatch wheel, and how the POWER Hypervisor uses this
wheel to ensure each partition can utilize its EC within the bounds of each 10 ms timeslice.
The point to emphasize on this visual is that as the number of virtual processors configured
for a partition increases, this will cause additional context switching. The Hypervisor
attempts to put virtual processors back on the physical CPU where it just ran, but
sometimes this is not possible, causing a cache miss. So, it is important to have just the
right amount of virtual processors, and not too many. Well be discussing this more in this
unit. This is the concept portion of this topic; well get to the performance management
portion in several pages.
Additional information One thing we dont show on any of these virtual processor
dispatch diagrams is that the Hypervisor uses Hypervisor decrementer interrupts to gain
control of a processor in order to dispatch a virtual processor of its hidden partition to
perform Hypervisor work. The Hypervisor has a layer of code (a small operating system)
that runs in a hidden partition and does not have any entitled capacity assigned to it. The
operating system in a partition is optimized to let the Hypervisor know when it has cycles
that the Hypervisor can use.
Transition statement Lets look in more detail at the concept of shared processor
affinity.
Uempty
Home node:
A virtual processor is assigned a home node.
MCM/DCM where most of the memory comes from
Virtual processor migrates back to home node whenever it has no
affinity left.
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe how shared processor affinity works, and what a home node is.
Details This visual introduces more detail about shared processor affinity and how it
works.
Additional information
Transition statement The next page describes scheduling affinity domains.
Uempty
Notes:
The different scheduling affinity domains represent the different levels of affinity. As
previously stated, the POWER Hypervisor always tries first to dispatch the virtual processor
onto the same physical processor that it last ran on and, depending on resource utilization,
will broaden its search to the other processor on the POWER5 or POWER6 chip, then to
another chip on the same MCM, and then to a chip on another MCM.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
mpstat -d
Starting in IBM AIX 5L V5.3, the mpstat command using the -d flag displays detailed
affinity and migration statistics for AIX threads and dispatching statistics for logical
processors.
The mpstat -d command shows statistics since system boot. Use an interval value to
obtain periodic statistics. Large numbers of involuntary switches in a small interval could
mean that there are too many virtual processors.
Uempty S1rd The process redispatch occurs within the same physical processor,
among different logical processors. This involves sharing of the L1, L2,
and L3 cache.
S2rd The process redispatch occurs within the same processor chip, but
among different physical processors. This involves sharing of the L2 and
L3 cache.
S3rd The process redispatch occurs within the same MCM module, but among
different processor chips.
S4rd The process redispatch occurs within the same central processing
complex (CPC) plane, but among different MCM modules. This involves
access to the main memory or L3-to-L3 transfer.
S5rd The process redispatch occurs outside of the CPC plane.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Define schedule affinity domains and monitor key fields in mpstat output.
Details Point out the involuntary and voluntary logical context switch columns.
Additional information
Transition statement Lets look at capped SPLPARs.
Uempty
LPAR Capacity
CededUtilization
capacity
Utilized capacity
Time
Capped LPAR cannot use more than entitled capacity.
Notes:
The visual illustrates a capped partition that uses all of its entitled processor capacity twice
over the time shown, but it cannot use more.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Illustrate how a capped partition cannot use more than its entitled capacity.
Details The utilized capacity labeled on the visual (in orange) shows the partition using
processing resources. Twice, the partition reaches its total entitled capacity and it is not
allowed to use more.
Additional information
Transition statement Now, lets look at a similar graph with an uncapped partition.
Uempty
Utilized Capacity
Time
Uncapped LPAR takes advantage of idle capacity.
Notes:
In the visual the uncapped partition reaches its entitled capacity and is allowed to utilize
capacity from the shared pool. Notice that the partition can use more than its maximum
processor capacity. The maximum setting limits dynamic LPAR operations only when
changing the entitled capacity; it has no relevance to uncapped partitions being allowed to
utilize idle processing resources.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Illustrate how an uncapped partition can use idle processing resources from
the shared pool.
Details The utilized capacity labeled on the visual (in orange) shows the partition using
processing resources.
Additional information
Transition statement Now, lets look at the effect of having more or fewer virtual
processors.
Uempty
4
Maximum Processor
Capacity
3
Entitled Processor
Capacity
2
Capped Potential Capacity
Minimum processor capacity
1
Entitled Capacity Per Virtual CPU
1 2 3 4 5 6
Virtual processors
Notes:
The visual compares a partition with one to six virtual processors. The visual compares the
partitions processor utilization when it is uncapped (first bar in each set) to when it is
capped (second bar in each set).
When capped, the partition can only utilize its entitled processor capacity.
When uncapped, the partition can not only use additional processor capacity, but its ability
to maximize the usage of the idle cycles is directly related to the number of virtual
processors it has. This is because a virtual processor can utilize a maximum of 10 ms in
each dispatch window. This visual illustrates that if you have too few virtual processors, you
would be limiting how much of the idle processing capacity can be used by the partition. It
also illustrates that having more virtual processors with a capped partition than it is capable
of using not only does not improve performance, but also increases the context switches
unnecessarily. Notice the line in the visual above labeled Entitled Capacity Per Virtual CPU.
This line shows that each virtual processor is doing less and less work.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Performance recap
Having dedicated processors provides improved performance over shared capped
processor performance because of reduced processor cache misses and reduced latency.
However, a partition using dedicated processors cannot take advantage of using excess
shared pool capacity as it could with an uncapped partition using the shared processing
pool. Performance could be better with the uncapped processors if there is excess capacity
in the shared pool that can be used.
Configuring the virtual processor number on shared processor partitions is one way to
increase (or reduce!) the performance for a partition.
The virtual processor setting for a partition can be changed dynamically. You can monitor
performance and change the virtual processor setting dynamically to see wether the
performance improves.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty For example, if an uncapped partition is configured with 1.5 processing units and there are
eight processors in the shared processor pool, you could configure up to 15 virtual
processors because 15 is the maximum for 1.5 processing units. However, the
recommendation is to configure the uncapped partition with eight virtual processors and
check the performance. You can then increase the number of virtual processors until you
see performance degrade.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Discuss the configuration guidelines for virtual processors.
Details Trial and error is the main point on this visual, because partition workloads are
all different. Some applications benefit from more virtual processors, and for others, it just
introduces more overhead.
This visual mentions a new concept: virtual processor (VP) folding. Do not go into detail
about this feature here because the topic is covered in detail starting in two visuals. Simply
say here that VP folding is a feature with AIX 5L V5.3 ML3 or above and well discuss this
in a few pages.
Additional information
Transition statement Lets look at the specific metrics to watch when tuning virtual
processors.
Uempty
Notes:
Here is a summary of how to use some of the new performance management tools that can
be used to monitor and make management decisions about processing resources.
If the user and system CPU usage is consistently high for an uncapped partition, but there
are available physical processors in the shared pool, then increase the number of virtual
processors. Monitor performance of the partition to see wether it increases.
You can also use vmstat to determine the percent of user and system time (us and sy
columns). The vmstat command also shows percent of CPU idle time (id column) and
CPU idle time when there are outstanding I/O requests (wa column). For shared partitions,
vmstat also shows the amount of physical processors consumed (pc column) and the
entitled capacity consumed (ec column).
Example vmstat output (showing only the columns related to CPU):
# vmstat 2 2
System configuration: lcpu=4 mem=1024MB ent=0.80
cpu
-----------------------
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-95
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
us sy id wa pc ec
97 0 0 3 0.80 100.0
94 0 0 6 0.80 100.0
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-97
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-99
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-101
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Enable/Disable VP folding
The schedo command is used to dynamically enable, disable, or tune the VP folding
feature. The VP folding feature is configurable by changing the vpm_fold_policy parameter.
To enable or disable the AIX processor folding feature depending on the partition type:
Uempty schedo -o vpm_fold_policy=0 => disable for both shared and dedicated
processors
schedo -o vpm_fold_policy=1 => enabled for shared processors, disabled for
dedicated processors
schedo -o vpm_fold_policy=2 => disabled for shared processors, enabled for
dedicated processors
schedo -o vpm_fold_policy=3 => enabled for both shared and dedicated
processors
1 is the default value
Typically, this feature should remain enabled. The disable function is available for
comparison reasons and in case any tools or packages encounter issues due to this
feature.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-103
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Explain the benefits of VP folding and how to disable or enable it.
Details Describe the benefits of the VP folding feature.
Describe how to use schedo to disable or enable this feature. The next visual shows how to
tune it.
We have not yet discussed why you might want to tune this feature, and that is coming up
soon.
Additional information
Transition statement Lets look at how this feature can be set to different values.
Uempty
Notes:
Configuring vpm_xvcpus
Every second, the kernel scheduler evaluates the number of virtual processors in a
partition based on their utilization. If the number of virtual processors needed to
accommodate the physical utilization of the partition is less than the current number of
enabled virtual processors, one virtual processor is disabled. If the number of virtual
processors needed is greater than the current number of enabled virtual processors, one or
more (disabled) virtual processors are enabled. Threads attached to a disabled virtual
processor are still allowed to run on it.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-105
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe how the VP folding option to schedo is used to calculate the number
of activated virtual processors.
Details This visual shows an example when vpm_xvcpus has the default value of 0.
Well get to other values on the next visual.
Additional information
Transition statement Next, well see how to tune this parameter and why.
Uempty
Notes:
Tuning VP folding
The visual above shows how the vpm_xvcpus value set using schedo is used to determine
the number of VPs to fold. You can set this value to an integer to tune how the VP folding
feature will react to a decrease in workload.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-107
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
work onto a fewer number of virtual processors is lost. In such environments, you might
want to configure a primary shared processor partition so that it has enough resources to
take over the entire shared processor pool, assuming its partition entitlement is large
enough, or it is uncapped. This enables more physical resources to be allocated to the
partition more quickly, with the additional benefit of being able to allocate essentially
dedicated processor resources to the partition. In this scenario, the assumption is that the
other shared processor partitions are mostly idle and are configured to utilize a fewer
number of virtual processors by default.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-109
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
PURR
The PURR value, covered in the simultaneous multithreading unit of this course, is used to
accumulate information only when the virtual processor is dispatched on a physical
processor. So PURR is utilized even if simultaneous multithreading is disabled, because it
provides accurate processor utilization statistics in a shared processor environment.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-111
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
virtual purr0
virtual timebase
virtual purr9
virtual proc0 logical CPU7 virtual proc5
logical CPU7
PURR statistic: LPAR 1 virtual purr7
Still measures fraction of time virtual purr6
virtual timebase
partition runs on a physical virtual proc3 LPAR 4
processor (the relative amount of
processing units consumed)
Dispatched
Thread1
Thread0
purr1
purr0
physical proc0
Copyright IBM Corporation 2011
Notes:
PURR statistic
The PURR statistic was described in the simultaneous multithreading unit in this course.
Notice that it still measures thread CPU elapsed time as was described in the
Simultaneous Multithreading unit.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-113
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
These tools were shown in the simultaneous multithreading unit in this course. Some of the
columns are only shown with simultaneous multithreading or Micro-Partitioning.
/usr/bin/lparstat
The -i option provides detailed LPAR information, and -H and -h provide Hypervisor
specific information. You can run lparstat over time with interval and count arguments;
otherwise it shows statistics since the last operating system boot. For example, lparstat
-h 2 5 runs the command five times, with two second intervals.
The lparstat output, in the visual above, shows a system configured as shared, capped,
and with simultaneous multithreading enabled. It has four logical processors (lcpu), 1 GB
of memory (mem), and its entitled capacity (ent) is 0.80. The psize field shows there are
two physical processors in the shared pool.
Uempty The following additional columns are displayed if the partition type is shared:
Field Description
physc Number of physical processors consumed.
%entc Percentage of the entitled capacity consumed.
Percentage of logical processor(s) utilization that occurred while
lbusy
executing at the user and system level.
app Available physical processors in the shared pool.
Number of virtual context switches which are the virtual processor
vcsw
hardware preemptions.
Number of phantom (targeted to another shared partition in this pool)
phint
interruptions received.
In the example output in the visual above, we see that the partition was mostly idle, at
99.8%. It used only 0.3 of its 0.80 entitled capacity (entc%). This consumed 0% of a
physical processor (physc), which because of the limit of two decimal places, shows as
zero.
/usr/bin/mpstat
If using shared processors, and simultaneous multithreading is enabled, the mpstat -s
command displays physical as well as logical processor usage as shown in the example
below.
In the output shown below, the physical processor Proc0 is busy at 0.35%, which is made
up of on logical processor cpu0 (0.26%) and logical processor cpu1 (0.09%). cpu0 and
cpu1 are hardware threads for Proc0.
# mpstat -s 1 1
System configuration: lcpu=4 ent=0.8
Proc0 Proc2
0.35% 0.02%
cpu0 cpu1 cpu2 cpu3
0.26% 0.09% 0.01% 0.01%
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-115
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose The AIX 5L V5.3 and above commands listed have enhancements to support
shared processors.
Details Point out the changes in lparstat and mpstat commands when a partition uses
shared processors.
Describe that the PURR statistics are also used for shared processor partitions even with
simultaneous multithreading disabled.
Additional information
Transition statement The next page lists more commands modified in AIX 5L V5.3 to
support shared processor partitions.
Uempty
Notes:
/usr/bin/iostat
iostat was modified to add two additional metrics: Physical processor consumed (physc
column) and Percentage of entitlement granted (%entc column). These are shown only in
shared processor partitions or if simultaneous multithreading is enabled.
Physical processor consumed shows a measure of the fraction of time a logical processor
gets physical processor cycles.
Percentage of entitlement consumed gives the relative entitlement consumption for each
logical processor.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-117
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-119
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
sar -P ALL
When running in a shared partition, sar displays the percentage of entitlement consumed
%entc which is ([PPC/ENT]*100). PPC is Physical Processor Consumed and ENT is
entitled capacity. This gives relative entitlement utilization for each logical processor and
allows system average utilization calculation from logical processor utilization.
Whenever the percentage of entitled capacity consumed is under 100%, a line beginning
with U is added to represent the unused capacity.
The physical processor consumed physc (delta PURR/delta TB) column shows the relative
simultaneous multithreading split between processors, that is, it shows the measurement of
fraction of time a logical processor was getting physical processor cycles.
Uempty On average, for the time that it actually consumed (0.01), cpu0 spent 20% in user time
and 50% in system time.
Although it has 1.00 entitled capacity (ent), it only used an average of 0.8% (%entc) of
it.
This very little time on a CPU consumed about 0.01 of a physical CPU (physc). The
physc column shows the amount of time that a virtual processor actually ran on a
physical CPU. When these virtual processors arent doing anything, the partition cedes
its excess cycles back to the Hypervisor.
In summary, on shared partitions, it is important that you dont just look at the %usr and
%sys columns and state whether the processors are busy or not. The output in the visual
above shows that this partition is hardly doing any work at all. In fact, this output was taken
on a system simply running only the operating system. But, it looks like its 79% busy. On a
shared partition, you must look at the other columns to figure out if it was indeed busy. For
example, if the %entc was nearing 100% (or more than 100% in the case of uncapped
partitions), then the partition is busy.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-121
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe the differences in the sar -P ALL output when using a shared
processor partition.
Details In the visual, you can use the example of the first line for cpu0 by saying that for
0.01 of a physical processors time, it was 23% in user, 58% in system, and 19% idle. The
important point is that although on average this system looks like it was 79% busy, that is
only for the actual time that the virtual processor(s) spent on a physical CPU. Any excess
processing cycles are ceded back to the Hypervisor.
The output in the visual shows a partition that is only using 0.01 of a physical processor
and on average 0.8% of its entitled capacity.
The example on the visual shows a simple example with an entitled capacity of 1.00. The
next visual shows a more complex scenario.
Additional information
Transition statement Lets see how sar -P looks on a busy system.
Uempty
# mpstat -s 1 1
System configuration: lcpu=4 ent=0.8
Proc0 Proc2
46.05% 53.66%
cpu0 cpu1 cpu2 cpu3
29.78% 16.27% 3.90% 49.76%
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-123
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
mpstat -s output
Notice the following in the mpstat -s output in the visual above:
We can confirm with the mpstat -s command that two logical processors are busy and
two are idle.
The two logical processors that are busy are on different virtual processors. If you watch
over time, you might see the load bounce around to different logical processors.
By adding up the percent busy for both virtual processors, you reach the value of about
80% of a physical processor. This makes sense with the entitled capacity of 0.80 on an
extremely busy system.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-125
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The topas output has been modified for Micro-Partitions. The new metrics have been
applied so processor utilization is calculated using new PURR-based values and formula
when running in simultaneous multithreading or Micro-Partitioning mode. Additional
information is:
Physc: The fractional number of processors consumed (shows for both dedicated and
shared partitions)
%Entc: The percentage of entitled capacity consumed (shown only for shared partitions)
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-127
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# topas -L
Notes:
topas -L output
This visual above shows the output when you press L while in topas, or when you invoke
topas with the -L option. This screen shows more partition-related statistics.
In this output, you can see the percentage of time the logical processors are busy (%lbusy),
the available processor pool (app), the number of voluntary context switches (vcsw), the
number of phantom interrupts (phint), the percentage of time processing Hypervisor calls
(%hypv), and the number of Hypervisor calls (hcalls).
There is also a break down of statistics for each logical processor.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-129
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
This visual above shows the output when you press C while in topas, or when you invoke
topas with the -C option.
As of AIX 5L 5.3 Maintenance Level 3, the topas command can also report some
performance metrics from remote partitions. This cross partition panel displays metrics
similar to the lparstat command for all of the AIX partitions it can identify as belonging to the
same hardware platform.
The example above shows a system with multiple partitions running at AIX 5L V5.3 ML3. In
this particular system, the partitions do not have any network interfaces configured and rely
on retrieving this data from the HMC, through the service processors network connection
to the HMC.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-131
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe how to use topas to get information from other partitions.
Details Describe how to invoke topas and view partition-related information from other
partitions.
Additional information There does not appear to be any security built in to the topas
-C function. That is, theres no way to block one partition from retrieving performance data
from another except by not upgrading to maintenance level 3.
Transition statement The lbusy percentage can be confusing. Lets look at this more
closely.
Uempty
# lparstat 2 3
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ---- ----- ----- ----- ----- ------ --- ---- -----
95.1 0.2 0.0 4.6 0.80 99.9 49.6 1.20 411 0
96.5 0.2 0.0 3.3 0.80 99.9 49.6 1.20 417 1
94.2 0.2 0.0 5.6 0.80 100.0 50.5 1.20 479 0
# mpstat -s 1 1
Notes:
The visual shows the output of lparstat and mpstat on a busy system. You can see that
although the system is using all of its entitled capacity (100% of its 0.80 processing units),
the logical processors are only 50% busy. Because this system has four logical processors,
and the load on the system apparently has only two threads, only two logical processors
are being utilized. lbusy therefore reports that 50% of the logical processors are busy. If we
were to start a load on the system using four threads, then we should see lbusy at 100%.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-133
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe what the lbusy percentage value represents.
Details Use the example output on the visual to show how lbusy might be less than
100% even when the system is using 100% of its entitled capacity.
Additional information
Transition statement We have two final topics to wrap up the Micro-Partitioning unit.
The first is a description of the types of applications best suited for Micro-Partitioning.
Uempty
Notes:
This page reviews some types of applications that would do well or not so well with
Micro-Partitioning. Your results might vary.
Decision Support System (DSS) and High Performance Computing (HPC) are examples of
applications that might be CPU-intensive. Online Transaction Processing applications are
an example of a low average CPU utilization because it usually has a human interface with
high idle times.
Polling behavior
If an application is constantly polling for available resources or particular conditions, this
uses CPU resources that might be available for other partitions without actually doing any
work. If this interferes too much with other partitions, and if dedicated processors are an
option, you might wish to try using dedicated processors for this partition.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-135
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Give general guidelines of the types of applications that would do well or not
so well with Micro-Partitioning.
Details Discuss different types of applications and whether they would be good
candidates for using shared processors.
This discussion should start with the reminder that using shared versus dedicated
processors is a trade-off. Using dedicated results in the best performance if you have
plenty of processors. But, if you need to use shared processors because you need to make
more effective use of the processors that you have, then the discussion should focus on
what type of applications would fare better than others.
Additional information
Transition statement As a final word on Micro-Partitioning, lets look at some
strategies for shared partition capacity planning.
Uempty
Notes:
This visual lists three strategies that you can use when planning your system.
Do you want all partitions to be guaranteed their CPU resources? If you do not know the
behavior of the applications on your system, having dedicated CPUs is the safest strategy
to use (performance-wise) if you have enough resources.
Do you have a wide range of applications on the system? The harvested capacity strategy
has some partitions that might have unused capacity, which will be harvested by others.
The third strategy is the most cost-effective and performance-risky option. With this
strategy, you must monitor the partitions closely. Its safe if some of the partitions peak, but
if most of the partitions peak simultaneously, then the resources are overcommitted.
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-137
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe the three strategies for capacity planning on a system where
partitions will utilize shared processors.
Details Discuss the three strategies and explain how you each has its own trade-off.
Additional information
Transition statement Next, well do some checkpoint questions.
Uempty
Checkpoint (1 of 4)
1. Match the following processor terms to the statements that describe
them: Dedicated, shared, capped, uncapped, virtual, logical
a. ___________ These processors cannot be used in micro-partitions.
b. ___________ Partitions marked as this might use excess processing
capacity in the shared pool.
c. ___________ There are two or four of these for each virtual processor if
simultaneous multithreading is enabled.
d. ___________ This type of processor must be configured in whole
processor units.
e. ___________ These processors are configured in processing units as
small as one-hundredth of a processor.
f. ___________ Partitions marked as this might use up to their entitled
capacity but not more.
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-139
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions (1 of 4)
1. Match the following processor terms to the statements that describe
them: Dedicated, shared, capped, uncapped, virtual, logical
a. Dedicated These processors cannot be used in micro-partitions.
b. Uncapped Partitions marked as this might use excess processing
capacity in the shared pool.
c. Logical There are two or four of these for each virtual processor if
simultaneous multithreading is enabled.
d. Dedicated This type of processor must be configured in whole
processor units.
e. Shared These processors are configured in processing units as
small as one-hundredth of a processor.
f. Capped Partitions marked as this might use up to their entitled
capacity but not more.
The answers in the correct order are dedicated, uncapped, logical,
dedicated, shared, and capped.
Additional information
Transition statement
Uempty
Checkpoint (2 of 4)
3. True or False: By default, dedicated processors are returned to the
shared processor pool if the dedicated partition becomes inactive.
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-141
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions (2 of 4)
3. True or False: By default, dedicated processors are returned to the
shared processor pool if the dedicated partition becomes inactive.
The answer is true.
4. If a partition has 2.5 processing units, what is the minimum number of
virtual processors it must have?
a. One
b. Three
c. No minimum
The answer is three.
5. If a partition has 2.5 processing units, what is the maximum number of
virtual processors it can have?
a. 25 (Maximum can be no more than 10 times processing units.)
b. 30
c. Total number of physical processors x 10
d. No maximum
The answer is 25 (maximum can be no more than 10 times processing
units).
Additional information
Transition statement
Uempty
Checkpoint (3 of 4)
6. What is the maximum amount of processing units that can be
allocated to a partition?
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-143
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions (3 of 4)
6. What is the maximum amount of processing units that can be allocated
to a partition?
The answer is all available processing units.
7. If an uncapped partition has an entitled capacity of 0.5 and two virtual
processors, what is the maximum amount of processing units it can
use?
The answer is 2.0 processing units because it is uncapped and has
two virtual processors (maximum of 1.0 units per virtual processor).
8. If there are multiple uncapped partitions running, how are excess
shared processor pool resources divided between the partitions?
The answer is the uncapped weight configuration value is used to
allocate excess resources.
Additional information
Transition statement
Uempty
Checkpoint (4 of 4)
10. What is the maximum number of virtual processors that can
be configured for an individual partition?
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-145
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions (4 of 4)
10. What is the maximum number of virtual processors that can
be configured for an individual partition?
The answer is up to ten times the amount of processing
units, with a maximum value of 64.
Additional information
Transition statement
Uempty
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-147
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Unit summary
Having completed this unit, you should be able to:
Describe the simultaneous multithreading concept and its effect on
performance monitoring and tuning
Describe the function of the PURR/SPURR statistics
Describe the impact of simultaneous multithreading on tools such as
vmstat, iostat, sar, and topas
Discuss guidelines for systems running simultaneous multithreading
with various workloads
Use tools to view statistics related to the monitoring and tuning of
partitions that have simultaneous multithreading enabled
Describe how the POWER Hypervisor allocates processing power from
the shared processing pool
Discuss recommendations associated with the number of virtual
processors
Describe performance considerations associated with implementing
Micro-Partitioning
Use tools to monitor the statistics on a partition running a workload with
Micro-Partitioning configured
Notes:
Copyright IBM Corp. 2010, 2011 Unit 2. Processor virtualization tuning 2-149
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Estimated time
01:30
References
SC24-7940-03 PowerVM Virtualization on IBM System p Introduction
and Configuration (Fourth Edition)
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
After completing this unit, you should be able to:
Discuss the details associated with the following IBM Power
Systems features:
Dedicated shared processors running in donating mode
Multiple shared processor pools (MSPPs)
Discuss how these features can improve processor resource
utilization
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Dedicated processors
Allocated as whole processors to a specific partition.
Same physical processors are used for that partition while it is running
Idle cycles are effectively wasted.
When partition is stopped, dedicated processors will go to
shared pool if the inactive option is checked for a partition.
Partition properties (or profile) LPAR
Dedicated
processor
Physical processors
(not in the shared pool)
Copyright IBM Corporation 2011
Notes:
Dedicated processors are whole physical processors allocated to a particular partition.
When the partition is shut down, the processors might return to the shared processing pool.
When the dedicated processor partition starts again, it will be allocated dedicated
processors, although the actual physical processors might be different from the last time it
was activated.
As of HMC Version 7, check the Allow when partition is inactive checkbox in the Processor
Sharing section of the partitions profile or properties to configure dedicated processors to
go to the shared processor pool when the partition is shut down. In previous versions of the
HMC, this box is labelled Allow idle processors to be shared.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Dedicated
processor
Notes:
This feature is available only on POWER6 and POWER7 processor-based servers for
partitions configured with dedicated processors. This function allows idle dedicated
processors to donate their cycles to the shared processor pool.
This feature is licensed as part of the PowerVM Standard and Enterprise Editions. It is
supported only on POWER6 and POWER7 Systems and available for AIX 5.3, AIX 6.1,
and Linux enterprise distributions.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
There are cases where dedicated processors are more efficient than shared processors;
for example, with applications where the requirement is not on throughput but on execution
time of very short code. Also, consider the following:
Memory affinity of dedicated partitions compared to shared type partitions.
Guaranteed performance characteristics: the dedicated partition ceding idle cycles is
not at the mercy of other partitions in the shared processor pool. At higher CPU
utilization, there is no donation from the dedicated processor partition to the shared
processor pool.
The operating system might have optimizations for dedicated processor partitions (for
example, some VMM tuning parameters related to the memory affinity that apply only to
dedicated logical partitions).
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
While partition is
running, idle cycles
are donated to the
pool
Copyright IBM Corporation 2011
Notes:
Uempty You can apply the partition profile settings to the logical partition by activating the logical
partition using this partition profile. You can also directly change how the logical partition
shares dedicated processors by changing the logical partition properties. Direct changes to
the logical partition properties take effect immediately.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe how to configure shared dedicated processors.
Details
Additional information
Transition statement We can also check the partitions donating mode by using the
HMC command-line interface.
Uempty
lpar2,share_idle_procs_active
HMC command-line terminology:
Sharing mode is sharing idle processors when partition is shut down.
Donor mode is sharing idle processors when partition is active.
SHARING_MODE values Description
Disable sharing mode and
keep_idle_procs donor mode
share_idle_procs Enable sharing mode
Notes:
You can view the sharing mode for a partition in three places on the HMC: from the HMC
command line, from the Utilization Data GUI application, and from the partitions properties.
The first two use the terminology shown in the table in the visual to distinguish which mode,
or combination of modes, the partition is using.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose This visual shows how to understand the terminology used for the different
shared dedicated processor modes and to use the HMC command to view the current
setting.
Details
Additional information
Transition statement The donating mode can also be seen in AIX monitoring tools.
Uempty
Working with sharing/donor
mode from CLI (1 of 2)
The sharing/donor mode attribute can be set from the command line
when creating the profile.
# mksyscfg -r prof -m system12 -i
"name=prof2,lpar_name=lpar1,min_mem=256,desired_mem=1024,
max_mem=1024,proc_mode=ded,min_procs=1,desired_procs=1,
max_procs=1,sharing_mode=keep_idle_procs
Notes:
When creating a logical partition profile, the sharing/donor mode can be specified. The
example in the last bullet shows how to modify the profile (prof2) with the
sharing_mode=share_idle_procs_always.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose To show how to specify the sharing/donor mode when creating or editing a
partition profile.
Details
Additional information
Transition statement The sharing/donor mode of a partition can be changed using the
chhwres HMC command.
Uempty
Working with sharing/donor
mode from CLI (2 of 2)
Sharing/donor mode attribute of a partition can be changed
from the command line.
# chhwres -m costieres -r proc -o s -a
"sharing_mode=share_idle_procs_active -p lpar1
Notes:
The sharing/donor mode attribute of the partition can be changed even when the partition is
running. The logical partition profile attribute overrides the sharing/donor mode attribute of
the partition (inactive partition) when the partition is activated.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose The sharing/donor mode of a partition can be changed using the HCM CLI
command.
Details The change in the partition properties takes effect immediately; there is no need
to shutdown then restart the logical partition.
Additional information
Transition statement AIX monitoring tools provides information about the partitions
donating mode.
Uempty
Notes:
The visual shows the lparstat -i command output reporting the donating mode in a
dedicated partition. For dedicated partitions not donating idle cycles, the mode would be
Capped.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose The lparstat command shows the partition is set to donating mode.
Details
Additional information
Transition statement Lets look at some more tools that shows the donating mode.
Uempty
Proc0
cpu0 cpu1
0.20% 0.17%
Notes:
mpstat command
If simultaneous multithreading is enabled, and a dedicated processor partition is configured
to donate cycles, the mpstat -s command displays logical processors usage as shown in
the visual above. In the example, logical processor cpu0 is 0.20% busy and logical
processor cpu1 is 0.17%. cpu0 and cpu1 are hardware threads for the physical processor
proc0.
When a dedicated processor partition is not donating cycles, the logical processors
percentages would add up to 100% because they are entirely dedicated to that partition,
and the mode is listed as Capped.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
mode shows Donating. If a dedicated partition is not donating cycles, the mode is listed as
Capped.
The phsyc field shows the amount consumed by the physical processor. You might see a
change in the physc distribution once the partition starts donating cycles if simultaneous
multithreading is enabled. The example sar -P ALL 2 1 output shown in the visual above
is from a shared dedicated partition that is not actively donating cycles. It has one
dedicated processor with simultaneous multithreading enabled. When this same partition
was donating all of its excess cycles, the commands output was:
AIX lpar1 3 5 00C958AF4C00 09/07/07
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
From the window that displays the utilization events, select an Utilization Sample event
type. From this periodic utilization sample, you can select the information to display by
using the View menu. The view option related to the sharing mode is:
Partitions: Displays information about the processor and memory utilization on each
logical partition in the managed system.
The Partition Processor Utilization sample window shows lpar1 with a sharing mode of
when active, which means that lpar1 can donate idle cpu cycles to the shared pool when
active.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
1 2
200 200
175 175
150 150
125 125
Wasted Dedicated
100 1 core Dedicated 100
1 Core Dedicated
75 75
50 50
25 25
0 0
3 4
200 200
175 175
150 150
125 0.5 Uncapped 2 125 0.5 Uncapped 2
0.5 Uncapped 1 0.5 Uncapped 1
100 100
Wasted Dedicated Wasted Dedicated
75 75 1 Core Dedicated
1 Core Dedicated
50 50
25 25
0 0
Notes:
Scenario description
1. Consider a two-core Power System with a dedicated partition having one physical cpu
assigned. A variable workload is running (between 0% and 100%).
2. The excess capacity of the dedicated processor is wasted.
3. Consider adding two evenly weighted shared uncapped partitions with a capacity
entitlement of 0.5 with a desired number of virtual processor = 1. The two uncapped
partitions are CPU-bound. Each uncapped partition shares the remaining physical
processor even though each can consume an entire physical processor (desired
#VP=1).
4. With the shared dedicated processors PowerVM feature, a dedicated partition donates
its excess cycles to the uncapped partitions. Each uncapped partition consumes an
entire processor if available (when the dedicated partition consumption is at 0%) and
shares a processor when the dedicated partition is fully utilized (when the dedicated
Uempty partition consumption is at 100%). The total processor capacity in the system is better
utilized while the dedicated processor partition maintains the performance
characteristics and predictability of the dedicated environment when
resource-constrained.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Explain how this feature works. Most of the information is in the student notes.
Details When the dedicated processor's utilization is equal to or greater than 80
percent, AIX stops the donation to preserve the level of performance of the dedicated
partition. Giving the processor to another partition at this point could risk invalidating the
cache, and can have an adverse affect on the performance of the dedicated partition.
Additional information
Transition statement Lets see how the different metrics in the lparstat and mpstat
commands relate to the donation of idle cycles.
Uempty
Notes:
Some new metrics are added when a dedicated type partition is in donating mode.
physc column
While the %user, %kernel, %wait, and %idle columns stay relative to partition capacity, a
new physc column shows the actual physical processor consumption. The physc statistics
were displayed only when the partition type was shared on POWER5. It is now available for
dedicated LPARs on POWER6 and POWER7.
%idon and %bdon columns
Two new columns, %idon and %bdon, are related to the donated cycles.
%idon:
Shows the percentage of physical processor used while explicitly donating idle cycles.
This metric is applicable only for donating dedicated partitions.
%bdon:
Shows the percentage of physical processor used while busy cycles are being donated.
This metric is applicable only for donating dedicated partitions.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
This donation occurs when the dedicated partition calls hcede and when the donation
is allowed. This time is seen as %idon.
It is also possible (but rather rare) to give some cycles when busy. This can happen
when the partition is blocked because all its logical processors are in the hypervisor
waiting for a page fault resolution. In this case, the PHYPE or Hypervisor can force a
donation. This time appears as %bdon.
Stolen cycles columns in performance monitoring commands lparstat and mpstat.
These cycles are stolen by POWER Hypervisor from a dedicated partition to run
maintenance tasks (hypervisor overhead). This can happen whether or not donation is
enabled.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Processor folding
Processor folding can be enabled to improve the amount of
cycles that are donated.
New schedo parameter vpm_fold_policy
Enable/disable the AIX processor folding feature depends on the partition
type
There are 3 bits in vpm_fold_policy to control processor folding.
> Bit 0 (0x1): When set to 1, this bit indicates processor folding is enabled if the
partition is using shared processors.
> Bit 1 (0x2): When set to 1, this bit indicates processor folding is enabled if the
partition is using dedicated processors.
> Bit 2 (0x4): When set to 1, this bit disables the automatic setting of
processor folding when the partition is in static power-saving mode.
These bit values can be combined to form the desired value.
vpm_fold_policy=1 is default value
Notes:
The processor folding feature has been available on shared processors since AIX 5.3 ML3
and has helped provide optimal virtual CPU scheduling. This feature is now available for
shared dedicated processors.
The tuning of this scheduling algorithm can be done using the AIX schedo command. The
value of the parameter vpm_xvcpus specifies the number of virtual processors to enable in
addition to the virtual processors required to satisfy the workload.
A new schedo parameter, vpm_fold_policy, affects of the virtual processor management
feature of processor folding in a logical partition. The virtual processor management feature
of processor folding can be enabled or disabled based on whether a partition has shared or
dedicated processors. When the partition is in static power- saving mode, processor folding
is automatically enabled for both shared or dedicated processor partitions.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe the new schedo parameter vmp_fold_policy.
Details The virtual processor folding feature on shared processors partitions is well
known and has been implemented starting at the end of 2005 in AIX. Dedicated processors
can also activate this feature to optimize the idle cycles given to the shared processor pool.
Additional information
Transition statement Lets see an example of how we could maximize the idle cycles
donated to the shared processor pool.
Uempty
Processor folding: Maximizing
the idle capacity (1 of 2)
Example: Looking at the dedicated partition CPU activity output of
topas -L
vpm_fold_policy = 0
Interval: 2 Logical Partition: Thu Feb 7 13:24:44 2008
Donating SMT OFF Online Memory: 40960.0
Partition CPU Utilization Online Logical CPUs: 4
%user %sys %wait %idle %hypv hcalls %istl %bstl %idon %bdon vcsw
0 2 0 52 0.0 6025 0.0 0.0 26.3 0.0 2181
=========================================================
LCPU minpf majpf intr csw icsw runq lpa scalls usr sys _wt idl All four
Cpu0 2805 0 119 51 311 0 99 222 1 49 0 50 physical CPUs
Cpu1 2806 0 117 59 422 0 99 173 0 44 0 56 busy
Cpu2 2803 0 107 39 737 0 97 124 0 55 0 44
Cpu3 2803 0 107 39 737 0 97 124 0 43 0 57
Notes:
Here is an example showing the impact on the amount of idle processing cycles available
in the shared processor pool when the processor folding feature in enabled or disabled in a
dedicated partition running in donating mode.
Consider two partitions. The first one is a dedicated partition running in donating mode with
4 physical processors assigned to it. The schedo parameter vpm_fold_policy is set to 0,
meaning that we disabled the folding policy.
The second partition is a shared-type partition running in uncapped mode, with an
entitlement of 4.0 and four virtual processors configured.
We have started a multi-threaded application (which means that lots of threads are
running) on the dedicated partition. Looking at the topas -L output, we can notice all four
processors loaded at about 50%.
Looking at the lparstat command output on the shared partition, we see the amount of idle
processing cycles available in the shared processor pool. This value can be seen in the
app column and is approximately 5.10.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Discuss the scenario in which a dedicated partition donates idle cycles to the
shared processor pool, while the processor folding feature is disabled.
Details Point out the activity is dispatched on the four processors.
Additional information
Transition statement Lets see what happens when we activate the processor folding.
Uempty
Processor folding: Maximizing
the idle capacity (2 of 2)
Example: Looking at the dedicated partition CPU activity output of
topas L
vpm_fold_policy = 2
Interval: 2 Logical Partition Thu Feb 7 13:24:44 2008
Donating SMT OFF Online Memory:40960.0
Partition CPU Utilization Online Logical CPUs: 4
%user %sys %wait %idle %hypv hcalls %istl %bstl %idon %bdon vcsw
0 2 0 52 0.0 6025 0.0 0.0 26.3 0.0 2181
=========================================================
LCPU minpf majpf intr csw icsw runq lpa scalls usr sys _wt idl Only three
Cpu0 1290 0 119 51 311 0 99 222 1 58 0 41 physical CPUs
Cpu1 0 0 117 0 0 0 0 0 0 0 0 100 busy
Cpu2 2803 0 107 39 737 0 97 124 0 56 0 44
Cpu3 2803 0 107 39 737 0 97 124 0 43 0 57
Notes:
On the dedicated partition, the folding mechanism is enabled by setting the processor
folding policy to 2 (schedo -o vpm_fold_policy=2).
Notice in the topas -L command output three of the four processors. One processors has
no thread dispatched on it. The folding mechanism estimated that only three processors
were enough to run the application and folded one processor.
Looking at the lparstat command output on the shared partition, you can see the amount
of idle cycles in the pool increased (5.60 average). The previous amount of idle cycles
available was about 5.10.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Its time for a checkpoint.
Uempty
Checkpoint
1. True or False: Dedicated processors can be shared only if
they are idle.
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions
1. True or False: Dedicated processors can be shared only if they are
idle.
The answer is true.
2. True or False: Only uncapped partitions can use idle cycles donated
by the dedicated processors.
The answer is true.
Additional information
Transition statement
Uempty
Topic 1: Summary
Having completed this topic, you should be able to:
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets see Topic 2.
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Give an overview of multiple shared processor pools (MSPPs).
Details
Additional information
Transition statement Lets discuss the multiple shared processor pools feature.
Uempty
Notes:
Multiple shared processor pools (MSPPs)
Multiple shared processor pools are supported by POWER6 and POWER7 processor-
based systems. This allows the system administrator to create a set of micro-partitions and
control the processor capacity consumed from the physical shared processor pool. Each
shared-processor pool has an associated entitled pool capacity, which is consumed by the
set of micro-partitions in that shared processor pool.
This feature allows for automatic, non-disruptive balancing of processing power between
partitions assigned to the shared pools. The result is increased throughput and potentially a
reduction of processor-based software licensing costs. This feature is licensed through
PowerVM Standard or Enterprise Edition along with a POWER6 and POWER7
processor-based server.
Default shared processor pool
All IBM Power Systems support the multiple shared processor pools capability and have a
minimum of one (the default) shared processor pool and up to a maximum of 64 shared
processor pools.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Give an overview of MSPPs.
Details Introduce the default shared processor pool feature and its purpose.
Additional information
Transition statement Lets discuss the details of the multiple shared processor pools
feature.
Uempty
Shared
Dedicated
Physical
processors
Notes:
Shared processor pools
A shared processor pool is primarily for the purpose of controlling the processor capacity
that micro-partitions can consume from the physical shared processor pool.
The set of micro-partitions form a unit through which processor capacity from the physical
shared-processor pool can be managed.
Each shared processor pool has a maximum capacity associated with it. This defines the
upper boundary of the processor capacity that can be utilized by the set of micro-partitions
in the shared processor pool.
Physical shared processor pool
The physical shared processor pool is a set of physical processors used to run a set of
micro-partitions. There is a maximum of one physical shared processor pool on IBM Power
Systems. All active physical processors are part of the physical-processor pool unless they
are assigned to a dedicated-processor partition where:
The LPAR is active and is not capable of capacity donation, or
The LPAR is inactive (powered-off) and the systems administrator has chosen not to
make the processors available for shared processor work.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe the terminology and concepts of multiple shared processor pools.
Details Discuss the physical shared processor pool and the maximum pool capacity.
Additional information
Transition statement Lets see an example of how the multiple shared processor pools
can affect licensing.
Uempty
Only license the relevant software based on shared pool Max Cap.
DB2 cores to license:
One from dedicated partition n2 plus five from pool # 1 = 6
WebSphere cores to license:
Six from pool #2 = 6
Notes:
In this example, the system has 12 CPUs and is configured with eight LPARs; three
dedicated-processor and five shared-processor LPARs. Two of the shared-processor
LPARs are assigned to pool #1 and the other three are assigned to pool #2.
The shared processor pool limits must be whole numbers. All are competing for the same
physical CPUs. The pools have equal priority, and there is not a way to change this. The
mechanism is simply limiting how many excess CPU cycles or how much excess capacity
a group of LPARs can use.
If we look at the DB2 cores to license, the shared processor pool 1 has a max capacity
value to 5, thus limiting the maximum processor cycles to be consumed to five processors.
This reduces the DB2 licensing needed in partitions n5 and n6 to five instead of nine
(without using an additional shared processor pool, we would have needed nine DB2
licenses to match with the number of shared processors in the pool).
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information This mechanism does not affect the processor affinity logic.
Transition statement The next slide compares a traditional CPU consumption of a
shared partition in the default pool with a shared partition in a user-defined shared pool.
Uempty
Default
Partition2
SPP
Partition1
SPP1
Max Cap Partition2
Partition1
Notes:
The first image depicts the utilization of the default physical shared processor pool. This
shows the pool utilization for two partitions with very demanding applications. The partitions
are uncapped and totally consuming all available CPU resources.
In the second image, Partition1 was moved to a separate Shared Processor Pool (SPP1)
with a defined maximum capacity value. This is possible with the new virtual shared
processor pools. This value will limit the amount of available CPU resources usable by the
partitions assigned to the pool. This is a way to set a cap for uncapped LPARs.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Discuss how the max capacity value limits the overall CPU consumption inside
a user-defined shared pool.
Details
Additional information
Transition statement The following slides detail the CPU usage in a user-defined
shared pool.
Uempty
CPU usage in a user-defined
shared processor pool
Max
capacity Remaining
Optional
Optional Optional
Entitled pool Reserved
capacity Partition1
Reserved
Reserved
Capacity
entitlement
Assigned partition(s)
Assigned partition(s)
entitlement
entitlement
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce the different concepts: The maximum pool capacity and reserved
pool capacity.
Details Inside a shared pool, only uncapped partitions can use the reserved pool
capacity. Each pool is a separate entity where cycles are ceded and then redistributed to
the logical partitions inside the pool. This is the level0 of capacity resolution.
Additional information
Transition statement The two levels of capacity resolution are discussed in the next
slide.
Uempty
resolution
between all
pools
Capped
Capped
Uncapped
Uncapped
Capped
Capped
SPP 0 SPP 1
P P P P P P P P P P
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce the two levels of capacity resolution.
Details If the logical partitions inside a shared pool do not consume all their cycles, or
even if cycles from the reserved pool capacity are not used, then these cycles can be given
to other partitions in other shared pools. The redistribution of these cycles is done by the
POWER Hypervisor, by taking into account all the weight of all the partitions in all the
shared pools. There is no weight associated to a specific shared pool.
Additional information
Transition statement Lets see the requirements to configure multiple shared
processor pools.
Uempty
Notes:
If your managed system's firmware is not at 01EM320_31_31 (as shown in the figure), it
cannot have more than one shared processor pool. You can download the required
firmware from http://www14.software.ibm.com/webapp/set2/firmware/gjsn.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Point out the minimum firmware level required for configuring MSPPs.
Details
Additional information
Transition statement The next series of slides show how to configure MSPPs.
Uempty
Up to 64 pools
Default pool is pool 0
Notes:
The managed system's Properties is the best place to verify that the system supports
multiple shared pools.
The maximum number of shared processor pools has a value of 64 if your system is
capable, but has a value of 1 if your system is not.
The maximum capacity of the pool is the maximum number of processing units available to
this LPAR's shared processor pool. Reserved entitled capacity of the pool is the number of
processing units that this LPAR's shared processor pool is entitled to receive.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
If the managed system supports multiple virtual shared pools, the Maximum number of
shared processor pools is 64 (instead of 1). Also, the Partition Processor Usage includes a
Shared Processor Pool (ID) column.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Change attributes of shared
processor pools (1 of 3)
Select the
managed system.
Notes:
From the HMC GUI, you must select Shared Processor Pool Management to configure the
shared pool attribute values. From here, you could also assign an LPAR to a specific
shared processor pool.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Change attributes of shared
processor pools (2 of 3)
Notes:
The 64 shared processor pools are already defined on your managed system, but only the
pool id 0 (the default pool) is activated.
To change the Reserve Processing Units or the Maximum Processing Units, click the link
associated with the pool name.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Change attributes of shared
processor pools (3 of 3)
Specify the Maximum processing units value and optionally
the Reserved processing units value.
Notes:
The Maximum processing units must be a whole number. The Reserved processing units is
optional to activate a user-defined shared processor pool.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement You can dynamically assign an LPAR to a new shared pool.
Uempty
Notes:
The LPARs pool assignment can be changed dynamically. Under the Manage Shared
Processor Pools, select the Partitions tab and click the link associated with the LPAR. You
then see an Assign Partition to a Pool dialogue box. Select the pool from the Pool Name
pull-down list.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement The next few visuals show how to view and monitor the multiple
shared processor pools.
Uempty
Notes:
The lparstat -i command shows Pool ID, Maximum Capacity of Pool, and Entitled
Capacity of Pool.
The following are the new shared processor pool fields in the lparstat command output:
Shared Pool ID: Identifier of shared pool of physical processors that this LPAR is a
member.
Maximum Capacity of Pool: This is the maximum number of processing units
available to this LPARs shared processor pool.
Entitled Capacity of Pool: This is the number of processing units that this LPARs
shared processor pool is entitled to receive.
Active CPUs in Pool: This is the maximum number of CPUs available to this LPAR's
shared processor pool.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose To identify the lparstat command output changes.
Details
Additional information
Transition statement Lets view the shared pool configuration from the HMC CLI.
Uempty
Notes:
Using the HMC CLI, you can list the different shared processor pools with logical partitions
assigned. The example shows the default shared pool with the lpars id 6,3,2,1, and 4
assigned to it, and then it shows the shared processor pool id 1 with the lpar id 7 assigned
to it.
The second lshwres command output shows the maximum number of shared processor
pools configurable to 64.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose To identify the lparstat command output changes.
Details
Additional information
Transition statement topas can be used to monitor multiple shared processor pools on
managed systems.
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose To examine the topas command changes.
Details
Additional information
Transition statement HMC utilization data reports also the shared processor pools
utilization.
Uempty
Shared
processor
pools and
utilization %
Notes:
From the window that displays the utilization events, select a Utilization Sample event
type. From this periodic utilization sample, you can select the information to display by
using the View menu. The two possible View options related to the processor pools are:
Physical Processor Pool, which displays information about the total processor utilization
within all shared processor pools on the managed system.
Shared Processor Pool, which displays information about the processor utilization
within each configured shared processor pool on the managed system.
The Shared Processor Pool Utilization window example shows two shared processor
pools. The SharedPool01 has 1 processing unit assigned (the max cap value is set to 1).
The processor utilization is about 100%, which means that the LPARs running in that
shared pool consume all of the processor capacity assigned.
The Physical Processor Pool Utilization window shows a Processing Unit value of three. If
you look at the Configurable processing units value in the System Utilization window, you
will see a value of four (we have four physical processors in the managed system). The
physical processor pool contains only three processors because one physical processor
has been assigned to a dedicated partition.
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Its time for a checkpoint.
Uempty
Checkpoint
1. True or False: Each shared processor pool has a maximum
capacity associated with it.
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions
1. True or False: Each shared processor pool has a maximum
capacity associated with it.
The answer is true.
2. True or False: The default shared processor pool does not
have a number.
The answer is false (default shared pool ID = 0).
3. What is the default value of the reserved pool capacity for a
shared processor pool?
The answer is the default value is 0.
Additional information
Transition statement
Uempty
Topic 2: Summary
Having completed this topic, you should be able to:
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Unit summary
Having completed this unit, you should be able to:
Discuss the details associated with the following IBM Power
Systems features:
Dedicated shared processors running in donating mode
Multiple shared processor pools (MSPPs)
Discuss how these features can improve processor resource
utilization
Notes:
Copyright IBM Corp. 2010, 2011 Unit 3. Dedicated shared capacity and multiple shared processor 3-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Estimated time
01:30
References
POW03026USEN.pdf PowerVM Active Memory Sharing: An
Overview
REDP-4470 PowerVM Virtualization Active Memory sharing
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
After completing this unit, you should be able to:
Describe the Active Memory Sharing concepts and
components
Describe the POWER Hypervisor paging activity
Create a shared memory pool
Create and manage the AMS paging space devices
Create and activate a shared memory partition
Describe the Virtual I/O Server virtual devices involved in AMS
Monitor the shared memory partition using the AIX
performance tools vmstat, lparstat, and svmon
Monitor the shared memory pool usage using data utilization
from the HMC
Notes:
These are the unit objectives
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
As the number of CPUs available on a system increases and CPU virtualization techniques
such as Power Micro-Partitioning allow better utilization of processors across partitions, the
role of memory management in server virtualization is becoming more important.
IBM Power Virtualization Manager (IBM PowerVM) Active Memory Sharing (AMS)
technology takes PowerVM virtualization to a new level of consolidation and virtualization
by optimizing memory utilization. AMS intelligently shares memory by dynamically moving
it from one partition to another on demand, thereby optimizing memory utilization and
allowing flexibility of memory usage.
Because memory utilization can be linked to processor utilization, this function
complements shared processors very well. Systems with low CPU requirements are very
likely to have low memory residency requirements as well.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce the PowerVM Active Memory sharing concept.
Details The memory allocation is done at the memory page level (4k pages) instead of
at the LMB size when using a dynamic LPAR operation to add or remove memory to
partitions. These 4k pages movements are done automatically by AMS.
Additional information
Transition statement
Uempty
POWER Hypervisor
Physical memory
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The virtualization of the real memory through PowerVM Active memory sharing allows a
customer to activate several partitions at the same time, in a system with less physical
memory than the sum of all the partitions logical address spaces. The main benefit here is
the overall memory utilization, as opposed to increased performance.
The Active Memory Sharing feature is most beneficial for environments where logical
partitions have low average memory residency requirements, such as university
environments.
This function is not suitable for environments running high-performance applications
because these tend to have very high memory residency requirements.
AMS might not be a perfect solution for workloads that have high quality of service criteria
and predictable performance. However, AMS is an appropriate solution for time-variant
(around the world) applications, workloads that have variant load levels, file and print
servers, and other workloads that are tolerant to memory access latencies.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Give some reasons to use AMS and some benefits it provides.
Details Workloads running in a partition have a working set size in memory which, if the
partitions activity level goes down. In either case, the freed or aged pages do not get used
for other partitions. Also, customer workloads have high peaks only infrequently, so
memory utilization levels are frequently low. AMS optimizes memory utilization by sharing
physical memory across multiple partitions, so memory pages that are not actively used by
one partition are given to another partition. Therefore, through consolidation of workloads
that do not peak concurrently, overall memory utilization levels can be increased in a
system.
AMS also enhances a customers ability to optimize the CPU and I/O resources along with
memory, as it allows them to consolidate a larger number of partitions on a system. Until
now, physical memory has limited the number of partitions a customer can configure on
IBM Power systems.
Additional information
Transition statement
Uempty
Notes:
These are the system, operating system, and logical partition requirements when
implementing AMS. AMS is available with PowerVM Enterprise edition, and the per-core
pricing remains unchanged.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose List of requirements for AMS.
Details
Additional information
Transition statement
Uempty
Notes:
At the time of writing, only one shared memory pool can be created per managed system,
and up to 128 shared memory partitions can be created. Shared memory partitions must be
created as the shared processor type, and all of the I/O adapters must be virtualized
through a virtual I/O server.
Also, pages larger than 4k and Barrier Synchronization register are not supported with
AMS.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Point out the actual restrictions you might encounter when implementing AMS.
Details The objective of AMS is to increase the number of partitions that can be
supported on a single CEC by sharing the limited resources such as I/O devices, memory,
and CPUs.
Also, supporting physical adapters requires changes in the physical adapter's device driver
to function properly in a shared memory environment. In AMS, the hypervisor guarantees
certain amount of memory to be available for I/O memory mapping operations, to make
sure DMA operations proceed without delays. The partition's I/O entitled memory
represents the maximum amount of memory device drivers can I/O map. The partition's I/O
entitled memory is communicated to the OS at boot time. The OS manages this memory
and distributes it across devices. Device drivers must be aware of this value because I/O
mapping operations might fail if all the I/O entitled memory has already being used.
To support physical adapters, IBM would have to work with the adapter providers to get the
changes required to support AMS in the device drivers.
Additional information
Transition statement Lets overview the AMS components.
Uempty
VIOS Paging
Dedicated
partition (1 GB) Shared Shared Shared Memory
Memory Memory Memory Partition4
V
A
Partition1 Partition2 Partition3 (8 GB)
S vSCSI
FC server CMM CMM CMM
I
Notes:
This visual shows the different components of the PowerVM Active Memory sharing.
Shared memory pool: Shared memory pool is a configured collection of physical memory
units managed by the PowerVM AMS manager. This shared pool holds the
memory-resident pages (memory resident pages are the physical memory pages that a
partition is actively using) pages of all the active shared memory partitions in a system. The
system administrator determines the amount of physical memory allocated to the shared
pool, in multiples of logical memory blocks (LMBs).
Shared memory partition: A partition that is associated with a shared memory pool.
Active Memory Sharing Manager (AMSM): The Active Memory Sharing Manager
(AMSM) is a hypervisor component that manages the shared memory pool and the
memory of the partitions associated with the shared memory pool. The AMSM allocates the
physical memory blocks that comprise the shared memory pool.
Collaborative Memory Manager (CMM): CMM is an operating system (kernel) feature
that gives hints on memory page usage (active, inactive, critical, and so on). The PowerVM
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
hypervisor uses this to select good victim pages to manage the physical memory of a
shared memory partition.
CMM allows an OS to page out the aged page contents, even when the working set is
within its configured memory limit, and loan pages to the hypervisor to use when expanding
another shared memory partitions physical memory usage.
VIO Server Paging partition: This partition is needed not only for AMS paging but also
for shared memory partitions I/O hosting.
Paging space devices: This is an area of non-volatile storage used to hold portions of a
shared memory partitions logical memory that are not resident in the shared memory pool.
The paging space is allocated in a paging space device assigned to the shared memory
pool. This paging space device can be a logical drive (hdisk in AIX) or logical volumes.
VASI: VASI stands for virtual asynchronous services interface. The VASI receives page in
and page out requests from the Active Memory Sharing Manager.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The virtualization control point has a front end that provides the interface to the system
administrator and a back end that communicates with the rest of the firmware and software
components.
The front end is provided by the HMC or IVM and provides administrator functions to define
a shared memory pool, create shared memory partitions, specify shared memory pool
parameters for shared memory partitions, and manage the shared memory partitions.
The virtualization control point communicates with other major elements to manage the
shared memory pool function. It has a paging space partition interface to manage paging
spaces and an Active Shared Memory Manager (ASMM) interface to manage the different
shared memory partitions.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Paging devices
Notes:
The Active Memory Manager is a component of the POWER Hypervisor firmware that runs
on the managed system and manages the physical memory of the AMS pool. This
management is based on partition configuration parameters, such as entitled memory,
memory capacity weight, and partitions workload.
The primary purpose of the Active Memory Sharing Manager is to select which partition
pages are kept resident in physical memory at any point and to move partition pages in and
out of the system to a paging space device with the help of a specialized VIOS partition.
When a page fault occurs (this page fault is transparent to the OS), the AMSM allocates
free pages to logical partitions.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The Active Memory Sharing Manager also keeps a list of free physical pages that are
assigned to the shared memory partitions as needed. When a page fault occurs, the AMSM
assigns a free page to handle that page fault. This is done until the AMSM free list reaches
a low water mark. At that point, the Active Memory Sharing Manager takes memory from
other partitions, using the page loaning mechanism if other partitions cooperate by loaning
pages to the Hypervisor, or through page stealing, which takes place when the partitions
are not cooperating. Page stealing is based upon the partitions:
Shared memory weight
Page usage status
Page usage statistics
The AMSM uses the hypervisor paging to try to keep all partition working sets (pages
needed by current workloads) resident in the shared memory pool in an over-committed
system.
Hypervisor page fault occurs when the partition wants to access its data that has been
paged out to disk.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
AMS is not transparent to the partitions operating systems. AIX has been modified to
support Active Memory Sharing.
Device drivers support
An AMS enabled OS distributes the partitions entitled memory among its various device
drivers. Device drivers in turn handle failure of I/O map requests when partitions
entitlement is reached and delayed request execution when a physical frame is not
immediately available.
Collaborative Memory Manager (CMM)
The shared memory partition provides the classification of page usage (page hints) to the
hypervisor for page stealing. This pages classification is done by the Collaborative
Memory Manager. The operating system can also assist the Active Memory Sharing
Manager by providing page usage hints that identify page utilization. This could be unused
pages that are good candidates for page stealing; the active pages with contents that need
to be preserved if they are stolen, or critical pages. Hypervisor never steals I/O Mapped
(DMA) memory pages.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The paging virtual I/O server is a component of Active Memory Sharing responsible for
paging in and out memory pages to service memory requests of the partitions.
When the Hypervisor wants to free memory pages in the shared memory pool, the content
of the memory pages to be freed must be stored on a paging device in order to be restored
later when the data is accessed again.
The Paging Virtual I/O server copies the content of a physical frame to the specific paging
device of the logical partition. The memory page that has been freed in the shared memory
pool can then be safely allocated to the demanding logical partition by the Active Shared
Memory Manager. If the logical partition wants to access the memory page that has been
paged out to the VIO paging device, an Active Memory Sharing page fault is raised. This
mechanism is completely transparent to the operating system.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets see what the I/O memory entitlement is.
Uempty
AIX divides I/O memory entitlement into pools for each device.
I/O memory entitlement requirements vary by virtual adapter type.
Virtual Device Virtual SCSI Virtual Ethernet Virtual Fiber Channel Virtual Serial
Default I/O Entitlement 17 MB 60 MB 137 MB 0
Notes:
A portion of the shared memory pool has to be available for the I/O devices during I/O
operations.
I/O entitled memory is the maximum amount of physical memory (from the shared memory
pool) that is guaranteed to be available for I/O devices at any given time. If the minimum
amount of memory required by the device operation does not reside in the shared memory
pool, the device operation will fail.
I/O memory entitlement is required: When a partition is about to perform an I/O
operation (for example, disk read/writes or TCP/IP communications), it must ensure that a
portion of physical memory remains unmoved for the duration of these operations.
I/O entitlement default values
The HMC and IVM assign a default I/O entitlement memory value for each shared memory
partition. These default values are based on the number and type of I/O devices configured
for a typical partition and would work for all of the supported operating systems in most
cases. However, these values should be evaluated based on the workload, device
configuration, and adapter operations. The default I/O entitlement values for different
virtual adapter types are listed in the figure.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose What is the I/O memory entitlement.
Details I/O entitled memory is not managed dynamically by the Hypervisor like logical
or physical memory is. The amount of I/O entitled memory allocated to the partition is
assigned by the HMC (or IVM), enforced by the Hypervisor, and obeyed by the partition
OS. Once assigned, this does not change without manual intervention. The amount of I/O
entitled memory automatically assigned to the partition by HMC/IVM is calculated using
default values for each type of virtual adapter configured in the partition. Depending on the
partition OS and workload, this default capacity might not provide acceptable performance.
For example, if a partition workload requires 64Mb of I/O capacity to complete its I/O
operations in an acceptable amount of time and only 48Mb is assigned to the partition, the
partition will need to queue I/O requests. So, I/O throughput will probably not be sufficient
for the partition to complete its processing in the amount of time desired. The user can
manually override the defaults and assign more (or less) I/O entitled memory capacity to
the partition to achieve the desired partition I/O performance.
Additional information
Transition statement
Uempty
Active pages
Critical pages
Notes:
Logical memory
In a shared memory logical partition, the memory that is assigned as a result of the
configured minimum, desired, and maximum values is known as the logical memory.
In an AMS environment, the partitions real physical memory in a shared memory partition
becomes the partitions logical memory. The real physical memory is part of the AMS
shared memory pool, which is virtualized by the hypervisor.
The partitions logical address can be mapped to any physical memory that is part of the
shared pool. As a result, physical memory assigned to one partition at one time can be
assigned to another partition at another time.
The logical memory is the quantity of memory that the operating system manages and can
access. Logical memory pages that are in use can be backed up by either physical memory
or a pools paging device.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The figure shows an example of logical to physical mapping made by the hypervisor at a
given time. The shared memory partition owns the logical memory and provides the
classification of page usage (page hints) to the hypervisor (this classification is done by the
collaborative memory manager component).
While I/O mapped pages are always assigned physical memory, all other pages can be
placed either in physical memory or on the paging device. Free and loaned pages have no
content from the shared memory partitions point of view, and are not copied to the paging
device.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Virtual
I/O
Server
Active Memory Sharing Manager (AMSM)
AMS shared
memory pool
Physical memory LPAR2 paging LPAR3 paging
devices devices
Copyright IBM Corporation 2011
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
16 GB 16 GB
Loaned Loaned
Loaned Can Loaned Can
(3 GB) (3 GB) be reclaimed
be reclaimed
13 GB 13 GB
File File
Cache Available to Cache
(4 GB) be loaned (4 GB)
9 GB 9 GB
Available to be
loaned
Working Working
Storage Unavailable Storage
(9 GB) for loaning (9 GB)
0 GB 0 GB
Notes:
Memory loaning
Active Memory Sharing uses a memory loaning concept. Memory loaning introduces a new
class of page frames, named loaned page frames. This memory loaning method is used to
respond to the hypervisor loan requests with memory frames that are least expensive (from
the OS point of view) to donate. Those pages can be unloaded to be used immediately as
the load increases in another LPARs operating system.
AIX collaborates with the hypervisor to help with hypervisor paging. In response to the
hypervisor requests, AIX checks once a second to determine if the hypervisor needs
memory. In the case where the hypervisor needs memory, AIX will free up logical memory
pages (which become loaned pages) and give them to the hypervisor. The policy to free up
logical memory is tunable through the vmo ams_loan_policy tunable in AIX.
The ams_loaning_policy value indicates the page loaning policy used by the Collaborative
Memory Manager (CMM).
Uempty Default (ams_loan_policy=1) Only loan file cache pages; do not page out to its
OS paging space
Aggressive (ams_loan_policy=2) Loan file cache and working storage pages;
will page out to its OS paging space until paging space is low
Off (ams_loan_policy=0) Disables any type of loaning
Page loaning policy of 1: Default loaning
With the default loaning configuration, AIX first reduces the number of logical pages
assigned to the file cache and loans them to the hypervisor. When the Hypervisor needs to
reduce the number of physical memory pages assigned to the logical partition, it first
selects loaned pages and then selects free and used memory pages. The effect of loaning
is to reduce the number of hypervisor page faults because AIX reduces the number of
active logical pages and classifies them as loaned.
Page loaning policy of 2: Aggressive loaning
If page loaning is set to aggressive, AIX either reduces the file cache or frees additional
working storage pages by copying them into the AIX paging space. The number of loaned
pages is greater than the default loaning policy. The hypervisor uses the loaned pages and
might not need to perform any activity on its paging space. When AIX selects a working
storage page, it is first copied to the local paging space. This setup reduces the effort of the
hypervisor by moving paging activity to AIX. If the loaned pages are used pages, AIX has to
save the content to its paging space before loaning them to the hypervisor. This behavior
will especially occur if you have selected this aggressive loaning policy.
Page loaning policy to 0: Disabled
When page loaning is disabled, AIX stops adding pages in the loaning state, even if
requested by the hypervisor. When the hypervisor needs to reduce the logical partitions
memory footprint and free pages have been already selected, either file cache pages or
working storage can be moved on the hypervisors paging space.
Deciding on loan policy
Deciding on which loan policy to use depends on the configuration, consolidation factors,
and workloads. If aggressive loan policy is selected for a shared memory partition, then the
paging devices for that partition should be tuned. If AMS loan policy is disabled, only the
hypervisor paging devices should be tuned. OS paging is used in an AMS environment
only for loaning pages to the hypervisor. Therefore, if AMS loan policy is not enabled, the
OS paging device does not have to be optimized.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Because the applications running in your shared memory LPARs will exhibit varying
behaviors, you need to understand the different ways you can configure the memory
subscription ratio. The memory subscription ratio is determined by the level of physical
memory available and the logical memory needed. Autonomic memory configuration
decisions will be made based on this ratio. This attribute does not need to be set. The
subscription ratio is the result of your partition configuration.
Non overcommit
This subscription ratio means that the amount of physical memory available in the shared
pool is enough to cover the total configured logical memory of the shared memory
partitions. Because all the configured logical memory is backed by physical memory, this
mode does not provide any memory saving benefits.
Logical overcommit
This is the ratio of the logical memory in use to physical memory available in the shared
memory pool. The total logical configured memory can be higher than the physical
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
memory; however, the current memory working set can be backed up by the physical
shared memory pool. (Note: the working set almost never exceeds the physical memory.)
Applications that time multiplex are good candidates for this memory overcommit ratio. For
example, in AM/PM scenarios, peaks and valleys of multiple workloads overlap, leading to
logical overcommit levels without consuming more than the physical memory available in
the pool. Test and development environments also are good candidates.
Physical overcommit
This is a subscription ratio in which the sum of all shared memory partitions logical memory
not only exceeds the physical memory in the shared pool, but the total actual memory
referenced (the working set) by all shared memory partitions exceeds the physical memory.
The working set memory of the shared memory partitions has to be backed by both the
physical memory in the shared pool and by the paging space devices.
Good candidates for this are applications that use a lot of AIX file cache or are less
sensitive to I/O latency, such as file server, print servers, and network applications.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Here are the tasks to perform to set up a shared memory pool. Also included are tasks
required to create and activate shared memory LPARs.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Select the
Server
Notes:
The figure shows how to manage the share memory pool from the HMC. On the HMC,
select the managed system on which the shared memory pool should be created. Then
select Configuration > Virtual Resources > Memory Pool Management.
To get access to the memory pool management wizard, a virtual I/O server must be defined
and running on your managed system.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
This is the wizard to use to create a shared memory pool. If a pool is already configured,
the window will display the pool's configuration details.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The shared memory pool configuration requires the definition of a set of paging devices
that are used to store excess memory pages on temporary storage devices. Access to the
paging devices associated with a shared memory partition is provided by a Paging Virtual
I/O Server on the same system. At the time of pool creation, the Virtual I/O Server that will
provide paging service to the pool must be identified.
The panel shown in the figure is provided for selecting paging VIOS partitions. You can also
provide a second paging VIOS to provide a redundant path and higher availability to the
paging space devices.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
A paging device is required for each shared memory partition. The size of the paging
device must be bigger than or equal to the maximum logical memory defined in the partition
profile. The paging devices are owned by a virtual I/O server. A paging device can be a
logical volume or a whole physical disk. The disks can be local or provided by an external
storage subsystem through a SAN.
If you are using whole physical disks, there are no actions required other than making sure
the disks are configured and available on the virtual I/O server. If you are using logical
volumes, then you need to create them before proceeding with the pool management
wizard.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Click to
select the
devices
Notes:
After selecting the VIOS paging partition, you can select the devices to be used as paging
devices for your logical partitions.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Filter the
device type
Then
Refresh
Notes:
To list devices in the VIOS device list at the bottom of the window, you must first select a
Device Type and then select Refresh. In our example, we selected Logical in the device
type selection, because we had previously created logical volumes to use as paging
devices for our logical partitions.
Once refreshed, the device list appears. You must select one paging device per shared
memory logical partition.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Paging space
devices
Shared memory
pool size
Summary before
finish
Notes:
After the paging devices have been selected, click Next to get a summary of all the
selections that have been made. Review the selections and then select Finish to commit
and finish the creation of the shared memory pool.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The hypervisor uses some of the pool memory for its own
administration purpose.
That is 256MB, plus a small amount per shared memory partition.
Copyright IBM Corporation 2011
Notes:
A working window shows the memory pool creation, but you do not receive any specific
message stating the shared memory pool has been successfully created.
The POWER Hypervisor uses memory from the shared memory pool for its own
administration purpose. For each 16GB of memory assigned in the shared memory pool,
the Hypervisor uses 256MB. Looking at the shared memory pool properties of your
managed system, you will notice the Available Pool Memory value is lower than the
maximum pool size. The Available Pool Memory value is the amount of physical system
memory available after subtracting the amount of memory that the server firmware uses to
manage the shared memory pool and the total amount of I/O entitled memory of the shared
memory partitions.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The paging device selection is made when a shared memory partition is activated. The
assignment is based on the availability and the size of the maximum logical memory
configuration of the logical partition. When listing the attributes of the vrmpage devices, you
can see which paging device is used by each shared memory partition.
In the following lsdev command output, we can see that partition ID 7 is using the paging
device named paginglpar1.
$lsdev -dev vrmpage0 -attr
attribute value description user_settable
LogicalUnitAddr0x8100000000000000 Logical Unit Address False
aix_tdev paginglpar1 Target Device Name False
partition_id 7 Client Partition ID False
redundant_usage no Redundant Usage True
storage_pool Storage Pool False
vasi_drc_name U8203.E4A.65D8032-V1-C3 VASI DRC Name True
vrm_state active Virtual Real Memory State True
vtd_handle 0x200001207f9b0297Virtual Target Device Handle False
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Pager VBSD
driver driver
Client
LPAR
Figure 4-29. Virtual I/O Server virtual devices for AMS (1 of 2) AN313.1
Notes:
This figure shows the different drivers involved in the VIOS paging space partition.
VASI
- Accepts commands from the hypervisor and forwards them to appropriate kernel
extensions in the VIOS. The kernel extensions are responsible for executing the
commands on behalf of firmware.
Pager
- A kernel extension that is responsible for satisfying paging requests from firmware.
The paging requests are sent to the pager through the VASI kernel extension. It is a
layer of code between the VASI kernel extension and the VBSD driver.
VBSD
- VBSD is an acronym for virtual block storage device. It is a kernel extension
providing an interface for managing and accessing storage volumes. This driver
manages I/O requests made by other kernel extensions, such as a pager, within the
VIOS partition.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Figure 4-30. Virtual I/O Server virtual devices for AMS (2 of 2) AN313.1
Notes:
After the shared memory pool is created, four new VASI and VBSD devices should be
visible on the Virtual I/O Server. In the figure above, you will see five VASI and VBSD
devices. This is because this virtual I/O server is also defined as a Mover Service Partition
for Live Partition Mobility for which one VASI and one VBSD device are required as well.
The virtualization control point dynamically adds VASI devices to a Virtual I/O Server in
order to enable the virtual I/O server to be a paging Virtual I/O Server.
The lsmap command can be used with the -ams option to list the characteristics of the
vrmpage device associated with your shared memory partition. The lsmap command
output in the visual shows the devices used by a particular communication stream and the
associated physical backing device that is the physical paging device.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Minimum, desired, and maximum values
When you activate a partition profile that uses shared memory, the managed system does
not commit a memory amount to the logical partition. The memory for the logical partition is
set to the 'Total assigned logical memory'. This is different from the dedicated memory
allocation. If the managed system does not have the assigned memory amount available,
but has at least the minimum memory amount available, the managed system activates the
logical partition with the memory that is available.
Memory weight
The Memory weight setting is one of the factors used by the hypervisor to determine which
shared memory partition should receive more memory from the shared memory pool. This
field displays the relative value that is used in determining the allocation of physical system
memory from the shared memory pool to a logical partition that uses shared memory. A
higher value, relative to the values set for other shared memory partitions, increases the
probability of the hypervisor allocating more physical system memory from the shared
memory pool to the shared memory partition.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Each shared memory partition requires a dedicated paging device in order to be activated.
Paging device selection is made when a shared memory partition is activated, based on
the availability and the size of the maximum logical memory configuration of the logical
partition. If no suitable paging device is available, activation fails with an error message
providing the required size of the paging device.
There isnt a fixed relationship between a paging device and a shared memory partition
when a system is managed using the HMC. The smallest suitable paging device
automatically selected when the shared memory partition is activated for the first time.
Once a paging device has been selected for a partition, this device is used again as long as
it is available at the time the partition is activated. However, if the paging device is
unavailable for example, because it has been deconfigured or is in use by another partition,
then a new suitable paging device is selected.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Assigned memory
This field displays the amount of memory that is currently assigned to the logical partition.
The amount of logical memory of a shared memory partition can be changed within the
minimum and maximum boundaries defined in the partition profile. Increasing the logical
memory in a shared memory partition does not mean that the amount of physical memory
that is assigned through the hypervisor is changed. The amount of physical memory that a
shared memory partition actually gets depends on the availability of free memory in the
shared memory pool. The memory weight of a shared memory partition can also be
changed dynamically.
Uempty change this value unless monitoring tools are reporting excessive I/O mapping failure
operations.
When you dynamically change the I/O entitled memory, you also change the I/O entitled
memory mode from the auto mode to the manual mode. In manual mode, if you add or
remove a virtual adapter to or from the shared memory partition, the HMC does not
automatically adjust the I/O entitled memory. Therefore, you might need to dynamically
adjust the I/O entitled memory when you dynamically add or remove adapters to or from
the shared memory partition. When you want to change the I/O entitled memory mode from
the manual mode to the auto mode, reactivation of the shared memory partition is required.
You can specify the size in a combination of gigabytes (GB) plus megabytes (MB).
The HMC or IVM calculates the I/O entitled memory based on the I/O configuration. The
I/O entitled memory is the maximum amount of physical memory guaranteed to be
available for I/O mapping.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Point out the I/O memory entitlement is automatically managed except when
you set an I/O entitled memory value. Setting a new value also changes the I/O entitled
memory mode from automatic to manual. In manual mode, it is your responsibility to
manage I/O entitled memory.
Details
Additional information
Transition statement
Uempty
Notes:
In an Active Memory Sharing environment, the performance of the operating system
paging device takes on a different important role. There are two levels of paging: the
partitions operating system paging and hypervisor paging.
If the page fault rate for one logical partition becomes too high, it is possible to increase the
memory weight assigned to the logical partition to improve the possibility of its receiving
physical memory.
It is recommended to have the partition devices separate from the hypervisor paging
devices, if possible. The I/O operations from shared memory partitions might compete with
I/O operations resulting from hypervisor paging. Page in and page out operation requests
are communicated to the paging virtual I/O server using this virtual devices. Each VASI
adapter can support multiple shared memory partitions.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
The AIX operating system has enhanced its monitoring tools to show updates to AMS
specific resource consumption metrics. In a dedicated memory partition, the memory stat
tool svmon can be used to measure the working set size. The command svmon G shows
the inuse memory value. Even though this inuse value represents the working set, it does
not represent the actual memory that is being currently referenced. With AMS, the actual
working set of a partition (actual amount of pages that are being currently referenced) can
be monitored.
Existing tools such as topas and vmstat have been enhanced to report physical memory
in use, hypervisor paging rate, hypervisor paging rate latency, and the amount of memory
loaned by AIX to the hypervisor.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets see a shared memory LPARs lparstat -i output.
Uempty
lparstat -i
# lparstat -i
Node Name : lpar1
Partition Name : lpar1
Partition Number :5
Type : Shared-SMT
Mode : Uncapped
Entitled Capacity : 0.10
Partition Group-ID : 32773
Shared Pool ID :0
Online Virtual CPUs :1
Maximum Virtual CPUs : 10
Minimum Virtual CPUs :1
Online Memory : 1536 MB
Maximum Memory : 2048 MB
Minimum Memory : 512 MB
Variable Capacity Weight : 128
Minimum Capacity : 0.10
Maximum Capacity : 2.00
Capacity Increment : 0.01
Maximum Physical CPUs in system :4
Active Physical CPUs in system :4
Active CPUs in Pool :4
Shared Physical CPUs in system :4
Maximum Capacity of Pool : 400
Entitled Capacity of Pool : 260
Unallocated Capacity : 0.00
Physical CPU Percentage : 10.00%
Unallocated Weight :0
Notes:
The AIX lparstat command has been enhanced to display statistics about shared memory.
Using the -i flag, the shared memory configuration is shown. The details include the LPAR
configured memory mode, the I/O memory entitlement (the amount of physical memory
used for I/O operations), the memory capacity weight value specified at the partition
creation time, and the shared memory pool size.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement The next slide introduces the pmem and loan values that are
seen in the vmstat output.
Uempty
Logical memory
Ac
tu
all
ys
Hypervisor
to
re
d
Physical Loaned memory
on
memory pages to Hypervisor
dis
k
VIOS VIOS
vmstat :
paging loan value
vmstat :
pmem value
VIOS AIX OS
paging device paging device
Copyright IBM Corporation 2011
Notes:
This slide introduces the virtual memory in a shared memory partition.
The logical memory shown in the visual are memory page frames backed by physical
memory and also loaned pages to the POWER Hypervisor. These values can be examined
using the vmstat command.
When the sum of the logical memory backed by physical memory frames (the pmem value)
and the pages loaned to the hypervisor (loan value) is less than the amount of logical
memory defined in the logical partition, the difference represents what has been stolen by
the hypervisor and resides on paging device.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets see an example of vmstat command output.
Uempty
vmstat command (1 of 2)
# vmstat -h 2
99 mmode:
mmode:Memory
Memorymode
modeofofLPAR
LPAR(dedicated
(dedicatedororshared)
shared)
99 mpsz:
mpsz: Amount of memory in shared memory pool(in
Amount of memory in shared memory pool (inGB)
GB)
99 hpi: Hypervisor page-ins / page faults
hpi: Hypervisor page-ins / page faults
99 hpit:
hpit:Time
Timewaiting
waitingfor
forhypervisor
hypervisorpage-ins
page-ins(in
(inmilliseconds)
milliseconds)
99 pmem:
pmem: Physical memory currently backingthe
Physical memory currently backing thelogical
logicalmemory
memoryassigned
assignedtotothe
theLPAR
LPAR
99 loan: Logical memory loaned
loan: Logical memory loaned
Notes:
With Active Memory Sharing, the vmstat command has been enhanced to report the
amount of physical memory that is backing the logical memory of the logical partition, as
well as the hypervisor paging information.
The mem field shows the amount of available logical memory. Unlike in a dedicated
memory partition where the logical memory is always backed by physical memory, this is
not the case in a shared memory partition. The command output shown in the figure has
1.5GB of logical memory. This does not mean that there is actually this amount of logical
memory available to the partition. To see how much physical memory that partition
currently has assigned, you have to look at the pmem column. In this case, it shows that
the partition only has 1.13GB of physical memory assigned at the time the output was
produced.
The hypv-page group in the vmstat command output shows physical memory statistics
and hypervisor paging activity. Looking at the loan column in this example, we see that the
partition has loaned 0.37GB of memory to the hypervisor. This loaned memory is included
in the free (amount of free memory) column under the memory section. If the workload
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
increases its load (memory usage), the loan and the free columns both come down and
pmem column will go up.
If the hpi and hpit values are non zero, or the pi and po values are non zero, then paging
to the paging device is occurring. This is an indication that the working set exceeds the
physical memory given to the partition, or physical pages are being loaned. Either of these
situations will impact workload performance
The following fields have been added for Active Memory Sharing:
mmode: Shows shared if the partition is running in shared memory mode.
mpsz: Shows the size of the shared memory pool.
hpi: Shows the number of hypervisor page-ins for the partition. A hypervisor page-in
occurs if a page is being referenced which is not available in real memory because it
has been paged out by the hypervisor previously. If no interval is specified when issuing
the vmstat command, the value shown is counted from boot time.
hpit: Shows the time spent in hypervisor paging in milliseconds for the partition. If no
interval is specified when issuing the vmstat command, the value shown is counted
from boot time.
pmem: Shows the amount of physical memory (in gigabytes) backing the logical
memory.
loan: Shows the amount of the logical memory in gigabytes that is loaned to the
hypervisor. The amount of loaned memory can be influenced through the vmo
ams_loan_policy tunable.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
vmstat command (2 of 2)
# vmstat -vh
393216 memory pages
365809 lruable pages
122435 free pages
1 memory pools
112636 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
80.0 maxperm percentage
6.3 numperm percentage
23296 file pages
0.0 compressed percentage
0 compressed pages
6.3 numclient percentage
80.0 maxclient percentage
23296 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
1700 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
Notes:
The vmstat command, with the -v flag associated with the -h flag, displays additional
memory metrics related to AMS. The output now includes the number of AMS memory
faults and the time spent in milliseconds for hypervisor paging. It also shows the number of
4KB pages that AIX has loaned to the hypervisor and the percentage of partition logical
memory that has been loaned.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
lparstat -me
# lparstat -me
physb %entc vcsw hpi hpit pmem iomin iomu iomf iohwm iomaf
-------- --------- -------- ------ ------ ---------- --------- -------- --------- --------- --------
0.90 12.1 441 0 0 1.34 31.7 12.0 45.3 12.7 0
Notes:
The lparstat command has been enhanced to display statistics about shared memory.
Most of the metrics show the I/O entitled memory statistics.
From the lparstat command output, we can see on the left, the different memory pool
names for each virtual adapter. A device might have multiple I/O memory entitlement pools.
The virtual Ethernet adapter could have several pools for different receive buffers, transmit
buffers, and other misc. memory. Virtual Fibre Channel adapters and virtual SCSI adapters
tend to have just a single main large pool and a few other smaller pools. The lparstat
me command reports consolidated I/O entitlement usage information across all pools for
an adapter. The exception is the virtual Ethernet adapter.
The iomaf value tells how many times the OS attempted to get a page frame for an I/O and
failed. If this value is non-zero, increase your I/O entitled memory. If I/O entitled memory in
use (iomu) is high relative to your configured I/O entitled memory, or iomf is consistently
low, increase your I/O entitled memory to improve performance.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement The next visuals are about topas.
Uempty
topas -L
99 Mmode:
Mmode:Memory
Memorymode
mode
99 IOME:
IOME: I/Omemory
I/O memoryentitlement
entitlementofofthe
thepartition
partitionininMegabytes
Megabytes
Notes:
When using the topas -L command, the logical partition view with the Active Memory
Sharing statistics is displayed. The IOME field shows the I/O memory entitlement
configured for the partition, while the iomu column shows the I/O memory entitlement in
use. Detailed information about I/O memory entitlement can be obtained by pressing e in
the topas -L menu.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
topas -C (1 of 2)
99 InU
InU - -Logical
Logicalpartition
partitionworking
workingset
set
99 CM
CM - -SMT
SMTenabled
enabledand
andcapped
cappedininshared-memory
shared-memorymode
mode
99 cM
cM - -SMT
SMTdisabled
disabledand
andcapped
cappedininshared-memory
shared-memorymode
mode
99 UM
UM - SMT enabled and uncapped inshared-memory
- SMT enabled and uncapped in shared-memorymode
mode
99 uM - SMT disabled and uncapped in shared-memory
uM - SMT disabled and uncapped in shared-memory modemode
Notes:
In the figure, there are four shared memory partitions with host names amsaixa, amsaixb,
amsaixc, and amsaixd. Each line shows the partitions corresponding physical and logical
memory usage. The InU column displays the amount of logical memory (in GB) from the
AIX perspective, which will be the configured memory for the partition. The pmem column
shows the physical memory (in GB) allocated to the shared memory partitions from the
AMS at a given time. This reflects the actual working set of the shared memory partition.
pmem: Physical memory in Gbytes allocated to shared memory partitions from the
shared memory pool in GB at a given time.
InUse: Logical memory; from the AIX perspective, this is LPARs working set.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-95
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets use topas to see the memory pool and the partitions
associated with that pool.
Uempty
topas -C (2 of 2)
Press the m key to display the memory pool panel from the CEC panel.
Move the cursor on the memory pool then press the f key to display the
partitions associated with that pool.
Topas CEC Monitor Interval: 10 Mon May 25 15:14:10 2009
Partitions Memory (GB) Memory Pool(GB) I/O Memory(GB)
Mshr: 4 Mon: 6.0 InUse: 4.9 MPSz: 4.0 MPUse: 4.0 Entl: 308.0Use: 47.9
Mded: 0 Avl: 1.1 Pools: 1
Host mem memu pmem meml iome iomu hpi hpit vcsw physb %entc
------------------------------------------------------------------------------------------------------------
lpar2 1.50 1.22 0.98 0.52 77.0 12.0 0 0 0 0.01 5.50
lpar4 1.50 1.17 1.05 0.45 77.0 12.0 0 0 277 0.01 5.11
lpar1 1.50 1.22 1.01 0.49 77.0 12.0 0 0 0 0.00 0.00
lpar3 1.50 1.25 0.97 0.53 77.0 12.0 0 0 733 0.01 10.11
99 memu:
memu:Logical
Logicalpartition
partitionworking
workingset
set
Copyright IBM Corporation 2011
Notes:
To display the memory pool panel from the CEC panel, press the m key. This panel
displays the statistics of all of the memory pools in the system. The example shows a
shared memory size of 4GB, and the size of the aggregate logical memory of all the
partitions in the pool is 6GB. Also, the following values of the pools are displayed:
mpid: The ID of the memory pool
mpsz: The size of the total physical memory of the memory pool in gigabytes
mpus: The total memory of the memory pool in use (this is the sum of the physical
memory allocated to all of the LPARs in the pool)
mem: The size of the aggregate logical memory of all the partitions in the pool in
gigabytes
memu: The aggregate logical memory that is used for all the partitions in the pool in
gigabytes
iome: The aggregate of I/O memory entitlement that is configured for all the LPARs in
the pool in gigabytes
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-97
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
iomu: The aggregate of the I/O memory entitlement that is used for all the LPARs in the
pool in gigabytes
hpi: The aggregate number of hypervisor page faults that have occurred for all of the
LPARs in the pool
hpit: The aggregate of time spent in waiting for hypervisor page-ins by all of the LPARs
in the pool in milliseconds
To display the partitions associated with a pool in the lower section of the panel, select a
particular memory pool and press the f key. The example shows four LPARs. Each of them
has 1.5GB of logical memory configured. The following values of the partitions in the pools
are displayed:
mem: The size of logical memory of the partition in gigabytes
memu: The logical memory that is used for the partition in gigabytes
meml: The logical memory loaned to hypervisor by the LPAR
pmem: The physical memory that is allocated to the partition from the memory pool in
gigabytes
iome: The amount of I/O memory entitlement that is configured for the LPAR in
gigabytes
iomu: The amount of I/O memory entitlement that is used for the LPAR in gigabytes
hpi: The number of hypervisor page faults
hpit: The time spent in waiting for hypervisor page-ins in milliseconds
vcsw: The virtual context switches average per second
physb: The physical processor that is busy
%entc: The percentage of the consumed processor entitlement
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-99
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
You can use the HMC utilization data to retrieve informations about the shared memory
pool utilization. First, specify the utilization events that you want to see. Utilization events
are records that contain information about the memory and processor utilization on your
managed system at a particular time. You can select the record type you want to see. In our
example, the shared memory pool.
Uempty Memory over commitment (percent): The difference (percentage) between the
aggregated logical memory over all partitions and the size of the shared memory pool,
as a percent
Partition Logical Memory (GB): The amount of logical memory (in gigabytes)
assigned to all of the partitions in the shared memory pool
Partition I/O entitled memory (GB): The total amount of I/O entitled memory (in
gigabytes) currently mapped by the shared memory pool
Partition mapped I/O entitled memory (GB): The amount of I/O entitled memory (in
gigabytes) currently mapped by all of the partitions in the shared memory partition
System firmware pool memory (GB): Amount of memory, in gigabytes, in the shared
memory pool that is being used by system firmware
Page fault rate (faults/second): The number of page faults per second.
Page-in delay (microseconds): The total page-in delay, in microseconds, spent
waiting for page faults since the shared memory pool was created
Page-in delay (percent): The page-in delay, expressed as a percentage of total time.
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-101
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
The information displayed in the visual shows the processor and memory utilization of each
logical partition in the managed system at the indicated date and time.
When you select the memory tab, the window displays the memory utilization status of a
selected partition at a specific date and time.
The following information is displayed:
Partition (ID): The partition name and ID number
Memory mode: The processor mode (dedicated or shared)
Logical memory (GB): The amount of logical memory configured for the partition, in
gigabytes (GB)
Virtual processors: The amount of physical memory configured for this partition, in
gigabytes (GB)
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-103
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Its time for a checkpoint.
Uempty
Checkpoint (1 of 3)
1. True or False: PowerVM Active Memory Sharing feature allows
shared memory partitions to share memory from a single pool of
shared physical memory.
3. True or False: The total logical memory of all shared memory LPARs
in a system is allowed to exceed the real physical memory allocated to
a shared memory pool in the system.
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-105
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solution (1 of 3)
1. True or False: PowerVM Active Memory Sharing feature allows shared memory
partitions to share memory from a single pool of shared physical memory.
The answer is true.
3. True or False: The total logical memory of all shared memory LPARs is allowed to
exceed the real physical memory allocated to a shared memory pool in the
system.
The answer is true.
Additional information
Transition statement
Uempty
Checkpoint (2 of 3)
5. What requirements must be met by the LPAR in order to be
defined as a shared memory LPAR?
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-107
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solution (2 of 3)
5. What requirements must be met by the LPAR in order to be defined
as shared memory LPAR?
The answer is the LPAR must use shared processors and use only
virtual I/Os.
Additional information
Transition statement
Uempty
Checkpoint (3 of 3)
9. True or False: The Collaborative Memory Manager is an
operating system feature that gives hints on memory page
usage to the hypervisor.
12. How can you tune the Collaborative Memory Managers loan
policy?
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-109
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solution (3 of 3)
9. True or False: The Collaborative Memory Manager is an operating
system feature that gives hints on memory page usage to the
hypervisor.
The answer is true.
10. Which commands can be used to get Active Memory Sharing
statistics?
The answer is vmstat, lparstat, topas, and svmon.
11. True or False: When AIX starts to loan logical memory pages, by
default it first selects pages used to cache file data.
The answer is true.
12. How can you tune the Collaborative Memory Managers loan policy?
The answer is the policy is tunable through the AIX VMM vmo
command. The parameter ams_loan_policy has a default value of 1.
This enables the loaning of the file cache. When set to 2, loaning of
any type of data is enabled.
Copyright IBM Corporation 2011
Additional information
Transition statement
Uempty
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-111
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Unit summary
Having completed this unit, you should be able to:
Describe the Active Memory Sharing concepts and
components
Describe the POWER Hypervisor paging activity
Create a shared memory pool
Create and manage the AMS paging space devices
Create and activate a shared memory partition
Describe the Virtual I/O Server virtual devices involved in AMS
Monitor the shared memory partition using the AIX
performance tools vmstat, lparstat, and svmon
Monitor the shared memory pool usage using data utilization
from the HMC
Notes:
Copyright IBM Corp. 2010, 2011 Unit 4. Active Memory Sharing 4-113
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Estimated time
02:00
References
http://www-03.ibm.com/systems/power/hardware/whitepapers/am_ex
p.html
Active Memory Expansion: Overview and Usage Guide
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
After completing this unit, you should be able to:
Describe the Active Memory Expansion (AME) feature
List the benefits of using AME
Define the purpose of the memory expansion factor
List workload characteristics used to evaluate suitability for AME
Describe how to use the AME planning tool
Explain the output produced by the AME planning tool
Describe how to select a suitable memory expansion factor
List the hardware and software requirements for AME
Describe how to activate the AME feature on a managed system
Configure a partition to use AME
List the tools used to monitor AME performance
Determine the memory compression level achieved in a partition
Determine the CPU resources used for memory compression and
decompression
Copyright IBM Corporation 2011
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 1: Overview
After completing this topic, you should be able to:
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Active Memory Expansion (AME) is an optional feature of POWER7 processor-based
systems that enables more effective server consolidation.
Partitions configured to use AME will observe an extended logical memory amount that is
greater than the allocated physical memory.
This technique can be used in two basic ways:
Retain the existing memory allocation of the LPAR, but use AME to present an
extended memory amount to the partition, and allow more throughput.
Reduce the memory allocation of the partition, and use AME such that the extended
memory amount presented to the application is the same as the original physical
memory allocation. This allows the LPAR to handle the same workload, but with a
reduced physical memory allocation.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The basic concept of AME is to use available CPU resources in the partition to compress
data to squeeze more of it into the actual memory amount allocated to the partition. The
CPU resources used for compression and decompression are from the resources allocated
to the partition, since the work is being carried out by the operating system itself.
The actual amount of increase in the effective memory capacity of the partition depends on
the compressibility of the data being used.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AME scenarios (1 of 2)
AME can be used to increase the effective memory capacity of a
partition, while retaining the existing physical memory allocation.
96GB 96GB
allocation allocation
MMMM MMMMG
MMMM MMMMG
MMMM Scenario 1: Expand effective
memory in constrained LPAR
MMMMG
MMMM 96GB 120GB MMMMG
MMMM MMMMG
MMMM MMMMG
96GB 120GB
effective effective
Assumes
25%
expansion
Notes:
Scenario 1
The diagram in the visual shows an example of using AME to expand the effective memory
amount observed by a partition. The partition retains its existing physical memory
allocation, and uses AME to present an extended memory amount to the applications and
users on the system.
In the example in the visual, a partition configured with 96GB of physical memory will
present an extended memory amount of 120GB to applications and users, while retaining a
physical footprint of 96GB.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AME scenarios (2 of 2)
AME can be used to reduce the physical memory allocation of a
partition while retaining the same effective memory capacity. This allows
more LPARs to be created.
96 GB 64 GB 32 GB
allocation allocation allocation
MMMM MMGG MM
MMMM MMGG MM
MMMM Scenario 2: Retain effective memory
in existing LPAR, but reduce physical
MMMG MG
MMMM allocation to use for second LPAR MMMG MG
MMMM MMMG MG
MMMM MMMG MG
96 GB 96 GB 48 GB
effective effective effective
Assumes Assumes
50% 50%
expansion expansion
Notes:
Scenario 2
Another way in which AME can be used is to cope with the existing workload being handled
by a partition while allowing the physical memory allocation to be reduced. This scenario is
shown in the diagram on the visual above.
The original partition is reconfigured with a reduced memory allocation, but uses AME to
present an extended memory amount to applications that is identical to the original
non-AME configuration. The physical memory that has been freed by this reconfiguration
has been repurposed to enable the creation and activation of an additional LPAR, which is
also using AME to present an extended memory amount to the applications and users it is
running.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Logical memory
Logical memory is the virtualized physical memory presented
to a partition by the hypervisor.
Notes:
In order to understand the implementation details of AME, it is necessary to define a
number of terms that are used in the explanation.
The first of these terms is logical memory. The HMC allocates logical memory to a partition,
which is then managed by the virtual memory manager component of the operating
system. For a partition configured to use dedicated memory, there is a one to one mapping
between logical memory and physical memory. For a partition configured to use Active
Memory Sharing (AMS), another memory utilization improvement technology available on
Power Systems, the logical memory might not all be mapped to physical memory at the
same time. The actual amount of physical memory used by a shared memory partition will
depend on the workload of the partition, and the demands being placed on the shared
memory pool by the other partitions configured to use AMS.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
When AME is enabled, the logical memory allocated to a partition is divided into two pools
for management purposes. There is a pool of uncompressed memory, and a pool of
compressed memory.
The compressed memory pool can be thought of as a special type of RAM disk paging
device that is handled internally by the virtual memory manager.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Page faults
When a referenced virtual page is not resident in the
uncompressed pool, a page fault is generated.
The VMM page fault handler uses the software page frame
table to determine current location of the referenced page.
Notes:
When the CPU references a virtual page for which it has no translation information (not
resident in the uncompressed pool), it generates a page fault. This will result in the VMM
page fault handler being invoked.
If the virtual page being referenced is located on a regular paging space device, then the
page fault handler will obtain a free frame from the uncompressed pool, and invokes an I/O
operation to copy the virtual page from paging space back into memory. The faulting thread
is put to sleep until the I/O operation completes.
If the virtual page being referenced is currently contained within the compressed page pool,
then a similar operation is performed. The VMM page fault handler will obtain a free frame
from the uncompressed pool, and use it to store the decompressed. This operation
completes much more quickly than a traditional page fault as there is no disk-based I/O
involved.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
View presented to
firmware, HMC, OS
Notes:
Memory expansion factor
Applications are presented with information about the expanded logical memory capacity of
the partition. This allows them to scale data structures, algorithms, and the number of
threads they will use appropriately for the total amount of data that can be handled by the
partition.
The firmware and operating system are presented with a view that represents the actual
logical memory amount allocated to the partition.
The memory expansion factor value is a multiplier of the actual logical memory amount that
determines the target expanded logical memory amount.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Pool size (1 of 2)
The size of the compressed memory pool (and also the
uncompressed pool) will vary dynamically over time depending
on LPAR workload.
Notes:
Dynamic pool boundary
The boundary between the compressed and uncompressed memory pools will change
dynamically as required, based on LPAR workload.
When AME is enabled, the compressed pool is initially empty. As the memory workload of
the partition increases, pages will be allocated from the uncompressed pool. If the partition
workload fits within the actual logical memory allocation, no compression will be required.
As the number of free frames in the uncompressed pool is reduced, the VMM will start to
compress eligible least recently used pages and move them to the compressed pool. This
will free up frames in the uncompressed pool at a rate larger than the rate at which frames
are being added to the compressed pool.
Since the amount of logical memory available to the partition is constant at any given
moment, if the compressed pool is expanded this means the uncompressed pool will
shrink.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Pool size (2 of 2)
LPARs actual
logical memory
Minimum size of
uncompressed pool Grow Shrink
ame_min_ucpool_size
Notes:
Dynamic pool boundary
The boundary between the compressed and uncompressed memory pools will change
dynamically as required, based on LPAR workload.
As shown in the diagram on the visual above, in addition to the uncompressed pool, some
portion of the logical memory will be used to store pinned pages which cannot be
compressed. The compressed pool start off at zero size, and will grow as required until one
of the three conditions listed on the visual are met.
If the compressed pool cannot grow any more and there is a lack of free frames in the
uncompressed pool, then regular paging activity to a paging space device will occur. Only
pages from the uncompressed pool will be examined by the VMM and considered as
candidates to be paged out.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
CPU cost
There is an associated cost in CPU resources that will be consumed within a partition when
AME is configured. As the amount of memory expansion increases, there will be an
increase in the amount of CPU resources consumed. This relationship is not linear, as
shown on the diagram in the visual above.
In many cases, a significant amount of memory expansion can be obtained without
consuming a significant amount of additional CPU resources. There is however a point in
the curve at which significant additional CPU resources might be consumed to provide only
a small amount of additional memory expansion.
The actual location of the sweet spot of the curve will depend on the existing CPU load on
the partition, the memory workload on the partition, and the compressibility of the data
being used.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Limitations
Only working storage pages can be compressed.
Notes:
AME does have some limitations on functionality, some of which are listed on the visual
above. Other limitations will be discussed in later in this unit.
The impact of these limitations on the potential performance of a partition using AME can
be determined in advance by the use of the AME planning tool.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AME economics
AME is a chargeable feature.
Cost varies depending on system, just as cost of physical memory
varies by system.
Scenario 1: System has maximum possible physical memory
for current configuration.
AME is a simple choice, even with moderate expansion factors.
Scenario 2: System is not at maximum possible physical
memory, and CPU utilization is relatively low.
Need to choose between adding more physical memory or adding
AME.
Calculate cost of AME gained memory based on expansion factor,
and compare with cost of same amount of physical memory for
system.
Scenario 3: System is full of small DIMMs, and need to avoid
cost of replacing with larger DIMMs.
This is the same as scenario 2.
Copyright IBM Corporation 2011
Notes:
Cost evaluation
The cost of the AME feature varies from system to system, just as the cost of physical
memory varies from system to system.
When evaluating whether AME is a sensible purchase, you should consider the cost of the
AME feature with the cost of buying an amount of physical memory equivalent to the logical
memory gained by each partition using AME. The calculation should also take into account
other factors, such as whether the system is currently at maximum physical memory
capacity. Another consideration is whether all DIMM slots are currently occupied, and
adding more memory capacity would require adding another CPU card, or replacing
existing DIMMs with larger capacity DIMMs.
Remember that the cost per Gigabyte of memory for a given system will vary depending on
the size of the DIMMs used in the memory feature.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Phase 2: Trial
Obtain 60 day trial activation of AME.
Configure LPARs as desired with memory expansion factor, and
monitor performance.
Notes:
A customer would be very unwise to deploy any new feature directly into a production
environment. Typically, planning and testing tasks would be performed before production
deployment.
The second topic in this unit covers the planning phase, including use of the AME planning
tool to determine the amount of benefit that can be obtained for a particular workload by
using AME.
The third topic of this unit covers the deployment details, such as obtaining the AME
activation key, and configuring a partition with a memory expansion factor value.
The fourth topic of this unit covers the facilities available to monitor the performance of a
partition configured with AME.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Other technologies
In addition to the Active Memory Expansion memory utilization improvement feature
available on POWER7 processor-based systems, IBM Power Systems support the Active
Memory Sharing (AMS) feature on systems using POWER6 or newer processors.
AME is a technology that can improve the memory utilization of a single partition, and can
be used by AIX 6 partitions running on POWER7 processor-based systems. It does not
require a Virtual I/O Server (VIOS) partition to be configured.
AMS is a feature which can be utilized by multiple operating systems on hardware with
POWER6 or newer processors. This feature creates a shared memory pool controlled by
the hypervisor, which allocates physical memory to the partitions most in need. This feature
requires a VIOS partition to be configured.
These two memory technologies can be used together on POWER7 processor-based
systems.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Max partition throughput: 99 tps + 65% Max partition throughput: 166 tps
Expanded memory
Note: This is an illustrative scenario based on using a sample workload. This data
represents measured results in a controlled lab environment. Your results might vary.
Copyright IBM Corporation 2011
Notes:
The example described in the visual shows the measured improvement in application
throughput on a single partition that was configured to retain the same physical memory
allocation and use AME to expand the amount of memory presented to the application.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
LPAR 2 (AppServer)
LPAR 3 (AppServer)
LPAR 2 (AppServer)
LPAR 3 (AppServer)
LPAR 4 (AppServer)
LPAR 1 (DB + App)
LPAR 1 (DB + App)
LPAR 4 (IDLE)
Note: This is an illustrative scenario based on using a sample workload. This data
represents measured results in a controlled lab environment. Your results might vary.
Copyright IBM Corporation 2011
Notes:
\The example described in this visual shows the measured improvement in overall system
utilization and throughput on a managed system configured with multiple partitions. AME
was used on the existing partitions to free up sufficient physical memory to allow a fourth
application partition to be configured. This allowed additional CPU cores on the system to
be utilized effectively.
Disclaimer
The example described on this visual and the example on the previous visual show
memory expansion improvements that are at the high end of what is possible. Not every
application will be able to achieve similar improvements in throughput by using AME.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. True or False: Every POWER7 system comes with Active
Memory Expansion as standard.
2. True or False: Active Memory Expansion allows a partition to
effectively use more memory than the logical memory
amount allocated by the hypervisor.
Notes:
Checkpoint solutions
1. True or False: Every POWER7 system comes with Active Memory
Expansion as standard.
The answer is false.
2. True or False: Active Memory Expansion allows a partition to
effectively use more memory than the logical memory amount
allocated by the hypervisor.
The answer is true.
5. True or False: The AME feature costs the same on every POWER7
system.
The answer is false.
Copyright IBM Corporation 2011
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 1: Summary
Having completed this topic, you should be able to:
Describe the Active Memory Expansion (AME) feature
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Workload characteristics (1 of 2)
Not all workloads will benefit from using AME.
Some will benefit to a greater extent than others
Notes:
Not all workloads will benefit greatly from using AME. This figure lists the workload
characteristics that have an impact on the potential benefit of using AME.
AIX provides the amepat planning and advisory tool to advise in planning and
implementing AME.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Workload characteristics (2 of 2)
Compressibility of in-memory data
Most data will compress reasonably well, resulting in high levels of
memory expansion.
The exception is data that is already compressed.
Memory access patterns
Workloads that tend to frequently access small areas of memory will
see most benefit.
Memory segment type
AME does not compress file pages cached in memory, therefore file
serving applications will have little benefit.
Pinned memory usage
AME does not compress pinned virtual memory pages.
Workloads that pin most of their memory will have a large memory
footprint and will not benefit from AME.
Copyright IBM Corporation 2011
Notes:
This visual describes how the specified workload characteristics are evaluated to
determine suitability for use with AME.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The data compression and decompression carried out when AME is enabled will consume
CPU resources allocated to the LPAR. The actual amount of memory expansion that can
be provided by AME will be limited by the amount of available CPU resource in addition to
the compressibility of the data being used.
The AME planning tool can estimate the additional amount of CPU resource that will be
consumed for a set of modeled AME configurations, based on data collected for an existing
workload.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The amepat planning and advisory tool was added in AIX 6.1 TL4 SP2. It runs on any
system supported by AIX 6.1, and can be used to generate a report that provides advice on
the usage of AME. It can examine the memory access patterns of a running workload and
estimate the amount of CPU resource that would be required for a number of modeled
AME configurations.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Operational considerations
The amepat command can be run from the command line, or using SMIT with the
command smit amepat.
When invoked by the root user, AME modeling will be performed using the available data. If
invoked by a non-root user, modeling will be disabled.
The tool should be run for a duration of time when the existing workload is at peak
utilization. This will allow the tool to measure CPU and memory usage on the system, along
with information on data compressibility.
The command can be operated in two basic modes. In recording mode, it will gather
statistics and store the data in a recording file. In report generation mode, it will generate a
report, either from real time data, or from data supplied in a recording file.
Recorded data can be used to generate multiple reports, which will allow you to model
multiple different AME scenarios.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Command usage
The command syntax is as follows:
amepat [ { { [ -c max_ame_cpuusage] | [ -C max_ame_cpuusage ] } |
[ -e startexpfactor [ :stopexpfactor [ :incexpfactor ] ] ] }]
[ { [ -t tgt_expmem_size ] | [ -a ] } ] [ -n num_entries ]
[ -m min_mem_gain ] [ -u minucomp_poolsize ] [ -v ] [ -N ]
[ { [ -P recfile ] | [ Duration ] | [ Interval <Samples> ] } ]
Flag Description
-c max_ame_cpuusage Unit is percentage
-C max_ame_cpuusage Unit is physical processors
-e startexpfactor Use to specify expansion factors to model
-t tgt_expmem_size Use specified target expanded memory size in MB
-a Auto tune target expanded memory size based on workload
-n num_entries Number of modeled statistics entries to display
-m mim_mem_gain Minimum modeled memory gain in MB from using AME
-u minucomp_poolsize Model using specified minimum uncompressed pool size
Duration Duration in minutes (memory samples collected automatically)
Interval Samples Interval in minutes, number of memory samples to obtain
-N Disable workload modeling, only monitor resource usage
Copyright IBM Corporation 2011
Notes:
Syntax
The syntax of the amepat command is detailed on this visual.
The amepat tool allows for workload planning for AME, and also monitoring when AME is
enabled.
For workload planning, the command can be invoked in recording mode where it will collect
data to a recording file, or it can be invoked in reporting mode, where it will generate a
report either from a recording file, or by collecting data in real time.
The -N flag disables the workload planning capability, and means that amepat will only
monitor current AME statistics (if AME is currently enabled). The -N flag is implied if the
command is invoked by a non-root user.
Refer to the online documentation or the man page for the amepat command for a
complete description of the available options.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Example 2:
amepat R recfile 5 4
Monitor the system for 20 minutes, taking a data compressibility
sample every five minutes, storing the raw data in recfile.
Example 3:
amepat P recfile e 2.0:4.0:0.5
Generate a report using the data in recfile, with modeled memory
expansion factor between 2.0 and 4.0 in increments of 0.5.
Notes:
This visual contains multiple examples of invoking the amepat command, along with a
description of each example.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Report structure
The report generated by the amepat command contains multiple sections. The AME
modeled statistics and AME recommendation sections will only be displayed if the
command is run by the root user. The AME statistics section will only be displayed if the
command was run on a system that has AME enabled.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
System Configuration:
---------------------
Partition Name : cassini201
Processor Implementation Mode : POWER7
Number Of Logical CPUs : 16
Processor Entitled Capacity : 2.00
Processor Max. Capacity : 4.00
True Memory : 4.00 GB
SMT Threads : 4
Shared Processor Mode : Enabled-Uncapped
Active Memory Sharing : Disabled
Active Memory Expansion : Disabled
Notes:
Report information
The first part of the report displays the command that was invoked to generate the report,
along with information on when the command was invoked, the monitoring time and the
number of samples of memory data that were taken.
The actual monitored time will likely be longer than the specified command duration
(indicated on the command line as duration in minutes, or interval in minutes and number
of samples to take) based on the memory usage and access patterns of the workload.
System Configuration section
This section of the report details the CPU and memory configuration of the partition.
System Resource Statistics section
This section of the report details the average, minimum and maximum CPU and memory
resource utilization of the partition during the monitoring period.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The modeled Active Memory Expansion CPU usage reported by amepat is just an
estimate. The actual CPU usage used for Active Memory Expansion may be lower
or higher depending on the workload.
Copyright IBM Corporation 2011
Notes:
Data compressibility
In this first example, the data being used by the application workload consisted of binary
data structures with mixed contents of double, long, and char[] data types. This normally
compresses reasonably well, and this is reflected in the observed average compression
ratio of 5.62 for the sample period.
Modeled statistics
The modeled statistics section of the report shows the possible true memory size that could
be allocated to the LPAR, and the memory expansion factor that would be used to retain an
expanded memory size of 4GB (the current memory size of the LPAR when the amepat
command was run). Each line also shows the estimated amount of CPU resource that
would be used by AME.
Recommendation section
The report recommends an initial configuration to use when enabling AME for the first time
with this workload. You should then monitor the actual performance obtained.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
System Configuration:
---------------------
Partition Name : cassini201
Processor Implementation Mode : POWER7
Number Of Logical CPUs : 16
Processor Entitled Capacity : 2.00
Processor Max. Capacity : 4.00
True Memory : 4.00 GB
SMT Threads : 4
Shared Processor Mode : Enabled-Uncapped
Active Memory Sharing : Disabled
Active Memory Expansion : Disabled
Notes:
The second example report shown on this visual was generated on the same lab system
used for the first example. The workload running on the system was the exact same
application as used previously. The recorded CPU and memory utilization values during
the sample period are similar to those observed in the previous example.
The key difference this time is that the data being used by the application was binary data
similar to the contents of a compressed file.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The modeled Active Memory Expansion CPU usage reported by amepat is just an
estimate. The actual CPU usage used for Active Memory Expansion may be lower
or higher depending on the workload.
Copyright IBM Corporation 2011
Notes:
Data compressibility
Note that in this second example, the average compression ratio detected during the
sampling period was only 1.06. As such, the workload would not benefit from using AME,
since the compressed data being added to the compressed pool would consume about the
same amount of space as the uncompressed copy of the data. If AME were enabled in this
situation, CPU resources would be utilized to compress and decompress the data, but
there would be almost no gain in effective memory capacity.
Because of the nature of the data being used in this sampling period, the modeled statistics
section only shows one situation, and that reflects the current configuration of the partition.
The recommendation section suggests a memory expansion factor of 1.0, essentially
meaning that no data compression will take place. In this situation, it would likely be better
to leave AME completely disabled (rather than enabling it with an expansion value of 1.0).
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. True or False: Any user can use the amepat command to
generate AME modeling information.
Notes:
Checkpoint solutions
1. True or False: Any user can use the amepat command to generate
AME modeling information.
The answer is false.
2. True or False: The amepat command can be used to generate a
report using recorded data.
The answer is true.
3. True or False: The amepat command should be run when the target
workload is idle.
The answer is false.
4. True or False: The amepat command can run on any system running
AIX 6.1 TL4 SP2 or above.
The answer is true.
5. True or False: The amepat command should only be run when AME
is disabled.
The answer is false.
Copyright IBM Corporation 2011
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 2: Summary
Having completed this topic, you should be able to:
List workload characteristics used to evaluate suitability for
AME
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Minimum requirements
Active Memory Expansion can be enabled on AIX partitions running AIX 6.1 TL4 SP2 or
above when running on POWER7 processor-based systems that have a valid Active
Memory Expansion activation. The activation can be permanent or a trial activation that is
still within the 60 day validity period. The system must be HMC managed, and the HMC
must be running V7 R7.1.0 or above of the HMC software.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Trial activation
Since Active Memory Expansion is a new feature, and not all workloads might benefit from
its use, IBM has decided to allow a one-time 60 day free trial activation of AME to be
generated for each eligible POWER7 processor-based system.
The trial activation request is made online from the Capacity on Demand website at the
following URL:
http://www-03.ibm.com/systems/power/hardware/cod/activations.html
This website provides links for other useful information related to Capacity on Demand
features.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
System information
In order to submit a trial request for AME, you will need to collect information about the
POWER7 processor-based managed system that the activation code will be used on.
This visual shows the menu path used to obtain this needed information.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Request submission
Once the system information has been gathered, you can proceed with the trial AME
activation request. The visual above shows the web page obtained by following the link to
request a trial AME activation that is shown on the Capacity on Demand Web site shown in
a previous visual. This page contains multiple mandatory fields for the required system
information, along with customer contact details.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Once a trial request has been submitted, the appropriate Virtualization Engine Technology
(VET) code will be generated, and sent to the email address specified in the contact
details. The activation code can also be retrieved from the Capacity on Demand: Activation
Code website at http://www-912.ibm.com/pod/pod.
The screen capture on the left side of the visual shows the page displayed at the Activation
Code website. You can enter a system type and serial number to display the available
activation code information. A typical page is shown in the screen capture on the right side
of the visual. The VET code contains the information that will enable the AME feature on a
POWER7 processor-based system.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Once the required VET code has been obtained, it should be applied to the managed
system using the HMC interface as shown in this visual.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
System capabilities
Once the VET code for AME has been entered, you should check the Capabilities tab of
the managed system properties, as shown on the visual above. The value of the Active
Memory Expansion Capable property should be displayed as True.
Enabling of the AME capability on a managed system is dynamic. There is no need to
shutdown and then restart the managed system for the code to be recognized.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Upon completion of the 60 day trial, you might wish to permanently enable the AME
capability on a managed system. This is done by placing an MES upgrade order against
the system serial number. Delivery of the activation code for upgrade orders will be made
using the Activation Code website as shown earlier.
A permanent activation of AME can also be made as part of an initial system order. In this
case, the system will be delivered from the factory with the capability already enabled.
There will be no need to retrieve and then enter an activation code.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Once a managed system has had the AME capability enabled, you can configure AME on
individual LPARs. The AME configuration of each LPAR is independent, and is made by
specifying a memory expansion factor value in the partition profile. This can be performed
when creating the partition (along with its default profile), by editing an existing profile, or by
creating a new profile.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Memory values
The minimum, desired, and maximum memory values specified in the partition profile are
actual logical memory values that will be presented to the firmware and operating system
running in the partition. If the partition is configured to use dedicated memory, then when it
is activated, it will be allocated the desired amount of physical memory assuming there is
sufficient available physical memory. If there is insufficient available physical memory to
allocate the desired value, the partition will still be activated assuming it can be provided
with an amount of physical memory that is greater than or equal to the minimum memory
value.
The extended memory value presented to the applications and users will be calculated by
applying the memory expansion factor value as a multiplier to the actual memory value
currently presented to the operating system.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-95
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
HMC command line
In addition to the HMC GUI, the HMC command line can also be used to modify and list the
AME status of partition profiles, and to list the current AME status of partitions.
This visual contains multiple examples of using the HMC command line.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-97
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Operational considerations
This visual lists additional operational considerations when using AME.
In particular, note that while AME is enabled dynamically on a managed system, a partition
must be reactivated using a suitable profile to enable (or disable) AME.
An AIX 6 partition running on POWER7 processor-based hardware will use 64KB pages for
many portions of the kernel address space when AME is not configured. When AME is
configured, even if the memory expansion factor is set to 1.0, the operating system will not
use 64KB pages by default.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-99
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
When performing memory DLPAR operations on a partition configured with AME, it is
important to understand the relationship between the values shown on the DLPAR dialog,
and the expanded memory value presented to applications and users in the partition.
Adding or removing memory is performed at the operating system level, and as such is
dealing with the actual logical memory blocks allocated to the operating system. Once the
specified number of logical memory blocks have been added to (or removed from) the
partition, a new expanded memory value will be calculated using the current memory
expansion factor value and the current logical memory amount.
The memory expansion factor value can also be changed dynamically.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-101
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unconfiguring AME
Using the GUI, simply clear the checkbox.
Expansion factor value automatically reset to 0.0
If trial AME activation has expired, must use CLI to remove AME
configuration from LPAR profile due to bug in HMC GUI in V7R7.1.0.
Notes:
AME removal
In order to remove AME from a partition configuration, the partition must be reactivated
using a profile that has a memory expansion factor value of 0.0.
When modifying a profile using the HMC GUI, simply clearing the AME checkbox will set
the expansion factor value to 0.0. When using the HMC CLI, set the mem_expansion value
in the profile to 0.0, as shown in the example on the visual above.
When a trial activation of AME expires, it will no longer be possible to activate partitions
using profiles that specify a memory expansion factor value greater than 0.0. A bug in HMC
V7 R7.1.0 prevents the GUI from clearing the AME checkbox when a trial activation of AME
has expired. The workaround in this case is to use the command line to set the expansion
factor to 0.0.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-103
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. True or False: A partition can use AME on any POWER7
system.
Notes:
Checkpoint solutions
1. True or False: A partition can use AME on any POWER7 system.
The answer is false. (AME can only be configured on a system with
the correct activation code.)
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-105
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 3: Summary
Having completed this topic, you should be able to:
List the hardware and software requirements for AME
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-107
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-109
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Existing AIX tools function as expected when run in a partition configured with AME.
Monitoring of AME partitions is very similar to those without AME.
In particular, CPU resource utilization should be monitored, since lack of CPU resource can
affect the workload. This is no different than a partition without AME, however the CPU
resource consumption will be higher when AME is compressing and decompressing pages.
A new performance metric that will need to be monitored is the expanded memory deficit.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-111
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
A memory deficit is the name used to describe a situation where a partition configured for
AME is unable to compress sufficient data to meet the expanded memory target value.
Typically this is because the actual data compression ratio achieved is less than required.
The visuals that follow contain diagrams that help to explain the concept of a memory
deficit. The LPAR depicted in the diagrams is configured as described on the visual.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-113
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
LPARs expanded
logical memory LPARs actual
logical memory
Uncompressed
data Uncompressed
2GB memory
pool
2GB
20GB
30GB Compressed
Compressed Compression memory
ratio
data 1.56 pool
28GB 18GB
View presented to
firmware, HMC, OS
Notes:
Zero deficit
When there is a zero memory deficit, the partition will be able to compress sufficient data to
reach the expanded memory target. An example of this situation is shown in the diagram
on the visual.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-115
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
View presented to
Deficit 2.8GB firmware, HMC, OS
Notes:
In a memory deficit situation, the partition will not be able to compress sufficient data to
reach the expanded memory target. An example of this situation is shown in the diagram
on the visual.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-117
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Correcting a deficit
Correcting a deficit will typically involve lowering the memory expansion factor to a less
aggressive value, and adding additional true memory to the partition. This allows the
reconfigured partition to still retain the target expanded memory value.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-119
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The amepat command can be used to perform basic monitoring of AME statistics. When
invoked with no arguments, it provides a snapshot of AME performance information. No
modeling information is provided when the command is invoked in this way.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-121
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
System Configuration:
---------------------
Partition Name : cassini201
Processor Implementation Mode : POWER7
Number Of Logical CPUs : 8
Processor Entitled Capacity : 2.00
Processor Max. Capacity : 2.00
True Memory : 2.50 GB
SMT Threads : 4
Shared Processor Mode : Enabled-Uncapped
Active Memory Sharing : Disabled
Active Memory Expansion : Enabled
Target Expanded Memory Size : 4.00 GB
Target Memory Expansion factor : 1.60
Notes:
AME information
The amepat command will display AME related information when invoked with no
arguments. The visual above contains an example of the output format.
The System Resource Statistics section contains summarized CPU and memory resource
utilization information. The CPU resource information is from when the LPAR was last
booted. Other commands should be used for fine grained interval monitoring.
The AME Statistics section is displayed when the command is run on an LPAR that has
AME enabled.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-123
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# lparstat I
. . . . .
Memory Mode : Dedicated-Expanded
Total I/O Memory Entitlement : -
Variable Memory Capacity Weight : -
Memory Pool ID : -
Physical Memory in the Pool : -
Hypervisor Page Size : -
Unallocated Variable Memory Capacity Weight: -
Unallocated I/O Memory entitlement : -
Memory Group ID of LPAR : -
Desired Virtual CPUs : 2
Desired Memory : 2560 MB
Desired Variable Capacity Weight : 128
Desired Capacity : 2.00
Target Memory Expansion Factor : 1.60
Target Memory Expansion Size : 4096 MB
Copyright IBM Corporation 2011
Notes:
AME information
The lparstat command will display AME related information when invoked with the -i flag. If
AME is currently not configured, the fields will contain a dash character. The visual contains
an example of the output format.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-125
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# lparstat -c 2 4
%user %sys %wait %idle physc %entc lbusy vcsw phint %xcpu dxm
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------
5.7 0.2 3.6 90.5 0.19 9.3 2.9 917 0 6.2 0
5.6 0.1 3.4 90.9 0.18 9.0 2.2 938 0 5.3 0
5.8 0.1 3.4 90.7 0.19 9.3 2.3 935 0 6.3 0
5.8 0.1 2.6 91.5 0.19 9.3 2.7 931 0 6.3 0
Notes:
AME information
The lparstat command will display AME information when invoked with the -c flag. The
information is only shown if AME is currently configured. The visual contains an example of
the output format.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-127
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
AME information
The vmstat command will display AME information when invoked with the -c flag. The
information is only shown if AME is currently configured. The visual contains an example of
the output format.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-129
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
AME information
The topas command will display AME information on the main panel if AME is currently
configured. The visual contains an example of the output format.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-131
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The amepat command can also be used on a partition with AME configured to perform fine
tuning of the configuration. Invoke the command to gather data while the running workload
is at peak utilization. The generated report will be more accurate and useful, as actual AME
CPU resource consumption and achieved data compression rate information is available.
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-133
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. True or False: Monitoring a partition that has AME configured
is completely different to monitoring a partition without AME.
Notes:
Checkpoint solutions
1. True or False: Monitoring a partition that has AME configured
is completely different to monitoring a partition without AME.
The answer is false.
2. True or False: A memory deficit is resolved by lowering the
expansion factor value, and removing true memory.
The answer is false.
3. True or False: The vmstat command will always report AME
statistics when AME is enabled.
The answer is false.
4. True or False: The topas command will always show AME
statistics on the initial page when AME is enabled.
The answer is true.
Copyright IBM Corporation 2011
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-135
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 4: Summary
Having completed this topic, you should be able to:
List the tools used to monitor AME performance
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-137
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-139
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
Having completed this unit, you should be able to:
Describe the Active Memory Expansion (AME) feature
List the benefits of using AME
Define the purpose of the memory expansion factor
List workload characteristics used to evaluate suitability for AME
Describe how to use the AME planning tool
Explain the output produced by the AME planning tool
Describe how to select a suitable memory expansion factor
List the hardware and software requirements for AME
Describe how to activate the AME feature on a managed system
Configure a partition to use AME
List the tools used to monitor AME performance
Determine the memory compression level achieved in a partition
Determine the CPU resources used for memory compression and
decompression
Copyright IBM Corporation 2011
Notes:
Copyright IBM Corp. 2010, 2011 Unit 5. Active Memory Expansion: Overview 5-141
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
01:00
References
SG24-7590-01 IBM PowerVM Virtualization Managing and Monitoring
Redpaper: Implementing the Qlogic Intelligent Pass-thru Module for
IBM BladeCenter
Unit objectives
After completing this unit, you should be able to:
Describe the NPIV PowerVM feature
Describe how to configure virtual Fibre Channel adapters on
the Virtual I/O Server and client partitions
Discuss how to use the HMC GUI and commands to work with
the World Wide Port Name (WWPN) pairs
Identify commands used to examine the NPIV configuration
Notes:
NPIV: Overview
Virtualization of a physical Fibre Channel port
Notes:
With N_Port ID Virtualization (NPIV), you can configure the managed system so that
multiple logical partitions can access independent physical storage through the same
physical fiber channel adapter. To access physical storage in a typical storage area
network (SAN) that uses fiber channel, the physical storage is mapped to logical units
(LUNs) and the LUNs are mapped to the ports of physical fiber channel adapters. Each
physical port on each physical fiber channel adapter is identified using one worldwide port
name (WWPN). NPIV is a standard technology for fiber channel networks and enables you
to connect multiple logical partitions to one physical port of a physical fiber channel
adapter. Each logical partition is identified by a unique WWPN, and this allows you to
connect each logical partition to independent physical storage on a SAN.
Using their unique WWPNs and the virtual fiber channel connections to the physical fiber
channel adapter, the operating systems running in the client logical partitions discover,
instantiate, and manage their physical storage located on the SAN. With NPIV you have
multiple fiber Channel initiators occupying and using a single physical port; easing
hardware requirements in Storage area network design.
Uempty NPIV also allows one F_Port (switch port) to be associated with multiple N_Port (node port)
IDs. Physical fiber channel HBA (Host Bus Attachment) can be shared across multiple
guest operating systems in a virtual environment. The combination of the ability of an
N_Port device, such as a host bus adapter (HBA), to have multiple N_Port IDs and the
ability of fabric switches to accept NPIV capable devices is the basic concept of transparent
switching.
Using the SAN tools of the SAN switch vendor, you zone your NPIV-enabled switch to
include WWPNs that are created by the HMC for any virtual fiber Channel client adapter
with the WWPNs from your storage device in a zone. This is the same as required in an
environment using physical fiber channel adapters The SAN uses zones to provide access
to the targets based on WWPNs.
Instructor notes:
Purpose Describe NPIV.
Details
Additional information
Transition statement Lets take a look at an environment without NPIV.
Uempty
Physical
P
adapter Virtual I/O Server Client Client
C VSCSI client
virtual adapter Physical Device Virtual
Virtual Virtual
device mapping target
target target
Physical device
device device
storage NON NPIV S1 S2
FC adapter C2 C1
SAN
SAN
B
C
Hypervisor
A
Notes:
Before NPIV, the only way to share a Fibre Channel adapter was by using the Virtual SCSI
protocol.
Virtual SCSI is based on a client-server relationship. The VIO Server owns the physical
resources as well as the virtual SCSI server adapter, and acts as a server, or SCSI target
device. The client logical partitions have a SCSI initiator, referred to as the virtual SCSI
client adapter, and access the virtual SCSI targets as standard SCSI LUNs. You configure
the virtual adapters by using the HMC or IVM. The configuration and provisioning of virtual
disk resources is performed by using the VIO Server. Physical disks owned by the VIO
Server can be either exported and assigned to a client logical partition as a whole or can be
partitioned into parts, such as logical volumes or files. The logical volumes and files can
then be assigned to different logical partitions. Therefore, using virtual SCSI, you can share
adapters as well as disk devices. To make a physical volume, logical volume, or files
available to a client logical partition requires that it be assigned to a virtual SCSI server
adapter on the Virtual I/O Server. The client logical partition accesses its assigned disks
through a virtual-SCSI client adapter. The virtual-SCSI client adapter recognizes standard
SCSI devices and LUNs through this virtual adapter.
Instructor notes:
Purpose
Details VSCSI requires the configuring of virtual target devices in the VIOS. NPIV does
not require this.
Additional information The following SCSI peripheral device types are supported:
Disk backed by logical volume
Disk backed by physical volume
Disk backed by file
Optical CD-ROM, DVD-RAM, and DVD-ROM
Optical DVD-RAM backed by file
Tape devices
Transition statement
Uempty
Notes:
With NPIV, the VIOS's role is fundamentally different. The VIOS facilitates adapter sharing
only. There is no device level abstraction or emulation. Rather than a storage virtualizer,
the VIOS serving NPIV is a passthru, providing a fiber channel pass-through connection
from the client to the SAN.
NPIV is a standard technology for fiber channel networks that enables you to connect
multiple logical partitions to one physical port of a physical fiber channel adapter. Each
logical partition is identified by a unique WWPN, which means that you can connect each
logical partition to independent physical storage on a SAN. To enable NPIV on the
managed system, you must create a Virtual I/O Server logical partition (version 2.1, or
later) that provides virtual resources to client logical partitions. You assign the physical fiber
channel adapters (with support for NPIV) to the Virtual I/O Server logical partition. Then,
you connect virtual fiber channel adapters on the client logical partitions to virtual fiber
channel adapters on the Virtual I/O Server logical partition. A virtual fiber channel adapter
is a virtual adapter that provides client logical partitions with a fiber channel connection to a
storage area network through the Virtual I/O Server logical partition. The Virtual I/O Server
cannot access and does not emulate the physical storage to which the client logical
partitions have access.The Virtual I/O Server logical partition provides the connection
between the virtual fiber channel adapters on the Virtual I/O Server logical partition and the
physical fiber channel adapters on the managed system.
NPIV benefits
Optimizes FC HBA resource usage
Simplifies SAN-based resource assignments to client partitions
LUN assigned to the WWPNs of the client virtual adapter
Compatible with storage solutions
SAN managers, Copy Services, backup / restore
Supported platforms
POWER6 servers and blades
HMC-managed and IVM-managed servers
Enables access to other SAN devices like tape libraries
Compatible with Live Partition Mobility
Physical FC HBA port supports 64 virtual ports
VIOS can support NPIV and vSCSI simultaneously
Notes:
Key benefits include the following:
Automatically adjusts to SAN fabric speed: 8Gbps, 4Gbps, or 2Gbps
- Optimizes resource usage since the physical fiber channel adapter is shared.
- Each physical NPIV capable FC HBA (Host Bus Adapter) will support 64 virtual
ports.
- NPIV simplifies the assignment of SAN-based resources to client partitions and SAN
zoning.
The LUN is assigned to the WWPNs of the client virtual adapter. The LPAR host
is defined as the disk sub-system.
You do not have to identify LUN numbers on the VIOS before mapping to
clients).
- Supported on POWER6 servers, blades, HMC-managed and IVM-managed
servers.
- Enables access to other SAN devices like tapes libraries.
- VIOS can support NPIV and vSCSI simultaneously.
- Compatible with LPM (Live Partition Mobility).
Notes:
VIOS can support NPIV and vSCSI simultaneously. Some LUNs can be assigned to the
WWPNs of the Physical NPIV FC adapter. These include the WWPNs assigned to the
clients virtual fiber channel adapters. Simultaneously, the VIOS can also provide access to
LUNs that are mapped to Virtual Target Devices and exported as vSCSI devices. There
can be MPIO or vendor-supplied multi-pathing software used to manage the paths to the
LUNs. You cannot mix vSCSI and NPIV paths to the same LUN.
The client can have one or more Virtual I/O Servers (VIOS) providing the pass-through
function for NPIV. The client can also have one or more VIOS hosting vSCSI storage. The
administrator could configure the client to boot from internal disk, vSCSI disk, or NPIV disk.
The physical HBA in the VIOS can support both NPIV and vSCSI traffic.
Notes:
Only the first SAN switch which is attached to the Fibre Channel adapter in the Virtual I/O
Server needs to be NPIV capable. Other switches in your SAN environment do not need to
be NPIV capable.
An NPIV implementation requires two participating ports:
An N_Port that communicates with a Fibre Channel fabric for requesting port
addresses and subsequently registering with the fabric.
An F_Port (SAN switch port) that assigns the addresses and provides fabric services.
WWPNs are generated based on the range of names available for use with the prefix in the
vital product data on the managed system. This 6-digit prefix comes with the purchase of
the managed system and includes 32,000 pairs of WWPNs. When you remove the
connection between a logical partition and a physical port (for example, by deleting an
adapter), the hypervisor deletes the WWPNs that are assigned to the virtual Fibre Channel
adapter on the logical partition.
Uempty The hypervisor does not reuse the WWPNs that are assigned to the virtual fiber Channel
client adapter on the client logical partition. If you create a new virtual fiber channel
adapter, you get a NEW pair of WWPNs. The pair is critical to proper operation, and both
must be zoned (2nd WWPN is used for Live Partition Mobility).
Power6 hardware, minimum firmware Ex340_041
Entry level systems: EL340_041
Midrange systems: EM340_041
Software:
HMC V7.3.4, or later
Virtual I/O Server Version 2.1 with Fix Pack 20.1, or later
AIX 5.3 TL9, or later
AIX 6.1 TL2, or later
SDD 1.7.2.0 + PTF 1.7.2.2
IBM Multipath Software
- NPIV clients require the following versions:
SDD 1.7.2.2
SDDPCM 2.2.0.6 or 2.4.0.1
http://www-01.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg
1S1003469
- For VIOS 2.1, follow the SDD/SDDPCM support matrix for AIX 6.1 versions
http://www-01.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350
EMC Power Path
- AIX 6.1clients require Power Path 5.3.0.0
- VIOS 2.1 would need Power Path 5.3.0.0
Hitachi Dynamic Link Manager
- AIX clients require HDLM 5.9.4
- VIOS 2.1 would need HDLM 5.9.4
Instructor notes:
Purpose Identify the requirements.
Details
Additional information If you reach the maximum number of WWPNs, you will need
to contact IBM and request a new activation code.
Transition statement The following discusses the task that must be performed when
configuring NPIV.
Uempty
Activate the VIO Server or run cfgdev. Check for a new vfchost# adapter
definition.
Notes:
The Virtual I/O Server cannot access and does not emulate the physical storage to which
the client logical partitions have access. The Virtual I/O Server provides the client logical
partitions with a connection to the physical fiber channel adapters on the managed system.
There is always a one-to-one relationship between virtual fiber channel adapters on the
client logical partitions and the virtual fiber channel adapters on the Virtual I/O Server
logical partition. That is, each virtual fiber channel adapter on a client logical partition must
connect to only one virtual fiber channel adapter on the Virtual I/O Server logical partition,
and each virtual fiber channel on the Virtual I/O Server logical partition must connect to
only one virtual fiber Fibre Channel adapter on a client logical partition.
Configuring a virtual Fibre Channel adapter using the HMC
You can configure a virtual fiber channel adapter dynamically for a running logical partition
using the Hardware Management Console (HMC). A Linux logical partition supports the
dynamic addition of virtual Fibre Channel adapters only if the DynamicRM tool package is
installed on the Linux logical partition. To download the DynamicRM tool package, see the
Service and Productivity Tools for Linux on POWER systems Web site.
When you dynamically add a virtual Fibre Channel adapter to a client logical partition, the
virtual Fibre Channel adapter (and the associated WWPNs) is lost when you restart the
logical partition. If you add the virtual fiber channel adapter to a partition profile after you
dynamically added it to the logical partition, the profile-based virtual fiber channel adapter
is assigned a different pair of worldwide port names (WWPNs) when the LPAR is started
with this profile. For this reason, the preferred way to add virtual Fibre Channel adapters is
by adding it to the partition profile.
Activate the VIO Server, or run cfgdev if virtual adapter was added using DLPAR.
Map the Virtual FC Adapter to an NPIV Physical Adapter
- vfcmap -vadapter vfchost2 -fcp fcs0
Check Virtual FC Mapping
- lsmap all npiv
Activate the LPAR, boot as SMS and install OS, or run cfgmgr if FC Virtual adapter
were added using DLPAR
Change the reserve policy attribute of the disk to no_reserve
Notes:
Server and client virtual Fibre Channel adapters are mapped one-to-one with the vfcmap
command in the VIOS.
Notes:
The WWPNs can also be displayed using the lscfg command at the client. For example:
# lscfg -vl fcs2
Using the SAN tools of the SAN switch vendor, you zone your NPIV-enabled switch to
include WWPNs that are created by the HMC for any virtual Fibre Channel client adapters.
You would put the WWPNs of the virtual adapters and the WWPNs from your storage
device in a zone; just as with a physical fiber channel adapter environment.
Some SAN switches require an optional license to activate NPIV capabilities.
Notes:
In this example, the client is not booting from the SAN LUN. In order for npiv client wwpn
to show up on the switch, the client must first, do an npiv login, which requires the client to
first do device discovery.
Notes:
The lscfg and lsmap commands can be helpful when examining the details of the
configuration. In the lsmap command you can find the name of the NPIV clients, the status
of the connections (LOGGED_IN implies the SAN switch has identified and connected to
the client's n_port), and location codes for the associated devices.
Notes:
If you delete the Virtual Fibre Channel Adapter and recreate it from the HMC GUI (using
DLPAR or in Partition Profile), then you get a new pair of WWPNs. The LUN is not
assigned to your LPAR and a SAN reconfiguration is required.
A new host must be created.
SAN switch and zoning could be affected.
To avoid SAN reconfiguration, change the WWPNs of the newly created Virtual FC Client
adapter and define it with the recorded values from the original adapter. (This is what the
HMC does during LPM process with target Virtual client FC.)
You are able to change the WWPNs of the virtual adapter to match the original WWPNS by
using the HMC command line (be careful with HMC CLI syntax, backslash and double
quotes). Below is an example:
chsyscfg -r prof -m sys154 -i name=mobility, lpar_name=sys154c4,
\"virtual_fc_adapters=\"\"14/client/1/sys154v1/23/c0507600667c0018,c0507600
667c0019/1\"\"\
Heterogeneous multipathing
Supported between virtual NPIV and physical Fibre Channel adapters
Delivers flexibility for Live Partition Mobility environments
VIOS#1 AIX
Passthru module A
NPIV
NPIV
Fiber
Fiber
HBA
HBA
POWER Hypervisor
Storage Controller
SAN Switch SAN Switch
A B C D
A B C D
Notes:
Provides efficient path redundancy to SAN resources for several LPARs using a single
NPIV adapter. In the example above, the virtual fiber channel adapter is used as a backup
path. This configuration also provides Dynamic Heterogeneous Multi-Path I/O. During a
Partition Mobility operation the LPAR could temporarily use the virtual path. The
administrator would have to remove the physical path using DLPAR, migrate, and then add
(reconfigure) the physical adapter at the target system.
# lspath
Enabled hdisk0 fscsi0
Enabled hdisk0 fscsi0
Enabled hdisk0 fscsi2
Enabled hdisk0 fscsi2
Notes:
# lspath -l hdisk0 -s available -F"connection:parent:path_status:status"
50050763060b81c5,4050400000000000:fscsi0:Available:Enabled
50050763061881c5,4050400000000000:fscsi0:Available:Enabled
50050763060b81c5,4050400000000000:fscsi2:Available:Enabled
50050763061881c5,4050400000000000:fscsi2:Available:Enabled.
Logical partitions
Physical
NPIV Fibre VIOS AIX Linux AIX AIX AIX AIX
Channel
Virtual
1 2 3 4 5 6
HBA
FC
server A1
adapter A7
A2 A3 A4 A5 A6 A8
Virtual client FC adapters
POWER Hypervisor
A1 A2 A3 A4 A5 A6 A7 A8
Figure 6-16. Shared NPIV adapter for efficient path redundancy AN313.1
Notes:
Redundancy configurations help protect your network from physical adapter failures as well
as Virtual I/O Server failures. Similar to virtual SCSI redundancy, virtual Fibre Channel
redundancy can be achieved using Multi-path I/O (MPIO) and mirroring at the client
partition. The difference between traditional redundancy with SCSI adapters and the NPIV
technology using virtual Fibre Channel adapters is that the redundancy occurs on the client
because only the client recognizes the disk.
The physical Fibre Channel port is connected to a virtual Fibre Channel adapter on the VIO
Server. The virtual Fibre Channel adapter on the VIO Server is connected to ports on the
physical Fibre Channel adapter. A single adapter could have multiple ports.
This example uses host bus adapter (HBA) failover to provide a basic level of redundancy
for the client logical partitions number 5 and 6. Their primary paths are through the
assigned physical fiber channel adapters. The backup paths are the virtual Fibre Channel
adapters.
It is recommended that you configure virtual Fibre Channel adapters from multiple logical
partitions to the same HBA, or you configure virtual Fibre Channel adapters from the same
logical partition to different HBAs.
LVM LVM
LVM
Multipathing Multipathing
Multipathing
Disk driver Disk driver
Disk driver
PHYP
SAN
Notes:
This is a simple diagram to illustrate how the VSCSI configuration requires the VIOS's to
provide key components and devices.
LVM
Multipathing
Disk driver
PHYP
SAN
Notes:
With NPIV, the VIOS does not have virtual targets configured. A virtual fiber server adapter
is created, but it serves as a connection to the pass-through module.
VIOS VIOS
vio client vio client
WWPN WWPN
N N
P P
I I
V V
WWPN NPIV enabled WWPN
vio client SAN vio client
WWPN WWPN
N N
vio client WWPN P P WWPN
vio client
WWPN I I WWPN
V V
vio client vio client
WWPN WWPN
VIOS VIOS
Notes:
Target storage subsystem must be zoned and visible from source and destination systems
for LPM to work.
Active/passive storage controllers must BOTH be in the SAN zone for LPM to work.
The infrastructure must meet the following requirements for migrations with virtual Fibre
Channel adapters:
The destination Virtual I/O Server must contain an NPIV-capable physical Fibre
Channel adapter that is connected to the NPIV-enabled port on the switch that has
connectivity to a port on a SAN device that has access to the same targets as the client
is using on the source CEC.
On the source Virtual I/O Server partition, do not set the adapter as required when you
create a virtual Fibre Channel adapter. The virtual Fibre Channel adapter must be solely
accessible by the client adapter of the mobile partition.
Uempty On the destination Virtual I/O Server partition, do not create any virtual Fibre Channel
adapters for the mobile partition. These are created automatically by the migration
function.
The mobile partitions virtual Fibre Channel WWPNs must be zoned on the switch with
the storage subsystem. You must include both WWPNs from each virtual Fibre Channel
adapter in the zone. The WWPN on the physical adapter on the source and destination
Virtual I/O Server does not have to be included in the zone.
The following components must be configured in the environment to support Live
partition Mobility:
- An NPIV-capable SAN switch
- An NPIV-capable physical Fibre Channel adapter on the source and destination
Virtual I/O Servers
- Each virtual Fibre Channel adapter on the Virtual I/O Server mapped to an
NPIV-capable physical Fibre Channel adapter
- Each virtual Fibre Channel adapter on the mobile partition mapped to a virtual Fibre
Channel adapter in the Virtual I/O Server
- At least one LUN mapped to the mobile partition's virtual Fibre Channel adapter
- Mobile partitions may have virtual SCSI and virtual Fibre Channel LUNs. Migration
of LUNs between virtual SCSI and virtual Fibre Channel is not supported at the time
of publication.
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
This visual shows a list of commands that are useful when managing an NPIV
environment.
Instructor notes:
Purpose
Details
Additional information Other commands in AIX:
lspath
lspath -l hdisk0 -s available
-F"connection:parent:path_status:status"
Transition statement
Uempty
Checkpoint
1. As with SCSI, a server adapter must be created at the VIOS.
However, how does its function differ from VSCSI?
Notes:
Instructor notes:
Purpose
Details
Checkpoint solutions
1. As with SCSI, a server adapter must be created at the VIOS.
However, how does its function differ from VSCSI?
The answer is with N_PIV, the VIOS provides a pass-thru
service.
Additional information
Transition statement
Uempty
Exercise
Unit
exerc
ise
Notes:
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Unit summary
Having completed this unit, you should be able to:
Describe the NPIV PowerVM feature
Describe how to configure virtual Fibre Channel adapters on
the virtual I/O server and client partitions
Discuss how to use the HMC GUI and commands to work with
the World Wide Port Name (WWPN) pairs
Identify commands used to examine the NPIV configuration
Notes:
Instructor notes:
Purpose
Details
Additional information
Transition statement
Estimated time
04:00
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty
Unit objectives
After completing this unit, you should be able to:
Discover physical to virtual SCSI device configuration
Determine which client partitions and devices are affecting the Virtual I/O Server performance
Describe the partition resource sizing guidelines for Virtual I/O Servers used for virtual SCSI
Use performance analysis tools to monitor virtual SCSI device performance
Describe how the following tuning options affect virtual Ethernet performance:
MTU sizes, CPU entitlement, TCP checksum offloading, simultaneous multithreading
Monitor virtual Ethernet utilization statistics
Describe Virtual I/O Server sizing guidelines for hosting shared Ethernet adapter services
Physical adapters, memory, and processing resources
Configure shared Ethernet adapter threading
Configure TCP segmentation offload on the shared Ethernet adapter
Configure SEA bandwidth apportioning and monitor with the seastat utility
Monitor shared Ethernet adapter network traffic with Virtual I/O Server utilities
Describe the Integrated Virtual Ethernet (IVE) adapter function
List performance and network availability considerations when configuring IVE devices
Tune the MCS value and queue pairs for optimal performance or scalability
View queue pair configuration from AIX
Monitor IVE port usage
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Performance considerations
Virtualization alters the way we look at system performance. We still follow the same rules
with respect to identifying existing or potential bottlenecks, but the remedy can be different
and more difficult to obtain. It is important to decide what the performance goal is, or what
the performance goals are. Understand how devices should be configured for the best
results and when virtual devices should be used instead of natively-attached, physical
devices. Clients can utilize a mix of directly-attached physical devices and virtual devices
depending on its requirements and availability of devices. Client partitions can use one or
more VIOS partitions to provide their virtual services for load-balancing or redundancy.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Each partition has two virtual serial adapters to support virtual console access. Do not
remove these and there is no need to create additional virtual serial adapters. Every
POWER6 Virtual I/O Server partition as of server firmware level 01EL320, will have a
Virtual Asynchronous Services Interface (VASI) adapter. Four additional VASI adapters are
added if the VIOS is designated as a paging VIOS for a shared memory pool. The virtual
Ethernet adapter is supported on POWER5 and POWER6 processor-based server
partitions running AIX V5.3 or higher or Linux. The Integrated Virtual Ethernet (IVE)
adapter is available on most POWER6 processor-based systems and is also called the
Host Ethernet Adapter (HEA). It is an integrated physical Ethernet adapter which can be
shared between partitions. IVE logical ports are supported in partitions running AIX V5.2
and higher and Linux.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Virtual SCSI devices are backed by physical devices on the Virtual I/O Server that provide
disk storage or media devices to the client. Even though SCSI is the protocol used for the
virtualization, the actual backing storage devices do not need to be SCSI devices. The
shared Ethernet adapter is a network bridge device that connects virtual Ethernet traffic on
a managed system to an external network. Virtual Fibre Channel adapters use N_Port
Identifier Virtualization (NPIV) technology
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
The items in the visual are considerations for VIOS performance or other virtual device
performance. Most of these you will prove during the hands-on lab exercises throughout
this course.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Monitoring resources on the Virtual I/O Server and AIX client
The Virtual I/O Server has a command line interface (CLI) with its own set of commands.
Use the help command at the CLI to see the available commands. AIX tools are available
by using the oem_setup_env command to access the root shell. The VIOS CLI has the
topas command which will help monitor all of the key resource areas. In addition, for CPU
and I/O usage statistics, you can use the viostat command which is like the iostat AIX
command. Use entstat and seastat for shared Ethernet adapter devices. The optimizenet
command is like the no AIX command.
Monitor system resources on the AIX client partitions as you normally would. If you find an
area with a bottleneck, be sure to determine whether the device is native or virtual. If
virtual, track it back to the physical device on the Virtual I/O Server.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
There are two key points when monitoring virtual devices. First, if you run out of processing
or memory resources on the Virtual I/O Server then this affects all of the clients which are
using those resources. Careful monitoring and tuning of the Virtual I/O Server partition is
necessary. Second, if you discover a performance issue on a device, be sure to determine
the exact physical device used as the backing device. Then tune the physical device as
you normally would in a non-virtualized environment.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Checkpoint
1. Once a CPU constraint is found as a bottleneck, what are
some steps that can be taken to solve the problem?
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions
1. Once a CPU constraint is found as a bottleneck, what are
some steps that can be taken to solve the problem?
The answers are check process activity to determine errant
processes, add CPU resources, change configuration (for
example, capped to uncapped and dedicated to donate
cycles, and move workload.
Additional information
Transition statement
Uempty
Topic 1: Summary
Having completed this topic, you should be able to:
Describe the performance considerations when using virtual
I/O
Use a methodical approach when tuning virtual I/O
performance
Describe tools that can be used to analyze and tune virtual
configurations
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Physical storage
Notes:
This visual shows an example system with virtual devices. The virtual target devices
(VTDs) on the Virtual I/O Server associated with vhost0 are vtscsi0, vtsci1, and vtopt0.
Each one represents the association of a single backing device to the virtual SCSI server
adapter.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
In the client partition, the virtual storage can be manipulated using the Logical Volume
Manager (LVM) just like a physical volume. The virtual SCSI client adapter can use these
devices like any other physically connected hdisk device for boot, swap, mirror, or any
other supported AIX feature. Performance considerations from dedicated storage are still
applicable when using virtual storage, such as spreading hot logical volumes across
multiple disks on multiple adapters so that parallel access is possible.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Virtual SCSI uses more processing power
Using Virtual SCSI (VSCSI) requires extra processing power compared to native disks.
This is due to the processing of extra Hypervisor calls and the paths involved for
exchanging I/O requests between the initiator and target adapters. The use of VSCSI will
roughly double the amount of processor time to perform each I/O when compared to using
directly attached storage. This processor load is split between the Virtual I/O Server and
the virtual SCSI client. Double the processor time sounds bad; however, the extra
processing time to process one 4KB I/O request is less than 50,000 CPU cycles. On a
1.65GHz processor core, this represents only 0.03 milliseconds. This is less than 1% of the
average seek time of any high performance 15,000 rpm SCSI disk.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
ms
ms
1
0.03
0.02 0.5
0.01
0 0
4KB 8KB 32KB 64KB 128KB 4KB 8KB 32KB 64KB 128KB
Block size Block size
Notes:
I/O latency when using VSCSI
I/O latency is the time it takes between the initiation of a disk I/O and completion as
observed by the thread. Latency is an important attribute of disk I/O. Applications which are
multi-threaded or use asynchronous I/O might be less sensitive to I/O latency, but under
most circumstances, lower latency is better for performance. Latency also varies with
different I/O block sizes. Consider a program which performs 1000 random disk I/Os one at
a time. If the time to complete an average I/O is six milliseconds, the program will take at
least six seconds to run; however, if the average I/O response time is reduced to three
milliseconds, the programs run time could be reduced by three seconds. The chart on the
right side of the visual shows average response times for an I/O using a typical disk used
natively in a partition. This chart is provided to illustrate the fact that 0.06 milliseconds is a
small fraction of an overall average I/O response time.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
I/O bandwidth
processor Virtual
I/O Server. 40
20
0
4KB 8KB 32KB 64KB 128KB
Block size
Notes:
I/O bandwidth is the maximum amount of data which can be read or written to storage in a
unit of time. Bandwidth can be measured from a single thread, or from a set of threads
executing concurrently. Though many commercial applications are more sensitive to
latency than bandwidth, bandwidth is crucial for many typical operations such as backup
and restore. The chart in the visual shows a comparison of measured bandwidth using
VSCSI and native disks for reads with varying block sizes of operations. In these tests, a
single thread operates sequentially on a constant file which is 256MB in size. The
difference between virtual I/O and native I/O in these tests is attributable to the increased
latency using virtual I/O. Because of the larger number of operations, the bandwidth
measured with small block sizes is much lower than with large block sizes.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
12
10
8 Native
Native
6 lvPV-backed
back
pdisk back
LV-backed
4
2
0
4K 8K 32K 64K 128K
Block size
Copyright IBM Corporation 2011
Notes:
Cycles per byte comparison
The graph in the visual shows a comparison of the CPU cycles per byte for native I/O and
VSCSI I/O using both logical volume backed storage and physical volume backed storage.
In the visual above, PV-backed is physical disk backed storage and LV-backed is logical
volume backed storage. The VSCSI measurements are of the Virtual I/O Server only; the
client is not included in the comparison. The processor efficiency of I/O improves with
larger I/O block sizes. Effectively, there is a fixed latency to start and complete an I/O
transaction, with some additional cycle time based on the size of the I/O transaction.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Figure 7-18. Sizing the Virtual I/O Server for virtual SCSI AN313.1
Notes:
Virtual I/O Server processor and memory sizing
Use of shared processors for VSCSI servers will slightly increase I/O response time but
might be worth the benefits of flexible processor entitlement sizing and the ability to mark
the partition as uncapped. Additional entitlement should be added when using shared
processors compared to dedicated processors on the VIOS. Tests have shown that with
low I/O loads and a small number of partitions, shared processors on the Virtual I/O Server
partition has little effect on performance. For more efficient virtual SCSI implementation
with larger loads, it might be advantageous to configure the Virtual I/O Server partition as a
dedicated processor partition. The memory requirements for the VSCSI server are modest
because there is no data caching in the VSCSI server. With large I/O configurations and
very high data rates, 1GB of memory for the VSCSI server is typically more than enough.
For configurations with low I/O rates with a small number of attached disks, 512MB of
memory is usually sufficient. If using IVE logical ports, configure an additional 103MB per
port.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
CPU cycles chart
The chart in the visual shows the typical number of CPU cycles per operation for both
physical volume and logical volume backed operations on a 1.65GHz POWER5 processor
core. These numbers are measured at the physical processor with SMT enabled. For other
CPU frequencies, adjust the cycles in the table by multiplying the cycles per operation by
the ratio of the frequencies. For example, to adjust for a 4.2GHz CPU, 1.65GHz/4.2GHz =
0.39. Multiply the CPU cycles in the table by 0.39 to get the required cycles per operation.
For example, 45,000 cycles would become 17,550 cycles.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Sizing example based on knowledge of the I/O traffic
The formula to reach the 0.85 processing units is shown on the visual. The 47,000 number
is taken from the chart on the previous visual for 8KB blocks. The 120,000 number is taken
from the same chart for 128KB blocks. To customize the formula for a different processor
speed, use the ratio of the processor speeds to convert the number of cycles. For 4.2GHz
CPU you would convert the 47,000 CPU cycles into 18,330, and the 120,000 CPU cycles
into 46,800 so the result would be as follows: 128310000 + 183300000 + 234000000 /
1650000000 = 0.33 processors Alternatively, do the calculation using the 1.65GHz
information, then multiple the resulting processing units by the ratio between the 1.65GHz
processor speed and the speed of your processor. For example, for 4.2GHz, multiple the
0.85 processors by 0.39 = 0.33 processors.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Sizing based on planning for maximum bandwidth configuration
If the server is sized for maximum bandwidth (which assumes sequential I/O), the
calculation will result in a much higher processor requirement than what might actually be
needed. Since disks are much more efficient doing large sequential I/Os than small random
I/Os, we can drive a much higher number of I/Os per second. Assume that a Virtual I/O
Server has 32 disks capable of 50MB per second when doing 128KB I/Os. That implies
each disk could average 390 disk I/Os per second (50,000,000 / 128,000=390.625). Thus,
the entitlement necessary to support 32 disks, each doing 390 I/Os per second with an
operation cost of 120,000 cycles requires approximately 0.91 processors
((32*390*120,000)/1,650,000,000). More simply, a Virtual I/O Server running on a single
processor core should be capable of driving approximately 32 fast disks to maximum
throughput.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Increasing the queue depth on a client virtual device reduces the number of supported
open devices on that virtual adapter, and the number of I/O requests that devices can have
active on the VIO Server. The VSCSI queue depth generally should not be any larger than
the queue depth on the physical LUN. A larger value wastes resources without additional
performance. If the virtual target device is a logical volume, the queue depth on all disks
included in that logical volume must be considered. If the logical volume is being mirrored,
the virtual SCSI client queue depth should not be larger than the smallest queue depth of
any physical device being used in a mirror. When mirroring, throughput is effectively
throttled to the device with the smallest queue depth. If a volume group on the client spans
virtual disks, keep the same queue depth on all the virtual disks in that volume group,
especially when using mirroring.
When increasing the VSCSI client queuing can be a useful optimization
The storage is Fibre Channel attached.
SCSI queue depth is already a limiting factor using the default setting of three.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Examples for tuning the virtual SCSI queue depth
Example configuration 1: A physical volume on the VIOS has a queue depth of 16
and the default virtual SCSI queue depth is three. The entire physical volume is used as
the backing storage. Alter the virtual SCSI queue depth to 16.
Example configuration 2: A physical volume on the VIOS has a queue depth of 16. It
has eight logical volumes being used as backing storage for client LPARs and the
virtual SCSI queue depths are three. In this case, it might result in 24 pending I/Os on a
queue depth of 16. The physical volume in this case could be tuned to 24 for better
performance. For more information, read the section on virtual SCSI queue depth in
chapter 4 of the IBM System p Advanced POWER Virtualization Best Practices, an IBM
Redbooks document (REDP-4194).
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Monitor system resources on the client partitions as you normally would. If you find an area
with a bottleneck, be sure to determine whether the device is native or virtual. If virtual,
track it back to the physical device on the Virtual I/O Server. Do not forget to check general
resource consumption on the Virtual I/O Server (CPU and memory in particular). Once you
find out the exact configuration and what the core problem is, then you can determine what
the tuning steps should be. For example, you do not want to perform disk tuning activities
when the problem is really a CPU starvation issue on the Virtual I/O Server.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Disk bound system
A system might be disk bound if at least one disk is busy and cannot fulfill other requests,
and processes are blocked and are waiting for the I/O operation to complete. The limitation
can be either physical or logical. The physical limitation involves hardware like bandwidth
of disks, adapters and the system bus. The logical limitation involves the organization of
the logical volumes on disks and Logical Volume Manager (LVM) tuning and settings, such
as striping or mirroring. The example in the visual shows a Virtual I/O Server with one very
busy disk. Usually we also look at the Wait% and the Waitqueue for indication that the
processors were idle but there were still threads waiting for I/O. However, since this is the
VIOS, the VIOS operating system does not have threads waiting. If we were to look at the
client, then we might see threads waiting or wait time. You will not always see waits when
there is a disk performance bottleneck however.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
tty: tin tout avg-cpu: % user % sys % idle % iowait physc % entc
0.0 180.0 0.8 31.2 34.2 33.8 0.1 36.6
Notes:
Comparing the examples
In the examples in the visual, the clients hdisk0 is the same as the VIOSs hdisk2.
% tm_act: Reports back the percentage of time that the physical disk was active or the
total time of disk requests. When utilization exceeds roughly 60 to 70 percent, it usually is
indicative that processes are starting to wait for I/O.
Kbps: Reports back the amount of data transferred to the drive in kilobytes per second.
tps: Reports back the number of transfers per second issued to the physical disk.
Kb_read: Reports back the total data (kilobytes) from your measured interval that is read
from the physical volumes.
Kb_wrtn: Reports back the amount of data (kilobytes) from your measured interval that is
written to the physical volumes.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Find busiest logical volumes using lvmstat command output
The example output file in the visual only shows the Most Active Logical Volumes section
of the file. It shows that there are two busy logical volumes. Figure out on which physical
volumes these are located (lspv). If theyre on the same physical volume, move one to a
less busy disk. If one logical volume is causing the disk to be too busy, you could use a
faster disk, use LVM or SAN storage features to spread the load over multiple disks, or
work with the client LPAR to figure out how that LV is being used and whether some
functions can be moved to another disk. Example line from lvmstat output:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
No No
Actions Actions
No No
Actions Actions
No No
Notes:
Disk I/O analysis
For a native device: Check the physical adapter and the physical disk
For a virtual device: Check the CPU & memory usage on the Virtual I/O Server in
addition to the physical adapter and disk When a system has been identified having
disk I/O performance problems, the next point is to find out where the problem comes
from. This visual shows the steps to follow on a disk I/O bound system. The items to
verify are:
Other tools to use to track down the precise disk and area on disk include: lspv, lslv
iostat/viostat, fileplace, and filemon. You will use these commands in the hands-on
exercise to determine where performance problems originate.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Total CPU cycles for the I/O client plus the Virtual I/O Server will be
higher than native I/O.
If a Virtual I/O Server has constrained resources, it will affect all clients
using those resources.
Memory requirements for the Virtual I/O Server are typically modest due
to the fact there is no data caching on the Virtual I/O Server.
Notes:
General performance considerations with virtual SCSI
If not constrained by processor performance, virtual disk I/O throughput is comparable
to native I/O.
Since VSCSI is a client/server model, the combined CPU cycles required on the I/O
client and the Virtual I/O Server will always be higher than native I/O. A reasonable
expectation is that it will take twice as many cycles to do VSCSI as native I/O (more or
less evenly distributed between the client and server).
If multiple partitions are competing for resources from a VSCSI server, care must be
taken to ensure enough server resources (processor, memory, and disk) are allocated
to do the job.
There is no data caching in memory on the server partition. Thus, all I/Os which it
services are essentially synchronous disk I/Os. Because there is no caching in memory
on the server partition, its memory requirements should be modest.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (1 of 2)
1. True or False: Memory requirements on the VIOS to support
VSCSI I/O operations are minimal because no data caching
is performed on the VIOS.
Notes:
Checkpoint solutions (1 of 2)
1. True or False: Memory requirements on the VIOS to support
VSCSI I/O operations are minimal because no data caching
is performed on the VIOS.
The answer is true.
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (2 of 2)
3. Which one of the following recommendations about sizing the
Virtual I/O Server for virtual SCSI is false:
a. For the best performance, dedicated processors can be used.
b. When using shared processors, use the uncapped mode.
c. When using shared processors, set the priority (weight value) of the
Virtual I/O Server partition equal to its client partitions.
Notes:
Checkpoint solutions (2 of 2)
3. Which one of the following recommendations about sizing the Virtual
I/O Server for virtual SCSI is false:
a. For the best performance, dedicated processors can be used.
b. When using shared processors, use the uncapped mode.
c. When using shared processors, set the priority (weight value) of the Virtual I/O
Server partition equal to its client partitions.
The answer is when using shared processors, set the priority (weight
value) of the Virtual I/O Server partition equal to its client partitions.
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 2: Summary
Having completed this topic, you should be able to:
Discover physical to virtual SCSI device configuration
Determine which client partitions and devices are affecting the
Virtual I/O Server performance
Describe the partition resource sizing guidelines for Virtual I/O
Servers used for virtual SCSI
Use performance analysis tools to monitor virtual SCSI device
performance
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Virtual Ethernet enables inter-partition communication without the need for physical
network adapters assigned to each partition. This technology enables IP-based
communication between logical partitions on the same system using a VLAN capable
software switch (POWER Hypervisor) in POWER5 and POWER6 systems. The virtual
Ethernet interfaces can be configured with both IPv4 and IPv6 protocols. To use virtual
Ethernet to connect to a physical Ethernet adapter which connects to a physical Ethernet
network, you must implement shared Ethernet adapter. This will be discussed in a later
unit.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
VLAN 300
POWER Hypervisor (switch)
Notes:
The POWER Hypervisor provides a virtual Ethernet switch function based on the IEEE
802.1Q VLAN standard that allows partition communication within the same server. Using
this switch function, partitions can communicate with each other by using virtual Ethernet
adapters and assigning VIDs that enable them to share a common logical network. The
POWER Hypervisor Ethernet switch function is included as standard in all POWER5 and
POWER6 systems. It does not require the purchase of any of the PowerVM (or Advanced
POWER Virtualization) features. The virtual Ethernet adapters are created and the VID
assignments are performed using the HMC. The system allows virtual Ethernet adapters to
be configured with a PVID, that will be used to tag untagged packets.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
p ut
gh
r ou
Th
Notes:
Comparing throughput of virtual Ethernet to physical Ethernet
CPU utilization: The virtual Ethernet adapter has a higher raw throughput than physical
Ethernet at all MTU sizes. With an MTU size of 9000 bytes, the throughput difference is
very large (four to five times) because the physical Ethernet adapter is running at wire
speed (989 Mbit/s user payload) while the virtual Ethernet can run much faster as it is
limited only by CPU and memory-to-memory transfer speeds.
If a partition is CPU constrained with a high virtual Ethernet workload, you can see linear
improvements in the throughput as more processor resources are added.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
700Mb
600Mb
Throughput
per second
500Mb
400Mb
300Mb
200Mb
100Mb
0
1500 1500 9000 9000 65394* 65394*
S D S D S D
Notes:
Throughput at different MTU sizes
The data in the visual is from a test on a POWER5 system using AIX V5. The test data is
documented in IBM System p Advanced POWER Virtualization Best Practices, an IBM
Redpaper document (REDP-4194). * The actual tests from which the data in the visual was
obtained used an MTU of 65394. As of AIX 6, the maximum MTU for virtual Ethernet
adapters is 65390. When setting the MTU size to larger values, be sure to have the
network parameters tcp_pmtu_discover and udp_pmtu_discover enabled. They are
enabled by default. The tcp_sendspace and tcp_recvspace buffer settings can also have a
performance impact. See the IBM System p Advanced POWER Virtualization Best
Practices, an IBM Redpaper document (REDP-4194) for more information on buffers.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
14 Gb
12 Gb
Throughput
per second
10 Gb
8 Gb
6 Gb
4 Gb
2 Gb
0
1500 1500 9000 9000 65390 65390
S D S D S D
Notes:
Example 2
The data in the visual was obtained with tests performed in the IBM UNIX Service
Enablement training lab on a POWER6 (4.2 GHz) processor-based server using a partition
running AIX 6. Workloads vary greatly and these numbers cannot be promised to
customers. However, notice that the affect of MTU size is the same in example 1 and
example 2.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
70
Performance 60
gain %
with SMT 50
40
30
20
10
0
1500 1500 9000 9000 65394* 65394*
MTU
S D S D S D
simplex vs.
duplex
*Data from AIX V5
Copyright IBM Corporation 2011
Notes:
Examining the impact of simultaneous multithreading (SMT) on virtual Ethernet
performance
The virtual Ethernet performance observed by a partition typically benefits when
simultaneous multithreading is enabled, because the virtual Ethernet is not limited by
media speed and it can take advantage of the extra available processor cycles. However,
in the case of light workloads, performance can be better if simultaneous multithreading is
disabled. Depending on the partition type, the system acts in a certain way when the
workload is light. In a dedicated processor partition, the second simultaneous
multithreading thread is disabled, however the system checks periodically to determine if it
should be reactivated. In a shared processor partition, the second simultaneous
multithreading thread runs an idle loop at a low priority. The consumption of CPU cycles by
periodic disabling, checking, and enabling of the second thread or running of an idle loop
tends to affect the latency of the transactions on the virtual Ethernet, thus reducing
throughput.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The TCP checksum is calculated on sending systems and the value placed in a field in the
packet. When the receiving system receives the packet, it recalculates the checksum and
then verifies that it is the same as the one the sender put in the field in the packet. This is to
make sure the packet did not get corrupted as it traveled over the physical network,
through routers, and so on. On virtual networks, the Hypervisor copies the packets from the
memory of the sender partition to the memory of the receiving partition, so there is no
potential for the packet to get corrupted along the way. The checksum offload setting must
be enabled in order to disable the checksum calculation. For virtual Ethernet adapters,
checksum offload is enabled by default for performance and is configurable. This causes
the sending system TCP stack to not generate a checksum when sending a packet, as it
assumes the adapter will do it instead. The virtual Ethernet adapter marks the packet as
having come from a virtual Ethernet adapter, so that the receiving side does not expect a
real checksum value.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
To change ent0:
# chdev l ent0 -a chksum_offload=[yes/no]
Or
# smit chgenet
Notes:
If the network traffic is going to be within a virtual Ethernet, then this feature should be
enabled on both the sending partitions and the receiving partitions for the best
performance. If there is a mismatch, that is, the sender and the receiver are not configured
the same way, the Hypervisor keeps track of how all of the adapters are configured and will
calculate the checksum if necessary (either on the sending side, or on the receiving side
where it will also perform the verification function). The ability to configure TCP checksum
offload for virtual Ethernet adapters is supported by AIX 5L V5.3 with maintenance level 3
and above. For physical adapters, it is also possible to enable or disable checksum
calculation at the TCP level with the ifconfig command. This method does not work for
virtual Ethernet adapters, and should not be used.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
START
Actions
On both source and
Is it a Yes destination: Check
Virtual Ethernet CPU resources, SMT,
Adapter? buffers, MTU
No
It might not be a network I/O problem.
Actions Re-evaluate
Notes:
Virtual Ethernet adapter
If there are performance problems with a virtual Ethernet adapter, make sure there is
enough CPU resources. There can be an impact on virtual Ethernet throughput even at
60-70% CPU saturation. Verify that SMT is enabled, checksum offload is enabled, and you
are using the highest MTU allowed for the configuration. Virtual Ethernet adapters which
use a shared Ethernet adapter to connect to a physical network will need to use the MTU of
the physical network or use path MTU (PMTU) discovery to set the best MTU for the
network. Also, check for socket buffer failures with the netstat -c command.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Virtual Ethernet monitoring
To monitor virtual Ethernet traffic you can use topas to view the interface statistics, entstat
to view the adapter statistics and overall network statistics with netstat. The entstat values
can be reset to zero with the -r flag.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. True or False: Virtual Ethernet adapters are created and the
PVID assignments are performed using the Hardware
Management Console (HMC).
Notes:
Checkpoint solutions
1. True or False: Virtual Ethernet adapters are created and the PVID
assignments are performed using the Hardware Management Console
(HMC).
The answer is true.
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 3: Summary
Having completed this topic, you should be able to:
Describe how the following tuning options affect virtual
Ethernet performance:
MTU sizes, CPU entitlement, TCP checksum offloading, simultaneous
multithreading
Monitor virtual Ethernet utilization statistics
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-95
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
In the visual, we see a representation of a Virtual I/O Server partition with one physical
adapter and two virtual adapters to connect to two separate VLANs on the managed
server. The shared Ethernet adapter (SEA) acts as an OSI Layer 2 bridge between the
virtual adapters and the physical adapters. The bridge performs the function of a MAC relay
and is independent of any higher layer protocol. With the default SEA configuration, if one
client partition sends data, it can take advantage of the full bandwidth of the adapter,
assuming the other client partitions do not send or receive data over the network adapter at
the same time. The Virtual I/O Server offers broadcast and multicast support, as well as
support for Address Resolution Protocol (ARP) and Neighborhood Discovery Protocol
(NDP).
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-97
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Physical ent0
External
LANs
Copyright IBM Corporation 2011
Notes:
Configuring the TCP/IP parameters with a shared Ethernet adapter
The visual shows the devices that can exist when implementing a single shared Ethernet
adapter. In this example, there are two interfaces on which the TCP/IP options can be
configured and there is no performance gain for either configuration. The first option is to
configure the TCP/IP parameters such as the IP address on the interface associated with
the SEA (en3 in this example). The second option is to configure a second, optional, virtual
Ethernet adapter and configure the TCP/IP parameters on its interface (en2 in this
example). You cannot configure the interface for the actual physical adapter associated
with a shared Ethernet adapter (en0 in this example). You also cannot configure the
interface associated with the shared Ethernet adapters virtual Ethernet adapter (en1 in this
example). You do not have to configure a TCP/IP address on the VIOS at all in order for the
SEA to bridge network traffic. Configuring the TCP/IP address as described here is simply
to allow the VIOS LPAR itself to communicate with other hosts on the network.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-99
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Redundancy options
Configure SEA with a network interface backup adapter.
Use SEA as network interface backup adapter for clients.
Clients can use own dedicated physical adapter and use the SEA link as
a backup or clients can use two SEAs.
Configure dual VIOS partitions with SEA failover.
Notes:
Changing the MTU on the physical adapter
More information on configuration choices: Use the chdev command to change the
MTU size on the physical adapter before the shared Ethernet adapter is created. This
example changes the MTU size to 9000 (jumbo frames) for the ent0 device which is the
physical adapter which will be associated with the shared Ethernet adapter: chdev -dev
ent0 -attr jumbo_frames=yes If your network load is made up of only small packets,
using the 9000 MTU will not decrease the processing load.
Besides the IBM redbooks documents, the IBM Unix Software Service Enablement course,
AHQV335 PowerVM Virtual I/O Server II: Advanced Configuration, covers how to
implement the redundancy and dual Virtual I/O Server configurations.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-101
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The most important aspects of sizing the Virtual I/O Server for SEA services are the
processing resources and the type and number of Ethernet adapters to use. If network
traffic is very high and performance is important, the best performance will be if the client
partition has its own physical Ethernet adapter connected to the external network. Tests
show that the shared Ethernet adapters stream data at media speed as long as the VIOS
has enough processing resources. The shared Ethernet adapter uses more processing
power than a physical adapter because of the bridging functionality. If you know the desired
throughput rate of your client partitions, then you can determine how many and what speed
adapters you need to install in your Virtual I/O Server partition. The Integrated Virtual
Ethernet Adapter (IVE) is also known as the Host Ethernet Adapter (HEA) and is the
integrated physical Ethernet adapter on most POWER6 processor-based servers.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-103
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The memory requirements for a Virtual I/O Server are typically minimal. Plan for 40MB per
logical processor. If a partition has many virtual processors and SMT is enabled, this could
a significant amount.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-105
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Types of workloads
Simplex is single direction TCP communication. Duplex is two direction TCP workloads. A
duplex example would be an ftp running from machine A to B and another ftp running
between machine B and A concurrently. Some media cannot send and receive
concurrently, thus they will not perform any better (and usually worse) when running duplex
workloads. Duplex workloads will not scale up at a full two times the rate of a simplex
workload because the TCP Ack packets coming back from the receiver now have to
compete with data packets flowing in the same direction.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-107
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Cycles per byte charts
Search the IBM Hardware Information Center for Planning for shared Ethernet adapters to
find the CPB data based on type of workload and MTU size. In the example workload in the
visual, the resulting number of needed 4.2GHz processors is 0.56. If the original 1.65GHz
cycles per byte number was used, 1.42 processors would be needed to drive that
workload.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-109
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Threading/non-threading (1 of 2)
Threading ensures that CPU resources are shared fairly when a Virtual
I/O Server provides a mix of SEA and VSCSI services.
Like most device drivers, virtual Ethernet and shared Ethernet drivers
typically drive high interrupt rates and are CPU intensive.
Without threading, on a CPU constrained system, virtual network traffic will have a
higher priority than virtual SCSI interrupts resulting in worse virtual SCSI
performance.
With threading enabled, there is more consistent quality of service but at a lower
overall LAN throughput.
Disable threading when a Virtual I/O Server is not used for VSCSI.
Can use separate Virtual I/O Servers for shared Ethernet adapter and virtual
SCSI services.
Notes:
How threading works
The threaded model helps to ensure that VSCSI and shared Ethernet adapter operations
share the Virtual I/O Server CPU resources fairly. However, threading adds more
instruction path length, thus using more CPU cycles. If the Virtual I/O Server will only be
running SEA services (no VSCSI) then the SEA device should be configured with threading
disabled in order to run in the most efficient mode. Threading causes incoming packets to
be queued to a buffer in memory. A special kernel thread is dispatched to process the
buffer, which uses more processing power and allows processing to be shared more evenly
with virtual SCSI. With non-threading, the virtual Ethernet and shared Ethernet adapter
driver forwards packets at the interrupt level which is more efficient and is why throughput
goes up. Note that we are not discussing simultaneous multithreading, but a configuration
option for the SEA device driver.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-111
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Threading/non-threading (2 of 2)
Disabling threading improves the cycles per byte of transmission.
Example for 1500 MTU streaming simplex workload with 4.2 GHz CPU:
Enabled is 4.4 cycles per byte and disabled is 3.65.
Notes:
Enabling and disabling threading
You can enable or disable threading using the -attr thread option of the mkvdev
command. To enable threading, use the -attr thread=1 option. To disable threading,
use the -attr thread=0 option. For example, the following command disables threading
for a new shared Ethernet adapter: mkvdev -sea ent1 -vadapter ent5 -default
ent5 -defaultid 1 -attr thread=0
Even though threading is enabled or disabled on an SEA device basis, if a VIOS had
multiple SEA devices, they should all be configured the same way.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-113
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
largesend: Overview
The largesend attribute (also known as segmentation offload) enables TCP largesend
capability from logical partitions to the physical adapter. For SEA devices, the attribute is
largesend. The attribute that is set on the physical adapter is called large_send. TCP will
send a big chunk of data to the adapter when TCP knows that adapter supports largesend.
The adapter will break this big TCP packet into multiple smaller TCP packets that will fit the
outgoing MTU of the adapter, saving system CPU load and increasing network throughput.
The TCP stack on a partition will determine if the Virtual I/O Server supports largesend. If
it does, then the partition will send big TCP packets directly to the Virtual I/O Server
partition. The largesend capability for SEA devices is supported with the Virtual I/O Server
version 1.3 and above. It is not enabled by default.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-115
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
1 = Enabled
Copyright IBM Corporation 2011
Notes:
Configuring largesend
Be sure to enable the large_send attribute on the physical adapter before associating it
with an SEA. Typically, large_send is enabled by default on physical adapters. The
largesend attribute can be enabled on the SEA when it is created with the mkvdev
command or later by using the chdev command.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-117
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
To disable:
# ifconfig en0 -largesend
Notes:
Configuring largesend on the clients
On AIX, largesend can be enabled on an LPARs virtual adapter using the ifconfig
command. It is not enabled by default. You cannot use chdev to configure largesend on
virtual adapters. Since ifconfig changes do not persist across operating system boots, you
must add the ifconfig command to an AIX startup script. If largesend is not enabled on a
client interface, then LARGESEND does not appear in the ifconfig output.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-119
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The bandwidth apportioning feature for the SEA, also known as Virtual I/O Server quality of
service (QoS), allows the VIOS to give a higher priority to some types of outgoing packets.
In accordance with the IEEE 801.q specification, VIOS administrators can instruct the SEA
to inspect bridged VLAN-tagged traffic for the VLAN priority field in the VLAN header. The
3-bit VLAN priority field allows each individual packet to be prioritized with a value from 0 to
7 to distinguish more important traffic from less important traffic. More important traffic is
sent sooner and uses more of the physical Ethernet adapter (configured in the SEA)
bandwidth than less important traffic.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-121
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The qos_mode attribute
The VIOS administrator can reset the SEA qos_mode attribute to either strict or loose
mode. The default is disabled mode.
Disabled mode: VLAN traffic is not inspected for the priority field.
Strict mode: more important traffic is bridged over less important traffic.
Loose mode: a cap is placed on each priority level so that after a number of bytes is
sent for each priority level, the next level is serviced. The cap from the lowest to highest
priority queue is 2MB, 4MB, 8MB, 16MB, 32MB, 64MB,128MB, 256MB.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-123
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The VIOS trunk virtual Ethernet adapter must be configured with additional VLANs.
Each client creates a VLAN device and sets its priority.
Can be set for additional VLANs only (not PVIDs)
$ smit addvlan
Add A VLAN
[Entry Fields]
VLAN Base Adapter ent3
* VLAN Tag ID [100] +#
VLAN Priority [1] +#
Notes:
The priorities are set on the VLAN devices themselves, so only VLAN devices can have
priorities set. The VLAN ID that is used as the PVID cannot have a priority set and will be
set to the default (0) priority. To use this feature, when the VIOS trunk virtual Ethernet
adapter is configured on an HMC, the adapter must be configured with additional VLAN IDs
because only the traffic on these VLAN IDs is delivered to the VIOS with a VLAN tag.
Untagged traffic is always treated as though it belongs to the default priority class (for
example, as if it had a priority value of 0). To enable the SEA to prioritize traffic, client
partitions must insert a VLAN priority value in their VLAN header. For AIX clients, a VLAN
pseudo-device must be created over the Virtual I/O Ethernet Adapter, and the VLAN
priority attribute must be set (the default value is 0).
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-125
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
TCP checksum offload
By default, the chksum_offload attribute is enabled for both physical and virtual Ethernet
adapters. This setting disables the TCP checksum calculation. When configuring a shared
Ethernet adapter, for best performance with typical configurations, do not change the
default value for the associated physical and virtual adapters.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-127
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Actions
No
Actions
Is it a Yes Is the VIO Yes
Shared Ethernet Check VIO Server
server CPU CPU activity;
adapter? bound? Tune CPU
No
Check adapter
statistics for Actions
saturation;
Tune adapter
Notes:
Virtual Ethernet adapter
If a virtual Ethernet adapter is bound, check the adapter statistics with the entstat
command. Check the adapter memory usage with the netstat command to validate if there
is enough buffer allocated to this adapter. Check to see if the partition is memory or CPU
bound. Check the clients virtual adapter utilization to see which client is using a lot of the
bandwidth. If one client has a lot of network traffic, which is affecting the network activity of
other client partitions, then perhaps it should get a dedicated Ethernet adapter.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-129
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
An understanding of the topology of the network devices involved is important when
troubleshooting a shared Ethernet adapter bottleneck. The commands in the visual above
will list the devices involved in a shared Ethernet adapter configuration. Both of the
commands shown are Virtual I/O Server CLI commands. The lstcpip -adapters
command can list several shared Ethernet adapters and the lsdev command as shown in
the visual will help you determine which physical and virtual adapters are associated with
each shared Ethernet adapter.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-131
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Monitoring shared Ethernet adapter
-lsnetsvc: Gives the status of a network service
-lstcpip: Displays the TCP/IP settings
-optimizenet: Changes the characteristics of network tunables
-snmp_info: Requests values of Management Information Base variables managed by
a Simple Network Management Protocol agent
-traceroute: Prints the route that IP packets take to a network host.
The output of the entstat command will show how many errors and collisions have been
detected on the shared Ethernet adapter. The topas command monitors only configured
interfaces and therefore is not a tool to use for monitoring the shared Ethernet adapter
device if another interface is configured.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-133
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Using the entstat command
The entstat command will show statistics for the shared Ethernet adapter device, and the
virtual adapter and the physical adapter to which it is associated. If you notice many errors
or collisions in the entstat output, check for CPU starvation on the Virtual I/O Server and
check the physical adapter for saturation. Besides overall packet numbers and the number
of packets dropped, monitor the Thread queue overflow packets line item. Once this
reaches 8192, packets will be dropped. If this value gets very large and packets are being
dropped, configure additional CPU resources. In addition to entstat, you can use topas
from the VIOS CLI to monitor SEA activity. As of VIOS V1.5, topas now lists the SEA
interface when it is the interface that is configured.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-135
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The seastat command to display MAC addresses
lshwres -m MSname -r virtualio --rsubtype eth --level lpar \
-F lpar_name,mac_addr
The seastat command is new as of Virtual I/O Server V1.5. To use seastat to see statistics
about network traffic, advanced accounting must be enabled on the SEA device. When
advanced accounting is enabled, the SEA keeps track of the hardware (MAC) addresses of
all of the packets it receives from the LPAR clients, and increments packet and byte counts
for each client independently. Command options: -n suppresses name resolution and -c
zeros out the statistics. You can use the HMC to quickly view which MAC addresses belong
to each LPAR.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-137
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. True or False: When using shared Ethernet adapters, set the MTU
size to 65390 on the physical adapter for the best performance.
3. If you see many collisions or dropped packets for the SEA device,
what are the first two things to investigate?
Notes:
Checkpoint solutions
1. True or False: When using shared Ethernet adapters, set the MTU size to 65390
on the physical adapter for the best performance.
The answer is false.
2. True or False: Processor utilization for large packet workloads on jumbo frames is
approximately half that required for MTU 1500.
The answer is true.
3. If you see many collisions or dropped packets for the SEA device, what are the
first two things to investigate?
The answers are VIOS CPU utilization and physical adapter saturation.
4. True or False: You can configure a maximum amount of network bandwidth for
individual clients of a shared Ethernet adapter.
The answer is false. You can only set priorities.
5. True or False: For mixed shared Ethernet adapter and VSCSI services, leave
threading enabled on the shared Ethernet adapter device.
The answer is true.
Copyright IBM Corporation 2011
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-139
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 4: Summary
Having completed this topic, you should be able to:
Describe Virtual I/O Server sizing guidelines for hosting shared
Ethernet adapter services
Physical adapters, memory, and processing resources
Configure shared Ethernet adapter threading
Configure TCP segmentation offload on the shared Ethernet
adapter
Configure SEA bandwidth apportioning and monitor with the
seastat utility
Monitor Shared Ethernet
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-141
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-143
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
IVE architecture (1 of 2)
Every POWER6 processor-based server I/O subsystem contains the
P5IOC2 chip
Dedicated controller that acts as the primary bridge for all PCI buses and all
internal I/O devices, including IVE adapter
Most (not all) POWER6 servers have an IVE adapter
IVE design provides high throughput and a great improvement of
latency for short packets
GX+ bus attachment for performance
chip (unused)
HEA
Notes:
IVE: Overview
The major component of the IVE is the Host Ethernet Adapter (HEA) which contains all of
the ports and switches. Other IVE components include the Vital Product Description (VPD)
chip with the media access control (MAC) addresses for the ports and, depending on the
model, one or two system ports. Since an HMC is required for p570 management, the IVE
system ports are not used. The IVE design provides a great improvement of latency for
small packets. The methods used to achieve low latency include attachment to the GX+
bus, immediate data in descriptors to reduce memory access, and direct user space
per-connection queueing (bypassing the operating systems). Additional acceleration
functions were designed into the IVE in order to reduce host code path length. This IVE
adapter provides three times the throughput over current 10Gbps solutions (when using the
10Gb IVE card). IVE relies exclusively on the system memory and system processing
cores to implement acceleration features.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-145
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
IVE architecture (2 of 2)
Logical ports are associated with a specific physical port.
Port group:
Set of 16 logical ports:
Logical ports can be split evenly between the two physical ports in a port group or unevenly.
One or two port groups per Host Ethernet Adapter (HEA), depending on model
One or two physical ports per port group, depending on model
Each physical ports has its own Layer 2 switch.
LPAR OS
Logical IVE
HEA
switch Port group
Physical
External switch
port
Copyright IBM Corporation 2011
Notes:
IVE terminology and description
AIX sees logical ports which are the representation of the shared physical ports. The logical
port will be an ent# device in AIX (and eth# in Linux). The OS in the LPAR box in the visual
above stands for operating system. Logical ports are grouped in sets of 16 called a port
group. Each port group will have either one or two physical ports, depending on the IVE
model. There are one or two port groups for each IVE depending on the model. The
administrator chooses which logical ports to allocate to partitions and which physical port to
use for the logical ports. Each LPAR can have one logical port per physical port. There is
one HEA in each IVE adapter. In the operating system, the HEA is represented logically as
an lhea device. If a partition uses two logical ports from the same HEA, they must use
different physical ports. In this case, there will be one lhea parent device and two ent#
devices.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-147
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
LPARs
Maximum 16
logical ports
per port group
Logical switch Logical switch
Port group
1Gb physical 1Gb physical
port port
External switch
Notes:
IVE gigabit adapter options
The dual-port gigabit IVE adapter has one port group and two physical 1Gb ports.
Because one port group supports a maximum of 16 ports, this adapter supports up to
16 logical ports, and therefore up to 16 partitions. This is the IVE that comes standard
on the p570 servers.
The quad-port gigabit IVE adapter has two identical port groups and two physical 1Gb
ports per port group. Because of the two port groups, it supports up to 32 ports, and
therefore up to 32 partitions.
For communications between partitions on the same server using the same physical port
(and thus the same logical switch), no access to an external switch is needed. For the best
performance between two LPARs on the same server, use IVE ports that share the same
HEA logical switch. This configuration will use more CPU because of increased throughput.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-149
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
LPARs
Maximum 16
logical ports
per port group
Logical switch Logical switch
Notes:
IVE adapter 10 gigabit option
The dual-port 10 gigabit IVE adapter has two port groups and one physical (optical) 10Gb
port per port group. Because of the two port groups, it supports up to 32 ports, and
therefore up to 32 partitions.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-151
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Virtual I/O
Linux AIX AIX Linux AIX AIX
Server
Ethernet IVE
Network
adapter
Copyright IBM Corporation 2011
Notes:
Comparison of shared Ethernet adapter and IVE
The visual above using a shared Ethernet adapter for a client with an IVE configuration.
With an IVE, packets destined for the external network are not bridged through a Virtual I/O
Server partition. An IVE logical port can be used as the physical adapter in a shared
Ethernet adapter configuration. In this case, the physical IVE port is used exclusively by the
VIOS, and can not be configured with other logical ports. The shared Ethernet adapter
configuration requires that clients create a virtual Ethernet adapter which is not supported
by AIX V5.2.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-153
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
There is a performance aspect and a scalability aspect to the Multi-Core Scaling (MCS)
configuration option. The MCS value controls the level of parallelism used by each
partitions operating system for network traffic and it controls the total number of logical
ports available for use for a particular port group. As you can see in the table in the visual,
the larger the MCS value, the fewer ports that can be configured per port group. The
default value on a p570 is four, therefore there will only be four available ports per port
group. Decreasing the MCS value will increase the number of available logical ports per
port group and is appropriate when, on average, the partitions utilizing the IVE have lower
network bandwidth requirements. Increasing the MCS value will provide for fewer ports per
port group and is appropriate when, on average, LPARs have higher network bandwidth
requirements.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-155
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Queue pairs (1 of 2)
The MCS value sets the number of queue pairs in AIX.
A queue pair is a pair of transmit and receive queues.
Having multiple queue pairs breaks network traffic into multiple streams that can
be dispatched to multiple cores to take advantage of parallel processing.
P6 P6 P6 P6 P6
core core core core core
Dispatched on
next available Port
Port
core (1 of 4)
(1 of 16)
lhea lhea
QP QP QP QP QP
1 Queue
4 Queue
pair
ent# ent# pairs
(1 stream)
(4 streams)
Copyright IBM Corporation 2011
Notes:
Breaking up the network traffic into multiple streams enables the traffic to be processed in
parallel by interrupt handlers running on different processors. This is beneficial when there
are enough processors so that each stream can be dispatched in parallel. The visual above
shows two configuration scenarios. The one on the left shows an example in which the
MCS value is 1 and this sets the number of queue pairs (QP) in each of the partitions which
use the same port group to 1. The example on the right shows a partition with four queue
pairs which can utilize four processor cores in parallel to process network traffic. This is
only beneficial for performance if the partition is configured with at least four processors
(dedicated, virtual, or logical). Having more QPs will use more CPU resources.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-157
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Queue pairs (2 of 2)
The number of queue pairs (QPs) in each LPAR is equivalent to the
MCS value for the port group.
Notes:
The number of queue pairs shown in the visual is per partition. If MCS is 4 and you have
four partitions using that port group, each of those partitions will have four queue pairs.
Each stream can be dispatched on the next available processor. One processor core can
handle two queue pairs with simultaneous multi-threading enabled. Therefore, in the
example in the visual, with SMT enabled and with one virtual processor (two logical
processors), two queue pairs would be ideal.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-159
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The AIX entstat command output will list the number of queue pairs (QPNs) for the IVE
logical port.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-161
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The multicore attribute
The multicore attribute is enabled by default. By disabling the multicore attribute, the
partition will have one queue pair. If the MCS value on the port group is a higher number
than you want for an LPAR, one tuning strategy is to disable the multicore attribute.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-163
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Tuning:
MCS value might be dictated by number of LPARs to support.
Can dynamically disable/enable multicore for the IVE port device.
Do not arbitrarily change number of virtual processors to fit QPs.
Have enough CPU resources:
Intra-physical port network communications will be faster and use more CPU.
Simultaneous multithreading should provide performance benefit.
Try one thing, monitor, and see if performance improves.
Configuration example:
Four processor system with four LPARs, set MCS = 4 for performance.
If one LPAR has four processors, and others have just one, disable MCS in the three
one-processor partitions.
If all LPARs use approximately one processor (VP or dedicated), disable multicore in
all.
Notes:
Using HEA ports and system resources
The higher the MCS value, the more CPU cycles per Gbps throughput due to less effective
interrupt coalescing. IVE ports use less CPU than using an SEA, because with an SEA,
there is CPU processing on the client and on the Virtual I/O Server. The 10Gb IVE card can
be driven at full bandwidth even using a maximum transmission unit (MTU) of 1500. For
10Gb adapter only, there is also some benefit to using an MCS greater than one to allow
spreading across the direct memory access (DMA) engine. In the past, the processing
power of the adapter card could not keep up with the volume of packets required at that
MTU to drive the full bandwidth.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-165
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Performance considerations
Purchase appropriately-sized IVE model for network needs.
Use IVE over other types of Ethernet adapters when possible.
IVE performance is faster on GX bus than a PCIe/X Ethernet card.
Distribute LPAR usage across IVE physical ports.
All LPARs sharing a physical port share the bandwidth of that port.
For fastest IVE communications between LPARs, use logical
ports on same internal IVE switch (that is, on the same
physical port).
Typically, SMT will increase performance.
Use flow control option in port groups, particularly for 10 Gb
IVE.
Overall recommended value (for performance) is to set MCS to
the number of logical processors for the partitions which will
use the port group.
Notes:
IVE performance considerations
The first statement on the visual refers to the fact that there are three different IVE models
that can be ordered for most POWER6 systems. Using IVE logical ports for communication
that are on the same physical port can be several times faster than using a different
physical port. The best improvements with SMT can be seen with configurations with
communications between two logical ports on the same physical port. Check the box for
the flow control on the physical port to have the HMC attempt to negotiate flow control in
both the transmit and receive directions. The HMC enables flow control in the directions for
which the HMC can negotiate flow control. It is recommended that this is enabled,
especially for the 10Gb IVE model.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-167
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Using the entstat command
The entstat AIX command is a tool to use for monitoring Ethernet traffic, including IVE
traffic. In the visual above, the ent0 device is the Logical Host Ethernet Port (lp-hea) device
as shown in the output of the lsdev c adapter AIX command. This is the device you
would use if the HEA port is used for communications on an AIX client or on a VIOS where
it is not used as part of an SEA configuration. On the VIOS, if the IVE port is the physical
adapter in an SEA configuration, use the SEA device with the entstat command.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-169
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
More entstat command output
The visual shows that entstat command output also displays configuration properties.
Notice the promiscuous setting, logical port number, and the number of QPs.
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-171
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (1 of 2)
1. True or False: The IVE allows partitions to connect to an external
network without the need for a Virtual I/O Server partition.
4. True or False: An IVE logical port can be used as the physical adapter
in an SEA configuration.
5. You can see the number of QPs by looking at output from what
command?
a. lsattr (AIX)
b. entstat (AIX)
c. ifconfig (AIX)
d. lshwres (HMC)
Copyright IBM Corporation 2011
Notes:
Checkpoint solutions (1 of 2)
1. True or False: The IVE allows partitions to connect to an external network without the need for a
Virtual I/O Server partition.
The answer is true.
2. True or False: Partitions using IVE logical ports must be connected to an external switch to
communicate with each other.
The answer is false. Partitions configured with logical ports on the same physical port do not need to
connect through an external switch to communication with each other.
3. True or False: The standard IVE adapter card on most POWER6 systems will connect 16 LPARs,
but you can optionally order an IVE adapter card which connects up to 32 LPARs.
The answer is true.
4. True or False: An IVE logical port can be used as the physical adapter in an SEA configuration.
The answer is true.
5. You can see the number of QPs by looking at output from what command?
a. lsattr (AIX)
b. entstat (AIX)
c. ifconfig (AIX)
d. lshwres (HMC)
The answer is entstat (AIX).
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-173
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (2 of 2)
6. True or False: It is best to have the number of QPs equivalent to the
number of virtual, dedicated, or logical processors in a partition
(whichever is the highest number).
8. True or False: The MCS value sets the maximum number of available
logical ports per physical port.
9. True or False: The MCS value sets the number of queue pairs (QPs)
in each partition which is configured for that port group.
11. What is the effect of disabling the multicore attribute for an LHEA
Ethernet device in an AIX LPAR?
Notes:
Checkpoint solutions (2 of 2)
6. True or False: It is best to have the number of QPs equivalent to the number of virtual, dedicated,
or logical processors in a partition (whichever is the highest number).
The answer is true.
7. True or False: The best performance will be between logical ports which share the same internal
switch.
The answer is true.
8. True or False: The MCS value sets the maximum number of available logical ports per physical
port.
The answer is false. MCS value sets the maximum number per port group.
9. True or False: The MCS value sets the number of queue pairs (QPs) in each partition which is
configured for that port group.
The answer is true.
11. What is the effect of disabling the multicore attribute for an LHEA Ethernet device in an AIX LPAR?
The answer is when you disable the multicore attribute, the device has just one QP.
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-175
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 5: Summary
Having completed this topic, you should be able to:
Describe the Integrated Virtual Ethernet (IVE) adapter function
List performance and network availability considerations when
configuring IVE devices
Tune the MCS value and queue pairs for optimal performance
or scalability
View queue pair configuration from AIX
Monitor IVE port usage
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-177
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-179
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
Having completed this unit, you should be able to:
Discover physical to virtual SCSI device configuration
Determine which client partitions and devices are affecting the Virtual I/O Server performance
Describe the partition resource sizing guidelines for Virtual I/O Servers used for virtual SCSI
Use performance analysis tools to monitor virtual SCSI device performance
Describe how the following tuning options affect virtual Ethernet performance:
MTU sizes, CPU entitlement, TCP checksum offloading, simultaneous multithreading
Monitor virtual Ethernet utilization statistics
Describe Virtual I/O Server sizing guidelines for hosting shared Ethernet adapter services
Physical adapters, memory, and processing resources
Configure shared Ethernet adapter threading
Configure TCP segmentation offload on the shared Ethernet adapter
Configure SEA bandwidth apportioning and monitor with the seastat utility
Monitor shared Ethernet adapter network traffic with Virtual I/O Server utilities
Describe the Integrated Virtual Ethernet (IVE) adapter function
List performance and network availability considerations when configuring IVE devices
Tune the MCS value and queue pairs for optimal performance or scalability
View queue pair configuration from AIX
Monitor IVE port usage
Notes:
Copyright IBM Corp. 2010, 2011 Unit 7. I/O device virtualization performance and tuning 7-181
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
01:30
References
SG24-7460-01 IBM System p Live Partition Mobility Redbook
IBM POWER6 partition mobility: Moving virtual servers seamlessly
between physical systems
http://researchweb.watson.ibm.com/journal/rd/516/ar
mstrong.pdf
Unit objectives
After completing this unit, you should be able to:
Notes:
Notes:
Partition mobility provides the ability to move a logical partition from one system to another.
Live (or active) partition mobility allows you to move a running logical partition, including its
operating system and applications, from one system to another. The applications do not
need to be shut down. Inactive partition mobility allows you to move a powered off (or
deactivated) logical partition from one system to another.
Live Partition Mobility
Live Partition Mobility allows you to migrate running AIX and Linux partitions and their
hosted applications from one physical server to another without disrupting the
infrastructure services. The migration operation, which takes just a few seconds, maintains
complete system transactional integrity. The migration transfers the entire system
environment, including processor state, memory, attached virtual devices, and connected
users.
As the number of hosted partitions increases, finding a maintenance window acceptable to
all becomes increasingly difficult. Live partition mobility allows you to move your partitions
Uempty around so that you can perform previously disruptive operations on the machine when it
best suits you, rather than when it causes the least inconvenience to the users.
Live partition mobility helps you meet the increasingly stringent service-level agreements
(SLAs) because it allows you to proactively move running partitions and applications from
one server to another. The ability to move running partitions from one server to another
offers you the ability to balance workloads and resources. If a key application's resource
requirements peak unexpectedly to a point where there is contention for server resources,
you might move it to a larger server or move other, less critical, partitions to different
servers, and use the freed-up resources to absorb the peak.
Live partition mobility can also be used as a mechanism for server consolidation, as it
provides an easy path to move applications from individual, stand-alone servers to
consolidation servers. If you have partitions with workloads that have widely fluctuating
resource requirements over time (for example, with a peak workload at the end of the
month or the end of the quarter), you can use live partition mobility to consolidate partitions
to a single server during the off-peak period, allowing you to power-off unused servers.
Then move the partitions to their own, adequately configured servers, just prior to the peak.
Live partition mobility contributes to the continuous availability goal. It can:
Reduce planned down time by dynamically moving applications from one server to
another
Respond to changing workloads and business requirements by letting you move
workloads from heavily loaded servers to servers that have spare capacity
Reduce energy consumption by allowing you to easily consolidate workloads and
power off unused servers
Inactive partition mobility
Inactive migration moves the definition of a powered off logical partition from one system to
another along with its network and disk configuration. No additional change in network or
disk setup is required and the partition can be activated as soon as migration is completed.
The inactive migration procedure performs the reconfiguration of the systems involved,
including the following:
A new partition is created on the destination system with the same configuration
present on the source system.
Network access and disk data is preserved and made available to the new partition.
On the source system, the partition configuration is removed and all involved resources
are freed.
If a system is down due to scheduled maintenance or not in service for other reasons, an
inactive migration can be performed. It is executed in a controlled way and with minimal
administrator interaction so that it can be safely and reliably performed in a very short time
frame.
Overview of process (1 of 2)
Move running partitions from one system to another
POWER6 systems
Partition with only virtual devices
Virtual disks must be LUNs on external SAN storage
LUNs must be accessible to a VIO Server on each system.
Shared and not reserved
Invoked by HMC command or GUI
migrlpar
Can be performed using the Integrated Virtualization Manager (IVM)
Notes:
Partition mobility provides systems management flexibility and improves system
availability. For example:
You can avoid planned outages for hardware or firmware maintenance by moving
logical partitions to another server and then performing the maintenance. Partition
mobility can help lead to zero downtime maintenance because you can use it to work
around scheduled maintenance activities.
You can avoid downtime for a server upgrade by moving logical partitions to another
server and then performing the upgrade. This allows your end users to continue their
work without disruption.
If a server indicates a potential failure, you can move its logical partitions to another
server before the failure occurs. Partition mobility can help avoid unplanned downtime.
You can consolidate workloads running on several small, under-used servers onto a
single large server.
You can move workloads from server to server to optimize resource use and workload
performance within your computing environment. With active partition mobility, you can
manage workloads with minimal downtime.
Overview of process (2 of 2)
hdisk0
Server Server
A B
vscsi0 en0
Service processor
Service processor
VASI VASI
Ethernet network
LUN
Copyright IBM Corporation 2011
Notes:
Active partition mobility lets you move a running logical partition, including its operating
system and applications, from one server to another without disrupting the operation of that
logical partition.
1. The user ensures that all requirements are satisfied and all preparation tasks are
completed.
2. The user initiates active partition mobility using the Partition Migration wizard (or
migrlpar command) on the HMC.
3. The HMC verifies the partition mobility environment.
4. The HMC prepares the source and destination environments for active partition
mobility.
5. The HMC initiates the transfer of the partition state from the source environment to the
destination environment. This includes all the logical partition profiles associated with
the mobile partition.
Uempty - The source mover service partition (MSP) extracts the partition state information
from the source server and sends it to the destination mover service partition over
the network.
- The destination mover service partition receives the partition state information and
installs it on the destination server.
6. The HMC initiates the suspension of the mobile partition on the source server. The
source mover service partition continues to transfer the partition state information to the
destination mover service partition.
7. The Hypervisor resumes the mobile partition on destination server.
8. The HMC initiates completion of the migration. This means that all resources that were
consumed by the mobile partition on the source server are reclaimed by the source
server, including:
- The source Virtual I/O Server unlocks, unconfigures, or undefines virtual resources
on the source server.
- The HMC removes the hosting virtual adapter slots from the source Virtual I/O
Server logical partition profiles as required.
9. The user performs post requisite tasks, such as:
- Adding the mobile partition to a partition workload group
- Adding dedicated I/O adapters
Inactive partition mobility
Inactive partition mobility lets you move a powered off logical partition from one server to
another.
1. The user ensures that all requirements are satisfied and all preparation tasks are
completed.
2. The user shuts down the mobile partition.
3. The user initiates inactive partition mobility using the Partition Migration wizard on the
HMC.
4. The HMC verifies the partition mobility environment.
5. The HMC prepares the source and destination environments for inactive partition
mobility.
6. The HMC initiates the transfer of the partition state from the source environment to the
destination environment. This includes all the logical partition profiles associated with
the mobile partition.
7. The HMC initiates completion of the migration. This means that all resources that were
consumed by the mobile partition on the source server are reclaimed by the source
server, including:
- The source Virtual I/O Servers unlocks, unconfigures, or undefines virtual resources
on the source and destination servers.
- The HMC removes the hosting virtual adapter slots from the source Virtual I/O
Server logical partition profiles.
8. The user activates the mobile partition on the destination server.
9. The user performs post requisite tasks, such as:
- Establishing virtual terminal connections
- Adding the mobile partition to a partition workload group
Components
Partition
POWER6
Virtual disk hdisk0
Mapped through VIO Server to a LUN server
Virtual Ethernet
Mapped through SEA in VIO Server vscsi0 en0
Hypervisor
network
Private
VLAN
Service processor
vSCSI
Virtual Ethernet
VASI
Support for migration HMC
vhost0 hdisk1 en1 MSP
VIO Server vtscsi0 en2
en2
Provides the Virtual I/O SEA
VASI interface to Hypervisor Open
fcs0 en0
network
Mover Service Partition
HMC
Ethernet network
Configuration of required capabilities
Validation of configuration
Orchestrates the sequence of events Storage area network
LUN
VASI: Virtual Asynchronous Services Interface
Notes:
The candidate partition must be one that has only virtual devices. If there are any physical
devices in its allocation, they must be removed before the validation or migration is
initiated.
The Hypervisor must support the partition mobility functionality. POWER6 Hypervisors
have this capability. PowerVM Enterprise edition must be ordered for both source and
destination managed systems.
The Virtual I/O Server on the source system provides the access to the clients resources,
but also has a Virtual Asynchronous Services Interface (VASI) and is identified as a mover
service partition (MSP). The VASI device allows the mover service partition to
communicate with the Hypervisor. MSP must be configured on both the source and
destination Virtual I/O Servers designated as the mover service partitions for the mobile
partition to participate in active mobility. The MSP is a Virtual I/O Server logical partition
that has at least one VASI adapter configured to allow the MSP to communicate with the
Hypervisor.
Uempty The HMC is used to configure, validate, and orchestrate the migration operation. You can
use the HMC to configure the Virtual I/O Server as an MSP. There is no need to create this
VASI adapter on the Virtual I/O Server. This device is automatically configured.
HMC includes a wizard or program that validates your configuration and identifies errors
that cause the migration to fail. During the migration, the HMC controls all phases of the
process.
Instructor notes:
Purpose Identify key elements of the migration process.
Details
Additional information
Transition statement The following is an introduction to the requirements.
Uempty
Basic requirements (1 of 4)
Two POWER6 or above systems
Managed by the same HMC or different HMCs
PowerVM Enterprise feature activated
Compatible CODE levels (with partition mobility enabled)
HMC, system firmware, OS, VIO Server
The same logical memory block (LMB) size on each system
Source and target systems must have:
VIOS providing mobile LPARs network and disk access
LUN access (external hdisk with no_reserve )
Both systems must bridge to clients networks
Operating, with VIO Server running
MSP
Target system:
No partition with the same name
Cannot be running on internal battery power
Must have sufficient resources (CPU and memory) available
Copyright IBM Corporation 2011
Notes:
Preparation
When you have created the Virtual I/O Servers, and configured mover service partition
devices, you must prepare the source and destination systems for migration by doing the
following:
1. Synchronize the time of day clocks on the mover service partitions using an external
time reference, such as the network time protocol (NTP). This is an optional step that
increases the accuracy of time measurement during migration. It is not required by the
migration mechanisms and even if this step is omitted, the migration process correctly
adjusts the partition time. Time never goes backwards on the mobile partition during a
migration.
2. Prepare the partition for migration.
- Use dynamic reconfiguration on the HMC to remove all dedicated I/O, such as PCI
slots, GX slots, and HEA, from the mobile partition.
- Remove the partition from a partition workload group (if assigned).
Uempty Destination system must not have a partition with the same name as the one that is to
be migrated
Source and destination Virtual I/O Server requirements
There must be at least one Virtual I/O Server logical partition installed and activated on
both the source and destination servers. The source and destination Virtual I/O Server
partitions must be at release level 1.5.
The mobile partition's network and disk access must be virtualized using one or more
Virtual I/O Servers.
The Virtual I/O Servers on both systems (source and target) must have a shared
Ethernet adapter configured to bridge to the same Ethernet network used by the mobile
partition.
The Virtual I/O Servers on both systems must be capable of providing virtual access to
all disk resources the mobile partition is using.
On the destination Virtual I/O Server partition, do not create any virtual SCSI adapters for
the mobile partition. These are created automatically by the migration function.
Mobile partition's operating system requirements
AIX version 5.3 Technology Level 7 or later
Red Hat Enterprise Linux version V5.1 or later
SUSE Linux Enterprise Services 10 (SLES 10) Service Pack 1 or later
Previous versions of AIX and Linux can participate in inactive partition mobility, if the
operating systems support virtual devices and IBM System p6 models.
Battery power
Ensure that the destination system is not running on battery power. If the destination
system is running on battery power, then you need to return the system to its regular power
source before moving a logical partition to it. However, the source system can be running
on battery power.
Instructor notes:
Purpose Identify key requirements.
Details Reference some of the details in the student notes.
Additional information
Transition statement A couple of the VIOS requirements can be viewed or created at
the HMC.
Uempty
Basic requirements (2 of 4)
Virtual
Asynchronous
Services
Interface
Copyright IBM Corporation 2011
Notes:
Access the VIOS LPAR's properties to configure it as a mover service partition by selecting
the option in the General tab. Also, verify the VASI interface is created by viewing the
Virtual Adapters tab. With the code available before November 2007 (VIOS 1.4), the VASI
interface had to be manually created. With VIOS 1.5, the VASI interface is automatically
created.
A VASI device is a virtual device unique to active partition mobility that allows the mover
service partition to communicate with the hypervisor.
Instructor notes:
Purpose Identify how to set a VIOS LPAR as an MSP and verify that the VASI interface
is created.
Details
Additional information
Transition statement
Uempty
Basic requirements (3 of 4)
Partition migration support must be enabled by entering PowerVM-
Enterprise activation key.
System firmware and
HMC code must be at
the required levels
CEC
properties
Notes:
You can verify the source and destination systems support partition mobility by looking at
the properties of the managed systems. If your system is not capable, the Migration tab
would not be visible.
The base level code requirements
HMC version and release must be at V7R320 or later.
POWER6 firmware level eFW3.2.
Must have PowerVM Enterprise Edition feature.
Instructor notes:
Purpose Manual validation of the basic requirements for the VIO and managed system.
Details
Additional information
Transition statement The following lists the LPARs requirements.
Uempty
Basic requirements (4 of 4)
Partition to be migrated must satisfy the following
requirements:
Not using physical I/O
Only virtual devices (user-defined virtual devices must have a virtual
slot number higher than 10)
VSCSI not backed by LVs or files in VIOS
Not set for redundant error path reporting
No additional virtual serial ports
Not part of a workload group
Not using barrier synchronization register
Not using huge pages
Notes:
These are additional considerations for the mobile client.
BSR is a memory register that is located on certain POWER-based processors. A
parallel-processing application running on AIX can use a BSR to perform barrier
synchronization, which is a method for synchronizing the threads in the parallel-processing
application. For a logical partition to participate in active partition mobility, it cannot use
BSR arrays. If the mobile partition uses BSR, it can participate in inactive partition mobility.
Huge pages can improve performance in specific environments that require a high degree
of parallelism, such as in DB2 partitioned database environments. You can specify the
minimum, desired, and maximum number of huge pages to assign to a partition when you
create the partition or partition profile. For a logical partition to participate in active partition
mobility, it cannot use huge pages. If the mobile partition uses huge pages, it can
participate in inactive partition mobility.
Instructor notes:
Purpose Identify the client/mobile partition's basic requirements.
Details
Additional information
Transition statement You can use the HMC GUI or commands to assist with checking
the environment.
Uempty
Validation (1 of 8)
Validation check options
lslparmigr and migrlpar HMC commands
$ lslparmigr -r virtualio -m srcSystem -t destSystem \
filter lpar_names=myLPAR
HMC GUI
Notes:
Before performing the migration, you should perform a validation. Explicitly requesting this
is optional but is recommended to manage errors before invoking the migration.
The HMC provides an easy way to check the systems and HMC for most of the
requirements. The process could be invoked from the HMC GUI or from using the
lslparmigr and migrlpar commands. When performing the migrate' option from the HMC
GUI, the default action is to automatically run the validation process before performing the
migration process.
Instructor notes:
Purpose Identify how to perform a validation.
Details
Additional information
Transition statement Lets see a validation wizard example.
Uempty
Validation (2 of 8)
Validation wizard example
Click Migrate
when validation is
successful
Copyright IBM Corporation 2011
Notes:
Click Validate to have the HMC examine the environment. Errors are displayed with
recommended resolutions.
The source and destination systems can be managed by the same or different HMCs.
Starting with HMC Version 7 Release 3.4, the Remote Live Partition Mobility feature is
available. This feature allows a user to migrate a client partition to a destination server that
is managed by a different HMC. The function relies on Secure Shell (SSH) to communicate
with the remote HMC for information such as the list of managed systems. SSH key
authentication to the remote HMC must be configured. To perform this, you should log into
the destination systems HMC and retrieve the authentication keys from the HMC currently
managing the mobile partition. As hscroot (or an account with hmcsuperadmin privileges),
use the mkauthkeys command.
For example, to configure ssh rsa key authentication to a remote system (in our case
10.31.204.31):
If they are managed by different HMCs, you must provide the IP address (or resolvable
hostname) for the HMC of the destination system in the Remote HMC field of the
Validation Wizard. Clicking MSP Pairing displays a list of MSPs on the source and
destination systems. The administrator should select the MSPs to be used during the
migration process.
After the successful validation and MSP selection, you can click Migrate to initiate the
partition mobility process.
If errors or warnings occur, the Partition Validation Errors/Warnings window opens. Perform
the following steps:
1. Check the messages and identify the prerequisites for the migration:
- For error messages: You cannot perform the migration if errors exist. Eliminate any
errors.
- For warning messages: If only warnings occur (no errors), you can migrate the
partition after the validation.
2. Close the Partition Validation Errors/Warnings window. A validation window opens
again. If you had warning messages only (no error messages), you can click Migrate.
Validation (3 of 8)
Validation process checks the following items:
RMC connections between VIO Servers
RMC connection to the partition to be migrated
LMB sizes on both systems
The partition to be migrated (partition readiness):
No physical adapters defined as required in the LPAR.
The LPAR uses only external LUNs.
VSCSI cannot be backed by LVs in VIOS.
The LPAR supports active migration (OS support).
The LPAR type is AIX Linux.
The LPAR is not a mover service partition.
The LPAR is not using barrier synchronization registers.
The LPAR is not using huge pages.
The LPAR state is active or running.
The LPAR is not in a partition workload group.
The LPAR MAC address is unique (across both servers).
The LPAR has a name that is not in use on the target system.
Not exceeding the supported number of active migrations
Notes:
This is not a complete list of checks.
For example, it first checks the source and destination systems, POWER Hypervisor,
Virtual I/O Servers, and mover service partitions for active partition migration capability and
compatibility.
Validation (4 of 8)
RMC connections
hdisk0
Server Server
A B
vscsi0 en0
Service Processor
Service processor
RM
VASI VASI
C
vhost0 hdisk1 en1 MSP MSP en1 hdisk1
vtscsi0 en2 en2 en2
SEA en2 SEA
fcs0 en0 RMC RMC en0
HMC fcs0
Ethernet network
LUN
Notes:
This is showing which RMC connections are checked and needed during the migration.
The validation process checks that the RMC connections to the mobile partition, the source
and destination Virtual I/O Servers, and the connection between the source and destination
mover service partitions are established.
Validation (5 of 8)
New RMC capabilities
Enabled by new code in the VIO Server and AIX partitions.
Notes:
If the results for your partition are <Active 1>, the RMC connection is established. If the
results for your partition are <Active 0> or your partition does not appear in the command
results/output, you have an RMC problem with the client.
The Dcaps value of 0x79f is identifying a version 2.1 VIOS that is partition mobility capable.
The Dcaps value of 0x5f is identifying an AIX 6.1 client partition that is partition mobility
capable.
Validation (6 of 8)
hdisk0
Server Server
A B
vscsi0 en0
Service processor
Pa
Service processor
rti
t io
nr
VASI ea VASI
din
hdisk1 en1 MSP es MSP en1
vhost0 s hdisk1
vtscsi0 en2 en2 en2
SEA en2 SEA
fcs0 en0 en0 fcs0
HMC
Ethernet network
LUN
Notes:
Partition readiness
Checks that none of the client virtual SCSI disks on the mobile partition are backed by
logical volumes and that no disks map to internal disks
Checks the mobile partition, its OS, and its applications for active migration capability.
Checks that the logical memory block size is the same on the source and destination
systems
Ensures that the type of the mobile partition is aixlinux and that it is not an alternate
error logging partition or a mover service partition
Ensures that the mobile partition is not configured with barrier synchronization registers
Ensures that the mobile partition is not configured with huge pages
Checks that the partition state is active and running
Checks that the mobile partition is not in a partition workload group
Uempty Checks the uniqueness of the mobile partition's virtual MAC addresses
Checks that the mobile partition's name is not already in use on the destination server
Checks the number of current active migrations against the number of supported active
migrations
Checks that there are no physical adapters in the mobile partition and that there are no
required virtual serial slots higher than slot 2
Application migration awareness
A migration aware application is one that is designed to recognize and dynamically adapt to
changes in the underlying system hardware after being moved from one system to another.
Most applications will not require any changes to work correctly and efficiently with Live
Partition Mobility. Some applications might have dependencies on characteristics that
change between the source and destination servers and other applications might adjust
their behavior to facilitate the migration.
Applications that should probably be made migration aware include applications that use
processor and memory affinity characteristics to tune their behavior because affinity
characteristics might change as a result of migration. The externally visible behavior
remains the same, but performance variations, for better or worse, might be observed
because of different server characteristics.
Making applications migration-aware
An application registers its capability with AIX and might block migration during the check
phase.
Mobility awareness can be built in to an application using the standard AIX dynamic
reconfiguration notification infrastructure. This infrastructure offers two different
mechanisms for alerting applications to configuration changes. Using the SIGRECONFIG
signal and the dynamic reconfiguration APIs Registering scripts with the AIX dynamic
reconfiguration infrastructure. Using the SIGRECONFIG and dynamic reconfiguration APIs
requires additional code in your applications. The DLPAR scripts allow you to add
awareness to those applications for which you do not have the source code.
Instructor notes:
Purpose Describe the Partition Readiness check of the Migration Validation process.
Details
Additional information
Transition statement The next step in the validation is checking the resources.
Uempty
Validation (7 of 8)
hdisk0
Server Server
A B
vscsi0 en0
Service processor
Service processor
VASI VASI
Ethernet network
LUN
Notes:
After verifying system and partition configurations, the HMC then determines whether
sufficient resources are available on the destination server to host the inbound mobile
partition. The following steps are performed:
1. The HMC checks that the necessary resources (processors, memory, and virtual slots)
are available to create a shell partition on the destination system with the exact
configuration of the mobile partition.
2. The HMC generates a source-to-destination hosting virtual adapter migration map,
ensuring no loss of multipath I/O capability for virtual SCSI and virtual Ethernet. The
HMC fails the migration request if the device migration map is incomplete.
Instructor notes:
Purpose Describe the System Resource validation check.
Details
Additional information
Transition statement The following is the last check performed before the migration.
Uempty
Validation (8 of 8)
hdisk0
Server Server
A B
vscsi0 en0
Service processor
Service Processor
O
ap pera
pli ti
n
ca VASI
tiog sy VASI
n r ste
vhost0 hdisk1 en1 MSP ea m MSP en1 hdisk1
din an
es d
vtscsi0 en2 en2 s en2
SEA en2 SEA
fcs0 en0 en0 fcs0
HMC
Ethernet network
LUN
Notes:
The HMC instructs the operating system in the mobile partition to check its own capacity
and readiness for migration. AIX passes the check-migrate request to those applications
and kernel extensions that have registered to be notified of dynamic reconfiguration events.
The operating system either accepts or rejects the migration. In the latter case, the HMC
fails the migration request.
An API allows the kernel and applications to be notified of the migration operation. A
SIGRECONFIG Signal is sent to DR-Aware applications. These applications cooperate
and are notified of the different phases (Check-, Prepare-, and Post- Migrate). When
notified about the pending migration, the following actions could occur:
Application could be enabled to do some reconfiguration
- Reduce memory footprint
- Loosen heartbeats and other time-outs
- Throttle workloads and so on
- Quiesce or restart
Indicate that partition is not ready for migration (The HMC would cancel the migration.)
Instructor notes:
Purpose
Details This step gives the LPAR a chance to reject the migration request.
Additional information
Transition statement The following page shows the command syntax and HMC GUI
for initiating the migration process.
Uempty
HMC command
migrlpar -o m -m srcSystem -t destSystem \
-p myLPAR -d 5 v
(Suggest running the Validate procedure if this LPAR has not been
migrated before.)
Copyright IBM Corporation 2011
Notes:
From the mobile LPARs context menu, click Operations > Mobility > Migrate. This
launches the Partition Migration wizard which guides you through the migration process.
Through this wizard, you can also learn about the partition migration process because it
provides a lot of process details in each step.
Suggest running the Validate procedure if this LPAR has not been migrated before.
migrlpar -o m | r | s | v
-m <managed system>
[-t <managed system>]
-p <partition name> | --id <partitionID>
[-n <profile name>]
[-f <input data file> | -i <input data>]
[-w <wait time>]
[--force]
[-d <detail level>]
[-v]
[--help]
-o The operation to perform
m - validate and migrate
r - recover
s - stop
v - validate
-m <managed system>: The source managed system's name.
-t <managed system>: The destination managed system's name.
-p <partition name>: The partition on which to perform the operation.
--id <partitionID>: The ID of the partition on which to perform the operation.
-n <profile name>: The name of the partition profile to be created on the destination.
-f <input data file>: The name of the file containing input data for this command. The
format is:
- attr_name1=value,attr_name2=value,...
- or
- attr_name1=value1,value2,...
-i <input data>: The input data for this command, typically the virtual adapter mapping
from the source to destination or the destination shared-processor pool. This follows the
same format as the input data file of the -f option.
-w <wait time>: The time, in minutes, to wait for any operating system command to
complete.
--force: Force the recovery. This option should be used with caution.
-d <detail level>: The level of detail requested from operating system commands;
values range from 0 (none) to 5 (highest).
-v: Verbose mode.
--help: Prints a help message.
Notes:
Next, the wizard gives you a chance to avoid over-writing existing partition profiles.
As part of the migration process, the HMC creates a new migration profile containing the
partitions current state. Unless you specify a profile name when you start the migration,
this profile replaces the existing profile that was used to activate the LPAR. Also, if you
specify an existing profile name, the HMC replaces that profile with the new migration
profile. If you do not want to replace any of the existing profiles that are associated with the
mobile LPAR, you must specify a new, unique profile name.
The next step gives you a chance to identify the target system. You can only identify a
system currently managed by the HMC.
mkauthkeys
Notes:
The Remote Live Partition Mobility refers to the migration of a logical partition between two
IBM Power Systems servers each managed by a separate Hardware Management
Console. This feature is available starting with HMC Version 7 Release 3.4. Remote
migrations require coordinated movement of a partitions state and resources over a secure
network channel to a remote HMC. The following list indicates the high-level prerequisites
for remote migration. If any of the following elements are missing, a migration cannot occur:
A ready source system that is migration-capable
A ready destination system that is migration-capable
Compatibility between the source and destination systems
Destination system managed by a remote HMC
Network communication between local and remote HMC
A partition that is ready to be moved from the source system to the destination system.
For an inactive migration, the partition must be turn off, but must be capable of booting
Uempty on the destination system. For active migrations, an MSP on the source and destination
systems.
One or more SANs that provide connectivity to all of the mobile partitions disks to the
Virtual I/O Server partitions on both the source and destination servers. The mobile
partition accesses all migratable disks through devices (virtual Fibre Channel, virtual
SCSI, or both). The LUNs used for virtual SCSI must be zoned and masked to the
Virtual I/O Servers on both systems.
The mobile partitions virtual disks must be mapped to LUNs; they cannot be part of a
storage pool or logical volume on the Virtual I/O Server. One or more physical IP
networks (LAN) that provide the necessary network connectivity for the mobile partition
through the Virtual I/O Server partitions on both the source and destination servers. The
mobile partition accesses all migratable network interfaces through virtual Ethernet
devices.
An RMC connection to manage inter-system communication
SSH key authentication to the remote HMC
Remote migration operations require that each HMC has RMC connections to its individual
systems Virtual I/O Servers and a connection to its systems service processors. The HMC
does not have to be connected to the remote systems RMC connections to its Virtual I/O
Servers nor does it have to connect to the remote systems service processor.
The local HMC, which manages the source server in a remote migration, serves as the
controlling HMC. The remote HMC, which manages the destination server, receives
requests from the local HMC and sends responses over a secure network channel.
Use the mkauthkeys command in the CLI to retrieve authentication keys from the current
HMC managing the mobile partition. You must be logged in as a user with hmcsuperadmin
privileges, such as the hscroot user, and authenticate to the remote HMC by using a
remote user ID with hmcsuperadmin privileges.
Instructor notes:
Purpose
Details
Additional information
Transition statement Next you might see errors or warnings.
Uempty
Notes:
Errors must be resolved since they will prevent the mobility process from continuing.
Warning should be read, but will not prevent the process from succeeding.
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
The wizard identifies the MSPs on the target and lists the VLANs that are bridged. You get
an error if the target system does not have a VIOS configured to bridge the VLANs that the
mobile LPAR is configured to use on the source system. To resolve, you have to manually
configure an SEA to bridge the required VLANs at the target system.
Instructor notes:
Purpose Show how the wizard identifies the MSP and required VLAN bridging.
Details
Additional information
Transition statement
Uempty
Notes:
After you identify the target MSP, the wizard identifies the virtual SCSI server adapters that
appear to have access to the clients disks. This allows the administrator to identify with
either a dual VIOS configuration or single VIOS configuration at the destination system.
This determines where the vSCSI server adapters will be created to support the client
adapters.
Instructor notes:
Purpose Show how the wizard identifies the virtual SCSI server adapter at the target
systems MSP.
Details Identifies the virtual adapter that has access to the clients disks.
Additional information
Transition statement
Uempty
Notes:
The last step is to show the summary. If you do not like what is shown in the summary, you
can click Back to go back and change any of the values.
When you are satisfied with the values, click Finish to initiate the migration process.
Instructor notes:
Purpose Completing the Migration wizard.
Details
Additional information
Transition statement The following slides graphically depicts what occurs when you
start the migration process.
Uempty
hdisk0 hdisk0
Server Server
A B
vscsi0 en0 en0 vscsi0
Service Processor
Service processor
AR
VASI LPVASI
w
ne
vhost0 hdisk1 en1 MSP te MSP en1 hdisk1
r ea
vtscsi0 en2 en2 C en2
SEA en2 SEA
fcs0 en0 en0 fcs0
HMC
Ethernet network
LUN
Copyright IBM Corporation 2011
Notes:
After validation checks pass, a new partition is created on the destination system to
accommodate the migrated environment.
Instructor notes:
Purpose Describe the start of the migration.
Details
Additional information
Transition statement
Uempty
hdisk0 hdisk0
Server Server
A B
vscsi0 en0 en0 vscsi0
Service processor
Service processor
VASI VASI
Ethernet network
LUN
Copyright IBM Corporation 2011
Notes:
The HMC verifies that the target MSP has access to the mobile LPARs external storage. It
then creates the required virtual SCSI adapters in the MSP on the destination system and
completes the LUN to virtual adapter mapping.
Instructor notes:
Purpose Mention the HMC's role in configuring the virtual SCSI devices at the
destination system.
Details
Additional information
Transition statement Next, the MSPs transfer the mobile LPARs state from the source
to the target system.
Uempty
hdisk0 hdisk0
Server Server
A B
fcs0 en0 en0 fcs0
Service processor
Service processor
VASI VASI
Ethernet network
NPIV
LUN
Notes:
The addition of NPIV and virtual Fibre Channel adapters reduces the number of
components and steps necessary to configure shared storage in a Virtual I/O Server
configuration:
With virtual Fibre Channel support, you do not map individual disks in the Virtual I/O
Server to the mobile partition. LUNs from the storage subsystem are zoned in a switch
with the mobile partitions virtual Fibre Channel adapter using its worldwide port names
(WWPNs), which greatly simplifies Virtual I/O Server storage management.
LUNs assigned to the virtual Fibre Channel adapter appear in the mobile partition as
standard disks from the storage subsystem. LUNs do not appear on the Virtual I/O
Server unless the physical adapters WWPN is zoned.
Standard multi-pathing software for the storage subsystem is installed on the mobile
partition. Multi-pathing software is not installed into the Virtual I/O Server partition to
manage virtual Fibre Channel disks. The absence of the software provides system
administrators with familiar configuration commands and problem determination
processes in the client partition.
Partitions can take advantage of standard multipath features, such as load balancing
across multiple virtual Fibre Channel adapters presented from dual Virtual I/O Servers.
Required components
In addition to the basic requirements, the following components must be configured in the
environment:
An NPIV-capable SAN switch
An NPIV-capable physical Fibre Channel adapter on the source and destination Virtual
I/O Servers
HMC Version 7 Release 3.4, or later
Virtual I/O Server Version 2.1 with Fix Pack 20.1, or later
AIX 5.3 TL9, or later
AIX 6.1 TL2 SP2, or later
Each virtual Fibre Channel adapter on the Virtual I/O Server mapped to an
NPIV-capable physical Fibre Channel adapter
Each virtual Fibre Channel adapter on the mobile partition mapped to a virtual Fibre
Channel adapter in the Virtual I/O Server
At least one LUN mapped to the mobile partitions virtual Fibre Channel adapter
hdisk0 hdisk0
Server Server
A B
vscsi0 en0 en0 vscsi0
Service Processor
Service Processor
VASI VASI
Ethernet network
LUN
Copyright IBM Corporation 2011
Notes:
The HMC initiates the transfer of the partition state from the source environment to the
destination environment. This includes all the logical partition profiles associated with the
mobile partition. The source mover service partition extracts the partition state information
from the source server (through Hypervisor) and sends it to the destination mover service
partition over the network. The destination mover service partition receives the partition
state information and installs it on the destination server.
Notes:
Notes:
Additional state information that is transferred from the source to destination system.
hdisk0 hdisk0
Server Server
A B
vscsi0 en0 en0 vscsi0
Service processor
Service processor
VASI VASI
Logical memory copy
vhost0 hdisk1 en1 MSP MSP en1 hdisk1 vhost0
vtscsi0 en2 en2 en2 vtscsi0
SEA en2 SEA
fcs0 en0 en0 fcs0
HMC
LUN
Notes:
During the transfer of the state information, the partition and its applications are active.
Sometime after more than half of the state has been transferred, the HMC initiates the
suspension of the mobile partition on the source server. The source mover service partition
continues to transfer the partition state information to the destination mover service
partition. The hypervisor resumes the mobile partition on destination server. The partition is
inactive for around two seconds between suspension (on source system) and reactivation
(on destination system).
hdisk0
Server Server
A B
en0 vscsi0
Service processor
Service processor
VASI
VASI
vhost0 hdisk1 en1 en1 hdisk1 vhost0
vtscsi0 Virtuen2 en2 en2 vtscsi0
al SC
SEA en2 SEA
SI re
mov
fcs0 en0 al en0 fcs0
HMC
Ethernet network
LUN
Notes:
The source Virtual I/O Servers unlock, unconfigure, or undefine virtual resources on the
source servers. After the transfer of the state is completed, the HMC removes the hosting
virtual adapter slots from the source Virtual I/O Server logical partition profiles as required.
However, you will find the slots that were defined as connect any (at the source Virtual I/O
Server) will not get removed.
hdisk0
Server Server
A B
en0 vscsi0
Service processor
Service processor
VASI
VASI
hdisk1 en1 LPAR en1 hdisk1 vhost0
re
en2 en2 mova en2 vtscsi0
l en2
SEA SEA
fcs0 en0 en0 fcs0
HMC
Ethernet network
LUN
Notes:
Finally, the source LPAR is removed.
Notes:
When using the HMC GUI, this output box is seen immediately after starting the migration
progress.
On source machine
Notes:
On the source system, in the Server > Contents of pane, you can observe the status of the
LPAR. Migrating-Running implies the migration process is running.
In the managed system Properties, the Migration tab shows the number of migrations in
progress.
On target machine
In progress
Finished
Notes:
On the target system, in the Server > Contents of pane, you can observe the status of the
LPAR. Migrating-Running implies the migration process is running. A status of Running
implies the migration has completed.
errpt -a
---------------------------------------------------------------------------
LABEL: CLIENT_PMIG_DONE
IDENTIFIER: A5E6DB96
Description
Client Partition Migration Completed
---------------------------------------------------------------------------
LABEL: CLIENT_PMIG_STARTED
IDENTIFIER: 08917DC6
Notes:
The migrated LPAR has entries in its AIX error log. There should be two entries; one for the
start and another for the completion.
errlog -a
---------------------------------------------------------------------------
LABEL: MVR_MIG_COMPLETED
IDENTIFIER: 3EB09F5A
Description
Migration completed successfully
Probable Causes
UNDETERMINED
Failure Causes
UNDETERMINED
Recommended Actions
NONE
Detail Data
STREAM ID
96C4 D88B 13BE F250
SERVICES (Source or Target)
Source MSP
Copyright IBM Corporation 2011
Notes:
At the source MSP, there is an entry for the completion and also an entry for when the
migrated LPAR is suspended.
Description
Client partition suspend issued
Probable Causes
UNDETERMINED
Failure Causes
UNDETERMINED
Recommended Actions
NONE
Detail Data
STREAM ID
96C4 D88B 13BE F250
SOFT SUSPEND
1
TRIGGER PERCENTAGE
99
SUSPEND COUNT
300
MAX PERCENTAGE
99
REQUESTED SUSPEND TRIGGER
100
Notes:
This shows when the migrated LPAR was suspended at the source system.
---------------------------------------------------------------------------
LABEL: MVR_MIG_COMPLETED
IDENTIFIER: 3EB09F5A
Description
Migration completed successfully
Probable Causes
UNDETERMINED
Failure Causes
UNDETERMINED
Recommended Actions
NONE
Detail Data
STREAM ID
96C4 D88B 13BE F250
SERVICES (Source or Target)
Target MSP
Notes:
The destination MSPs error log shows when the migration process completed.
Troubleshooting (1 of 5)
Error log
Migration GUI or command line messages
Also stored in /var/hsc/log/cimserver.log
alog -t cfg -o > /tmp/cfglog
Provides details not included in HMC message
Run at source and destination VIOS
Contains error details
Methods and scripts called
Abbreviated descriptions
RMC return code
The OS command return code
PID
Locate the lines containing ERROR
Notes:
The migration process is fairly verbose as it provides error log entries, GUI popup
messages, and entries in a config log file. The cfg type alog (or config log) is a log that can
be invaluable when troubleshooting migration problems. Sometimes this log provides
details not found in the HMC error messages or in the AIX error log.
Troubleshooting (2 of 5)
Example: Partition mobility validation fails with the following errors:
HSCLA29A The RMC command issued to partition VA_NET1 failed. The partition command
is: migmgr -f find_devices -t vscsi -d 1 The RMC return code is: 0 The OS command
return code is: 85 The OS standard out is: Running method
'/usr/lib/methods/mig_vscsi' 85 The OS standard err is:
Error[0]: HSCLA24E The migrating partition's virtual SCSI adapter 10 cannot be
hosted by the existing virtual I/O server (VIOS) partitions on the destination
managed system. To migrate the partition, set up the necessary VIOS hosts on the
destination managed system, then try the operation again.
Notes:
When a validation error occurs at the HMC, you are usually provided error details similar to
the information in the figure above. This will include an first associated error code, the
LPAR name, the command used by the validation process, the RMC return code, the OS
return code and sometimes additional related error codes. After reading the error message,
you might be wondering why the existing VIOS cannot host the migrating LPARs VSCSI
adapter. This information is not provided by the HMC error message, but might be provided
in the cfg log entries.
Troubleshooting (3 of 5)
Look for the return code in alog t cfg o output:
# grep "rc= 85" config.log
C0 663752 mig_vscsi.c 352 leaving mig_vscsi fn= find_devices, rc= 85
C0 663754 mig_vscsi.c 352 leaving mig_vscsi fn= find_devices, rc= 85
C0 663756 mig_vscsi.c 352 leaving mig_vscsi fn= find_devices, rc= 85
C0 663758 mig_vscsi.c 352 leaving mig_vscsi fn= find_devices, rc= 85
Process IDs
Return Codes
Notes:
As we examine the config log, we can search for information provided within the HMC error
message. One item to search for is the OS return code. In our example, this value is 85.
The C at the beginning of the entries indicate the beginning of a command execution. The
0 that follows indicate this is associated with an error. You should now search the log for
entries associated with one of the PIDs listed. In the following figure we will show an
example using 663758.
The config log is primarily for use by config commands, device methods, and dynamic
reconfig (DR) commands. There are three general forms of error log entries:
1. The start log. A command or device method that uses the error log will first log a "start"
entry. This will identify the command or method, command line parameters, its PID, and
parent's PID. The general format is as follows:
TS PID PPID [TIMESTAMP] [FILE] [LINE] CMD
Uempty Where:
- T identifies a type of executable. The letter C indicates the start of a command and
the letter M indicates the start of a method.
- S is the letter S identifying the start of a command or method.
- PID is the process ID of the command or method. It is useful for finding other log
entries in the config log that are added by this command or method.
- PPID is the parent process ID. This is useful for correlating the start log entry of a
method with the start log entry of the command that invoked the method.
- TIMESTAMP is an optional timestamp of the format HH:MM:SS.
- FILE is an optional source file name.
- LINE is an optional line number within the source file of the line that generated the
log entry.
- CMD is the command and arguments specified when the command or method was
invoked.
2. The second form is for logging informational, error, and debug log entries. The general
format is:
T# PID [TIMESTAMP] [FILE] [LINE] STRING
Where:
- T indicates a type of executable. The letter C indicates the start of a command and
the letter M indicates the start of a method. The letter B is a special case that is used
to log special boot time information.
- # is a verbosity number in the range of 0 to 9. 0 denotes an error condition. 1
denotes an informational message. Other numbers denote different levels of debug
information. The typical value will be 4.
- PID is the process ID. It can be matched to an earlier start log to see what command
generated this log entry.
- TIMESTAMP is an optional timestamp of the format HH:MM:SS.
- FILE is an optional source file name. Will always be included for an error entry (# is
0).
- LINE is an optional line number within the source file of the line that generated the
log entry. Will always be included for an error entry (# is 0).
- STRING is the data actually logged.
3. The third form is for special informational messages logged during boot. These always
have the format:
B# STRING
Where:
- B is always the letter B.
- # is usually the number 1 but could be 0 through 9 as described above.
- STRING is the actual data. If you see a timestamp, it is because it was included in
the data.
Instructor notes:
Purpose
Details
Additional information If an error occurs, you might also see a log entry that starts with
C0 or M0. Except for DR operations, where additional levels of debug information will be
logged, these are the only entries we log by default. This might change in the near future.
There is an environment variable that can be created to change the level of detail that will
be logged. Examples:
export CFGLOG=timestamp
will result in timestamps being included in new log entries.
export CFGLOG=verbosity:4
will result in all entries of verbosity up to and including 4 to be logged.
export CFGLOG=detail
will result in all entries including source file name and line numbers. I believe it will also
change the verbosity to 4.
Any combination of the above can be used as long as the values are separated by a
comma. For example:
export CFGLOG=verbosity:4,timestamp
Transition statement
Uempty
Troubleshooting (4 of 5)
Take one of the PIDs shown and grep it:
# grep 663758 config.log
CS 663758 352486 /usr/sbin/migmgr -f find_devices -t vscsi -d 1
C4 663758 Running method '/usr/lib/methods/mig_vscsi'
CS 663758 352486 /usr/sbin/migmgr -f find_devices -t vscsi -d 1
C0 663758 vsmig_set.c 55 vsmig_dest_adapter is about to call
LIBXML_TEST_VERSION
C0 663758 vsmig_set.c 59 vsmig_dest_adapter called LIBXML_TEST_VERSION
C0 663758 vsmig_util.c 1482 original pipe_ctrl from RMC =0x0
C0 663758 vscsi_vtd.c 416 No attribute for node
[/virtDev/blockStorage/AIX/devID], cnt=0
C0 663758 vsmig_util.c 790 ERROR: virtual device name already exists,
root103
C0 663758 mig_vscsi.c 352 leaving mig_vscsi fn= find_devices, rc= 85
Notes:
In our example we searched for the PID 663758. An important detail is revealed in the
second from last entry. This migration validation failed because the virtual device name
(root103) used at the source system VIOS already exists on the destination systems VIOS.
Instructor notes:
Purpose Show config log details.
Details
Additional information
Transition statement The following is a simple way to took for important error details in
the config log.
Uempty
Troubleshooting (5 of 5)
Another way to view the errors:
# grep ERROR config.log
C0 663752 vsmig_util.c 790 ERROR: virtual device name
already exists, root103
C0 663754 vsmig_util.c 790 ERROR: virtual device name
already exists, hb11vg103
C0 663756 vsmig_util.c 790 ERROR: virtual device name
already exists, hb11vg103
C0 663758 vsmig_util.c 790 ERROR: virtual device name
already exists, root103
Notes:
Most of the important error details will contain ERROR within the text entry.
Instructor notes:
Purpose
Details
Additional information
Transition statement Let take a look at Dual VIOS considerations.
Uempty
Notes:
Live Partition Mobility does not make any changes to the network setup on the source and
destination systems. It only checks that all virtual networks used by the mobile partition
have a corresponding shared Ethernet adapter on the destination system. Shared Ethernet
failover might or might not be configured on either the source or the destination systems.
If you are planning to use shared Ethernet adapter failover, remember not to assign the
Virtual I/O Servers IP address on the shared Ethernet adapter. Create another virtual
Ethernet adapter and assign the IP address on it. Partition migration requires network
connectivity through the RMC protocol to the Virtual I/O Server. The backup shared
Ethernet adapter is always offline as well as its IP address, if any.
The Virtual I/O Servers selected as mover service partitions are loaded by memory moves
and network data transfer. So, if multiple mover service partitions are available on either
the source or destination systems, we suggest distributing the load among them. This can
be done explicitly by selecting the mover service partitions either using the GUI, or the
command-line interface. Each mover service partition can manage up to four concurrent
active migrations and explicitly using multiple Virtual I/O Servers avoids queuing of
requests.
Network management can cause high CPU usage and usual performance considerations
apply: use uncapped Virtual I/O Servers and add virtual processors if the load increases.
Alternatively, create dedicated Virtual I/O Servers on the source and destination systems
that provide the mover service function separating the service network traffic from the
migration network traffic. You can combine or separate virtualization functions and mover
service functions to suit your needs.
Additional information
Transition statement
Figure 8-47. Dual VIO Server: Virtual SCSI with dual HMC consideration AN313.1
Notes:
Dual Virtual I/O Server and client mirroring
The migration process automatically detects which Virtual I/O Server has access to which
storage and configures the virtual devices to keep the same disk access topology. When
migration is complete, the logical partition has the same disk configuration it had on
previous system, still using two Virtual I/O Servers.
If the destination system has only one Virtual I/O Server, the migration is still possible and
the same virtual SCSI setup is preserved at the client side. The destination Virtual I/O
Server must have access to all disk spaces and the process creates two virtual SCSI
adapters on the same Virtual I/O Server.
Dual Virtual I/O Server and multipath I/O
When multiple Virtual I/O Servers are involved, multiple virtual SCSI combinations are
possible, because access to the same SAN disk can be provided on the destination system
by multiple Virtual I/O Servers. Live Partition Mobility automatically manages the virtual
SCSI configuration if an administrator does not provide specific mappings.
Uempty With multipath I/O, the logical partition accesses the same disk data using two different
paths, each provided by a separate Virtual I/O Server. One path is active and the other is
standby. The migration is possible only if the destination system is configured with two
Virtual I/O Servers that can provide the same multipath setup.
The partition that is moving must keep the same number of virtual SCSI adapters after
migration and each virtual disk must remain connected to the same adapter or adapter set.
An adapters slot number might change after migration, but the same device name is kept
by the operating system for both adapters and disks.
To migrate the partition with only one Virtual I/O Server configured on the destination
system, you must first remove one path from the source configuration before starting the
migration. The removal can be performed without interfering with the running applications.
The configuration becomes a simple single Virtual I/O Server migration.
A logical partition that is using only one Virtual I/O Server for virtual disks can be migrated
to a system where multiple Virtual I/O Servers are available. Because the migration never
changes a partitions configuration, only one Virtual I/O Server is used on the destination
system.
Multiple concurrent migrations
While a migration is in progress, you can start another one. When the number of migrations
to be executed grows, the setup time using the GUI might become long and you should
consider using the command-line interface. The migrlpar command can be used in scripts
to start multiple migrations in parallel.
A migration might fail validation checks and therefore not be started if the moving partition
adapter and disk configuration cannot be preserved on the destination system. We suggest
that you always perform a validation before performing a migration. The validation checks
the configuration of the involved Virtual I/O Servers and shows you the configuration that
will be applied.
Dual HMC considerations
To avoid concurrent operations on the same system, a locking mechanism is activated
when migrating a partition. The HMC that initiates a migration takes a lock on both
managed systems and the lock is released when migration is completed. The other HMC
can show the status of migration but cannot issue any additional configuration changes on
the two systems. The lock can be manually broken, but this option should be considered
carefully.
Dual Virtual I/O Server and virtual Fibre Channel multi-pathing
With multipath I/O, the logical partition accesses the same storage data using two different
paths, each provided by a separate Virtual I/O Server. The migration is possible only if the
destination system is configured with two Virtual I/O Servers that can provide the same
multipath setup. They both must have access to the shared disk data.
When migration is complete on the destination system, the two Virtual I/O Servers are
configured to provide the two paths to the data. If the destination system is configured with
only one Virtual I/O Server, the migration cannot be performed. The migration process
would create two paths using the same Virtual I/O Server, but this setup of having one
virtual Fibre Channel host device mapping the same LUNs on different virtual Fibre
Channel adapters is not recommended.
To migrate the partition, you must first remove one path from the source configuration
before starting the migration. The removal can be performed without interfering with the
running applications. The configuration becomes a simple single Virtual I/O Server
migration.
Notes:
Under Virtual I/O Server version 1.5 (or above) and on a POWER6 system, the IVM can be
used to perform the migration process. From the main Partition Management panel, select
the LPAR and Migrate from the task menu. The Status menu option allows you to view
information about the current migration processes.
To see the Mobility options, your system must have the PowerVM Enterprise Edition
feature.
Notes:
This visual shows what could be used to monitor the migration process. The bottom
window is what is seen after clicking the Status option in the Mobility task list. To refresh
the Percent Complete column, you must click Refresh.
Checkpoint
1. True or False: The VASI interface controls every phase of
the partition mobility process.
Notes:
Checkpoint solutions
1. True or False: The VASI interface controls every phase of the partition
mobility process.
The answer is false.
4. What log usually provides details not found in the HMC migration error
message?
The answer is the config log; alog t cfg.
Copyright IBM Corporation 2011
Additional information
Transition statement
Exercise
Unit
exerc
ise
Notes:
Unit summary
Having completed this unit, you should be able to:
Notes:
Estimated time
01:30
References
SG24-7940 PowerVM Virtualization on IBM System p (Redbook)
pSeries and AIX Information Center
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
After completing this unit, you should be able to:
Manage PowerVM system firmware
Update the Virtual I/O Server software
Single and dual VIO configurations
updateios command
Back up the Virtual I/O Server
backupios command (file, tape, CD, and DVD)
Restore the Virtual I/O Server
Backup tape, DVD, and tar file
Add disk space to a vSCSI client partition
Back up client partitions operating system to virtual DVD
Change partition availability priority
Activate the power saver mode
Manage hot-pluggable devices in the Virtual I/O Server
diagmenu command: VIOS diagnostic menu
Add hot-swap SCSI disk
Copyright IBM Corporation 2011
Notes:
The objectives list of what you should be able to do at the end of this unit.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
PowerVM Editions features are activated with a code, similar to the way that Capacity on
Demand is activated on IBM Systems and IBM SServer hardware. If your system is
purchased without the feature, you can later purchase it by requesting the appropriate
feature code. The following are some of the PowerVM feature codes:
Uempty
Table 1:
Machine Power Express Standard Enterprise
Type / systems edition edition edition
Model feature feature feature
code code code
9119 FHA 595 NA #7943 #8002
9125 F2A 575 NA #7949 #8024
9117 MMA 570 NA #7942 #7995
9406 MMA 570 NA #7942 #7995
8204 E8A 550 #7983 #7982 #7986
9409 M50 550 NA #7982 #7986
8203 E4A 520 #7983 #8506 #8507
9408 M25 520 NA #8506 #8507
9407 M15 520 NA #8506 NA
7998 61X JS22 NA #5409 #5649
7998 60X JS12 NA #5409 #5606
To activate PowerVM standard edition or PowerVM Enterprise edition, you must enter an
activation code from the Hardware Management Console (HMC) or using the ASMI menu
interface.
To activate these PowerVM features, on the HMC, you must have an HMC Super
Administrator user role. Then to enter a code, perform the following tasks:
1. Retrieve the Virtualization Technology activation code from the following:
www.ibm.com/systems/p/advantages/cod.
2. Look in the Activation tools list on the right side and click Activation codes by
machine serial number.
3. Enter the system type and serial number of your server.
4. Record the activation code that is displayed on the Web site. The activation code type is
VET (Virtualization Technology Code).
5. The easiest way to enter your activation code on your managed system is by using the
HMC. To enter your code, complete the following steps:
6. In the system management navigation area of the HMC, expand Servers.
7. In the working area select your managed system.
8. Select Capacity on Demand (CoD) > Advanced POWER Virtualization > Enter
Activation Code in the Tasks Pad.
9. Type your activation code in the Code field. If you have copied the code, click the
middle mouse button.
10. Click OK.
You can now begin using the Virtualization technologies, which include Micro-Partitioning,
Virtual SCSI, shared Ethernet adapter, multiple shared processor pools, Integrated
Virtualization Manager, and so on and Partition Mobility if you ordered a PowerVM
Enterprise edition code.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe the procedure to enable PowerVM.
Details You request the PowerVM firmware activation code in the same manner the
CoD activation code is requested. The code can be entered by way of the HMC or the
ASMI of the managed system. The HMC is the recommended option.
Additional information
Transition statement Now that we know how to enable PowerVM, lets discuss
managing the firmware of the PowerVM enabled system.
Uempty
Firmware management
Fix or update strategy
Managed system firmware (Licensed Internal Code) updates
Server firmware
Power subsystem firmware
I/O adapter and device firmware
Types of firmware maintenance
Concurrent (must use an HMC)
Disruptive
Reboot of managed system necessary
Acquiring the fix or update
http://www14.software.ibm.com/webapp/set2/firmware/gjsn
HMC
Performing the update
HMC
AIX or LINUX operating system
Stand-alone diagnostic CD
Notes:
Fixes provide changes to your software, Licensed Internal Code, or machine code that fix
known problems, add new function, and keep your server or HMC operating efficiently. For
example, you might install fixes for your operating system in the form of a program
temporary fix (PTF). Or, you might install a server firmware (Licensed Internal Code) fix
with code changes that are needed to support either new hardware or new functions of the
existing hardware.
A good fix strategy is an important part of maintaining and managing your server. If you
have a dynamic environment that changes frequently, then you should install fixes on a
regular basis. If you have a stable environment, you do not have to install fixes as
frequently. However, you should consider installing fixes whenever you make any major
software or hardware changes in your environment.
You can get fixes using a variety of methods, depending on your service environment. For
example, if you use an HMC to manage your server, you can use the HMC interface to
download, install, and manage your HMC and firmware (Licensed Internal Code) fixes. If
you do not use an HMC to manage your server, you can use the functions specific to your
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
operating system to get and apply your fixes.You can also use the managed system's ASMI
interface to apply the update. In addition, you can download or order many fixes through
Internet Web sites.
The server firmware is the part of the Licensed Internal Code that enables hardware, such
as the service processor. When you install a server firmware fix, it is installed on the
temporary side of the service processor.
The power subsystem firmware is the part of the Licensed Internal Code that enables the
power subsystem hardware in the model IBM System p 575 and IBM System p 595
servers. You must use an HMC to update or upgrade power subsystem firmware.
You must install HMC fixes before you install server firmware or power subsystem firmware
fixes so that the HMC can handle any fixes or new function that you apply to the server.
After you install HMC fixes, either install the power subsystem firmware and server
firmware fixes together, or install the power subsystem firmware first (if you have a model
IBM System p 575 or IBM System p 595 server), and then the server firmware second.
Transition statement Before updating the system firmware, you must make sure the
HMC is at a supported and compatible level.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
You must install any necessary HMC fixes before you install server firmware or power
subsystem firmware fixes so that the HMC can handle any fixes or new function that you
apply to the server. The following table lists currently supported firmware (FW) Release
Levels for Entry-level IBM Systems with POWER6 processors, as well as the compatibility
of HMC FW levels with system FW levels (as of September 2008). You can find this matrix
at http://www14.software.ibm.com/webapp/set2/sas/f/power5cm3/eltablep6.html
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The following table lists currently supported firmware (FW) Release Levels for Mid-Range
IBM Systems with POWER6 processors, as well as the compatibility of HMC FW levels
with system FW levels (as of September 2008). You can find this matrix at
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm3/emtablep6.html
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The following table lists currently supported firmware (FW) release levels for POWER6
systems, as well as the compatibility of HMC FW levels with system FW levels for High-end
IBM Systems with POWER6 processors (IBM System p 595 and IBM System p 575).
To access this table online, go to:
https://www14.software.ibm.com/webapp/set2/sas/f/power5cm/ehtablep6.html
The different POWER code matrix that list supported code combinations for IBM Power
Systems can be accessed at
https://www14.software.ibm.com/webapp/set2/sas/f/power5cm/supportedcode.html
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
https://www14.software.ibm.com/webapp/set2/sas/f/power5cm/supportedcodep7.html
Copyright IBM Corporation 2011
Notes:
The following table lists currently supported firmware (FW) release levels for POWER7
systems, as well as the compatibility of HMC FW levels with system FW levels for Entry,
Mid-range, and High-end IBM Systems with POWER7 processors at the time of writing.
To access these tables online, go to
https://www14.software.ibm.com/webapp/set2/sas/f/power5cm3/eltablep7.html
https://www14.software.ibm.com/webapp/set2/sas/f/power5cm3/emtablep7.html
https://www14.software.ibm.com/webapp/set2/sas/f/power5cm3/ehtablep7.html
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Firmware, also known as microcode, is Licensed Internal Code that fixes problems and
enables new system features as they are introduced. New features introduced are
supported by new firmware release levels. In between new hardware introductions, there
are fixes or updates to the supported features. These fixes are often bundled into service
packs. A service pack is referred to as an update level. A new release is referred to as an
upgrade level. Both levels are represented by the file name in the form of
PPMMXXX_YYY_ZZZ. PP, and MM are package and machine type identifiers. PP can be
01 for managed system or it can be 02 for power subsystem. The MM identifier is a EM, EL,
or EH for POWER6 systems, depending on the model and a EP or ES for Power System
firmware. The firmware version file applicable to POWER6 systems is in the form of
01ELXXX_YYY_ZZZ for Low-end servers, 01EMXXX_YYY_ZZZ for Mid-range servers and
01EHXXX_YYY_ZZZ for High-end servers.
The file naming convention for system firmware is 01ELXXX_YYY_ZZZ, where XXX is the
stream release level, YYY is the service pack level, and ZZZ is the last disruptive service
pack level.
Uempty Using the previous example, the system firmware 01EL320_076 would be described as
release level 320, service pack 076.
Each stream release level supports new machine types, new features, or both.
Firmware updates can be disruptive or concurrent. A disruptive upgrade is defined as one
that requires the target system to be shutdown and powered off prior to activating the new
firmware level. A new release level upgrade will always be disruptive. All other upgrades
are defined as concurrent, meaning that they can be applied while the system is running.
Concurrent updates require an HMC but are not guaranteed to be non-disruptive.
In general, a firmware upgrade is disruptive if:
The release levels (XXX) are different. Example: Currently installed release is EM310,
new release is EM320.
The service pack level (YYY) and the last disruptive service pack level (ZZZ) are equal.
Example: EM320_120_120 is disruptive, no matter what level of EM310 is currently
installed on the system.
The service pack level (YYY) currently installed on the system is lower than the last
disruptive service pack level (ZZZ) of the new service pack to be installed. Example:
Currently installed service pack is EM310_120_120 and the new service pack is
EM310_152_130.
An installation is concurrent if:
The service pack level (YYY) is higher than the service pack level currently installed on
your system. Example: Currently installed service pack is EM310_126_120, new
service pack is EM310_143_120.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Decoding the fix/update file-naming convention.
Details Use the detail in the student notes to describe the naming convention.
Additional information
Transition statement Before updating, you must identify the HMC and the managed
system code levels.
Uempty
Apply updates
Notes:
The HMC Version 7 navigation pane contains the primary navigation links for managing
your system resources and the Hardware Management Console. These include the
Updates link. Updates provides a way for you to access information on both HMC and
system firmware code levels at the same time without running a task. The Updates work
pane displays the Hardware Management Console code level, system code levels, and the
ability to install corrective service by clicking Update HMC.
The HMC version can also be checked as hscroot from the shell prompt as follows:
version= Version: 7
Release: 3.3.0
Service Pack: 1
HMC Build level 20080602.1
MH01113: Support for new T0 Synergy brand. (06-02-2008)
","base_version=V7R3.3.0"
You can check available updates at:
http://www14.software.ibm.com/webapp/set2/firmware/gjsn
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
This site allows you to download updates for the system firmware as well as other system
components, such as devices, adapters, disks, and so on. You can also download ISO
images to use when updating the system firmware from the HMC or diagnostic CD.
From this Web page, you can select to download an RPM or ISO of your desired update.
Selecting Desc provides detailed information of the description and purpose, requirements,
and how to install the update.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
To examine Firmware
current LIC update
levels From CD
only
Copyright IBM Corporation 2011
Notes:
For each selected Managed system, you can launch five tasks. The first task is to perform
updates to the current LIC release. This is sometimes referred to as applying fixes to the
firmware. There are many options as to where you apply the update from, whether from an
IBM Web site, technical support system, CD, and so forth. These fixes can be concurrent
or disruptive.
If you need to upgrade to a whole new firmware release, this is done with the second
option, Upgrade Licensed Internal Code to a new release. If you are upgrading to a new
release, you can obtain images from an online source; however, they must be applied from
a CD-ROM. For example, you might obtain the CD from IBM, or you might download an
ISO image from an online source and create your own CD. This CD is used to upgrade the
release level of the managed system firmware.
The third task (Flash Side Selection) enables you to select which flash side will be active
after the next activation, t-side (temporary side) or p-side (permanent side). The Service
Pack maintains two copies of the server firmware. One copy is held in the t-side repository
Uempty (temporary) and the other copy is held in the p-side repository (permanent) This Flash Side
Selection option is for IBM service use only.
The Check system readiness task checks for any errors on the internal code for the target
managed system.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Discuss the update options available in the Updates HMC application.
Details This course does not cover the firmware update process details, because this is
a basic skill we hope the student already has, and this is discussed in the AU73 course.
Additional information
Transition statement Lets see how to perform an update without HMC.
Uempty
Notes:
If you do not have an HMC attached to your managed system, your first step in the update
process is to determine the firmware level of the system. This can be identified through the
Advance System Management Interface (ASMI).
A system with no HMC is also known as an unmanaged system. The ASMI is used to
power on the system and perform other useful functions. Using a serial cable and a
terminal emulator program like HyperTerminal on Windows, the text-based ASMI and the
active console can be accessed. When the serial connection to the system is established,
press Enter to log in and to be presented with the following ASMI login screen:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Welcome
Machine type-model: 8204-E8A
Serial number: 652AFE2
Date: 2008-7-21
Time: 20:12:48
Service Processor: Primary
User ID: admin
Password: *****
User ID to change: admin
Current password for user ID admin: *****
New password for user: ******
New password again: ******
Operation completed successfully.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
1. Power/Restart Control
2. System Service Aids
3. System Information
4. System Configuration
5. Network Services
6. Performance Setup
7. On Demand Utilities
8. Concurrent Maintenance
9. Login Profile
99. Log out
Notes:
The firmware level is the Version value in the upper left of the screen. The Power/Restart
Control option allows you to power on the system and observe the initialization process.
As the system powers up, you should make the system boot from the CD/DVD rom by
accessing the SMS mode to change the bootlist.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Access the ASMI from a Web browser. Direct your Web browser to https://<your service
processor IP address>.
Another way to get the system firmware level is by running the lsmcode command from
within one of the systems AIX LPARs.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Diagnostic Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will not be used.
Advanced Diagnostics Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will be used.
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.) TASKS SELECTION LIST 801004
This selection will list the tasks supported by these procedures.
Once a task is selected, a resource menu may be presented showing
all resources supported by the task. From the list below, select a task by moving the cursor to
Resource Selection the task and pressing 'Enter'.
This selection will list the resources in the system that are supported To list the resources for the task highlighted, press 'List'.
by these procedures. Once a resource is selected, a task menu will
be presented showing all tasks that can be run on the resource(s). [MORE...24]
Format Media
Gather System Information
Hot Plug Task
F1=Help F10=Exit F3=Previous Menu
Identify and Attention Indicators
Local Area Network Analyzer
Log Repair Action
Microcode Tasks
RAID Array Manager
SSA Service Aids
This selection provides tools for diagnosing and resolving
problems on SSA attached devices.
Update and Manage system Flash
[BOTTOM]
Notes:
The diagnostic program can be used to update the system firmware by accessing Task
Selection -> Update and Manage System Flash.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Steps required:
View system firmware level
ASMI Welcome panel
lsmcode command
The current permanent system firmware image is EL320_076.
The current temporary system firmware image is EL320_076.
The system is currently booted from the temporary image.
Download fix/update
http://www14.software.ibm.com/webapp/set2/firmware/gjsn
Notes:
Installing server firmware fixes through the operating system is a disruptive process. The
permanent level is also known as the backup level. The temporary level is also known as
the installed level. The system was booted from the temporary side, so at this time the
temporary level is also the activated level.
From a computer or server with an Internet connection, go to the Microcode Downloads
website at http://www14.software.ibm.com/webapp/set2/firmware/gjsn. Select your
machine type and model from the drop-down list under Download microcode by machine
type and model. Click Go. An information window is opened. Click Continue. The available
firmware levels are displayed. Record the available firmware.
Select the check box associated with all of the fixes you want to download and then select
Continue which is at the bottom of the page. Perform the following in your LPAR which has
Service Authority (the Service Partition).
To unpack the RPM file, enter one of the following commands at the AIX or Linux command
prompt:
Uempty If you want to unpack from a CD, enter rpm -Uvh --ignoreos
/mnt/filename.rpm
If you want to unpack from the server's hard drive, enter rpm -Uvh --ignoreos
/tmp/fwupdate/filename.rpm where filename is the name of the RPM file that
contains the server firmware; for example, 01EL3xx_yyy_zzz.rpm.
When you unpack the RPM file, the server firmware fix file is saved in the /tmp/fwupdate
directory on the server's hard drive in the following format: 01EL3xx_yyy_zzz.
You need the server firmware fix file name in the next step. To view the name, enter the
following at an AIX or Linux command prompt: ls /tmp/fwupdate
Note: To perform this task, you must have root user authority.
The name of the server firmware fix file is displayed. For example, you might see output
similar to the following: 01EL3xx_yyy_zzz
To install the server firmware fix
From an AIX command prompt enter the following:
cd /tmp/fwupdate
/usr/lpp/diagnostics/bin/update_flash -f fwlevel
Where fwlevel is the specific server firmware fix file name, such as 01EL3xx_yyy_zzz.
During the server firmware installation process, reference codes CA2799FD and
CA2799FF are alternately displayed on the control panel. After the installation is complete,
the system is automatically powered off and powered on.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Describe updating the system firmware from an AIX partition.
Details Use the student note content to describe the details. You can use this procedure
on a system that is managed by an HMC. The LPAR must be the Service Partition. If the
system was managed by an HMC, you would be able to assign the Service Partition
through the General tab of the Manage System Properties. Otherwise, with the
unmanaged system, the default partition is set up to be the service partition by IBM
manufacturing.
Additional information
Transition statement
Uempty
Fix Pack Fix Pack 9.1 Fix Pack 10.1 Fix Pack 11.1 Migration DVD Fixpak 21
5.3.0 TL06 or 5.3.0 TL08 or
AIX BASE 5.3.0 TL07 or later AIX 6.1 AIX 6.1 TL2 SP3
later later
Notes:
Existing VIOS installations can refresh to the latest VIOS level by applying the Fix Pack. If
your VIOS is installed with the previous VIOS install media, or running with a Fix Pack prior
to Fix Pack 21 you should update it by applying Fix Pack 21
If you are updating from VIOS level 1.5, you must perform an upgrade to version 2.1 This
upgrade preserves the virtual devices configuration. Then you can update to the 2.1.1.0
level by installing the Fix Pack 21.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Service pack
Applies to only one (the latest) VIOS level
Critical fixes for issues found between fix pack releases
Can only be applied to the fix pack release for which it is specified
Interim fix
Applies to only one (the latest) VIOS level
Provides a fix for a specific issue
Notes:
The service strategy for VIOS has changed. In addition to Fix Packs, Service Packs for
VIOS will be released, depending on the number of needed changes. VIOS Service Packs
consist of critical changes found between Fix Pack releases.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Before you start, ensure that the following statements are true:
An HMC is attached to the managed system.
A DVD optical device is assigned to the Virtual I/O Server logical partition.
The Virtual I/O Server migration installation media is required.
After the migration is complete, the Virtual I/O Server logical partition is restarted to its
preserved configuration prior to the migration installation. It is recommended that you verify
that migration was successful by checking results of the installp command and running the
ioslevel command. It should indicate that the ioslevel is now $ ioslevel 2.1.0.0. You can
restart previously running daemons and agents such as FTP and Telnet, and the previously
running agents, such as ITUAM.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Figure 9-19. VIO Server software updates: Single VIO Server AN313.1
Notes:
When performing updates in a single Virtual I/O Server environment, you must plan for
virtual I/O client downtime. This is because you must bring down all of the associated
virtual I/O clients before you can start the update of the Virtual I/O Server.
To avoid complications during and after an update, you should verify the system is fully
operational as well as trouble free, and check the environment's configuration before
updating the Virtual I/O Server software. The following list is an example of useful
commands that can be used to document the configuration of the virtual I/O client and
Virtual I/O Server:
lsvg rootvg: On the Virtual I/O Server and virtual I/O client, check for stale PP's and
stale PV.
lsvg -pv rootvg: On the Virtual I/O Server, check for missing disks.
netstat -cdlistats: On the Virtual I/O Server, check that the Link status is Up on
all used interfaces.
Uempty errpt: On the virtual I/O client, check for CPU, memory, disk, or Ethernet errors, and
resolve them before you continue.
lsvg -p rootvg: On the virtual I/O client, check for missing disks.
netstat -v: On the virtual I/O client, check that the Link status is Up on all used
interfaces.
If a current backup is not available, perform a backup of the Virtual I/O Server and the
virtual I/O client. Backup of the Virtual I/O Server is done by backupios (discussed later in
this unit), and the backup of the virtual I/O client can done by using mksysb, savevg, Tivoli
Storage Manager, or similar backup products.
To update or upgrade the Virtual I/O Server, use the following steps (in this case, from a
locally attached DVD/CD drive):
1. Bring down the virtual I/O clients connected to the Virtual I/O Server.
2. Apply the update with the updateios command (detailed later in this unit).
3. Reboot Virtual I/O Server.
4. Check the new level with ioslevel.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce the steps required to update the VIO Server.
Details In a single VIO Server configuration, you must plan for the impact the update
process will have on the clients. Depending on the devices served, the clients might
experience a great amount or very little disruption. For example, if the client is being served
its boot disk from this VIO Server, the client must be shut down prior to the update
procedures.
Additional information
Transition statement Lets look at how things differ in a dual VIO Server configuration.
Uempty
VIO Server software updates:
Dual VIO Servers (1 of 4)
Dual VIO Server configuration is recommended.
Minimizes client disruptions
Check and document the virtual Ethernet and virtual SCSI disk
configurations.
Figure 9-20. VIO Server software updates: Dual VIO Servers (1 of 4) AN313.1
Notes:
When applying an update to the Virtual I/O Server in a dual Virtual I/O Server environment,
you can do so without having planned or unplanned downtime. However, if the Virtual I/O
Server is updated from 1.3 to 1.4 or higher, and you want to migrate from network interface
backup to shared Ethernet adapter fail over on the clients, you have planned downtime at
the virtual I/O client when changing the virtual network setup. Migrating from Network
Interface Backup to shared Ethernet adapter failover is optional.
It is always good practice to check the virtual Ethernet and virtual SCSI disk device
configurations on the Virtual I/O Server and virtual I/O client before starting the update.
Also, consider checking the physical adapter connections and the virtual device mappings.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
As seen in this example, all of the virtual adapters on the virtual I/O client are up and
running:
#netstat -v
.. (Lines omitted for clarity)
.Virtual I/O Ethernet Adapter (l-lan) Specific Statistics:
---------------------------------------------------------
RQ Length: 4481
No Copy Buffers: 0
Filter MCast Mode: False
Filters: 255
Enabled: 1 Queued: 0 Overflow: 0
LAN State: Operational
Hypervisor Send Failures: 0
Receiver Failures: 0
Send Errors: 0
Hypervisor Receive Failures: 0
ILLAN Attributes: 0000000000003002 [0000000000002000]
.. (Lines omitted for clarity)
The following is run on the VIOS number 1:
$ netstat -cdlistats
.. (Lines omitted for clarity)
.Virtual I/O Ethernet Adapter (l-lan) Specific Statistics:
---------------------------------------------------------
RQ Length: 4481
No Copy Buffers: 0
Trunk Adapter: True
Priority: 1 Active: True
Filter MCast Mode: False
Filters: 255
Enabled: 1 Queued: 0 Overflow: 0
LAN State: Operational
.. (Lines omitted for clarity)
The following might be seen on VIOS number 2:
$ netstat -cdlistats
.. (Lines omitted for clarity)
.Virtual I/O Ethernet Adapter (l-lan) Specific Statistics:
---------------------------------------------------------
RQ Length: 4481
No Copy Buffers: 0
Trunk Adapter: True
Priority: 2 Active: False
Filter MCast Mode: False
Filters: 255
Enabled: 1 Queued: 0 Overflow: 0
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Introduce the steps required to update the VIOS in a dual VIO Server
configuration.
Details Details are in the notes.
Additional information
Transition statement Lets look at items needing attention if you are using an MPIO
configuration.
Uempty
VIO Server software updates:
Dual VIO Servers (2 of 4)
SAN switch SAN switch
VIOS 1 VIOS 2
FC FC
vSCSI vSCSI
vSCSI vSCSI
MPIO
Figure 9-21. VIO Server software updates: Dual VIO Servers (2 of 4) AN313.1
Notes:
How to check the disk status depends on how the disks are shared from the Virtual I/O
Server. You want to verify that everything is alright before you start the update.
If you have an MPIO setup similar to the example shown, you should run the following
commands before and after updating the first Virtual I/O Server. This allows you to check
the disk path status.
lspath: On the virtual I/O client, check all the paths to the disks, they should all be in
the enabled state.
lsattr -El hdisk0: On the virtual I/O client, look at the MPIO heartbeat values for
hdisk0. Verify the hcheck_mode attribute is set to nonactive and hcheck_interval
attribute is set to 60.
If you are using an IBM storage solution, then verify the reserve_policy attribute is set to
no_reserve.
Other storage vendors might require other values for reserve_policy. You should check this
attribute value at the Virtual I/O Server.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Identify things that need to be checked and documented in an MPIO
configuration.
Details Before performing the update, you should check your MPIO configuration.
Check to see that the client is not using the path that includes the VIO Server being
updated.
Additional information
Transition statement What should you check if you are using LVM mirroring?
Uempty
VIO Server software updates:
Dual VIO Servers (3 of 4)
VIOS 1 VIOS 2
vSCSI vSCSI
vSCSI vSCSI
Client partition
LVM
mirroring
Figure 9-22. VIO Server software updates: Dual VIO Servers (3 of 4) AN313.1
Notes:
If your LVM disk environment is similar to the figure, you should check the LVM status of
the disk shared by the Virtual I/O Server. Verify that everything is okay before performing
the update.
lsvg rootvg: On the virtual I/O client, check for stale PPs and the quorum must be
off.
lsvg -p rootvg: On the virtual I/O client, check for missing hdisk.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Things to check in an LVM mirrored disk environment.
Details This is a simple but very popular configuration.
Additional information
Transition statement Lets look at what commands should be used when updating in a
dual VIO Server configuration.
Uempty
VIO Server software updates:
Dual VIO Servers (4 of 4)
1. Run updateios.
2. Reboot the standby Virtual I/O Server when update is done.
$ shutdown -restart
3. Check the new level with ioslevel.
4. Verify updated environment.
5. Start the update on the other VIOS
6. If using SEA, change the standby/primary status.
$ chdev -dev ent4 -attr ha_mode=standby
ent4 changed
$ netstat -cdlistats
updateios
7. Reboot the Virtual I/O Server (shutdownrestart).
8. Check the new level with ioslevel.
9. Reset the Virtual I/O Server SEA role back to primary using chdev.
$ chdev -dev ent4 -attr ha_mode=auto
ent4 changed
Copyright IBM Corporation 2011
Figure 9-23. VIO Server software updates: Dual VIO Servers (4 of 4) AN313.1
Notes:
If using shared Ethernet adapter failover, use the netstat command to see that the
interface is not active on the VIO Server you are about to update. In the following command
output, we can see the adapter is the standby adapter (Priority=2) and it is not active
(Active=false):
$ netstat -cdlistats
. (Lines omitted for clarity)
Trunk Adapter: True
Priority: 2 Active: False
Filter MCast Mode: False
Filters: 255
Enabled: 1 Queued: 0 Overflow: 0
LAN State: Operational
. (Lines omitted for clarity)
$
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Verify that the standby Virtual I/O Server and the virtual I/O client are connected to the
Virtual I/O Server environment. If you have an MPIO environment, run the lspath on the
virtual I/O client and verify that the all paths are enabled. If you have an LVM environment,
You will have to run varyonvg and the volume group should begin to sync. If not, run
syncvg -v on the volume groups that use virtual disk from the Virtual I/O Server
environment so that all the volume groups are in sync.
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 511 488 102..94..88..102..102
hdisk1 missing 511 488 102..94..88..102..102
# varyonvg rootvg
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 511 488 102..94..88..102..102
hdisk1 active 511 488 102..94..88..102..102
# lsvg rootvg
VOLUME GROUP: rootvg VG IDENTIFIER: 00c478de00004c00000
00006b8b6c15e
VG STATE: active PP SIZE: 64 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1022 (65408 megabytes)
MAX LVs: 256 FREE PPs: 976 (62464 megabytes)
LVs: 9 USED PPs: 46 (2944 megabytes)
OPEN LVs: 8 QUORUM: 1
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
Verify that the Ethernet connects to the Virtual I/O Server being used for the shared
Ethernet adapter failover scenario using netstat -cdlistat on Virtual I/O Server or
netstat -v on the virtual I/O client for network interface backup and check the link
status for the network interface backup adapters.
Uempty If the shared Ethernet adapter failover is used, shift the standby/primary Virtual I/O
Server, using chdev and check with netstat -cdlistats that the state has changed.
$ chdev -dev ent4 -attr ha_mode=standby
ent4 changed
$ netstat -cdlistats
. (Lines omitted for clarity)
Trunk Adapter: True
Priority: 1 Active: False
Filter MCast Mode: False
. (Lines omitted for clarity)
Apply the update to the Virtual I/O Server, which now is the standby Virtual I/O Server
using updateios.
Reboot the Virtual I/O Server: shutdown -restart.
Check the new level with ioslevel.
Verify the environment of the standby Virtual I/O Server and the associated virtual I/O
clients.
If you have an MPIO environment, run the lspath on the virtual I/O client and verify all
paths is enabled. If you have an LVM environment, You will have to run varyonvg and
the volume group should begin to sync. If not, run syncvg -v on the volume groups
that use virtual disk from the Virtual I/O Server environment so all the volume groups
are in sync.
Verify the Ethernet connect to the Virtual I/O Server, using netstat -cdlistats and
netstat -v check for link status.
Reset the Virtual I/O Server role back to primary using chdev.
$ chdev -dev ent4 -attr ha_mode=auto
ent4 changed
The update is now done.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Identify the commands used to update the VIOS.
Details
Additional information
Transition statement
Uempty
Notes:
The update package is available as a set of downloadable files. As new updates become
available, the package name will change. New packages are cumulative. To obtain all
available fixes, download the latest package.
The Web site also allows you to download the package by way of a Java applet. This
enables you to download the entire package in one session. This applet can download files
to your system only if you grant the access. You are prompted for this. If you deny access,
the applet does not download files.
There is a link to downloading the ISO image. Also, you can order the CD-ROM through the
Delivery Service Center. The order site requires you to sign on with an IBM ID. You receive
the CD-ROM in several days.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Identify how to acquire the updated code.
Details
Additional information
Transition statement When you have acquired the update, you can use the updateios
command.
Uempty
updateios command
updateios syntax:
$ updateios cleanup
Notes:
The updateios command is used to install fixes, or updates the Virtual I/O Server to the
latest maintenance level. Before installing a fix or maintenance level, the updateios
command first runs a preview installation and displays the results. Upon completion of the
preview, the user is then prompted to either continue or exit. If the preview fails for any
reason, then the updates should not be installed.
The -install flag is used to install new file sets onto the Virtual I/O Server. This flag should
not be used to install fixes or maintenance levels.
The -cleanup flag cleans up after an interrupted installation and attempts to remove all
incomplete pieces of the previous installation. Cleanup should be performed whenever any
software product or update is in a state of either applying or committing and can be run
manually as needed.
The -commit flag commit all uncommitted updates to the Virtual I/O Server.
The -reject flag rejects all uncommitted updates to the Virtual I/O Server.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
If the -remove flag is specified, the listed file sets are removed from the system. The file
sets to be removed must be listed on the command line or in the RemoveListFile file.
The log file, install.log in the user's home directory, is overwritten with a list of all file
sets that were installed.
Flags
-accept agrees to required software license agreements for software to be installed.
-cleanup cleans up after an interrupted installation or update.
-commit commits all specified updates.
-dev Media Specifies the device or directory containing the images to install.
-f forces all uncommitted updates to be committed prior to applying the new updates.
When combined with the -dev flag, it commits all updates prior to applying any new
ones. When combined with the -reject flag, it rejects all uncommitted updates without
prompting for confirmation.
-file specifies the file containing a list of entries to uninstall.
-install installs new file sets onto the Virtual I/O Server.
-reject rejects all specified uncommitted updates.
-remove performs an uninstall of the specified software.
To update the Virtual I/O Server to the latest level, where the updates are located on the
mounted file system /home/padmin/update, type updateios -dev
/home/padmin/update.
To update the Virtual I/O Server to the latest level, when previous levels are not committed,
type updateios -f -dev /home/padmin/update.
To reject installed updates, type updateios -reject.
To cleanup partial installed updates, type updateios -cleanup.
To commit the installed updates, type updateios -commit.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
6. Type $ ioslevel.
Notes:
Log in to the Virtual I/O Server as the user padmin.
Create directory on the Virtual I/O Server:
$ mkdir directory_name
Transfer update files using ftp to the directory created.
Apply the update by running the updateios command:
$ updateios -dev directory_name -install -accept
Accept to continue installation after preview update is run.
Verify that the update was successful by checking results of the updateios command and
running the ioslevel command. The result of ioslevel should be equal to the level of the
package downloaded:
$ ioslevel
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
6. Type $ ioslevel.
Copyright IBM Corporation 2011
Notes:
If the remote file system is to be mounted read-only, you must first rename the fix pack file
tableofcontents.txt to .toc. Failure to do this will prevent you from being able to install this fix
pack.
Log in to the Virtual I/O Server as user padmin.
Mount the remote directory onto the Virtual I/O Server:
$ mount remote_machine_name:directory /mnt
Apply the update by running the updateios command:
$ updateios -dev /mnt -install -accept
Verify update was successful by checking results of updateios command and running
ioslevel command. The result of ioslevel should be equal to the ioslevel of the package
downloaded:
$ ioslevel
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
5. Type $ ioslevel.
Notes:
This fix pack can be burned onto a CD using the ISO image. After the CD has been
created, the following steps need to be performed to apply the update:
Log in to the Virtual I/O Server as user padmin.
Place the update CD into the drive.
Apply the update by running the updateios command: $ updateios -dev /dev/cdX
-install -accept (where X is device number between 0 and N).
Verify update was successful by checking results of updateios command and running the
ioslevel command. The result of ioslevel should be equal to the level of the package
downloaded.
Refer to the Virtual I/O Server online publications for additional information on the
updateios, ioslevel, and mount commands. Information on these commands can be
obtained from the IBM Information Center.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
A complete disaster recovery (DR) strategy for the Virtual I/O Server should include
backing up the four areas listed so that we can recover the virtual devices and their
physical backing devices. The reprovision of these four areas, if necessary, is followed by
our server backup strategy, which rebuilds the AIX or Linux logical partitions. If we just
want to back up the Virtual I/O Server, then external device configuration is the one we are
most interested in.
Uempty This information is beyond the scope of this document but, we mention it here to make the
reader aware that a complete DR solution for a physical or virtual server environment has a
dependency on this information. The method to collect and record the information depends
not only on the vendor and model of the infrastructure systems at the primary site, but also
what is present at the DR site.
Memory, CPU, virtual devices, and physical devices defined on the HMC
The definition of the Virtual I/O Server logical partition on the HMC includes such things as
how much CPU, memory, and which physical adapters are to be used. In addition to this,
the virtual device configuration (for example, virtual Ethernet adapters and which virtual
LAN ID they belong to) needs to be captured. The backup and restore of this data is
beyond the scope of this document, but more information can be found in the IBM
Information Center under the Backing up partition profile data topic.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
To remote file mksysb image From an AIX NIM server and a standard
system mksysb system installation
Notes:
You can back up the Virtual I/O Server and user-defined virtual devices using the
backupios command. You can also use IBM Tivoli Storage Manager to schedule backups
and store backups on another server. Different media can be used for performing a backup,
either DVD, tape, local or remote file systems. The restoration method depends on what
method was used for performing the backup.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
User-defined virtual devices include metadata, such as virtual device mappings, that define
the relationship between the physical environment and the virtual environment. This data
can be saved to a location that is automatically backed up when you use the backupios
command.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
2. Activate each volume group and storage pool that you want to back
up:
$ activatevg <volume_group>
Notes:
In situations where you plan to restore the Virtual I/O Server to a new or different system
(for example, in the event of a system failure or disaster), you need to back up both the
Virtual I/O Server and user-defined virtual devices.
The user-defined virtual devices include metadata, such as virtual device mappings that
define the relationship between the physical environment and the virtual environment, but
also any user-defined disk structure. This user-defined disk structures can change over
time if you add more clients or perform changes in the storage pools configuration.
In addition to backing up the Virtual I/O Server, you need to back up user-defined virtual
devices in preparation of a system failure or disaster. This can be accomplished using the
savevgstruct command.
lsmap command
The lsmap output does not gather information such as SEA adapter control channels (for
SEA failover), IP addresses to ping, and whether threading is enabled for the SEA devices.
These settings and any other changes that have been made (for example MTU settings)
must be documented separately. It is also vitally important to use the slot numbers as a
reference for the virtual SCSI and virtual Ethernet devices, not the vhost numbers or ent
numbers. The vhost and ent devices are assigned by the Virtual I/O Server as they are
found at boot time. If we add in more devices after subsequent boots, then these will be
sequentially numbered.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
savevgstruct
Structure information goes to /tmp/vgdata
Automatically called by backupios (output is automatically backed up)
$ savevgstruct <volume group or storage pool>
restorevgstruct
Restore the volume group or storage pools structure.
$ restorevgstruct -ls
Usage: restorevgstruct {-ls | -vg VolumeGroupLabel [DiskName ...]}
Restores the user volume group.
-ls Displays a list of saved volume groups.
-vg Specifies the name of the volume group.
DiskName Specifies the names of disk devices to be
used instead of the disk devices listed in
the saved volume group.
Notes:
This is good for disaster recovery or in situations where you are not restoring to the same
system or the disks. The user-defined volume group disk structures can be backed up
using the savevgstruct command. This command writes a backup of the structure of a
named volume group (and therefore storage pool) to the /tmp/vgdata directory and also
in /home/ios/backupvgs directory. For example, if we wanted to back up the structure
in the volgrp01 volume group, we would run the command as follows:
$ savevgstruct volgrp01
Creating information file for volume group volgrp01.
You must run the savevgstruct command for each volume group and storage pool present
on the system, and these must be active. Use the lsvg command to list all of the volume
groups on the system and the activatevg command if necessary.
The savevgstruct command is automatically called before the backup commences for all
active non-rootvg volume groups/storage pools on a Virtual I/O Server when the
backupios command is run. The data (a backup and restore format file) from this is written
in the /home/ios/vgbackups directory on the VIO Server. This information can be used
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
after a Virtual I/O Server restoration for rebuilding the non rootvg structure using the
restorevgstruct command.
In the previous version of the Virtual I/O Server (for example, version 1.3), the data was not
written correctly in the /home/ios/vgbackups directory but is still written in /tmp.
However, the /tmp file system is not backed up automatically when performing a
backupios. The /tmp directory is excluded by default from an mksysb backup, so you had
to enter the oem_setup_env shell and make a copy of the .data files into the
/home/ios/vgbackups directory. The following command could be used after all
savevgstruct command had been completed:
find /tmp/vgdata -name "*.data" -exec cp {} /home/ios/vgbackups/ \;
This is no longer needed if you are using Virtual I/O Server Version 1.4, 1.5, or Version 2.1.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Tape backup
$ backupios -tape /dev/rmt0
DVD backup
$ backupios -cd /dev/cd0 -udf
mksysb backup
$ backupios -file /mnt/VIOS_BACKUP.mksysb -mksysb
DVD-RAM backup
$ backupios -cd /dev/cd0 -udf -accept
Notes:
backupios command
The backupios command backs up the Virtual I/O Server operating system, and the user
defined virtual devices by default. The user-defined virtual devices also include the different
user-defined volume groups and storage pools. By default the backupios command
invokes the savevgstruct command to backup in /home/ios/vgbackups directory the
structure of any online user-defined volume group and storage pools. If you have setup a
virtual Media Repository on your Virtual I/O Server, then the savevgstruct will also backup
the content of your Media repository (the different virtual media). The savevgstruct invokes
the AIX savevg command.
Backing up on tape
The result of running the backupios as command with the -tape flag is shown.
Tape backup
$ backupios -tape /dev/rmt0
Uempty
Creating information file (/image.data) for rootvg ..
Creating tape boot image .....
Creating list of files to back up.
Backing up 23622 files .
23622 of 23622 files (100%)
0512-038 mksysb: Backup Completed Successfully.
bosboot: Boot image is 26916 512 byte blocks.
bosboot: Boot image is 26916 512 byte blocks.
The result of this command is a bootable tape that allows an easy restore of the Virtual I/O
Server.
Backing up on DVD
There are two types of DVD media that can be used for backing up: DVD-RAM and DVD-R.
DVD-RAM media can support both -cdformat and -udf format, while DVD-R media only
supports the -cdformat. The DVD device cannot be virtualized and assigned to a client
partition when performing backups. Remove the device from the client and the virtual SCSI
mapping from the server before proceeding with the backup.
$ backupios -cd cd0 -udf
Creating information file for volume group data pool
Creating list of files to back up. Backing up six files
6 of 6 files (100%)
0512-038 savevg: Backup Completed Successfully.
Backup in progress. This command can take a considerable amount of time
to complete, please be patient
Initializing mkcd log: /var/adm/ras/mkcd.log ... Verifying command
parameters
Creating image.data file ...
Creating temporary file system: /mkcd/mksysb_image Creating mksysb image
Creating list of files to back up.
Backing up 27129 files
27129 of 27129 files (100%)
0512-038 mksysb: Backup Completed Successfully. Populating the CD or DVD
file system
Copying backup to the CD or DVD file system
Building chrp boot image
Removing temporary file system: /mkcd/mksysb_image
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Once again you use the backupios command, but the big difference here is that all of the
previous commands resulted in some form of bootable media that can be used to directly
recover the Virtual I/O Server. This command results in either a TAR file, which contains all
of the information needed for a restore, or a mksysb image, but both methods depend on
an installation server for restoration. The restoration server can be an HMC using the
Network Installation Manager on Linux facility and the installios command. Alternatively,
we would use an AIX Network Installation Management (NIM) server and a standard
mksysb system install. Both of these methods are covered later in the restore section.
If you are using the NIM server for the install, it must be running a level of AIX that can
support the Virtual I/O Server install. For this reason the NIM server should be running the
very latest technology level and service packs at all times.
You can use the backupios command to write to a local file on the Virtual I/O Server but
the more common scenario would be to perform a backup to a remote NFS file system, the
ideal situation might be onto the NIM server as the restore server. In the following example,
the NIM server has a host name of SERVER5 and the Virtual I/O Server is LPAR01.
The first step is to set up the NFS file system export on the NIM server. Here, we are going
to export a file system called /export/ios_backup, and in this case, the
/etc/exports looks similar to the following:
#more /etc/exports
/export/ios_backup
-sec=sys:krb5p:krb5i:krb5:dh,rw=lpar01.ilsvpn.atlanta.ibm.com,root=lpar0
1.ilsvpn.atlanta.ibm.com
#
The NFS server must have the root access (NFS attribute) set on the file system exported
to the Virtual I/O Server logical partition for the backup to succeed.
Make sure the name resolution is functioning between the NIM server and the Virtual I/O
Server for both IP and host name. To edit the name resolution on the Virtual I/O Server, use
the hostmap command to manipulate the /etc/hosts file or the cfgnamesrv command to
change the DNS parameters.
Examples
hostmap -addr 192.100.201.7 -host alpha bravo charlie
The IP address 192.100.201.7 is specified as the address of the host that has a primary
host name of alpha with synonyms of bravo and charlie.
To add a domain entry with a domain name of abc.aus.century.com, type:
cfgnamesrv -add -dname abc.aus.century.com
To add a name server entry with IP address 192.9.201.1, type:
cfgnamesrv -add -ipaddr 192.9.201.1
Uempty The backup of the Virtual I/O Server can be fairly large, so make sure that the system
limits allow the creation of large enough files. With the NFS export and name resolution
set up, the file system needs to be mounted on the Virtual I/O Server.
$ mount server5:/export/ios_backup /mnt
$ mount
node mounted over vfs date options
-------- --------------- --------------- ------ ------------
/dev/hd4 / jfs2 Jun 27 10:48 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Jun 27 10:48 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Jun 27 10:48 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Jun 27 10:48 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Jun 27 10:48 rw,log=/dev/hd8
/proc /proc procfs Jun 27 10:48 rw
/dev/hd10opt /opt jfs2 Jun 27 10:48 rw,log=/dev/hd8
server5.itsc.austin.ibm.com /export/ios_backup /mnt nfs3 Jun 27 10:57
$ backupios -file /mnt
Creating information file for volume group storage01.
Creating information file for volume group volgrp01.
Backup in progress. This command can take a considerable amount of time
to complete, please be patient...
$
The command above creates a full backup tar file package, including all of the resources
that the installios command needs to install a Virtual I/O Server (mksysb, bosinst.data,
network bootimage, and SPOT) from an HMC using the installios command. We cover the
restoration methods later in this unit, but it is possible to just create the mksysb backup of
the Virtual I/O Server as follows. At the current time the NIM server only supports the
mksysb restoration method. The mksysb backup of the Virtual I/O Server can be extracted
from the TAR file created in a full backup, so either method is appropriate if the restoration
method is to use a NIM server.
$ backupios -file /mnt/VIOS_BACKUP_27Jun2008_1205.mksysb -mksysb
/mnt/VIOS_BACKUP_27Jun2008_1205.mksysb doesn't exist.
Creating /mnt/VIOS_BACKUP_27Jun2008_1205.mksysb
Creating information file for volume group storage01.
Creating information file for volume group volgrp01.
Backup in progress. This command can take a considerable amount of time
to complete, please be patient...
Creating information file (/image.data) for rootvg.
Creating list of files to back up...
Backing up 45016 files...........................
45016 of 45016 files (100%)
0512-038 savevg: Backup Completed Successfully.
Both of these methods create a backup of the virtual I/O operating system, which we can
use to recover the Virtual I/O Server using either an HMC or a NIM server.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
functionality could include restoring configurations on external devices to which the VIO
Server communicates.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Restoring the Virtual I/O Server from CD or DVD backup
The backup procedure creates bootable media, which we can use to restore as
stand-alone backups. Insert the first disk from the set of backups into the optical drive and
boot the machine into SMS mode. Select to install from the optical drive and work through
the usual installation procedure. If the CD or DVD backup spanned multiple disks, then
during the install you are prompted to insert the next disk in the set with a similar message
to the following:
Please remove volume 1, insert volume 2, and press the Enter key.
For more details on this please see details on system backup and restore
in the IBM Information Center.
Restoring the Virtual I/O Server from tape backup
The procedure for the tape is very similar to the CD or DVD, as this is bootable media: just
place the backup media into the tape drive and follow the boot into SMS mode. Select to
install from the tape drive and follow the same procedure as shown in the visual.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty "/etc/syslog.conf"
Shutting down syslog services
done
Starting syslog services
done
nimol_config MESSAGE: Executed /usr/sbin/nimol_bootreplyd -l -d -f
/etc/nimoltab -s server1.itsc.austin.ibm.com.
nimol_config MESSAGE: Successfully configured NIMOL.
nimol_config MESSAGE: target directory: /info/default5
nimol_config MESSAGE: Executed /usr/sbin/iptables -I INPUT 1 -s server5 -j
ACCEPT.
nimol_config MESSAGE: source directory: /mnt/nimol
nimol_config MESSAGE: Checking /mnt/nimol/nim_resources.tar for existing
resources.
nimol_config MESSAGE: Executed /usr/sbin/iptables -D INPUT -s server5 -j
ACCEPT.
nimol_config MESSAGE: Added "/info/default5 *(rw,insecure,no_root_squash)"
to the file "/etc/exports"
nimol_config MESSAGE: Successfully created "default5".
nimol_install MESSAGE: The hostname "lpar11.ilsvpn.atlanta.ibm.com" will be
used.
"/etc/nimol.conf"
nimol_install MESSAGE: Added
menu interface.
if_en: ns_alloc(en0) failed with errno = 19
if_en: ns_alloc(en0) failed with errno = 19
Method error (/usr/lib/methods/chgif):
0514-068 Cause not known.
0821-510 ifconfig: error calling entry point for /usr/lib/drivers/if_en: The
specified device
does not exist.
0821-103 : The command /usr/sbin/ifconfig en0 inet 10.31.182.163 arp netmask
255.255.255.0 mtu 1500 up failed.
0821-007 cfgif: ifconfig command failed.
The status of "en0" Interface in the current running system is uncertain.
0821-103 : The command /usr/lib/methods/cfgif -len0 failed.
0821-510 ifconfig: error calling entry point for /usr/lib/drivers/if_en: The
specified device does not exist.
0821-103 : The command /usr/sbin/ifconfig en0 inet 10.31.182.163 arp netmask
255.255.255.0 mtu 1500 up failed.
0821-229 chgif: ifconfig command failed.
The status of "en0" Interface in the current running system is uncertain.
mktcpip: Problem with command: chdev , return code = 1
if_en: ns_alloc(en0) failed with errno = 19
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-95
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-97
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
installp Flags
COMMIT software updates? [yes] +
SAVE replaced files? [no] +
AUTOMATICALLY install requisite software? [yes] +
EXTEND filesystems if space needed? [yes] +
OVERWRITE same or newer versions? [no] +
VERIFY install and check file sizes? [no] +
ACCEPT new license agreements? [no] +
(AIX V5 and higher machines and resources)
Preview new LICENSE agreements? [no] +
Notes:
With the SPOT and the mksysb image defined to NIM, we can now install the Virtual I/O
Server from the backup. If the machine or LPAR we are to install is not defined in NIM,
create a NIM machine object to identify it. Then use smitty nim_bosinst fastpath to enable
the base operating system installation process.
Note that the Remain NIM client after install field here is set to no. If this is not set to no,
then the last step for the NIM install is to configure an IP address onto the physical adapter
used to install the Virtual I/O Server from the NIM server. If this is the adapter used by the
shared Ethernet adapter, it will cause some error messages similar to those shown below.
Uempty If this is the case, reboot the Virtual I/O Server, log on to the Virtual I/O Server through the
terminal and remove the IP address information and SEA adapter and recreate them:
inet0 changed
if_en: ns_alloc(en0) failed with errno = 19
if_en: ns_alloc(en0) failed with errno = 19
Method error (/usr/lib/methods/chgif):
0514-068 Cause not known.
0821-510 ifconfig: error calling entry point for /usr/lib/drivers/if_en:
The
specified device does not exist.
0821-103: The command /usr/sbin/ifconfig en0 inet 10.31.182.163 arp
netmask
255.255.255.0 mtu 1500 up failed.
0821-007 cfgif: ifconfig command failed.
The status of "en0" Interface in the current running system is uncertain.
0821-103 : The command /usr/lib/methods/cfgif -len0 failed.
0821-510 ifconfig: error calling entry point for /usr/lib/drivers/if_en:
The specified device does not exist.
0821-103 : The command /usr/sbin/ifconfig en0 inet 10.31.182.163 arp
netmask
255.255.255.0 mtu 1500 up failed.
0821-229 chgif: ifconfig command failed.
The status of "en0" Interface in the current running system is uncertain.
mktcpip: Problem with command: chdev
, return code = 1
if_en: ns_alloc(en0) failed with errno = 19
if_en: ns_alloc(en0) failed with errno = 19
if_en: ns_alloc(en0) failed with errno = 19
if_en: ns_alloc(en0) failed with errno = 19
Now that we have set up the NIM server to push out the backup image, the Virtual I/O
Server LPAR needs to have the remote IPL setup completed; the procedure for this can be
found in the AIX Installation in a Partitioned Environment guide found in the Infocenter at:
http://publib16.boulder.ibm.com/pseries/index.htm
The install of the Virtual I/O Server should complete, but in this case there is a big
difference between restoring to the existing server or restoring to a new disaster recovery
server. One of the NIM install options is to recover devices. With this option, any virtual
devices that were created on a server will be recreated exactly as they were, providing the
restoration occurs to the same server. This means that virtual target SCSI devices and
shared Ethernet adapters should all be recovered without any need to recreate them.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-99
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Using cron
Available in Virtual I/O Server 1.3 (and above) to schedule
tasks
crontab e
/usr/ios/cli/ioscli
Example:
30 21 * * * /usr/ios/cli/ioscli mount
eservernim:/export/nim/mksysb /mnt
31 21 * * * /usr/ios/cli/ioscli backupios file
/mnt/vios_backup -mksysb
Notes:
The cron function was introduced to the padmin shell in Virtual I/O Server Version 1.3.
However, many commands fail when executed from within a cron job. Failing commands
include mount and backupios, even though they work fine when executed from the CLI.
/home/padmin/.profile is an important part of the CLI, because it changes the path
and aliases many commands. The commands fail because cron does not read the users
.profile.
You can successfully execute the commands from within a cron job by using the
/usr/ios/ioscli command.
With Virtual I/O Server Version 1.3 and later, the crontab command is available to allow
you to submit, edit, list, or remove cron jobs. A cron job is a command run by the cron
daemon at regularly scheduled intervals, such as system tasks, nightly security checks,
analysis reports, and backups.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-101
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
With the Virtual I/O Server, a cron job can be submitted by specifying the crontab
command with the -e flag. The crontab command invokes an editing session that allows
you to modify the padmin users crontab file and create entries for each cron job in this file.
When you finish creating entries and exit the file, the crontab command copies it into the
/var/spool/cron/crontabs directory and places it in the padmin file.
When scheduling jobs, use the padmin users crontab file. The creation or editing of other
users crontab files is not supported.
The following syntax is available to be used by the crontab command:
crontab [ -e padmin | -l padmin | -r padmin | -v padmin ]
-e padmin: Edits a copy of the padmins crontab file. When editing is complete, the file
is copied into the crontab directory as the padmin's crontab file.
-l padmin: Lists padmin's crontab file.
-r padmin: Removes the padmins crontab file from the crontab directory.
-v padmin: Lists the status of the padmin's cron jobs.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-103
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Since the Virtual I/O Server Version 1.5, it is possible to create a single container to store
and manage file-backed virtual media files. This container is named the virtual media
repository. You can have only one virtual media repository per Virtual I/O Server.
You can create virtual media files and store them in this repository. These media stored can
be loaded into file-backed virtual optical devices for exporting to client logical partitions.
The slide describes the way to create a blank virtual optical media that can be used to
backup an client lpars mksysb image. From the client logical partition, you can use the
mkdvd or mkcd command to create a system backup image (mksysb) to DVD-RAM
backed by a file.
Restoring your bootable image from a file-backed virtual optical device
First you need to load the virtual DVD containing the bootable image into the virtual optical
device (this is the same action you would have done with physical DVD reader) using the
loadopt command.
Boot your logical partition in SMS mode and start restoring the image.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-105
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty Increasing the size of the backing device at the Virtual I/O Server (before
Version 1.3)
If you have a Virtual I/O Server Version 1.2 (pre VIOS 1.3) then you can increase the
associated logical volume of non-root volume group, but you must first varyoff the volume
group to unconfigure, and re-configure the associated virtual target device. When varying
on the non-root volume group on the client partition, you must execute a chvg command to
activate the new added space.
The chvg command can be used to set the characteristics of a volume group. The specific
-g parameter examines all of the disks in the volume group to see whether they have grown
in size. If any disks have grown try to add additional physical partitions to the physical
volume.
Increasing the size of the backing device at the Virtual I/O Server (VIOS
1.3 and later)
With Virtual I/O Server Version 1.3 (and later), changing the backing device logical
volumes size, or files size is non-disruptive. After you change its size, you must run chvg
-g <volume group name> at the client partition.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-107
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose Show how to manage virtual SCSI disk.
Details Increasing non rootvg volume groups in the client partition is dynamic. A chvg
command has to be performed on the client lpar to recognize the new virtual SCSI disk
size. VIOS 1.5 is required for exporting files as virtual SCSI disk devices.
Additional information
Transition statement
Uempty
Server
configuration
Availability
priority update
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-109
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-111
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-113
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Firmware power saver mode capability: This capability indicates the firmware loaded on
the server is capable of performing Power Saver mode functions, but does not
necessarily imply the same for the underlying hardware.
Hardware power saver mode capability: This capability indicates whether the server
hardware supports the power saver mode function. For example, POWER6
processor-based systems with a nominal operating frequency of < 4.0 GHz do not
support power saver functions at the hardware level, even if the installed firmware does.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-115
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Diagnostic Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will not be used.
Advanced Diagnostics Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will be used.
TASKS SELECTION LIST 801004
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
This selection will list the tasks supported by these procedures.
Once a task is selected, a resource menu may be presented showing
all resources supported by the task. From the list below, select a task by moving the cursor to
Resource Selection the task and pressing 'Enter'.
This selection will list the resources in the system that are supported
To list the resources for the task highlighted, press 'List'.
by these procedures. Once a resource is selected, a task menu will
be presented showing all tasks that can be run on the resource(s).
[MORE...20]
Display Resource Attributes
Display Service Hints
Display Software Product Data
F1=Help F10=Exit F3=Previous Menu
Display or Change Bootlist
Format Media
Gather System Information
Hot Plug Task
Identify and Attention Indicators
Local Area Network Analyzer
Log Repair Action
Microcode Tasks
RAID Array Manager
[MORE...4]
Notes:
Uempty The Hot Plug Tasks selection is under the Task Selection option of the menu. Under this
menu selection, the choice of PCI hot plug tasks, RAID hot plug devices, and the SCSI and
SCSI RAID hot plug manager are presented.
The PCI Hot Plug Manager menu is used for adding, identifying, or replacing PCI adapters
in the system that are currently assigned to the VIOS. The RAID hot plug devices option is
used for adding RAID enclosures that are connected to a SCSI RAID adapter. The SCSI
and SCSI RAID manager menu is used for disk drive addition or replacement and SCSI
RAID configuration.
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-117
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-119
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Lets review the key concepts of this unit.
Uempty
Checkpoint
1. Which command can be used to display the code version of
the HMC?
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-121
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Checkpoint solutions
1. Which command can be used to display the code version of the HMC?
The answer is lshmc -V.
2. Your currently installed firmware level is EL320_59_031, and the new service pack
is EL320_061_031. Is this disruptive?
The answer is no.
3. In the service partition, what command is used to apply the updated system
firmware?
The answer is update_flash.
4. List the commands that can be used to back up and restore the volume group data
structures.
The answers are savevgstruct and restorevgstruct.
5. If you resize a backing device at the VIO Server version 1.3 or later, what
commands must be executed at the client partition to use the additional space?
The answer is chvg -g.
Additional information
Transition statement
Uempty
Exercise
Unit
exerc
ise
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-123
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Unit summary
Having completed this unit, you should be able to:
Manage PowerVM system firmware
Update the Virtual I/O Server software
Single and dual VIO configurations
updateios command
Back up the Virtual I/O Server
backupios command (file, tape, CD, and DVD)
Restore the Virtual I/O Server
Backup tape, DVD, and tar file
Add disk space to a vSCSI client partition
Back up client partitions operating system to virtual DVD
Change partition availability priority
Activate the power saver mode
Manage hot-pluggable devices in the Virtual I/O Server
diagmenu command: VIOS diagnostic menu
Add hot-swap SCSI disk
Copyright IBM Corporation 2011
Notes:
Copyright IBM Corp. 2010, 2011 Unit 9. PowerVM advanced systems maintenance 9-125
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Estimated time
01:00
References
SG24-7590-01 IBM PowerVM Virtualization Managing and Monitoring
redbook
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
After completing this unit, you should be able to:
Describe standard AIX/virtualization monitoring tools
Identify freeware monitoring tools
Describe virtualization management and monitoring tools, such
as
IBM Systems Director
IBM Tivoli Monitoring
Notes:
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Utilization data management application
HMCs can collect utilization data for any managed system. To collect this information, you
need to select the managed system you want to monitor, then change the utilization
frequency for retrieving data. This can be every 30 seconds, 60 seconds, every 5 minutes,
every 30 minutes, or every hour. By default, the utilization retrieval rate is set to 0 and the
recording is set to disable.
Utilization data is collected into records called utilization events and include information
about the states of the HMC, partitions, and managed systems and utilization of processor
and memory resources. Events can be viewed at periodic intervals (hourly, daily, monthly,
and snapshot) by selecting Operations > Utilization Data > View.
Periodic intervals and maximum number of events
Hourly: Hourly periodic utilization events, system-level state and configuration change
utilization events, partition-level state and configuration change utilization events, and
HMC start, shutdown, and time-change utilization events
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
LPARs utilization
percentage
Notes:
Logical partition utilization table
This table displays the utilization data captured for each logical partition on the managed
system at the indicated date and time. Each line contains the information for a single logical
partition on the managed system.
The information contained in each column is as follows:
Partition (ID): Displays the name of each logical partition and the ID number.
Processor mode: Displays the processor mode of each logical partition. Valid values
are Dedicated or Shared).
Processing units: Displays the number of processing units committed to each logical
partition.
Current processors: Displays the number of dedicated processors committed to each
logical partition.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Utilized processing units: Displays the number of processing units that were utilized
by the logical partition since the previous sample. The utilization expressed as a
percentage of the number of processing units assigned to the logical partition is also
displayed. The utilization can be greater than the number of processing units assigned
to the logical partition (or 100%) if the sharing mode of the logical partition is uncapped.
This information is not available for logical partitions that use dedicated processors.
Physical processor poll utilization
Processing units: Displays the number of processing units in all shared processor
pools that were configurable for partition usage. This number includes processing units
that were assigned to all partitions in shared processor pools.
Processor utilization: Displays the number of processing units that were utilized by all
partitions in shared processor pools since the previous sample. The utilization
expressed as a percentage of the number of configurable processing units in all shared
processor pools is also displayed. The utilization can be greater than the number of
configurable processing units in all shared processor pools (or 100%) if shared
processor partitions are using processing cycles that belong to dedicated processor
partitions.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
SharedPool01
utilization
percentage
Shared memory
pool utilization
Notes:
Shared processor pool utilization
This table displays the utilization data captured for each configured shared processor pool
on the managed system at the indicated date and time. Each row in the table contains the
information for a single configured shared processor pool on the managed system. The
information contained in each column is as follows:
Shared processor pool (ID): Displays the name and ID of the shared processor pool.
Processing units: Displays the number of processing units assigned to each
configured shared processor pool on the managed system.
Processor utilization: Displays the percent of entitled processing time that the logical
partitions using each configured shared processor pool have used since the last time
that the managed system was powered on or restarted. This is the utilized pool cycles
time divided by the total pool cycles time, and this is expressed as a percent value. This
figure can be greater than 100% if the sharing mode of the logical partition is uncapped
Uempty and the logical partition is using processing time that belongs to dedicated logical
partitions or logical partitions that use other shared processor pools.
Shared memory pool utilization table
The information displayed here allows you to see the utilization of the shared memory pool
at the indicated date and time.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
POWER6 provides a Scaled Processor Utilization Resource Register (SPURR).
Measurement of processor time is dynamically scaled, based on the current degree of
throttling or frequency skewing. AIX 5.3 TL7 and AIX 6.1 will support accurate process
accounting based on the SPURR in the face of processor throttling or TPMD induced
processor frequency slewing.
The POWER5/POWER6 family of processors implement a performance-specific register
called the Process Utilization Resource Register (PURR). The PURR tracks the real
processor resource usage on a per thread or per partition level. The AIX 5L performance
tools have been updated in AIX 5L V5.3 to reflect the new statistics.
The PURR is simply a 64-bit counter with the same units for the timebase and decrementer
registers that provide per-thread processor utilization statistics. Each POWER5/POWER6
processor (core) has two hardware threads associated. With SMT enabled, each hardware
thread is seen as a logical processor.
The timebase register is simply a hardware register that is incremented at each tic. The
decrementer register provides periodic interrupts. A simple way to look at it would be to
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
consider that at each processor clock cycle, one of the PURRs is incremented. It will be for
the thread dispatching instructions or the thread that last dispatched an instruction.
The sum of the two PURRs equals the value in the timebase register. This approach is an
approximation, as SMT allows both threads to run in parallel. It simply provides a
reasonable indication of which thread is making use of the POWER5 resources.
The AIX tools that provide system wide information, such as the iostat, vmstat, sar, and
time commands, use the PURR-based statistics whenever SMT is enabled for the %user,
%system, %iowait, and %idle figures.
When executing on a shared-processor partition, these commands add two extra columns
of information with:
Physical processor consumed by the partition, shown as pc or %physc.
Percentage of entitled capacity consumed by the partition, shown as ec or %entc.
# iostat -t 2 4
System configuration: lcpu=2 ent=0.50
tty: tin tout avg-cpu: % user % sys % idle % iowait physc % entc
0.0 19.3 8.4 77.6 14.0 0.1 0.5 99.9
0.0 83.2 9.9 75.814.2 0.1 0.5 99.5
0.0 41.1 9.5 76.413.9 0.1 0.5 99.6
0.0 41.0 9.4 76.4 14.1 0.0 0.5 99.7
# sar -P ALL 2 2
AIX vio_client2 3 5 00CC489E4C00 08/17/05
System configuration: lcpu=2 ent=0.50
20:13:48 cpu %usr %sys %wio %idle physc %entc
20:13:50 0 19 71 0 9 0.31 61.1
1 2 75 0 23 0.19 38.7
- 13 73 0 15 0.50 99.8
20:13:52 0 21 69 0 9 0.31 61.1
1 2 75 0 23 0.20 39.0
- 14 71 0 15 0.50 100.2
Average 0 20 70 0 9 0.31 61.1
1 2 75 0 23 0.19 38.9
- 13 72 0 15 0.50 100.0
A cross-partition view of system resources is available with the topas C command. At this
time, this command will only see partitions running AIX 5L V5.3 TL3 or later; Virtual I/O
Server with version 1.3 or later is supported also.
The topas command also has a new -D switch or D command to show disk statistics that
take virtual SCSI disks into account.
The mpstat command collects and displays performance statistics for all logical CPUs in a
partition. When the mpstat command is invoked, it displays two sections of statistics. The
first section displays the system configuration, which is displayed when the command
starts and whenever there is a change in the system configuration. The second section
displays the utilization statistics, which will be displayed in intervals and at any time the
values of these metrics are deltas from a pervious interval.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
topasrec
topasrec is a new tool available since October 2008 that generates binary recording of
local system metrics and the CEC metrics. When you run the topasrec command for a
CEC recording, the topasrec command collects a set of metrics from the AIX partitions
running on the same CEC. The topasrec command collects dedicated and shared partition
data, and a set of aggregated values to provide an overview of the partition set on the
same CEC.
Start recording
To start a continuous local binary recording using the default output file:
# topasrec -L
The performance data is logged to:
/usr/lpp/perfagent/<hostname>_<date>_<time>.topas
To start a CEC binary recording using the default output file:
Uempty # topasrec -C
The performance data is logged to:
/usr/lpp/perfagent/<hostname>_<date>_<time>.topas
Stop recording
It is recommended to stop topas recording using the smitty Stop_Recording fast path. The
command listtrec is run under the cover when stopping topas recording.
Summary report of a CEC recording in binary format
# topasout -R summary /usr/lpp/perfagent/sys044_vios1_cec_091006_1324.topas
#Report: CEC Summary --- hostname: sys044_vios1 version:1.2
Start:10/06/09 13:24:12 Stop:10/06/09 13:38:12 Int: 5 Min Range: 14 Min
Partition Mon: 6 UnM: 0 Shr: 6 Ded: 0 Cap: 6 UnC: 0
--CEC-------------- -Processors----------------- --Memory
(GB)------------
Time ShrB DedB Don Stl Mon UnM Shr Ded PSz APP Mon UnM Avl UnA InU
13:29 0.0 0.0 - - 1.3 0.0 1.3 0 8.0 8.0 6.6 0.0 0.0 0.0 0.0
13:34 0.0 0.0 - - 1.5 0.0 1.5 0 8.0 8.0 7.0 0.0 0.0 0.0 0.0
13:38 0.0 0.0 - - 1.5 0.0 1.5 0 8.0 8.0 7.0 0.0 0.0 0.0 0.0
Detailed report
# topasout -R detailed /usr/lpp/perfagent/sys044_vios1_cec_091006_1324.topas
#Report: CEC Detailed --- hostname: sys044_vios1 version:1.2
Start:10/06/09 13:24:12 Stop:10/06/09 13:28:12 Int: 5 Min Range: 4 Min
Time: 13:28:12
-----------------------------------------------------------------
Partition Info Memory (GB) Processors Avail Pool : 8.0
Monitored : 6 Monitored : 6.6 Monitored : 1.3 Shr Physcl Busy: 0.03
UnMonitored: 0 UnMonitored: 0.0 UnMonitored: 0.0 Ded Physcl Busy: 0.00
Shared : 6 Available : 0.0 Available : 0.0 Donated Phys. CPUs: 0.00
UnCapped : 0 UnAllocated: 0.0 Unallocated: 0.0 Stolen Phys. CPUs :
0.00
Capped : 6 Consumed : 0.0 Shared : 1.3 Hypervisor
Dedicated : 0 Dedicated : 0.0 Virt Cntxt Swtch:3205358
Donating : 0 Donated : 0 Phantom Intrpt : 4573
Pool Size : 8.0
Host OS M Mem InU Lp Us Sy Wa Id PhysB Vcsw Ent %EntC PhI
-------------------------------------shared--------------------------------
sys044_lpar1 A61 C 1.0 0.7 2 0 0 0 99 0.00 194 0.3 0.87 0
sys044_lpar2 A61 C 1.0 0.7 2 0 0 0 99 0.01 387899 0.3 1.69 549
sys044_vios1 A61 C 1.0 0.6 2 0 4 0 94 0.01 2816378 0.1 7.98 4023
sys044_vios4 A61 C 1.0 0.6 2 0 1 0 79 0.00 155 0.1 2.15 0
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
New smit panels have been introduced in AIX to operate on topas, to start and stop
recordings.
Persistent recording
Persistent recordings are those recordings that are started from SMIT with the option to
specify the cut and retention. You can specify the number of days of recording to be stored
per recording file (cut) and the number of days of recording to be retained (retention) before
it can be deleted. Not more than one instance of persistent recording of the same type
(CEC or local recording) can be run in a system. When a persistent recording is started, the
recording command will be invoked with user-specified options. The same set of command
line options used by this persistent recording will be added to inittab entries. This will
ensure that the recording is started automatically on reboot or restart of the system.
By default, a local persistent recording is already running on each AIX operating system.
The default persistent recording is based on a daily recording in /etc/perf/daily
directory. You can start a persistent local recording either in binary or nmon type.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
If you started a local recording, the possible reporting format will be the following:
Comma_separated
Spreadsheet
Detailed
Summary
Disk_summary
Network_summary
nmon
Adapter
Virtual_Adapter
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Detailed
Notes:
Here is a topasout report (summary and detailed) of a topas CEC (cross partitions)
recording in binary format.
The first output report is an extract of a summary report from a CEC recording file, and the
the secondary output report is an extract of a detailed report from a CEC recording file.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
topas_nmon
The classical nmon freeware tool has been assimilated inside topas and is now fully
supported by IBM. Like topas, nmon is also a cursor-based tool for system performance
monitoring and also has recording capabilities. nmon is a new, by default, part of AIX and
can be started using the standard nmon command or topas_nmon.
Unlike topas, to start recording local data in nmon format, use the smitty
Start_Recording_Topas fast path menu. The output report file generated can be used with
nmon analyzer to create graphic views of recorded data.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
nmon consolidator
Need to use nmon consolidator if you want to get a report for the entire machine.
Notes:
nmon analyzer is an Excel spreadsheet that takes an output file from nmon or
topas_nmon and produces some nice graphs to aid in analysis and report writing. It also
performs some additional analyses for ESS, EMC, and FAStT subsystems. It requires
Excel 2002 or later.
Using NMON_analyser
FTP the input file to your PC, ideally using the ASCII or TEXT options to make sure that
lines are terminated with the CRLF characters required by Windows applications.
Open the NMON_analyser spreadsheet and specify the options you want on the Analyser
and Settings sheets (see below). Save the spreadsheet if you want to make these options
your personal defaults.
Click Analyse nmon data and find and select the .nmon files to be processed. You can
select several files in the same directory. If you wish to process several files in different
directories you might wish to consider using the FILELIST option described below.
Uempty You might see the message SORT command failed for filename if the file has >65K lines
and the filename (or directory name) contains blanks or special characters. Either rename
the file or directory or just pre-sort the file before using the Analyzer.
Using NMON_consolidator
NMON_consolidator reads in up to 255 nmon or topasout files to produce a consolidated
set of data in the form of an Excel spreadsheet (requires Excel 2002 or later).
A separate sheet is generated for each major performance statistic; graphs are
automatically generated showing summary data for each server and for the installation as a
whole. The tool allows nodes to be grouped together and will automatically calculate group
totals and group averages for each statistic. Because the graphs are pre-defined, the user
is free to edit the titles, colors, and fonts to suit their own requirements and can simply
delete unwanted charts or entire sheets to reduce the amount of output.
Administrators who tend partitioned servers will find this tool provides the ability to get an
overview of an entire machine at-a-glance and provides the opportunity for modelling
different partitioning scenarios (for example, moving dedicated partitions into the shared
pool).
nmon_consolidator can be obtained from
http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmonconsolidator
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Virtual I/O Server 2.1 monitoring commands
vmstat, fcstat, and svmon commands are standard AIX commands reporting
performance statistics. The fcstat command displays the statistics gathered by the
specified Fibre Channel device driver. The svmon command captures and analyzes a
snapshot of virtual memory.
wkldout: The wkldout command processes the data produced by running the
Workload Manger Agent. The Workload Manager Agent writes data files to the
/home/ios/perf/wlm directory.
topas: The topas command reports selected statistics about the activity on the local
system. To get a cross partition view, the cecdisp option can be used.
viostat: The viostat command reports CPU statistics, asynchronous input/output
(AIO), and input/output statistics for the entire system, adapters, tty devices, disks, and
CD-ROMs.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
seastat: The seastat command generates a report to view, per client, shared Ethernet
adapter statistics. To gather network statistics at a per-client level, advanced
accounting can be enabled on the shared Ethernet to provide more information about
its network traffic. To enable per-client statistics, the VIOS administrator can set the
shared Ethernet adapter accounting attribute to enabled. The default value is disabled.
When advanced accounting is enabled, the shared Ethernet adapter keeps track of the
hardware (MAC) addresses of all of the packets it receives from the LPAR clients, and
increments packet and byte counts for each client independently. After advanced
accounting is enabled on the shared Ethernet adapter, the VIOS administrator can
generate a report to view per-client statistics by running the seastat command.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
This menu can also be retrieved using the topas -C command (instead of the topas
cecdisp command) from any logical partition in the managed system.
In this example, we have one Virtual I/O Server and two virtual I/O client partitions.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
From the previous topas menu, you can zoom on a specific Virtual I/O Server and clients
configuration and throughput. This can be accomplished by selecting a Virtual I/O Server
using the arrows keys and then pressing d to get the detailed monitoring.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Total throughput=KB-In+KB-Out
Notes:
Network statistics can be seen in topas by pressing E. If you are running topas on a Virtual
I/O Server, the shared Ethernet adapter configuration and statistics will be shown.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
===============================================================================
Vtargets/Disks Busy% KBPS TPS KB-R ART MRT KB-W AWT MWT AQW AQD
hdisk2 95.7 204.3K 2.4K 204.3K 0.4 1.8 0.0 0.0 0.0 0.6 0.9
hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
hdisk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Notes:
The figure shows a topas view of the adapters on the Virtual I/O Server. You can see
activity on the physical Fibre Channel adapter (fcs0) and on the virtual SCSI adapter
(vhost0). At the bottom of the output, hdisk2 is the physical disk device with I/O activity.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
===============================================================================
Vtargets/Disks Busy% KBPS TPS KB-R ART MRT KB-W AWT MWT AQW AQD
hdisk0 100.0 150.0K 2.3K 150.0K 1.2 72.2 0.0 0.0 113.80.0 0.0
===============================================================================
Path Busy% KBPS TPS KB-R KB-W
Path1 0.0 0.0 0.0 0.0 0.0 Paths throughput
Path0 100.0 214.0K 2.5K 214.0K 0.0
Copyright IBM Corporation 2011
Notes:
In this figure, the upper topas panel shows two virtual SCSI client adapters and a virtual
SCSI disk (hdisk0). We can see vscsi0 has activity and hdisk0 is 100% busy. Looking at the
bottom of the output, we see the two paths providing access to hdisk0 (virtual SCSI disk).
Only path0 has activity.
To collect input/output statistics for the disk, enable the collection by running the following
command:
chdev -l sys0 a iostat=true
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The Workload Manager Agent provides recording capability for a limited set of local system
performance metrics. These include common CPU, memory, network, disk, and partition
metrics typically displayed by the topas command.
The Workload Manager must be started using the wkldmgr command before the
wkldagent command is run. Daily recordings are stored in the /home/ios/perf/wlm
directory with filenames xmwlm.YYMMDD, where YY is the year, MM is the month, and DD
is the day.
The wkldout command can be used to process Workload Manager-related recordings. All
recordings cover 24-hour periods and are retained for only two days.
wkldout [-report reportType] [-interval MM] [-beg HHMM] [-end
HHMM][-fmt [-mode modeType]] [-graph] filename
<xmwlm_recording_file>
report: Detailed | summary | disk | lan
Uempty interval MM: Split the recording reports into equal size time periods. Allowed values
(in min.) are 5, 10, 15, 30, and 60.
beg HHMM: Begin time in hours (HH) and minutes (MM). Range is between 0000 and
2400.
end HHMM: End time in hours (HH) and minutes (MM). Range is between 0000 and
2400 and is greater than the begin time.
fmt: Spreadsheet import format.
mode: min | max | mean |stdev | set
graph: Generate the .csv file under /home/ios/perf/wlm in the format
xmwlm.YYMMDD.csv, which can be input to nmon Analyzer to produce some nice
graphs to aid in analysis and report writing. nmon Analyzer requires Excel 2002 or later.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Graphical LPAR monitor for System p5 servers (LPARMon) is a graphical logical partition
(LPAR) monitoring tool that can be used to monitor the state of one or more LPARs on a
System p5 server. LPAR state information that can be monitored with this tool includes
LPAR entitlement/tracking, simultaneous multithreading (SMT) state, and processor and
memory use. In addition to monitoring the state of individual LPARS, the tool also includes
gauges that display overall shared processor pool use and total memory use. The LPARs
to be monitored can be running any mixture of AIX 5.3, AIX 5.2, or the Linux operating
systems. There is also a history feature that can be used to display an LPARs processor
use over seconds, minutes, hours, or days. This feature is helpful in determining an LPARs
processor resource requirements.
How does it work?
The LPARMon tool consists of two components. First, there are small agents that run in
AIX or Linux LPARs. These agents gather various LPAR information through several
operating system commands and API calls. The agents then pass this information using a
connected socket to the second component, which is the monitors graphical user
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
interface. This graphical user interface is a Java application and it is used as a collection
point for the server and LPAR status information, which it then displays in a graphical
format for the user.
Once LPARMon is installed and configured, starting up LPARMon is as simple as executing
the lparmon shell script or lparmon.bat file. The operation of LPARMon is very straight
forward and the various dialogs were covered in the Overview section. There are just a few
things you need to keep in mind when using LPARMon.
1. Only one LPARMon can be connected to a partition at any given time. If a partition is
specified in the LPARMon config file and an instance of LPARMon is running which
pointed to that config file, other instances of LPARMon who might also have specified
that same partition in their config file will hang on startup waiting for the partition to be
released by the active LPARMon instance.
2. If a partition is specified in the config file and LPARMon cannot connect to that partition,
the partition will be ignored and not be available when the LPARMon dialog comes up.
The most likely causes for this problem are:
a. The LPARMon agent is not running on the specified partition.
b. The machine where LPARMon is running cannot connect to the agent using the
specified or defaulted port. Make sure you can ping the machine or try another port.
c. Make sure you have used the correct version of the agent that corresponds with the
operating system that is running on the partition.
d. If the LPARMon agent is running and you still cannot connect to it, try killing all
instances of the LPARMon agent on the partition and restart the agent.
Once the communication problems with the partition have been resolved, it is necessary to
restart LPARMon in order to see the partition in the monitor.
3. The history information for processor usage is only held while LPARMon is running.
Restarting LPARMon will reset the history data.
Ganglia
Many large AIX high performance computing (HPC) clusters use this excellent tool to
monitor performance across large clusters of machines.
The data is displayed graphically on a Web site, and includes configuration and
performance statistics. This is also increasingly being used in commercial data centers to
monitor large groups of machines.
Ganglia can also be used to monitor a group of logical partitions (LPARs) on a single
machine - these just look like a cluster to Ganglia.
Ganglia is not limited to just the AIX, which makes it even more useful in heterogeneous
computer rooms.
For more information go to the Ganglia home Web site at http://ganglia.sourceforge.net/
For the Ganglia for AIX and Linux on POWER binaries go to http://www.perzl.org/ganglia/
Uempty Briefly, a daemon runs on each node, machine, or LPAR and the data is collected by a
further daemon and placed in an rrdtool database. Ganglia then uses PHP scripts on a
Web server to generate the graphs as directed by the user. There is also an on-going
project to add POWER5 micro-partitions statistics.
Ganglia tool can also be found here:
http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
Ganglia
Many large AIX high performance computing (HPC) clusters use this excellent tool to
monitor performance across large clusters of machines.
The data is displayed graphically on a Web site, and includes configuration and
performance statistics. This is also increasingly being used in commercial data centers to
monitor large groups of machines.
Ganglia can also be used to monitor a group of logical partitions (LPARs) on a single
machine - these just look like a cluster to Ganglia. Ganglia is not limited to just the AIX,
which makes it even more useful in heterogeneous computer rooms.
For more information, go to the Ganglia home Web site at http://ganglia.sourceforge.net/.
For the Ganglia for AIX and Linux on POWER binaries go to http://www.perzl.org/ganglia/.
A wiki is also available here:
http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Ganglia components
A daemon runs on each node, machine, or LPAR and the data collected and placed in an
rrdtool database. Ganglia then uses PHP scripts on a Web server to generate the graphs
as directed by the user.
The components of Ganglia are as follows:
This data collector (G)
- The daemon is a single file and called gmond (Ganglia MONitor Daemon). It is a
monitoring daemon that collects the different metrics.
- Its configuration file is /etc/gmond.conf
- This goes on each node.
The data consolidator (G)
- This is a single file and called gmetad (Ganglia METAdata Daemon). It polls all the
gmond clients and stores the collected metrics in a round-robin databases (RRDs)
- Its configuration file is /etc/gmetad.conf
- You need one of these for each cluster. On massive clusters you can have more
than one and a hierarchy.
- This daemon collects the gmond data set using the network and saves it in a rrdtool
database.
The database
- Ganglia uses the well known and respected open source tool called rrdtool.
The Web GUI tools (G)
- These are a collection of PHP scripts started by the Web server to extract the
Ganglia data and generate the graphs for the Web site.
The Web server with PHP
- This could be any Web server that supports PHP, SSL, and XML.
- Everyone uses Apache2; you are on your own if you use anything else.
Addition advanced tools (G)
- gmetric is used to add extra stats. In fact, anything you like such as numbers or
strings, with units, and so on.
- gstat is used to get at the Ganglia data to do anything else you like.
Notice: The parts that are labeled with a (G) are part of Ganglia.The other parts you have
to get and install as prerequisites, namely Apache2, PHP, and rrdtool. These might also
have prerequisites.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
lpar2rrd is a tool capable of producing historical CPU utilization graphs of logical partitions
and shared CPU usage.
It also collects complete physical (hardware) and logical configuration of all managed
systems and logical partitions. This includes all changes in their state and configuration.
This tool is not intended to be real-time monitoring tool.
This tool is intended only for HMC-based micro-partitioned systems with a shared CPU
pool and creates charts based on utilization data collected on HMC (lslparutil hmc
command). It is agent-less; no agent needs to be installed on any logical partition. It uses
ssh keys-based access to the HMC to get all of the data, so it does not cause any load on
monitored logical partitions.
It supports all types of logical partitions and operating systems: AIX, VIOS, Linux on Power,
i5/OS on IBM Power systems. It creates automatically a menu for viewing charts,
configuration, and logs. It creates a physical and logical configuration inventory of all
managed systems and their logical partitions (once a day).
Uempty The lpar2rrd tool shows 100 of the last changes in the configuration and 100 of the last
changes in the state of all managed systems and their logical partitions. It shows the total
memory usage for each managed systems
This tool is simple to install, configure, and use (initial install and configuration together with
supporting tools like Apache/Perl/SSH should not take more than half an hour). Default
graphs can provide up to a year of historical data if available at the HMC.
More can be found here:
http://www.ibm.com/developerworks/wikis/display/virtualization/lpar2rrd+tool.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement Next is the Performance toolbox.
Uempty
Performance toolbox
POWER systems and virtualized environments can be
monitored by Performance toolbox (PTX).
Notes:
This X Windows performance monitoring tools support POWER systems and virtualization
statistics. From VIOS version 1.3, the daemon that Performance toolbox (PTX)
communicates with to extract data from remote machine is available.
You run a daemon on each AIX LPAR.
The PTX graphical user interface runs on a machine running X Windows and typically an
AIX workstation (although VNC works and you could use an other workstation running X
Windows remotely).
With PTX, you build up a monitor of what you want to capture dynamically on the screen
(CPU, disk, network, and so on, out of hundreds of statistics). You can also do the
following:
Automate the capture and saving of data to files
Replay the "monitor" - much like watching a video an zoom forward and back in time
Filter and modify the captured data to support other tools or performance databases
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
These two graphs show Entitlement (ent), Physical CPU use (physc), Shared, SMT, Cap
Status, and the Global CPU utilization in 2D and 3D modes. 3D allows multiple
machines/LPARs to be monitored at the same time.
The Performance toolbox for AIX consists of two components called the Manager and the
Agent. The Agent is also referred to as the Performance Aide and represents the
component that is installed on every network node in order to enable monitoring by the
manager.
The Agent component is available separately from the Performance toolbox for AIX
product.
The Local Performance Analysis and Control Commands fileset (perfagent.tools) is now a
prerequisite of the Performance Aide for AIX fileset (perfagent.server). The Local
Performance Analysis and Control Commands ship with the base operating system and
must be installed before proceeding with the Performance Aide for AIX installation.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
LPAR
LPAR
LPAR
LPAR
LPAR
LPAR
LPAR VIO VIO
Discovers physical
and shared I/O
Copyright IBM Corporation 2011
Notes:
IBM Systems Director is an integrated suite of systems management tools, designed for
monitoring and managing heterogeneous IT environments. The design concept is to
provide one mechanism by which your entire IT architecture is managed. From one access
point, administrators can monitor system environmental, resources, inventory, events, task
management, core corrective actions, distributed commands, and hardware control for a
wide range of servers and storage. IBM Systems Director has an extendable and modular
foundation that enables it to be enhanced with additional plug-ins such as VMControl.
IBM Director is a management tool for virtual and physical environments in the data center.
IT administrators can view and track the hardware configuration of remote systems in
detail, and monitor the usage and performance of critical components such as processors,
disks, and memory.
Uempty 2. The management console provides a Web-based user interface for all management
functions.
3. The information is passed to and from supported systems using the Systems Director
Agent or without an agent. This allows clients to trade off management functions for a
smaller footprint.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information IBM Director is provided at no additional charge for use on IBM
systems. You can purchase additional IBM Director server licenses for installation on
non-IBM servers. In order to extend IBM Director capabilities, several extensions can be
optionally purchased and integrated.
Transition statement
Uempty
Notes:
User interface
Here is a preview of Systems Directors console.
IBM Systems Director utilizes an extensible Web-based user interface for the console.
From the console, the administrator can perform a number of tasks utilizing the various
foundation components provided. For certain legacy operations, the user interface provides
a launch in context capability.
It is designed to give IT administrators the flexibility to manage in a way that is intuitive to
them. If they only manage Power servers, they can go right to a Power Systems
management home page by clicking a button. If they want to get a quick status or update
servers, they can go right to that function. It is designed to minimize the number of clicks
required to perform a function and show status of each function along the way.
It includes context-sensitive help, tutorials, and even a link to guide users to understand
how to navigate the console.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
IBM Systems Director:
Power Systems Management
Notes:
The IBM Systems Director Power Systems Management option, provides specific tasks
that can help you manage power systems and platform managers such as the Hardware
Management Console (HMC) and the Integrated Virtualization Manager (IVM).
IBM Power systems can all be completely managed by IBM Systems Director with
capabilities such as discovery, inventory, status, monitoring, power management, and so
on.
IBM Systems Director can manage the following Power systems environments that might
include POWER5 and POWER6 processor-based servers running AIX, IBM i, or Linux:
Power systems managed by the Hardware Management Console
Power systems managed by the Integrated Virtualization Manager
Power systems server with a single image (a nonpartitioned configuration)
A Power Architecture BladeCenter server under the control of a BladeCenter
management module
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
IBM Systems Director gives you an overall understanding of the Hardware Management
Consoles and Integrated Virtualization Managers in your environment, as well as the hosts
they manage and their associated virtual servers (logical partitions). You can access and
manage the logical partitions as you would any other managed system. In addition, IBM
Systems Director provides a launch-in-context feature to access additional tasks that are
available from the Hardware Management Console and the Integrated Virtualization
Manager. From IBM Systems Director, you can also access i5/OS management tasks, in
addition to the AIX management tasks.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
VMControl
IBM Systems Director VMControl enables clients to improve service delivery through faster
deployment of Power servers. Systems Director VMControl is a plug-in of IBM System
Director and is available in two editions: the Express Edition and the Standard Edition.
The Express Edition
The Express Edition provides lifecycle management of virtual machines, that is, the ability
create, modify, and delete them as well as move them to other locations. This function is a
no-charge download that supports a broad set of operating environments and Hypervisors
across IBM hardware platforms.
The Standard Edition
The Standard Edition includes the Express Edition functions and adds virtual-to-virtual
image management. This includes the ability to create, capture, import, and deploy virtual
images. This virtual image management solution configures new Power server AIX
systems or System z Linux systems, clone existing systems, and facilitates planning and
Uempty deploying virtual images. These virtual server images can be maintained in a library and
encapsulate the operating system, middleware, and applications for deployment on
another server.
Industry standard virtual images can also be imported because the Systems Director
VMControl design is based on the Open Virtualization Format (OVF) standard. For Power
servers, it leverages the Network Installation Manager (NIM) function of AIX, so that clients
do not have to migrate these to use the Systems Director interface for maintaining AIX
images. The automation capabilities can reduce the time to deploy new services, especially
as compared to installing each operating environment, middleware, and applications
individually.
VMControl Standard Edition also helps reduce image sprawl. In other words, the
proliferation of multiple similar images maintained by many administrators. Instead,
administrators can choose from existing images that are consistently used throughout the
IT operation.
There is a 60-day free trial for your clients to download and try the image management
capabilities of Systems Director VMControl Standard Edition. At the end of the 60 days,
they can purchase a license key to continue managing virtual images or continue to use the
Express Edition features at no charge.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement
Uempty
Notes:
From this IBM Systems Director menu, you can manage the virtual appliances in your data
center.
Virtual appliances
Generally speaking, a virtual appliance is a virtual machine software image designed to run
on a virtualization platform. From this menu, you can rapidly deploy virtual appliances to
create virtual servers that are instantly configured with the operating system and software
applications that you desire. You can deploy virtual appliances to the following platforms:
IBM Power systems servers (POWER5 and POWER6) that are managed by Hardware
Management Console or Integrated Virtualization Manager
Linux on System z systems running on the z/VM Hypervisor
IBM Systems Director VMControl allows you to complete the following tasks:
Discover existing image repositories in your environment and import external,
standards-based images into your repositories as virtual appliances.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Capture a running virtual server that is configured just the way you want, complete with
guest operating system, running applications, and virtual server definition. When you
capture the virtual server, a virtual appliance is created in one of your image
repositories with the same definitions and can be deployed multiple times in your
environment.
Import virtual appliance packages that exist in the Open Virtualization Format (OVF)
from the Internet or other external sources. After the virtual appliance packages are
imported, you can deploy them within your data center.
Deploy virtual appliances quickly to create new virtual servers that meet the demands of
your ever-changing business needs.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
IBM Tivoli Monitoring manages and monitors system and network applications on a variety
of operating systems, tracks the availability and performance of your enterprise system,
and provides reports to track trends and troubleshoot problems.
IBM Tivoli Monitoring (ITM) Version 6.2 gives the Power systems administrator the ability to
be alerted or notified when something goes wrong. ITM uses agent technology, and has the
capability to determine the health and availability of the entire Power system; right down to
the network interface card.
ITM provides the administrator the ability to monitor both the physical and logical resources
of the Power system, including the disk and network that sits behind the Virtual I/O Server
(VIOS). It can do this because with every VIOS server shipped from IBM, there is an ITM
agent embedded in the VIOS to enable ITM to monitor the disks and network.
Included with ITM is a data warehouse tool that allows the customer to store as much
historical data as they desire. This DB information allows the customer to go back and
compare current system performance and utilization to past performance and utilization.
Uempty In addition, ITM includes a vast number of reporting templates that can be easily
customized for individual customer requirements.
For additional technical resources, see the following Web sites:
IBM Tivoli Monitoring information center:
http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/index.jsp?topic=/com.ibm.itm.do
c/welcome.htm
IBM Tivoli Monitoring Web site:
http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliMonitoring.html
IBM Tivoli Monitoring can be ordered as a separate product or can be part of the AIX
management edition and AIX Enterprise editions offering. AIX Enterprise edition is only
available with AIX 6. Consider AIX Management edition if running AIX 5.3.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose
Details
Additional information
Transition statement The next series of slides show different views in Tivoli to monitor
the virtualized environment.
Uempty
TEP client
Management server
TEMS Warehouse
Topology
availability
performance
AIX
VIOS availability
HMC OS availability health
health performance
performance
HMC/IVM
Notes:
The basic installation of IBM Tivoli Monitoring requires the following components:
One or more Tivoli enterprise monitoring servers (TEMS), which act as a collection and
control point for alerts received from the agents, and collect their performance and
availability data. The monitoring server also manages the connection status of the
agents.
A Tivoli enterprise portal server (TEPS), which provides the core presentation layer for
retrieval, manipulation, analysis, and pre-formatting of data. The portal server retrieves
data from the hub monitoring server in response to user actions at the portal client, and
sends the data back to the portal client for presentation. The portal server also provides
presentation information to the portal client so that it can render the user interface views
suitably.
One or more Tivoli enterprise portal clients (TEP client) with a Java-based user
interface for viewing and monitoring your enterprise. Tivoli enterprise portal offers two
modes of operation: desktop and browser.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Tivoli enterprise monitoring agents, installed on the systems or subsystems you want to
monitor. These agents collect data from monitored or managed systems and distributes
it to a monitoring server. Four different System p agents are available for monitoring and
gathering information.
- Virtual I/O Server Premium Agent: The VIOS premium agent monitors the health of
the VIOS, provides mapping of storage and network resources to the client LPAR,
and provides storage and network utilization statistics. The VIOS premium agent is
preinstalled on a VIOS system. No further installation is required, but the agent must
be configured and bound to a TEMS and TEPS in order to be viewed from a TEP.
- CEC Agent: The CEC agent provides overall CPU and memory utilization of the
frame for monitored partitions and provides CPU and memory utilization by LPAR.
The CEC agent must be installed on an AIX partition that resides on the CEC to be
monitored.
- AIX Premium Agent: The AIX premium agent provides statistics for each LPAR
(entitled CPU, physical and logical CPUs), memory utilization, disk and network
utilization, and process data. This agent also provides usage statistics for WPARs
as well. An AIX agent can be installed on any AIX partition. All AIX agents on a CEC
are typically bound to the same TEMS and TEPS so they can be viewed from the
same TEP.
- HMC Agent: The HMC agent provides health and availability of the HMC. A HMC
agent can be installed on any AIX partition, but requires a ssh connection to the
HMC to monitor it, so it is generally convenient to install it on the same partition as
the CEC agent that already requires a ssh connection. Multiple instances of the
HMC agent can be invoked on the same partition to monitor multiple HMCs.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Total CPU
and
memory
allocated
to LPARs
Notes:
This figure shows a view of the overall CPU and memory resources allocated to the
different logical partitions. These information are provided by the CEC agent.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CPU,
memory, disk,
network info
per LPAR
Notes:
Monitoring can be done on individual or groups of logical partitions. The figure shows an
example of CPU, memory, disk, and network resources for an individual logical partition.
This information is provided by the AIX premium agent running on the logical partition.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Shows how
network
interfaces are
mapped to
LPARS
Notes:
Tivoli Monitoring is virtualization aware it can show the relationships between virtual
resources and physical resources. The figure shows an example of virtual network
adapters and interfaces mapped to the different logical partitions. This virtual resource
information is provided by the VIOS premium agent installed on the virtual I/O server.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Shows virtual
storage
mapping for
VIO Server
Shows virtual
storage
mapping detail
for VIO Server
Notes:
The figure shows an example of virtual disk mapping and utilization information of the
virtualized environment. This is also provided by the VIOS premium agent running on the
virtual I/O server.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The VIOS premium agent also provides status information of the devices (virtual and
physical) on a Virtual I/O Server.
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
1. Which command can be used to check and monitor a shared
Ethernet adapter failover on a Virtual I/O Server?
Notes:
Checkpoint solutions
1. Which command can be used to check and monitor a shared Ethernet
adapter failover on a Virtual I/O Server?
The answer is run topas and then press E.
Additional information
Transition statement
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
Having completed this unit, you should be able to:
Describe standard AIX/virtualization monitoring tools
Identify freeware monitoring tools
Describe virtualization management and monitoring tools, such
as
IBM Systems Director
IBM Tivoli Monitoring
Notes:
Copyright IBM Corp. 2010, 2011 Unit 10. Virtualization management tools 10-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint solution
1. The PowerVM Enterprise Edition is required for which of the following?
a. Shared Ethernet adapter
b. Partition mobility
c. Virtual SCSI Adapter
d. Integrated Virtual Ethernet
e. Active Memory Sharing
The answers are partition mobility and Active Memory Sharing.
Unit 2
Checkpoint solutions (1 of 4)
1. Match the following processor terms to the statements that describe
them: Dedicated, shared, capped, uncapped, virtual, logical
a. Dedicated These processors cannot be used in micro-partitions.
b. Uncapped Partitions marked as this might use excess processing
capacity in the shared pool.
c. Logical There are two or four of these for each virtual processor if
simultaneous multithreading is enabled.
d. Dedicated This type of processor must be configured in whole
processor units.
e. Shared These processors are configured in processing units as
small as one-hundredth of a processor.
f. Capped Partitions marked as this might use up to their entitled
capacity but not more.
The answers in the correct order are dedicated, uncapped, logical,
dedicated, shared, and capped.
AP
Checkpoint solutions (2 of 4)
3. True or False: By default, dedicated processors are returned to the
shared processor pool if the dedicated partition becomes inactive.
The answer is true.
4. If a partition has 2.5 processing units, what is the minimum number of
virtual processors it must have?
a. One
b. Three
c. No minimum
The answer is three.
5. If a partition has 2.5 processing units, what is the maximum number of
virtual processors it can have?
a. 25 (Maximum can be no more than 10 times processing units.)
b. 30
c. Total number of physical processors x 10
d. No maximum
The answer is 25 (maximum can be no more than 10 times processing
units).
Checkpoint solutions (3 of 4)
6. What is the maximum amount of processing units that can be allocated
to a partition?
The answer is all available processing units.
7. If an uncapped partition has an entitled capacity of 0.5 and two virtual
processors, what is the maximum amount of processing units it can
use?
The answer is 2.0 processing units because it is uncapped and has
two virtual processors (maximum of 1.0 units per virtual processor).
8. If there are multiple uncapped partitions running, how are excess
shared processor pool resources divided between the partitions?
The answer is the uncapped weight configuration value is used to
allocate excess resources.
AP
Checkpoint solutions (4 of 4)
10. What is the maximum number of virtual processors that can
be configured for an individual partition?
The answer is up to ten times the amount of processing
units, with a maximum value of 64.
Unit 3
Topic 1
Checkpoint solutions
1. True or False: Dedicated processors can be shared only if they are
idle.
The answer is true.
2. True or False: Only uncapped partitions can use idle cycles donated
by the dedicated processors.
The answer is true.
AP Topic 2
Checkpoint solutions
1. True or False: Each shared processor pool has a maximum
capacity associated with it.
The answer is true.
2. True or False: The default shared processor pool does not
have a number.
The answer is false (default shared pool ID = 0).
3. What is the default value of the reserved pool capacity for a
shared processor pool?
The answer is the default value is 0.
Unit 4
Checkpoint solution (1 of 3)
1. True or False: PowerVM Active Memory Sharing feature allows shared memory
partitions to share memory from a single pool of shared physical memory.
The answer is true.
3. True or False: The total logical memory of all shared memory LPARs is allowed to
exceed the real physical memory allocated to a shared memory pool in the
system.
The answer is true.
AP
Checkpoint solution (2 of 3)
5. What requirements must be met by the LPAR in order to be defined
as shared memory LPAR?
The answer is the LPAR must use shared processors and use only
virtual I/Os.
Checkpoint solution (3 of 3)
9. True or False: The Collaborative Memory Manager is an operating
system feature that gives hints on memory page usage to the
hypervisor.
The answer is true.
10. Which commands can be used to get Active Memory Sharing
statistics?
The answer is vmstat, lparstat, topas, and svmon.
11. True or False: When AIX starts to loan logical memory pages, by
default it first selects pages used to cache file data.
The answer is true.
12. How can you tune the Collaborative Memory Managers loan policy?
The answer is the policy is tunable through the AIX VMM vmo
command. The parameter ams_loan_policy has a default value of 1.
This enables the loaning of the file cache. When set to 2, loaning of
any type of data is enabled.
Copyright IBM Corporation 2011
AP Unit 5
Topic 1
Checkpoint solutions
1. True or False: Every POWER7 system comes with Active Memory
Expansion as standard.
The answer is false.
2. True or False: Active Memory Expansion allows a partition to
effectively use more memory than the logical memory amount
allocated by the hypervisor.
The answer is true.
5. True or False: The AME feature costs the same on every POWER7
system.
The answer is false.
Copyright IBM Corporation 2011
Topic 2
Checkpoint solutions
1. True or False: Any user can use the amepat command to generate
AME modeling information.
The answer is false.
2. True or False: The amepat command can be used to generate a
report using recorded data.
The answer is true.
3. True or False: The amepat command should be run when the target
workload is idle.
The answer is false.
4. True or False: The amepat command can run on any system running
AIX 6.1 TL4 SP2 or above.
The answer is true.
5. True or False: The amepat command should only be run when AME
is disabled.
The answer is false.
Copyright IBM Corporation 2011
AP Topic 3
Checkpoint solutions
1. True or False: A partition can use AME on any POWER7 system.
The answer is false. (AME can only be configured on a system with
the correct activation code.)
Topic 4
Checkpoint solutions
1. True or False: Monitoring a partition that has AME configured
is completely different to monitoring a partition without AME.
The answer is false.
2. True or False: A memory deficit is resolved by lowering the
expansion factor value, and removing true memory.
The answer is false.
3. True or False: The vmstat command will always report AME
statistics when AME is enabled.
The answer is false.
4. True or False: The topas command will always show AME
statistics on the initial page when AME is enabled.
The answer is true.
Copyright IBM Corporation 2011
AP Unit 6
Checkpoint solutions
1. As with SCSI, a server adapter must be created at the VIOS.
However, how does its function differ from VSCSI?
The answer is with N_PIV, the VIOS provides a pass-thru
service.
Unit 7
Topic 1
Checkpoint solutions
1. Once a CPU constraint is found as a bottleneck, what are
some steps that can be taken to solve the problem?
The answers are check process activity to determine errant
processes, add CPU resources, change configuration (for
example, capped to uncapped and dedicated to donate
cycles, and move workload.
AP Topic 2
Checkpoint solutions (1 of 2)
1. True or False: Memory requirements on the VIOS to support
VSCSI I/O operations are minimal because no data caching
is performed on the VIOS.
The answer is true.
Checkpoint solutions (2 of 2)
3. Which one of the following recommendations about sizing the Virtual
I/O Server for virtual SCSI is false:
a. For the best performance, dedicated processors can be used.
b. When using shared processors, use the uncapped mode.
c. When using shared processors, set the priority (weight value) of the Virtual I/O
Server partition equal to its client partitions.
The answer is when using shared processors, set the priority (weight
value) of the Virtual I/O Server partition equal to its client partitions.
AP Topic 3
Checkpoint solutions
1. True or False: Virtual Ethernet adapters are created and the PVID
assignments are performed using the Hardware Management Console
(HMC).
The answer is true.
Topic 4
Checkpoint solutions
1. True or False: When using shared Ethernet adapters, set the MTU size to 65390
on the physical adapter for the best performance.
The answer is false.
2. True or False: Processor utilization for large packet workloads on jumbo frames is
approximately half that required for MTU 1500.
The answer is true.
3. If you see many collisions or dropped packets for the SEA device, what are the
first two things to investigate?
The answers are VIOS CPU utilization and physical adapter saturation.
4. True or False: You can configure a maximum amount of network bandwidth for
individual clients of a shared Ethernet adapter.
The answer is false. You can only set priorities.
5. True or False: For mixed shared Ethernet adapter and VSCSI services, leave
threading enabled on the shared Ethernet adapter device.
The answer is true.
Copyright IBM Corporation 2011
AP Topic 5
Checkpoint solutions (1 of 2)
1. True or False: The IVE allows partitions to connect to an external network without the need for a
Virtual I/O Server partition.
The answer is true.
2. True or False: Partitions using IVE logical ports must be connected to an external switch to
communicate with each other.
The answer is false. Partitions configured with logical ports on the same physical port do not need to
connect through an external switch to communication with each other.
3. True or False: The standard IVE adapter card on most POWER6 systems will connect 16 LPARs,
but you can optionally order an IVE adapter card which connects up to 32 LPARs.
The answer is true.
4. True or False: An IVE logical port can be used as the physical adapter in an SEA configuration.
The answer is true.
5. You can see the number of QPs by looking at output from what command?
a. lsattr (AIX)
b. entstat (AIX)
c. ifconfig (AIX)
d. lshwres (HMC)
The answer is entstat (AIX).
Checkpoint solutions (2 of 2)
6. True or False: It is best to have the number of QPs equivalent to the number of virtual, dedicated,
or logical processors in a partition (whichever is the highest number).
The answer is true.
7. True or False: The best performance will be between logical ports which share the same internal
switch.
The answer is true.
8. True or False: The MCS value sets the maximum number of available logical ports per physical
port.
The answer is false. MCS value sets the maximum number per port group.
9. True or False: The MCS value sets the number of queue pairs (QPs) in each partition which is
configured for that port group.
The answer is true.
11. What is the effect of disabling the multicore attribute for an LHEA Ethernet device in an AIX LPAR?
The answer is when you disable the multicore attribute, the device has just one QP.
AP Unit 8
Checkpoint solutions
1. True or False: The VASI interface controls every phase of the partition
mobility process.
The answer is false.
4. What log usually provides details not found in the HMC migration error
message?
The answer is the config log; alog t cfg.
Copyright IBM Corporation 2011
Unit 9
Checkpoint solutions
1. Which command can be used to display the code version of the HMC?
The answer is lshmc -V.
2. Your currently installed firmware level is EL320_59_031, and the new service pack
is EL320_061_031. Is this disruptive?
The answer is no.
3. In the service partition, what command is used to apply the updated system
firmware?
The answer is update_flash.
4. List the commands that can be used to back up and restore the volume group data
structures.
The answers are savevgstruct and restorevgstruct.
5. If you resize a backing device at the VIO Server version 1.3 or later, what
commands must be executed at the client partition to use the additional space?
The answer is chvg -g.
AP Unit 10
Checkpoint solutions
1. Which command can be used to check and monitor a shared Ethernet
adapter failover on a Virtual I/O Server?
The answer is run topas and then press E.
backpg
Back page