Você está na página 1de 82

Stefan Gocke

Consultant
Gocke IT Solutions

PowerHA and PowerVM together


building a virtualized cluster

Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
9.0
Stefan Gocke

Gocke IT Solutions

stefan.gocke@t-online.de

In IT Industry since 1986 - Freelance since 2003


based south of Munich
Projekts in POWER Systems
(LPAR, Virtualization, HACMP, TSA, etc.)
Projects in Storage
(DS8000, DS4000, SVC, V7K, SAN)
Projects with TSM and Tape
(AIX, Windows, Linux, Solaris, Netware, z/OS TSM SRV)
Trainings for ArrowECS, GlobalKnowledge, Avnet, FastLane, ....
(AIX, PowerHA, GPFS, LPAR, POWER7/8, TSM, SVC, Storwize, DS8000, SAN, )

IBM CATE Power Systems for AIX V3 , and many more .

! ! ! ! !
Agenda
Differences between PowerHA V6.1 and V7.1
Identify and work around SPOFs
in a virtualized environment
(Live) Partition Mobility
and PowerHA SystemMirror (HACMP)
VIO Server functions and
PowerHA SystemMirror (HACMP)
POWER7 AME and PowerHA SystemMirror (HACMP)
SR-IOV / IVE - with PowerHA System Mirror (HACMP)
Its your session

If you dont ask you cant get an answer

So please ask questions at all times

Or email later to : Stefan.Gocke@t-online.de


PowerHA SystemMirror and Virtual Devices

PowerHA SystemMirror will work with virtual devices.

PowerHA SystemMirror is supported with Live Partition Mobility

All IBM Education Services classes on PowerHA SystemMirror


run on fully virtualized LPARs

Some restrictions apply when using virtual ethernet and/or


virtual disk access.

Using a virtual I/O server will add new SPOFs to the cluster which
need to be taken into account.
Whats different with PowerHA V7.1(.x)
PowerHA SystemMirror Version 7.1.x

Change in PowerHA Cluster through new use of


Cluster Aware AIX (CAA) instead of topology services
no topsvcs any more
New SAN heartbeating no disk heartbeat
Use of AIX - CAA repository
Different SAN Zoning (may be) needed
Network needs multicast enabled
cthags instead of grpsvcs
rootvg system event monitoring through AHAFS
And dont use PowerHA V7.1.0 !!
(CAA redesign afterward makes migrations not usable)
PowerHA SystemMirror Version 7.1.x

The HACMP - part is still the same !


HACMPODM is still there
Different GUI
Some more options in resource groups (not discussed here)

My personal testing viewpoints:


The failover is better / faster
The cluster is more stable
The repository in CAA is different ...
Cluster repository
There is only one repository to make sure no split brain situations can occur.

Starting with PowerHA SystemMirror V7.1.1 SP2 repository resiliency has been
implemented, so a failure of the repository will not fail the cluster and the failed disk
can be recovered.

Starting with PowerHA SystemMirrir V7.1.3 you can replace the repository disk
clmgr replace repository hdisk<x>

You can prepare for the loss of a repository
clmgr add repository hdisk<x>
clmgr replace repository (when the rst one failed)

You can add a second repository when building the cluster
clmgr add cluster <xxx> repository hdiskp,hdiskq

Copyright IBM Corporation 2015


Multicast IP, SAN and Repository disk requirements

Cluster communication requires a multicast IP address to be used.


You can specify this address when you create the cluster, or you can let one be generated
automatically.
The ranges 224.0.0.0 through 224.0.0.255 and 239.0.0.0 through 239.255.255.255 are
reserved for administrative and maintenance purposes.
Also, make sure the multicast traffic generated by each of the cluster nodes is properly
forwarded by the network infrastructure toward any other cluster node.
Should you use SAN-based heartbeating, you must have zoning setup to ensure
connectivity between host FC adapters. You also need to activate Target Mode Enabled
parameter on the involved FC adapters.
(see later charts for better explanation)
Hardware redundancy at the storage subsystem level is mandatory for the Cluster
Repository disk. LVM mirroring of repository disk is not supported. This disk should be at
least 1GB in size.
Starting with PowerHA 7.1.3 also unicast addresses can be used again
(This is still untested as eGA will be mid December 2013)

10
Differences PowerHA V6.1 and V7.1.x
PowerHA V6.1 uses RSCT Topology Services
PowerHA V7.1 uses CAA
Dont start/use V7.1.0 use 7.1.2 / 7.1.3 (bepends

Repository disk considerations


Mulitcast instead of unicast heartbeating
SAN heartbeating using the multicast

Both use RSCT Group Services

Be aware of a memory leak RSCT (IV69760/IV66606)

IBMconfig.RM daemon on CAA Group Leader (lssrc -ls IBM.ConfigRM)


Information on IV69760
Be aware of a memory leak RSCT (IV69760/IV66606))

rsct.core.rmc starting 3.1.5.0 to 3.2.0.4 in IBMconfig.RM (only group leader)


in CAA which can bring the CAA node down.

Time to failure after a new boot is estimated to be between 6 and 8 months.

A reboot of the node will remedy the situation.

"lssrc -ls IBM.ConfigRM will show the active group leader.

Affects PowerHA V7 and VIO SSP. Fix is pending.


PowerHA SystemMirror V6.1

PowerHA 6.1

RSCT

Resource Monitoring
and Control
Resource Manager Group Services

Topology Services

AIX
PowerHA SystemMirror V7.1.x

PowerHA 7.1.x

RSCT

Resource Monitoring
and Control
Resource Manager Group Services

CAA
AIX
The non-virtualized Cluster

Eliminating all single points of failures

includes the following items:


client
Redundant access to the disks of the rootvg.

Node 1 Node 2
Redundant disks for the rootvg.
hdisk0 hdisk0
rootvg rootvg
Redundant disks for the datavg.
hdisk1 hdisk1
rootvg rootvg

Redundant adapters for the datavg.

External SAN Storage


Redundant network cards.
hdisk2 hdisk3 hdisk4 hdisk5
The virtualized cluster Node

Client LPAR A Client LPAR B


In a virtualized environement, a VIO server is
A B

used for the disk access.

The following SPOFs are now


vscsi0 vscsi0
Hypervisor

added to the system:

- rootvg disks
vhost0 vhost1

vtscsi0 vtscsi1
- datavg disks

- VIO server partition

scsi0
VIOS 1
How can these SPOFs from a storage (disk)

A point of view be eliminated?


B
Eliminating SPOF for VIO storage
(vSCSI)
Add a Second VIO Server

AIX Client LPAR A AIX Client LPAR A

A A B B

LVM Mirroring LVM Mirroring

vscsi0 vscsi1 vscsi0 vscsi1


Hypervisor

vhost0 vhost1 vhost0 vhost1

vtscsi0 vtscsi1 vtscsi0 vtscsi1

VIOS 1 scsi0 VIOS 2 scsi0

A A

B B
Redundancy Using Mirroring (as a reference only)

Define a second VIO server partition and allocate a second disk for the rootvg.

The backing device can be a whole disk (local or on the SAN) as well as a single LV.

Mirror the rootvg in the same way as always:

- extend the volume group


- mirror all LVs or use the mirrorvg command
- turn off quorum checking
- set the bootlist
- execute the bosboot command.

In case of a failure of one VIO server:

- The LPAR will see stale partitions


- Volume group needs to be resynchronized using syncvg or varyonvg

The procedure could be used for datavgs as well.

This procedure does no automatic recovery so not really usable


Redundancy using MPIO
AIX Client LPAR 1 AIX Client LPAR 2

A B

MPIO Default PCM MPIO Default PCM

vscsi0 vscsi1 vscsi0 vscsi1


Hypervisor

vhost0 vhost1 vhost0 vhost1

vtscsi0 vtscsi1 vtscsi0 vtscsi1

fcs0 fcs1
VIOS 1 VIOS 2
A

PV LUNs
The Concept

The same LUN is mapped to both VIO servers in


AIX Client LPAR 1 AIX Client LPAR 2

A B

the SAN. MPIO Default PCM MPIO Default PCM

vscsi0 vscsi1 vscsi0 vscsi1

From both VIO servers, the LUN is mapped again

Hypervisor
to the same LPAR.

The LPAR will correctly identify the disk as an vhost0 vhost1 vhost0 vhost1

vtscsi0 vtscsi1 vtscsi0 vtscsi1

MPIO capable device and create one hdisk device

with two paths. VIOS 1


fcs0
VIOS 2
fcs1

The device will work only in failover mode, other PV LUNs

modes are not supported.


Disk Reservation

A VIO server will reserve a disk device on open. Opening means making the virtual SCSI target
available.
If the device driver is not able to ignore the reservation, the device can not be mapped to a
second partition at a time.
All devices accessed through a VIO server must support a no_reserve attribute.
This is not an issue if it is a real single VIO attachment (which means no redundancy and no
HACMP)
Consequences for PowerHA SystemMirror

The reservation held by a VIO server can not be broken by PowerHA SystemMirror.

Reservation requests from the client partition can not be passed to the real storage (so
far).

Only devices that will not be reserved on open are supported.

In this case PowerHA SystemMirror requires the use of


enhanced concurrent mode volume groups.

No Problem for PowerHA V7.x, as only ecmvg are supported.

The traditional VG type is not supported in a virtualized environment due to the lack of
access control.

PowerHA SystemMirror V7.1.x will convert a VG automatically to an ecmvg.


Example: Normal Operation
root@GITS-LPAR1:/home/root# lspv
hdisk0 00cd0e0ad3ff82a7 rootvg active
hdisk1 00cd0e0afb614ca2 rootvg active
hdisk2 00cd0e0aed8c400b dirvg active
hdisk3 00cd0e0aed8c40e2 dirvg active

root@GITS-LPAR1:/home/root# lsdev -Ccdisk


hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available Virtual SCSI Disk Drive
hdisk2 Available Virtual SCSI Disk Drive
hdisk3 Available Virtual SCSI Disk Drive

root@GITS-LPAR1:/home/root# lsattr -El hdisk2


PCM PCM/friend/vscsi Path Control Module False
algorithm fail_over Algorithm True
hcheck_cmd test_unit_rdy Health Check Command True
hcheck_interval 0 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
max_transfer 0x40000 Maximum TRANSFER Size True
pvid 00cd0e0aed8c400b0000000000000000 Physical volume identifier False
queue_depth 3 Queue DEPTH True
reserve_policy no_reserve Reserve Policy True

root@GITS-LPAR1:/home/root# lspath
Enabled hdisk0 vscsi0
Enabled hdisk1 vscsi1
Enabled hdisk2 vscsi0
Enabled hdisk3 vscsi1
Enabled hdisk2 vscsi1
Enabled hdisk3 vscsi0
Failure Situation

root@GITS-LPAR1:/home/root# lspv
hdisk0 00cd0e0ad3ff82a7 rootvg active
hdisk1 00cd0e0afb614ca2 rootvg active
hdisk2 00cd0e0aed8c400b dirvg active
hdisk3 00cd0e0aed8c40e2 dirvg active

root@GITS-LPAR1:/home/root# lspath
Failed hdisk0 vscsi0
Enabled hdisk1 vscsi1
Failed hdisk2 vscsi0
Enabled hdisk3 vscsi1
Enabled hdisk2 vscsi1
Failed hdisk3 vscsi0

An errpt entry for the failure will be generated


Failure Recovery

Two error sceanrios:


Device is Failed needs activation this can be automated
Device is Missing needs cfgmgr

First step: Restart the failed VIOServer.


- Verify device access by the VIOServer and VSCSI mapping.
- lsdev type disk; lspv; lsmap -all

VSCSI does not (yet) provide the same level of comfort as software like SDDPCM. This
means:
- No daemon process for continuous path verification.
- No automatic recovery.

If the client LPAR has been rebooted while one VIOServer was down: cfgmgr

Use chpath to bring failed paths back:


chpath -1 hdiskX -p vscsiN -s enable
Failure Recovery MPIO automatic recovery

Automatic path recovery is available (supported) since VIOS1.3


in the following way:
Update the following attributes of each individual disk (hdisk) device:
assign a health check interval (given in seconds check with disk vendor)
* chpath -1 hdisk3 a hcheck_interval=60 p
assign different priorities to each path:
* chpath -1 hdisk3 a priority=1 p vscsi1
* chpath -1 hdisk3 a priority=2 p vscsi2

If a path fails, it will be set to error. If a path returns, it will be set automatically back to the
enabled state. No need to call cfgmgr.

Be careful when performing maintenance of a VIO server partition !

[ lsattr -El <hdiskx> ] will now show if the setting is enabled or only in ODM but not reboot
has been made
Scripts to help (from Brian Smiths blog)

If you set the parameters via p only the ODM is updated.


How do you see if a reboot has been made?

Check the settings with kdb L or


https://www.ibm.com/developerworks/community/blogs/brian/entry/compare_hba_settings?lang=en
https://www.ibm.com/developerworks/community/blogs/brian?lang=en

Do at your own risk as they use kdb.


reboot had been done NO reboot has been made
MPIO Summary

Uses simple fail_over mode of VSCSI PCM only!

- Other algorithm may not be possible anyway.


- Can`t guarantee order of writes otherwise!
- At the moment this is the only mode available.
- Disadvantage: FC adapters in second VIOServer underutilized.
- Can use chpath to turn a path off or on, but primary path is
determined during cfgmgr automatically.
- If priority is configured for the devices, the lower value determines
the highest priority.

AIX CDs contain all you need in the client partitions:

- No special drivers required.


- Necessary filesets are installed automatically.
Tuning - Queue Depth

VIO Client Virtual Disk (Q1) VIO


Client
Queue depth: 1-256, default: 3
To avoid queuing on virtual SCSI client adapter the Q1 Virtual
Disk
maximum
number LUNs per virtual client adapter should be:
Virtual
Max LUNs = INT (510 / (Q1+3)) Q2 Adapter

PowerVM
Hypervisor
VIOC Virtual SCSI Client Adapter (Q2)
Command elements (CE): 512
2 CE for adapter use (512 2 = 510 remaining)
3 CE for each device for recovery
Virtual VIO
Q2 queue depth is fixed Adapter Server
Target
Device
VIOS Disk Adapter (Q3)
Default Q3 queue depth for FC is 200 (num_cmd_elems)
Disk
Q3 Adapter

VIOS Physical Disk (Q4)


Queue depth varies by device (1-256) Q4 Disk
Single queue per disk or LUN even if using LV-VSCI disks
Note: There may also be queues for multi-path driver paths (e.g. SDD and PowerPath)
MPIO using NPIV virtual fiberchannel devices

MPIO using NPIV requires the use of the supplied MPIO device drivers of the manufaturer of
the disk subsystem in the LPAR (again).

SDDPCM for IBM disks subsystems


(DS8000 / DS5000 / )
PowerPath for EMC disks subsystems
HLDM for Hitachi disk subsystems
.

Deamon controlled

Loadbalancing enabled per default

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10691
N_Port ID Virtualization (Virtual FC)
NPIV simplifies disk management

N_Port ID Virtualization
Multiple Virtual World Wide Port Names per FC port PCIe 8 Gb adapter
LPARs have direct visibility on SAN (Zoning/Masking)
I/O Virtualization configuration effort is reduced

Virtual SCSI Model N_Port ID Virtualization


AIX AIX
Generic SCSI Disks DS8000 HDS

VIOS VIOS

FC Adapters FC Adapters

SAN SAN

DS8000 HDS DS8000 HDS


N_Port ID Virtualization

N_Port ID Virtualization
VIOC VIOC
Virtualizes FC adapters
Virtual WWPNs are attributes of the client virtual Multipath SW Multipath SW
FC adapters not physical adapters
64 WWPNs per FC port (128 per dual port HBA) Virtual FC - Client

Hypervisor

Customer Value VIOS Virtual FC - Server VIOS

use existing storage management


Allows common SAN managers, copy services,
backup/restore, zoning, tape libraries, etc
Physical
Load balancing across VIOS FC Ports (8 Gb FC)

Allows mobility without manual management


intervention
NPIV Enabled
Tape SAN
NPIV - Things to consider

WWPN pair is generated EACH time you create a VFC.


NEVER is re-created or re-used. Just like a real HBA.
If you create a new VFC, you get a NEW pair of WWPNs.
Save the partition profile with VFCs in it.
Make a copy, dont delete a profile with a VFC in it.
Make sure the partition profile is backed up for local and disaster recovery! Otherwise youll
have to create new VFCs and map to them during a recovery.
Target Storage SUBSYSTEM must be zoned and visible from source and destination systems
for LPM to work.
Active/Passive storage controllers must BOTH be in the SAN zone for LPM to work
Do NOT include the VIOS physical 8G adapter WWPNs in the zone
You should NOT see any NPIV LUNs in the VIOS
Load multi-path code in the client LPAR, NOT in the VIOS
No passthru tunables in VIOS
PowerHA SystemMirror - NPIV setup

Client LPAR 1 Client LPAR 2

A B

Multi-Path Software Multi-Path Software

fcs0 fcs1 fcs2 fcs3 fcs0 fcs1 fcs2 fcs3


Hypervisor

vfchost0 vfchost1 vfchost2 vfchost3 vfchost0 vfchost1 vfchost2 vfchost3

VIOS 1 fcs0 fcs1 VIOS 2 fcs0 fcs1

B
PV LUNs
NPIV heterogeneous connectivity

Mix virtual fiberchannel and physical fiberchannel adapters


for heterogeneous SAN connectivity
VIO client
Generic backup
To fulfill service level agreements and provide redundancy tape client
device
Cost effective redundancy
Tape support / LAN-free backup and restore support vSCSI

VIOS
VIOS AIX Client LPAR
Passthru module A
FC Adapters
NPIV

NPIV
Fibre
Fibre

HBA
HBA

LAN
Note: backup
Multipath server
POWER Hypervisor FC & FC SAN

drive robotics
Storage Controller
SAN Switch SAN Switch Tape Library
A B C D

Note: NOT vSCSI A B C D


Considerations for SAN heartbeating
with PowerHA SystemMirror V7.1.x
Configure SAN heart beating in virtual environment
Host Zoning is not enough !
target-mode on the FC adapter(s) is needed

Enable the tme attribute on (VIOS) FC adapters:


As padmin: chdev -dev fcs(x) -attr tme=yes -perm
(shutdown -restart needed)
On the HMC, Add a new virtual ethernet adapter to the profile of each PowerHA
virtual client node with a VLAN ID of 3358.
Reactivate the partition using the new profile.
When it boots, a new "entX" should show up, and there should be an sfwcommX
child of that entX.
Verify the configuration changes by running the following command:
[ lsdev -C | grep sfwcom ]
sfwcomm0 Available 01-00-02-FF Fiber Channel Storage Framework Comm
sfwcomm1 Available 01-01-02-FF Fiber Channel Storage Framework Comm

lscluster -i should list the interface as UP.


Notes on the VLAN 3358

VLAN 3358 only needs to be created on virtual client LPARs,


not VIOS. (?)

VLAN 3358 is the only value that CAA will use,


VLAN tag of sfw0 should not be changed.

The entX adapter associated with VLAN 3358 does not require an
enX interface on top of it and does not require an IP address.
Eliminating SPOF for VIO LAN
(Virtual Ethernet Adapter)
Virtual Ethernet The virtualized Node

VIO Client
In a virtualized environment, a VIO server is needed

for access to the outside world.

NIC The SEA acts as a layer-2 based ethernet bridge.

VIOS
Now, the physical network devices in the VIO Server

SEA are new SPOFs.

NIC

How can these SPOFs from a networking point of view

Switch
be eliminated?
Single VIOS Single NIC
The Second VIO Server

VIOS 1 VIOS 2 Client 1 Client 2

en2 en2 en2 en2


(if) (if) (if) (if)

ent2 ent2 ent2 ent2


(SEA) (SEA) (LA) (LA)
NIB NIB
ent0 ent1 ent1 ent0 ent0 ent1 ent0 ent1
(Phy) (Vir) (Vir) (Phy) (Vir) (Vir) (Vir) (Vir)

PVID PVID PVID PVID PVID PVID


Hypervisor

1 2 1 2 1 2

VLAN 1
VLAN 2

Untagged Untagged

Untagged
Ethernet Switch Ethernet Switch

Active
Passive
The Second VIO Server (using manual loadbalancing)

VIOS 1 VIOS 2 AIX 1 AIX 2

en2 en2 en2 en2


(if) (if) (if) (if)

ent2 ent2 ent2 ent2


(SEA) (SEA) (LA) (LA)
NIB NIB
ent0 ent1 ent1 ent0 ent0 ent1 ent0 ent1
(Phy) (Vir) (Vir) (Phy) (Vir) (Vir) (Vir) (Vir)

PVID PVID PVID PVID PVID PVID


Hypervisor

1 2 1 2 2 1

VLAN 1
VLAN 2

Untagged Untagged Note: If you split the active


client interfaces across VIOSs,
those LPARs will talk to each
other through the external
switches not the Hypervisor.
Untagged
Ethernet Switch Ethernet Switch

Active
Passive
Etherchannel

The VIO server partitions can use an etherchannel in link aggregation mode.

The VIO client partition can use etherchannel only interface backup mode.
Link aggregation for client partitions is not supported/implemented.

The same PVID for both VIO servers is only allowed using a second vSwitch (on
POWER6 and POWER7 Firmware.

In case one VIO fails, the etherchannel in the client partition will automatically
switch to second adapter.

The virtual server adapter must have the same PVIDs as the client adapters.
If this is true, all VLAN tags will be stripped off by the VIO server which is a
requirement for PowerHA SystemMirror.
Dual VIO-Server with Network Interface Backup (NIB)

Two virtual switches with the same VLAN ID

PowerHA LPAR

NIB

MAC Y MAC Z

ent0 ent1

Virtual I/O Server 1 Virtual I/O Server 2

Shared Ethernet Adapter ent4 Shared Ethernet Adapter


VLAN 4 vSwitch0

VLAN 4
802.3ad vSwitch1 ent4 802.3ad

MAC A MAC B MAC C MAC D MAC E MAC F MAC G MAC H

ent0 ent1 ent2 ent3 ent0 ent1 ent2 ent3

IEEE 802.3ad IEEE 802.3ad


Link Aggregation Link Aggregation

Switch 1 Switch 2
Two virtual switches with the same VLAN ID

Disadvantage:
More manual work on the LPARs
Should use address to ping for failover to work
(Performance effect ?)

Advantage:
Faster failover/failback than SEA failover (see next)

Still needs netmon.cf !


The second, the best-practices way ! SEA failover

Starting with VIOS level 1.2 and HMC version 5.1,


the failover of a SEA device is supported in the systems firmware.

For this approach, two VIO servers with same PVID for the SEA device have to be
configured.

A different Trunk priority is assigned to each VIO server.

(Starting with Firmware 720 a ARP storm cannot be accedentially made, the devices used
for the SEA will not become available should this configuration error accendtially have
been created)

An additional heartbeat VLAN is used between the VIO servers for monitoring.
Or on POWER7+/POWER8 automatically uses VLAN
Dual VIOs and SEA failover

VIOS 1 VIOS 2 Client 1 Client 2

en3 en3
Primary (if) (if) Backup

ent3 ent2 ent2 ent3 en0 en0


(SEA) (Vir) (Vir) (SEA) (if) (if)

ent0 ent1 ent1 ent0 ent0 ent0


(Phy) (Vir) (Vir) (Phy) (Vir) (Vir)

PVID PVID=99 PVID PVID PVID


Hypervisor

1 1 1 1

VLAN 1

Untagged Untagged

Ethernet Switch Untagged Ethernet Switch

Active
Passive
Failure Situation

VIOS 1 VIOS 2 Client 1 Client 2

X
en3 en3
Primary
Backup (if) (if) Backup
Primary

ent3 ent2 ent2 ent3 en0 en0


(SEA) (Vir) (Vir) (SEA) (if) (if)

X
ent0 ent1 ent1 ent0 ent0 ent0
(Phy) (Vir) (Vir) (Phy) (Vir) (Vir)

PVID PVID=99 PVID PVID PVID


Hypervisor

1 1 1 1

VLAN 1

Untagged Untagged

X Ethernet Switch Untagged Ethernet Switch

Active
Passive
Virtual Ethernet Adapters Configuration (PowerHA V6.1)

net_ether_0

9.19.51.20 (service IP) (service IP) 9.19.51.21


9.19.51.10 (persistent IP) (persistent IP) 9.19.51.11
192.168.100.1 192.168.100.2
( base address) Topsvcs heartbeating ( base address)

en0 serial_net_0 en0

PowerHA Node 1 PowerHA Node 2


FRAME 1 FRAME 2

Hypervisor

ent1 ent0 ent2 ent5 ent0 ent5 ent2 ent1 ent0


(phy) (phy) (virt) (virt) (virt) (virt) (virt) (phy) (phy)
FRAME X
Control
Channel Control
Channel
ent3 ent4 en0 ent4 ent3
(LA) (SEA) (SEA) (LA)

Virtual I/O Server (VIOS1) AIX Client LPAR Virtual I/O Server (VIOS2)
Single Interface PowerHA Recommendation (PowerHA V6.1)
and PowerHA V7.1.3 with Unicast addresses

net_ether_0

9.19.51.20 (service IP) (service IP) 9.19.51.21

9.19.51.10 9.19.51.11
( base address) Topsvcs heartbeating ( base address)

en0 serial_net_0 en0

PowerHA Node 1 PowerHA Node 2


FRAME 1 FRAME 2

In single adapter configurations using IPAT via Aliasing the base address
and the service IP address can be on the same routable subnet.

This change translates in no longer having the need for a persistent IP


address on each node

This change came into effect with APAR IZ26020 (5.4) and IZ31675 (5.3).
The change basically removes the verification check.
PowerHA and Single Interface Networks

Generally, single interface networks are not allowed as this limits the error detection
capabilities of PowerHA SystemMirror (V6.1 and also V7.1).

If you decide to use a single interface network, consider the following:


enter external IP-addresses to the netmon.cf for additional analysis.
losing the network connection will result in a failover of all resource groups using this interface
(selective failover).
verify that the failover of the etherchannel works properly. Adjust heartbeat parameters if necessary,
otherwise, you may experience a DGSP.

IZ01331: NEW NETMON FUNCTIONALITY TO SUPPORT HACMP ON VIO

!REQD !ALL 100.12.7.9


!REQD !ALL 110.12.7.9
!REQD !ALL host5.ibm.com
!REQD en1 9.12.11.10

This is also/still true for PowerHA Version 7.1.x


netmon.cf changes APAR IZ01331

Changes to netmon.cf for virtualized LPARs


The "traditional" format of the netmon.cf file has not changed -- one
hostname or IP address per line.
Any adapter matching one or more "!REQD" lines (as the owner) will
ignore any traditional lines.
Order from one line to the other is unimportant;

you can mix "!REQD" lines with traditional ones in any way. However, if using a full 32 traditional lines, do
not put them all at the very beginning of the file otherwise each adapter will read in all the traditional lines
(since those lines apply to any adapter by default), stop at 32 and quit reading the file there. The same
problem is not true in reverse, as "!REQD" lines which do not apply to a given adapter will be skipped over
and not count toward the 32 maximum.

Comments (lines beginning with "#") are allowed on or between lines and
will be ignored.
If more than 32 "!REQD" lines are specified for the same owner, any extra
will simply be ignored (just as with traditional lines).
Is netmon.cf needed fr PowerHA Version 7.1.x ??
PowerHA SystemMirror Version 7.1.0 said not needed
but the same problems still arise

So PowerHA SystemMirror Version 7.1.1 had to


reintoduce the netmon.cf functionality.

netmon.cf was handled by topology services which have moved to or are


replaced by CAA, so PowerHA 7.1.1+ reintroduced netmon.cf handled now
by group services (cthags)
PowerHA Version 7.1.x netmon.cf

For all virtualized network adapters


(virtual ethernet, IVE, and SR-IOV)
and single adapter networks !!

use the netmon.cf


by adding IP-Addresses/hostnames

What about PowerHA SystemMirror Version 7.1.3 ?

The problem is still the same.

Have multiple virtual networks.


(service/backup/admin networks with seperate SEA/physical adapters)
The virtualized cluster
Virtualized PowerHA SystemMirror (HACMP) clusters

AIX Client 1 (HACMP AIX Client 2 (HACMP)

A A

en0 en0
(if) MPIO Default PCM MPIO Default PCM (if)

ent0 ent0
(Vir) vscsi0 vscsi1 vscsi0 vscsi1 (Vir)

PVID=1 PVID=99 PVID=1

PVID=99
PowerVM Hypervisor PowerVM Hypervisor

ent2 ent2 ent2 ent2


VIOS1 (Vir) VIOS2 (Vir) VIOS1 vhost0 (Vir) VIOS2 (Vir) vhost0
vhost0 vhost0
ent5 ent3 ent3 ent5 ent5 ent3 ent3 ent5
(SEA) (Vir) (Vir) (SEA) (SEA) (Vir) (Vir) (SEA)
vtscsi0 vtscsi0 vtscsi0 vtscsi0
Primary Backup Primary Backup
ent4 ent4 ent4 ent4
(LA) (LA) (LA) (LA) Multi-Path Driver
Multi-Path Driver Multi-Path Driver
Multi-Path Driver

fcs0 fcs1 ent1 ent0 ent1 ent0 fcs0 fcs1 fcs0 fcs1 ent1 ent0 ent1 ent0 fcs0 fcs1
(Phy) (Phy) (Phy) (Phy) (Phy) (Phy) (Phy) (Phy)

Ethernet Switch Ethernet Switch

Active
Passive
NPIV: PowerHA SystemMirror Configuration Example

System 1
VIOS 1 Node 1
NPIV
HBA
vscsi0
hdisk0 } rootvg
HBA

}
hdisk vhost0

Hypervisor
MPIO hdisk1
vscsi1
vscsi_vg
hdisk2
HBA

hdisk vhost0 fcs0

}
hdisk3
NPIV
MPIO
LUNS
VSCSI HBA
fcs1
hdisk4
npiv_vg
VIOS 2

LUNS
NPIV
System 2
STORAGE
SUBSYSTEM VIOS 1 Node 2
NPIV
HBA
vscsi0
hdisk0 } rootvg
HBA

}
Hypervisor

hdisk vhost0 MPIO hdisk1


vscsi1
vscsi_vg
hdisk2
HBA

hdisk vhost0 fcs0

}
hdisk3
MPIO
NPIV
HBA
fcs1
hdisk4
npiv_vg
VIOS 2
The biggest problem areas Split Brain

Split brain situations - keeping data consistent

Redundancy has moved out of the cluster


(and into the VIO Servers / VIO Setup)

No direct connectivity between VIO and cluster

Hung cluster nodes write post event scripts


VIO server maintenance
Maintenance of the VIO Server Partition

Situation is different from VIOServer shutdown:


- SW update may coause temporarily inconsistent code levels.
- Can`t rely on continous operation.
Solution: manual failover

- chpath -1 hdiskxy p vscsixy s disable


- smitty etherchannel: force a failover (AIX 530-02 at least needed)
- for LVM mirrored disk, set the virtual SCSI target devices
to a defined` state in the VIO server partition.

The VIO partition needs to be rebooted after maintenance!


Reintegration:
Same as after a failure for vSCSI
Reconfigure etherchannels back to previous settings
Switch of the SEA failover is done automatically.
Virtual or Dedicated Adapters?

The decision if virtual or dedicated devices are used depend on the following factors (pick
any two J ) :

- Costs
- Performance
- Flexibility

From a performance point of view, virtual ethernet inside of a managed system delivers
very high data transfer rates, but this will need CPU cycles.

The use of virtual disk I/O is quite cheap, nevertheless some latency will be added to
your I/O.

NPIV Virtual FiberChannel gets nearer to the original cluster setups


VIO Server Failure

What happens if a VIO server is down?


Simple answer: It depends!
If you only use the VIO server for networking, the networks connections may be lost,
depending on setup. Then a failover of the resource group will occur.
If I/O and networking is done through the VIO server, you will lose all connections between
your cluster nodes. In this case, the LPAR needs to be shutdown immediately.

Otherwise you may receive a DGSP which will halt one of your cluster nodes.
So redundant VIO servers + SEA failover + MPIO
or
Redundant VIO Server + SEA failover + NPIV
I would recommend NPIV / Virtual Fiberchannel easier to setup and maintain
more active resiliancy.
VIO Server Failure / Maintenance / Reboot

Be sure to check all LPARs at all times !!

This is essential !!

There is no guarantie that you automated recovery settings


(VSCSI / SEA Failover / NIB Failback) will work.
So !!
Be sure to check all LPARs afterwards !!
POWER6 and POWER7 - PowerVM
POWER6/POWER7 new virtualization options

Live Partition Mobility next charts


Integrated Virtual Ethernet like virtual ethernet
Multiple shared pools works no PowerHA effects
Shared Dedicated Capacity supported/works
VIO 2.1 NPIV discussed already
VIO 2.1.1 Active Memory Sharing yes special considerations
POWER7 Active Memory Expansion supported/works
VIO 2.2 Shared Storage Pools not usable
VIO 2.2.1 Shared Storage Pools supported
VIO 2.3.x Shared Storage Pools supported
PowerHA (HACMP) and Live Partition Mobility

Minimum requirements:
Firmware Level: 01EM320_40 (or higher) (Nov 2007)
VIOS Level: 1.5.1.1-FP-10.1 (w/Interim Fix 071116)

HMC Level: 7.3.2.

Live Partition Mobility AIX V5.3 AIX V6.1

HACMP 5.3 HACMP # IZ07791 HACMP # IZ07791


AIX 5.3.7.1 AIX 6.1.0.1
RSCT 2.4.5.4 RSCT 2.5.0.0

HACMP 5.4.1 HACMP # IZ02620 HACMP #IZ02620


AIX 5.3.7.1 AIX 6.1.0.1
RSCT 2.4.5.4 RSCT 2.5.0.0

HACMP tolerates the Live Partition Mobility


Live Partition Mobility will not cause unwanted cluster events. When using Live Partition
Mobility error detection rates have to set to normal or slow, not fast.

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10640
Live Partition Mobility Steps

Source Mover Mover Destination


Mobile Service Service Partition
Partition Partition Partition Shell

1 2 3 HMC 4 5 6

PowerVM PowerVM

POWER6 Source System POWER6 Destination System


Ethernet

The HMC creates a compatible partition shell on the destination system


The HMC configures the mover service partitions on the source and destination systems
The HMC issues a prepare for migration event to the source operating system
The HMC creates the necessary vSCSI and/or NPIV devices in the destination systems VIOSes
The source mover starts sending partition state to the destination mover
Once sufficient pages have moved, the Hypervisor suspends the source partition
During the suspension, the source mover partition continues to send partition state information
The mobile partition resumes execution on the destination server
The destination partition retries all pending I/O requests that were not completed
When the destination mover partition receives the last memory page the migration is complete
Live Partition Mobility the migration process

Power6 System #1 Power6 System #2

Suspended
AIX Client 1 Partition ShellClient
AIX Partition
1
M M M M M M M M M M M M M M

en0 en0
A (if) (if) A

vscsi0 ent1 ent1 vscsi0


HMC
VLAN VLAN
PowerVM PowerVM

VASI vhost0 ent1 ent1 vhost0 VASI

Mover ent2 en2 en2 ent2 Mover


Service vtscsi0 vtscsi0 Service
SEA (if) (if) SEA

fcs0 ent0 VIOS ent0 fcs0 VIOS

Logical Memory Copy


Storage
Subsystem

A
Live Partition Mobility

System X System A System B

LPAR y LPAR x HYPV HYPV

HACMP
HACMP
Migrating LPAR to other system
en0

hdisk0

vfscsi0 vhost0 vhost0 vfscsi0

en2 HMC en2


hdiskX
if VASI MSP VASI MSP if hdiskX

ent2 ent1 ent1 ent2


SEA virt virt SEA
Private Private
network network
fsc0 ent0 Service Service
ent0 fsc0
processor processor

Ethernet
Network

Storage Area Network


LUN
Live Partition Mobility to standby system ok ?

System X System A System B

LPAR y LPAR x HYPV HYPV

HACMP
HACMP

en0

hdisk0

vfscsi0 vhost0 vhost0 vfscsi0

en2 HMC en2


hdiskX
if VASI MSP VASI MSP if hdiskX

ent2 ent1 ent1 ent2


SEA virt virt SEA
Private Private
network network
fsc0 ent0 Service Service
ent0 fsc0
processor processor

Ethernet
Network

Storage Area Network


LUN
Live Partition Mobility and PowerHA SystemMirror

Normal maintenance with PowerHA SystemMirror is:


Failover time
No high availability

Maintenance using LPM is:


No downtime because no failover
High availability at least with OS and Application

Maybe SPOF because all LPARs in one system

What about errors during LPM !?!


Live Partition Mobility to standby system ok ? YES

System X System A System B

LPAR y LPAR x Maintenance using LPM is:


HYPV HYPV

No downtime because no failover


HACMP
HACMP
High availability
en0 at least with OS and Application
hdisk0
Same SPOF as before
vfscsi0 vhost0
because all LPARs in one system vhost0 vfscsi0

en2 HMC en2


hdiskX
if VASI MSP VASI MSP if hdiskX

ent2
SEA
ent1
virt
What about errors during LPM !?! ent1
virt
ent2
SEA
Private Private
network network
fsc0 ent0 Service Service
ent0 fsc0
processor
Thats is what PowerHA SysMir is for !
processor

Ethernet
Network

Storage Area Network


LUN
POWER7 Active Memory Expansion
See AME sessions and Redbook for more details
http://www.redbooks.ibm.com/abstracts/sg247590.html

Hardware feature - uses free CPU cycles to compress/uncompress memory

New POWER7+ has accelerator to get a higher expansion factor with less CPU overhead.

Is supported with PowerHA System Mirror (HACMP)

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10708
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10705
.
Active Memory Sharing (AMS)

For details see AMS Redbook


http://www.redbooks.ibm.com/abstracts/redp4470.html

Use on Standby LPARs

Faster than DLPAR of


memory

Start testing to find out


I have no active experience

Supported
AMS Think about the paging

Spread paging I/O across many disk spindles


5 second glitch or 0.5 second glitch

Cache
Disk write caches

Solid State Disks


Other virtualization features

Multiple shared pools used for licensing


are you in the right pool

Shared Dedicated Capacity

performance gain possible when switching


unused cycles of standby LPAR are added to
the shared pool
Discussion - Questions
Now or send your question and/or remarks per email

Stefan.Gocke@t-online.de

The End

Thank You and enjoy the rest of the conference


Continue growing your IBM skills

ibm.com/training provides a
comprehensive portfolio of skills and career
accelerators that are designed to meet all
your training needs.

Training in cities local to you - where and


when you need it, and in the format you want
Use IBM Training Search to locate public training classes
near to you with our five Global Training Providers
Private training is also available with our Global Training Providers

Demanding a high standard of quality


view the paths to success
Browse Training Paths and Certifications to find the
course that is right for you

If you cant find the training that is right for you with our
Global Training Providers, we can help.
Contact IBM Training at dpmc@us.ibm.com
Global Skills Initiative

Copyright IBM Corporation


2015

Você também pode gostar