Você está na página 1de 10

NIM ALT_DISK_INSTALL on AIX

Purpose: To update nodes to TL 12 with SP 1 on clustered systems using alt_disk_install from


the NIM server. The nodes were at TL 8 SP 1.

This procedure was tested on an IBM cluster of 4 nodes and then was implemented on an IBM
cluster of 166 nodes.
The test system cluster of 4 nodes had already been updated to TL 11 SP 3 but the process in
doing the NIM alt_disk_install was what we specifically were looking into testing and it
succeeded.
There was a small margine for error since there was only 1 fileset difference between TL 11 SP 3
and TL 12 SP 1.

Details of the 166 node cluster, for your convenience, can be found at:
http://www.navo.hpc.mil/davinci_about.html#overview

===================

Versions to consider on the test system nodes:

MS* Nodes (4)


Hardware
Model 8203-E4A 9125-F2A
fwversion EL320_040 ES350_085

Operating Systems Previous version Current version


AIX 5300-11-03-1013 5300-12-01-1016

Software
GPFS 3.2.1.21 3.2.1.21
CSM 1.7.1.6 1.7.1.6
ESSL 4.4.0.1 4.4.0.1
PESSL 3.3.0.3 3.3.0.3
LAPI 2.4.7.3 2.4.7.4
POE 5.2.1.6 5.2.1.6**
PBSPro 10.4.0.101257 10.4.4.110077
XLC/C++ (lslpp) 11.1.0.0 11.1.0.0
XLF (lslpp) 13.1.0.0 13.1.0.0
XLC/C++ (ndi) 11.1.0.0 11.1.0.0
XLF (ndi) 13.1.0.0 13.1.0.0

* MS is the Management Server which also is the NIM Server.

Versions to consider on the production system nodes:

MS* Nodes (166)


Hardware
Model 8203-E4A 9125-F2A
fwversion EL320_061 ES340_123

Operating Systems Previous version Current version


AIX 5300-08-01-0819 5300-12-01-1016

Software
GPFS 3.2.1.19 3.2.1.21
CSM 1.7.0.13 1.7.1.6
ESSL 4.3.0.0 4.4.0.1
PESSL 3.3.0.2 3.3.0.3
LAPI 2.4.5.5 2.4.7.4
POE 4.3.2.5 5.2.1.6**
PBSPro 10.4.0.101257 10.4.4.110077
XLC/C++ (lslpp) 11.1.0.0 11.1.0.0
XLF (lslpp) 13.1.0.0 13.1.0.0
XLC/C++ (ndi) 11.1.0.0 11.1.0.0
XLF (ndi) 13.1.0.0 13.1.0.0

* MS is the Management Server which also is the NIM Server.


** IBM needed to provide an eFix; /usr/lpp/ppe.poe/lib/libmpi_r.a was fixed.
Note: Our test systems and operational systems are now running identical software.
Our next activity will be to update the firmware on our operational systems.

Prepare directory structures for NIM operations

If you already have a NIM environment, you can most likely skip this section.

Special Note: I walked into this production environment not having the benifit of designing and
configuring the systems from scratch.
With that in mind; the decision was made to create an environment as if it was from scratch.
You will see below this newer directory structure that handles all the necessary components
needed for a successful NIM installation.

mkdir /csminstall/NIM
mkdir /csminstall/NIM/STAGE
mkdir /csminstall/NIM/LPPSOURCE
mkdir /csminstall/NIM/SPOT
mv /csminstall/ZIPs/TL_5300-12-00-0000 /csminstall/NIM/STAGE
mv /csminstall/ZIPs/SP_5300-12-01-1016 /csminstall/NIM/STAGE
mkdir /csminstall/NIM/STAGE/AIX53-08
mv /csminstall/AIX.old/lppsource/installp/ppc /csminstall/NIM/STAGE/AIX53-08
cd /csminstall/NIM/STAGE/AIX53-08/ppc
mv * ..
mv *.toc ..
cd ..
rm -fr ppc

Creating a new lpp_source

Notice that my STAGE area is where I placed the pre-existant lpp_source (AIX53-08).
Using the AIX53-08 lpp_source I then recreated a new lpp_source where I added the TL 12 and
SP 1 into it.
This preserved the old lpp_source and gave me a new lpp_source to work with. Also, this
provided an additional avenue for
disaster recovery, if ever we would need to reinstall the AIX 5.3 TL 8 SP 1 image. Don't get me
wrong, we have other avenues that
are better and would be the first lines of defense for recovery (i.e. mksysbs are saved off into a
samfs archive that
gets copied to tape).

mkdir /csminstall/NIM/lpp_aix53-12-01
smit nim_mkres
lpp_source
Define a Resource

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Resource Name [lpp_aix53-12-01]
* Resource Type lpp_source
* Server of Resource [master] +
* Location of Resource [/csminstall/NIM/LPPSOURCE/lpp_aix53-12-01> /
Architecture of Resource [power] +
Source of Install Images [/csminstall/NIM/STAGE/AIX53-08> +/
Names of Option Packages [all]
Show Progress [yes] +
Comments []

lsnim -l lpp_aix53-12-01
lpp_aix53-12-01:
class = resources
type = lpp_source
arch = power
Rstate = ready for use
prev_state = unavailable for use
location = /csminstall/NIM/LPPSOURCE/lpp_aix53-12-01
simages = yes
alloc_count = 0
server = master

NOTE: Notice simages = yes; this means that this lppsource is adequate for NIM to use to install
systems.

Updating our lpp_source adding our TL 12

smit nim_update_add
TARGET lpp_source
lpp_aix53-12-01 resources lpp_source
Media, directory or lpp_source to copy images from [/csminstall/NIM/STAGE/TL_5300-12-00-
0000]
Add Software to an lpp_source

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
TARGET lpp_source lpp_aix53-12-01
SOURCE of Software to Add /csminstall/NIM/STAGE/TL_5300-12-00-0000>
SOFTWARE Packages to Add [all] +
-OR-
INSTALLP BUNDLE containing packages to add [] +

gencopy Flags
DIRECTORY for temporary storage during copying [/tmp]
EXTEND filesystems if space needed? yes +
Process multiple volumes? yes +

Press <Enter>
Must exit smit
Press <F3>

Updating our lpp_source adding our SP 1

smit nim_update_add
TARGET lpp_source
lpp_aix53-12-01 resources lpp_source
Media, directory or lpp_source to copy images from [/csminstall/NIM/STAGE/SP_5300-12-01-
1016]
Add Software to an lpp_source

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
TARGET lpp_source lpp_aix53-12-01
SOURCE of Software to Add /csminstall/NIM/STAGE/SP_5300-12-01-1016>
SOFTWARE Packages to Add [all] +
-OR-
INSTALLP BUNDLE containing packages to add [] +

gencopy Flags
DIRECTORY for temporary storage during copying [/tmp]
EXTEND filesystems if space needed? yes +
Process multiple volumes? yes +

Press <Enter>
Must exit smit
Press <F3>

Create our SPOT

smit nim_mkres
spot

Define a Resource

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[Entry Fields]
* Resource Name [spot_aix53-12-01]
* Resource Type spot
* Server of Resource [master] +
* Source of Install Images [lpp_aix53-12-01] +
* Location of Resource [/csminstall/NIM/SPOT/spot_aix53-12-01> /
Expand file systems if space needed? yes +
Comments []

installp Flags
PREVIEW only? (install operation will NOT occur) no +
COMMIT software updates? yes +
SAVE replaced files? yes +
AUTOMATICALLY install requisite software? yes +
OVERWRITE same or newer versions? no +
VERIFY install and check file sizes? no +

Press <Enter>

Checking on newly created resources

lsnim -l lpp_aix53-12-01
lpp_aix53-12-01:
class = resources
type = lpp_source
arch = power
Rstate = ready for use
prev_state = unavailable for use
location = /csminstall/NIM/LPPSOURCE/lpp_aix53-12-01
simages = yes
alloc_count = 0
server = master

NOTE: Having simages is still set for a status of yes is very good. This means NIM has checked
all the filesets and has determined the lppsource is healthy and can be used to do installs to our
nodes.

lsnim -l spot_aix53-12-01
spot_aix53-12-01:
class = resources
type = spot
plat_defined = chrp
arch = power
Rstate = ready for use
prev_state = verification is being performed
location = /csminstall/NIM/SPOT/spot_aix53-12-01/spot_aix53-12-01/usr
version =5
release =3
mod = 12
oslevel_r = 5300-12
alloc_count = 0
server = master
if_supported = chrp.mp ent
Rstate_result = success

Note: We now have a lpp_source and spot that will support what we are trying to do to update our
nodes.

smit nim_alt_clone

Clone the rootvg to an Alternate Disk

Type or select values in entry fields.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]


* Target Machine / Group to Install [AIXNodes] +
* Target Disk(s) to install [hdisk1]
Phase to execute all +
IMAGE_DATA resource [] +/
EXCLUDE_FILES resource [] +/
(leave blank to include all files in backup)

BUNDLE to install [] +
-OR-
Fileset(s) to install []

FIX_BUNDLE to install [] +
-OR-
FIXES to install [update_all]

LPP_SOURCE [lpp_aix53-12-01] +
(required if filesets, bundles or fixes used)

installp Flags
COMMIT software updates? yes +
SAVE replaced files? no +
AUTOMATICALLY install requisite software? yes +
EXTEND filesystems if space needed? yes +
OVERWRITE same or newer versions? no +
VERIFY install and check file sizes? yes +
ACCEPT new license agreements? yes +

Customization SCRIPT resource [] +/


Set bootlist to boot from this disk
on next reboot no +
Reboot when complete? no +
Verbose output? yes +
Debug output? yes +

Group controls (only valid for group targets):


Number of concurrent operations [] #
Time limit (hours) [] #

[root@l1n2](/var/adm/ras): 3909# cat alt_disk_inst.log


Mon Mar 14 17:40:16 GMT 2011
cmd: /usr/sbin/alt_disk_copy -D -N -B -P all -I -acNgXvY -l /tmp/_nim_dir_504736/lpp_source -V
-d hdisk1
0505-124 alt_disk_install: option -b, -F, -f, or -w must be used in conjunction with the -l option.

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

[TOP]
+-----------------------------------------------------------------------------+
Concurrency Control
+-----------------------------------------------------------------------------+

Processing will begin with the first 4 machines from the group...

+-----------------------------------------------------------------------------+
Initiating "alt_disk_install" Operation
+-----------------------------------------------------------------------------+
Allocating resources ...

Initiating the alt_disk_install operation on machine 1 of 4: l1n3 ...

Initiating the alt_disk_install operation on machine 2 of 4: l1n2 ...

Initiating the alt_disk_install operation on machine 3 of 4: l1n4 ...

Initiating the alt_disk_install operation on machine 4 of 4: l1n1 ...

+-----------------------------------------------------------------------------+
"alt_disk_install" Operation Summary
+-----------------------------------------------------------------------------+
Target Result
------ ------
l1n3 INITIATED
l1n2 INITIATED
l1n4 INITIATED
l1n1 INITIATED

Note: Use the lsnim command to monitor progress of "INITIATED"


targets by viewing their NIM database definition.

exportfs: /csminstall/AIX/lppsource: A file or directory in the path name does n


ot exist.

+-----------------------------------------------------------------------------+
Concurrency Control
+-----------------------------------------------------------------------------+
The first 4 machines have been processed. As machines finish
installing processing will resume with the remaining members
of the group, one at a time.

Status: Pending: 0 Installing: 4 Complete: 0 Failed: 0

Status: Pending: 0 Installing: 4 Complete: 0 Failed: 0

Status: Pending: 0 Installing: 4 Complete: 0 Failed: 0

Status: Pending: 0 Installing: 3 Complete: 1 Failed: 0

+-----------------------------------------------------------------------------+
Concurrency Control: "alt_disk_install" Operation Summary
+-----------------------------------------------------------------------------+
Target Result
------ ------
l1n3 COMPLETE
l1n2 COMPLETE
l1n4 COMPLETE
l1n1 COMPLETE

What the mount looked like during the alt_disk_install cloning process:
[root@l1n1](/root): 2029# mount
node mounted mounted over vfs date options
-------- --------------- --------------- ------ ------------ ---------------
/dev/hd4 / jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Nov 09 21:35 rw,log=/dev/hd8
/proc /proc procfs Nov 09 21:35 rw
/dev/hd10opt /opt jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/lv00 /var/adm/csd jfs Nov 09 21:35 rw,log=/dev/loglv00
/dev/root_lv /root jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/node_lv /node jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/scratch_lv /scratch jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/fslv00 /LL jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/spadm_lv /spadm jfs2 Nov 09 21:35 rw,log=INLINE
/dev/ptfs_lv /ptfs jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/gpfs_bk /backup mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_bk
/dev/gpfs_hm /u/home mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_hm
/dev/gpfs_sd /scheduler mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_sd
/dev/gpfs_st /site mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_st
/dev/gpfs_wk /scr mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_wk
lms1 /csminstall/AIX/lpp_source/5300-12-01 /tmp/mnt nfs3 Mar 02 20:30
lms1.test.navo.hpc.mil /csminstall/NIM/LPPSOURCE/lpp_aix53-12-01
/tmp/_nim_dir_234254/lpp_source nfs3 Mar 14 18:07 hard,intr
/dev/alt_hd4 /alt_inst jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_fslv00 /alt_inst/LL jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_hd1 /alt_inst/home jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_node_lv /alt_inst/node jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_hd10opt /alt_inst/opt jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_ptfs_lv /alt_inst/ptfs jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_root_lv /alt_inst/root jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_scratch_lv /alt_inst/scratch jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_spadm_lv /alt_inst/spadm jfs2 Mar 14 18:09 rw,log=INLINE
/dev/alt_hd3 /alt_inst/tmp jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_hd2 /alt_inst/usr jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_hd9var /alt_inst/var jfs2 Mar 14 18:09 rw,log=/dev/alt_hd8
/dev/alt_lv00 /alt_inst/var/adm/csd jfs Mar 14 18:09 rw,log=/dev/alt_loglv00

What the mount looks like after the alt_disk_install process:

[root@l1n1](/root): 2037# mount


node mounted mounted over vfs date options
-------- --------------- --------------- ------ ------------ ---------------
/dev/hd4 / jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Nov 09 21:33 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Nov 09 21:35 rw,log=/dev/hd8
/proc /proc procfs Nov 09 21:35 rw
/dev/hd10opt /opt jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/lv00 /var/adm/csd jfs Nov 09 21:35 rw,log=/dev/loglv00
/dev/root_lv /root jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/node_lv /node jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/scratch_lv /scratch jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/fslv00 /LL jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/spadm_lv /spadm jfs2 Nov 09 21:35 rw,log=INLINE
/dev/ptfs_lv /ptfs jfs2 Nov 09 21:35 rw,log=/dev/hd8
/dev/gpfs_bk /backup mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_bk
/dev/gpfs_hm /u/home mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_hm
/dev/gpfs_sd /scheduler mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_sd
/dev/gpfs_st /site mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_st
/dev/gpfs_wk /scr mmfs Nov 09 21:41
rw,mtime,atime,quota=userquota;groupquota;filesetquota,dev=gpfs_wk
lms1 /csminstall/AIX/lpp_source/5300-12-01 /tmp/mnt nfs3 Mar 02 20:30

SPECIAL NOTE: Below explains why we are choosing to use smit.

Basically, if an application is built to be used in smit then we should use smit. There are steps
within some applications that make additional calls to other scripts or functions that one
would not necessarily be aware of and if one ran applications via command line one
could miss critical and important steps for running the application. Example is our nim
server application that uses the nim command line operation. When I try to do a alternate
clone of hdisk0 to hdisk1 and then update to technology level 12 on the alternate disk I
am unable to accept license agreement using the nim options that are available. Going
through smit and doing the same thing I do have the option to accept the license
agreement. See below for this example:

As in above we issue smit alt_disk_clone and then we fill out the appropriate fields for what we
want to do. Then issue <F6> to see the command/script that is running. We will notice the
following:

Note: I formatted it for easier reading.

/usr/sbin/nim -o alt_disk_install
-a source=rootvg
-a disk="${TARGET}"
-a phase="${PHASE}”
-a image_data="$IMAGE_DATA"
-a exclude_files="$EXCLUDE_LIST"
-a installp_bundle="$INSTALLP_BUNDLE"
-a filesets="$FILESETS"
-a fix_bundle="$FIX_BUNDLE"
-a fixes="$FIXES"
-a installp_flags="-$FLAGS"
-a script="$SCRIPT"
-a set_bootlist="$SET_BOOTLIST"
-a boot_client="$BOOT_CLIENT"
-a show_details="$VERBOSE"
-a debug="$DEBUG"
-a time_limit=”${GRP_TIME_LIMIT}”
-a concurrent=”${GRP_CONCURRENT}”
${MACHINE}

Note:
$FLAGS is passed to nim_alt_clone which takes the –z option to allow for installp type flags such
as “ACCEPT new license agreements?”.
nim_alt_clone -g 'AIXNodes' -t 'hdisk1' -P'all' -l 'lpp_aix53-12-01' -z 'c' -z 'N' -z 'g' -z 'X' -z '' -z 'v'
-z 'Y' '-B'
nim_alt_clone is not a command line type command for us to use; it is more a function call for the
nim command to use.

It is better to use SMIT rather than figure out how to make calls to nim_alt_clone. We could
create a script via smit but to tweak the script would be blind and so we would revert back to
using smit anyway.

Você também pode gostar