Você está na página 1de 6

Booting up Solaris 10 from a SAN replicated LUN on a different Sun SPARC server July 9, 2010 By Andrew Lin The

quickest way to recover from a total disaster is to have some sort of replic ation implemented. There are two different methods of real-time replication, har dware and software. My experiences with software replication such as Symantec Ve ritas Volume replicator for AIX was not pleasing. It required constant maintenan ce and troubleshooting. The best is hardware replication if you can afford it. A lot of organizations pick software replication as it generally cost a lot less up front, but the cost of maintenance eventually adds up. I will explain how to recover a Solaris 10 server from hardware replicated SAN d isk. It took me sometime to figure out how to boot up from the replicated SAN LU N (disk), and many more hours to understand why the steps I applied works. In this example I have a SUN SPARC M3000 server with two Qlogic fiber channel ca rds (HBA) installed in the PCI solts. The HBAs were already configured to connec t to the SAN disk (LUN). This LUN contained the replicated copy of a production Slaris 10 server. The production server had two ZFS pool residing in a single LU N. Using the Solaris 10 installation CD boot up the Sparc server into single user m ode. boot cdrom -s The first thing you need to do is to see if the HBAs are working. The connected status indictes that communication between the server and switch is working. The re are two HBAs installed for redundancy, both connected to the same LUN. # luxadm -e port /devices/pci@0,600000/pci@0/pci@8/SUNW,qlc@0/fp@0,0:devctl CONNECTED /devices/pci@1,700000/pci@0/pci@0/SUNW,qlc@0/fp@0,0:devctl CONNECTED Now you need to find out if the SAN disk is visible from the server. Even though both HBAs are connected to the same SAN disk, you will see two separate SAN dis ks in the results below. It just means there are two paths to the SAN. # luxadm probe No Network Array enclosures found in /dev/es Found Fibre Channel device(s): Node WWN:50060e80058c7b10 Device Type:Disk device Logical Path:/dev/rdsk/c1t50060E80058C7B10d1s2 Node WWN:50060e80058c7b00 Device Type:Disk device Logical Path:/dev/rdsk/c2t50060E80058C7B00d1s2 In the above example the first LUN is c1t50060E80058C7B10d1s2. This is the logic al device name which is a symbolic link to the physical device name stored in th e /devices directory. Logical device names contain the controller number(c2), ta rget number (t50060E80058C7B10), disk number (d1), and slice number (s2). The next step is to find out how the disk is partitioned, the format command wil l give you that information. You need this information to understand how to boot up the disk. # format Searching for disks done AVAILABLE DISK SELECTIONS:

0. c1t50060E80058C7B10d1 1066>/pci@0,600000/pci@0/pci@8/SUNW,qlc@0/fp@0,0/ssd@w5 0060e80058c7b10,1 1. c2t50060E80058C7B00d1 1066>/pci@1,700000/pci@0/pci@0/SUNW,qlc@0/fp@0,0/ssd@w5 0060e80058c7b00,1 Select the first disk 0. Specify disk (enter its number): 0 selecting c1t50060E80058C7B10d1 [disk formatted] FORMAT MENU: disk select a disk type select (define) a disk type partition select (define) a partition table current describe the current disk format format and analyze the disk repair repair a defective sector label write label to the disk analyze surface analysis defect defect list management backup search for backup labels verify read and display labels save save new disk/partition definitions inquiry show vendor, product and revision volname set 8-character volume name ! execute , then return quit Display the labels and slices (partitions). In Solaris each slice is treated as a separate physical disk. In the below example you can tell that the disk is lab eled as VTOC (Volume Table of Contents) because you can see the cylinders. VTOC is also known as SMI label. If the disk was labeled with EFI (Extensible Firmwar e Interface), then you would see sectors instead of cylinders. Partition 0 (slic e 0) holds the operating system files, the boot disk. Please note that you canno t boot from a disk with EFI label. Slice 2 is the entire physical disk because i t contains all cylinders, 0 65532. format> verify Primary label contents: Volume name = < ascii name = pcyl = 65535 ncyl = 65533 acyl = 2 nhead = 15 nsect = 1066 Part Tag Flag Cylinders Size Blocks 0 root wm 0 3356 25.60GB (3357/0/0) 53678430 1 unassigned wm 0 0 (0/0/0) 0 2 backup wm 0 65532 499.66GB (65533/0/0) 1047872670 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 3357 65532 474.07GB (62176/0/0) 994194240

Here is what we know so far, we know that the disk name is c1t50060E80058C7B10d1 . Slice 0 on this disk contains the boot files. The physical path for this disk is /pci@0,600000/pci@0/pci@8/SUNW,qlc@0/fp@0,0/ssd@w50060e80058c7b10,1. Now we n eed to find out the physical path for Slice 0. I know that the disk contains ZFS filesystems because it is a replica of the pro duction disk. When a ZFS filesystem is moved to a different SPARC server it must first be imported because the hostid is different. List the ZFS pool contained on the disk using the zpool import command. There ar e two ZFS pools in the below example, epool and rpool. Take a note of the status and action. # zpool import pool: epool id: 16865366839830765202 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the -f lag. see: http://www.sun.com/msg/ZFS-8000-EY config: epool ONLINE c2t50060E80058C7B00d1s7 ONLINE pool: rpool id: 10594898920105832331 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the -f lag. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool ONLINE c2t50060E80058C7B00d1s0 ONLINE Import the pool with the zpool import command. The options -a will import all th e ZFS pools it can find, the -f option will force the import. If you do not spec ify the force option, then the import may fail with the error cannot import rpool : pool may be in use from other system, it was last accessed by server name (hosti d: 123456) . Ignore the error message about failed to create mountpoint. # zpool import -af cannot mount /epool : failed to create mountpoint cannot mount /rpool : failed to create mountpoint List the imported ZFS pools. # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT epool 472G 260G 212G 55% ONLINE rpool 25.5G 5.75G 19.8G 22% ONLINE List the ZFS filesystems. In the example below notice the mountpoint / is mounte d on the zfs filesystem rpool/ROOT/zfsboot. This is the boot partition, it resid es in rpool. # zfs list

NAME USED AVAIL REFER MOUNTPOINT epool 260G 205G 21K /epool rpool 7.74G 17.4G 99K /rpool rpool/ROOT 4.74G 17.4G 21K legacy rpool/ROOT/zfsboot 4.74G 17.4G 4.48G / rpool/ROOT/zfsboot/var 139M 17.4G 105M /var rpool/dump 1.00G 17.4G 1.00G rpool/swap 2.00G 19.4G 16K Change the mountpoint for rpoo/ROOT/zfsboot to /mnt so you can mount it to read the contents. # zfs set mountpoint=/mnt rpool/ROOT/zfsboot Confirm that the mountpoint was changed. # zfs list NAME USED AVAIL REFER MOUNTPOINT epool 260G 205G 21K /epool rpool 7.74G 17.4G 99K /rpool rpool/ROOT 4.74G 17.4G 21K legacy rpool/ROOT/zfsboot 4.74G 17.4G 4.48G /mnt rpool/ROOT/zfsboot/var 139M 17.4G 105M /mnt/var rpool/dump 1.00G 17.4G 1.00G rpool/swap 2.00G 19.4G 16K Now mount rpool/ROOT/zfsboot. # zfs mount rpool/ROOT/zfsboot List the logical disks. # cd /dev/dsk # ls c1t50060E80058C7B10d1s0 c1t50060E80058C7B10d1s1 c1t50060E80058C7B10d1s2 c1t50060E80058C7B10d1s3 c1t50060E80058C7B10d1s4 c1t50060E80058C7B10d1s5 c1t50060E80058C7B10d1s6 c1t50060E80058C7B10d1s7

c2t50060E80058C7B00d1s0 c2t50060E80058C7B00d1s1 c2t50060E80058C7B00d1s2 c2t50060E80058C7B00d1s3 c2t50060E80058C7B00d1s4 c2t50060E80058C7B00d1s5 c2t50060E80058C7B00d1s6 c2t50060E80058C7B00d1s7

As stated earlier the physical path for the disk we are looking for is /pci@0,60 0000/pci@0/pci@8/SUNW,qlc@0/fp@0,0/ssd@w50060e80058c7b10,1. The boot slice is 0. We can derive from the physical path that the disk name is 50060e80058c7b10. We also know from the output of the format command that the physical disk 50060e80 058c7b10 maps to the logical disk c1t50060E80058C7B10d1. Therefore we can derive that the logical boot disk is c1t50060E80058C7B10d1s0. Now find out what physical path c1t50060E80058C7B10d1s0 is a symbolic link for a nd that is your complete boot path. In the below example the boot path starts at the first slash (/) right after /devices. It is /pci@0,600000/pci@0/pci@8/SUNW, qlc@0/fp@0,0/ssd@w50060e80058c7b10,1:a. You need to replace ssd@ with disk@ when entering the path into EEPROM. # ls -l c1t50060E80058C7B10d1s0 lrwxrwxrwx 1 root root 82 Jul 5 10:21 c1t50060E80058C7B10d1s0 -> ../../devices/pci@0,600000/pci@0/pci@8/SUNW,qlc@0/fp@0,0/ssd@w50060e80058c7b10,1 :a

If you have more than one ZFS pool the non root pool may not get mounted upon bo oting up the server, as in this example. You may get the below error. SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Mon Jul 5 11:54:14 EDT 2010 PLATFORM: SUNW,SPARC-Enterprise, CSN: PX654321, HOSTNAME: Andrew-Lin SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 33e5a9f1-49ac-6ebc-f2a9-dff25dea6b86 DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more info rmation. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run zpool status -x and replace the bad device. http://sun.com/msg/ZFS-8000-D3fameserverq9{root}: zpool status -x pool: epool state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using zpool online . see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM epool UNAVAIL 0 0 0 insufficient replicas c3t60060E8005652C000000652C00002100d0s7 UNAVAIL 0 0 0 cannot open The above error is caused by the zpool.cache file. This file contains the old pa ths of the disks from the previous server. The default behavior of Solaris 10 is to read the path from the zpool.cache file to speed up the boot sequence. You s hould delete this file and the system will recreate a fresh one during the boot up sequence. Below are the steps to rename the zpool.cache file. # cd /mnt/etc/zfs # ls zpool.cache # mv zpool.cache zpool.cache.old Now you need to reverse the changed you applied to the mountpoint earlier. Make sure that you change directory out of /mnt to /, otherwise the set mountpoint co mmand will fail with the error device busy. Ignore the cannot mount / : directory i s not empty message. # zfs set mountpoint=/ rpool/ROOT/zfsboot cannot mount / : directory is not empty property may be set but unable to remount filesystem Confirm that the mount points were changed. # zfs list NAME USED AVAIL REFER MOUNTPOINT epool 260G 205G 21K /epool rpool 7.74G 17.4G 99K /rpool rpool/ROOT 4.74G 17.4G 21K legacy rpool/ROOT/zfsboot 4.74G 17.4G 4.48G / rpool/ROOT/zfsboot/var 139M 17.4G 105M /var

rpool/dump 1.00G 17.4G 1.00G rpool/swap 2.00G 19.4G 16K Shutdown the server. # init 0 Now set the boot device in EEPROM. setenv boot-device /pci@0,600000/pci@0/pci@8/SUNW,qlc@0/fp@0,0/disk@w50060e80058 c7b10,1:a The server is ready to be booted with the boot command. {0} ok boot

Você também pode gostar