Você está na página 1de 38

High available data center on the Laptop/Desktop with Xen

Martin Bracher Cyrill Mller LinuxWorld 2006

Basel

Baden

Bern

Lausanne

Zrich

Dsseldorf

Frankfurt/M.

Freiburg i. Br.

Hamburg

Mnchen

Stuttgart

Data center on the Laptop with Xen


Overview Installation and configuration Dom0 Installation and configuration DomU
Daten sind immer im Spiel

LinuxWorld 2006

MnB / CyM

2006

Vocabulary how the terms are used


Host-OS, XEN Domain 0, Dom0, privileged domain Guest-OS, DomU, Unprivileged domain, Virtual Machine, VM

Paravirtualization: Guest-OS with a modified kernel Full / Hardware virtualization, HVM: Non-modified guest-OS, CPU extensions
Intel Vanderpool (VT) AMD Pacifica

LinuxWorld 2006

MnB / CyM

2006

High Availability (HA)


HA is often a must in the IT today Special hardware systems (certified, expensive) required Often special and complex software required (e.g. Oracle RAC) Identical test environment required
You can not test on the production system. When the change does not work, the system is not available ( = no HA)

Can we build/emulate a test system with XEN?


NO, we need a physical system
problems are often hardware dependent (motherboard, controllers) performance is hardware dependent

YES, (next slides...)

LinuxWorld 2006

MnB / CyM

2006

High Availability (HA)


To handle complex HA systems, special knowledge is required
DBA/Sysadmins must have this knowledge Education/training is required New versions get experience with new functions, new syntax...

The test system must be a mirror of the production


When not necessary, do not change/destroy this system

Another system is required


Low-cost system with HA functionality, but no redundancy
Disks not mirrored, less CPU, less memory, ...

Not all data of a terabyte warehouse is required

Maybe we can use virtualization for the systems


XEN
LinuxWorld 2006 5 MnB / CyM 2006

Introduction: Xen Virtualization


ParaVirtualization
Special kernel which knows the virtualised environment Only the kernel must be modified, applications are unchanged

Hardware Virtualization (HVM)


Non-modified guest OS can be used
Windows Older Linux distributions without XEN kernel (RHEL 4, SLES 9)

Processors with special virtualization support required


Intel Vanderpool (VT) AMD Pacifica

Can be used to emulate Windows-Clients

LinuxWorld 2006

MnB / CyM

2006

Introduction: Xen Architecture


Dom0
XEN control software Unmodified user programs

DomU
Unmodified user programs Guest OS
Modified kernel (paravirt.)

Host OS
Linux XEN-Kernel
Back-End Device Drivers Native Device Drivers

Front-End Device Drivers

Control Interface

Safe H/W interface

Event Channel

Virtual CPU

Virtual MMU

XEN Virtual Machine Monitor

Hardware (CPU, Memory, Disk, Network, ...)

LinuxWorld 2006

MnB / CyM

2006

Introduction: Xen Network (Bridging)

Computer - XEN
N e t w o r k dom0 eth0 veth0 domU-X

Bridge:Bridge: xenbr0 xenbr0 vif0.0

IP XXX.XXX.XXX.XXX eth0 peth0 MAC XX-XX-XX-XX-XX-XX

vifX.0

eth0

LinuxWorld 2006

MnB / CyM

2006

Introduction: Xen Network (Bridging)

Computer - XEN
N e t w o r k dom0 eth0 domU-X

Bridge:Bridge: xenbr0 xenbr0 vif0.0 peth0 vifX.0 eth0

LinuxWorld 2006

MnB / CyM

2006

Introduction: RAC Architecture


Corporate Network Server 1
Oracle SGA Instance 1

Server 2 Interconnect SAN


Oracle SGA Instance 2

Oracle Database

LinuxWorld 2006

10

MnB / CyM

2006

Example
HA system with Oracle Real Application Cluster
Linux SuSE SLES 10 Cluster with 2 nodes Oracle RAC database version 10.2 Shared storage on a SAN Datafiles on a cluster filesystem (OCFS2) Cluster interconnect with bonding Linux SuSE SLES 10 (with SLES9 hardware virtualization required) Cluster with 2 nodes 2 DomU Oracle RAC database version 10.2 Shared storage on a SAN ??? Datafiles on a cluster filesystem (OCFS2) Cluster interconnect with bonding ???

XEN system with Oracle Real Application Cluster

LinuxWorld 2006

11

MnB / CyM

2006

Location of the (virtual) systems


Represent a Real Application Cluster environment with XEN
Storage, SAN: SLES10, Suse 10.1
can be in a XEN DomU (VM) can be on the Host-OS (Dom0) can be on a remote machine (Laptop/Desktop) on Host-OS or VM

RAC Nodes: Enterprise Linux (SLES10)


can be in a VM 1 node can be on the Host-OS
SAN RAC1 SAN RAC2 RAC1 RAC2 RAC1 RAC2 RAC1 RAC2

SAN

SAN

LinuxWorld 2006

12

MnB / CyM

2006

RAC requirements: Shared Storage


To set up a RAC environment, shared storage is required for
OCR, the Oracle Cluster Registry Voting files all Database files

These files can be


in RAW devices on a cluster filesystem only database files: Oracle ASM (Automatic Storage Management)

Shared storage devices in a XEN environment can be


Physical block-devices (disk, partition, LVM) Files iSCSI devices

LinuxWorld 2006

13

MnB / CyM

2006

RAC requirements: Network


For a RAC environment at least 2 interfaces are required
Public interface (intranet) Private interface: Cluster interconnect

Normally more interfaces are used for redundancy In a XEN environment, we normally use virtual network devices
eth0, eth1, eth2

LinuxWorld 2006

14

MnB / CyM

2006

Data center on the Laptop with Xen


Overview Installation and configuration Dom0 Installation and configuration DomU
Daten sind immer im Spiel

LinuxWorld 2006

15

MnB / CyM

2006

Shared storage
(Shared) Storage for VM on the same Dom0
Physical devices

RAC1 RAC2

SAN

disk = [ 'phy:sdc,hdb,r','phy:/dev/vg1/lv_vm1,sda,w' ]

Files
disk = [ 'file:/xen/vm1/hda,hda,w' ]

Can also be a sparse file (empty blocks are not yet allocated)
dd if=/dev/zero of=/xen/vm1/hda bs=1024k seek=1023 count=1

Slower, because write-cache will be deactivated (SLES10 Readme) Not acceptable for Cluster: remove losetup -y in /etc/xen/scripts/block

For full virtualized (hvm) domains (VT/Pacifica), add ioemu


disk = [ 'phy:/dev/vg1/lv_vm1,ioemu:sda,w' ]

LinuxWorld 2006

16

MnB / CyM

2006

Shared storage
Per default, XEN blocks multiple access to a device
Error: Device 2048 (vbd) could not be connected. File /u03/xen/env4/san/sda is loopback-mounted through /dev/loop111, which is mounted in a guest domain, and so cannot be mounted now.

Add ! to the mount option to open the device in shared mode


disk = [ 'file:/u03/xen/env4/san/sda,sda,w!' ]

LinuxWorld 2006

17

MnB / CyM

2006

Shared storage
Shared Storage for VM on different Dom0 (e.g. 2 Laptops)
The shared storage is handled on the Dom0 layer and presented to the DomU like on a single Dom0
Real shared storage (SCSI, fibre channel SAN) When the device for domU is a disk-file, it can be shared via NFS iSCSI (SCSI over IP), visible like normal SCSI disks (slow)

The shared storage is handled directly on the DomU layer


iSCSI (SCSI over IP), visible like normal SCSI disks (fast) Direct access to PCI devices like SCSI/HostBus adaptors

The most flexible solution is iSCSI (SCSI over IP)


migration of whole DomU to another server
LinuxWorld 2006 18 MnB / CyM 2006

What is iSCSI
Fibre Channel, Host Bus Adaptor
The HBA presents the SAN Storage like SCSI disks to the server (LUN) High-end solution: fast, complex and expensive Can also be used for XEN
Optional: SAN-Switch, Fabric

SCSI HBA
172.16.1.2

SCSI HBA
172.16.1.3

SAN
TCP/IP LAN
LinuxWorld 2006 19 MnB / CyM 2006

What is iSCSI
iSCSI (SCSI over IP, RFC 3720, 256 pages)
SCSI commands are encapsulated in IP packets and transferred via network No expensive, special hardware, no SAN knowledge required Slower access time due to network delay (ping)

SCSI

NIC

iSCSI-target

SCSI

NIC
172.16.1.3

NIC

172.16.1.2

TCP/IP LAN

LinuxWorld 2006

20

MnB / CyM

2006

iSCSI target (the storage, SAN)


You can buy storage hardware which offers iSCSI iscsi-target is a software for Linux, emulating a iSCSI storage box
iSCSI Enterprise Target: http://iscsitarget.sourceforge.net Only available for newer kernels (e.g. Suse 10.x, SLES10, not SLES9) Install the iscsitarget.rpm and yast2-iscsi-server.rpm Started by /etc/init.d/iscsitarget, daemon /usr/sbin/ietd

LinuxWorld 2006

21

MnB / CyM

2006

iSCSI target (the storage)


You can use disks, partitions, LVM's or files as iSCSI disks
The same devices that can be presented as a disk to XEN All devices exported via iSCSI will be presented as SCSI disks It is easy to convert existing XEN Disks to iSCSI disks
disk = ['phy:/dev/vg1/lv_vm1,sda,w' ]

present /dev/vg1/lv_vm1 as iSCSI disk


disk = ['phy:/dev/sdc,sda,w' ] but it is faster to connect to the target from within the domU

DO NOT access the device via iSCSI and locally


Data corruption can happen

LinuxWorld 2006

22

MnB / CyM

2006

iSCSI target (the storage)


Every iSCSI device must have a worldwide unique identifier
Prefix iqn. (iSCSI Qualified Name) Date YYYY-MM. (month after domain-registration) Domainname (reversed) :string

#/etc/ietd.conf Target iqn.1996-11.ch.trivadis:disk1 Lun 0 Path=/dev/sdc,Type=fileio Lun 1 Path=/dev/vg1/lv_vm1,Type=fileio MaxConnections 8


LinuxWorld 2006

#>1 LUN not configurable via GUI

23

MnB / CyM

2006

iSCSI initiator (client)


open-iscsi
New client-software Works with Kernel 2.6.11+, included in SLES10 / Suse 10.x
In initial kernel of SuSE 10.1 the kernel module is missing

Persistent configuration in DBM databases


Discovery table (discovery.db) Node table (node.db)
# communication with the /sbin/iscsid daemon > iscsiadm -m discovery -t st -p 192.168.2.135:3260 [2f03cf] 192.168.2.135:3260,1 iqn.1996-11.ch.trivadis:disk1 > iscsiadm -m node #show discovered targets [2f03cf] 192.168.2.135:3260,1 iqn.1996-11.ch.trivadis:disk1 > iscsiadm -m node --record=2f03cf -login > iscsiadm > fdisk l
LinuxWorld 2006

-m node -r 2f03cf -o update -n node.conn[0].startup -v automatic # shows the iSCSI disks like /dev/sda, /dev/sdb, ...
24 MnB / CyM 2006

Access to PCI devices


Using PCI device exclusive in DomU
It is possible to access some PCI devices exclusively in a DomU
At the moment only possible for paravirtualization e.g. network adaptor, HBA

Dom0-Kernel compiled with the following options:


sles10:/boot # grep PCIDEV config-2.6.16.20-0.12-xen CONFIG_XEN_PCIDEV_BACKEND=m # CONFIG_XEN_PCIDEV_BACKEND_VPCI is not set CONFIG_XEN_PCIDEV_BACKEND_PASS=y

Rebuild initrd with the following configuration /etc/sysconfig/kernel


> vi /etc/sysconfig/kernel INITRD_MODULES="... pciback" > mkinitrd

LinuxWorld 2006

25

MnB / CyM

2006

Access to PCI devices


Get the PCI-ID of the device with lspci (something like xx:xx.x)
> lspci #when you get "unknown device", try update-pciids first 02:01.0 Ethernet controller: Intel Corporation 82540EP Gigabit Ethernet Controller (Mobile) (rev 03) 07:01.0 Fibre Channel: QLogic Corp. QLA6312 Fibre Channel Adapter (rev 03)

Hide the ID from the Dom0: /boot/grub/menu.lst


module /boot/vmlinuz-xen ... pciback.hide=(02:01.0)(07:01.0)

Start DomU
> xm create c xen1 pci=02:01.0 pci=07:01.0

or add the option to the config file


pci=['02:01.0','07:01.0']

When it works, lspci in the DomU shows the devices

LinuxWorld 2006

26

MnB / CyM

2006

Definition of DomU
Network
Virtual ethernet devices can be presented to DomU Connected to a bridge (acts like a switch) Number of interfaces is restricted to 3
vif = [ 'mac=00:16:3e:36:56:b0', 'mac=00:16:3e:36:56:b1,bridge=xenbr1' , 'mac=00:16:3e:36:56:b4,bridge=xenbr2' ]

LinuxWorld 2006

27

MnB / CyM

2006

Definition of DomU
Kernel
Specify to boot
kernel and initrd (path of Dom0) root-filesystem (relative to the DomU disk)
kernel = '/boot/vmlinuz-xen' ramdisk = '/boot/initrd-xen' root = "/dev/hda1 ro"

Use a special bootloader which is able to read the DomU disk


bootloader = '/usr/lib/xen/boot/domUloader.py' bootentry = 'hda2:/boot/vmlinuz-xen,/boot/initrd-xen' root = '/dev/hda2 ro'

bootentry: point of view of the DomU The kernel and initrd to boot will be copied to Dom0 in a temporary dir.

Use the same kernel-type in Dom0 and DomU


xenpae kernel in Dom0 does not work with xen kernel in DomU

LinuxWorld 2006

28

MnB / CyM

2006

Data center on the Laptop with Xen


Overview Installation and configuration Dom0 Installation and configuration DomU
Daten sind immer im Spiel

LinuxWorld 2006

29

MnB / CyM

2006

OS installation
yast2 offers a module to install a DomU from installation media
Use of ISO images possible (cdrom=/path/file.iso)
change of CD means: move the next image to the specified name To Re-read the (new) cd-image, press eject when a graphical installation is preferred:

Reuse of an existing installation


maybe the XEN kernel must be added to it DO NOT boot an already started installation (corruption of filesystem)
In HVM-Virtualization you can specify the whole disk (grub will be started)

LinuxWorld 2006

30

MnB / CyM

2006

Shared storage for Cluster-files


As already mentioned, the shared storage can be configured on the Host-OS and presented as a disk to DomU
nothing special to configure in DomU but can be extremely slow (iSCSI)

Shared storage via Open-iSCSI can be configured in DomU


native performance (identical to non-XEN environments)

Exclusive access to a PCI HBA / SCSI-controller for DomU

In our example we will use iSCSI and access the shared storage via a cluster filesystem, OCFS2

LinuxWorld 2006

31

MnB / CyM

2006

Oracle Cluster Filesystem (OCFS2)


In August 2005, Oracle released the final version of OCFS2
Today part of the official kernel

Key features / limitations of OCFS2


Not compatible with OCFS1 POSIX compliant filesystem, not only for Oracle database files ORACLE_HOME can be a shared volume on OCFS2 (not recommended) Direct I/O and async I/O Listener Node1 Node2 Listener Journaling Filesystem ORACM ORACM GSD GSD Codebase from ext3 used Instance Instance TEST2 Default 512TB, up to 4PB possible TEST1
/u00 /u01 /u02

LinuxWorld 2006

32

MnB / CyM

2006

Setup of OCFS2
For initial setup, use the graphical tool ocfs2console Enable startup in runlevel
/etc/init.d/o2cb enable chkconfig ocfs2 on

Format the disk (shared storage between all RAC nodes)


mkfs.ocfs2 -b 4K -C 1M -L u05 -N 4 /dev/sdb1

/etc/fstab
LABEL=u05 /u05 ocfs2 _netdev,datavolume,nointr
0 0

Use LABEL=... instead of volume path (changing disk-order)


Does not work with whole disks, e.g. /dev/sda create /dev/sda1
LinuxWorld 2006 33 MnB / CyM 2006

Fencing
Fencing is the act of forcefully removing a node from a cluster Fencing will be done by a system panic
this normally results to a system reboot
# /etc/sysctl.conf # Reboot after 60 seconds kernel.panic = 60

Fencing happens, when the node


does not have a quorum (does not see 50% of other nodes) can not write to the heartbeat within a defined period
((O2CB_HEARTBEAT_THRESHOLD - 1) * 2) #default (7-1)*2=12

On slow or virtual hardware (XEN), increase the threshold


# /etc/sysctl.conf O2CB_HEARTBEAT_THRESHOLD=30

Use a different network interface (not the cluster-interconnect)


LinuxWorld 2006 34 MnB / CyM 2006

Network configuration
Virtual network cards (eth0, eth1, eth2)
Configuration via Yast2

Physical network cards


When configured to have exclusive access to the PCI card in this DomU, then configuration as a real network card

Bonding
Use of 2 network devices with the same IP It is also possible to configure bonding on virtual devices
For speed, it does not really make sense, because it is not really faster When you use bonding in the production, use it also here to have the same device names

LinuxWorld 2006

35

MnB / CyM

2006

Network configuration: Bonding for interconnect

bond0

eth1 eth2

eth1 eth2

bond0

Add 2 additional bridges in Dom0 (acts like a switch)


/etc/xen/scripts/network-bridge start netdev=dummy0 \ bridge=xenbr1 vifnum=1 /etc/xen/scripts/network-bridge start netdev=dummy1 \ bridge=xenbr2 vifnum=2

Add 2 additional interfaces to DomU


vif = [ 'mac=00:16:3e:36:56:a0' , 'mac=00:16:3e:36:56:a1,bridge=xenbr1' , 'mac=00:16:3e:36:56:a2,bridge=xenbr2' ]

LinuxWorld 2006

36

MnB / CyM

2006

Oracle installation
Insert the installation CD and mount it
When the cdrom or iso-image is not attached to domU, attach it now
host# xm block-attach domU file://tmp/10gR2.iso /dev/hdc ro xen# mount /dev/hdc /mnt #or remove it xen# umount /mnt host# xm block-detach domU /dev/hdc

Remote installation as usual, no difference to real hardware


ssh -X, vncviewer

Remove LD_ASSUME_KERNEL=... in ORACLE_HOME/bin/*


Incompatible with newer Linux distributions (SLES10)

When clusterware does not start after reboot


chkconfig init.crs off; chkconfig init.crs on
LinuxWorld 2006 37 MnB / CyM 2006

XEN for training in HA Core Messages


What is possible
Hardware-independent tests
syntax features

Education @home
Daten sind immer im Spiel

What is not possible


real performance tests hardware-related tests

Good for Desktop, bad for Laptop


No standby/suspend supported Not all kernel-modules available

LinuxWorld 2006

38

MnB / CyM

2006

Você também pode gostar