Escolar Documentos
Profissional Documentos
Cultura Documentos
by Andrew Beekhof
pacemaker@clusterlabs.org
Table of Contents
Read-Me-First v
What Is Pacemaker? vi
Pacemaker Architecture ix
Installation i
OS Installation i
Security Shortcuts xi
Setup xii
Configuring OpenAIS xv
Preparation xxix
Preparation lxvii
Recovery lxxiv
Example lxxvi
Resources lxxix
Fencing lxxix
Apache lxxxi
Read-Me-First
Given the graphical nature of the Fedora install process, a number of screenshots are included. However the
guide is primarily composed of commands, the reasons for executing them and their expected outputs.
If a command is listed and not followed by any output, then it should be assumed that the command does not
produce any.
What Is Pacemaker?
Pacemaker is a cluster resource manager. It achieves maximum availability for your cluster services (aka.
resources) by detecting and recovering from node and resource-level failures by making use of the messaging
and membership capabilities provided by your preferred cluster infrastructure (either OpenAIS or Heartbeat).
Two-node Active/Passive clusters using Pacemaker and DRBD are a cost-effective solution for many High
Availability situations.
By supporting many nodes, Pacemaker can dramatically reduce hardware costs by allowing several active/
passive clusters to be combined and share a common backup node
When shared storage is available, every node can potentially be used for failover. Pacemaker can even run
multiple copies of services to spread out the workload.
Pacemaker Architecture
At the highest level, the cluster is made up of three pieces:
• Core cluster infrastructure providing messaging and membership functionality (illustrated in red)
• Non-cluster aware components (illustrated in blue). In a Pacemaker cluster, these pieces include not only the
scripts that knows how to start, stop and monitor resources, but also a local daemon that masks the
differences between the different standards these scripts implement.
• A brain (illustrated in green) that processes and reacts to events from the cluster (nodes leaving or joining) and
resources (eg. monitor failures) as well as configuration changes from the administrator. In response to all of
these events, Pacemaker will compute the ideal state of the cluster and plot a path to achieve it. This may
include moving resources, stopping nodes and even forcing them offline with remote power switches.
When combined with OpenAIS, Pacemaker also supports popular open source cluster filesystems 1. Due to
recent standardization in the cluster filesystem community, they make use of a common distributed lock
manager which makes use of OpenAIS for its messaging capabilities and Pacemaker for its membership (which
nodes are up/down) and fencing services.
1 Even though Pacemaker also supports Heartbeat, the filesystems need to use the stack for messaging and membership
and OpenAIS seems to be what they’re standardizing on. Technically it would be possible for them to support Heartbeat as
well, however there seems little interest in this.
Installation
OS Installation
Detailed instructions for installing Fedora are available at http://docs.fedoraproject.org/install-guide/f11/ in a
number of languages. The abbreviated version is:
Point your browser to http://fedoraproject.org/en/get-fedora-all, locate the Install Media section and download
the install DVD that matches your hardware.
Burn the disk image to a DVD2 and boot from it. Or use the image to boot a virtual machine as I have done here.
After clicking through the welcome screen, select your language and keyboard layout 3
2 http://docs.fedoraproject.org/readme-burning-isos/
3 http://docs.fedoraproject.org/install-guide/f11/en-US/html/s1-langselection-x86.html
Assign your machine a host name. 4 I happen to control the clusterlabs.org domain name, so I will use that here.
You will then be prompted to indicate the machine’s physical location and to supply a root password 5.
Now select where you want Fedora installed. 6 As I don’t care about any existing data, I will accept the default
and allow Fedora to use the complete drive.
4 http://docs.fedoraproject.org/install-guide/f11/en-US/html/sn-networkconfig-fedora.html
5 http://docs.fedoraproject.org/install-guide/f11/en-US/html/sn-account_configuration.html
6 http://docs.fedoraproject.org/install-guide/f11/en-US/html/s1-diskpartsetup-x86.html
The next step is to configure networking. Do not accept the default. Cluster machines should never obtain an
ip address via DHCP. Here I will use the internal addresses for the clusterlab.org network.
Next choose which software should be installed. Deselect the default “Office and Productivity” as its not
appropriate for a cluster node. We’ll install any needed software later. After you click next, Fedora will begin
installing.
Once the node reboots, follow the on screen instructions 7 to create a system user and configure the time. It is
highly recommended to enable NTP on your cluster nodes. Doing so ensures all nodes agree on the current
time and makes reading log files significantly easier.
Click through the next screens until you reach the login window. Click on the user you created and supply the
password you indicated earlier.
7 http://docs.fedoraproject.org/install-guide/f11/en-US/html/ch-firstboot.html
That was the last screenshot by the way, from here on in we’re going to be working from the terminal.
Switch to the super user account. You will need to supply the password you entered earlier during the
installation process.
su -
[beekhof@test1 ~]$ su -
Password:
[root@test1 ~]#
Note that the username (the text before the @ symbol) now indicates we’re running as the super user “root”.
wget -O /etc/yum.repos.d/pacemaker.repo \
http://download.opensuse.org/repositories/server:/ha-clustering/Fedora_11/server:ha-
clustering.repo
Before continuing, verify that the repository is installed and that it is providing Pacemaker packages (look for the
text highlighted in green).
yum repolist
yum info pacemaker
[root@test1 ~]# yum repolist
Loaded plugins: refresh-packagekit
server_ha-clustering | 951 B 00:00
server_ha-clustering/primary | 11 kB 00:00
server_ha-clustering 31/31
repo id repo name status
fedora Fedora 11 - x86_64 enabled: 16,577
server_ha-clustering High Availability/Clustering server technologies (Fedora_11) enabled: 31
updates Fedora 11 - x86_64 - Updates enabled: 5,947
repolist: 22,555
[root@test1 ~]# yum info pacemaker
Loaded plugins: refresh-packagekit
Available Packages
Name : pacemaker
Arch : i386
Version : 1.0.5
Release : 3.1
Size : 669 k
Repo : server_ha-clustering
Summary : Scalable High-Availability cluster resource manager
URL : http://www.clusterlabs.org
License : GPLv2+ and LGPLv2+
Description: Pacemaker is an advanced, scalable High-Availability cluster resource
: manager for Linux-HA (Heartbeat) and/or OpenAIS.
:
: It supports "n-node" clusters with significant capabilities for
: managing resources and dependencies.
:
: It will run scripts at initialization, when machines go up or down,
: when related resources fail and can be configured to periodically check
: resource health.
Name : pacemaker
Arch : x86_64
Version : 1.0.5
Release : 3.1
Size : 698 k
Repo : server_ha-clustering
Summary : Scalable High-Availability cluster resource manager
URL : http://www.clusterlabs.org
License : GPLv2+ and LGPLv2+
Description: Pacemaker is an advanced, scalable High-Availability cluster resource
: manager for Linux-HA (Heartbeat) and/or OpenAIS.
:
: It supports "n-node" clusters with significant capabilities for
: managing resources and dependencies.
:
: It will run scripts at initialization, when machines go up or down,
: when related resources fail and can be configured to periodically check
: resource health.
Dependencies Resolved
=======================================================================================================
Transaction Summary
=======================================================================================================
Install 12 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Installed:
openais.x86_64 0:0.80.5-15.1 pacemaker.x86_64 0:1.0.5-4.1
Dependency Installed:
OpenIPMI-libs.x86_64 0:2.0.16-1.fc11 PyXML.x86_64 0:0.8.4-14
cluster-glue.x86_64 0:1.0-12.1 cluster-glue-libs.x86_64 0:1.0-12.1
heartbeat.x86_64 0:3.0.0-33.2 libesmtp.x86_64 0:1.0.4-10.fc11
libopenais2.x86_64 0:0.80.5-15.1 openhpi-libs.x86_64 0:2.14.0-2.fc11
pacemaker-libs.x86_64 0:1.0.5-4.1 resource-agents.x86_64 0:1.0-31.4
Complete!
[root@test1 ~]#
Dependencies Resolved
===========================================================================================================
Package Arch Version Repository Size
===========================================================================================================
Installing:
yum-plugin-versionlock noarch 1.1.22-1.fc11 updates 15 k
Transaction Summary
===========================================================================================================
Install 1 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Installed:
yum-plugin-versionlock.noarch 0:1.1.22-1.fc11
Complete!
[root@test1 ~]# rpm -q --qf "%{name}-%{version}-%{release}.%{arch}\n" openais >> /etc/yum/pluginconf.d/
versionlock.list
[root@test1 ~]# cat /etc/yum/pluginconf.d/versionlock.list
openais-0.80.5-15.1.x86_64
[root@test1 ~]#
Security Shortcuts
To simplify this guide and focus on the aspects directly connected to clustering, we will now disable the
machine’s firewall and SELinux installation. Both of these actions create significant security issues and should
not be performed on machines that will be exposed to the outside world.
Now reboot all nodes so the new security settings take effect.
Setup
For the purposes of this document, the additional node is called test2 with address 19.168.9.42.
Finalize Networking
Confirm that you can communicate with the two new nodes:
ping -c 3 192.168.9.42
[root@test1 ~]# ping -c 3 192.168.9.42
PING 192.168.9.42 (192.168.9.42) 56(84) bytes of data.
64 bytes from 192.168.9.42: icmp_seq=1 ttl=64 time=0.343 ms
64 bytes from 192.168.9.42: icmp_seq=2 ttl=64 time=0.402 ms
64 bytes from 192.168.9.42: icmp_seq=3 ttl=64 time=0.558 ms
Now we need to make sure we can communicate with the machines by their name. If you have a DNS server,
add additional entries for the three machines. Otherwise, you’ll need to add the machines to /etc/hosts . Below
are the entries for my cluster nodes:
ping test2
[root@test1 ~]# ping -c 3 test2
PING test2.clusterlabs.org (192.168.9.41) 56(84) bytes of data.
64 bytes from test1.clusterlabs.org (192.168.9.41): icmp_seq=1 ttl=64 time=0.164 ms
64 bytes from test1.clusterlabs.org (192.168.9.41): icmp_seq=2 ttl=64 time=0.475 ms
64 bytes from test1.clusterlabs.org (192.168.9.41): icmp_seq=3 ttl=64 time=0.186 ms
Configure SSH
SSH is a convenient and secure way to copy files and perform commands remotely. For the purposes of this
guide, we will create a key without a password (using the -N “” option) so that we can perform remote actions
without being prompted.
NOTE: Unprotected SSH keys, those without a password, are not recommended for servers exposed
to the outside world.
Create a new key and allow anyone with that key to log in:
Install the key on the other nodes and test that you can now run commands remotely, without being prompted
uname -n
dnsdomainname
[root@test1 ~]# uname -n
test1.clusterlabs.org
[root@test1 ~]# dnsdomainname
clusterlabs.org
The output from the second command is fine, but we really don’t need the domain name included in the basic
host details. To address this, we need to update /etc/sysconfig/network. This is what it should look like before
we start.
cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=test1.clusterlabs.org
GATEWAY=192.168.9.1
All we need to do now is strip off the domain name portion, which is stored elsewhere anyway.
Now confirm the change was successful. The revised file contents should look something like this.
cat /etc/sysconfig/network
[root@test1 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=test1
GATEWAY=192.168.9.1
However we’re not finished. The machine wont normally see the shortened host name until about it reboots,
but we can force it to update
source /etc/sysconfig/network
hostname $HOSTNAME
[root@test1 ~]# source /etc/sysconfig/network
[root@test1 ~]# hostname $HOSTNAME
uname -n
dnsdomainname
[root@test1 ~]# uname -n
test1
[root@test1 ~]# dnsdomainname
clusterlabs.org
Configuring OpenAIS
Choose a port number and multi-cast 8 address 9.
Be sure that the values you chose do not conflict with any existing clusters you might have. For advice on
choosing a multi-cast address, see http://www.29west.com/docs/THPM/multicast-address-assignment.html
For this document, I have chosen port 4000 and used 226.94.1.1 as the multi-cast address.
export ais_port=4000
export ais_mcast=226.94.1.1
Next we automatically determine the hosts address. By not using the full address, we make the configuration
suitable to be copied to other nodes.
export ais_addr=`ip addr | grep "inet " | tail -n 1 | awk '{print $4}' | sed s/255/0/`
Once you’re happy with the chosen values, update the OpenAIS configuration
The final configuration should look something like the sample in the appendix.
/etc/init.d/openais start
[root@test1 ~]# /etc/init.d/openais start
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
8 http://en.wikipedia.org/wiki/Multicast
9 http://en.wikipedia.org/wiki/Multicast_address
Check the cluster started correctly and that an initial membership was able to form
With one node functional, its now safe to start OpenAIS on the second node as well.
The exact node id will depend on the IP address of the node, unless you also chose 192.168.9.41, you should
expect to see a number other than 688498880.
ps axf
[root@test1 ~]# ps axf
PID TTY STAT TIME COMMAND
2 ? S< 0:00 [kthreadd]
3 ? S< 0:00 \_ [migration/0]
... lots of processes ...
2166 pts/0 SLl 0:01 aisexec
2172 ? SLs 0:00 \_ /usr/lib64/heartbeat/stonithd
2173 pts/0 S 0:00 \_ /usr/lib64/heartbeat/cib
2174 pts/0 S 0:00 \_ /usr/lib64/heartbeat/lrmd
2175 pts/0 S 0:00 \_ /usr/lib64/heartbeat/attrd
2176 pts/0 S 0:00 \_ /usr/lib64/heartbeat/pengine
2177 pts/0 S 0:00 \_ /usr/lib64/heartbeat/crmd
2178 pts/0 S 0:00 \_ aisexec
And finally, check for any ERRORs during startup, there shouldn’t be any, and display the cluster’s status.
Since Pacemaker 1.0, this has all changed and we have an integrated, scriptable, cluster shell that hides all the
messy XML scaffolding. It even allows you to queue up several changes at once and commit them atomically.
crm --help
[root@test1 ~]# crm --help
usage:
crm [-D display_type]
crm [-D display_type] args
crm [-D display_type] [-f file]
Examples:
# crm -f stopapp2.cli
# crm < stopapp2.cli
# crm resource stop global_www
# crm status
The primary tool for monitoring the status of the cluster is crm_mon (also available as crm status). It can be run in
a variety of modes and has a number of output options. To find out about any of the tools that come with
Pacemaker, simply invoke them with the --help option or consult the included man pages. Both sets of output
are created from the tool, and so will always be in sync with each other and the tool itself.
Additionally, the Pacemaker version and supported cluster stack(s) is available via the --version option.
crm_mon --version
crm_mon --help
[root@test1 ~]# crm_mon --version
crm_mon 1.0.5 for OpenAIS and Heartbeat (Build: 462f1569a43740667daf7b0f6b521742e9eb8fa7)
Modes:
-h, --as-html=value Write cluster status to the named file
-w, --web-cgi Web mode with output suitable for cgi
-s, --simple-status Display the cluster status once as a simple one line output (suitable for nagios)
-S, --snmp-traps=value Send SNMP traps to this station
-T, --mail-to=value Send Mail alerts to this user. See also --mail-from, --mail-host, --mail-prefix
Display Options:
-n, --group-by-node Group resources by node
-r, --inactive Display inactive resources
-f, --failcounts Display resource fail counts
-o, --operations Display resource operation history
-t, --timing-details Display resource operation history with timing details
Additional Options:
-i, --interval=value Update frequency in seconds
-1, --one-shot Display the cluster status once on the console and exit
-N, --disable-ncurses Disable the use of ncurses
-d, --daemonize Run in the background as a daemon
-p, --pid-file=value (Advanced) Daemon pid file location
-F, --mail-from=value Mail alerts should come from the named user
-H, --mail-host=value Mail alerts should be sent via the named host
-P, --mail-prefix=value Subjects for mail alerts should start with this string
-E, --external-agent=value A program to run when resource operations take place.
-e, --external-recipient=value A recipient for your program (assuming you want the program to send
something to someone).
Examples:
Display the cluster´s status on the console with updates as they occur:
# crm_mon
Display the cluster´s status on the console just once then exit:
# crm_mon
Display your cluster´s status, group resources by node, and include inactive resources in the list:
Start crm_mon as a background daemon and have it write the cluster´s status to an HTML file:
For those that are not of afraid of XML, you can see the raw configuration by appending “xml” to the previous
command.
Before we make any changes, its a good idea to check the validity of the configuration.
crm_verify -L
[root@test1 ~]# crm_verify -L
crm_verify[2195]: 2009/08/27_16:57:12 ERROR: unpack_resources: No STONITH resources have been defined
crm_verify[2195]: 2009/08/27_16:57:12 ERROR: unpack_resources: Either configure some or disable STONITH with
the stonith-enabled option
crm_verify[2195]: 2009/08/27_16:57:12 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH
to ensure data integrity
Errors found during check: config not valid
-V may provide more details
[root@test1 ~]#
In order to guarantee the safety of your data 10, Pacemaker ships with STONITH11 enabled. However it also
knows when no STONITH configuration has been supplied and reports this as a problem (since the cluster
would not be able to make progress if a situation requiring node fencing arose).
For now, we will disable this feature and configure it later in the Configuring STONITH section. It is important to
note that the use of STONITH is highly encouraged, turning it off tells the cluster to simply pretend that failed
nodes are safely powered off. Some vendors will even refuse to support clusters that have it disabled.
With the new cluster option set, the configuration is now valid.
11 A common node fencing mechanism. Used to ensure data integrity by powering off “bad” nodes.
Adding a Resource
The first thing we should do is configure an IP address. Regardless of where the cluster service(s) are running,
we need a consistent address to contact them on. Here I will choose and add 192.168.9.101 as the floating
address, give it the imaginative name ClusterIP and tell the cluster to check that its running every 30 seconds.
Important: The chosen address must not be one already associated with a physical node
op monitor interval=30s
The other important piece of information here is ocf:heartbeat:IPaddr2. This tells Pacemaker three things about
the resource you want to add. The first field, ocf, is the standard to which the resource script conforms to and
where to find it. The second field is specific to OCF resources and tells the cluster which namespace to find the
resource script in, in this case heartbeat. The last field indicates the name of the resource script.
crm ra classes
[root@test1 ~]# crm ra classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
To then find all the OCF resource agents provided by Pacemaker and Heartbeat, run
Now verify that the IP resource has been added and display the cluster’s status to see that it is now active.
crm_mon
[root@test1 ~]# crm configure show
node test1
node test2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.9.101" cidr_netmask="32" \
op monitor interval="30s"
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
[root@test1 ~]# crm_mon
============
Last updated: Fri Aug 28 15:23:48 2009
Stack: openais
Current DC: test1 - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Perform a Failover
Being a high-availability cluster, we should test failover of our new resource before moving on.
Once OpenAIS is no longer running, go to the other node and check the cluster status with crm_mon.
Online: [ test2 ]
OFFLINE: [ test1 ]
There are three things to notice about the cluster’s current state. The first is that, as expected, test1 is now
offline. However we can also see that ClusterIP isn’t running anywhere!
A cluster is said to have quorum when more than half the known or expected nodes are online, or for the
mathematically inclined, whenever the following equation is true:
Therefore a two-node cluster only has quorum when both nodes are running, which is no longer the case for our
cluster. This would normally make the creation of a two-node cluster pointless 12, however it is possible to
control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore
quorum altogether.
12 Actually some would argue that two-node clusters are always pointless, but that is an argument for another time.
After a few moments, the cluster will start the IP address on the remaining node. Note that the cluster still does
not have quorum.
Online: [ test2 ]
OFFLINE: [ test1 ]
Now simulate node recovery by restarting the cluster stack on test1 and check the cluster’s status.
/etc/init.d/openais start
crm_mon
[root@test1 ~]# /etc/init.d/openais start
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[root@test1 ~]# crm_mon
============
Last updated: Fri Aug 28 15:32:13 2009
Stack: openais
Current DC: test2 - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Here we see something that some may consider surprising, the IP is back running at its original location!
To address this, Pacemaker has the concept of resource stickiness which controls how much a service prefers
to stay running where it is. You may like to think of it as the “cost” of any downtime. By default, Pacemaker
assumes there is zero cost associated with moving resources and will do so to achieve “optimal13” resource
placement. We can specify a different stickiness for every resource, but it is often sufficient to change the
default.
If we now retry the failover test, we see that as expected ClusterIP still moves to test2 when test1 is taken offline.
Online: [ test2 ]
OFFLINE: [ test1 ]
13 It should be noted that Pacemaker’s definition of optimal may not always agree with that of a human’s. The order in which
Pacemaker processes lists of resources and nodes create implicit preferences (required in order to create a stabile solution)
in situations where the administrator had not explicitly specified some.
However when we bring test1 back online, ClusterIP now remains running on test2.
Now that we have a basic but functional active/passive two-node cluster, we’re ready to add some real services.
We’re going to start with Apache because its a feature of many clusters and relatively simple to configure.
Preparation
First we need to create a page for Apache to serve up. On Fedora 11 the default Apache docroot is /var/www/
html, so we’ll create an index file there.
</html>
END
[root@test1 ~]# cat <<-END >/var/www/html/index.html
> <html>
> <body>My Test Site - test1</body>
> </html>
> END
[root@test1 ~]#
For the moment, we will simplify things by serving up only a static site and manually sync the data between the
two nodes. So run the command again on test2.
<html>
<body>My Test Site - test2</body>
</html>
END
[root@test2 ~]# cat <<-END >/var/www/html/index.html
> <html>
> <body>My Test Site - test2</body>
> </html>
> END
[root@test2 ~]#
Wait a moment, the WebSite resource isn’t running on the same host as our IP address!
14 Compare the key used here ocf:heartbeat:apache with the one we used earlier for the IP address: ocf:heartbeat:IPaddr2
crm_mon
[root@test1 ~]# crm configure location prefer-test1 WebSite 50: test1
[root@test1 ~]# crm configure show
node test1
node test2
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.9.101" cidr_netmask="32" \
op monitor interval="30s"
location prefer-test1 WebSite 50: test1
colocation website-with-ip inf: WebSite ClusterIP
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
[root@test1 ~]# crm_mon
============
Last updated: Fri Aug 28 16:17:35 2009
Stack: openais
Current DC: test2 - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Even though we now prefer test1 over test2, that preference is (intentionally) less than the resource stickiness
(how much we preferred not to have unnecessary downtime).
To see the current placement scores, you can use a tool called ptest
ptest -sL
Notice how the colocation rule we created has ensured that ClusterIP was also moved to test1.
For the curious, we can see the effect of this command by examining the configuration
Note that the automated constraint is now gone. If we check the cluster status, we can also see that as
expected the resources are still active on test1.
Even if you’re serving up static websites, having to manually synchronize the contents of that website to all the
machines in the cluster is not ideal. For dynamic websites, such as a wiki, its not even an option. Not everyone
care afford network-attached storage but somehow the data needs to be kept in sync. Enter DRBD which can
be thought of as network based RAID-1. See http://www.drbd.org/ for more details.
Install Pre-requisites
DRBD does not currently ship with Fedora and since there is a kernel component, can be sensitive to system
updates which may change the kernel’s APIs and ABIs. For this reason we’ll simply build our own DRBD
packages - to be sure they are a perfect match for the machine.
Dependencies Resolved
==============================================================================================
Package Arch Version Repository Size
==============================================================================================
Installing:
flex x86_64 2.5.35-5.fc11 fedora 319 k
gcc x86_64 4.4.1-2.fc11 updates 11 M
glibc-devel x86_64 2.10.1-4 updates 998 k
kernel-devel x86_64 2.6.29.6-217.2.16.fc11 updates 6.2 M
kernel-headers x86_64 2.6.29.6-217.2.16.fc11 updates 855 k
rpm-build x86_64 4.7.1-1.fc11 updates 128 k
Installing for dependencies:
cloog-ppl x86_64 0.15-0.9.gitb9d79.fc11 updates 90 k
elfutils x86_64 0.142-1.fc11 updates 232 k
elfutils-libs x86_64 0.142-1.fc11 updates 205 k
glibc-headers x86_64 2.10.1-4 updates 630 k
ppl x86_64 0.10.2-3.fc11 updates 1.7 M
xz x86_64 4.999.8-0.8.beta.20090817git.fc11 updates 123 k
Transaction Summary
==============================================================================================
Install 12 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total size: 22 M
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : elfutils-libs-0.142-1.fc11.x86_64 1/12
Installing : elfutils-0.142-1.fc11.x86_64 2/12
Installing : xz-4.999.8-0.8.beta.20090817git.fc11.x86_64 3/12
Installing : ppl-0.10.2-3.fc11.x86_64 4/12
Installing : cloog-ppl-0.15-0.9.gitb9d79.fc11.x86_64 5/12
Installing : kernel-headers-2.6.29.6-217.2.16.fc11.x86_64 6/12
Installing : rpm-build-4.7.1-1.fc11.x86_64 7/12
Installing : flex-2.5.35-5.fc11.x86_64 8/12
Installing : glibc-headers-2.10.1-4.x86_64 9/12
Installing : glibc-devel-2.10.1-4.x86_64 10/12
Installing : kernel-devel-2.6.29.6-217.2.16.fc11.x86_64 11/12
Installing : gcc-4.4.1-2.fc11.x86_64 12/12
Installed:
flex.x86_64 0:2.5.35-5.fc11 gcc.x86_64 0:4.4.1-2.fc11
glibc-devel.x86_64 0:2.10.1-4 kernel-devel.x86_64 0:2.6.29.6-217.2.16.fc11
kernel-headers.x86_64 0:2.6.29.6-217.2.16.fc11 rpm-build.x86_64 0:4.7.1-1.fc11
Dependency Installed:
cloog-ppl.x86_64 0:0.15-0.9.gitb9d79.fc11 elfutils.x86_64 0:0.142-1.fc11
elfutils-libs.x86_64 0:0.142-1.fc11 glibc-headers.x86_64 0:2.10.1-4
ppl.x86_64 0:0.10.2-3.fc11 xz.x86_64 0:4.999.8-0.8.beta.20090817git.fc11
Complete!
[root@test2 ~]#
wget http://oss.linbit.com/drbd/8.3/drbd-8.3.2.tar.gz
tar zxf drbd-8.3.2.tar.gz
cd drbd-8.3.2
sed -i.gres "s/%files\ -f\ %{_builddir}\/%{name}-%{version}\//%files\ -f\ /g" drbd.spec.in
grep files.*file.list *spec.in
make rpm
The following build log is quite long and is only included for reference in case it does not work on your machine.
Most people can skip to the end.
15 At the time of writing, the latest version was 8.3.2. If a later version is now available it would be advisable to try that first.
rm -f drbd_buildtag.c drbd_strings.c
rm -f *~
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/user'
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/scripts'
rm -f *~
rm -f datadisk
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/scripts'
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/documentation'
To clean the documentation: make doc-clean
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/documentation'
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/drbd'
rm -rf .tmp_versions
rm -f *.[oas] *.ko .*.cmd .*.d .*.tmp *.mod.c .*.flags .depend .kernel*
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/drbd'
rm -f *~
rm -rf dist
make[1]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2'
+ '[' -e /etc/redhat-release ']'
+ make all doc PREFIX=/root/drbd-8.3.2/dist/install/ MANDIR=/usr/share/man KDIR=/lib/modules/
2.6.29.6-217.2.16.fc11.x86_64/build LOCALVERSION=
make[1]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2'
make -C drbd drbd_buildtag.c
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/drbd'
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/drbd'
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/user'
flex -s -odrbdadm_scanner.c drbdadm_scanner.fl
gcc -O1 -c -W -Wall -I../drbd -c -o drbdadm_scanner.o drbdadm_scanner.c
gcc -O1 -c -W -Wall -I../drbd -c -o drbdadm_parser.o drbdadm_parser.c
gcc -O1 -c -W -Wall -I../drbd -c -o drbdadm_main.o drbdadm_main.c
drbdadm_main.c: In function ‘sanity_check_abs_cmd’:
drbdadm_main.c:2440: warning: suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’ to ‘~’
gcc -O1 -c -W -Wall -I../drbd -c -o drbdadm_adjust.o drbdadm_adjust.c
gcc -O1 -c -W -Wall -I../drbd -c -o drbdtool_common.o drbdtool_common.c
gcc -O1 -c -W -Wall -I../drbd -c -o drbdadm_usage_cnt.o drbdadm_usage_cnt.c
cp ../drbd/drbd_buildtag.c drbd_buildtag.c
gcc -O1 -c -W -Wall -I../drbd -c -o drbd_buildtag.o drbd_buildtag.c
gcc -O1 -c -W -Wall -I../drbd -c -o drbdadm_minor_table.o drbdadm_minor_table.c
gcc -o drbdadm drbdadm_scanner.o drbdadm_parser.o drbdadm_main.o drbdadm_adjust.o drbdtool_common.o
drbdadm_usage_cnt.o drbd_buildtag.o drbdadm_minor_table.o
gcc -O1 -c -W -Wall -I../drbd -c -o drbdmeta.o drbdmeta.c
flex -s -odrbdmeta_scanner.c drbdmeta_scanner.fl
gcc -O1 -c -W -Wall -I../drbd -c -o drbdmeta_scanner.o drbdmeta_scanner.c
gcc -o drbdmeta drbdmeta.o drbdmeta_scanner.o drbdtool_common.o drbd_buildtag.o
gcc -O1 -c -W -Wall -I../drbd -I/lib/modules/2.6.29.6-217.2.16.fc11.x86_64/build/include -c -o drbdsetup.o
drbdsetup.c
cp ../drbd/drbd_strings.c drbd_strings.c
gcc -O1 -c -W -Wall -I../drbd -I/lib/modules/2.6.29.6-217.2.16.fc11.x86_64/build/include -c -o
drbd_strings.o drbd_strings.c
gcc -o drbdsetup drbdsetup.o drbdtool_common.o drbd_buildtag.o drbd_strings.o
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/user'
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/scripts'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/scripts'
make[2]: Entering directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/documentation'
To (re)make the documentation: make doc
make[2]: Leaving directory `/root/drbd-8.3.2/dist/BUILD/drbd-8.3.2/documentation'
Dependencies Resolved
==================================================================================
Package
Arch Version Repository Size
==================================================================================
Installing:
drbd x86_64 8.3.2-3 /drbd-8.3.2-3.x86_64 422 k
drbd-km-2.6.29.6_217.2.16.fc11.x86_64
x86_64 8.3.2-3 /drbd-km-2.6.29.6_217.2.16.fc11.x86_64-8.3.2-3.x86_64 3.7 M
Transaction Summary
==================================================================================
Install 2 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Installed:
drbd.x86_64 0:8.3.2-3 drbd-km-2.6.29.6_217.2.16.fc11.x86_64.x86_64 0:8.3.2-3
Complete!
By default DRBD configures itself to start when the machine is powered on, however since we want the cluster
to manage it, we will need to disable this behavior:
We could rebuild the drbd package on test2, however if they share the same architecture (x86_64 in this case)
we can reuse the ones we built for test1. Assuming this is the case for you, copy them to test2 and install:
Dependencies Resolved
================================================================================
Package
Arch Version Repository Size
================================================================================
Installing:
drbd
x86_64 8.3.2-3 /drbd-8.3.2-3.x86_64 422 k
drbd-km-2.6.29.6_217.2.16.fc11.x86_64
x86_64 8.3.2-3 /drbd-km-2.6.29.6_217.2.16.fc11.x86_64-8.3.2-3.x86_64 3.7 M
Transaction Summary
================================================================================
Install 2 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Installed:
drbd.x86_64 0:8.3.2-3 drbd-km-2.6.29.6_217.2.16.fc11.x86_64.x86_64 0:8.3.2-3
Complete!
[root@test1 drbd-8.3.2]#
Configure DRBD
Before we configure DRBD, we need to set aside some disk for it to use.
If you arrived at this point without any space for a new LVM partition, you can cheat and steal some from the
swap device. This is less than an ideal solution, but it is significantly simpler than resizing a live root filesystem.
lvs
[root@test1 drbd-8.3.2]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv_root vg_test1 -wi-ao 7.30G
lv_swap vg_test1 -wi-ao 512.00M
swapoff -a
lvremove -f vg_test1/lv_swap
lvcreate -n lv_swap -L 500m vg_test1
mkswap /dev/mapper/vg_test1-lv_swap
swapon -a
lvs
[root@test1 ~]# swapoff -a
[root@test1 drbd-8.3.2]# lvremove -f vg_test1/lv_swap
Logical volume "lv_swap" successfully removed
[root@test1 drbd-8.3.2]# lvcreate -n lv_swap -L 500m vg_test1
Logical volume "lv_swap" created
[root@test1 drbd-8.3.2]# mkswap /dev/mapper/vg_test1-lv_swap
Setting up swapspace version 1, size = 511996 KiB
no label, UUID=aaddc16a-b307-46b8-9efe-93cd8bbfc08d
[root@test1 drbd-8.3.2]# swapon -a
[root@test1 drbd-8.3.2]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv_root vg_test1 -wi-ao 7.30G
lv_swap vg_test1 -wi-ao 500.00M
lvs
swapoff -a
lvremove -f vg_test2/lv_swap
lvcreate -n lv_swap -L 500m vg_test2
mkswap /dev/mapper/vg_test2-lv_swap
swapon -a
lvs
[root@test1 drbd-8.3.2]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv_root vg_test2 -wi-ao 7.30G
lv_swap vg_test2 -wi-ao 512.00M
[root@test2 ~]# swapoff -a
[root@test2 ~]# lvremove -f vg_test2/lv_swap
Logical volume "lv_swap" successfully removed
[root@test2 ~]# lvcreate -n lv_swap -L 500m vg_test2
Logical volume "lv_swap" created
[root@test2 ~]# mkswap /dev/mapper/vg_test2-lv_swap
Setting up swapspace version 1, size = 511996 KiB
no label, UUID=aaddc16a-b307-46b8-9efe-93cd8bbfc08d
[root@test2 ~]# swapon -a
[root@test2 ~]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv_root vg_test2 -wi-ao 7.30G
lv_swap vg_test2 -wi-ao 500.00M
Repeat this on the second node, be sure to use the same size partition.
lvs
lvcreate -n drbd-demo -L 10m vg_test2
lvs
[root@test2 ~]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv_root vg_test2 -wi-ao 7.30G
lv_swap vg_test2 -wi-ao 500.00M
[root@test2 ~]# lvcreate -n drbd-demo -L 10m vg_test2
Detailed information on the directives used in this configuration (and other alternatives) is available from http://
www.drbd.org/users-guide/ch-configure.html
Be sure to use the names and addresses of your nodes if they differ from the ones used in this guide.
global {
usage-count yes;
}
common {
protocol C;
}
resource wwwdata {
meta-disk internal;
device /dev/drbd1;
syncer {
verify-alg sha1;
}
net {
allow-two-primaries;
}
on test1 {
disk /dev/dev/mapper/vg_test1-drbd--demo;
address 192.168.9.41:7789;
}
on test2 {
disk /dev/dev/mapper/vg_test2-drbd--demo;
address 192.168.9.42:7789;
}
}
Now load the DRBD kernel module and confirm that everything is sane
modprobe drbd
drbdadm up wwwdata
cat /proc/drbd
[root@test1 drbd-8.3.2]# modprobe drbd
[root@test1 drbd-8.3.2]# drbdadm up wwwdata
[root@test1 drbd-8.3.2]# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@test1.clusterlabs.org, 2009-08-31 13:09:34
modprobe drbd
drbdadm up wwwdata
cat /proc/drbd
[root@test2 ~]# drbdadm --force create-md wwwdata
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
[root@test2 ~]# modprobe drbd
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
[root@test2 ~]# drbdadm up wwwdata
[root@test2 ~]# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@test1.clusterlabs.org, 2009-08-31 13:09:34
Now we need to tell DRBD which set of data to use. Since both sides contain garbage, we can run the
following on test1:
test1 is now in the Primary state which allows it to be written to. Which means its a good point at which to
create a filesystem and populate it with some data to serve up via our WebSite resource.
Now mount the newly created filesystem so we can create our index file
umount /dev/drbd1
[root@test1 ~]# mount /dev/drbd1 /mnt/
[root@test1 ~]# cat <<-END >/mnt/index.html
> <html>
> <body>My Test Site - drbd</body>
> </html>
> END
[root@test1 ~]# umount /dev/drbd1
And finally, confirm the data is in sync between the two nodes
First we launch the shell. The prompt will change to indicate you’re in interactive mode.
crm
[root@test1 ~]# crm
cib crm(live)#
Next we must create a working copy or the current configuration. This is where all our changes will go. The
cluster will not see any of them until we say its ok. Notice again how the prompt changes, this time to indicate
that we’re no longer looking at the live cluster.
Now we can create our DRBD clone and display the revised configuration.
Once we’re happy with the changes, we can tell the cluster to start using them and use crm_mon to check
everything is functioning.
Now that DRBD is functioning we can configure a Filesystem resource to use it. In addition to the filesystem’s
definition, we also need to tell the cluster where it can be located (only on the DRBD Primary) and when it is
allowed to start (after the Primary was promoted).
crm
cib new fs
configure primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/dev/mapper/vg_test1-drbd--demo" directory="/var/www/html"
fstype="ext4"
configure colocation fs_on_drbd inf: WebFS WebDataClone:Master
configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
[root@test1 ~]# crm
crm(live)# cib new fs
INFO: fs shadow CIB created
crm(fs)# configure primitive WebFS ocf:heartbeat:Filesystem params device="/dev/dev/mapper/vg_test1-drbd--
demo" directory="/var/www/html" fstype="ext4"
crm(fs)# configure colocation fs_on_drbd inf: WebFS WebDataClone:Master
crm(fs)# configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
We also need to tell the cluster that Apache needs to run on the same machine as the filesystem and that it
must be active before Apache can start.
After reviewing the new configuration, we again upload it and watch the cluster put it into effect.
cib commit fs
quit
crm_mon
crm(fs)# cib commit fs
INFO: commited 'fs' shadow CIB to the cluster
crm(fs)# quit
bye
[root@test1 ~]# crm_mon
============
Last updated: Tue Sep 1 10:08:44 2009
Stack: openais
Current DC: test1 - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Testing Migration
We could shut down the active node again, but another way to safely simulate recovery is to put the node into
what is called “standby mode”. Nodes in this state tell the cluster that they are not allowed to run resources.
Any resources found active there will be moved elsewhere. This feature can be particularly useful when
updating the resources’ packages.
Put the local node into standby mode and observe the cluster move all the resources to the other node. Note
also that the node’s status will change to indicate that it can no longer host resources.
Once we’ve done everything we needed to on test1 (in this case nothing, we just wanted to see the resources
move), we can allow the node to be a full cluster member again.
Notice that our resource stickiness settings prevent the services from migrating back to test1.
Conversion to Active/Active
The primary requirement for an Active/Active cluster is that the data required for your services are available,
simultaneously, on both machines. Pacemaker makes no requirement on how this is achieved, you could use a
SAN if you had one available, however since DRBD supports multiple Primaries, we can also use that.
The only hitch is that we need to use a cluster-aware filesystem (and the one we used earlier with DRBD, ext4, is
not one of those). GFS2 support is planed for later this year, however the best option currently is OCFS2. It
comes with Fedora 11 and supports
Dependencies Resolved
=================================================================================
Package Arch Version Repository Size
=================================================================================
Installing:
ocfs2-tools x86_64 1.4.1-41.1 server_ha-clustering 832 k
Installing for dependencies:
Transaction Summary
=================================================================================
Install 3 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Installed:
ocfs2-tools.x86_64 0:1.4.1-41.1
Dependency Installed:
libdlm.x86_64 0:2.99.08-15.1 libdlm2.x86_64 0:2.99.08-15.1
Complete!
The second service is OCFS2’s own control daemon which also integrates with Pacemaker to obtain node
membership data.
cluster-infrastructure="openais" \
expected-quorum-votes=”2” \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness=”100”
Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster’s response
cib commit stack-glue
quit
crm_mon
crm(stack-glue)# cib commit stack-glue
INFO: commited 'stack-glue' shadow CIB to the cluster
crm(stack-glue)# quit
bye
[root@test1 ~]# crm_mon
============
Last updated: Thu Sep 3 20:49:54 2009
Stack: openais
Current DC: test2 - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
5 Resources configured.
============
Before we do so, check to see that the o2cb service is not being automatically started when the machine boots.
This service is only used when OCFS2 uses its internal cluster stack. Neither it, nor the cluster.conf it creates
should be used when using OCFS2 with Pacemaker.
There are currently a few minor issues with the o2cb script that need to be addressed before we use it. To
make the script compatible with Fedora 11, run the following commands (there should be no output).
Now use the crm shell to create the o2cb cluster resource:
crm
cib new oracle-glue
configure primitive o2cb ocf:ocfs2:o2cb op monitor interval=120s
configure clone o2cb-clone o2cb meta interleave=true
[root@test1 ~]# crm
crm(live)# cib new oracle-glue
INFO: oracle-glue shadow CIB created
crm(oracle-glue)# configure primitive o2cb ocf:ocfs2:o2cb op monitor interval=120s
crm(oracle-glue)# configure clone o2cb-clone o2cb meta interleave=true
Now ensure Pacemaker only starts the o2cb service on nodes that also have a copy of the dlm service (created above) al-
ready running
configure colocation o2cb-with-dlm INFINITY: o2cb-clone dlm-clone
configure order start-o2cb-after-dlm mandatory: dlm-clone o2cb-clone
crm(oracle-glue)# configure colocation o2cb-with-dlm INFINITY: o2cb-clone dlm-clone
crm(oracle-glue)# configure order start-o2cb-after-dlm mandatory: dlm-clone o2cb-clone
Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster’s response
configure show
cib commit oracle-glue
quit
crm_mon
crm(oracle-glue)# configure show
node test1
node test2
primitive WebData ocf:linbit:drbd \
params drbd_resource="wwwdata" \
op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4"
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.9.101" cidr_netmask="32" \
op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
op monitor interval="120s"
primitive o2cb ocf:ocfs2:o2cb \
op monitor interval="120s"
ms WebDataClone WebData \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone dlm-clone dlm \
meta interleave="true"
clone o2cb-clone o2cb \
meta interleave="true"
location prefer-test1 WebSite 50: test1
Preparation
Before we do anything to the existing partition, we need to make sure it is unmounted. We do this by tell the
cluster to stop the WebFS resource. This will ensure that other resources (in our case, Apache) using WebFS are
not only stopped, but stopped in the correct order.
Important: This will erase all previous content stored on the DRBD device. Ensure you have a copy
of any important data.
The OCFS2 defaults are pretty good, so simply point mkfs to the DRBD partition we’re going to use.
mkfs.ocfs2 /dev/drbd1
[root@test1 ~]# mkfs.ocfs2 /dev/drbd1
mkfs.ocfs2 1.4.2
Cluster stack: pcmk
Cluster name: pacemaker
NOTE: Selecting extended slot map for userspace cluster stack
Filesystem label=
Block size=1024 (bits=10)
Cluster size=4096 (bits=12)
Volume size=12541952 (3062 clusters) (12248 blocks)
1 cluster groups (tail covers 3062 clusters, rest cover 3062 clusters)
Journal size=4194304
Initial number of node slots: 2
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 0 block(s)
Formatting Journals: done
Formatting slot map: done
Writing lost+found: done
Formatting quota files: done
mkfs.ocfs2 successful
[root@test1 ~]#
Then (re)populate the new filesystem with data (web pages). For now we’ll create another variation on our home
page.
Now that we’ve recreated the resource, we also need to recreate all the constraints that used it. This is
because the shell will automatically remove any constraints that referenced WebFS.
Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster’s response
This will involve a number of changes, so we’ll again use interactive mode.
crm
cib new active
There’s no point making the services active on both locations if we can’t reach them, so lets first clone the IP
address. Cloned IPaddr2 resources use an iptables rule to ensure that each request only processed by one of
the two clone instances. The additional meta options tell the cluster how many instances of the clone we want
(one “request bucket” for each node) and that if all other nodes fail, then the remaining node should hold all of
them. Otherwise the requests would be simply discarded.
Notice how any constraints that referenced ClusterIP have been updated to use WebIP instead. This is an
additional benefit of using the crm shell.
Next we need to convert the filesystem and Apache resources into clones. Again, the shell will automatically
update any relevant constraints.
The last step is to tell the cluster that it is now allowed to promote both instances to be Primary (aka. Master).
Change master-max to 2
configure show
crm(active)# configure clone WebFSClone WebFS
crm(active)# configure clone WebSiteClone WebSite
crm(active)# configure edit WebDataClone
crm(active)# configure show
node test1
node test2
primitive WebData ocf:linbit:drbd \
params drbd_resource="wwwdata" \
op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ocfs2"
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.9.101" cidr_netmask="32" \
op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
op monitor interval="120s"
primitive o2cb ocf:ocfs2:o2cb \
op monitor interval="120s"
ms WebDataClone WebData \
meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone WebFSClone WebFS
clone WebIP ClusterIP \
meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster’s response
Recovery
Configure STONITH
Just because a node is unresponsive, this doesn’t mean it isn’t accessing your data. The only way to be 100%
sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing
the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the
cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.
The biggest mistake people make in choosing a STONITH device is to use remote power switch (such as many
onboard IMPI controllers) that shares power with the node it controls. In such cases, the cluster cannot be sure
if the node is really offline, or active and suffering from a network fault.
Likewise, any device that relies on the machine being active (such as SSH-based “devices” used during testing)
are inappropriate.
Configuring STONITH
1. Find the correct driver: stonith -L
2. Since every device is different, the parameters needed to configure it will vary. To find out the parameters required by
the device: stonith -t {type} -n
Hopefully the developers chose names that make sense, if not you can query for some additional information by finding
an active cluster node and running:
The output should be XML formatted text containing additional parameter descriptions
3. Create a file called stonith.xml containing a primitive resource with a class of stonith, a type of {type} and a parameter
for each of the values returned in step 2
4. Create a clone from the primitive resource if the device can shoot more than one node and supports multiple
simultaneous connections.
5. Upload it into the CIB using cibadmin: cibadmin -C -o resources --xml-file stonith.xml
Example
Assuming we have an IBM BladeCenter containing our two nodes and the management interface is active on
192.168.9.31, then we would chose the external/ibmrsa driver in step 2 and obtain the following list of
parameters
stonith -t external/ibmrsa -n
[root@test1 ~]# stonith -t external/ibmrsa -n
hostname ipaddr userid passwd type
Assuming we know the username and password for the management interface, we would create a STONITH
resource with the shell
crm
cib new active
configure primitive rsa-fencing stonith::external/ibmrsa \
params hostname=”test1 test2" ipaddr=192.168.9.31 userid=mgmt passwd=abc123 type=ibm \
op monitor interval="60s"
configure clone Fencing rsa-fencing
[root@test1 ~]# crm
crm(live)# cib new stonith
INFO: stonith shadow CIB created
crm(stonith)# configure primitive rsa-fencing stonith::external/ibmrsa \
params hostname=”test1 test2" ipaddr=192.168.9.31 userid=mgmt passwd=abc123 type=ibm \
op monitor interval="60s"
crm(stonith)# configure clone Fencing rsa-fencing
ms WebDataClone WebData \
meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone Fencing rsa-fencing
clone WebFSClone WebFS
clone WebIP ClusterIP \
meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
clone WebSiteClone WebSite
clone dlm-clone dlm \
meta interleave="true"
clone o2cb-clone o2cb \
meta interleave="true"
colocation WebFS-with-o2cb inf: WebFSClone o2cb-clone
colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone
colocation fs_on_drbd inf: WebFSClone WebDataClone:Master
colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
colocation website-with-ip inf: WebSiteClone WebIP
order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
order WebSite-after-WebFS inf: WebFSClone WebSiteClone
order apache-after-ip inf: WebIP WebSiteClone
order start-WebFS-after-o2cb inf: o2cb-clone WebFSClone
order start-o2cb-after-dlm inf: dlm-clone o2cb-clone
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes=”2” \
stonith-enabled="true" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness=”100”
Configuration Recap
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes=”2” \
stonith-enabled=”true” \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness=”100”
Node List
The list of cluster nodes is automatically populated by the cluster.
node test1
node test2
Cluster Options
This is where the cluster automatically stores some information about the cluster
• dc-version - the version (including upstream source-code hash) of Pacemaker used on the DC
• cluster-infrastructure - the cluster infrastructure being used (heartbeat or openais)
• expected-quorum-votes - the maximum number of nodes expected to be part of the cluster
and where the admin can set options that control the way the cluster operates
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes=”2” \
stonith-enabled=”true” \
no-quorum-policy="ignore"
Resources
Default Options
Here we configure cluster options that apply to every resource.
Fencing
Service Address
Users of the services provided by the cluster require an unchanging address with which to access it.
Additionally, we cloned the address so it will be active on both nodes. An iptables rule (created as part of the
resource agent) is used to ensure that each request only processed by one of the two clone instances. The
additional meta options tell the cluster that we want two instances of the clone (one “request bucket” for each
node) and that if one node fails, then the remaining node should hold both.
Cluster Filesystem
The cluster filesystem ensures that files are read and written correctly. We need to specify the block device
(provided by DRBD), where we want it mounted and that we are using OCFS2. Again it is a clone because it is
intended to be active on both nodes. The additional constraints ensure that it can only be started on nodes with
active o2cb and drbd instances.
Apache
Lastly we have the actual service, Apache. We need only tell the cluster where to find it’s main configuration file
and restrict it to running on nodes that have the required filesystem mounted and the IP address active.
aisexec {
# Run as root - this is necessary to be able to manage resources with Pacemaker
user: root
group: root
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
use_mgmtd: yes
use_logd: yes
}
totem {
version: 2
# How long to wait for join messages in the membership protocol (ms)
join: 1000
# How long to wait for consensus to be achieved before starting a new round of membership configuration
(ms)
consensus: 2500
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Disable encryption
secauth: off
interface {
ringnumber: 0
logging {
debug: off
fileline: off
to_syslog: yes
to_stderr: off
syslog_facility: daemon
timestamp: on
}
amf {
mode: disabled
}
Project Website
http://www.clusterlabs.org
Cluster Commands
A comprehensive guide to cluster commands has been written by Novell and can be found at:
http://www.novell.com/documentation/sles11/book_sleha/index.html?page=/documentation/sles11/
book_sleha/data/book_sleha.html
OpenAIS
http://www.openais.org