Red Hat cluster suite in Debian GNU / Linux 4.0 Etch

The Red Hat cluster suite packages in Debian 4.0 Etch are partially broken. See how to patch them to get your cluster up and running.

I presume you are an experienced system administrator if you are going to get a high availability cluster running.
Red Hat cluster suite is intended for High Availability clustering, and load balancing of virtual servers.
You will have to patch source files and repackage them the Debian way. It must not be a mistery for you.
Download the Debian source files for Red Hat cluster from Debian repository [6 ] and place them in /usr/src.
Also, it is a lot better to add your user name to the group "src".
This group has special privileges over source files operations, despite not being root and will ease things ahead without risking your system.
It has to be /usr/src directory because you will have to repackage some other packages and they presume this directory.
The rgmanager is not binary packaged in Debian, but it is at the source package.
So, you have to follow 2 bug reports for it and apply the patch.
The missing dependency [0 ] is solved along with the missing control definition for rgmanager [1].
Apply the patch showed at the end of bug report and solve both problems.
There is a serious problem that gnbd kernel module is not binary packaged in Debian.
You will need to carefully apply three patches showed at the bug report [2 ].
First you have to patch the source package redhat-cluster in order to get a correct redhat-cluster-source package instead of the original [7 ]
This resulting redhat-cluster-source package will be used for generating a correct linux-modules-extra-2.6 for your kernel instead of those from repositories [8 ].
Verify that the resulting redhat-cluster-source package is installed in /usr/src.
Special attention to the displayed command lines. They show the exact files affected.
You will end with a dozen of binary packages in /usr/src.
Move or copy most of them to another directory you will use for installation.
Select the ones suitable for the cluster you are willing to configure.
DLM and GULM are mutually exclusive.
Development packages are not needed in all machines.
You will need good clvm [9 ] and fence [4] init scripts, if the defaults are not good enough.
The installation will fail at configuration phase if you do not have an /etc/cluster/cluster.conf [4 ].
I do not like to have such default file for this task.
If install without the file, you will have to repeat the installation to force reconfiguration of all pending packages (dpkg -i *.deb) .
As you will have many files at /etc/default/* and /etc to configure at each machine, and will have to carefully craft YOUR /etc/cluster/cluster.conf , it is not a real blocking problem.
The linux-modules-extra-2.6 , when recompiled and repackaged, generates lots of binary packages for different kernels. You will have to select the suitable ones for your machines.
Also, if you upgrade kernel, you will have to regenerate it.
The cluster configuration and set up is a subject for a future article.

GFS over GNBD quirks

Freezes when writing
The Red Hat cluster source 1.03.00-2 with the kernel 2.6.18-4 has a subtle problem leading to seemingly random filesystem I/O freezes during file writings when you mount GFS over GNBD imported devices from GNBD servers.
This problem could be masked by SAN with clever cache and connections and GNBD servers interconected into the cluster through really fast fiber optic dedicated network.
The gnbd servers and cluster coordination involves a big network overhead above the actual filesystem operations.
This problems become evident at slow shared (saturated?) networks.
The I/O filesystem calls are uninterruptible, and your solution is to shutdown or even hardware reset the machine importing the gnbd device.
The work around is to use an undocumented feature of GFS.
Mount at the GFS client machines using option "sync" for disabling filesystem cache by using synchronous transactions.
This will have big bad impact on performance but will give reliable operation.
The newest Red Hat cluster source and kernel versions should be tested about this problem.

Concurrent accesses
During further tests, I spotted a problem with concurrent access at GFS over GNBD imported devices.
Even using sync mount option, when one reads the same device that is being written, ramdom freezes happen.
Tried various other configurations, but nothing solved the problem.
This kind of problem is sometimes difficult to reproduce. You have to impose a reasonable payload over your combined hardware and network capacity.
So, it is unlikely to be seen on high end hardware (servers and disks) and private Fiber Channel networks under confortable loads.
At low end hardware and shared (loaded / crowded?) network, it is much more easy to reproduce.
Even if you use Red Hat, you should investigate your set up against this problem under heavy load.
It seems a problem of GNBD because when you use local GFS, it does not happen.
I further investigated the issue and realized that GNBD kernel module seems to be caught in an I/O deadlock.
As Red Hat supports it in its kernel releases, I guess that it is an unfortunate association of kernel 2.6.18 and Red Hat Cluster Suite 1.03.00-2 source packaged in Debian 4.0 Etch.
Examining the CVS commits at gnbd-kernel source subtree, I realized that developers were already aware of some potential race conditions and dead locks, addressed in recent development versions.
But as GFS2 (included in newer redhat-cluster-suite 2.x series) is considered beta even by Red Hat (as of May 2007), and I would like to not diverge too much of official Debian 4.0 packages (using redhat-cluster-suite 1.x series) , decided to discard GNBD because of its reliability issues.
Will miss its built in fencing code and straight integration with Red Hat cluster suite.
It is also easier to deploy than other solutions in this enironment.

GFS over iSCSI
I am currently testing iSCSI for the task of substituting GNBD, confirmed as the Achiles heel of the cluster.
You need a target device (the device "server") and a scsi initiator (the "client").
I am using all software implementations.
For the target, chose iSCSI Enterprise Target [10 ].
For the scsi initiator, chose Open-ISCSI [11 ].
During 3 days, I tried hard to crash the 3 GFS volumes.
Mounted at only one cluster "client", with gnome desktop, despite being possible to mount GFS volumes locally at their respective servers, opened 6 simultaneous bonnie++ hard disk test suites [12 ] and 2 scp sessions. Scp at local machine allows you to see the transfer speed.
At the humble gfs iscsi targets used and the 100 Mb/s network, and the 1 GB ram P4 desktop used, this was enough to saturate network and turned the desktop almost unresponsive.
  • Not a single crash in 3 days.
Will stay with iSCSI Enterprise Target and Open-iSCSI. See [14 ], [15 ], [16 ], [17 ].
Open-iSCSI initiator and bonnie++ are officially packaged in Debian.
iSCSI Enterprise Target is unnoficially packaged for Debian Etch [13 ] as of may 2007. It is available at Unstable repository already. You may try to backport it because official package is newer.

Updates to this article
Will update this article when more information becomes available for publication.

Update october, 9th, 2007:
The kernel was security updated [20], but the red-hat-cluster source package was not. So, you will still have to apply the patches yourself.

Update november, 7, 2007:
We prepared a text with instructions [21] for recompiling the redhat-cluster-modules needed after you upgrade your kernel.
The repository stock packages does not contain the needed patches and you must patch the sources and recompile packages for installing.

[0] rgmanager use arping options from iputils-arping package, not arping's one -
[1] redhat-cluster: rgmanager not build, missing debian/control definition -
[2] gnbd.ko block device not packaged -
[3] clvm needs init script -
[4] updated init script for fence -
[5] ccs: cluster.conf(5) missing -
[6] Debian source for redhat-cluster
[7] redhat-cluster-source (1.03.00-2)
[8] Source package: linux-modules-extra-2.6 (2.6.18-7+etch2)
[9[ improved clvmd init script compliant with Debian Policy 9.3
[10] iSCSI Enterprise Target (the "server")
[11] Open-iSCSI project (the "client")
[12] bonnie++ hard disk benchmarking sw
[13] iSCSI Enterprise Target unofficial packages for Debian
[14] A quick guide to iSCSI in Linux (a bit outdated, but still excellent and crystal clear basics)
[15] Open-iSCSI readme
[16] iSCSI target and initiator setup
[17] Open-iSCSI and SuSe Linux
[18] iSCSI brazilian portuguese hands on article
[19] redhat-cluster-source: does not repackage gnbd-kernel
[20] kernel images :
[21] how recompile linux-modules-extra-2.6


Postagens mais visitadas deste blog

Tutorial Cyrus IMAP aggregator (murder) 2.3.16 sobre Debian GNU Linux 5.x Lenny

How to configure multipath for high availability and performance on Debian and CentOS for storage at IBM DS8300 SAN

Como instalar Oracle Client no Debian e Ubuntu