Cluster node sync-on-boot technology
Description
Sync-on-boot technology allows you to use standard root file systems to:
- initialize computer cluster nodes
- synchronize cluster nodes after an update to the standard
Its purpose is to reduce cluster node maintenance by:
- keeping a central standard of the root file system of a node
- automatically synchronizing the node to the standard upon node reboot
Master
Depends
- bind9
- dhcp3-server
- initramfs-tools
- make
- nfs-kernel-server
- syslinux
- tftpd-hpa
- util-linux
Recommends
- shorewall
- tftp-hpa
Node
Depends
- busybox
- e2fsprogs
- rsync
- sed
- util-linux
Recommends
- syslinux
Configuration
I recommend you perform the following steps in the order they are presented here.
DNS
You are going to need name services for the cluster nodes. Configure forward and reverse name resolution.
DHCP
You are going to need DHCP services as well. Have a look at this example DHCP server configuration file.
- Customize the host statement for the update node
TFTP
During booting TFTPD serves the nodes' requests for kernel and initramdisk downloads.
Verify that your TFTP server is ready to answer calls. Usually you should have a line like this in /etc/inetd.conf:
tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot
/var/lib/tftpboot is the root of the file system visible to nodes during booting. The value of the dhcpd.conf filename statement is relative to this root.
- Customize the -s option above (/var/lib/tftpboot) as necessary
SYSLINUX/PXELINUX
Network boot loader (docs: /usr/share/doc/syslinux/pxelinux.txt.gz). If following our DHCPC example:
- Copy /usr/lib/syslinux/pxelinux.0 to /var/lib/tftpboot/cluster_node/pxelinux.0
- mkdir /var/lib/tftpboot/cluster_node/pxelinux.cfg
- Have at least these 2 files in pxelinux.cfg:
- pxelinux.cfg/default
- pxelinux.cfg/C0A800FD - configuration for the update node
- Have pxelinux-common in /var/lib/tftpboot/cluster_node. Customize this file to match your kernel.
Standard node root file system
Prepare the standard root file system. We will assume that it is exported - as in our case - from /srv/nfs4/clusternoderoot.
/etc/clustcontrol
This directory (on the master) holds node configuration parameters used during initialization and synchronization. Different standard node roots share this control directory.
- Have at least these files in the directory:
- default - have a look at this one
- functions - be careful, black magic
- sfdisk-default - customize as necessary
NFS exports
- Bind mount /etc/clustcontrol to /srv/nfs4/clusternoderoot/etc/clustcontrol
- This is because we want the standard roots to share just one /etc/clustcontrol
- NFS export to the nodes:
- /srv/nfs4/clusternoderoot
- Use the appropriate 'exports' flag (crossmnt) to 'unhide' the bind-mounted /srv/nfs4/clusternoderoot/etc/clustcontrol
- /srv/nfs4/clusternoderoot/etc/clustcontrol
- /srv/nfs4/clusternoderoot
Initramdisk
Work in your standard node root file system for this step. You can do this either by working on the host where the standard is prepared or by doing a chroot on the server that stores the file system.
- Add these files to /srv/nfs4/clusternoderoot/etc/initramfs-tools:
- conf.d/clustcontrol - should work for you as is
- hooks/clustcontrol - should work for you as is
- scripts/local-top/clustcontrol - should work for you as is
- scripts/local-bottom/clustcontrol - should work for you as is
- Make the above files except conf.d/clustcontrol executable
- Have initramfs.conf in /srv/nfs4/clusternoderoot/etc/initramfs-tools look like the example under the link
- Create the initramdisk (e.g. update-initramfs -k all -u)
Kernel and initramdisk for TFTP
In this step we copy the kernel and the new initramdisk to the appropriate location within the TFTPD root directory /var/lib/tftpboot.
Either:
- Manually copy them over to /var/lib/tftpboot/cluster_node/ (you will have enough of this after the 3rd use)
or
- Place this Makefile into /var/lib/tftpboot/cluster_node, edit it as necessary and just run 'make' ever after.
Network interfaces
- It is best to disable the udev 75-persistent-net-generator.rules on the nodes. Prepend
GOTO="persistent_net_generator_end"
to/etc/udev/rules.d/75-persistent-net-generator.rules
. If you do not have this file, make a copy of/lib/udev/rules.d/75-persistent-net-generator.rules
. - Look at my /etc/network/interfaces. You will need the 'manual' method on the interface that gets configured in the initramdisk to get for example NFS work properly after a node reboot. This solution works both for the update node and for nodes in regular mode. Asking for 'dhcp' configuration would break the update node.
/etc/fstab
- Comment or remove the line for the root file system from /etc/fstab on the standard node root. This line is invalid when you use nfsroot and it would confuse fsck in regular operation. The root gets mounted in the initrd and that does not depend on /etc/fstab at all.
Bios
- Configure network booting on the nodes.
Boot
GRUB
You do not need grub. Remove it. When doing a kernel update dpkg may complain that update-grub is missing. Have this for /usr/sbin/update-grub:
#!/bin/sh # lkajan: we do not want to use grub on the nodes - we use pxelinux and syslinux. exit 0;
You are done. Boot the nodes.
Update node
The purpose of this node is to make it easy to update the standard node root. Upon booting it has an NFS root that mounts the standard node root. Any change on this node directly affects the standard node root and gets rsync'd to nodes upon reboot.
You can control the existence and identity of the update node in your dhcpd.conf file. Whichever node gets the IP address 192.168.0.253 (or C0A800FD in hexadecimal; updatenode.rostclust resolves to this address) boots as the update node. That is because PXE loads the configuration file pxelinux.cfg/C0A800FD for a client with the above IP address and that configuration file in turn directs PXE to boot 'linuxnfsroot' by default. 'linuxnfsroot' in pxelinux-common is parametrized (boot=nfs) to have NFS root.
Extras
Booting a node from its own hard drive
You can make the nodes' own hard drives bootable like this:
- Add this to the /etc/rc.local of the nodes.
Advanced
This system allows you to have multiple standard node roots for different kinds of nodes.