Network installation of Debian on cluster nodes using SystemImager

Introduction

SystemImager (SI) is an excellent tool for installation of multiple machines such as cluster nodes. Machines with a PXE-enabled BIOS can be booted and installed over a network connection in almost fully-automated fashion. Unfortunately, the SI documentation for this procedure is poor - outdated, incomplete and confusing. Here, I document my experiences of using SI to install Debian/GNU LInux on 12 cluster nodes. I used the current Debian (sarge) version of SI, 3.2.3.

Useful links that helped a lot:
http://wiki.sisuite.org/NetworkBoot
http://www.falkotimme.com/howtos/systemimager/index.php

Setup

My cluster (cerberus) contains 14 machines as follows:

- cerberus00 (the head node), installed manually
- cerberus24 (fileserver and SI image server), installed manually
- cerberus01 (compute node, SI golden client), installed manually
- cerberus02-11 (compute nodes) awaiting installation via SI

Procedure

All operations are performed as root.

1. Install SI server on cerberus24:

 apt-get install systemimager-server

2. Install systemimager-client on cerberus01:

 apt-get install systemimager-client

3. Run prepareclient on cerberus01:

- this is where I had my first problem. The SI documentation says simply “run prepareclient” with no clue as to any required or optional switches. The full command is:

 prepareclient --server cerberus24

This requires an entry for the image server in /etc/hosts - you could use IP address here. The client launches an rsync daemon which is used to pull the image to the image server; you should kill this later.

4. Run getimage on cerberus24:

- my problem here was that rsync failed when I used the IP address for cerberus01, but worked fine using the short domain name. Perhaps IP addresses should be in quotes? The getimage man page is also broken and does not give the options for -ip-assignment. I went with static.

 getimage -golden-client cerberus01 -image cerberus01_base_install -ip-assignment static

Call the image whatever you like. Once the image is pulled to the image server, you can kill the rsync daemons on server and golden client.

5. Set up cerberus24 as a boot server

This is where the SI documentation really falls down. You need to configure DHCP on the image server so as when a client boots it will receive the required IP address, boot a kernel then begin running the SI install scripts.

(i) Install required packages on cerberus24:

You need to apt-get install the following:

- dhcp (version 2 NOT dhcp3-server). Version 2 allocates IP addresses in ascending order, which is what you want as you boot each client. Version 3 allocates them in random order, which is probably not what you want.
- pxe
- tftpd-hpa
- tftp-hpa (you need this client for testing later on)
- syslinux

(ii) Edit /etc/systemimager/systemimager.conf

Make sure it contains:

 NET_BOOT_DEFAULT = local

This ensures that the SI scripts only run on the first boot - after that the client will boot from its own disk.

(iii) Run mkbootserver

This is an interactive script to set up your image server as a boot server. It is very fussy and if it breaks, it does not properly shut down or start up some of the services that it tests. If the script exits early you should:

- rm -rf /tftpboot
- /etc/init.d/inetd stop, then start
- /etc/init.d/pxe restart
- killall in.tftpd

If the script runs to completion, it will rewrite the file /etc/dhcpd.conf and prepare the directory /tftpboot. My /etc/dhcpd.conf ended up looking like this:

 default-lease-time -1;
 filename "pxelinux.bin";
 subnet 192.168.1.0 netmask 255.255.255.0 { 
   range  192.168.1.102 192.168.1.112; 
   option domain-name "localdomain.domain";
   option routers 192.168.1.1;
   option option-140 "192.168.1.124";
   option option-144 "n";
   next-server 192.168.1.124;
 }

A few notes. “range” is the IP addresses that you want the clients to receive. Using dhcp version 2 they will be allocated in ascending order (so client “cerberus02” will be “192.168.1.102” and “cerberus12” will be “192.168.1.112”. “option-140” means that cerberus24 is the SI image server as well as the dhcp server and “next-server” means that it is also the boot server. “pxelinux.bin” is the initial PXE boot file.

The directory /tftpboot should contain the following files and directories:

   -rw-r--r--  2 root root  11830 2004-09-20 07:37 pxelinux.bin
   -rw-r--r--  2 root root 818470 2005-02-22 10:50 kernel
   -rw-r--r--  2 root root 505184 2005-02-22 10:50 initrd.img
   drwxr-xr-x  3 root root   4096 2005-10-03 15:48 X86PC
   drwxr-xr-x  2 root root   4096 2005-10-03 17:29 pxelinux.cfg

(iv) Run mkclientnetboot

This is barely explained at all in the SI documentation. The purpose of this command is to generate a file in /tftpboot/pxelinux.cfg/ for each client which is a hex representation of the IP address. I ran:

 mkclientnetboot --verbose --netboot --clients "IP address list"

where IP address list is just a space-separated string, “192.168.1.102 192.168.1.103 192.168.1.104…” etc. ending with “192.168.1.112”.

(v) Ensure that all required services are running

On your image server, check:

 /etc/init.d/systemimager-server start
 /etc/init.d/netbootmond start
 /etc/init.d/pxe start

tftpd-hpa should run under inetd (it will be activated by a call from the client), so ensure inetd is running too.

6. And away you go…

Plug a monitor into your PXE-enabled install client and power up. If all goes well, you'll see the machine acquire the correct IP address, then TFTP will find the pxelinux.bin file, the kernel and image that you specified using mkbootserver will load and then the SI scripts will do their magic. Reboot at the end and the machine should boot from its own disk.

 
systemimager_hacks.txt · Last modified: 2007/08/20 13:27 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki