Bring an Aquila Node Online
a Mini-How-To*
*not in official LDP format
Sheyna Gifford July 24, 2002
The purpose of this document is to provide Aquila Beowulf cluster administrators with a quick reference guide to bringing processing nodes, also known as "slave" nodes, on board the network.
Introduction
The Aquila Cluster is the brainchild of DEEP program director Marc Davis. As such, its purpose is to store DEEP sky survey data, process the data in the most expedient manner possible, and return the results to the appropriate researcher. As many varieties of storage devices exist adequate to the task (RAID arrays and files servers are two such examples), Aquila's primary function is data processing.
Section 1: Hardware
Aquila slaves nodes consist of rack-mounted Tyan Thunder K7 motherboards equipped with v. 2.09 BIOS. On board are dual AMD Athlon processors, and approximately 2 Gig of DDR RAM, as well as five available PCI slots. 3Com dual LAN ports are also included.
For a complete list of hardware onboard the motherboard, please see Appendix A.
Assuming that the node is mounted into the rack, has been supplied with a source of power, and is linked into the network via an ethernet cable you are ready to begin.
Section 2: Software
Installation Media
On the master node, mount the CDROM on /media/cdrom. Then edit the /etc/exports and file with the following record:
/media/cdrom 10.1.1.0/255.255.255.0(rw,no_root_squash)
Then run exportfs -a. The nodes should now be able to see the cdrom. Make sure CD1 is in master's CDROM drive, insert the SuSE boot diskette into the node's disk drive and reboot.
Network Installation
The boot disk will load itself up, and ask for a modules disk. It will ask how you would like to install SuSE. Select "Network Install". Then, you will be asked to insert a second modules disk, which constrains drivers for the ethernet card. Select the drivers for 3com 3C920. The address of the nameserver is 10.1.1.128, and the location of the installation media is /media/cdrom.
2.3YaST (Yet another Setup Tool)
There is little difference between YaST 1 and 2, with the exception of the GUI. YaST 2 is gooey, YaST 1 is not.
Disk Partitions
The only partitions that the processing nodes need are /boot, / and /scratch. Using fdisk or SuSE's partition program, put /boot on /dev/hda1, / on /dev/hda2, swap on /dev/hda3, and /scratch on /dev/hda4, as is displayed on the partition table below. Use the following example from node1 to allowcate blocks to the various partitions. At the end of the day your partition table should read:
Disk /dev/hda: 255 heads, 63 sectors, 2498 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 6 48163+ 83 Linux
/dev/hda2 7 785 6257317+ 83 Linux
/dev/hda3 786 919 1076355 82 Linux swap
/dev/hda4 920 2498 12683317+ 83 Linux
And /etc/fstab should read:
bash-2.05# more /etc/fstab
/dev/hda3 swap swap defaults 0 0
/dev/hda2 / ext2 defaults 1 1
/dev/hda1 /boot ext2 defaults 1 2
/dev/hda4 /scratch ext2 defaults 1 2
Note, all filesystems are ext2.
Select Packages
After you have partitioned the drives, and created the mount points accordingly, you will be moved to a menu where you will be allowed to choose the various system software packages.
Choose to install all packages. That way, you won't miss anything.
The only packages that actually WILL be installed are the ones on CD1, since that is the one and only media source. In a normal install you would be prompted for the other CDs (2-5, anyway). However, this being a network install, you will have to go install a few other packages from the directory where all of the CDs have been copied onto the network, and a few others have been downloaded off of the web. More on this in the section 4.0.
Post-Package Installation Configuration
After you have chosen and installed the packages, you will be moved through a series of questions designed to fill in blanks in a document called /etc/rc.config. As you will be editing this document later, by hand, you need not answer most of these questions. However, you definitely what to make it a true network, eschew modem configuration, skip giving it a nameserver, etc. SuSE will try and set up sendmail. Don't let it. When it get to the the LILO configuration, just make sure to put LILO on the MBR, on /dev/hde. SuSE will occasionally attempt to put LILO on the root partition, /dev/hde. This is not a good idea, especially since that drive hasn't even been initialized. Typically, you can run with what SuSE has set as the LILO defaults (thought you will have to create a configuration. Please follow the prompts).
2.6 Misc. Install Information
After the all is said and done the system will reboot. After the reboot chances are that YaST will restarting and begin tying up loose ends. At least one of those loose ends involves the mouse. The mouse port is PS/2, but you do not have to bother configuring the mouse or the sound card.
Section 3: Post-Installation Configuration
/etc/rc.config
Whereas other Linux flavors (e.g. RedHat) hide their crucial configuration files in various scattered directories, SuSE centralizes the configuration process into a single file: /etc/rc.config. With some minor alterations to this file, and two or three other /etc/host files, and a little bit of luck, the node will be on the network in no time.
When you get right down to it, very few major changes will be required. Mostly, you will be checking to see that some variables set during installation were set correctly. On top of that, you'll be giving it the information it needs to see the network.
The first order of business is to check that START_LOOPBACK="yes". You were asked about this variable during installation, but this is a good opportunity to check that it was set properly.
Next, mark the presence of a network card by setting NETCONFIG="_0".
Now, give it a network address. An example, from node1, is IPADDR_0="10.1.1.1"
NETDEV_0="eth0"
IFCONFIG_0="10.1.1.1 broadcast 10.1.1.255 netmask 255.255.255.0"
CHECK_ETC_HOSTS=no
BEAUTIFY_ETC_HOSTS=no
In general, I do not recommend allowing SuSEconfig to "beautify" or otherwise alter /etc/hosts. That's for you to do later. SuSEconfig has a penchant for sticking a lot of junk in /etc/hosts that you don't need and probably don't want.
*FQHOSTNAME="node1.localdomain"
OK, here we go. This particular variable is extremely important. Even if everything else is set right, if this one is set wrong, yp will not bind on reboot. "localdomain" is something that you will have to define in /etc/hosts.
*ROOT_LOGIN_REMOTE="yes"
Please set this so that you can access the node from a remote location. That is, unless, you really LIKE the basement and you intend to be down there all day, everyday.
START_LPD=no
Turning off the line printer daemon will make boot-up faster.
*CREATE_YP_CONF="yes"
OK, here we go again. When you reach this part of the document you are approximately 80% of the way thru the actual file, but only 50% of the way thru the changes you have to make, and make correctly. Be SURE to make all of the following changes.
*YP_DOMAINNAME="berkeley1"
*YP_SERVER="10.1.1.128"
*START_YPBIND="yes"
There is a lot of other junk, both before these last 4 lines and after it. You need pay it no mind for now.
After these modifications are complete, in order for your changes to be enacted, return to the command line and (as root) type
# /sbin/SuSEconfig
SuSE will then run through the script, line-by-line, making the necessary changes across the system. With any luck, one of the lines that will appear in the list of outputs is "Domainname Bound." If "Domainname not bound" appears, please double-check that
YPDOMAIN, YPSERVER and FQHOSTNAME are set properly. Don't panic yet if the domain still refuses to bind, for there are a few changes yet to be made.
The /etc/hosts files
In order to bring a node onto the network, you will have to modify /etc/hosts, /etc/hosts.allow and /etc/hosts.deny as needed. A good example of how a node /etc/hosts file should read comes from node1:
bash-2.05a# more /etc/hosts
127.0.0.1 localhost
10.1.1.8 node8
10.1.1.7 node7
10.1.1.6 node6
10.1.1.5 node5
10.1.1.4 node4
10.1.1.3 node3
10.1.1.2 node2
10.1.1.1 node1
10.1.1.128 master
10.1.1.130 server2
# special IPv6 addresses
::1 localhost ipv6-localhost ipv6-loopback
fe00::0 ipv6-localnet
ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts
node1.localdomain node1
Note how the last line of the file identifies the node in question by its network address,
FQHOSTNAME, and abbreviated hostname. The information in this line must correlate exactly with the information in /etc/rc.config, or ypbind will not start.
After you have saved and exited, you may try running /etc/init.d/ypbind start. Sometimes, the domain will bind. More often, you will get an error message that the domain is not yet bound. Fear not, for 9-times-out-of-10 first you must reboot for ypbind to take root. At the very least, at this point, the hostname command should return the appropriate value.
Please modify /etc/hosts.allow with the names of all of the other machines on the network. /etc/hosts.deny should already be properly set to deny all calls from IP addresses not in the localdomain. You may also choose to alter |~/.rhosts at this time so that rsh will work when the node comes back up. Reboot.
The First Reboot
During the first reboot after configuration, watch as SuSE brings up the varies daemons. Most importantly, watch for ypbind. If that daemon fails to start, please check /etc/rc.config and make sure that your YPDOMAIN, YPSERVER and FQHOSTNAME are properly set, and that they correspond to the information in /etc/hosts. Please also double-check that the IP and IFCONFIG addresses are set properly.
3.4 SuSE Quirks
Once or twice, regardless of having all of the proper information in the proper location, the yp has still refused to bind. After three or four reboots, if this is still the case, try changing CHECK_ETC_HOSTS=yes. Sometimes, for reasons unknown, SuSE wants control over that file and won't fork over a connection without it. Also, be sure and check that you have loaded the proper modules for the 3com ethernet card.
4.0 Notes
While the node should now be happily on the network, the task of fully integrating it as a functional member of the massively parallel system is far from complete. The node must be introduced to the master computer and other nodes via their own /etc/hosts files. It must also be introduced to PBS via the /usr/spool/PBS/server_priv/nodes file, and to MPI via the /usr/local/mpich-1.2.4/util/machines/machines.LINUX file.
For the node to function like all of the other processing nodes, it will need some additional files not included on CD1. Find them in /cosmo/suse/cd-all. Specifically, find the xntpd package and install it on the new node. Also, you will need rdate and dump/restore, which was acquired for Aquila via the web, and can be found running on all of the nodes and the master server. Rdate and dump will need to scripts that allow them to run properly. Those scripts may be found in the path /usr/local/adm of any Aquila node. These very scripts-networkreboot, networkshutdown, runrdate, etc-will also need to be provided with information about the new node.
Conclusion
The cluster is constantly changing. For example, within the last week or so, all of the cluster's processing nodes have been fitted with myirnet cards. This required that a string of new addresses be added to /etc/hosts (10.1.2.1-10.1.2.8) as well as a host of other alterations as outlined on the installation docs from myricom.com.
There are many subtitles entailed in properly configuring a processing node. Please consult with the senior systems administrator to an up-to-date list. Furthermore, please add those details to this document. All your efforts in this regard are greatly appreciated.
Appendices
Appendix A
The following list of hardware specifications should provide all you will need to know in order to build a linux kernel for any board of this configuration:
Processor
Dual AMD Athlon MP processors, bus support systems for 200/266 MHZ
Chipset
AMD-760 MP Chipset
Memory
Four 2.5 V 184-pinn 25 degree angled registered DRR DIMM sockets
Expansion Slots
One AGP Pro 50 Slot
Five 64/32-bit 33 Mhz (5-volt) PCI slots
Integrated IDE
Dual channel master mode for up to 4 IDE devices
Support for DMW 100/66/33 IDE and ATAPI compliant devices
Integrated LAN
Dual 3Com 3C920 LAN controllers
10/100 Mbps data transfer rate per controller
Integrated video
ATI RAGE XL graphics accelerator
Standard 15-pin VGA port
BIOS
Phoenix BIOS 4 Mb Flash ROM
Supports APM v1.2 and ACPI v. 1.0
Myrinet cards
the master node also has:
ATA-100 support
Dual channel ultra 160 SCSI, with and Adaptec AIC-7899W controller, with 160 mpbs data throughput, supporting 15 LVD SCSI devices per channel