Raspberry Pi Cluster

This year (2018) when asked what I want for Christmas I provided my wife a reasonable list of books I'd been wanting for some time. It's her opinion I have too many books so she refuses to buy them as gifts (in her defense they have started to occupy our bedroom floor). Thinking long and hard about a non-literary gift I finally settled on a couple of Raspberry Pi 3's so I could build a small cluster (for fun more than anything). I had setup a few computers to communicate in the past and had setup MPI as well but was a bit rusty. When looking for information on doing so I was a bit disappointed as most of the literature online was focused on setting up MPI with Python for the Raspberry Pi. This (and for the sake of remembering in the future) motivated me to create this post detailing how to setup the Pi's for parallel computing via MPICC/MPIC++.

Step 1: Setting up the Pi's

The kit I received with the pi's contained a micro SD card pre-flashed with raspbian so setup was easy for me. If that is not the case for you, please look here for a good walkthrough on installing raspbian on your pi.

Afterwards, you will need to enable SSH capabilities on your pi. To do so, I looked to this guide for instructions on enabling SSH with a headless pi since I didn't want to connect a keyboard/monitor to each pi and setup the capabilities. Another thing I did while I had access to the pi's boot partitions was overclock them (only recommended if you have additional ways of dealing with the heat produced by the CPU (e.g. heatsinks). Since the pi 3 doesn't allow overclocking through `raspi-config' I edited the config.txt file in the /boot/ directory to enable overclocking which is explained in this forum post.

I would also recommend accessing your router's settings and giving each pi a static IP address so you can avoid headaches when modifying the hosts file later on.

Step 2: Installing MPICH

I'll keep this short and sweet: use apt to install mpich. This can be done via

sudo apt install mpich

this is the easiest and most effective way of getting the MPI libraries and executables on your machines.

Step 3: SSH Without a Password

To be able to execute code efficiently you will need to enable SSH'ing to each machine without entering your password each time. To do so you first need to generate an ssh key via:

ssh-keygen -t rsa

There will be a few options after entering this, ignore them by just pressing Enter without typing any values.

The SSH key needs to be copied to the other machines via

ssh-copy-id <other IP>

This needs to be done for every other IP in your cluster. After this is done on your master node, you will need to SSH to every other pi in your cluster and do the same for every node. Essentially, on each node you need to generate an SSH key then copy it to every other machine in the cluster.

After this is all done, you should have already SSH'd to each other node on the network from your master/head node. If not this needs to be done now.

Step 4: Create Host File

Note: You may want to do step 5 first, I was too lazy to change the ordering. The order doesn't matter but there will be no backtracking if step 5 is done first.

This is fairly straightforward: create a file (with any name) that will hold the names/IPs of all of the nodes (hosts) in the cluster. The format is

<hostname >:<num procs>

If you want to run computations on your master node be sure to include it in the hostfile as well. Note that the number of processes (slots) the node can manage is optional and need not be specified. The option is mostly for preventing overuse of the computational resources for a node. The hostname here can be the actual hostname of the pi (found by typing hostname into the terminal) or the IP address. Later I will cover changing the hostname of the pi's so that each is unique. Afterwards the hostname in the hostfile can be the actual hostname but initially each pi has the same hostname which will cause issues.

Step 5: Create Host Names and Edit Hosts Files

After the pi's are setup you can SSH to each node and change its hostname by editing the /etc/hosts and /etc/hostname files. The hostname file should only have one line which should be changed to whatever you want to call your nodes (in my case I used master/slave so each non-master node had a hostname slave{1,2,3,...}, i.e. slave1 for the first pi, slave2 for the second, etc. In /etc/hosts the very last line (if the file hasn't been changed) needs to be updated to match the hostname in the /etc/hostname file.

Now that each host has a name you will need to add these names to the master node's /etc/hosts file. Append to the end of the file the following lines

<node IP> <5 spaces(tab)> <hostname>

for each node in the network.

Similarly, on every non-master (slave) node, you will need to add the master node's hostname and IP address (in the same format as above) to the /etc/hosts file.

Step 6: Test

At this point you should be able to do a test run on your cluster. To do so, we won't do any programming but rather use a built-in program called hostname to test communication back-and-forth between master and slave.

The command to run is

mpiexec -n <number procs> -hostfile <host file name (step 4)> hostname

Note: the number of processes to run should be large enough to 'hit' each node. That is, if there are three hosts in your host file and the first two allow for 3 processes each, putting the value 5 after -n above would not allow the third host to receive any commands from the master. In this case you would need to use at least seven processes.

Conclusion

That's the quick and dirty rundown of setting up a Raspberry Pi cluster using MPI and C/C++. I will concede the quality of this post is rather lacking but I believe it provides enough information to get started with setting up your cluster. So much so that other widely available information can be queried for any issues you encounter. Like I mention in the introduction, this post is partly to educate and partly to solidify (and create a reminder of) the steps I took to set up my cluster so that, if I add more Pi's, I will have a good starting point.

Good luck and happy computing!
Home