BitTorrent for Clusters |
Introduction
When staging large input files to the many nodes of a cluster, the traditional network copy approach (scp, rcp, NFS) quickly overwhelms the machine holding the original copy of the input file. BitTorrent, a peer-to-peer file distribution protocol is naturally suited for file distribution when the file is large, and many clients want the file simultaneously - precisely the scenario when a staging large input files for a distributed cluster job.
This infrastructure make it possible to simultaneously distribute 40 copies of a 4Gb file to the nodes of a Linux cluster without bringing down the local network or the submit machine.
Installation
- Download the BitTorrent source from the SourceForge CVS repository. The patch assumes version 3.4.2 (CVS tip as of December 2004).
cvs -d:pserver:anonymous@bittorrent.cvs.sourceforge.net:/cvsroot/bittorrent login
cvs -z3 -d:pserver:anonymous@bittorrent.cvs.sourceforge.net:/cvsroot/bittorrent co BitTorrent - Download the patch and support files from http://www.umiacs.umd.edu/~nedwards/research/downloads/script.tar.gz.
cd BitTorrent
wget http://www.umiacs.umd.edu/~nedwards/research/downloads/script.tar.gz - Unpack patch and support files, and apply patch.
gunzip -c script.tar.gz | tar xovf -
patch -p0 < script.patch - Install in suitable location. I use
$HOME
as the prefix argument, which installs files in$HOME/bin
and$HOME/lib/python2.4/site-packages
.python setup.py install --prefix=$HOME
- Make sure your path contains the binary install location,
$HOME/bin
in my case; and the environment variable PYTHONPATH contains the library install location,$HOME/lib/python2.4/site-packages
in my case.
Wrapper Scripts
The wrapper scripts are very simple. They make distributing files to a cluster as simple as using scp.
-
btinit.sh [ -D n [ -E ] ]
file1 ... fileN
- Starts the BitTorrent tracker server, if necessary, and a single seeding client per file. A 4-tuple of information is output for each input file: filename, server, port, and info-hash. Usually run on the cluster job submission machine. Computes a checksum of the file to avoid creating multiple torrents for the same file. Will also handle directories, in which case the checksum is not computed - if a seeder is serving a torrent with the same name is found, it is killed; and if a torrent with the same name is found the torrent is recomputed. The option -D instructs the seeding client to terminate after n clients have downloaded the file or directory. The -E option instructs the tracker to terminate when no clients have contacted the tracker in the last 10 minutes. Use these options with caution - they have not been particularly well tested. The -E option, in particular, may terminate the tracker if only the seeding client is running, since the seeding client has a complete copy of the file or directory, and doesn't need to contact the tracker to find out where to get torrent chunks.
- Downloads the file tracked by the server server on port port with hash info-hash to the local file filename. Must be idle for 1 minute before exiting, to give its peers a chance to ask for pieces of the file. Usually run by each node before running its job to download the required input files. Failure to download the file is indicated by non-zero exit status.
- Kills the seeding client for each listed file (or all files with -a) and removes the torrent file from the tracker's directory. If no torrent files remain to be tracked, the tracker is killed and all tracker files are removed from /tmp. Run this on the same machine as btinit.sh.
- Print a table showing various statistics for each of the torrents being tracked by the tracker. Run this on the same machine as btinit.sh.
bt.sh filename server port info-hash
btfini.sh file1 ... fileN btfini.sh -a
btstatus.sh
Example
- Initiate the torrent on genesub00 for the files nraa.fasta, nrdb.fasta and msdb.fasta.
genesub00> ls -l nraa.fasta nrdb.fasta msdb.fasta
-rw-rw-r-- 1 nedwards nedwards 651374714 Nov 25 19:41 msdb.fasta
-rw-rw-r-- 1 nedwards nedwards 1100528784 Nov 25 19:42 nraa.fasta
-rw-rw-r-- 1 nedwards nedwards 760915396 Nov 25 19:45 nrdb.fasta
genesub00> btinit.sh nraa.fasta nrdb.fasta
Started torrent tracker server...
Make torrent file for nraa.fasta...
Make torrent file for nrdb.fasta...
Started torrent uploader for nraa.fasta...
Started torrent uploader for nrdb.fasta...
nraa.fasta genesub00.umiacs.umd.edu 7878 3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta genesub00.umiacs.umd.edu 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca
genesub00> btstatus.sh
name size complete dling dled transferr
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 1 0 0 0B
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 1.73GB 1/2 0/0 0/0 0B
genesub00> btinit.sh msdb.fasta nrdb.fasta
Torrent tracker server already running...
Make torrent file for msdb.fasta...
File nrdb.fasta torrent already made and file hasn't changed
Started torrent uploader for msdb.fasta...
Torrent uploader already running for nrdb.fasta...
msdb.fasta genesub00 7878 26a9afd2c998e94bc25f8798394e4b8eb3845225
nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca
genesub00> btstatus.sh
name size complete dling dled transferr
msdb.fasta 621MB 1 0 0 0B
26a9afd2c998e94bc25f8798394e4b8eb3845225
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 1 0 0 0B
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 2.34GB 1/3 0/0 0/0 0B - Download the file nrdb.fasta on genesub01 and genesub02.
genesub01> bt.sh nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca &
genesub02> bt.sh nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca &
genesub00> btstatus.sh
name size complete dling dled transferr
msdb.fasta 621MB 1 0 0 0B
26a9afd2c998e94bc25f8798394e4b8eb3845225
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 1 2 0 0B
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 2.34GB 1/3 2/2 0/0 0B - Wait for completion.
genesub01> ls -l nrdb.fasta
-rw-rw-r-- 1 nedwards nedwards 760915396 Dec 21 16:10 nrdb.fasta
genesub02> ls -l nrdb.fasta
-rw-rw-r-- 1 nedwards nedwards 760915396 Dec 21 16:10 nrdb.fasta
genesub00> btstatus.sh
name size complete dling dled transferr
msdb.fasta 621MB 1 0 0 0B
26a9afd2c998e94bc25f8798394e4b8eb3845225
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 3 0 2 1.41GB
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 2.34GB 3/5 0/0 2/2 1.41GB - Cleanup.
genesub00> btfini.sh -a
genesub00> btstatus.sh
Notes
There are a few things going on behind the scenes that are important to know about.
Temporary Files
The btinit.sh creates torrents and various other files the tracker server uses in /tmp
. All files used by a particular tracker match the template /tmp/bt*.UUUU.PPPP
, where UUUU is the username of the user running the tracker and PPPP is the port number it is bound to.
Tracker Server Port
The tracker server needs to bind to some port to listen to requests from clients. The temporary files in /tmp
are used to indicate whether or not a port is currently in use by some user. The btinit.sh script will search for an unused port before starting the tracker server. The port used by the tracker server can be set explicitly by setting the environment variable BTTPORT
.
Permitted IP Addresses
The tracker server and BitTorrent clients take a regular expression argument which constrains the IP addresses that will be permitted to contact the tracker server or client. Currently, the scripts use a regular expression allows any valid IP address to connect to the tracker and downloading and seeding clients. The regular expression can be set explicitly in the bt.sh and btinit.sh scripts or by setting the environment variable BTIPRE
. Note however, that this security measure is only as good as the firewalls' ability to block IP spoofing attacks.