BitTorrent for Clusters

Official website: http://www.umiacs.umd.edu/~nedwards/research/bittorrent_for_clusters.html

BitTorrent for Clusters

Introduction

When staging large input files to the many nodes of a cluster, the traditional network copy approach (scp, rcp, NFS) quickly overwhelms the machine holding the original copy of the input file. BitTorrent, a peer-to-peer file distribution protocol is naturally suited for file distribution when the file is large, and many clients want the file simultaneously - precisely the scenario when a staging large input files for a distributed cluster job.

This infrastructure make it possible to simultaneously distribute 40 copies of a 4Gb file to the nodes of a Linux cluster without bringing down the local network or the submit machine.

Installation

  1. Download the BitTorrent source from the SourceForge CVS repository. The patch assumes version 3.4.2 (CVS tip as of December 2004).
        cvs -d:pserver:anonymous@bittorrent.cvs.sourceforge.net:/cvsroot/bittorrent login
    cvs -z3 -d:pserver:anonymous@bittorrent.cvs.sourceforge.net:/cvsroot/bittorrent co BitTorrent

  2. Download the patch and support files from http://www.umiacs.umd.edu/~nedwards/research/downloads/script.tar.gz.
        cd BitTorrent
    wget http://www.umiacs.umd.edu/~nedwards/research/downloads/script.tar.gz

  3. Unpack patch and support files, and apply patch.
        gunzip -c script.tar.gz | tar xovf -
    patch -p0 < script.patch

  4. Install in suitable location. I use $HOME as the prefix argument, which installs files in $HOME/bin and $HOME/lib/python2.4/site-packages.
        python setup.py install --prefix=$HOME

  5. Make sure your path contains the binary install location, $HOME/bin in my case; and the environment variable PYTHONPATH contains the library install location, $HOME/lib/python2.4/site-packages in my case.

Wrapper Scripts

The wrapper scripts are very simple. They make distributing files to a cluster as simple as using scp.

btinit.sh [ -D n [ -E ] ] file1 ... fileN
Starts the BitTorrent tracker server, if necessary, and a single seeding client per file. A 4-tuple of information is output for each input file: filename, server, port, and info-hash. Usually run on the cluster job submission machine. Computes a checksum of the file to avoid creating multiple torrents for the same file. Will also handle directories, in which case the checksum is not computed - if a seeder is serving a torrent with the same name is found, it is killed; and if a torrent with the same name is found the torrent is recomputed. The option -D instructs the seeding client to terminate after n clients have downloaded the file or directory. The -E option instructs the tracker to terminate when no clients have contacted the tracker in the last 10 minutes. Use these options with caution - they have not been particularly well tested. The -E option, in particular, may terminate the tracker if only the seeding client is running, since the seeding client has a complete copy of the file or directory, and doesn't need to contact the tracker to find out where to get torrent chunks.

bt.sh filename server port info-hash
Downloads the file tracked by the server server on port port with hash info-hash to the local file filename. Must be idle for 1 minute before exiting, to give its peers a chance to ask for pieces of the file. Usually run by each node before running its job to download the required input files. Failure to download the file is indicated by non-zero exit status.

btfini.sh file1 ... fileN btfini.sh -a
Kills the seeding client for each listed file (or all files with -a) and removes the torrent file from the tracker's directory. If no torrent files remain to be tracked, the tracker is killed and all tracker files are removed from /tmp. Run this on the same machine as btinit.sh.

btstatus.sh
Print a table showing various statistics for each of the torrents being tracked by the tracker. Run this on the same machine as btinit.sh.

Example

  1. Initiate the torrent on genesub00 for the files nraa.fasta, nrdb.fasta and msdb.fasta.
        genesub00> ls -l nraa.fasta nrdb.fasta msdb.fasta
    -rw-rw-r-- 1 nedwards nedwards 651374714 Nov 25 19:41 msdb.fasta
    -rw-rw-r-- 1 nedwards nedwards 1100528784 Nov 25 19:42 nraa.fasta
    -rw-rw-r-- 1 nedwards nedwards 760915396 Nov 25 19:45 nrdb.fasta

    genesub00> btinit.sh nraa.fasta nrdb.fasta
    Started torrent tracker server...
    Make torrent file for nraa.fasta...
    Make torrent file for nrdb.fasta...
    Started torrent uploader for nraa.fasta...
    Started torrent uploader for nrdb.fasta...
    nraa.fasta genesub00.umiacs.umd.edu 7878 3b89f9eb2b8ba2e464e7d2184b523f73896e5568
    nrdb.fasta genesub00.umiacs.umd.edu 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca

    genesub00> btstatus.sh
    name size complete dling dled transferr
    nraa.fasta 1.02GB 1 0 0 0B
    3b89f9eb2b8ba2e464e7d2184b523f73896e5568
    nrdb.fasta 725MB 1 0 0 0B
    538b7e44f5c53ef6806502589d0f291b9aa8efca
    Total 1.73GB 1/2 0/0 0/0 0B

    genesub00> btinit.sh msdb.fasta nrdb.fasta
    Torrent tracker server already running...
    Make torrent file for msdb.fasta...
    File nrdb.fasta torrent already made and file hasn't changed
    Started torrent uploader for msdb.fasta...
    Torrent uploader already running for nrdb.fasta...
    msdb.fasta genesub00 7878 26a9afd2c998e94bc25f8798394e4b8eb3845225
    nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca

    genesub00> btstatus.sh
    name size complete dling dled transferr
    msdb.fasta 621MB 1 0 0 0B
    26a9afd2c998e94bc25f8798394e4b8eb3845225
    nraa.fasta 1.02GB 1 0 0 0B
    3b89f9eb2b8ba2e464e7d2184b523f73896e5568
    nrdb.fasta 725MB 1 0 0 0B
    538b7e44f5c53ef6806502589d0f291b9aa8efca
    Total 2.34GB 1/3 0/0 0/0 0B

  2. Download the file nrdb.fasta on genesub01 and genesub02.
        genesub01> bt.sh nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca &

    genesub02> bt.sh nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca &

    genesub00> btstatus.sh
    name size complete dling dled transferr
    msdb.fasta 621MB 1 0 0 0B
    26a9afd2c998e94bc25f8798394e4b8eb3845225
    nraa.fasta 1.02GB 1 0 0 0B
    3b89f9eb2b8ba2e464e7d2184b523f73896e5568
    nrdb.fasta 725MB 1 2 0 0B
    538b7e44f5c53ef6806502589d0f291b9aa8efca
    Total 2.34GB 1/3 2/2 0/0 0B

  3. Wait for completion.
        genesub01> ls -l nrdb.fasta
    -rw-rw-r-- 1 nedwards nedwards 760915396 Dec 21 16:10 nrdb.fasta

    genesub02> ls -l nrdb.fasta
    -rw-rw-r-- 1 nedwards nedwards 760915396 Dec 21 16:10 nrdb.fasta

    genesub00> btstatus.sh
    name size complete dling dled transferr
    msdb.fasta 621MB 1 0 0 0B
    26a9afd2c998e94bc25f8798394e4b8eb3845225
    nraa.fasta 1.02GB 1 0 0 0B
    3b89f9eb2b8ba2e464e7d2184b523f73896e5568
    nrdb.fasta 725MB 3 0 2 1.41GB
    538b7e44f5c53ef6806502589d0f291b9aa8efca
    Total 2.34GB 3/5 0/0 2/2 1.41GB

  4. Cleanup.
        genesub00> btfini.sh -a

    genesub00> btstatus.sh

Notes

There are a few things going on behind the scenes that are important to know about.

Temporary Files

The btinit.sh creates torrents and various other files the tracker server uses in /tmp. All files used by a particular tracker match the template /tmp/bt*.UUUU.PPPP, where UUUU is the username of the user running the tracker and PPPP is the port number it is bound to.

Tracker Server Port

The tracker server needs to bind to some port to listen to requests from clients. The temporary files in /tmp are used to indicate whether or not a port is currently in use by some user. The btinit.sh script will search for an unused port before starting the tracker server. The port used by the tracker server can be set explicitly by setting the environment variable BTTPORT.

Permitted IP Addresses

The tracker server and BitTorrent clients take a regular expression argument which constrains the IP addresses that will be permitted to contact the tracker server or client. Currently, the scripts use a regular expression allows any valid IP address to connect to the tracker and downloading and seeding clients. The regular expression can be set explicitly in the bt.sh and btinit.sh scripts or by setting the environment variable BTIPRE. Note however, that this security measure is only as good as the firewalls' ability to block IP spoofing attacks.

Other Assumptions
The scripts assume the following typical unix/linux programs are available in the user's path: python, sh, ls, wc, whoami, uname, ps, cat, test (as '['), echo, sed, kill, sleep, rm, wget, mkdir, cksum, expr, awk, fgrep; and that they behave in relatively standard ways.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值