I have a big list of remote file locations and local paths where I would like them to end up. Each file is small, but there are very many of them. I am generating this list within Python.
I would like to download all of these files as quickly as possible (in parallel) prior to unpacking and processing them. What is the best library or linux command-line utility for me to use? I attempted to implement this using multiprocessing.pool, but that did not work with the FTP library.
I looked into pycurl, and that seemed to be what I wanted, but I could not get it to run on Windows 7 x64.
解决方案
Know it's an old post but there is a perfect linux utility for this. If you are transferring files from a remote host, lftp is great! I mainly use it to quickly push stuff to my ftp server but it works great for pulling stuff off as well using the mirror command. It also has an option to copy a user defined number of files in parallel like you wanted. If you wanted to copy some files from a remote path to a local path your command line would look something like this;
lftp
open ftp://user:password@ftp.site.com
cd some/remote/path
lcd some/local/path
mirror --reverse --parallel=2
Be very careful with this command though, just like other mirror commands if you screw it up, you WILL DELETE FILES.
For more options or documentation for lftp I've visited this site http://lftp.yar.ru/lftp-man.html