多线程下载的实现是郁闷了 我很久的问题了--网络流不能重定向
无意中发现了这篇文章<viusal c#实现断点续传> http://www.yesky.com/390/1781390.shtml,文章里这一段让我感兴趣,让我认识到了http协议的强大:
下面我们就来说说"断点续传",顾名思义,断点续传就是在上一次下载时断开的位置开始继续下载。
在HTTP协议中,可以在请求报文头中加入Range段,来表示客户机希望从何处继续下载。
<script type="text/javascript">zmbbs=1;</script>
这样就可以从1024字节后下载
也就是说,如果要实现多线程下载,只要让每个线程在发送httpheader时加上'Range:bytes=*'就行了!
代码如下:
测试一下
python test.py http://tn4.cn3.yahoo.com/image/d43/ab6a1f9ede0aee6b0c.jpeg 8
然后就会发现文件夹下多了个ab6a1f9ede0aee6b0c.jpeg,基本成功!
第一次用python写程序,写的比较随便,功能很简单,只有下载,没有显示下载进度,没有断点续传,有空的话再补上吧
ps:如果要实现向迅雷一样的p2sp下载的话,其实也很简单,就是每个线程到一个源下载各自的部分就行!这得需要一个web数据库
ps:感谢 andelf的支持!!
无意中发现了这篇文章<viusal c#实现断点续传> http://www.yesky.com/390/1781390.shtml,文章里这一段让我感兴趣,让我认识到了http协议的强大:
下面我们就来说说"断点续传",顾名思义,断点续传就是在上一次下载时断开的位置开始继续下载。
在HTTP协议中,可以在请求报文头中加入Range段,来表示客户机希望从何处继续下载。
比如说从第1024字节开始下载,请求报文如下:
GET /image/index_r4_c1.jpg HTTP/1.1 Accept: */* Referer: http://192.168.3.120:8080 Accept-Language: zh-cn Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705) Host: 192.168.3.120:8080 Range:bytes=1024- Connection: Keep-Alive |
这样就可以从1024字节后下载
也就是说,如果要实现多线程下载,只要让每个线程在发送httpheader时加上'Range:bytes=*'就行了!
代码如下:
import
sys,time
from httplib import *
from thread import *
from threading import *
parts = []
thread_amount = 5
PART_LENGTH = 1024
lock = RLock()
class Part(Thread):
def __init__ (self,NO,resource):
# for short only
self.resource = resource
r = resource
self.NO = NO
self.pos_start = int( r.content_length / thread_amount ) * NO
self.length = int( r.content_length / thread_amount )
self.pos_end = self.pos_start + self.length
self.downloaded = 0
self.speed = 0
parts.append( self )
Thread. __init__ ( self, name = ' part_%s ' % (NO) )
def run(self):
http = HTTPConnection(self.resource.host, 80 )
headers = {
' Range ' : ' bytes=%s-%s ' % ( self.pos_start, self.pos_end )
};
http.request( ' GET ' ,self.resource.url, '' ,headers)
resp = http.getresponse()
while self.downloaded < self.length:
self.ongetdata(resp.read(PART_LENGTH))
def ongetdata(self,data):
lock.acquire()
self.resource.F.seek(self.downloaded + self.NO * self.length,0)
self.resource.F.write(data)
lock.release()
self.downloaded += PART_LENGTH
class Resource:
def __init__ (self,url):
# get host & url
n = url.find( ' / ' , 7 )
self.host = url[ 7 :n]
self.url = url[n:]
# get length
http = HTTPConnection(self.host, 80 )
http.request( ' GET ' ,self.url)
resp = http.getresponse()
self.content_length = int(resp.getheader( ' Content-Length ' ))
# get filename & create a file before download
n = url.rfind( ' / ' )
self.filename = url[n + 1 :]
print self.filename
self.F = open(self.filename, ' wb+ ' )
print >> self.F, ' x ' * self.content_length
def begin_download(url):
# get the host and url
r = Resource(url)
for i in range(thread_amount):
p = Part(i,r)
p.start()
def part_begin_download(p,r):
start_new_thread(x_part_begin_download,(p,r))
try :
thread_amount = int(sys.argv[ 2 ])
except :
thread_amount = 1
begin_download( sys.argv[ 1 ] )
from httplib import *
from thread import *
from threading import *
parts = []
thread_amount = 5
PART_LENGTH = 1024
lock = RLock()
class Part(Thread):
def __init__ (self,NO,resource):
# for short only
self.resource = resource
r = resource
self.NO = NO
self.pos_start = int( r.content_length / thread_amount ) * NO
self.length = int( r.content_length / thread_amount )
self.pos_end = self.pos_start + self.length
self.downloaded = 0
self.speed = 0
parts.append( self )
Thread. __init__ ( self, name = ' part_%s ' % (NO) )
def run(self):
http = HTTPConnection(self.resource.host, 80 )
headers = {
' Range ' : ' bytes=%s-%s ' % ( self.pos_start, self.pos_end )
};
http.request( ' GET ' ,self.resource.url, '' ,headers)
resp = http.getresponse()
while self.downloaded < self.length:
self.ongetdata(resp.read(PART_LENGTH))
def ongetdata(self,data):
lock.acquire()
self.resource.F.seek(self.downloaded + self.NO * self.length,0)
self.resource.F.write(data)
lock.release()
self.downloaded += PART_LENGTH
class Resource:
def __init__ (self,url):
# get host & url
n = url.find( ' / ' , 7 )
self.host = url[ 7 :n]
self.url = url[n:]
# get length
http = HTTPConnection(self.host, 80 )
http.request( ' GET ' ,self.url)
resp = http.getresponse()
self.content_length = int(resp.getheader( ' Content-Length ' ))
# get filename & create a file before download
n = url.rfind( ' / ' )
self.filename = url[n + 1 :]
print self.filename
self.F = open(self.filename, ' wb+ ' )
print >> self.F, ' x ' * self.content_length
def begin_download(url):
# get the host and url
r = Resource(url)
for i in range(thread_amount):
p = Part(i,r)
p.start()
def part_begin_download(p,r):
start_new_thread(x_part_begin_download,(p,r))
try :
thread_amount = int(sys.argv[ 2 ])
except :
thread_amount = 1
begin_download( sys.argv[ 1 ] )
测试一下
python test.py http://tn4.cn3.yahoo.com/image/d43/ab6a1f9ede0aee6b0c.jpeg 8
然后就会发现文件夹下多了个ab6a1f9ede0aee6b0c.jpeg,基本成功!
第一次用python写程序,写的比较随便,功能很简单,只有下载,没有显示下载进度,没有断点续传,有空的话再补上吧
ps:如果要实现向迅雷一样的p2sp下载的话,其实也很简单,就是每个线程到一个源下载各自的部分就行!这得需要一个web数据库
ps:感谢 andelf的支持!!