Python多进程分片下载网页URL大文件 ,可以按照以下流程设计代码框架:
-
首先,我们通过发送HEAD请求获取要下载文件的大小,并计出需要将文件分成多少个分片。
-
然后,我们创建一个包含所有分片信息的列表。每个分片的信息包括分片编号、起始位置、结束位置和要下载的URL。
-
接下来,我们使用
multiprocessing.Process
创建指定数量的进程,并将每个分片的信息传给download_file
函数进行分片下载。每个进程都会执行download_file
函数来下载其所负责的分片。 -
在
download_file
函数中,我们使用requests.get
方法发送带有分片请求头的GET请求,通过stream=True
参数以流的形式接收响应,然后将分片的二进制数据,使用file.seek函数保持与源文件位置一致,写入本地文件。 -
最后,下载完成,我们输出一条消息提示下载文件完成。
这样,通过使用多进程对文件进行分片下载的方式,我们可以加快下载速度,提高效率。
Python多进程分片下载远端大文件源码(多进程单独封装函数):
import multiprocessing
import time
import requests
def download_file(start_pos, end_pos, url, file_name, file_size, shared_dict):
headers = {'Range': 'bytes={}-{}'.format(start_pos, end_pos)}
resp = requests.get(url, headers=headers, stream=True)
with open(file_name, 'r+b') as file:
file.seek(start_pos)
for chunk in resp.iter_content(chunk_size=1024 * 1024):
if chunk:
file.write(chunk)
shared_dict['total_bytes_downloaded'] += len(chunk)
print("chunk file with start_pos ~ end_pos: {} ~ {}, Download successfully!".format(start_pos, end_pos))
print("{}/{}".format(shared_dict['total_bytes_downloaded'], file_size))
percent = float(shared_dict['total_bytes_downloaded']) * 100 / file_size
print("Download %.2f%% of the file." % percent)
def download_multiprocessing(url, file_name, file_size, shared_dict):
# get number of CPU with high efficiency
num_processes = multiprocessing.cpu_count()
#num_processes = 1
print("number of CPU is {}, number of process is {}".format(multiprocessing.cpu_count(), num_processes))
# create new empty local file, same size with remote file
with open(file_name, "wb") as f:
f.truncate(file_size)
# calculate the chunk size, n-1 processing handle each chunk size sub-file, and last processing handle last remaining sub-file
chunk_size = file_size // num_processes
print("chunk_size is {}".format(chunk_size))
# create number of process
processes = []
# create a process for each chunk
print(range(num_processes))
for index in range(num_processes):
print("index:",index)
#print("process: {}".format(index))
start_pos = index * chunk_size
end_pos = start_pos + chunk_size
# last process will download the remaining bytes
if index == num_processes - 1:
end_pos = file_size - 1
print("end_pos",end_pos)
args = (start_pos, end_pos, url, file_name, file_size, shared_dict)
process = multiprocessing.Process(target=download_file, args=args)
process.start()
print(process)
processes.append(process)
# wait for all the processes to finish
for process in processes:
process.join()
print(process)
def main():
url = 'http://URL/abc.zip'
file_name = './requests_largefile.zip'
response = requests.head(url)
print(response)
#file_size = int(response.headers.get('Content-Length', 0))
file_size = int(requests.head(url).headers["Content-Length"])
print("url file_size is {} bytes".format(file_size))
manager = multiprocessing.Manager()
shared_dict = manager.dict(total_bytes_downloaded=0)
multiprocess_download_start = time.time()
download_multiprocessing(url, file_name, file_size, shared_dict)
multiprocess_download_end = time.time()
multiprocess_download_cost = multiprocess_download_end - multiprocess_download_start
print("{}/{}".format(shared_dict['total_bytes_downloaded'], file_size))
print("Full file Download successfully! Cost time: {:.2f}s".format(multiprocess_download_cost))
if __name__ == "__main__":
main()
运行结果输出:
$ python multi_process_download_url_bigfile_def_requests.py
<Response [200]>
url file_size is 368162450 bytes
number of CPU is 4, number of process is 4
chunk_size is 92040612
[0, 1, 2, 3]
('index:', 0)
<Process(Process-2, started)>
('index:', 1)
<Process(Process-3, started)>
('index:', 2)
<Process(Process-4, started)>
('index:', 3)
('end_pos', 368162449)
<Process(Process-5, started)>
chunk file with start_pos ~ end_pos: 92040612 ~ 184081224, Download successfully!
262144000/368162450
Download 71.20% of the file.
chunk file with start_pos ~ end_pos: 184081224 ~ 276121836, Download successfully!
291270053/368162450
Download 79.40% of the file.
chunk file with start_pos ~ end_pos: 276121836 ~ 368162449, Download successfully!
319347531/368162450
Download 87.03% of the file.
chunk file with start_pos ~ end_pos: 0 ~ 92040612, Download successfully!
358959344/368162450
Download 97.50% of the file.
<Process(Process-2, stopped)>
<Process(Process-3, stopped)>
<Process(Process-4, stopped)>
<Process(Process-5, stopped)>
358959344/368162450
Full file Download successfully! Cost time: 5.03s