1,背景
通过python的request下载文件,代码本身很简单,唯一需要说明的而即使需要通过session机制实现keep-alive的时候。
我使用python requests库中resue http conection的的session机制, 官方文档在https://requests.readthedocs.io/en/latest/user/advanced/
1.1 request Session对象
Session Objects
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
1.2 Keep-Alive
https://requests.readthedocs.io/en/latest/user/advanced/#keep-alive
Keep-Alive
Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!
Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object.
2, 代码
# This is a sample Python script of downloading file and writing.
from datetime import datetime
import time
import requests
"""
提前准备好一个可以下载文件的url,并且不需要认证,因为本示例中没有添加header信息,直接通过get下载文件
"""
timeFormat = "%Y-%m-%d %H:%M:%S.%f"
def download_file(session, url, file_name):
chrome = ""
headers = {"User-agent": chrome}
r = session.get(url, headers=headers, stream=True)
with open(file_name, 'wb') as f:
f.write(r.content)
f.flush()
f.close()
def downloadAction(downloadCount):
url = "https://a.b.c/dw?file=a.txt"
startTime = datetime.now()
startTimeStr = datetime.strptime(str(startTime), timeFormat)
print("file download startTime:%s" % startTimeStr)
session = requests.Session()
# 保存到当前目录下data文件夹下,文件名以file开始,
for index in range(downloadCount):
filePath = "./data/file{0}.txt".format(index)
download_file(session, url, filePath, isEnabledKeepAlive)
session.close()
endTime = datetime.now()
endTimeStr = datetime.strptime(str(endTime), timeFormat)
print("file download endTime:%s" % endTimeStr)
consumedTimeBySecond = (endTime - startTime).seconds
# 假设文件为10M大小
totalFileSize = downloadCount * 10
avgSpeed = totalFileSize / consumedTimeBySecond
print(" %d times downlaod file, consumed(second):%d, avgSpeed:%f" %
downloadCount, consumedTimeBySecond, avgSpeed))
if __name__ == '__main__':
downloadAction(10, True)
本文介绍如何使用Python的requests库实现文件下载,并利用Session对象保持连接活跃,提高下载效率。文中提供了一个示例脚本,展示了如何批量下载文件。
5478

被折叠的 条评论
为什么被折叠?



