python requests 下载文件例子

russle

已于 2022-07-03 23:21:38 修改

阅读量1.8k

点赞数 1

分类专栏： python 文章标签： python 开发语言爬虫

于 2022-07-03 23:18:00 首次发布

本文链接：https://blog.csdn.net/russle/article/details/125591386

版权

python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1，背景

通过python的request下载文件，代码本身很简单，唯一需要说明的而即使需要通过session机制实现keep-alive的时候。

我使用python requests库中resue http conection的的session机制，官方文档在https://requests.readthedocs.io/en/latest/user/advanced/

1.1 request Session对象

Session Objects

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

1.2 Keep-Alive

https://requests.readthedocs.io/en/latest/user/advanced/#keep-alive
Keep-Alive

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!

Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object.

2，代码

# This is a sample Python script of downloading file and writing.

from datetime import datetime
import time

import requests

"""
  提前准备好一个可以下载文件的url，并且不需要认证，因为本示例中没有添加header信息，直接通过get下载文件
"""

timeFormat = "%Y-%m-%d %H:%M:%S.%f"


def download_file(session, url, file_name):
    chrome = ""
    headers = {"User-agent": chrome}

    r = session.get(url, headers=headers, stream=True)
    with open(file_name, 'wb') as f:
        f.write(r.content)
        f.flush()
        f.close()


def downloadAction(downloadCount):

    url = "https://a.b.c/dw?file=a.txt"

    startTime = datetime.now()
    startTimeStr = datetime.strptime(str(startTime), timeFormat)
    print("file download startTime:%s" % startTimeStr)

    session = requests.Session()
	# 保存到当前目录下data文件夹下，文件名以file开始，
    for index in range(downloadCount):
        filePath = "./data/file{0}.txt".format(index)
        download_file(session, url, filePath, isEnabledKeepAlive)

    session.close()

    endTime = datetime.now()
    endTimeStr = datetime.strptime(str(endTime), timeFormat)
    print("file download   endTime:%s" % endTimeStr)
    consumedTimeBySecond = (endTime - startTime).seconds
	#  假设文件为10M大小
	totalFileSize = downloadCount * 10
    avgSpeed = totalFileSize / consumedTimeBySecond
    print(" %d times downlaod file, consumed(second):%d, avgSpeed:%f" %
          downloadCount, consumedTimeBySecond, avgSpeed))

   
if __name__ == '__main__':
    downloadAction(10, True)

russle

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python requests 下载文件例子

通过python的request下载文件，代码本身很简单，唯一需要说明的而即使需要通过session机制实现keep-alive的时候。我使用python requests库中resue http conection的的session机制，官方文档在https://requests.readthedocs.io/en/latest/user/advanced/Session ObjectsThe Session object allows you to persist certain parameters
复制链接

扫一扫