使用请求在python中下载大文件

本文翻译自:Download large file in python with requests

Requests is a really nice library. 请求是一个非常不错的库。 I'd like to use it for download big files (>1GB). 我想用它来下载大文件(> 1GB)。 The problem is it's not possible to keep whole file in memory I need to read it in chunks. 问题是不可能将整个文件保留在内存中,我需要分块读取它。 And this is a problem with the following code 这是以下代码的问题

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return 

By some reason it doesn't work this way. 由于某种原因,它无法按这种方式工作。 It still loads response into memory before save it to a file. 仍将响应加载到内存中,然后再将其保存到文件中。

UPDATE 更新

If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here . 如果您需要一个小型客户端(Python 2.x /3.x),可以从FTP下载大文件,则可以在此处找到它。 It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task. 它支持多线程和重新连接(它确实监视连接),还可以为下载任务调整套接字参数。


#1楼

参考:https://stackoom.com/question/1836h/使用请求在python中下载大文件


#2楼

Your chunk size could be too large, have you tried dropping that - maybe 1024 bytes at a time? 您的块大小可能太大,您是否尝试过删除它-一次一次可能是1024个字节? (also, you could use with to tidy up the syntax) (同样,您可以使用with整理语法)

def DownloadFile(url):
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return 

Incidentally, how are you deducing that the response has been loaded into memory? 顺便说一句,您如何推断响应已加载到内存中?

It sounds as if python isn't flushing the data to file, from other SO questions you could try f.flush() and os.fsync() to force the file write and free memory; 听起来好像python没有将数据刷新到文件,从其他SO问题 f.flush()您可以尝试f.flush()os.fsync()来强制文件写入和释放内存;

    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                f.flush()
                os.fsync(f.fileno())

#3楼

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file: 使用以下流代码,无论下载文件的大小如何,Python内存的使用都受到限制:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)
                    # f.flush()
    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size ; 注意,使用iter_content返回的字节数不完全是chunk_size it's expected to be a random number that is often far bigger, and is expected to be different in every iteration. 它应该是一个通常更大的随机数,并且在每次迭代中都应该有所不同。

See http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow for further reference. 有关更多参考,请参见http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow


#4楼

It's much easier if you use Response.raw and shutil.copyfileobj() : 如果使用Response.rawshutil.copyfileobj()会容易得多:

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

This streams the file to disk without using excessive memory, and the code is simple. 这样就无需占用过多内存就可以将文件流式传输到磁盘,并且代码很简单。


#5楼

Not exactly what OP was asking, but... it's ridiculously easy to do that with urllib : 不完全是OP的要求,但是...使用urllib做到这一点非常容易:

from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file: 或这样,如果您要将其保存到临时文件中:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

I watched the process: 我看了看这个过程:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

And I saw the file growing, but memory usage stayed at 17 MB. 而且我看到文件在增长,但内存使用量保持在17 MB。 Am I missing something? 我想念什么吗?

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 在 Python ,可以使用第三方库 `requests` 来实现下载文件。首先需要使用 pip 安装 requests 库。在终端输入: ``` pip install requests ``` 安装完成后,可以在 Python 代码使用 `requests.get()` 函数来下载文件。示例代码如下: ``` import requests url = 'http://example.com/file.zip' response = requests.get(url) with open('file.zip', 'wb') as f: f.write(response.content) ``` 其 'http://example.com/file.zip' 替换为需要下载文件的 URL, 'file.zip' 替换为需要保存为的文件名。 另外,在需要用代理上网的情况下,可以使用 ``` proxies = {'http': 'http://proxy_ip:port'} response = requests.get(url, proxies=proxies) ``` 来实现代理下载. ### 回答2: 在Python,可以使用多种方法使用本地下载下载文件。 一种常见的方法是使用`urllib`模块的`urlretrieve()`函数。该函数可以从指定的URL下载文件到本地。以下是一个示例代码: ```python import urllib url = 'http://example.com/file.zip' local_file = 'path/to/save/file.zip' urllib.urlretrieve(url, local_file) ``` 另一种方法是使用`requests`库进行下载。`requests`库是一个常用的HTTP库,它提供了更多的功能和灵活性。以下是一个示例代码: ```python import requests url = 'http://example.com/file.zip' local_file = 'path/to/save/file.zip' response = requests.get(url) with open(local_file, 'wb') as f: f.write(response.content) ``` 在这个例子,我们首先使用`requests.get()`函数发送一个HTTP GET请求,获取服务器响应。然后,我们使用`open()`函数将文件以二进制写模式打开,并将响应内容写入文件。 这些都是简单的示例代码,你可以根据自己的需求进行进一步的定制。无论你选择使用`urllib`还是`requests`,下载文件时请记得指定正确的URL和本地文件路径。另外,一定要注意网站的下载规则和条款,确保你有权利进行下载。 ### 回答3: 在Python,可以使用多种方式下载文件。其一种常见的方式是使用`urllib`库或`requests`库来进行文件下载使用`urllib`库下载文件的步骤如下: 1. 首先,导入`urllib.request`模块。 ```python import urllib.request ``` 2. 然后,使用`urllib.request.urlretrieve()`方法指定文件的URL和保存的本地路径进行下载。 ```python url = "https://example.com/file.txt" # 文件的URL file_path = "/path/to/save/file.txt" # 保存的本地路径 urllib.request.urlretrieve(url, file_path) ``` 使用`requests`库下载文件的步骤如下: 1. 首先,导入`requests`库。 ```python import requests ``` 2. 然后,使用`requests.get()`方法发送HTTP GET请求并获取文件内容。 ```python url = "https://example.com/file.txt" # 文件的URL response = requests.get(url) ``` 3. 接着,将获取的文件内容保存到本地文件。 ```python file_path = "/path/to/save/file.txt" # 保存的本地路径 with open(file_path, "wb") as file: file.write(response.content) ``` 以上就是使用本地下载下载文件的基本步骤。根据实际需求,还可以加入异常处理、进度显示等功能来增强下载器的稳定性和可用性。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值