借鉴了Python 下载文件获取文件名_我叫农的博客-CSDN博客_python获取下载文件名代码
大文件时要令stream=True。默认情况下false,会立即下载并保存在内存中。
默认情况下是false,他会立即开始下载文件并存放到内存中,倘若文件过大就会导致内存不足的情况。
response = requests.get(url, headers=headers, timeout=12)
with open('C:\\Users\\Administrator\\Desktop\\new1.pdf', 'wb') as f:
f.write(response.content)
stream参数设置成True时,它不会立即开始下载,使用iter_content或iter_lines遍历内容或访问内容属性时才开始下载。
iter_content:一块一块的遍历要下载的内容
iter_lines:一行一行的遍历要下载的内容
使用上面两个函数下载大文件可以防止占用过多的内存,因为每次只下载小部分数据。
res = requests.get(url_file, stream=True)
with open("file_path", "wb") as pyFile: # 绝对路径如'C:\\xx\down\\'
for chunk in res.iter_content(chunk_size=1024): # 1024B
if chunk:
pyFile.write(chunk)
import os
import time
from urllib.parse import unquote
import requests
headers = {
'...':'...',
}
def get_file_name(url, headers):
filename = ''
if 'Content-Disposition' in headers and headers['Content-Disposition']:
disposition_split = headers['Content-Disposition'].split(';')
if len(disposition_split) > 1:
if disposition_split[1].strip().lower().startswith('filename='):
file_name = disposition_split[1].split('=')
if len(file_name) > 1:
filename = unquote(file_name[1])
if not filename and os.path.basename(url):
filename = os.path.basename(url).split("?")[0]
if not filename:
return time.time()
return filename
def download_file(response, file_name)
with open(file_path+file_name, "wb") as pyFile:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
pyFile.write(chunk)
print('下载完成')
def start(url):
get_file = requests.get(url=url, headers=headers, stream=True, allow_redirects=False, timeout=10)
# content_length = get_file.headers['Content-Length'] # Transfer-Encoding:chunked时为块传输,无content_length
file_name = get_file_name(url, get_file.headers)
download_file(response, file_name)
print("文件大小:", content_length, "文件名称:" + file_name)
if __name__ == '__main__':
file_path = 'C:\\xx\down\\'
url = 'https://iterm2.com/downloads/stable/iTerm2-3_3_6.zip'
start(url)
1112

被折叠的 条评论
为什么被折叠?



