处理方法
对响应头中有类似æµ\x99大ä¸\xadæ\x8e§ECS-700å\x8a\x9fè\x83½å\的中文乱码,重新编码
filename=response.headers['Content-Disposition'].encode('raw_unicode_escape').decode()
截取文件后缀
file_type = filename[filename.rfind('.') + 1:]
对于大文件可以chunk分块下载:
import requests
def file_download(file_url: str, filename: str):
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/87.0.4280.88 Safari/537.36',
}
res = requests.get(file_url, headers=headers, stream=True)
with open(filename, 'wb') as f:
for chunk in res.iter_content(chunk_size=1024 * 10):
if chunk:
f.write(chunk)
响应内容转换HtmlResponse使用样式选择器
import requests
from scrapy.http import HtmlResponse
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/72.0.3626.121 Safari/537.36',
}
req_url = 'https://www.baidu.com/'
res = requests.get(req_url, timeout=10, headers=headers)
res = HtmlResponse(req_url, body=res.content.decode(), encoding="utf-8")
print(res.text)
对部分https网站请求报错,在请求中添加 verify=False 参数