requests下载pdf时遇到的坑！浏览器可以正常打开，但是用requests 请求返回为空（304）或者404

.java&&web

已于 2023-02-27 14:48:01 修改

阅读量2.1k

点赞数 1

分类专栏： python 文章标签： pdf python 开发语言

于 2023-02-27 14:37:01 首次发布

本文链接：https://blog.csdn.net/weixin_44055702/article/details/129241457

版权

python 专栏收录该内容

3 篇文章

订阅专栏

记录一下用requests 下载pdf时遇到的坑，与下载excel 等文件遇到的问题不太一样，url用浏览器打开就是一个完整的pdf，但是用requests发起请求时，不加请求头会返回404，加了请求头会返回304 ，但是内容也是空，必须得返回是206 ，这样的内容才会是pdf的真实内容

下面是解决方案：

def pdf_url_requests(pdf_url):
    # 第一步，先请求pdf的大小
    res_size = requests.head(pdf_url)
    content_length = res_size.headers['Content-Length']
    
    # 再用 pdf的大小去构建请求头
    headers = {
        'Connection': 'keep-alive',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Range': f'bytes=0-{content_length}',
    }
    
    resp = requests.get(pdf_url, headers=headers, stream=True, timeout=20)
    return resp

def download_pdf(pdf_url, file_name):
    res = pdf_url_requests(pdf_url)
    # 存储地址
    pdf_path = fr'./{file_name}'
    print(pdf_path)
    with open(pdf_path, 'wb') as f:
        for content in res.iter_content(chunk_size=512):
            if content:
                f.write(content)
    res.close()

download_pdf('https://bigquant.com/wiki/static/upload/f3/f3192d11-d1e8-4c51-9e09-3e762ed0a028.pdf', '模板.pdf')