Python解决下载pdf问题bug

代码如下:
  • 1、BUG问题:requests返回的二进制结果resp.content=b’ '为空,无法下载pdf
  • 2、产生原因: response.close()方法会调用HttpWorkerRequest.CloseConnection()方法。终止(Terminate)与客户端的套接字连接,并使得服务器,客户端以及之间设施上的缓存(buffer)失效。导致发送到客户端的数据丢失。如果还未存数据就关闭连接,易造成数据丢失。
    在这里插入图片描述
  • 3、解决方法:response.close()关闭提后或者不使用
def get_proxies():
    proxy = {}
    return proxy

def pdf_url_requests(pdf_url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "zh-CN,zh;q=0.9",
    }
    for i in range(4):
        try:
            resp = requests.get(pdf_url, headers=headers, proxies=proxy, stream=True, timeout=20)
            resp.encoding = "utf-8"
        except socket.error as err:
            logging.warning(f"{pdf_url} : this website may socket timeout, sleep 1s and try again : {err} ")
            time.sleep(random.uniform(0.5, 1.5))
            proxy = get_proxies()
        except Exception as _e:
            time.sleep(random.uniform(0.5, 1.5))
            logging.exception(f'{pdf_url} :Exception error happened in methods_get_requests :{_e}')
            proxy = get_proxies()
        else:
            if not resp:
                continue
            return resp

def download_pdf(pdf_url, pdf_name):
    r = pdf_url_requests(pdf_url)
    pdf_path = f"{E:/pdf/AuditReport/{pdf_name}"
    with open(pdf_path, 'wb') as f:
        for content in r.iter_content(chunk_size=512):
            if content:
                f.write(content)
    r.close()
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值