lxml和requests问题记录

最新推荐文章于 2022-12-28 15:27:30 发布

txf-ly

最新推荐文章于 2022-12-28 15:27:30 发布

阅读量177

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/tianxifeng/article/details/102803912

版权

Python 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

lxml里面xpath获取结果不完整解决办法

去掉html里面内容为\x00的部分

r = requests.get(url)
body = r.text.strip().replace('\x00', '').encode('utf8') or b'<html/>'
root = etree.fromstring(body, parser=etree.HTMLParser(recover=True, encoding='utf8'))
root.xpath(...)

request传输编码为`Transfer-Encoding: chunked`时乱码问题

当传输编码为Transfer-Encoding: chunked时，requests返回的text无法解析，此时修改requests的header，把accept-encoding修改成空即可，例如：

accept-encoding：''

附：推荐html数据解析库parsel，基于lxml和cssselect，同时支持xpath和css方法。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

txf-ly

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
lxml和requests问题记录

lxml里面xpath获取结果不完整解决办法去掉html里面内容为\x00的部分r = requests.get(url)body = r.text.strip().replace('\x00', '').encode('utf8') or b'<html/>'root = etree.fromstring(body, parser=etree.HTMLPars...
复制链接

扫一扫