xpath解析无法得到结果

最新推荐文章于 2023-02-11 17:49:45 发布

Yoooung～

最新推荐文章于 2023-02-11 17:49:45 发布

阅读量1.2k

点赞数 1

分类专栏： python 文章标签： python 爬虫开发语言

本文链接：https://blog.csdn.net/m0_54797890/article/details/124771185

版权

今天在做网页解析的时候遇到一个问题，使用了正确的xpath解析式但是得不到想要的网页内容，在确定了爬取的内容没有问题之后，发现是在etree.HTML(r.text)进行网页转换时一些转义字符导致部分标签被注释掉从而无法通过xpath获取

解决办法：

用selector

# selector 方式
import requests
from lxml import etree
from bs4 import BeautifulSoup

url = 'https://gitee.com/al-one/hass-xiaomi-miot'
headers = {
   
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    }

r = requests.get(url=url, headers=headers)
content = r.text
html = BeautifulSoup(content, 'html.parser')
res = html.select('#git-readme > div > div.file_content.markdown-body')[

最低0.47元/天解锁文章

Yoooung～

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
xpath解析无法得到结果

今天在做网页解析的时候遇到一个问题，使用了正确的xpath解析式但是得不到想要的网页内容，在确定了爬取的内容没有问题之后，发现是在etree.HTML(r.text)进行网页转换时一些转义字符导致部分标签被注释掉从而无法通过xpath获取解决办法：用selector# selector 方式import requestsfrom lxml import etreefrom bs4 import BeautifulSoupurl = 'https://gitee.com/al-one/ha
复制链接

扫一扫