Python爬虫之pyquery获取不到元素

最新推荐文章于 2023-02-14 00:14:20 发布

置顶流云浅暮

最新推荐文章于 2023-02-14 00:14:20 发布

阅读量2.3k

点赞数 2

分类专栏： Python 爬虫文章标签： Python 爬虫

本文链接：https://blog.csdn.net/qq_40176258/article/details/85041089

版权

Python 同时被 2 个专栏收录

20 篇文章 0 订阅

订阅专栏

爬虫

14 篇文章 0 订阅

订阅专栏

今天在做爬虫项目的时候出现了一个错误，通过pyquery获取不到元素。


from pyquery import PyQuery as pq

html = '''
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>TEST</title>
</head>
<body>
    <div class="warp">
        <ul class="goodsList">
            <li>this is the test1</li>
            <li>this is the test2</li>
            <li>this is the test3</li>
            <li>this is the test4</li>
        </ul>
    </div>
</body>
</html>
'''
doc = pq(html)
element = doc('.warp ul li:first-child')
print(element)

运行结果：

None

但是pyquery中的选择器并没有错误，但是运行结果一直是None。这是为什么呢？后来通过查看相关文档得知，pyquery解析的是html类型的字符串，但是上面的类型是xhtml，所以会获取不到元素。可以在pq()方法初始化字符串时加上parser="html"告诉pyquery使用html规范解析，即可解决上述问题。

from pyquery import PyQuery as pq

html = '''
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>TEST</title>
</head>
<body>
    <div class="warp">
        <ul class="goodsList">
            <li>this is the test1</li>
            <li>this is the test2</li>
            <li>this is the test3</li>
            <li>this is the test4</li>
        </ul>
    </div>
</body>
</html>
'''
doc = pq(html,parser="html")
element = doc('.warp ul li:first-child')
if element:
    print(element)
else:
    print('None')

运行结果：

<li>this is the test1</li>

流云浅暮

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫之pyquery获取不到元素

今天在做爬虫项目的时候出现了一个错误，通过pyquery获取不到元素。from pyquery import PyQuery as pqhtml = '''&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;head&gt; &lt;title&gt;TEST&lt;/title&gt;&lt;/head&gt;&..
复制链接

扫一扫