lxml解析中的命名空间参数

最新推荐文章于 2024-03-01 18:21:32 发布

lizz2276

最新推荐文章于 2024-03-01 18:21:32 发布

阅读量378

点赞数

原文链接：https://www.cnpython.com/qa/521675

版权

我有一个html页面，我正在尝试解析。以下是我对lxml的操作：

node=etree.fromstring(html)
>>> node
<Element {http://www.w3.org/1999/xhtml}html at 0x110676a70>
>>> node.xpath('//body')
[]
>>> node.xpath('body')
[]

不幸的是，我所有的xpath调用都返回一个空列表。为什么会发生这种情况？我该如何修复此呼叫？在

方法1、可以在此处添加命名空间，如下所示：

>>> node.xpath('//xmlns:tr', namespaces={'xmlns':'http://www.w3.org/1999/xhtml'})
[<Element {http://www.w3.org/1999/xhtml}tr at 0x11067b6c8>, <Element {http://www.w3.org/1999/xhtml}tr at 0x11067b710>]

更好的方法是使用lxml'shtml解析器：

^{pr2}$

方法2、查询时需要使用命名空间前缀。像

node.xpath('//html:body', namespaces={'html': 'http://...'})

或者您可以使用.nsmap

^{pr2}$

这假设所有名称空间都是在node指向的标记上定义的。对于大多数xml文档，这通常是正确的。在

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

lizz2276

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lxml解析中的命名空间参数

我有一个html页面，我正在尝试解析。以下是我对lxml的操作：node=etree.fromstring(html)>>> node<Element {http://www.w3.org/1999/xhtml}html at 0x110676a70>>>> node.xpath('//body')[]>>> node.xpath('body')[]不幸的是，我所有的xpath调用都返回一个空列表。为什么会发生这种情况
复制链接

扫一扫