遍历xpath列表的问题

遍历xpath分类的问题【大坑】


  • 代码
from lxml import etree

text = ''' <div> <ul> 
        <li class="item-1"><a>first item</a></li> 
        <li class="item-1"><a href="link2.html">second item</a></li> 
        <li class="item-inactive"><a href="link3.html">third item</a></li> 
        <li class="item-1"><a href="link4.html">fourth item</a></li> 
        <li class="item-0"><a href="link5.html">fifth item</a>  
        </ul> </div> '''

html = etree.HTML(text)

ret = html.xpath("//li")
print(ret)  # [<Element li at 0x2d90f08>, <Element li at 0x2d90ee0>, <Element li at 0x2d90eb8>, <Element li at 0x2d90e90>, <Element li at 0x2d90e68>]
for i in ret:
    ret2 = i.xpath("//@class")
    print(ret2)
  • 结果:
[<Element li at 0x2d90f08>, <Element li at 0x2d90ee0>, <Element li at 0x2d90eb8>, <Element li at 0x2d90e90>, <Element li at 0x2d90e68>]
['item-1', 'item-1', 'item-inactive', 'item-1', 'item-0']
['item-1', 'item-1', 'item-inactive', 'item-1', 'item-0']
['item-1', 'item-1', 'item-inactive', 'item-1', 'item-0']
['item-1', 'item-1', 'item-inactive', 'item-1', 'item-0']
['item-1', 'item-1', 'item-inactive', 'item-1', 'item-0']
  • 理想结果:
[<Element li at 0x3230f30>, <Element li at 0x3230f08>, <Element li at 0x3230ee0>, <Element li at 0x3230eb8>, <Element li at 0x3230e90>]
['item-1']
['item-1']
['item-inactive']
['item-1']
['item-0']

  • 问题:
    在本次遍历中,每个i的etree.element内容不一样,理应xpath选择后不用,然而遍历后再xpath选择结果却一样。
    推论,在子element中调用xpath会自动返回至顶层xpath路径。


  • 解决办法:
    分类完,遍历时在调用xpath方法时,在路径前面加" . "直接指向当前路径,就不会指向顶层xpath路径。
    代码:
from lxml import etree

text = ''' <div> <ul> 
        <li class="item-1"><a>first item</a></li> 
        <li class="item-1"><a href="link2.html">second item</a></li> 
        <li class="item-inactive"><a href="link3.html">third item</a></li> 
        <li class="item-1"><a href="link4.html">fourth item</a></li> 
        <li class="item-0"><a href="link5.html">fifth item</a>  
        </ul> </div> '''

html = etree.HTML(text)

ret = html.xpath("//li")
print(ret)  # [<Element li at 0x2d90f08>, <Element li at 0x2d90ee0>, <Element li at 0x2d90eb8>, <Element li at 0x2d90e90>, <Element li at 0x2d90e68>]
for i in ret:
    ret2 = i.xpath(".//@class")  # 注意在//前面加了一个“.”
    print(ret2)
  • 最后:
    解决了问题,结果也正确,但没有验证推论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值