python中xpath常用方法小结

最新推荐文章于 2024-05-01 09:50:36 发布

大蛇王

最新推荐文章于 2024-05-01 09:50:36 发布

阅读量2.5k

点赞数

分类专栏： python 文章标签： xpath

本文链接：https://blog.csdn.net/t8116189520/article/details/78861177

版权

python 专栏收录该内容

69 篇文章 20 订阅

订阅专栏

这是一个test.html文件内容

<!-- hello.html -->
<div>
<ul>
<li class="item-0" text="1"><a href="link1.html">first item</a></li>
<li class="item-1" text="2"><a href="link2.html">second item</a></li>
<li class="item-inactive" text="3"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1" text="2"><a href="link4.html">fourth item</a></li>
<li class="item-0" text="1"><a href="link5.html">fifth item</a></li>
</ul>
</div>

以下是xpath使用方法

#coding:utf-8
import lxml
import lxml.etree

html=lxml.etree.parse("test.html")
print type(html)
res=html.xpath("//li")
print res
print len(res)#列表长度
print type(res) #元素列表
print type(res[0]) #树的元素

res1=html.xpath("//li/@class") #同级目录
print res1

res2=html.xpath("//li/@text")
print res2

res3=html.xpath("//li/a")#取出下一级
print res3

res4=html.xpath("//li/a/@href") #取出下一级的某个通有元素
print res4

res5=html.xpath("//li/a[@href=\"link3.html\"]") #取出下一级的某个通有元素
print res5

res6=html.xpath("//li//span") #取出下一级的某个通有元素
print res6

res6=html.xpath("//li//span/@class") #取出下一级的某个通有元素de class属性
print res6

res7=html.xpath("//li/a//@class") #取出下一级的某个通有元素de class属性
print res7

#res8=html.xpath("//li[1]") #第一个
res8=html.xpath("//li[last()]") #最后一个
print res8

res9=html.xpath("//li[last()]/a/@href") #最后一个li下面的a中的href
print res9

res9=html.xpath("//li[last()-1]/a/@href") #最后一个li下面的a中的href
print res9

res10=html.xpath("//*[@class=\"bold\"]") #最后一个li下面的a中的href
print res10

res11=html.xpath("//*[@text=\"3\"]") #最后一个li下面的a中的href
print res11

res11=html.xpath("//*[@text=\"3\"]/@class") #最后一个li下面的a中的href
print res