python xpath库_python中的xpath库：lxml

最新推荐文章于 2024-04-07 08:00:00 发布

weixin_39566578

最新推荐文章于 2024-04-07 08:00:00 发布

阅读量302

点赞数

文章标签： python xpath库

标签元素：

LXML是处理和分析HTML的高效的库，它支持xpath语法。第一步引入lxml库

form lxml import etree

lxml的Element类是处理html 结点的基本类，一个Element可以看作一个html标签元素，看作标签元素

它应有，文本，属性(root.attrib )，节点名(root.)。

>>> root=etree.Element("root")

>>> type(root)

>>> root=etree.Element( "body" , name="names" )

>>> root.attrib

{'name': 'names'}

>>> root.text="my_names"

>>> etree.tostring( root )

b'

my_names'

以上代码中建立一个 tagname为 :body , 属性：name=“names” ，文本：my_names 的 Element 类

root.attrib , root.text , root.tag 。

子元素问题：

>>> root.append( etree.Element("child1") )

>>> child2 = etree.SubElement(root, "child2")

>>> child3 = etree.SubElement(root, "child3")

以上是给 root 对象添加子对象 (及root 标签内套子标签),可看为可遍历的对象：

>>> child = root[0]

>>> print(child.tag)

child1

>>> print(len(root))

3

>>> root.index(root[1]) # lxml.etree only!

1

>>> for child in root:

... print(child.tag)

child0

child1

child2

child3

xpath语法：

>>> html = etree.Element("html")

>>> body = etree.SubElement(html, "body")

>>> body.text = "TAIL"

>>> html.text="TEXT"

>>> etree.tostring(html)

b'text

'

>>> print(html.xpath("//text()")) # lxml.etree only!

['TEXT', 'TAIL']

>>> build_text_list = etree.XPath("//text()") # lxml.etree only!

>>> print(build_text_list(html))

['TEXT', 'TAIL']

标签搜索:

find():返回第一个匹配对象，并且xpath语法只能使用相对路径(以’.//’开头)；

findall():返回一个标签对象的列表，并且xpath语法只能使用相对路径(以’.//’开头)；

xpath()：返回一个标签对象的列表，并且xpath语法的相对路径和绝对路径。

>>> root = etree.XML("aText")

>>> x=root.find('.//a[@x]')

>>> x

>>> x.text

'aText'

>>> x.tag

'a'

>>> x2=root.findall('.//a[@x]')

>>> x2

[]

>>> type(x2)

>>> x3=root.xpath('//a[@x]')

>>> type(x3)

>>> x3

[]

解析HTML

from lxml import etree

import requests

from chardet import detect

url='http://tool.chinaz.com/'

resp=requests.get(url,timeout=50)

html=resp.content

#识别编码

cder=detect(html)

html=html.decode(cder.get('encoding'))

tree=etree.HTML(html)

#打印全部a标签

hrefs=tree.xpath('//a')

for href in hrefs:

weixin_39566578

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。