lxml.etree的使用

最新推荐文章于 2024-07-28 02:29:46 发布

bb67ao

最新推荐文章于 2024-07-28 02:29:46 发布

阅读量1k

点赞数 1

分类专栏：爬虫

本文链接：https://blog.csdn.net/bb67ao/article/details/117261607

版权

爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

lxml.entree的使用

文档
常用的一个对象三个方法

文档

中文文档
其他博主的翻译，只有部分
英文文档
自己使用英文文档的方法，ctrl+f，页面搜索想要查看的方法

常用的一个对象三个方法

Element对象

以下所有方法的使用都要将我们的字符串利用.HTML()方法转化为Element对象。 **Element对象是一个列表**

etree.fromstring()(将字符串转化为Element对象)

from lxml import etree
text ='''
<root>
  <child1>nihao</child1>
  <child2>zhongg</child2>
</root>
'''
html = etree.fromstring(text)
print(html)

<Element root at 0x1c2698573c8>

etree.XML(str)（将XML转化为Element对象）

将字符串转化为Element对象，或者说解析XML页面,也可用于HTML,表现与fromstring类似

from lxml import etree
text ='''
<root>
  <child1>nihao</child1>
  <child2>zhongg</child2>
</root>
'''
xml = etree.XML(text)
print(xml)

[<Element child1 at 0x1815e2e64c8>]

etree.tostring()(将Element转化为string）

将Element转化为string

from lxml import etree
text ='''
<root>
  <child1>nihao</child1>
  <child2>zhongg</child2>
</root>
'''
html = etree.HTML(text)
print(etree.tostring(html))

不符合html格式的，转化为Element时etree会自动将他补全

b'<html><body><root>\n  <child1>nihao</child1>\n  <child2>zhongg</child2>\n</root>\n</body></html>'

etree.Parser（文件形式解析html内容）

文档传送门

Element.xpath（xpth定位标签）

用xpath定位到我们所需要的节点

from lxml import etree
text ='''
<root>
  <child1>nihao</child1>
  <child2>zhongg</child2>
</root>
'''
html = etree.HTML(text)
print(html.xpath("//child1/text()"))
#自己补充xpath内容

注意：Element对象是列表