关于E-factory、ElementPath

最新推荐文章于 2022-02-24 17:37:27 发布

like_LeafFlying

最新推荐文章于 2022-02-24 17:37:27 发布

阅读量791

点赞数

分类专栏： python 文章标签： python lxml xml html xml解析

python 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

关于E-factory、ElementPath

The E-factory

E-factory为XML和HTML的生成提供了一种更加简单、紧凑的语法支持。

>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python
...     return {"class":' '.join(args)}

>>> html = page = (
...   E.html(       # create an Element called "html"
...     E.head(
...       E.title("This is a sample document")
...     ),
...     E.body(
...       E.h1("Hello!", CLASS("title")),
...       E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
...       E.p("This is another paragraph, with a", "\n      ",
...         E.a("link", href="http://www.python.org"), "."),
...       E.p("Here are some reservered characters: <spam&egg>."),
...       etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
...     )
...   )
... )

>>> print(etree.tostring(page, pretty_print=True))
<html>
  <head>
    <title>This is a sample document</title>
  </head>
  <body>
    <h1 class="title">Hello!</h1>
    <p>This is a paragraph with <b>bold</b> text in it!</p>
    <p>This is another paragraph, with a
      <a href="http://www.python.org">link</a>.</p>
    <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>
    <p>And finally an embedded XHTML fragment.</p>
  </body>
</html>

基于属性访问的元素创建使得为xml语言创建简单的词汇表非常容易：

>>> from lxml.builder import ElementMaker # lxml only !

>>> E = ElementMaker(namespace="http://my.de/fault/namespace",
...                  nsmap={'p' : "http://my.de/fault/namespace"})

>>> DOC = E.doc
>>> TITLE = E.title
>>> SECTION = E.section
>>> PAR = E.par

>>> my_doc = DOC(
...   TITLE("The dog and the hog"),
...   SECTION(
...     TITLE("The dog"),
...     PAR("Once upon a time, ..."),
...     PAR("And then ...")
...   ),
...   SECTION(
...     TITLE("The hog"),
...     PAR("Sooner or later ...")
...   )
... )

>>> print(etree.tostring(my_doc, pretty_print=True))
<p:doc xmlns:p="http://my.de/fault/namespace">
  <p:title>The dog and the hog</p:title>
  <p:section>
    <p:title>The dog</p:title>
    <p:par>Once upon a time, ...</p:par>
    <p:par>And then ...</p:par>
  </p:section>
  <p:section>
    <p:title>The hog</p:title>
    <p:par>Sooner or later ...</p:par>
  </p:section>
</p:doc>

这样的一个例子是模块lxml.html.builder，它为HTML提供了一个词汇表。当处理多个命名空间时，最佳实践是为每个命名空间URI定义一个ElementMaker 。再次，请注意上面的例子是如何用命名常量预定义（tag builder），这也使得很容易把命名空间的标签声明放入一个模块，同样可以方便地从模块中使用标签命名常量。这也可以避免一些陷阱，例如错别字、命名空间的意外丢失。

　元素路径（ElementPath）

ElementTree库配备了简单的类XPATH的语言，叫做“ElementPath”。主要的不同点是，你可以在ElementPath 表达式中使用像{namespace}tag这样的符号。然而，像值比较、函数等高级功能却是不可用的。
除了完整的XPATH的实现，lxml.etree也支持ElementPath语言，就像ElementTree那样。这个API提供了4个方法，你可以在元素、节点树（ElementTrees）中找到它们：
- iterfind() 遍历所有匹配路径表达式的元素
- findall() 返回一个匹配元素的列表
- find() 仅返回第一个匹配元素
- findtext() 返回第一个匹配匹配元素的文本内容

这里有一些例子：

>>> root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")

寻找一个节点的孩子节点：

>>> print(root.find("b"))
None
>>> print(root.find("a").tag)
a

查找节点树中的任意一个节点：

>>> print(root.find(".//b").tag)
b
>>> [ b.tag for b in root.iterfind(".//b") ]
['b', 'b']

查到包含一定属性的节点：

>>> print(root.findall(".//a[@x]")[0].tag)
a
>>> print(root.findall(".//a[@y]"))
[]

在lxml 3.4中，有一个辅助的方法为元素生成结构化的ElementPath表达式：

>>> tree = etree.ElementTree(root)
>>> a = root[0]
>>> print(tree.getelementpath(a[0]))
a/b[1]
>>> print(tree.getelementpath(a[1]))
a/c
>>> print(tree.getelementpath(a[2]))
a/b[2]
>>> tree.find(tree.getelementpath(a[2])) == a[2]
True

只要“树”还没被修改，这个路径表达式就代表了给定的节点，你可以在后续中把它用于find方法。与XPath相比，ElementPath表达式有更加独立的优势，甚至对于使用命名空间的文档。

.iter()方法是一个特殊的例子，它仅通过的标签名在“树”查找指定的节点，不是基于路径。这也意味着下面的命令在成功的情况下是等价的：

>>> print(root.find(".//b").tag)
b
>>> print(next(root.iterfind(".//b")).tag)
b
>>> print(next(root.iter("b")).tag)
b

注，find()方法如果未找到目标节点，它仅会返回None。然而，其他两个例子将抛出* StopIteration*异常。

like_LeafFlying

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录