python elementtree解析xml 缩进,[python]使用ElementTree解析XML【译】

最新推荐文章于 2024-08-25 14:05:13 发布

Aowandowski

最新推荐文章于 2024-08-25 14:05:13 发布

阅读量385

点赞数

文章标签： python elementtree解析xml 缩进

19.7 The ElementTree XML API

源码：Lib/xml/etree/ElementTree.py

Element类型是一个灵活的容器对象，设计出来是用于存储有层次的数据结构到内存中。这个类型可以描述为是列表与字典之间的交叉。

警告：

xml.etree.ElementTree模块对于恶意构造的数据不是安全的。如果你需要解析不可信和未经身份验证的数据请查看XML vulnerabilities.

每个元素都有一系列与其关联的属性：

1. 标签，用于标识该元素表示哪种数据(即元素类型)

2. 一些属性，存储在Python dictionary中

3. 一个文本字符串

4. 一个可选的尾字符串

5. 一些孩子elements，存储在Python sequence中

为了创建一个element实例，使用Element 构造函数或者SubElement()工厂函数。

ElementTree 类可以用来包裹一个element结构，用于与XML进行相互转换。

一个 C语言实现的可用 API ： xml.etree.cElementTree.

Changed in version 2.7: The ElementTree API is updated to 1.3. For more information, see Introducing ElementTree 1.3.

19.7.1. 综述

这是关于使用xml.etree.ElementTree (ET)的简要综述，目的是演示如何创建block和模块的基本概念。

19.7.1.1. XML树和elements

XML是一个固有层次的数据格式，最自然的方式代表他就是用tree. ET有两个类来实现这个目的。ElementTree表示整个XML文档，

Element表示树中的一个节点。遍历整个文档(读写文件)通常使用 ElementTree, 遍历单独的节点或者子节点通常使用element。

19.7.1.2. 解析 XML

我们将使用下面的XML文档作为本节的示例数据：

2008

141100

2011

59900

2011

13600

我们有多种方法导入数据：

从硬盘文件导入：

import xml.etree.ElementTree as ET

tree = ET.parse('country_data.xml')

root = tree.getroot()

通过字符串导入：

root = ET.fromstring(country_data_as_string)

fromstring() 解析XML时直接将字符串转换为一个 Element，解析树的根节点。其他的解析函数会建立一个 ElementTree。

一个Element, 根节点有一个tag以及一些列属性(保存在dictionary中)

>>> root.tag

'data'

>>> root.attrib

{}

有一些列子节点可供遍历：

>>> for child in root:

... print child.tag, child.attrib

...

country {'name': 'Liechtenstein'}

country {'name': 'Singapore'}

country {'name': 'Panama'}

子节点是嵌套的，我们可以通过索引访问特定的孩子节点：

>>> root[0][1].text

'2008'

19.7.1.3. 查找感兴趣的element

Element 拥有一些方法来帮助我们迭代遍历其子树。例如：Element.iter():

>>> for neighbor in root.iter('neighbor'):

... print neighbor.attrib

...

{'name': 'Austria', 'direction': 'E'}

{'name': 'Switzerland', 'direction': 'W'}

{'name': 'Malaysia', 'direction': 'N'}

{'name': 'Costa Rica', 'direction': 'W'}

{'name': 'Colombia', 'direction': 'E'}

Element.findall()仅查找当前element直接的孩子属于某个tag的elements

Element.find() 查找属于某个tag的第一个element

Element.text 访问element的文本内容

Element.get()获取element的属性

>>> for country in root.findall('country'):

... rank = country.find('rank').text

... name = country.get('name')

... print name, rank

...

Liechtenstein 1

Singapore 4

Panama 68

使用XPath.可以更加巧妙的访问element。

19.7.1.4. 修改XML文件

ElementTree 提供了一个简单的方法来建立XML文档并将其写入文件。 ElementTree.write() 提供了这个功能。

一旦被建立，一个 Element 对象可能会进行以下操作：改变文本(比如Element.text), 添加或修改属性 (Element.set() ), 添加孩子(例如 Element.append()).

假设我们想将每个国家的排名+1，并且增加一个updated属性:

>>> for rank in root.iter('rank'):

... new_rank = int(rank.text) + 1

... rank.text = str(new_rank)

... rank.set('updated', 'yes')

...

>>> tree.write('output.xml')

我们新的xml文件将如下显示：

2008

141100

2011

59900

2011

13600

我们可以使用这个函数来删除节点：Element.remove(). 让我们删除所有排名大于50的国家：

>>> for country in root.findall('country'):

... rank = int(country.find('rank').text)

... if rank > 50:

... root.remove(country)

...

>>> tree.write('output.xml')

我们新的xml文件将如下显示：

2008

141100

2011

59900

19.7.1.5. 创建XML文档：

SubElement() 函数也提供了一个为已有element创建子element的简便方法：

>>> a = ET.Element('a')

>>> b = ET.SubElement(a, 'b')

>>> c = ET.SubElement(a, 'c')

>>> d = ET.SubElement(c, 'd')

>>> ET.dump(a)

19.7.1.6. 解析带有命名空间的xml

如果一个xml中有命名空间，标签和有前缀形式的属性，比如prefix:sometag，利用{uri}sometag 格式来代替整个uri.

如果有默认的命名空间，则整个的uri使用没有前缀的标签。

下面这个例子是两种命名空间的结合，一个是带有前缀fictional,另一个是默认的命名空间。

xmlns="http://people.example.com">

John Cleese

Lancelot

Archie Leach

Eric Idle

Sir Robin

Gunther

Commander Clement

一种搜索和探索这个xml例子的方式是手动添加URI到每个标签或属性中在xpath的 find()或者findall()方式

root = fromstring(xml_text)

for actor in root.findall('{http://people.example.com}actor'):

name = actor.find('{http://people.example.com}name')

print name.text

for char in actor.findall('{http://characters.example.com}character'):

print ' |-->', char.text

另一种更好的方法，搜索这个xml例子的方式是创建一个包含你自己创建的前缀的字典，用它们进行搜索功能：

ns = {'real_person': 'http://people.example.com',

'role': 'http://characters.example.com'}

for actor in root.findall('real_person:actor', ns):

name = actor.find('real_person:name', ns)

print name.text

for char in actor.findall('role:character', ns):

print ' |-->', char.text

这两种方式的输出结果都是下面这个样子：

John Cleese

|--> Lancelot

|--> Archie Leach

Eric Idle

|--> Sir Robin

|--> Gunther

|--> Commander Clement

19.7.1.7. Additional resources

See http://effbot.org/zone/element-index.htm for tutorials and links to other docs.

19.7.2. XPath support

该模块提供了对XPath expressions 的有限的支持。目的是支持其中的一部分句法；完整的XPath工程超出了这个模块的范畴。

19.7.2.1. Example

import xml.etree.ElementTree as ET

root = ET.fromstring(countrydata)

# Top-level elements

root.findall(".")

# All 'neighbor' grand-children of 'country' children of the top-level

# elements

root.findall("./country/neighbor")

# Nodes with name='Singapore' that have a 'year' child

root.findall(".//year/..[@name='Singapore']")

# 'year' nodes that are children of nodes with name='Singapore'

root.findall(".//*[@name='Singapore']/year")

# All 'neighbor' nodes that are the second child of their parent

root.findall(".//neighbor[2]")

19.7.2.2. 支持的 XPath 语法

tag

选中给定标签的子元素，举个例子：

spam-表示选择所有叫做spam的子元素

spam/egg-表示选择所有命名为egg的所有孙子，在命名为spam的子元素中。

选中全部孩子elements。 For example, */egg selects all grandchildren named egg.

选中当前element。 This is mostly useful at the beginning of the path, to indicate that it’s a relative path.

选中同一级别的全部子element. For example, .//egg selects all egg elements in the entire tree.

选中父亲节点；

[@attrib]

选中含有给定属性的全部节点。

[@attrib='value']

选中含有给定属性以及给定属性值的全部节点。The value cannot contain quotes.

[tag]

选中所有拥有一个叫做tag的孩子的elements。 Only immediate children are supported.

[tag='text']

选中所有拥有一个叫做tag的孩子的elements，该elements包含值为text

[position]

选中所有位于指定位置的elements。 The position can be either an integer

(1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1).

Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate. position predicates must be preceded by a tag name.

未完待续

Aowandowski

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫