python3. xml.etree.ElementTree 学习记录

最新推荐文章于 2024-03-06 11:26:16 发布

变量很难起

最新推荐文章于 2024-03-06 11:26:16 发布

阅读量8.1k

点赞数 4

分类专栏： Python专栏

本文链接：https://blog.csdn.net/weixin_42547344/article/details/81097633

版权

Python专栏专栏收录该内容

18 篇文章 3 订阅

订阅专栏

xml.etree.ElementTreee解析xml文件

首先贴一段xml，看官如果看的熟悉，没错.....这就是官网的

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

现在xml有了，就差把它引入解析了，可以这样来看代码:

# 首先从磁盘读取xml文件
import xml.etree.ElementTree as ET
tree = ET.parse('demo.xml')
root = tree.getroot()
# 其次我也可以从字符串中读取数据
root = ET.fromstring(demo_xml_string)

fromstring() ：将字符串中xml直接解析成树结构，root为解析树的根元素.

作为树 (Element)，root也有标签和属性字典，在撸个小码瞧瞧

>>> root.tag
'data'
>>> root.attrib
{}

标签为 ‘data’ ，属性字典类型为：{}

获取看官到这里也不想看这个了，后面还有大招，它还有子节点，也可以进行迭代

In [7]: for child in root:
   ...:     print child.tag, child.attrib
   ...:
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}

没个根节点下的子节点是嵌套的，也可以通过索引来访问特定的子节点，这样：

In [8]: root[0][1].text
Out[8]: '2008'

1.1 上面的方法只是初步使用魔法，还有好的方法来寻找元素.

Element 有一些方法可以帮助递归迭代根节点下的子节点元素(子节点下还有字节点的元素...有点意识模糊了..额)

Element.iter(): 递归寻找子节点下内容

In [15]: for n in root.iter('neighbor'):
    ...:     print n.attrib
    ...:
{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}

Element.findall(): 查找特定元素，并且这些元素是当前元素的直接子元素

Element.find(): 查找特定标记的第一个子元素，

Element.text(): 访问元素的内容

Element.get(): 访问元素属性

撸一段小码，看看例子

In [18]: for country in root.findall('country'):
    ...:     rank = country.find('rank').text
    ...:     name = country.get('name')
    ...:     print(name, rank)
    ...:
('Liechtenstein', '1')
('Singapore', '4')
('Panama', '68')

以上寻找元素完毕了，还有支持Xpath匹配，知道看官懒得去找，小弟特奉上w3cschool 地址，以供参考

1.2 上面是读取操作xml，也许我们有用到写/修改的需求，也可以做到

ElementTree 提供构建xml并且将其写入文件的简单方法，ElementTree.write() 次方法作用于写xml到文件，可以通过直接更改其字段 (Element.text)，添加和修改属性Element.set()方法，添加子节点Element.append() 来操作对象

下面用这个例子，假设想在每个国家/地区的排名中添加一个并更新在rank添加一个新元素

In [32]: for rank in root.iter('rank'):
    ...:     new_rank = int(rank.text) + 1
    ...:     rank.set('updated', 'yes')
    ...:

In [33]: tree.write('output.xml')


#####更新后xml######

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

也可以使用删除元素Element.remove() 假设我们删除排名高于50的所有国家/地区

>>> for country in root.findall('country'):
...     rank = int(country.find('rank').text)
...     if rank > 50:
...         root.remove(country)
...
>>> tree.write('output.xml')

现在在看一下xml文件内容删没删掉 <肯定删掉了呀....>

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
</data>

1.3 构建xml文档

使用SubElement() 函数创建子元素，请看下面

In [34]: a = ET.Element('a')

In [35]: b = ET.SubElement(a, 'b')

In [36]: c = ET.SubElement(b, 'c')

In [37]: d = ET.SubElement(c, 'd')

In [38]: ET.dump(a)
<a><b><c><d /></c></b></a>

1.4 使用命名空间解析

示例这是包含俩个命名空间的xml示例，一个名称前缀为'fictional'，另一个名称为默认名称空间.

<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
        xmlns="http://people.example.com">
    <actor>
        <name>John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>

搜索此xml一种方法是手动将url添加到 actors标签，或者 xpath 标记属性findall()

root = fromstring(xml_text)
for actor in root.findall('{http://people.example.com}actor'):
    name = actor.find('{http://people.example.com}name')
    print name.text
    for char in actor.findall('{http://characters.example.com}character'):
        print ' |-->', char.text

搜索命名空间XML示例的更好方法是使用您自己的前缀创建一个字典，并在搜索函数中使用这些字典

ns = {'real_person': 'http://people.example.com',
      'role': 'http://characters.example.com'}

for actor in root.findall('real_person:actor', ns):
    name = actor.find('real_person:name', ns)
    print name.text
    for char in actor.findall('role:character', ns):
        print ' |-->', char.text

看下这俩种方法输出：

John Cleese
 |--> Lancelot
 |--> Archie Leach
Eric Idle
 |--> Sir Robin
 |--> Gunther
 |--> Commander Clement

1.5 Xpath 匹配支持

ElementTree 提供了友好的xpath支持，以便在树中定位元素，

贴一下官网演示示例:

import xml.etree.ElementTree as ET

root = ET.fromstring(countrydata)

# Top-level elements
root.findall(".")

# All 'neighbor' grand-children of 'country' children of the top-level
# elements
root.findall("./country/neighbor")

# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")

# 'year' nodes that are children of nodes with name='Singapore'
root.findall(".//*[@name='Singapore']/year")

# All 'neighbor' nodes that are the second child of their parent
root.findall(".//neighbor[2]")

tag	选择具有给定标记的所有子元素。例如，spam选择指定的所有子元素spam，并spam/egg选择指定的所有孙子egg的所有命名的孩子 spam。
*	选择所有子元素。例如，*/egg 选择所有名为的孙子egg。
.	选择当前节点。这在路径的开头非常有用，表明它是一个相对路径。
//	选择当前元素下所有级别的所有子元素。例如，.//egg选择egg整个树中的所有元素。
..	选择父元素。
[@attrib]	选择具有给定属性的所有元素。
[@attrib='value']	选择给定属性具有给定值的所有元素。该值不能包含引号。
[tag]	选择具有子命名的所有元素 tag。只支持直系孩子。
[tag='text']	选择具有名为tag其完整文本内容（包括后代）的子项的所有元素等于给定的元素 text。
[position]	选择位于给定位置的所有元素。位置可以是整数（1是第一个位置），表达式last() （对于最后位置），或相对于最后位置的位置（例如last()-1）。

1.6 下面方法仅供参考

xml.etree.ElementTree.Comment(text=None)

xml.etree.ElementTree.dump(elem)

xml.etree.ElementTree.fromstring(text)

xml.etree.ElementTree.fromstringlist(sequence, parse=None)

xml.etree.ElementTree.iselement(element)

xml.etree.ElementTree.iterparse(source, events=None, parser=None)

xml.etree.ElementTree.parse(source, parser=None)

xml.etree.ElementTree.ProcessingInstruction(target, text=None)

xml.etree.ElementTree.register_namespace(prefix, url)

xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)

xml.etree.ElementTree.tostring(element, encoding='us-ascii',method="xml")

xml.etree.ElementTree.tostringlist(element, encoding='us-ascii',method='xml')

xml.etree.ElementTree.XML(text, parser=None)

xml.etree.ElementTree.Comment(text, parser=None)

1.7 元素对象, 供参考

以下方法适用元素属性

tag(): 字符串类型，标识标识次元素类型

text(): 元素内容

attrib(): 属性字典

clear(): 清楚所有子元素

get(key, default=None): 获取key的元素属性，返回属性值，默认None

items(): 将元素属性以元组形式返回

keys(): 列表形式返回属性名称

set(key, value): 将元素上的属性设置为value

以下方法适用于元素子元素

append(subelement): 添加子元素

extend(subelements)：零个或多个元素序列追加子元素

find(match)：查找匹配第一个子元素，返回一个元素示例或None

findall(match): 按照标签名称寻找子元素

findtext(match, default=None)：匹配第一个子元素的文本，match可以是标签名

insert(index, element):在元素指定位置插入子元素

iter(tag=None): 使用当前元素作为跟创建树迭代器

iterfind(match): 按照标签名和xpath查找所有匹配的子元素，返回迭代

itertext()：创建文本迭代器，迭代器按照文本顺序遍历元素和子元素，返回全部文本

makeelement(tag, attrib)：创建与元素具有相同类型的新元素对象

remove(subelement): 从元素中删除子元素

注意：没有子元素的元素为False

element = root.find('foo')

if not element:  # careful!
    print "element not found, or element has no subelements"

if element is None:
    print "element not found"

至此，xml.etree.ElementTree 学习记录完毕，希望能帮到大家 SANQ.

变量很难起

关注

4
点赞
踩
46

收藏

觉得还不错? 一键收藏
0
评论
python3. xml.etree.ElementTree 学习记录

XML是一种分层数据格式，表示它最自然的方式是使用树，ElementTree将整个XML文档表示为树，element表示树中单个节点，最后一层一层获取自己要的节点内容。
复制链接

扫一扫