Python 使用 ElementTree 解析 XML 文件

cnhwl

已于 2022-06-27 15:18:02 修改

阅读量1.6k

点赞数 6

分类专栏：机器学习与数据科学文章标签： xml python 开发语言

于 2022-06-10 17:19:46 首次发布

本文链接：https://blog.csdn.net/cnhwl/article/details/125223496

版权

机器学习与数据科学专栏收录该内容

5 篇文章 3 订阅

订阅专栏

关于 XML 文件的简介，看看菜鸟就可以了，链接在此。

假设我们有个存放电影数据的 XML 文件：movies.xml，其内容如下：

<?xml version="1.0"?>
<collection>
    <genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of  the Covenant before the Nazis.'
                </description>
            </movie>
               <movie favorite="True" title="THE KARATE KID">
               <format multiple="Yes">DVD,Online</format>
               <year>1984</year>
               <rating>PG</rating>
               <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
               <format multiple="False">Blu-ray</format>
               <year>1985</year>
               <rating>PG</rating>
               <description>Marty McFly</description>
            </movie>
        </decade>
        <decade years="1990s">
            <movie favorite="False" title="X-Men">
               <format multiple="Yes">dvd, digital</format>
               <year>2000</year>
               <rating>PG-13</rating>
               <description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description>
            </movie>
            <movie favorite="True" title="Batman Returns">
               <format multiple="No">VHS</format>
               <year>1992</year>
               <rating>PG13</rating>
               <description>NA.</description>
            </movie>
               <movie favorite="False" title="Reservoir Dogs">
               <format multiple="No">Online</format>
               <year>1992</year>
               <rating>R</rating>
               <description>WhAtEvER I Want!!!?!</description>
            </movie>
        </decade>    
    </genre>

    <genre category="Thriller">
        <decade years="1970s">
            <movie favorite="False" title="ALIEN">
                <format multiple="Yes">DVD</format>
                <year>1979</year>
                <rating>R</rating>
                <description>"""""""""</description>
            </movie>
        </decade>
        <decade years="1980s">
            <movie favorite="True" title="Ferris Bueller's Day Off">
                <format multiple="No">DVD</format>
                <year>1986</year>
                <rating>PG13</rating>
                <description>Funny movie on funny guy </description>
            </movie>
            <movie favorite="FALSE" title="American Psycho">
                <format multiple="No">blue-ray</format>
                <year>2000</year>
                <rating>Unrated</rating>
                <description>psychopathic Bateman</description>
            </movie>
        </decade>
    </genre>
</collection>

可以看到，XML 文件是由多个被称为元素（Element）的东西组成的，每个元素都是有头有尾的，以 <xxx> 开头，以 </xxx> 结尾。可以把元素理解为树的一个个节点，每个元素主要有三个特征：
1、tag，标签，即 XML 文件中在括号里的，被标红色的部分，是个字符串；
2、atrrib，属性，即 XML 文件中在括号里的，被标黄色和绿色的部分，它们会组成一个字典 dict，黄色的就是 key，绿色的就是 value；
3、text，文本，即 XML 文件中不在括号里的，例如：

                ...
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of  the Covenant before the Nazis.'
                </description>
                ...

使用 Python 解析 XML 文件十分简单，首先导入 ElementTree 库并且读入文件：

import xml.etree.ElementTree as ET
tree = ET.parse('movies.xml')
root = tree.getroot()

此时查看 root，可以看到输出就是一个元素：

<Element 'collection' at 0x0000026DF3130728>

很简单地就可以找到元素的三个特征：

print(root.tag)
print(root.attrib)
print(root.text)
'''
collection
{}

'''

这表明该元素的 tag 为 collection，attrib 为空的字典，text 为空。

由于这个元素同时也相当于根节点，所以可以遍历它的子节点，有多种方法：

1、把元素看作是存放子节点的列表，直接索引

print(root[0])
print(root[0].tag)
print(root[0].attrib)
print(root[0].text)
'''
<Element 'genre' at 0x0000026DF3130778>
genre
{'category': 'Action'}

'''

print(root[0][0][0][3])
print(root[0][0][0][3].tag)
print(root[0][0][0][3].attrib)
print(root[0][0][0][3].text)
'''
<Element 'description' at 0x0000026DF3130B38>
description
{}

                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of  the Covenant before the Nazis.'
'''

for 循环可以索引多个

for child in root:
    print(child.tag, child.attrib)
'''
genre {'category': 'Action'} 
genre {'category': 'Thriller'} 
'''

2、用 root.iter(tag)，可以遍历得到某个 tag 的所有元素

for movie in root.iter('movie'):
    print(movie.tag, movie.attrib)
'''
movie {'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
movie {'favorite': 'True', 'title': 'THE KARATE KID'}
movie {'favorite': 'False', 'title': 'Back 2 the Future'}
movie {'favorite': 'False', 'title': 'X-Men'}
movie {'favorite': 'True', 'title': 'Batman Returns'}
movie {'favorite': 'False', 'title': 'Reservoir Dogs'}
movie {'favorite': 'False', 'title': 'ALIEN'}
movie {'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
movie {'favorite': 'FALSE', 'title': 'American Psycho'}
'''

cnhwl

关注

6
点赞
踩
25

收藏

觉得还不错? 一键收藏
0
评论
Python 使用 ElementTree 解析 XML 文件

关于 XML 文件的简介，看看菜鸟就可以了，链接在此。假设我们有个存放电影数据的 XML 文件：movies.xml，其内容如下：可以看到，XML 文件是由多个被称为元素（Element）的东西组成的，每个元素都是有头有尾的，以开头，以结尾。可以把元素理解为树的一个个节点，每个元素主要有三个特征：1、tag，标签，即 XML 文件中在括号里的，被标红色的部分，是个字符串；2、atrrib，属性，即 XML 文件中在括号里的，被标黄色和绿色的部分，它们会组成一个字典 dict，黄色的就是 key
复制链接

扫一扫