关于 XML 文件的简介,看看菜鸟就可以了,链接在此。
假设我们有个存放电影数据的 XML 文件:movies.xml,其内容如下:
<?xml version="1.0"?>
<collection>
<genre category="Action">
<decade years="1980s">
<movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
<format multiple="No">DVD</format>
<year>1981</year>
<rating>PG</rating>
<description>
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the Covenant before the Nazis.'
</description>
</movie>
<movie favorite="True" title="THE KARATE KID">
<format multiple="Yes">DVD,Online</format>
<year>1984</year>
<rating>PG</rating>
<description>None provided.</description>
</movie>
<movie favorite="False" title="Back 2 the Future">
<format multiple="False">Blu-ray</format>
<year>1985</year>
<rating>PG</rating>
<description>Marty McFly</description>
</movie>
</decade>
<decade years="1990s">
<movie favorite="False" title="X-Men">
<format multiple="Yes">dvd, digital</format>
<year>2000</year>
<rating>PG-13</rating>
<description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description>
</movie>
<movie favorite="True" title="Batman Returns">
<format multiple="No">VHS</format>
<year>1992</year>
<rating>PG13</rating>
<description>NA.</description>
</movie>
<movie favorite="False" title="Reservoir Dogs">
<format multiple="No">Online</format>
<year>1992</year>
<rating>R</rating>
<description>WhAtEvER I Want!!!?!</description>
</movie>
</decade>
</genre>
<genre category="Thriller">
<decade years="1970s">
<movie favorite="False" title="ALIEN">
<format multiple="Yes">DVD</format>
<year>1979</year>
<rating>R</rating>
<description>"""""""""</description>
</movie>
</decade>
<decade years="1980s">
<movie favorite="True" title="Ferris Bueller's Day Off">
<format multiple="No">DVD</format>
<year>1986</year>
<rating>PG13</rating>
<description>Funny movie on funny guy </description>
</movie>
<movie favorite="FALSE" title="American Psycho">
<format multiple="No">blue-ray</format>
<year>2000</year>
<rating>Unrated</rating>
<description>psychopathic Bateman</description>
</movie>
</decade>
</genre>
</collection>
可以看到,XML 文件是由多个被称为元素(Element)的东西组成的,每个元素都是有头有尾的,以 <xxx>
开头,以 </xxx>
结尾。可以把元素理解为树的一个个节点,每个元素主要有三个特征:
1、tag,标签,即 XML 文件中在括号里的,被标红色的部分,是个字符串;
2、atrrib,属性,即 XML 文件中在括号里的,被标黄色和绿色的部分,它们会组成一个字典 dict,黄色的就是 key,绿色的就是 value;
3、text,文本,即 XML 文件中不在括号里的,例如:
...
<description>
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the Covenant before the Nazis.'
</description>
...
使用 Python 解析 XML 文件十分简单,首先导入 ElementTree 库并且读入文件:
import xml.etree.ElementTree as ET
tree = ET.parse('movies.xml')
root = tree.getroot()
此时查看 root,可以看到输出就是一个元素:
<Element 'collection' at 0x0000026DF3130728>
很简单地就可以找到元素的三个特征:
print(root.tag)
print(root.attrib)
print(root.text)
'''
collection
{}
'''
这表明该元素的 tag 为 collection,attrib 为空的字典,text 为空。
由于这个元素同时也相当于根节点,所以可以遍历它的子节点,有多种方法:
1、把元素看作是存放子节点的列表,直接索引
print(root[0])
print(root[0].tag)
print(root[0].attrib)
print(root[0].text)
'''
<Element 'genre' at 0x0000026DF3130778>
genre
{'category': 'Action'}
'''
print(root[0][0][0][3])
print(root[0][0][0][3].tag)
print(root[0][0][0][3].attrib)
print(root[0][0][0][3].text)
'''
<Element 'description' at 0x0000026DF3130B38>
description
{}
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the Covenant before the Nazis.'
'''
for 循环可以索引多个
for child in root:
print(child.tag, child.attrib)
'''
genre {'category': 'Action'}
genre {'category': 'Thriller'}
'''
2、用 root.iter(tag),可以遍历得到某个 tag 的所有元素
for movie in root.iter('movie'):
print(movie.tag, movie.attrib)
'''
movie {'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
movie {'favorite': 'True', 'title': 'THE KARATE KID'}
movie {'favorite': 'False', 'title': 'Back 2 the Future'}
movie {'favorite': 'False', 'title': 'X-Men'}
movie {'favorite': 'True', 'title': 'Batman Returns'}
movie {'favorite': 'False', 'title': 'Reservoir Dogs'}
movie {'favorite': 'False', 'title': 'ALIEN'}
movie {'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
movie {'favorite': 'FALSE', 'title': 'American Psycho'}
'''