新建测试数据 example.xml
<?xml version="1.0" encoding="utf-8"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
可以使用 ET 模块的parse()
函数来从指定的 XML 文件构造一个ElementTree
对象:
import xml.etree.ElementTree as ET
# 获取XML文档对象ElementTree
tree = ET.parse('C:\\Users\micha\\PycharmProjects\\nlp\\inspect\\example.xml')
# 获取XML文档对象的根节点element
root = tree.getroot()
print(root.tag) # output: data
ET 模块的fromstring()
函数提供从 XML 字符串构造一个Element
对象的功能。
# Generate string representation of XML element生成XML元素的字符串表示
xml_str = ET.tostring(root)
print(xml_str)
#Parse XML document from sequence of string fragments
root = ET.fromstring(xml_str)
print(root.tag)
# output:
b'<data>\n <country name="Liechtenstein">\n <rank>1</rank>\n <year>2008</year>\n <gdppc>141100</gdppc>\n <neighbor direction="E" name="Austria" />\n <neighbor direction="W" name="Switzerland" />\n </country>\n <country name="Singapore">\n <rank>4</rank>\n <year>2011</year>\n <gdppc>59900</gdppc>\n <neighbor direction="N" name="Malaysia" />\n </country>\n <country name="Panama">\n <rank>68</rank>\n <year>2011</year>\n <gdppc>13600</gdppc>\n <neighbor direction="W" name="Costa Rica" />\n <neighbor direction="E" name="Colombia" />\n </country>\n</data>'
data
查找 XML 结点
Element
类提供了Element.iter()
方法来查找指定的结点。Element.iter()
会递归查找所有的子结点,以便查找到所有符合条件的结点。
# 递归查找所有的neighbor子节点
for neighbor in root.iter("neighbor"):
print(neighbor.attrib)
# output
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}
如果使用Element.findall()
或者Element.find()
方法,则只会从结点的直接子结点中查找,并不会递归查找。
for country in root.findall("country"):
rank = country.find("rank").text
year = country.find("year").text
name = country.get("name")
print(name,rank,year)
# output
Liechtenstein 1 2008
Singapore 4 2011
Panama 68 2011