解析XML文件 (python)
第一章 基本运用
文章目录
前言
记录学习python的过程,随着应用增多的笔记,用于记录和复习,欢迎互相讨论和学习
一、解析XML的python库
xml.etree.ElementTree
和xml.dom.minidom
是常用的解析XML文件的库,引入库如下:
1. xml.etree.ElementTree
import xml.etree.ElementTree as ET
2. xml.dom.minidom
import xml.dom.minidom
二、使用步骤
1.解析XML文件
1. 使用etree
xml_path = "./402_fd2.xml"
tree = ET.parse(xml_path)
2. 使用dom
xml_path = "./402_fd2.xml"
# get XML file content
with open(xml_path, 'r', encoding='utf-8') as f:
datasource = f.read()
f.close()
# exchange datasourece to string
ET.fromstring(datasource)
# parse string with dom but minidom
DOMTree = xml.dom.minidom.parseString(datasource)
2.对dom对象进行解析获取数据
1. 获取dom对象的元素 / 标签
collection = DOMTree.documentElement
2. 对标签进行索引获取节点
<tagname>_Node = collection.getElementsByTagName('<tagname>')
# <>里填你需要的索引的标签名称
3. 获取节点下的参数值
在获取的节点对象里,其本质可类比为list对象,如果该对象在xml文件中超过一个,请调用其中需要的节点:
<attribute> = <tagname>[0].getAttribute('<attribute>')
# 如果只有一个节点:
#<attribute> = <tagname>_Node.getAttribute('<attribute>')
# <>里填你需要的参数的名称
4. 遍历父节点下的子节点和参数
套个for循环即可:
for <tagname> in <tagname>_Node:
<tagname*>_Node = <tagname>.getElementsByTagName('<tagname*>')
for <tagname*> in <tagname*>_Node:
<attribute*> = <tagname*>.getAttribute('<attribute*>')
应用
这里贴出应用的源代码,仅供参考
import xml.etree.ElementTree as ET
import xml.dom.minidom
# specify the path to your XML file
xml_path = "./402_fd2.xml"
# get XML file content
with open(xml_path, 'r', encoding='utf-8') as f:
datasource = f.read()
f.close()
# exchange datasourece to string
ET.fromstring(datasource)
# parse string with dom but minidom
DOMTree = xml.dom.minidom.parseString(datasource)
# get dom element
data = []
collection = DOMTree.documentElement
# get parent Node
Spectrum_Node = collection.getElementsByTagName("Spectrum")
# Iterate over the child nodes and parameters under the parent node
for Spectrum in Spectrum_Node:
frequency = []
amplitude = []
number = Spectrum.getAttribute("number")
X = Spectrum.getElementsByTagName('X')
for x in X:
frequency.append(x.getAttribute("id"))
amplitude.append(x.getAttribute('value'))
data.append((number, (frequency, amplitude)))