python 对xml文件的操作

最新推荐文章于 2024-09-02 07:14:24 发布

Tomator01

最新推荐文章于 2024-09-02 07:14:24 发布

阅读量2.9k

点赞数 2

分类专栏： python 文章标签： python xml

本文链接：https://blog.csdn.net/big_pai/article/details/86659639

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

有用请点赞，没用请差评。

欢迎分享本文，转载请保留出处。

一、对xml文件的解析

python解析XML常见的有三种方法：一是xml.dom.*模块，它是W3C DOM API的实现，若需要处理DOM API则该模块很适合；二是xml.sax.*模块，它是SAX API的实现，这个模块牺牲了便捷性来换取速度和内存占用，SAX是一个基于事件的API，这就意味着它可以“在空中”处理庞大数量的的文档，不用完全加载进内存；三是xml.etree.ElementTree模块（简称 ET），它提供了轻量级的Python式的API，相对于DOM来说ET 快了很多，而且有很多令人愉悦的API可以使用，相对于SAX来说ET的ET.iterparse也提供了 “在空中” 的处理方式，没有必要加载整个文档到内存，ET的性能的平均值和SAX差不多，但是API的效率更高一点而且使用起来很方便。

以DOM树来解析更直观，所以笔者只示例第一种、第三种的写法。

1.1 xml.dom.minidom

文件对象模型（Document Object Model，简称DOM），是W3C组织推荐的处理可扩展置标语言的标准编程接口。一个 DOM 的解析器在解析一个XML文档时，一次性读取整个文档，把文档中所有元素保存在内存中的一个树结构里，之后你可以利用DOM 提供的不同的函数来读取或修改文档的内容和结构，也可以把修改过的内容写入xml文件。python中用xml.dom.minidom来解析xml文件。

以一段web漏洞说明Xml文件为例(：

Test.xml

<?xml version="1.0" encoding="utf-8"?>
<WebApplicationTest>
	<TestDescription name="ExtJS charts.swf cross site scripting" version="0.1" released="20080307" updated="20140527" protocol="FTP" mayproxy="false" affects="server" severity="high" alert="success" type="Configuration">
		<Description>The ExtJS JavaScript framework that is shipped with TYPO3 also delivers a flash file to show charts. This file is susceptible to cross site scripting (XSS). This vulnerability can be exploited without any authentication.</Description>
		<ApplicableTo>
			<Platform>
				<OS>All</OS>
				<Arch>*</Arch>
			</Platform>
			<WebServer>*</WebServer>
			<ApplicationServer>*</ApplicationServer>
		</ApplicableTo>
	</TestDescription>
</WebApplicationTest>

采用DOM树解析：


from xml.dom.minidom import parse
import xml.dom.minidom
import  os

xml_file="D:\\python3_anaconda3\\学习\\test\\test.xml"

# 使用minidom解析器打开 XML 文档,得到文档对象
DOMTree = xml.dom.minidom.parse(xml_file)

#得到元素对象
collection = DOMTree.documentElement

#获得子标签
target_tag = collection.getElementsByTagName("TestDescription")
print(type(target_tag))

# 获得标签属性值
print(target_tag[0].getAttribute('severity'))

# 获取OS标签
target_tag2 = collection.getElementsByTagName("OS")

# 获取标签对之间的数据
print(target_tag2[0].firstChild.data)

print("*"*20)

#修改xml属性或者标签对数据
target_tag[0].setAttribute('severity','medium')
target_tag2[0].firstChild.data='no'

#注意必须要重新写入文件，可以是新文件或者old文件
with open(xml_file,'w') as f:
    DOMTree.writexml(f)

out:

1.2 xml.etree.ElementTree

　　ElementTree在Python标准库中有两种实现：一种是纯Python实现的，如xml.etree.ElementTree，另一种是速度快一点的xml.etree.cElementTree。尽量使用C语言实现的那种，因为它速度更快，而且消耗的内存更少。

import xml.etree.ElementTree as ET

xml_file="D:\\python3_anaconda3\\学习\\test\\test.xml"

tree = ET.parse(xml_file)
print("tree type:", type(tree))

# 获得根节点
root = tree.getroot()
print ("root type:", type(root))
print (root.tag, "----", root.attrib)

#使用下标访问
print (root[0][0].tag)
print (root[0][0].text)

# 遍历root[0][1]的下一层
for child in root[0][1][0]:
    print("遍历root的下一层", child.tag, "----", child.attrib)

#根据标签名查找
captionList = root[0][1][0].findall("OS")
print(len(captionList))
print(captionList[0].tag,":",captionList[0].text)

print("*"*20)

# 修改xml文件
captionList[0].set("item","9999")
captionList[0].text="5555"

tree.write(xml_file)

out: