python解析xml基础操作

知识充实人生

已于 2022-09-04 08:40:36 修改

阅读量3k

点赞数 2

分类专栏： python 常用软件工具文章标签： xml

于 2022-09-04 01:31:04 首次发布

本文链接：https://blog.csdn.net/zyp626/article/details/126450216

版权

常用软件工具同时被 2 个专栏收录

10 篇文章 1 订阅

订阅专栏

python

8 篇文章 2 订阅

订阅专栏

一、xml文件创建

方法一：使用xml.dom.minidom

2、根节点下所有的节点标签查找category标签

1、使用find/findall函数搜索不到标签

一、xml文件创建

方法一：使用xml.dom.minidom

1、文件、标签的创建

import xml.etree.ElementTree as etree
from xml.dom.minidom import Document
from xml.etree.ElementTree import Element as El
# 创建xml文件
doc = Document()
#创建根节点
root_node = doc.createElement("root")
doc.appendChild(root_node)
#创建子节点
son_node = doc.createElement("son_node")
root_node.appendChild(son_node)
#子节点添加内容
text = doc.createTextNode("标签内容")
son_node.appendChild(text)
#设置节点属性
son_node.setAttribute("name","value")
son_node.setAttribute("name1","value1")
#添加二级子节点
sec_node=doc.createElement("second")
son_node.appendChild(sec_node)
text=doc.createTextNode("二级子节点内容")
sec_node.appendChild(text)
#将内容保存到xml文件中
filename = "test.xml"
f = open(filename, "w",encoding="utf-8")
f.write(doc.toprettyxml(indent="  "))
f.close()

结果图

方法二：使用ElementTree

import xml.etree.ElementTree as etree
#创建根元素
root=etree.Element("root")
#创建子元素son,并设置子元素的标签名，属性
son=etree.SubElement(root,"max",attrib={"sex":"male"})
#设置子元素son的内容
son.text="content"
#创建子元素的子元素sub_son
sub_son=etree.SubElement(son,"lily",attrib={"sex":"female"})
#创建elementtree实例
et=etree.ElementTree(element=root)
et.write(r"C:\Users\Administrator\Desktop\test_python\CIRCLE2_TEST\test1.xml",encoding="utf-8")

结果图

二、xml文件读取

1、根节点直属标签查找

import xml.etree.ElementTree as etree

xml_path=r"C:\Users\Administrator\Desktop\test_python\feed.xml"
tree=etree.parse(xml_path)           #获取xml整个文档内容
root=tree.getroot()                  #获取根节点
#展示根节点root的直属节点
tag=list(root)                    #此处也可用tag=root.getchildren()
print("tag:",tag)
#搜索根节点下的指定标签link,查找第一个
tag_find=root.find("{http://www.w3.org/2005/Atom}link")  #此处因为xml指定了命名空间，所以要加上{http://www.w3.org/2005/Atom}
print("tag_find:",tag_find)
#搜索根节点下的所有标签名称为link，返回结果列表,
tag_all=root.findall("{http://www.w3.org/2005/Atom}link")   
print("tag_all:",tag_all)

结果图

2、根节点下所有的节点标签查找category标签

#在所有标签中（包含各级子标签）中搜索text标签
text=root.find(".//{http://www.w3.org/2005/Atom}category")   #需加入.//
print("有.// ",text)
text1=root.find("{http://www.w3.org/2005/Atom}category")
print("无.// ",text1)

结果对比

3、获取指定标签的内容，属性

tag=root.find(".//{http://www.w3.org/2005/Atom}summary")   #需加入.//
#获取标签的内容
content=tag.text
print("content:",content)
#获取标签的属性
attribute=tag.attrib
print("attribute:",attribute)

结果图

三、xml文件修改

1、修改标签内容，属性

#修改标签的内容
tag.text="modify_content"
#修改标签的属性，或者添加属性
tag.set("atrri","value3")
#此操作将删除其他属性，只保留设置的属性
tag.attrib={"atrri":"value"}

结果图，可以看到28行summary中内容属性已被修改

2、增加子标签

tag=root.find(".//name")   #需加入.//
#name标签下增加sex和address标签
sex=etree.SubElement(tag,"sex",attrib={"hobby":"swim"})
sex.text="male"
addr=etree.SubElement(tag,"address",attrib={"provience":"guangdong"})
addr.text="shenzhen"
tree.write(xml_path)

结果图

四、xml操作之删除

1、删除指定标签

xml_path=r"C:\Users\Administrator\Desktop\test_python\feed.xml"
tree=etree.parse(xml_path)           #获取xml整个文档内容
for t in tree.iter():                #tree.iter（）可获得xml的所有节点及信息
    if t.tag=="entry":               #查找到父节点
        print(list(t))
        for i in list(t):
            if i.tag=="category":   #查找到子节点
                t.remove(i)         #通过父节点删除子节点
                break               #如果要删除父节点下所有子节点为category的，则为continue
tree.write(xml_path)

2、删除指定标签其下所有子标签以及属性

clear操作会删除标签下的所有属性，内容，以及子标签，但仍会保留该标签名

tag=root.find(".//summary")   #需加入.//
#删除标签tag的所有内容，包括属性，内容，和子标签,只保留该标签名
tag.clear()
tree.write(xml_path)

结果图

3、删除xml文件

xml_path=r"C:\Users\Administrator\Desktop\test_python\feed - 副本 (2).xml"
os.remove(xml_path)

五、常见问题

1、使用find/findall函数搜索不到标签

原因：xml中指定了命名空间，find中的标签名前需加上命名空间

tag_find=root.find("{http://www.w3.org/2005/Atom}link")  #此处因为xml指定了命名空间，所以要加上{http://www.w3.org/2005/Atom}

知识充实人生

关注

2
点赞
踩
25

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录