Python学习——xml模块

最新推荐文章于 2024-05-25 16:23:58 发布

浅洛帆

最新推荐文章于 2024-05-25 16:23:58 发布

阅读量460

点赞数 1

分类专栏： python学习文章标签： python xml

本文链接：https://blog.csdn.net/angelpumpkin/article/details/80066325

版权

python学习专栏收录该内容

20 篇文章 1 订阅

订阅专栏

一、简述

xml即可扩展标记语言，它可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。它用于不同语言或者程序之间进行数据交换，从这点上讲与json差不多，只不过json看起来更美观、可读性更强。另外json诞生的时间并不是很久，在json出现以前，数据交换只能选择xml，即便是json已经在大面积使用的现在，xml依然被广泛使用，java项目中随处可见啊。

二、xml的结构

先来看一个栗子把：

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2018</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2018</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

OK,从结构上，它很像我们常见的HTML超文本标记语言。但他们被设计的目的是不同的，超文本标记语言被设计用来显示数据，其焦点是数据的外观。它被设计用来传输和存储数据，其焦点是数据的内容。
结构特征解读如下：

它由成对的标签组成
一级标签称为根节点，其他级别的标签称为节点
标签可以有属性
标签对可以嵌入数据 2011
嵌入的数据即为节点的值

标签可以嵌入子标签（具有层级关系）

三、通过python操作xml文件

就以上面的xml文件为例来看看怎么通过python操作xml文件把。

3.1 读取xml文件内容

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as et

tree = et.parse('test.xml')
root = tree.getroot()  # 获取根节点
print(root.tag)  # 打印根节点

for child in root:
    print('-----------')
    print('\t', child.tag, child.attrib)  # 分别打印子节点名称和子节点属性

    for i in child:
        print('\t\t', i.tag, i.text,  i.attrib)  # 打印子节点下节点的节点名、节点值和属性
    # 只遍历year节点
    for i in child.iter('year'):
        print('\t\t\t', i.tag, i.text)

print('')
for node in root.iter('year'):
    print(node.tag, node.text)

输出：
data
-----------
     country {'name': 'Liechtenstein'}
         rank 2 {'updated': 'yes'}
         year 2008 {}
         gdppc 141100 {}
         neighbor None {'direction': 'E', 'name': 'Austria'}
         neighbor None {'direction': 'W', 'name': 'Switzerland'}
             year 2008
-----------
     country {'name': 'Singapore'}
         rank 5 {'updated': 'yes'}
         year 2018 {}
         gdppc 59900 {}
         neighbor None {'direction': 'N', 'name': 'Malaysia'}
             year 2018
-----------
     country {'name': 'Panama'}
         rank 69 {'updated': 'yes'}
         year 2018 {}
         gdppc 13600 {}
         neighbor None {'direction': 'W', 'name': 'Costa Rica'}
         neighbor None {'direction': 'E', 'name': 'Colombia'}
             year 2018

year 2008
year 2018
year 2018

说明：

getroot()用于返回根节点，tag返回节点名，attrib返回节点属性，text返回节点的值；
只返回某个节点的信息，使用iter(节点名)即可。

3.2 修改xml文件内容

"""修改xml文件内容"""
import xml.etree.ElementTree as et

tree = et.parse('test.xml')
root = tree.getroot()

print(type(root.iter('year')))
for node in root.iter('year'):
    new_year = int(node.text) + 1
    node.text = str(new_year)  # 修改节点值
    node.tag = 'next_year'  # 修改节点名称
    node.set('Pumpkin', 'handsome')  # 修改节点属性

tree.write('test2.xml')  # 保存文件

注意最后一步保存操作不能漏掉！
修改的实际效果如下：

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <next_year Pumpkin="handsome">2009</next_year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <next_year Pumpkin="handsome">2019</next_year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <next_year Pumpkin="handsome">2019</next_year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>

3.3 删除xml节点

import xml.etree.ElementTree as et

tree = et.parse('test.xml')
root = tree.getroot()


for country in root.findall('country'):  # 查找第一层子节点
    rank = int(country.find('rank').text)  # 查找子节点下的子节点
    if rank > 50:
        root.remove(country)  # 删除符合条件的节点

tree.write('test2.xml')

注意：

findall()从根节点只能根据第一层的子节点名查找，并且返回第一层子节点的内存地址；
删除节点用remove()方法；
删除后需要write保存。

3.4 创建新的xml文件

import xml.etree.ElementTree as et

# 创建根节点
new_xml = et.Element('profile')
# 创建根节点的第一层子节点，参数依次表示父节点，子节点名称，子节点属性
name = et.SubElement(new_xml, 'name', attrib={'LuZhiShen': 'HuaHeShang'})
age = et.SubElement(name, 'age', attrib={'adult': 'yes'})
# 设置子节点的值
age.text = '22'
gender = et.SubElement(name, 'gender')
gender.text = 'man'
# 创建第二个根节点的第一层子节点
name2 = et.SubElement(new_xml, 'name', attrib={'WuYong': 'Zhiduoxing'})
age2 = et.SubElement(name2, 'age')
age2.text = '23'

# 生成新的xml文档
ET = et.ElementTree(new_xml)
# 保存文档
ET.write('my.xml', encoding='utf-8', xml_declaration='true')
# 打印文档格式
et.dump(new_xml)

创建的xml文档格式：
这里写图片描述
可以看出与一般的xml文件相比就差缩进了，不过不影响数据交换啦。

注意：

SubElement()方法用于创建新的节点，它的第一个参数决定了新节点属于什么节点的子节点。

参考：http://www.cnblogs.com/linupython/p/8308315.html

浅洛帆

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python学习——xml模块

一、简述xml即可扩展标记语言，它可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。它用于不同语言或者程序之间进行数据交换，从这点上讲与json差不多，只不过json看起来更美观、可读性更强。另外json诞生的时间并不是很久，在json出现以前，数据交换只能选择xml，即便是json已经在大面积使用的现在，xml依然被广泛使用，java项目中随处可见啊。二...
复制链接

扫一扫

专栏目录