python学习之 -- xml.etree.ElementTree解析xml

最新推荐文章于 2024-08-13 04:13:21 发布

帅气好男人_Jack

最新推荐文章于 2024-08-13 04:13:21 发布

阅读量2.3k

点赞数

分类专栏： python 文章标签： xml python

本文链接：https://blog.csdn.net/jackzhouyu/article/details/50542175

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Python -- xml.etree.ElementTree学习

ElementTree的xml是一个轻量级的DOM解析，有解析速度快，消耗内存小等优点

ElementTree中心就是Element类，它是设计用来存储分级tag标签的数据结构；
----------------------------------------------------------------------------------------------------------------------------
1. 先谈谈解析对象，xml的结构：
a. tag标签 string类型
b. attributes 标签属性字典类型数据
c. text 标签的值value
d. 子标签 child element

创建element实例，可以使用构造函数和SubElement；ElementTree结构可以包含许多Element，并且可以转换成xml，也可以从xml解析而来

ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree.

纯手工创建一个xml文件：

  a = ET.Element('a')
		  b = ET.SubElement(a, 'b')
		  c = ET.SubElement(a, 'c')
		  d = ET.SubElement(c, 'd')
		  ET.dump(a)
		<a><b /><c><d /></c></a>

----------------------------------------------------------------------------------------------------------------------------
2. 解析xml的步骤：
以以下country_xml为例：

<?xml version="1.0"?>
			<data>
				<country name="Liechtenstein">
					<rank>1</rank>
					<year>2008</year>
					<gdppc>141100</gdppc>
					<neighbor name="Austria" direction="E"/>
					<neighbor name="Switzerland" direction="W"/>
				</country>
				<country name="Singapore">
					<rank>4</rank>
					<year>2011</year>
					<gdppc>59900</gdppc>
					<neighbor name="Malaysia" direction="N"/>
				</country>
				<country name="Panama">
					<rank>68</rank>
					<year>2011</year>
					<gdppc>13600</gdppc>
					<neighbor name="Costa Rica" direction="W"/>
					<neighbor name="Colombia" direction="E"/>
				</country>
			</data>

------------------------------------------------------------------------------------------------------------------------
import xml.etree.ElementTree as ET
1. 导入xml数据 ---------- 直接从xml文件导入：
ElementTree = ET.parse("country.xml") #整个xml树状结构
Element = ElementTree.getroot() #获取root节点 ElementTree

导入xml数据 ---------- 从一个xml字符串导入，并得首节点:
Element_root = ET.fromstring(count_as_string)
------------------------------------------------------------------------------------------------------------------------
2. 查找数据
查找数据的方法有Element.iter('text') .findall('text') find('text')
iter(): 递归的查找，会查找当前节点，它的子节点。子节点......
findall(): 只会查找当前节点的子节点那一级目录
find():只是查找第一个，查找到后，可以用get('attribute_name')获取属性的值

example:
			#！bin/bash


			__author__ = 'JackZhous'


			import logging
			import xml.etree.ElementTree as ET
			import sys




			def script(xml_path, mode):
				tree = ET.parse(xml_path)
				node_root = tree.getroot()
				iter_mode = '1'
				if iter_mode == mode:
					for node in node_root.iter('country'):
						name = node.get('name')
						year = node.find('year').text
						print ('name = ' , name, 'year = ' , year)
				else:
					for node in node_root.findall('country'):
						name = node.get('name')
						year = node.find('year').text
						print ('name = ' + name, 'year = ' + year)


			if __name__ == '__main__':
				print ("脚本名：", sys.argv[0])
				print ("参数1：" , sys.argv[1])
				print ('参数2：' , sys.argv[2])
				script(sys.argv[1],sys.argv[2])

------------------------------------------------------------------------------------------------------------------------
3. 修改xml数据
根据上一步骤，查找到你感兴趣的数据后，可以使用修改节点属性值(element.text)或者增加/改变属性值set('attributes','values')或者删除某一个节点(remove(element)),最后一步直接输出到文件ElementTree.write('country.xml')
if(name == 'Jackzhous'):
node1.remove(node)
------------------------------------------------------------------------------------------------------------------------
4. 解决有名字空间namespace的xml问题，例如android的manifest里面有xmlns:android="http://schemas.android.com/apk/res/android"
命名空间里面装着很多标签名，防止这些
用字典或者字符串类型数据替换，如上dictionary = {'android':'http://schemas.android.com/apk/res/android'},或者 android_name = 'http://schemas.android.com/apk/res/android'
查找的时候前者用find('android:name',dictionary) 后者直接find(android_name:)

用命令空间进行查找时，需要特殊标识，如下：
android_name = 'http://schemas.android.com/apk/res/android'
查找该名字空间下name="a.b.activity"，则用：
tree.find("./application/Activity[@{"+android_name+"}name='" + "a.b.activity']")这就可以找到
.代表当前节点 application/activity依次在这两个节点下[]这个符号里面表示查找的特性

以上表达式不明白请看：
tag Selects all child elements with the given tag. For example, spam selects all child elements named spam, and spam/egg selects all grandchildren named egg in all children named spam.
* Selects all child elements. For example, */egg selects all grandchildren named egg.
. Selects the current node. This is mostly useful at the beginning of the path, to indicate that it’s a relative path.
// Selects all subelements, on all levels beneath the current element. For example, .//egg selects all egg elements in the entire tree.
.. Selects the parent element.
[@attrib] Selects all elements that have the given attribute.
[@attrib='value'] Selects all elements for which the given attribute has the given value. The value cannot contain quotes.
[tag] Selects all elements that have a child named tag. Only immediate children are supported.
[tag='text'] Selects all elements that have a child named tag whose complete text content, including descendants, equals the given text.
[position] Selects all elements that are located at the given position. The position can be either an integer (1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1).

for循环语法,以android的manifest文件为例：

查找主activity名字

ET.register_namespace('android',android)
	tree = ET.parse(path)
	root = tree.find('application')
	for activity in root.findall('activity'):
		target = activity.find("./intent-filter/action[@{"+ android + "}name='" + "android.intent.action.MAIN']")
		if target is None:
			print('node has no intent-filter')
			continue
		main_activity = activity.get("{%s}name" % android)
		print('got the main activity ' + main_activity)
		break

备注：详情请访问：https://docs.python.org/2/library/xml.etree.elementtree.html?highlight=elementtree