https://www.cnblogs.com/handsome1013/p/10058838.html
ET.Parser 用法
https://www.cnblogs.com/yezuhui/p/6853323.html
Python3 xml解析模块xml.etree.ElementTree简介
删除重复xml节点
import xml.etree.ElementTree as ET----------导入xml模块
root = ET.parse('GHO.xml')------------------分析指定xml文件
tree = root.getroot()-----------------------获取第一标签
data = tree.find('Data')--------------------查找第一标签中'Data'标签
for obs in data:----------------------------历遍'Data'中的所有标签
for item in obs:------------------------历遍'Data'中的'obs'标签下的所有标签
key = item.attrib()-----------------提取key值参数
print(list(key))--------------------输出key值
如何读取属性及节点内容。
怎样将data中的 id,name及其值取出来?
问题解释
两种方式:
1.先取得node
String strID = node.getAttributes().getNamedItem("id").getNodeValue();
String strName = node.getAttributes().getNamedItem("name").getNodeValue();
2.先取得element
String strID = element.getAttribute("id");
String strName = element.getAttribute("name");
小练习
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as ET
tree = ET.parse('abcdefg.xml')
root = tree.getroot()
iter_elem = root.findall('.//*')
print(len(iter_elem))
#elem = root.find('')
#print iter_elem
for element in iter_elem:
if element is None:
continue
if element.text is None:
continue
print("hello")
context=[]
src_elem = element.find("source")
if src_elem is None:
continue
context.append(src_elem.text)
print( "attri :%s"%src_elem.attrib)
print("tag :%s"%src_elem.tag)
#for item in src_elem:
# key = item.text()
# print list(key)
del duplicatd node:
import xml.etree.ElementTree as ET
path = 'in.xml'
tree = ET.parse(path)
root = tree.getroot()
prev = None
def elements_equal(e1, e2):
if type(e1) != type(e2):
return False
if e1.tag != e1.tag: return False
if e1.text != e2.text: return False
if e1.tail != e2.tail: return False
if e1.attrib != e2.attrib: return False
if len(e1) != len(e2): return False
return all([elements_equal(c1, c2) for c1, c2 in zip(e1, e2)])
for page in root: # iterate over pages
elems_to_remove = []
for elem in page:
if elements_equal(elem, prev):
print("found duplicate: %s" % elem.text) # equal function works well
elems_to_remove.append(elem)
continue
prev = elem
for elem_to_remove in elems_to_remove:
page.remove(elem_to_remove)
tree.write("out.xml")