为了完成类似的任务,我成功地使用了celementree.iterparse方法。
我有一个带有“resFrame”标记的重复“entries”的大型xml文档,我想筛选出特定id的条目。下面是我用于此文档的代码:
源文档具有此结构
....
....
....
...
234234.....
344234.....
...
...
我使用下面的脚本创建了一个较小的文档,该文档具有相同的结构、bucket条目,并且只重新构建具有特定id的条目#!/usr/bin/env python2.6
import xml.etree.cElementTree as cElementTree
start = '''<?xml version="1.0" encoding="UTF-8"?>
'''
def main():
print start
context = cElementTree.iterparse('snap.xml', events=("start", "end"))
context = iter(context)
event, root = context.next() # get the root element of the XML doc
for event, elem in context:
if event == "end":
if elem.tag == 'bucket': # i want to write out all entries
elem.tail = None
print cElementTree.tostring( elem )
if elem.tag == 'resFrame':
if elem.find("id").text == ":4:39644:482:-1:1": # i only want to write out resFrame entries with this id
elem.tail = None
print cElementTree.tostring( elem )
if elem.tag in ['bucket', 'frame', 'resFrame']:
root.clear() # when done parsing a section clear the tree to safe memory
print ""
main()