如果您有一个大的xml不能放入内存,那么可以尝试一次序列化一个元素。例如,假设...文档结构并忽略可能的命名空间问题:import xml.etree.cElementTree as etree
def getelements(filename_or_file, tag):
context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
_, root = next(context) # get root element
for event, elem in context:
if event == 'end' and elem.tag == tag:
yield elem
root.clear() # free memory
with open('output.xml', 'wb') as file:
# start root
file.write(b'')
for page in getelements('sample.xml', 'page'):
if keep(page):
file.write(etree.tostring(page, encoding='utf-8'))
# close root
file.write(b'')
其中keep(page)返回True如果page应该保留,例如:
^{pr2}$
为了进行比较,要修改一个小的xml文件,您可以:# parse small xml
tree = etree.parse('sample.xml')
# remove some root/page elements from xml
root = tree.getroot()
for page in root.findall('page'):
if not keep(page):
root.remove(page) # modify inplace
# write to a file modified xml tree
tree.write('output.xml', encoding='utf-8')