python判断xml的iter为空_对大型XML文件使用Python Iterparse

最新推荐文章于 2021-01-29 13:07:26 发布

weixin_39811842

最新推荐文章于 2021-01-29 13:07:26 发布

阅读量74

点赞数

文章标签： python判断xml的iter为空

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39811842/article/details/111448106

版权

尝试Liza Daly的fast_iter。处理完元素之后elem，它会调用elem.clear()以移除后代，并移除之前的兄弟姐妹。

def fast_iter(context, func, *args, **kwargs):

"""

http://lxml.de/parsing.html#modifying-the-tree

Based on Liza Daly's fast_iter

http://www.ibm.com/developerworks/xml/library/x-hiperfparse/

See also http://effbot.org/zone/element-iterparse.htm

"""

for event, elem in context:

func(elem, *args, **kwargs)

# It's safe to call clear() here because no descendants will be

# accessed

elem.clear()

# Also eliminate now-empty references from the root node to elem

for ancestor in elem.xpath('ancestor-or-self::*'):

while ancestor.getprevious() is not None:

del ancestor.getparent()[0]

del context

def process_element(elem):

print elem.xpath( 'description/text( )' )

context = etree.iterparse( MYFILE, tag='item' )

fast_iter(context,process_element)

Daly的文章非常不错，特别是在处理大型XML文件时。

编辑：fast_iter上面发布的是Daly的修改版本fast_iter。在处理完一个元素之后，它会更积极地删除不再需要的其他元素。

下面的脚本显示了行为上的差异。特别注意orig_fast_iter不要删除A1元素，而mod_fast_iter确实删除它，从而节省更多的内存。

import lxml.etree as ET

import textwrap

import io

def setup_ABC():

content = textwrap.dedent('''\

1

2

''')

return content

def study_fast_iter():

def orig_fast_iter(context, func, *args, **kwargs):

for event, elem in context:

print('Processing {e}'.format(e=ET.tostring(elem)))

func(elem, *args, **kwargs)

print('Clearing {e}'.format(e=ET.tostring(elem)))

elem.clear()

while elem.getprevious() is not None:

print('Deleting {p}'.format(

p=(elem.getparent()[0]).tag))

del elem.getparent()[0]

del context

def mod_fast_iter(context, func, *args, **kwargs):

"""

http://www.ibm.com/developerworks/xml/library/x-hiperfparse/

Author: Liza Daly

See also http://effbot.org/zone/element-iterparse.htm

"""

for event, elem in context:

print('Processing {e}'.format(e=ET.tostring(elem)))

func(elem, *args, **kwargs)

# It's safe to call clear() here because no descendants will be

# accessed

print('Clearing {e}'.format(e=ET.tostring(elem)))

elem.clear()

# Also eliminate now-empty references from the root node to elem

for ancestor in elem.xpath('ancestor-or-self::*'):

print('Checking ancestor: {a}'.format(a=ancestor.tag))

while ancestor.getprevious() is not None:

print(

'Deleting {p}'.format(p=(ancestor.getparent()[0]).tag))

del ancestor.getparent()[0]

del context

content = setup_ABC()

context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')

orig_fast_iter(context, lambda elem: None)

# Processing 1

# Clearing 1

# Deleting B1

# Processing 2

# Clearing 2

# Deleting B2

print('-' * 80)

"""

The improved fast_iter deletes A1. The original fast_iter does not.

"""

content = setup_ABC()

context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')

mod_fast_iter(context, lambda elem: None)

# Processing 1

# Clearing 1

# Checking ancestor: root

# Checking ancestor: A1

# Checking ancestor: C

# Deleting B1

# Processing 2

# Clearing 2

# Checking ancestor: root

# Checking ancestor: A2

# Deleting A1

# Checking ancestor: C

# Deleting B2

study_fast_iter()

weixin_39811842

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python判断xml的iter为空_对大型XML文件使用Python Iterparse

尝试Liza Daly的fast_iter。处理完元素之后elem，它会调用elem.clear()以移除后代，并移除之前的兄弟姐妹。def fast_iter(context, func, *args, **kwargs):"""http://lxml.de/parsing.html#modifying-the-treeBased on Liza Daly's fast_iterhttp://w...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。