python判断xml的iter为空_对大型XML文件使用Python Iterparse

尝试Liza Daly的fast_iter。处理完元素之后elem,它会调用elem.clear()以移除后代,并移除之前的兄弟姐妹。

def fast_iter(context, func, *args, **kwargs):

"""

http://lxml.de/parsing.html#modifying-the-tree

Based on Liza Daly's fast_iter

http://www.ibm.com/developerworks/xml/library/x-hiperfparse/

See also http://effbot.org/zone/element-iterparse.htm

"""

for event, elem in context:

func(elem, *args, **kwargs)

# It's safe to call clear() here because no descendants will be

# accessed

elem.clear()

# Also eliminate now-empty references from the root node to elem

for ancestor in elem.xpath('ancestor-or-self::*'):

while ancestor.getprevious() is not None:

del ancestor.getparent()[0]

del context

def process_element(elem):

print elem.xpath( 'description/text( )' )

context = etree.iterparse( MYFILE, tag='item' )

fast_iter(context,process_element)

Daly的文章非常不错,特别是在处理大型XML文件时。

编辑:fast_iter上面发布的是Daly的修改版本fast_iter。在处理完一个元素之后,它会更积极地删除不再需要的其他元素。

下面的脚本显示了行为上的差异。特别注意orig_fast_iter不要删除A1元素,而mod_fast_iter确实删除它,从而节省更多的内存。

import lxml.etree as ET

import textwrap

import io

def setup_ABC():

content = textwrap.dedent('''\

1

2

''')

return content

def study_fast_iter():

def orig_fast_iter(context, func, *args, **kwargs):

for event, elem in context:

print('Processing {e}'.format(e=ET.tostring(elem)))

func(elem, *args, **kwargs)

print('Clearing {e}'.format(e=ET.tostring(elem)))

elem.clear()

while elem.getprevious() is not None:

print('Deleting {p}'.format(

p=(elem.getparent()[0]).tag))

del elem.getparent()[0]

del context

def mod_fast_iter(context, func, *args, **kwargs):

"""

http://www.ibm.com/developerworks/xml/library/x-hiperfparse/

Author: Liza Daly

See also http://effbot.org/zone/element-iterparse.htm

"""

for event, elem in context:

print('Processing {e}'.format(e=ET.tostring(elem)))

func(elem, *args, **kwargs)

# It's safe to call clear() here because no descendants will be

# accessed

print('Clearing {e}'.format(e=ET.tostring(elem)))

elem.clear()

# Also eliminate now-empty references from the root node to elem

for ancestor in elem.xpath('ancestor-or-self::*'):

print('Checking ancestor: {a}'.format(a=ancestor.tag))

while ancestor.getprevious() is not None:

print(

'Deleting {p}'.format(p=(ancestor.getparent()[0]).tag))

del ancestor.getparent()[0]

del context

content = setup_ABC()

context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')

orig_fast_iter(context, lambda elem: None)

# Processing 1

# Clearing 1

# Deleting B1

# Processing 2

# Clearing 2

# Deleting B2

print('-' * 80)

"""

The improved fast_iter deletes A1. The original fast_iter does not.

"""

content = setup_ABC()

context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')

mod_fast_iter(context, lambda elem: None)

# Processing 1

# Clearing 1

# Checking ancestor: root

# Checking ancestor: A1

# Checking ancestor: C

# Deleting B1

# Processing 2

# Clearing 2

# Checking ancestor: root

# Checking ancestor: A2

# Deleting A1

# Checking ancestor: C

# Deleting B2

study_fast_iter()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值