python lxml etree,使用python lxml.etree处理庞大的XML文件

I would like to parse a huge xml (>200MB) using lxml.etree in Python. I tried to use etree.parse to load the XML file, but this does not work due to the filesize:

etree.parse('file.xml')Traceback (most recent call last):

File "", line 1, in

File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)

File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)

File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)

File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)

File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)

File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)

File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)

File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)

lxml.etree.XMLSyntaxError: Excessive depth in document: 256 use XML_PARSE_HUGE option, line 1276, column 7

As I want to use xpath expressions, I have to parse the file first. How can I therefore parse the XML file? How do I use XML_PARSE_HUGE in connection to lxml.etree?

Thanks!

解决方案

Try to create a custom XMLParser instance:

from lxml.etree import XMLParser, parse

p = XMLParser(huge_tree=True)

tree = parse('file.xml', parser=p)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值