python写xml保留注释_使用Python / ElementTree解析XML时如何保留注释

Currently using Python 2.4.3, and not allowed to upgrade

I want to change the values of a given attribute in one or more tags, together with XML-comments in the updated file.

I have managed to create a Python script that takes a XML-file as argument, and for each tag specified changes an attribute, as shown below

def update(file, state):

global Etree

try:

from elementtree import ElementTree

print '*** using ElementTree'

except ImportError, e:

print '***'

print '*** Error: Must install either ElementTree or lxml.'

print '***'

raise ImportError, 'must install either ElementTree or lxml'

#end try

doc = Etree.parse(file)

root = doc.getroot()

for element in root.findall('.//StateManageable'):

element.attrib['initialState'] = state

#end for

doc.write(file)

#end def

This is all fine, the attributes "initialState" are updated, except for the fact that my original XML contains a lot of XML comments as well, but they are long gone, which is bad.

Suspect that parse only retrieves the XML-structure, but I thought XML-comments where a part of the structure. I also realize that the "human-readable" formatting of my original document is long gone, but that I have realized is expected behavior, need to format afterwards using xmllint --format or XSL.

解决方案

I know this is old now, but I stumbled across this answer above about how to retain comments. Frederik's published instructions about how to put comments into the tree still works with current versions of ElementTree, but does more than it needs to for my use, at least. It wraps the XML in a element, which is undesirable for me. I also don't need processing instructions preserved, but only comments. So, I trimmed down the class he provided on the site to this:

import xml.etree.ElementTree as ET

class PCParser(ET.XMLTreeBuilder):

def __init__(self):

ET.XMLTreeBuilder.__init__(self)

# assumes ElementTree 1.2.X

self._parser.CommentHandler = self.handle_comment

def handle_comment(self, data):

self._target.start(ET.Comment, {})

self._target.data(data)

self._target.end(ET.Comment)

To use this, create an instance of this object as a 'parser' and then pass as a parameter to ElementTree.parse() like this:

parser = PCParser()

self.tree = ET.parse(self.templateOut, parser=parser)

I take no credit whatsoever for the code, or for the undocumented use of ElementTree, but it works for me in preserving only comments without affecting the original document structure. And note that any future change to ElementTree (seems unlikely at this point after all these years, though) will break this.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值