python xml转换键值对,使用Python将XML转换为标记和值列表

I'm learning Python and I'm trying to extract lists of all tags and corresponding values from any XML file. This is my code so far.

def ParseXml(XmlFile):

try:

parser = etree.XMLParser(remove_blank_text=True, compact=True)

tree = ET.parse(XmlFile, parser)

root = tree.getroot()

ListOfTags, ListOfValues, ListOfAttribs = [], [], []

for elem in root.iter('*'):

Tag = elem.tag

ListOfTags.append(Tag)

value = elem.text

if value is not None:

ListOfValues.append(value)

else:

ListOfValues.append('')

attrib = elem.attrib

if attrib:

ListOfAttribs.append([attrib])

else:

ListOfAttribs.append([])

print('%s File parsed successfully' % XmlFile)

return (ListOfTags, ListOfValues, ListOfAttribs)

except Exception as e:

print('Error while parsing XMLs : %s : %s' % (type(e), e))

return ([], [], [])

For an XML input like this:

SING

ABD45129-PD1212-121DFL

Streaming

Success

This output is multiple lists of tags, values and attributes. This is working fine.

['Application', 'UserAuthRequest', 'VendorApp', 'AppName', 'ApplicationRequest', 'GUID', 'Type', 'File', 'FileExtension', 'Result', 'ResultCode']

['', '', '', 'SING', '', 'ABD45129-PD1212-121DFL', 'Streaming', '', '', '', 'Success']

[[{'Version': '2.01'}], [], [], [], [{'ID': '12-123-AH'}], [], [{'tc': '200'}], [], [{'VendorCode': '200'}], [], [{'tc': '1'}]]

But my problem is that i need the tags including the parent and child tags. Like below is actual output I'm targetting:

['Application', 'UserAuthRequest', 'UserAuthRequest.VendorApp', 'UserAuthRequest.VendorApp.AppName', 'ApplicationRequest', 'ApplicationRequest.GUID', 'ApplicationRequest.Type', 'ApplicationRequest.File', 'ApplicationRequest.File.FileExtension', 'ApplicationRequest.File.FileExtension.Result', 'ApplicationRequest.File.FileExtension.Result.ResultCode']

How do i make this happen with Python? or is there any other alternate way to do this?

解决方案

import xml.etree.ElementTree as ET

def parse_node(node, ancestor_string=""):

#print(type(node), dir(node))

if ancestor_string:

node_string = ".".join([ancestor_string, node.tag])

else:

node_string = node.tag

tag_list = [node_string]

text = node.text

if text:

text_list = [text.strip()]

else:

text_list = [""]

attr_list = [node.attrib]

for child_node in list(node):

child_tag_list, child_text_list, child_attr_list = parse_node(child_node, ancestor_string=node_string)

tag_list.extend(child_tag_list)

text_list.extend(child_text_list)

attr_list.extend(child_attr_list)

return tag_list, text_list, attr_list

def parse_xml(file_name):

tree = ET.parse("test.xml")

root_node = tree.getroot()

tags, texts, attrs = parse_node(root_node)

print(tags)

print(texts)

print(attrs)

def main():

parse_xml("a.xml")

if __name__ == "__main__":

main()

Notes:

The idea is to "remember the path" in the xml tree. That is done via parse_node's ancestor_string argument, which is computed for each node in the tree and passed to its (direct) children

The naming differs from the one in the question because of [Python]: PEP 8 -- Style Guide for Python Code considerations

At 1st glance, it might seem that having 2 functions (main and parse_xml) where one just calls the other, only adds an useless level of nesting, but it's a good practice that I got used to

I corrected the attributes list. Instead a list of lists with each inner list containing a single dictionary, return a list of dictionaries

Output (I've run the script with Python 2.7 and Python 3.5):

['Application', 'Application.UserAuthRequest', 'Application.UserAuthRequest.VendorApp', 'Application.UserAuthRequest.VendorApp.AppName', 'Application.ApplicationRequest', 'Application.ApplicationRequest.GUID', 'Application.ApplicationRequest.Type', 'Application.ApplicationRequest.File', 'Application.ApplicationRequest.FileExtension', 'Application.ApplicationRequest.FileExtension.Result', 'Application.ApplicationRequest.FileExtension.Result.ResultCode']

['', '', '', 'SING', '', 'ABD45129-PD1212-121DFL', 'Streaming', '', '', '', 'Success']

[{'Version': '2.01'}, {}, {}, {}, {'ID': '12-123-AH'}, {}, {'tc': '200'}, {}, {'VendorCode': '200'}, {}, {'tc': '1'}]

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值