python 处理xml pandas_在python中解析xml到pandas数据帧

最新推荐文章于 2021-11-26 21:15:42 发布

EYES 乱

最新推荐文章于 2021-11-26 21:15:42 发布

阅读量356

点赞数

文章标签： python 处理xml pandas

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_32495691/article/details/113641334

版权

解决方案中的问题是“元素数据提取”没有正确完成。您在问题中提到的xml嵌套在几个层中。这就是为什么我们需要递归地读取和提取数据。在这种情况下，下面的解决方案应该能满足您的需要。尽管我鼓励你看一下this article和{a2}以获得更清晰的理解。在

方法：1import numpy as np

import pandas as pd

#import os

import xml.etree.ElementTree as ET

def xml2df(xml_source, df_cols, source_is_file = False, show_progress=True):

"""Parse the input XML source and store the result in a pandas

DataFrame with the given columns.

For xml_source = xml_file, Set: source_is_file = True

For xml_source = xml_string, Set: source_is_file = False

Child 1 Text

Child 2 Text

Child 3 Text

Note that for an xml structure as shown above, the attribute information of

element tag can be accessed by list(element). Any text associated with tag can be accessed

as element.text and the name of the tag itself can be accessed with

element.tag.

"""

if source_is_file:

xtree = ET.parse(xml_source) # xml_source = xml_file

xroot = xtree.getroot()

else:

xroot = ET.fromstring(xml_source) # xml_source = xml_string

consolidator_dict = dict()

default_instance_dict = {label: None for label in df_cols}

def get_children_info(children, instance_dict):

# We avoid using element.getchildren() as it is deprecated.

# Instead use list(element) to get a list of attributes.

for child in children:

#print(child)

#print(child.tag)

#print(child.items())

#print(child.getchildren()) # deprecated method

#print(list(child))

if len(list(child))>0:

instance_dict = get_children_info(list(child),

instance_dict)

if len(list(child.keys()))>0:

items = child.items()

instance_dict.update({key: value for (key, value) in items})

#print(child.keys())

instance_dict.update({child.tag: child.text})

return instance_dict

# Loop over all instances

for instance in list(xroot):

instance_dict = default_instance_dict.copy()

ikey, ivalue = instance.items()[0] # The first attribute is "ID"

instance_dict.update({ikey: ivalue})

if show_progress:

print('{}: {}={}'.format(instance.tag, ikey, ivalue))

# Loop inside every instance

instance_dict = get_children_info(list(instance),

instance_dict)

#consolidator_dict.update({ivalue: instance_dict.copy()})

consolidator_dict[ivalue] = instance_dict.copy()

df = pd.DataFrame(consolidator_dict).T

df = df[df_cols]

return df

运行以下命令以生成所需的输出。在

^{pr2}$

方法：2

{{cd2>你可以转换。运行以下命令以获得所需的输出。在pip install -U xmltodictSolutiondef read_recursively(x, instance_dict):

#print(x)

txt = ''

for key in x.keys():

k = key.replace("@","")

if k in df_cols:

if isinstance(x.get(key), dict):

instance_dict, txt = read_recursively(x.get(key), instance_dict)

#else:

instance_dict.update({k: x.get(key)})

#print('{}: {}'.format(k, x.get(key)))

else:

#print('else: {}: {}'.format(k, x.get(key)))

# dig deeper if value is another dict

if isinstance(x.get(key), dict):

instance_dict, txt = read_recursively(x.get(key), instance_dict)

# add simple text associated with element

if k=='#text':

txt = x.get(key)

# update text to corresponding parent element

if (k!='#text') and (txt!=''):

instance_dict.update({k: txt})

return (instance_dict, txt)

您需要上面给出的函数read_recursively()。现在运行以下命令。在import xmltodict, json

o = xmltodict.parse(xml_string) # INPUT: XML_STRING

#print(json.dumps(o)) # uncomment to see xml to json converted string

consolidated_dict = dict()

oi = o['Instances']['Instance']

for x in oi:

instance_dict = dict()

instance_dict, _ = read_recursively(x, instance_dict)

consolidated_dict.update({x.get("@ID"): instance_dict.copy()})

df = pd.DataFrame(consolidated_dict).T

df = df[df_cols]

df

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 处理xml pandas_在python中解析xml到pandas数据帧

解决方案中的问题是“元素数据提取”没有正确完成。您在问题中提到的xml嵌套在几个层中。这就是为什么我们需要递归地读取和提取数据。在这种情况下，下面的解决方案应该能满足您的需要。尽管我鼓励你看一下this article和{a2}以获得更清晰的理解。在方法：1import numpy as npimport pandas as pd#import osimport xml.etree.Element...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。