pythonhadoopxml文件_利用python利用hadoop处理xml文件

最新推荐文章于 2021-12-16 16:05:26 发布

cstghitpku

最新推荐文章于 2021-12-16 16:05:26 发布

阅读量114

点赞数

文章标签： pythonhadoopxml文件

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_33198642/article/details/113647170

版权

你考虑过使用xpath吗？它是一种迷你语言，可以用来绕过xml树。在python中可以很容易地使用它。在

下面是我应该怎么做(这是有效的Python代码。我用Python3.2测试过。可以很好地处理示例xml)：import xml.etree.ElementTree as xml #you had this line in your code. I am not using any tool you do not have access to in your script

def get_row_attributes(the_xml_as_a_string):

"""

this function takes xml as a string.

It can work with xml that looks like your included example xml.

This function returns a list of dictionaries. Each dictionary is made up of the attributes of each row. So the result looks like:

[

{attribute_name:value_for_first_row,attribute_name:value_for_first_row...},

{attribute_name:value_for_second_row,attribute_name:value_for_second_row...},

etc

]

"""

tree = xml.fromstring(the_xml_as_a_string)

rows = tree.findall('table/row') # 'table/row' is xpath. it means get all the rows in all the tables

return [row.attrib for row in rows]

要使用此函数，请读入std并建立一个字符串。呼叫get_row_attributes(the_xml_as_a_string)

生成的字典包含您请求的信息(行的属性)。在

所以现在我们有了从std读入

得到了所有行的所有信息

全部使用完全正常的python

最后要做的是将它写入另一个进程。如果您需要有关此部分的帮助，请提供有关数据应采用何种格式以及数据应流向何处的信息

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。