python getchildren xml,如何使用Python正确解析父/子XML

I have a XML parsing issue that I have been working on for the last few days and I just can't figure it out. I've used both the ElementTree built-in to Python as well as the LXML libraries but get the same results. I would like to continue using ElementTree if I can, but if there are limitations to that library then LXML would do. Please see the following XML example. What I am trying to do is find a connection element and see what classes that element contains. I am expecting each connection to contain at least one class. If it doesn't have at least one class I want to know that it doesn't. The problem I am facing is that my code is returning ALL THE CLASSES in the document for each connection, instead of only the classes for that specific connection.

10

DVD

DVD_TEST

20

TV

For example, here is my Python code and the output that it returns:

for parentConnection in elemetTree.getiterator('connection'):

# print parentConnection.tag

for childConnection in parentConnection:

# print childConnection.text

if childConnection.tag == 'id':

connID = childConnection.text

print connID

for p in tree.xpath('./connections/connection/classes/class'):

for attrib in p.attrib:

print '@' + attrib + '=' + p.attrib[attrib]

children = p.getchildren()

for child in children:

print child.text

Here is the output:

10

DVD

DVD_TEST

TV

20

DVD

DVD_TEST

TV

As you can see, I am printing out the text of the CONNECTION ID and then the text for each CLASSNAME. However, as you can see, they both contain the same text for CLASSNAME. The output should really look like this:

10

DVD

DVD_TEST

20

TV

Now as the above hand modified example shows each connection ID (Parent) has the appropriate classes/classnames (children). I just can't figure out how to make this work. If any of you have the knowledge to make this work, I would love to hear it.

I've tried building a data structure and other examples on this forum but just can't get it to work right.

解决方案

My solution without using xpath. What I recommend is digging a little further into lxml documentation. There might be more elegant and direct ways to achieve this. There's a lot to explore!.

Solution:

from lxml import etree

from io import BytesIO

class FindClasses(object):

@staticmethod

def parse_xml(xml_string):

parser = etree.XMLParser()

fs = etree.parse(BytesIO(xml_string), parser)

fstring = etree.tostring(fs, pretty_print=True)

element = etree.fromstring(fstring)

return element

def find(self, xml_string):

for parent in self.parse_xml(xml_string).getiterator('connection'):

for child in parent:

if child.tag == 'id':

print child.text

self.find_classes(child)

@staticmethod

def find_classes(child):

for parent in child.getparent(): # traversing up -> connection

for children in parent.getchildren(): # children of connection -> classes

for child in children.getchildren(): # child of classes -> class

print child.text

print

if __name__ == '__main__':

xml_file = open('foo.xml', 'rb') #foo.xml or path to your xml file

xml = xml_file.read()

f = FindClasses()

f.find(xml)

Output:

10

DVD

DVD_TEST

20

TV

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值