python etree xpath,带xpath的python etree和带前缀的名称空间

I can't find info, how to parse my XML with namespace:

I have this xml:

blabla

string

And tried to parse it:

dom = ET.parse(u'C:\\filepath\\1.xml')

rootxml = dom.getroot()

for subtag in rootxml.xpath(u'//par:actual'):

#do something

print(subtag)

And got exception, because it doesn't know about namespace prefix.

Is there best way to solve that problem, counting that script will not know about file it going to parse and tag is going to search for?

Searching web and stackoverflow I found, that if I will add there:

namespace = {u'par': u"http://somewhere.net/actual"}

for subtag in rootxml.xpath(u'//par:actual', namespaces=namespace):

#do something

print(subtag)

That works. Perfect. But I don't know which XML I will parse, and searching tag (such as //par:actual) is also unknown to my script. So, I need to find way to extract namespace from XML somehow.

I found a lot of ways, how to extract namespace URI, such as:

print(rootxml.tag)

print(rootxml.xpath('namespace-uri(.)'))

print(rootxml.xpath('namespace-uri(/*)'))

But how should I extract prefix to create dictionary which ElementTree wants from me? I don't want to use regular expression monster over xml body to extract prefix, I believe there have to exist supported way for that, isn't it?

And maybe there have to exist some methods for me to extract by ETree namespace from XML as dictionary (as ETree wants!) without hands manipulation?

解决方案

You cannot rely on the namespace declarations on the root element: there is no guarantee that the declarations will even be there, or that the document will have the same prefix for the same namespace throughout.

Assuming you are going to have some way of passing the tag you want to search (because you say it is not known by your script), you should also provide a way to pass a namespace mapping as well. Or use the James Clark notation, like {http://somewhere.net/actual}actual (the ETXPath has support for this syntax, whereas "normal" xpath does not, but you can also use other methods like .findall() if you don't need full xpath)

If you don't care for the prefix at all, you could also use the local-name() function in xpath, eg. //*[local-name()="actual"] (but you won't be "really" sure it's the right "actual")

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值