I want to use the method of "findall" to locate some elements of the source xml file in the ElementTree module.
However, the source xml file (test.xml) has namespace. I truncate part of xml file as sample:
Updates
9/26/2012 10:30:34 AM
All Rights Reserved.
newlicense.htm
N
The sample python code is below:
from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return
Although it can works, because there is a namespace "{http://www.test.com}", it's very inconvenient to add a namespace in front of each tag.
How can I ignore the namespace when using the method of "find", "findall" and so on?
解决方案
Instead of modifying the XML document itself, it's best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:
from StringIO import StringIO
import xml.etree.ElementTree as ET
# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
if '}' in el.tag:
el.tag = el.tag.split('}', 1)[1] # strip all namespaces
root = it.root
This is based on the discussion here:
http://bugs.python.org/issue18304