注意:这是一个对Python的ElementTree标准库非常有用的答案,无需使用硬编码名称空间。
要从XML数据中提取名称空间的前缀和URI,可以使用ElementTree.iterparse函数,只解析名称空间开始事件(start ns):>>> from io import StringIO
>>> from xml.etree import ElementTree
>>> my_schema = u'''
... xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
... xmlns:owl="http://www.w3.org/2002/07/owl#"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
... xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
... xmlns="http://dbpedia.org/ontology/">
...
...
... basketball league
...
... a group of sports teams that compete against each other
... in Basketball
...
...
...
...
'''>>> my_namespaces = dict([
... node for _, node in ElementTree.iterparse(
... StringIO(my_schema), events=['start-ns']
... )
... ])
>>> from pprint import pprint
>>> pprint(my_namespaces)
{'': 'http://dbpedia.org/ontology/',
'owl': 'http://www.w3.org/2002/07/owl#',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
'xsd': 'http://www.w3.org/2001/XMLSchema#'}
然后字典可以作为参数传递给搜索函数:root.findall('owl:Class', my_namespaces)