最近在研究python解析xml。python从来不缺解析xml的库,我分析了一下,综合来看有两个非常合适,一个是大名鼎鼎很低调的xml.dom,一个是强大而且高效的lxml。先来学习minidom的。
这个类实现的readNodes作用是读取节点值和相应的属性
readElementByName是根据输入的元素名字来读取其子元素的结点属性
不是很难理解
先把xml贴上来
1 <? xml version="1.0" encoding="UTF-8" ?>
2 < waf >
3 < policy > acl </ policy >
4 < prot >
5 < dstip > 2.2.2.2 </ dstip >
6 < dstip > 3.3.3.3 </ dstip >
7 < dstport > 80 </ dstport >
8 < srcip > 3.3.3.3 </ srcip >
9 < srcport > 8888 </ srcport >
10 < protocol > 17 </ protocol >
11 </ prot >
12
13 < other test_case_id = "1" >
14 < action >
15 0
16 </ action >
17 < res >
18 0
19 </ res >
20 </ other >
21 < rule ID ="18612269" value ="/x22" />
22 </ waf >
1 # !/usr/bin/env python
2 # coding=utf-8
3 from xml.dom import minidom
4
5 class Xml_dom():
6 def readNodes(self,domElement):
7 for nodes in domElement.childNodes:
8 if nodes.nodeType == 1 :
9 print nodes.nodeName + ' ===================== '
10 for keys in nodes.attributes.keys():
11 print nodes.attributes[keys].name + ' = ' + nodes.attributes[keys].value
12 if len(nodes.childNodes) == 1 :
13 print nodes.nodeName + ' : ' + nodes.childNodes[0].nodeValue
14 else :
15 self.readNodes(nodes)
16 def readElementByName(self,elementList):
17 for elements in elementList:
18 if elements.nodeType == 1 :
19 print elements.nodeName + ' >>>>>>>>>>>>>>>>>>>>>>> '
20 for keys in elements.attributes.keys():
21 print elements.attributes[keys].name + ' = ' + elements.attributes[keys].value
22 if len(elements.childNodes) == 1 :
23 print elements.nodeName + ' : ' + elements.childNodes[0].nodeValue
24 else :
25 self.readElementByName(elements.childNodes)
26 def __init__ (self,filename,elename):
27 self.dom = minidom.parse(filename)
28 self.root = self.dom.documentElement
29 print ' =========xml_dom==============/n '
30 self.readNodes(self.root)
31 print ' =========end===============/n '
32 print ' >>>>>>>>>xml_dom>>>>>>>>>>/n '
33 el = self.dom.getElementsByTagName(elename)
34 self.readElementByName(el)
35 print " >>>>>>>>>end>>>>>>>>>>>> "
36
37 if __name__ == ' __main__ ' :
38 # a = Xml_dom('rule_sqlInj.xml','configs')
39 a = Xml_dom( ' waf_sqlrule.xml ' , ' prot ' )
得到的结果:
> "D:/Python25/pythonw.exe" -u "D:/学习/python/xml/xml_dom/xml_dom.py"
=========xml_dom==============
policy=====================
prot=====================
other=====================
test_case_id=1
action=====================
res=====================
rule=====================
ID=18612269
value=/x22
=========end===============
>>>>>>>>>xml_dom>>>>>>>>>>
prot>>>>>>>>>>>>>>>>>>>>>>>
dstip>>>>>>>>>>>>>>>>>>>>>>>
dstip:2.2.2.2
dstip>>>>>>>>>>>>>>>>>>>>>>>
dstip:3.3.3.3
dstport>>>>>>>>>>>>>>>>>>>>>>>
dstport:80
srcip>>>>>>>>>>>>>>>>>>>>>>>
srcip:3.3.3.3
srcport>>>>>>>>>>>>>>>>>>>>>>>
srcport:8888
protocol>>>>>>>>>>>>>>>>>>>>>>>
protocol:17
>>>>>>>>>end>>>>>>>>>>>>
=========xml_dom==============
policy=====================
prot=====================
other=====================
test_case_id=1
action=====================
res=====================
rule=====================
ID=18612269
value=/x22
=========end===============
>>>>>>>>>xml_dom>>>>>>>>>>
prot>>>>>>>>>>>>>>>>>>>>>>>
dstip>>>>>>>>>>>>>>>>>>>>>>>
dstip:2.2.2.2
dstip>>>>>>>>>>>>>>>>>>>>>>>
dstip:3.3.3.3
dstport>>>>>>>>>>>>>>>>>>>>>>>
dstport:80
srcip>>>>>>>>>>>>>>>>>>>>>>>
srcip:3.3.3.3
srcport>>>>>>>>>>>>>>>>>>>>>>>
srcport:8888
protocol>>>>>>>>>>>>>>>>>>>>>>>
protocol:17
>>>>>>>>>end>>>>>>>>>>>>
网上从来不缺乏minidom解析的文章,我也是刚学到的。这个还好理解,其实掌握了基本的method就可以应用了。如果想学习更多,可以直接看minidom的源码。