项目中用到SAX模块对xml文件进行解析,先贴出我们的xml文件吧,当然只是一小部分
<ORDER_INFO execute_id="58" order_id="16" show_sequence="default" show_type="CPM" >
<DATE_TIME>
<DAY id='MON'>0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23</DAY>
<DAY id='TUE'>0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23</DAY>
</DATE_TIME>
<AUDIENCE>
<AREA>0010</AREA>
<KEYWORD_FILES>Keywords_file/58/default.txt</KEYWORD_FILES>
<KEYSITE_FILES>Keyurl_file/58/default.txt</KEYSITE_FILES>
</AUDIENCE>
那我写的例子就是基于上述xml文件的解析;
先熟悉一下流程:
1、首先写一个xml解析的类,执行解析的全部流程,我定义为XMLPARSE;
2、make_parser生成解析器,设置解析器使用的解析方法,也就是将步骤一的解析方法置到解析器中;
3、引入文件,执行解析动作parse()
XMLParse构建:
#!/usr/bin/python
#-*- coding:utf-8 -*-
from xml.sax.handler import ContentHandler
class XMLParse(ContentHandler):
def __init__(self):
self.name = dict()
self.name['ORDER_INFO'] = self._ORDERINFO
self.name['DAY'] = self._DAY
self.name['AREA'] = self._AREA
self.name['KEYWORD_FILES'] = self._SEARCHORDER
def startElement(self, name, attrs):
self.content = ''
if not name in self.name:
return
self.attrs = attrs
if not self.name[name]():
return False
def characters(self, content):
self.content = self.content + content
def endElement(self, name):
try:
if self.label == 'DAY':
#存储操作
self.day = self.content
return
if self.label == 'AREA':
#存储操作
self.area = self.content
return
if self.label == 'KEYWORD_FILES':
#存储操作
self.word_file = self.content
return
except Exception, e:
return False
return
def _ORDERINFO(self):
try:
orderid = str(self.attrs['execute_id'])
self.tmp_eorder_id = orderid
superid = str(self.attrs['order_id'])
type = str(self.attrs['show_type'])
linear = str(self.attrs['show_sequence'])
return True
except Exception, e:
return False
def _DAY(self):
try:
self.label = "DAY"
self.week_id = str(self.attrs['id'])
return True
except Exception,e:
return False
def _AREA(self):
try:
self.label = "AREA"
return True
except Exception,e:
return False
def _SEARCHORDER(self):
try:
self.label = "KEYWORD_FILES"
return True
except Exception, e:
return False
为了防止characters被多次执行,所以对标签内容的解析放在了endElement中进行!具体参看http://blog.csdn.net/wyjzt999/article/details/8661192
构造解析器:
from xml.sax import make_parser
par = make_parser()
par.setContentHandler(XMLParse())
执行解析动作:
xml_file = "/root/demo.xml"
par.parse(xml_file)