Parsing XML with python using xml.sax, but my code fails to catch Entities. Why doesn't skippedEntity() or resolveEntity() report in the following:
import os
import cStringIO
import xml.sax
from xml.sax.handler import ContentHandler,EntityResolver,DTDHandler
#Class to parse and run test XML files
class TestHandler(ContentHandler,EntityResolver,DTDHandler):
#SAX handler - Entity resolver
def resolveEntity(self,publicID,systemID):
print "TestHandler.resolveEntity: %s %s" % (publicID,systemID)
def skippedEntity(self, name):
print "TestHandler.skippedEntity: %s" % (name)
def unparsedEntityDecl(self,publicID,systemID,ndata):
print "TestHandler.unparsedEntityDecl: %s %s" % (publicID,systemID)
def startElement(self,name,attrs):
# name = string.lower(name)
summary = '' + attrs.get('summary','')
arg = '' + attrs.get('arg','')
print 'TestHandler.startElement(), %s : %s (%s)' % (name,summary,arg)
def run(xml_string):
try:
parser = xml.sax.make_parser()
stream = cStringIO.StringIO(xml_string)
curHandler = TestHandler()
parser.setContentHandler(curHandler)
parser.setDTDHandler( curHandler )
parser.setEntityResolver( curHandler )
parser.parse(stream)
stream.close()
except (xml.sax.SAXParseException), e:
print "*** PARSER error: %s" % e;
def main():
try:
XML = " ]>Entity: ¬"
run(XML)
except Exception, e:
print 'FATAL ERROR: %s' % (str(e))
if __name__== '__main__':
main()
When run, all I see is:
TestHandler.startElement(), step: foo ()
*** PARSER error: :1:36: undefined entity
Why don't I see the resolveEntity print for # or the skipped entry print for ¬?
解决方案
I think resolveEntity and skippedEntity are only called for external DTDs. I got this to work by modifying the XML.
XML = """<?xml version="1.0" encoding="utf-8" ?>
Entity: ¬
"""
The external.dtd contains two simple entity declarations.
Also, I got rid of resolveEntity.
This outputs -
TestHandler.startElement(), test : step: bar foo ()
TestHandler.skippedEntity: not
Hope this helps.