python xml解析cdata_如何使用Python从xml文件读取CDATA

I try to parse a large xml file with Python, but when I want to print CDATA information, there are nothing, especially with the "content" tag for the description

My source code look like this:

#!/usr/bin/python

# -*- coding: utf-8 -*-

import xml.sax

import re

from cStringIO import StringIO

class MovieHandler( xml.sax.ContentHandler ):

def __init__(self):

self.item = {}

self.CurrentData = ""

self.url = ""

self.description = ""

self.price = ""

# Call when an element starts

def startElement(self, tag, attributes):

self.CurrentData = tag

# Call when an elements ends

def endElement(self, tag):

elif self.CurrentData == "url":

self.item["url"] = self.url

elif self.CurrentData == "content":

print 'description: ', self.description

elif self.CurrentData == "price":

if self.price:

self.price = re.sub('[^0-9]','',self.price[0].encode('ascii', 'ignore'))

self.item["price"] = int(self.price)

self.CurrentData = ""

print self.item

self.item.clear()

# Call when a character is read

def characters(self, content):

if self.CurrentData == "url":

self.url = content

elif self.CurrentData == "content":

self.description = content

elif self.CurrentData == "price":

self.price = content

if ( __name__ == "__main__"):

# create an XMLReader

parser = xml.sax.make_parser()

# turn off namepsaces

parser.setFeature(xml.sax.handler.feature_namespaces, 0)

# override the default ContextHandler

Handler = MovieHandler()

parser.setContentHandler(Handler)

parser.parse("myfile.xml")

print "done"

the content tag look like this:

new tires

perfect condition

Black LeatherInterior]]>

Thanks in advance

解决方案

The .characters() function can be called several times, each time with a fragment of the text. You seem to be overwriting self.description with each call.

Try this:

def characters(self, content):

...

self.description += content # Note: '+=', not '='

...

and remember to set self.description = "" when you are done with it.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值