python实现xml文件解析失败_Python 的 sax 解析 xml 文件遇到了个错误,不知各位大佬有解决办法否?...

a.xml:

Sanjeev Saxena

q

w

Parallel Integer Sorting and Simulation Amongst CRCW Models

607-619

1996

33

Acta Inf

7

db/journals/acta/acta33.html#Saxena96

htt

python 代码:

from xml.sax.handler import ContentHandler, EntityResolver

from xml.sax import parse

from itertools import combinations

class DBLP(ContentHandler, EntityResolver):

passthrough = False

paper_authors = []

currTag = ''

def startElement(self, name, attrs):

if name == 'article':

self.passthrough = True

elif name == 'author' and self.passthrough:

self.currTag = 'author'

def endElement(self, name):

if name == 'article':

self.passthrough = False

self.generate_paper_info()

self.paper_authors = []

elif name == 'author':

self.currTag = ''

def characters(self, chars):

if self.passthrough and self.currTag == 'author':

self.paper_authors.append(chars)

def generate_paper_info(self):

with open('dblp.txt', 'w') as f:

if len(self.paper_authors) < 2:

print 'Only one author'

else:

for info in combinations(self.paper_authors, 2):

f.write('{0} {1}\n'.format(info[0], info[1]))

print 'Write one piece of cooperation user'

parse('a.xml', DBLP())

报错信息:

Traceback (most recent call last):

File "dblp.py", line 40, in

parse('a.xml', DBLP())

File "E:\Python2.7.12\lib\xml\sax\__init__.py", line 33, in parse

parser.parse(source)

File "E:\Python2.7.12\lib\xml\sax\expatreader.py", line 110, in parse

xmlreader.IncrementalParser.parse(self, source)

File "E:\Python2.7.12\lib\xml\sax\xmlreader.py", line 123, in parse

self.feed(buffer)

File "E:\Python2.7.12\lib\xml\sax\expatreader.py", line 213, in feed

self._parser.Parse(data, isFinal)

File "E:\Python2.7.12\lib\xml\sax\expatreader.py", line 397, in external_entity_ref

"")

File "E:\Python2.7.12\lib\xml\sax\saxutils.py", line 349, in prepare_input_source

f = urllib.urlopen(source.getSystemId())

File "E:\Python2.7.12\lib\urllib.py", line 87, in urlopen

return opener.open(url)

File "E:\Python2.7.12\lib\urllib.py", line 213, in open

return getattr(self, name)(url)

File "E:\Python2.7.12\lib\urllib.py", line 469, in open_file

return self.open_local_file(url)

File "E:\Python2.7.12\lib\urllib.py", line 483, in open_local_file

raise IOError(e.errno, e.strerror, e.filename)

IOError: [Errno 2] : 'dblp.dtd'

分割线~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~·

只要把上面这句去掉就好了,想到用 EntityResolver 重写,忽略识别 dtd,但是不知道如何重写。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值