python写xml文件,在编写XML文件时读取它(使用Python)

I have to monitor an XML file being written by a tool running all the day. But the XML file is properly completed and closed only at the end of the day.

Same constraints as XML stream processing:

Parse an incomplete XML file on-the-fly and trigger actions

Keep track of the last position within the file to avoid processing it again from the beginning

On answer of Need to read XML files as a stream using BeautifulSoup in Python, slezica suggests xml.sax, xml.etree.ElementTree and cElementTree. But no success with my attempts to use xml.etree.ElementTree and cElementTree. There are also xml.dom, xml.parsers.expat and lxml but I do not see support for "on-the-fly parsing".

I need more obvious examples...

I am currently using Python 2.7 on Linux, but I will migrate to Python 3.x => please also provide tips on new Python 3.x features. I also use watchdog to detect XML file modifications => Optionally, reuse the watchdog mechanism. Optionally support also Windows.

Please provide easy to understand/maintain solutions. If it is too complex, I may just use tell()/seek() to move within the file, use stupid text search in the raw XML and finally extract the values using basic regex.

XML sample:

TCPFLOW

1.4.6

file1

288

file2

352

file3

456

...

...

First test using SAX failed:

import xml.sax

class StreamHandler(xml.sax.handler.ContentHandler):

def startElement(self, name, attrs):

print 'start: name=', name

def endElement(self, name):

print 'end: name=', name

if name == 'root':

raise StopIteration

if __name__ == '__main__':

parser = xml.sax.make_parser()

parser.setContentHandler(StreamHandler())

with open('f.xml') as f:

parser.parse(f)

Shell:

$ while read line; do echo $line; sleep 1; done f.xml &

...

$ ./test-using-sax.py

start: name= dfxml

start: name= creator

start: name= program

end: name= program

start: name= version

end: name= version

Traceback (most recent call last):

File "./test-using-sax.py", line 17, in

parser.parse(f)

File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 107, in parse

xmlreader.IncrementalParser.parse(self, source)

File "/usr/lib64/python2.7/xml/sax/xmlreader.py", line 125, in parse

self.close()

File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 220, in close

self.feed("", isFinal = 1)

File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 214, in feed

self._err_handler.fatalError(exc)

File "/usr/lib64/python2.7/xml/sax/handler.py", line 38, in fatalError

raise exception

xml.sax._exceptions.SAXParseException: report.xml:15:0: no element found

解决方案

Three hours after posting my question, no answer received. But I have finally implemented the simple example I was looking for.

My inspiration is from saaj's answer and is based on xml.sax and watchdog.

from __future__ import print_function, division

import time

import watchdog.events

import watchdog.observers

import xml.sax

class XmlStreamHandler(xml.sax.handler.ContentHandler):

def startElement(self, tag, attributes):

print(tag, 'attributes=', attributes.items())

self.tag = tag

def characters(self, content):

print(self.tag, 'content=', content)

class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler):

def __init__(self):

watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml'])

self.file = None

self.parser = xml.sax.make_parser()

self.parser.setContentHandler(XmlStreamHandler())

def on_modified(self, event):

if not self.file:

self.file = open(event.src_path)

self.parser.feed(self.file.read())

if __name__ == '__main__':

observer = watchdog.observers.Observer()

event_handler = XmlFileEventHandler()

observer.schedule(event_handler, path='.')

try:

observer.start()

while True:

time.sleep(10)

finally:

observer.stop()

observer.join()

While the script is running, do not forget to touch one XML file, or simulate the on-the-fly writing using the following command:

while read line; do echo $line; sleep 1; done out.xml &

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值