python监视文件更改_使用Python监视目录中的文件更改

使用Python和watchdog库,监控FTP文件夹,当有新的XML文件创建或现有文件被修改时,解析XML内容并将其插入数据库。
摘要由CSDN通过智能技术生成

python监视文件更改

需求 (requirement)

Watch changes in a ftp folder, whenever a new xml file is created, or when an existing file is modified this needs to be parsed and its contents inserted in the database.

每当创建新的xml文件或修改现有文件时,请监视ftp文件夹中的更改,需要对其进行解析并将其内容插入数据库中。

工具 (tools)

  • Python 2..7
  • watchdog
  • Python 2..7
  • 看门狗

Install from pip

从pip安装

pip install watchdog
pip install watchdog
 

Watchdog is a Python API library and shell utilities to monitor file system events.

看门狗是一个Python API库和Shell实用程序,用于监视文件系统事件。

如何 (How to)

First create the monitoring script, it will run daemonized and will observe any changes to the given directory. In that script 3 modules/classes will be used

首先创建监视脚本,它将运行守护进程,并将观察对给定目录的任何更改。 在该脚本中,将使用3个模块/类

  • time from Python will be used to sleep the main loop
  • watchdog.observers.Observer is the class that will watch for any change, and then dispatch the event to specified the handler.
  • watchdog.events.PatterMatchingHandler is the class that will take the event dispatched by the observer and perform some action
  • 来自Python的时间将用于Hibernate主循环
  • watchdog.observers.Observer是将监视任何更改,然后将事件分派到指定的处理程序的类。
  • watchdog.events.PatterMatchingHandler是用于获取观察者调度的事件并执行某些操作的类

watch_for_changes.py

watch_for_changes.py

PatternMatchingEventHandler inherits from FileSystemEventHandler and exposes some usefull methods:

PatternMatchingEventHandler继承自FileSystemEventHandler并公开了一些有用的方法:

Events are: modified, created, deleted, moved

事件是: 修改,创建,删除,移动

  • on_any_event: if defined, will be executed for any event
  • on_created: Executed when a file or a directory is created
  • on_modified: Executed when a file is modified or a directory renamed
  • on_moved: Executed when a file or directory is moved
  • on_deleted: Executed when a file or directory is deleted.
  • on_any_event :如果定义,将对任何事件执行
  • on_created :创建文件或目录时执行
  • on_modified :在修改文件或重命名目录时执行
  • on_moved :移动文件或目录时执行
  • on_deleted :删除文件或目录时执行。

Each one of those methods receives the event object as first parameter, and the event object has 3 attributes.

这些方法中的每个方法都将事件对象作为第一个参数, 事件对象具有3个属性。

  • event_type ‘modified’ | ‘created’ | ‘moved’ | ‘deleted’
  • is_directory True | False
  • src_path path/to/observed/file
  • event_type'modified '| '创建'| “感动” | “已删除”
  • is_directory True | 假
  • src_path路径/到/观察到的/文件

So to create a handler just inherit from one of the existing handlers, for this example PatternMatchingEventHandler will be used to match only xml files.

因此,要创建一个仅继承自现有处理程序之一的处理程序,对于此示例, PatternMatchingEventHandler将仅用于匹配xml文件。

To simplify I will enclose the file processor in just one method, and I will implement method only for on_modified and on_created, which means that my handler will ignore any other events.

为简化起见,我将文件处理器仅封装在一个方法中,并且仅对on_modified和on_created实现方法,这意味着我的处理程序将忽略任何其他事件。

Also defining the patterns attribute to watch only for files with xml or lxml extensions.

还定义patterns属性以仅监视具有xml或lxml扩展名的文件。

 class MyHandler(PatternMatchingEventHandler):
    patterns = ["*.xml", "*.lxml"]

    def process(self, event):
        """
        event.event_type 
            'modified' | 'created' | 'moved' | 'deleted'
        event.is_directory
            True | False
        event.src_path
            path/to/observed/file
        """
        # the file will be processed there
        print event.src_path, event.event_type  # print now only for degug

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)
 class MyHandler(PatternMatchingEventHandler):
    patterns = ["*.xml", "*.lxml"]

    def process(self, event):
        """
        event.event_type 
            'modified' | 'created' | 'moved' | 'deleted'
        event.is_directory
            True | False
        event.src_path
            path/to/observed/file
        """
        # the file will be processed there
        print event.src_path, event.event_type  # print now only for degug

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)
 

With the above handler only creation and modification will be watched now the Obserser needs to be scheduled.

使用上面的处理程序,现在仅需要观察 Obserser的创建和修改。

You can set the named-argument “recursive” to True for observer.schedule. if you want to watch for files in subfolders.

您可以将observer.schedule的命名参数“递归”设置为True。 如果要监视子文件夹中的文件。

That’s all needed to watch for modifications on the given directory, it will take the current directory as default or the path given as first parameter.

所有这些都需要注意对给定目录的修改,它将当前目录作为默认目录,或者将给定的路径作为第一个参数。

python watch_for_changes.py /path/to/directory
python watch_for_changes.py /path/to/directory
 

let it run in a shell and open another one or the file browser to change or create new .xml files in the /path/to/directory.

让它在shell中运行并打开另一个文件或文件浏览器,以在/ path / to /目录中更改或创建新的.xml文件。

Since the handler is printing the results, the outrput should be:

由于处理程序正在打印结果,因此输出量应为:

rochacbruno@~/$ python watch_for_changes.py /tmp
/tmp/test.xml created
/tmp/test.xml modified
rochacbruno@~/$ python watch_for_changes.py /tmp
/tmp/test.xml created
/tmp/test.xml modified
 

Now to complete the script only need to implement in the process method, the necessary logic to parse and insert to database.

现在完成脚本只需要在处理方法中实现,就可以将必要的逻辑进行解析并插入到数据库中。

For example, if the xml file contains some data about current track on a web radio:

例如,如果xml文件包含有关Web广播中当前曲目的一些数据:

The easiest way to parse this small xml is using xmltodict library.

解析此小型xml的最简单方法是使用xmltodict库。

pip install xmltodict
pip install xmltodict
 

With xmltodict.parse function the above xml will be outputed as an OrderedDict

使用xmltodict.parse函数,上述xml将作为OrderedDict输出

Now we can just access that dict to create the registry on filesystem or something else. Notice that I will use a lot of get method of dict type to avoid KeyErrors.

现在我们可以访问该指令以在文件系统或其他系统上创建注册表。 注意,我将使用很多dict类型的get方法来避免KeyErrors。

with open(event.src_path, 'r') as xml_source:
    xml_string = xml_source.read()
    parsed = xmltodict.parse(xml_string)
    element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')
    if not element:
        return
    print dict(element)
with open(event.src_path, 'r') as xml_source:
    xml_string = xml_source.read()
    parsed = xmltodict.parse(xml_string)
    element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')
    if not element:
        return
    print dict(element)
 

ant the output will be:

ant的输出将是:

Much better than XPATH, and for this particular case when the xml_source is small there will no relevant performace issue.

比XPATH好得多,对于这种特殊情况,当xml_source很小时,将不存在相关的性能问题。

Now only need to get the values and populate the database, in my case I will use Redis DataModel as storage.

现在只需要获取值并填充数据库,就我而言,我将使用Redis DataModel作为存储。

also I will use magicdate module to automagically convert the date format to datetime object.

我还将使用magicdate模块自动将日期格式转换为datetime对象。

import sys
import time
import xmltodict
import magicdate
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler

from .models import Media


class MyHandler(PatternMatchingEventHandler):
    patterns=["*.xml"]

    def process(self, event):
        """
        event.event_type
            'modified' | 'created' | 'moved' | 'deleted'
        event.is_directory
            True | False
        event.src_path
            path/to/observed/file
        """

        with open(event.src_path, 'r') as xml_source:
            xml_string = xml_source.read()
            parsed = xmltodict.parse(xml_string)
            element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')
            if not element:
                return

            media = Media(
                title=element.get('title1'),
                description=element.get('title3'),
                media_id=element.get('media_id1'),
                hour=magicdate(element.get('hour')),
                length=element.get('title4')
            )
            media.save()

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)


if __name__ == '__main__':
    args = sys.argv[1:]
    observer = Observer()
    observer.schedule(MyHandler(), path=args[0] if args else '.')
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()
import sys
import time
import xmltodict
import magicdate
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler

from .models import Media


class MyHandler(PatternMatchingEventHandler):
    patterns=["*.xml"]

    def process(self, event):
        """
        event.event_type
            'modified' | 'created' | 'moved' | 'deleted'
        event.is_directory
            True | False
        event.src_path
            path/to/observed/file
        """

        with open(event.src_path, 'r') as xml_source:
            xml_string = xml_source.read()
            parsed = xmltodict.parse(xml_string)
            element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')
            if not element:
                return

            media = Media(
                title=element.get('title1'),
                description=element.get('title3'),
                media_id=element.get('media_id1'),
                hour=magicdate(element.get('hour')),
                length=element.get('title4')
            )
            media.save()

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)


if __name__ == '__main__':
    args = sys.argv[1:]
    observer = Observer()
    observer.schedule(MyHandler(), path=args[0] if args else '.')
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()
 

翻译自: https://www.pybloggers.com/2013/12/watching-a-directory-for-file-changes-with-python/

python监视文件更改

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值