python watchdog pattern_watchdog（python）-仅监视一种文件格式，并忽略“ PatternMatchingEventHandler”中的所有其他内容...

最新推荐文章于 2024-05-03 18:41:34 发布

weixin_39838028

最新推荐文章于 2024-05-03 18:41:34 发布

阅读量345

点赞数

文章标签： python watchdog pattern

本文链接：https://blog.csdn.net/weixin_39838028/article/details/111450669

版权

I'm running code from this article and made some changes to monitor file creations/additions of only one format, that's .csv in a specified directory.

the problem now is:

My program breaks(stops monitoring, but keeps running), whenever the new file added is not of .csv format; and to compensate for that, here's what i did with ignore_patterns argument(but the program still stops monitoring after a new file of other format is added):

PatternMatchingEventHandler(patterns="*.csv", ignore_patterns=["*~"], ignore_directories=True, case_sensitive=True)

the complete code is:

import time

import csv

from datetime import datetime

from watchdog.observers import Observer

from watchdog.events import PatternMatchingEventHandler

from os import path

from pandas import read_csv

# class that takes care of everything

class file_validator(PatternMatchingEventHandler):

def __init__(self, source_path):

# setting parameters for 'PatternMatchingEventHandler'

super(file_validator, self).__init__(patterns="*.csv", ignore_patterns=["*~"], ignore_directories=True, case_sensitive=True)

self.source_path = source_path

self.print_info = None

def on_created(self, event):

# this is the new file that was created

new_file = event.src_path

# details of each new .csv file

# demographic details

file_name = path.basename(new_file)

file_size = f"{path.getsize(new_file) / 1000} KiB"

file_creation = f"{datetime.fromtimestamp(path.getmtime(new_file)).strftime('%Y-%m-%d %H:%M:%S')}"

new_data = read_csv(new_file)

# more details

number_columns = new_data.shape[1]

data_types_data = [

('float' if i == 'float64' else ('int' if i == 'int64' else ('character' if i == 'object' else i))) for i in

[x.name for x in list(new_data.dtypes)]]

null_count_data = list(dict(new_data.isna().sum()).values())

print(f"{file_name}, {file_size}, {file_creation}, {number_columns}")

# trying to access this info, but of no help

self.print_info = f"{file_name}, {file_size}, {file_creation}, {number_columns}"

def return_logs(self):

return self.print_info

# main function

if __name__ == "__main__":

some_path = "C:\\Users\\neevaN_Reddy\\Documents\\learning dash\\"

my_validator = file_validator(source_path=some_path)

my_observer = Observer()

my_observer.schedule(my_validator, some_path, recursive=True)

my_observer.start()

try:

while True:

time.sleep(1)

except KeyboardInterrupt:

my_observer.stop()

my_observer.join()

# # this doesn't print anything

print(my_validator.return_logs)

EDIT 1(after Quentin Pradet's comment):

after your suggestion in the comment I've changed my arguments to:

super(file_validator, self).__init__(patterns="*.csv",

# ignore_patterns=["*~"],

ignore_directories=True,

case_sensitive=True)

and when I copy files of other format(i tried with .ipynb file), this error is what i see(also program stops monitoring even .csv files after this):

Exception in thread Thread-1:

Traceback (most recent call last):

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\threading.py", line 926, in _bootstrap_inner

self.run()

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\watchdog\observers\api.py", line 199, in run

self.dispatch_events(self.event_queue, self.timeout)

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\watchdog\observers\api.py", line 368, in dispatch_events

handler.dispatch(event)

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\watchdog\events.py", line 454, in dispatch

_method_map[event_type](event)

File "C:/Users/neevaN_Reddy/Documents/Work/Project-Aretaeus/diabetes_risk project/file validation using a class.py", line 26, in on_created

new_data = read_csv(new_file)

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 685, in parser_f

return _read(filepath_or_buffer, kwds)

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 463, in _read

data = parser.read(nrows)

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 1154, in read

ret = self._engine.read(nrows)

File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 2059, in read

data = self._reader.read(nrows)

File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read

File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory

File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows

File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows

File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

apparently, there is some error with pandas, which means my on_created function is being triggered for file formats that are not .csv too, which i presume mean that something has to go in ignore_patterns argument to not have the on_created function triggered when a file of some other format is added.

解决方案

Can you try sending patterns as a list instead of a string, eg. patterns=["*.csv"]?

weixin_39838028

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫