python提取日志文件的指定记录

最新推荐文章于 2024-08-03 15:44:40 发布

Y_zlin

最新推荐文章于 2024-08-03 15:44:40 发布

阅读量5k

点赞数 3

文章标签： java python 前端

本文链接：https://blog.csdn.net/hhxbc/article/details/126338527

版权

使用python中re模块，利用正则表达式来进行日志文件匹配

利用readlines()来讲log文件中的记录存到一个列表中，然后按每一行来进行遍历进行正则匹配

知识点补充：

python中有神奇的三种读操作：read、readline和readlines

read() ：一次性读取整个文件内容。推荐使用read(size)方法，size越大运行时间越长

readline() ：每次读取一行内容。内存不够时使用，一般不太用

readlines() ：一次性读取整个文件内容，并按行返回到list，方便我们遍历

一般小文件我们都采用read()，不确定大小你就定个size，大文件就用readlines()

import re
class WcsLog:
    data_all = {}
    data_ev = {}
    print("提取ERROR日志")
    with open("./log.log", encoding="utf-8") as f:
        count = 0
        newline = ''
        # 根据正则将匹配到多行数据组成一行日志。
        for line in f.readlines():
            patt = r'.*ERROR.*'  #r是为了防止转义
            pattern = re.compile(patt) # 进行预处理
            result = pattern.findall(line)
            if "" == newline or 0 == len(result):
                newline = newline + " " + line.strip('\n')  # 拼接
                continue
            else:
                # 判断当前行中有没有匹配的字符串
                patt = r'.*执行失败.*'
                pattern = re.compile(patt)
                # 如果当前行匹配到字符串，将改行数据赋值给trsult2，如果没有匹配到将空数组赋值给result2
                result2 = pattern.findall(newline)
                if 0 < len(result2):
                    # 对匹配到的数据根据正则方式进行分段。
                    log = re.split("\s", newline)
                    print( newline)
                    # 在分割字符串后，取出需要的数据。
                    data_ev["date"] = log[0] + log[1]

                    print("匹配到的时间：", data_ev["date"])
                    count += 1
                    print("\n")
                newline = line.strip('\n')
        print("总共", count, "行")

if __name__ =="__main__":
    a = WcsLog()