系统日志类文本文件中数据的分类统计计算-python

使用Python进行MySQL错误日志分析,统计特定错误信息的每日出现频率,通过文本处理和数据结构操作实现高效分类统计。

python分类统计计算系统日志类文本文件中的数据

程序运行环境:windows 7, Anaconda3 Jupyter Notebook
类型:读书笔记
来源:Python数据分析基础-Clinton W. Brownley. 2017第一版,ISBN:978-7-115-46335-7
Foundations for Analystics with python by Clinton Brownley(O’Reilly).Copyright 2016 Clinton Brownley, 978-1-491-92253-8
第五章,5.3节,P167-174

错误日志通常以文本形式存储。以MySQL错误日志(扩展名为.err)为例。
便于打开,本文python分析用了txt格式。用文本编辑其打开日志文件,更名为testMySQLog.txtg,内容如下:

==========================================================================
246824 10:40:50 mysqld_safe Starting mysald daemon with databases from usr/local/mysql/data
2014-02-03 10:40:55 98765 [note] InnoDB:Compressstables use zlib 1.2.3
2014-02-03 10:40:55 98765 [note] InnoDB:Using atomics to ref count buffer pool pages
2014-02-03 10:47:18 98765 [note] InnoDB:5.6.16 started;log sequence number 123456
2014-02-03 10:47:18 64208 [note] InnoDB:Using atomics to ref count buffer pool pages
2014-02-03 10:47:18 64208 [note] InnoDB:Compressstables use zlib 1.2.3
2014-02-03 10:55:55 64208 [note] InnoDB:usr/local/mysql/bin/mysqld: shutdown complete

135791 15:59:29 mysqld_safe Starting mysald daemon with databases from usr/local/mysql/data
2014-03-07 10:40:55 98765 [note] InnoDB:Compressstables use zlib 1.2.3
2014-03-07 10:40:55 98765 [note] InnoDB:Compressstables use zlib 1.2.3
2014-03-07 10:47:18 98765 [note] InnoDB:5.6.16 started;log sequence number 123456
2014-03-07 10:47:18 64208 [note] InnoDB:Using atomics to ref count buffer pool pages
2014-03-07 10:47:18 64208 [note] InnoDB:Compressstables use zlib 1.2.3
2014-03-07 10:55:55 64208 [note] InnoDB:usr/local/mysql/bin/mysqld: shutdown complete

12457815:59:29 mysqld_safe Starting mysald daemon with databases from usr/local/mysql/data
2014-10-27 10:40:55 98765 [note] InnoDB:Completed initialization of buffer pool
2014-10-27 10:40:55 98765 [note] InnoDB:IPv6 available
2014-10-27 10:47:18 98765 [note] InnoDB:5.6.16 started;log sequence number 123456
2014-10-27 10:47:18 64208 [note] InnoDB:Completed initialization of buffer pool
2014-10-27 10:47:18 64208 [note] InnoDB:IPv6 available
2014-10-27 10:55:55 64208 [note] InnoDB:usr/local/mysql/bin/mysqld: shutdown complete

==========================================================================

利用phthon代码统计报错信息在每天出现的次数,代码如下:

#!/usr/bin/env/python 
#文本文件中数据的任意数目分类计算统计量
import sys
import string
#调试模式
DEBUG_1 = False 
DEBUG_2 = False
DEBUG_3 = False
DEBUG_4 = False
imput_file= 'testMySQLog.txt' #日志文件路径
output_file ='log_error.csv'  #记录日志错误
message ={ }#空字典,是一个嵌套字典
notes = [ ]# 空列表,保存输入的错误日志文件中所有日期发生的全部错误消息
count=0
with open(imput_file,'r',newline='') as text_file:#打开输入文件,供读取
    if DEBUG_1:#调试模式
        print(text_file)
    for row in text_file:
        count+=1
        if DEBUG_2:#调试模式
            print('行{0:d}:'.format(count),row)
        if '[note]' in row: #提取包含[note]这个关键词的行
            row_list = row.split(' ',4)#分词
            day=row_list[0].strip()
            note= row_list[4].strip('\n').strip()
            if DEBUG_3:
                print('第{0:d}次循环:\n'.format(count))
                print(row_list)
                print('day:',day)
                print('note:',note)
            if note not in notes:#记录错误信息note到列表notes中
                notes.append(note)
                if DEBUG_3:
                    print('notes:',notes)
            #每条错误消息与对应的计数值可以反映在对应的日期中
            #即每天某条错误消息出现的次数
            if day not in message:#检验day是否包含在字典message中
                message[day]={ }#日期作为关键字添加到字典中
                if DEBUG_3:
                    print('message:',message)
            if note not in message[day]:#检验note消息是否是字典中的一个键
                message[day][note]=1#note还不是字典中的一个键,首次出现
            else:
                message[day][note]+=1#note已经是字典中的一个键,出现次数加1
    if DEBUG_3:
        print('最终结果:\n')
        print('notes:',notes)
        print('message:',message)

#记录统计信息
filewriter = open(output_file,'w',newline='')
header=['date']#创建一个列表变量,并将字符串date赋给这个列表
header.extend(notes)
if DEBUG_4:
    print('extend header:',header)
header=','.join(map(str,header))+'\n'#把header转为长字符串,并满足csv的格式
if DEBUG_4:
    print('join header:',header)
    print('type header:',type(header))
filewriter.write(header)#将标题写入csv文件中
count=0
for day, day_value in message.items():#将messsage中的信息整理好,写入csv文件中
    row_of_output = [ ]
    row_of_output.append(day)#文件的第一列,日期
    if DEBUG_4:
        count+=1
        print('第{0:d}次循环:'.format(count))
        print('day:',day)
        print('day_value:',day_value)
        print('row_of_output:',row_of_output)
        print('notes:',notes)
    for index in range(len(notes)):#遍历记录的所有报错消息
        if notes[index] in day_value.keys():#某天对应的报错信息是否出现
            row_of_output.append(day_value[notes[index]])#记录报错信息出现的次数
        else:
            row_of_output.append(0)#如该消息在当天没有出现,则补零
        if DEBUG_4:
            print('遍历报错消息\n','第{0:d}次循环'.format(index))
            print('row_of_output',row_of_output)
    #row_of_output转为长字符串,并满足csv的格式
    output=','.join(map(str,row_of_output))+'\n'
    filewriter.write(output)#写入csv文件   
filewriter.close()#释放文件资源
    

代码中利用DEBUG_1,DEBUG_2,DEBUG_3,DEBUG_4分别展示了不同阶段代码的运行情况。

============================================================================
#调试模式
DEBUG_1 = True
DEBUG_2 = False
DEBUG_3 = False
DEBUG_4 = False
打开文件的句柄
运行结果:
<_io.TextIOWrapper name=‘testMySQLog.txt’ mode=‘r’ encoding=‘cp936’>

============================================================================
#调试模式
DEBUG_1 = False
DEBUG_2 = True
DEBUG_3 = False
DEBUG_4 = False
row遍历文件时每一次读到的数据
运行结果:
行1: 246824 10:40:50 mysqld_safe Starting mysald daemon with databases from usr/local/mysql/data

行2: 2014-02-03 10:40:55 98765 [note] InnoDB:Compressstables use zlib 1.2.3

行3: 2014-02-03 10:40:55 98765 [note] InnoDB:Using atomics to ref count buffer pool pages

行4: 2014-02-03 10:47:18 98765 [note] InnoDB:5.6.16 started;log sequence number 123456

行5: 2014-02-03 10:47:18 64208 [note] InnoDB:Using atomics to ref count buffer pool pages

行6: 2014-02-03 10:47:18 64208 [note] InnoDB:Compressstables use zlib 1.2.3

行7: 2014-02-03 10:55:55 64208 [note] InnoDB:usr/local/mysql/bin/mysqld: shutdown complete

行8:

行9: 135791 15:59:29 mysqld_safe Starting mysald daemon with databases from usr/local/mysql/data

行10: 2014-03-07 10:40:55 98765 [note] InnoDB:Compressstables use zlib 1.2.3

行11: 2014-03-07 10:40:55 98765 [note] InnoDB:Compressstables use zlib 1.2.3

行12: 2014-03-07 10:47:18 98765 [note] InnoDB:5.6.16 started;log sequence number 123456

行13: 2014-03-07 10:47:18 64208 [note] InnoDB:Using atomics to ref count buffer pool pages

行14: 2014-03-07 10:47:18 64208 [note] InnoDB:Compressstables use zlib 1.2.3

行15: 2014-03-07 10:55:55 64208 [note] InnoDB:usr/local/mysql/bin/mysqld: shutdown complete

行16:

行17: 12457815:59:29 mysqld_safe Starting mysald daemon with databases from usr/local/mysql/data

行18: 2014-10-27 10:40:55 98765 [note] InnoDB:Completed initialization of buffer pool

行19: 2014-10-27 10:40:55 98765 [note] InnoDB:IPv6 available

行20: 2014-10-27 10:47:18 98765 [note] InnoDB:5.6.16 started;log sequence number 123456

行21: 2014-10-27 10:47:18 64208 [note] InnoDB:Completed initialization of buffer pool

行22: 2014-10-27 10:47:18 64208 [note] InnoDB:IPv6 available

行23: 2014-10-27 10:55:55 64208 [note] InnoDB:usr/local/mysql/bin/mysqld: shutdown complete

============================================================================

#调试模式
DEBUG_1 = False
DEBUG_2 = False
DEBUG_3 = True
DEBUG_4 = False
提取日志中包含[note]这个关键词的行,写入字典message中
message 格式 日期:报错信息列表
报错信息列表: 报错信息(字符格式):出现次数
到此,统计已经完成
运行结果:
第2次循环:

[‘2014-02-03’, ‘10:40:55’, ‘98765’, ‘[note]’, ‘InnoDB:Compressstables use zlib 1.2.3\r\n’]
day: 2014-02-03
note: InnoDB:Compressstables use zlib 1.2.3
notes: [‘InnoDB:Compressstables use zlib 1.2.3’]
message: {‘2014-02-03’: {}}
第3次循环:

[‘2014-02-03’, ‘10:40:55’, ‘98765’, ‘[note]’, ‘InnoDB:Using atomics to ref count buffer pool pages\r\n’]
day: 2014-02-03
note: InnoDB:Using atomics to ref count buffer pool pages
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’]
第4次循环:

[‘2014-02-03’, ‘10:47:18’, ‘98765’, ‘[note]’, ‘InnoDB:5.6.16 started;log sequence number 123456\r\n’]
day: 2014-02-03
note: InnoDB:5.6.16 started;log sequence number 123456
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’]
第5次循环:

[‘2014-02-03’, ‘10:47:18’, ‘64208’, ‘[note]’, ‘InnoDB:Using atomics to ref count buffer pool pages\r\n’]
day: 2014-02-03
note: InnoDB:Using atomics to ref count buffer pool pages
第6次循环:

[‘2014-02-03’, ‘10:47:18’, ‘64208’, ‘[note]’, ‘InnoDB:Compressstables use zlib 1.2.3\r\n’]
day: 2014-02-03
note: InnoDB:Compressstables use zlib 1.2.3
第7次循环:

[‘2014-02-03’, ‘10:55:55’, ‘64208’, ‘[note]’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete\r\n’]
day: 2014-02-03
note: InnoDB:usr/local/mysql/bin/mysqld: shutdown complete
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’]
第10次循环:

[‘2014-03-07’, ‘10:40:55’, ‘98765’, ‘[note]’, ‘InnoDB:Compressstables use zlib 1.2.3\r\n’]
day: 2014-03-07
note: InnoDB:Compressstables use zlib 1.2.3
message: {‘2014-02-03’: {‘InnoDB:Compressstables use zlib 1.2.3’: 2, ‘InnoDB:Using atomics to ref count buffer pool pages’: 2, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}, ‘2014-03-07’: {}}
第11次循环:

[‘2014-03-07’, ‘10:40:55’, ‘98765’, ‘[note]’, ‘InnoDB:Compressstables use zlib 1.2.3\r\n’]
day: 2014-03-07
note: InnoDB:Compressstables use zlib 1.2.3
第12次循环:

[‘2014-03-07’, ‘10:47:18’, ‘98765’, ‘[note]’, ‘InnoDB:5.6.16 started;log sequence number 123456\r\n’]
day: 2014-03-07
note: InnoDB:5.6.16 started;log sequence number 123456
第13次循环:

[‘2014-03-07’, ‘10:47:18’, ‘64208’, ‘[note]’, ‘InnoDB:Using atomics to ref count buffer pool pages\r\n’]
day: 2014-03-07
note: InnoDB:Using atomics to ref count buffer pool pages
第14次循环:

[‘2014-03-07’, ‘10:47:18’, ‘64208’, ‘[note]’, ‘InnoDB:Compressstables use zlib 1.2.3\r\n’]
day: 2014-03-07
note: InnoDB:Compressstables use zlib 1.2.3
第15次循环:

[‘2014-03-07’, ‘10:55:55’, ‘64208’, ‘[note]’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete\r\n’]
day: 2014-03-07
note: InnoDB:usr/local/mysql/bin/mysqld: shutdown complete
第18次循环:

[‘2014-10-27’, ‘10:40:55’, ‘98765’, ‘[note]’, ‘InnoDB:Completed initialization of buffer pool\r\n’]
day: 2014-10-27
note: InnoDB:Completed initialization of buffer pool
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’]
message: {‘2014-02-03’: {‘InnoDB:Compressstables use zlib 1.2.3’: 2, ‘InnoDB:Using atomics to ref count buffer pool pages’: 2, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}, ‘2014-03-07’: {‘InnoDB:Compressstables use zlib 1.2.3’: 3, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:Using atomics to ref count buffer pool pages’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}, ‘2014-10-27’: {}}
第19次循环:

[‘2014-10-27’, ‘10:40:55’, ‘98765’, ‘[note]’, ‘InnoDB:IPv6 available\r\n’]
day: 2014-10-27
note: InnoDB:IPv6 available
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’, ‘InnoDB:IPv6 available’]
第20次循环:

[‘2014-10-27’, ‘10:47:18’, ‘98765’, ‘[note]’, ‘InnoDB:5.6.16 started;log sequence number 123456\r\n’]
day: 2014-10-27
note: InnoDB:5.6.16 started;log sequence number 123456
第21次循环:

[‘2014-10-27’, ‘10:47:18’, ‘64208’, ‘[note]’, ‘InnoDB:Completed initialization of buffer pool\r\n’]
day: 2014-10-27
note: InnoDB:Completed initialization of buffer pool
第22次循环:

[‘2014-10-27’, ‘10:47:18’, ‘64208’, ‘[note]’, ‘InnoDB:IPv6 available\r\n’]
day: 2014-10-27
note: InnoDB:IPv6 available
第23次循环:

[‘2014-10-27’, ‘10:55:55’, ‘64208’, ‘[note]’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’]
day: 2014-10-27
note: InnoDB:usr/local/mysql/bin/mysqld: shutdown complete
最终结果:

notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’, ‘InnoDB:IPv6 available’]
message: {‘2014-02-03’: {‘InnoDB:Compressstables use zlib 1.2.3’: 2, ‘InnoDB:Using atomics to ref count buffer pool pages’: 2, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}, ‘2014-03-07’: {‘InnoDB:Compressstables use zlib 1.2.3’: 3, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:Using atomics to ref count buffer pool pages’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}, ‘2014-10-27’: {‘InnoDB:Completed initialization of buffer pool’: 2, ‘InnoDB:IPv6 available’: 2, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}}

===========================================================================

#调试模式
DEBUG_1 = False
DEBUG_2 = False
DEBUG_3 = False
DEBUG_4 = True
将保存在message中的统计结果保存到log_error.csv文件中,保存格式为

date报错信息1报错信息2… …报错信息n
2014-02-03出现次数出现次数出现次数出现次数
2014-03-07出现次数出现次数出现次数出现次数
2014-10-27出现次数出现次数出现次数出现次数

运行结果:
extend header: [‘date’, ‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’, ‘InnoDB:IPv6 available’]
join header: date,InnoDB:Compressstables use zlib 1.2.3,InnoDB:Using atomics to ref count buffer pool pages,InnoDB:5.6.16 started;log sequence number 123456,InnoDB:usr/local/mysql/bin/mysqld: shutdown complete,InnoDB:Completed initialization of buffer pool,InnoDB:IPv6 available

type header: <class ‘str’>
第1次循环:
day: 2014-02-03
day_value: {‘InnoDB:Compressstables use zlib 1.2.3’: 2, ‘InnoDB:Using atomics to ref count buffer pool pages’: 2, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}
row_of_output: [‘2014-02-03’]
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’, ‘InnoDB:IPv6 available’]
遍历报错消息
第0次循环
row_of_output [‘2014-02-03’, 2]
遍历报错消息
第1次循环
row_of_output [‘2014-02-03’, 2, 2]
遍历报错消息
第2次循环
row_of_output [‘2014-02-03’, 2, 2, 1]
遍历报错消息
第3次循环
row_of_output [‘2014-02-03’, 2, 2, 1, 1]
遍历报错消息
第4次循环
row_of_output [‘2014-02-03’, 2, 2, 1, 1, 0]
遍历报错消息
第5次循环
row_of_output [‘2014-02-03’, 2, 2, 1, 1, 0, 0]
第2次循环:
day: 2014-03-07
day_value: {‘InnoDB:Compressstables use zlib 1.2.3’: 3, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:Using atomics to ref count buffer pool pages’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}
row_of_output: [‘2014-03-07’]
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’, ‘InnoDB:IPv6 available’]
遍历报错消息
第0次循环
row_of_output [‘2014-03-07’, 3]
遍历报错消息
第1次循环
row_of_output [‘2014-03-07’, 3, 1]
遍历报错消息
第2次循环
row_of_output [‘2014-03-07’, 3, 1, 1]
遍历报错消息
第3次循环
row_of_output [‘2014-03-07’, 3, 1, 1, 1]
遍历报错消息
第4次循环
row_of_output [‘2014-03-07’, 3, 1, 1, 1, 0]
遍历报错消息
第5次循环
row_of_output [‘2014-03-07’, 3, 1, 1, 1, 0, 0]
第3次循环:
day: 2014-10-27
day_value: {‘InnoDB:Completed initialization of buffer pool’: 2, ‘InnoDB:IPv6 available’: 2, ‘InnoDB:5.6.16 started;log sequence number 123456’: 1, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’: 1}
row_of_output: [‘2014-10-27’]
notes: [‘InnoDB:Compressstables use zlib 1.2.3’, ‘InnoDB:Using atomics to ref count buffer pool pages’, ‘InnoDB:5.6.16 started;log sequence number 123456’, ‘InnoDB:usr/local/mysql/bin/mysqld: shutdown complete’, ‘InnoDB:Completed initialization of buffer pool’, ‘InnoDB:IPv6 available’]
遍历报错消息
第0次循环
row_of_output [‘2014-10-27’, 0]
遍历报错消息
第1次循环
row_of_output [‘2014-10-27’, 0, 0]
遍历报错消息
第2次循环
row_of_output [‘2014-10-27’, 0, 0, 1]
遍历报错消息
第3次循环
row_of_output [‘2014-10-27’, 0, 0, 1, 1]
遍历报错消息
第4次循环
row_of_output [‘2014-10-27’, 0, 0, 1, 1, 2]
遍历报错消息
第5次循环
row_of_output [‘2014-10-27’, 0, 0, 1, 1, 2, 2]

可以到本代码同一目录下打开log_error.csv,可以得到以下表格

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值