python log文件处理_Python 处理大Log文件

最近研究 process monitor,对于取得的Log 需要进行简单的统计得出操作的类型,结果是一个 620000行左右的文件。数据示例如下:

"15:20:33.9935624","cmd.exe","1784","IRP_MJ_CREATE","C:\EDK","SUCCESS","Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened"

"15:20:33.9935792","cmd.exe","1784","IRP_MJ_DIRECTORY_CONTROL","C:\EDK\build.*","SUCCESS","Type: QueryDirectory, Filter: build.*, 2: Build"

"15:20:33.9936119","cmd.exe","1784","IRP_MJ_DIRECTORY_CONTROL","C:\EDK","NO MORE FILES","Type: QueryDirectory"

"15:20:33.9936272","cmd.exe","1784","IRP_MJ_CLEANUP","C:\EDK","SUCCESS",""

"15:20:33.9936306","cmd.exe","1784","IRP_MJ_CLOSE","C:\EDK","SUCCESS",""

"15:20:33.9937049","cmd.exe","1784","IRP_MJ_CREATE","C:\EDK","SUCCESS","Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened"

"15:20:33.9937169","cmd.exe","1784","IRP_MJ_DIRECTORY_CONTROL","C:\EDK\build","SUCCESS","Type: QueryDirectory, Filter: build, 2: Build"

"15:20:33.9937342","cmd.exe","1784","IRP_MJ_DIRECTORY_CONTROL","C:\EDK","NO MORE FILES","Type: QueryDirectory"

"15:20:33.9937476","cmd.exe","1784","IRP_MJ_CLEANUP","C:\EDK","SUCCESS",""

"15:20:33.9937541","cmd.exe","1784","IRP_MJ_CLOSE","C:\EDK","SUCCESS",""

"15:20:33.9938281","cmd.exe","1784","IRP_MJ_CREATE","C:\EDK\BaseTools\Bin","SUCCESS","Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened"

"15:20:33.9938586","cmd.exe","1784","IRP_MJ_DIRECTORY_CONTROL","C:\EDK\BaseTools\Bin\build.*","NO SUCH FILE","Type: QueryDirectory, Filter: build.*"

"15:20:33.9938787","cmd.exe","1784","IRP_MJ_CLEANUP","C:\EDK\BaseTools\Bin","SUCCESS",""

我需要提取出来IRP_MJ_DIRECTORY_CONTROL 和IRP_MJ_CLEANUP这样的操作记录下来。

听说Python擅长于此,于是简单学习一下进行统计。

第一个问题是如何处理大文件(目前的数据不算大,200MB),在【参考1】找到方法;

第二个问题是如何进行统计的问题。我只需要记录一个操作有还是没有,所以使用字典类型最合适不过;

最终代码如下(Python2.7环境下运行)

class Load_Corpus_with_Iteration(object):

def __init__(self,path):

self.path=path

def __iter__(self):

for line in open(self.path):

yield line.split()

corpus = Load_Corpus_with_Iteration('logfile.csv')

operate = {}

index=0

for item in corpus:

list1 = (item[0].split(','))

opStr=list1[3]

operate[opStr]=1

index=index+1

if index % 10000 ==0:

print index,str(operate)

结果如下(为了便于阅读,经过简单排版)

620000 {

'"IRP_MJ_DIRECTORY_CONTROL"': 1,

'"FASTIO_READ"': 1,

'"IRP_MJ_READ"': 1,

'"FASTIO_LOCK"': 1,

'"FASTIO_RELEASE_FOR_SECTION_SYNCHRONIZATION"': 1,

'"IRP_MJ_CLOSE"': 1,

'"IRP_MJ_QUERY_INFORMATION"': 1,

'"IRP_MJ_SET_INFORMATION"': 1,

'"FASTIO_ACQUIRE_FOR_SECTION_SYNCHRONIZATION"': 1,

'"FASTIO_QUERY_INFORMATION"': 1,

'"IRP_MJ_WRITE"': 1,

'"FASTIO_ACQUIRE_FOR_CC_FLUSH"': 1,

'"IRP_MJ_FILE_SYSTEM_CONTROL"': 1,

'"IRP_MJ_QUERY_VOLUME_INFORMATION"': 1,

'"FASTIO_WRITE"': 1,

'"IRP_MJ_CREATE"': 1,

'"FASTIO_NETWORK_QUERY_OPEN"': 1,

'"FASTIO_RELEASE_FOR_CC_FLUSH"': 1,

'"FASTIO_UNLOCK_SINGLE"': 1, '

"IRP_MJ_CLEANUP"': 1,

'"FASTIO_CHECK_IF_POSSIBLE"': 1}

参考:

1. https://blog.csdn.net/chixujohnny/article/details/53069988

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值