python linux nginx日志,Python 正则分析Nginx日志

最新推荐文章于 2022-03-18 13:16:59 发布

樵枫

最新推荐文章于 2022-03-18 13:16:59 发布

阅读量104

点赞数

文章标签： python linux nginx日志

有个需求要分析Nginx日志，也懒得去研究logstach之类的开源工具，干脆直接写一个脚本，自己根据需求来实现：

先看日志格式：我们跟别人的不太一样，所以没办法了：

12.195.166.35 [10/May/2015:14:38:09 +0800] "list.xxxx.com" "GET /new/10:00/9.html?cat=0,0&sort=price_asc HTTP/1.0" 200 42164 "http://list.linuxidc.com/new/10:00/8.html?cat=0,0&sort=price_asc" "Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; H60-L02 Build/HDH60-L02) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.4.0.558 U3/0.8.0 Mobile Safari/534.30"

上面是我的日志格式：

脚本如下：

#!/usr/bin/env Python

#-*- coding:utf-8 –*-

#Author:xiaoluo

#QQ:942729042

#date:2015:05:12

import re

import sys

log = sys.argv[1]

ip = r"?P[\d.]*"

date = r"?P\d+"

month = r"?P\w+"

year = r"?P\d+"

log_time = r"?P\S+"

timezone = r"""?P

[^\"]*

"""

name = r"""?P\"

[^\"]*\"

"""

method = r"?P\S+"

request = r"?P\S+"

protocol = r"?P\S+"

status = r"?P\d+"

bodyBytesSent = r"?P\d+"

refer = r"""?P\"

[^\"]*\"

"""

userAgent=r"""?P

"""

#f = open('access1.log','r')

#for logline in f.readlines():

p = re.compile(r"(%s)\ \[(%s)/(%s)/(%s)\:(%s)\ (%s)\ (%s)\ (%s)\ (%s)\ (%s)\ (%s)\ (%s)\ (%s)\ (%s)" %(ip, date, month, year, log_time,timezone,name,method,request,protocol,status,bodyBytesSent,refer,userAgent), re.VERBOSE)

def getcode():

codedic={}

f = open(log,'r')

for logline in f.readlines():

matchs = p.match(logline)

if matchs !=None:

allGroups =matchs.groups()

status= allGroups[10]

codedic[status]=codedic.get(status,0) +1

return codedic

f.close()

def getIP():

f = open(log,'r')

IPdic={}

for logline in f.readlines():

matchs = p.match(logline)

if matchs !=None:

allGroups =matchs.groups()

IP=allGroups[0]

IPdic[IP] = IPdic.get(IP,0) +1

IPdic=sorted(IPdic.iteritems(),key=lambda c:c[1],reverse=True)

IPdic=IPdic[0:21:1]

return IPdic

f.close()

def getURL():

f = open(log,'r')

URLdic={}

for logline in f.readlines():

matchs = p.match(logline)

if matchs !=None:

allGroups =matchs.groups()

urlname = allGroups[6]

URLdic[urlname] = URLdic.get(urlname,0) +1

URLdic=sorted(URLdic.iteritems(),key=lambda c:c[1],reverse=True)

URLdic=URLdic[0:21:1]

return URLdic

def getpv():

f = open(log,'r')

pvdic={}

for logline in f.readlines():

matchs = p.match(logline)

if matchs !=None:

allGroups =matchs.groups()

timezone=allGroups[4]

time = timezone.split(':')

minute = time[0]+":"+time[1]

pvdic[minute]=pvdic.get(minute,0) +1

pvdic=sorted(pvdic.iteritems(),key=lambda c:c[1],reverse=True)

pvdic=pvdic[0:21:1]

return pvdic

if __name__=='__main__':

print "网站监控状况检查状态码"

print getcode()

print "网站访问量最高的20个IP地址"

print getIP()

print "网站访问最多的20个站点名"

print getURL()

print getpv()

这里要指出的是。我当初是给正则匹配的时候单独封装一个函数的，这样就省去了下面每个函数要打开之前都要单独打开一遍文件，但是我return的时候只能用列表的形式返回，结果列表太大把我的内存耗光了，我的是32G的内存，15G的日志。

效果：

最后一个函数是统计每分钟，访问的数量

Nginx 的详细介绍：请点这里

Nginx 的下载地址：请点这里

樵枫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python linux nginx日志,Python 正则分析Nginx日志

有个需求要分析Nginx日志，也懒得去研究logstach之类的开源工具，干脆直接写一个脚本，自己根据需求来实现：先看日志格式：我们跟别人的不太一样，所以没办法了：12.195.166.35 [10/May/2015:14:38:09 +0800] "list.xxxx.com" "GET /new/10:00/9.html?cat=0,0&sort=price_asc HTTP/1.0"...
复制链接

扫一扫