有日志如下:
2020-04-02 19:28:50 - *1*product result all time*20190302153429792181010892684972978^170966576^vw_agg_nEgU78m^2020040219285070410_12079403*10.12.0.204
2020-04-02 19:28:50 - *1*product result all time*20191224164314692507426235116468376^29211915^vw_agg_nEgU78m^2020040219285074210_12057799*10.12.0.98
2020-04-02 19:28:50 - *1*product result all time*20190609133048020739441628603549340^109434469^vw_agg_nEgU78m^2020040219285093810_12041928*10.12.0.12
首先按*分割,取出响应时间
cat product.log.2020-04-01
| grep "product result all time"
| awk -F* '{ printf $2 "\n"}' | head -100
这种是awk+action的示例,每行都会执行action{ printf $2 “\n”}
计算相关统计量
BEGIN和END的作用是给程序赋予初始状态和在程序结束之后执行一些扫尾的工作。
任何在BEGIN之后列出的操作(在{}内)将在awk开始扫描输入之前执行,而END之后列出的操作将在扫描完全部的输入之后执行。因此,通常使用BEGIN来显示变量和预置(初始化)变量,使用END来输出最终结果。
注意必须转化为数值型
求和:
cat product.log.2020-04-01
| grep "product result all time"
| awk -F* '{ printf $2 "\n"}'
| awk '{sum+=$1+0} END {print "Sum = ", sum}'
求平均:
cat product.log.2020-04-01
| grep "product result all time"
| awk -F* '{ printf $2 "\n"}'
| awk '{sum+=$1+0} END {print "Average = ", sum/NR}'
求最大:
cat product.log.2020-04-01
| grep "product result all time"
| awk -F* '{ printf $2 "\n"}'
| awk 'BEGIN {max = 0} {if ($1+0>max+0) max=$1 fi} END {print "Max=", max}'
求最小:
cat product.log.2020-04-01
| grep "product result all time"
| awk -F* '{ printf $2 "\n"}'
| awk 'BEGIN {min = 1999999} {if ($1+0<min+0) min=$1 fi} END {print "Min=", min}'
统计百分比:
cat product.log.2020-04-01
| grep "product result all time"
| awk -F* '{ printf $2 "\n"}'
| awk '{a[$1+0]++;s+=1}END{for (j in a) printf "%s %.2f%\n",j,a[j]*100/s}'
平均tps:
cat product.log.2020-04-01
| awk -F"[ ]" '{print $2,1+0 "\n"}'
|sort|uniq -c |sort -nr
|awk 'NR>2{print p}{p=$0}'
|awk '{if(a[$1]==""||a[$1]<$2)a[$1]=$2}END{for(n in a)print n"\t"a[n]}'
| awk '{sum+=$1+0} END {print "Average = ", sum/NR}'
统计访问时间超过50毫秒的请求数
cat product.log.2020-04-01
| awk -F* '{ printf $2 "\n"}'
| awk 'BEGIN {count = 0} {if ($1+0>50) count=count+1 fi} END {print "count=", count}'
截取某段时间日志
cat product.log.2020-04-01 | awk '$2 >="02:00:00" && '$2 <="01:00:00"''
每分钟流量情况
cat product.log.2020-04-01 | awk '$2 >="22:00:00"' | awk -F"[ ]" '{print $2}'| awk -F: '{a=$1":"$2;print a}' | sort|uniq -c