网络安全--筛选给定范围内的日志

目录

pass:在观看此篇前先看上篇的awk介绍

一、文件

二、第一方法

1.步骤

​编辑三、第二方法:

awk内容:

结果:

 四、第二要求

统计独立ip

操作步骤:

1.先创建文件写入一下测试内容:

2.书写awk代码如下:

 3.未生成之前:

4.生成后:

​编辑 5.检查

五、第三要求

处理字段缺失的数据

内容:

 1.问题:

2.奇异的解题思路---重构(无法解决)

​编辑 3.小技巧:将空白部分保留下来打印

 4.看下一个有字符如何打印:

5.解决:

总结:逗号不再是分隔符,可正常打印

 六、第四要求

筛选给定时间范围内的日志

 问题解释:

概念引入

相关例题:

文件中引入内容:

 awk内容:

运行内容如下:

 解释:


pass:在观看此篇前先看上篇的awk介绍

一、文件

找到自己目录下Apache的工作日志作为例子,这里我挑选了一个比较大的 

127.0.0.1 - - [30/Jul/2023:08:34:54 +0800] "GET /less02/index.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:34:54 +0800] "GET /favicon.ico HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:08:36:05 +0800] "GET /less02/index.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:36:55 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:36:55 +0800] "GET /less02/js.js HTTP/1.1" 200 211
127.0.0.1 - - [30/Jul/2023:08:37:55 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:38:17 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:17 +0800] "GET /less02/js.js HTTP/1.1" 200 226
127.0.0.1 - - [30/Jul/2023:08:38:20 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:20 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:35 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:35 +0800] "GET /less02/js.js HTTP/1.1" 200 226
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:39 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:59 +0800] "GET /less02/js.js HTTP/1.1" 200 249
127.0.0.1 - - [30/Jul/2023:08:39:59 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:42:20 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:42:20 +0800] "GET /less02/js.js HTTP/1.1" 200 178
127.0.0.1 - - [30/Jul/2023:08:43:20 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:44:50 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:44:50 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:45:50 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:50:04 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:50:04 +0800] "GET /less02/js.js HTTP/1.1" 200 271
127.0.0.1 - - [30/Jul/2023:08:50:08 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:50:08 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:51:04 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:58:41 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:41 +0800] "GET /less02/js.js HTTP/1.1" 200 472
127.0.0.1 - - [30/Jul/2023:08:58:47 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:47 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:48 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:48 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:48 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:59:47 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:40:28 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:40:28 +0800] "GET /less02/js.js HTTP/1.1" 200 180
127.0.0.1 - - [30/Jul/2023:14:40:28 +0800] "GET /favicon.ico HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:14:40:53 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:40:53 +0800] "GET /less02/js.js HTTP/1.1" 200 180
127.0.0.1 - - [30/Jul/2023:14:40:54 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:40:54 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/js.js HTTP/1.1" 200 180
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:40 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:40 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:40 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:42:39 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:42:51 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:42:51 +0800] "GET /less02/js.js HTTP/1.1" 200 189
127.0.0.1 - - [30/Jul/2023:14:43:51 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:44:03 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:44:03 +0800] "GET /less02/js.js HTTP/1.1" 200 231
127.0.0.1 - - [30/Jul/2023:14:45:03 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:48:51 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:48:51 +0800] "GET /less02/js.js HTTP/1.1" 200 253
127.0.0.1 - - [30/Jul/2023:14:48:52 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:48:52 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:49:51 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:52:27 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:52:27 +0800] "GET /less02/js.js HTTP/1.1" 200 281
127.0.0.1 - - [30/Jul/2023:21:56:45 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:56:45 +0800] "GET /less02/js.js HTTP/1.1" 200 36
127.0.0.1 - - [30/Jul/2023:21:57:15 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:15 +0800] "GET /less02/js.js HTTP/1.1" 200 34
127.0.0.1 - - [30/Jul/2023:21:57:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:36 +0800] "GET /less02/js.js HTTP/1.1" 200 33
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:57 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:57 +0800] "GET /less02/js.js HTTP/1.1" 200 31
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:03 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:03 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:57 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:21:59:25 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:25 +0800] "GET /less02/js.js HTTP/1.1" 200 32
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:45 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:45 +0800] "GET /less02/js.js HTTP/1.1" 200 32
127.0.0.1 - - [30/Jul/2023:21:59:46 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:46 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/js.js HTTP/1.1" 200 33
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:01:34 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/js.js HTTP/1.1" 200 51
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:40 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:41 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:41 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:06:39 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:23:35:11 +0800] "GET /less02/tools HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/ HTTP/1.1" 200 56719
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/assets/main.css HTTP/1.1" 200 626596
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/images/file-32x32.png HTTP/1.1" 200 1946
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/images/file-128x128.png HTTP/1.1" 200 19378
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/images/cook_male-32x32.png HTTP/1.1" 200 1624
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/assets/main.js HTTP/1.1" 200 4237575
127.0.0.1 - - [30/Jul/2023:23:41:08 +0800] "GET /less/index.html HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:23:41:20 +0800] "GET /less01/index.html HTTP/1.1" 200 790
127.0.0.1 - - [30/Jul/2023:23:41:20 +0800] "GET /less01/css/style.css HTTP/1.1" 200 55
127.0.0.1 - - [30/Jul/2023:23:41:34 +0800] "GET /less02/index.html HTTP/1.1" 200 323
127.0.0.1 - - [30/Jul/2023:23:42:34 +0800] "-" 408 -

二、第一方法

统计日志中各IP访问304状态码的次数

1.步骤

首先第一步先测试(看状态码是否可以正常打印)

cat access.log | awk '{print $1  $9}'

 其次统计出其次数和状态码如下:

注:因为我本机的Apache一直用来测试,因此只有访问本地端,真实的应该为:

 当我们看到一个ip访问多次的时候,就应该明白此为爆破扫描ip应及时封除

这里我以我的举例如下:

awk '$9==304{arr[$1]++}END{for(i in arr){print arr[i],i}}' access.log 


三、第二方法:

awk内容:

$9 == 200 {
    arr[$1]++
}
END {
    PROCINFO["sorted_in"] = "@val_num_desc";
    for (i in arr) {
        if (cnt++ == 10) {
            exit
        }
        print arr[i], i
    }
}

结果:

 

 访问多次解决方法:自动封堵

如何解决:

统计非200状态码的ip,并获取次数最多的前10个ip

awk中排序函数 sort asort

设置排序顺序PROCINFO

 四、第二要求

统计独立ip

需求:统计每个URL的独立访问IP有多少个(去重),并且要为每个URL保存一个对应的文件

操作步骤:

1.先创建文件写入一下测试内容:

a.com.cn|202.109.134.23|2015-11-20 20:34:43|guest
b.com.cn|202.109.134.23|2015-11-20 20:34:48|guest
c.com.cn|202.109.134.24|2015-11-20 20:34:48|guest
a.com.cn|202.109.134.23|2015-11-20 20:34:43|guest
a.com.cn|202.109.134.24|2015-11-20 20:34:43|guest
b.com.cn|202.109.134.25|2015-11-20 20:34:48|guest

2.书写awk代码如下:

BEGIN{
        FS="|"
}

!arr[$1,$2]++{
        arr1[$1]++
}

END{
        for(i in arr1) {
                print i, arr[i] > (i".txt")
        }
}
~      

 3.未生成之前:

4.生成后:

 5.检查

五、第三要求

处理字段缺失的数据

内容:

ID  name    gender  age  email          phone
1   Bob     male    28   abc@qq.com     18023394012
2   Alice   female  24   def@gmail.com  18084925203
3   Tony    male    21                  17048792503
4   Kevin   male    21   bbb@189.com    17023929033
5   Alex    male    18   ccc@xyz.com    18185904230
6   Andy    female       ddd@139.com    18923902352
7   Jerry   female  25   exdsa@189.com  18785234906
8   Peter   male    20   bax@qq.com     17729348758
9   Steven          23   bc@sohu.com    15947893212
10  Bruce   female  27   bcbd@139.com   13942943905

 1.问题:

当字段缺失时很明显打印错误

2.奇异的解题思路---重构(无法解决)

 3.小技巧:将空白部分保留下来打印

awk '{print $0}' FIELDWIDTHS="2 2:6 2:6 2:3 2:13 2:11" a.txt
FIELDWIDTH第一个字段是字符宽度ID为2,指定2个字符宽度
第两个字段最大为6,但前面和ID之间还有两个空格,所以可以指定宽度为8,也可以跳过两个字符2:6
awk 'NR==4{print $5}' FIELDWIDTHS="2 2:6 2:6 2:3 2:13 2:11" a.txt

 4.看下一个有字符如何打印:

5.解决:

FPAT可以收集正则匹配的结果,并将它们保存在各个字段中。(就像grep匹配成功的部分会加颜色显示,而使用FPAT划分字段,则是将匹配成功的部分保存在字段$1 $2 $3...中)。

总结:逗号不再是分隔符,可正常打印

 cat demo2.txt | awk 'BEGIN{FPAT="[^,]+|\".*\""}{print $1 $3}'

 

 六、第四要求

筛选给定时间范围内的日志

 问题解释:

grep/sed/awk用正则去筛选日志时,如果要精确到小时、分钟、秒,则非常难以实现。

但是awk提供了mktime()函数,它可以将时间转换成epoch时间值。

借此,可以取得日志中的时间字符串部分,再将它们的年、月、日、时、分、秒都取出来,然后放入mktime()构建成对应的epoch值。因为epoch值是数值,所以可以比较大小,从而决定时间的大小。

概念引入

在 AWK 编程语言中,时间戳通常用于处理文本数据中的时间信息。AWK 是一种用于文本处理和数据提取的编程语言,它允许你使用模式匹配和操作来处理文本文件中的行和字段。

在 AWK 中,你可以使用内置函数 systime() 来获取当前的 Unix 时间戳,它返回从 Epoch 时间(1970 年 1 月 1 日)到当前时间的秒数。这可以用于处理时间戳相关的操作。

相关例题:

文件中引入内容:

John 2023-08-01 15:30:45
Alice 2023-08-02 12:45:00
Bob 2023-08-03 09:15:30

 awk内容:

awk '{ 
    cmd = "date -d \"" $2 " " $3 "\" +%s"; 
    cmd | getline timestamp; 
    close(cmd); 
    print $1, timestamp 
}' data.txt

运行内容如下:

 

 解释:

$2$3 表示输入行中的第二个和第三个字段,即日期和时间。date -d 命令被用来将日期和时间转换为 Unix 时间戳,+%s 参数表示输出结果以秒为单位。getline 函数用于执行外部命令并读取其输出,将结果存储在 timestamp 变量中,然后通过 print 命令输出名字和对应的 Unix 时间戳。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值