使用awk+sort+uniq进行文本分析

最新推荐文章于 2023-06-20 21:44:41 发布

weixin_34344403

最新推荐文章于 2023-06-20 21:44:41 发布

阅读量126

点赞数

文章标签： awk 运维网络

原文链接：http://blog.51cto.com/wanyuetian/1716971

版权

1、uniq命令
uniq - report or omit repeated lines
介绍：uniq对指定的ASCII文件或标准输入进行唯一性检查，以判断文本文件中重复出现的行。常用于系统排查及日志分析
命令格式：
uniq [OPTION]... [File1 [File2]]
uniq从已经排序好的文本文件File1中删除重复的行，输出到标准标准输出或File2。常作为过滤器，配合管道使用。
在使用uniq命令之前，必须确保操作的文本文件已经过sort排序，若不带参数运行uniq，将删除重复的行。
常见参数：
-c, --count prefix lines by the number of occurrences 去重后计数
2、实战演练

测试数据：

[root@web01 ~]# cat uniq.txt 
10.0.0.9
10.0.0.8
10.0.0.7
10.0.0.7
10.0.0.8
10.0.0.8
10.0.0.9

a、直接接文件，不加任何参数，只对相邻的相同内容去重：

[root@web01 ~]# uniq uniq.txt 
10.0.0.9
10.0.0.8
10.0.0.7
10.0.0.8
10.0.0.9

b、sort命令让重复的行相邻（-u参数也可完全去重），然后用uniq进行完全去重

[root@web01 ~]# sort uniq.txt 
10.0.0.7
10.0.0.7
10.0.0.8
10.0.0.8
10.0.0.8
10.0.0.9
10.0.0.9
[root@web01 ~]# sort -u uniq.txt 
10.0.0.7
10.0.0.8
10.0.0.9
[root@web01 ~]# sort uniq.txt|uniq
10.0.0.7
10.0.0.8
10.0.0.9

c、sort配合uniq去重后计数

[root@web01 ~]# sort uniq.txt|uniq -c
      2 10.0.0.7
      3 10.0.0.8
      2 10.0.0.9

3、企业案例
处理一下文件内容，将域名取出并根据域名进行计数排序处理（百度和sohu面试题）

[root@web01 ~]# cat access.log 
http://www.etiantian.org/index.html
http://www.etiantian.org/1.html
http://post.etiantian.org/index.html
http://mp3.etiantian.org/index.html
http://www.etiantian.org/3.html
http://post.etiantian.org/2.html

解答：
分析：此类问题是运维工作中最常见的问题。可以演变成分析日志，查看TCP各个状态连接数，查看单IP连接数排名等等。

[root@web01 ~]# awk -F '[/]+' '{print $2}' access.log|sort|uniq -c|sort -rn -k1
      3 www.etiantian.org
      2 post.etiantian.org
      1 mp3.etiantian.org

转载于:https://blog.51cto.com/wanyuetian/1716971

weixin_34344403

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用awk+sort+uniq进行文本分析

1、uniq命令uniq - report or omit repeated lines介绍：uniq对指定的ASCII文件或标准输入进行唯一性检查，以判断文本文件中重复出现的行。常用于系统排查及日志分析命令格式：uniq [OPTION]... [File1 [File2]]uniq从已经排序好的文本文件File1中删除重复的行，输出到标准标准输出或File2。常作为过滤器...
复制链接

扫一扫