uniq与sort -u去重区别和联系

最新推荐文章于 2024-08-12 12:00:18 发布

FeelTouch Labs

最新推荐文章于 2024-08-12 12:00:18 发布

阅读量8k

点赞数

分类专栏： Unix/Linux/Shell/Vim/Awk/Tcl

Unix/Linux/Shell/Vim/Awk/Tcl 专栏收录该内容

81 篇文章 0 订阅

订阅专栏

uniq所谓的重复是连续出现的相同记录。而sort -u是全局的。先sort，再用uniq可以实现sort -u(即sort -u file.txt 等价于sort file.txt | uniq)

sort -u 和 uniq都能起到删除重复信息的功能，那么他们的区别究竟在哪呢？

$ cat test
jason
jason
jason
fffff
jason

下面分别执行三个命令

1：sort -u test

sort -u test
fffff
jason

2: uniq test

$uniq test
jason
fffff
jason

3: sort test|uniq

$sort test |uniq
fffff
jason

从上面三个命令我们很容易看出他们之间的区别。uniq所谓的重复是连续出现的相同记录

#cat a.txt | uniq -c -i | sort -k2 -n 排重，排重输出的第二列正序排列
#cat a.txt | uniq -c -i | sort -k2 -rn 排重，排重输出的第二列逆序排列

uniq 参数解释

-c 统计重复数量

     -c      Precede each output line with the count of the number of times
             the line occurred in the input, followed by a single space.

     -d      Only output lines that are repeated in the input.

     -f num Ignore the first num fields in each input line when doing compar-
             isons. A field is a string of non-blank characters separated
             from adjacent fields by blanks. Field numbers are one based,
             i.e., the first field is field one.

     -s chars
             Ignore the first chars characters in each input line when doing
             comparisons. If specified in conjunction with the -f option, the
             first chars characters after the first num fields will be
             ignored. Character numbers are one based, i.e., the first char-
             acter is character one.

     -u      Only output lines that are not repeated in the input.

-i Case insensitive comparison of lines.

=============================================================================

linux关于sort命令的高级用法（按多个列值进行排列）

如果单纯地使用sort按行进行排序比较简单，

但是使用sort按多个列值排列，同时使用tab作为分隔符，而且对于某些列需要进行逆序排列，这样sort命令写起来就比较麻烦了

比如下面的文件内容，使用[TAB]进行分割:

Group-ID   Category-ID   Text        Frequency
----------------------------------------------
200        1000          oranges     10
200        900           bananas     5
200        1000          pears       8
200        1000          lemons      10
200        900           figs        4
190        700           grapes      17

下面使用这些列进行排序（列4在列3之前进行排序，而且列4是逆序排列）

    * Group ID (integer)
    * Category ID (integer)
    * Frequency “sorted in reverse order” (integer)
    * Text (alpha-numeric)

排序后的结果应该为：

Group-ID   Category-ID   Text        Frequency
----------------------------------------------
190        700           grapes      17
200        900           bananas     5
200        900           figs        4
200        1000          lemons      10
200        1000          oranges     10
200        1000          pears       8

可以直接使用sort命令来解决这个问题：

BASH CODE

sort -t $'\t' -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 <my-file>

解释如下：

-t $'\t'：指定TAB为分隔符
-k 1, 1: 按照第一列的值进行排序，如果只有一个1的话，相当于告诉sort从第一列开始直接到行尾排列
n:代表是数字顺序，默认情况下市字典序，如10<2
r: reverse 逆序排列，默认情况下市正序排列

所以最后的命令：sort -t $’\t’ -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 my-file

参考：点击打开链接

FeelTouch Labs

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录