uniq所谓的重复是连续出现的相同记录。而sort -u是全局的。 先sort,再用uniq可以实现sort -u(即sort -u file.txt 等价于sort file.txt | uniq)
sort -u 和 uniq都能起到删除重复信息的功能,那么他们的区别究竟在哪呢?
$ cat test
jason
jason
jason
fffff
jason
下面分别执行三个命令
1:sort -u test
sort -u test
fffff
jason
2: uniq test
$uniq test
jason
fffff
jason
3: sort test|uniq
$sort test |uniq
fffff
jason
从上面三个命令我们很容易看出他们之间的区别。uniq所谓的重复是连续出现的相同记录
#cat a.txt | uniq -c -i | sort -k2 -n 排重,排重输出的第二列正序排列
#cat a.txt | uniq -c -i | sort -k2 -rn 排重,排重输出的第二列逆序排列
uniq 参数解释
-c 统计重复数量
-c Precede each output line with the count of the number of times
the line occurred in the input, followed by a single space.
-d Only output lines that are repeated in the input.
-f num Ignore the first num fields in each input line when doing compar-
isons. A field is a string of non-blank characters separated
from adjacent fields by blanks. Field numbers are one based,
i.e., the first field is field one.
-s chars
Ignore the first chars characters in each input line when doing
comparisons. If specified in conjunction with the -f option, the
first chars characters after the first num fields will be
ignored. Character numbers are one based, i.e., the first char-
acter is character one.
-u Only output lines that are not repeated in the input.
-i Case insensitive comparison of lines.
=============================================================================
linux关于sort命令的高级用法(按多个列值进行排列)
如果单纯地使用sort按行进行排序比较简单,
但是使用sort按多个列值排列,同时使用tab作为分隔符,而且对于某些列需要进行逆序排列,这样sort命令写起来就比较麻烦了
比如下面的文件内容,使用[TAB]进行分割:
Group-ID Category-ID Text Frequency ---------------------------------------------- 200 1000 oranges 10 200 900 bananas 5 200 1000 pears 8 200 1000 lemons 10 200 900 figs 4 190 700 grapes 17
下面使用这些列进行排序(列4在列3之前进行排序,而且列4是逆序排列)
* Group ID (integer) * Category ID (integer) * Frequency “sorted in reverse order” (integer) * Text (alpha-numeric)
排序后的结果应该为:
Group-ID Category-ID Text Frequency ---------------------------------------------- 190 700 grapes 17 200 900 bananas 5 200 900 figs 4 200 1000 lemons 10 200 1000 oranges 10 200 1000 pears 8
可以直接使用sort命令来解决这个问题:
sort -t $'\t' -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 <my-file>
解释如下:
-t $'\t':指定TAB为分隔符 -k 1, 1: 按照第一列的值进行排序,如果只有一个1的话,相当于告诉sort从第一列开始直接到行尾排列 n:代表是数字顺序,默认情况下市字典序,如10<2 r: reverse 逆序排列,默认情况下市正序排列
所以最后的命令:sort -t $’\t’ -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 my-file
参考: 点击打开链接