文件的排序和合并--sort，cut，tr等命令应用

最新推荐文章于 2022-09-09 19:12:34 发布

weixin_34049948

最新推荐文章于 2022-09-09 19:12:34 发布

阅读量205

点赞数

文章标签： awk

原文链接：http://blog.51cto.com/hebingkun/1311391

版权

本文章的内容只对文本进行编辑，但结果并不会把原文件内容更改

sort命令，只是对文件排行排序

选项意义

-c 测试文件是否已经被排序

-k 指定排序的区域（常用）

-m 合并两个已排序的文件

-n 根据数字大小进行排序（常用）

-o 将输出写到指定文件，相当于重定向输出文件

-r 将排序结果逆向显示

-t 改变区域分隔符（常用）

-u 去除结果中的重复行

格式：sort 选项输入文件

下面详细解说：

-t选项——原理从第一区域开始排序，若第一区域内容相同，则以第二区域排序

-k

[root@localhost ~]# cat abc
aa:ff:kk
bb:qq:gg
dd:ee:ww
xx cc
[root@localhost ~]# sort -t: -k3 abc
xx cc
bb:qq:gg
aa:ff:kk
dd:ee:ww

可以留意到，从第三区域才开始排序，前2个区域是没有排序，而-t就以：为分隔符

也可以用sort -t: -k3 /etc/passwd

[root@localhost ~]# sort -t: -k3 /etc/passwd
root:x:0:0:root:/root:/bin/bash
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
bin:x:1:1:bin:/bin:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
squid:x:23:23::/var/spool/squid:/sbin/nologin

这样第3区域刚好是UID就会排序，但会发现这里只以第一个字符大小进行排序，而不会以整体大小排序

如：这样只达到第1个字符进行排序，而不是对整个数进行排序

1 1

400 400

67 501

89 602

501 67

602 89

-n选项，为指定列的数字大小，属于整体性质来排序

如：

900 3

60 60

3 100

100 900

[root@localhost ~]# cat abc
900:aa
60:bb
3:rr
100:pp
[root@localhost ~]# sort -t: -k1 abc
100:pp
3:rr
60:bb
900:aa
[root@localhost ~]# sort -t: -k1n abc
3:rr
60:bb
100:pp
900:aa

可以看出，加了-n是对整体排序，而不加-n则只以单个字符排序

所以用 sort -t: -k3n /etc/passwd 这样就可以对UID进行从小到大排序了

-r是反向，从大到小排序

[root@localhost ~]# cat abc
900:aa
60:bb
3:rr
100:pp
[root@localhost ~]# sort -t: -k1nr abc
900:aa
100:pp
60:bb
3:rr

-u选项，去除重复出现的区域

[root@localhost ~]# cat abc
900:aa
900:aa
900:aa
60:bb
[root@localhost ~]# sort -t: -k1n -u abc
60:bb
900:aa

其中重复了2行的900:aa就不会显示出来

-o为定向输出，但直接用>来代替就行了，这个选项很少用

-m为排序后2个文件合并，前提2个文件内容没有重复

[root@localhost ~]# cat abc
555:rr
900:aa
60:bb
[root@localhost ~]# cat aaa
xxx
xxxxx
[root@localhost ~]# sort -t: -m abc aaa
555:rr
900:aa
60:bb
xxx
xxxxx

这样就把abc的内容全部到aaa里了

其实还可以简写为sort -t: abc aaa 就行了

也可以按2个文件的区域进行排序

[root@localhost ~]# cat abc
555:rr
900:aa
60:bb
[root@localhost ~]# cat aaa
777:xxx
444444:xxxxx
[root@localhost ~]# sort -t: -k1n abc aaa
60:bb
555:rr
777:xxx
900:aa
444444:xxxxx

这样就达到2个文件内容合并后再进行区域排序

sort命令就介绍到这里

uniq命令——用于去除重复行，其实跟sort -u差不多

选项意义

-c 打印每行在文本中重复出现的次数

-d 只显示有重复的记录，每个重复记录只出现一次

-u 只显示没有重复的记录

[root@localhost ~]# cat abc
555:rr
555:rr
555:rr
60:bb
[root@localhost ~]# uniq abc
555:rr
60:bb

去除2行重内容

与sort -u的区别，sort -u只会把重复行删除剩下1个

[root@localhost ~]# cat abc
555:rr
555:rr
555:rr
60:bb
60:bb
show me
555:rr
hello
[root@localhost ~]# sort -u abc
555:rr
60:bb
hello
show me
[root@localhost ~]# uniq abc
555:rr
60:bb
show me
555:rr
hello

可以留意到，uniq只是把连续重复的行删除，第4行的555:rr是没有删除，但sort -u则认为都是重复，所以一并删除

-C为打印文本中出现重复的次数

[root@localhost ~]# uniq -c abc
      3 555:rr
      2 60:bb
      1 show me
      1 555:rr
      1 hello

可以显示出重复的次数

-d只打印连续重复的记录

[root@localhost ~]# cat abc
555:rr
555:rr
555:rr
60:bb
60:bb
show me
555:rr
hello
[root@localhost ~]# uniq -d abc
555:rr
60:bb

只显示出连续重复的行记录

-u只打印没有连续重复的记录

[root@localhost ~]# cat abc
555:rr
555:rr
555:rr
60:bb
60:bb
show me
555:rr
hello
[root@localhost ~]# uniq -u abc
show me
555:rr
hello

其实-d和-u是正反向的对比效果

cut命令——用于从标准输入或文本文件中按区域或行提取文本

格式：cut 选项文件

选项意义

-c 指定提取的字符数或字符范围

-f 指定提取的区域数或范围

-d 改变区域分隔符

[root@localhost ~]# cat abc
555:rr
60:bb
hello:333
[root@localhost ~]# cut -c1 abc
5
6
h
[root@localhost ~]# cut -c1-5 abc
555:r
60:bb
hello
[root@localhost ~]# cut -c2,4 abc
5:
0b
el

-c1只提取abc文件中第一个字符的列，而-c1-5即代表1－5个字符的列，-c2,4即代表第2和第4个字符的列

-d指定分隔符，-f为指定区域，这2个参数通常要一起使用

[root@localhost ~]# cat abc
555:rr
60:bb
hello:333
[root@localhost ~]# cut -d: -f2 abc
rr
bb
333

感觉跟awk的截取差不多

空格也可以用cut -d “ ” 来表示

[root@localhost ~]# cat aaa
777 xxx
444444 xxxxx
[root@localhost ~]# cut -d" " -f2 aaa
xxx
xxxxx

使用cut来截取IP也行

[root@localhost ~]# ifconfig eth0 | grep "inet addr" | cut -d: -f2 | cut -d" " -f1
192.168.1.1

这方法的原理跟awk一样，就是先过滤IP字段，再以：为分区过滤，再过滤出第1区域

也可以结合sort来使用对UID进行排序

[root@localhost ~]# cat test
squid:x:23:23::/var/spool/squid:/sbin/nologin
xfs:x:43:43:X Font Server:/etc/X11/fs:/sbin/nologin
sabayon:x:86:86:Sabayon user:/home/sabayon:/sbin/nologin
leon:x:500:500::/home/leon:/bin/bash
tom:x:501:501::/home/tom:/bin/bash
[root@localhost ~]# sort -t: -k3n test | cut -d: -f3
23
43
86
500
501

这样就把UID部分单独截取出来进行排序，当然用awk也能实现的

paste命令——把两个文件内容粘贴一起

选项意义

-d 指定分隔符，默认是以tab键为分隔符

-s 将每个文件粘贴成一行

[root@localhost ~]# cat aa
aaa
bbb
ccc
[root@localhost ~]# cat bb
111
222
333
[root@localhost ~]# paste aa bb
aaa     111
bbb     222
ccc     333

可以留意到粘贴后，是以空格为分隔符

[root@localhost ~]# paste aa bb > newfile
[root@localhost ~]# cat newfile
aaa     111
bbb     222
ccc     333

当然也可以排粘贴后的结果输出到另一个文件也行

[root@localhost ~]# cat aaa
777 xxx
444444 xxxxx
[root@localhost ~]# paste -d? abc aaa
555:rr?777 xxx
60:bb?444444 xxxxx
hello:333?

-d为指定分隔，因为默认是以tab键为分隔符，现在改变了分隔符？，注意，分隔符只能指定1个字符，不能指定多个字符，如-d***，这样的效果只能是-d*

[root@localhost ~]# cat aa
aaa
bbb
ccc
[root@localhost ~]# cat bb
111
222
333
[root@localhost ~]# paste -s aa bb
aaa     bbb     ccc
111     222     333

-s可以排两个文件以的内容分别以一行方式排列再合并，但效果不好看

tr命令——实现字符转换功能，其功能类似于sed命令，但tr命令比较简单

格式：tr 选项字符串1 字符串2 <标准输入文件，也可以使用管道

选项意义

-c 选定字符串1中字符集的补集，即反选字符串1中的字符集

-d 删除字符串1中出现的所有字符（常用）

-s 删除所有重复出现的字符序列，只保留一个

tr命令要么在管道后面使用，要么在标准输入”<”使用

[root@localhost ~]# cat abc
555:rr
60:bb
hello:333
[root@localhost ~]# tr -d 555 < abc
:rr
60:bb
hello:333

从abc中标准输入，然后匹配到555字符串就删除，-d是删除字符串

[root@localhost~]# cat abc

555:rRAd

60:bB

heLLo:333

[root@localhost~]# tr -d A-Z < abc

555:rd

60:b

heo:333

从abc文本中删除大写A-Z范围的字符串

同理，也可以删除数字0-9的范围

[root@localhost ~]# cat abc
555:rRAd
60:bB
heLLo:333
[root@localhost ~]# tr -d 0-9 < abc
:rRAd
:bB
heLLo:

这样0－9范围的数字就被删除

-s为删除重复的字符，只保留1个

[root@localhost ~]# cat aa
aaa
bbb
ccc
[root@localhost ~]# tr -s a,b < aa
a
b
ccc

指定的条件是a,b这两个重复的字符，然后删除重复只保留1个

删除文本中有多个空白行

[root@localhost ~]# cat aa
aaa
bbb
ccc
[root@localhost ~]# tr -s "\n" < aa
aaa
bbb
ccc

把空白行都删除，这里\n是代表换行的意思

也可以指定范围进行删除

[root@localhost ~]# cat abc
WinnnneNrrrrTteeemmm
222223333441234
[root@localhost ~]# tr -s "[a-z][0-9]" < abc
WineNrTtem
2341234

可以看到，只删除剩1个的重复字符

若上面的命令改成 tr “[a-z][0-9]” < abc

[root@localhost ~]# cat abc
WinnnneNrrrrTteeemmm
222223333441234
[root@localhost ~]# tr "[A-Z]" "9" < abc
9innnne9rrrr9teeemmm
222223333441234

这也可以匹配凡是大写的字符都更改为9，注意更改的字符只能是单个，不能是多个，如:

tr“[A-Z]” “999” < abc 这样的效果也只能替换1个9的字符而不能是多个

使用管道方式用tr命令

[root@localhost ~]# ifconfig eth0 | grep "inet addr"
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
[root@localhost ~]# ifconfig eth0 | grep "inet addr" | tr -d 19
          inet addr:2.68..  Bcast:2.68..255  Mask:255.255.255.0

这样匹配19的字符都全部删除

文件的排序、合并就介绍到这里

转载于:https://blog.51cto.com/hebingkun/1311391

weixin_34049948

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
文件的排序和合并--sort，cut，tr等命令应用

本文章的内容只对文本进行编辑，但结果并不会把原文件内容更改sort命令，只是对文件排行排序选项意义-c 测试文件是否已经被排序-k 指定排序的区域（常用）-m 合并两个已排序的文件-n 根据数字大小进行排序（常用）-o 将输出写到指定文件，相当于重定向输出文件-r 将排序结果逆向显示-t 改变区域分隔符（常用）-u 去除结果中的重复行格式：sort 选项输入文...
复制链接

扫一扫