comm命令可以按行比较两个排序好的文件,输出有3列:第一列是file1独有的、第二列是file2独有的,第三列是两者都有的,简单语法如下:
NAME
comm - compare two sorted files line by line
SYNOPSIS
comm [OPTION]... FILE1 FILE2
DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line.
With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines
unique to FILE2, and column three contains lines common to both files.
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
--check-order
check that the input is correctly sorted, even if all input lines are pairable
--nocheck-order
do not check that the input is correctly sorted
--output-delimiter=STR
separate columns with STR
示例:先从
词典里按顺序随机抽取一些行导出到文件中,这样就省得排序了:
qingsong@db2a:/tmp$
sed -n '5p;1001p;3000p;4000p;5000p;7000p;8800p;9900p;10000p' /usr/share/dict/american-english > file1
qingsong@db2a:/tmp$
sed -n '2p;4000p;5000p;8888p;10000p;30000p;40000p' /usr/share/dict/american-english > file2
qingsong@db2a:/tmp$
cat file1
ABM's
Ashikaga's
Charybdis's
Decker
Eurasia
Idaho's
Lipizzaner
Meghan's
Merck's
qingsong@db2a:/tmp$
cat file2
A's
Decker
Eurasia
Lombard's
Merck's
collaborated
elms
比较两个文件 qingsong@db2a:/tmp$
comm file1 file2
qingsong@db2a:/tmp$ comm file1 file2
A's
ABM's
Ashikaga's
Charybdis's
Decker
Eurasia
Idaho's
Lipizzaner
Lombard's
Meghan's
Merck's
collaborated
elms
只显示file1独有的行: 需要把第2列和第3列去掉:
qingsong@db2a:/tmp$
comm -2 -3 file1 file2
ABM's
Ashikaga's
Charybdis's
Idaho's
Lipizzaner
Meghan's
只显示file2独有的行: qingsong@db2a:/tmp$
comm -1 -3 file1 file2
A's
Lombard's
collaborated
elms
只显示两者重复的行: qingsong@db2a:/tmp$
comm -1 -2 file1 file2
Decker
Eurasia
Merck's
只显示两者不重复的行: 后面的sed是将以\t开头的\t去掉:
qingsong@db2a:/tmp$
comm -3 file1 file2 | sed 's/^\t//' A's ABM's Ashikaga's Charybdis's Idaho's Lipizzaner Lombard's Meghan's collaborated elms
本文介绍了如何使用comm命令行工具高效地比较两个已排序文件,展示了其基本语法、选项应用及实际操作示例,包括显示唯一行、排除重复行等。通过sed进行额外处理,提供了一种在信息技术中处理文本数据的有效方法。
366

被折叠的 条评论
为什么被折叠?



