Linux_Linux_sort 命令

 最近有被问到如何在Linux 中实现 2个可能重复文件的交集。下面,我们进行下梳理。

 

 

函数介绍 英文

首先,看下sort 的函数介绍 :

可以使用的方法 man sort / sort -h

[root@cdh-manager linux_cmd_test]# sort --help
Usage: sort [OPTION]... [FILE]...
  or:  sort [OPTION]... --files0-from=F
Write sorted concatenation of all FILE(s) to standard output.

Mandatory arguments to long options are mandatory for short options too.
Ordering options:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           sort by random hash of keys
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text

Other options:

      --batch-size=NMERGE   merge at most NMERGE inputs at once;
                            for more use temp files
  -c, --check, --check=diagnose-first  check for sorted input; do not sort
  -C, --check=quiet, --check=silent  like -c, but do not report first bad line
      --compress-program=PROG  compress temporaries with PROG;
                              decompress them with PROG -d
      --debug               annotate the part of the line used to sort,
                              and warn about questionable usage to stderr
      --files0-from=F       read input from the files specified by
                            NUL-terminated names in file F;
                            If F is - then read names from standard input
  -k, --key=KEYDEF          sort via a key; KEYDEF gives location and type
  -m, --merge               merge already sorted files; do not sort
  -o, --output=FILE         write result to FILE instead of standard output
  -s, --stable              stabilize sort by disabling last-resort comparison
  -S, --buffer-size=SIZE    use SIZE for main memory buffer
  -t, --field-separator=SEP  use SEP instead of non-blank to blank transition
  -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or /tmp;
                              multiple options specify multiple directories
      --parallel=N          change the number of sorts run concurrently to N
  -u, --unique              with -c, check for strict ordering;
                              without -c, output only the first of an equal run
  -z, --zero-terminated     end lines with 0 byte, not newline
      --help     display this help and exit
      --version  output version information and exit

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a
field number and C a character position in the field; both are origin 1, and
the stop position defaults to the line's end.  If neither -t nor -b is in
effect, characters in a field are counted from the beginning of the preceding
whitespace.  OPTS is one or more single-letter ordering options [bdfgiMhnRrV],
which override global ordering options for that key.  If no key is given, use
the entire line as the key.

SIZE may be followed by the following multiplicative suffixes:
% 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

With no FILE, or when FILE is -, read standard input.

*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'sort invocation'

 

函数介绍 中文

再看下中文的具体介绍:

语法

sort [-bcdfimMnr][-o<输出文件>][-t<分隔字符>][+<起始栏位>-<结束栏位>][--help][--verison][文件]

参数说明:

-b 忽略每行前面开始出的空格字符。

-c 检查文件是否已经按照顺序排序。

-d 排序时,处理英文字母、数字及空格字符外,忽略其他的字符。

-f 排序时,将小写字母视为大写字母。

-i 排序时,除了040至176之间的ASCII字符外,忽略其他的字符。

-m 将几个排序好的文件进行合并。

-M 将前面3个字母依照月份的缩写进行排序。

-n 依照数值的大小排序。

-u 意味着是唯一的(unique),输出的结果是去完重了的。

-o<输出文件> 将排序后的结果存入指定的文件。

-r 以相反的顺序来排序。

-t<分隔字符> 指定排序时所用的栏位分隔字符。

+<起始栏位>-<结束栏位> 以指定的栏位来排序,范围由起始栏位到结束栏位的前一栏位。

--help 显示帮助。

--version 显示版本信息。

 

实例

构建 2个测试文件

test.txt

[root@cdh-manager linux_cmd_test]# cat test.txt
test
as
test
ss
pz
sda
as

test2.txt

[root@cdh-manager linux_cmd_test]# cat test2.txt
test
wq
qq
ss
ssdz
ww

 

案例一

在使用sort命令以默认的式对文件的行进行排序,使用的命令如下:(默认是字典序)

[root@cdh-manager linux_cmd_test]# sort test.txt
as
as
pz
sda
ss
test
test

 

案例二

-r

字典序倒序

[root@cdh-manager linux_cmd_test]# sort -r test.txt
test
test
ss
sda
pz
as
as

 

案例三

-r -u 

字典序排序并去重

[root@cdh-manager linux_cmd_test]# sort -r -u test.txt
test
ss
sda
pz
as

 

案例四

求 test.txt test2.txt 的交集,并去重输出到另一个文件

[root@cdh-manager linux_cmd_test]# cat test2.txt test.txt | sort -u -o intersect.txt
[root@cdh-manager linux_cmd_test]# cat intersect.txt 
as
pz
qq
sda
ss
ssdz
test
wq
ww

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值