Linux_Linux_sort 命令

最新推荐文章于 2022-04-01 11:08:51 发布

高达一号

最新推荐文章于 2022-04-01 11:08:51 发布

阅读量320

点赞数

分类专栏： Linux&&Mac&&Windows

本文链接：https://blog.csdn.net/u010003835/article/details/106806413

版权

Linux&&Mac&&Windows 专栏收录该内容

70 篇文章 0 订阅

订阅专栏

最近有被问到如何在Linux 中实现 2个可能重复文件的交集。下面，我们进行下梳理。

函数介绍英文

首先，看下sort 的函数介绍：

可以使用的方法 man sort / sort -h

[root@cdh-manager linux_cmd_test]# sort --help
Usage: sort [OPTION]... [FILE]...
  or:  sort [OPTION]... --files0-from=F
Write sorted concatenation of all FILE(s) to standard output.

Mandatory arguments to long options are mandatory for short options too.
Ordering options:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           sort by random hash of keys
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text

Other options:

      --batch-size=NMERGE   merge at most NMERGE inputs at once;
                            for more use temp files
  -c, --check, --check=diagnose-first  check for sorted input; do not sort
  -C, --check=quiet, --check=silent  like -c, but do not report first bad line
      --compress-program=PROG  compress temporaries with PROG;
                              decompress them with PROG -d
      --debug               annotate the part of the line used to sort,
                              and warn about questionable usage to stderr
      --files0-from=F       read input from the files specified by
                            NUL-terminated names in file F;
                            If F is - then read names from standard input
  -k, --key=KEYDEF          sort via a key; KEYDEF gives location and type
  -m, --merge               merge already sorted files; do not sort
  -o, --output=FILE         write result to FILE instead of standard output
  -s, --stable              stabilize sort by disabling last-resort comparison
  -S, --buffer-size=SIZE    use SIZE for main memory buffer
  -t, --field-separator=SEP  use SEP instead of non-blank to blank transition
  -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or /tmp;
                              multiple options specify multiple directories
      --parallel=N          change the number of sorts run concurrently to N
  -u, --unique              with -c, check for strict ordering;
                              without -c, output only the first of an equal run
  -z, --zero-terminated     end lines with 0 byte, not newline
      --help     display this help and exit
      --version  output version information and exit

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a
field number and C a character position in the field; both are origin 1, and
the stop position defaults to the line's end.  If neither -t nor -b is in
effect, characters in a field are counted from the beginning of the preceding
whitespace.  OPTS is one or more single-letter ordering options [bdfgiMhnRrV],
which override global ordering options for that key.  If no key is given, use
the entire line as the key.

SIZE may be followed by the following multiplicative suffixes:
% 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

With no FILE, or when FILE is -, read standard input.

*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'sort invocation'

函数介绍中文

再看下中文的具体介绍：

语法

sort [-bcdfimMnr][-o<输出文件>][-t<分隔字符>][+<起始栏位>-<结束栏位>][--help][--verison][文件]

参数说明：

-b 忽略每行前面开始出的空格字符。

-c 检查文件是否已经按照顺序排序。

-d 排序时，处理英文字母、数字及空格字符外，忽略其他的字符。

-f 排序时，将小写字母视为大写字母。

-i 排序时，除了040至176之间的ASCII字符外，忽略其他的字符。

-m 将几个排序好的文件进行合并。

-M 将前面3个字母依照月份的缩写进行排序。

-n 依照数值的大小排序。

-u 意味着是唯一的(unique)，输出的结果是去完重了的。

-o<输出文件> 将排序后的结果存入指定的文件。

-r 以相反的顺序来排序。

-t<分隔字符> 指定排序时所用的栏位分隔字符。

+<起始栏位>-<结束栏位> 以指定的栏位来排序，范围由起始栏位到结束栏位的前一栏位。

--help 显示帮助。

--version 显示版本信息。

实例

构建 2个测试文件

test.txt

[root@cdh-manager linux_cmd_test]# cat test.txt
test
as
test
ss
pz
sda
as

test2.txt

[root@cdh-manager linux_cmd_test]# cat test2.txt
test
wq
qq
ss
ssdz
ww

案例一

在使用sort命令以默认的式对文件的行进行排序，使用的命令如下：(默认是字典序)

[root@cdh-manager linux_cmd_test]# sort test.txt
as
as
pz
sda
ss
test
test

案例二

-r

字典序倒序

[root@cdh-manager linux_cmd_test]# sort -r test.txt
test
test
ss
sda
pz
as
as

案例三

-r -u

字典序排序并去重

[root@cdh-manager linux_cmd_test]# sort -r -u test.txt
test
ss
sda
pz
as

案例四

求 test.txt test2.txt 的交集，并去重输出到另一个文件

[root@cdh-manager linux_cmd_test]# cat test2.txt test.txt | sort -u -o intersect.txt

[root@cdh-manager linux_cmd_test]# cat intersect.txt 
as
pz
qq
sda
ss
ssdz
test
wq
ww

高达一号

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

Linux_Linux_sort 命令

函数介绍 英文

函数介绍 中文

实例

案例一

案例二

案例三

案例四

函数介绍英文

函数介绍中文