【sort_uniq_join_cut_paste_split】-CSDN博客

本文链接：https://blog.csdn.net/kk185800961/article/details/8057731

【【【sort 介绍】】】
sort -cmu -o output_file [other options] +pos1 +pos2 input_files
   -c 测试文件是否已经分类。       $ sort -c file
   -m 合并两个分类文件。
   -u 删除所有复制行。       $ sort -u file || $ cat file|awk '{print $2}'|sort -u
   -o 存储sort结果的输出文件名   $ sort -o out_file file   || $ sort file -o file
   [other options]
   -R 根据随机 hash 排序
   -f 忽略字母大小写
   -i 只排序可打印字符
   -b 每行从第一个非空字符开始比较。
   -V 在文本内进行自然版本排序
   -n 指定分类是域上的数字分类--从小到大   $ sort -n file
   -r 对分类次序或比较求逆。       $ sort -r file
   -t 域分隔符;用非空格或 tab键分隔域。   $ sort -n -t: -k 2 file
   -k n ：n为域号。用此域号分类排序。   $ sort -k 2 file --第二域第三个字符开始比较
   n ：n为域号。在分类比较时忽略此域,一般与 -k一起使用。
   post1 传递到m,n。m为域号,n为开始分类字符数;例如 4,6意即以第5域分类,从第 7个字符开始。
                   $ sort -k 4,6 file

【【【uniq 介绍】】】
uniq [option] file 【注意文件里的空格！tab！空行！！】
   -u 只显示不重复的行。
   -d 只显示有重复的行（每种重复行只显示其中一行）
   -c 打印每一重复行出现次数。
   -f n:n为数字,前 n个域被忽略，从n+1个字段开始分组排序
   -s n:避免比较前n个字符

【【【join 介绍】】】
uniq [option] file1 file2
      -a 文件编号的值可以是1 或2，分别对应文件1 和文件2。
         此选项用于根据指定文件编号输出不成对的行目。
      -e 将缺失的输入区块替换为指定字符
      -i 比较时忽略大小写
      -j 域，等于"-1 域 -2 域"
      -o 格式，按照指定格式构造输出行   $ join -o 1.2 2.2 file1 file2 --file1的2列，file2的2列
      -t 字符，使用指定字符作为输入和输出的分隔符。-t ':' 使用冒号作为分隔符
      -v FILENUM，like -a FILENUM, but suppress joined output lines
      -1 FIELD，join on this FIELD of file 1
      -2 FIELD，join on this FIELD of file 2

内连接（inner join）               格式：join <FILE1> <FILE2>
左连接（left join, 左外连接, left outer join）   格式：join -a1 <FILE1> <FILE2>
右连接（right join, 右外连接,right outer join）   格式：join -a2 <FILE1> <FILE2>
全连接（full join, 全外连接, full outer join）   格式：join -a1 -a2 <FILE1> <FILE2>

   指定输出字段：
   -o <FILENO.FIELDNO> ..
   其中FILENO=1表示第一个文件，FILENO=2表示第二个文件，FIELDNO表示字段序号，从1开始编号
   -o 1.1 1.2 2.2 表示输出第一个文件的第一个字段、第二个字段，第二个文件的第二个字段

【【【cut 介绍】】】
cut [option] file1 file2
   -b, --bytes=列表   只选中指定的这些字节
   -c, --characters=列表   只选中指定的这些字符：-c1,5-7（5到7），-c1-50（前50）
   -d, --delimiter=分界符   使用指定分界符代替制表符作为区域分界
   -f, --fields=列表   只选中指定的这些域；-f1,10-12（第1域,第10域到第12域。）
               并打印所有不包含分界符的行，除非-s 选项被指定
   -n           (忽略)
       --complement   补全选中的字节、字符或域
   -s, --only-delimited   不打印没有包含分界符的行

【【【paste 介绍】】】
paste [option] file1 file2---------并列输出
   -d, --delimiters=列表   改用指定列表里的字符替代制表分隔符，两个文件的链接符
   -s, --serial       不使用平行的行目输出模式，而是每个文件占用一行。文件1为第一行，文件2为第二行
file1:
1       一月
2       二月
3       三月
4       四月
5       五月
file2:
1       January
2       February
3       March
4       April
$ paste file1 file2
1       一月    1       January
2       二月    2       February
3       三月    3       March
4       四月    4       April
5       五月

$ paste -d'-' file1 file2
1       一月-1       January
2       二月-2       February
3       三月-3       March
4       四月-4       April
5       五月-

$ paste -s file1 file2
1       一月    2       二月    3       三月    4       四月    5       五月
1       January    2       February    3       March    4       April

【【【split 介绍】】】
split -size infile outfile
   将输入内容拆分为固定大小的分片并输出到"前缀aa"、"前缀ab",...；
   默认以 1000 行为拆分单位，默认前缀为"x"。如果不指定文件，或
   者文件为"-"，则从标准输入读取数据。
   长选项必须使用的参数对于短选项时也是必需使用的。
   -a, --suffix-length=N   指定后缀长度为N (默认为2)
   -b, --bytes=大小   指定每个输出文件的字节大小   $ split -b 400 infile outfile
   -C, --line-bytes=大小   指定每个输出文件里最大行字节大小
   -d, --numeric-suffixes    使用数字后缀代替字母后缀   $ split -l 6 -d infile outfile
   -l, --lines=数值   指定每个输出文件有多少行   $ split -l 6 infile outfile
$ split [-abcdl] [size] infile outfile