Linux split文件切分工具的使用

最新推荐文章于 2024-12-11 10:33:19 发布

byzf

最新推荐文章于 2024-12-11 10:33:19 发布

阅读量1.3k

点赞数 2

分类专栏： tools 操作系统文章标签： linux shell split

kjcxmx

本文链接：https://blog.csdn.net/kjcxmx/article/details/118424599

版权

tools 同时被 2 个专栏收录

18 篇文章 2 订阅

订阅专栏

操作系统

4 篇文章 0 订阅

订阅专栏

本文介绍了Linux下split命令的使用，包括按行数、字节数、固定数量拆分文件，并展示了如何指定文件名前缀、后缀长度和添加verbose参数。此外，还详细讲解了如何在Linux和Windows环境下合并已拆分的文件，确保文件内容完整无误。

摘要由CSDN通过智能技术生成

Linux split文件切分工具的使用

0x00、背景介绍

日常生活中，我们会遇到大文件，像操作系统镜像、高清电影、超大日志文件等等，即便是压缩效果也不大。如下问题博主会经常遇到，简单罗列一下几种常见的情况。

1.通常会遇到大文件，有时候对于超大文件的拷贝到U盘是有限制的，大于多少G就不允许拷贝。
2.另外我们通常会记录程序的操作日志，有时候为了方便定位，即便是做成每天按日期划分，单个文件还是会超级大。
3.在Linux下vim打开超大文件，受到内存硬件原因，往往会遇到打不开或者内存使用过高导致卡机问题。
4.有时候我们并不一定要打开查看整个文件，而是获取其中的一部分，拆分就会显得十分省力。

0x01、split介绍

Linux下有个强大的文件切分的命令工具split，查看帮助文件。

root@Lemon:~/Desktop/split$ split --help
Usage: split [OPTION]... [FILE [PREFIX]]
Output pieces of FILE to PREFIXaa, PREFIXab, ...;
default size is 1000 lines, and default PREFIX is 'x'.

如果没有指定文件，或者文件为"-"，则从标准输入读取。

必选参数对长短选项同时适用。
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of records per output file
  -d                      use numeric suffixes starting at 0, not alphabetic
      --numeric-suffixes[=FROM]  same as -d, but allow setting the start value
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines/records per output file
  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below
  -t, --separator=SEP     use SEP instead of newline as the record separator;
                            '\0' (zero) specifies the NUL character
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose        在每个输出文件打开前输出文件特征
      --help        显示此帮助信息并退出
      --version        显示版本信息并退出

The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).

CHUNKS may be:
  N       split into N files based on size of input
  K/N     output Kth of N to stdout
  l/N     split into N files without splitting lines/records
  l/K/N   output Kth of N to stdout without splitting lines/records
  r/N     like 'l' but use round robin distribution
  r/K/N   likewise but only output Kth of N to stdout

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
请向<http://translationproject.org/team/zh_CN.html> 报告split 的翻译错误
Full documentation at: <http://www.gnu.org/software/coreutils/split>
or available locally via: info '(coreutils) split invocation'

我们看到split的主要参数

split [--help][--version][-<行数>][-b <字节>][-C <字节>][-l <行数>][要切割的文件][输出文件名]

-<行数> : 指定每多少行切成一个小文件
-b<字节> : 指定每多少字节切成一个小文件
--help : 查看在线帮助
--version : 显示版本信息
-C<字节> : 与参数"-b"相似，但是在切 割时将尽量维持每行的完整性
[输出文件名] : 设置切割后文件的前置文件名， split会自动在前置文件名后再加上编号

在没有明确指定拆分后文件的命名方式的情况下，split 会默认采用 x 字符作为文件前缀，拼接下划线，然后拼接aa、ab、ac...dc等，类似x_aa、x_ab、x_dc。可以直接接文件名前缀指定。

添加--verbose参数可以看到拆分创建文件的过程。

添加-a参数，可以设置后缀拼接划分的长度，如-a 5，输出文件结果x_aaaaa、x_aaaab。

下面是man split系统命令的使用手册。

SPLIT(1)                  User Commands                                SPLIT(1)
NAME
       split - split a file into pieces
SYNOPSIS
       split [OPTION]... [FILE [PREFIX]]
DESCRIPTION
       Output pieces of FILE to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'.

   With no FILE, or when FILE is -, read standard input.
   Mandatory arguments to long options are mandatory for short options too.
   -a, --suffix-length=N
          generate suffixes of length N (default 2)
   --additional-suffix=SUFFIX
          append an additional SUFFIX to file names
   -b, --bytes=SIZE
          put SIZE bytes per output file
   -C, --line-bytes=SIZE
          put at most SIZE bytes of records per output file
   -d     use numeric suffixes starting at 0, not alphabetic
   --numeric-suffixes[=FROM]
          same as -d, but allow setting the start value
   -e, --elide-empty-files
          do not generate empty output files with '-n'
   --filter=COMMAND
          write to shell COMMAND; file name is $FILE
   -l, --lines=NUMBER
          put NUMBER lines/records per output file
   -n, --number=CHUNKS
          generate CHUNKS output files; see explanation below
   -t, --separator=SEP
          use SEP instead of newline as the record separator; '\0' (zero) specifies the NUL character
   -u, --unbuffered
          immediately copy input to output with '-n r/...'
   --verbose
          print a diagnostic just before each output file is opened
   --help display this help and exit
   --version
          output version information and exit
   The  SIZE argument is an integer and optional unit (example: 10K is 10*1024).  Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers
   of 1000).
   CHUNKS may be:
   N      split into N files based on size of input
   K/N    output Kth of N to stdout
   l/N    split into N files without splitting lines/records
   l/K/N  output Kth of N to stdout without splitting lines/records
   r/N    like 'l' but use round robin distribution
   r/K/N  likewise but only output Kth of N to stdout
AUTHOR
       Written by Torbjorn Granlund and Richard M. Stallman.
REPORTING BUGS
       GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
       Report split translation bugs to <http://translationproject.org/team/>
COPYRIGHT
       Copyright © 2016 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
       Full documentation at: <http://www.gnu.org/software/coreutils/split>
       or available locally via: info '(coreutils) split invocation'
GNU coreutils 8.26          February 2017     SPLIT(1)

0x02、实际例子解读

1.默认情况

默认情况下是按照行数拆分，一个文件1000行，拆分后的文件名以x_为前缀，后面拼接aa、ab、ac...dc等。

// split -b 5M test  //拆分文件默认排序
root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split test 
root@Lemon:~/Desktop/split/test$ ls
test  xaa  xab  xac  xad
root@Lemon:~/Desktop/split/test$

2.根据文件字节大小拆分

每10MB拆分一个文件，也可以指定 K、M、G、T 等单位。

// split -b 5M test  //拆分文件默认排序
root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -b 5M test 
root@Lemon:~/Desktop/split/test$ ls
test  xaa  xab  xac  xad
root@Lemon:~/Desktop/split/test$

// split -b 5M -d test  //以序号方式拆分
root@Lemon:~/Desktop/split/test$ split -b 5M -d test 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  x00  x01  x02  x03
root@Lemon:~/Desktop/split/test$

// split -b 5M test test_split_   //拼接字符串重命名
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -b 5M test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_aa  test_split_ab  test_split_ac  test_split_ad
root@Lemon:~/Desktop/split/test$

// split -b 5M -d test test_split_   //拼接字符串重命名，序列按照序号形式
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -b 5M -d test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_00  test_split_01  test_split_02  test_split_03
root@Lemon:~/Desktop/split/test$

3.根据文件行数拆分

每10行拆分一个文件，使用-和行数，如-10。或者加参数-l和行数，如-l 10。

// split -10 test  //拆分文件默认排序
root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -10 test 
root@Lemon:~/Desktop/split/test$ ls
test  xaa  xab  xac  xad
root@Lemon:~/Desktop/split/test$

// split -b 5M -d test  //以序号方式拆分
root@Lemon:~/Desktop/split/test$ split -10 -d test 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  x00  x01  x02  x03
root@Lemon:~/Desktop/split/test$

// split -b 5M test test_split_   //拼接字符串重命名
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -10 test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_aa  test_split_ab  test_split_ac  test_split_ad
root@Lemon:~/Desktop/split/test$

// split -b 5M -d test test_split_   //拼接字符串重命名，序列按照序号形式
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -10 -d test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_00  test_split_01  test_split_02  test_split_03
root@Lemon:~/Desktop/split/test$

4.指定文件数目拆分

每10行拆分一个文件，使用-和行数，如-10。或者加参数-l和行数，如-l 10。

// split -n 5 test  //拆分文件默认排序
root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -n 5 test
root@Lemon:~/Desktop/split/test$ ls
test  xaa  xab  xac  xad  xae
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ wc -l x*
   7 xaa
  10 xab
   9 xac
  10 xad
   8 xae
  44 总用量
root@Lemon:~/Desktop/split/test$

// split -n 4 -d test  //以序号方式填充
root@Lemon:~/Desktop/split/test$ split -n 4 -d test 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  x00  x01  x02  x03
root@Lemon:~/Desktop/split/test$

// split -n 4 test test_split_   //拼接字符串重命名
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -n 4 test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_aa  test_split_ab  test_split_ac  test_split_ad
root@Lemon:~/Desktop/split/test$

// split -n 4 -d test test_split_   //拼接字符串重命名，序列按照序号形式
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -n 4 -d test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_00  test_split_01  test_split_02  test_split_03
root@Lemon:~/Desktop/split/test$

5.指定文件拼接的尾数长度

默认为2个字符，可通过-a来指定。

// split -n 5 -a 5 test  //拆分文件默认排序
root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ split -n 5 -a 5 test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  xaaaaa  xaaaab  xaaaac  xaaaad  xaaaae
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ wc -l x*
   7 xaaaaa
  10 xaaaab
   9 xaaaac
  10 xaaaad
   8 xaaaae
  44 总用量
root@Lemon:~/Desktop/split/test$

// split -l 10 -a 4 -d test test_split_  //以序号方式填充，填充长度为4
root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ split -l 10 -a 4 -d test test_split_
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  test_split_0000  test_split_0001  test_split_0002  test_split_0003  test_split_0004
root@Lemon:~/Desktop/split/test$ wc -l test_split_000*
  10 test_split_0000
  10 test_split_0001
  10 test_split_0002
  10 test_split_0003
   4 test_split_0004
  44 总用量
root@Lemon:~/Desktop/split/test$

5.添加--verbose参数

通过添加--verbose参数，获取拆分创建文件的过程。

root@Lemon:~/Desktop/split/test$ split -10 test --verbose
正在创建文件'xaa'
正在创建文件'xab'
正在创建文件'xac'
正在创建文件'xad'
正在创建文件'xae'
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  xaa  xab  xac  xad  xae
root@Lemon:~/Desktop/split/test$ wc -l x*
  10 xaa
  10 xab
  10 xac
  10 xad
   4 xae
  44 总用量
root@Lemon:~/Desktop/split/test$

0x03、合并文件

既然文件可以拆分，那就一定可以合并。不过最好在拆分完成后将原来的文件输出个md5或sha256完整性校验。

1.Linux下合并文件

使用cat命令将拆分的文件合并成一个文件，可以看出md5值合并后的文件和源文件的大小是一致的：

root@Lemon:~/Desktop/split/test$ ls
test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ split -10 test 
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ ls
test  xaa  xab  xac  xad  xae
root@Lemon:~/Desktop/split/test$ md5sum test 
050952b65f63a6190c9b3328dd1afab1  test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ mv test test_ori
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ cat x* > test
root@Lemon:~/Desktop/split/test$ 
root@Lemon:~/Desktop/split/test$ md5sum test 
050952b65f63a6190c9b3328dd1afab1  test
root@Lemon:~/Desktop/split/test$ ls
test  test_ori  xaa  xab  xac  xad  xae
root@Lemon:~/Desktop/split/test$

2.Windows下合并文件

在Windows下要运行cmd，然后用copy命令来进行文件的合并：

copy /b xaa + xab + xac + xad + xae test