linux解压多线程,多线程压缩工具pigz使用

参考多线程压缩工具Pigz使用

学习Linux系统时都会学习这么几个压缩工具:gzip、bzip2、zip、xz,以及相关的解压工具。关于这几个工具的使用和相互之间的压缩比以及压缩时间对比可以看:Linux中归档压缩工具学习

那么Pigz是什么呢?简单的说,就是支持并行压缩的gzip。Pigz默认用当前逻辑cpu个数来并发压缩,无法检测个数的话,则默认并发8个线程,也可以使用-p指定线程数。需要注意的是其CPU使用比较高。

官网:http://zlib.net/pigz

安装

yum install pigz

使用方法

$ pigz --help

Usage: pigz [options] [files ...]

will compress files in place, adding the suffix '.gz'. If no files are

specified, stdin will be compressed to stdout. pigz does what gzip does,

but spreads the work over multiple processors and cores when compressing.

Options:

-0 to -9, -11 Compression level (11 is much slower, a few % better)

--fast, --best Compression levels 1 and 9 respectively

-b, --blocksize mmm Set compression block size to mmmK (default 128K)

-c, --stdout Write all processed output to stdout (won't delete)

-d, --decompress Decompress the compressed input

-f, --force Force overwrite, compress .gz, links, and to terminal

-F --first Do iterations first, before block split for -11

-h, --help Display a help screen and quit

-i, --independent Compress blocks independently for damage recovery

-I, --iterations n Number of iterations for -11 optimization

-k, --keep Do not delete original file after processing

-K, --zip Compress to PKWare zip (.zip) single entry format

-l, --list List the contents of the compressed input

-L, --license Display the pigz license and quit

-M, --maxsplits n Maximum number of split blocks for -11

-n, --no-name Do not store or restore file name in/from header

-N, --name Store/restore file name and mod time in/from header

-O --oneblock Do not split into smaller blocks for -11

-p, --processes n Allow up to n compression threads (default is the

number of online processors, or 8 if unknown)

-q, --quiet Print no messages, even on error

-r, --recursive Process the contents of all subdirectories

-R, --rsyncable Input-determined block locations for rsync

-S, --suffix .sss Use suffix .sss instead of .gz (for compression)

-t, --test Test the integrity of the compressed input

-T, --no-time Do not store or restore mod time in/from header

-v, --verbose Provide more verbose output

-V --version Show the version of pigz

-z, --zlib Compress to zlib (.zz) instead of gzip format

-- All arguments after "--" are treated as files

原目录大小:

[20:30 root@hulab /DataBase/Human/hg19]$ du -h

8.1G ./refgenome

1.4G ./encode_anno

4.2G ./hg19_index/hg19

8.1G ./hg19_index

18G .

接下来我们分别使用gzip以及不同线程数的pigz对h19_index目录进行压缩,比较其运行时间。

### 使用gzip进行压缩(单线程)

[20:30 root@hulab /DataBase/Human/hg19]$ time tar -czvf index.tar.gz hg19_index/

hg19_index/

hg19_index/hg19.tar.gz

hg19_index/hg19/

hg19_index/hg19/genome.8.ht2

hg19_index/hg19/genome.5.ht2

hg19_index/hg19/genome.7.ht2

hg19_index/hg19/genome.6.ht2

hg19_index/hg19/genome.4.ht2

hg19_index/hg19/make_hg19.sh

hg19_index/hg19/genome.3.ht2

hg19_index/hg19/genome.1.ht2

hg19_index/hg19/genome.2.ht2

real 5m28.824s

user 5m3.866s

sys 0m35.314s

### 使用4线程的pigz进行压缩

[20:36 root@hulab /DataBase/Human/hg19]$ ls

encode_anno hg19_index index.tar.gz refgenome

[20:38 root@hulab /DataBase/Human/hg19]$ time tar -cvf - hg19_index/ | pigz -p 4 > index_p4.tat.gz

hg19_index/

hg19_index/hg19.tar.gz

hg19_index/hg19/

hg19_index/hg19/genome.8.ht2

hg19_index/hg19/genome.5.ht2

hg19_index/hg19/genome.7.ht2

hg19_index/hg19/genome.6.ht2

hg19_index/hg19/genome.4.ht2

hg19_index/hg19/make_hg19.sh

hg19_index/hg19/genome.3.ht2

hg19_index/hg19/genome.1.ht2

hg19_index/hg19/genome.2.ht2

real 1m18.236s

user 5m22.578s

sys 0m35.933s

### 使用8线程的pigz进行压缩

[20:42 root@hulab /DataBase/Human/hg19]$ time tar -cvf - hg19_index/ | pigz -p 8 > index_p8.tar.gz

hg19_index/

hg19_index/hg19.tar.gz

hg19_index/hg19/

hg19_index/hg19/genome.8.ht2

hg19_index/hg19/genome.5.ht2

hg19_index/hg19/genome.7.ht2

hg19_index/hg19/genome.6.ht2

hg19_index/hg19/genome.4.ht2

hg19_index/hg19/make_hg19.sh

hg19_index/hg19/genome.3.ht2

hg19_index/hg19/genome.1.ht2

hg19_index/hg19/genome.2.ht2

real 0m42.670s

user 5m48.527s

sys 0m28.240s

### 使用16线程的pigz进行压缩

[20:43 root@hulab /DataBase/Human/hg19]$ time tar -cvf - hg19_index/ | pigz -p 16 > index_p16.tar.gz

hg19_index/

hg19_index/hg19.tar.gz

hg19_index/hg19/

hg19_index/hg19/genome.8.ht2

hg19_index/hg19/genome.5.ht2

hg19_index/hg19/genome.7.ht2

hg19_index/hg19/genome.6.ht2

hg19_index/hg19/genome.4.ht2

hg19_index/hg19/make_hg19.sh

hg19_index/hg19/genome.3.ht2

hg19_index/hg19/genome.1.ht2

hg19_index/hg19/genome.2.ht2

real 0m23.643s

user 6m24.054s

sys 0m24.923s

### 使用32线程的pigz进行压缩

[20:43 root@hulab /DataBase/Human/hg19]$ time tar -cvf - hg19_index/ | pigz -p 32 > index_p32.tar.gz

hg19_index/

hg19_index/hg19.tar.gz

hg19_index/hg19/

hg19_index/hg19/genome.8.ht2

hg19_index/hg19/genome.5.ht2

hg19_index/hg19/genome.7.ht2

hg19_index/hg19/genome.6.ht2

hg19_index/hg19/genome.4.ht2

hg19_index/hg19/make_hg19.sh

hg19_index/hg19/genome.3.ht2

hg19_index/hg19/genome.1.ht2

hg19_index/hg19/genome.2.ht2

real 0m17.523s

user 7m27.479s

sys 0m29.283s

### 解压文件

[21:00 root@hulab /DataBase/Human/hg19]$ time pigz -p 8 -d index_p8.tar.gz

real 0m27.717s

user 0m30.070s

sys 0m22.515s

各个压缩时间的比较:

程序

线程数

时间

gzip

1

5m28.824s

pigz

4

1m18.236s

pigz

8

0m42.670s

pigz

16

0m23.643s

pigz

32

0m17.523s

从上面可以看出,使用多线程pigz进行压缩能进行大大的缩短压缩时间,特别是从单线程的gzip到4线程的pigz压缩时间缩短了4倍,继续加多线程数,压缩时间减少逐渐不那么明显。

虽然pigz能大幅度的缩短运行时间,但这是以牺牲cpu为代价的,所以对于cpu使用较高的场景不太宜使用较高的线程数,一般而言使用4线程或8线程较为合适。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值