checksum 工具_Linux上哪种Checksum工具更快?

checksum 工具

It is common practice to calculate the checksums for files to check its integrity. For large files, the checksum computation is slow. Now I am wondering why it is so slow and whether choosing another tool will be better. In this post, I try three common tools md5sum, sha1sum and crc32 to compute checksums on a relatively large file to see which checksum tool on Linux is faster to help us decide the choices of the checksum tool.

通常的做法是计算文件的校验和以检查其完整性。 对于大文件,校验和计算很慢。 现在我想知道为什么它这么慢,以及选择其他工具是否会更好。 在本文中,我尝试使用三种常用的工具md5sumsha1sumcrc32来计算相对较大文件的校验和,以查看Linux上哪种校验和工具更快,以帮助我们确定校验和工具的选择。

File to be checsum’ed is a 15GB text file:

要检查的文件是一个15GB的文本文件:

$ ls -lha wiki.txt 
-rw-r--r-- 1 zma zma 15G Jun 14 10:28 wiki.txt

表现 (The performance)

Now, let’s see how does the three tools perform for computing the checksum of the file.

现在,让我们看看这三个工具如何执行计算文件的校验和。

sha1sum速度 (sha1sum speed)

$ time sha1sum wiki.txt 
251dcb5c08c6a2fabd258f2c8a9b95e15c0cc098  wiki.txt

real    1m21.143s
user    0m21.647s
sys 0m4.668s

crc32速度 (crc32 speed)

$ time crc32 wiki.txt
0080f7a1

real    1m21.051s
user    0m16.194s
sys 0m4.890s

md5sum速度 (md5sum speed)

$ time md5sum wiki.txt
e2e649030c795ffa9f33a99bcb39dde7  wiki.txt

real    1m27.392s
user    0m25.563s
sys 0m3.936s

摘要 (Summary)

From the results, crc32 is the fasted. But it is just a tiny bit faster than sha1sum and md5sum. md5sum is the slowest but just a little bit slower.

从结果来看, crc32是禁食的。 但这仅比sha1summd5sum快一点。 md5sum是最慢的,但稍微慢一点。

Why there is no much differences? To compute the checksums, the tools need to read these files and do the computation. Now, let’s check how much time is needed to read the file content out.

为什么没有太多差异? 要计算校验和,工具需要读取这些文件并进行计算。 现在,让我们检查一下读取文件内容需要多少时间。

$ time dd if=wiki.txt of=/dev/null bs=8192
1953039+1 records in
1953039+1 records out
15999296457 bytes (16 GB) copied, 80.4203 s, 199 MB/s

real    1m20.447s
user    0m0.202s
sys 0m7.091s

The I/O read speed is around 200MB/s. That’s not bad for a single magnetic disk I/O storage.

I / O读取速度约为200MB / s。 对于单个磁盘I / O存储来说,这还不错。

So, almost all time are on reading the file content. The algorithms and the tools themselves are not yet the limitation. The disk I/O speed is.

因此,几乎所有时间都在读取文件内容上。 算法和工具本身还不是限制。 磁盘I / O速度是。

The conclusion is that use any tools that work the best for you (you may need to be aware of the the collisions for these algorithms, check Simard’s comment) without worrying a lot about the speed (it still consumes time) on a relatively modern computer. If you want higher speed, improve your I/O speed first till CPU is the bottleneck (CPU usage reaches 100%).

结论是,使用任何最适合您的工具(您可能需要了解这些算法的冲突,请查看Simard的评论 ),而不必担心相对现代计算机上的速度(仍然会浪费时间) 。 如果要提高速度,请先提高I / O速度,直到CPU成为瓶颈(CPU使用率达到100%)。

如果I / O不是瓶颈怎么办 (What if I/O was not the bottleneck)

Pádraig comments that we can avoid the I/O and measure the computational cost. I did a little bit change to the suggested command to do checksum on a file under /dev/shm/ as crc32 does not accept input from STDIN. The system is the same one on which I did the previous tests. It can only support 3GB by the time I did this test. The results are as follows.

Pádraig 评论说,我们可以避免I / O并测量计算成本。 我对建议的命令做了一些更改,以便对/ dev / shm /下的文件执行校验和,因为crc32不接受来自STDIN的输入。 该系统与我之前进行测试的系统相同。 进行此测试时,它只能支持3GB。 结果如下。

[zma@host:/dev/shm]$ head -c 3G /dev/zero >test
[zma@host:/dev/shm]$ for chk in crc32 md5sum sha1sum ; do echo $chk; time $chk test; done
crc32
480bbe37

real    0m3.411s
user    0m2.931s
sys     0m0.482s
md5sum
c698c87fb53058d493492b61f4c74189  test

real    0m5.103s
user    0m4.697s
sys     0m0.409s
sha1sum
6e7f6dca8def40df0b21f58e11c1a41c3e000285  test

real    0m4.451s
user    0m4.082s
sys     0m0.372s

To summarize the speed if we consider md5sum‘s speed as the baseline:

如果将md5sum的速度作为基线,则总结速度:

md5sum: 1.00x
crc32: 1.50x
sha1sum: 1.15x

md5sum :1.00x
crc32 :1.50x
sha1sum :1.15倍

crc32 is the fastest here. It is a Perl 5 program using Archive::Zip::computeCRC32() to compute the crc32.

crc32是这里最快的。 这是一个Perl 5程序,使用Archive::Zip::computeCRC32()计算crc32。

The throughput here for md5sum is above 600MB/s. This is not a number that can not be achieved by an SSD or a RAID of SSDs. On the system I tested, if the I/O is much improved, the computation will likely affect much of the time spent.

md5sum的吞吐量在600MB / s以上。 这不是SSD或RAID的RAID无法达到的数字。 在我测试的系统上,如果I / O得到很大改善,则计算可能会影响所花费的大部分时间。

CPU型号和使用的校验和工具版本 (CPU model and versions of checksum tools used)

Here are the CPU model and versions of the checksum tools used during the test.

这是测试期间使用的CPU型号和校验和工具的版本。

$ lscpu | grep "Model name"
Model name:            Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
$ md5sum --version
md5sum (GNU coreutils) 8.23
Copyright (C) 2014 FreeSoftware Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ulrich Drepper, Scott Miller, and David Madore.
$ sha1sum --version
sha1sum (GNU coreutils) 8.23
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ulrich Drepper, Scott Miller, and David Madore.
$ rpm -qf `which crc32`
perl-Archive-Zip-1.46-1.fc22.noarch

翻译自: https://www.systutorials.com/which-checksum-tool-on-linux-is-faster/

checksum 工具

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值