sha256算法 java,Java:高效计算大文件的SHA-256哈希

I need to calculate a SHA-256 hash of a large file (or portion of it). My implementation works fine, but its much slower than the C++'s CryptoPP calculation (25 Min. vs. 10 Min for ~30GB file). What I need is a similar execution time in C++ and Java, so the hashes are ready at almost the same time. I also tried the Bouncy Castle implementation, but it gave me the same result. Here is how I calculate the hash:

int buff = 16384;

try {

RandomAccessFile file = new RandomAccessFile("T:\\someLargeFile.m2v", "r");

long startTime = System.nanoTime();

MessageDigest hashSum = MessageDigest.getInstance("SHA-256");

byte[] buffer = new byte[buff];

byte[] partialHash = null;

long read = 0;

// calculate the hash of the hole file for the test

long offset = file.length();

int unitsize;

while (read < offset) {

unitsize = (int) (((offset - read) >= buff) ? buff : (offset - read));

file.read(buffer, 0, unitsize);

hashSum.update(buffer, 0, unitsize);

read += unitsize;

}

file.close();

partialHash = new byte[hashSum.getDigestLength()];

partialHash = hashSum.digest();

long endTime = System.nanoTime();

System.out.println(endTime - startTime);

} catch (FileNotFoundException e) {

e.printStackTrace();

}

解决方案

My explanation may not solve your problem since it depends a lot on your actual runtime environment, but when I run your code on my system, the throughput is limited by disk I/O and not the hash calculation. The problem is not solved by switching to NIO, but is simply caused by the fact that you're reading the file in very small pieces (16kB). Increasing the buffer size (buff) on my system to 1MB instead of 16kB more than doubles the throughput, but with >50MB/s, I am still limited by disk speed and not able to fully load a single CPU core.

BTW: You can simplify your implementation a lot by wrapping a DigestInputStream around a FileInputStream, read through the file and get the calculated hash from the DigestInputStream instead of manually shuffling the data from a RandomAccessFile to the MessageDigest as in your code.

I did a few performance tests with older Java versions and there seem to be a relevant difference between Java 5 and Java 6 here. I'm not sure though if the SHA implementation is optimized or if the VM is executing the code much faster. The throughputs I get with the different Java versions (1MB buffer) are:

Sun JDK 1.5.0_15 (client): 28MB/s, limited by CPU

Sun JDK 1.5.0_15 (server): 45MB/s, limited by CPU

Sun JDK 1.6.0_16 (client): 42MB/s, limited by CPU

Sun JDK 1.6.0_16 (server): 52MB/s, limited by disk I/O (85-90% CPU load)

I was a little bit curious on the impact of the assembler part in the CryptoPP SHA implementation, as the benchmarks results indicate that the SHA-256 algorithm only requires 15.8 CPU cycles/byte on an Opteron. I was unfortunately not able to build CryptoPP with gcc on cygwin (the build succeeded, but the generated exe failed immediately), but building a performance benchmark with VS2005 (default release configuration) with and without assembler support in CryptoPP and comparing to the Java SHA implementation on an in-memory buffer, leaving out any disk I/O, I get the following results on a 2.5GHz Phenom:

Sun JDK1.6.0_13 (server): 26.2 cycles/byte

CryptoPP (C++ only): 21.8 cycles/byte

CryptoPP (assembler): 13.3 cycles/byte

Both benchmarks compute the SHA hash of a 4GB empty byte array, iterating over it in chunks of 1MB, which are passed to MessageDigest#update (Java) or CryptoPP's SHA256.Update function (C++).

I was able to build and benchmark CryptoPP with gcc 4.4.1 (-O3) in a virtual machine running Linux and got only appr. half the throughput compared to the results from the VS exe. I am not sure how much of the difference is contributed to the virtual machine and how much is caused by VS usually producing better code than gcc, but I have no way to get any more exact results from gcc right now.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值