【无标题】高效的获取文件的md5的方法

最新推荐文章于 2025-03-11 11:28:52 发布

yjad

最新推荐文章于 2025-03-11 11:28:52 发布

阅读量1.4k

点赞数 11

文章标签： python 开发语言

本文链接：https://blog.csdn.net/dou986532/article/details/142421451

版权

先直接上代码解决方案


import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.time.StopWatch;

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;

public class FileHashCalculator {

	private static final int CHUNK_SIZE = 1024 * 1024 * 10; // 10 MB

	public static String getMD5Checksum(String filePath) throws IOException, NoSuchAlgorithmException, InterruptedException, ExecutionException {
		File file = new File(filePath);
		if (!file.exists() || !file.isFile()) {
			throw new IllegalArgumentException("文件路径无效或文件不存在: " + filePath);
		}

		try (RandomAccessFile raf = new RandomAccessFile(file, "r");
		     FileChannel fileChannel = raf.getChannel()) {

			long fileSize = fileChannel.size();
			int numChunks = (int) Math.ceil((double) fileSize / CHUNK_SIZE);
			ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
			List<Future<byte[]>> futures = new ArrayList<>();

			for (int i = 0; i < numChunks; i++) {
				long offset = (long) i * CHUNK_SIZE;
				long chunkSize = Math.min(CHUNK_SIZE, fileSize - offset);
				futures.add(executor.submit(new MD5ChunkTask(fileChannel, offset, chunkSize)));
			}
			executor.shutdown();
			MessageDigest md = MessageDigest.getInstance("MD5");
			for (Future<byte[]> future : futures) {
				byte[] chunkMD5 = future.get();
				md.update(chunkMD5);
			}

			byte[] md5Bytes = md.digest();
			return convertBytesToHex(md5Bytes);
		}
	}

	private static String convertBytesToHex(byte[] bytes) {
		StringBuilder sb = new StringBuilder();
		for (byte b : bytes) {
			sb.append(String.format("%02x", b));
		}
		return sb.toString();
	}

	private static class MD5ChunkTask implements Callable<byte[]> {
		private final FileChannel fileChannel;
		private final long offset;
		private final long size;

		public MD5ChunkTask(FileChannel fileChannel, long offset, long size) {
			this.fileChannel = fileChannel;
			this.offset = offset;
			this.size = size;
		}

		@Override
		public byte[] call() throws Exception {
			MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, offset, size);
			MessageDigest md = MessageDigest.getInstance("MD5");
			md.update(buffer);
			return md.digest();
		}
	}


	public static String getCompressFilePwd(String filePath) {
		String res = "";
		try {
			res = getMD5Checksum(filePath);
			if (StringUtils.isNotBlank(res)) {
				res = res.substring(0, 5);
			}
		} catch (Exception e) {
			throw new RuntimeException(e);
		}
		return res;
	}

	public static void main(String[] args) {

		String filePath = "C:\\Users\\longf\\Downloads\\Compressed\\JH_Appform_V5.4_Linux_x64__DianKeXinYun_r40717.tar.gz";
		filePath = "C:\\Users\\longf\\Downloads\\Compressed\\test.txt";
		try {
			for (int i = 0; i < 10; i++) {
				StopWatch stopWatch = new StopWatch();
				stopWatch.start();
				String checksum = getCompressFilePwd(filePath);
				stopWatch.stop();
				System.out.println("CompressFilePwd: " + checksum);
				System.out.println("Time taken: " + stopWatch.getTime() + " ms");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}

1. 文件操作和异常处理

文件操作

RandomAccessFile 和 FileChannel: 使用 RandomAccessFile 可以直接访问文件内容，并通过 FileChannel 进行高效的文件读取操作。在计算文件的 MD5 校验和时，利用 MappedByteBuffer 可以将文件的一部分映射到内存中，以加快数据处理速度。
MappedByteBuffer: 通过 FileChannel.map() 方法可以将文件的一部分映射到内存中的 MappedByteBuffer，从而避免频繁的 I/O 操作，提高文件数据的读取效率。

异常处理

IOException 和 NoSuchAlgorithmException: 在文件操作和安全算法（如 MD5 计算）中可能会抛出的异常需要进行适当的处理。在 getMD5Checksum 方法中使用了 throws IOException, NoSuchAlgorithmException 来声明可能抛出的异常，而在 getCompressFilePwd 方法中使用了捕获异常并抛出 RuntimeException 的方式处理异常。

2. 多线程计算

ExecutorService 和 Callable

ExecutorService: 使用 ExecutorService 可以管理和调度线程池中的线程，支持异步执行任务。在 getMD5Checksum 方法中，通过 Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()) 创建一个固定大小的线程池，根据 CPU 核心数动态调整线程池大小，以最大化性能。
Callable 和 Future: 使用 Callable 接口定义可以返回结果的任务，并通过 Future 对象获取任务的执行结果。在 FileHashCalculator 中，MD5ChunkTask 类实现了 Callable<byte[]> 接口，在每个任务中计算文件的部分 MD5 值，并返回结果给主线程汇总计算。

3. 字符串处理和安全算法

MD5 校验和计算

MessageDigest: 使用 MessageDigest.getInstance("MD5") 获取 MD5 摘要算法的实例，通过 update(byte[]) 方法更新摘要内容，最后通过 digest() 方法计算出最终的 MD5 值。在 MD5ChunkTask 中，每个任务会计算文件的部分 MD5 值，并返回给主线程汇总计算。
字符串处理: 使用 StringUtils.isNotBlank() 方法判断字符串是否不为空，避免空指针异常；使用 StringBuilder 类来拼接字符串，通过 String.format("%02x", b) 将字节数组转换为十六进制字符串表示 MD5 值。

4. 线程池技术

ExecutorService 和 Executors

ExecutorService: 是 Java 并发库提供的一个接口，用于管理线程的执行。ExecutorService 提供了一种灵活的线程管理机制，可以提交任务并控制线程池的行为。
Executors: 是 ExecutorService 的工厂类，提供了创建各种类型的线程池的静态方法。在 FileHashCalculator 类中，使用了 Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()) 创建了一个固定大小的线程池，大小为当前运行环境的 CPU 核心数。这种做法可以最大程度地利用系统资源，提高多线程计算的效率。

5. 性能优化

分块计算和内存映射

CHUNK_SIZE: 在 FileHashCalculator 类中定义了一个 CHUNK_SIZE 常量，用于指定每次读取文件的块大小。通过将文件分成多个块进行计算，可以减少单次计算的数据量，降低内存使用，并且利用了操作系统的文件缓存机制，提高了文件读取的效率。
MappedByteBuffer: 使用 FileChannel.map() 方法创建 MappedByteBuffer，将文件的一部分映射到内存中。这种方式避免了传统的文件 I/O 操作中的数据复制，提升了文件读取和计算 MD5 值的效率。

6. 安全算法和异常处理

MD5 算法和异常处理

MessageDigest: 使用 MessageDigest.getInstance("MD5") 获取 MD5 摘要算法的实例，用于计算文件的 MD5 值。MD5 是一种常见的哈希算法，用于生成文件内容的唯一标识符。
异常处理: 在 getMD5Checksum 方法中捕获并处理了 IOException 和 NoSuchAlgorithmException 异常，保证程序能够正确处理文件读取和算法调用可能抛出的异常情况。同时，在 getCompressFilePwd 方法中，将捕获的异常重新封装为 RuntimeException 抛出，使得异常处理更加灵活和简洁。