Java-压缩还能这么优化~喜大普奔

2401_83916241

于 2024-05-15 01:50:54 发布

阅读量837

点赞数 10

分类专栏：程序员文章标签： java spring 开发语言

本文链接：https://blog.csdn.net/2401_83916241/article/details/138879008

版权

程序员专栏收录该内容

552 篇文章 0 订阅

订阅专栏

批量压缩文件 v2.0
@param fileNames 需要压缩的文件名称列表(包含相对路径)
@param zipOutName 压缩后的文件名称

**/

public static void batchZipFiles(List fileNames, String zipOutName) throws Exception {

ZipOutputStream zipOutputStream = null;

WritableByteChannel writableByteChannel = null;

ByteBuffer buffer = ByteBuffer.allocate(2048);

try {

zipOutputStream = new ZipOutputStream(new FileOutputStream(zipOutName));

writableByteChannel = Channels.newChannel(zipOutputStream);

for (String sourceFile : fileNames) {

File source = new File(sourceFile);

zipOutputStream.putNextEntry(new ZipEntry(source.getName()));

FileChannel fileChannel = new FileInputStream(sourceFile).getChannel();

while (fileChannel.read(buffer) != -1) {

//更新缓存区位置

buffer.flip();

while (buffer.hasRemaining()) {

writableByteChannel.write(buffer);

}

buffer.rewind();

}

fileChannel.close();

}

} catch (Exception e) {

log.error(“batchZipFiles error fileNames:” + JSONObject.toJSONString(fileNames), e);

} finally {

zipOutputStream.close();

writableByteChannel.close();

buffer.clear();

}

复制代码

还是利用java.nio包下面的api，首先用Channels.newChannel()方法将zipOutputStream输出流创建一个写的通道通道，在读取文件内容的时候直接用FileInputStream.getChannel()，获取当前文件读的通道，然后从读的通道中通过ByteBuffer(缓冲区)读取文件内容写入writableByteChannel写通道中，一定记得反转缓冲区buffer.flip()，否则读取的内容就是文件最后的内容byte=0时的。这种方法相较于上面的速度如下图所示：

压缩三个大小为3.5GB的文件

第三天

继续优化，听说用上内存映射文件的方式更快！那还等什么，让我来try一try！撸代码：

/**

批量压缩文件 v3.0
@param fileNames 需要压缩的文件名称列表(包含相对路径)
@param zipOutName 压缩后的文件名称

**/

public static void batchZipFiles(List fileNames, String zipOutName) {

ZipOutputStream zipOutputStream = null;

WritableByteChannel writableByteChannel = null;

MappedByteBuffer mappedByteBuffer = null;

try {

zipOutputStream = new ZipOutputStream(new FileOutputStream(zipOutName));

writableByteChannel = Channels.newChannel(zipOutputStream);

for (String sourceFile : fileNames) {

File source = new File(sourceFile);

long fileSize = source.length();

zipOutputStream.putNextEntry(new ZipEntry(source.getName()));

int count = (int) Math.ceil((double) fileSize / Integer.MAX_VALUE);

long pre = 0;

long read = Integer.MAX_VALUE;

//由于一次映射的文件大小不能超过2GB，所以分次映射

for (int i = 0; i < count; i++) {

if (fileSize - pre < Integer.MAX_VALUE) {

read = fileSize - pre;

}

mappedByteBuffer = new RandomAccessFile(source, “r”).getChannel()

.map(FileChannel.MapMode.READ_ONLY, pre, read);

writableByteChannel.write(mappedByteBuffer);

pre += read;

}

//释放资源

Method m = FileChannelImpl.class.getDeclaredMethod(“unmap”, MappedByteBuffer.class);

m.setAccessible(true);

m.invoke(FileChannelImpl.class, mappedByteBuffer);

mappedByteBuffer.clear();

}

} catch (Exception e) {

log.error(“zipMoreFile error fileNames:” + JSONObject.toJSONString(fileNames), e);

} finally {

try {

if (null != zipOutputStream) {

zipOutputStream.close();

}

if (null != writableByteChannel) {

writableByteChannel.close();

}

if (null != mappedByteBuffer) {

mappedByteBuffer.clear();

}

} catch (Exception e) {

log.error(“zipMoreFile error fileNames:” + JSONObject.toJSONString(fileNames), e);

}

复制代码

这里有两个坑的地方是：

1.利用MappedByteBuffer.map文件时如果文件太大超过了Integer.MAX时(大约是2GB)就会报错：

所以这里需要分次将要写入的文件映射为内存文件。

2.这里有个bug，就是将文件映射到内存后，在写完就算clear了mappedByteBuffer，也不会释放内存，这时候就需要手动去释放，详细见上代码。

看速度！

压缩三个大小为3.5GB的文件

肯定是我的打开方式有问题，为什么反而是最慢的。。难道是文件太大了吗？我的机器内存太小了？还是我用的有问题，让我思考一下。。希望留言区讨论一下。

第四天

我在想批量压缩文件这么慢是不是因为是串行的，如果改成多线程并行那不是会快了？说干就干，本来想自己写的，后来在google上查资料发现apache-commons有现成的，那果断不重复造轮子，上代码：

/**

批量压缩文件 v4.0
@param fileNames 需要压缩的文件名称列表(包含相对路径)
@param zipOutName 压缩后的文件名称

**/

public static void compressFileList(String zipOutName, List fileNameList) throws IOException, ExecutionException, InterruptedException {

ThreadFactory factory = new ThreadFactoryBuilder().setNameFormat(“compressFileList-pool-”).build();

ExecutorService executor = new ThreadPoolExecutor(5, 10, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<>(20), factory);

ParallelScatterZipCreator parallelScatterZipCreator = new ParallelScatterZipCreator(executor);

OutputStream outputStream = new FileOutputStream(zipOutName);

ZipArchiveOutputStream zipArchiveOutputStream = new ZipArchiveOutputStream(outputStream);

zipArchiveOutputStream.setEncoding(“UTF-8”);

for (String fileName : fileNameList) {

File inFile = new File(fileName);

final InputStreamSupplier inputStreamSupplier = () -> {

try {

return new FileInputStream(inFile);

} catch (FileNotFoundException e) {

e.printStackTrace();

return new NullInputStream(0);

}

};

ZipArchiveEntry zipArchiveEntry = new ZipArchiveEntry(inFile.getName());

zipArchiveEntry.setMethod(ZipArchiveEntry.DEFLATED);

zipArchiveEntry.setSize(inFile.length());

zipArchiveEntry.setUnixMode(UnixStat.FILE_FLAG | 436);

parallelScatterZipCreator.addArchiveEntry(zipArchiveEntry, inputStreamSupplier);

}

parallelScatterZipCreator.writeTo(zipArchiveOutputStream);

zipArchiveOutputStream.close();

outputStream.close();

log.info(“ParallelCompressUtil->ParallelCompressUtil-> info:{}”, JSONObject.toJSONString(parallelScatterZipCreator.getStatisticsMessage()));

最后

自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数Java工程师，想要提升技能，往往是自己摸索成长，自己不成体系的自学效果低效漫长且无助。

因此收集整理了一份《2024年Java开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，基本涵盖了95%以上Java开发知识点，不论你是刚入门Java开发的新手，还是希望在技术上不断提升的资深开发者，这些资料都将为你打开新的学习之门！

如果你觉得这些内容对你有帮助，需要这份全套学习资料的朋友可以戳我获取！！

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且会持续更新！
img-dZOdM6Oz-1715709042544)]

[外链图片转存中…(img-nxlKjtmH-1715709042545)]

如果你觉得这些内容对你有帮助，需要这份全套学习资料的朋友可以戳我获取！！

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且会持续更新！

2401_83916241

关注

10
点赞
踩
30

收藏

觉得还不错? 一键收藏
0
评论
Java-压缩还能这么优化~喜大普奔

自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。深知大多数Java工程师，想要提升技能，往往是自己摸索成长，自己不成体系的自学效果低效漫长且无助。因此收集整理了一份《2024年Java开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。
复制链接

扫一扫