java zip 读取,在Java中高效读取zip文件

I working on a project which works on a very large amount of data.

I have a lot(thousands) of zip files, each containing ONE simple txt file with thousands of lines(about 80k lines).

What I am currently doing is the following:

for(File zipFile: dir.listFiles()){

ZipFile zf = new ZipFile(zipFile);

ZipEntry ze = (ZipEntry) zf.entries().nextElement();

BufferedReader in = new BufferedReader(new InputStreamReader(zf.getInputStream(ze)));

...

In this way I can read the file line by line, but it is definetely too slow.

Given the large number of files and lines that need to be read, I need to read them in a more efficient way.

I have looked for a different approach, but I haven't been able to find anything.

What I think I should use are the java nio APIs intended right for intensive I/O operations, but I don't know how to use them with zip files.

Any help would really be appreciated.

Thanks,

Marco

解决方案

I have a lot(thousands) of zip files. The zipped files are about 30MB each, while the txt inside the zip file is about 60/70 MB. Reading and processing the files with this code takes a lot of hours, around 15, but it depends.

Let's do some back-of-the-envelope calculations.

Let's say you have 5000 files. If it takes 15 hours to process them, this equates to ~10 seconds per file. The files are about 30MB each, so the throughput is ~3MB/s.

This is between one and two orders of magnitude slower than the rate at which ZipFile can decompress stuff.

Either there's a problem with the disks (are they local, or a network share?), or it is the actual processing that is taking most of the time.

The best way to find out for sure is by using a profiler.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值