多文件并行读入 java,如何在Java 8中并行读取文件的所有行

I want to read all lines of a 1 GB large file as fast as possible into a Stream. Currently I'm using Files(path).lines() for that. After parsing the file, I'm doing some computations (map()/filter())

At first I thought this is already done in parallel, but it seems I'm wrong:

When reading the file as it is, it takes about 50 seconds on my dual CPU laptop.

However, if I split the file using bash commands and then process them in parallel, it only takes about 30 seconds.

I tried the following combinations:

single file, no parallel lines() stream ~ 50 seconds

single file, Files(..).lines().parallel().[...] ~ 50 seconds

two files, no parallel lines() strean ~ 30 seconds

two files, Files(..).lines().parallel().[...] ~ 30 seconds

I ran these 4 multiple times with roughly the same results (by 1 or 2 seconds). The [...] is a chain of map and filter only, with a toArray(...) at the end to trigger the evaluation.

The conclusion is that there is no difference in using lines().parallel(). As reading two files in parallel takes a shorter time, there is a performance gain from splitting the file. However it seems the whole file is read serially.

Edit:

I want to point out that I use an SSD, so there is practically to seeking time. The file has 1658652 (relatively short) lines in total.

Splitting the file in bash takes about 1.5 seconds:

time split -l 829326 file # 829326 = 1658652 / 2

split -l 829326 file 0,14s user 1,41s system 16% cpu 9,560 total

So my question is, is there any class or function in the Java 8 JDK which can parallelize reading all lines without having to split it first? For example, if I have two CPU cores,

the first line reader should start at the first line and a second one at line (totalLines/2)+1.

解决方案

You might find some help from this post. Trying to parallelize the actual reading of a file is probably barking up the wrong tree, as the biggest slowdown will be your file system (even on an SSD).

If you set up a file channel in memory, you should be able to process the data in parallel from there with great speed, but chances are you won't need it as you'll see a huge speed increase.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值