java读取超大文本,Java读取大文本文件具有7000万行文本

I have a big test file with 70 million lines of text.

I have to read the file line by line.

I used two different approaches:

InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode");

BufferedReader br = new BufferedReader(isr);

while((cur=br.readLine()) != null);

and

LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode");

while(it.hasNext()) cur=it.nextLine();

Is there another approach which can make this task faster?

Best Regards,

解决方案

1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering

2) You can take measurements and see for yourself

3) Though there's no performance benifits I like 1.7 approach

try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {

for (String line = null; (line = br.readLine()) != null;) {

//

}

}

4) Scanner based version

try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {

while (sc.hasNextLine()) {

String line = sc.nextLine();

}

// note that Scanner suppresses exceptions

if (sc.ioException() != null) {

throw sc.ioException();

}

}

5) This may be faster than the rest

try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {

ByteBuffer bb = ByteBuffer.allocateDirect(1000);

for(;;) {

StringBuilder line = new StringBuilder();

int n = ch.read(bb);

// add chars to line

// ...

}

}

it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值