java断行,Files.lines在Java8中跳过断行

I am reading a very large (500mb) file with Files.lines(...).

It reads a part of the file but at some point it breaks with java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1

I think the file has lines with different charsets. Is there a way to skip these broken lines? I know that the stream returned is backed by a Reader and with the reader I know how to skip, but don't know how to get the Reader from the stream to set it up as I like.

List lines = new ArrayList<>();

try (Stream stream = Files.lines(Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI()), Charset.forName("UTF-8"))) {

stream

.filter(s -> s.substring(0, 2).equalsIgnoreCase("aa"))

.forEach(lines::add);

} catch (final IOException e) {

// catch

}

解决方案

You can’t filter lines with invalid characters after the decoding when the preconfigured decoder already stops the decoding with an exception. You have to configure a CharsetDecoder manually to tell it to ignore invalid input or replace that input with a special character.

CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder()

.onMalformedInput(CodingErrorAction.IGNORE);

Path path=Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI());

List lines;

try(Reader r=Channels.newReader(FileChannel.open(path), dec, -1);

BufferedReader br=new BufferedReader(r)) {

lines=br.lines()

.filter(s -> s.regionMatches(true, 0, "aa", 0, 2))

.collect(Collectors.toList());

}

This simply ignores charset decoding errors, skipping the characters. To skip entire lines containing errors, you can let the decoder insert a replacement character (defaults to '\ufffd') for errors and filter out lines containing that character:

CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder()

.onMalformedInput(CodingErrorAction.REPLACE);

Path path=Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI());

List lines;

try(Reader r=Channels.newReader(FileChannel.open(path), dec, -1);

BufferedReader br=new BufferedReader(r)) {

lines=br.lines()

.filter(s->!s.contains(dec.replacement()))

.filter(s -> s.regionMatches(true, 0, "aa", 0, 2))

.collect(Collectors.toList());

}

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值