java断行,Files.lines在Java8中跳过断行

最新推荐文章于 2024-05-23 17:54:58 发布

别说你也难过

最新推荐文章于 2024-05-23 17:54:58 发布

阅读量330

点赞数

文章标签： java断行

I am reading a very large (500mb) file with Files.lines(...).

It reads a part of the file but at some point it breaks with java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1

I think the file has lines with different charsets. Is there a way to skip these broken lines? I know that the stream returned is backed by a Reader and with the reader I know how to skip, but don't know how to get the Reader from the stream to set it up as I like.

List lines = new ArrayList<>();

try (Stream stream = Files.lines(Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI()), Charset.forName("UTF-8"))) {

stream

.filter(s -> s.substring(0, 2).equalsIgnoreCase("aa"))

.forEach(lines::add);

} catch (final IOException e) {

// catch

}

解决方案

You can’t filter lines with invalid characters after the decoding when the preconfigured decoder already stops the decoding with an exception. You have to configure a CharsetDecoder manually to tell it to ignore invalid input or replace that input with a special character.

CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder()

.onMalformedInput(CodingErrorAction.IGNORE);

Path path=Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI());

List lines;

try(Reader r=Channels.newReader(FileChannel.open(path), dec, -1);

BufferedReader br=new BufferedReader(r)) {

lines=br.lines()

.filter(s -> s.regionMatches(true, 0, "aa", 0, 2))

.collect(Collectors.toList());

}

This simply ignores charset decoding errors, skipping the characters. To skip entire lines containing errors, you can let the decoder insert a replacement character (defaults to '\ufffd') for errors and filter out lines containing that character: