目前使用Stream API无法轻松实现对匹配谓词的连续元素进行操作.
Returns a stream consisting of lists of elements of this stream where adjacent elements are grouped according to supplied predicate.
以下代码将连续行的记录部件号严格大于前一行的记录部件号组合在一起.记录号用正则表达式提取,该表达式查找第一个被忽略的数字后面的所有数字.
private static final Pattern PATTERN = Pattern.compile("\\d(\\d+)");
public static void main(String[] args) throws IOException {
try (StreamEx stream = StreamEx.ofLines(Paths.get("..."))) {
List records =
stream.groupRuns((s1, s2) -> getRecordPart(s2) > getRecordPart(s1))
.map(RecordFactory::createRecord)
.toList();
}
}
private static final int getRecordPart(String str) {
Matcher matcher = PATTERN.matcher(str);
if (matcher.find()) {
return Integer.parseInt(matcher.group(1));
}
return 1; // if the pattern didn't find anything, it means the record is on a single line
}
这假定您的RecordFactory将从List< String>创建一个Record.而不是来自String.请注意,此解决方案可以并行运行,但如果您希望获得更好的并行性能(以内存为代价),最好将文件内容存储到List中并对该列表进行后处理.