Following is an example of a list of multiline records, each starting with a fixed string label (LABEL):
...
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
Is there a Java regular expression that can much the above and extract each record, i.e.
LABEL ...
...
...
Also, is this the fastest way of extracting those records, or reading line-by-line and checking the start of the string would yield faster results?
解决方案
To iterate over all the LABEL groups, use this:
Pattern regex = Pattern.compile("(?sm)LABEL.*?(?=^LABEL|\\Z)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the current LABEL group: regexMatcher.group()
}
See the demo for the various matches.
Explanation
(?s) activates DOTALL mode, allowing the dot to match across lines
(?m) turns on multi-line mode, allowing ^ and $ to match on each line
LABEL matches literal characters
.*? lazily matches all chars up to...
the point where the lookahead (?=^LABEL|\\Z) can assert that what follows is the next LABEL or the end of the string