I have some complex regular expressions which I need to comment for readability and maintenance. The Java spec is rather terse and I struggled for a long time getting this working. I finally caught my bug and will post it as an answer but I'd be grateful for any other advice on maintaining regexes
As an example I want to comment the subcomponents (of patternS) in a simple name parser:
String testTarget = "Waldorf T. Flywheel";
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.)?\\s+([A-Za-z]+)";
Pattern pattern = Pattern.compile(patternS, Pattern.COMMENTS);
Assert.assertTrue(pattern.matcher(testTarget).matches());
EDIT: I would be grateful for examples of the (?x) format as well.
EDIT: @geowa4 has a good suggestion which avoids embedded comments. Sinnce java and others have provided for embedded comments what are the cases where they are useful? (I think I have a case but I'd be interested to see others).
EDIT: As noted below @mikej the regex does not support the optional initial well and would be better as:
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.\\s+)?([A-Za-z]+)";
but that would end up extracting space in the initial
解决方案
See the post by Martin Fowler on ComposedRegex for some more ideas on improving regexp readability. In summary, he advocates breaking down a complex regexp into smaller parts which can be given meaningful variable names. e.g.
String mandatoryName = "([A-Za-z]+)";
String mandatoryWhiteSpace = "\\s+";
String optionalInitial = "([A-Z]\\.)?";
String pattern = mandatoryName + mandatoryWhiteSpace + optionalInitial +
mandatoryWhiteSpace + mandatoryName;