I know $ is used to check if a line end follows in a Java regular expression.
For the following codes:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$?", "$1");
System.out.println(test_domain);
The output is:
http://www.google.com
line2
line3
I assume that the pattern (\\.[^:/]+).*$? matches the first line, which is http://www.google.com/path, and the $1 is http://www.google.com. The ? makes a reluctant match (so matches the first line.)
However, if I remove the ? in the pattern and implement following codes:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
The output is:
http://www.google.com/path
line2
line3
I think it should give out the result http://www.google.com
(\\.[^:/]+) matches http://www.google.com
.*$ matches /path\nline2\nline3
Where is my misunderstanding of the regex here?
解决方案
Your regex does not match the input string.In fact, $ matches exactly the end of string (at the end of line3). Since you are not using an s flag, the . cannot get there.
More, the $ end of line/string anchor cannot have ? quantifier after it. It makes no sense for the regex engine, and is ignored in Java.
To make it work at all, you need to use s flag if you want to just return http://www.google.com:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?s)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
Output of this demo:
http://www.google.com
With a multiline (?m) flag, the regex will process each line looking for a literal . and then a sequence of characters other than : and /. When one of these characters is found, the rest of characters on that line will be omitted.
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
http://www.google.com
line2
line3