String类中的相关函数
boolean matches(String regex)
Tells whether or not this string matches the given regular expression.
String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
String replaceFirst(String regex, String replacement)
Replaces the first substring of this string that matches the given regular expression with the given replacement.
String[] split(String regex)
Splits this string around matches of the given regular expression.
String[] split(String regex, int limit)
Splits this string around matches of the given regular expression.
Pattern类和Matcher类
https://www.cnblogs.com/ggjucheng/p/3423731.html
java.util.regex是一个用正则表达式所订制的模式来对字符串进行匹配工作的类库包。它包括两个类:Pattern和Matcher。
Pattern 一个Pattern是一个正则表达式经编译后的表现模式。
Matcher 一个Matcher对象是一个状态机器,它依据Pattern对象做为匹配模式对字符串展开匹配检查。 首先一个Pattern实例订制了一个所用语法与PERL的类似的正则表达式经编译后的模式,然后一个Matcher实例在这个给定的Pattern实例的模式控制下进行字符串的匹配工作。
Pattern类
https://docs.oracle.com/javase/8/docs/api/
类方法
static Pattern compile(String regex)
Compiles the given regular expression into a pattern.
static Pattern compile(String regex, int flags)
Compiles the given regular expression into a pattern with the given flags.
static boolean matches(String regex, CharSequence input)
Compiles the given regular expression and attempts to match the given input against it.
static String quote(String s)
Returns a literal pattern String for the specified String.
实例方法
String[] split(CharSequence input)
Splits the given input sequence around matches of this pattern.
String[] split(CharSequence input, int limit)
Splits the given input sequence around matches of this pattern.
Matcher matcher(CharSequence input)
Creates a matcher that will match the given input against this pattern.
String pattern()
Returns the regular expression from which this pattern was compiled.
String toString()
Returns the string representation of this pattern.
int flags()
Returns this pattern’s match flags.
Stream splitAsStream(CharSequence input)
Creates a stream from the given input sequence around matches of this pattern.
Predicate asPredicate()
Creates a predicate which can be used to match a string.
Matcher类
https://docs.oracle.com/javase/8/docs/api/
boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
boolean find(int start)
Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
boolean matches()
Attempts to match the entire region against the pattern.
String group()
Returns the input subsequence matched by the previous match.
String group(int group)
Returns the input subsequence captured by the given group during the previous match operation.
String group(String name)
Returns the input subsequence captured by the given named-capturing group during the previous match operation.
int groupCount()
Returns the number of capturing groups in this matcher’s pattern
Pattern pattern()
Returns the pattern that is interpreted by this matcher.
String replaceAll(String replacement)
Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
String replaceFirst(String replacement)
Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.
常见操作
String input = "abc123de";
Pattern p = Pattern.compile("\\d");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group());
}
正则表达式语法
Summary of regular-expression constructs
Construct Matches
Characters
x The character x
\ The backslash character
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh The character with hexadecimal value 0xhh
\uhhhh The character with hexadecimal value 0xhhhh
\x{h…h} The character with hexadecimal value 0xh…h (Character.MIN_CODE_POINT <= 0xh…h <= Character.MAX_CODE_POINT)
\t The tab character (’\u0009’)
\n The newline (line feed) character (’\u000A’)
\r The carriage-return character (’\u000D’)
\f The form-feed character (’\u000C’)
\a The alert (bell) character (’\u0007’)
\e The escape character (’\u001B’)
\cx The control character corresponding to x
Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: a-lq-z
Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
\H A non-horizontal whitespace character: [^\h]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
\V A non-vertical whitespace character: [^\v]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{ASCII} All ASCII:[\x00-\x7F]
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct} Punctuation: One of !"#$%&’()*+,-./:;<=>?@[]^_`{|}~
\p{Graph} A visible character: [\p{Alnum}\p{Punct}]
\p{Print} A printable character: [\p{Graph}\x20]
\p{Blank} A space or a tab: [ \t]
\p{Cntrl} A control character: [\x00-\x1F\x7F]
\p{XDigit} A hexadecimal digit: [0-9a-fA-F]
\p{Space} A whitespace character: [ \t\n\x0B\f\r]
java.lang.Character classes (simple java character type)
\p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored} Equivalent to java.lang.Character.isMirrored()
Classes for Unicode scripts, blocks, categories and binary properties
\p{IsLatin} A Latin script character (script)
\p{InGreek} A character in the Greek block (block)
\p{Lu} An uppercase letter (category)
\p{IsAlphabetic} An alphabetic character (binary property)
\p{Sc} A currency symbol
\P{InGreek} Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)
Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
Linebreak matcher
\R Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group
Back references
\n Whatever the nth capturing group matched
\k Whatever the named-capturing group “name” matched
Quotation
\ Nothing, but quotes the following character
\Q Nothing, but quotes all characters until \E
\E Nothing, but ends quoting started by \Q
Special constructs (named-capturing and non-capturing)
(?X) X, as a named-capturing group
(?:X) X, as a non-capturing group
(?idmsuxU-idmsuxU) Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group