正则表达式
-
字符
语法 解释 x The character x \ The backslash character \0n The character with octal value 0n (0 <= n <= 7) \0nn The character with octal value 0nn (0 <= n <= 7) \0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) \xhh The character with hexadecimal value 0xhh \uhhhh The character with hexadecimal value 0xhhhh \x{h…h} The character with hexadecimal value 0xh…h (Character.MIN_CODE_POINT <= 0xh…h <= Character.MAX_CODE_POINT) \t The tab character (’\u0009’) \n The newline (line feed) character (’\u000A’) \r The carriage-return character (’\u000D’) \f The form-feed character (’\u000C’) \a The alert (bell) character (’\u0007’) \e The escape character (’\u001B’) \cx The control character corresponding to x -
字符类
语法 解释 [abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction) -
预定义字符类
语法 解释 . Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] \H A non-horizontal whitespace character: [^\h] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] \V A non-vertical whitespace character: [^\v] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w] -
POSIX 字符类(仅匹配US-ASCII)
语法 解释 \p{Lower} A lower-case alphabetic character: [a-z] \p{Upper} An upper-case alphabetic character:[A-Z] \p{ASCII} All ASCII:[\x00-\x7F] \p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}] \p{Digit} A decimal digit: [0-9] \p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}] \p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_
{\p{Graph} A visible character: [\p{Alnum}\p{Punct}] \p{Print} A printable character: [\p{Graph}\x20] \p{Blank} A space or a tab: [ \t] \p{Cntrl} A control character: [\x00-\x1F\x7F] \p{XDigit} A hexadecimal digit: [0-9a-fA-F] \p{Space} A whitespace character: [ \t\n\x0B\f\r] -
java.lang.Character classes
类语法 解释 \p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase() \p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase() \p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace() \p{javaMirrored} Equivalent to java.lang.Character.isMirrored() -
Unicode脚本、字符块块、类别和二进制内容
语法 解释 \p{IsLatin} A Latin script character (script) \p{InGreek} A character in the Greek block (block) \p{Lu} An uppercase letter (category) \p{IsAlphabetic} An alphabetic character (binary property) \p{Sc} A currency symbol \P{InGreek} (大写P) Any character except one in the Greek block (negation) [\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction) -
边界匹配符
语法 解释 ^ The beginning of a line $ The end of a line \b A word boundary \B A non-word boundary \A The beginning of the input \G The end of the previous match \Z The end of the input but for the final terminator(行终止符), if any \z The end of the input -
行终止匹配符
语法 解释 \R Any Unicode linebreak sequence, is equivalent to \u000D\u000A -
量词(贪婪)
语法 解释 X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times -
量词后缀
语法 | 解释 |
---|---|
? | 将默认(贪婪)匹配转换为勉强匹配 |
+ | 将默认(贪婪)匹配转换为占有匹配 |
- 逻辑操作
语法 | 解释 |
---|---|
XY | X followed by Y |
X | Y |
(X) | X, as a capturing group |
- 群组
|语法|解释|
|-|-|
(X) | 匹配将X作为群组的字符串
\n | 第n个群组的匹配
如上图所示,嵌套群组是按照前括号顺序排序的,群组0是整个输入,二用于第一个实际群组的索引为1。上图表达式与"11:59am"匹配的结果为:
群组索引 | 开始 | 结束 | 字符串 |
---|---|---|---|
0 | 0 | 7 | 11:59am |
1 | 0 | 5 | 11:59 |
2 | 0 | 2 | 11 |
3 | 3 | 5 | 59 |
还可以为上面的表达式群组命名(语法为:(?<name>X)
)(?<time>(?<hour>1?[0-9]) : (?<minute>[0-5][0-9]))[ap]m。我们将群组1命名为time,群组2命名为hour,群组3命名为minute
13. 引用
语法 | 解释 |
---|---|
\ | Nothing, but quotes the following character |
\Q | Nothing, but quotes all characters until \E |
\E | Nothing, but ends quoting started by \Q |
相关解释查看API
Pattern
和Matcher
类
- 如果只是想知道一个字符串是否匹配某个正则表达式:
if (Pattern.matches(regex,input)){
do something...;
}
- 如果需要对匹配字符进行操作:
Pattern pattern = Pahtter.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()){
do something with mathcer;
}
Matcher类
有许多的方法。