核心技术(卷二)02、第1章-正则表达式

正则表达式

  1. 字符

    语法解释
    xThe character x
    \The backslash character
    \0nThe character with octal value 0n (0 <= n <= 7)
    \0nnThe character with octal value 0nn (0 <= n <= 7)
    \0mnnThe character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
    \xhhThe character with hexadecimal value 0xhh
    \uhhhhThe character with hexadecimal value 0xhhhh
    \x{h…h}The character with hexadecimal value 0xh…h (Character.MIN_CODE_POINT <= 0xh…h <= Character.MAX_CODE_POINT)
    \tThe tab character (’\u0009’)
    \nThe newline (line feed) character (’\u000A’)
    \rThe carriage-return character (’\u000D’)
    \fThe form-feed character (’\u000C’)
    \aThe alert (bell) character (’\u0007’)
    \eThe escape character (’\u001B’)
    \cxThe control character corresponding to x
  2. 字符类

    语法解释
    [abc]a, b, or c (simple class)
    [^abc]Any character except a, b, or c (negation)
    [a-zA-Z]a through z or A through Z, inclusive (range)
    [a-d[m-p]]a through d, or m through p: [a-dm-p] (union)
    [a-z&&[def]]d, e, or f (intersection)
    [a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)
    [a-z&&[^m-p]]a through z, and not m through p: [a-lq-z] (subtraction)
  3. 预定义字符类

    语法解释
    .Any character (may or may not match line terminators)
    \dA digit: [0-9]
    \DA non-digit: [^0-9]
    \hA horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
    \HA non-horizontal whitespace character: [^\h]
    \sA whitespace character: [ \t\n\x0B\f\r]
    \SA non-whitespace character: [^\s]
    \vA vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
    \VA non-vertical whitespace character: [^\v]
    \wA word character: [a-zA-Z_0-9]
    \WA non-word character: [^\w]
  4. POSIX 字符类(仅匹配US-ASCII)

    语法解释
    \p{Lower}A lower-case alphabetic character: [a-z]
    \p{Upper}An upper-case alphabetic character:[A-Z]
    \p{ASCII}All ASCII:[\x00-\x7F]
    \p{Alpha}An alphabetic character:[\p{Lower}\p{Upper}]
    \p{Digit}A decimal digit: [0-9]
    \p{Alnum}An alphanumeric character:[\p{Alpha}\p{Digit}]
    \p{Punct}Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_{
    \p{Graph}A visible character: [\p{Alnum}\p{Punct}]
    \p{Print}A printable character: [\p{Graph}\x20]
    \p{Blank}A space or a tab: [ \t]
    \p{Cntrl}A control character: [\x00-\x1F\x7F]
    \p{XDigit}A hexadecimal digit: [0-9a-fA-F]
    \p{Space}A whitespace character: [ \t\n\x0B\f\r]
  5. java.lang.Character classes

    语法解释
    \p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase()
    \p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase()
    \p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace()
    \p{javaMirrored}Equivalent to java.lang.Character.isMirrored()
  6. Unicode脚本、字符块块、类别和二进制内容

    语法解释
    \p{IsLatin}A Latin script character (script)
    \p{InGreek}A character in the Greek block (block)
    \p{Lu}An uppercase letter (category)
    \p{IsAlphabetic}An alphabetic character (binary property)
    \p{Sc}A currency symbol
    \P{InGreek} (大写P)Any character except one in the Greek block (negation)
    [\p{L}&&[^\p{Lu}]]Any letter except an uppercase letter (subtraction)
  7. 边界匹配符

    语法解释
    ^The beginning of a line
    $The end of a line
    \bA word boundary
    \BA non-word boundary
    \AThe beginning of the input
    \GThe end of the previous match
    \ZThe end of the input but for the final terminator(行终止符), if any
    \zThe end of the input
  8. 行终止匹配符

    语法解释
    \RAny Unicode linebreak sequence, is equivalent to \u000D\u000A
  9. 量词(贪婪)

    语法解释
    X?X, once or not at all
    X*X, zero or more times
    X+X, one or more times
    X{n}X, exactly n times
    X{n,}X, at least n times
    X{n,m}X, at least n but not more than m times
  10. 量词后缀

语法解释
?将默认(贪婪)匹配转换为勉强匹配
+将默认(贪婪)匹配转换为占有匹配
  1. 逻辑操作
语法解释
XYX followed by Y
XY
(X)X, as a capturing group
  1. 群组
    |语法|解释|
    |-|-|
    (X) | 匹配将X作为群组的字符串
    \n | 第n个群组的匹配

一个匹配时间的正则表达式

如上图所示,嵌套群组是按照前括号顺序排序的,群组0是整个输入,二用于第一个实际群组的索引为1。上图表达式与"11:59am"匹配的结果为:

群组索引开始结束字符串
00711:59am
10511:59
20211
33559

还可以为上面的表达式群组命名(语法为:(?<name>X))(?<time>(?<hour>1?[0-9]) : (?<minute>[0-5][0-9]))[ap]m。我们将群组1命名为time,群组2命名为hour,群组3命名为minute
13. 引用

语法解释
\Nothing, but quotes the following character
\QNothing, but quotes all characters until \E
\ENothing, but ends quoting started by \Q

相关解释查看API

  1. PatternMatcher
  • 如果只是想知道一个字符串是否匹配某个正则表达式:
 if (Pattern.matches(regex,input)){
 	do something...;
 }
  • 如果需要对匹配字符进行操作:
 Pattern pattern = Pahtter.compile(regex);
 Matcher matcher = pattern.matcher(input);
 if (matcher.matches()){
 	do something with mathcer;
 }

Matcher类有许多的方法。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值