# 正则表达式语法讲解（二）

The /A and /Z are just like "^'' and "$'', except that they won't match multiple times when the modifier /m is used, while "^'' and "$'' will match at every internal line separator.
/A and /Z的含义跟"^'' and "$''一样，但当使用/m时，他们不能匹配多次，而此时"^'' and "$''会在每个行分割符中匹配多次。

The ".'' metacharacter by default matches any character, but if You switch Off the modifier /s, then '.' won't match embedded line separators.
".''默认匹配任意一个字符，但如果你关闭/s，".''就不会匹配嵌入的行分隔符。

TRegExpr works with line separators as recommended at www.unicode.org ( http://www.unicode.org/unicode/reports/tr18/ ):
www.unicode.org ( http://www.unicode.org/unicode/reports/tr18/ )有TregExpr与行分隔符一起工作的说明：

"^" is at the beginning of a input string, and, if modifier /m is On, also immediately following any occurrence of /x0D/x0A or /x0A or /x0D (if You are using Unicode version of TRegExpr, then also /x2028 or  /x2029 or /x0B or /x0C or /x85). Note that there is no empty line within the sequence /x0D/x0A.
"^"位于输入字符串的开头，但是，如果/m是开启的，它会立即匹配跟随在/x0D/x0A or /x0A or /x0D后的字符串（如果你使用Unicode版本的TregExpr，那么也可以是/x2028 or  /x2029 or /x0B or /x0C or /x85）。注意在/x0D/x0A序列中没有空行。

"$" is at the end of a input string, and, if modifier /m is On, also immediately preceding any occurrence of /x0D/x0A or /x0A or /x0D (if You are using Unicode version of TRegExpr, then also /x2028 or /x2029 or /x0B or /x0C or /x85). Note that there is no empty line within the sequence /x0D/x0A. "$"位于输入字符串的结尾，但是，如果/m是开启的，它会立即匹配在/x0D/x0A or /x0A or /x0D前的字符串（如果你使用Unicode版本的TregExpr，那么也可以是/x2028 or  /x2029 or /x0B or /x0C or /x85）注意在/x0D/x0A序列中没有空行。

"." matchs any character, but if You switch Off modifier /s then "." doesn't match /x0D/x0A and /x0A and /x0D (if You are using Unicode version of TRegExpr, then also /x2028 and  /x2029 and /x0B and /x0C and /x85).
"."匹配任意一个字符，但是，如果关闭/s，那么"."不会匹配/x0D/x0A and /x0A and /x0D（如果你使用Unicode版本的TregExpr，那么也不会匹配/x2028 and  /x2029 and /x0B and /x0C and /x85）

Note that "^.*$" (an empty line pattern) doesnot match the empty string within the sequence /x0D/x0A, but matchs the empty string within the sequence /x0A/x0D. 注意"^.*$"（空行模式）不会匹配中间有/x0D/x0A序列的空字符串，但匹配中间有/x0A/x0D的空字符串。

Multiline processing can be easely tuned for Your own purpose with help of TRegExpr properties LineSeparators and LinePairedSeparator, You can use only Unix style separators /n or only DOS/Windows style /r/n or mix them together (as described above and used by default) or define Your own line separators!

Metacharacters - predefined classes

/w     an alphanumeric character (including "_")  一个阿尔发字符（包括"_"）
/W     a nonalphanumeric      非阿尔发字符
/d     a numeric character      数字
/D     a non-numeric          非数字
/s     any space (same as [ /t/n/r/f])    任意空格（同[ /t/n/r/f]）
/S     a non space             非空格

You may use /w, /d and /s within custom character classes.

Examples:
foob/dr     matchs strings like 'foob1r', ''foob6r' and so on but not 'foobar', 'foobbr' and so on
foob[/w/s]r matchs strings like 'foobar', 'foob r', 'foobbr' and so on but not 'foob1r', 'foob=r' and so on
foob/dr     匹配如'foob1r', ''foob6r'等字符串，除了'foobar', 'foobbr'等。
foob[/w/s]r  匹配如'foobar', 'foob r', 'foobbr'等字符串，除了'foob1r', 'foob=r'等。

TRegExpr uses properties SpaceChars and WordChars to define character classes /w, /W, /s, /S, so You can easely redefine it.
TRegExpr 使用SpaceChars and WordChars熟悉定义字符类/w, /W, /s, /S，你可以轻松地重定义它。

Metacharacters - word boundaries

/b     Match a word boundary 。     匹配单词
/B     Match a non-(word boundary)   匹配非单词

{TODO    不知道怎么翻译哦}
A word boundary (/b) is a spot between two characters that has a /w on one side of it and a /W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a /W.

Metacharacters - iterators

Any item of a regular expression may be followed by another type of metacharacters - iterators. Using this metacharacters You can specify number of occurences of previous character, metacharacter or subexpression.

*      zero or more ("greedy"), similar to {0,}    出现0次或以上，同{0,}
+      one or more ("greedy"), similar to {1,}     出现1次或以上，同{1,}
?      zero or one ("greedy"), similar to {0,1}     出现0次或1次，同{0,1}，即要么匹配，要么不匹配。

{TODO    下面的翻译有点莫名其妙，要参考下别人是怎么翻译的啊}

{n}    exactly n times ("greedy")               出现n次
{n,}   at least n times ("greedy")                至少n次
{n,m}  at least n but not more than m times ("greedy")    n≤count≤
*?     zero or more ("non-greedy"), similar to {0,}?    要么出现0次，要么出现一次非0
+?     one or more ("non-greedy"), similar to {1,}?    要么出现一次，要么出现一次
??     zero or one ("non-greedy"), similar to {0,1}?    要么出现0次，要么出现一次1次
{n}?   exactly n times ("non-greedy")                出现n次
{n,}?  at least n times ("non-greedy")                 要么出现n次，要么大于n
{n,m}? at least n but not more than m times ("non-greedy")   要么出现一次大于n小于m，要么不出现。

So, digits in curly brackets of the form {n,m}, specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down r.e. execution.

{n}形式等于{n,n}，即匹配确切的n次。
{n,}形式匹配n次或更多次。

If a curly bracket occurs in any other context, it is treated as a regular character.

Examples:
foob.*r     matchs strings like 'foobar',  'foobalkjdflkj9r' and 'foobr'
匹配如'foobar', 'foobalkjdflkj9r' and 'foobr'

foob.+r     matchs strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr'
匹配如'foobar', 'foobalkjdflkj9r'，除了'foobr'
foob.?r     matchs strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r'
匹配如'foobar', 'foobbr' and 'foobr'，除了'foobalkj9r'
fooba{2}r   matchs the string 'foobaar'
匹配'foobaar'
fooba{2,}r  matchs strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.
匹配如'foobaar', 'foobaaar', 'foobaaaar'等
fooba{2,3}r matchs strings like 'foobaar', or 'foobaaar'  but not 'foobaaaar'
匹配如'foobaar', or 'foobaaar'，除了'foobaaaar'

A little explanation about "greediness". "Greedy" takes as many as possible, "non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied to string
'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty string, 'b{2,3}?' returns 'bb', 'b{2,3}' returns 'bbb'.

You can switch all iterators into "non-greedy" mode (see the modifier /g).

Metacharacters – alternatives

You can specify a series of alternatives for a pattern using "|'' to separate them, so that fee|fie|foe will match any of "fee'', "fie'', or "foe'' in the target string (as would f(e|i|o)e). The first alternative includes everything from the last pattern delimiter ("('', "['', or the beginning of the pattern) up to the first "|'', and the last alternative contains everything from the last "|'' to the next pattern delimiter. For this reason, it's common practice to include alternatives in parentheses, to minimize confusion about where they start and end.

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot'', only the "foo'' part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.)

Also remember that "|'' is interpreted as a literal within square brackets, so if You write [fee|fie|foe] You're really only matching [feio|].

Examples:
foo(bar|foo)  matchs strings 'foobar' or 'foofoo'.

Metacharacters – subexpressions

The bracketing construct ( ... ) may also be used for define r.e. subexpressions (after parsing You can find subexpression positions, lengths and actual values in MatchPos, MatchLen and Match properties of TRegExpr, and substitute it in template strings by TRegExpr.Substitute).
(…)也用于定义正在表达式的子表达式（在解析你能找到的子表达式位置后，TRegExpr的MatchPos, MatchLen and Match属性存储了你找到的子表达式位置、长度和匹配，同时用TRegExpr.Substitute替代它们）

Subexpressions are numbered based on the left to right order of their opening parenthesis.
First subexpression has number '1' (whole r.e. match has number '0' - You can substitute it in TRegExpr.Substitute as '$0' or '$&').

Examples:
(foobar){8,10}  matchs strings which contain 8, 9 or 10 instances of the 'foobar'
匹配出现的8, 9 or 10个'foobar'
foob([0-9]|a+)r matchs 'foob0r', 'foob1r' , 'foobar', 'foobaar', 'foobaar' etc.
匹配'foob0r', 'foob1r' , 'foobar', 'foobaar', 'foobaar'等

Metacharacters - backreferences

Metacharacters /1 through /9 are interpreted as backreferences. /<n> matches previously matched subexpression #<n>.

Examples:
(.)/1+         matchs 'aaaa' and 'cc'.            匹配'aaaa' and 'cc'
(.+)/1+        also match 'abab' and '123123'     匹配'abab' and '123123'
(['"]?)(/d+)/1 matchs '"13" (in double quotes), or '4' (in single quotes) or 77 (without quotes) etc
匹配'"13"（两个引号），或者'4'（一个引号）或者77（没有引号）等

• 本文已收录于以下专栏：

举报原因： 您举报文章：正则表达式语法讲解（二） 色情 政治 抄袭 广告 招聘 骂人 其他 (最多只允许输入30个字)