正则表达式是被用来匹配字符串中的字符组合的模式。在JavaScript中,正则表达式也是对象。这种模式可以被用作 exec
和test
methods ofRegExp
, and with the match
, replace
, search
, and split
methods of String
. This chapter describes JavaScript regular expressions. RegExp的exec和test方法以及String的replace, search和split方法。
Creating a Regular Expression
通过下面两张方法你可以创建一个正则表达式:
- 使用一个正则表达式字面量,正如下面一样:
var re = /ab+c/;
正则表达式字面量实现了当脚本执行的时候的编译。当你的正则表达式是常量的时候,使用这种方式可以获得更好的性能。
- 调用
RegExp
对象的构造函数,如下所示:var re = new RegExp("ab+c");
使用构造函数,提供了对正则表达式运行时的编译。当你知道正则表达式的模式会发生改变, 或者你事先并不了解它的模式和从其他地方得到的代码,比如用户的输入,这时比较适合用构造函数的方式。
编写一个正则表达式的模式
一个正则表达式模式是由简单的字符所构成的,比如/abc/
, 或者是简单和特殊字符的组合,比如 /ab*c/
or /Chapter (\d+)\.\d*/
. 最后一个例子用到了括号,它在正则表达式中可以被用做是一个记忆设备。这一部分正则所匹配的字符将会被记住,在后面可以被利用。正如 Using Parenthesized Substring Matches
使用简单的模式
简单的模式是有你找到的直接匹配所构成的。比如,/abc/
这个模式就匹配了在一个字符串中,仅仅字符'abc'同时出现并按照这个顺序。在Hi, do you know your abc's?" 和 "The latest airplane designs evolved from slabcraft."就会匹配成功。在上面的两个实例中,匹配的是子字符串‘abc’。在字符串"Grab crab"中将不会被匹配,因为它不包含任何的‘abc’子字符串。
使用特殊字符
选择一个匹配需要比直接匹配需要跟多的条件的时候,比如寻找一个或多个b's,或则寻找空格,那么这时模式将要包含特殊字符。比如, /ab*c/
matches any character combination in which a single 'a' is followed by zero or more 'b's (*
means 0 or more occurrences of the preceding item) and then immediately followed by 'c'. In the string "cbbabbbbcdebc," the pattern matches the substring 'abbbbc'./ab*c/模式匹配了一个单独的‘a’后面跟了零个或则多个b(*的意思是前面一项出现了零个或者多个),且后面跟着‘c’的任何字符组合。在字符串“cbbabbbbcdebc,”中,这个模式匹配了子字符串'abbbbc'。
The following table provides a complete list and description of the special characters that can be used in regular expressions.
下面的表格列出了一个我们在正则表达式中可以利用的特殊字符的完整列表和描述。
Character | Meaning |
---|---|
\ | Either of the following: 下面之一
|
^ | Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, For example, 匹配输入的开始。如果多行标示被设置为true,同时匹配换行后紧跟的字符。 比如,/^A/并不会匹配“an A”中的‘A’,但是会匹配“An E”中的‘A’。 当这个字符出现在一个字符集合模式的第一个字符的时候,它将会有不同的意义。 比如,/[^a-z\s]/会匹配“my 3 sisters”中的‘3’ |
$ | Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, 匹配输入的结束,如果多行标示被设置为true,同时会匹配换行前紧跟的字符。 比如,/t$/并不会匹配“eater”中的‘t’,但是会匹配“eat”中的。 |
* | Matches the preceding character 0 or more times. For example, 匹配前一个字符0次或者是多次。 比如,/bo*/会匹配“A ghost boooooed”中的'boooo'和‘A bird warbled’中的‘b’,但是在“A goat grunted”中将不会匹配任何东西。 |
+ | Matches the preceding character 1 or more times. Equivalent to {1,}. 匹配前面一个字符1次或者多次,和{1,}有相同的效果。 For example, 比如,/a+/匹配了在“candy”中的a,和在"caaaaaaandy"中所有的a。 |
? | Matches the preceding character 0 or 1 time. Equivalent to {0,1}. For example, If used immediately after any of the quantifiers Also used in lookahead assertions, described under x(?=y) and x(?!y) in this table. 匹配前面一个字符0次或者1次,和{0,1}有相同的效果。 比如,/e?le?/匹配“angel”中的‘el’,和"angle"中的‘le’以及“oslo”中的'l'。 如果'?'紧跟在在任何量词*, + , ?,或者是{}的后面,将会事量词变成非贪婪模式(匹配最少的次数),和默认的贪婪模式(匹配最多的次数)正好相反。比如,使用/\d+/非全局的匹配“123abc”将会返回“123”,如果使用/\d+?/,那么久只会匹配到“1”。 同时运用在向前断言,在本表的x(?=y)和x(?!y)中有描述。 |
. | (The decimal point) matches any single character except the newline character. (小数点)匹配任何除了新一行字符的任何单个字符。 For example, 比如,/.n/将会匹配‘nay, an apple is on the tree’中的‘an’和‘on’,但是不会匹配'nay'。 |
(x) | Matches 'x' and remembers the match. These are called capturing parentheses. For example, 匹配‘x’并且记住匹配项。这个被叫做捕获括号。 比如,/(foo)/匹配和记住了“foo bar”中的'foo'。匹配到子字符串可以通过结果数组的[1],...,[n]元素进行访问。 |
(?:x) | Matches 'x' but does not remember the match. These are called non-capturing parentheses. The matched substring can not be recalled from the resulting array's elements 匹配'x'但是不记住匹配项。这种被叫做非捕获括号。匹配到的子字符串不能通过结果数组的[1],...,[n]进行访问。 |
x(?=y) | Matches 'x' only if 'x' is followed by 'y'. This is called a lookahead. For example, 匹配'x'仅仅当'x'后面跟着'y'.这种叫做向后查询。 比如,/Jack(?=Sprat)/会匹配到'Jack'仅仅当它后面跟着'Sprat'。/Jack(?=Sprat|Frost)/匹配‘Jack’仅仅当它后面跟着'Sprat'或者是‘Frost’。但是‘Sprat’和‘Frost’都不是匹配结果的一部分。 |
x(?!y) | Matches 'x' only if 'x' is not followed by 'y'. This is called a negated lookahead. For example, 匹配'x'仅仅当'x'后面不跟着'y',这个被叫反向向前查找。 比如,/\d+(?!\.)/匹配一个数字仅仅当这个数字后面没有跟小数点的时候。正则表达式/\d+(?!\.)/.exec("3.141")匹配‘141’但是不是‘3.141’ |
x|y | Matches either 'x' or 'y'. For example, 匹配‘x’或者‘y’。 比如,/green|red/匹配“green apple”中的‘greem’和“red apple”中的‘red’ |
{n} | Where For example, n是一个正数,匹配了前面一个字符刚好发生了n次。 比如,/a{2}/不会匹配“candy”中的'a',但是会匹配“caandy”中所有的a,和“caaandy”中的前两个'a'。 |
{n,m} | Where For example, |
[xyz] | A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen. Special characters (such as the dot ( For example, |
[^xyz] | A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen. Everything that works in the normal character set also works here. For example, |
[\b] | Matches a backspace (U+0008). (Not to be confused with \ b .) |
\b | Matches a word boundary. A word boundary matches the position where a word character is not followed or preceeded by another word-character. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero. (Not to be confused with Examples: |
\B | Matches a non-word boundary. This matches a position where the previous and next character are of the same type: Either both must be words, or both must be non-words. The beginning and end of a string are considered non-words. For example, |
\cX | Where X is a character ranging from A to Z. Matches a control character in a string. For example, |
\d | Matches a digit character. Equivalent to For example, |
\D | Matches any non-digit character. Equivalent to For example, |
\f | Matches a form feed (U+000C). |
\n | Matches a line feed (U+000A). |
\r | Matches a carriage return (U+000D). |
\s | Matches a single white space character, including space, tab, form feed, line feed. Equivalent to For example, |
\S | Matches a single character other than white space. Equivalent to For example, |
\t | Matches a tab (U+0009). |
\v | Matches a vertical tab (U+000B). |
\w | Matches any alphanumeric character including the underscore. Equivalent to For example, |
\W | Matches any non-word character. Equivalent to For example, |
\n | Where n is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, |
\0 | Matches a NULL (U+0000) character. Do not follow this with another digit, because \0<digits> is an octal escape sequence. |
\xhh | Matches the character with the code hh (two hexadecimal digits) |
\uhhhh | Matches the character with the code hhhh (four hexadecimal digits). |