Javascript Regular Expressions

10.1.2 Character Classes

Individual literal characters can be combined into character classes by placing them within square brackets. A character class matches any one character that is contained within it. Thus, the regular expression /[abc]/ matches any one of the letters a, b, or c. Negated character classes can also be defined -- these match any character except those contained within the brackets. A negated character class is specified by placing a caret (^ ) as the first character inside the left bracket. The regexp /[^abc]/ matches any one character other than a, b, or c. Character classes can use a hyphen to indicate a range of characters. To match any one lowercase character from the Latin alphabet, use /[a-z]/ , and to match any letter or digit from the Latin alphabet, use /[a-zA-Z0-9]/ .

Because certain character classes are commonly used, the JavaScript regular expression syntax includes special characters and escape sequences to represent these common classes. For example, \s matches the space character, the tab character, and any other Unicode whitespace character, and \S matches any character that is not Unicode whitespace. Table 10-2 lists these characters and summarizes character class syntax. (Note that several of these character class escape sequences match only ASCII characters and have not been extended to work with Unicode characters. You can explicitly define your own Unicode character classes; for example, /[\u0400-04FF]/ matches any one Cyrillic character.)

Table 10-2. Regular expression character classes

Character

Matches

[...]

Any one character between the brackets.

[^...]

Any one character not between the brackets.

.

Any character except newline or another Unicode line terminator.

\w

Any ASCII word character. Equivalent to [a-zA-Z0-9_] .

\W

Any character that is not an ASCII word character. Equivalent to [^a-zA-Z0-9_] .

\s

Any Unicode whitespace character.

\S

Any character that is not Unicode whitespace. Note that \w and \S are not the same thing.

\d

Any ASCII digit. Equivalent to [0-9] .

\D

Any character other than an ASCII digit. Equivalent to [^0-9] .

[\b]

A literal backspace (special case).

Note that the special character class escapes can be used within square brackets. \s matches any whitespace character and \d matches any digit, so /[\s\d]/ matches any one whitespace character or digit. Note that there is one special case. As we'll see later, the \b escape has a special meaning. When used within a character class, however, it represents the backspace character. Thus, to represent a backspace character literally in a regular expression, use the character class with one element: /[\b]/ .

10.1.3 Repetition

With the regular expression syntax we have learned so far, we can describe a two-digit number as /\d\d/ and a four-digit number as /\d\d\d\d/ . But we don't have any way to describe, for example, a number that can have any number of digits or a string of three letters followed by an optional digit. These more complex patterns use regular expression syntax that specifies how many times an element of a regular expression may be repeated.

The characters that specify repetition always follow the pattern to which they are being applied. Because certain types of repetition are quite commonly used, there are special characters to represent these cases. For example, + matches one or more occurrences of the previous pattern. Table 10-3 summarizes the repetition syntax. The following lines show some examples:

/\d{2,4}/     // Match between two and four digits

/\w{3}\d?/    // Match exactly three word characters and an optional digit

/\s+java\s+/  // Match "java" with one or more spaces before and after

/[^"]*/       // Match zero or more non-quote characters

Table 10-3. Regular expression repetition characters

Character

Meaning

{ n , m }

Match the previous item at least n times but no more than m times.

{ n ,}

Match the previous item n or more times.

{ n }

Match exactly n occurrences of the previous item.

?

Match zero or one occurrences of the previous item. That is, the previous item is optional. Equivalent to {0,1} .

+

Match one or more occurrences of the previous item. Equivalent to {1,} .

*

Match zero or more occurrences of the previous item. Equivalent to {0,} .

Be careful when using the * and ? repetition characters. Since these characters may match zero instances of whatever precedes them, they are allowed to match nothing. For example, the regular expression /a*/ actually matches the string "bbbb", because the string contains zero occurrences of the letter a!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值