正则表达式

qq28754889

于 2011-12-31 15:49:52 发布

阅读量330

点赞数

分类专栏： js学习笔记文章标签： character 正则表达式 underscore javascript newline whitespace

js学习笔记专栏收录该内容

35 篇文章 0 订阅

订阅专栏

正则表达式中的一个功能强大的搜索和替换行中. 在 JavaScript 中, 正则表达式内置的搜索 (搜索), 匹配 (匹配) 的识别方法和替换行 (替换).

正则表达式, 或正如他们所说的 "regexps", 包括一个模板和其他标志.

最简单的搜索，使用正则表达式是相同的子字符串搜索。正则表达式对象正则表达式作为一个字符串，有限的迹象创建“/” ：

regexp = /att/;

str = "Show me the pattern!";

alert( str.search(regexp) ); // 13

在这个例子中，str.search返回一个正则表达式中的位置，ATT在 Show me the pattern!的位置

在现实生活中，正则表达式可能会非常棘手。例如，对于电子邮件的正则表达式搜索：

regexp = /[a-z0-9!$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/

str = "find-ka email@gmail.com in this text"

alert( str.match(regexp) )   // email@gmail.com

在JavaScript中，你可以在控制台中测试正则表达式，指的是直接通过该方法行str.match：

alert ( "Lala" . match (/ la /))

子串匹配，可用于搜索在一个循环从0到9的所有数字。但正则表达式匹配的情况下正常的处理。

正则表达式可能有一个确切的性质，而不是一个字符类。

例如，一个任意数字\ D表示，在正则表达式。下面的例子匹配一个数字：

showMatch( "I'm 5 years old", /\d/ )   // 5

\d

A digit, any character from 0 to 9( 一个数字，从0到9的任何字符)

\s

A whitespace character, like tab, newline etc.( 一个空白字符，像制表符，换行符等)

\w

A symbol of Latin alphabet or a digit or an underscore '_'(一个拉丁字母的符号或一个数字或下划线“_ “) 一个正则表达式可能包含许多共同定期的符号和字符类：

showMatch( "I'm the 1st one", /\dst/ )   // matches '1st'

showMatch( "I'm 1 year old", /\d\s\w\w\w\w/ )   // 1 year

\D

A non-digit, the inversion of \d

\S

A non-whitespace, the inversion of \s.

\W

A symbol which is neither from Latin alphabet, nor a digit, nor an underscore, the inversion of \w

In the example below, we seek a first non-wordly character:

showMatch( "I'm 1 year old", /\W/ )   // matches apostrophe '

A regexp may also contain non-printable string characters: \n, \t and others. Theese are of course just characters, not classes.

Spaces are important

Usually, we don’t pay enough attention to spaces. A 1-5 or 1 - 5, no much visual difference.

But in regular expressions, a space is just like any other symbol.

The regexp below doesn’t work, because it doesn’t include space symbols:

 
      1 showMatch( "1 - 5", /\d-\d/ )  // no matches!

Let’s fix it. We could put space symbols in regexp or, better, include a generic space symbol:

 
showMatch( "1 - 5", /\d - \d/ )   // works
 
showMatch( "1 - 5", /\d\s-\s\d/ ) // also works
 
showMatch( "1-5", /\d - \d/ ) // fails! (no spaces in string)

The last match fails, because the subject has no spaces. So don’t put extra spaces in regular expressions, they are all meaningful.

In regular expression, the dot '.' denotes any character except a newline:

 
showMatch( "A char", /ch.r/ ) // "char"
 
showMatch( "A ch-r", /ch.r/ ) // "ch-r"
 
showMatch( "A ch r", /ch.r/ ) // "ch r", the space is also a char

Although the dot stands for any char, but there must be a char:

 
    
show clean source in new windowHide/show line numbersprint highlighted code

 
     1 showMatch( "A chr", /ch.r/ ) // not found

There are characters which have special use in regexps: [ \ ^ $ . | ? * + ( ).

They are special, because they are used to enhance regexp searching abilities. Don’t try to remember the list. You will find them easy to remember after we cover them.

To use a special character as a regular symbol, it must be escaped. Or, in other words, prepended with a backslash.

For example, we need to find the dot '.'. In a regexp, it is a special symbol meaning any character excepts a newline.

So we need to escape it:

 
     1 showMatch( "Chapter 5.1", /\d\.\d/ )  // 5.1

Without escaping, \d.\d would match 5+1 as well:

 
     1 showMatch( "5+1 = 6", /\d.\d/ )  // 5+1

Round brackets are also special, so to find an opening bracket, use \(. The example below looks for a worldly character followed by an opening bracket:

 
     1 showMatch( "function g()", /\w\(/ )  // g(

The slash '/' is not a special in regexps syntax, but in a slashed /...pattern.../ it should be also escaped and inserted as '\/', so the JavaScript parser knows you want the character '/', not finishing the regexp.

Here’s how it looks:

 
    
show clean source in new windowHide/show line numbersprint highlighted code

 
     1 showMatch( "/", /\// )  // '/'

A regular expression may have optional flags, which affect the search.

In JavaScript, there are three flags:

Find all matches.

Case-insensitive search

Enable multiline mode.

A flag is appended after the pattern, like /.../g.

A regexp without global flag returns only first match:

 
     1 alert( "123".match( /\d/ ))  // '1'

But if the global flag is used, all matches can be found:

 
     1 alert( "123".match( /\d/g ))  // '1', '2', '3'

Multiple flags are possible. For example, find all matches and ignore the case:

 
    
show clean source in new windowHide/show line numbersprint highlighted code

 
     1 alert( "Smile a smile".match( /SMILE/gi ))  // 'Smile', 'smile'

Several characters or character classes may be grouped in square brackets [...] to search for any of them.

For instance, [eao] means any of characters ‘a’, ‘e’, or ‘o’. That’s a single char from the list.

 
     1 showMatch( "The OGRE on green grass!", /gr[eao]/gi ) // "GRE", "gre", "gra"

Here, gr[eao] matches gre, not gree, because [eao] stands for only one char.

The time can be represented as hour:minute or hour-minute, both hour and minute are 2 digits:

Create a regular expression to find all times in: Breakfast at 09:00. Dinner at 21-30.

Solution

The regular expression: \d\d[-:]\d\d.

 
        1 showMatch( "Breakfast at 09:00. Dinner at 21-30.", /\d\d[-:]\d\d/g)

Note that we in the character set, hyphen '-' is not escaped, because it may not be special in this position.

Flag g means global search instead of only first match.

Square brackets can also contain character ranges. For example, [a-z] is a character from a to z, [0-5] matches a character from 0 to 5.

 
     1 showMatch( "Exception 0xAF", /x[A-F]/g ) // matches "xA", not "xc"

The example above doesn’t find xc in Exception, because the range contains only uppercase characters and there is no i flag.

The regexp matches xA, and not xAF, because [A-F] is a single character from A to F.

Characters, classes and ranges can be put together.

The example finds any character from ranges a-f, A-F or x or a digit:

 
     1 showMatch( "look -> 0xAF", /[\dA-Fa-fx]/g ) // "0", "x", "A", "F"

Most character classes are actually a short representation of ranges, for example:

\d is same as [0-9],
\w is same as [a-zA-Z0-9_],
\s is same as [\t\n\v\f\r ] plus several unicode space symbols.

There are also negated ranges: [^...].

Square brackets starting with a caret: [^...] find all characters except the given ones.

For example:

[^aeou] - any character except ‘a’,’e’,’o’,’u’
[^0-9] - any non-digit, same as \D
[^\s] - any not-a-space, same as \S

Just like the ordinary range, a negated range may contain multiple characters and ranges.
The example below looks for non-letters, non-digits, non-spaces:

 
     1 showMatch( "alice15@gmail.com", /[^\d\sA-Z]/gi ) // "@", "."

Does the pattern k[^s] match the text sock ?

Solution

The regexp looks for a character "k" followed by a character which can be anything excepts.

But in sock, there is no character, hence no match.

A character set [...] must always match a character, no matter if it is inverted or not.

Most special characters can be used in square brackets without escaping.

In square brackets you only need to escape the closing square bracket ']', and the backslash '\'.

Other special characters are escaped only if they may have special meaning

The hyphen '-' must be escaped only if it’s in-between other symbols. If it first or last, then it may not denote a range, and hence can come unescaped: [-...].
The caret symbl '^' must be escaped only if it’s the first symbol [\^..].
All other characters, including dot '.', plus '+', brackets '( )', opening square bracket '[' etc can appear unescaped.

If you look at most regular expression in the code around you, special characters are usually escaped no matter where they are in the regexp.
But square brackets often allow to remove escaping. It makes the pattern more readable.

For example, the regexp [-().^] literally means any of characters from the list -().^. Regexp special symbols do not have any special meaning here.

 
    
show clean source in new windowHide/show line numbersprint highlighted code

 
var re = /[-().^]/g
 
 
 
showMatch( "f(g)-^1", re ) // matches (, ), -, ^

So, technically it is possible to save on extra slashes in square brackets. But if you forget it and put them in, nothing breaks.

 
var re = /[\-\(\)\.\^]/g
 
 
 
showMatch( "f(g)-^1", re ) // matches same (, ), -, ^

qq28754889

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
正则表达式

正则表达式中的一个功能强大的搜索和替换行中. 在 JavaScript 中, 正则表达式内置的搜索 (搜索), 匹配 (匹配) 的识别方法和替换行 (替换).正则表达式, 或正如他们所说的 "regexps", 包括一个模板和其他标志.最简单的搜索，使用正则表达式是相同的子字符串搜索。正则表达式对象正则表达式作为一个字
复制链接

扫一扫

专栏目录