正则表达式

正 则 表达式 中 的 一个 功能 强大 的 搜索 和 替换 行 中. 在 JavaScript 中, 正 则 表达式 内置 的 搜索 (搜索), 匹配 (匹配) 的 识别 方法 和 替换 行 (替换).

正 则 表达式, 或 正如 他们 所说 的 "regexps", 包括 一个 模板 和 其他 标志.

最简单的搜索,使用正则表达式是相同的子字符串搜索。正则表达式对象正则表达式作为一个字符串,有限的迹象创建“/” 


regexp = /att/;

str = "Show me the pattern!";

alert( str.search(regexp) ); // 13
在这个例子中,str.search返回一个正则表达式中的位置,ATT Show me the pattern!的位置

在现实生活中,正则表达式可能会非常棘手。例如,对于电子邮件的正则表达式搜索:


 

regexp = /[a-z0-9!$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/

str = "find-ka email@gmail.com in this text"

alert( str.match(regexp) )   // email@gmail.com

在JavaScript中,你可以在控制台中测试正则表达式,指的是直接通过该方法行str.match

alert ( "Lala" . match (/ la /))


子串匹配,可用于搜索在一个循环从0到9的所有数字。但正则表达式匹配的情况下正常的处理 。

正则表达式可能有一个确切的性质,而不是一个字符类

例如,一个任意数字\ D表示,在正则表达式 。下面的例子匹配一个数字:


showMatch( "I'm 5 years old", /\d/ )   // 5
\d
A digit, any character from 0 to 9( 一个数字,从0到9的任何字符)
\s
A whitespace character, like tab, newline etc.( 一个空白字符,像制表符,换行符等)
\w
A symbol of Latin alphabet or a digit or an underscore  '_'(一个拉丁字母的符号或一个数字或下划线“_ “) 一个正则表达式可能包含许多共同定期的符号和字符类 :

showMatch( "I'm the 1st one", /\dst/ )   // matches '1st'
showMatch( "I'm 1 year old", /\d\s\w\w\w\w/ )   // 1 year

\D
A non-digit, the inversion of  \d
\S
A non-whitespace, the inversion of  \s.
\W
A symbol which is neither from Latin alphabet, nor a digit, nor an underscore, the inversion of  \w

In the example below, we seek a first non-wordly character:

showMatch( "I'm 1 year old", /\W/ )   // matches apostrophe '

A regexp may also contain non-printable string characters: \n, \t and others. Theese are of course just characters, not classes.

Spaces are important

Usually, we don’t pay enough attention to spaces. A 1-5 or 1 - 5, no much visual difference.

But in regular expressions, a space is just like any other symbol.

The regexp below doesn’t work, because it doesn’t include space symbols:

1 showMatch( "1 - 5"/\d-\d/ )  // no matches!

Let’s fix it. We could put space symbols in regexp or, better, include a generic space symbol:

1 showMatch( "1 - 5"/\d - \d/ )   // works
2 showMatch( "1 - 5"/\d\s-\s\d/ ) // also works
3 showMatch( "1-5"/\d - \d/ ) // fails! (no spaces in string)

The last match fails, because the subject has no spaces. So don’t put extra spaces in regular expressions, they are all meaningful.

In regular expression, the dot '.' denotes any character except a newline:

1 showMatch( "A char"/ch.r/ ) // "char"
2 showMatch( "A ch-r"/ch.r/ ) // "ch-r"
3 showMatch( "A ch r"/ch.r/ ) // "ch r", the space is also a char

Although the dot stands for any char, but there must be a char:

There are characters which have special use in regexps: [ \ ^ $ . | ? * + ( ).

They are special, because they are used to enhance regexp searching abilities. Don’t try to remember the list. You will find them easy to remember after we cover them.

To use a special character as a regular symbol, it must be escaped. Or, in other words, prepended with a backslash.

For example, we need to find the dot '.'. In a regexp, it is a special symbol meaning any character excepts a newline.

So we need to escape it:

1 showMatch( "Chapter 5.1"/\d\.\d/ )  // 5.1

Without escaping, \d.\d would match 5+1 as well:

1 showMatch( "5+1 = 6"/\d.\d/ )  // 5+1

Round brackets are also special, so to find an opening bracket, use \(. The example below looks for a worldly character followed by an opening bracket:

1 showMatch( "function g()"/\w\(/ )  // g(

The slash '/' is not a special in regexps syntax, but in a slashed /...pattern.../ it should be also escaped and inserted as '\/', so the JavaScript parser knows you want the character '/', not finishing the regexp.

Here’s how it looks:

A regular expression may have optional flags, which affect the search.

In JavaScript, there are three flags:

g
Find all matches.
i
Case-insensitive search
m
Enable multiline mode.

A flag is appended after the pattern, like /.../g.

A regexp without global flag returns only first match:

1 alert( "123".match( /\d/ ))  // '1'

But if the global flag is used, all matches can be found:

1 alert( "123".match( /\d/g ))  // '1', '2', '3'

Multiple flags are possible. For example, find all matches and ignore the case:

1 alert( "Smile a smile".match( /SMILE/gi ))  // 'Smile', 'smile'

Several characters or character classes may be grouped in square brackets [...] to search for any of them.

For instance, [eao] means any of characters ‘a’, ‘e’, or ‘o’. That’s a single char from the list.

1 showMatch( "The OGRE on green grass!"/gr[eao]/gi ) // "GRE", "gre", "gra"

Here, gr[eao] matches gre, not gree, because [eao] stands for only one char.

The time can be represented as hour:minute or hour-minute, both hour and minute are 2 digits:

09:00
21-30

Create a regular expression to find all times in: Breakfast at 09:00. Dinner at 21-30.

Solution

The regular expression: \d\d[-:]\d\d.

1 showMatch( "Breakfast at 09:00. Dinner at 21-30."/\d\d[-:]\d\d/g)

Note that we in the character set, hyphen '-' is not escaped, because it may not be special in this position.

Flag g means global search instead of only first match.

Square brackets can also contain character ranges. For example, [a-z] is a character from a to z[0-5] matches a character from 0 to 5.

1 showMatch( "Exception 0xAF"/x[A-F]/g ) // matches "xA", not "xc"

The example above doesn’t find xc in Exception, because the range contains only uppercase characters and there is no i flag.

The regexp matches xA, and not xAF, because [A-F] is a single character from A to F.

Characters, classes and ranges can be put together.

The example finds any character from ranges a-fA-F or x or a digit:

1 showMatch( "look -> 0xAF"/[\dA-Fa-fx]/g ) // "0", "x", "A", "F"

Most character classes are actually a short representation of ranges, for example:

  • \d is same as [0-9],
  • \w is same as [a-zA-Z0-9_],
  • \s is same as [\t\n\v\f\r ] plus several unicode space symbols.

There are also negated ranges: [^...].

Square brackets starting with a caret: [^...] find all characters except the given ones.

For example:

  • [^aeou] - any character except ‘a’,’e’,’o’,’u’
  • [^0-9] - any non-digit, same as \D
  • [^\s] - any not-a-space, same as \S

Just like the ordinary range, a negated range may contain multiple characters and ranges.
The example below looks for non-letters, non-digits, non-spaces:

1 showMatch( "alice15@gmail.com"/[^\d\sA-Z]/gi ) // "@", "."

Does the pattern k[^s] match the text sock ?

Solution

The regexp looks for a character "k" followed by a character which can be anything excepts.

But in sock, there is no character, hence no match.

A character set [...] must always match a character, no matter if it is inverted or not.

Most special characters can be used in square brackets without escaping.

In square brackets you only need to escape the closing square bracket ']', and the backslash '\'.

Other special characters are escaped only if they may have special meaning

  • The hyphen '-' must be escaped only if it’s in-between other symbols. If it first or last, then it may not denote a range, and hence can come unescaped: [-...].
  • The caret symbl '^' must be escaped only if it’s the first symbol [\^..].
  • All other characters, including dot '.', plus '+', brackets '( )', opening square bracket '[' etc can appear unescaped.

If you look at most regular expression in the code around you, special characters are usually escaped no matter where they are in the regexp.
But square brackets often allow to remove escaping. It makes the pattern more readable.

For example, the regexp [-().^] literally means any of characters from the list -().^. Regexp special symbols do not have any special meaning here.

1 var re = /[-().^]/g
2  
3 showMatch( "f(g)-^1"re ) // matches (, ), -, ^

So, technically it is possible to save on extra slashes in square brackets. But if you forget it and put them in, nothing breaks.

1 var re = /[\-\(\)\.\^]/g
2  
3 showMatch( "f(g)-^1"re ) // matches same (, ), -, ^


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值