Regular Expressions(常规表达式)

Regular Expressions

The patterns in the input are written using an extended set of regular expressions. These are:

`x’
match the character `x’
`.’
any character (byte) except newline
`[xyz]’
a “character class”; in this case, the pattern matches either an `x’, a `y’, or a `z’
`[abj-oZ]’
a “character class” with a range in it; matches an `a’, a `b’, any letter from `j’ through `o’, or a `Z’
`[^A-Z]’
a “negated character class”, i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter.
`[^A-Z/n]’
any character EXCEPT an uppercase letter or a newline
`r*’
zero or more r’s, where r is any regular expression
`r+’
one or more r’s
`r?’
zero or one r’s (that is, “an optional r“)
`r{2,5}’
anywhere from two to five r’s
`r{2,}’
two or more r’s
`r{4}’
exactly 4 r’s
`{name}’
the expansion of the “ name” definition (see above)
`”[xyz]/”foo”‘
the literal string: `[xyz]”foo’
`/x
if x is an `a’, `b’, `f’, `n’, `r’, `t’, or `v’, then the ANSI-C interpretation of / x. Otherwise, a literal `x (used to escape operators such as `*’)
`/0′
a NUL character (ASCII code 0)
`/123′
the character with octal value 123
`/x2a’
the character with hexadecimal value 2a
`(r)’
match an r; parentheses are used to override precedence (see below)
`rs
the regular expression r followed by the regular expression s; called “concatenation”
`r|s
either an r or an s
`r/s
an r but only if it is followed by an s. The text matched by s is included when determining whether this rule is the longest match, but is then returned to the input before the action is executed. So the action only sees the text matched by r. This type of pattern is called trailing context. (There are some combinations of `r/s that flex cannot match correctly; see notes in the Deficiencies / Bugs section below regarding “dangerous trailing context”.)
`^r
an r, but only at the beginning of a line (i.e., which just starting to scan, or right after a newline has been scanned).
`r$’
an r, but only at the end of a line (i.e., just before a newline). Equivalent to “ r//n”. Note that flex’s notion of “newline” is exactly whatever the C compiler used to compile flex interprets ‘/n’ as; in particular, on some DOS systems you must either filter out /r’s in the input yourself, or explicitly use r//r/n for “r$”.
`<s>r
an r, but only in start condition s (see below for discussion of start conditions) < s1, s2, s3> r same, but in any of start conditions s1, s2, or s3
`<*>r
an r in any start condition, even an exclusive one.
`<<EOF>>’
an end-of-file
`<s1,s2><<EOF>>’
an end-of-file when in start condition s1 or s2

Note that inside of a character class, all regular expression operators lose their special meaning except escape (’/') and the character class operators, ‘-’, ‘]’, and, at the beginning of the class, ‘^’.

The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence. For example,

foo|bar*

is the same as

(foo)|(ba(r*))

since the ‘*’ operator has higher precedence than concatenation, and concatenation higher than alternation (’|'). This pattern therefore matches either the string “foo” or the string “ba” followed by zero-or-more r’s. To match “foo” or zero-or-more “bar”’s, use:

foo|(bar)*

and to match zero-or-more “foo”’s-or-”bar”’s:

(foo|bar)*

In addition to characters and ranges of characters, character classes can also contain character class expressions. These are expressions enclosed inside `[’: and `:’] delimiters (which themselves must appear between the ‘[’ and ‘]’ of the character class; other elements may occur inside the character class, too). The valid expressions are:

[:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]

These expressions all designate a set of characters equivalent to the corresponding standard C `isXXX’ function. For example, `[:alnum:]’ designates those characters for which `isalnum()’ returns true - i.e., any alphabetic or numeric. Some systems don’t provide `isblank()’, so flex defines `[:blank:]’ as a blank or a tab.

For example, the following character classes are all equivalent:

[[:alnum:]] [[:alpha:][:digit:] [[:alpha:]0-9] [a-zA-Z0-9]

If your scanner is case-insensitive (the `-i’ flag), then `[:upper:]’ and `[:lower:]’ are equivalent to `[:alpha:]’.

Some notes on patterns:

  • A negated character class such as the example “[^A-Z]” above will match a newline unless “/n” (or an equivalent escape sequence) is one of the characters explicitly present in the negated character class (e.g., “[^A-Z/n]”). This is unlike how many other regular expression tools treat negated character classes, but unfortunately the inconsistency is historically entrenched. Matching newlines means that a pattern like [^”]* can match the entire input unless there’s another quote in the input.
  • A rule can have at most one instance of trailing context (the ‘/’ operator or the ‘$’ operator). The start condition, ‘^’, and “<<EOF>>” patterns can only occur at the beginning of a pattern, and, as well as with ‘/’ and ‘$’, cannot be grouped inside parentheses. A ‘^’ which does not occur at the beginning of a rule or a ‘$’ which does not occur at the end of a rule loses its special properties and is treated as a normal character. The following are illegal:
    foo/bar$ <sc1>foo<sc2>bar

    Note that the first of these, can be written “foo/bar/n”. The following will result in ‘$’ or ‘^’ being treated as a normal character:

    foo|(bar$) foo|^bar

    If what’s wanted is a “foo” or a bar-followed-by-a-newline, the following could be used (the special ‘|’ action is explained below):

    foo      | bar$     /* action goes here */

    A similar trick will work for matching a foo or a bar-at-the-beginning-of-a-line.

<script type="text/javascript"> </script> <script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript"> </script>  
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Learning Regular Expressions By 作者: Ben Forta ISBN-10 书号: 0134757068 ISBN-13 书号: 9780134757063 Edition 版本: 1 出版日期: 2018-05-25 pages 页数: 144 $34.99 Learn to use one of the most powerful text processing and manipulation tools available Regular expression experts have long been armed with an incredibly powerful tool, one that can be used to perform all sorts of sophisticated text processing and manipulation in just about every language and on every platform. That’s the good news. The bad news is that for too long, regular expressions have been the exclusive property of only the most tech savvy. Until now. Ben Forta’s Learning Regular Expressions teaches you the regular expressions that you really need to know, starting with simple text matches and working up to more complex topics, including the use of backreferences, conditional evaluation, and look-ahead processing. You’ll learn what you can use, and you’ll learn it methodically, systematically, and simply. Regular expressions are nowhere near as complex as they appear to be at first glance. All it takes is a clear understanding of the problem being solved and how to leverage regular expressions to solve them. Read and understand regular expressions Use literal text and metacharacters to build powerful search patterns Take advantage of advanced regular expression features, including lookahead and backreferences Perform powerful search-and-replace operations in all major professional editing tools Add sophisticated form and text processing to web applications Search for files using command-line tools like grep and egrep Use regular expressions in programming languages like JavaScript, Java, PHP, Python, Microsoft .NET, and C#, as well as in DBMSs including MySQL and Oracle Work with phone numbers, postal codes, social security numbers, IP addresses, URLs, email addresses, and credit card numbers

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值