重新了解了下正则表达式,小记如下,
参考:《Classic Shell Scripting》 p33 ~ p47
POSIX BRE and ERE metacharacters
Character BRE / ERE Meaning in a pattern\ Both
. Both
* Both
^ Both
$ Both
[...] Both
\{n,m\} BRE
\( \) BRE
\n BRE
+ ERE
? ERE
| ERE
( ) ERE
{n,m} ERE
POSIX bracket expressions
Character classes
Class Matching characters[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Nonspace characters
[:lower:] Lowercase characters
[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits
Collating symbols
A collating symbol is a multicharacter sequence that should be treated as a unit.It consists of the characters bracketed by [. and .]. Collating symbols are specific to
the locale in which they are used.
Equivalence classes
An equivalence class lists a set of characters that should be considered equivalent,such as e and è. It consists of a named element from the locale, bracketed by [= and =].
All three of these constructs must appear inside the square brackets of a bracket
expression. For example, [[:alpha:]!] matches any single alphabetic character or the
exclamation mark,and [[.ch.]] matches the collating element ch, but does not match just
the letter c or the letter h. In a French locale, [[=e=]] might match any of e, è, ë, ê, or é.
Basic Regular Expressions
Matching single characters
• Ordinary characters• Metacharacters: escaping it
• The . (dot) character
• Bracket expression: [](such as [012345], [0-5], [^0-5]) or Character classes(such as
[:digit:]) or Equivalence classes(such as [=e=]) or Collating symbols(such as [.ch.]).
Within bracket expressions, all other metacharacters lose their special meanings. Thus,
[*\.] matches a literal asterisk, a literal backslash, or a literal period. To get a ] into the set,
place it first in the list: [ ]*\.] adds the ] to the list. To get a minus character into the set,
place it first in the list: [-*\.]. If you need both a right bracket and a minus, make the right
bracket the first character, and make the minus the last one in the list: [ ]*\.-].
Backreferences
Pattern Matches\(ab\)\(cd\)[def]*\2\1
\(why\).*\1
\([[:alpha:]_][[:alnum:]_]*\) = \1;
\(["']\).*\1
Matching multiple characters with one expression
*\{N\}
\{N,\}
\{N,M\}
\{,M\}
Anchoring text matches
^$
BRE operator precedence
Operator Meaning[..] [==] [::]
\metacharacter
[]
\(\) \digit
* \{\}
no symbol Concatenation
^ $
Extended Regular Expressions
Matching single characters
same as BREs. But one notable exception is that in awk, \ is special inside bracketexpressions. Thus, to match a left bracket, dash, right bracket, or backslash, you could
use [\[\-\]\\].
Backreferences don’t exist
Matching multiple regular expressions with one expression
*+
?
{N}
{N,}
{N,M}
{,M}
Alternation
|
Grouping
()Anchoring text matches
same as BRE. But there is one significant difference: in EREs, ^ and $ are alwaysmetacharacters. Thus, regular expressions such as ab^cd and ef$gh are valid, but cannot
match anything,
ERE operator precedence
Operator Meaning[..] [==] [::]
\metacharacter
[]
()
* + ? {}
no symbol Concatenation
^ $
| Alternation
Additional GNU regular expression operators
Operator Meaning\w Matches any word-constituent character. Equivalent to [[:alnum:]_].
\W Matches any nonword-constituent character. Equivalent to [^[:alnum:]_].
\< \> Matches the beginning and end of a word, as described previously.
\b Matches the null string found at either the beginning or the end of a word.
This is a generalization of the \< and \> operators. Note: Because awk uses
\b to represent the backspace character, GNU awk (gawk) uses \y.
\B Matches the null string between two word-constituent characters.
\' \` Matches the beginning and end of an emacs buffer, respectively. GNU
programs (besides emacs) generally treat these as being equivalent to ^ and $.
Finally, although POSIX explicitly states that the NUL character need not be matchable, GNU
programs have no such restriction. If a NUL character occurs in input data, it can be matched by
the . metacharacter or a bracket expression.
Unix programs and their regular expression type
Type grep sed ed ex/vi more egrep awk lexBRE • • • • •
ERE • • •
\< \> • • • • •