regular expression


regular expression' world is full of interesting. believe that and study it in the core.

regular expression帮助我们match、replace文本。

我将regular expression 分为三部分,search patterns,replacement patterns,opreator


1 search patterns告诉我们怎么去match




2 replacement patterns告诉我们将匹配的内容做如何处理



3 opreator其实与regular expression无关,他与perl相关。

m// (Matching)

s/// (Substitution)


另外perl常用的是基于命令行格式,如perl -ne ‘s///ig’ file.txt


 Search patterns

The characters in the following table have special meaning only in search patterns:




Match any single character except newline. Can match newline in awk.


Match any number (or none) of the single character that immediately precedes it. The preceding character can also be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character."


Match the following regular expression at the beginning of the line or string.


Match the preceding regular expression at the end of the line or string.


Turn off the special meaning of the following character.

[ ]

Match any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally).


Match a range of occurrences of the single character that immediately precedes it. The preceding character can also be a metacharacter. {n} matches exactly n occurrences; {n,} matches at least n occurrences; and {n,m} matches any number of occurrences between n and m. n and m must be between 0 and 255, inclusive.


Just like {n,m}, but with backslashes in front of the braces.

/( /)

Save the pattern enclosed between /( and /) into a special holding space. Up to nine patterns can be saved on a single line. The text matched by the subpatterns can be "replayed" in substitutions by the escape sequences /1 to /9.


Replay the nth sub-pattern enclosed in /( and /) into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left.

在perl regularEX中在()中的内容可以是一个group,也可以将其中匹配内容保存在一个变量中,变量名可以在serach pattern中使用,也可以在replacement pattern中使用。

/< />

Match characters at beginning (/<) or end (/>) of a word.


Match one or more instances of preceding regular expression.


Match zero or one instances of preceding regular expression.


Match the regular expression specified before or after.

( )

Apply a match to the enclosed group of regular expressions.


Word character


Non-word character


Digit character


Non-digit character


Whitespace character


Non-whitespace character
 Replacement patterns

The characters in the following table have special meaning only in replacement patterns:




Turn off the special meaning of the following character.


Restore the text matched by the nth pattern previously saved by /( and /). n is a number from 1 to 9, with 1 starting on the left.


Reuse the text matched by the search pattern as part of the replacement pattern.


Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern (ex and vi).


Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern (ed).


Convert first character of replacement pattern to uppercase.


Convert entire replacement pattern to uppercase.


Convert first character of replacement pattern to lowercase.


Convert entire replacement pattern to lowercase.


Turn off previous /U or /L.


Turn off previous /u or /l.





1.3.2 Regular Expression Operators

Perl provides the built-in regular expression operators qr//, m//, and s///, as well as the split function. Each operator accepts a regular expression pattern string that is run through string and variable interpolation and then compiled.

Regular expressions are often delimited with the forward slash, but you can pick any non-alphanumeric, non-whitespace character. Here are some examples:

qr#...#       m!...!        m{...}
s|...|...|    s[...][...]   s<...>/.../

A match delimited by slashes (/.../) doesn't require a leading m:

/.../      #same as m/.../

Using the single quote as a delimiter suppresses interpolation of variables and the constructs /N{name}, /u, /l, /U, /L, /Q, /E. Normally these are interpolated before being passed to the regular expression engine.

qr// (Quote Regex)


Quote and compile PATTERN as a regular expression. The returned value may be used in a later pattern match or substitution. This saves time if the regular expression is going to be repeatedly interpolated. The match modes (or lack of), /ismxo, are locked in.

m// (Matching)


Match PATTERN against input string. In list context, returns a list of substrings matched by capturing parentheses, or else (1) for a successful match or ( ) for a failed match. In scalar context, returns 1 for success or "" for failure. /imsxo are optional mode modifiers. /cg are optional match modifiers. /g in scalar context causes the match to start from the end of the previous match. In list context, a /g match returns all matches or all captured substrings from all matches. A failed /g match will reset the match start to the beginning of the string unless the match is in combined /cg mode.

s/// (Substitution)


Match PATTERN in the input string and replace the match text with REPLACEMENT, returning the number of successes. /imosx are optional mode modifiers. /g substitutes all occurrences of PATTERN. Each /e causes an evaluation of REPLACEMENT as Perl code.


split /PATTERN/

Return a list of substrings surrounding matches of PATTERN in EXPR. If LIMIT, the list contains substrings surrounding the first LIMIT matches. The pattern argument is a match operator, so use m if you want alternate delimiters (e.g., split m{PATTERN}). The match permits the same modifiers as m{}. Table 1-8 lists the after-match variables.



1.3.4 Examples
Example 1-1. Simple match
# Match Spider-Man, Spiderman, SPIDER-MAN, etc.
my $dailybugle = "Spider-Man Menaces City!";
if ($dailybugle =~ m/spider[- ]?man/i) { do_something(  ); }
Example 1-2. Match, capture group, and qr
# Match dates formatted like MM/DD/YYYY, MM-DD-YY,...
my $date  = "12/30/1969";
my $regex = qr!(/d/d)[-/](/d/d)[-/](/d/d(?:/d/d)?)!;
if ($date =~ m/$regex/) {
  print "Day=  ", $1,
        "Month=", $2,
        "Year= ", $3;
Example 1-3. Simple substitution
# Convert <br> to <br /> for XHTML compliance
my $text = "Hello World! <br>";
$text =~ s#<br>#<br />#ig;
Example 1-4. Harder substitution
# urlify - turn URL's into HTML links
$text = "Check the website,";
$text =~ 
      /b                         # start at word boundary
      (                          # capture to $1
       (https?|telnet|gopher|file|wais|ftp) : 
                                 # resource and colon
       [/w/#~:.?+=&%@!/-] +?     # one or more valid
                                 # characters 
                                 # but take as little as
                                 # possible
      (?=                        # lookahead   
        [.:?/-] *                #  for possible punctuation
        (?: [^/w/#~:.?+=&%@!/-]  #  invalid character
          | $ )                  #  or end of string
     }{<a href="$1">$1</a>}igox;
Any word (a word is defined as a sequence of alphanumerics - no whitespace) that contains a
 double letter, for example "book" has a double "o" and "feed" has a double "e". 






