C++11正则表达式

```
// regex_replace example
#include <iostream>
#include <string>
#include <regex>
#include <iterator>

int main ()
{
  std::string s ("there is a subsequence in the string\n");
  std::regex e ("\\b(sub)([^ ]*)");   // matches words beginning by "sub"

  // using string/c-string (3) version:
  std::cout << std::regex_replace (s,e,"sub-$2");

  // using range/c-string (6) version:
  std::string result;
  std::regex_replace (std::back_inserter(result), s.begin(), s.end(), e, "$2");
  std::cout << result;

  // with flags:
  std::cout << std::regex_replace (s,e,"$1 and $2",std::regex_constants::format_no_copy);
  std::cout << std::endl;

  return 0;
}
```

```
// regex_match example
#include <iostream>
#include <string>
#include <regex>

int main ()
{

  if (std::regex_match ("subject", std::regex("(sub)(.*)") ))
    std::cout << "string literal matched\n";

  const char cstr[] = "subject";
  std::string s ("subject");
  std::regex e ("(sub)(.*)");

  if (std::regex_match (s,e))
    std::cout << "string object matched\n";

  if ( std::regex_match ( s.begin(), s.end(), e ) )
    std::cout << "range matched\n";

  std::cmatch cm;    // same as std::match_results<const char*> cm;
  std::regex_match (cstr,cm,e);
  std::cout << "string literal with " << cm.size() << " matches\n";

  std::smatch sm;    // same as std::match_results<string::const_iterator> sm;
  std::regex_match (s,sm,e);
  std::cout << "string object with " << sm.size() << " matches\n";

  std::regex_match ( s.cbegin(), s.cend(), sm, e);
  std::cout << "range with " << sm.size() << " matches\n";

  // using explicit flags:
  std::regex_match ( cstr, cm, e, std::regex_constants::match_default );

  std::cout << "the matches were: ";
  for (unsigned i=0; i<cm.size(); ++i) {
    std::cout << "[" << cm[i] << "] ";
  }

  std::cout << std::endl;

  return 0;
}
```

```
// ConsoleApplication6.cpp : 定义控制台应用程序的入口点。
//

#include "stdafx.h"
#include <iostream>
#include <regex>

int _tmain(int argc, _TCHAR* argv[])
{
	std::string strPattern = "%1我们%2的歌";
	strPattern = std::regex_replace(strPattern, std::regex("%[1-9]"), "(.*)");
	std::cout << "pattern:" << strPattern << std::endl;

	std::string str = "tujiaw我们zhudan的歌";
	bool result = std::regex_match(str, std::regex(strPattern));
	std::cout << (result ? "true" : "false") << std::endl;

	system("pause");
	return 0;
}

// pattern:(.*)我们(.*)的歌
// true
```


syntax specifications

std::ECMAScript syntax

ECMAScript regular expressions pattern syntax
The following syntax is used to  construct  regex objects (or  assign) that have selected  ECMAScript as its grammar.

regular expression pattern is formed by a sequence of characters.
Regular expression operations look sequentially for matches between the characters of the pattern and the characters in the target sequence: In principle, each character in the pattern is matched against the corresponding character in the target sequence, one by one. But the regex syntax allows for special characters and expressions in the pattern:

Special pattern characters

Special pattern characters are characters (or sequences of characters) that have a special meaning when they appear in a regular expression pattern, either to represent a character that is difficult to express in a string, or to represent a category of characters. Each of these  special pattern characters is matched in the target sequence against a single character (unless a quantifier specifies otherwise).

charactersdescriptionmatches
.not newlineany character except line terminators (LF, CR, LS, PS).
\ttab (HT)a horizontal tab character (same as \u0009).
\nnewline (LF)a newline (line feed) character (same as \u000A).
\vvertical tab (VT)a vertical tab character (same as \u000B).
\fform feed (FF)a form feed character (same as \u000C).
\rcarriage return (CR)a carriage return character (same as \u000D).
\clettercontrol codea control code character whose code unit value is the same as the remainder of dividing thecode unit value of letter by 32.
For example: \ca is the same as \u0001\cb the same as \u0002, and so on...
\xhhASCII charactera character whose code unit value has an hex value equivalent to the two hex digits hh.
For example: \x4c is the same as L, or \x23 the same as #.
\uhhhhunicode charactera character whose code unit value has an hex value equivalent to the four hex digits hhhh.
\0nulla null character (same as \u0000).
\intbackreferencethe result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.
\ddigita decimal digit character (same as [[:digit:]]).
\Dnot digitany character that is not a decimal digit character (same as [^[:digit:]]).
\swhitespacea whitespace character (same as [[:space:]]).
\Snot whitespaceany character that is not a whitespace character (same as [^[:space:]]).
\wwordan alphanumeric or underscore character (same as [_[:alnum:]]).
\Wnot wordany character that is not an alphanumeric or underscore character (same as [^_[:alnum:]]).
\charactercharacterthe character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
[class]character classthe target character is part of the class (see character classes below)
[^class]negated character classthe target character is not part of the class (see character classes below)

Notice that, in C++, character and string literals also escape characters using the backslash character ( \), and this affects the syntax for constructing regular expressions from such types. For example:
1
2
std::regex e1 ("\\d");  // regular expression: \d -> matches a digit character
std::regex e2 ("\\\\"); // regular expression: \\ -> matches a single backslash (\) character 
 


Quantifiers

Quantifiers follow a character or a  special pattern character. They can modify the amount of times that character is repeated in the match:
characterstimeseffects
*0 or moreThe preceding atom is matched 0 or more times.
+1 or moreThe preceding atom is matched 1 or more times.
?0 or 1The preceding atom is optional (matched either 0 times or once).
{int}intThe preceding atom is matched exactly int times.
{int,}int or moreThe preceding atom is matched int or more times.
{min,max}between min and maxThe preceding atom is matched at least min times, but not more than max.
By default, all these quantifiers are greedy (i.e., they take as many characters that meet the condition as possible). This behavior can be overridden to  ungreedy (i.e., take as few characters that meet the condition as possible) by adding a question mark ( ?) after the quantifier.
For example:
Matching  "(a+).*" against  "aardvark" succeeds and yields  aa as the first submatch.
While matching  "(a+?).*" against  "aardvark" also succeeds, but yields  a as the first submatch.

Groups

Groups allow to apply quantifiers to a sequence of characters (instead of a single character). There are two kinds of groups:
charactersdescriptioneffects
(subpattern)GroupCreates a backreference.
(?:subpattern)Passive groupDoes not create a backreference.
When a group creates a backreference, the characters that represent the  subpattern in the target sequence are stored as a  submatch. Each submatch is numbered after the order of appearance of their opening parenthesis (the first submatch is number 1, the second is number 2, and so on...).

These  submatches can be used in the regular expression itself to specify that the entire subpattern should appear again somewhere else (see  \ int in the  special characters list). They can also be used in the  replacement string or retrieved in the  match_results object filled by some  regex operations.

Assertions

Assertions are conditions that do not consume characters in the target sequence: they do not describe a character, but a condition that must be fulfilled before or after a character.
charactersdescriptioncondition for match
^Beginning of lineEither it is the beginning of the target sequence, or follows a line terminator.
$End of lineEither it is the end of the target sequence, or precedes a line terminator.
\bWord boundaryThe previous character is a word character and the next is a non-word character (or vice-versa).
Note: The beginning and the end of the target sequence are considered here as non-word characters.
\BNot a word boundaryThe previous and next characters are both word characters or both are non-word characters.
Note: The beginning and the end of the target sequence are considered here as non-word characters.
(?=subpattern)Positive lookaheadThe characters following the assertion must match subpattern, but no characters are consumed.
(?!subpattern)Negative lookaheadThe characters following the assertion must not match subpattern, but no characters are consumed.

Alternatives

A pattern can include different alternatives:
characterdescriptioneffects
|SeparatorSeparates two alternative patterns or subpatterns.
A regular expression can contain multiple alternative patterns simply by separating them with the  separator operator ( |): The regular expression will match if any of the alternatives match, and as soon as one does.

Subpatterns (in groups or assertions) can also use the  separator operator to separate different alternatives.

Character classes

A character class defines a category of characters. It is introduced by enclosing its descriptors in square brackets ( [ and  ]).
The regex object attempts to match the entire character class against a single character in the target sequence (unless a quantifier specifies otherwise).
The character class can contain any combination of:
  • Individual characters: Any character specified is considered part of the class (except the characters \[] and -when they have a special meaning as described in the following paragraphs).
    For example:
    [abc] matches ab or c.
    [^xyz] matches any character except xy and z.
  • Ranges: They can be specified by using the hyphen character (-) between two valid characters.
    For example:
    [a-z] matches any lowercase letter (abc, ... until z).
    [abc1-5] matches either ab or c, or a digit between 1 and 5.
  • POSIX-like classes: A whole set of predefined classes can be added to a custom character class. There are three kinds:
    classdescriptionnotes
    [:classname:]character classUses the regex traitsisctype member with the appropriate type gotten from applyinglookup_classname member on classname for the match.
    [.classname.]collating sequenceUses the regex traitslookup_collatename to interpret classname.
    [=classname=]character equivalentsUses the regex traitstransform_primary of the result ofregex_traits::lookup_collatename for classname to check for matches.
    The choice of available classes depend on the regex traits type and on its selected locale. But at least the following character classes shall be recognized by any regex traits type and locale:
    classdescriptionequivalent (with regex_traits, default locale)
    [:alnum:]alpha-numerical characterisalnum
    [:alpha:]alphabetic characterisalpha
    [:blank:]blank characterisblank
    [:cntrl:]control characteriscntrl
    [:digit:]decimal digit characterisdigit
    [:graph:]character with graphical representationisgraph
    [:lower:]lowercase letterislower
    [:print:]printable characterisprint
    [:punct:]punctuation mark characterispunct
    [:space:]whitespace characterisspace
    [:upper:]uppercase letterisupper
    [:xdigit:]hexadecimal digit characterisxdigit
    [:d:]decimal digit characterisdigit
    [:w:]word characterisalnum
    [:s:]whitespace characterisspace
    Please note that the brackets in the class names are additional to those opening and closing the class definition.
    For example:
    [[:alpha:]] is a character class that matches any alphanumeric character.
    [abc[:digit:]] is a character class that matches abc, or a digit.
    [^[:space:]] is a character class that matches any character except a whitespace.
  • Escape characters: All escape characters described above can also be used within a character class specification. The only change is with \b, that here is interpreted as a backspace character (\u0008) instead of a word boundary.
    Notice that within a class definition, those characters that have a special meaning in the regular expression (such as *.$) don't have such a meaning and are interpreted as normal characters (so they do not need to be escaped). Instead, within a class definition, the hyphen (-) and the brackets ([ and ]) do have special meanings under some circumstances, in which case they should be placed within the class in other locations where they do not have such special meaning, or be escaped with a backslash (\).


  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值