C++ regex 正则表达式的使用

在c++中,有三种正则可以选择使用,C ++regex,C regex,boost regex ,如果在windows下开发c++,默认不支持后面两种正则,如果想快速应用,显然C++ regex比较方便使用。文章将讨论C++ regex 正则表达式的使用。

C++ regex函数有3个:regex_match、regex_search 、regex_replace

regex_match

regex_match是正则表达式匹配的函数,下面以例子说明。如果想系统的了解,参考regex_match

// regex_match example
#include <iostream>
#include <string>
#include <regex>

int main ()
{

  if (std::regex_match ("subject", std::regex("(sub)(.*)") ))
    std::cout << "string literal matched\n";

  std::string s ("subject");
  std::regex e ("(sub)(.*)");
  if (std::regex_match (s,e))
    std::cout << "string object matched\n";

  if ( std::regex_match ( s.begin(), s.end(), e ) )
    std::cout << "range matched\n";

  std::cmatch cm;    // same as std::match_results<const char*> cm;
  std::regex_match ("subject",cm,e);
  std::cout << "string literal with " << cm.size() << " matches\n";

  std::smatch sm;    // same as std::match_results<string::const_iterator> sm;
  std::regex_match (s,sm,e);
  std::cout << "string object with " << sm.size() << " matches\n";

  std::regex_match ( s.cbegin(), s.cend(), sm, e);
  std::cout << "range with " << sm.size() << " matches\n";

  // using explicit flags:
  std::regex_match ( "subject", cm, e, std::regex_constants::match_default );

  std::cout << "the matches were: ";
  for (unsigned i=0; i<sm.size(); ++i) {
    std::cout << "[" << sm[i] << "] ";
  }

  std::cout << std::endl;

  return 0;
}
输出如下:

string literal matched
string object matched
range matched
string literal with 3 matches
string object with 3 matches
range with 3 matches
the matches were: [subject] [sub] [ject]

regex_search

regex_match是另外一个正则表达式匹配的函数,下面是regex_search的例子。regex_search和regex_match的主要区别是:regex_match是全词匹配,而regex_search是搜索其中匹配的字符串。如果想系统了解,请参考regex_search

// regex_search example
#include <iostream>
#include <regex>
#include <string>

int main(){
  std::string s ("this subject has a submarine as a subsequence");
  std::smatch m;
  std::regex e ("\\b(sub)([^ ]*)");   // matches words beginning by "sub"

  std::cout << "Target sequence: " << s << std::endl;
  std::cout << "Regular expression: /\\b(sub)([^ ]*)/" << std::endl;
  std::cout << "The following matches and submatches were found:" << std::endl;

  while (std::regex_search (s,m,e)) {
    for (auto x=m.begin();x!=m.end();x++) 
      std::cout << x->str() << " ";
    std::cout << "--> ([^ ]*) match " << m.format("$2") <<std::endl;
    s = m.suffix().str();
  }
}

输出如下:

Target sequence: this subject has a submarine as a subsequence
Regular expression: /\b(sub)([^ ]*)/
The following matches and submatches were found:
subject sub ject --> ([^ ]*) match ject
submarine sub marine --> ([^ ]*) match marine
subsequence sub sequence --> ([^ ]*) match sequence

/********  无情的分割线 ********* /    
  作者:没有开花的树    
  博客:blog.csdn.net/mycwq    
/ *******   无情的copy  *********/
regex_replace

regex_replace是替换正则表达式匹配内容的函数,下面是regex_replace的例子。如果想系统了解,请参考regex_replace

#include <regex> 
#include <iostream> 
 
int main() { 
    char buf[20]; 
    const char *first = "axayaz"; 
    const char *last = first + strlen(first); 
    std::regex rx("a"); 
    std::string fmt("A"); 
    std::regex_constants::match_flag_type fonly = 
        std::regex_constants::format_first_only; 
 
    *std::regex_replace(&buf[0], first, last, rx, fmt) = '\0'; 
    std::cout << &buf[0] << std::endl; 
 
    *std::regex_replace(&buf[0], first, last, rx, fmt, fonly) = '\0'; 
    std::cout << &buf[0] << std::endl; 
 
    std::string str("adaeaf"); 
    std::cout << std::regex_replace(str, rx, fmt) << std::endl; 
 
    std::cout << std::regex_replace(str, rx, fmt, fonly) << std::endl; 
 
    return 0; 
} 
输出如下:
AxAyAz
Axayaz
AdAeAf
Adaeaf

C++ regex正则表达式的规则和其他编程语言差不多,如下:

特殊字符(用于匹配很难形容的字符):

characters description matches
. not newline any character exceptline terminators(LF, CR, LS, PS).
\t tab (HT) a horizontal tab character (same as\u0009).
\n newline (LF) a newline (line feed) character (same as\u000A).
\v vertical tab (VT) a vertical tab character (same as\u000B).
\f form feed (FF) a form feed character (same as\u000C).
\r carriage return (CR) a carriage return character (same as\u000D).
\cletter control code a control code character whosecode unit valueis the same as the remainder of dividing thecode unit valueofletterby 32.
For example:\cais the same as\u0001,\cbthe same as\u0002, and so on...
\xhh ASCII character a character whosecode unit valuehas an hex value equivalent to the two hex digitshh.
For example:\x4cis the same asL, or\x23the same as#.
\uhhhh unicode character a character whosecode unit valuehas an hex value equivalent to the four hex digitshhhh.
\0 null a null character (same as\u0000).
\int backreference the result of the submatch whose opening parenthesis is theint-th (intshall begin by a digit other than0). Seegroupsbelow for more info.
\d digit a decimal digit character
\D not digit any character that is not a decimal digit character
\s whitespace a whitespace character
\S not whitespace any character that is not a whitespace character
\w word an alphanumeric or underscore character
\W not word any character that is not an alphanumeric or underscore character
\character character the charactercharacteras it is, without interpreting its special meaning within a regex expression.
Anycharactercan be escaped except those which form any of the special character sequences above.
Needed for:^ $ \ . * + ? ( ) [ ] { } |
[class] character class the target character is part of the class
[^class] negated character class the target character is not part of the class
注意了,在C++反斜杠字符(\)会被转义
std::regex e1 ("\\d");  //  \d -> 匹配数字字符
std::regex e2 ("\\\\"); //  \\ -> 匹配反斜杠字符

数量

characters times effects
* 0 or more The preceding atom is matched 0 or more times.
+ 1 or more The preceding atom is matched 1 or more times.
? 0 or 1 The preceding atom is optional (matched either 0 times or once).
{int} int The preceding atom is matched exactlyinttimes.
{int,} intor more The preceding atom is matchedintor more times.
{min,max} betweenminandmax The preceding atom is matched at leastmintimes, but not more thanmax.

注意了,模式 "(a+).*" 匹配 "aardvark" 将匹配到 aa,模式 "(a+?).*" 匹配 "aardvark" 将匹配到 a

(用以匹配连续的多个字符):

characters description effects
(subpattern) Group Creates a backreference.
(?:subpattern) Passive group Does not create a backreference.
注意了,第一种将创建一个反向引用,用于提取匹配到的内容,第二种则没有,相对来说性能方面也没这部分的开销

characters description condition for match
^ Beginning of line Either it is the beginning of the target sequence, or follows aline terminator.
$ End of line Either it is the end of the target sequence, or precedes aline terminator.
| Separator Separates two alternative patterns or subpatterns..

单个字符

[abc] 匹配 a, b 或 c.
[^xyz] 匹配任何非 x, y, z的字符

范围
[a-z] 匹配任何小写字母 (a, b, c, ..., z).
[abc1-5] 匹配 a, b , c, 或 1 到 5 的数字.

c++ regex还有一种类POSIX的写法

class description equivalent (withregex_traits, default locale)
[:alnum:] alpha-numerical character isalnum
[:alpha:] alphabetic character isalpha
[:blank:] blank character isblank
[:cntrl:] control character iscntrl
[:digit:] decimal digit character isdigit
[:graph:] character with graphical representation isgraph
[:lower:] lowercase letter islower
[:print:] printable character isprint
[:punct:] punctuation mark character ispunct
[:space:] whitespace character isspace
[:upper:] uppercase letter isupper
[:xdigit:] hexadecimal digit character isxdigit
[:d:] decimal digit character isdigit
[:w:] word character isalnum
[:s:] whitespace character isspace

参考:

http://blog.csdn.net/mycwq/article/details/18838151

http://www.cplusplus.com/reference/regex/


  • 0
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值