c++ 崩溃正则表达式regex_C++ regex 正则表达式的使用

最新推荐文章于 2024-07-08 02:18:21 发布

稻草人旅行

最新推荐文章于 2024-07-08 02:18:21 发布

阅读量648

点赞数

文章标签： c++ 崩溃正则表达式regex

本文链接：https://blog.csdn.net/weixin_31602525/article/details/114976275

版权

在c++中，有三种正则可以选择使用，C ++regex，C regex，boost regex ，如果在windows下开发c++，默认不支持后面两种正则，如果想快速应用，显然C++ regex 比较方便使用。文章将讨论C++ regex 正则表达式的使用。

C++ regex函数有3个：regex_match、 regex_search 、regex_replace

regex_match

regex_match是正则表达式匹配的函数，下面以例子说明。如果想系统的了解，参考regex_match

// regex_match example

#include

int main ()

{

if (std::regex_match ("subject", std::regex("(sub)(.*)") ))

std::cout <

std::string s ("subject");

std::regex e ("(sub)(.*)");

if (std::regex_match (s,e))

std::cout <

if ( std::regex_match ( s.begin(), s.end(), e ) )

std::cout <

std::cmatch cm; // same as std::match_results cm;

std::regex_match ("subject",cm,e);

std::cout <

std::smatch sm; // same as std::match_results<:const_iterator> sm;

std::regex_match (s,sm,e);

std::cout <

std::regex_match ( s.cbegin(), s.cend(), sm, e);

std::cout <

// using explicit flags:

std::regex_match ( "subject", cm, e, std::regex_constants::match_default );

std::cout <

for (unsigned i=0; i

std::cout <

}

std::cout <

return 0;

}输出如下：

string literal matched

string object matched

range matched

string literal with 3 matches

string object with 3 matches

range with 3 matches

the matches were: [subject] [sub] [ject]

regex_search

regex_match是另外一个正则表达式匹配的函数，下面是regex_search的例子。regex_search和regex_match的主要区别是：regex_match是全词匹配，而regex_search是搜索其中匹配的字符串。如果想系统了解，请参考regex_search

// regex_search example

#include

int main(){

std::string s ("this subject has a submarine as a subsequence");

std::smatch m;

std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"

std::cout <

while (std::regex_search (s,m,e)) {

for (auto x=m.begin();x!=m.end();x++)

std::cout <str() <

std::cout < ([^ ]*) match " <

s = m.suffix().str();

}

输出如下：

Target sequence: this subject has a submarine as a subsequence

Regular expression: /\b(sub)([^ ]*)/

The following matches and submatches were found:

subject sub ject --> ([^ ]*) match ject

submarine sub marine --> ([^ ]*) match marine

subsequence sub sequence --> ([^ ]*) match sequence

/******** 无情的分割线 ********* /

作者：没有开花的树

博客：blog.csdn.net/mycwq

/ ******* 无情的copy *********/regex_replace

regex_replace是替换正则表达式匹配内容的函数，下面是regex_replace的例子。如果想系统了解，请参考regex_replace

#include

int main() {

char buf[20];

const char *first = "axayaz";

const char *last = first + strlen(first);

std::regex rx("a");

std::string fmt("A");

std::regex_constants::match_flag_type fonly =

std::regex_constants::format_first_only;

*std::regex_replace(&buf[0], first, last, rx, fmt) = '\0';

std::cout <

*std::regex_replace(&buf[0], first, last, rx, fmt, fonly) = '\0';

std::cout <

std::string str("adaeaf");

std::cout <

return 0;

}输出如下：

AxAyAz

Axayaz

AdAeAf

Adaeaf

C++ regex正则表达式的规则和其他编程语言差不多，如下：

特殊字符(用于匹配很难形容的字符):

characters

description

matches

not newline

any character except line terminators (LF, CR, LS, PS).

tab (HT)

a horizontal tab character (same as \u0009).

newline (LF)

a newline (line feed) character (same as \u000A).

vertical tab (VT)

a vertical tab character (same as \u000B).

form feed (FF)

a form feed character (same as \u000C).

carriage return (CR)

a carriage return character (same as \u000D).

\cletter

control code

a control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.

For example: \ca is the same as \u0001, \cb the same as \u0002, and so on...

\xhh

ASCII character

a character whose code unit value has an hex value equivalent to the two hex digits hh.

For example: \x4c is the same as L, or \x23 the same as #.

\uhhhh

unicode character

a character whose code unit value has an hex value equivalent to the four hex digitshhhh.

null

a null character (same as \u0000).

\int

backreference

the result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.

digit

a decimal digit character

not digit

any character that is not a decimal digit character

whitespace

a whitespace character

not whitespace

any character that is not a whitespace character

word

an alphanumeric or underscore character

not word

any character that is not an alphanumeric or underscore character

\character

character

the character character as it is, without interpreting its special meaning within a regex expression.

Any character can be escaped except those which form any of the special character sequences above.

Needed for: ^ $ \ . * + ? ( ) [ ] { } |

[class]

character class

the target character is part of the class

[^class]

negated character class

the target character is not part of the class

注意了，在C++反斜杠字符(\)会被转义

std::regex e1 ("\\d"); // \d -> 匹配数字字符

std::regex e2 ("\\\\"); // \\ -> 匹配反斜杠字符

数量：

characters

times

effects

0 or more

The preceding atom is matched 0 or more times.

1 or more

The preceding atom is matched 1 or more times.

0 or 1

The preceding atom is optional (matched either 0 times or once).

{int}

int

The preceding atom is matched exactly int times.

{int,}

int or more

The preceding atom is matched int or more times.

{min,max}

between min and max

The preceding atom is matched at least min times, but not more than max.

注意了，模式 "(a+).*" 匹配 "aardvark" 将匹配到 aa，模式 "(a+?).*" 匹配 "aardvark" 将匹配到 a

组(用以匹配连续的多个字符):

characters

description

effects

(subpattern)

Group

Creates a backreference.

(?:subpattern)

Passive group

Does not create a backreference.

注意了，第一种将创建一个反向引用，用于提取匹配到的内容，第二种则没有，相对来说性能方面也没这部分的开销

characters

description

condition for match

Beginning of line

Either it is the beginning of the target sequence, or follows a line terminator.

End of line

Either it is the end of the target sequence, or precedes a line terminator.

Separator

Separates two alternative patterns or subpatterns..

单个字符

[abc] 匹配 a, b 或 c.

[^xyz] 匹配任何非 x, y, z的字符

范围

[a-z] 匹配任何小写字母 (a, b, c, ..., z).

[abc1-5] 匹配 a, b , c, 或 1 到 5 的数字.

c++ regex还有一种类POSIX的写法

class

description

equivalent (with regex_traits, default locale)

[:alnum:]

alpha-numerical character

isalnum

[:alpha:]

alphabetic character

isalpha

[:blank:]

blank character

isblank

[:cntrl:]

control character

iscntrl

[:digit:]

decimal digit character

isdigit

[:graph:]

character with graphical representation

isgraph

[:lower:]

lowercase letter

islower

[:print:]

printable character

isprint

[:punct:]

punctuation mark character

ispunct

[:space:]

whitespace character

isspace

[:upper:]

uppercase letter

isupper

[:xdigit:]

hexadecimal digit character

isxdigit

[:d:]

decimal digit character

isdigit

[:w:]

word character

isalnum

[:s:]

whitespace character

isspace

参考：

http://blog.csdn.NET/mycwq/article/details/18838151

http://www.cplusplus.com/reference/regex/

稻草人旅行

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

c++ 崩溃 正则表达式regex_C++ regex 正则表达式的使用

c++ 崩溃正则表达式regex_C++ regex 正则表达式的使用