C++ 正则表达式

多弗朗强哥

已于 2023-08-29 17:06:23 修改

阅读量400

点赞数

分类专栏： Linux开发文章标签： c++ 正则表达式

于 2023-07-23 08:59:31 首次发布

本文链接：https://blog.csdn.net/weixin_39759247/article/details/131749890

版权

Linux开发专栏收录该内容

27 篇文章 0 订阅

订阅专栏

正则表达式使用符号表示一类字符串，以实现批量处理字符串的效果。C++ string没有提供正则表达式，而是以C++正则表达式库形式提供的。

C++正则表达式分为Match（匹配）、Search（搜索）、Tokenize（分词）、Replace（替换）四部分。

正则表达式语法

C++支持的正则表达式语法如下：

ECMAScript  //默认语法
awk
grep
egrep
basic 
extended

ECMAScript常用语法如下：

/* 匹配次数 */
*	     //匹配任意次前面字符，>=0
+	     //至少匹配1次前面字符，>=1
?	     //至多匹配1次前面字符，可有可无，<=1

/* 匹配位置 */
^	     //匹配行首
$	     //匹配行尾
\b	     //匹配边界，不匹配字符

/* 匹配内容 */
\w, \W	     //匹配字母[a-z]、数字、下划线，取反
\d, \D       //匹配数字[0-9]，取反
\r, \n, \t   //换行符、回车键、tab键
\s, \S	     //匹配空格，取反
.	         //任意字符，除换行符
[…]	         //匹配[]中的任意字符
[^…]	     //不匹配[]中的任意字符
[[:alnum:]]  //字母或数字
[[:alpha:]]  //字母
[[:lower:]]  //小写字母
[[:upper:]]  //大写字母
[[:princt:]] //标点符号
[[:blank:]]  //空格或tab
[[:space:]]  //空白字符
[[:cntrl:]]  //控制符
[[:digit:]]  //数字
[[:xdigit:]] //十六进制数字
[[:print:]]  //可打印字符，包括空白
[[:graph:]]  //可打印字符，除空白

/* 分组 */
|	     // 逻辑或
(…)	     //设定分组
\1	     //代表第1组内容
{}	     //重复表达式n次

可以通过传入语法类型来指定使用的正则语法：

regex_match(s,reg,regex_constants::grep);

全词匹配

#include <regex>
string s;
regex reg;

regex_match(s,reg);

子串搜索

搜索子串：

#include <regex>
bool res;
string s;
regex reg;
smatch m; // smatch、cmatch、wsmatch、wcmatch

regex_search(s,reg); //只判断是否存在
regex_search(s,m,reg); //判断是否存在，并返回匹配信息

m.str() //返回所有匹配内容
m.size() //返回匹配个数
m[n].str() //组n的内容
m.str(n) //组n的内容
m.prefix().str() //组1前的内容
m.suffix().str() //最后一组后的内容
m.positon(n) //组n的位置

遍历匹配结果：

#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main(){
    string s("tag: <head1>mmmm</head1> <href>alading</href> binggo! <head>nnnnn</head> end!");
    regex reg("<(.*)>(.*)</(\\1)>");

    sregex_iterator beg(s.cbegin(), s.cend(), reg);
    sregex_iterator end;

    for_each(beg,end,[](const smatch& m){
        cout << m.str() << endl;
        cout << m.str(1) << endl;
        cout << m.str(2) << endl;
    });
    
    return 0;
}

相同词汇的第一次匹配：

#include <iostream>
#include <string>
#include <regex>

using namespace std;

int main() {
    string s("tag: <head>mmmm</head> <href>alading</href> binggo! <head>nnnnn</head> end!");
    regex reg("<([^>]*)>([^<]*)</\\1>");
    smatch match;

    if (regex_search(s, match, reg)) {
        cout << "Matched tag: " << match[0] << endl;
        cout << "Opening tag: " << match[1] << endl;
        cout << "Content: " << match[2] << endl;
    } else {
        cout << "No match found." << endl;
    }

    return 0;
}

子串分割

#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main(){
    string s("alian, bob, conda, davi");
    regex reg("[ \t\n]*[;,.][ \t\n]*");

    sregex_token_iterator beg(s.cbegin(),s.cend(),reg,-1); //-1表示不带分词符，0表示带分词符，n表示提取第n个
    sregex_token_iterator end;

    for(; beg != end; beg++){
        cout << *beg << endl;
    }
    
    return 0;
}

子串替换

将匹配到的词，替换掉

#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main(){
    string s("tag: <head1>mmmm</head1> <href>alading</href> binggo! <head>nnnnn</head> end!");
    regex reg("<(.*)>(.*)</(\\1)>");
    string target(R"({$1 $2})");

    string t = regex_replace(s,reg,target);
    cout << t << endl;
    return 0;
}

注意，这里的占位符使用的是$符号，如果想和sed一样使用\1，则需要传入regex_constants::format_sed。

多弗朗强哥

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
C++ 正则表达式

正则表达式使用符号表示一类字符串，以实现批量处理字符串的效果。C++ string没有提供正则表达式，而是以C++正则表达式库形式提供的。C++正则表达式分为Match（匹配）、Search（搜索）、Tokenize（分词）、Replace（替换）四部分。more。
复制链接

扫一扫