C++ 正则表达式_c++ regex pattern-CSDN博客

简介

使用正则表达式来处理字符串是非常方便的，它的处理流程为

用正则表达式定义要匹配的字符串的规则
然后对目标字符串进行匹配
最后对匹配到的结果进行操作

C++ 的 regex 库，来实现正则表达式的所有操作

定义 regex pattern
匹配 regex_search
搜索 regex_search
替换 regex_replace

正则表达式本身可以看作是一种简单的程序设计语言

在运行时，当一个 regex 对象被初始化或被赋予新的模式时，它才被编译
如果正则表达式存在错误，会在运行时抛出一个 regex_error 异常
构建一个 regex 对象比较耗时，如果非必要尽量少用

转义字符

和 java 一样，在字符串里，反斜杠会经一层转义，然后才将转义后的结果返回。因此同样，单个反斜杠要变成两个反斜杠。

在 C++ 中，bool 会被输出为整数 0 或 1

string str = "520";
cout << regex_match(str, regex("\d+")) << endl;		//0
cout << regex_match(str, regex("\\d+")) << endl;	//1

在不使用双转义字符的时候，匹配失败。虽然不会报错，但会有一条警告。

“d”：不可识别的字符转义序列

标志

定义正则表达式的时候可以指定一些标志，用 “|” 分隔

标志	含义
icase	在匹配过程中忽略大小写
nosubs	不保存匹配的子表达式
optimize	执行速度优先于构造速度
ECMAScript	使用 ECMA-262 指定语法
basic	使用 POSIX 基本的正则表达式语法
extended	使用 POSIX 扩展的正则表达式语法
awk	使用 POSIX 版本的 awk 的语法
grep	使用 POSIX 版本的 grep 的语法
egrep	使用 POSIX 版本的 egrep 的语法

匹配忽略大小写

最常用的莫过于使用 icase 忽略大小写了

string str = "aAa";
cout << regex_match(str, regex("a*")) << endl;							//0
cout << regex_match(str, regex("a*", regex::icase)) << endl;		//1

全文匹配

bool regex_match(str, regex) 用来判断目标字符串 str 和正则表达式 regex 是否完全匹配

判断是否匹配

string str = "http://www.baidu.com";
string reg = "([a-zA-z]+)://([^\\s]+)";
regex pattern(reg);

bool isMatch = regex_match(str, pattern);
if (isMatch) {
    cout << "regex_match(str, pattern) 匹配成功" << endl;
} else {
    cout << "regex_match(str, pattern) 匹配失败" << endl;
}

获取分组

如果匹配成功，会将匹配到的分组存入 smatch 中，smatch 的长度是 “分组数+1”。

其中 smatch[0] 存储的是原始字符串，然后 1 - N，分别是匹配到的各个分组，分组是正则表达式中用括号括起来的内容。

string str = "http://www.baidu.com";
string reg = "([a-zA-z]+)://([^\\s]+)";
regex pattern(reg);

smatch result;
isMatch = regex_match(str, result, pattern);
if (isMatch) {
    for (int i = 0; i < result.size(); i++) {
        cout << "regex_match(str, result, pattern) result[" << i << "] = " << result[i] << endl;
    }
} else {
    cout << "regex_match(str, pattern) 匹配失败" << endl;
}

输出

regex_match(str, result, pattern) result[0] = http://www.baidu.com
regex_match(str, result, pattern) result[1] = http
regex_match(str, result, pattern) result[2] = www.baidu.com

搜索

bool regex_search(str, result, regex) 是搜索匹配，用来将 str 中首个符合正则表达式 regex 的字符串提取出来

注意：默认情况下，search 在发现首个后会戛然中止，如果需要找出所有符合规则的字符串，需要使用 iterator。

其中 smatch[0] 存储的是原始字符串，然后 1 - N，分别是匹配到的各个分组，分组是正则表达式中用括号括起来的内容。

搜索首条

默认情况下，只会进行首次匹配。匹配后的结果放置于 smatch 变量中。smatch[0] 是匹配到的整个字符串，后续索引依次是每个分组。

string str = "192.168.1.1 是一个内网地址，而 214.26.18.5 是一个外部地址";
string reg = "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})";
regex pattern(reg);
smatch result;
bool found = regex_search(str, result, pattern);
if (found) {
    cout << "result size = " << result.size() << endl;
    for (int i = 0; i < result.size(); i++) {
        cout << "result[" << i << "] = " << result.str(i) << endl;
    }
} else {
    cout << "regex_search(str, result, pattern) not found" << endl;
}

输出

result size = 5
result[0] = 192.168.1.1
result[1] = 192
result[2] = 168
result[3] = 1
result[4] = 1

递归搜索

递归搜索的核心是引入字符串迭代器。每次搜索到结果后，将迭代器的起始值改为搜索到的结果的尾部，然后循环往复即可。

string str = "192.168.1.1 是一个内网地址，而 214.26.18.5 是一个外部地址";
string reg = "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})";
regex pattern(reg);
smatch result;

string::const_iterator iterStart = str.begin();
string::const_iterator iterEnd = str.end();
while (regex_search(iterStart, iterEnd, result, pattern)) {
    cout << "result size = " << result.size() << endl;
    for (int i = 0; i < result.size(); i++) {
        cout << "result[" << i << "] = " << result.str(i) << endl;
    }
    iterStart = result[0].second;
}

输出

result size = 5
result[0] = 192.168.1.1
result[1] = 192
result[2] = 168
result[3] = 1
result[4] = 1
result size = 5
result[0] = 214.26.18.5
result[1] = 214
result[2] = 26
result[3] = 18
result[4] = 5

替换

string regex_replace(str1, regex, str2) 是替换匹配，可以将 str1 中所有符合匹配规则 regex 的字符串或其子字符串替换为 str2

全量替换

 str = "bob is a man, bob's age is twenty-eight, bob has a house and car!";
 cout << regex_replace(str, regex("bob"), "tina") << endl;	//输出 "tina is a man, tina's age is twenty-eight, tina has a house and car!"

当需要匹配特殊字符时，需要用到转义字符

string str = "his name is ${name} and age is ${age}!";
regex pattern1("\\$\\{name\\}");
regex pattern2("\\$\\{age\\}");
str = regex_replace(str, pattern1, "bob");
cout << str << endl;										// 输出 his name is bob and age is ${age}!
str = regex_replace(str, pattern2, "18");
cout << str << endl;										// 输出 his name is bob and age is 18!

去除首尾的空白字符

这个正则表达式使用了逻辑或，匹配A或者B分组，匹配到则进行替换。

public string trim(string str) {
	return regex_replace(str, regex("(^\\s*)|(\\s*$)"), "");
}

void main() {
	string str = "			大话西游				";
	cout << "\"" << trim(str) << "\"" << endl;		// 输出 "大话西游"
}

调整字符串内容（改变子串顺序、缩短字符串）

可以用美元符号 $ 来标识每一个分组，分组编号从1开始，例如 $1, $2, $3 等等

string str = "2020-4-6";
regex pattern("(\\d+)-(\\d+)-(\\d+)");
cout << regex_replace(str, pattern, "$2$3$1") << endl;		//	输出 4/6/2020