注意:
.
(点)在括号中没有特殊含义,无需转义用\
转义。
这里不介绍正则表达式语法。
std::regex
默认使用ECMAScript的正则语法,请参考cpluscplus std::ECMAScript syntax。
如果你想了解POSIX的语法,可以参考我的另一篇博客POSIX正则表达式。
1. 查找第一个匹配的
#include <iostream>
#include <regex>
using namespace std;
int main(void)
{
string pattern = "[^c]ei";
pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
regex policy(pattern);
string text = "receipt freind theif receive";
smatch results;
if(regex_search(text, results, policy))
cout << results.str() << endl;
}
输出结果:
freind
2. 查找所有结果
#include <iostream>
#include <regex>
using namespace std;
int main (void)
{
string pattern = "[^c]ei";
pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
// 忽略大小写
regex policy (pattern, regex::icase);
string text = "receipt freind theif receive";
for (sregex_iterator it (text.begin(), text.end(), policy), end_it;
it != end_it; ++it
)
cout << it->str() << endl;
}
其中,比较难理解的是sregex_iterator it (text.begin(), text.end(), policy), end_it
,这行代码是定义了it
迭代器来进行遍历查询,end_it
为空sregex_iterator
,起到尾后迭代器的作用。
输出结果:
freind
theif
3. 打印匹配结果的上下文
#include <iostream>
#include <regex>
using namespace std;
string text =
"Once there were two mice. They were friends. One mouse "
"lived in the country; the other mouse lived in the city."
"After many years the Country mouse saw the City mouse;"
"he said, \"Do come and see me at my house in the country."
"\" So the City mouse went. The City mouse said, \"This food"
"is not good, and your house is not good. Why do you live "
"in a hole in the field? You should come and live in the "
"city. You would live in a nice house made of stone. You "
"would have nice food to eat. You must come and see me at"
"my house in the city.\"The Country mouse went to the house"
"of the City mouse. It was a very good house. Nice food "
"was set ready for them to eat. But just as they began to"
"eat they heard a great noise. The City mouse cried, \" Run"
"! Run! The cat is coming!\" They ran away quickly and hid"
".After some time they came out. When they came out, the "
"Country mouse said, \"I do not like living in the city."
"I like living in my hole in the field. For it is nicer"
"to be poor and happy, than to be rich and afraid.";
int main (void)
{
string pattern = "you";
pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
regex policy (pattern, regex::icase);
for (sregex_iterator it (text.begin(), text.end(), policy), end_it;
it != end_it ; ++it) {
auto pos = it->prefix().length();
pos = pos > 40 ? pos - 40 : 0;
cout << it->prefix().str().substr (pos)
<< "\n\t\t>>> "
<< it->str()
<< " <<<\n"
<< it->suffix().str().substr (0, 40)
<< endl;
}
}
输出结果:
mouse said, "This foodis not good, and
>>> your <<<
house is not good. Why do you live in a
house is not good. Why do
>>> you <<<
live in a hole in the field? You should
live in a hole in the field?
>>> You <<<
should come and live in the city. You w
should come and live in the city.
>>> You <<<
would live in a nice house made of ston
uld live in a nice house made of stone.
>>> You <<<
would have nice food to eat. You must c
would have nice food to eat.
>>> You <<<
must come and see me atmy house in the
4. 使用子表达式
#include <iostream>
#include <regex>
using namespace std;
string text =
"Once there were two mice. They were friends. One mouse "
"lived in the country; the other mouse lived in the city."
"After many years the Country mouse saw the City mouse;"
"he said, \"Do come and see me at my house in the country."
"\" So the City mouse went. The City mouse said, \"This food"
"is not good, and your house is not good. Why do you live "
"in a hole in the field? You should come and live in the "
"city. You would live in a nice house made of stone. You "
"would have nice food to eat. You must come and see me at"
"my house in the city.\"The Country mouse went to the house"
"of the City mouse. It was a very good house. Nice food "
"was set ready for them to eat. But just as they began to"
"eat they heard a great noise. The City mouse cried, \" Run"
"! Run! The cat is coming!\" They ran away quickly and hid"
".After some time they came out. When they came out, the "
"Country mouse said, \"I do not like living in the city."
"I like living in my hole in the field. For it is nicer"
"to be poor and happy, than to be rich and afraid.";
int main (void)
{
string pattern = "! The (.*?)(coming)[[:alnum:]]*";
regex policy (pattern);
for (sregex_iterator it (text.begin(), text.end(), policy), end_it; it != end_it ; ++it) {
cout << "总表达式\n\t" << it->str() << "\n";
if ( (*it) [1].matched)
cout << "第1个子表达式\n\t" << it->str (1) << "\n";
if ( (*it) [2].matched)
cout << "第2个子表达式\n\t" << it->str (2);
}
}
输出结果:
总表达式
! The cat is coming
第1个子表达式
cat is
第2个子表达式
coming
5. 查找并替换
regex_replace()
用于查找并替换,
#include <iostream>
#include <regex>
#include <sstream>
using namespace std;
static const string text =
"morgan (201) 555-2368 862-555-0123\n"
"drew (973)555.0130\n"
"lee (609) 555-0132 2015550175 800.555-0000";
int main (void)
{
string phone_pattern =
"(\\()?" //可选左括号
"(\\d{3})" //区号
"(\\))?" //可选右括号
"([-. ])?" //可选分隔符
"(\\d{3})" //前三位
"([-. ])?" //可选分隔符
"(\\d{4})"; //后四位
regex policy (phone_pattern);
string format = "$2.$5.$7";//格式为 xxx.xxx.xxxx
istringstream input (text);
string line;
while (getline (input, line)) {
cout << regex_replace (line, policy, format) << endl;
}
}
运行结果:
morgan 201.555.2368 862.555.0123
drew 973.555.0130
lee 609.555.0132 201.555.0175 800.555.0000
其中format
中的$n
表示第n个子表达式。
默认情况下,regex_replace
会输出整个输入序列。
未与正则表达式匹配的部分会原样输出,匹配的部分按照格式字符来输出。
如果只想要匹配的部分,我们可以通过添加format_no_copy
标志:
string fmt = "$2.$5.$7 "
cout << regex_replace (line, policy, format, regex_constants::format_no_copy) << endl;
此时输出结果:
201.555.2368 862.555.0123
973.555.0130
609.555.0132 201.555.0175 800.555.0000
标准库定义了用来在替换过程中控制匹配或格式的标志。这些标志可以传递给函数
regex_search
或regex_match
或是类smatch
的format
成员,例如format_no_copy
是类型match_flag_type
的值,定义在命名空间std::regex_constants
中。
匹配标志(定义在regex_constants::mat_flag_type
中)
match_default | 等价于format_default |
match_not_bol | 不将首字符作为行首处理 |
match_not_eol | 不将尾字符作为行尾处理 |
match_not_bow | 不将首字符作为单词首处理 |
match_not_eow | 不将尾字符作为单词尾处理 |
match_any | 如果存在多于一个匹配,则可返回任意一个匹配 |
match_not_null | 不匹配任何空序列 |
match_continuous | 匹配必须从输入的首字符开始 |
match_prev_avail | 输入序列包含第一个匹配之前的内容 |
format_default | 用ECMAScript规则替换字符串 |
format_sed | 用POSIX sed规则替换字符串 |
format_no_copy | 不输出输入序列中未匹配的部分 |
format_first_only | 只替换子表达式的第一次出现 |
参考书籍: C++ Primer 5 中文版
使用记录
regex_replace()
替换全部:
string replace_all (const string &text, const regex &pattern, const string &format)
{
string result = text;
while (regex_search (result, pattern)) {
result = regex_replace (result, pattern, format);
}
return result;
}