C++正则表达式基础

注意: .(点)在括号中没有特殊含义,无需转义用\转义。

这里不介绍正则表达式语法。
std::regex默认使用ECMAScript的正则语法,请参考cpluscplus std::ECMAScript syntax

如果你想了解POSIX的语法,可以参考我的另一篇博客POSIX正则表达式

1. 查找第一个匹配的

#include <iostream>
#include <regex>

using namespace std;

int main(void)
{
    string pattern = "[^c]ei";
    pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
    
    regex policy(pattern);    
    string  text = "receipt freind theif receive";
    smatch results;
    if(regex_search(text, results, policy))
        cout << results.str() << endl;
}

输出结果:

freind

2. 查找所有结果

#include <iostream>
#include <regex>

using namespace std;

int main (void)
{
	string pattern = "[^c]ei";
	pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
	
	// 忽略大小写
	regex policy (pattern, regex::icase);
	
	string text = "receipt freind theif receive";
	
	for (sregex_iterator it (text.begin(), text.end(), policy), end_it;
	     it != end_it; ++it
	    )
		cout << it->str() << endl;
}

其中,比较难理解的是sregex_iterator it (text.begin(), text.end(), policy), end_it,这行代码是定义了it迭代器来进行遍历查询,end_it为空sregex_iterator,起到尾后迭代器的作用。

输出结果:

freind
theif

3. 打印匹配结果的上下文

#include <iostream>
#include <regex>

using namespace std;


string text =
    "Once there were two mice. They were friends. One mouse "
    "lived in the country; the other mouse lived in the city."
    "After many years the Country mouse saw the City mouse;"
    "he said, \"Do come and see me at my house in the country."
    "\" So the City mouse went. The City mouse said, \"This food"
    "is not good, and your house is not good. Why do you live "
    "in a hole in the field? You should come and live in the "
    "city. You would live in a nice house made of stone. You "
    "would have nice food to eat. You must come and see me at"
    "my house in the city.\"The Country mouse went to the house"
    "of the City mouse. It was a very good house. Nice food "
    "was set ready for them to eat. But just as they began to"
    "eat they heard a great noise. The City mouse cried, \" Run"
    "! Run! The cat is coming!\" They ran away quickly and hid"
    ".After some time they came out. When they came out, the "
    "Country mouse said, \"I do not like living in the city."
    "I like living in my hole in the field. For it is nicer"
    "to be poor and happy, than to be rich and afraid.";

int main (void)
{
	string pattern = "you";
	pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
	
	regex policy (pattern, regex::icase);
	
	for (sregex_iterator it (text.begin(), text.end(), policy), end_it;
	     it != end_it ; ++it) {
		auto pos = it->prefix().length();
		pos = pos > 40 ? pos - 40 : 0;
		cout << it->prefix().str().substr (pos)
		     << "\n\t\t>>> "
		     << it->str()
		     << " <<<\n"
		     << it->suffix().str().substr (0, 40)
		     << endl;
	}
	
}

输出结果:

 mouse said, "This foodis not good, and 
		>>> your <<<
 house is not good. Why do you live in a
 house is not good. Why do 
		>>> you <<<
 live in a hole in the field? You should
 live in a hole in the field? 
		>>> You <<<
 should come and live in the city. You w
 should come and live in the city. 
		>>> You <<<
 would live in a nice house made of ston
uld live in a nice house made of stone. 
		>>> You <<<
 would have nice food to eat. You must c
 would have nice food to eat. 
		>>> You <<<
 must come and see me atmy house in the 

4. 使用子表达式

#include <iostream>
#include <regex>

using namespace std;


string text =
    "Once there were two mice. They were friends. One mouse "
    "lived in the country; the other mouse lived in the city."
    "After many years the Country mouse saw the City mouse;"
    "he said, \"Do come and see me at my house in the country."
    "\" So the City mouse went. The City mouse said, \"This food"
    "is not good, and your house is not good. Why do you live "
    "in a hole in the field? You should come and live in the "
    "city. You would live in a nice house made of stone. You "
    "would have nice food to eat. You must come and see me at"
    "my house in the city.\"The Country mouse went to the house"
    "of the City mouse. It was a very good house. Nice food "
    "was set ready for them to eat. But just as they began to"
    "eat they heard a great noise. The City mouse cried, \" Run"
    "! Run! The cat is coming!\" They ran away quickly and hid"
    ".After some time they came out. When they came out, the "
    "Country mouse said, \"I do not like living in the city."
    "I like living in my hole in the field. For it is nicer"
    "to be poor and happy, than to be rich and afraid.";

int main (void)
{
	string pattern = "! The (.*?)(coming)[[:alnum:]]*";
	regex policy (pattern);
	for (sregex_iterator it (text.begin(), text.end(), policy), end_it; it != end_it ; ++it) {
		cout << "总表达式\n\t" << it->str() << "\n";
		if ( (*it) [1].matched)
			cout << "第1个子表达式\n\t" << it->str (1) << "\n";
			
		if ( (*it) [2].matched)
			cout << "第2个子表达式\n\t" << it->str (2);
	}
	
}

输出结果:

总表达式
	! The cat is coming
第1个子表达式
	cat is 
第2个子表达式
	coming

5. 查找并替换

regex_replace()用于查找并替换,

#include <iostream>
#include <regex>
#include <sstream>

using namespace std;


static const string text =
    "morgan (201) 555-2368 862-555-0123\n"
    "drew (973)555.0130\n"
    "lee (609) 555-0132 2015550175 800.555-0000";
int main (void)
{
	string phone_pattern =
	    "(\\()?" //可选左括号
	    "(\\d{3})" //区号
	    "(\\))?" //可选右括号
	    "([-. ])?" //可选分隔符
	    "(\\d{3})" //前三位
	    "([-. ])?" //可选分隔符
	    "(\\d{4})"; //后四位
	regex policy (phone_pattern);
	string format = "$2.$5.$7";//格式为 xxx.xxx.xxxx
	istringstream input (text);
	string line;
	while (getline (input, line)) {
		cout << regex_replace (line, policy, format) << endl;
	}
}

运行结果:

morgan 201.555.2368 862.555.0123
drew 973.555.0130
lee 609.555.0132 201.555.0175 800.555.0000

其中format中的$n表示第n个子表达式。

默认情况下,regex_replace会输出整个输入序列。
未与正则表达式匹配的部分会原样输出,匹配的部分按照格式字符来输出。
如果只想要匹配的部分,我们可以通过添加format_no_copy标志:

string fmt = "$2.$5.$7 "
cout << regex_replace (line, policy, format, regex_constants::format_no_copy) << endl;

此时输出结果:

201.555.2368 862.555.0123 
973.555.0130 
609.555.0132 201.555.0175 800.555.0000 

标准库定义了用来在替换过程中控制匹配或格式的标志。这些标志可以传递给函数regex_searchregex_match或是类smatchformat成员,例如format_no_copy是类型match_flag_type的值,定义在命名空间std::regex_constants中。

匹配标志(定义在regex_constants::mat_flag_type中)

match_default等价于format_default
match_not_bol不将首字符作为行首处理
match_not_eol不将尾字符作为行尾处理
match_not_bow不将首字符作为单词首处理
match_not_eow不将尾字符作为单词尾处理
match_any如果存在多于一个匹配,则可返回任意一个匹配
match_not_null不匹配任何空序列
match_continuous匹配必须从输入的首字符开始
match_prev_avail输入序列包含第一个匹配之前的内容
format_default用ECMAScript规则替换字符串
format_sed用POSIX sed规则替换字符串
format_no_copy不输出输入序列中未匹配的部分
format_first_only只替换子表达式的第一次出现

参考书籍: C++ Primer 5 中文版

使用记录

regex_replace()替换全部:

string replace_all (const string &text, const regex &pattern, const string &format)
{
    string result = text;
    while (regex_search (result, pattern)) {
        result = regex_replace (result, pattern, format);
    }
    return result;
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

barbyQAQ

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值