跟我学c++中级篇——正则表达式

最新推荐文章于 2022-03-13 16:15:14 发布

fpcc

最新推荐文章于 2022-03-13 16:15:14 发布

阅读量273

点赞数 3

分类专栏： C++11 C++ 文章标签：正则表达式 c++

本文链接：https://blog.csdn.net/fpcc/article/details/120176987

版权

C++ 同时被 2 个专栏收录

227 篇文章 32 订阅

订阅专栏

C++11

224 篇文章 92 订阅

订阅专栏

一、正则表达式

什么是正则表达式？这个东西非常简单理解，举一个例子，如果不小心把豆子、大米等多种粮食混合在了一起，现在又要分开。怎么办？最简单的方法是需要啥就一个个的捡出来。但是如果需要两种、三种…这时候儿可以动一下脑子，是不是可以用一个多层的筛子，每个筛子的网眼不同，那么不同的粮食就会自动的被过滤到指定的位置上。其实，正则的本质也是如此。只是实现的手段不同罢了，实现正则的方式一般有两种，一种是DFA（确定有限状态自动机）,一种是NFA（一种是非确定有限状态自动机），其实它们基本都是对树的遍历，有过编译原理知识的可能很快就明白这其中的味道了。没有编译原理经验的也不要心慌，毕竟只是用一下，不是让你实现正则。
说得简单一些，其实正则表达式就是对相关源数据的一种过滤手段，通过一系列的表达式来寻找或者替换掉相关的字符串。看一下常见的例子：

//验证手机号码
"^1([38][0-9]|4[5-9]|5[0-3,5-9]|66|7[0-8]|9[89])[0-9]{8}$"
//验证身份证
"\d{14}[[0-9],0-9xX]"  //15位
"\d{17}(\d|X|x)"；      //18位
//验证用户名和密码
（"^[a-zA-Z]\w{5,15}$"） //要求：首位必须是字符，长度6~16位，大小写字母和数字组成。

正则表达式，对于初学者来说，并不友好，但是如果应用到具体的场合上，还是非常有杀伤力的，可以大幅的降低开发的成本和命令行的数量。

二、c++11的正则表达式

在c++11中同样提供了对正则表达式的支持，这也是几乎所有语言都必须拥有的一个功能库提供，下面看一下在c++11中是如何提供这个库支持的。先看一下文档中的说明：

The regular expressions library provides a class that represents regular expressions, which are a kind of mini-language used to perform pattern matching within strings. Almost all operations with regexes can be characterized by operating on several of the following objects:

Target sequence. The character sequence that is searched for a pattern. This may be a range specified by two iterators, a null-terminated character string or a std::string.
Pattern. This is the regular expression itself. It determines what constitutes a match. It is an object of type std::basic_regex, constructed from a string with special syntax. See regex_constants::syntax_option_type for the description of supported syntax variations.
Matched array. The information about matches may be retrieved as an object of type std::match_results.
Replacement string. This is a string that determines how to replace the matches, see regex_constants::match_flag_type for the description of supported syntax variations.

大概的意思就是它提供通过目标序列（通常就是需要处理的文本）、模式匹配（就是上面的模式字符）、匹配数组（模式字符组合）和字符串替换规则（如何替换相关的文本）等一系列的步骤来实现。在上面的说明中也提到了c++11为了实现这几个步骤提供了的相关的基础类和应用的方式。

三、源码分析

看一下正则在源码中如何进行的：

template<class _Elem,
	class _Alloc,
	class _RxTraits> inline
	bool regex_match(_In_z_ const _Elem *_Str,
		match_results<const _Elem *, _Alloc>& _Matches,
		const basic_regex<_Elem, _RxTraits>& _Re,
		regex_constants::match_flag_type _Flgs =
			regex_constants::match_default)
	{	// try to match regular expression to target text
	const _Elem *_Last = _Str + char_traits<_Elem>::length(_Str);
	return (_Regex_match1(_Str, _Last,
		&_Matches, _Re, _Flgs, true));
	}
template<class _BidIt,
	class _Alloc,
	class _Elem,
	class _RxTraits,
	class _It> inline
	bool _Regex_match1(_It _First, _It _Last,
		match_results<_BidIt, _Alloc> *_Matches,
		const basic_regex<_Elem, _RxTraits>& _Re,
		regex_constants::match_flag_type _Flgs,
		bool _Full)
	{	// try to match regular expression to target text
	if (_Re._Empty())
		return (false);
	_Matcher<_BidIt, _Elem, _RxTraits, _It> _Mx(_First, _Last,
		_Re._Get_traits(), _Re._Get(), _Re.mark_count() + 1, _Re.flags(),
			_Flgs);
	return (_Mx._Match(_Matches, _Full));
	}
	template<class _Alloc>
		bool _Match(match_results<_BidIt, _Alloc> *_Matches,
			bool _Full_match)
		{	// try to match
		if (_Matches)
			{	// clear _Matches before doing work
			_Matches->_Ready = true;
			_Matches->_Resize(0);
			}

		_Begin = _First;
		_Tgt_state._Cur = _First;
		_Tgt_state._Grp_valid.resize(_Get_ncap());
		_Tgt_state._Grps.resize(_Get_ncap());
		_Cap = _Matches != nullptr;
		_Full = _Full_match;
		_Max_complexity_count = _REGEX_MAX_COMPLEXITY_COUNT;
		_Max_stack_count = _REGEX_MAX_STACK_COUNT;

		_Matched = false;

		if (!_Match_pat(_Rep))
			return (false);

		if (_Matches)
			{	// copy results to _Matches
			_Matches->_Resize(_Get_ncap());
			for (unsigned int _Idx = 0; _Idx < _Get_ncap(); ++_Idx)
				{	// copy submatch _Idx
				if (_Res._Grp_valid[_Idx])
					{	// copy successful match
					_Matches->_At(_Idx).matched = true;
					_Matches->_At(_Idx).first = _Res._Grps[_Idx]._Begin;
					_Matches->_At(_Idx).second = _Res._Grps[_Idx]._End;
					}
				else
					{	// copy failed match
					_Matches->_At(_Idx).matched = false;
					_Matches->_At(_Idx).first = _End;
					_Matches->_At(_Idx).second = _End;
					}
				}
			_Matches->_Org = _Begin;
			_Matches->_Pfx().first = _Begin;
			_Matches->_Pfx().second = _Matches->_At(0).first;
			_Matches->_Pfx().matched =
				_Matches->_Pfx().first != _Matches->_Pfx().second;

			_Matches->_Sfx().first = _Matches->_At(0).second;
			_Matches->_Sfx().second = _End;
			_Matches->_Sfx().matched =
				_Matches->_Sfx().first != _Matches->_Sfx().second;

			_Matches->_Null().first = _End;
			_Matches->_Null().second = _End;
			}
		return (true);
		}

MSVC提供的库代码就是乱烘烘的，凑和着看吧。其实你看整个代码，其实就是上面提供的几个过程的类的创建和应用过程。

四、实例

看一下几个相关的实例（cppreference.com）:

 
 #include <iostream>
#include <iterator>
#include <string>
#include <regex>
 
int main()
{
    std::string s = "Some people, when confronted with a problem, think "
        "\"I know, I'll use regular expressions.\" "
        "Now they have two problems.";
 
    std::regex self_regex("REGULAR EXPRESSIONS",
            std::regex_constants::ECMAScript | std::regex_constants::icase);
    if (std::regex_search(s, self_regex)) {
        std::cout << "Text contains the phrase 'regular expressions'\n";
    }
 
    std::regex word_regex("(\\w+)");
    auto words_begin = 
        std::sregex_iterator(s.begin(), s.end(), word_regex);
    auto words_end = std::sregex_iterator();
 
    std::cout << "Found "
              << std::distance(words_begin, words_end)
              << " words\n";
 
    const int N = 6;
    std::cout << "Words longer than " << N << " characters:\n";
    for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
        std::smatch match = *i;
        std::string match_str = match.str();
        if (match_str.size() > N) {
            std::cout << "  " << match_str << '\n';
        }
    }
 
    std::regex long_word_regex("(\\w{7,})");
    std::string new_s = std::regex_replace(s, long_word_regex, "[$&]");
    std::cout << new_s << '\n';
}

运行结果：

 Text contains the phrase 'regular expressions'
Found 20 words
Words longer than 6 characters:
  confronted
  problem
  regular
  expressions
  problems
Some people, when [confronted] with a [problem], think 
"I know, I'll use [regular] [expressions]." Now they have two [problems].

这个东西得多用，不常用慢慢就忘记了，多看一些例程，再根据实际情况多应用几回，掌握它应该没有什么难度。其实这个正则表达式在c++中用处真的不是特别大（相对于前端和Web开发），一般在前端经常性的要校验用户名和密码以及IP、URL还有身份证啥的，这才是用得最广泛的地方。
不过，这不代表在c++中没有用武之地，否则人家也不会提供这个库了。在网络服务端编程中，对IP和URL也要进行检验，以及数据库的一些操作中，都是需要用到的。少，不代表没有。

五、总结

标准库就意味着有好多通用的东西是要写到里面的，目的就是为了减轻开发者的负担，同时让开发者依赖于标准库，形成一个互相促进的正反馈效果。c++的标准库在这方面其实做的并不是特别优秀，好多需要的东西都需要自己造或者使用第三方的库，这也是这几年新的c++标准库迭代快速的一个主要原因。不过，这也是一个双刃剑，过多的使用标准库，会使得c++开发者感觉不到了c++的灵活和方便。正所谓，事物都有两面性。到底是好还是不好，交给时间来决定吧。
在这里插入图片描述

fpcc

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
跟我学c++中级篇——正则表达式

一、正则表达式什么是正则表达式？这个东西非常简单理解，举一个例子，如果不小心把豆子、大米等多种粮食混合在了一起，现在又要分开。怎么办？最简单的方法是需要啥就一个个的捡出来。但是如果需要两种、三种…这时候儿可以动一下脑子，是不是可以用一个多层的筛子，每个筛子的网眼不同，那么不同的粮食就会自动的被过滤到指定的位置上。其实，正则的本质也是如此。只是实现的手段不同罢了，实现正则的方式一般有两种，一种是DFA（确定有限状态自动机）,一种是NFA（一种是非确定有限状态自动机），其实它们基本都是对树的遍历，有过编译原理
复制链接

扫一扫