boost::regex学习

最新推荐文章于 2020-03-07 15:18:03 发布

universee

最新推荐文章于 2020-03-07 15:18:03 发布

阅读量501

点赞数

boost::regex学习

浏览次数：3211次 2007年09月12日博客园字号: 大中小

分享到： QQ空间新浪微博腾讯微博人人网豆瓣网开心网更多 2

一：编译
boost的正则表达式需要编译（如果不需要全部Boost的功能的话，请不要build all boost，那会花掉好几个小时。我推荐仅仅build需要的库就好。）
原有的boost 1.33似乎使用vc8编译的时候有问题。下载boost 1.34.1，使用“Visual Studio 2005 Command Prompt”，进入到boost_1_34_1libs egexuild：
nmake vc8.mak
OK，生成的文件在vc80下。

二：学习正则表达式
deelx_zh.rar
不错的正则表达式的学习资料，顺便推荐一下：
http://www.regexlab.com/
这个站长还与我有个一信之缘（我写的P2P之UDP穿透NAT的原理与实现（附源代码））。站长的这个正则库在CodeProject获得了不错的评价。

三：简单的例子
    std::string regstr = "a+";
    boost::regex expression(regstr);
    std::string testString = "aaa";

    // 匹配至少一个a
    if( boost::regex_match(testString, expression) )
    {
        std::cout<< "Match" << std::endl;
    }
    else
    {
        std::cout<< "Not Match" << std::endl;
    }

四：regex_match例子代码学习
1 我们经常会看一个字符串是不是合法的IP地址，合法的IP地址需要符合以下这个特征：
xxx.xxx.xxx.xxx 其中xxx是不超过255的整数
正则表达式找到上面的这种形式的字符串相当容易，只是判断xxx是否超过255就比较困难了（因为正则表达式是处理的文本，而非数字）
OK，我们先来处理一个数字，即：xxx。找到一种表达式来处理这个数字，并且保证这个数字不会超过255
第一种情况：x，即只有一个数字，它可以是0～9 ，用d 表示
第二种情况：xx，即有两个数字，它可以是00～99，用dd 表示
第三种情况：xxx，这种情况分为两种，一种是 1xx，可以用 1dd 表示
另外一种是 2xx，这又分为两种 2[1234]d
和 25[12345]
好了组合起来
1?d{1,2}|2[1234]d|25[12345]
既可以标识一个不大于255的数字字符串

嗯，我们现在需要重复这种情况既可：
(1?d{1,2}|2[1234]d|25[12345]).(1?d{1,2}|2[1234]d|25[12345]).(1?d{1,2}|2[1234]d|25[12345]).(1?d{1,2}|2[1234]d|25[12345])

呵呵，长是长了点，我试图用boost支持的子表达式缩短，但是没有达到效果，请各位了解boost的正则表达式的达人指点：
(1?d{1,2}|2[1234]d|25[12345]).1$.1$.1$
(参看反向索引： http://www.boost.org/libs/regex/doc/syntax_perl.html
似乎反向只能匹配与第一个字符完全一样的字符串，与我们的需求不同)

Example：

2 我们来看看 regex_match的另外一个函数原型
template <class ST, class SA, class Allocator, class charT, class traits>
    bool regex_match(const basic_string<charT, ST, SA>& s,
     match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m,
    const basic_regex <charT, traits>& e, match_flag_type flags = match_default);

template <class BidirectionalIterator, class Allocator, class charT, class traits>
bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
                 match_results<BidirectionalIterator, Allocator>& m,
                 const basic_regex <charT, traits>& e,
                 match_flag_type flags = match_default);

注意参数m，如果这个函数返回false的话，m无定义。如果返回true的话，m的定义如下

Element	Value
m.size()	e.mark_count()
m.empty()	false
m.prefix().first	first
m.prefix().last	first
m.prefix().matched	false
m.suffix().first	last
m.suffix().last	last
m.suffix().matched	false
m[0].first	first
m[0].second	last
m[0].matched	`true` if a full match was found, and `false` if it was a partial match (found as a result of the `match_partial` flag being set).
m[n].first	For all integers n < m.size(), the start of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last.
m[n].second	For all integers n < m.size(), the end of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last.
m[n].matched	For all integers n < m.size(), true if sub-expression nparticipated in the match, false otherwise.

Example:

std:: string regstr = " (1?\d{1,2}|2[1234]\d|25[12345])\.(1?\d{1,2}|2[1234]\d|25[12345])\.(1?\d{1,2}|2[1234]\d|25[12345])\.(1?\d{1,2}|2[1234]\d|25[12345]) " ;
boost::regex expression(regstr);
std:: string testString = " 192.168.4.1 " ;
boost::smatch what;
if ( boost::regex_match(testString, what, expression) )
{
    std::cout << " This is ip address " << std::endl;
     for ( int i = 1 ;i <= 4 ;i ++ )
    {
        std:: string msg(what[i].first, what[i].second);
        std::cout << i << " ： " << msg.c_str() << std::endl;
    }
}
else
{
    std::cout << " This is not ip address " << std::endl;
}

这个例子会把所有的IP的单个数字答应出来：
This is ip address
1：192
2：168
3：4
4：1

五：regex_search学习
regex_search与regex_match基本相同，只不过regex_search不要求全部匹配，即部份匹配（查找）即可。
简单例子：

std:: string regstr = " (\d+) " ;
boost::regex expression(regstr);
std:: string testString = " 192.168.4.1 " ;
boost::smatch what;
if ( boost::regex_search(testString, expression) )
{
std::cout << " Have digit " << std::endl;
}

上面这个例子检测给出的字符串中是否包含数字。

好了，再来一个例子，用于打印出所有的数字

std:: string regstr = " (\d+) " ;
boost::regex expression(regstr);
std:: string testString = " 192.168.4.1 " ;
boost::smatch what;
std:: string ::const_iterator start = testString.begin();
std:: string ::const_iterator end = testString.end();
while ( boost::regex_search(start, end, what, expression) )
{
    std::cout << " Have digit： " ;
    std:: string msg(what[ 1 ].first, what[ 1 ].second);
    std::cout << msg.c_str() << std::endl;
    start = what[ 0 ].second;
}

打印出：
Have digit：192
Have digit：168
Have digit：4
Have digit：1

六：关于重复的贪婪
我们先来一个例子：

std:: string regstr = " (.*)(age)(.*)(\d{2}) " ;
boost::regex expression(regstr);
std:: string testString = " My age is 28 His age is 27 " ;
boost::smatch what;
std:: string ::const_iterator start = testString.begin();
std:: string ::const_iterator end = testString.end();
while ( boost::regex_search(start, end, what, expression) )
{

    std:: string name(what[ 1 ].first, what[ 1 ].second);
    std:: string age(what[ 4 ].first, what[ 4 ].second);
    std::cout << " Name: " << name.c_str() << std::endl;
    std::cout << " Age: " << age.c_str() << std::endl;
    start = what[ 0 ].second;
}

我们希望得到的是打印人名，然后打印年龄。但是效果令我们大失所望：
Name:My age is 28 His
Age:27

嗯，查找原因：这是由于"+"号或者"*"号等重复符号带来的副作用，这些符号会消耗尽可能多的输入，使之是“贪婪”的。即正则表达式(.*)会匹配最长的串，而不是匹配最短的成功串。
如何使得这些重复的符号不再“贪婪”，我们在重复符号后加上"?"即可。

std:: string regstr = " (.*?)(age)(.*?)(\d{2}) " ;
boost::regex expression(regstr);
std:: string testString = " My age is 28 His age is 27 " ;
boost::smatch what;
std:: string ::const_iterator start = testString.begin();
std:: string ::const_iterator end = testString.end();
while ( boost::regex_search(start, end, what, expression) )
{

    std:: string name(what[ 1 ].first, what[ 1 ].second);
    std:: string age(what[ 4 ].first, what[ 4 ].second);
    std::cout << " Name: " << name.c_str() << std::endl;
    std::cout << " Age: " << age.c_str() << std::endl;
    start = what[ 0 ].second;
}

打印输出：
Name:My
Age:28
Name: His
Age:27

七： regex_replace 学习
写了个去除左侧无效字符（空格，回车，TAB）的正则表达式。

std:: string testString = " Hello World ! GoodBye World " ;
std:: string TrimLeft = " ([\s\r\n\t]*)(\w*.*) " ;
boost::regex expression(TrimLeft);
testString = boost::regex_replace( testString, expression, " $2 " );
std::cout << " TrimLeft: " << testString << std::endl;

打印输出：
TrimLeft:Hello World ! GoodBye World

问题是去除右侧无效字符的正则表达式该怎么写？哪位大侠显个灵，帮助写写看，多谢了。

原文链接： http://www.cnblogs.com/shootingstars/archive/2007/07/30/837522.html

http://www.kuqin.com/cpluspluslib/20070912/1033.html

universee

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
boost::regex学习

boost::regex学习浏览次数：3211次 2007年09月12日博客园字号: 大中小分享到：QQ空间新浪微博腾讯微博人人网豆瓣网开心网更多2一：编译boost的正则表达式需要编译（如果不需要全部Boost的功能的话，请不要build all boost，那会花掉好几个小时。我推荐仅仅build需要的库就好。）原有的boost 1.
复制链接

扫一扫