VS2010 regex的使用

转载 2012年03月22日 17:52:27

Getting started with C++ TR1 regular expressions

 

Overview
Header and namespace
C++ regular expression flavor
Matching
Retrieving matches
Replacing matches
Escape sequences
Case-sensitivity
Troubleshooting

Overview

This article is written for the benefit of someone familiar with regular expressions but not with the use of regular expressions in C++ via the TR1 (C++ Standards Committee Technical Report 1) extensions. Comparisons will be made with Perl for those familiar with Perl, though no knowledge of Perl is required. The focus is not on the syntax of regular expressions per se but rather how to use regular expressions to search for patterns and make replacements.

Support for TR1 extensions in Visual Studio 2008 is added as a feature pack. It is also included in Visual Studio 2010. Other implementations include the Boost and Dinkumware.

The C++ TR1 regular expression specification has an intimidating array of options. This article is intended to get you started, not to explore every nook and cranny. Getting started is the harder part since it's easier to find API details than basic examples.

The examples below use fully qualified namespaces for clarity. You could make your code more succinct by adding a few using statements to eliminate namespace qualifiers.

C++ TR1 regular expression flavor

The C++ TR1 regular expressions can follow the syntax of several regular expression environments depending on the optional flags sent to the regular expression class constructor. The six options given in the Microsoft implementation are as follows.

  • basic
  • extended
  • ECMAScript
  • awk
  • grep
  • egrep

The default for the Microsoft implementation is ECMAScript, matching the regular expression syntax of the ECMAScript (JavaScript) language, which is very similar to that in Perl 5.

The choice of flavors is extensible and implementation-specific. For example, the Boost implementation adds perl as an option, which presumably follows Perl 5 syntax more closely than the ECMASCript option does.

For someone familiar with regular expressions the difficulty in using regular expressions in C++ TR1 is not in the syntax of regular expressions themselves, but rather in using regular expressions to do work.

Header and namespace

The C++ regular expression functions are defined in the <regex> header and contained in the namespace std::tr1. Note that tr is lowercase in C++. In English prose “TR” is capitalized.

Matching

The first surprise you may run into with the C++ regular expression implementation is thatregex_match does not "match" in the usual sense. It will return true only when the entire string matches the regular expression. The function regex_search works more like the match operator in other environments, such as the m// operator in Perl.

To illustrate regex_match and regex_search start with a C++ string

        
    std::string str = "Hello world";
        

and construct a regular expression

        
    std::tr1::regex rx("ello");
        

The expression

        
    regex_match(str.begin(), str.end(), rx)
        

will return false because the string str contains more character beyond the match of the regular expression rx. However

        
    regex_search(str.begin(), str.end(), rx)
        

will return true because the regular expression matches a substring of str.

Retrieving matches

After performing a match in Perl, the captured matches are stored in the variables $1$2, etc. Similarly, after a C++ places matches in a match_result object. However, while Perl always creates $nvariables, C++ does not store matches unless you call an overloaded form ofregex_search that takes a match_result object. The class match_result is a template; often people use the class cmatch defined by

        
    typedef match_results<const char*> cmatch
        

The following example shows how retrieve captured matches.

        
    std::tr1::cmatch res;
    str = "<h2>Egg prices</h2>";
    std::tr1::regex rx("<h(.)>([^<]+)");
    std::tr1::regex_search(str.c_str(), res, rx);
    std::cout << res[1] << ". " << res[2] << "\n";
        

The code above will output

        
    2. Egg prices
        

Note that res[n] corresponds to Perl's $n.

Replacing matches

The following code will replace “world” in the string “Hello world” with “planet”. The stringstr2 will contain “Hello planet” and the string str will remain unchanged.

        
    std::string str = "Hello world";
    std::tr1::regex rx("world");
    std::string replacement = "planet";
    std::string str2 = std::tr1::regex_replace(str, rx, replacement);
        

Note that regex_replace does not change its arguments, unlike the Perl commands/world/planet/.

Note also that the third argument to regex_replace must be a string class and not a string literal. You could, however, eliminate the temporary variable replacement by changing the call to regex_replace with a string literal cast to a string.

        
    regex_replace(str, rx, std::string("planet"))
        

By default, all instances of the pattern that match the regular expression are replaced. In the example above, if str had been "Hello world world" the result would have been "Hello planet planet". To replace only the first instance (to produce "Hello planet world" you would need to add the flag

        
    std::tr1::regex_constants::format_first_only
        

as the fourth argument to regex_replace.

Because the default behavior of regex_replaceis a global replace, the function is analogous to the s///g operator in Perl. With the format_first_only flag the function is analogous to the unmodified s/// Perl operator.

Escape sequences

Regular expression processing is not as convenient in C++ as it is in languages such as Perl that have built-in regular expression support. One reason is escape sequences. To send a backslash\ to the regular expression engine, you have to type \\ in the source code. For example, consider these definitions.

        
    std::string str = "Hello\tworld";
    std::tr1::regex rx("o\\tw");
        

The string str contains a tab character between the o and the w. The regular expression rxdoes not contain a tab character; it contains \t, the regular expression syntax for matching a tab character.

Case-sensitivity

C++ regular expressions are case-sensitive by default, as in Perl and many other environments. To specify that a regular expression is case-insensitive, add the flagstd::tr1::regex_constants::icase as a second argument to the regex constructor. (The constructor flags can be combined with a bit-wise. So if you're specifying a flag for the regular expression flavor, you can follow it with | icase to combine the two.)

Support for case-sensitivity highlights the differences between C++ and scripting languages. C++ allows more control over regular expressions but also requires more input. For example, Perl makes the m// (match) and s/// (replace) operators case-insensitive by simply appending an i. While the regular expression syntax in C++ is more cluttered than that of scripting languages, people who use C++ are doing so because they value control over succinct syntax.

Troubleshooting

If you have trouble linking with the regex library in Visual Studio 2008, this post may help.

Furter resources

Other C++ articles:

Using regular expressions in other languages:

Daily tips on regular expressions



转帖:http://www.johndcook.com/cpp_regex.html

[C/C++11]_[初级]_[使用正则表达式库regex]

场景 正则表达式在处理非常量字符串查找,替换时能很省事,如果稍微复杂点的字符串匹配, 没有正则表达式还真做不出来. C++11 为我们提供了正则表达式库. 使用起来比boost的正则库方便. 搞Jav...
  • infoworld
  • infoworld
  • 2016年03月21日 14:56
  • 5818

[C/C++11]_[初级]_[使用正则表达式库regex]

场景 正则表达式在处理非常量字符串查找,替换时能很省事,如果稍微复杂点的字符串匹配, 没有正则表达式还真做不出来. C++11 为我们提供了正则表达式库. 使用起来比boost的正则库方便. 搞Jav...
  • infoworld
  • infoworld
  • 2016年03月21日 14:56
  • 5818

extjs中regex和vtype区别以及regex的详细用法

extjs中regex和vtype都是用来进行表单验证的,regex是通过正则表达式对单个输入控件的内容进行格式验证,而vtype既可以对单个控件的输入格式进行验证还能对多可控件的内容进行关联验证。比...
  • wbtrip1993
  • wbtrip1993
  • 2015年06月30日 11:27
  • 549

Java 中正则表达式的运用 Java.util.regex.*

Java 中正则表达式的运用 Java.util.regex.* 看代码: /** * 版权所有 (c) 2016,小明有限公司   */ package RegexTest; imp...
  • wangming520liwei
  • wangming520liwei
  • 2016年12月15日 16:17
  • 883

C# Regex类用法

一、C#正则表达式符号模式   字  符 描  述 \ 转义字符,将一个具有特殊功能的字符转义为一个普通字符,或反过来 ...
  • Paris_chenxin
  • Paris_chenxin
  • 2016年10月21日 17:45
  • 4467

C# 正则表达式Regex

正则表达式,使得字符串的操作变得更加方便。 由于正则表达式是用于处理字符串,因此正则的类被放置在System.Text.RegularExpressions中。 使用示例: 1、使用情景:当我们复...
  • Rechard0121
  • Rechard0121
  • 2014年02月19日 16:40
  • 797

Python的regex模块——更强大的正则表达式引擎

Python自带了正则表达式引擎(内置的re模块),但是不支持一些高级特性,比如下面这几个: 固化分组    Atomic grouping占有优先量词    Possessive quanti...
  • lwqhp
  • lwqhp
  • 2017年06月07日 00:26
  • 1537

正则表达式Regex

1.概念正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。正则表通常被用来检索、替换那些符合某个模式(规...
  • qq_33337811
  • qq_33337811
  • 2017年01月25日 11:53
  • 1229

Perl regex 快速入门

Perl 正则表达式 快速入门 Simple word matching 简单的字符匹配 元字符 使用字符类 using character classes 或操作 Matching this o...
  • dark5669
  • dark5669
  • 2017年06月09日 15:15
  • 485

C#Regex正则表达式学习笔记

//判断输入的字符串只包含汉字  Regex regex = new Regex("^[/u4e00-/u9fa5]+$"); //判断输入的字符串是否是一个合法的手机号  Regex regex =...
  • saccharine
  • saccharine
  • 2016年11月04日 09:58
  • 884
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:VS2010 regex的使用
举报原因:
原因补充:

(最多只允许输入30个字)