VC正则表达式的使用

最新推荐文章于 2022-05-28 11:38:57 发布

xust999

最新推荐文章于 2022-05-28 11:38:57 发布

阅读量1.4k

点赞数

分类专栏： VC & C、c++

VC & C、c++ 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

VC正则表达式的使用

正则表达式是一种对字符进行模糊匹配的一个公式。在数据有效性验证，查找，替换文本中都可以使用正则表达式。

本篇文章主要描述的是使用ATL中两个模板类CAtlRegExp和CAtlREMatchContext。

在使用CAtlRegExp类之前需要添加#include <atlrx.h> 这个头文件。

RegExp是Regular Expression的缩写

以匹配邮件地址字符串为例说明两个类的使用

该示例更改自http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx

CString strRegex=L"({[0-9_]+@[a-zA-Z0-9]+[.][a-zA-Z0-9]+[.]?[a-zA-Z0-9]+})";

CString strInput;

strInput=L"admin@domain.com";

CAtlRegExp<CAtlRECharTraitsW> reRule;

wchar_t *wt = (wchar_t *)(LPCTSTR)strRegex;

REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);

if (REPARSE_ERROR_OK != status)

{

return 0;

}

CAtlREMatchContext<CAtlRECharTraitsW> mcRule;

wt = (wchar_t *)(LPCTSTR)strInput;

if (!reRule.Match((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt,&mcRule))

{

AfxMessageBox(L"您输入的邮件地址不合法！");

}

else

{

for (UINT nGroupIndex = 0; nGroupIndex < mcRule.m_uNumGroups; ++nGroupIndex)

{

const CAtlREMatchContext<>::RECHAR* szStart = 0;

const CAtlREMatchContext<>::RECHAR* szEnd = 0;

mcRule.GetMatch(nGroupIndex, &szStart, &szEnd);

ptrdiff_t nLength = szEnd - szStart;

CString strEmailAddress(szStart, static_cast<int>(nLength));

if(strEmailAddress.Compare(strInput)!=0)

{

CString strPrompt;

strPrompt.Format(L"您输入的邮件地址不合法，您要输入%s 吗！",strEmailAddress);

AfxMessageBox(strPrompt);

}

else

{

AfxMessageBox(L"输入的邮件地址正确！");

}

这两个模板类由另一个描述字符集特性的类参数化，可以是ASCII，WCHAR 或多字节。

可以将此忽略掉，因为根据设置的字符集，模板类自动生成具体的类。

在atlrx.h文件中供选择的有三个类

CAtlRECharTraitsA 用于ASCII

CAtlRECharTraitsW 用于UNICODE

CAtlRECharTraitsMB 用于多字节

在VC2005默认的字符集是使用Unicode字符集

根据正则的源码

#ifndef _UNICODE

typedef CAtlRECharTraitsA CAtlRECharTraits;

#else // _UNICODE

typedef CAtlRECharTraitsW CAtlRECharTraits;

#endif // !_UNICODE

所以构造CAtlRegExp类可以是

CAtlRegExp<> reRule;

REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);

也可以是

CAtlRegExp<CAtlRECharTraitsW> reRule;

REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);

通过调用CAtlRegExp的Parse()方法，使用正则表达式字符串作为参数，就可以构造出一个我们所需要的类。

调用CATLRegExp的Match()函数

Match()函数参数说明

第一个参数是要对比的字符串，

第二个参数是存储match的结果

CAtlREMatchContext的成员变量m_uNumGroups表示匹配的Group

CAtlREMatchContext的GetMatch()函数返回匹配上的字符串的pStart和pEnd指针

以下从MSDN摘录的正则表达语法

原文是http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx

Regular Expression Syntax

This table lists the metacharacters understood by CAtlRegExp.

Metacharacter	Meaning
.	Matches any single character.
[ ]	Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").
^	If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c"). If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").
-	In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").
?	Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").
+	Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "456", and so on).
*	Indicates that the preceding expression matches zero or more times.
??, +?, *?	Non-greedy versions of ?, +, and . These match as little as possible, unlike the greedy versions that match as much as possible (for example, given the input "<abc><def>", <.?> matches "<abc>" while <.*> matches "<abc><def>").
( )	Grouping operator. Example: (/d+,)*/d+ matches a list of numbers separated by commas (for example, "1" or "1,23,456").
{ }	Indicates a match group. The actual text in the input that matches the expression inside the braces can be retrieved through theCAtlREMatchContext object.
/	Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]/+ matches a digit followed by a plus character). Also used for abbreviations (such as /a for any alphanumeric character; see the following table). If / is followed by a number n, it matches the nth match group (starting from 0). Example: <{.?}>.?<//0> matches "<head>Contents</head>". Note that, in C++ string literals, two backslashes must be used: "//+", "//a", "<{.?}>.?<///0>".
$	At the end of a regular expression, this character matches the end of the input (for example,[0-9]$ matches a digit at the end of the input).
\|	Alternation operator: separates two expressions, exactly one of which matches (for example, T\|the matches "The" or "the").
!	Negation operator: the expression following ! does not match the input (for example, a!b matches "a" not followed by "b").

Abbreviations

CAtlRegExp can handle abbreviations, such as /d instead of [0-9]. The abbreviations are provided by the character traits class passed in the CharTraits parameter. The predefined character traits classes provide the following abbreviations.

Abbreviation	Matches
/a	Any alphanumeric character: ([a-zA-Z0-9])
/b	White space (blank): ([ //t])
/c	Any alphabetic character: ([a-zA-Z])
/d	Any decimal digit: ([0-9])
/h	Any hexadecimal digit: ([0-9a-fA-F])
/n	Newline: (/r\|(/r?/n))
/q	A quoted string: (/"[^/"]/")\|(/'[^/']/')
/w	A simple word: ([a-zA-Z]+)
/z	An integer: ([0-9]+)

关于语法翻译可参考http://www.vckbase.com/document/viewdoc/?id=1138

摘录

字符元	意义
.	匹配单个字符
[ ]	指定一个字符类，匹配方括号内的任意字符。例：[abc] 匹配 "a", "b"或 "c"。
^	如果^出现在字符类的开始处，它否定了字符类，这个被否定的字符类匹配除却方括号内的字符的字符。如：[^abc]匹配除了"a", "b"和"c"之外的字符。如果^出现在正则表达式前边，它匹配输入的开头，例：^[abc]匹配以"a", "b"或"c"开头的输入。
-	在字符类中，指定一个字符的范围。例如：[0-9]匹配"0"到"9"的数字。
?	指明?前的表达式是可选的，它可以匹配一次或不进行匹配。例如： [0-9][0-9]? 匹配"2"或"12"。
+	指明?前的表达式匹配一次或多次。例如：[0-9]+匹配"1", "13", "666"等。
*	指明*前的表达式匹配零次或多次。
??, +?, *?	?, +和的非贪婪匹配版本，它们尽可能匹配较少的字符；而?, +和则是贪婪版本，尽可能匹配较多的字符。例如：输入"<abc><def>", 则<.?> 匹配"<abc>"，而<.>匹配"<abc><def>"。
( )	分组操作符。例如：(/d+,)*/d+匹配一串由逗号分开的数字，例如： "1"或"1,23,456"。
/	转义字符，转义紧跟的字符。例如，[0-9]+ 匹配一个或多个数字，而 [0-9]/+ 匹配一个数字后跟随一个加号的情况。反斜杠/也用于表示缩写，/a 就表示任何数字、字母。如果/后紧跟一个数字n，则它匹配第n个匹配群组(从0开始)，例如，<{.?}>.?<//0>匹配"<head>Contents</head>"。注意，在C++字符串中，反斜杠/需要用双反斜杠//来表示： "//+", "//a", "<{.?}>.?<///0>"。
$	放在正则表达式的最后，它匹配输入的末端。例如：[0-9]$匹配输入的最后一个数字。
\|	间隔符，分隔两个表达式，以正确匹配其中一个，例如：T\|the匹配"The" 或"the"。

对“^”的强调：

如果^出现在字符类的开始处，它否定了字符类，例：[^abc]匹配除了"a", "b"和"c"之外的字符

如果^出现在正则表达式前边，它匹配输入的开头，例：^[abc]匹配以"a", "b"或"c"开头的输入

对“$”的强调：

出现在末尾，表示结束。

缩写匹配

缩写	匹配
/a	字母、数字([a-zA-Z0-9])
/b	空格(blank): ([ //t])
/c	字母([a-zA-Z])
/d	十进制数 ([0-9])
/h	十六进制数([0-9a-fA-F])
/n	换行: (/r\|(/r?/n))
/q	引用字符串(/"[^/"]/")\|(/''''[^/'''']/'''')
/w	一段文字 ([a-zA-Z]+)
/z	一个整数([0-9]+)

　　正则表达式语法

字符元意义 . 匹配单个字符 [ ] 指定一个字符类，匹配方括号内的任意字符。例：[abc] 匹配 "a", "b"或 "c"。 ^ 如果^出现在字符类的开始处，它否定了字符类，这个被否定的字符类匹配除却方括号内的字符的字符。如：[^abc]匹配除了"a", "b"和"c"之外的字符。如果^出现在正则表达式前边，它匹配输入的开头，例：^[abc]匹配以"a", "b"或"c"开头的输入。 - 在字符类中，指定一个字符的范围。例如：[0-9]匹配"0"到"9"的数字。 ? 指明?前的表达式是可选的，它可以匹配一次或不进行匹配。例如： [0-9][0-9]? 匹配"2"或"12"。 + 指明?前的表达式匹配一次或多次。例如：[0-9]+匹配"1", "13", "666"等。

指明*前的表达式匹配零次或多次。 ??, +?, *? ?, +和*的非贪婪匹配版本，它们尽可能匹配较少的字符；而?, +和*则是贪婪版本，尽可能匹配较多的字符。例如：输入"<abc><def>", 则<.*?> 匹配"<abc>"，而<.*>匹配"<abc><def>"。 ( ) 分组操作符。例如：(d+,)*d+匹配一串由逗号分开的数字，例如： "1"或"1,23,456"。转义字符，转义紧跟的字符。例如，[0-9]+ 匹配一个或多个数字，而 [0-9]+ 匹配一个数字后跟随一个加号的情况。反斜杠也用于表示缩写，a 就表示任何数字、字母。如果后紧跟一个数字n，则它匹配第n个匹配群组(从0开始)，例如，<{.*?}>.*?</>匹配 "<head>Contents</head>"。注意，在C++字符串中，反斜杠需要用双反斜杠来表示： "+", "a", "<{.*?}>.*?</ >"。 $ 放在正则表达式的最后，它匹配输入的末端。例如：[0-9]$匹配输入的最后一个数字。 | 间隔符，分隔两个表达式，以正确匹配其中一个，例如：T|the匹配"The" 或"the"。

　　缩写匹配

缩写匹配 a 字母、数字([a-zA-Z0-9]) 空格(blank): ([ ]) c 字母([a-zA-Z]) d 十进制数 ([0-9]) h 十六进制数([0-9a-fA-F]) 换行: ( |( ? )) q 引用字符串("[^"]*")|(''''[^'''']*'''') w 一段文字 ([a-zA-Z]+) z 一个整数([0-9]+)

　　ATL CATLRegExp

　　ATL Server常常需要对地址、命令等复杂文字字段信息解码，而正则表达式是强大的文字解析工具，所以，ATL提供了正则表达式解释工具。

其他：

头文件不存在的问题：

VS 2008中由于将ALT项目的部分代码剥离出去成为了独立的开源项目，需要用到ALT中正则表达式等功能就需要手动下载。

参考：http: //connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=306398

下载地址：http: //www.codeplex.com/AtlServer

把下载的东西解压缩到一个目录，比如c:\alt\

在VS里面[工具]--[选项]--[项目和解决方案]--[VC++目录]，在右上角选择[包含引用的文件]中加入c:\alt\include就OK了

CAtlRegExp 及 GRETA 不支持 {m,n} 这样的限定符而Boost支持

还有一个值得注意的地方就是ATL中用大括号"({})"表示其子匹配

子匹配Group从0开始.

例如：

re.Parse("^{\\w+}\\b+{(\\a|\\b)+}$");

// {} 表示要保存到匹配结果MathContext中，等同于PERL的()
// () 表示把集合，等同于PERL的[]

参考：

http://blog.csdn.net/abcpanpeng/article/details/4461836

http://blog.csdn.net/whucv/article/details/7880796

http://blog.sina.com.cn/s/blog_4840fe2a0100rbsq.html

http://blog.csdn.net/wu_huiwen/article/details/5523128

http://tool.chinaz.com/regex

http://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html

http://www.vckbase.com/index.php/wv/1351

xust999

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
VC正则表达式的使用

VC正则表达式的使用正则表达式是一种对字符进行模糊匹配的一个公式。在数据有效性验证，查找，替换文本中都可以使用正则表达式。本篇文章主要描述的是使用ATL中两个模板类CAtlRegExp和CAtlREMatchContext。在使用CAtlRegExp类之前需要添加#include 这个头文件。RegExp是Regular Expression的缩写以匹配邮件
复制链接

扫一扫