C# 中正则表达式 Group 分组

最新推荐文章于 2024-02-04 12:26:35 发布

anmu1361

最新推荐文章于 2024-02-04 12:26:35 发布

阅读量138

点赞数

文章标签： c#

原文链接：http://www.cnblogs.com/px7034/archive/2011/01/24/1943062.html

版权

引用：http://www.cnblogs.com/kiant71/archive/2010/08/14/1799799.html

在一个正则表达式中，如果要提取出多个不同的部分（子表达式项），需要用到分组功能。

在 C# 正则表达式中，Regex 成员关系如下，其中 Group 是其分组处理类。

Regex –> MatcheCollection (匹配项集合)

          –> Match (单匹配项内容)

                –> GroupCollection (单匹配项中包含的 "(分组/子表达式项)" 集合)

                      –> Group ( "(分组/子表达式项)" 内容)

                            –> CaputerCollection (分组项内容显示基础？)

                                  –> Caputer

Group 对分组有两种访问方式：

1、数组下标访问

在 ((\d+)([a-z]))\s+ 这个正则表达式里总共包含了四个分组，按照默认的从左到右的匹配方式，

Groups[0]    代表了匹配项本身，也就是整个整个表达式 ((\d+)([a-z]))\s+

Groups[1]    代表了子表达式项 ((\d+)([a-z]))

Groups[2]    代表了子表达式项 (\d+)

Groups[3]    代表了子表达式项 ([a-z])

 
     00string text = "1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080"; 
 
     01Response.Write(text + "<br/>"); 
 
     02 
 
     03string strPatten = @"((\d+)([a-z]))\s+"; 
 
     04Regex rex = new Regex(strPatten, RegexOptions.IgnoreCase); 
 
     05MatchCollection matches = rex.Matches(text); 
 
     06 
 
     07//提取匹配项 
 
     08foreach (Match match in matches) 
 
     09{ 
 
     10    GroupCollection groups = match.Groups; 
 
     11    Response.Write(string.Format("<br/>{0} 共有 {1} 个分组：{2}<br/>"
 
     12                                , match.Value, groups.Count, strPatten)); 
 
     13 
 
     14    //提取匹配项内的分组信息 
 
     15    for (int i = 0; i < groups.Count; i++) 
 
     16    { 
 
     17        Response.Write( 
 
     18            string.Format("分组 {0} 为 {1}，位置为 {2}，长度为 {3}<br/>"
 
     19                        , i 
 
     20                        , groups[i].Value 
 
     21                        , groups[i].Index 
 
     22                        , groups[i].Length)); 
 
     23    } 
 
     24} 
 
     25 
 
     26/*  
 
     27 * 输出： 
 
     28 1A 2B 3C 4D 5E 6F 7G 8H 9I 10J 11Q 12J 13K 14L 15M 16N ffee80 #800080 
 
     29 
 
     301A 共有 4 个分组：((\d+)([a-z]))\s+ 
 
     31分组 0 为 1A ，位置为 0，长度为 3 
 
     32分组 1 为 1A，位置为 0，长度为 2 
 
     33分组 2 为 1，位置为 0，长度为 1 
 
     34分组 3 为 A，位置为 1，长度为 1 
 
     35   
 
     36 .... 
 
     37   
 
     38 */

2、命名访问

利用 (?<xxx>子表达式) 定义分组别名，这样就可以利用 Groups["xxx"] 进行访问分组/子表达式内容。

 
     00string text = "I've found this amazing URL at http://www.sohu.com, and then find ftp://ftp.sohu.comisbetter."; 
 
     01Response.Write(text + "<br/>"); 
 
     02 
 
     03string pattern = @"\b(?<protocol>\S+)://(?<address>\S+)\b"; 
 
     04Response.Write(pattern.Replace("<", "&lt;").Replace(">","&gt;") + "<br/><br/>"); 
 
     05 
 
     06MatchCollection matches = Regex.Matches(text, pattern); 
 
     07foreach (Match match in matches) 
 
     08{ 
 
     09    GroupCollection groups = match.Groups; 
 
     10    Response.Write(string.Format( 
 
     11                    "URL: {0}； Protocol: {1}； Address: {2} <br/>"
 
     12                    , match.Value 
 
     13                    , groups["protocol"].Value  
 
     14                    , groups["address"].Value)); 
 
     15} 
 
     16 
 
     17/*  
 
     18 * 输出 
 
     19 I've found this amazing URL at http://www.sohu.com, and then find ftp://ftp.sohu.comisbetter. 
 
     20    \b(?<protocol>\S+)://(?<address>\S+)\b 
 
     21 
 
     22    URL: http://www.sohu.com； Protocol: http； Address: www.sohu.com  
 
     23    URL: ftp://ftp.sohu.comisbetter； Protocol: ftp； Address: ftp.sohu.comisbetter  
 
     24 
 
     25 */