正则表达式详解 + C#和TypeScript示例

最新推荐文章于 2024-04-01 21:39:49 发布

爱敲代码的小伙计

最新推荐文章于 2024-04-01 21:39:49 发布

阅读量1.9k

点赞数

分类专栏：正则表达式文章标签：正则表达式

本文链接：https://blog.csdn.net/qq_35231953/article/details/106835296

版权

正则表达式专栏收录该内容

1 篇文章 0 订阅

订阅专栏

之前学正则表达式的时候发现很多帖子讲的都不详细，所以东拼西凑找了很多帖子。在这里总结一份自认为比较详细易懂的笔记供大家参考

一、元字符

元字符	说明	举例
.	匹配除换行符以外的任意字符	.abc => aabcdef 、 6abcww
\w	匹配字母或数字或下划线或汉字	\w3 => b3df 、 333aad3kk
\s	匹配任意的空白符	a\sb => sssa bdd
\d	匹配数字	\da => wa3a
\b	匹配单词的开始或结束	\binfo\b => abcinfodex info des
^	匹配字符串的开始(作用域为^右侧所有字符)	^info => infoabc info dex
$	匹配字符串的结束(作用域为$左侧所有字符)	info$ => infoabc info dex info
[]	字符种类。匹配方括号内的任意字符	[b3]3 => b3df 、 333aad3kk
[^]	否定的字符种类。匹配除了方括号里的任意字符	[^a]b => sssa bddabc [^(ac)]b => sssa bddabcb
(xyz)	字符集，匹配与xyz完全相等的字符串	(abc)d => abcdeabcaad
\|	或运算符，匹配符号前或后的字符	(a\|b)d => adbd abd
\	转义字符，用于匹配一些保留的字符 [ ] ( ) { } . * + ? ^ $ \ \|
@	在字符串前加入此符号表示忽略转义	例如路径"c:\abc\d.txt"将产生编译器错误，可以写为@"c:\abc\d.txt

二、重复限定符

限定符是作用在与他左边最近的一个字符。(贪婪匹配，非贪婪匹配在后面加个"?" 。非贪婪是指尽可能少重复)

语法（贪婪匹配）	说明	懒惰（非贪婪）
*	重复0次或更多次 <=> {0,}	*?
+	重复1次或更多次 <=> {1,}	+?
?	重复0次或1次（即可选字符）	??
{n}	重复n次
{n,}	重复n次或更多次	{n,}?
{n,m}	重复n到m次	{n,m}?

举例：

1.匹配8位数字的QQ号码： ^\d{8}$

2.匹配1开头11位数字的号码： ^1\d{10}$

3.匹配14~18位的数字： ^d{14,18}$

4.匹配以a开头的，0个或多个b结尾的字符串： ^ab*$

5.*号匹配在*之前的字符出现大于等于0次。例如，表达式 a* 匹配0或更多个以a开头的字符。表达式[a-z]* 匹配一个行中所有以小写字母开头的字符串 "[a-z]*" => The car parked in the garage #21.

6.在正则表达式中元字符 ? 标记在符号前面的字符为可选，即出现 0 或 1 次。例如，表达式 [T]?he 匹配字符串 he 和 The。

"[T]he" => The car is parked in the garage.
"[T]?he" => The car is parked in the garage.

7.在正则表达式中 {} 是一个量词，常用来一个或一组字符可以重复出现的次数。例如，表达式 [0-9]{2,3} 匹配最少 2 位最多 3 位 0~9 的数字。"[0-9]{2,3}" => The number was 9.9997 but we rounded it off to 10.0.

8.[0-9]{2,} 匹配至少两位 0~9 的数字。 "[0-9]{2,}" => The number was 9.9997 but we rounded it off to 10.0.

三、分组

正则表达式中用小括号()来做分组，也就是括号中的内容作为一个整体。

1.匹配字符串中包含0到多个ab开头： ^(ab)*

四、转义

正则提供了转义的方式，也就是要把这些元字符、限定符或者关键字转义成普通的字符。

反斜线 \ 在表达式中用于转码紧跟其后的字符。用于指定 { } [ ] / \ + * . $ ^ | ? 这些特殊字符。如果想要匹配这些特殊字符则要在其前面加上反斜线 \。

1. 匹配字符串中包含0到多个(ab)开头： ^($ab$)*

2. . 是用来匹配除换行符外的所有字符的。如果想要匹配句子中的 . 则要写成 \. 以下这个例子 \.?是选择性匹配.

"(f|c|m)at\.?" => The fat cat sat on the mat.

五、条件或

正则用符号 | 来表示或，也叫做分支条件，当满足正则里的分支条件的任何一种条件时，都会当成是匹配成功。

"(f|c|m)at\.?" => The fat cat sat on the mat.

六、区间

正则提供一个元字符中括号 [] 来表示区间条件。

限定0到9 可以写成[0-9]
限定A-Z 写成[A-Z]
限定某些数字 [165]

七、简写字符集

简写	描述
.	除换行符外的所有字符
\w	匹配所有字母数字下划线，等同于 [a-zA-Z0-9_]
\W	匹配所有非字母数字，即符号，等同于： [^\w]
\d	匹配数字： [0-9]
\D	匹配非数字： [^\d]
\B	匹配不是单词开头或结束的位置
\s	匹配所有空格字符，等同于： [\t\n\f\r\p{Z}]
\S	匹配所有非空格字符： [^\s]
\f	匹配一个换页符
\n	匹配一个换行符
\r	匹配一个回车符
\t	匹配一个制表符
\v	匹配一个垂直制表符
\p	匹配 CR/LF（等同于 \r\n），用来匹配 DOS 行终止符

八、零宽断言

1.正向先行断言（正前瞻）

语法：（?=pattern）作用：匹配pattern表达式的前面内容，不返回本身。

例：My name is Zhang. 我们想要匹配 Zhang前面的全部内容

.*(?=Zhang) => My name is Zhang.

2.正向后行断言（正后顾）

语法：（?<=pattern）作用：匹配pattern表达式的后面的内容，不返回本身。

例：m_ParentPrefab: {fileID: 0} 我们想要匹配m_ParentPrefab:后面的全部内容:

(?<=m_ParentPrefab:).* => m_ParentPrefab: {fileID: 0}

3.负向先行断言（负前瞻）

语法： (?!pattern) 作用：匹配非pattern表达式的前面内容，不返回本身。

例：The fat cat sat on the mat. 我们想匹配不是在 (空格)fat 前面的那个 (T|t)he

"(T|t)he(?!\sfat)" => The fat cat sat on the mat.

4.负向后行断言（负后顾）

语法：(?<!pattern) 作用：匹配非pattern表达式的后面内容，不返回本身。

例： The cat sat on cat. 我们想匹配不是在The后面的那个cat

"(?<!The\s)(cat)" => The cat sat on cat.

九、捕获和非捕获

1.数字捕获组

语法：(exp)

解释：从表达式左侧开始，每出现一个左括号和它对应的右括号之间的内容为一个分组，在分组中，第0组为整个表达式，第一组开始为分组。

比如固定电话的：020-85653333

正则表达式为：(0\d{2})-(\d{8})

C#:

string str = "020-85653333";
Regex test = new Regex("(0\\d{2})-(\\d{8})");
var matches = test.Matches(str);
for (int i = 0; i < matches.Count; i++)
{
    var match = matches[i];
    for (int j = 0; j < match.Groups.Count; j++)
    {
       Log("组Id：", j, "组值：", match.Groups[j].Value);
    }
}
//    组Id：    0    组值：    020-85653333
//    组Id：    1    组值：    020
//    组Id：    2    组值：    85653333

TypeScript:

let str = "020-85653333";
let reg = new RegExp( "(0\\d{2})-(\\d{8})" );
let matches = str.match(reg);
if( matches !== null )
{
    for (let index = 0; index < matches.length; index++) {
        console.log("组id:",index,"组值",matches[index]);
    }
}
//组id: 0 组值 020-85653333
//组id: 1 组值 020
//组id: 2 组值 85653333

注意：在 ts 中使用全局模式，是会忽略分组的。

let str = "020-85653333";
let reg = new RegExp( "(0\\d{2})-(\\d{8})","g" );
let matches = str.match(reg);
if( matches !== null )
{
    for (let index = 0; index < matches.length; index++) {
        console.log("组id:",index,"组值",matches[index]);
    }
}
//组id: 0 组值 020-85653333

2.命名编号捕获组

语法： (?<name>exp) 解释：分组的命名由表达式中的name指定

C#:

string str = "020-85653333";
Regex test = new Regex("(?<groupId>0\\d{2})-(?<groupValue>\\d{8})");
var matches = test.Matches(str);
for (int i = 0; i < matches.Count; i++)
{
   var match = matches[i];
   Log("groupId:  ", match.Groups["groupId"].Value, "groupValue:  ", match.Groups["groupValue"].Value);
}
//    groupId:      020    groupValue:      85653333

TypeScript:

let str = "020-85653333";
let reg = new RegExp( "(?<groupId>0\\d{2})-(?<groupValue>\\d{8})" );
let matches = str.match(reg);
if( matches !== null && matches.groups !== undefined )
{
    console.log( "groupId:", matches.groups["groupId"], "groupValue" , matches.groups["groupValue"]);
}
//groupId: 020 groupValue 85653333

3.非捕获组

语法：(?:exp)

解释：和捕获组刚好相反，它用来标识那些不需要捕获的分组。

C#:

string str = "020-85653333";
Regex test = new Regex("(?:0\\d{2})-(\\d{8})");
var matches = test.Matches(str);
for (int i = 0; i < matches.Count; i++)
{
    var match = matches[i];
    for (int j = 0; j < match.Groups.Count; j++)
    {
        Log("组Id：", j, "组值：", match.Groups[j].Value);
    }
}
//    组Id：    0    组值：    020-85653333
//    组Id：    1    组值：    85653333

TypeScript:

let str = "020-85653333";
let reg = new RegExp( "(?:0\\d{2})-(\\d{8})" );
let matches = str.match(reg);
if( matches !== null )
{
    for (let index = 0; index < matches.length; index++) {
        console.log( "组Id:" , index , "组值:" , matches[index] );
    }
}
//组Id: 0 组值: 020-85653333
//组Id: 1 组值: 85653333

十、反向引用

捕获会返回一个捕获组，这个分组是保存在内存中，不仅可以在正则表达式外部通过程序进行引用，也可以在正则表达式内部进行引用，这种引用方式就是反向引用。

根据捕获组的命名规则，反向引用可分为：

数字编号组反向引用：\k<number> 或\number

命名编号组反向引用：\k<name> 或者\'name'

作用：主要是用来查找一些重复的内容或者做替换指定字符。

捕获组有两种命名方式：一种是是根据捕获分组顺序命名，一种是自定义命名来作为捕获组的命名

在默认情况下都是以数字来命名，而且数字命名的顺序是从1开始的

例1：比如要查找一串字母"aabbbbgbddesddfiid"里成对的字母

因此要引用第一个捕获组，根据反向引用的数字命名规则就需要 \k<1>或者\1

当然，通常都是是后者。

C#:

string str = "aabbbbgbddesddfiid";
Regex test = new Regex("(\\w)\\1");
//Regex test = new Regex("(\\w)\\k<1>");
//Regex test = new Regex("(?<test>\\w)\\k<test>");
//Regex test = new Regex("(?<test>\\w)\\k'test'");
//Regex test = new Regex("(?<test>\\w)\\k\'test\'");
var matches = test.Matches(str);
for (int i = 0; i < matches.Count; i++)
{
    var match = matches[i];
    Log(match.Value);
}
/*==》
1   aa
2   bb
3   bb
4   dd
5   dd
6   ii
*/

TypeScript:

需要注意的是，在 ts 中，有分组的话默认匹配模式是非全局模式，所以想匹配全部需要加标识符 g 。

let str = "aabbbbgbddesddfiid";
let reg = new RegExp( "(\\w)\\1" );
let matches = str.match(reg);
if( matches !== null )
{
    for (let index = 0; index < matches.length; index++) {
        console.log( index , "值:" , matches[index] );
    }
}
//0 值: aa
//1 值: a


let str = "aabbbbgbddesddfiid";
let reg = new RegExp( "(\\w)\\1","g" );
let matches = str.match(reg);
if( matches !== null )
{
    for (let index = 0; index < matches.length; index++) {
        console.log( index , "值:" , matches[index] );
    }
}
//0 值: aa
//1 值: bb
//2 值: bb
//3 值: dd
//4 值: dd
//5 值: ii

例2：把字符串中abc换成a

C#：

string str = "abcbbabcbcgbddesddfiid";
Regex test = new Regex("(a)(b)c");
var str2 = test.Replace(str,"a");
Log(str2);//==》abbabcgbddesddfiid
var str3 = test.Replace(str, "$1");//取组1的值 (a)(b)c中的a
Log(str3);//==》abbabcgbddesddfiid
var str4 = test.Replace(str, "$2");//取组2的值 (a)(b)c中的b
Log(str4);//==》bbbbbcgbddesddfiid

TypeScript:

let str = "abcbbabcbcgbddesddfiid";
let reg = new RegExp( "(a)(b)c" , "g" );
console.log( str.replace(reg , "-" ) );
//-bb-bcgbddesddfiid
console.log( str.replace(reg , "$1" ) );
//abbabcgbddesddfiid
console.log( str.replace(reg , "$2" ) );
//bbbbbcgbddesddfiid

十一、贪婪和非贪婪

1.贪婪

贪婪匹配：当正则表达式中包含能接受重复的限定符时，通常的行为是（在使整个表达式能得到匹配的前提下）匹配尽可能多的字符，这匹配方式叫做贪婪匹配。

特性：一次性读入整个字符串进行匹配，每当不匹配就舍弃最右边一个字符，继续匹配，依次匹配和舍弃（这种匹配-舍弃的方式也叫做回溯），直到匹配成功或者把整个字符串舍弃完为止，因此它是一种最大化的数据返回，能多不会少。

前面的重复限定符就是贪婪量词，例如：\d{3,6}

"/(.*at)/" => The fat cat sat on the mat.
"/(.*?at)/" => The fat cat sat on the mat.

多个贪婪在一起时，如果字符串能满足他们各自最大程度的匹配时，就互不干扰，但如果不能满足时，会根据深度优先原则，也就是从左到右的每一个贪婪量词，优先最大数量的满足，剩余再分配下一个量词匹配。

C#:

string str = "61762828 176 2991 87321";
string reg = "(\\d{1,2})(\\d{3,4})";
Regex test = new Regex(reg);
Log("文本：" + str);
Log("贪婪模式：" + reg); 
var matches = test.Matches(str);
for (int i = 0; i < matches.Count; i++)
{
   var match = matches[i];
   Log("匹配结果：",match.Value);
}
//==》
//文本：61762828 176 2991 87321
//贪婪模式：(\d{1,2})(\d{3,4})
//匹配结果：617628
//匹配结果：2991
//匹配结果：87321

TypeScript:

let str = "61762828 176 2991 87321";
let reg = new RegExp( "(\\d{1,2})(\\d{3,4})" , "g" );
console.log( "文本：" , str );
let matches = str.match( reg );
if(matches !== null)
{
    for (let index = 0; index < matches.length; index++) {
        console.log("匹配结果:" , index , matches[index] );
    }
}
//文本： 61762828 176 2991 87321
//匹配结果: 0 617628
//匹配结果: 1 2991
//匹配结果: 2 87321

2.懒惰（非贪婪）

懒惰匹配：当正则表达式中包含能接受重复的限定符时，通常的行为是（在使整个表达式能得到匹配的前提下）匹配尽可能少的字符，这匹配方式叫做懒惰匹配。

特性：从左到右，从字符串的最左边开始匹配，每次试图不读入字符匹配，匹配成功，则完成匹配，否则读入一个字符再匹配，依此循环（读入字符、匹配）直到匹配成功或者把字符串的字符匹配完为止。

懒惰量词是在贪婪量词后面加个"?"

语法	说明
*?	重复零次或更多次,但尽可能少重复
+?	重复一次或更多次,但尽可能少重复
??	重复零次或一次,但尽可能少重复
{n,}?	重复n次或更多次,但尽可能少重复
{n,m}?	重复n到m次,但尽可能少重复

C#:

string str = "61762828 176 2991 87321";
string reg = "(\\d{1,2}?)(\\d{3,4})";
Regex test = new Regex(reg);
Log("文本：" + str);
Log("贪婪模式：" + reg); 
var matches = test.Matches(str);
for (int i = 0; i < matches.Count; i++)
{
   var match = matches[i];
   Log("匹配结果：",match.Value);
}
//==》
//文本：61762828 176 2991 87321
//贪婪模式：(\d{1,2}?)(\d{3,4})
//匹配结果：61762
//匹配结果：2991
//匹配结果：87321

TypeScript:

let str = "61762828 176 2991 87321";
let reg = new RegExp( "(\\d{1,2}?)(\\d{3,4})" , "g" );
console.log( "文本：" , str );
let matches = str.match( reg );
if(matches !== null)
{
    for (let index = 0; index < matches.length; index++) {
        console.log("匹配结果:" , index , matches[index] );
    }
}
//文本： 61762828 176 2991 87321
//匹配结果: 0 61762
//匹配结果: 1 2991
//匹配结果: 2 87321