正则表达式入门

最新推荐文章于 2020-11-25 23:58:23 发布

辰三

最新推荐文章于 2020-11-25 23:58:23 发布

阅读量438

点赞数

分类专栏： c# regex 文章标签： c# regex 正则表达式

本文链接：https://blog.csdn.net/u012549951/article/details/109775216

版权

c# 同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

regex

1 篇文章 0 订阅

订阅专栏

文章目录

前言
一、开始干
结束上代码

前言

正则表达式一直不懂，最近在看《正则表达式必知必会》，补充一下短板，下面记录一些基础知识，自己做个笔记。ps基于C#的，不同语言格式上会有一丢丢的不同，大部分都是相同的。

这里的代码可能不太好看，拷入vs里面会有一些提示和颜色，会好很多，在文末放入了所有代码，可以对照的看

一、开始干

1.字符匹配

随便给个字符串

string hello = "你好2a,你妈好a，你爸好a，你爸好，你爷好p，欢迎来到知识的海洋,知识是人类进步的阶梯好.";

单字符直接进行匹配就可以

var result1 = Regex.Matches(hello, "知识");//result1的matches count=2，将两个知识都匹配出来了

2.任意匹配 .

“.”表示对任意的匹配，但是只能代替一个

var result3 = Regex.IsMatch(hello, "进步.");//进步的
var result4 = Regex.Matches(hello, ".好.");//你好2  你妈好  你爸好 你爷好 梯好.
//真正的.可以用转义来表示\.
var result5 = Regex.Matches(hello, @".好\.");//梯好.

3.单字符的或 [和]

如大小写都要匹配的时候[Tt]，当然c#有个option叫RegexOptions.IgnoreCase 可以简化表达式的写法

//[和]的意思
var result6 = Regex.Matches(hello, "[你妈]好");//你好，妈好

4.区间

//[-]从哪到哪，按照ascii
var result7 = Regex.Matches(hello, ".好[a-z]");//妈好a  爸好a   爷好p
var result8 = Regex.Matches(hello, ".好[0-9]");//你好2
//多个区间值
var result9 = Regex.Matches(hello, ".好[a-z0-9]");//妈好a  爸好a   爷好p 你好2

5.简写

注意，在c#里，双引号里面用反斜线需要在前面加@转义下子

\d 表示[0-9] 是数字的简写

\D 表示非[0-9]，其他所有

\w 表示a-z、A-Z、_、0-9、汉字5种，表示的很广

\W 表示非\w，就一些特殊字符，像一些标点

\s表示一些空白，像空格，tab

6. 运算符

 string hello2 = "hai..1.123456789@163.com";

^取非

//取非区间,在前面加个^就可以了
var result10 = Regex.Matches(hello, "好[^a-f0-9.]");//好，  好p

+ 表示1次或多次

var result17 = Regex.Matches(hello2, @"[\w.]+@\w+\.\w+");//hai..1.123456789@163.com 注意:“.”出现在【】中不需要加反斜线转义
var result18 = Regex.Matches(hello2, @"\w+@\w+\.\w+");//123456789@163.com

*表示0次或多次

//匹配次数可以是0次或多次，后面加*（与+用法一致）
//可能有疑问 123456789@163.com也匹配呀，这个匹配是从字符串开头开始匹配的，1.123456789@163.com已经把字符串给匹配没了
var result19 = Regex.Matches(hello2, @"\w+\.*\w+@\w+\.\w+");//1.123456789@163.com

？表示0次或1次

//匹配0次或1次 “?” *注意*：+、*、? 要放在[]外面，放里面成了或者的意思了。[\w?]匹配字符或者？
var result20 = Regex.Matches(hello2, @"\w+\.?\w+@\w+\.\w+");//1.123456789@163.com
//在使用过程中习惯将一个匹配原则用[]括起来，增加可读性
var result21 = Regex.Matches(hello2, @"[\w]+[.]?[\w]+[@][\w]+[.][\w]+");//1.123456789@163.com

精确到数量{count}、{minCount,maxCount}、{minCount,}

//精确匹配的数量{count}{minCount,maxCount}{minCount,}(最少出现minCount次)
var result22 = Regex.Matches(hello2, @"[\w][.]?[\d]{6}[@][\d]+[.][a-z]+");//3456789@163.com @前的数字要出现6次
var result23 = Regex.Matches(hello2, @"[\w][.]?[\d]{6,8}[@][\d]+[.][\w]+");//123456789@163.com @前的数字要出现6-8次,取最大次数
var result24 = Regex.Matches(hello2, @"[\w][.]?[\d]{7,}[@][\d]+[.][\w]+");//1.123456789@163.com  @前的数字最少要出现7次,取最大次数

贪婪型与懒惰性

+ * 都是贪婪型，是越多越好，加个?后，就会变成懒惰型，越少越好

string hello3 = "<body>\r\n //以下是div \r\n <div>this is title</div>\r\n <div>this is content</div> \r\n</body>";
var result25 = Regex.Matches(hello3, @"<div>.*</div>");//1个结果<div>this is title</div>\r\n <div>this is content</div>
var result26 = Regex.Matches(hello3, @"<div>.*?</div>");//2个结果1.<div>this is title</div> 2.<div>this is content</div>

位置表示

（1）全字匹配 \b \B

//位置匹配  \bis\b 表示is全词匹配 this不会被匹配到 \b表示字符串的开始或结尾 \b可以单独用表示字符串以什么开头或结尾
//\B表示前后都是空格的匹配 例如： 你-好，我 - 好，用\b-\b会有两个结果 \B-\B 只有我 - 好这一个结果
var result27 = Regex.Matches(hello3, "is");//this is this is
var result28 = Regex.Matches(hello3, @"\bis\b");// is  is 
var result29 = Regex.Matches(hello, @"\b你爸好\b");//你爸好

（2）字符串的开头结尾 ^ $

^在表达式的开头表示下面是以什么开头的 $放在结尾表示以什么结尾的（^只有在方括号里才表示非的概念）

var result30 = Regex.Matches(hello3, @"^\s*<body>");//以<body>开头
var result31 = Regex.Matches(hello3, @"</body>\s*$");//以</body>结尾
var result32 = Regex.Matches(hello3, @"^\s*<body>.*?</body>\s*$");//以<body>开头,以</body>结尾

（3）分行的开头 (?m)

加了\r\n 就是另起一行了
例如下面查看注释内容

var result44 = Regex.Matches(hello3, @"^\s *//.*");//  //以下是div \r\n <div>this is title</div>\r\n <div>this is content</div> \r\n</body>
var result33 = Regex.Matches(hello3, @"(?m)^\s*//.*$");以下是div
var result43 = Regex.Matches(hello3, @"^\s *//.*", RegexOptions.Multiline);//c#里有个option选项，简化了表达式的复杂性

子表达式 ()

string hello4 = "加油加油你最强，加油加油你最棒，加油加油加油！";

（1）将多个字符作为一个整体

var result34 = Regex.Matches(hello4, @"加油{2,}[\w]+");//这样是单字符的匹配 只能匹配出加油油这样的，好像还很萌 哈哈
var result35 = Regex.Matches(hello4, @"(加油){2,}[\w]+");//加油加油你最强   加油加油你最棒   加油加油加油
var result36 = Regex.Matches(hello4, @"((加油){2,}[\w]+，){2}(加油){2,}[\w]+");//嵌套了一下 加油加油你最强，加油加油你最棒，加油加油加油

（2）多字符的或 |

//多字符的或 单字符或是通过[]实现，多字符或通过(ab|cd)实现
var result37 = Regex.Matches(hello4, @"[\w]+(最强|最棒)");//加油加油你最强   加油加油你最棒

回溯引用

- \n
引用前面表达式的结果先检查前面表达式的结果，有了固定的值后再匹配后面的回溯引用
n是某个num \1表示第一个子表达式 \2表示第二个子表达式一次类推

string hello5 = "<body>\r\n //以下是div \r\n <div><h1>this is title</h1></div>\r\n <div><h2>this is content</h2></div> \r\n <div><h3>this is bottom</h4></div>\r\n<h5>没想到吧，我还有一层</h5></body>";

var result38 = Regex.Matches(hello5, @"<[hH]\d>.*?</[hH]\d>");//<h1>this is title</h1>  <h2>this is content</h2>  <h3>this is bottom</h4>  <h5>没想到吧，我还有一层</h5>
//<h3>this is bottom</h4>  3和4 值不同，所以回溯引用不会匹配他 如果是</h3>就可以了
var result39 = Regex.Matches(hello5, @"<[hH](\d)>.*?</[hH]\1>");//<h1>this is title</h1>  <h2>this is content</h2>   <h5>没想到吧，我还有一层</h5>

- 子表达式的别名（命名捕获）
c# 有对子表达式起别名的功能，很内丝
起名是在子表达式括号里用(?子表达式) 回溯时\k 替换时用${name}

var result40 = Regex.Matches(hello5, @"<[hH](?<num>\d)>.*?</[hH]\k<num>>");//<h1>this is title</h1>  <h2>this is content</h2>   <h5>没想到吧，我还有一层</h5>
var result41 = Regex.Replace(hello5, @"(?<head><[hH](?<num>\d)>)(?<content>.*?)(?<foot></[hH]\k<num>>)", @"${head}<b>${content}</b>${foot}");//在h标签里面加了个b标签

向前向后查找

（1）正向前查找?= 正向后?<=

//(?<=<[H](?<num>\d)>) <hn>后的内容，但不包括Hn，需要用括号包住
var result42 = Regex.Matches(hello5, @"(?<=<[H](?<num>\d)>).*(?=</[H]\k<num>>)", RegexOptions.IgnoreCase);//this is title   this is content  没想到吧，我还有一层

（2）负向前查找?！负向后?<！

string hello6 = "一个苹果5块钱，100个苹果多少钱";
var result45 = Regex.Matches(hello6, @"\d+(?=块钱)");//5
//查找数字后面不包含块钱的匹配项
var rsult46 = Regex.Matches(hello6, @"\d+(?!块钱)");//100

条件嵌入(?(regex)true|false)

查找h标签内容，如果上层有div包裹，把div也查出来

//有些难看懂 (?<div><div>)?解释?<div>是取别名 括号外面有?表示<div>可有可无 
//(?(div)</div>|</body>)解释 标准的条件嵌入格式，如果div这个子表达式匹配成功，则后面也有</div> 如果没有匹配到那就加</body> 
//这个false是我随意加的，一点实际意义都没有
var result47 = Regex.Matches(hello5, @"(?<div><div>)?\s*<h(?<num>\d)>.*?</h\k<num>>(?(div)</div>|</body>)");//<div><h1>this is title</h1></div>    <div><h2>this is content</h2></div>    <h5>没想到吧，我还有一层</h5></body>

结束上代码

结束了。上全部代码

			string hello = "你好2a,你妈好a，你爸好a，你爸好，你爷好p，欢迎来到知识的海洋,知识是人类进步的阶梯好.";
            //单纯查找单个
            var result = Regex.Match(hello, "知识");
            //查找多个
            var result1 = Regex.Matches(hello, "知识");
            //.可以任意字符
            var result2 = Regex.Matches(hello, "你.好");
            var result3 = Regex.IsMatch(hello, "进步.");
            var result4 = Regex.Matches(hello, ".好.");
            //真正的.可以用转义来表示\.
            var result5 = Regex.Matches(hello, @".好\.");
            //[和]的意思：你好，妈好
            var result6 = Regex.Matches(hello, "[你妈]好");
            //[-]从哪到哪，按照ascii
            var result7 = Regex.Matches(hello, ".好[a-z]");
            var result8 = Regex.Matches(hello, ".好[0-9]");
            //\d是[0-9]的简写 \D是[^0-9]的简写
            var result12 = Regex.Matches(hello, @".好\d");
            //多个区间值
            var result9 = Regex.Matches(hello, ".好[a-z0-9]");
            //取非区间,在前面加个^就可以了
            var result10 = Regex.Matches(hello, "好[^a-f0-9.]");

            string hello1 = "好随便给出 \r\n一串字母数字下划线组 合： 好_0a1q2e3e4r5";
            //\w：result15再加上汉字 \W 是取非
            var result13 = Regex.Matches(hello1, @"\w");
            var result14 = Regex.Matches(hello1, @"\W");
            var result15 = Regex.Matches(hello1, "[^a-zA-Z0-9_]");
            //\s空白字符  \S非空白字符
            var result16 = Regex.Matches(hello1, @"\s.");

            string hello2 = "hai..1.123456789@163.com";
            //匹配表达式后面加个“+”表示后面可以出现1次或多次
            var result17 = Regex.Matches(hello2, @"[\w.]+@\w+\.\w+");
            var result18 = Regex.Matches(hello2, @"\w+@\w+\.\w+");
            //匹配次数可以是0次或多次，后面加*（与+用法一致）
            var result19 = Regex.Matches(hello2, @"\w+\.*\w+@\w+\.\w+");
            //匹配0次或1次 “?” *注意*：+、*、? 要放在[]外面，放里面成了或者的意思了。[\w?]匹配字符或者？
            var result20 = Regex.Matches(hello2, @"\w+\.?\w+@\w+\.\w+");
            var result21 = Regex.Matches(hello2, @"[\w]+[.]?[\w]+[@][\w]+[.][\w]+");
            //精确匹配的数量{count}{minCount,maxCount}{minCount,}(最少出现minCount次)
            var result22 = Regex.Matches(hello2, @"[\w][.]?[\d]{6}[@][\d]+[.][a-z]+");
            var result23 = Regex.Matches(hello2, @"[\w][.]?[\d]{6,8}[@][\d]+[.][\w]+");
            var result24 = Regex.Matches(hello2, @"[\w][.]?[\d]{7,}[@][\d]+[.][\w]+");

            //贪婪型与懒惰性 + * 都是贪婪型，是越多越好，加个?后，就会变成懒惰型，越少越好
            string hello3 = "<body>\r\n //以下是div \r\n <div>this is title</div>\r\n <div>this is content</div> \r\n</body>";
            var result25 = Regex.Matches(hello3, @"<div>.*</div>");
            var result26 = Regex.Matches(hello3, @"<div>.*?</div>");

            //位置匹配  \bis\b 表示is全词匹配 this不会被匹配到 \b表示字符串的开始或结尾 \b可以单独用表示字符串以什么开头或结尾
            //\B表示前后都是空格的匹配 例如： 你-好，我 - 好，用\b-\b会有两个结果 \B-\B 只有我 - 好这一个结果
            var result27 = Regex.Matches(hello3, "is");
            var result28 = Regex.Matches(hello3, @"\bis\b");
            var result29 = Regex.Matches(hello, @"\b你爸好\b");

            //字符串的开头结尾  ^在表达式的开头表示下面是以什么开头的 $放在结尾表示以什么结尾的（^只有在方括号里才表示非的概念）
            var result30 = Regex.Matches(hello3, @"^\s*<body>");//以<body>开头
            var result31 = Regex.Matches(hello3, @"</body>\s*$");//以</body>结尾
            var result32 = Regex.Matches(hello3, @"^\s*<body>.*?</body>\s*$");//以<body>开头,以</body>结尾
            //分行开头结尾(?m),例如我查找program.cs这个类里所有通过//注释的内容  (?m)^\s*//.*$
            var result33 = Regex.Matches(hello3, @"(?m)^\s*//.*$");
            var result43 = Regex.Matches(hello3, @"^\s *//.*", RegexOptions.Multiline);
            var result44 = Regex.Matches(hello3, @"^\s *//.*");
            //子表达式，将多个字符作为一个整体，例如"加油"
            string hello4 = "加油加油你最强，加油加油你最棒，加油加油加油！";
            var result34 = Regex.Matches(hello4, @"加油{2,}[\w]+");//这样是单字符的匹配 只能匹配出加油油这样的
            var result35 = Regex.Matches(hello4, @"(加油){2,}[\w]+");
            var result36 = Regex.Matches(hello4, @"((加油){2,}[\w]+，){2}(加油){2,}[\w]+");

            //多字符的或 单字符或是通过[]实现，多字符或通过(ab|cd)实现
            var result37 = Regex.Matches(hello4, @"[\w]+(最强|最棒)");

            string hello5 = "<body>\r\n //以下是div \r\n <div><h1>this is title</h1></div>\r\n <div><h2>this is content</h2></div> \r\n <div><h3>this is bottom</h4></div>\r\n<h5>没想到吧，我还有一层</h5></body>";

            //回溯引用 通过\n表示和前面第n个子表达式的结果一致，引用前面子表达式的结果
            var result38 = Regex.Matches(hello5, @"<[hH]\d>.*?</[hH]\d>");
            var result39 = Regex.Matches(hello5, @"<[hH](\d)>.*?</[hH]\1>");
            //子表达式的别名（命名捕获）
            var result40 = Regex.Matches(hello5, @"<[hH](?<num>\d)>.*?</[hH]\k<num>>");
            var result41 = Regex.Replace(hello5, @"(?<head><[hH](?<num>\d)>)(?<content>.*?)(?<foot></[hH]\k<num>>)", @"${head}<b>${content}</b>${foot}");
            //向前查找？= 向后查找?<=  ?!负向前查找   ?<!负向后查找
            var result42 = Regex.Matches(hello5, @"(?<=<[H](?<num>\d)>).*(?=</[H]\k<num>>)", RegexOptions.IgnoreCase);
            //var result45= Regex.Matches(hello5, @"($'<[H](?<num>\d)>).*($`</[H]\k<num>>)", RegexOptions.IgnoreCase);
            //var result42 = Regex.Matches(hello5, @"<[hH](?<num>\d)>.*</[hH]!\k<num>>");
            string hello6 = "一个苹果5块钱，100个苹果多少钱";
            var result45 = Regex.Matches(hello6, @"\d+(?=块钱)");
            //查找数字后面不包含块钱的匹配项
            var rsult46 = Regex.Matches(hello6, @"\d+(?!块钱)");
            //条件嵌入(?(regex)true|false) 查找htllo5里面所有被h标签包括的内容，如果h被div包围的话，一起查出来
            var result47 = Regex.Matches(hello5, @"(?<div><div>)?\s*<h(?<num>\d)>.*?</h\k<num>>(?(div)</div>|</body>)");

辰三

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
正则表达式入门

文章目录前言一、开始干1.字符匹配2.任意匹配 .3.单字符的或 [和]4.区间5.简写\d 表示[0-9] 是数字的简写\D 表示非[0-9]，其他所有\w 表示a-z、A-Z、_、0-9、汉字5种，表示的很广\W 表示非\w，就一些特殊字符，像一些标点\s表示一些空白，像空格，tab6. 运算符^取非+ 表示1次或多次*表示0次或多次？表示0次或1次精确到数量{count}、{minCount,maxCount}、{minCount,}贪婪型与懒惰性位置表示（1）全字匹配 \b \B（2）字符串的开头结
复制链接

扫一扫