Linux系统使用VIM熟练掌握正则表示法

最新推荐文章于 2024-07-18 23:22:58 发布

傻傻的心动

最新推荐文章于 2024-07-18 23:22:58 发布

阅读量1.4k

点赞数 3

分类专栏： linux基础命令文章标签： java 开发语言 linux

Panta rhei

本文链接：https://blog.csdn.net/m0_63624418/article/details/127592747

版权

linux基础命令专栏收录该内容

31 篇文章 6 订阅

订阅专栏

熟练掌握正则表示法

简单地说，正则表示法就是处理字符串的方法，它以“行”为单位来进行字符串的处理。正则表示法透过一些特殊符号的辅助，可以让使用者轻易完成查找/删除/替换某些特定字符串的工作。
举例来说，如果只想找到MYweb（前面两个为大写字母）或 Myweb（仅有一个大写字母）字符串（MYWEB、myweb等都不符合要求），该如何处理？如果在没有正则表示法的环境中（例如MS Word)，你或许要使解语系对正则用忽略大小写的办法，或者分别以 MYweb及 Myweb搜寻两遍。但是，达式的影响忽略大小写可能会搜寻到 MYWEB/myweb/MyWeB等不需要的字符串而
造成困扰。

1.掌握grep的高级使用
格式：grep[-A][-B][--color=auto]‘查找字符串' filename

选项与参数的含义如下。
-A：后面可加数字，为 after 的意思，除了列出该行外，后续的n行也列出来。

-B：后面可加数字，为befor的意思，除了列出该行外，前面的n行也列出来。--color=auto：可将搜寻出的正确数据用特殊颜色标记。
【例1】用 dmesg列出核心信息，再以grep 找出内含 IPv6的那行。

[root@CentOS7-1~]# dmesg | grep 'IPv6'

20.944553] IPv6:ADDRCONF(NETDEV _UP)： ens38: link is not ready26.822775] IPv6: ADDRCONF(NETDEV_UP): virbr0: link is not ready553.276846] IPv6: ADDRCONF (NETDEV_UP): ens38: link is not ready553.282437] IPv6: ADDRCONF (NETDEV_UP): ens38: link is not ready553.284846] IPv6: ADDRCONF (NETDEV_UP):ens38: link is not ready
[ 553.286861] IPv6: ADDRCONF (NETDEV_CHANGE): ens38: link becomes ready

# dmesg 可列出核心信息，通过grep 获取IPV6的相关信息。不过没有行号与特殊颜色显示。

【例2】承上题，要将获取到的关键字显色，且加上行号（-n）来表示。
[root@CentOS7-1 ~]# dmesg | grep -n --color=auto 'IPv6'
1903:[20.944553] IPv6: ADDRCONF (NETDEV_UP):ens38: link is not ready

1912:[ 26.822775] IPv6: ADDRCONF(NETDEV_UP): virbr0: link is not ready

1918:[ 553.276846] IPv6:ADDRCONF(NETDEV_UP): ens38:link is not ready

1919:[ 553.282437] IPv6:ADDRCONF (NETDEV_UP): ens38: link is not ready

1920:[ 553.284846] IPv6: ADDRCONF (NETDEV_UP):ens38: link is not ready

1922:[ 553.286861] IPv6:ADDRCONF(NETDEV_CHANGE):ens38: link becomes ready

#除了会有特殊颜色外，最前面还有行号

【例3】承上题，在关键字所在行的前一行与后一行也一起找出来显示。[root@CentOS7-1 ~]# dmesg | grep -n -A1 -B1 --color=auto 'IPV6'

[ 20.666378] ip_set: protocol 6
1903:[ 20.944553] IPv6:ADDRCONF(NETDEV UP):ens38: link is not ready
1922:[ 553.286861] IPv6:ADDRCONF (NETDEV CHANGE):ens38: link becomes ready
1923-[ 555.495760] TCP: lp registered
#如上所示，你会发现关键字 1903 所在的前一行及1922后一行也都被显示出来
#这样可以让你将关键字前后数据找出来进行分析

2.练习基础正则表达式
练习文件 sample.txt的内容如下。文件共有22行，最底下一行为空白行。现将该文件复制到root的家目录/root下面。
[root@CentOS7-1 ~]# pwd
/root
[root@CentOS7-1 ~]# cat /root/sample.txt
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.

Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.^M
GNU is free air not free beer.^M
Her hair is very beauty.^M

I can't finish the test.^M

Oh! The soup taste good.^M

motorcycle is cheap than car.

This window is clear.
the symbol '*' is represented as start.
Oh! My god!
The gd software is a library for drafting programs.^M

You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
#I am Bobby

查找特定字符串。

假设我们要从文件 sample.txt当中取得“the”这个特定字符串，最简单的方式是:
[root@CentOS7-1 ~]# grep -n 'the' /root/sample.txt
8:I can't finish the test.
12:the symbol *' is represented as start.

15:You are the best is mean you are the no. 1.

16:The world <Happy> is the same with "glad".

18:google is the best tools for search keyword.

如果想要反向选择呢？也就是说，当该行没有“the”这个字符串时才显示在屏幕上：
[root@CentOS7-1 ~]# grep -vn 'the' /root/sample.txt
你会发现，屏幕上出现的行列为除了 8，12，15，16，18五行之外的其他行！接下来，如果你想要获得不论大小写的“the”这个字符串，则执行
[root@CentOS7-1 ~]# grep -in 'the' /root/sample.txt
8:I can't finish the test.
9:0h! The soup taste good.
12:the symbol * is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
除了多两行（9、14行）之外，第16行也多了一个“The”关键字被标出了颜色。

（2）利用中括号口来搜寻集合字符。

对比“test”或“taste”这两个单词可以发现，它们有共同点“t?st”存在。这个时候，可以这样来查寻：
[root@CentOS7-1 ~]# grep -n 't[ae]st' /root/sample,txt
8:I can't finish the test.
9:0h!The soup taste good.
   其实里面不论有几个字符，都只代表某一个字符，所以，上面的例子说明需要的字符串是tast或test。而如果想要搜寻到有“oo”的字符时，则使用：
[root@CentOS7-1 ~]# grep -n 'oo' /root/sample.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:0h!The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!
    但是，如果不想要“oo”前面有“g”的行显示出来。此时，可以利用在集合字节的反向选择[^]来完成：
[root@CentOS7-1~]# grep -n '[^g]oo' /root/sample.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!
   第1、9行不见了，因为这两行的00前面出现了g。第2、3行没有疑问，因为foo与Foo均可被接受。但是第18行虽然有 google 的 goo，因为该行后面出现了t001的to0，所以该行也被列出来。也就是说，18行里面虽然出现了我们所不要的项目（goo），但是由于有需要的项目（too），因此其是符合字符串搜寻要求的。

至于第 19行，同样，因为goooooogle 里面的 00前面可能是o。例如：go(ooo)oogle，所以，这一行也是符合需求的。

再者，假设o0前面不想有小写字母，可以这样写：[^abcd.z]oo。但是这样似乎不怎么方便，由于小写字母的 ASCI 上编码的顺序是连续的，因此，我们可以将之简化:
[root@CentOS7-1 ~]# grep -n '[^a-z]oo' sample.txt
3:Football game is not use feet only.

也就是说，一组集合字节中如果是连续的，例如大写英文/小写英文/数字等，就可以使用[a-z]，[A-Z]，[0-9]等方式来书写。那么如果要求字符串是数字与英文呢？那就将其全部写在一起，变成：[a-zA-Z0-9]。例如，我们要获取有数字的那一行：
[root@CentOS7-1 ~]# grep -n '[0-9]' /root/sample.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

但由于考虑到语系对于编码顺序的影响，所以除了连续编码使用减号“-”之外，你也可以使用如下的方法来取得前面两个测试的结果：
[root@CentOS7-1 ~]# grep -n '[^[:lower:]]oo' /root/sample.txt

# [:1ower:)代表的就是a-2的意思
[root@CentOS7-1 ~]#grep -n'[[:digit:]]' /root/sample,txt

至此：对于口、[]以及目当中的“.”，是不是已经很熟悉了?
（3）行首与行尾字节^$。
    在前面、可以查询到一行字串里面有“the”，那如果想要让“the”只在行首才列出呢?
[root@CentOS7-1 ~]# grep -n ’^the’ /root/sample.txt
12:the symbol ’*’is represented as start
   此时、就只剩下第12行，因为只有第12行的行首是the。此外，如果想要开头是小写字母的那些行列出呢？可以这样写：
[root@CentOS7-1 ~]# grep -n ’^[a-z]’ /root/sample.txt
2:apple da my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18 google is the best tools for search keyword,
19:goooooogle yes!
20:go! go! Let's go.
   那如果不想要开头是英文字母，则可以这样：
[root@CentOS7-1 ~]# grep -n '^[^a-zA-Z]' /root/sample.txt

1:"Open Source" is a good mechanism to develop programs.

21: #I am Bobby

特别提示：“^”符号在字符集合符号（括号[]）之内与之外的意义是不同的。在[]内代表“反向选择”，在[]之外则代表定位在行首。反过来思考，如果想要找出行尾结束为小数点（.）的那些行，该如何处理？
[root@CentOS7-1 ~]# grep -n '\.$' /root/sample.txt
1: "Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15;You are the best is mean you are the no. 1.

16:The world <Happy> is the same with "glad".

17:I like dog.
18:google is the best tools for search keyword.
20:go! go！ Let's go.
特别注意：因为小数点具有其他意义（下面会介绍），所以必须要使用跳转字节（\）来解除其特殊意义。不过，你或许会觉得奇怪，第5~9行最后面也是“.”，怎么无法打印出来？这里就牵涉到Windows 平台的软件对于断行字符的判断问题了！我们使用cat-A将第5行显示出来，你会发现（命令 cat中的-A 参数含义：显示不可打印字符，行尾显示“$”):

[root@CentOS7-1-]#cat -An /root/sample.txt | head -n 10 | tail -n 6
5 However, this dress is about $ 3183 dollars.^M$
6 GNU is free air not free beer.'Ms
7 Her hair is very beauty. M$
8 I can't finish the test. M$
9 Oh! The soup taste good.'M$
10 motorcycle is cheap than car.s
   由此，我们可以发现第5~9行为Windows的断行字节（^M$），而正常的Linux应该仅有第10行显示的那样（$）。所以，也就找不到5~9行了。这样就可以了解“^”与“$”的意义了。
   思考：如果想要找出哪一行是空白行，即该行没有输入任何数据，该如何搜寻？
|root@CentOS7-1~]# grep -n '^$' /root/sample.txt
22:
   因为只有行首跟行尾有（^$），所以这样就可以找出空白行了。
   技巧：假设已经知道在一个程序脚本（shell script）或者是配置文件中，空白行与开头为#的那些行是注解，因此如果你要将数据打印出参考时，可以将这些数据省略掉以节省纸张，那么怎么操作呢？我们以/etc/rsyslog.conf这个文件来作范例，可以自行参考以下输出的结果(-v选项表示输出除之外的所有行）：
|root@CentOS7-1~]# cat -n /etc/rsyslog.conf
#结果可以发现有91行的输出，其中包含很多空白行与#开头的注释行

[root@CentOS7-1 ~]# grep -v '^$' /etc/rsyslog.conf | grep -v ‘^#’
#结果仅有10行，其中第一个“-v1^S1”代表不要空白行
#第二个“-v’^#’”代表不要开头是#的那行

（4）任意一个字符“.”与重复字节“*”。
    我们知道万用字符“*”可以用来代表任意（0或多个）字符，但是正则表示法并不是万用字符，两者之间是不相同的。至于正则表示法当中的“.”则代表“绝对有一个任意字符”的意思。这两个符号在正则表示法的意义如下。
    .（小数点）：代表一个任意字符。
    *（星号）：代表重复前一个字符0次到无穷多次的意思，为组合形态。
   下面直接做练习。假设需要找出“g??d”的字符串，即共有4个字符，开头是“g”而结束是“d”，可以这样做：
[root@CentOS7-1 ~]# grep -n 'g..d' /root/sample.txt
1:"Open Source" is a good mechanism to develop programs.
9:0h!The soup taste good.
16:The world <Happy> is the same with "glad"
   因为强调g与d之间一定要存在两个字符，因此，第13行的god与第14行的gd就不会被列出来。如果想要列出 00、000、0000等数据，也就是说，至少要有两个（含）0以上，该如何操作呢？是。*还是oo*还是000*呢？
   因为*代表的是“重复0个或多个前面的RE字符”，因此，“o*”代表的是“拥有空字符或一个。以上的字符”。

特别注意：因为允许空字符（即有没有字符都可以），所以“grep -n 'o*' sample.txt”将会把所有的数据都列出来。
   那如果是“oo*”呢？则第一个。肯定必须要存在，第二个。则是可有可无的多个o，所以，凡是含有0、00、000、0000等，都可以被列出来。
   同理，当需要“至少两个。以上的字符串”时，就需要000*，即
[root@CentOS7-1~]# grep -n 'ooo*' /root/sample.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only·
9:0h!The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!
    继续做练习，如果想要字符串开头与结尾都是g，但是两个g之间仅能存在至少一个o,即gog、goog、gooog等，那该如何操作呢？
[root@CentOS7-1 ~]# grep -n 'goo*g' sample.txt

18:google is the best tools for search keyword.

19:goooooogle yes!
   如果想要找出以g开头且以g结尾的字符串，当中的字节可有可无，那该如何操作呢？是“g*g”吗？
[root@CentOS7-1 ~]# grep -n 'g*g' /root/sample.txt
1: "Open Source" is a good mechanism to develop programs.
3:Football game is not use feet only.
9:0h！The soup taste good.
13:0h! My god!
14:The gd software is a library for drafting programs.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go!go! Let's
   但测试的结果竟然出现这么多行？因为g*g里面的g*代表“空字符或一个以上的g”再加上后面的g，因此，整个正则表达式的内容就是g、88、888、g8gg等，所以，只要该行当中拥有一个以上的g就符合所需了。
   那该如何满足g.…的需求呢？利用任意一个字符“.”，即“g.*g”。因为“”可以是0个或多个重复前面的字符，而“，”是任意字节，所以“*”就代表零个或多个任意字符。
[root@CentOS7-1 ~]# grep -n 'g.*g' /root/sample.txt
1:"Open Source" is a good mechanism to develop programs.

14:The gd software is a library for drafting programa.
18:google is the best tools for search keyword,
19:goooooogle yes!
20:go!go! Let's go
因为代表以g开头并且以g结尾，中间任意字符均可接受，所以，第1、14、20行是可接受的。
注意：‘“*”的RE（正则表达式）表示任意字符很常见，希望大家能够理解并且熟悉。

再来完成一个练习，如果想要找出“任意数字”的行列呢？因为仅有数字，所以这样做：
[root@CentOS7-1 ~]# grep -n '[0-9][0-9]*' /root/sample.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.
   虽然使用grep -n'[0-9]' sample.txt也可以得到相同的结果，但希望大家能够理解上面命令当中RE表示法的意义。
（5）限定连续RE字符范围{}。
   在上例中，可以利用“.”与RE字符及“*”来设置0个到无限多个重复字符，那如果想要限制一个范围区间内的重复字符数该怎么办呢？举例来说，想要找出2个~5个o的连续字符串，该如何操作？这时候就要使用限定范围的字符“{}”了。但因为“{”与“}”的符号在shell里是有特殊意义的，所以必须使用转义字符“\”来让其失去特殊意义才行。
   先来做一个练习，假设要找到含两个。的字符串的行，可以这样做:
[root@CentOS7-1~]# grep -n 'o\{2\}' /root/sample.txt
1: "Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:0h！The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!
   似乎与ooo*的字符没有什么差异，因为第19行有多个。依旧也出现了！那么换个搜寻的字符串试试。假设要找出g后面接2~5个o，然后再接一个g的字符串，应该这样操作：
[root@CentOS7-1~]# grep -n 'go\{2,5\)g' /root/sample.txt
18:google is the best tools for search keyword.
   第19行没有被选中（因为19行有6个o)。那么，如果想要的是2个o以上的 goooo...g呢？除了可以使用gooo*g外，也可以这样：
[root@CentOS7-1 ~]# grep -n 'go\{2,\}g' /root/sample.txt
18;google is the best tools for search keyword.
19:goooooogle yes!

3.基础正则表达式的特殊字符汇总
   经过了上面的几个简单的范例，可以将基础正则表示的特殊字符汇总成表。

基础正则表达式的特殊字符汇总
RE字符	意义与范例
^word	意义：待搜寻的字串(word)在行首范例：搜寻行首为#开始的那一行，并列出行号 grep -n '^4' sample.txt
word$	意义：待搜寻的字串“word”在行尾范例：将行尾为！的那一行列出来，并列出行号 grep -n '!$' sample.txt
.	意义：代表一定有一个任意字节的字符范例：搜寻的字串可以是“eve”“"eae”"eee"“ee”，但不能仅有“ee”，即e与e中间“一定”仅有一个字符，而空白字符也是字符 grep -n 'e.e' sample.txt
\	意义：转义字符，将特殊符号的特殊意义去除范例：搜寻含有单引号（'）的那一行！ grep -n \' sample.txt
*	意义：重复零个到无穷多个的前一个RE字符范例：找出含有“es”“ess”“esss”等的字串，注意，因为可以是0个，所以es也是符合要求的搜寻字符串。另外，因为为重复“前一个RE字符”的符号，因此，在之前必须要紧接着一个RE字符！例如任意字符则为“.” *grep -n 'ess' sample.txt**
[list]	意义：字节集合的RE字符，里面列出想要选取的字节范例：搜寻含有（gl）或（gd）的那一行，需要特别留意的是，在口当中“仅代表一个待搜寻的字符”，例如“a[afl]y”代表搜寻的字符串可以是 aay、afy、aly 即[afnl]代表a或f或1的意思 grep -n 'g[ld]' sample.txt
[n1-n2]	意义：字符集合的 RE字符，里面列出想要选取的字符范围范例：搜寻含有任意数字的那一行！需特别留意，在字符集合□中的减号-是有特殊意义的，代表两个字符之间的所有连续字符！但这个连续与否与ASCII编码有关，因此，你的编码需要设置正确（在bash当中，需要确定LANG与LANGUAGE的变量是否正确！），例如所有大写字符则为[A-Z] grep -n '[A-Z]' sample.txt
[^list]	意义：字符集合的RE字符，里面列出不需要的字符串或范围范例：搜寻的字符串可以是“oog”“ood”，但不能是“oot”，那个在]内时，代表的意义是“反向选择”的意思。例如，不选取大写字符，则为[^A-Z]。但是，需要特别注意的是，如果以grep-n[^A-Z] sample.txt来搜寻，则发现该文件内的所有行都被列出，为什么？因为这个[^A-Z]是“非大写字符”的意思，因为每一行均有非大写字符 grep -n 'oo[^t]' sample.txt
\{n,m\}	意义：连续n~m个的“前一个RE字符” 意义：若为\{n\}则是连续n个的前一个RE字符意义：若是\{n,}则是连续n个以上的前一个RE字符范例：在g与g之间有2~3个的0存在的字符串，即“goog"、“gooog” grep -n 'go\{2,3\}g' sample.txt