Shell编程之正则表达式

最新推荐文章于 2024-07-16 23:28:44 发布

anbesrt

最新推荐文章于 2024-07-16 23:28:44 发布

阅读量1.3k

点赞数 22

文章标签：正则表达式 linux

本文链接：https://blog.csdn.net/anbesrt/article/details/139291634

版权

正则表达式

概述

定义

regex,使用单个字符串来描述、匹配一系列符合某个句法规则的字符串，是一种匹配字符串的方法，通过一些特殊符号，实现快速查找、删除、替换某个特定字符串

用途

可以通过正则表达式快速提取需要的信息

基础正则表达式

示例

首先建立一个名为test的测试文件

[root@localhost ~]$ cat test 
he was short and fat.
he was weating a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

1.查找特定字符

[root@localhost ~]$ grep -n 'the' test                #-n：显示行号
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
[root@localhost ~]$ grep -in 'the' test                #-i：不区分大小写
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.

#-v：反向 添加-v后查找含有the字符行的命令，就成了查找不含the字符行
[root@localhost ~]$ grep -vn 'the' test
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
6:PI=3.14
7:a wood cross!
8:Actions speak louder than words
9:
10:#woood #
11:#woooooooood #
12:AxyzxyzxyzxyzC
13:I bet this place is really spooky late at night!
14:Misfortunes never come alone/single.
15:I shouldn't have lett so tast.

2.利用[]来查找集合字符

[root@localhost ~]$ grep -n 'sh[io]rt' test        #单引号的内容表示匹配[]中任意一个字符，一个[]仅代表一个字符，无关内容，且前面是sh，后面是rt
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.

查找包含重复单个字符“oo”时，只需执行以下命令

[root@localhost ~]$ grep -n 'oo' test
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
7:a wood cross!
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!

若要查找“oo”前面不是“w”的字符串时，需要通过集合字符的反向选择“[^]”来实现目的

[root@localhost ~]$ grep -n '[^w]oo' test
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!

[root@localhost ~]$ grep -n '[^a-z]oo' test        #查找oo前不是小写字母的行，A-Z大写字母，1-9数字
3:The home of Football on BBC Sport online.

3.查找行首“^”与行尾字符“$”

基础正则表达式包含两个定位元字符：“^”（行首）与“$”（行尾）

[root@localhost ~]$ grep -n '^the' test        #查询以the开头的行
4:the tongue is boneless but it breaks bones.12!

“^”符号在元字符集合“[]”符号内外的作用是不一样的，在“[]”符号内表示反向选择，在“[]” 符号外则代表定位行首。

[root@localhost ~]$ grep -n '\.$' test        #用于查找结尾是“.”的行，由于"."也是一个元字符，所有需要使用“\”转义字符，将具有特殊意义的字符转换为普通字符
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
14:Misfortunes never come alone/single.
15:I shouldn't have lett so tast.

4.查找任意一个字符“.”与重复字符"*"

[root@localhost ~]$ grep -n 'w..d' test        #“.”：代表任意一个字符
5:google is the best tools for search keyword.
7:a wood cross!
8:Actions speak louder than words

“*”代表的是重复零个或多个前面的单字符，“o*”表示拥有零个（即空字符）或大于等于一个“o”的字符

[root@localhost ~]$ grep -n 'w..d' test      #查询具有两个以上“o”字符的字符串
5:google is the best tools for search keyword.
7:a wood cross!
8:Actions speak louder than words
[root@localhost ~]# grep -n 'ooo*' test
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
7:a wood cross!
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!

5.查找连续字符范围

需要用到基础正则表达式中的限定范围的字符“{}”，因为“{}”在Shell中有特殊意义，就需要用到转义字符"\"

[root@localhost ~]# grep -n 'o\{2\}' test    #查找包含两个o的行
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
7:a wood cross!
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!


[root@localhost ~]￥ grep -n 'wo\{2,5\}d' test        #查找以w开头，d结尾中间包含2-5个o的字符
7:a wood cross!
10:#woood #

总结

^ 匹配输入字符串的开始位置。除非在方括号表达式中使用，表示不包含该字符集合。要匹配“^” 字符本身，请使用“\^”

$ 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性，则 “$” 也匹配 ‘\n’ 或 ‘\r’ 。要匹配“$” 字符本身，请使用 “\$”

. 匹配除“\r\n”之外的任何单个字符

\ 反斜杠，又叫转义字符，去除其后紧跟的元字符或通配符的特殊意义

* 匹配前面的子表达式零次或多次。要匹配“*”字符，请使用 “\*”

[] 字符集合。匹配所包含的任意一个字符。例如，“[abc]”可以匹配 “plain” 中的 “a”

[^] 赋值字符集合。匹配未包含的一个任意字符。例如，“[^abc]”可以匹配 “plain” 中任何一个字母

[n1-n2] 字符范围。匹配指定范围内的任意一个字符。例如，“[a-z]”可以匹配 “a” 到 “z” 范围内的任意一个小写字母字符。

注意：只有连字符（- ）在字符组内部，并且出现在两个字符之间时，才能表示字符的范围；如果出现在字符组的开头，则只能表示连字符本身

{n} n 是一个非负整数，匹配确定的 n 次。例如， “o{2}” 不能匹配 “Bob” 中的 “o” ，但是能匹配 “food” 中的“oo”

{n,} n 是一个非负整数，至少匹配 n 次。例如， “o{2,}” 不能匹配 “Bob” 中的 “o” ，但能匹配 “foooood” 中的所有 o 。 “o{1,}” 等价于 “o+” 。 “o{0,}” 则等价于 “o*”

{n,m} m 和 n 均为非负整数，其中 n<=m ，最少匹配 n 次且最多匹配 m 次

扩展正则表达式

egrep 命令与 grep 命令的用法基本相似。egrep 命令是一个搜索文件获得模式，使用该命令可以搜索文件中的任意字符串和符号，也可以搜索一个或多个文件的字符串，一个提示符可以是单个字符、一个字符串、一个字或一个句子。

常见元字符

+         作用：重复一个或者一个以上的前一个字符

示例：执行“egrep -n 'wo+d' test.txt”命令，即可查询 "wood" "woood" "woooooood" 等字符串

？     作用：零个或者一个的前一个字符

示例：执行“egrep -n 'bes?t' test.txt” 命令，即可查询 “bet”“best” 这两个字符串

| 作用：使用或者（or）的方式找出多个字符

示例：执行“egrep -n 'of|is|on' test.txt” 命令即可查询 "of" 或者 "if" 或者 "on" 字符串

() 作用：查找“组 ” 字符串

示例：“egrep -n 't(a|e)st' test.txt”。 “tast” 与 “test” 因为这两个单词的 “t” 与 “st” 是重复的，所以将 “a” 与 “e” 列于“()” 符号当中，并以 “|” 分隔，即可查询 "tast" 或者 "test" 字符串

()+         作用：辨别多个重复的组

              示例：“egrep -n 'A(xyz)+C' test.txt”。该命令是查询开头的 "A" 结尾是 "C" ，中间有一个以上的 "xyz" 字符串的意思

文本处理器

sed工具

sed的工作流程：

读取：sed 从输入流（文件、管道、标准输入）中读取一行内容并存储到临时的缓

冲区中（又称模式空间，pattern space）。

执行：默认情况下，所有的 sed 命令都在模式空间中顺序地执行，除非指定了行

的地址，否则 sed 命令将会在所有的行上依次执行。

显示：发送修改后的内容到输出流。在发送数据后，模式空间将会被清空。

常见用法

sed [选项] '操作' 参数

sed [选项] -f scriptfile 参数

常见选项

-e或expression=：表示用指定命令或脚本来处理输入的文本文件

-f或-file=：表示用指定的脚本文件来处理输入的文本文件

-h或-help：显示帮助

-n、--quiet或silent：表示仅显示处理后的结构

-i：直接编辑文本文件

''操作"用于指定对文件操作的动作行为，也就是sed的命令，通常采用“[n1[,n2]]”操作参数的格式，n1、n2为可选项，代表选择进行操作的行数，常见的操作

a：增加，在当前行下面增加一行指定的内容

c：替换，将选定行替换为指定内容

d：删除，删除选定的行

i：插入，在选定的行上面插入一行指定内容

p：打印，如果同时指定行，则表示打印指定行；若不指定行，则表示打印所有内容；如果有非打印字符，则以ASCII码输出，通常与*n一起使用

s：替换，替换指定字符

y：字符转换

示例

1.输出符合条件的文本（p表示正常输出）

[root@localhost ~]$ sed -n '3p' test        #输出第三行的内容
The home of Football on BBC Sport online.    
[root@localhost ~]$ sed -n '3,5p' test       #输出第三行到第五行的内容
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@localhost ~]$ sed -n 'p;n' test        #输出奇数行内容
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
a wood cross!

#woooooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@localhost ~]$ sed -n 'n;p' test        #输出偶数行内容
he was weating a blue polo shirt with black pants.
the tongue is boneless but it breaks bones.12!
PI=3.14
Actions speak louder than words
#woood #
AxyzxyzxyzxyzC
Misfortunes never come alone/single.


[root@localhost ~]$ sed -n '10,${n;p}' test    #输入第十行到文件尾之间的偶数行
#woooooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.

sed命令结合正则表达式

[root@localhost ~]$ sed -n '/the/p' test        #输出包含the的行
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

[root@localhost ~]$ sed -n '4,/the/p' test        #输出第四行到第一个包含the的行
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

[root@localhost ~]$ sed -n '/the/=' test    #输出包含the的行号
4
5

[root@localhost ~]$ sed -n '/[0-9]$/p' test    #输出以数字结尾的行
PI=3.14

[root@localhost ~]$ sed -n '/\<wood\>/p' test    #输出包含wood的行
a wood cross!

2.删除符合条件的文本（d）

下面代码中的nl命令用于计算文件的行数

[root@localhost ~]$ nl test | sed '3d'    #删除第三行
     1	he was short and fat.
     2	he was weating a blue polo shirt with black pants.
     4	the tongue is boneless but it breaks bones.12!
     5	google is the best tools for search keyword.
     6	PI=3.14
     7	a wood cross!
     8	Actions speak louder than words
       
     9	#woood #
    10	#woooooooood #
    11	AxyzxyzxyzxyzC
    12	I bet this place is really spooky late at night!
    13	Misfortunes never come alone/single.
    14	I shouldn't have lett so tast.
[root@localhost ~]$ nl test | sed '3,5d'    #删除第三到五行
     1	he was short and fat.
     2	he was weating a blue polo shirt with black pants.
     6	PI=3.14
     7	a wood cross!
     8	Actions speak louder than words
       
     9	#woood #
    10	#woooooooood #
    11	AxyzxyzxyzxyzC
    12	I bet this place is really spooky late at night!
    13	Misfortunes never come alone/single.
    14	I shouldn't have lett so tast.

[root@localhost ~]$ nl test | sed '/cross/d'
     …………    #省略部分内容
     6	PI=3.14
     8	Actions speak louder than words
    …………

[root@localhost ~]$ sed '/^[a-z]/d' test        #删除以小写字母开头的行
The home of Football on BBC Sport online.
PI=3.14
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.


[root@localhost ~]$ sed '/\.$/d' test        #删除以.结尾的行
the tongue is boneless but it breaks bones.12!
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!

[root@localhost ~]$ sed '/^$/d' test        #删除所有空行

注意：若是删除重复的空行，即连续的空行只保留一个，执行“sed -e‘/^$/{n;/^$/d}’test.txt”命令即可实现。其效果与“cat -s test.txt”相同，n 表示读下一行数据。

3.替换符合要求的文本

s：字符串替换

c：整行整块替换

y：字符转换

常见的用法如下

sed 's/the/THE/' test        #将每行中的第一个 the 替换为 THE
sed 's/l/L/2' test          #将每行中的第 2 个 l 替换为 L
sed 's/the/THE/g' test      #将文件中的所有 the 替换为 THE
sed 's/o//g' test            #将文件中的所有 o 删除(替换为空串)
sed 's/^/#/' test         #在每行行首插入#号
sed '/the/s/^/#/' test      #在包含 the 的每行行首插入#号
sed 's/$/EOF/' test         #在每行行尾插入字符串 EOF
sed '3,5s/the/THE/g' test     #将第 3~5 行中的所有 the 替换为 THE
sed '/the/s/o/O/g' test      #将包含 the 的所有行中的 o 都替换为 O

4.迁移符合条件的文本

常用参数

H:复制到剪贴板；

g、G:将剪贴板中的数据覆盖/追加至指定行；

w：保存为文件；

r：读取指定文件；

a：追加指定内容。

操作方法

sed '/the/{H;d};$G' test         #将包含 the 的行迁移至文件末尾,{;}用于多个操作
sed '1,5{H;d};17G' test          #将第 1~5 行内容转移至第 17 行后
sed '/the/w out.file' test      #将包含 the 的行另存为文件 out.file
sed '/the/r /etc/hostname' test  #将文件/etc/hostname 的内容添加到包含 the 的每行以后
sed '3aNew' test                 #在第 3 行后插入一个新行,内容为 New
sed '/the/aNew' test           #在包含 the 的每行后插入一个新行,内容为 New
sed '3aNew1\nNew2' test         #在第 3 行后插入多行内容,中间的\n 表示换行

5.使用脚本编辑文件

使用sed脚本将多个编辑指令存放到文件中（每行一条编辑指令），通过”-f“选项来调用

执行以下命令可以将第1-5行内容转移到第15行后

sed '1,5{H;d};15G' test

可以改用脚本文件方式

[root@localhost ~]$ vim opt.list 
1,5H
1,5d
15G
[root@localhost ~]$ sed -f opt.list test 
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

he was short and fat.
he was weating a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

awk工具

1.常见用法

通常情况下 awk 所使用的命令格式如下所示，其中，单引号加上大括号“{}”用于设置对数据进行的处理动作。awk 可以直接处理目标文件，也可以通过“-f”读取脚本对目标文件进行处理

awk 选项 '模式或条件 {编辑指令}' 文件 1 文件 2 … //过滤并输出文件中符合条件的内容
awk -f 脚本文件 文件 1 文件 2 … //从脚本中调用编辑指令,过滤并输出内容

awk 比较倾向于将一行分成多个“字段”然后再进行处理，且默认情况下字段的分隔符为空格或 tab 键。awk 执行结果可以通过 print 的功能将字段数据打印显示。在使用 awk 命令的过程中,可以使用逻辑操作符“&&”表示“与”、“||” 表示“或”、“！”表示“非”；还可以进行简单的数学运算，如+、-、*、/、%、^分别表示加、减、乘、除、取余和乘方。

查找/etc/passwd的用户名、用户ID、组ID，执行以下awk命令即可

[root@localhost ~]$ awk -F':' '{print $1,$3,$4}' /etc/passwd
root 0 0
bin 1 1
daemon 2 2
…………

awk的特殊内建变量

FS：指定每行文本的字段分隔符，默认为空格或制表位。

NF：当前处理的行的字段个数。

NR：当前处理的行的行号（序数）。

$0：当前处理的行的整行内容。

$n：当前处理行的第 n 个字段（第 n 列）。

FILENAME：被处理的文件名。

RS：数据记录分隔，默认为\n，即每行为一条记录。

示例

1.按行输出文本

awk '{print}' test #输出所有内容,等同于 cat test.txt
awk '{print $0}' test #输出所有内容,等同于 cat test.txt
awk 'NR==1,NR==3{print}' test #输出第 1~3 行内容
awk '(NR>=1)&&(NR<=3){print}' test #输出第 1~3 行内容
awk 'NR==1||NR==3{print}' test #输出第 1 行、第 3 行内容
awk '(NR%2)==1{print}' test #输出所有奇数行的内容
awk '(NR%2)==0{print}' test #输出所有偶数行的内容
awk '/^root/{print}' /etc/passwd #输出以 root 开头的行
awk '/nologin$/{print}' /etc/passwd #输出以 nologin 结尾的行
awk 'BEGIN {x=0};/\/bin\/bash$/{x++};END {print x}' /etc/passwd
#统计以/bin/bash 结尾的行数,等同于 grep -c "/bin/bash$" /etc/passwd
awk 'BEGIN{RS=""};END{print NR}' /etc/squid/squid.conf
#统计以空行分隔的文本段落数

2.按字段输出文本

awk '{print $1}' test        #输出每行中（以制表位或空格分隔)的第三个字段
awk '{print $1,$3}' test        #输出每行中的第一、第三个字段
awk -F":" '$2==""{print $1,$3}' test        #输出密码为空的用户的shadow记录
awk 'BEGIN {FS=":"}; $2==""{print}' /etc/shadow
#输出密码为空的用户的 shadow 记录

awk -F ":" '$7~"/bash"{print $1}' /etc/passwd
#输出以冒号分隔且第 7 个字段中包含/bash 的行的第 1 个字段

awk '($1~"nfs")&&(NF==8){print $1,$2}' /etc/services
#输出包含 8 个字段且第 1 个字段中包含 nfs 的行的第 1、2 个字段

awk -F ":" '($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd
#输出第 7 个字段既不为/bin/bash 也不为/sbin/nologin 的所有行

3.通过管道、双引号调用Shell命令

awk -F: '/bash$/{print | "wc -l"}' /etc/passwd    #调用wc -l 命令统计使用bash的用户个数，等同于grep -c "bash$" /etc/passwd

awk 'BEGIN {while ("w" | getline) n++ ;{print n-2}}'    #调用w命令，并用来统计在线用户数

awk 'BEGIN{"hostname" | getline;print $0}'       #调用hostname，并输出当前主机名

sort工具

是一个以行为单位，对文件内容进行排序的工具，也可以根据不同的数据类型来排序

常用选项

-f：忽略大小写

-b：忽略每行前面的空格

-M：按月份排序

-n：按照数字进行排序

-r：反向排序

-u：等同于uniq，表示相同的数据仅显示一行

-t：指定分隔符，默认使用[Tab]键分隔

-o<输出文件>：将排序后的结果转存到指定文件

-k：指定排序区域

1.将/etc/passwd文件中的账号进行排序

[root@localhost ~]$ sort /etc/passwd
adm:x:3:4:adm:/var/adm:/sbin/nologin
bin:x:1:1:bin:/bin:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
……

2.将/etc/passwd文件中第三列进行反向排序

[root@localhost ~]$ sort -t":" -rk 3 /etc/passwd
nobody:x:99:99:Nobody:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
halt:x:7:0:halt:/sbin:/sbin/halt
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
bin:x:1:1:bin:/bin:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
root:x:0:0:root:/root:/bin/bash

可以使用sort -t":" -rk 3 /etc/passwd -o user.txt将输出的内容存储到文本文件里

uniq工具

用于报告或忽略文件中的重复行，具体的命令语法格式为：uniq [选项] 参数

常用选项

-c：进行计数

-d：仅显示重复行

-u：仅显示出现一次的行

示例

1.删除文件中的重复行

[root@localhost ~]$ vim name
[root@localhost ~]$ uniq name
zhangsan
lisi
wangwu

2.删除文件中的重复行，并在行首显示该行重复出现的次数

[root@localhost ~]$ uniq -c name
      2 zhangsan
      3 lisi
      2 wangwu

3.查找文件中的重复行

[root@localhost ~]$ uniq -d name
zhangsan
lisi
wangwu

tr工具

常用来对来自标准输入的字符进行替换、压缩和删除。

语法格式

tr [选项] [参数]

常用选项

-c：取代所以不属于第一字符集的字符；

-d：删除所有属于第一字符集的字符；

-s：把连续重复的字符以单独一个字符表示；

-t：先删除第一字符集较第二字符集多出的字符

示例

将输入的字符由大写转为小写

[root@localhost ~]$ echo "KCg" | tr 'A-Z' 'a-z'
kcg

压缩输入中重复的字符

[root@localhost ~]$ echo "niiihaooo" | tr -s 'io'
nihao

删除字符串中某些字符

[root@localhost ~]$ echo "niiihaooo" | tr -d 'io'
nha

anbesrt

关注

22
点赞
踩
16

收藏

觉得还不错? 一键收藏
0
评论
Shell编程之正则表达式

匹配输入字符串的开始位置。除非在方括号表达式中使用，表示不包含该字符集合。要匹配“^” 字符本身，请使用“\^”$ 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的Multiline属性，则“$”也匹配‘\n’或‘\r’。要匹配“$”字符本身，请使用“\$”. 匹配除“\r\n”之外的任何单个字符\ 反斜杠，又叫转义字符，去除其后紧跟的元字符或通配符的特殊意义。
复制链接

扫一扫