1.什么是正则表达式
简单的说正则表达试就是为处理大量字符串而定义的一套规则和方法,例如:假设“@”代表nishishei,“!”代表linzhongniao。echo “@!”=”nishisheilinzhongniao”
通过定义的这些特殊符号的辅助,系统管理员就可以快速过滤,替换或输出需要的字符串,linux正则表达式一般以行为单位处理的。可以用man grep深入研究
2.为什么要学习正则表达式?
在企业工作中,我们每天做的linux运维工作中,时刻都会面对大量的有字符串的文本配置、程序、命令输出及日志文件等,而我们经常会迫切的需要,从大量的字符串中查找符合工作需要的特定的字符串,这就要靠正则表达式了。例如:ifconfig命令只输出IP,access.log日志文件只取出ip等。linux正则表达式以行为单位处理。
3.基础正则第一波命令说明
3.1 模拟数据
[root@linzhongniao ~]# cat linzhongniao.log
I am linzhongniao!
I like linux.
空行
I like badminton ball,billiard ball and chinese chess !
my blog id https://blog.51cto.com/10642812
my qq num is 1200098
空行
my god,i am not linzhongniao,But to the birds of the forest!!!
3.2 “^”尖括号说明
“^” 匹配以什么字符开头的内容,vi/vim编辑器里面“^”代表一行的开头
实例:过滤以字母m开头的内容
[root@linzhongniao ~]# grep "^m" linzhongniao.log
my blog id https://blog.51cto.com/10642812
my qq num is 1200098
my god,i am not linzhongniao,But to the birds of the forest!!!
3.3 “$”符号说明
“$”匹配以什么字符结尾的内容,vi/vim编辑器里面“$”代表一行的结尾。
实例;过滤出以8结尾的内容
[root@linzhongniao ~]# grep "8$" linzhongniao.log
my qq num is 1200098
3.4 “^$”组合符号说明
“^$”表示空行
4.基础正则第二波命令说明
4.1 “.”点号说明
“.”点号代表且只能代表任意一个字符
实例:
[root@linzhongniao ~]# grep "." linzhongniao.log
I am linzhongnieo!
I like linux.
I like badminton ball,billiard ball and chinese chess !
my blog id https://blog.51cto.com/10642812
my qq num is 1200098
my god,i am not linzhongniao,But to the birds of the forest!!!
匹配以linzhongni开头,以o结尾的内容,中间的字符可以任意多个。
[root@linzhongniao ~]# grep "linzhongni.*o" linzhongniao.log
I am linzhongnieo!
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
4.2 “\”反斜线符号说明
转义符号,例“\.”
就只代表点本身了,让有着特殊身份意义的字符,脱掉马甲,还原原型。\$就代表着$符号。
只匹配以点号结尾的字符,需要对点号进行转义
[root@linzhongniao ~]# grep "\.$" linzhongniao.log
I like linux.
4.3 “*”星号符号说明
重复0个或多个前面的一个字符,例如o*代表匹配有零个或多个字母o的内容。
[root@linzhongniao ~]# grep "linzhongniao*" linzhongniao.log
my god,i am not linzhongniao,But to the birds of the forest!!!
[root@linzhongniao ~]# grep "n*" linzhongniao.log
I am linzhongnieo!
I like linux.
I like badminton ball,billiard ball and chinese chess !
my blog id https://blog.51cto.com/10642812
my qq num is 1200098
my god,i am not linzhongniao,But to the birds of the forest!!!
4.4 “.*”组合符号说明
“.*”
匹配所有(任意)多个字符,延伸“^.*”
以任意多个字符开头,“.*$”
以任意多个字符结尾。
实例 :
匹配以goo开头的任意多个字符
[root@linzhongniao ~]# grep "goo.*" linzhongniao.log
goodi
very good
goood
good
匹配任意多个以字母d结尾的内容
[root@linzhongniao ~]# grep ".*d$" linzhongniao.log
gd
goood
glad
good
匹配任意多个以数字2结尾的内容
[root@linzhongniao ~]# grep ".*2$" linzhongniao.log
my blog id https://blog.51cto.com/10642812
匹配任意多个以叹号结尾的内容,注意反斜线的运用
[root@linzhongniao ~]# grep ".*\!$" linzhongniao.log
I am linzhongnieo!
I like badminton ball,billiard ball and chinese chess !
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
5.基础正则表达式第三波命令说明
5.1 [ abc ] 符号说明
匹配字符集内的任意一个字符[a-zA-Z],[0-9],[A-Z]。
[root@linzhongniao ~]# grep "[A-Z]" linzhongniao.log
I am linzhongnieo!
I like linux.
I like badminton ball,billiard ball and chinese chess !
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
LINZHONGNIAO
[root@linzhongniao ~]# grep "[a-z]" linzhongniao.log
I am linzhongnieo!
I like linux.
I like badminton ball,billiard ball and chinese chess !
my blog id https://blog.51cto.com/10642812
my qq num is 1200098
not 1200000098
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
goodi
good
gd
goood
glad
[root@linzhongniao ~]# grep -i "[A-Z]" linzhongniao.log
I am linzhongnieo!
I like linux.
I like badminton ball,billiard ball and chinese chess !
my blog id https://blog.51cto.com/10642812
my qq num is 1200098
not 1200000098
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
goodi
good
gd
goood
glad
LINZHONGNIAO
5.2 [^abc]符号说明
中括号里的“^”尖括号为取反的意思,匹配不包含^后的任意一个字符的内容。注意和^在中括号外面是有区别的,^在中括号外面是表示以什么开头的意思。
5.3a\{n,m\}
符号说明
重复n到m次前一个出现的字符(即重复字母a,n到m次),如果用egrep(grep -E)和sed –r 可以去掉斜线,它们可以识别扩展正则表达式。
5.4 a\{n,\}
符号说明
重复至少n次(即重复a至少n次),如果用egrep(grep -E)/sed –r 可以去掉斜线。
5.5 a\{n\}
符号说明
重复n次,前一个出现的字符。如果用egrep(grep -E)和sed –r 可以去掉斜线。
[root@linzhongniao ~]# egrep "0{3}" linzhongniao.log
my qq num is 1200098
not 1200000098
5.6 a\{,m\}
符号说明
重复最多m次, 前一个重复的字符。如果用egrep(grep -E)/sed –r 可以去掉斜线。
6.扩展的正则表达式
grep –E 以及egrep
【了解即可】
(1)“+”,加号表示重复“一个或一个以上”前面的字符(*是0或多个)。
[root@linzhongniao ~]# egrep "g+d" linzhongniao.log
gd
[root@linzhongniao ~]# egrep "go+d" linzhongniao.log
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
good
(2)* 星号表示0个或多个
[root@linzhongniao ~]# egrep "go*d" linzhongniao.log
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
good
gd
(3)“?”问号表示重复“0个或一个”(“.”点号是有且只有一个)
[root@linzhongniao ~]# egrep "go?d" linzhongniao.log
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
gd
[root@linzhongniao ~]# egrep "go.d" linzhongniao.log
good
(4)“|”管道
表示同时过滤多个字符串。
[root@linzhongniao ~]# egrep "3306|1521" /etc/services
mysql 3306/tcp# MySQL
mysql 3306/udp# MySQL
ncube-lm1521/tcp# nCube License Manager
ncube-lm1521/udp# nCube License Manager
[root@linzhongniao ~]# egrep "god|good" linzhongniao.log
my god,i am not linzhongniao,But to the birds of the FOREST!!!!
good
(5)(|)小括号分组过滤,后向引用。
[root@linzhongniao ~]# egrep "g(la|oo)d" linzhongniao.log
good
glad
7.元字符
元字符(meta character)是一种perl风格的正则表达式,只有一部分文本处理工具支持它,并不是所有的文本处理工具都支持。
\b 单词边界
示例:
[root@linzhongniao ~]# grep "good" linzhongniao.log
goodi
good
如果只想过滤good,不想过滤goodi;可以用\b定义边界,也可以用grep –w按单词搜索
[root@linzhongniao ~]# grep "good\b" linzhongniao.log
good
[root@linzhongniao ~]# grep -w "good" linzhongniao.log
good
8.正则表达式知识总结
9.企业级实战linux正则表达式结合三剑客实战
9.1 取下面的ip
解答:
sed -n 's#支持正则位置##gp' file
方法一:先把行给取出来,对目标前的内容进行匹配
[root@linzhongniao ~]# ifconfig eth0|sed -n '2'p|sed 's#^.*dr:##g'
192.168.0.117 Bcast:192.168.0.255 Mask:255.255.255.0
再对目标后的内容进行匹配
[root@linzhongniao ~]# ifconfig eth0|sed -n '2p'|sed 's#^.*dr:##g'|sed 's# B.*$##g' 《==这里# B.*$中间有两个空格,最好复制粘贴
192.168.0.117
处理技巧:
匹配需要的目标(获取的字符串如上文的ip)前的字符串一般用以^开头(^.*)
来匹配到以实际字符结尾,如:“^.addr:”表示匹配以任意字符开头到addr:结尾的内容。而处理需要的目标后的内容一般在匹配的开头写上实际的字符,结尾是以$结尾 (.$) 来匹配。如B.*$
部分表示匹配以空格大写B开头一直到结尾的内容。将匹配到的内容替换为空剩下的就是想要的内容。
方法二:
[root@linzhongniao ~]# ifconfig eth0|sed -n '2s#^.*dr:##gp'|sed 's# B.*$##g'
192.168.0.117
方法三:
sed的后向引用:
sed –nr ‘s#()()#\1\2#gp’file
参数:
-n 取消默认输出
-r 不用转义
sed反向引用演示:取出linzhongniao
[root@linzhongniao ~]# echo "I am linzhongniao linux" >f.txt
[root@linzhongniao ~]# cat f.txt
I am linzhongniao linux
[root@linzhongniao ~]# cat f.txt|sed -nr 's#^.*m (.*) l.*$#\1#gp'
linzhongniao
当在前面匹配的部分用小括号的时候,第一个括号内容,可以在后面的部分用\1输出,第二个括号的内容可以在后面部分用\2输出,以此类推。
[root@linzhongniao ~]# ifconfig eth0|sed -nr '2s#^.*dr:(.*) B.*$#\1#gp'
192.168.1.106
方法四:
[root@linzhongniao ~]# ifconfig eth0|awk -F "[ :]+" 'NR==2{print $4}'
192.168.0.106
方法五:
[root@linzhongniao ~]# ifconfig eth0|sed -nr '/inet addr/s#^.*dr:(.*) B.*$#\1#gp'
192.168.0.117
方法六:
[root@linzhongniao ~]# ifconfig bond0|awk -F "(addr:| Bcast:)" 'NR==2{print $2}'
192.168.1.225
取出ip addr列出的ip
[root@linzhongniao ~]# ip addr|awk -F "[ /]+" 'NR==8 {print $3}'
192.168.0.106
[root@linzhongniao ~]# ip addr|sed -nr '8s#^.*inet ##gp'|sed 's#/24 b.*$##g'
192.168.1.106
[root@linzhongniao ~]# ip addr|sed -nr '8s#^.*inet (.*)/24.*$#\1#gp'
192.168.1.106
[root@linzhongniao ~]# ip addr|awk -F "(inet |/24 brd)" NR==8'{print $2}'
192.168.1.106
9.2 将/etc/passwd文件下的第一列和最后一列替换
[root@linzhongniao ~]# tail /etc/passwd|awk -F "[:]+" '{print $6":"$2":"$3":"$4"::"$5":"$1}'
/bin/bash:x:855:855::/home/stu1:stu1
/bin/bash:x:856:856::/home/stu2:stu2
/bin/bash:x:857:857::/home/stu3:stu3
/bin/bash:x:858:858::/home/stu4:stu4
/bin/bash:x:859:859::/home/stu5:stu5
/bin/bash:x:860:860::/home/stu6:stu6
/bin/bash:x:861:861::/home/stu7:stu7
/bin/bash:x:862:862::/home/stu8:stu8
/bin/bash:x:863:863::/home/stu9:stu9
/bin/bash:x:864:864::/home/stu10:stu10
9.3 取出文件权限
取出644
[root@linzhongniao ~]# stat /etc/hosts
File: `/etc/hosts'
Size: 218 Blocks: 8 IO Block: 4096 regular file
Device: 804h/2052d Inode: 260125 Links: 2
Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root)
Access: 2018-07-18 10:09:51.759042316 +0800
Modify: 2018-07-11 16:18:38.646992646 +0800
Change: 2018-07-11 16:18:38.646992646 +0800
解答
方法一:
[root@linzhongniao ~]# stat /etc/hosts|sed -nr 's#^.*0(.*)/-rw.*$#\1#gp'
644
方法二:
[root@linzhongniao ~]# stat /etc/hosts|awk -F "[0/]+" 'NR==4 {print $2}'
644
[root@linzhongniao ~]# stat ett.txt|awk -F "[: (0/]+" 'NR==4{print $2}'
644
方法三:
[root@linzhongniao ~]# stat /etc/hosts|awk -F "(0|/)" 'NR==4{print $2}'
644
方法四:
[root@linzhongniao ~]# stat -c %a /etc/hosts
644
9.4 批量重命名文件
当前目录下有文件如下所示:要求用sed命令重命名,删除文件名中的_finished
。
[root@linzhongniao test]# ls
stu_102999_1_finished.jpg stu_102999_2_finished.jpg stu_102999_3_finished.jpg stu_102999_4_finished.jpg stu_102999_5_finished.jpg
解答:
下面mv &中的&符号代表前面ls查找的内容。
[root@linzhongniao test]# ls|sort|sed -nr 's#(^.*_)(.*)(_.*ed)(.j.*$)#mv & \1\2\4#gp'
mv stu_102999_1_finished.jpg stu_102999_1.jpg
mv stu_102999_2_finished.jpg stu_102999_2.jpg
mv stu_102999_3_finished.jpg stu_102999_3.jpg
mv stu_102999_4_finished.jpg stu_102999_4.jpg
mv stu_102999_5_finished.jpg stu_102999_5.jpg
将上面输出的内容交给bash处理。
[root@linzhongniao test]# ls|sort|sed -nr 's#(^.*_)(.*)(_.*ed)(.j.*$)#mv & \1\2\4#gp'|bash
[root@linzhongniao test]# ls
stu_102999_1.jpg stu_102999_2.jpg stu_102999_3.jpg stu_102999_4.jpg stu_102999_5.jpg
9.5 批量创建用户
[root@linzhongniao ~]# echo stu{1..10}|xargs -n 1|awk '{print "useradd" ,$0}'
useradd stu1
useradd stu2
useradd stu3
useradd stu4
useradd stu5
useradd stu6
useradd stu7
useradd stu8
useradd stu9
useradd stu10
[root@linzhongniao ~]# echo stu{1..10}|xargs -n 1|awk '{print "useradd " ,$0}'
useradd stu1
useradd stu2
useradd stu3
useradd stu4
useradd stu5
useradd stu6
useradd stu7
useradd stu8
useradd stu9
useradd stu10
最后交给bash处理
[root@linzhongniao ~]# echo stu{1..10}|xargs -n 1|awk '{print "useradd " ,$0}'|bash
转载于:https://blog.51cto.com/10642812/2179014